Next: Arrays Summary, Previous: Multidimensional, Up: Arrays [Contents][Index]
gawk
goes beyond standard awk
’s multidimensional
array access and provides true arrays of
arrays. Elements of a subarray are referred to by their own indices
enclosed in square brackets, just like the elements of the main array.
For example, the following creates a two-element subarray at index 1
of the main array a
:
a[1][1] = 1 a[1][2] = 2
This simulates a true two-dimensional array. Each subarray element can
contain another subarray as a value, which in turn can hold other arrays
as well. In this way, you can create arrays of three or more dimensions.
The indices can be any awk
expressions, including scalars
separated by commas (i.e., a regular awk
simulated
multidimensional subscript). So the following is valid in
gawk
:
a[1][3][1, "name"] = "barney"
Each subarray and the main array can be of different length. In fact, the
elements of an array or its subarray do not all have to have the same
type. This means that the main array and any of its subarrays can be
nonrectangular, or jagged in structure. You can assign a scalar value to
the index 4
of the main array a
, even though a[1]
is itself an array and not a scalar:
a[4] = "An element in a jagged array"
The terms dimension, row, and column are
meaningless when applied
to such an array, but we will use “dimension” henceforth to imply the
maximum number of indices needed to refer to an existing element. The
type of any element that has already been assigned cannot be changed
by assigning a value of a different type. You have to first delete the
current element, which effectively makes gawk
forget about
the element at that index:
delete a[4] a[4][5][6][7] = "An element in a four-dimensional array"
This removes the scalar value from index 4
and then inserts a
three-level nested subarray
containing a scalar. You can also
delete an entire subarray or subarray of subarrays:
delete a[4][5] a[4][5] = "An element in subarray a[4]"
But recall that you can not delete the main array a
and then use it
as a scalar.
The built-in functions that take array arguments can also be used
with subarrays. For example, the following code fragment uses length()
(see String Functions)
to determine the number of elements in the main array a
and
its subarrays:
print length(a), length(a[1]), length(a[1][3])
This results in the following output for our main array a
:
2, 3, 1
The ‘subscript in array’ expression
(see Reference to Elements) works similarly for both
regular awk
-style
arrays and arrays of arrays. For example, the tests ‘1 in a’,
‘3 in a[1]’, and ‘(1, "name") in a[1][3]’ all evaluate to
one (true) for our array a
.
The ‘for (item in array)’ statement (see Scanning an Array) can be nested to scan all the elements of an array of arrays if it is rectangular in structure. In order to print the contents (scalar values) of a two-dimensional array of arrays (i.e., in which each first-level element is itself an array, not necessarily of the same length), you could use the following code:
for (i in array) for (j in array[i]) print array[i][j]
The isarray()
function (see Type Functions)
lets you test if an array element is itself an array:
for (i in array) { if (isarray(array[i]) { for (j in array[i]) { print array[i][j] } } else print array[i] }
If the structure of a jagged array of arrays is known in advance,
you can often devise workarounds using control statements. For example,
the following code prints the elements of our main array a
:
for (i in a) { for (j in a[i]) { if (j == 3) { for (k in a[i][j]) print a[i][j][k] } else print a[i][j] } }
See Walking Arrays for a user-defined function that “walks” an arbitrarily dimensioned array of arrays.
Recall that a reference to an uninitialized array element yields a value
of ""
, the null string. This has one important implication when you
intend to use a subarray as an argument to a function, as illustrated by
the following example:
$ gawk 'BEGIN { split("a b c d", b[1]); print b[1][1] }' error→ gawk: cmd. line:1: fatal: split: second argument is not an array
The way to work around this is to first force b[1]
to be an array by
creating an arbitrary index:
$ gawk 'BEGIN { b[1][1] = ""; split("a b c d", b[1]); print b[1][1] }' -| a
Next: Arrays Summary, Previous: Multidimensional, Up: Arrays [Contents][Index]