Next: , Previous: , Up: Extension API Description   [Contents][Index]


16.4.2 General-Purpose Data Types

I have a true love/hate relationship with unions.

Arnold Robbins

That’s the thing about unions: the compiler will arrange things so they can accommodate both love and hate.

Chet Ramey

The extension API defines a number of simple types and structures for general-purpose use. Additional, more specialized, data structures are introduced in subsequent sections, together with the functions that use them.

The general-purpose types and structures are as follows:

typedef void *awk_ext_id_t;

A value of this type is received from gawk when an extension is loaded. That value must then be passed back to gawk as the first parameter of each API function.

#define awk_const …

This macro expands to ‘const’ when compiling an extension, and to nothing when compiling gawk itself. This makes certain fields in the API data structures unwritable from extension code, while allowing gawk to use them as it needs to.

typedef enum awk_bool {
    awk_false = 0,
    awk_true
} awk_bool_t;

A simple Boolean type.

typedef struct awk_string {
    char *str;      /* data */
    size_t len;     /* length thereof, in chars */
} awk_string_t;

This represents a mutable string. gawk owns the memory pointed to if it supplied the value. Otherwise, it takes ownership of the memory pointed to. Such memory must come from calling one of the gawk_malloc(), gawk_calloc(), or gawk_realloc() functions!

As mentioned earlier, strings are maintained using the current multibyte encoding.

typedef enum {
    AWK_UNDEFINED,
    AWK_NUMBER,
    AWK_STRING,
    AWK_ARRAY,
    AWK_SCALAR,         /* opaque access to a variable */
    AWK_VALUE_COOKIE    /* for updating a previously created value */
} awk_valtype_t;

This enum indicates the type of a value. It is used in the following struct.

typedef struct awk_value {
    awk_valtype_t val_type;
    union {
        awk_string_t       s;
        double             d;
        awk_array_t        a;
        awk_scalar_t       scl;
        awk_value_cookie_t vc;
    } u;
} awk_value_t;

An “awk value.” The val_type member indicates what kind of value the union holds, and each member is of the appropriate type.

#define str_value      u.s
#define num_value      u.d
#define array_cookie   u.a
#define scalar_cookie  u.scl
#define value_cookie   u.vc

Using these macros makes accessing the fields of the awk_value_t more readable.

typedef void *awk_scalar_t;

Scalars can be represented as an opaque type. These values are obtained from gawk and then passed back into it. This is discussed in a general fashion in the text following this list, and in more detail in Symbol table by cookie.

typedef void *awk_value_cookie_t;

A “value cookie” is an opaque type representing a cached value. This is also discussed in a general fashion in the text following this list, and in more detail in Cached values.

Scalar values in awk are either numbers or strings. The awk_value_t struct represents values. The val_type member indicates what is in the union.

Representing numbers is easy—the API uses a C double. Strings require more work. Because gawk allows embedded NUL bytes in string values, a string must be represented as a pair containing a data pointer and length. This is the awk_string_t type.

Identifiers (i.e., the names of global variables) can be associated with either scalar values or with arrays. In addition, gawk provides true arrays of arrays, where any given array element can itself be an array. Discussion of arrays is delayed until Array Manipulation.

The various macros listed earlier make it easier to use the elements of the union as if they were fields in a struct; this is a common coding practice in C. Such code is easier to write and to read, but it remains your responsibility to make sure that the val_type member correctly reflects the type of the value in the awk_value_t struct.

Conceptually, the first three members of the union (number, string, and array) are all that is needed for working with awk values. However, because the API provides routines for accessing and changing the value of a global scalar variable only by using the variable’s name, there is a performance penalty: gawk must find the variable each time it is accessed and changed. This turns out to be a real issue, not just a theoretical one.

Thus, if you know that your extension will spend considerable time reading and/or changing the value of one or more scalar variables, you can obtain a scalar cookie99 object for that variable, and then use the cookie for getting the variable’s value or for changing the variable’s value. The awk_scalar_t type holds a scalar cookie, and the scalar_cookie macro provides access to the value of that type in the awk_value_t struct. Given a scalar cookie, gawk can directly retrieve or modify the value, as required, without having to find it first.

The awk_value_cookie_t type and value_cookie macro are similar. If you know that you wish to use the same numeric or string value for one or more variables, you can create the value once, retaining a value cookie for it, and then pass in that value cookie whenever you wish to set the value of a variable. This saves storage space within the running gawk process and reduces the time needed to create the value.


Footnotes

(99)

See the “cookie” entry in the Jargon file for a definition of cookie, and the “magic cookie” entry in the Jargon file for a nice example. See also the entry for “Cookie” in the Glossary.


Next: , Previous: , Up: Extension API Description   [Contents][Index]