Previous: , Up: Sample Programs   [Contents][Index]


11.5 Exercises

  1. Rewrite cut.awk (see Cut Program) using split() with "" as the separator.
  2. In Egrep Program, we mentioned that ‘egrep -i’ could be simulated in versions of awk without IGNORECASE by using tolower() on the line and the pattern. In a footnote there, we also mentioned that this solution has a bug: the translated line is output, and not the original one. Fix this problem.
  3. The POSIX version of id takes options that control which information is printed. Modify the awk version (see Id Program) to accept the same arguments and perform in the same way.
  4. The split.awk program (see Split Program) assumes that letters are contiguous in the character set, which isn’t true for EBCDIC systems. Fix this problem. (Hint: Consider a different way to work through the alphabet, without relying on ord() and chr().)
  5. In uniq.awk (see Uniq Program, the logic for choosing which lines to print represents a state machine, which is “a device that can be in one of a set number of stable conditions depending on its previous condition and on the present values of its inputs.”81 Brian Kernighan suggests that “an alternative approach to state machines is to just read the input into an array, then use indexing. It’s almost always easier code, and for most inputs where you would use this, just as fast.” Rewrite the logic to follow this suggestion.
  6. Why can’t the wc.awk program (see Wc Program) just use the value of FNR in endfile()? Hint: Examine the code in Filetrans Function.
  7. Manipulation of individual characters in the translate program (see Translate Program) is painful using standard awk functions. Given that gawk can split strings into individual characters using "" as the separator, how might you use this feature to simplify the program?
  8. The extract.awk program (see Extract Program) was written before gawk had the gensub() function. Use it to simplify the code.
  9. Compare the performance of the awksed.awk program (see Simple Sed) with the more straightforward:
    BEGIN {
        pat = ARGV[1]
        repl = ARGV[2]
        ARGV[1] = ARGV[2] = ""
    }
    
    { gsub(pat, repl); print }
    
  10. What are the advantages and disadvantages of awksed.awk versus the real sed utility?
  11. In Igawk Program, we mentioned that not trying to save the line read with getline in the pathto() function when testing for the file’s accessibility for use with the main program simplifies things considerably. What problem does this engender though?
  12. As an additional example of the idea that it is not always necessary to add new features to a program, consider the idea of having two files in a directory in the search path:
    default.awk

    This file contains a set of default library functions, such as getopt() and assert().

    site.awk

    This file contains library functions that are specific to a site or installation; i.e., locally developed functions. Having a separate file allows default.awk to change with new gawk releases, without requiring the system administrator to update it each time by adding the local functions.

    One user suggested that gawk be modified to automatically read these files upon startup. Instead, it would be very simple to modify igawk to do this. Since igawk can process nested @include directives, default.awk could simply contain @include statements for the desired library functions. Make this change.

  13. Modify anagram.awk (see Anagram Program), to avoid the use of the external sort utility.

Footnotes

(81)

This is the definition returned from entering define: state machine into Google.


Previous: , Up: Sample Programs   [Contents][Index]