The A-Z of Programming Languages: AWK

Alfred V. Aho of AWK fame talks about the history and continuing popularity of his pattern matching language.

Were there any programs or languages that already had these functions at the time you developed AWK?

Our original model was GREP. But GREP had a very limited form of pattern action processing, so we generalized the capabilities of GREP considerably. I was also interested at that time in string pattern matching algorithms and context-free grammar parsing algorithms for compiler applications. This means that you can see a certain similarity between what AWK does and what the compiler construction tools LEX and YACC do.

LEX and YACC were tools that were built around string pattern matching algorithms that I was working on: LEX was designed to do lexical analysis and YACC syntax analysis. These tools were compiler construction utilities which were widely used in Bell labs, and later elsewhere, to create all sorts of little languages. Brian Kernighan was using them to make languages for typesetting mathematics and picture processing.

LEX is a tool that looks for lexemes in input text. Lexemes are sequences of characters that make up logical units. For example, a keyword like 'then' in a programming language is a lexeme. The character 't' by itself isn't interesting, 'h' by itself isn't interesting, but the combination 'then' is interesting. One of the first tasks a compiler has to do is read the source program and group its characters into lexemes.

AWK was influenced by this kind of textual processing, but AWK was aimed at data-processing tasks and it assumed very little background on the part of the user in terms of programming sophistication.

Can you provide Computerworld readers with a brief summary in your own words of AWK as a language?

AWK is a language for processing files of text. A file is treated as a sequence of records, and by default each line is a record. Each line is broken up into a sequence of fields, so we can think of the first word in a line as the first field, the second word as the second field, and so on. An AWK program is of a sequence of pattern-action statements. AWK reads the input a line at a time. A line is scanned for each pattern in the program, and for each pattern that matches, the associated action is executed.

A simple example should make this clear. Suppose we have a file in which each line is a name followed by a phone number. Let's say the file contains the line 'Naomi 1234'. In the AWK program the first field is referred to as $1, the second field as $2, and so on Thus, we can create an AWK program to retrieve Naomi's phone number by simply writing $1 == "Naomi" {print $2} which means if the first field matches Naomi, then print the second field. Now you're an AWK programmer! If you typed that program into AWK and presented it with a file that had names and phone numbers that program, then it would print 1234 as Naomi's phone number.

A typical AWK program would have several pattern-action statements. The patterns can be Boolean combinations of strings and numbers; the actions can be statements in a C-like programming language.

AWK became popular since it was one of the standard programs that came with every UNIX system.

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags a-z of programming languages

More about Bell LabsetworkWall Street

Show Comments