Chapter 3: Lexical Analysis

LEX: A lexical analyzer generator

Refer to LEX User's Manual

In this section, we consider the design of a software tool that automatically constructs the lexical analyzer code from the LEX specifications. LEX is one such lexical analyzer generator which produces C code based on the token specifications. This tool has been widely used to specify lexical analyzers for a variety of languages. We refer to the tool as Lex Compiler, and to its input specification as the Lex language. Lex is generally used in the manner depicted in the slide. First, a specification of a lexical analyzer is prepared by creating a program lex.l in the lex language. Then, the lex.l is run through the Lex compiler to produce a C program lex.yy.c . The program lex.yy.c consists of a tabular representation of a transition diagram constructed from the regular expressions of the lex.l, together with a standard routine that uses the table to recognize lexemes. The actions associated with the regular expressions in lex.l are pieces of C code and are carried over directly to lex.yy.c. Finally, lex.yy.c is run through the C compiler to produce an object program a.out which is the lexical analyzer that transforms the input stream into a sequence of tokens.