Chapter 3: Lexical Analysis

Lexical analyzer generator

. Input to the generator

- List of regular expressions in priority order

- Associated actions for each of regular expression (generates kind of token and other book keeping information)

. Output of the generator

- Program that reads input character stream and breaks that into tokens

- Reports lexical errors (unexpected characters), if any

  We assume that we have a specification of lexical analyzers in the form of regular expression and the corresponding action parameters. Action parameter is the program segments that is to be executed whenever a lexeme matched by regular expressions is found in the input. So, the input to the generator is a list of regular expressions in a priority order and associated actions for each of the regular expressions. These actions generate the kind of token and other book keeping information. Our problem is to construct a recognizer that looks for lexemes in the input buffer. If more than one pattern matches, the recognizer is to choose the longest lexeme matched. If there are two or more patterns that match the longest lexeme, the first listed matching pattern is chosen. So, the output of the generator is a program that reads input character stream and breaks that into tokens. It also reports in case there is a lexical error i.e. either unexpected characters occur or an input string doesn't match any of the regular expressions.