Power System Analysis

Chapter 3: Lexical Analysis

How to specify tokens

If S is an alphabet of basic symbols, then a regular definition is a sequence of definitions of the form

d₁r₁

d₂r₂

.............

d_nr_n

where each d_i is a distinct name, and each r_i is a regular expression over the symbols in i.e. the basic symbols and the previously defined names. By restricting each r_i to symbols of S and the previously defined names, we can construct a regular expression over S for any r_i by repeatedly replacing regular-expression names by the expressions they denote. If r_i used d_kfor some k >= i, then r_i might be recursively defined, and this substitution process would not terminate. So, we treat tokens as terminal symbols in the grammar for the source language. The lexeme matched by the pattern for the token consists of a string of characters in the source program and can be treated as a lexical unit. The lexical analyzer collects information about tokens into there associated attributes. As a practical matter a token has usually only a single attribute, appointed to the symbol table entry in which the information about the token is kept; the pointer becomes the attribute for the token.