Chapter 3:Lexical Analysis

. The algorithm gives priority to tokens listed earlier

- Treats "if" as keyword and not identifier

. How much input is used? What if

- x1 .xi? L(R)

- x1.xj ? L(R)

- Pick up the longest possible string in L(R)

- The principle of "maximal munch"

. Regular expressions provide a concise and useful notation for string patterns

. Good algorithms require a single pass over the input

A simple technique for separating keywords and identifiers is to initialize appropriately the symbol table in which information about identifier is saved. The algorithm gives priority to the tokens listed earlier and hence treats "if" as keyword and not identifier. The technique of placing keywords in the symbol table is almost essential if the lexical analyzer is coded by hand. Without doing so the number of states in a lexical analyzer for a typical programming language is several hundred, while using the trick, fewer than a hundred states would suffice. If a token belongs to more than one category, then we go by priority rules such as " first match " or " longest match ". So we have to prioritize our rules to remove ambiguity. If both x1 .xi and x1 .xj ε L(R) then we pick up the longest possible string in L(R). This is the principle of " maximal munch ". Regular expressions provide a concise and useful notation for string patterns. Our goal is to construct a lexical analyzer that will isolate the lexeme for the next token in the input buffer and produce as output a pair consisting of the appropriate token and attribute value using the translation table. We try to use a algorithm such that we are able to tokenize our data in single pass. Basically we try to efficiently and correctly tokenize the input data.