Chapter 3: Lexical Analysis

Transition diagram for unsigned numbers

. The lexeme for a given token must be the longest possible

. Assume input to be 12.34E56

. Starting in the third diagram the accept state will be reached after 12

. Therefore, the matching should always start with the first transition diagram

. If failure occurs in one transition diagram then retract the forward pointer to the start state and activate the next diagram

. If failure occurs in all diagrams then a lexical error has occurred

The lexeme for a given token must be the longest possible. For example, let us assume the input to be 12.34E56 . In this case, the lexical analyzer must not stop after seeing 12 or even 12.3. If we start at the third diagram (which recognizes the integers) in the previous slide, the accept state will be reached after 12. Therefore, the matching should always start with the first transition diagram. In case a failure occurs in one transition diagram then we retract the forward pointer to the start state and start analyzing using the next diagram. If failure occurs in all diagrams then a lexical error has occurred i.e. the input doesn't pass through any of the three transition diagrams. So we need to prioritize our rules and try the transition diagrams in a certain order (changing the order may put us into trouble). We also have to take care of the principle of maximal munch i.e. the automata should try matching the longest possible token as lexeme.