Chapter 3: Lexical Analysis

How to break up text

Elsex=0
else
x
=
0
elsex = 0

. Regular expressions alone are not enough

. Normally longest match wins

. Ties are resolved by prioritizing tokens

. Lexical definitions consist of regular definitions, priority rules and maximal munch principle

We can see that regular expressions are not sufficient to help us in breaking up our text. Let us consider the example " elsex=0 ".In different programming languages this might mean " else x=0 " or "elsex=0". So the regular expressions alone are not enough. In case there are multiple possibilities, normally the longest match wins and further ties are resolved by prioritizing tokens. Hence lexical definitions consist of regular definitions, priority rules and prioritizing principles like maximal munch principle. The information about the language that is not in the regular language of the tokens can be used to pinpoint the errors in the input. There are several ways in which the redundant matching in the transitions diagrams can be avoided.