Difficulties in design of lexical analyzers
. Is it as simple as it sounds?
. Lexemes in a fixed position. Fix format vs. free format languages
. Handling of blanks
- in Pascal, blanks separate identifiers
- in Fortran, blanks are important only in literal strings for example variable counter is same as count er
- Another example
DO 10 I = 1.25 DO10I=1.25
DO 10 I = 1,25 DO10I=1,25
The design of a lexical analyzer is quite complicated and not as simple as it looks. There are several kinds of problems because of all the different types of languages we have. Let us have a look at some of them. For example:
1. We have both fixed format and free format languages - A lexeme is a sequence of character in source program that is matched by pattern for a token. FORTRAN has lexemes in a fixed position. These white space and fixed format rules came into force due to punch cards and errors in punching. Fixed format languages make life difficult because in this case we have to look at the position of the tokens also.
2. Handling of blanks - It's of our concern that how do we handle blanks as many languages (like Pascal, FORTRAN etc) have significance for blanks and void spaces. When more than one pattern matches a lexeme, the lexical analyzer must provide additional information about the particular lexeme that matched to the subsequent phases of the lexical analyzer. In Pascal blanks separate identifiers. In FORTRAN blanks are important only in literal strings. For example, the variable " counter " is same as " count er ".
Another example is
DO 10 I = 1.25
DO 10 I = 1,25
The first line is a variable assignment DO10I = 1.25. The second line is the beginning of a Do loop. In such a case we might need an arbitrary long lookahead. Reading from left to right, we cannot distinguish between the two until the " , " or " . " is reached.
|