Chapter 3: Lexical Analysis

 

#include <stdio.h>

#include <ctype.h>

int lineno = 1;

int tokenval = NONE;

int lex() {

             int t;

             while (1) {

             t = getchar ();

             if (t = = ' ' || t = = '\t');

             else if (t = = '\n')lineno = lineno + 1;

             else if (isdigit (t) ) {

          tokenval = t - '0' ;

          t = getchar ();

          while (isdigit(t)) {

       tokenval = tokenval * 10 + t - '0' ;

       t = getchar();

}

ungetc(t,stdin);

return num;

}

else { tokenval = NONE;return t; }

}

}

A crude implementation of lex() analyzer to eliminate white space and collect numbers is shown. Every time the body of the while statement is executed, a character is read into t. If the character is a blank (written ' ') or a tab (written '\t'), then no token is returned to the parser; we merely go around the while loop again. If a character is a new line (written '\n'), then a global variable "lineno" is incremented, thereby keeping track of line numbers in the input, but again no token is returned. Supplying a line number with the error messages helps pin point errors. The code for reading a sequence of digits is on lines 11-19. The predicate isdigit(t) from the include file <ctype.h> is used on lines 11 and 14 to determine if an incoming character t is a digit. If it is, then its integer value is given by the expression t-'0' in both ASCII and EBCDIC. With other character sets, the conversion may need to be done differently.