Chapter 3: Lexical Analysis

Lexical Analysis

. Recognize tokens and ignore white spaces, comments

Generates token stream


. Error reporting

. Model using regular expressions

. Recognize using Finite State Automata

The first phase of the compiler is lexical analysis. The lexical analyzer breaks a sentence into a sequence of words or tokens and ignores white spaces and comments. It generates a stream of tokens from the input. This is modeled through regular expressions and the structure is recognized through finite state automata. If the token is not valid i.e., does not fall into any of the identifiable groups, then the lexical analyzer reports an error. Lexical analysis thus involves recognizing the tokens in the source program and reporting errors, if any. We will study more about all these processes in the subsequent slides