Chapter 2: Introduction to compilers

Lexical Analysis

. Recognizing words is not completely trivial. For example:
ist his ase nte nce?

. Therefore, we must know what the word separators are

. The language must define rules for breaking a sentence into a sequence of words.

. Normally white spaces and punctuations are word separators in languages.

. In programming languages a character from a different class may also be treated as word separator.

. The lexical analyzer breaks a sentence into a sequence of words or tokens: - If a == b then a = 1 ; else a = 2 ; - Sequence of words (total 14 words) if a == b then a = 1 ; else a = 2 ;

In simple words, lexical analysis is the process of identifying the words from an input string of characters, which may be handled more easily by a parser. These words must be separated by some predefined delimiter or there may be some rules imposed by the language for breaking the sentence into tokens or words which are then passed on to the next phase of syntax analysis. In programming languages, a character from a different class may also be considered as a word separator.