Chapter 3: Lexical Analysis

Notation

. Let S be a set of characters. A language over S is a set of strings of characters belonging to S

. A regular expression r denotes a language L(r)

. Rules that define the regular expressions over S

- ? is a regular expression that denotes { ? } the set containing the empty string

- If a is a symbol in S then a is a regular expression that denotes {a}

Let S be a set of characters. A language over S is a set of strings of characters belonging to S . A regular expression is built up out of simpler regular expressions using a set of defining rules. Each regular expression r denotes a language L( r ). The defining rules specify how L( r ) is formed by combining in various ways the languages denoted by the sub expressions of r . Following are the rules that define the regular expressions over S :

. ? is a regular expression that denotes { ? }, that is, the set containing the empty string.

. If a is a symbol in S then a is a regular expression that denotes { a } i.e., the set containing the string a . Although we use the same notation for all three, technically, the regular expression a is different from the string a or the symbol a . It will be clear from the context whether we are talking about a as a regular expression, string or symbol.