Problems
. Scans text character by character
. Look ahead character determines what kind of token to read and when the current token ends
. First character cannot determine what kind of token we are going to read
The problem with lexical analyzer is that the input is scanned character by character. Now, its not possible to determine by only looking at the first character what kind of token we are going to read since it might be common in multiple tokens. We saw one such an example of > and >= previously. So one needs to use a lookahead character depending on which one can determine what kind of token to read or when does a particular token end. It may not be a punctuation or a blank but just another kind of token which acts as the word boundary. The lexical analyzer that we just saw used a function ungetc() to push lookahead characters back into the input stream. Because a large amount of time can be consumed moving characters, there is actually a lot of overhead in processing an input character. To reduce the amount of such overhead involved, many specialized buffering schemes have been developed and used.
|