Lexical analysis

Lexical Analysis: The Almost Unforgotten One

Lexical analysis is quite a standard term in computer science, especially in the areas of natural language processing and compiler construction. It can also be viewed as the transferring of the source code into a target code that is easier to read and analyze. At this point, analysis of the source code text begins with the lexer that segmentizes text into meaningful units called tokens.

A token is the smallest possible unit in a programming language. Examples of tokens include keywords, operators, identifiers, and literals. For example, when explaining the process of creating code ; int x = 10; the lexer breaks down the code into tokens “int,” “x,” “=,” and “10.” In turn, such tokenization makes parsing and semantic analysis and processes of CC phases simpler as the order of information has already matched them to each of the constituent parts of the source code.

Additionally, performing lexical analysis allows a developer to catch some errors quickly; lexical analysis in compiler design involves discarding of invalid tokens during early phases of compilation. In other words, a developer is able to get feedback regarding syntax errors right away which in no doubt improves the quality of the code and increases productivity. In conclusion, the idea behind lexical analysis is to act as the very first step in encoding a high-level language into binary which is then followed by developing an application program.