In taking a recent course in compiler theory, I decided to write about the implementation details briefly, although the full source code is available at this repository. This compiler does not, however, feature any machine independent optimizations during the code generation phase. I may do another post on compiler optimization, and making use of parallelism in compilers.
The goal of a compiler is to translate
A compiler by definition translates a program in one language into an equivalent program in another language.
So, the lexical analysis phase attempts to break a program in tokens, specifically lexemes separated by a delimiter, which is usually whitespace or curly braces. The goal of the lexer is to generate a token stream to send to the parser.
Contructing an LL(1) Parser
Parsing a program is a rather complex task. The path the parser takes through the program is based on the language specification. There are also various solutions for implementation, including constructing a parse tree of all tokens in a program, such that a program can be expressed by an in order traversal of the Abstract Syntax Tree (AST). Basically, in my implementation I used an LL(1) parser, which is a top-down parser that uses a left-to-right scan, with a leftmost derivation. This parser also uses a one token look ahead. The lexical analyzer and the parser together comprise the semantic analysis component of the compiler.
Constructing a symbol table is useful in scoping and determining a variable’s type. I found the implementation to be rather straightforward. A symbol table can be implemented using a stack of scopes, each of which is a table of the symbols that are currently in scope. So, the symbol table must be integrated into the parser, in order for symbols to be added to the table. This allows for type checking to be implemented in the parsing phase of the program.
There were certain runtime components that needed to be implemented, in order for the program to have access to certain language level functions. These functions were simply added to the symbol table on a global scope, so that the resulting program would always have these runtime functions in scope.
As previously mentioned, this compiler project did not feature any machine independent optimizations. Because of this, the resulting code, although correct, was not extremely efficient. For example, a 3 variable addition:
could be broken down into:
1 2 3