![](https://steemitimages.com/640x0/https://i.postimg.cc/bYt3y2jB/compiler-series.jpg)
[Custom Thumbnail]
All the Code of the series can be found at the Github repository:
https://github.com/drifter1/compiler
Introduction
Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will tweak some of the previous code and generate the code of Arithmetic expressions and "simple" Variable References.Requirements:
Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:- What Compiler Design is (mainly the steps)
- For which exact Language the Compiler is build for (Tokens and Grammar)
- How to use Flex and Bison
- How to implement a lexer and parser for the language using those tools
- What the Symbol Table is and how we implement it
- How we combine Flex and Bison together
- How we can pass information from the Lexer to the Parser
- How we define operator priorities, precedencies and associativity
- What Semantic Analysis is (Attributes, SDT etc.)
- How we do the so called "Scope Resolution"
- How we declare types and check the type inter-compatibility for different cases that include function parameters and assignments
- How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
- Intermediate Code Representations, including AST's
- How we implement an AST (structure and management)
- Action Rules for AST nodes and more
- The target system's architecture and its assembly language (MIPS)
- Register Allocation & Assignment and how we implement it
- How we generate Assembly code for various cases
Difficulty:
Talking about the series in general this series can be rated:- Intermediate to Advanced
- Medium
Actual Tutorial Content
Tweaking Previous Code
During the implementation of the new Code, I found out some things that have to be changed in order for them to work correctly or to just simplify things a little more...expression_data_type() Function
The first change is inside of the ast.c function "expression_data_type" that returns the datatype of any given node/expression. Knowing that the actual datatype of a pointer is an integer, we will return INT_TYPE instead of the inf_type entry that can easily return REAL_TYPE for pointers to real variables.So, the REF_NODE case in the switch-case of that function now becomes:
case REF_NODE: /* identifier reference */
temp_ref = (AST_Node_Ref *) node;
/* if "simple" type */
int type = temp_ref->entry->st_type;
if(type == INT_TYPE || type == REAL_TYPE || type == CHAR_TYPE){
return temp_ref->entry->st_type;
}
/* if array */
else if(type == ARRAY_TYPE){
return temp_ref->entry->inf_type;
}
else if(type == POINTER_TYPE){
return INT_TYPE;
}
break;
main_reg_allocation() Function
Thinking about the stuff that we talked about in part 1, relational and equality nodes don't get stored inside of temporary variables, but are translated into branch instructions. This tells us that the REL_NODE and EQU_NODE cases don't have to insert a new temporary variable. For those types of nodes we also don't have to insert edges to the adjacency graph.So, we will just comment-out any call of the function that's about conditions and empty-out those two previously mentioned node-cases.
The new code is:
...
case REL_NODE:
/* used in branch and loop conditions - not stored */
break;
case EQU_NODE:
/* used in branch and loop conditions - not stored */
break;
...
case IF_NODE:
temp_if = (struct AST_Node_If *) node;
//main_reg_allocation(temp_if->condition);
...
case ELSIF_NODE:
temp_elsif = (struct AST_Node_Elsif *) node;
//main_reg_allocation(temp_elsif->condition);
main_reg_allocation(temp_elsif->elsif_branch);
break;
case FOR_NODE:
temp_for = (struct AST_Node_For *) node;
//main_reg_allocation(temp_for->initialize);
//main_reg_allocation(temp_for->condition);
main_reg_allocation(temp_for->for_branch);
//main_reg_allocation(temp_for->increment);
break;
case WHILE_NODE:
temp_while = (struct AST_Node_While *) node;
//main_reg_allocation(temp_while->condition);
main_reg_allocation(temp_while->while_branch);
break;
....
generate_statements() Function
In order to make the load and store operations of the main "declared" variables less, we will add lots of new edges to the AdjGraph, so that those variables have a different register with each other and all the temporary variables.The Code for adding those edges is:
for(i = 0; i < var_count - temp_count; i++){
for(j = 1; j < var_count; j++){
if(i < j){
insertEdge(i, j);
insertEdge(j, i);
}
}
}
GetRegisterName() Function
To simplify the naming process of registers, we create a new function that takes in two parameters: the color and if the registers if GP or FP. So, each variable of the varArray is basically allocated two registers: one GP and one FP. Lots of the GP and FP registers will be allocated for other specific purposes (FP constants, type conversion etc.), but we will still have more than 10 GP registers available and 12 dual-FP registers available. Those numbers might change, but for now that's it!The GP registers that can be used are $s0-$s7 and $t0-$t7. The FP registers that are available are $f0-$f24. As we said last time, register $f30 is reserved for conversion purposes. Additionally, we will now also reserve register $f28 for floating-point constants and register $26 for the conversion of the result case: double to int.
Either way, for now the function's Code is:
char * GetRegisterName(int color, int isFloat){
char* regName;
regName = (char*) malloc(5 * sizeof(char));
if(isFloat == 0){
switch(color){
/* callee saved values */
case 0:
case 1:
case 2:
case 3:
case 4:
case 5:
case 6:
case 7:
sprintf(regName, "$s%d", color);
break;
/* caller saved temporaries */
case 8:
case 9:
case 10:
case 11:
case 12:
case 13:
case 14:
case 15:
sprintf(regName, "$t%d", color - 8);
break;
default:
fprintf(stderr, "Too many GP registers!\n");
exit(1);
}
}
else{
switch(color){
/* callee saved values */
case 0:
case 1:
case 2:
case 3:
case 4:
case 5:
case 6:
case 7:
case 8:
case 9:
case 10:
case 11:
case 12:
sprintf(regName, "$f%d", color*2);
break;
default:
fprintf(stderr, "Too many FP registers!\n");
exit(1);
}
}
return regName;
}
Generating the Main Function
main_func_traversal() Function
To generate the code of the main funciton we have to define a new function that again uses the same "recursive" concept as "ast_traversal" or the newer "main_reg_allocation" function. We will only use that new function to traverse through the main function's AST Tree (main_func_tree). The actual generation of expressions, statements etc. will be done through other functions. In other words, we will define functions like generate_arithm() or generate_bool(), to generate the needed code for those cases.So, how does this function look like? Well, the Code is
void main_func_traversal(FILE *fp, AST_Node *node){
int i;
AST_Node_Declarations *temp_declarations;
AST_Node_Decl *temp_decl;
AST_Node_Arithm *temp_arithm;
AST_Node_Bool *temp_bool;
AST_Node_Rel *temp_rel;
AST_Node_Equ *temp_equ;
AST_Node_Ref *temp_ref;
AST_Node_Statements *temp_statements;
AST_Node_If *temp_if;
AST_Node_Elsif *temp_elsif;
AST_Node_For *temp_for;
AST_Node_While *temp_while;
AST_Node_Incr *temp_incr;
AST_Node_Assign *temp_assign;
AST_Node_Func_Call *temp_func_call;
AST_Node_Call_Params *temp_call_params;
/* temp variable ST entry */
list_t *entry;>
/* check if empty */
if(node == NULL){
return;
}
switch(node->type){
/* declarations case */
case DECLARATIONS:
/* nothing */
break;
/* declaration case */
case DECL_NODE:
/* nothing */
break;
/* left and right child cases */
case BASIC_NODE:
main_func_traversal(fp, node->left);
main_func_traversal(fp, node->right);
break;
case ARITHM_NODE:
temp_arithm = (struct AST_Node_Arithm *) node;
main_func_traversal(fp, node->left);
main_func_traversal(fp, node->right);
generate_arithm(fp, temp_arithm);
break;
case BOOL_NODE:
temp_bool = (struct AST_Node_Bool *) node;
main_func_traversal(fp, node->left);
main_func_traversal(fp, node->right);
generate_bool(fp, temp_bool);
break;
case REL_NODE:
temp_rel = (struct AST_Node_Rel *) node;
main_func_traversal(fp, node->left);
main_func_traversal(fp, node->right);
generate_rel(fp, temp_rel);
break;
case EQU_NODE:
temp_equ = (struct AST_Node_Equ *) node;
main_func_traversal(fp, node->left);
main_func_traversal(fp, node->right);
generate_equ(fp, temp_equ);
break;
/* reference case */
case REF_NODE:
temp_ref = (struct AST_Node_Ref *) node;
/* load value from memory to register */
generate_load(fp, temp_ref);
break;
/* constant case */
case CONST_NODE:
/* already managed in the various generation functions */
break;
/* statements case */
case STATEMENTS:
temp_statements = (struct AST_Node_Statements *) node;
for(i = 0; i < temp_statements->statement_count; i++){
main_func_traversal(fp, temp_statements->statements[i]);
}
break;
/* the if case */
case IF_NODE:
temp_if = (struct AST_Node_If *) node;
main_func_traversal(fp, temp_if->condition);
main_func_traversal(fp, temp_if->if_branch);
if(temp_if->elseif_count > 0 ){
for(i = 0; i < temp_if->elseif_count; i++){
main_func_traversal(fp, temp_if->elsif_branches[i]);
}
}
if(temp_if->else_branch != NULL){
main_func_traversal(fp, temp_if->else_branch);
}
break;
/* the else if case */
case ELSIF_NODE:
temp_elsif = (struct AST_Node_Elsif *) node;
main_func_traversal(fp, temp_elsif->condition);
main_func_traversal(fp, temp_elsif->elsif_branch);
break;
/* for case */
case FOR_NODE:
temp_for = (struct AST_Node_For *) node;
main_func_traversal(fp, temp_for->initialize);
main_func_traversal(fp, temp_for->condition);
main_func_traversal(fp, temp_for->for_branch);
main_func_traversal(fp, temp_for->increment);
break;
/* while case */
case WHILE_NODE:
temp_while = (struct AST_Node_While *) node;
main_func_traversal(fp, temp_while->condition);
main_func_traversal(fp, temp_while->while_branch);
break;
/* assign case */
case ASSIGN_NODE:
temp_assign = (struct AST_Node_Assign *) node;
main_func_traversal(fp, temp_assign->assign_val);
break;
/* simple case */
case SIMPLE_NODE:
/* will be managed in another article */
break;
/* increment statement */
case INCR_NODE:
temp_incr = (AST_Node_Incr*) node;
/* will be covered in another article */
break;
/* function call case */
case FUNC_CALL:
temp_func_call = (struct AST_Node_Func_Call *) node;
if(temp_func_call->num_of_pars != 0){
for(i = 0; i < temp_func_call->num_of_pars; i++){
main_func_traversal(fp, temp_func_call->params[i]);
}
}
/* when function non-void */
if(temp_func_call->entry->inf_type != VOID_TYPE){
generate_func_call_res(fp, temp_func_call);
}
break;
case CALL_PARAMS:
temp_call_params = (struct AST_Node_Call_Params*) node;
/* parameters will be covered in another article */
break;
/* function declaration stuff */
case FUNC_DECLS:
case FUNC_DECL:
case RET_TYPE:
case DECL_PARAMS:
case RETURN_NODE:
/* can't occur in main */
break;
default: /* wrong choice case */
fprintf(stderr, "Error in node selection!\n");
exit(1);
}
}
The various generation function that we use are declared as following:
void generate_arithm(FILE *fp, AST_Node_Arithm *node);
void generate_bool(FILE *fp, AST_Node_Bool *node);
void generate_rel(FILE *fp, AST_Node_Rel *node);
void generate_equ(FILE *fp, AST_Node_Equ *node);
void generate_load(FILE *fp, AST_Node_Ref *node);
void generate_func_call_res(FILE *fp, AST_Node_Func_Call *node);
Let's get into the implementation of two of them...
generate_load() Function
The load instruction is called to get the value of a variable from memory and to load it into the corresponding register of that variable. The types of load instructions that we can have are basically:- LW - loading value from memory
- LA - loading address from memory
- L.D - loading floating-point value from memory
Based on that, we write the following code
void generate_load(FILE *fp, AST_Node_Ref *node){
if(node->entry->st_type == REAL_TYPE){
fprintf(fp, "L.D %s, %s\n", GetRegisterName(node->entry->g_index, 1), node->entry->st_name);
}
else{
if(node->ref == 1){
fprintf(fp, "LA %s, %s($0)\n", GetRegisterName(node->entry->g_index, 0), node->entry->st_name);
}
else{
fprintf(fp, "LW %s, %s($0)\n", GetRegisterName(node->entry->g_index, 0), node->entry->st_name);
}
}
}
generate_arithm() Function
Generating arithmetic expressions is veeery veeery complicated as we might have to convert values, store floating-point constant...There are basically lots of different types of operands and combinations between them. So, let's start simple...Basic block
The basic block of this function will be a switch-case of the operator-type and so we have:void generate_arithm(FILE *fp, AST_Node_Arithm *node){
...
/* operation */
switch(node->op){
case ADD:
/* code */
break;
case SUB:
/* code */
break;
case MUL:
/* code */
break;
case DIV:
/* code */
break;
case INC:
/* code */
break;
case DEC:
/* code */
break;
}
}
Of course the instruction depends on the operator!
Increment/Decrement Operators
The simplest arithmetic instructions are increments and decrements. For those we basically just have to create:- ADDI - incrementation of integer value
- SUBI - decrementation of integer value
- ADD.D with FP constant "1.0" stored in $f28 - incrementation of FP value
- SUB.D with FP constant "1.0" stored in $f28 - decrementation of FP value
In Code this looks like this
...
case INC:
/* check data type */
if (node->data_type == REAL_TYPE){
fprintf(fp, "LI.D $28, 1.0\n");
fprintf(fp, "ADD.D %s, %s, $28\n", GetRegisterName(node->g_index, 1), GetRegisterName(node->g_index, 1));
}
else{
fprintf(fp, "ADDI %s, %s, 1\n", GetRegisterName(node->g_index, 0), GetRegisterName(node->g_index, 0));
}
break;
case DEC:
/* check data type */
if (node->data_type == REAL_TYPE){
fprintf(fp, "LI.D $28, 1.0\n");
fprintf(fp, "SUB.D %s, %s, $28\n", GetRegisterName(node->g_index, 1), GetRegisterName(node->g_index, 1));
}
else{
fprintf(fp, "SUBI %s, %s, 1\n", GetRegisterName(node->g_index, 0), GetRegisterName(node->g_index, 0));
}
break;
...
As you can see we are using GetRegisterName() to get the specific register of the variable that gets incremented (node->g_index). Also see how we use 1 or 0 as the second parameter, to denote the type of the register (FP or GP).
Other Operators
The code of the other operators simply changes by the prefix (ADD, SUB, MUL, DIV), but when do we use OP, OPI or OP.D and how do we manage the various operands? Well, the easiest thing to do is to define some new variables:- float_op flag - 0: integer operation, 1: floating-point operation
- const_op flag - 0 : operation without constant, 1: operation contains constant
- Result (type) - 0: integer, 1: double
- Operand1 - 0: GP, 1: FP, 2: constant, 3: FP constant
- Operand2 - 0: GP, 1: FP, 2: constant, 3: FP constant
First we have to check the left (or 1st) operand:
if (expression_data_type(node->left) == REAL_TYPE){
float_op = 1;
if(node->left->type == CONST_NODE){
const_op = 1;
Operand1 = 3;
}
else{
Operand1 = 1;
}
}
else{
if(node->left->type == CONST_NODE){
const_op = 1;
Operand1 = 2;
}
else{
Operand1 = 0;
}
}
Similar code is written to check the right (or 2nd) operand.
Afterwards, we also check the result:
if(node->data_type == REAL_TYPE){
float_op = 1;
Result = 1;
}
After getting the information of the operands and result and storing it into the previously mentioned 5 variables, we basically have to check in which of the various combinations we are in...using continuous if-statements...
Experimenting a lot I ended up with the following code for the ADD operator:
if(float_op == 1){
if(const_op == 1){
if(Operand1 == 2 || Operand1 == 3){
temp_const = (AST_Node_Const *) node->left;
}
else{
temp_const = (AST_Node_Const *) node->right;
}
/* floating-point constant */
if(temp_const->const_type == REAL_TYPE){
fprintf(fp, "LI.D $f28, %.2f\n", temp_const->val);
}
else{
fprintf(fp, "LI.D $f28, %d.0\n", temp_const->val);
}
}
/* operand needs conversion */
if(Operand1 == 0){
fprintf(fp, "MTC1.D %s, $f30\n", GetRegisterName(getGraphIndex(node->left) , 0));
fprintf(fp, "CVT.D.W $f30, $f30\n");
}
else if(Operand2 == 0){
fprintf(fp, "MTC1.D %s, $f30\n", GetRegisterName(getGraphIndex(node->right) , 0));
fprintf(fp, "CVT.D.W $f30, $f30\n");
}
fprintf(fp, "ADD.D ");
/* check if result needs conversion */
if(Result == 0){
fprintf(fp, "$f26, ");
}
else{
fprintf(fp, "%s, ", GetRegisterName(node->g_index, 1));
}
switch(Operand1){
case 0:
fprintf(fp, "$f30 ");
break;
case 1:
fprintf(fp, "%s ", GetRegisterName(getGraphIndex(node->left) , 1));
break;
case 2:
case 3:
fprintf(fp, "$f28 ");
}
switch(Operand2){
case 0:
fprintf(fp, "$f30 ");
break;
case 1:
fprintf(fp, "%s ", GetRegisterName(getGraphIndex(node->right) , 1));
break;
case 2:
case 3:
fprintf(fp, "$f28 ");
}
fprintf(fp, "\n");
/* result needs type-conversion */
if(Result == 0){
fprintf(fp, "CVT.W.D $f26, $f26\n");
fprintf(fp, "MTC1 %s, $f26\n", GetRegisterName(node->g_index, 0));
}
}
else if(const_op == 1){
if(Operand1 != 0){
temp_const = (AST_Node_Const *) node->left;
fprintf(fp, "ADDI %s, %s, %d\n", GetRegisterName(node->g_index, 0), GetRegisterName(getGraphIndex(node->right), 0), temp_const->val);
}
if(Operand2 != 0){
temp_const = (AST_Node_Const *) node->right;
fprintf(fp, "ADDI %s, %s, %d\n", GetRegisterName(node->g_index, 0), GetRegisterName(getGraphIndex(node->left), 0), temp_const->val);
}
}
else{
fprintf(fp, "ADD %s, %s, %s\n", GetRegisterName(node->g_index, 0), GetRegisterName(getGraphIndex(node->left), 0), GetRegisterName(getGraphIndex(node->right), 0));
}
Hope that its understandable!
For now, the other generation functions simply print out the type...
Output for Examples
Let's run our compiler for the examples to see the new output files!Simple Example 1
Running the simplest example, we are basically already creating the final code...The Console output is:
![](https://steemitimages.com/640x0/https://i.postimg.cc/TPC5dJRg/image.png)
The generated output file is:
.data
# variables
i: .word 0
val: .double 0.000000
res: .double 0.000000
# messages
.text
main:
L.D $f2, val
LI.D $f28, 1.0
ADD.D $f6, $f2 $f28
Interesting right?
Simple Example 2
Running for the second example we get:![](https://steemitimages.com/640x0/https://i.postimg.cc/RF1SCM18/image.png)
The output file is:
.data
# variables
i: .word 0
val: .double 2.500000
res: .space 80
# messages
.text
main:
LW $s0, i($0)
Relational Expression
L.D $f2, val
LW $s0, i($0)
Function Call result
LW $s2, res($0)
LW $s2, res($0)
Here we can see that some of the code is missing!
Full Example
Running for the complete example we get:![](https://steemitimages.com/640x0/https://i.postimg.cc/4N70pWq4/image.png)
The output code is:
.data
# variables
c: .byte 'c'
i: .word 0
p: .word 0
val: .double 2.500000
res: .double 0.500000, 1.500000, 2.500000, 3.500000, 4.500000, 5.500000
# messages
msg1: .asciiz "\n"
msg2: .asciiz "\n"
msg3: .asciiz "iteration: 3\n"
msg4: .asciiz " "
msg5: .asciiz "\n"
.text
main:
LA $s3, res($0)
LW $s0, i($0)
Relational Expression
LW $s0, i($0)
Relational Expression
LW $s0, i($0)
Equality Expression
LW $s0, i($0)
MULI $s5, $s0, 2
Function Call result
L.D $f4, val
LW $s0, i($0)
Function Call result
LW $s3, res($0)
L.D $f4, val
LW $s0, i($0)
Function Call result
LW $s3, res($0)
LW $s3, res($0)
LW $s4, p($0)
ADDI $t1, $s4, 1
LW $s0, i($0)
Equality Expression
L.D $f4, val
Equality Expression
Boolean Expression
LW $s0, i($0)
Relational Expression
LW $s0, i($0)
LW $s1, c($0)
Oh no! In this example lots of the code is still missing, but we will make sure that all the empty spots get filled out, by the end of all these "generating code" articles!
The other expressions will be covered in the next part(s)...
RESOURCES
References:
No references, just using code that I implemented in my previous articles.Images:
All of the images are custom-made!Previous parts of the series
General Knowledge and Lexical Analysis
- Introduction
- A simple C Language
- Lexical Analysis using Flex
- Symbol Table (basic structure)
- Using Symbol Table in the Lexer
Syntax Analysis
- Syntax Analysis Theory
- Bison basics
- Creating a grammar for our Language
- Combine Flex and Bison
- Passing information from Lexer to Parser
- Finishing Off The Grammar/Parser: [part 1] [part 2]
Semantic Analysis (1)
- Semantic Analysis Theory
- Semantics Examples
- Scope Resolution using the Symbol Table
- Type Declaration and Checking
- Function Semantics: [part 1] [part 2]
Intermediate Code Generation (AST)
- Abstract Syntax Tree Principle
- Abstract Syntax Tree Structure
- Abstract Syntax Tree Management
- Action Rules for Declarations and Initializations
- Action Rules for Expressions
- Action Rules for Assignments and Simple Statements
- Action Rules for If-Else Statements
- Action Rules for Loop Statements and some Fixes
- Action Rules for Function Declarations: [part 1][part 2]
- Action Rules for Function Calls
Semantic Analysis (2)
- Datatype attribute for Expressions
- Type Checking for Assignments
- Revisit Queue and Parameter Checking: [part 1][part 2][part 3][part 4]
- Revisit Queue and Assignment Checking : [part 1][part 2][part 3]
Machine Code Generation
- Machine Code Generation Principles
- MIPS Instruction Set
- Simple Examples in MIPS Assembly
- full_example.c in MIPS Assembly: [part 1][part 2]
- Generating Code for Declarations and Initializations
- Generating Code for Array Initializations and String Messages
- Register Allocation & Assignment Theory
- Implementing Register Allocation: [part 1][part 2][part 3][part 4]
- Generating Code for Expressions: [part 1]
Final words | Next up on the project
And this is actually it for today's post! I hope that I explained everything as much as I needed to. In the next part(s) we will generate the code for the remaining expressions...Next up on this series, in general, are:
- Machine Code generation (MIPS Assembly)
- Various Optimizations in the Compiler's Code
So, see ya next time!
GitHub Account:
https://github.com/drifter1![](https://steemitimages.com/0x0/https://media.giphy.com/media/ybITzMzIyabIs/giphy.gif)
Keep on drifting! ;)
Thank you for your contribution @drifter1.
We have been analyzing your tutorial and we suggest the following points:
Again an excellent tutorial in which it is well explained and well structured. Good job!
Thanks for following our suggestions in your previous tutorial.
Thank you for your work in developing this tutorial.
Looking forward to your upcoming tutorials.
Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.
To view those questions and the relevant answers related to your post, click here.
Need help? Chat with us on Discord.
[utopian-moderator]
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Thank you for your review, @portugalcoin! Keep up the good work!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Hi @drifter1!
Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Hey, @drifter1!
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!
Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).
Want to chat? Join us on Discord https://discord.gg/h52nFrV.
Vote for Utopian Witness!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit