
[Custom Thumbnail]
All the Code of the series can be found at the Github repository: https://github.com/drifter1/compiler
Introduction
Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will write the needed Action Rules for "while" and "for" Loop statements, whilst also fixing some stuff around precedence and the sign rule!More specifically, the topics that we will cover today are:
- Some Fixes (precedence, sign rule, messages)
- Action rules of while statements
- Action rules of for statements
- Running the compiler for "full_example.c"
Requirements:
Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:- What Compiler Design is (mainly the steps)
- For which exact Language the Compiler is build for (Tokens and Grammar)
- How to use Flex and Bison
- How to implement a lexer and parser for the language using those tools
- What the Symbol Table is and how we implement it
- How we combine Flex and Bison together
- How we can pass information from the Lexer to the Parser
- How we define operator priorities, precedencies and associativity
- What Semantic Analysis is (Attributes, SDT etc.)
- How we do the so called "Scope Resolution"
- How we declare types and check the type inter-compatibility for different cases, including function parameters
- How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
- Intermediate Code Representations, including AST's
- How we implement an AST (structure and management)
- Action Rules for other cases
Difficulty:
Talking about the series in general this series can be rated:- Intermediate to Advanced
- Medium
Actual Tutorial Content
Some Fixes
Fixing the precedence
The first thing that I should change immediately is the way that we defined precedence and associativity in the parser! I put the definitions in the wrong order :D Precedence increases downwards and so we have to put them in increasing order and not decreasing order as I had them as of now! Also, let's set up a new kind of precedence "MINUS" that will help us define the precedence of the minus sign that we put in front of constants (and only them).So, the "new" precedencies and associativities are:
%left COMMA
%right ASSIGN
%left OROP
%left ANDOP
%left EQUOP
%left RELOP
%left ADDOP
%left MULOP DIVOP
%right NOTOP INCR REFER MINUS
%left LPAREN RPAREN LBRACK RBRACK
Changing the sign rule
Having the current sign-rule doesn't allow the use of the newly defined precedence for it! So, what do we do? Well, we will just replace this optional sign rule with two new sub-rules for expressions: one with a simple constant and one with a signed constant. Visually this looks as following:
Removing the sign rule completely and moving the action rules to the corresponding sub-rules we now have the following expressions rule snippet:
expression:
... other sub-rules
| constant
{
$$ = $1; /* no sign */
}
| ADDOP constant %prec MINUS
{
/* plus sign error */
if($1.ival == ADD){
fprintf(stderr, "Error having plus as a sign!\n");
exit(1);
}
else{
AST_Node_Const *temp = (AST_Node_Const*) $2;
/* inverse value depending on the constant type */
switch(temp->const_type){
case INT_TYPE:
temp->val.ival *= -1;
break;
case REAL_TYPE:
temp->val.fval *= -1;
break;
case CHAR_TYPE:
/* sign before char error */
fprintf(stderr, "Error having sign before character constant!\n");
exit(1);
break;
}
$$ = (AST_Node*) temp;
}
}
... other sub-rules
;
In the signed constant rule "ADDOP constant" we now define the precedence of this rule using "%prec MINUS", as you can see. And this is it actually :P
Removing-Tweaking lots of console messages
Last but not least let's also remove some console messages from the compiler. These are:- The comment eat-ups in the lexer (lexer.l)
- The add entry and scope hiding messages from the symbol table functions (symtab.c)
Commenting out the messages from the lexer:
...
"//".* { /* printf("Eat up comment at line %d\n", lineno); */ }
"/*" { /* printf("Eat up comment from line %d ", lineno); */ BEGIN(ML_COMMENT); }
"*/" { /* printf("to line %d\n", lineno); */ BEGIN(INITIAL); }
...
I did the same for the functions "insert" and "hide_scope" of "symtab.c". Go and check it out on GitHub! :)
A snippet of the "ast_traversal" function:
...
else if(node->type == IF_NODE){
AST_Node_If *temp_if = (struct AST_Node_If *) node;
ast_print_node(node);
printf("Condition:\n");
ast_traversal(temp_if->condition);
printf("If branch:\n");
ast_traversal(temp_if->if_branch);
if(temp_if->elseif_count > 0 ){
printf("Else if branches:\n");
for(i = 0; i < temp_if->elseif_count; i++){
printf("Else if branch%d:\n", i);
ast_traversal(temp_if->elsif_branches[i]);
}
}
if(temp_if->else_branch != NULL){
printf("Else branch:\n");
ast_traversal(temp_if->else_branch);
}
}
...
Action rules of while statements
The simplest of today's two statements is the while statement! A while statement can be visualized as following:
We have already managed the two parts of this statement in previous articles! So, what do we do? Well, first of all let's add the token definition of the two loop statements. Both are nodes and so it is pretty simple and looks as following:
%type <node> for_statement while_statement
In the "statement"-rule we just have to pass the information from the actual rule, which means that we have to write:
statement:
... other sub-rules
| for_statement
{
$$ = $1; /* just pass information */
}
| while_statement
{
$$ = $1; /* just pass information */
}
... other sub-rules
;
The actual "while_statement" rule, simple has to use the creation function, passing the correct parameters that it gets from the already made nodes of "expression" and "tail". So, we have:
while_statement: WHILE LPAREN expression RPAREN tail
{
$$ = new_ast_while_node($3, $5);
}
;
Action rules of for statements
We could do similar stuff for the for statements, but let's change the rule a little bit, so that we can store the loop counter-variable inside of the AST node. Let's also check if the same counter-variables is used in both the initialize and increment parts!So, the new AST_Node_For structure is:
typedef struct AST_Node_For{
enum Node_Type type; // node type
// initialization
struct AST_Node *initialize;
// condition
struct AST_Node *condition;
// incrementation
struct AST_Node *increment;
// branch
struct AST_Node *for_branch;
// loop counter
list_t *counter;
}AST_Node_For;
Let's not add this new entry to the creation function, but make a completely new function that will add the counter entry to the structure, whilst also checking if the same one is used in the initialize and increment parts!
This function is defined as following:
ast.h:
...
void set_loop_counter(AST_Node *node);
...
----------------------------------------------------------------------
ast.c:
...
void set_loop_counter(AST_Node *node){
/* type-cast to for node */
AST_Node_For *for_node = (AST_Node_For *) node;
/* find the counter */
AST_Node_Assign *assign_node = (AST_Node_Assign *) for_node->initialize;
for_node->counter = assign_node->entry;
/* check if the same one occurs in increment! */
AST_Node_Incr *incr_node = (AST_Node_Incr *) for_node->increment;
if( strcmp(incr_node->entry->st_name, assign_node->entry->st_name) ){
fprintf(stderr, "Variable used in init and incr of for are not the same!\n");
exit(1);
}
/* type-cast back to AST_Node */
node = (AST_Node *) for_node;
}
...
To make this procedure simpler, I expect to see an increment expression as the expression found in "increment". This means that we will change the "for_statement"-rule to only allow such an expression! So, In the end we can visualize the whole statement as:

The action rules for this statement now have to contain the creation of the increment node. This node and any other node that can be found from the bits and pieces of the rule, will be passed as an parameter of the action for-node creation function. IN the end we will also call the set_loop_counter function, which will take care of the new counter-stuff!
The final "for_statement"-rule becomes:
for_statement: FOR LPAREN assigment SEMI expression SEMI ID INCR RPAREN tail
{
/* create increment node*/
AST_Node *incr_node;
if($8.ival == INC){ /* increment */
incr_node = new_ast_incr_node($7, 0, 0);
}
else{
incr_node = new_ast_incr_node($7, 1, 0);
}
$$ = new_ast_for_node($3, $5, incr_node, $10);
set_loop_counter($$);
}
;
Running the Compiler for "full_example.c"
Let's run the compiler for the full example program. A snippet of the console messages and part of the example file this messages correspond too looks as following:
Amazing right? Changing the order of node printing and finally adding the for and while statements today, we can now clearly see how the program is build up.
Finding the precedence-error
Changing the order of printing I saw that our precedence was defined in wrong order! More specifically the last if-statement of the for (that you can't see in the snippet) had a condition that was parsed in the wrong way! Boolean operators have a higher precedence than equality operators, making them go "higher" in the tree and so print out later. Well, this was not the case with the way that I defined the precedence previously :PThe if-statement is:
if(i == 2 && val == 4.5) ...
So, what I saw in the messages was that the bool operator '&&" was printed before the equality operator '=='. The same also happened in arithmetic operators, making add or sub operators go higher than mul or div operations, which is clearly wrong!! We don't have such a case in the example program and so I never really noticed this error :D
RESOURCES
References:
No references, just using code that I implemented in my previous articles.Images:
All of the images are custom-made!Previous parts of the series
- Introduction -> What is a compiler, what you have to know and what you will learn
- A simple C Language -> Simplified C, comparison with C, tokens, basic structure
- Lexical Analysis using Flex -> Theory, Regular Expressions, Flex, Lexer
- Symbol Table (basic structure) ->Why Symbol Tables, Basic Implementation
- Using Symbol Table in the Lexer -> Flex and Symbol Table combination
- Syntax Analysis Theory -> Syntax Analysis, grammar types and parsing
- Bison basics -> Bison tutorial actually
- Creating a grammar for our Language -> Grammar and first Parser
- Combine Flex and Bison -> lexer and parser combined
- Passing information from Lexer to Parser -> Bug Fix, yylval variable, YYSTYPE union, Setting up the Parser, Passing information "directly" and through the symbol table (special case) with examples.
- Finishing Off The Grammar/Parser (part 1) -> Adding the missing grammar rules, small fixes
- Finishing Off The Grammar/Parser (part 2) -> Operator priorities, precedencies and associativity, complete example file (for testing), grammar rule visualization, finish off grammar/parser
- Semantic Analysis Theory -> What is Semantic Analysis about, CSG and CSL, Attribute Grammars, Syntax Directed Translations (SDT)
- Semantics Examples -> Visualization of Semantics for different rules/cases, needed attributes, what we have to implement from now on
- Scope Resolution using the Symbol Table -> What we have now, scope resolution, integration, compiler output
- Type Declaration and Checking -> Type declaration and type checking code, "simple" code integration
- Function Semantics (part 1) -> Code for parameter definitions, function declarations and parameter compatibility checking
- Function Semantics (part 2) -> Revisit queue concept, basic implementation, inserting undeclared identifiers
- Abstract Syntax Tree Principle -> Intermediate code generation and representations, how to design an AST
- Abstract Syntax Tree Structure -> Node, operator and value types, AST node structures, some management functions, integration with the rest of the compiler
- Abstract Syntax Tree Management -> Tweaking Nodes, Node creation and AST traversal functions, "manual" testing of the AST
- Action Rules for Declarations and Initializations -> Replacing many values with a Value union, Other smaller changes, Action rules for declarations, Action rules for initializations
- Action Rules for Expressions -> Separating operator tokens, Action rules for expressions, running the compiler
- Action Rules for Assignments and Simple Statements -> Action rules for simple statements, Action rules for assignments, running the compiler
- Action Rules for If-Else Statements -> Statements-node, Action rules for “simple” and “complicated” if-else statements, running the compiler
Final words | Next up on the project
And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.Next up on this series are:
- Semantic analysis (using more action rules in Bison)
- Intermediate Code generation (using the AST structure for more cases)
- Machine Code generation (MIPS Assembly)
So, see ya next time!
GitHub Account:
https://github.com/drifter1
Keep on drifting! ;)
Thank you for your contribution @drifter1.
We suggest only one point for improvement in your tutorial
Thanks again for your good work on developing this tutorial. Your tutorials are very well explained. We look forward to seeing more tutorials.
Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.
To view those questions and the relevant answers related to your post, click here.
Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Thank you for your review, @portugalcoin! Keep up the good work!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
I read years ago when they added generics to Java that supposedly angle brackets (e.g.:
List<String>()
) was really hard to get the compiler to see correctly. Have you noticed this?Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Haha seems legit...
Must be a pretty difficult grammar rule to setup :)
It's quite tricky when symbols are being used in more than one places of the grammar. A great example in C might be the '*' , which is the multiply symbol and also being used for pointers. The big problem with the generics in Java must have been something similar! The symbols < and > are also being used as relational operators. So, there might have been conflicts between function calls (object constructor actually), objects etc, which of course can also be compared using such relational operator!
Nice topic!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Hey @inertia
Here's a tip for your valuable feedback! @Utopian-io loves and incentivises informative comments.
Contributing on Utopian
Learn how to contribute on our website.
Want to chat? Join us on Discord https://discord.gg/h52nFrV.
Vote for Utopian Witness!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Hi @drifter1!
Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your UA account score is currently 3.505 which ranks you at #6323 across all Steem accounts.
Your rank has not changed in the last three days.
In our last Algorithmic Curation Round, consisting of 230 contributions, your post is ranked at #214.
Evaluation of your UA score:
Feel free to join our @steem-ua Discord server
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Congratulations @drifter1! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :
Click here to view your Board of Honor
If you no longer want to receive notifications, reply to this comment with the word
STOP
To support your work, I also upvoted your post!
Do not miss the last post from @steemitboard:
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Hey, @drifter1!
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!
Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).
Want to chat? Join us on Discord https://discord.gg/h52nFrV.
Vote for Utopian Witness!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit