![](https://steemitimages.com/640x0/https://i.postimg.cc/L43XFttv/compiler-series.jpg)
[Custom Thumbnail]
All the Code of the series can be found at the Github repository: https://github.com/drifter1/compiler
Introduction
Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will check if the types of the expression and identifier in an assignment statement are compatible with each other or not. Furthermore, we will also simplify some rules and tweak other things. One of those things is a "sneak-peak" of the next articles :)The topics that we will cover today are:
- Simplify pointer and array rules
- Advanced type error message
- Assignment type check
- Some stuff around parameters
Requirements:
Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:- What Compiler Design is (mainly the steps)
- For which exact Language the Compiler is build for (Tokens and Grammar)
- How to use Flex and Bison
- How to implement a lexer and parser for the language using those tools
- What the Symbol Table is and how we implement it
- How we combine Flex and Bison together
- How we can pass information from the Lexer to the Parser
- How we define operator priorities, precedencies and associativity
- What Semantic Analysis is (Attributes, SDT etc.)
- How we do the so called "Scope Resolution"
- How we declare types and check the type inter-compatibility for different cases, including function parameters
- How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
- Intermediate Code Representations, including AST's
- How we implement an AST (structure and management)
- Action Rules for AST nodes and more
Difficulty:
Talking about the series in general this series can be rated:- Intermediate to Advanced
- Medium
Actual Tutorial Content
Simplify pointer and array rules
First of all, I would like to point out that I removed the DOT token (.) from the lexer, as it's not used right now. We might add it back later on, but right now it just gives us an "unused" warning. Also, the pointer and array rules from the grammar that we have right now, allow multiple pointers and multi-dimensional arrays. But, managing those in the symbol table structure and generating Assembly code is not that easily. So, to make the whole process easier for you to understand, and most importantly easier to code, I will only allow single-pointers and one-dimensional arrays. This means that the rules for them change into:pointer: MULOP ; /* for now only single pointers! */
array: /* for now only one-dimensional arrays */
LBRACK expression RBRACK /* can only be used in expressions */
{
// if declaration then error!
if(declare == 1){
fprintf(stderr, "Array declaration at %d contains expression!\n", lineno);
exit(1);
}
}
| LBRACK ICONST RBRACK
{
// set array_size for declaration purposes
$$ = $2.ival;
}
;
That way the actual structure that we have for storing such information (symbol table entry) makes much more sense now! We could also add new entries for such things, but why make things so complicated from the beginning? We can always add such behavior later on...
Advanced type error message
Next up is the "type_error" function that we defined during Semantics Analysis and so inside of the "semantics.c" file. This function get's triggered from the "get_result_type" function, whenever there is a type conflict. To help us distinguish the error that we have, we also added an error message. But this error message just prints out numbers, and remembering all the data and operator type integer mappings is quite difficult! So, let's make it print out the actual names of the data and operator types.In code this looks as following:
void type_error(int type_1, int type_2, int op_type){
fprintf(stderr, "Type conflict between ");
/* first type */
if (type_1 == INT_TYPE) fprintf(stderr,"%s","int");
else if (type_1 == REAL_TYPE) fprintf(stderr,"%s","real");
else if (type_1 == CHAR_TYPE) fprintf(stderr,"%s","char");
else fprintf(stderr,"%s","other");
fprintf(stderr, " and ");
/* second type */
if (type_2 == INT_TYPE) fprintf(stderr,"%s","int");
else if (type_2 == REAL_TYPE) fprintf(stderr,"%s","real");
else if (type_2 == CHAR_TYPE) fprintf(stderr,"%s","char");
else fprintf(stderr,"%s","other");
/* operator */
fprintf(stderr," using op type ");
switch(op_type){
case NONE:
fprintf(stderr,"%s","NONE");
break;
case ARITHM_OP:
fprintf(stderr,"%s","ARITHM_OP");
break;
case INCR_OP:
fprintf(stderr,"%s","INCR_OP");
break;
case BOOL_OP:
fprintf(stderr,"%s","BOOL_OP");
break;
case NOT_OP:
fprintf(stderr,"%s","NOT_OP");
break;
case REL_OP:
fprintf(stderr,"%s","REL_OP");
break;
case EQU_OP:
fprintf(stderr,"%s","EQU_OP");
break;
default:
fprintf(stderr, "Error in operator selection!\n");
exit(1);
}
/* line */
fprintf(stderr, " in line %d\n", lineno);
exit(1);
}
Consider the following code snippet:
...
int i;
i = 5.5;
...
Running the compiler now, when having a type conflict, would give us:
![](https://steemitimages.com/640x0/https://i.postimg.cc/c4Qfj9TH/image.png)
Type error cause we can't assign (operator NONE) a real to an integer!
Assignment type check
Having all the needed functions already implemented in previous articles/parts of the series, we just have to call the "get_result_type" function with the correct arguments:- The variable datatype can be found using the "get_type" function.
- The expression datatype can be found with the "expression_data_type" function
- The operator is NONE, which corresponds to assignments and so type checks only
assigment: var_ref ASSIGN expression
{
AST_Node_Ref *temp = (AST_Node_Ref*) $1;
$$ = new_ast_assign_node(temp->entry, temp->ref, $3);
// check assignment semantics
get_result_type(
get_type(temp->entry->st_name), /* variable datatype */
expression_data_type($3), /* expression datatype */
NONE /* checking compatibility only (no operator) */
);
}
;
Running this function with a NONE argument, will make it check if the first datatype can get a value of the second datatype. That way we know if the expression can be assigned as a value of the identifier. If not, then we will get an error message (like the one that we saw earlier).
Some stuff around parameters
The stuff that we covered up to now is actually all that I have to do for today's topic! But, as that would be just to little, let's now also dive into some other stuff, to prepare for the next big topic of function call semantics. In this topic we have to:- check if the calling parameters are compatible with the formal (declaration) parameters (which includes checking the parameter count)
- set the return type of the function, so that we can check if the function can be used in an expression, assignment etc.
Let's first add some new entries to the revisit queue entry. To check parameters and to be able to tweak the datatype of the symbol table entry/item we have to add:
- a "list_t" entry
- an array of parameter datatype ("par_types"
- the number of parameters ("num_of_pars")
typedef struct revisit_queue{
// symbol table entry
list_t *entry;
// name of identifier
char *st_name;
// type of revisit
int revisit_type;
// parameters
int *par_types;
int num_of_pars;
// maybe additional information to simplify the process ...
struct revisit_queue *next;
}revisit_queue;
The first of those newly added entries can be accessed easily during the "entry" creation inside of the "insert" function of the Symbol Table. This makes us think that the "add_to_queue" function of the Revisit Queue structure, should contain such a parameter. In the same way, to be able to access the queue entry easier, we should also create a search function that gives back the entry without removing it. This will become useful when we actually want to add the information of parameter data-types ("par_types") and the parameter count ("num_of_pars") to this entry. That way we now end up with the following new functions:
symtab.h:
...
void add_to_queue(list_t *entry, char *name, int type);
revisit_queue *search_queue(char *name);
...
----------------------------------------------------------------------------
symtab.c:
...
void add_to_queue(list_t *entry, char *name, int type){
revisit_queue *q;
/* queue is empty */
if(queue == NULL){
/* set up entry */
q = (revisit_queue*) malloc(sizeof(revisit_queue));
q->entry = entry;
q->st_name = name;
q->revisit_type = type;
q->next = NULL;
/* q "becomes" the queue */
queue = q;
}
/* queue not empty */
else{
/* find last element */
q = queue;
while(q->next != NULL) q = q->next;
/* add element to the end */
q->next = (revisit_queue*) malloc(sizeof(revisit_queue));
q->next->entry = entry;
q->next->st_name = name;
q->next->revisit_type = type;
q->next->next = NULL;
}
}
revisit_queue *search_queue(char *name){
revisit_queue *q;
/* search for the entry */
q = queue;
while( strcmp(q->st_name, name) != 0 ) q = q->next;
return q;
}
...
Doing that we of course have to change the "insert"-function, as it uses the "add_to_queue" function. That's pretty easily actually, as we just have to tweak the following line:
add_to_queue(l, l->st_name, PARAM_CHECK);
which is line 68 of the "symtab.c" file.
The "search_queue" function can be used in the "function_call"-rule, to be able to set the two other new entries of the revisit queue entry. This function gives back the queue entry and so we can then easily set the number of parameters, allocate memory for the array, and get the expression data type of each parameter, passing it to each corresponding "par_types" index. In code that looks as following:
function_call: ID LPAREN call_params RPAREN
{
AST_Node_Call_Params *temp = (AST_Node_Call_Params*) $3;
$$ = new_ast_func_call_node($1, temp->params, temp->num_of_pars);
/* add information to revisit queue entry (if one exists) */
revisit_queue *q = search_queue($1->st_name);
if(q != NULL){
q->num_of_pars = temp->num_of_pars;
q->par_types = (int*) malloc(temp->num_of_pars * sizeof(int));
/* get the types of the parameters */
int i;
for(i = 0; i < temp->num_of_pars; i++){
/* get datatype of parameter-expression */
q->par_types[i] = expression_data_type(temp->params[i]);
}
}
}
;
More around that in the next articles!
RESOURCES
References:
No references, just using code that I implemented in my previous articles.Images:
All of the images are custom-made!Previous parts of the series
- Introduction -> What is a compiler, what you have to know and what you will learn
- A simple C Language -> Simplified C, comparison with C, tokens, basic structure
- Lexical Analysis using Flex -> Theory, Regular Expressions, Flex, Lexer
- Symbol Table (basic structure) ->Why Symbol Tables, Basic Implementation
- Using Symbol Table in the Lexer -> Flex and Symbol Table combination
- Syntax Analysis Theory -> Syntax Analysis, grammar types and parsing
- Bison basics -> Bison tutorial actually
- Creating a grammar for our Language -> Grammar and first Parser
- Combine Flex and Bison -> lexer and parser combined
- Passing information from Lexer to Parser -> Bug Fix, yylval variable, YYSTYPE union, Setting up the Parser, Passing information "directly" and through the symbol table (special case) with examples.
- Finishing Off The Grammar/Parser (part 1) -> Adding the missing grammar rules, small fixes
- Finishing Off The Grammar/Parser (part 2) -> Operator priorities, precedencies and associativity, complete example file (for testing), grammar rule visualization, finish off grammar/parser
- Semantic Analysis Theory -> What is Semantic Analysis about, CSG and CSL, Attribute Grammars, Syntax Directed Translations (SDT)
- Semantics Examples -> Visualization of Semantics for different rules/cases, needed attributes, what we have to implement from now on
- Scope Resolution using the Symbol Table -> What we have now, scope resolution, integration, compiler output
- Type Declaration and Checking -> Type declaration and type checking code, "simple" code integration
- Function Semantics (part 1) -> Code for parameter definitions, function declarations and parameter compatibility checking
- Function Semantics (part 2) -> Revisit queue concept, basic implementation, inserting undeclared identifiers
- Abstract Syntax Tree Principle -> Intermediate code generation and representations, how to design an AST
- Abstract Syntax Tree Structure -> Node, operator and value types, AST node structures, some management functions, integration with the rest of the compiler
- Abstract Syntax Tree Management -> Tweaking Nodes, Node creation and AST traversal functions, "manual" testing of the AST
- Action Rules for Declarations and Initializations -> Replacing many values with a Value union, Other smaller changes, Action rules for declarations, Action rules for initializations
- Action Rules for Expressions -> Separating operator tokens, Action rules for expressions, running the compiler
- Action Rules for Assignments and Simple Statements -> Action rules for simple statements, Action rules for assignments, running the compiler
- Action Rules for If-Else Statements -> Statements-node, Action rules for “simple” and “complicated” if-else statements, running the compiler
- Action Rules for Loop Statements and some Fixes -> Some fixes, Action rules of while statements, Action rules of for statements, running the compiler
- Action Rules for Function Declarations (part 1) -> Small Tweaks and Fixes, Visualizing the Problem, The new AST nodes and creation functions, Action Rules for Declarations
- Action Rules for Function Declarations (part 2) -> Action Rules for Function declarations, Action Rules for a Function declaration Action Rules for Parameters, Traversing after parsing is done using "program", Compiler validation using examples
- Action Rules for Function Calls -> Visualizing the Problem,AST nodes and creation functions, Action Rules for Function Calls, Running the compiler
- Datatype attribute for Expressions -> Visualizing the Problem, Parameter datatype, Changes in AST Nodes and creation functions, Running the compiler
Final words | Next up on the project
And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.Next up on this series are:
- Semantic analysis (using even more action rules in Bison)
- Machine Code generation (MIPS Assembly)
So, see ya next time!
GitHub Account:
https://github.com/drifter1![](https://steemitimages.com/0x0/https://media.giphy.com/media/ybITzMzIyabIs/giphy.gif)
Keep on drifting! ;)
Thank you for another contribution.
Another beautiful piece of work. It is nice to see you performing some enhancements on prior work, particularly the pointers and dot notation.
Oh and not to undermine the usefulness of proper error messages and the type checking.
You could have provided some more sample output/result of the modifications you worked on as well via screenshots/compiler output.
Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.
To view those questions and the relevant answers related to your post, click here.
Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Thank you for your review, @mcfarhat! Keep up the good work!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Hey, @drifter1!
Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!
Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).
Want to chat? Join us on Discord https://discord.gg/h52nFrV.
Vote for Utopian Witness!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Hi @drifter1!
Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
University was time-consuming in the last weeks and so I don't had much time to work around this series...and for Steemit in general.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit