In this appendix, we demonstrate how to implement an important compiler part in this completely realistic way. Specifically, we implement a precedence parser that guides a syntax-directed translation of infix logical expressions to its postfix equivalents. We program this implementation in C++, which definitely belongs to the most popular programming languages today.
We implement the syntax-directed translation in the following way. The input grammar may content an identifiers and an integer numbers as operands. Every positive integer represents the logical value true while the zero means false. Lexemes and tokens specifying logical operators and, or and not are denoted by &, |, and , ! respectively. The syntax directed translation is defined by this attribute grammar
C -> !C {POSTFIX(!)} C -> C | C {POSTFIX(|)} C -> C & C {POSTFIX(&)} C -> (C) {} C -> i{$1} {POSTFIX($1)} C -> #{$1} {POSTFIX($1)}
where #’s attributes are 0 or 1, representing true or false, respectively. For instance, this syntax directed translation converts 0 | ! (day & night) to #{0} i{?day} i{?night} & ! |.
Synopsis. First, we conceptualize the implementation in Concept. Then, in C++ Source Code, we give a C++ code of this implementation. As opposed to the previous parts of this book, we present both sections in somewhat schematic way because computer program code is usually commented in this way.
The implementation is conceptualized as an object oriented system based upon these classes Lex
recognizes the next lexeme and produces its token that specifies the lexeme. Each identifier is recorded in the symbol table and represented by a token with an attribute, which is a symbol table address to the symbol table record of the identifier.
PushdownAutomaton
performs the fundamental pushdown operations.
SymbolTable
stores identifiers.
PrecedenceParser
makes the syntax analysis of the tokenized infix expression, obtained by Lex
, by using the precedence parsing method. In addition, it controls the semantic actions to produce the postfix equivalent to the infix expression. PrecedenceParser
holds the instances of the Lex
, PushdownAutomaton
, and SymbolTable
classes.
The following definitions represent the tokens together with the corresponding lexemes. They are specified in the Defines.h
file as follows:
typedef enum PD_SYMBOLS { UNDEF = -1, // used as end of the rule marker TOKEN_IDENT = 0, // 'i' identificator TOKEN_INTEGER = 1, // '#' integer TOKEN_LOG_AND = 2, // '&' disjunction TOKEN_LOG_OR = 3, // '|' conjunction TOKEN_LOG_NEG = 4, // '!' negation TOKEN_B_LEFT = 5, // '(' left parenthesis TOKEN_B_RIGHT = 6, // ')' right parenthesis BOTTOM = 7, // '$' used as the pushdown bottom N_COND = 8, // 'C' nonterminal CONDITION LSHARP = 9, // '<' < RSHARP = 10 // '>' > };
UNDEF symbol is used as an endmarker, which marks, for instance, the end of a rule. LSHARP and RSHARP are pushdown symbols. The other symbols correspond to the input tokens.
Lex
performs
T_Token * Lex::getNextToken():
getting the next tokenvoid Lex::setToken(int type, string name, int value):
setting token with a given valuestypedef struct T_Token { int type; string name; int value; }; T_token token;
PushdownAutomaton
is implemented as
// Definition of the pushdown symbol typedef struct T_PdSymbol { PD_SYMBOLS symbol; // type of the symbol T_Token * token; // pointer to the corresponding token if any }; // definitions of the pushdown structure typedef vector<T_PdSymbol *> T_PdString; typedef T_PdString T_PdAutomaton; typedef T_PdString::iterator T_PdAutomatonIter; typedef T_PdString::reverse_iterator T_PdAutomatonIterRev; T_PdAutomaton pdAutomaton;
PushdownAutomaton
performs
void PushdownAutomaton::push(T_PdSymbol * symbol):
pushing a symbol onto the pushdown top;void pop():
popping the topmost pushdown symbol;T_PdSymbol * PushdownAutomaton::getTopmostTerm():
getting the topmost pushdown terminal;bool PushdownAutomaton::switchTop(T_PdString oldTop, T_PdString newTop):
switching a string for another string on the pushdown top;bool PushdownAutomaton::switchSymbol(PD_SYMBOLS symbol, T_PdString newTop):
replacing the topmost pushdown symbol with a string;bool PushdownAutomaton::compareTop(T_PdString top):
comparing a topmost pushdown string with another string; if they coincide, it returns true, and if they do not, it returns false;T_PdString PushdownAutomaton::getSymbolsToReduce():
getting the pushdown string that starts at the topmost occurrence of symbol < and continues up to the very top of the pushdown;void PushdownAutomaton::setSymbol(T_PdSymbol * pdSymbol, PD_SYMBOLS symbol, T_Token * token):
setting pdSymbol with a given values.
SymbolTable
is implemented as
struct T_SymbolInfo { int length; }; typedef map<string, T_SymbolInfo> T_SymbolTable; typedef map<string, T_SymbolInfo>::iterator T_SymbolTableIter; T_SymbolTable symbolTable;''
SymbolTable
performs
void SymbolTable::installID(string lexeme):
storing new lexeme into the symbol table;T_SymbolTableIter SymbolTable::getID(string lexeme):
return of the symbol-table index addressing the given lexeme.
PrecedenceParser
class stores the grammatical rules in this way
typedef vector<PD_SYMBOLS> T_Rule; typedef vector<PD_SYMBOLS>::iterator T_RuleIter; typedef map<int, T_Rule> T_RuleTable; typedef map<int, T_Rule>::iterator T_RuleTableIter;
In PrecedenceParser
, an operator precedence table is implemented as
char precedenceTable[PTABLEROW][PTABLECOL];
The sematic actions corresponding to the rule used during reductions are implemented as
char postfixAction[PTABLEROW];
Finally, PrecedenceParser
contains
int PrecedenceParser::findRightSide(T_PdString rightSide):
finding the rule according to its right-hand side;T_Rule PrecedenceParser::getRule(int index):
getting the rule according to its table index.int PrecedenceParser::runParsing():
implementation of the syntax-directed translation based upon the precedence parser;Here you can see demonstration of the compiler.
In this section we give the code of the 2 key compiler methods.
PrecedenceParser::PrecedenceParser() { // initialization of the precedence table char precedenceTableInit[PTABLEROW][PTABLECOL] = { /* i # & | ~ ( ) $ */ /* i */ {'_','_','>','>','_','_','>','>'}, /* # */ {'_','_','>','>','_','_','>','>'}, /* & */ {'<','<','>','>','<','<','>','>'}, /* | */ {'<','<','<','>','<','<','>','>'}, /* ~ */ {'<','<','>','>','<','<','>','>'}, /* ( */ {'<','<','<','<','<','<','=','_'}, /* ) */ {'_','_','>','>','_','_','>','>'}, /* $ */ {'<','<','<','<','<','<','_','_'} }; // filling PrecedenceTable with precedenceTableInit int size = PTABLEROW * PTABLECOL * sizeof(char); memcpy(precedenceTable, precedenceTableInit, size); // Actions that generate the postfix notation. Index to the postfixAction array // corresponds to the index of a rule in ruleTable. In this way, we associate each // rule with its semantic action that generates the postfix notation. postfixAction[0] = '!'; postfixAction[1] = '|'; postfixAction[2] = '&'; postfixAction[3] = '_'; // no postfix action defined for brackets postfixAction[4] = 'i'; postfixAction[5] = '#'; // Initialization of the grammatical rules follows next. The first symbol of // each rule detones the left-hand side of the rule. PD_SYMBOLS rules[RULECOUNT][RULELENGTH]= { { N_COND, TOKEN_LOG_NEG, N_COND , UNDEF , UNDEF }, { N_COND, N_COND , TOKEN_LOG_OR , N_COND , UNDEF }, { N_COND, N_COND , TOKEN_LOG_AND, N_COND , UNDEF }, { N_COND, TOKEN_B_LEFT , N_COND , TOKEN_B_RIGHT, UNDEF }, { N_COND, TOKEN_IDENT , UNDEF , UNDEF , UNDEF }, { N_COND, TOKEN_INTEGER, UNDEF , UNDEF , UNDEF } }; // Filling ruleTable with the rules initRules(rules, RULECOUNT); // Initialization of SymbolTable, Lexical analyzer and pushdown automaton. symbolTable = new SymbolTable; lex = new Lex(symbolTable); pdAutomaton = new PushdownAutomaton(); } int PrecedenceParser::runParsing() { T_Token *token; T_PdSymbol *pdSymbol; T_PdSymbol *pdSymbolIns; PD_SYMBOLS inSymbol; T_PdString rightSide; T_PdString newTop; T_PdString oldTop; T_PdString stringToReduce; T_Rule reduceRule; T_PdString leftSide; T_PdString::iterator iter2; T_PdSymbol * pdSymbolTmp; // pdSymbol represents a pushdown symbol. pdSymbol = new T_PdSymbol; setSymbol(pdSymbol, BOTTOM, NULL); // pushing symbol $, which marks the pushdown bottom pdAutomaton->push(pdSymbol); // getting the first token token = lex->getNextToken(); // Token is stored into inSymbol. inSymbol = (PD_SYMBOLS)token->type; do{ // pdSymbol records the topmost terminal. pdSymbol = pdAutomaton->getTopmostTerm(); // inSymbol and pdSymbol determines the corresponding precedenceTable // entry, which then this entry species the parsing action to perform. switch(precedenceTable[pdSymbol->symbol][inSymbol]) { case '=': /* two computational actions are performed: (1): push(inSymbol) & (2): get the next token */ /* (1): push(inSymbol) */ pdSymbol = new T_PdSymbol; setSymbol(pdSymbol, inSymbol, token); pdAutomaton->push(pdSymbol); /* (2): read the next token */ token = lex->getNextToken(); inSymbol = (PD_SYMBOLS)token->type; break; case '<': /* three computational actions are performed: (1): replacement of pdSymbol with string consisting of pdSymbol followed by symbol < (2): push(inSymbol) (3): reading the next token */ /* (1): replacement of pdSymbol with a string consisting of pdSymbol followed by symbol < */ // preparing the string consisting of // pdSymbol followed by symbol < newTop.clear(); newTop.push_back(pdSymbol); pdSymbolIns = new T_PdSymbol; setSymbol(pdSymbolIns, LSHARP, NULL); newTop.push_back(pdSymbolIns); // replacing pdSymbol with the string prepared above pdAutomaton->switchSymbol(pdSymbol->symbol, newTop) /* (2): push(inSymbol) */ pdSymbolIns = new T_PdSymbol; setSymbol(pdSymbolIns, inSymbol, token); pdAutomaton->push(pdSymbolIns); /* (3): read the next token */ token = lex->getNextToken(); inSymbol = (PD_SYMBOLS)token->type; break; case '>': /* two computational actions are performed: (1): if the topmost occurrence of symbol < is followed by string y, make sure there exists a rule, r, with the right-hand side equal to y, and if this is the case, reduce <y to the the left-hand side of r on the pushdown top; (2): write out r */ /* (1): reduction */ // finding <y on the pushdown top stringToReduce.clear(); stringToReduce = pdAutomaton->getSymbolsToReduce(); // testing that a string starting with symbol < // occurs on the pushdown top if(!stringToReduce.empty()) { // placing y into handle T_PdString handle = stringToReduce; handle.erase(handle.begin()); // finding rule r with y on the right-hand side int res = findRightSide(handle); // r with y on the right-hand side is found if(res != -1) { // getting rule r of the form C -> y reduceRule = getRule(res); pdSymbolIns = new T_PdSymbol; setSymbol(pdSymbolIns, *(reduceRule.begin()), NULL); leftSide.clear(); leftSide.push_back(pdSymbolIns); //reducing y to C on the pushdown top pdAutomaton->switchTop(stringToReduce, leftSide); // postfixAction[] specifies the semantic action // associated with the rule according to which a reduction // is performed char c = postfixAction[res]; if(c != '_') { // some action prescribed if(c == '#') // integer value { // As described in Lex, 0 represents the zero value // while 1 represents any non-zero value in this // implementation. The next line converts the value // to the corresponding character that is concatenated // with the current postfix expression string. char cVal = (*(stringToReduce.back())).token->value+'0'; postfixExpr.push_back(cVal); } else if(c == 'i') { postfixExpr.push_back(postfixAction[res]); } else { postfixExpr.push_back(postfixAction[res]); } } } else // no rule has y on its right-hand side { cerr << "SYNTAX ERROR: the right-hand side of the rule is missing" << endl; return 0; } } else // no symbol < occurs in the pushdown { cerr << "SYNTAX ERROR: no symbol < occurs in the pushdown, so no handle is delimited" << endl; return 0; } break; default: cerr << "SYNTAX ERROR: table-detected error by a blank entry" << endl; return 0; break; } pdSymbolTmp = pdAutomaton->getTopmostTerm(); }while( !((pdSymbolTmp->symbol == BOTTOM) && (inSymbol == BOTTOM)) ); return 0; }
The whole compiler source code including common routines you can download here.