Tải bản đầy đủ (.pdf) (104 trang)

compilers principles techniques and tools phần 4 ppsx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (5.23 MB, 104 trang )

CHAPTER
4.
SYNTAX ANALYSIS
(head)
:
body)^
C
(semantic acti~n)~
)
I
(body)z
C
(semantic a~tion)~
)
I
(body),
C
(semanticaction),
3
In a Yacc production, unquoted strings of letters and digits hot declared to
be tokens are taken to be nonterminals. A quoted single character,
e.g. 'c',
is taken to be the terminal symbol c, as wkll as the integer code for the token
represented by that character
(i.e.,
Lex
would return the character code for
)
c
'
to the parser, as


an
integer). Alternative bodies can be separated by a vertical
bar, and a semicolon follows each head with its alternatives and their semantic
actions. The first head is taken to be the start symbol.
A
Yacc semantic action is a sequence of
C
statements. In a semantic action,
the symbol
$$
refers to the attribute value associated with the nonterminal of
the head, while
$i
refers to the value associated with the ith grammar symbol
(terminal or nonterminal) of the body. The semantic action is performed when-
ever we reduce by the associated production, so normally the semantic action
computes a value for
$$
in terms of the $i's. In the Yacc specification, we have
written the two E-productions
and their associated semantic actions as:
expr
:
expr
'+)
term
I
$$
=
$1

+
$3;
3
1
term
s
Note that the nonterminal
term
in the first production is the third grammar
symbol of the body, while
+
is the second. The semantic action associated with
the first production adds the value of the
expr
and the
term
of the body and
assigns the result as the value for the nonterminal
expr
of the head. We have
omitted the semantic action for the second production altogether, since copying
the value is the default action for productions with a single grammar symbol
in the body. In general,
(
$$
=
$1;
)
is the default semantic action.
Notice that we have added a new starting production

line
:
expr
'\n'
(
printf ("%d\nfl,
$1)
;
3
to the Yacc specification. This production says that an input to the desk
calculator is to be an expression followed by a
newline character. The semantic
action associated with this production prints the decimal value of the expression
followed by a
newline character.
Simpo PDF Merge and Split Unregistered Version -
4.9.
PARSER
GENERATORS
The Supporting C-Routines Part
The third part of a Yacc specification consists of supporting C-routines.
A
lexical analyzer by the name yylex
()
must be provided. Using Lex to produce
yylex() is a common choice; see Section 4.9.3. Other procedures such as error
recovery routines may be added as necessary.
The lexical analyzer
yylex() produces tokens consisting of a token name
and its associated attribute value. If a token name such as

DIGIT
is returned,
the token name must be declared in the first section of the Yacc specification.
The attribute value associated with a token is communicated to the parser
through a
Y
acc-defined variable yylval.
The lexical analyzer in Fig. 4.58 is very crude. It reads input characters
one at a time using the C-function get char
()
.
If the character is a digit, the
value of the digit is stored in the variable
yylval, and the token name
DIGIT
is returned. Otherwise, the character itself is returned as the token name.
4.9.2
Using
Yacc
with Ambiguous
Grammars
Let us now modify the Yacc specification so that the resulting desk calculator
becomes more useful. First, we shall allow the desk calculator to evaluate a
sequence of expressions, one to a line. We shall also allow blank lines between
expressions. We do so by changing the first rule to
lines
:
lines expr
)
\n)

(
printf (I1%g\n",
$2)
;
3
I
lines )\n7
I
/*
empty
*/
9
In Yacc, an empty alternative, as the third line is, denotes
e.
Second, we shall enlarge the class of expressions to include numbers instead
of single digits and to include the arithmetic operators
+,
-,
(both binary and
unary),
*,
and
/.
The easiest way to specify this class of expressions is to use
the ambiguous grammar
E+E+E
I
E
-
E

I
E*E
I
E/E
1
-
E
1
number
The resulting Yacc specification is shown in Fig. 4.59.
Since the grammar in the Yacc specification in Fig. 4.59 is ambiguous, the
LALR algorithm will generate parsing-action conflicts. Yacc reports the num-
ber of parsing-action conflicts that are generated.
A
description of the sets of
items and the parsing-action conflicts can be obtained by invoking Yacc with a
-v
option. This option generates an additional file y
.
output that contains the
kernels of the sets of items found for the grammar, a description of the parsing
action conflicts generated by the LALR algorithm, and a readable represen-
tation of the LR parsing table showing how the parsing action conflicts were
resolved. Whenever Yacc reports that it has found parsing-action conflicts, it
Simpo PDF Merge and Split Unregistered Version -
CHAPTER
4.
SYNTAX ANALYSIS
%<
#include <ctype.h>

#include <stdio.h>
#define YYSTYPE double
/*
double type for Yacc stack
*/
%3
%token NUMBER
%left
)+'
'-'
%left
'*'
'/)
%right UMINUS
%%
lines
:
lines expr
'
\n)
<
printf ("%g\n8'
,
$2)
;
3
I
lines
'\n'
I

/*
empty
*/
9
expr
:
expr
'+'
expr
<
$$
=
$1
+
$3;
1
1expr'-'expr <$$=$I-$3;)
Iexpr'*)expr <$$=$1*$3;>
Iexpr'/)expr <$$=$1/$3;)
1
)()
expr
'1)
<
$$
=
$2; 3
I
'-9
expr

%prec UMINUS
<
$$
=
-
$2; 3
I
NUMBER
9
%%
yylex0
<
int c;
while
(
(
c
=
getchar0
==
'
'
1;
if
(
(C
==
).
P)
(

I
(isdigit (c))
)
<
ungetc(c, stdin)
;
scanf ("%lfN, &yylval)
;
return NUMBER;
3
return c;
Figure
4.59:
Yacc
specification
for
a more advanced desk calculator.
Simpo PDF Merge and Split Unregistered Version -
4.9.
PARSER GENERAT
293
is wise to create and consult the file
y
.
output
to see why the parsing-action
conflicts were generated and to see whether they were resolved correctly.
Unless otherwise instructed
Y
acc

will resolve all parsing action conflicts
using the following two rules:
1.
A
reduce/reduce conflict is resolved by choosing the conflicting production
listed first in the
Yacc
specification.
2.
A
shift/reduce conflict is resolved in favor of shift. This rule resolves the
shift/reduce conflict arising from the dangling-else ambiguity correctly.
Since these default rules may not always be what the compiler writer wants,
Yacc
provides a general mechanism for resolving shiftlreduce conflicts. In the
declarations portion, we can assign precedences and associativities to terminals.
The declaration
makes
+
and
-
be of the same precedence and be left associative. We can declare
an operator to be right associative by writing
and we can force an operator to be a nonassociative binary operator
(i.e., two
occurrences of the operator cannot be combined at all) by writing
The tokens are given precedences in the order in which they appear in the
declarations part, lowest first. Tokens in the same declaration have the same
precedence. Thus, the declaration
%right UMINUS

in Fig.
4.59
gives the token
UMINUS
a precedence level higher than that of the
five preceding terminals.
Yacc
resolves shiftlreduce conflicts by attaching a precedence and associa-
tivity to each production involved in a conflict, as well as to each terminal
involved in a conflict. If it must choose between shifting input symbol
a
and re-
ducing by production
A
-+
a,
Yacc
reduces if the precedence of the production
is greater than that of
a,
or if the precedences are the same and the associativity
of the production is
left
.
Otherwise, shift is the chosen action.
Normally, the precedence of a production is taken to be the same as that of
its rightmost terminal. This is the sensible decision in most cases. For example,
given productions
Simpo PDF Merge and Split Unregistered Version -
294

CHAPTER
4.
SYNTAX ANALYSIS
we would prefer to reduce by
E
-+
E+E
with lookahead
+,
because the
+
in
the body has the same precedence as the lookahead, but is left associative.
With lookahead
*,
we would prefer to shift, because the lookahead has higher
precedence than the
+
in the production.
In those situations where the rightmost terminal does not supply the proper
precedence to a production, we can force a precedence by appending to a pro-
duct ion the tag
Xprec
(terminal)
The precedence and associativity of the production will then be the same as that
of the terminal, which presumably is defined in the declaration section.
Yacc
does not report shiftlreduce conflicts that are resolved using this precedence
and associativity mechanism.
This "terminal" can be a placeholder, like

UMINUS
in Fig. 4.59; this termi-
nal is not returned by the lexical analyzer, but is declared solely to define a
precedence for a production. In Fig. 4.59, the declaration
%right
UMINUS
assigns to the token
UMINUS
a precedence that is higher than that of
*
and
/.
In the translation rules part, the tag:
Xprec
UMINUS
at the end of the production
expr
:
'-'
expr
makes the unary-minus operator in this production have a higher precedence
than any other operator.
4.9.3
Creating
Yacc
Lexical Analyzers with
Lex
Lex
was designed to produce lexical analyzers that could be used with
Yacc.

The
Lex library
11
will provide a driver program named
yylex
0,
the name required
by
Yacc
for its lexical analyzer. If
Lex
is used to produce the lexical analyzer,
we replace the routine
yylex()
in the third part of the
Yacc
specification by
the statement
and we have each Lex action return a terminal known to
Yacc.
By using
the
#include "1ex.yy. ctl
statement, the program
yylex
has access to
Yacc's
names for tokens, since the Lex output file
is
compiled as part of the

Yacc
output file
y
.
tab
.
c.
Under the
UNIX
system, if the Lex specification is in the file
first .l
and
the
Yacc
specification in
second. y,
we can say
Simpo PDF Merge and Split Unregistered Version -
4.9.
PARSER GENERATORS
lex first.1
yacc sec0nd.y
cc y.tab.c -1y -11
to obtain the desired translator.
The
Lex
specification in Fig. 4.60 can be used in place of the lexical analyzer
in Fig. 4.59. The last pattern, meaning "any character," must be written
\n
l

.
since the dot in Lex matches any character except newline.
number
[0-91 +\e.
?
1
[o-91 *\e. [o-91
+
%%
[
1
(
/*
skip blanks
*/
)
(number)
(
sscanf (yytext
,
"%lfl', &yylval)
;
return
NUMBER;
)
\n
I
.
{
return yytext C01

;
)
Figure 4.60: Lex specification for
yylex()
in Fig. 4.59
4.9.4
Error Recovery
in
Yacc
In
Yacc,
error recovery uses a form of error productions. First, the user de-
cides what "major" nonterminals will have error recovery associated with them.
Typical choices are some subset of the nonterminals generating expressions,
statements, blocks, and functions. The user then adds to the grammar error
productions of the form
A
+
error
a,
where
A
is a major nonterminal and
a
is a string of grammar symbols, perhaps the empty string;
error
is a
Yacc
reserved word.
Yacc

will generate a parser from such a specification, treating
the error productions as ordinary productions.
However,
wherl the parser generated by
Yacc
encounters an error, it treats
the states whose sets of items contain error productions in a special way. On
encountering an error,
Yacc
pops symbols from its stack until it finds the top-
most state on its stack whose underlying set of items includes an item of the
form
A
+
.
error
a.
The parser then "shifts" a fictitious token
error
onto the
stack, as though it saw the token
error
on its input.
When
a
is
e,
a reduction to
A
occurs immediately and the semantic action

associated with the production
A
-+
.
error
(which might be a user-specified
error-recovery routine) is invoked. The parser then discards input symbols until
it finds an input symbol on which normal parsing can proceed.
If
a
is not empty,
Yacc
skips ahead on the input looking for a substring
that can be reduced to
a.
If
a
consists entirely of terminals, then it looks for
this string of terminals on the input, and "reduces" them by shifting them onto
the stack.
At
this point, the parser will have
error
a
on top of its stack. The
parser will then reduce
error
cu
to
A,

and resume normal parsing.
For example, an error production of the form
Simpo PDF Merge and Split Unregistered Version -
CHAPTER
4.
SYNTAX ANALYSIS
%C
#include <ctype.h>
#include <stdio.h>
#define YYSTYPE double
/*
double type for Yacc stack
*/
%3
%token NUMBER
%left
)+)
)-)
%left
)*)
'/)
%right UMINUS
%%
lines
:
lines expr )\n)
C
printf("%g\ntt, $2);
1
I

lines )\n)
I
/*
empty
*/
1
error '\n)
{
yyerror ("reenter previous line: It)
;
yyerrok; 3
9
expr :expr)+)expr C$$=$1+$3;)
I
expr
'-'
expr
C
$$
=
$1
-
$3; 3
I
expr
)*)
expr
I
$$
=

$1
*
$3;
I
Iexpr)/)expr C$$=$1/$3;)
1
)()
expr
C
$$
= $2;
3
1
9-)
expr %prec UMINUS
C
$$
=
-
$2;
I
NUMBER
Figure 4.61: Desk calculator with error recovery
stmt
+
error
;
would specify to the parser that it should skip just beyond the next semicolon
on seeing an error, and assume that a statement had been found. The semantic
routine for this error production would not need to manipulate the input, but

could generate a diagnostic message and set a flag to inhibit generation of object
code, for example.
Example
4.70
:
Figure 4.61 shows the
Yacc
desk calculator of Fig. 4.59 with
the error production
lines
:
error '\n)
This error production causes the desk calculator to suspend normal parsing
when a syntax error is found on an input line. On encountering the error,
Simpo PDF Merge and Split Unregistered Version -
4.10.
SUMMARY
OF
CHAPTER
4
297
the parser in the desk calculator starts popping symbols from its stack until it
encounters a state that has a shift action on the token
error.
State
0
is such a
state (in this example, it's the only such state), since its items include
lines
+=

-
error
'
\nJ
Also, state
0
is always on the bottom of the stack. The parser shifts the token
error
onto the stack, and then proceeds to skip ahead in the input until it has
found a
newline character. At this point the parser shifts the newline onto the
stack, reduces
error
'\nJ to lines, and emits the diagnostic message "reenter
previous line:". The special
Yacc
routine
yyerrok
resets the parser to its normal
mode of operation.
4.9.5
Exercises for Section
4.9
!
Exercise
4.9.1
:
Write a
Yacc
program that takes boolean expressions as input

[as given by the grammar of Exercise
4.2.2(g)] and produces the truth value of
the expressions.
!
Exercise
4.9.2
:
Write a
Yacc
program that takes lists (as defined by the
grammar of Exercise
4.2.2(e), but with any single character as an element, not
just
a)
and produces as output a linear representation of the same list; i.e., a
single list of the elements, in the same order that they appear in the input.
!
Exercise
4.9.3
:
Write a
Yacc
program that tells whether its input is a palin-
drome (sequence of characters that read the same forward and backward).
!!
Exercise
4.9.4
:
Write a
Yacc

program that takes regular expressions (as de-
fined by the grammar of Exercise
4.2.2(d), but with any single character as an
argument, not just
a)
and produces as output a transition table for a nonde-
terministic finite automaton recognizing the same language.
4.10
Summary
of
Chapter
4
+
Parsers.
A
parser takes as input tokens from the lexical analyzer and
treats the token names as terminal symbols of a context-free grammar.
The parser then constructs a parse tree for its input sequence of tokens;
the parse tree may be constructed figuratively (by going through the cor-
responding derivation steps) or literally.
+
Context-Free Grammars. A grammar specifies a set of terminal symbols
(inputs), another set of nonterminals (symbols representing syntactic con-
structs), and a set of productions, each of which gives a way in which
strings represented by one nonterminal can be constructed from terminal
symbols and strings represented by certain other nonterminals.
A
pro-
duction consists of a head (the nonterminal to be replaced) and a body
(the replacing string of grammar symbols).

Simpo PDF Merge and Split Unregistered Version -
CHAPTER
4.
SYNTAX ANALYSIS
+
Derivations.
The process of starting with the start-nonterminal of a gram-
mar and successively replacing it by the body of one of its productions is
called a derivation. If the
leftmost (or rightmost) nonterminal is always
replaced, then the derivation is called
leftmost (respectively, rightmost).
+
Parse Trees.
A
parse tree is a picture of a derivation, in which there is
a node for each nonterminal that appears in the derivation. The children
of a node are the symbols by which that nonterminal is replaced in the
derivation. There is a one-to-one correspondence between parse trees, left-
most derivations, and rightmost derivations of the same terminal string.
+
Ambiguity.
A
grammar for which some terminal string has two or more
different parse trees, or equivalently two or more
leftmost derivations or
two or more rightmost derivations, is said to be ambiguous. In most cases
of practical interest, it is possible to redesign an ambiguous grammar so
it becomes an unambiguous grammar for the same language. However,
ambiguous grammars with certain tricks applied sometimes lead to more

efficient parsers.
+
Top-Down and Bottom-
Up
Parsing.
Parsers are generally distinguished
by whether they work top-down (start with the grammar's start symbol
and construct the parse tree from the top) or bottom-up (start with the
terminal symbols that form the leaves of the parse tree and build the
tree from the bottom). Top-down parsers include recursive-descent and
LL
parsers, while the most common forms of bottom-up parsers are
LR
parsers.
+
Design of Grammars.
Grammars suitable for top-down parsing often are
harder to design than those used by bottom-up parsers. It is necessary
to eliminate left-recursion, a situation where one nonterminal derives
a
string that begins with the same nonterminal. We also must left-factor
-
group productions for the same nonterminal that have a common prefix
in the body.
+
Recursive-Descent Parsers.
These parsers use a procedure for each non-
terminal. The procedure looks at its input and decides which production
to apply for its nonterminal. Terminals in the body of the production are
matched to the input at the appropriate time, while nonterminals in the

body result in calls to their procedure. Backtracking, in the case when
the wrong production was chosen, is a possibility.
+
LL(1)
Parsers.
A
grammar such that it is possible to choose the correct
production with which to expand a given nonterminal, looking only at
the next input symbol, is called
LL(1).
These grammars allow us to
construct a predictive parsing table that gives, for each nonterminal and
each lookahead symbol, the correct choice of production. Error correction
can be facilitated by placing error routines in some or all of the table
entries that have no legitimate production.
Simpo PDF Merge and Split Unregistered Version -
4.20.
SUMMARY
OF
CHAPTER
4
299
+
Shift-Reduce Parsing.
Bottom-up parsers generally operate by choosing,
on the basis of the next input symbol (lookahead symbol) and the contents
of the stack, whether to shift the next input onto the stack, or to reduce
some symbols at the top of the stack. A reduce step takes a production
body at the top of the stack and replaces it by the head of the production.
+

Viable Prefixes.
In shift-reduce parsing, the stack contents are always a
viable prefix
-
that is, a prefix of some right-sentential form that ends
no further right than the end of the handle of that right-sentential form.
The handle is the substring that was introduced in the last step of the
right most derivation of that sentential form.
+
Valid Items.
An item is a production with a dot somewhere in the body.
An item is valid for a viable prefix if the production of that item is used
to generate the handle, and the viable prefix includes all those symbols
to the left of the dot, but not those below.
+
LR Parsers.
Each of the several kinds of LR parsers operate by first
constructing the sets of valid items (called LR states) for all possible
viable prefixes, and keeping track of the state for each prefix on the stack.
The set of valid items guide the shift-reduce parsing decision. We prefer
to reduce if there is a valid item with the dot at the right end of the body,
and we prefer to shift the lookahead symbol onto the stack if that symbol
appears immediately to the right of the dot in some valid item.
+
Simple
LR
Parsers.
In an SLR parser, we perform
a
reduction implied by

a valid item with a dot at the right end, provided the lookahead symbol
can follow the head of that production in some sentential form.
The
grammar is SLR, and this method can be applied, if there are no
parsing-
action conflicts; that is, for no set of items, and for no lookahead symbol,
are there two productions to reduce by, nor is there the option to reduce
or to shift.
+
Canonical-LR Parsers.
This more complex form of LR parser uses items
that are augmented by the set of lookahead symbols that can follow the use
of the underlying production. Reductions are only chosen when there is a
valid item with the dot at the right end, and the current lookahead symbol
is one of those allowed for this item. A canonical-LR parser can avoid some
of the parsing-action conflicts that are present in
SLR
parsers, but often
has many more states than the SLR parser for the same grammar.
+
Lookahead-LR Parsers.
LALR parsers offer many of the advantages of
SLR
and Canonical-LR parsers, by combining the states that have the
same kernels (sets of items, ignoring the associated lookahead sets). Thus,
the number of states is the same as that of the SLR parser, but some
parsing-action conflicts present in the SLR parser may be removed in
the LALR parser. LALR parsers have become the method of choice in
practice.
Simpo PDF Merge and Split Unregistered Version -

300
CHAPTER
4.
SYNTAX ANALYSIS
+
Bottom-
Up
Parsing of Ambiguous Grammars. In many important situa-
tions, such as parsing arithmetic expressions, we can use an ambiguous
grammar, and exploit side information such as the precedence of operators
to resolve conflicts between shifting and reducing, or between reduction by
two different productions. Thus, LR parsing techniques extend to many
ambiguous grammars.
+
Y
acc.
The parser-generator
Y
acc
takes a (possibly) ambiguous grammar
and conflict-resolution information and constructs the LALR states. It
then produces a function that uses these states to perform a bottom-up
parse and call an associated function each time a reduction is performed.
4.11
References for Chapter
4
The context-free grammar formalism originated with Chomsky [5], as part of
a study on natural language. The idea also was used in the syntax description
of two early languages: Fortran by
Backus [2] and Algol 60 by Naur [26]. The

scholar Panini devised an equivalent syntactic notation to specify the rules of
Sanskrit grammar between 400 B.C. and 200
B.C.
[19].
The phenomenon of ambiguity was observed first by Cantor [4] and Floyd
[13]. Chomsky Normal Form (Exercise 4.4.8) is from [6]. The theory of context-
free grammars is summarized in
[17].
Recursive-descent parsing was the method of choice for early compilers,
such as
[16],
and compiler-writing systems, such as META [28] and TMG [25].
LL grammars were introduced by Lewis and Stearns [24]. Exercise 4.4.5, the
linear-time simulation of recursive-descent
,
is from [3].
One of the earliest parsing techniques, due to Floyd [14], involved the prece-
dence of operators. The idea was generalized to parts of the language that do
not involve operators by Wirth and Weber
[29].
These techniques are rarely
used today, but can be seen as leading in a chain of improvements to LR parsing.
LR parsers were introduced by Knuth [22], and the canonical-LR parsing
tables originated there. This approach was not considered practical, because the
parsing tables were larger than the main memories of typical computers of the
day, until Korenjak
[23] gave a method for producing reasonably sized parsing
tables for typical programming languages.
DeRemer developed the LALR [8]
and SLR [9] methods that are in use today. The construction of LR parsing

tables for ambiguous grammars came from
[I]
and [12].
Johnson's
Yacc
very quickly demonstrated the practicality of generating
parsers with an LALR parser generator for production compilers. The manual
for the
Yacc
parser generator is found in [20]. The open-source version,
Bison,
is described in [lo]. A similar LALR-based parser generator called
CUP
[18]
supports actions written in Java. Top-down parser generators incude
Antlr
[27], a recursive-descent parser generator that accepts actions in
C++,
Java, or
C#, and
LLGen
[15], which is an LL(1)-based generator.
Dain [7] gives a bibliography on syntax-error handling.
Simpo PDF Merge and Split Unregistered Version -
4.11.
REFERENCES FOR
CHAPTER
4
301
The general-purpose dynamic-programming parsing algorithm described in

Exercise 4.4.9 was invented independently by J. Cocke (unpublished) by Young-
er
[30] and Kasami [21]; hence the "CYK algorithm." There is a more complex,
general-purpose algorithm due to
Earley
[I
I]
that tabulates LR-items for each
substring of the given input; this algorithm, while also O(n3) in general, is only
O(n2) on unambiguous grammars.
1.
Aho, A.
V.,
S. C. Johnson, and
J.
D. Ullman, "Deterministic parsing of
ambiguous grammars,"
Comm. A CM
18:8
(Aug., 1975), pp. 441-452.
2.
Backus, J.W, "The syntax and semantics of the proposed international
algebraic language of the Zurich-ACM-GAMM Conference,"
Proc. Intl.
Conf. Information Processing,
UNESCO, Paris, (1959) pp. 125-132.
3. Birman, A. and J. D. Ullman, "Parsing algorithms with backtrack,"
In-
formation and Control
23:l (1973), pp. 1-34.

4.
Cantor, D. C., "On the ambiguity problem of Backus systems,"
J.
ACM
9:4 (1962), pp. 477-479.
5. Chomsky,
N.,
"Three models for the description of language,"
IRE Trans.
on Information Theory
IT-2:3 (1956), pp. 113-124.
6. Chomsky, N., "On certain formal properties of grammars,"
Information
and Control
2:2 (1959), pp. 137-167.
7. Dain, J., "Bibliography on Syntax Error Handling in Language Transla-
tion Systems," 1991. Available from the
comp
.
compilers
newsgroup; see

8.
DeRemer,
F.,
"Practical Translators for LR(k) Languages," Ph.D. thesis,
MIT, Cambridge, MA, 1969.
9. DeRemer,
F.,
"Simple LR(k) grammars,"

Cornrn. ACM
14:7 (July, 1971),
pp. 453-460.
10. Donnelly,
C.
and R. Stallman, "Bison: The YACC-compatible Parser
Generator,"
http:
//www
.
gnu.
org/software/bison/manual/
.
11.
Earley,
J.,
"An efficient context-free parsing algorithm,"
Comm. A CM
13:2 (Feb., 1970), pp. 94-102.
12. Earley, J., "Ambiguity and precedence in syntax
description,"
Acta In-
formatica
4:2
(1975), pp. 183-192.
13. Floyd, R.
W.,
"On ambiguity in phrase-structure languages,''
Comm.
ACM

5:10 (Oct., 1962), pp. 526-534.
14. Floyd, R. W., "Syntactic analysis and operator precedence,"
J.
ACM
10:3
(1963), pp. 316-333.
Simpo PDF Merge and Split Unregistered Version -
302
CHAPTER
4.
SYNTAX ANALYSIS
15. Grune,
D
and C. J. H. Jacobs, "A programmer-friendly LL(1) parser
generator,"
Software Practice and Experience
18:l (Jan., 1988), pp. 29-
38. See also
http
:
//www
.
cs
.
vu.
nl/"ceriel/LLgen. html
.
16. Hoare, C. A. R., "Report on the Elliott Algol translator,"
Computer
J.

5:2 (1962), pp. 127-129.
17. Hopcroft, J.
E.,
R. Motwani, and J.
D.
Ullman,
Introduction to Automata
Theory, Languages, and Computation,
Addison-Wesley, Boston MA, 2001.
18. Hudson, S.
E.
et al., "CUP LALR Parser Generator in Java," Available
at
19. Ingerman, P.
Z.,
"Panini-Backus form suggested,"
Comm. ACM
10:3
(March 1967), p. 137.
20. Johnson, S. C., "Yacc
-
Yet Another Compiler Compiler," Computing
Science Technical Report 32, Bell Laboratories, Murray Hill, NJ, 1975.
Available at
http
:
//dinosaur. compilertools. net/yacc/
.
21. Kasami, T., "An efficient recognition and syntax analysis algorithm for
context-free languages," AFCRL-65-758, Air Force Cambridge Research

Laboratory,
Bedford, MA, 1965.
22. Knuth, D. E., "On the translation of languages from left to right,"
Infor-
mation and Control
8:6 (1965), pp. 607-639.
23. Korenjak, A. J., "A practical method for constructing
LR(k) processors,"
Comm. ACM
12:lI (Nov., 1969), pp. 613-623.
24. Lewis,
P.
M. I1 and
R.
E. Stearns, "syntax-directed transduction,"
J.
ACM
15:3 (1968), pp. 465-488.
25.
McClure, R. M., "TMG
-
a syntax-directed compiler,"
proc. 20th ACM
Natl. Conf.
(1965), pp. 262-274.
26. Naur, P. et al., "Report on the algorithmic language ALGOL 60,"
Comm.
ACM
3:5 (May, 1960), pp. 299-314. See also
Comm. ACM

6:l (Jan.,
1963), pp. 1-17.
27. Parr, T., "ANTLR,"
http: //www
.
antlr
.
org/
.
28. Schorre,
D. V.,
"Meta-11: a syntax-oriented compiler writing language,"
Proc. 19th ACM Natl. Conf.
(1964) pp. D1.3-1-D1.3-11.
29. Wirth, N. and H. Weber, "Euler: a generalization of Algol and its formal
definition: Part I,"
Comm. ACM
9:l (Jan., 1966), pp. 13-23.
30.
Younger,
D
.H.,
"Recognition and parsing of context-free languages in time
n3,"
Information and Control
10:2 (1967), pp. 189-208.
Simpo PDF Merge and Split Unregistered Version -
Chapter
5
Syntax-Directed

Translation
This chapter develops the theme of Section 2.3: the translation of languages
guided by context-free grammars. The translation techniques in this chapter
will be applied in Chapter
6
to type checking and intermediate-code generation.
The techniques are also useful for implementing little languages for specialized
tasks; this chapter includes an example from typesetting.
We associate information with a language construct by attaching attributes
to the grammar
symbol(s) representing the construct, as discussed in Sec-
tion 2.3.2.
A
syntax-directed definition specifies the values of attributes by
associating semantic rules with the grammar productions. For example, an
infix-to-postfix translator might have a production and rule
This production has two nonterminals,
E
and
T;
the subscript in El distin-
guishes the occurrence of
E
in the production body from the occurrence of
E
as the head. Both
E
and T have a string-valued attribute code. The semantic
rule specifies that the string
E.

code is formed by concatenating El. code,
T.
code,
and the character
'+I.
While the rule makes it explicit that the translation of
E
is built up from the translations of El, T, and
I+',
it may be inefficient to
implement the translation directly by manipulating strings.
From Section 2.3.5, a syntax-directed translation scheme embeds program
fragments called semantic actions within production bodies, as in
E
-+
El
+T
{
print
'+I
}
By convention, semantic actions are enclosed within curly braces. (If curly
braces occur as grammar symbols, we enclose them within single quotes, as in
Simpo PDF Merge and Split Unregistered Version -
304
CHAPTER
5.
SYNTAX-DIRECTED TRANSLATION
I{'
and

I}'.)
The position of a semantic action in a production body determines
the order in which the action is executed. In production
(5.2), the action
occurs at the end, after all the grammar symbols; in general, semantic actions
may occur at any position in a production body.
Between the two notations, syntax-directed definitions can be more readable,
and hence more useful for specifications. However, translation schemes can be
more efficient, and hence more useful for implementations.
The most general approach to syntax-directed translation is to construct a
parse tree or a syntax tree, and then to compute the values of attributes at the
nodes of the tree by visiting the nodes of the tree. In many cases, translation
can be done during parsing, without building an explicit tree. We shall therefore
study a class of syntax-directed translations called "L-attributed translations"
(L for left-to-right), which encompass virtually all translations that can be
performed during parsing. We also study a smaller class, called "S-attributed
translations" (S for synthesized), which can be performed easily in connection
with a bottom-up parse.
5.1
Syntax-Directed Definitions
A
s yntax-directed definition
(SDD) is a context-free grammar together with,
attributes and rules. Attributes are associated with grammar symbols and rules
are associated with productions. If
X
is a symbol and
a
is one of its attributes,
then we write

X.a
to denote the value of
a
at a particular parse-tree node
labeled
X.
If
we implement the nodes of the parse tree by records or objects,
then the attributes of
X
can be implemented by data fields in the records that
represent the nodes for
X.
Attributes may be of any kind: numbers, types, table
references, or strings, for instance. The strings may even be long sequences of
code, say code in the intermediate language used by a compiler.
5.1.1
Inherited and Synthesized Attributes
We shall deal with two kinds of attributes for nonterminals:
1.
A
synthesized attribute
for a nonterminal
A
at a parse-tree node N is
defined by a semantic rule associated with the production at N. Note
that the production must have
A
as its head. A synthesized attribute at
node N is defined only in terms of attribute values at the children of N

and at N itself.
2. An
inherited attribute
for a nonterminal
B
at a parse-tree node N is
defined by a semantic rule associated with the production at the parent
of N. Note that the production must have
B
as a symbol in its body.
An
inherited attribute at node
N
is defined only in terms of attribute values
at
N's parent,
N
itself, and N's siblings.
Simpo PDF Merge and Split Unregistered Version -
5.1.
SYNTAX-DIRECTED DEFINITIONS
305
An Alternative Definition of Inherited Attributes
No additional translations are enabled if we allow an inherited attribute
B.c at a node N to be defined in terms of attribute values at the children
of N, as well as at N itself, at its parent, and at its siblings. Such rules can
be
"simulated" by creating additional attributes of B, say B.cl
,
B.c2,

.
.
.
.
These are synthesized attributes that copy the needed attributes of the
children of the node labeled B.
We then compute
B.c as an inherited
attribute, using the attributes
B.cl,
B.cz,.
. .
in place of attributes at the
children. Such attributes are rarely needed in practice.
While we do not allow an inherited attribute at node N to be defined in terms of
attribute values at the children of node N, we do allow a synthesized attribute
at node N to be defined in terms of inherited attribute values at node
N
itself.
Terminals can have synthesized attributes, but not inherited attributes. At-
tributes for terminals have lexical values that are supplied by the lexical ana-
lyzer; there are no semantic rules in the SDD itself for computing the value of
an attribute for a terminal.
Example
5.1
:
The
SDD
in Fig.
5.1

is based on our familiar grammar for
arithmetic expressions with operators
+
and
*.
It evaluates expressions termi-
nated by an endmarker n. In the SDD, each of the nonterminals has a single
synthesized attribute, called val. We also suppose that the terminal
digit
has
a synthesized attribute
lexval, which is an integer value returned by the lexical
analyzer.
Figure
5.1:
Syntax-directed definition
of
a simple desk calculator
PRODUCTION
1)
L+En
2)
E+El
+
T
3)
E+T
4)
T+Tl
*

F
5)
T+F
6)
F+(E)
7)
F
+
digit
The rule for production
1,
L
-+
E
n, sets L.val to E.va1, which we shall see
is the numerical value of the entire expression.
Production
2,
E
-+
El
+
T, also has one rule, which computes the val
attribute for the head
E
as the sum of the values at El and T. At any parse-
SEMANTIC RULES
L.val
=
E.val

E.val=E1.val+T.val
E.val
=
T.val
T.val=Tl.vaExF.val
T.val
=
F.val
F.val
=
E.val
F. val
=
digit
.lexval
Simpo PDF Merge and Split Unregistered Version -
306
CHAPTER
5.
SYNTAX-DIRECTED TRANSLATION
tree node
N
labeled
E,
the value of
val
for
E
is the sum of the values of
val

at
the children of node
N
labeled
E
and T.
Production 3,
E
+
T, has a single rule that defines the value of
val
for
E
to be the same as the value of
val
at the child for T. Production
4
is similar to
the second production; its rule multiplies the values at the children instead of
adding them. The rules for productions 5 and
6
copy values at a child, like that
for the third production. Production
7
gives
F.val
the value of a digit, that is,
the numerical value of the token
digit
that the lexical analyzer returned.

An SDD that involves only synthesized attributes is called
S-attributed;
the
SDD in Fig.
5.1
has this property. In an S-attributed SDD, each rule computes
an attribute for the nonterminal at the head of a production from attributes
taken from the body of the production.
For simplicity, the examples in this section have semantic rules without
side effects.
In
practice, it is convenient to allow SDD's to have limited side
effects, such as printing the result computed by
a
desk calculator or interacting
with a symbol table.
Once the order of evaluation of attributes is discussed
in Section
5.2,
we shall allow semantic rules to compute arbitrary functions,
possibly involving side effects.
An S-attributed SDD can be implemented naturally in conjunction with an
LR
parser. In fact, the SDD in Fig. 5.1 mirrors the Yacc program of Fig. 4.58,
which illustrates translation during
LR
parsing. The difference is that, in the
rule for production
1,
the Yacc program prints the value

E.val
as a side effect,
instead of defining the attribute
L.va1.
An SDD without side effects is sometimes called an
attribute grammar.
The
rules in an attribute grammar define the value of an attribute purely in terms
of the values of other attributes and constants.
5.1.2
Evaluating an
SDD
at the Nodes of a Parse Tree
To visualize the translation specified by an SDD, it helps to work with parse
trees, even though a translator need not actually build a parse tree. Imagine
therefore that the rules of an SDD are applied by first constructing a parse tree
and then using the rules to evaluate all of the attributes at each of the nodes
of the parse tree. A parse tree, showing the
value(s) of its attribute(s) is called
an
annotated parse tree.
How do we construct an annotated parse tree? In what order do we evaluate
attributes? Before we can evaluate an attribute at a node of a parse tree, we
must evaluate all the attributes upon which its value depends. For example,
if all attributes are synthesized, as in Example 5.1, then we must evaluate the
ual
attributes at all of the children of
a
node before we can evaluate the
val

attribute at the node itself.
With synthesized attributes, we can evaluate attributes in any bottom-up
order, such as that of a postorder traversal
of
the parse tree; the evaluation of
S-attributed definitions is discussed in Section 5.2.3.
Simpo PDF Merge and Split Unregistered Version -
5.1.
SYNTAX-DIRECTED DEFINITIONS
307
For
SDD's with both inherited and synthesized attributes, there is no guar-
antee that there is even one order in which to evaluate attributes at nodes.
For instance, consider nonterminals A and
B,
with synthesized and inherited
attributes
A.s
and B.i, respectively, along with the production and rules
These rules are circular; it is impossible to evaluate either
A.s at a node
N
or B.i
at the child of
N
without first evaluating the other. The circular dependency
of
A.s
and B.i at some pair of nodes in a parse tree is suggested by Fig. 5.2.
Figure 5.2: The circular dependency of

A.s
and B.i on one another
It is computationally difficult to determine whether or not there exist any
circularities in any of the parse trees that a given SDD could have to translate.'
Fortunately, there are useful subclasses of
SDD's that are sufficient to guarantee
that an order of evaluation exists, as we shall see in Section 5.2.
Example
5.2
:
Figure 5.3 shows an annotated parse tree for the input string
3
*
5
+
4
n,
constructed using the grammar and rules of Fig. 5.1. The values
of
lexval
are presumed supplied by the lexical analyzer. Each of the nodes for
the nonterminals has attribute
val
computed in a bottom-up order, and we see
the resulting values associated with each node. For instance, at the node with
a child labeled
*,
after computing
T.val=
3

and
F.val
=
5 at its first and third
children, we apply the rule that says
T.val
is the product of these two values,
or 15.
Inherited attributes are useful when the structure of a parse tree does not
"match" the abstract syntax of the source code. The next example shows how
inherited attributes can be used to overcome such a mismatch due to a grammar
designed for parsing rat her than translation.
'without going into details, while the problem is decidable, it cannot be solved by a
polynomial-time algorithm, even if
F
=
N'P,
since it has exponential time complexity.
Simpo PDF Merge and Split Unregistered Version -
CHAPTER
5.
SYNTAX-DIRECTED TRANSLATION
I I
2aj=\
F.val
=
4
I
T.va1
=

3
F.val=
5
digit.lexval=
4
I
I
F.val
=
3
digit. lexval
=
5
I
digit. lexval
=
3
Figure 5.3: Annotated parse tree for 3
*
5
+
4
n
Example
5.3
:
The SDD in Fig. 5.4 computes terms like 3
*
5
and 3

*
5
*
7.
The top-down parse of input 3
*
5 begins with the production
T
+
F T'.
Here,
F
generates the digit 3, but the operator
*
is generated by
TI.
Thus, the left
operand 3 appears in a different
subtree of the parse tree from
*.
An inherited
attribute will therefore be used to pass the operand to the operator.
The grammar in this example is an excerpt from a non-left-recursive version
of the familiar expression grammar; we used such a grammar as a running
example to illustrate top-down parsing in Section 4.4.
1)
T+FT1 TI. inh
=
F.val
T.val

=
T1.syn
4)
F
-+
digit
I
F.val
=
digit
.lexval
Figure 5.4: An SDD based on a grammar suitable for top-down parsing
Each
of
the nonterminals
T
and
F
has a synthesized attribute
val;
the
terminal
digit
has a synthesized attribute
lexval.
The nonterminal
T'
has two
attributes: an inherited attribute
inh

and a synthesized attribute
syn.
Simpo PDF Merge and Split Unregistered Version -
5.1.
SYNTAX-DIRECTED DEFINITIONS
309
The semantic rules are based on the idea that the left operand of the operator
*
is inherited. More precisely, the head T' of the production TI
-+
*
F
Ti
inherits the left operand of
*
in the production body. Given a term
x
*
y
*
z,
the root of the subtree for
*
y
*
z
inherits
x.
Then, the root of the subtree for
*

x
inherits the value of
x
*
y, and so on, if there are more factors in the term.
Once all the factors have been accumulated, the result is passed back up the
tree using synthesized attributes.
To see how the semantic rules are used, consider the annotated parse tree
for 3
*
5 in Fig. 5.5. The leftmost leaf in the parse tree, labeled
digit,
has
attribute value lexval
=
3, where the 3 is supplied by the lexical analyzer. Its
parent is for production
4,
F
-+
digit.
The only semantic rule associated with
this production defines
F.
val
=
digit.
lexval, which equals 3.
digit.
lexval

=
3
F.val
=
5
Ti.syn
=
15
digit.
lexval
=
5
E
Figure 5.5: Annotated parse tree for 3
*
5
At the second child of the root, the inherited attribute
T1.inh is defined by
the semantic rule
T1.inh
=
F.val associated with production 1. Thus, the left
operand, 3, for the
*
operator is passed from left to right across the children of
the root.
The production at the node for
TI is TI
-+
*

FT;. (We retain the subscript
1
in the annotated parse tree to distinguish between the two nodes for TI.) The
inherited attribute
Ti. inh is defined by the semantic rule Ti. inh
=
TI. inh
x
F.
val
associated with production
2.
With T1.inh
=
3 and F.val
=
5, we get T;.inh
=
15.
At the lower node
for
Ti, the production is TI
-+
E.
The semantic rule T1.syn
=
T1.inh defines
Ti .syn
=
15. The syn attributes at the nodes for T' pass the value 15 up the

tree to the node for T, where
T.val
=
15.
5.1.3
Exercises for Section
5.1
Exercise
5.1.1
:
For the
SDD
of Fig. 5.1, give annotated parse trees for the
following expressions:
Simpo PDF Merge and Split Unregistered Version -
CHAPTER
5.
SYNTAX-DIRECTED TRANSLATION
Exercise
5.1.2:
Extend the SDD of Fig.
5.4
to handle expressions as in
Fig. 5.1.
Exercise
5.1.3
:
Repeat Exercise 5.1.1, using your SDD from Exercise 5.1.2.
5.2
Evaluation Orders for SDD's

"Dependency graphs" are a useful tool for determining an evaluation order for
the attribute instances in a given parse tree. While an annotated parse tree
shows the values of attributes, a dependency graph helps us determine how
those values can be computed.
In this section, in addition to dependency graphs, we define two impor-
tant classes of
SDD's: the "S-attributed" and the more general "L-attributed"
SDD's. The translations specified by these two classes fit well with the parsing
methods we have studied, and most translations encountered in practice can be
written to conform to the requirements of at least one of these classes.
5.2.1
Dependency Graphs
A
dependency graph
depicts the flow of information among the attribute in-
stances in a particular parse tree; an edge from one attribute instance to an-
other means that the value of the first is needed to compute the second. Edges
express constraints implied by the semantic rules. In more detail:
For each parse-tree node, say a node labeled by grammar symbol
X,
the
dependency graph has a node for each attribute associated with
X.
Suppose that a semantic rule associated with a production
p
defines the
value of synthesized attribute
A.b
in terms of the value of
X.c

(the rule
may define
A.b
in terms of other attributes in addition to
X.c).
Then,
the dependency graph has an edge from
X.c
to
A.b.
More precisely, at
every node
N
labeled
A
where production
p
is applied, create an edge to
attribute
b
at N, from the attribute
c
at the child of
N
corresponding to
this instance of the symbol
X
in the body of the production.2
Suppose that a semantic rule associated with a production
p

defines the
value of inherited attribute
B.c
in terms of the value of
X.a.
Then, the
dependency graph has an edge from
X.a
to
B.c.
For each node
N
labeled
B
that corresponds to an occurrence
of
this
B
in the body of production
p,
create an edge to attribute
c
at N from the attribute
a
at the node
Ad
2~ince a node
N
can have several children labeled
X,

we
again assume that subscripts
distinguish among uses of the same symbol at different places in the production.
Simpo PDF Merge and Split Unregistered Version -
5.2.
EVALUATION ORDERS FOR SDD'S
311
that corresponds to this occurrence of
X.
Note that
M
could be either
the parent or a sibling of
N.
Example
5.4
:
Consider the following production and rule:
At every node
N
labeled
E,
with children corresponding to the body of this
production, the synthesized attribute
ual
at
N
is computed using the values of
ual
at the two children, labeled

E
and
T.
Thus, a portion of the dependency
graph for every parse tree in which this production is used looks like Fig. 5.6.
As a convention, we shall show the parse tree edges as dotted lines, while the
edges of the dependency graph are solid.
E
val
Figure 5.6:
E.
val
is synthesized from
El.
val
and
E2.
val
Example
5.5
:
An example of a complete dependency graph appears in Fig.
5.7. The nodes of the dependency graph, represented by the numbers
1
through
9,
correspond to the attributes in the annotated parse tree in Fig. 5.5.
T
9
val

, ,
.
,
. .
digit
1
lexval
*
digit
2
lexval
(5
Figure 5.7: Dependency graph for the annotated parse tree of Fig. 5.5
Nodes
1
and
2
represent the attribute
lexval
associated with the two leaves
labeled
digit.
Nodes
3
and
4
represent the attribute
ual
associated with the
two nodes labeled

F.
The edges to node
3
from
1
and to node
4
from
2
result
Simpo PDF Merge and Split Unregistered Version -
312 CHAPTER
5.
SYNTAX-DIRECTED TRANSLATION
from the semantic rule that defines
F.ual in terms of digit.lexua1. In fact, F.ual
equals digit.lexual, but the edge represents dependence, not equality.
Nodes 5 and
6
represent the inherited attribute T1.inh associated with each
of the occurrences of nonterminal
TI. The edge to 5 from
3
is due to the rule
T1.inh
=
F.ual, which defines T1.inh at the right child of the root from F.ua1
at the left child. We see edges to
6
from node 5 for T1.inh and from node

4
for F.val, because these values are multiplied to evaluate the attribute inh at
node
6.
Nodes
7
and
8
represent the synthesized attribute syn associated with the
occurrences of
TI. The edge to node
7
from 6 is due to the semantic rule
T1.syn
=
T1.inh associated with production 3 in Fig. 5.4. The edge to node
8
from 7 is due to a semantic rule associated with production
2.
Finally, node 9 represents the attribute T.ual. The edge to 9 from
8
is due
to the semantic rule, T.
ual
=
T1.syn, associated with production
1.
5.2.2
Ordering the Evaluation of Attributes
The dependency graph characterizes the possible orders in which we can evalu-

ate the attributes at the various nodes of a parse tree. If the dependency graph
has an edge from node
M
to node N, then the attribute corresponding to
M
must be evaluated before the attribute of
N.
Thus, the only allowable orders
of evaluation are those sequences of nodes
Nl, N2,.
. .
,
Nk such that if there is
an edge of the dependency graph from
Ni to Nj; then
i
<
j.
Such an ordering
embeds a directed graph into a linear order, and is called a topological sort of
the graph.
If there is any cycle in the graph, then there are no topological sorts; that is,
there is no way to evaluate the SDD on this parse tree. If there are no cycles,
however, then there is always at least one topological sort. To see why, since
there are no cycles, we
cad surely find a node with no edge entering. For if there
were no such node, we could proceed from predecessor to predecessor until we
came back to some node we had already seen, yielding a cycle. Make this node
the first in the topological order, remove it from the dependency graph, and
repeat the process

on
the remaining nodes.
Example
5.6
:
The dependency graph of Fig. 5.7 has no cycles. One topologi-
cal sort is the order in which the nodes have already been numbered:
1,2,.
.
.
,9.
Notice that every edge of the graph goes from a node to
a
higher-numbered node,
so this order is surely a topological sort. There are other topological sorts as
well,
suchas 1,3,5,2,4,6,7,8,9.
5.2.3
S-Attributed Definitions
As mentioned earlier, given an SDD, it is very hard to tell whether there exist
any parse trees whose dependency graphs have cycles. In practice, translations
can be implemented using classes of
SDD's that guarantee an evaluation order,
Simpo PDF Merge and Split Unregistered Version -
5.2.
EVALUATION ORDERS FOR SDD'S
313
since they do not permit dependency graphs with cycles. Moreover, the two
classes introduced in this section can be implemented efficiently in connection
with top-down or bot tom-up parsing.

The first class is defined as follows:
a
An SDD is S-attributed if every attribute is synthesized.
Example
5.7
:
The SDD of Fig.
5.1
is an example of an S-attributed definition.
Each attribute,
L.val, E.va1, T.val, and F.val is synthesized.
C7
When an SDD is S-attributed, we can evaluate its attributes in ahy bottom-
up order of the nodes of the parse tree. It is often especially simple to evaluate
the attributes by performing a postorder traversal of the parse tree and evalu-
ating the attributes at a node N when the traversal leaves N for the last time.
That is, we apply the function postorder, defined below, to the root of the parse
tree (see also the box
"Preorder and Postorder Traversals" in Section
2.3.4):
postorder (N)
{
for
(
each child C of N, from the left
)
postorder(C);
evaluate the attributes associated with node
N;
1

S-attributed definitions can be implemented during bottom-up parsing, since
a bottom-up parse corresponds to a postorder traversal. Specifically, postorder
corresponds exactly to the order in which an LR parser reduces a production
body to its head. This fact will be used in Section
5.4.2
to evaluate synthesized
attributes and store them on the stack during LR parsing, without creating the
tree nodes explicitly.
5.2.4
L-Attributed Definitions
The second class of SDD's is called L-attributed definitions. The idea behind
this class is that, between the attributes associated with a production body,
dependency-graph edges can go from left to right, but not from right to left
(hence "L-attributed"). More precisely, each attribute must be either
1.
Synthesized, or
2.
Inherited, but with the rules limited as follows. Suppose that there is
a production
A
-+
X1X2
- -
Xn, and that there is an inherited attribute
Xi.a computed by a rule associated with this production. Then the rule
may use only:
(a) Inherited attributes associated with the head
A.
(b) Either inherited or synthesized attributes associated with the occur-
rences of symbols

X1, X2,.
.
.
,
Xipl located to the left of Xi.
Simpo PDF Merge and Split Unregistered Version -
314
CHAPTER
5.
SYNTAX-DIRECTED TRANSLATION
(c) Inherited or synthesized attributes associated with this occurrence
of
Xi
itself, but only in such a way that there are no cycles in a
dependency graph formed by the attributes of this
Xi.
Example
5.8
:
The SDD in Fig.
5.4
is L-attributed. To see why, consider the
semantic rules for inherited attributes, which are repeated here for convenience:
The first of these rules defines the inherited attribute
Tf.inh
using only F.ual,
and
F
appears to the left of
TI

in the production body, as required. The second
rule defines
Ti.inh
using the inherited attribute
T1.inh
associated with the head,
and
F.va1, where
F
appears to the left of
T,'
in the production body.
In each of these cases, the rules use information
"from above or from the
left
,"
as required by the class. The remaining attributes are synthesized. Hence,
the SDD is L-attributed.
Example
5.9
:
Any SDD containing the following production and rules cannot
be L-attributed:
The first rule,
A.s
=
B.b, is a legitimate rule in either an S-attributed or L-
attributed SDD. It defines a synthesized attribute
A.s
in terms of an attribute

at a child (that is, a symbol within the production body).
The second rule defines an inherited attribute
B.i, so the entire SDD cannot
be S-attributed. Further, although the rule is legal, the SDD cannot be L-
attributed, because the attribute
C.c
is used to help define
B.i,
and
C
is to
the right of B in the production body. While attributes at siblings in a parse
tree may be used in L-attributed
SDD's, they must be to the left of the symbol
whose attribute is being defined.
5.2.5
Semantic Rules with Controlled
Side
Effects
In practice, translations involve side effects: a desk calculator might print a
result; a code generator might enter the type of an identifier into a symbol table.
With
SDD's, we strike a balance between attribute grammars and translation
schemes. Attribute grammars have no side effects and allow any evaluation
order consistent with the dependency graph. Translation schemes impose
left-
to-right evaluation and allow semantic actions to contain any program fragment;
translation schemes are discussed in Section
5.4.
We shall control side effects in SDD's in ope of the following ways:

Simpo PDF Merge and Split Unregistered Version -

×