Chapter 3
Describing Syntax
and Semantics
ISBN 0-321-33025-0
Chapter 3 Topics
•
Introduction
Introduction
• The General Problem of Describing Syntax
lhdf b
•Forma
l
Met
h
o
d
s o
f
Descri
b
ing Syntax
• Attributes Grammars
• Describing the Meanings of Programs:
Dynamic Semantics
Dynamic
Semantics
Copyright © 2006 Addison-Wesley. All rights reserved. 1-2
Introduction
• A language may be hard to learn, hard to
implement and any ambiguity in the
implement
,
and
any
ambiguity
in
the
specification may lead to dialect differences if
we do not have a clear language definition
we
do
not
have
a
clear
language
definition
• Most new programming languages are subjected
to a period of scrutiny by potential users before
to
a
period
of
scrutiny
by
potential
users
before
their designs are completed
• Who must use lan
g
ua
g
e definitions
gg
– Other language designers
– Implementors
Copyright © 2006 Addison-Wesley. All rights reserved. 1-3
– Programmers (the users of the language)
Introduction (cont.)
• The study of programming languages can be
divided into examinations of syntax and
semantics
–
Syntax
- the form or structure of the expressions,
statements, and program units
Sti
th i f th i
–
S
eman
ti
c
s
-
th
e mean
i
ng o
f
th
e express
i
ons,
statements, and program units
•
Semantics should follow from syntax the form
•
Semantics
should
follow
from
syntax
,
the
form
of statements should be clear and imply what
the statements do or how they should be used
Copyright © 2006 Addison-Wesley. All rights reserved. 1-4
the
statements
do
or
how
they
should
be
used
Example
•
Syntax Example
: Simple C if statement
if (<expr>)
<true-statement>
else
<false
statement>
<false
-
statement>
•
Semantics Example
:
If the expression evaluated to true execute
the true statement otherwise execute the false
Copyright © 2006 Addison-Wesley. All rights reserved. 1-5
statement
The General Problem of Describing
Syntax
Syntax
•A sentence is a string of characters over some
alphabet
•
A
language
is a set of sentences
A
language
is
a
set
of
sentences
•A lexeme is the lowest level syntactic unit of a
language (e g *
+
sum begin)
language
(e
.
g
.,
*
,
+
, =,
sum
,
begin)
•A token is a category of lexemes (e.g.,
identifier)
• Languages can be formally defined in two
Copyright © 2006 Addison-Wesley. All rights reserved. 1-6
distinct ways: by recognition and by generation
Example
index = 2 * count + 10;
Lexeme Token
index
identifier
=
equal_sign
2
int_literal
*
mult_op
count
identifier
l
+
p
l
us_op
10
int_literal
semicolon
Copyright © 2006 Addison-Wesley. All rights reserved. 1-7
;
semicolon
Language Recognizers
• A reco
g
nition device reads in
p
ut strin
g
s of
gpg
the language and decides whether the input
strings belong to the language
strings
belong
to
the
language
• Example: syntax analysis part of a compiler
Copyright © 2006 Addison-Wesley. All rights reserved. 1-8
Language Generators
•
A device that generates sentences of a
A
device
that
generates
sentences
of
a
language
One can determine if the syntax of a
•
One
can
determine
if
the
syntax
of
a
particular sentence is correct by comparing
ih fh
i
t to t
h
e structure o
f
t
h
e generator
Copyright © 2006 Addison-Wesley. All rights reserved. 1-9
Language Recognizers vs. Generators
•
The language recognizer can only be used
The
language
recognizer
can
only
be
used
in trial-and-error mode (black box)
The structure of the language generator is
•
The
structure
of
the
language
generator
is
an open-book which people can easily read
dd d
an
d
un
d
erstan
d
Copyright © 2006 Addison-Wesley. All rights reserved. 1-10
Formal Methods of Describing Syntax
•
This section discusses the formal language
This
section
discusses
the
formal
language
generation mechanisms that are commonly
used to describe the syntax of programming
used
to
describe
the
syntax
of
programming
languages
Th h i f ll d
•
Th
ese mec
h
an
i
sms are o
f
ten ca
ll
e
d
grammars
• We will discuss the class of languages called
context-free languages
Copyright © 2006 Addison-Wesley. All rights reserved. 1-11
Context-Free Grammars
• Noam Chomsky
– A linguist
– Described four classes of grammars in the mid-1950s
Tfh l
f
•
T
wo o
f
t
h
ese grammar c
l
ass, context-
f
ree
and regular grammars, are useful in
i
computer sc
i
ence
– The tokens of programming languages can be
described by regular grammars
described
by
regular
grammars
– Whole programming languages, with minor
exce
p
tions
,
can be described b
y
context-free
Copyright © 2006 Addison-Wesley. All rights reserved. 1-12
p, y
grammars
Backus-Naur Form
• Invented by John Backus to describe Algol 58
• BNF is a metalanguage
–A metalanguage is a language used to describe
another language
• BNF is equivalent to context-free grammars
• Extended BNF (EBNF) improves readability
and writability of BNF
•In BNF,
abstractions
are used to represent
syntactic structures (also called nonterminal
Copyright © 2006 Addison-Wesley. All rights reserved. 1-13
symbols)
Backus-Naur Form (cont.)
• Although BNF is simple, it is sufficiently
powerful to describe the great majority of
the syntax of programming languages:
– Lists of similar constructs
– The order in which different constructs must
appear
– Nested structures to any depth
– Operator precedence
–O
p
erator associativit
y
Copyright © 2006 Addison-Wesley. All rights reserved. 1-14
py
–…
Grammar and Rules
• A grammar is a finite nonempty set of rules
• A rule has a left-hand side (LHS) and a right-
hand side (RHS), and consists of terminal
(lexeme or token) and nonterminal symbols
<assign> <var> = <expression>
• An abstraction (or nonterminal symbol) can
have more than one RHS
<if_stmt>
if <logic_expr> then <stmt> |
Copyright © 2006 Addison-Wesley. All rights reserved. 1-15
if <logic_expr> then <stmt> else <stmt>
Describing Lists
• Syntactic lists (for example, a list of
identifiers appearing on a data declaration
statement) are described usin
g
recursion
g
• A rule is recursive if its LHS appears in its
RHS
RHS
<ident_list> identifier | identifier , <ident_list>
Note: Comma ‘,’ is a terminal
Copyright © 2006 Addison-Wesley. All rights reserved. 1-16
Example: A grammar of expressions
• Seven terminal symbols:
+
-*
()xy
+
(
)
x
y
• Four non-terminal symbols:
‹expr› ‹term› ‹factor› ‹var›
• Start/goal symbol:
‹expr›
Rl f
R
u
l
es o
f
grammar
‹expr› ‹term› | ‹expr› + ‹term› | ‹expr› - ‹term›
‹term› ‹factor› | ‹term› * ‹factor›
‹factor› ‹var› | ( ‹expr› )
Copyright © 2006 Addison-Wesley. All rights reserved. 1-17
‹var› x | y
Derivations
•A derivation is a repeated application of
rules, starting with the start symbol and
ending with a sentence (all terminal
symbols)
•
Derivations can be used to generate all the
Derivations
can
be
used
to
generate
all
the
possible sentences in a grammar
•
Every string of symbols in the derivation is a
•
Every
string
of
symbols
in
the
derivation
is
a
sentential form
Copyright © 2006 Addison-Wesley. All rights reserved. 1-18
Derivations (cont.)
•A sentence is a sentential form that has only
terminal symbols
•A leftmost derivation is one in which the
leftmost nonterminal in each sentential form
is the one that is expanded
is
the
one
that
is
expanded
• A derivation may be neither leftmost nor
rightmost
rightmost
• Derivation order should have no effect on the
ldb
Copyright © 2006 Addison-Wesley. All rights reserved. 1-19
l
anguage generate
d
b
y a grammar
Example: a grammar and left derivation
<sentence> <noun-phrase> <verb-phrase> .
<noun-phrase> <article> <noun>
<article> a | the
<noun> girl | dog
<verb-phrase> <verb> <noun-phrase>
<verb> sees | pets
<sentence> <noun-phrase> <verb-phrase> .
<article>
<noun> <verb
-
phrase>
<article>
<noun>
<verb
phrase>
.
the <noun> <verb-phrase> .
the girl <verb-phrase> .
the girl
<verb>
<noun
phrase>
the
girl
<verb>
<noun
-
phrase>
.
the girl sees <noun-phrase> .
the girl sees <article> <noun> .
the girl sees a
<noun>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-20
the
girl
sees
a
<noun>
.
the girl sees a dog .
Context free?
• In the previous example you might wonder
about the idea of context
• In a context-free grammar we find that
replacements do not have any context in which
they cannot occur
– The dog pets the girl =
– The girl pets the dog =
• Of course this means that there are certain
contexts that the rules don’t work, thus it would
Copyright © 2006 Addison-Wesley. All rights reserved. 1-21
not be “context-free”
Example: A left derivation of expression:
(x
-
y)
*
x+y
(
x
y
)
x
+
y
‹expr›
‹expr› + ‹term›
‹term› + ‹term›
‹term› * ‹factor› + ‹term›
‹factor› * ‹factor› + ‹term›
(
‹ex
p
r›
)
* ‹factor› + ‹term›
(
p
)
( ‹expr› - ‹term› ) * ‹factor› + ‹term›
( ‹term› - ‹term› ) * ‹factor› + ‹term›
( ‹factor› - ‹term› ) * ‹factor› + ‹term›
(
‹var›
-
‹term›
)
*
‹factor›
+
‹term›
(
‹var›
‹term›
)
‹factor›
+
‹term›
( x - ‹term› ) * ‹factor› + ‹term›
( x - ‹factor› ) * ‹factor› + ‹term›
( x - ‹var› ) * ‹factor› + ‹term›
(x
-
y)
*
‹factor›
+
‹term›
(
x
-
y
)
‹factor›
+
‹term›
( x - y ) * ‹var› + ‹term›
( x - y ) * x + ‹term›
( x - y ) * x + ‹factor›
(
)* +
Copyright © 2006 Addison-Wesley. All rights reserved. 1-22
(
x - y
)
*
x
+
‹var›
( x - y ) * x + y
Example: A grammar for a small language
<program>
begin
<stmt list>
end
<program>
begin
<stmt
_
list>
end
<stmt_list> <stmt> | <stmt> ; <stmt_list>
<stmt>
<var> = <expression>
<stmt>
<var> = <expression>
<var> A | B | C
i
|
<express
i
on>
<var>
+
<var>
|
<var> – <var> |
<var>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-23
Example: A left derivation of grammar
<program>
begin
<stmt list>
end
begin
<stmt
_
list>
end
begin <stmt> ; <stmt_list> end
begin <var> = <expression> ; <stmt_list> end
begin A = <expression> ; <stmt_list> end
begin A = <var> + <var> ; <stmt_list> end
begin
A
=
B
+
<var>
;
<stmt list>
end
begin
A
=
B
+
<var>
;
<stmt
_
list>
end
begin A = B + C ; <stmt_list> end
begin A = B + C ; <stmt > end
begin A = B + C ; <var> = <expression> en
d
begin A = B + C ; B = <expression> end
begin
A
=
B
+
C
;
B
=
<var>
end
Copyright © 2006 Addison-Wesley. All rights reserved. 1-24
begin
A
B
+
C
;
B
<var>
end
begin A = B + C ; B = C end
Parse Tree
• Grammars naturally describe the hierarchical
syntactic structure of the sentences of the
languages they define
• These hierarchical structures are called parse
trees
– Every internal node of a parse tree is labeled with a
nonterminal symbol
– Every leaf is labeled with a terminal symbol
– Every subtree of a parse tree describes one instance of
an abstraction in the statement
Copyright © 2006 Addison-Wesley. All rights reserved. 1-25
an
abstraction
in
the
statement
Example: Parser tree of sentence
“
the girl sees a dog
”
the
girl
sees
a
dog
.
t
sen
t
ence
noun-phrase verb-phrase .
article noun
verb
noun-phrase
article nounthe girl sees
adog
Copyright © 2006 Addison-Wesley. All rights reserved. 1-26
Example: A Parser Tree
<assign>
Rules of
g
rammar:
<id> = <expr>
g
<assign> <id> = <expr>
<ex
p
r> <id> + <ex
p
r>
|
<id> * <expr>
(
<expr>
)
A
B
p
p|
<id> * <expr> |
<id> |
(
<expr>
)
<id> + <expr>
B
(<expr>)
<id> A | B | C
A
<id>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-27
C
Ambiguity
• Two different derivations can lead to the same
the parse tree, this is good because the
grammar is unambiguous
• Given a grammar:
<expr> <expr> + <expr> |
<expr> * <expr> |
( <expr> ) |
<number>
<number> <number> <digit> | <digit>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-28
<digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
… given 234, we have different derivations …
number
number digit
number
number digit
number 4
number digit 4
number 3 4
number digit digit
digit digit digit
2 digit digit
digit 3 4
234
2 3 digit
234
… but the parse tree is the same in either case …
…
but
the
parse
tree
is
the
same
in
either
case
…
number
number digit
number digit
di
g
it
3
4
Copyright © 2006 Addison-Wesley. All rights reserved. 1-29
g
3
2
Ambiguity (cont.)
• A grammar is ambiguous if and only if it
generates a sentential form that has two or
more distinct parse trees
• Ambiguity should be avoided
Copyright © 2006 Addison-Wesley. All rights reserved. 1-30
Two distinct parser trees for sentence
A
=
B+C
*
A
A
B
+
C
A
<assign> <assign>
<id> = <expr>
A
<id> = <expr>
*
A
<expr> + <expr>
<ex
p
r> * <ex
p
r>
A
<id>
<expr>
*
<expr>
<expr> + <expr>
A
<id>
p
p
<id> <id>
B
<id> <id> A
C
A
B
C
and their meanin
g
Copyright © 2006 Addison-Wesley. All rights reserved. 1-31
g
A = B + (C * A) A = (B + C) * A
Operator Precedence
• An operator in an arithmetic expression
which is generated lower in the parse tree
can be used to indicate that it has
precedence over an operator produced
higher up in the tree
higher
up
in
the
tree
• Although the above grammar is not
ambiguous, the precedence order of its
operators is not the usual one
Copyright © 2006 Addison-Wesley. All rights reserved. 1-32
Removing Ambiguity
• An unambiguous grammar for expressions
<assign> <id> = <expr>
<id> A | B | C
<expr> <expr> + <term> | <term>
<term> <term> * <factor> | <factor>
<factor> ( <expr> ) | <id>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-33
Example: Derivation and parser tree of
sentence A
=
B+C
*
A
sentence
A
B
+
C
A
<assign>
<id>
=
<expr>
<assign>
<id>
<expr>
A = <expr>
A = <expr> + <term>
A
=
<term>
+
<term>
<id> = <expr>
A
=
<term>
+
<term>
A = <factor> + <term>
A = <id> + <term>
A
B
t
<expr> + <term>A
A
=
B
+ <
t
erm>
A = B + <term> * <factor>
A = B + <factor> * <factor>
<term>
<f
acto
r> <i
d
>
<term> * <factor>
<f
a
ct
o
r>
A = B + <id> * <factor>
A = B + C * <factor>
A = B + C * <id>
acto
<id>
d
A
ao
<id>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-34
A = B + C * A
B C
Associativity of Operators
• Addition becomes left or right associative. This
i ’t b d ith dditi b t i ti it
i
sn
’t
so
b
a
d
w
ith
a
dditi
on
b
u
t
assoc
i
a
ti
v
it
y can
be a problem with other operators, e.g
subtraction and division
subtraction
and
division
• When a BNF rule has its LHS also appearing at
hb f S h l d b
t
h
e
b
eginning o
f
its RH
S
, t
h
e ru
l
e is sai
d
to
b
e
left recursive
• We can fix this problem with following rules:
– Left recursive rules become left associative
Copyright © 2006 Addison-Wesley. All rights reserved. 1-35
– Right recursive rules become right associative
Associativity of Operators (cont.)
•
In most languages that provide
In
most
languages
that
provide
exponentiation operator, it is right associative
The following rules could be used to describe
•
The
following
rules
could
be
used
to
describe
exponentiation as a right associative operator
<factor> <expr> ** <factor> | <expr>
<expr> ( <expr> ) | <id>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-36
An Unambiguous Grammar for statement
if
-
then
-
else
if
then
else
• The ambiguous grammar of if-then-else
statement has following rules:
<if_stmt> if <logic_expr> then <stmt> |
if <logic_expr> then <stmt> else <stmt>
• The simplest sentential form that illustrates
this ambiguity is
if <lo
g
ic_ex
p
r> then
gp
if <logic_expr> then <stmt>
else
<stmt>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-37
else
<stmt>
The two parser trees …
<if_stmt>
if
<logic_expr>
then
<stmt>
else
<stmt>
if t t
if
<logic_expr>
then
<stmt>
<
if
_s
t
m
t
>
<if_stmt>
if
<logic expr>
then
<stmt>
if
<logic
_
expr>
then
<stmt>
<if_stmt>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-38
else
<stmt>
if
<logic_expr>
then
<stmt>
An Unambiguous Grammar …
• The rule for if constructs in most languages is
that an
else
clause when present is matched
that
an
else
clause
,
when
present
,
is
matched
with the nearest previous unmatched then (or if)
•
The unambiguous grammar based on this rule
The
unambiguous
grammar
based
on
this
rule
follows
<stmt> <matched> | <unmatched>
<matched>
if <logic_expr> then <matched> else <matched> |
any non-if statement
<unmatched>
if
<logic expr>
then
<stmt> |
Copyright © 2006 Addison-Wesley. All rights reserved. 1-39
if
<logic
_
expr>
then
<stmt>
|
if <logic_expr> then <matched> else <unmatched>
There is just one possible parse tree …
<stmt>
<unmatched>
if
<logic_expr>
then
<stmt>
<matched>
else
<stmt>
if
<lo
g
ic
_
ex
p
r>
then
<stmt>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-40
g_ p
Extended BNF
• Extended BNF does not enhance the descriptive
power of BNF; it only increases BNF
’
s readability
power
of
BNF;
it
only
increases
BNF s
readability
and writability
Three e tensions
•
Three
e
x
tensions
:
– Optional parts are placed in brackets ([])
<selection>
if
(<expression>) <statement>
[
l
<statement>
]
;
<selection>
if
(<expression>)
<statement>
[
e
l
se
<statement>
]
;
– Put alternative parts of RHSs in parentheses and
se
p
arate them with vertical bars
p
<for_stmt> for <var> := <expr> (to | downto) <expr> do <stmt>
– Put repetitions (0 or more) in braces ({})
Copyright © 2006 Addison-Wesley. All rights reserved. 1-41
<ident> -> letter {letter | digit}
Example: BNF and EBNF versions of an
expression grammar
expression
grammar
BNF:
<ex
p
r> <ex
p
r> + <term>
|
p
p|
<expr> – <term> |
<term>
<term> <term> * <factor> |
<term> / <factor> |
ft
<
f
ac
t
or>
EBNF:
<expr>
<term> {(+ |
)<term>}
EBNF:
<expr>
<term>
{(+
|
–
)
<term>}
<term> <factor> {(* | /) <factor>}
Copyright © 2006 Addison-Wesley. All rights reserved. 1-42
Syntax Graphs
• The information in BNF and EBNF rules can be
represented in a directed graph. Such graphs are
called syntax graphs
• A separate graph is used for each syntactic unit
• Syntax graphs use different kinds of nodes to
represent the terminal and nonterminal symbols
of the right sides of a grammar's rules
– Rectangle nodes contain the names of syntactic units
(nonterminals)
Copyright © 2006 Addison-Wesley. All rights reserved. 1-43
– Circles or ellipses contain terminal symbols
Example: The Ada if statement
condition
if then
stmtsif_stmt
end if
;
else_if
else
stmts
condition
stmts
else if
condition
elsif then
stmts
else
_
if
f
f
<i
f
_stmt> if <condition> then <stmts> {<else_i
f
>}
[else <stmts>] end if ;
<else if>
elsif
<condition>
then
<stmts>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-44
<else
_
if>
elsif
<condition>
then
<stmts>
Attribute Grammars
• CFGs cannot describe all of the syntax of
programming languages
• Additions to CFGs to carr
y
some semantic
y
info along through parse trees
•
Primary value of
attribute grammars
:
•
Primary
value
of
attribute
grammars
:
– Static semantics specification
Compiler design (static semantics checking)
–
Compiler
design
(static
semantics
checking)
Copyright © 2006 Addison-Wesley. All rights reserved. 1-45
Static semantics
•
Example 1
: In Java, a floating-point value cannot be
idt it t ibllthhth
ass
i
gne
d
t
o an
i
n
t
eger
t
ype var
i
a
bl
e, a
lth
oug
h
th
e
opposite is legal
Example 2
: All variables must be declared before they
•
Example
2
:
All
variables
must
be
declared
before
they
are referenced
•
Example 3
:Ifthe
end
of an Ada subprogram is
Example
3
:
If
the
end
of
an
Ada
subprogram
is
followed by a name, that name must match the name
of the subprogram
• These problems exemplify the category of language
rules called static semantics rules. They cannot be
Copyright © 2006 Addison-Wesley. All rights reserved. 1-46
specified in BNF
Attribute Grammars - Basic Concepts
• Attributes, which are associated with grammar
symbols, are similar to variables in the sense
that they can have values assigned to them
• Attribute computation functions (or semantic
functions) are associated with grammar rules.
They are used to specify how attribute values
are computed
• Predicate functions, which state the static
semantic rules of the language, are associated
Copyright © 2006 Addison-Wesley. All rights reserved. 1-47
with grammar rules
Attribute Grammars - Definition
• An attribute grammar is a CFG G = (S, N, T, P)
ith th f ll i dditi
w
ith
th
e
f
o
ll
ow
i
ng a
dditi
ons:
– For each grammar symbol X there is a set A(X)
f
b
l
o
f
attri
b
ute va
l
ues
– Each rule has a set of semantic functions that
define certain attributes of the nonterminals in
the rule
– Each rule has a (possibly empty) set of
predicate functions to check for attribute
Copyright © 2006 Addison-Wesley. All rights reserved. 1-48
consistency
Attribute Grammars (cont.)
• The set A(X) consists of two disjoint sets:
Shid ib
S(X) i
–
S
ynt
h
es
i
ze
d
attr
ib
utes
S(X)
: to pass semant
i
c
information up a parser tree
–
Inherited attributes
I(X): to pass semantic information
Inherited
attributes
I(X):
to
pass
semantic
information
down a parser tree
•
Let X
0
X
1
…X
n
be a rule
Let
X
0
X
1
…
X
n
be
a
rule
• Functions of the form S(X
0
) = f(A(X
1
), A(X
n
)) define
s
y
nthesized attributes of X
0
y
0
• Functions of the form I(X
j
) = f(A(X
0
), , A(X
n
)), for
1
j
n, define inherited attributes of X
j
Copyright © 2006 Addison-Wesley. All rights reserved. 1-49
j
j
• Initially, there are intrinsic attributes on the leaves
Attribute Grammars (cont.)
• The value of an inherited attribute on a parse
tree node depends on the attribute values of
that node's parent node and those of its
sibling nodes
•
To avoid circularity, inherited attributes are
To
avoid
circularity,
inherited
attributes
are
often restricted to functions of the form:
I(X
) f(A(X
)A(X
))
I(X
j
)
=
f(A(X
0
)
, ,
A(X
j
-1
))
Copyright © 2006 Addison-Wesley. All rights reserved. 1-50