Tải bản đầy đủ (.pdf) (46 trang)

Describing syntax and semantics

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (231.74 KB, 46 trang )

Chapter 3
Describing Syntax
and Semantics
ISBN 0-321-33025-0
Chapter 3 Topics

Introduction
Introduction
• The General Problem of Describing Syntax
lhdf b
•Forma
l
Met
h
o
d
s o
f
Descri
b
ing Syntax
• Attributes Grammars
• Describing the Meanings of Programs:
Dynamic Semantics
Dynamic

Semantics
Copyright © 2006 Addison-Wesley. All rights reserved. 1-2
Introduction
• A language may be hard to learn, hard to
implement and any ambiguity in the


implement
,
and

any

ambiguity

in

the

specification may lead to dialect differences if
we do not have a clear language definition
we

do

not

have

a

clear

language

definition
• Most new programming languages are subjected

to a period of scrutiny by potential users before
to

a

period

of

scrutiny

by

potential

users

before

their designs are completed
• Who must use lan
g
ua
g
e definitions
gg
– Other language designers
– Implementors
Copyright © 2006 Addison-Wesley. All rights reserved. 1-3
– Programmers (the users of the language)

Introduction (cont.)
• The study of programming languages can be
divided into examinations of syntax and
semantics

Syntax
- the form or structure of the expressions,
statements, and program units
Sti
th i f th i

S
eman
ti
c
s
-
th
e mean
i
ng o
f

th
e express
i
ons,
statements, and program units

Semantics should follow from syntax the form


Semantics

should

follow

from

syntax
,
the

form

of statements should be clear and imply what
the statements do or how they should be used
Copyright © 2006 Addison-Wesley. All rights reserved. 1-4
the

statements

do

or

how

they


should

be

used
Example

Syntax Example
: Simple C if statement
if (<expr>)
<true-statement>
else
<false
statement>
<false
-
statement>

Semantics Example
:
If the expression evaluated to true execute
the true statement otherwise execute the false
Copyright © 2006 Addison-Wesley. All rights reserved. 1-5
statement
The General Problem of Describing
Syntax
Syntax
•A sentence is a string of characters over some
alphabet


A
language
is a set of sentences
A

language
is

a

set

of

sentences
•A lexeme is the lowest level syntactic unit of a
language (e g *
+
sum begin)
language

(e
.
g
.,
*
,
+
, =,
sum

,
begin)
•A token is a category of lexemes (e.g.,
identifier)
• Languages can be formally defined in two
Copyright © 2006 Addison-Wesley. All rights reserved. 1-6
distinct ways: by recognition and by generation
Example
index = 2 * count + 10;
Lexeme Token
index
identifier
=
equal_sign
2
int_literal
*
mult_op
count
identifier
l
+
p
l
us_op
10
int_literal
semicolon
Copyright © 2006 Addison-Wesley. All rights reserved. 1-7
;

semicolon

Language Recognizers
• A reco
g
nition device reads in
p
ut strin
g
s of
gpg
the language and decides whether the input
strings belong to the language
strings

belong

to

the

language

• Example: syntax analysis part of a compiler
Copyright © 2006 Addison-Wesley. All rights reserved. 1-8
Language Generators

A device that generates sentences of a
A


device

that

generates

sentences

of

a

language
One can determine if the syntax of a

One

can

determine

if

the

syntax

of

a


particular sentence is correct by comparing
ih fh
i
t to t
h
e structure o
f
t
h
e generator
Copyright © 2006 Addison-Wesley. All rights reserved. 1-9
Language Recognizers vs. Generators

The language recognizer can only be used
The

language

recognizer

can

only

be

used

in trial-and-error mode (black box)

The structure of the language generator is

The

structure

of

the

language

generator

is

an open-book which people can easily read
dd d
an
d
un
d
erstan
d
Copyright © 2006 Addison-Wesley. All rights reserved. 1-10
Formal Methods of Describing Syntax

This section discusses the formal language
This


section

discusses

the

formal

language

generation mechanisms that are commonly
used to describe the syntax of programming
used

to

describe

the

syntax

of

programming

languages
Th h i f ll d

Th

ese mec
h
an
i
sms are o
f
ten ca
ll
e
d

grammars
• We will discuss the class of languages called
context-free languages
Copyright © 2006 Addison-Wesley. All rights reserved. 1-11
Context-Free Grammars
• Noam Chomsky
– A linguist
– Described four classes of grammars in the mid-1950s
Tfh l
f

T
wo o
f
t
h
ese grammar c
l
ass, context-

f
ree
and regular grammars, are useful in
i
computer sc
i
ence
– The tokens of programming languages can be
described by regular grammars
described

by

regular

grammars
– Whole programming languages, with minor
exce
p
tions
,
can be described b
y
context-free
Copyright © 2006 Addison-Wesley. All rights reserved. 1-12
p, y
grammars
Backus-Naur Form
• Invented by John Backus to describe Algol 58
• BNF is a metalanguage

–A metalanguage is a language used to describe
another language
• BNF is equivalent to context-free grammars
• Extended BNF (EBNF) improves readability
and writability of BNF
•In BNF,
abstractions
are used to represent
syntactic structures (also called nonterminal
Copyright © 2006 Addison-Wesley. All rights reserved. 1-13
symbols)
Backus-Naur Form (cont.)
• Although BNF is simple, it is sufficiently
powerful to describe the great majority of
the syntax of programming languages:
– Lists of similar constructs
– The order in which different constructs must
appear
– Nested structures to any depth
– Operator precedence
–O
p
erator associativit
y
Copyright © 2006 Addison-Wesley. All rights reserved. 1-14
py
–…
Grammar and Rules
• A grammar is a finite nonempty set of rules
• A rule has a left-hand side (LHS) and a right-

hand side (RHS), and consists of terminal
(lexeme or token) and nonterminal symbols
<assign>  <var> = <expression>
• An abstraction (or nonterminal symbol) can
have more than one RHS
<if_stmt> 
if <logic_expr> then <stmt> |
Copyright © 2006 Addison-Wesley. All rights reserved. 1-15
if <logic_expr> then <stmt> else <stmt>
Describing Lists
• Syntactic lists (for example, a list of
identifiers appearing on a data declaration
statement) are described usin
g
recursion
g
• A rule is recursive if its LHS appears in its
RHS
RHS

<ident_list>  identifier | identifier , <ident_list>
Note: Comma ‘,’ is a terminal
Copyright © 2006 Addison-Wesley. All rights reserved. 1-16
Example: A grammar of expressions
• Seven terminal symbols:
+
-*
()xy
+


(

)

x

y
• Four non-terminal symbols:
‹expr› ‹term› ‹factor› ‹var›
• Start/goal symbol:
‹expr›
Rl f
R
u
l
es o
f
grammar
‹expr›  ‹term› | ‹expr› + ‹term› | ‹expr› - ‹term›
‹term›  ‹factor› | ‹term› * ‹factor›
‹factor›  ‹var› | ( ‹expr› )
Copyright © 2006 Addison-Wesley. All rights reserved. 1-17
‹var›  x | y
Derivations
•A derivation is a repeated application of
rules, starting with the start symbol and
ending with a sentence (all terminal
symbols)

Derivations can be used to generate all the

Derivations

can

be

used

to

generate

all

the

possible sentences in a grammar

Every string of symbols in the derivation is a

Every

string

of

symbols

in


the

derivation

is

a

sentential form
Copyright © 2006 Addison-Wesley. All rights reserved. 1-18
Derivations (cont.)
•A sentence is a sentential form that has only
terminal symbols
•A leftmost derivation is one in which the
leftmost nonterminal in each sentential form
is the one that is expanded
is

the

one

that

is

expanded
• A derivation may be neither leftmost nor
rightmost
rightmost


• Derivation order should have no effect on the
ldb
Copyright © 2006 Addison-Wesley. All rights reserved. 1-19
l
anguage generate
d

b
y a grammar
Example: a grammar and left derivation
<sentence>  <noun-phrase> <verb-phrase> .
<noun-phrase>  <article> <noun>
<article>  a | the
<noun>  girl | dog
<verb-phrase>  <verb> <noun-phrase>
<verb>  sees | pets
<sentence>  <noun-phrase> <verb-phrase> .

<article>
<noun> <verb
-
phrase>

<article>
<noun>

<verb
phrase>
.

 the <noun> <verb-phrase> .
 the girl <verb-phrase> .

the girl
<verb>
<noun
phrase>

the

girl
<verb>
<noun
-
phrase>
.
 the girl sees <noun-phrase> .
 the girl sees <article> <noun> .

the girl sees a
<noun>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-20

the

girl

sees

a

<noun>
.
 the girl sees a dog .
Context free?
• In the previous example you might wonder
about the idea of context
• In a context-free grammar we find that
replacements do not have any context in which
they cannot occur
– The dog pets the girl = 
– The girl pets the dog =

• Of course this means that there are certain
contexts that the rules don’t work, thus it would
Copyright © 2006 Addison-Wesley. All rights reserved. 1-21
not be “context-free”
Example: A left derivation of expression:
(x
-
y)
*
x+y
(

x

y

)


x

+

y
‹expr› 
‹expr› + ‹term› 
‹term› + ‹term› 
‹term› * ‹factor› + ‹term› 
‹factor› * ‹factor› + ‹term› 
(
‹ex
p
r›
)
* ‹factor› + ‹term› 
(
p
)
( ‹expr› - ‹term› ) * ‹factor› + ‹term› 
( ‹term› - ‹term› ) * ‹factor› + ‹term› 
( ‹factor› - ‹term› ) * ‹factor› + ‹term› 
(
‹var›
-
‹term›
)
*
‹factor›
+

‹term›

(

‹var›
‹term›
)

‹factor›
+

‹term›

( x - ‹term› ) * ‹factor› + ‹term› 
( x - ‹factor› ) * ‹factor› + ‹term› 
( x - ‹var› ) * ‹factor› + ‹term› 
(x
-
y)
*
‹factor›
+
‹term›

(

x

-
y


)

‹factor›
+

‹term›

( x - y ) * ‹var› + ‹term› 
( x - y ) * x + ‹term› 
( x - y ) * x + ‹factor› 
(
)* +
Copyright © 2006 Addison-Wesley. All rights reserved. 1-22
(
x - y
)

*
x
+
‹var› 
( x - y ) * x + y
Example: A grammar for a small language
<program>

begin
<stmt list>
end
<program>


begin
<stmt
_
list>
end
<stmt_list>  <stmt> | <stmt> ; <stmt_list>
<stmt>

<var> = <expression>
<stmt>

<var> = <expression>
<var>  A | B | C
i
|
<express
i
on>

 <var>

+

<var>
|
<var> – <var> |
<var>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-23
Example: A left derivation of grammar

<program>

begin
<stmt list>
end

begin
<stmt
_
list>
end
 begin <stmt> ; <stmt_list> end
 begin <var> = <expression> ; <stmt_list> end
 begin A = <expression> ; <stmt_list> end
 begin A = <var> + <var> ; <stmt_list> end

begin
A
=
B
+
<var>
;
<stmt list>
end

begin
A
=
B

+
<var>
;
<stmt
_
list>
end
 begin A = B + C ; <stmt_list> end
 begin A = B + C ; <stmt > end
 begin A = B + C ; <var> = <expression> en
d
 begin A = B + C ; B = <expression> end

begin
A
=
B
+
C
;
B
=
<var>
end
Copyright © 2006 Addison-Wesley. All rights reserved. 1-24

begin
A
B
+

C
;
B
<var>
end
 begin A = B + C ; B = C end
Parse Tree
• Grammars naturally describe the hierarchical
syntactic structure of the sentences of the
languages they define
• These hierarchical structures are called parse
trees
– Every internal node of a parse tree is labeled with a
nonterminal symbol
– Every leaf is labeled with a terminal symbol
– Every subtree of a parse tree describes one instance of
an abstraction in the statement
Copyright © 2006 Addison-Wesley. All rights reserved. 1-25
an

abstraction

in

the

statement

Example: Parser tree of sentence


the girl sees a dog

the

girl

sees

a

dog
.
t
sen
t
ence
noun-phrase verb-phrase .
article noun
verb
noun-phrase
article nounthe girl sees
adog
Copyright © 2006 Addison-Wesley. All rights reserved. 1-26
Example: A Parser Tree
<assign>
Rules of
g
rammar:
<id> = <expr>
g

<assign>  <id> = <expr>
<ex
p
r>  <id> + <ex
p
r>
|
<id> * <expr>
(
<expr>
)
A
B
p
p|
<id> * <expr> |
<id> |
(
<expr>
)
<id> + <expr>
B
(<expr>)
<id>  A | B | C
A
<id>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-27
C
Ambiguity
• Two different derivations can lead to the same

the parse tree, this is good because the
grammar is unambiguous
• Given a grammar:
<expr>  <expr> + <expr> |
<expr> * <expr> |
( <expr> ) |
<number>
<number>  <number> <digit> | <digit>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-28
<digit>  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
… given 234, we have different derivations …
number
 number digit
number
 number digit
 number 4
 number digit 4
 number 3 4
 number digit digit
 digit digit digit
 2 digit digit
 digit 3 4
 234
 2 3 digit
 234
… but the parse tree is the same in either case …


but


the

parse

tree

is

the

same

in

either

case


number
number digit
number digit
di
g
it
3
4
Copyright © 2006 Addison-Wesley. All rights reserved. 1-29
g
3

2
Ambiguity (cont.)
• A grammar is ambiguous if and only if it
generates a sentential form that has two or
more distinct parse trees
• Ambiguity should be avoided
Copyright © 2006 Addison-Wesley. All rights reserved. 1-30
Two distinct parser trees for sentence
A
=
B+C
*
A
A

B

+

C

A
<assign> <assign>
<id> = <expr>
A
<id> = <expr>
*
A
<expr> + <expr>
<ex

p
r> * <ex
p
r>
A
<id>
<expr>
*
<expr>
<expr> + <expr>
A
<id>
p
p
<id> <id>
B
<id> <id> A
C
A
B
C
and their meanin
g

Copyright © 2006 Addison-Wesley. All rights reserved. 1-31
g
A = B + (C * A) A = (B + C) * A
Operator Precedence
• An operator in an arithmetic expression
which is generated lower in the parse tree

can be used to indicate that it has
precedence over an operator produced
higher up in the tree
higher

up

in

the

tree

• Although the above grammar is not
ambiguous, the precedence order of its
operators is not the usual one
Copyright © 2006 Addison-Wesley. All rights reserved. 1-32
Removing Ambiguity
• An unambiguous grammar for expressions
<assign>  <id> = <expr>
<id>  A | B | C
<expr>  <expr> + <term> | <term>
<term>  <term> * <factor> | <factor>
<factor>  ( <expr> ) | <id>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-33
Example: Derivation and parser tree of
sentence A
=
B+C
*

A
sentence

A

B

+

C

A
<assign>

<id>
=
<expr>
<assign>

<id>
<expr>
 A = <expr>
 A = <expr> + <term>

A
=
<term>
+
<term>
<id> = <expr>


A

=
<term>
+
<term>
 A = <factor> + <term>
 A = <id> + <term>
A
B
t
<expr> + <term>A

A
=
B
+ <
t
erm>
 A = B + <term> * <factor>
 A = B + <factor> * <factor>
<term>
<f
acto
r> <i
d
>
<term> * <factor>
<f

a
ct
o
r>
 A = B + <id> * <factor>
 A = B + C * <factor>
 A = B + C * <id>
acto
<id>
d
A
ao
<id>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-34
 A = B + C * A
B C
Associativity of Operators
• Addition becomes left or right associative. This
i ’t b d ith dditi b t i ti it
i
sn
’t
so
b
a
d
w
ith
a
dditi

on
b
u
t
assoc
i
a
ti
v
it
y can
be a problem with other operators, e.g
subtraction and division
subtraction

and

division
• When a BNF rule has its LHS also appearing at
hb f S h l d b
t
h
e
b
eginning o
f
its RH
S
, t
h

e ru
l
e is sai
d
to
b
e
left recursive
• We can fix this problem with following rules:
– Left recursive rules become left associative
Copyright © 2006 Addison-Wesley. All rights reserved. 1-35
– Right recursive rules become right associative
Associativity of Operators (cont.)

In most languages that provide
In

most

languages

that

provide

exponentiation operator, it is right associative
The following rules could be used to describe

The


following

rules

could

be

used

to

describe

exponentiation as a right associative operator
<factor>  <expr> ** <factor> | <expr>
<expr>  ( <expr> ) | <id>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-36
An Unambiguous Grammar for statement
if
-
then
-
else
if
then
else
• The ambiguous grammar of if-then-else
statement has following rules:
<if_stmt>  if <logic_expr> then <stmt> |

if <logic_expr> then <stmt> else <stmt>
• The simplest sentential form that illustrates
this ambiguity is
if <lo
g
ic_ex
p
r> then
gp
if <logic_expr> then <stmt>
else
<stmt>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-37
else
<stmt>
The two parser trees …
<if_stmt>
if
<logic_expr>
then
<stmt>
else
<stmt>
if t t
if
<logic_expr>
then
<stmt>
<
if

_s
t
m
t
>
<if_stmt>
if
<logic expr>
then
<stmt>
if
<logic
_
expr>
then
<stmt>
<if_stmt>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-38
else
<stmt>
if
<logic_expr>
then
<stmt>
An Unambiguous Grammar …
• The rule for if constructs in most languages is
that an
else
clause when present is matched
that


an

else
clause
,
when

present
,
is

matched

with the nearest previous unmatched then (or if)

The unambiguous grammar based on this rule
The

unambiguous

grammar

based

on

this

rule


follows
<stmt>  <matched> | <unmatched>
<matched> 
if <logic_expr> then <matched> else <matched> |
any non-if statement
<unmatched> 
if
<logic expr>
then
<stmt> |
Copyright © 2006 Addison-Wesley. All rights reserved. 1-39
if
<logic
_
expr>

then
<stmt>

|
if <logic_expr> then <matched> else <unmatched>
There is just one possible parse tree …
<stmt>
<unmatched>
if
<logic_expr>
then
<stmt>
<matched>

else
<stmt>
if
<lo
g
ic
_
ex
p
r>
then
<stmt>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-40
g_ p
Extended BNF
• Extended BNF does not enhance the descriptive
power of BNF; it only increases BNF

s readability
power

of

BNF;

it

only

increases


BNF s

readability

and writability
Three e tensions

Three

e
x
tensions
:
– Optional parts are placed in brackets ([])
<selection>

if
(<expression>) <statement>
[
l
<statement>
]
;
<selection>


if
(<expression>)


<statement>

[
e
l
se
<statement>

]
;
– Put alternative parts of RHSs in parentheses and
se
p
arate them with vertical bars
p
<for_stmt>  for <var> := <expr> (to | downto) <expr> do <stmt>
– Put repetitions (0 or more) in braces ({})
Copyright © 2006 Addison-Wesley. All rights reserved. 1-41
<ident> -> letter {letter | digit}
Example: BNF and EBNF versions of an
expression grammar
expression

grammar

BNF:
<ex
p
r>  <ex
p

r> + <term>
|

p
p|
<expr> – <term> |
<term>
<term>  <term> * <factor> |
<term> / <factor> |
ft
<
f
ac
t
or>
EBNF:
<expr>

<term> {(+ |
)<term>}
EBNF:
<expr>


<term>

{(+

|


)

<term>}
<term>  <factor> {(* | /) <factor>}
Copyright © 2006 Addison-Wesley. All rights reserved. 1-42
Syntax Graphs
• The information in BNF and EBNF rules can be
represented in a directed graph. Such graphs are
called syntax graphs
• A separate graph is used for each syntactic unit
• Syntax graphs use different kinds of nodes to
represent the terminal and nonterminal symbols
of the right sides of a grammar's rules
– Rectangle nodes contain the names of syntactic units
(nonterminals)
Copyright © 2006 Addison-Wesley. All rights reserved. 1-43
– Circles or ellipses contain terminal symbols
Example: The Ada if statement
condition
if then
stmtsif_stmt
end if
;
else_if
else
stmts
condition
stmts
else if
condition

elsif then
stmts
else
_
if
f
f
<i
f
_stmt>  if <condition> then <stmts> {<else_i
f
>}
[else <stmts>] end if ;
<else if>

elsif
<condition>
then
<stmts>
Copyright © 2006 Addison-Wesley. All rights reserved. 1-44
<else
_
if>


elsif
<condition>

then
<stmts>

Attribute Grammars
• CFGs cannot describe all of the syntax of
programming languages
• Additions to CFGs to carr
y
some semantic
y
info along through parse trees

Primary value of
attribute grammars
:

Primary

value

of

attribute

grammars
:
– Static semantics specification
Compiler design (static semantics checking)

Compiler

design


(static

semantics

checking)
Copyright © 2006 Addison-Wesley. All rights reserved. 1-45
Static semantics

Example 1
: In Java, a floating-point value cannot be
idt it t ibllthhth
ass
i
gne
d

t
o an
i
n
t
eger
t
ype var
i
a
bl
e, a
lth
oug

h

th
e
opposite is legal
Example 2
: All variables must be declared before they

Example

2
:

All

variables

must

be

declared

before

they

are referenced

Example 3

:Ifthe
end
of an Ada subprogram is
Example

3
:

If

the

end
of

an

Ada

subprogram

is

followed by a name, that name must match the name
of the subprogram
• These problems exemplify the category of language
rules called static semantics rules. They cannot be
Copyright © 2006 Addison-Wesley. All rights reserved. 1-46
specified in BNF
Attribute Grammars - Basic Concepts

• Attributes, which are associated with grammar
symbols, are similar to variables in the sense
that they can have values assigned to them
• Attribute computation functions (or semantic
functions) are associated with grammar rules.
They are used to specify how attribute values
are computed
• Predicate functions, which state the static
semantic rules of the language, are associated
Copyright © 2006 Addison-Wesley. All rights reserved. 1-47
with grammar rules
Attribute Grammars - Definition
• An attribute grammar is a CFG G = (S, N, T, P)
ith th f ll i dditi
w
ith

th
e
f
o
ll
ow
i
ng a
dditi
ons:
– For each grammar symbol X there is a set A(X)
f
b

l
o
f
attri
b
ute va
l
ues
– Each rule has a set of semantic functions that
define certain attributes of the nonterminals in
the rule
– Each rule has a (possibly empty) set of
predicate functions to check for attribute
Copyright © 2006 Addison-Wesley. All rights reserved. 1-48
consistency
Attribute Grammars (cont.)
• The set A(X) consists of two disjoint sets:
Shid ib
S(X) i

S
ynt
h
es
i
ze
d
attr
ib
utes

S(X)
: to pass semant
i
c
information up a parser tree

Inherited attributes
I(X): to pass semantic information
Inherited

attributes
I(X):

to

pass

semantic

information

down a parser tree

Let X
0

X
1
…X
n

be a rule
Let

X
0

X
1


X
n
be

a

rule
• Functions of the form S(X
0
) = f(A(X
1
), A(X
n
)) define
s
y
nthesized attributes of X
0
y
0

• Functions of the form I(X
j
) = f(A(X
0
), , A(X
n
)), for
1 
j
 n, define inherited attributes of X
j
Copyright © 2006 Addison-Wesley. All rights reserved. 1-49
j
j
• Initially, there are intrinsic attributes on the leaves
Attribute Grammars (cont.)
• The value of an inherited attribute on a parse
tree node depends on the attribute values of
that node's parent node and those of its
sibling nodes

To avoid circularity, inherited attributes are
To

avoid

circularity,

inherited


attributes

are

often restricted to functions of the form:
I(X
) f(A(X
)A(X
))
I(X
j
)
=
f(A(X
0
)
, ,
A(X
j
-1
))
Copyright © 2006 Addison-Wesley. All rights reserved. 1-50

×