Chapter 3
Describing Syntax
and Semantics
ISBN 0-321-33025-0
Chapter 3 Topics
• Introduction
• The General Problem of Describing Syntax
• Formal Methods of Describing Syntax
• Attributes Grammars
• Describing the Meanings of Programs:
Dynamic Semantics
Copyright © 2006 Addison-Wesley. All rights reserved.
1-2
Introduction
• A language may be hard to learn, hard to
implement, and any ambiguity in the
specification may lead to dialect differences if
we do not have a clear language definition
• Most new programming languages are subjected
to a period of scrutiny by potential users before
their designs are completed
• Who must use language definitions
– Other language designers
– Implementors
– Programmers (the users of the language)
Copyright © 2006 Addison-Wesley. All rights reserved.
1-3
Introduction (cont.)
• The study of programming languages can be
divided into examinations of syntax and
semantics
– Syntax - the form or structure of the expressions,
statements, and program units
– Semantics - the meaning of the expressions,
statements, and program units
• Semantics should follow from syntax, the form
of statements should be clear and imply what
the statements do or how they should be used
Copyright © 2006 Addison-Wesley. All rights reserved.
1-4
Example
•Syntax Example: Simple C if statement
if (<expr>)
<true-statement>
else
<false-statement>
•Semantics Example:
If the expression evaluated to true execute
the true statement otherwise execute the false
statement
Copyright © 2006 Addison-Wesley. All rights reserved.
1-5
The General Problem of Describing
Syntax
• A sentence is a string of characters over some
alphabet
• A language is a set of sentences
• A lexeme is the lowest level syntactic unit of a
language (e.g., *, +, =, sum, begin)
• A token is a category of lexemes (e.g.,
identifier, number, operator, …)
Copyright © 2006 Addison-Wesley. All rights reserved.
1-6
Example
index = 2 * count + 10;
Lexeme
Token
index
=
2
identifier
equal_sign
int_literal
*
count
+
10
;
mult_op
identifier
plus_op
int_literal
semicolon
Copyright © 2006 Addison-Wesley. All rights reserved.
1-7
The Definition of Languages
• Languages can be formally defined in two
distinct ways: by recognition and by
generation
• Language Recognizers
– A recognition device of the language reads
input strings and decides whether the input
strings belong to the language
– Example: syntax analysis part of a compiler
Copyright © 2006 Addison-Wesley. All rights reserved.
1-8
The Definition of Languages (cont.)
• Language Generators
– A device that generates sentences of a language
– One can determine if the syntax of a particular
sentence is correct by comparing it to the
structure of the generator
Copyright © 2006 Addison-Wesley. All rights reserved.
1-9
Language Recognizers vs. Generators
• The language recognizer can only be used
in trial-and-error mode (black box)
• The structure of the language generator is
an open-book which people can easily read
and understand
Copyright © 2006 Addison-Wesley. All rights reserved.
1-10
Formal Methods of Describing Syntax
• This section discusses the formal language
generation mechanisms that are commonly
used to describe the syntax of programming
languages
• These mechanisms are often called
grammars
• We will discuss the class of languages called
context-free languages
Copyright © 2006 Addison-Wesley. All rights reserved.
1-11
Context-Free Grammars
• Noam Chomsky
– A linguist
– Described four classes of grammars in the mid-1950s
• Two of these grammar class, context-free
and regular grammars, are useful in
computer science
– The tokens of programming languages can be
described by regular grammars
– Whole programming languages, with minor
exceptions, can be described by context-free
grammars
Copyright © 2006 Addison-Wesley. All rights reserved.
1-12
Backus-Naur Form
• Invented by John Backus to describe Algol 58
• BNF is a metalanguage
– A metalanguage is a language used to describe
another language
• BNF is equivalent to context-free grammars
• In BNF, abstractions are used to represent
syntactic structures (also called nonterminal
symbols)
Copyright © 2006 Addison-Wesley. All rights reserved.
1-13
Backus-Naur Form (cont.)
• Although BNF is simple, it is sufficiently
powerful to describe the great majority of
the syntax of programming languages:
– Lists of similar constructs
– The order in which different constructs must
appear
– Nested structures to any depth
– Operator precedence
– Operator associativity
– …
Copyright © 2006 Addison-Wesley. All rights reserved.
1-14
Grammar and Rules
• A grammar is a finite nonempty set of rules
• A rule has a left-hand side (LHS) and a righthand side (RHS), and consists of terminal
(lexeme or token) and nonterminal symbols
<assign> <var> = <expression>
• An abstraction (or nonterminal symbol) can
have more than one RHS
<if_stmt>
if <logic_expr> then <stmt> |
if <logic_expr> then <stmt> else <stmt>
Copyright © 2006 Addison-Wesley. All rights reserved.
1-15
Describing Lists
• Syntactic lists (for example, a list of
identifiers appearing on a data declaration
statement) are described using recursion
• A rule is recursive if its LHS appears in its
RHS
<ident_list> identifier | identifier , <ident_list>
Note: Comma „,‟ is a terminal
Copyright © 2006 Addison-Wesley. All rights reserved.
1-16
Example: A grammar of expressions
• Seven terminal symbols:
+*()xy
• Four non-terminal symbols:
‹expr› ‹term› ‹factor› ‹var›
• Start/goal symbol:
‹expr›
Rules of grammar
‹expr›
‹term› | ‹expr› + ‹term› | ‹expr› - ‹term›
‹term›
‹factor› | ‹term› * ‹factor›
‹factor›
‹var› | ( ‹expr› )
‹var›
x | y
Copyright © 2006 Addison-Wesley. All rights reserved.
1-17
Derivations
• A derivation is a repeated application of
rules, starting with the start symbol and
ending with a sentence (all terminal
symbols)
• Derivations can be used to generate all the
possible sentences in a grammar
• Every string of symbols in the derivation is a
sentential form
Copyright © 2006 Addison-Wesley. All rights reserved.
1-18
Example
<sentence> <noun-phrase> <verb-phrase> .
<noun-phrase> <article> <noun>
<article> a | the
<noun> girl | dog
<verb-phrase> <verb> <noun-phrase>
<verb> sees | pets
<sentence>
<noun-phrase> <verb-phrase> .
<article> <noun> <verb-phrase> .
the <noun> <verb-phrase> .
the girl <verb-phrase> .
the girl <verb> <noun-phrase> .
the girl sees <noun-phrase> .
the girl sees <article> <noun> .
the girl sees a <noun> .
the girl sees a dog .
Copyright © 2006 Addison-Wesley. All rights reserved.
1-19
Derivations (cont.)
• A sentence is a sentential form that has only
terminal symbols
• A leftmost derivation is one in which the
leftmost nonterminal in each sentential form
is the one that is expanded
• A derivation may be neither leftmost nor
rightmost
• Derivation order should have no effect on the
language generated by a grammar
Copyright © 2006 Addison-Wesley. All rights reserved.
1-20
Context free?
• In the previous example you might wonder
about the idea of context
• In a context-free grammar we find that
replacements do not have any context in which
they cannot occur
– The dog pets the girl =
– The girl pets the dog =
• Of course this means that there are certain
contexts that the rules don‟t work, thus it would
not be “context-free”
Copyright © 2006 Addison-Wesley. All rights reserved.
1-21
Example: A left derivation of expression:
(x-y)*x+y
‹expr›
‹expr› + ‹term›
‹term› + ‹term›
‹term› * ‹factor› + ‹term›
‹factor› * ‹factor› + ‹term›
( ‹expr› ) * ‹factor› + ‹term›
( ‹expr› - ‹term› ) * ‹factor› + ‹term›
( ‹term› - ‹term› ) * ‹factor› + ‹term›
( ‹factor› - ‹term› ) * ‹factor› + ‹term›
( ‹var› - ‹term› ) * ‹factor› + ‹term›
( x - ‹term› ) * ‹factor› + ‹term›
( x - ‹factor› ) * ‹factor› + ‹term›
( x - ‹var› ) * ‹factor› + ‹term›
( x - y ) * ‹factor› + ‹term›
( x - y ) * ‹var› + ‹term›
( x - y ) * x + ‹term›
( x - y ) * x + ‹factor›
( x - y ) * x + ‹var›
(x-y)*x+y
Copyright © 2006 Addison-Wesley. All rights reserved.
1-22
Example: A grammar for a small language
begin <stmt_list> end
<stmt_list> <stmt> | <stmt> ; <stmt_list>
<stmt> <var> = <expression>
<var> A | B | C
<expression> <var> + <var> |
<var> – <var> |
<var>
Copyright © 2006 Addison-Wesley. All rights reserved.
1-23
Example: A left derivation of grammar
begin <stmt_list> end
begin <stmt> ; <stmt_list> end
begin <var> = <expression> ; <stmt_list> end
begin A = <expression> ; <stmt_list> end
begin A = <var> + <var> ; <stmt_list> end
begin A = B + <var> ; <stmt_list> end
begin A = B + C ; <stmt_list> end
begin A = B + C ; <stmt > end
begin A = B + C ; <var> = <expression> end
begin A = B + C ; B = <expression> end
begin A = B + C ; B = <var> end
begin A = B + C ; B = C end
Copyright © 2006 Addison-Wesley. All rights reserved.
1-24
Parse Tree
• Grammars naturally describe the hierarchical
syntactic structure of the sentences of the
languages they define
• These hierarchical structures are called parse
trees
– Every internal node of a parse tree is labeled with a
nonterminal symbol
– Every leaf is labeled with a terminal symbol
– Every subtree of a parse tree describes one instance of
an abstraction in the statement
Copyright © 2006 Addison-Wesley. All rights reserved.
1-25