Chapter 3 describing syntax and semantics

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (494.37 KB, 99 trang )

Chapter 3

Describing Syntax
and Semantics

ISBN 0-321-33025-0

Chapter 3 Topics
• Introduction
• The General Problem of Describing Syntax
• Formal Methods of Describing Syntax
• Attributes Grammars

• Describing the Meanings of Programs:
Dynamic Semantics

Copyright © 2006 Addison-Wesley. All rights reserved.

1-2

Introduction
• A language may be hard to learn, hard to
implement, and any ambiguity in the
specification may lead to dialect differences if
we do not have a clear language definition
• Most new programming languages are subjected
to a period of scrutiny by potential users before
their designs are completed
• Who must use language definitions

– Other language designers
– Implementors
– Programmers (the users of the language)
Copyright © 2006 Addison-Wesley. All rights reserved.

1-3

Introduction (cont.)
• The study of programming languages can be
divided into examinations of syntax and
semantics
– Syntax - the form or structure of the expressions,
statements, and program units
– Semantics - the meaning of the expressions,
statements, and program units

• Semantics should follow from syntax, the form
of statements should be clear and imply what
the statements do or how they should be used
Copyright © 2006 Addison-Wesley. All rights reserved.

1-4

Example
•Syntax Example: Simple C if statement
if (<expr>)
<true-statement>
else

<false-statement>
•Semantics Example:
If the expression evaluated to true execute
the true statement otherwise execute the false
statement
Copyright © 2006 Addison-Wesley. All rights reserved.

1-5

The General Problem of Describing
Syntax
• A sentence is a string of characters over some
alphabet
• A language is a set of sentences
• A lexeme is the lowest level syntactic unit of a
language (e.g., *, +, =, sum, begin)
• A token is a category of lexemes (e.g.,
identifier, number, operator, …)

Copyright © 2006 Addison-Wesley. All rights reserved.

1-6

Example
index = 2 * count + 10;

Lexeme

Token

index
=
2

identifier
equal_sign
int_literal

*
count
+
10
;

mult_op
identifier
plus_op
int_literal
semicolon

Copyright © 2006 Addison-Wesley. All rights reserved.

1-7

The Definition of Languages
• Languages can be formally defined in two
distinct ways: by recognition and by

generation
• Language Recognizers
– A recognition device of the language reads
input strings and decides whether the input
strings belong to the language
– Example: syntax analysis part of a compiler

Copyright © 2006 Addison-Wesley. All rights reserved.

1-8

The Definition of Languages (cont.)
• Language Generators
– A device that generates sentences of a language
– One can determine if the syntax of a particular
sentence is correct by comparing it to the
structure of the generator

Copyright © 2006 Addison-Wesley. All rights reserved.

1-9

Language Recognizers vs. Generators
• The language recognizer can only be used
in trial-and-error mode (black box)
• The structure of the language generator is
an open-book which people can easily read
and understand

Copyright © 2006 Addison-Wesley. All rights reserved.

1-10

Formal Methods of Describing Syntax
• This section discusses the formal language
generation mechanisms that are commonly
used to describe the syntax of programming
languages
• These mechanisms are often called
grammars
• We will discuss the class of languages called
context-free languages
Copyright © 2006 Addison-Wesley. All rights reserved.

1-11

Context-Free Grammars
• Noam Chomsky
– A linguist
– Described four classes of grammars in the mid-1950s

• Two of these grammar class, context-free
and regular grammars, are useful in
computer science
– The tokens of programming languages can be
described by regular grammars

– Whole programming languages, with minor
exceptions, can be described by context-free
grammars
Copyright © 2006 Addison-Wesley. All rights reserved.

1-12

Backus-Naur Form
• Invented by John Backus to describe Algol 58
• BNF is a metalanguage
– A metalanguage is a language used to describe
another language

• BNF is equivalent to context-free grammars
• In BNF, abstractions are used to represent
syntactic structures (also called nonterminal
symbols)
Copyright © 2006 Addison-Wesley. All rights reserved.

1-13

Backus-Naur Form (cont.)
• Although BNF is simple, it is sufficiently
powerful to describe the great majority of
the syntax of programming languages:
– Lists of similar constructs
– The order in which different constructs must
appear

– Nested structures to any depth
– Operator precedence
– Operator associativity
– …
Copyright © 2006 Addison-Wesley. All rights reserved.

1-14

Grammar and Rules
• A grammar is a finite nonempty set of rules
• A rule has a left-hand side (LHS) and a righthand side (RHS), and consists of terminal
(lexeme or token) and nonterminal symbols
<assign>  <var> = <expression>

• An abstraction (or nonterminal symbol) can
have more than one RHS
<if_stmt> 
if <logic_expr> then <stmt> |
if <logic_expr> then <stmt> else <stmt>
Copyright © 2006 Addison-Wesley. All rights reserved.

1-15

Describing Lists
• Syntactic lists (for example, a list of
identifiers appearing on a data declaration
statement) are described using recursion
• A rule is recursive if its LHS appears in its

RHS
<ident_list>  identifier | identifier , <ident_list>

Note: Comma „,‟ is a terminal

Copyright © 2006 Addison-Wesley. All rights reserved.

1-16

Example: A grammar of expressions
• Seven terminal symbols:
+*()xy
• Four non-terminal symbols:
‹expr› ‹term› ‹factor› ‹var›
• Start/goal symbol:
‹expr›

Rules of grammar
‹expr› 

‹term› | ‹expr› + ‹term› | ‹expr› - ‹term›

‹term› 

‹factor› | ‹term› * ‹factor›

‹factor› 

‹var› | ( ‹expr› )

‹var› 

x | y

Copyright © 2006 Addison-Wesley. All rights reserved.

1-17

Derivations
• A derivation is a repeated application of
rules, starting with the start symbol and
ending with a sentence (all terminal
symbols)
• Derivations can be used to generate all the
possible sentences in a grammar
• Every string of symbols in the derivation is a
sentential form

Copyright © 2006 Addison-Wesley. All rights reserved.

1-18

Example
<sentence>  <noun-phrase> <verb-phrase> .
<noun-phrase>  <article> <noun>
<article>  a | the
<noun>  girl | dog

<verb-phrase>  <verb> <noun-phrase>
<verb>  sees | pets
<sentence> 









<noun-phrase> <verb-phrase> .
<article> <noun> <verb-phrase> .
the <noun> <verb-phrase> .
the girl <verb-phrase> .
the girl <verb> <noun-phrase> .
the girl sees <noun-phrase> .
the girl sees <article> <noun> .
the girl sees a <noun> .
the girl sees a dog .

Copyright © 2006 Addison-Wesley. All rights reserved.

1-19

Derivations (cont.)
• A sentence is a sentential form that has only
terminal symbols

• A leftmost derivation is one in which the
leftmost nonterminal in each sentential form
is the one that is expanded

• A derivation may be neither leftmost nor
rightmost
• Derivation order should have no effect on the
language generated by a grammar
Copyright © 2006 Addison-Wesley. All rights reserved.

1-20

Context free?
• In the previous example you might wonder
about the idea of context
• In a context-free grammar we find that
replacements do not have any context in which
they cannot occur
– The dog pets the girl = 
– The girl pets the dog = 

• Of course this means that there are certain
contexts that the rules don‟t work, thus it would
not be “context-free”
Copyright © 2006 Addison-Wesley. All rights reserved.

1-21

Example: A left derivation of expression:
(x-y)*x+y
‹expr› 
‹expr› + ‹term› 
‹term› + ‹term› 
‹term› * ‹factor› + ‹term› 
‹factor› * ‹factor› + ‹term› 
( ‹expr› ) * ‹factor› + ‹term› 
( ‹expr› - ‹term› ) * ‹factor› + ‹term› 
( ‹term› - ‹term› ) * ‹factor› + ‹term› 
( ‹factor› - ‹term› ) * ‹factor› + ‹term› 
( ‹var› - ‹term› ) * ‹factor› + ‹term› 
( x - ‹term› ) * ‹factor› + ‹term› 
( x - ‹factor› ) * ‹factor› + ‹term› 
( x - ‹var› ) * ‹factor› + ‹term› 
( x - y ) * ‹factor› + ‹term› 
( x - y ) * ‹var› + ‹term› 
( x - y ) * x + ‹term› 
( x - y ) * x + ‹factor› 
( x - y ) * x + ‹var› 
(x-y)*x+y
Copyright © 2006 Addison-Wesley. All rights reserved.

1-22

Example: A grammar for a small language
 begin <stmt_list> end
<stmt_list>  <stmt> | <stmt> ; <stmt_list>
<stmt>  <var> = <expression>

<var>  A | B | C
<expression>  <var> + <var> |
<var> – <var> |
<var>

Copyright © 2006 Addison-Wesley. All rights reserved.

1-23

Example: A left derivation of grammar

 begin <stmt_list> end
 begin <stmt> ; <stmt_list> end
 begin <var> = <expression> ; <stmt_list> end
 begin A = <expression> ; <stmt_list> end
 begin A = <var> + <var> ; <stmt_list> end
 begin A = B + <var> ; <stmt_list> end
 begin A = B + C ; <stmt_list> end
 begin A = B + C ; <stmt > end
 begin A = B + C ; <var> = <expression> end
 begin A = B + C ; B = <expression> end
 begin A = B + C ; B = <var> end
 begin A = B + C ; B = C end
Copyright © 2006 Addison-Wesley. All rights reserved.

1-24

Parse Tree

• Grammars naturally describe the hierarchical
syntactic structure of the sentences of the
languages they define
• These hierarchical structures are called parse
trees
– Every internal node of a parse tree is labeled with a
nonterminal symbol
– Every leaf is labeled with a terminal symbol
– Every subtree of a parse tree describes one instance of
an abstraction in the statement
Copyright © 2006 Addison-Wesley. All rights reserved.

1-25

Chapter 3 describing syntax and semantics

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về