Tải bản đầy đủ (.pdf) (4 trang)

INTRODUCTION TO COMPUTER SCIENCE - PART 9 potx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (134.5 KB, 4 trang )

INTRODUCTION TO COMPUTER SCIENCE
HANDOUT #9. GRAMMARS
K5 & K6, Computer Science Department, Vaên Lang University
Second semester Feb, 2002
Instructor: Traàn Ñöùc Quang
Major themes:
1. Context-Free Grammars
2. Languages from Grammars
Reading: Sections 11.2 and 11.3.
9.1 CONTEXT-FREE GRAMMARS
In the last two handouts, we met the two equivalent ways to decribe patterns. In this
handout, we shall see another even more powerful way, called context-free grammars
(or "grammars"), in the sense they can describe more languages than the two others.
Suppose we want to define arithmetic expressions that involve
1. The four binary operators, +, −, ∗, and /,
2. Parentheses for grouping, and
3. Operands that are numbers.
The usual definition is of the form:
BASIS. A number is an expression.
INDUCTION. If E is an expression, then each of the following is also an expression.
1. ( E ). That is, we may place parentheses around an expression to get a new
expression.
2. E + E. That is, two expressions connected by a plus sign is an expression.
3. E − E. This and the next two rules are analogous to (2) with the other operators.
4. E ∗ E.
5. E / E.
50 INTRODUCTION TO COMPUTER SCIENCE: HANDOUT #9. GRAMMARS
To be more succinct and concise, we can use a grammar to define our expressions:
(1) <Expression> → number
(2) <Expression> → (<Expression>)
(3) <Expression> → <Expression> + <Expression>


(4) <Expression> → <Expression> −− <Expression>
(5) <Expression> → <Expression> ∗ <Expression>
(6) <Expression> → <Expression> / <Expression>
The symbol <Expression> is called a syntactic category or a variable which stands
for any arithmetic expression. The symbol → means "can be composed of". For exam-
ple, rule (2) states that an expression can be composed of a left parenthesis followed by
any string that is an expression followed by a right parenthesis.
There are three kinds of symbols that appear in grammars.
1. The first are "metasymbols," symbols that play special roles and do not stand for
themselves. The only example we have seen so far is the symbol →, which is
used to seperate the syntactic category being defined from a way in which
strings of that syntactic category may be composed.
2. The second kind of symbol is a syntactic category, which as we mentioned repre-
sents a set of strings being defined.
3. The third kind of symbol is called a terminal, which can be characters such as +,
or (, or they can be any abstract symbol that is known or does not need to define
in the grammar. The symbol number in our grammar is of this kind of symbol.
A context-free grammar consists of one or more productions. Each line in our grammar
is a production. In general, a production has three parts:
1. A head, which is the syntactic category on the left side of the arrow,
2. The metasymbol →, and
3. A body, consisting of zero or more syntactic categories and/or terminals on the
right side of the arrow.
Our grammar for simple expressions has six productions numbered 1 to 6.
We can augment the grammar for expressions by providing productions for number, a
symbol has been viewed as a terminal, and productions for a new syntactic category
<Digit>. Three more productions can be added to our working grammar.
(7) <Digit> → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
(8) <Number> → <Digit>
(9) <Number> → <Number> <Digit>

9.1 CONTEXT-FREE GRAMMARS 51
In fact, the production for <Digit> is composed of ten productions, each for one of
ten decimal digits.
<Digit> → 0
<Digit> → 1
. . .
<Digit> → 9
A more complex grammar for expressions can be:
(1) <Expression> → <Number>
(2) <Expression> → ( <Expression> )
(3) <Expression> → <Expression> + <Expression>
(4) <Expression> → <Expression> −− <Expression>
(5) <Expression> → <Expression> * <Expression>
(6) <Expression> → <Expression> / <Expression>
(7) <Digit> → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
(8) <Number> → <Digit>
(9) <Number> → <Number> <Digit>
We can also describe the structure of control flow in language like C grammatically.
For a simple example, it helps to imagine that there are abstract terminals condition
and simpleStat. The former stands for a conditional expression. We could replace this
terminal by a syntactic category, say <Condition>. The productions for <Condition>
would resemble those of our expression grammar above, but with logical operators like
&&, comparison operators like <, and the arithmetic operators.
The terminal simpleStat stands for a statement that does not involve nested control
structure, such as an assignment, function call, break, continue, return. Again, we
could replace this terminal by a syntactic category and the productions to expand it.
In the grammar for statements below, we use keywords like if, else, or while,
punctuators like { or ;, as terminals.
<Statement> → while ( condition ) <Statement>
<Statement> → if ( condition ) <Statement>

<Statement> → if ( conditon ) <Statement> else <Statement>
<Statement> → { <StatList> }
<Statement> → simpleStat ;
<StatList> → ε
<StatList> → <StatList> <Statement>
52 INTRODUCTION TO COMPUTER SCIENCE: HANDOUT #9. GRAMMARS
9.2 LANGUAGES FROM GRAMMARS
A grammar is essentially an inductive definition involving sets of strings. Thus, from a
grammar for a syntactic category, we can produce the set of strings that are of this
syntactic category by walking around the grammar and applying the productions to get
more and more strings.
If a grammar consists of more than one syntactic category, by convention, the syn-
tactic category that we want to get its strings is written first. In some compiler text-
books, this syntactic category is called the start symbol. For example, in the first our
grammar, the start symbol is <Expression>; whereas in the second, the start symbol is
<Statement>.
9.3 GLOSSARY
Grammar: Văn phạm.
Context-free grammar: Văn phạm phi ngữ cảnh.
Syntax: Cú pháp.
Syntactic Category: Phạm trù cú pháp.
Plus sign: Dấu cộng.
Minus sign: Dấu trừ.
Metasymbol: Meta ký hiệu.
Terminal: Ký hiệu tận, tận.
Nonterminal: Ký hiệu chưa tận, chưa tận.
Production: Luật sinh.
Head: Đầu (luật sinh).
Body: Thân (luật sinh).
Decimal Digit: Ký số thập phân.

Control Structure: Cấu trúc điều khiển.
Start Symbol: Ký hiệu khởi đầu, khởi tự.

×