INTRODUCTION TO COMPUTER SCIENCE - PART 8 pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (114.26 KB, 3 trang )

INTRODUCTION TO COMPUTER SCIENCE
HANDOUT #8. REGULAR EXPRESSIONS
K5 & K6, Computer Science Department, Vaên Lang University
Second semester Feb, 2002
Instructor: Traàn Ñöùc Quang
Major themes:
1. Introduction
2. Algebraic Laws for Regular Expressions
Reading: Sections 10.5 and 10.7.
8.1 INTRODUCTION
In the previous handout, we have studied a finite automaton which is, in a sense, a
machine recognizing character strings with some patterns. In this handout, we meet
regular expressions which is an algebraic way to describe a string pattern. Regular
expressions are analogous to the algebra of arithmetic expressions with which we are
all familiar, and the relational algebra that we met in Handout #3.
Interestingly, the set of patterns that can be expressed in the regular-expression
algebra is exactly the same set of patterns that can be described by automata. For
example, the regular expression a | bc* can express the patterns described by the fol-
lowing automaton.
A
start
B
a
D
b
c
46 INTRODUCTION TO COMPUTER SCIENCE: HANDOUT #8. REGULAR EXPRESSIONS
A set of strings denoted by a regular expression E is called a regular language and
can be referred to as L(E). Given two regular languages L and M, we can construct new
languages by applying the operators below one or more times to these languages:
1. The union of two languages L and M, denoted L ∪ M, is the set of strings that

are in either L or M. For example, if L = {00, 01, 11} and M = {aa, ab}, the union
L ∪ M = {00, 01, 11, aa, ab}.
2. The concatenation of languages L and M, denoted LM, is the set of strings that
can be formed by taking any string in L and concatenating it with any string in
M. For the above example, we have LM = {00aa, 01aa, 11aa, 00ab, 01ab, 11ab}.
3. The closure (star, or Kleene closure) of a language L is denoted L* and represents
the set of strings that can be formed by taking any number of strings from L and
concatenating all of them.
Algebra of regular expressions, like all kinds of algebras, starts with some elementary
expressions, usually constants and/or variables denoting languages. We then construct
more expressions by applying the set of three operators (union, concatenation, star) to
these elemetary expressions and to previously constructed expressions. In particular:
1. Empty string is denoted by the regular expression ε.
2. A symbol a is denoted by the regular expression a.
3. Suppose R and S are two regular expressions denoting the languages L(R) and
L(S), respectively.
a) R + S (or R | S) is a regular expression for the union of the languages L(R)
and L(S).
b) RS is a regular expression for the concatenation of the languages L(R) and
L(S).
c) R* is a regular expression for the Kleen closure of the language L(R).
For a simple example, suppose letter is any character in the alphabet, digit is
any decimal digits, and under is the symbol _. We can define a regular expression as a
pattern of legal identifiers in C as follows:
identifiers = (letter|under)(letter|digit|under)*
As in arithmetic expressions, we can use parentheses to group regular expressions.
8.2 ALGEBRAIC LAWS FOR REGULAR EXPRESSIONS
It is possible for two regular expressions to denote the same language, just as two
arithmetic expressions can denote the same function of their operands. As an example,
the arithmetic expressions x + y and y + x each denote the same function of x and y,

8.3 GLOSSARY 47
because addition is commutative. So are the regular expressions R + S and S + R.
We now list some common algebraic laws for regular expressions with no proofs.
More useful laws can be found in the textbook.
1. Commutativity of union. (R | S) ≡ (S | R)
2. Associativity of union. ((R | S) | T) ≡ (R | (S | T))
3. Associativity of concatenation. ((RS)T) ≡ (R(ST))
4. Left distributivity of concatenation over union. (R(S | T)) ≡ (RS | RT)
5. Right distributivity of concatenation over union. ((S | T)R) ≡ (SR | TR)
6. Idempotence of union. (R | R) ≡ R
7. RR* ≡ R*R
8.3 GLOSSARY
Regular Expression: Biểu thức chính quy.
Arithmetic Expression: Biểu thức số học.
Algebra: Đại số.
algebra of regular expressions: đại số biểu thức chính quy.
algebra of arithmetic expressions: đại số biểu thức số học.
set algebra: đại số tập hợp.
relational algebra: đại số quan hệ.
Algebraic law: Luật đại số (tính chất đại số).
Commutative law, commutativity: Luật giao hoán, tính giao hoán.
Associative law, associativity: Luật kết hợp, tính kết hợp.
Distributive law, distributivity: Luật phân phối, tính phân phối.
Idempotence: Tính lũy đẳng.
Regular language, regular set: Ngôn ngữ chính quy, tập chính quy.
Union: Phép hợp.
Concatenation: Phép ghép nối.
Star, Kleene closure: Phép toán sao, phép lấy bao đóng Kleene.

INTRODUCTION TO COMPUTER SCIENCE - PART 8 pot

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về