Tải bản đầy đủ (.pdf) (538 trang)

Ebook Data structures and problem solving using C++ (2nd edition) Part 2

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (18.76 MB, 538 trang )

Chapter 12
Stacks and Compilers
Stacks are used extensively in compilers. In this chapter we present two simple components of a compiler: a balanced symbol checker and a simple calculator. We do so to show simple algorithms that use stacks and to show how
the STL classes described in Chapter 7 are used.
In this chapter, we show:
how to use a stack to check for balanced symbols,
how to use a state machine to parse symbols in a balanced symbol
program, and
how to use operator precedence parsing to evaluate infix expressions
in a simple calculator program.

12.1

Balanced-Symbol Checker

As discussed in Section 7.2, compilers check your programs for syntax
errors. Frequently, however, a lack of one symbol (such as a missing * /
comment-ender or 1) causes the compiler to produce numerous lines of
diagnostics without identifying the real error. A useful tool to help debug
compiler error messages is a program that checks whether symbols are balanced. In other words, every { must correspond to a 1, every [ to a l , and so
on. However, simply counting the numbers of each symbol is insufficient.
For example, the sequence [ ( ) 1 is legal, but the sequence [ ( I ) is wrong.

12.1.1 Basic Algorithm
A stack is useful here because we know that when a closing symbol such as
is seen, it matches the most recently seen unclosed ( . Therefore, by placing
an opening symbol on a stack, we can easily determine whether a closing
symbol makes sense. Specifically, we have the following algorithm.

A stack can be used
detect mismatched


symbols.


Stacks and Compilers

I >

Symbols: ( [

(

[

I

>*

[

)

)

*

[

eof*

Errors (indicated by *):

(when expecting)
(with no matching opening symbol

[ unmatched at end of input

Figure 12.1

Stack operations in a balanced-symbol algorithm

1. Make an empty stack.
2. Read symbols until the end of the file.
a. If the symbol is an opening symbol, push it onto the stack.
b. If it is a closing symbol do the following.
i. If the stack is empty, report an error.
ii. Otherwise, pop the stack. If the symbol popped is not the
corresponding opening symbol, report an error.
3. At the end of the file, if the stack is not empty, report an error.

Symbols in
comments, string
constants, and
character constants
need not be balanced.

Line numbers are
needed for
meaningful error
messages.

In this algorithm, illustrated in Figure 12.1, the fourth, fifth, and sixth symbols all generate errors. The > is an error because the symbol popped from

the top of stack is a (, so a mismatch is detected. The ) is an error because
the stack is empty, so there is no corresponding ( . The [ is an error detected
when the end of input is encountered and the stack is not empty.
To make this technique work for C++ programs, we need to consider all
the contexts in which parentheses, braces, and brackets need not match. For
example, we should not consider a parenthesis as a symbol if it occurs inside
a comment, string constant, or character constant. We thus need routines to
skip comments, string constants, and character constants. A character constant in C++ can be difficult to recognize because of the many escape
sequences possible, so we need to simplify things. We want to design a program that works for the bulk of inputs likely to occur.
For the program to be useful, we must not only report mismatches but
also attempt to identify where the mismatches occur. Consequently, we keep
track of the line numbers where the symbols are seen. When an error is
encountered, obtaining an accurate message is always difficult. If there is an
extra 1 , does that mean that the > is extraneous? Or was a I missing earlier?


Balanced-Symbol Checker

We keep the error handling as simple as possible, but once one error has
been reported, the program may get confused and start flagging many errors.
Thus only the first error can be considered meaningful. Even so, the program
developed here is very useful.

12.1.2 Implementation
The program has two basic components. One part, called tokenization, is
the process of scanning an input stream for opening and closing symbols
(the tokens) and generating the sequence of tokens that need to be recognized. The second part is running the balanced symbol algorithm, based on
the tokens. The two basic components are represented as separate classes.
Figure 12.2 shows the Tokenizer class interface, and Figure 12.3
shows the Balance class interface. The Tokenizer class provides a constructor that requires an istream and then provides a set of accessors that

can be used to get
-

-

Tokenization is the
process Of generating
the sequence of
symbols~tokens,that
need to be
recognized.

the next token (either an openinglclosing symbol for the code in this
chapter or an identifier for the code in Chapter 13),
the current line number, and
the number of errors (mismatched quotes and comments).
The Tokenizer class maintains most of this information in private data
members. The Balance class also provides a similar constructor, but its
only publicly visible routine is checkBalance,shown at line 24. Everything else is a supporting routine or a class data member.
We begin by describing the Tokenizer class. inputstreamis a reference to an istream object and is initialized at construction. Because of the
ios hierarchy (see Section 4. I), it may be initialized with an ifstream
object. The current character being scanned is stored in ch,and the current
line number is stored in currentline.Finally, an integer that counts the
number of errors is declared at line 37. The constructor, shown at lines 22
and 23, initializes the error count to 0 and the current line number to 1 and
sets the istream reference.
We can now implement the class methods, which as we mentioned, are
concerned with keeping track of the current line and attempting to differentiate symbols that represent opening and closing tokens from those that are
inside comments, character constants, and string constants. This general process of recognizing tokens in a stream of symbols is called lexical analysis.
Figure 12.4 shows a pair of routines, nextchar and putBackChar.The

nextchar method reads the next character from inputstream,assigns it

~exica~analysis
is

wedto ignore
comments and
re,,gnize symbols.


Stacks and Compilers

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

#include <fstream>
#include <vector>

#include <stack>
#include <stdlib.h>
using namespace std;
/ / Tokenizer class.
/ / CONSTRUCTION: with an istream that is ready to be read.
//
/ / ******************PUBLIC
OPERATIONS***********************
//
//
//
//
//
//

char getNextOpenClose( i
int getLineNumber( )
int getErrorCount ( )
string getNextID( )

Return next open/close symbol
Return current line number
Return number of parsing errors
Return next C++ identifier
(see Section 13.2)

-->
-->
- >
-->


*******I*****~****ERRORS**I*~*****************************

/ / Mismatched ' ,

" , and EOF reached in a comment are noted.

class Tokenizer
{

public:
Tokenizer( istream
: currentline( 1

&

) ,

input )
errors( 0

) ,

inputstream( input

)

I

j


/ / The public routines.
char getNextOpenClose( ) ;
string getNextID( ) ;
int getLineNumber( ) const;
int getErrorCount( ) const;

private:
enum CommentType

{

SLASH-SLASH, SLASH-STAR 1 ;

istream & inputstream;
char ch;
int currentline;
int errors;

i/
//
//
//

Reference to the input stream
Current character
Current line
Number of errors detected

/ / A host of internal routines.

boo1 nextchar ( ) ;
void putBackChar ( ) ;
void skipcomment( ComrnentType start
void skipQuote( char quoteType ) ;
string getRemainingString( ) ;

) ;

1;

Figure 12.2

The Tokenizer class interface, used to retrieve tokens from an input
stream.


Balanced-Symbol Checker

1
2
3
4
5
6
7
8
9
10
11
12

13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

#include "Tokenizer.hn
#include <iostream>
using namespace std;
/ / Symbol is the class that will be placed on the Stack.
struct Symbol

i
char token;
int theline;


1;
/ / Balance class interface: check for balanced symbols.

//
/ / CONSTRUCTION: with an istream object.
/ / ******************PUBLIC
OPERATIONS********************

/ / int CheckBalance( )
//

-->

Print mismatches
return number of errors

class Balance
{

public:
Balance( istream
int checkBalance (
private:
Tokenizer tok;
int errors;

&

input


)

:

tok( input

) ,

errors( 0

)

{

) ;

/ / Token source
/ / Mismatched openiclose symbol errors

void checkMatch( const Symbol

&

opSym, const Symbol

&

clSp 1;

I;


Figure 12.3

I

Class interface for a balanced-symbol program.

to ch,and updates currentLine if a newline is encountered. It returns
false only if the end of the file has been reached. The complementary procedure putBackChar puts the current character, ch,back onto the input
stream, and decrements currentLine if the character is a newline. Clearly,
putBackChar should be called at most once between calls to nextchar;as
it is a private routine, we do not worry about abuse on the part of the class
user. Putting characters back onto the input stream is a commonly used technique in parsing. In many instances we have read one too many characters,
and undoing the read is useful. In our case this occurs after processing a / .
We must determine whether the next character begins the comment start
token; if it does not, we cannot simply disregard it because it could be an
opening or closing symbol or a quote. Thus we pretend that it is never read.


-

Stacks and Compilers

1
2
3
4

5
6

7
8
9
10
11
12
13
14
15
16
17
18
19
20

//
//
//
//
//

nextchar sets ch based on the next character in
inputstream and adjusts currentLine if necessary.
It returns the result of get.
putBackChar puts the character back onto inputstream.
Both routines adjust currentLine if necessary.
boo1 Tokenizer::nextChar( )
i
if ( !inputStream.get(ch ) )
return false;

if( ch = = ' \ n ' )
currentline++;
return true;

1
void Tokenizer::putBackChar( )
i
inputStream.putback( ch ) ;
if ( ch == \ n ' )
currentline--;

1

Figure 12.4

The state machine is
a common technique
used to parse
symbols; at any pointy
it is in some state,
and each input
character takes it to a
new state. Eventually,
the state machine
reaches a state in
which a svmbol has
been recognized.

The nextchar routine for reading the next character, updating
currentLine if necessary, and returning true if not at the end

of file; and the putBackChar routine for putting back ch and
updating currentLine if necessary.

Next is the routine skipcomment,shown in Figure 12.5. Its purpose is
to skip over the characters in the comment and position the input stream so
that the next read is the first character after the comment ends. This technique is complicated by the fact that comments can either begin with / / , in
which case the line ends the comment, or / *, in which case * / ends the
comment.] In the / / case, we continually get the next character until either
the end of file is reached (in which case the first half of the && operator fails)
or we get a newline. At that point we return. Note that the line number is
updated automatically by nextchar. Otherwise, we have the / * case,
which is processed starting at line 15.
The skipcomment routine uses a simplified state machine. The state
machine is a common technique used to parse symbols; at any point, it is in
some state, and each input character takes it to a new state. Eventually, it
reaches a state at which a symbol has been recognized.
In skipcomment,at any point, it has matched 0, 1 , or 2 characters of
the * / terminator, corresponding to states 0, 1, and 2. If it matches two characters, it can return. Thus, inside the loop, it can be in only state 0 or 1
because, if it is in state 1 and sees a /, it returns immediately. Thus the state
I . We do not consider deviant cases involving \ .


Balanced-Symbol Checker
1
2
3
4
5
6
7

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

/ / Precondition: We are about to process a comment;
//
have already seen comment start token.
/ / Postcondition: Stream will be set immediately after
/I'
comment ending token.
void Tokenizer::skipComment( CommentType start i
{

if( start == SLASH-SLASH )

i
while( nextchar ( ) & &

ch

(

!=

' \ n l)

)

return;
1
/ / ~ o o kfor * /
boo1 state = false;

while( nexrChar(

)

/ / Seen first char in comment ender.

)

I
if( state & & ch ==
return;
state = ( c h = = ' * '


'

/

I

)

) ;

1
errors++;
cout << "Unterminated comment at line "
<< getLineNumber( ) < < endl;
1

Figure 12.5

The skipcomment routine for moving past an already
started comment.

can be represented by a Boolean variable that is true if the state machine is
in state 1 . If it does not return, it either goes back to state 1 if it encounters a
* or goes back to state 0 if it does not. This procedure is stated succinctly at
line 21.
If we never find the comment-ending token, eventually nextchar
returns false and the while loop terminates, resulting in an error message.
The skipQuote method, shown in Figure 12.6, is similar. Here, the parameter is the opening quote character, which is either " or . In either case, we
need to see that character as the closing quote. However, we must be prepared to handle the \ character; otherwise, our program will report errors

\vhen it is run on its own source. Thus we repeatedly digest characters. If the
current character is a closing quote, we are done. If it is a newline, we have
an unterminated character or string constant. And if it is a backslash, we
digest an extra character without examining it.
Once we've written the skipping routine, writing getNextOpenClose
is easier. If the current character is a / , we read a second character to see


Stacks and Compilers

1
2
3
4

/ / Precondition: We are about to process a quote;
//
have already seen beginning quote.
/ / Postcondition: Stream will be set immediately after
//
matching quote.

5 void Tokenizer::skipQuote( char quoteType
6 {
7
8
9
10
11
12

13
14
15
16
17
18
19
20
21
22 1

while ( nextchar (

)

)

)

{

if ( ch == quoteType
return;
if( ch == '\n' )

)

{

cout


"Missing closed quote at line " <<
( getLineNumber ( ) - 1 ) << endl;

i<

errors++;
return;

1
/ / If a backslash, skip next character.
else if ( ch == ' \ \ I )
nextchar( ) ;

1

Figure 12.6

The skipQuote routine for moving past an already started
character or string constant.

whether we have a comment. If so, we call skipcomment;if not, we undo
the second read. If we have a quote, we call skipQuote.If we have an
opening or closing symbol, we can return. Otherwise, we keep reading until
we eventually run out of input or find an opening or closing symbol. The
entire routine is shown in Figure 12.7.
The getLineNumber and getErrorCount methods are one-liners that
return the values of the corresponding data members and are not shown. We
discuss the getNextID routine in Section 13.2.2 when it is needed.
In the Balance class, the balanced symbol algorithm requires that we

place opening symbols on a stack. In order to print diagnostics, we store a
line number with each symbol, as shown previously in the symbol s truct
at lines 6 to 10 in Figure 12.3.
The checkBalance routine is implemented as shown in Figure 12.8. It
follows the algorithm description almost verbatim. A stack that stores pending opening symbols is declared at line 7. Opening symbols are pushed onto
the stack with the current line number. When a closing symbol is encountered and the stack is empty, the closing symbol is extraneous; otherwise, we
remove the top item from the stack and verify that the opening symbol that
was on the stack matches the closing symbol just read. To do so we use the


Balanced-Symbol Checker
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

20
21
22
23
24
25
26

/ / Return the next opening or closing symbol or '\O' (if EOF) .
/ / Skip past comments and character and string constants.

char Tokenizer::getNextOpenClose(

)

I
while( nextchar( )
i
if( ch == ' / '

)
)

{

if ( nextchar(

)

)


{

if ( ch == I * ' )
skipcomment( SLASH-STAR ) ;
else if( ch == ' / ' )
skipcomment( SLASH-SLASH 1 ;
else if( ch ! = ' \ n ' )
putBackChar ( ) ;

1
1
else if( ch == I \ "
skipQuote( ch )
else if( ch ==
ch==
return ch;
I

(

'

I

)

'

/


( ch == ' " '

)

;

I/
11

ch ==
ch==

' [ I

Ij

'I' 11

ch ==
ch ==

11

' { I
I

)

'


)

1
return ' \ O 1 ;

/ / End of file

)

Figure 12.7

The getNextOpenClose routine for skipping comments and
quotes and returning the next opening or closing character.

checkMatch routine, which is shown in Figure 12.9. Once the end of input

is encountered, any symbols on the stack are unmatched; they are repeatedly
output in the while loop that begins at line 40. The total number of errors
detected is then returned.
Note that the current implementation allows multiple calls to
checkBalance.However, if the input stream is not reset externally, all that
happens is that the end of the file is immediately detected and we return
immediately. We can add functionality to the Tokenizer class, allowing it
to change the stream source, and then add functionality to the Balance
class to change the input stream (passing on the change to the Tokenizer
class). We leave this task for you to do as Exercise 12.9.
Figure 12.10 shows that we expect a Balance object to be created
and then checkBalance to be invoked. In our example, if there are no
command-line arguments, the associated istream is cin;otherwise, we

repeatedly use istreams associated with the files given in the commandline argument list.

ThecheckBalance
does all the
algorithmic work.


1 / / Print error message for unbalanced symbols.
2 / / Return number of errors detected.
3 int Balance::checkBalance( )
4 {
5
char ch;
6
Symbol lastsymbol, match;
7
stack8
9
while( ( ch = tok.getNextOpenClose( ) ) ! = ' \ 0 ' )
10
{
11
1astSymbol.token = ch;
12
1astSymbol.theLine = tok.getLineNumber( ) ;
13
switch( ch )
14
15

{
16
case
case ' [ I : case
17
pendingTokens.push( lastsymbol ) ;
18
break;
19
20
case ' ) ' : case ' ] ' : case ' 1 ' :
21
if( pendingTokens.empty( ) )
22
I
23
tout << "Extraneous u << ch << " at line "
24
<< tok.getLineNurnber( ) << endl;
25
errors++;
26
1
27
else
28
I
29
match = pendingTokens.top( ) ;
30

pendingTokens.pop( ) ;
31
checkMatch( match, lastsymbol ) ;
32
1
33
break;
34
35
default: / / Can't happen
36
break ;
37
1
38
1
39
40
while( !pendingTokens.empty( ) )
41
I
42
match = pendingTokens.top( ) ;
43
pendingTokens.pop( ) ;
44
cout << "Unmatched " << match.token < < " at line "
45
<< match.theLine << endl;
46

errors++;
47
1
48
49
return errors + tok.getErrorCount( ) ;
50 1
'

Figure 12.8

(

I

:

The checkBalance algorithm.

' { I :


Balanced-Symbol Checker

1 / / Print an error message if clSym does not match opSym.
2 / / Update errors.
3 void Balance::checkMatch( const Symbol & opSym,
4
const Symbol & clSym )
5 {

6
if( opSym.token == ' ( ' & & clSym.token ! = ' ) ' I /
7
opSym.token == ' [ ' & & clSym.token ! = ' I ' ( I
8
opSym.token == ' { ' && clSym.token ! = ' 1 ' )
9
{
10
cout << "Found " << clSym.token
<< " on line " << tok.getLineNumber( )
11
12
<< " ; does not match " << opSym.token
13
<< ' at line " << opsym.theline << endl;
14
errors++;
15
1
16 }
Figure 12.9

1
2
3
4
5
6
7

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

The checkMatch routine for checking that the closing symbol
matches the opening symbol.

/ / main routine for balanced symbol checker.
int main( int argc, char **argv )

I
if( argc == 1


)

I
Balance p ( cin ) ;
if( p.checkBalance( ) == 0 )
cout << "NO errors" << endl;
return 0 ;
1

while( --argc

)

{

ifstream ifp( *++argv
if( !ifp )

);

(

cerr << "Cannot open " << *argv << endl;
continue;

1
cout << *argv << " : " << endl;
Balance p ( ifp ) ;
if ( p.checkBalance ( ) == 0 )
cout << "No errors" << endl;


1
return 0;
1

Figure 12.10 The main routine with command-line arguments.


-

Stacks and Compilers

12.2 A Simple Calculator
Some of the techniques used to implement compilers can be used on a
smaller scale in the implementation of a typical pocket calculator. Typically,
calculators evaluate infix expressions, such as 1+2,which consist of a
binary operator with arguments to its left and right. This format, although
often fairly easy to evaluate, can be more complex. Consider the expression

In an infix expression
a binary operator has
arguments to its left
and right.

Mathematically, this expression evaluates to 7 because the multiplication operator has higher precedence than addition. Some calculators give the
answer 9 , illustrating that a simple left-to-right evaluation is not sufficient;
we cannot begin by evaluating 1+2.Now consider the expressions

When there are
several operators,

precedence and
associativity
determine how the
operators are
processed.

in which is the exponentiation operator. Which subtraction and which
exponentiation get evaluated first? On the one hand, subtractions are processed left-to-right, giving the result 3 . On the other hand, exponentiation is
generally processed right-to-left, thereby reflecting the mathematical 23'
rather than (23)3.Thus subtraction associates left-to-right, whereas exponentiation associates from right-to-left. All of these possibilities suggest that
evaluating an expression such as
A

would be quite challenging.
If the calculations are performed in integer math (i.e., rounding down on
division), the answer is -8. To show this result, we insert parentheses to clarify ordering of the calculations:

Although the parentheses make the order of evaluations unambiguous,
they do not necessarily make the mechanism for evaluation any clearer. A
different expression form, called a postfix expression, which can be evaluated by a postfix machine without using any precedence rules, provides a
direct mechanism for evaluation. In the next several sections we explain
how it works. First, we examine the postfix expression form and show how
expressions can be evaluated in a simple left-to-right scan. Next, we show
algorithmically how the previous expressions, which are presented as infix
expressions, can be converted to postfix. Finally, we give a C++ program


-

-


A Simple Calculator

that evaluates infix expressions containing additive, multiplicative, and exponentiation operators-as well as overriding parentheses. We use an algorithm
called operator precedence parsing to convert an infix expression LO a
postfix expression in order to evaluate the infix expression.

12.2.1

Postfix Machines

A postfix expression is a series of operators and operands. A postfix
machine is used to evaluate a postfix expression as follows. When an operand is seen, it is pushed onto a stack. When an operator is seen, the appropriate number of operands are popped from the stack, the operator is evaluated,
and the result is pushed back onto the stack. For binarv onerators. which are
the most common, two operand\ are popped. When the complete postfix
expression is evaluated, the result should be a single item on the stack that
represents the answer. The postfix form represents a natural way to evaluate
expressions because precedence rules are not required.
A simple example is the postfix expression
d

l

Apostfixexpression
Can be evaluated as
follows. Operands are
pushed onto a single
stack. An operator
PoPsitsoPerands
and then pushes the

result. At the end of
the evaluation, the
stack should contain
only one element,
which represents the
result.

The evaluation proceeds as follows: 1. then a. and then 3 are each pushed
onto the stack. To process *, we pop the top two items on the stack: 3 and
then 2 . Note that the first item popped becomes the r h s parameter to the
binary operator and that the second item popped is the lhs parameter; thus
parameters are popped in reverse order. For multiplication, the order does
not matter, but for subtraction and division, it does. The result of the multiplication is 6 , and that is pushed back onto the stack. At this point, the top of
the stack is 6 ; below it is 1. To process the +, the 6 and 1 are popped, and
their sum. 7. is pushed. At this point, the expression has been read and the
stack has only one itern. Thus the final answer is 7 .
Every valid infix expression can be converted to postfix form. For example, the earlier long infix expression can be written in postfix notation as

Figure 12.11 shows the steps used by the postfix machine to evaluate this
expression. Each step involves a single push. Consequently, as there are
9 operands and 8 operators, there are 17 steps and 17 pushes. Clearly, the
time required to evaluate a postfix expression is linear.
The remaining task is to write an algorithm to convert from infix notation to postfix notation. Once we have it, we also have an algorithm that
evaluates an infix expression.

Evaluation of a
postfix
takes linear time.



-

---..

Stacks and Compilers

Postjix Expression: 1 2 - 4 5

A

3

* 6 * 7 2 2

A

A

/ -

Figure 12.1 1 Steps in the evaluation of a postfix expression.

12.2.2 Infix to Postfix Conversion
The operator
precedence parsing
algorithm converts an
infix expression to a
postfix expression,
so we can evaluate
the infix expression.


The basic principle involved in the operator precedence parsing algorithm,
which converts an infix expression to a postfix expression, is the following.
When an operand is seen, we can immediately output it. However, when we
see an operator, we can never output it because we must wait to see the second operand, so we must save it. In an expression such as

which in postfix form is

An operator stack is
used to store
operators that have
been seen but not yet
output.

a postfix expression in some cases has operators in the reverse order than
they appear in an infix expression. Of course, this order can occur only if the
precedence of the involved operators is increasing as we go from left to
right. Even so, this condition suggests that a stack is appropriate for storing


A Simple Calculator

operators. Following this logic, then, when we read an operator it must
somehow be placed on a stack. Consequently, at some point the operator
must get off the stack. The rest of the algorithm involves deciding when
operators go on and come off the stack.
In another simple infix expression

when we reach the - operator, 2 and 5 have been output and is on the
stack. Because - has lower precedence than ^ , the needs to be applied to 2

and 5.Thus we must pop the and any other operands of higher precedence
than - from the stack. After doing so, we push the -. The resulting postfix
expression is
A

A

A

In general, when we are processing an operator from input, we output those
operators from the stack that the precedence (and associativity) rules tell us
need to be processed.
A second example is the infix expression

When we reach the operator, 3 and 2 have been output and * is on the
stack. As has higher precedence than *, nothing is popped and goes on
the stack. The 5 is output immediately. Then we encounter a - operator. Precedence rules tell us that is popped, followed by the *. At this point, nothing is left to pop, we are done popping, and - goes onto the stack. We then
output 1. When we reach the end of the infix expression, we can pop the
remaining operators from the stack. The resulting postfix expression is
A

A

A

A

Before the summarizing algorithm, we need to answer a few questions.
First, if the current symbol is a + and the top of the stack is a +, should the +
on the stack be popped or should jt stay? The answer is determined by deciding whether the input + implies that the stack + has been completed. Because

+ associates from left to right, the answer is yes. However, if we are talking
about the operator, which associates from right to left, the answer is no.
Therefore, when examining two operators of equal precedence, we look at
the associativity to decide, as shown in Figure 12.12.
A

When an operator is
seen on the input,
operators of higher
priority (or left
associative operators
of equal priority) are
removed from the
stack, signaling that
they should be
applied.The input
operator is then
placed on the stack.


Infix Expression

Postfix Expression

Associativity

2 + 3 + 4

2 3 + 4 +


Left-associative:Input + is
lower than stack +.

2 " 3 " 4

234'"'

Right-associative:Input A is
higher than stack * .

Figure 12.12 Examples of using associativity to break ties in precedence.

A left parenthesis is
treated as a highprecedence operator
when it is an input
symbol but as a lowPrecedence operator
when it is on the
stack. A left
parenthesis is
removed only by a
right parenthesis.

What about parentheses? A left parenthesis can be considered a highprecedence operator when it is an input symbol but a low-precedence operator when it is on the stack. Consequently, the input left parenthesis is simply
placed on the stack. When a right parenthesis appears on the input, we pop
the operator stack until we come to a left parenthesis. The operators are written, but the parentheses are not.
The following is a summary of the various cases in the operator precedence parsing algorithm. With the exception of parentheses, everything
popped from the stack is output.
Operands: Immediately output.
Close parenthesis: Pop stack symbols until an open parenthesis
appears.

Operator: Pop all stack symbols until a symbol of lower precedence
or a right-associative symbol of equal precedence appears. Then push
the operator.
End of input: Pop all remaining stack symbols.

As an example, Figure 12.13 shows how the algorithm processes

Below each stack is the symbol read. To the right of each stack, in boldface,
is any output.

12.2.3 Implementation
The Evaluator
class will parse and
evaluate infix
expressions.

We now have the theoretical background required to implement a simple calculator. Our calculator supports addition, subtraction, multiplication, division, and exponentiation. We write a class template Evaluator that can be
instantiated with the type in which the math is to be performed (presumably,
int or double or perhaps a HugeInt class). We make a simplifying


A Simple

ca=m

Figure 12.13 Infix to postfix conversion.

assumption: Negative numbers are not allowed. Distinguishing between the
binary minus operator and the unary minus requires extra work in the scanning routine and also complicates matters because it introduces a nonbinary
operator. Incorporating unary operators is not difficult, but the extra code

does not illustrate any unique concepts and thus we leave it for you to do as
an exercise.
Figure
12.14 shows the Evaluator class interface, which is used to process a single string of input. The basic evaluation algorithm requires two
5tacks. The first stack is used to evaluate the infix expression and generate the
postfix expression. It is the stack of operators declared at line 33. An enumerated type, TokenType,is declared at line 20; note that the symbols are listed
in order of precedence. Rather than explicitly outputting the postfix expression, we send each postfix symbol to the postfix machine as it is generated.
Thus we also need a stack that stores operands. Consequently, the postfix
machine stack, declared at line 34, is instantiated with NumericType.Note
that, if we did not have templates, we would be in trouble because the two

We need two stacks:
an
stack and
a stack for the postfix
machine,


-

m-stacks

and Compilers

1
2
3
4
5
6


//
I/
//
//
//
//

Evaluator class interface: evaluate infix expression.
NumericType: Must have standard set of arithmetic operators
CONSTRUCTION: with a string.
******************PUBLIC
OPERATIONS***********************

7 / / NumericType getvalue( ) - - > Return value of infix expression
8 / / ******************ERRORS**********************************
9
10
11
12
13
14
15
16
17
18
19
20
21
22

23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

/ / Some error checking is performed.

#include <stdlib.h>
#include <math.h>
#include < fstream>
#include <iostream>
#include <sstream>
#include <vector>
#include <string>

using namespace std;
enum TokenType

{

EOL, VALUE, OPAREN, CPAREN, EXP,
MULT, DIV, PLUS, MINUS } ;

template <class NumericType>
class Evaluator
{

public:
Evaluator( const string & s )
{ opStack.push-back( EOL ) ;
NumericType getvalue(

);

:

str( s

)

}

/ / Do the evaluation

private:

vector<TokenType> opStack; / / Operator stack for conversion
vector<NumericType> postFixStack; / / Postfix machine stack
istringstream str;

/ / The character stream

/ / Internal routines
NumericType getTop ( ) ;
/ / Get top of postfix stack
void binaryOp( TokenType topop ) ; / / Process an operator
void processToken( const Token<NumericType> & lastToken ) ;
};

Figure 12.14 The Evaluator class interface.


A Simple Calculator
1 template <class NumericType>
2 class Token
3 I
4
public:
5
Token( TokenType tt = EOL, const NumericType
6
: theType ( tt ) , thevalue ( nt ) { 1

7
8
9

10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

TokenType getType( ) const
{ return theType; }
const NumericType & getvalue(
{ return thevalue; )

)

&

nt


=

0

)

const

private:
TokenType
theType;
NumericType thevalue;

1;
template <class NumericType>
class Tokenizer
{

public:
Tokenizer ( istream & is ) : in( is
Token<NumericType> getToken( ) ;
private:
istream

&

)

{


1

in;

1;

Figure 12.15 The Token class and Tokenizer class interface.

stacks hold items of different types.* The remaining data member is an
istringstream object used to step through the input line."
As was the case with the balanced symbol checker, we can write a
Tokenizer class that can be used to give us the token sequence. Although
we could reuse code, there is in fact little commonality, so we write a
Tokenizer class for this application only. Here, however, the tokens are a
little more complex because, if we read an operand, the type of token is
VALUE, but we must also know what the value is that has been read. Thus we
define both a Tokeni zer class and a Token class, shown in Figure 12.15. A
Token Stores both a TokenType,and if the token is a VALUE, its numeric
value. Accessors can be used to obtain information about a token. (The
2. We use vector instead of the stack adapter. since it provides basic stack operations via
push-back,pop-back, and back.
3 . The istringstream function is not yet available on all compilers. The online code has
a deprecated replacement for older compilers. See the online README file for detalls.


Stacks and Compilers

1
2
3

4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

34
35
36
37

/ / Find the next token, skipping blanks, and return it.
/ / Print error message if input is unrecognized.

template <class NumericType>
Token<NumericType> Tokenizer<NumericType>::getToken()

I
char ch;
NumericType thevalue;
/ / Skip blanks
while( in.get( ch

)

if( in.good(

ch ! = ' \ n l&& ch ! = '\0' )

)

&&

&&

ch = = '


'

)

I
switch( ch

)

I
case
case
case
case
case
case
case

' ' . return EXP;
' / ' : return DIV;
A

return
return
return
' + ' : return
' - ' : return
I


*

I

(

'

'

:

:

I

)

'

:

MULT;
OPAREN;
CPAREN;
PLUS;
MINUS;

default :
in.putback( ch ) ;

if ( ! ( in >> thevalue

)

)

I
cerr < < "Parse error" << endl;
return EOL;
}

return Token<NumericType>( VALUE, thevalue

);

1
}

return EOL;
}

Figure 12.16 The getToken routine for returning the next token in the
input stream.

getvalue function could be made more robust by signaling an error if
theType is not VALUE.) The Tokenizer class has one member function.
Figure 12.16 shows the getToken routine. First we skip past any

blanks, and when the loop at line 10 ends, we have gone past any blanks. If
we have not reached the end of line, we check to see whether we match any

of the one-character operators. If so, we return the appropriate token (a


A Simple Calculator
1 / / Public routine that performs the evaluation.
2 / / ~xaminesthe postfix machine to see if a single result
3 / / is left and if so, returns it; otherwise prints error.
4 template <class NumericType>
5 NumericType Evaluator<NumericType>::getValue( )
6 {
7
Tokenizer<NumericType> tok( str ) ;
8
Token<NumericType> lastToken;
9
10
do
11
{
12
lastToken = tok.getToken( ) ;
13
processToken( 1astToken ) ;
14
} while( lastToken.getType( ) ! = EOL ) ;
15
if ( postFixStack.empty ( ) )
16
17
I

18
cerr < < "Missing operand!" < < endl;
19
return 0;
20
1
21
22
NumericType theResult = postFixStack.back( 1 ;
23
postFixStack.pop-back( ) ;
24
if( !postFixStack.empty( ) )
25
cerr < < "Warning: missing operators!" < < endl;
26
return theResult;
27
28 1

Figure 12.17 The getvalue routine for reading and processing tokens and then
returning the item at the top of the stack.

Token object is constructed by using an implicit type conversion by virtue
of a one-parameter constructor). Otherwise, we reach the default case in
the switch statement. We expect that what remains is an operand, so we
unread ch,use operator>> to get the value, and then return a Token
object by. expKcitly
constructing a Token object based on the value read.
Note that for the putback to work we must use get.That is why we do not

C++ note: get must
simply use operator>> (in place of lines 10-13) to skip implicitly past the be used so that
blanks.
gutback works.
We can now discuss the member functions of the Evaluator class. The
only publicly visible member function is getvalue.Shown in Figure 12.17,
getvalue repeatedly reads a token and processes it until the end of line is
detected. At that point the item at the top of the stack is the answer.


Stacks and Compilers

1 / / top and pop the postfix machine stack; return the result.
2 / / If the stack is empty, print an error message.
3 template <class NumericType>
4 NumericType Evaluator<NumericType>::getTop( )
5 I
6
if ( postFixStack.empty ( ) 1
7
{
8
cerr << "Missing operand" << endl;
9
return 0;
10
1
11
12
NumericType tmp = postFixStack.back( ) ;

13
postFixStack.pop-back( ) ;
a
14
return tmp;
15 1

Figure 12.18 The getTop routine for getting the top item in the postfix stack and
removing it.

A precedence table is
used to decide what
is removed from the
operator stack. Leftassociative operators
have the operator
stack precedence set
at 1 higher than the
input
symbol
.
precedence. Rightassociative operators
go the other way.

Figures 1 2.18 and 12.19 show the routines used to implement the postfix
machine. The getTop routine returns and removes the top item in the postfix stack. The binaryop routine applies topop (which is expected to be the
top item in the operator stack) to the top two items on the postfix stack and
replaces them with the result. It also pops the operator stack (at line 33). signifying that processing for topop is complete. The pow routine is presumed
to exist for NumericType objects; we can either use the math library routine
or adapt the one previously shown in Figure 8.14.
Figure 12.20 declares a precedence table, which stores the operator precedences and is used to decide what is removed from the operator stack. The

operators are listed in the same order as the enumeration type TokenType.
Because enumeration types are assigned consecutive indices beginning with
zero, they can be used to index an array. (The array initialization syntax used
here was described in Section 1.2.6.)
We want to assign a number to each level of precedence. The higher the
number, the higher is the precedence. We could assign the additive operators
precedence 1, multiplicative operators precedence 3, exponentiation precedence 5, and parentheses precedence 99. However, we also need to take into
account associativity. To do so, we assign each operator a number that represents its precedence when it is an input symbol and a second number that
represents its precedence when it is on the operator stack. A left-associative
operator has the operator stack precedence set at 1 higher than the input
symbol precedence, and a right-associative operator goes the other way.
Thus the precedence of the + operator on the stack is 2.


-

A Simple Calculator

1
2
3
4
5
6
7
8
9
10
11
12

13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

i / Process an operator by taking two items off the postfix
/ / stack, applying the operator, and pushing the result.
/ / Print error if missing closing parenthesis or division by 0.

template <class NumericType>
void Evaluator<NumericType>::binaryOp(TokenType topop


)

i
if ( topop == OPAREN )
i
cerr << "Unbalanced parentheses" << endl;
opStack .pop-back ( ) ;
return;

3
NumericType rhs = getTop(
NumericType lhs = getTop(

);
);

if ( topop == EXP )
postFixStack.push-back( pow( lhs, rhs
else if ( topop == PLUS )
postFixStack.push-back( lhs + rhs ) ;
else if ( topop = = MINUS )
postFixStack.push-back( lhs - rhs ) ;
else if ( topop = = MULT )
postFixStack.push-back( lhs * rhs ) ;
else if( topop = = DIV )
if( rhs ! = 0 )
postFixStack.push-back( lhs / rhs
else

)


);

);

(

cerr < < "Division by zero" << endl;
postFixStack.push-back( lhs ) ;

1
opStack.pop-back(

);

3

Figure 12.19 The BinaryOp routine for applying topop to the postfix stack.

A consequence of this rule is that any two operators that have different
precedences are still correctly ordered. However, if a + is on the operator
stack and is also the input symbol, the operator on the top of the stack will
appear to have higher precedence and thus will be popped. This is what we
want for left-associative operators.
Similarly, if a is on the operator stack and is also the input symbol, the
operator on the top of the stack will appear to have lower precedence and
thus it will not be popped. That is what we want for right-associative operators. The token VALUE never gets placed on the stack, so its precedence is
meaningless. The end-of-line token is given lowest precedence because it is
A



Stacks and Compilers

1
/ / PREC-TABLE matches order of TokenType enumeration.
2 struct Precedence
3 {
4
int inputsymbol;
5
int topofstack;
6 1 PREC-TABLE [ ] =
7 {
/ / EOL, VALUE
8
{ 0 , -1 1 , { 0 , 0 1 ,
/ / OPAREN, CPAREN
9
{ 100, 0 1 , { 0 , 99 1 ,
10
{ 6, 5 1 ,
/ / EXP
11
{ 3 , 4 1, { 3 , 4 1,
/ / MULT, DIV
12
: 1 , 2 1 , { 1, 2 1
/ / PLUS, MINUS
13 1 ;


Figure 12.20 Table of precedences used to evaluate an infix expression.

placed on the stack for use as a sentinel (which is done in the constructor). If
we treat it as a right-associative operator, it is covered under the operator
case.
The remaining method is processToken,which is shown in Figure 12.21.
When we see an operand, we push it onto the postfix stack. When we see a
closing parenthesis, we repeatedly pop and process the top operator on the
operator stack until the opening parenthesis appears (lines 18-20). The
opening parenthesis is then popped at line 22. (The test at line 21 is used to
avoid popping the sentinel in the event of a missing opening parenthesis.)
Otherwise, we have the general operator case, which is succinctly described
by the code in lines 28-32.
A simple main routine is given in Figure 12.22. It repeatedly reads a line
of input, instantiates an Evaluator object, and computes its value. As written, the program performs i n t math. We can change line 8 to use double
math or perhaps a large-integer class.

12.2.4 Expression Trees
In an expression tree,
the leaves
operands and the
other nodes contain
operators.

Figure 12.23 shows an example of an expression tree, the leaves of which
are operands (e.g., constants or variable names) and the other nodes contain
operators. This particular tree happens to be binary because all the operations are binary. Although it is the simplest case, nodes can have more than
two children. A node also may have only one child, as is the case with the
unary minus operator.
We evaluate an expression tree T by applying the operator at the root to the

values obtained by recursively evaluating the left and right subtrees. In this
example, the left subtree evaluates to (acb)and the right subtree evaluates to


A Simple Calculator
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

26
27
28
29
30
31
32
33
34
35

/ / After token is read, use operator precedence parsing
/ / algorithm to process it; missing opening parentheses
/ / are detected here.

template <class NurnericType>
void Evaluator<NumericType>::
processTokeni const Token<NurnericType>

&

lastToken

)

I
TokenType topop;
TokenType lastType
switch( lastType


=

lastToken.getType(

);

)

(

case VALUE :
postFixStack.push-back( lastToken.getValue(
return;
case CPAREN:
while( ( topop

)

) ;

opStack.back( ) i ! = OPAREN & &
topop ! = EOL )
binaryopi topop 1 ;
if( topop == OPAREN 1
opStack.pop-back( ) ; l i Get rid cf opening parens
else
cerr << "Missing open parenthesis" < i endl;
break;
=


default:
:/ General operator case
while( PREC-TABLE[ lastType I .inputsymbol <=
PREC-TABLE[ topop = opStack.back( ) 1.topOfStack
binaryOp( topop ) ;
if ( lastToken ! = EOL )
opStack.push-back( lastType ) ;
break;

1

Figure 12.21 The processToken routine for processing lastToken,using
the operator precedence parsing algorithm.

(a-b)
. The entire tree therefore represents ( (a+b)* (a-b)) . We can produce an (overly parenthesized) infix expression by recursively producing a
parenthesized left expression, printing out the operator at the root, and recursively producing a parenthesized right expression. This general strategy
(left, node, right) is called an inorder traversal. This type of traversal is easy
to remember because of the type of expression it produces.

)


×