The C++ Programming Language Third Edition phần 2 pps

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (328.54 KB, 102 trang )

Section 5.3.1

Navigating Arrays

93

is equivalent to a traversal using a pointer:
v oi d f p(c ha r v
vo id fp ch ar v[])
{
f or (c ha r* p = v *p 0; p
fo r ch ar
v; p!=0 p++) u se p);
us e(*p
}

The prefix * operator dereferences a pointer so that *p is the character pointed to by p
p
p,and ++
increments the pointer so that it refers to the next element of the array.
There is no inherent reason why one version should be faster than the other. With modern compilers, identical code should be generated for both examples (see §5.9[8]). Programmers can
choose between the versions on logical and aesthetic grounds.
The result of applying the arithmetic operators +, -, ++, or -- to pointers depends on the type
of the object pointed to. When an arithmetic operator is applied to a pointer p of type T p is
T*,
assumed to point to an element of an array of objects of type T p 1 points to the next element of
T; p+1
that array, and p 1 points to the previous element. This implies that the integer value of p 1 will
p-1
p+1
be s iz eo f(T larger than the integer value of p For example, executing

si ze of T)
p.
#i nc lu de 
in cl ud e io st re am
i nt m ai n ()
in t ma in
{
i nt v i[1 0];
in t vi 10
s ho rt v s[1 0];
sh or t vs 10
s td :c ou t << &v i[0 << ´ ´ << &v i[1 << ´\ n´;
st d: co ut
vi 0]
vi 1]
\n
s td :c ou t << &v s[0 << ´ ´ << &v s[1 << ´\ n´;
st d: co ut
vs 0]
vs 1]
\n
}

produced
0 x7 ff fa ef 0 0 x7 ff fa ef 4
0x 7f ff ae f0 0x 7f ff ae f4
0 x7 ff fa ed c 0 x7 ff fa ed e
0x 7f ff ae dc 0x 7f ff ae de

using a default hexadecimal notation for pointer values. This shows that on my implementation,

s iz eo f(s ho rt is 2 and s iz eo f(i nt is 4
si ze of sh or t)
si ze of in t) 4.
Subtraction of pointers is defined only when both pointers point to elements of the same array
(although the language has no fast way of ensuring that is the case). When subtracting one pointer
from another, the result is the number of array elements between the two pointers (an integer). One
can add an integer to a pointer or subtract an integer from a pointer; in both cases, the result is a
pointer value. If that value does not point to an element of the same array as the original pointer or
one beyond, the result of using that value is undefined. For example:
v oi d f
vo id f()
{
i nt v 1[1 0];
in t v1 10
i nt v 2[1 0];
in t v2 10
i nt i 1 = &v 1[5
in t i1
v1 5]-&v 1[3 ; // i1 = 2
v1 3]
i nt i 2 = &v 1[5
in t i2
v1 5]-&v 2[3 ; // result undefined
v2 3]

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

94

Pointers, Arrays, and Structures

i nt p 1 = v 2+2
in t* p1 v2 2;
i nt p 2 = v 2-2
in t* p2 v2 2;

Chapter 5

// p1 = &v2[2]
// *p2 undefined

}

Complicated pointer arithmetic is usually unnecessary and often best avoided. Addition of pointers
makes no sense and is not allowed.
Arrays are not self-describing because the number of elements of an array is not guaranteed to
be stored with the array. This implies that to traverse an array that does not contain a terminator the
way character strings do, we must somehow supply the number of elements. For example:
v oi d f p(c ha r v , u ns ig ne d i nt s iz e)
vo id fp ch ar v[] un si gn ed in t si ze
{
f or (i nt i 0; i si ze i
fo r in t i=0 ius e(v i])
c on st i nt N = 7
co ns t in t
7;
c ha r v 2[N ;

ch ar v2 N]
f or (i nt i 0; i N; i
fo r in t i=0 ius e(v 2[i
}

Note that most C++ implementations offer no range checking for arrays. This array concept is
inherently low-level. A more advanced notion of arrays can be provided through the use of classes;
see §3.7.1.

5.4 Constants [ptr.const]
C++ offers the concept of a user-defined constant, a c on st to express the notion that a value doesn’t
co ns t,
change directly. This is useful in several contexts. For example, many objects don’t actually have
their values changed after initialization, symbolic constants lead to more maintainable code than do
literals embedded directly in code, pointers are often read through but never written through, and
most function parameters are read but not written to.
The keyword c on st can be added to the declaration of an object to make the object declared a
co ns t
constant. Because it cannot be assigned to, a constant must be initialized. For example:
c on st i nt m od el = 9 0;
co ns t in t mo de l 90
c on st i nt v = { 1 2 3 4 };
co ns t in t v[]
1, 2, 3,
c on st i nt x
co ns t in t x;

// model is a const
// v[i] is a const

// error: no initializer

Declaring something c on st ensures that its value will not change within its scope:
co ns t
v oi d f
vo id f()
{
m od el = 2 00
mo de l 20 0;
v 2]++;
v[2
}

// error
// error

Note that c on st modifies a type; that is, it restricts the ways in which an object can be used, rather
co ns t
than specifying how the constant is to be allocated. For example:

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 5.4

Constants

95

v oi d g co ns t X p
vo id g(c on st X* p)
{
// can’t modify *p here
}
v oi d h
vo id h()
{
X v al
va l;
// val can be modified
g va l);
g(&v al
// ...
}

Depending on how smart it is, a compiler can take advantage of an object being a constant in several ways. For example, the initializer for a constant is often (but not always) a constant expression
(§C.5); if it is, it can be evaluated at compile time. Further, if the compiler knows every use of the
c on st it need not allocate space to hold it. For example:
co ns t,
c on st i nt c 1 = 1
co ns t in t c1 1;
c on st i nt c 2 = 2
co ns t in t c2 2;
c on st i nt c 3 = m y_ f(3 ;
co ns t in t c3 my _f 3)
e xt er n c on st i nt c 4;
ex te rn co ns t in t c4
c on st i nt p = &c 2;
co ns t in t*

c2

// don’t know the value of c3 at compile time
// don’t know the value of c4 at compile time
// need to allocate space for c2

Given this, the compiler knows the values of c 1 and c 2 so that they can be used in constant expresc1
c2
sions. Because the values of c 3 and c 4 are not known at compile time (using only the information
c3
c4
available in this compilation unit; see §9.1), storage must be allocated for c 3 and c 4. Because the
c3
c4
address of c 2 is taken (and presumably used somewhere), storage must be allocated for c 2. The
c2
c2
simple and common case is the one in which the value of the constant is known at compile time and
no storage needs to be allocated; c 1 is an example of that. The keyword e xt er n indicates that c 4 is
c1
ex te rn
c4
defined elsewhere (§9.2).
It is typically necessary to allocate store for an array of constants because the compiler cannot,
in general, figure out which elements of the array are referred to in expressions. On many
machines, however, efficiency improvements can be achieved even in this case by placing arrays of
constants in read-only storage.
Common uses for c on st are as array bounds and case labels. For example:
co ns ts
c on st i nt a = 4 2;

co ns t in t
42
c on st i nt b = 9 9;
co ns t in t
99
c on st i nt m ax = 1 28
co ns t in t ma x 12 8;
i nt v ma x];
in t v[m ax
v oi d f in t i
vo id f(i nt i)
{
s wi tc h (i {
sw it ch i)
c as e a
ca se a:
// ...

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

96

Pointers, Arrays, and Structures

Chapter 5

c as e b
ca se b:

// ...
}
}

Enumerators (§4.8) are often an alternative to c on st in such cases.
co ns ts
The way c on st can be used with class member functions is discussed in §10.2.6 and §10.2.7.
co ns t
Symbolic constants should be used systematically to avoid ‘‘magic numbers’’ in code. If a
numeric constant, such as an array bound, is repeated in code, it becomes hard to revise that code
because every occurrence of that constant must be changed to make a correct update. Using a symbolic constant instead localizes information. Usually, a numeric constant represents an assumption
about the program. For example, 4 may represent the number of bytes in an integer, 1 28 the num12 8
ber of characters needed to buffer input, and 6 24 the exchange factor between Danish kroner and
6.2 4
U.S. dollars. Left as numeric constants in the code, these values are hard for a maintainer to spot
and understand. Often, such numeric values go unnoticed and become errors when a program is
ported or when some other change violates the assumptions they represent. Representing assumptions as well-commented symbolic constants minimizes such maintenance problems.
5.4.1 Pointers and Constants [ptr.pc]
When using a pointer, two objects are involved: the pointer itself and the object pointed to. ‘‘Prefixing’’ a declaration of a pointer with c on st makes the object, but not the pointer, a constant. To
co ns t
declare a pointer itself, rather than the object pointed to, to be a constant, we use the declarator
operator *c on st instead of plain *. For example:
co ns t
v oi d f 1(c ha r* p
vo id f1 ch ar p)
{
c ha r s
ch ar s[] = "G or m";
Go rm
c on st c ha r* p c = s

co ns t ch ar pc s;
p c[3 = ´g
pc 3]
g´;
pc = p
pc p;

// pointer to constant
// error: pc points to constant
// ok

c ha r *c on st c p = s
ch ar co ns t cp s;
c p[3 = ´a
cp 3]
a´;
cp = p
cp p;

// constant pointer
// ok
// error: cp is constant

c on st c ha r *c on st c pc = s
co ns t ch ar co ns t cp c s;
c pc 3] = ´a
cp c[3
a´;
c pc = p
cp c p;

// const pointer to const
// error: cpc points to constant
// error: cpc is constant

}

The declarator operator that makes a pointer constant is *c on st There is no c on st declarator
co ns t.
co ns t*
operator, so a c on st appearing before the * is taken to be part of the base type. For example:
co ns t
c ha r *c on st c p;
ch ar co ns t cp
c ha r c on st p c;
ch ar co ns t* pc
c on st c ha r* p c2
co ns t ch ar pc 2;

// const pointer to char
// pointer to const char
// pointer to const char

Some people find it helpful to read such declarations right-to-left. For example, ‘‘c p is a c on st
cp
co ns t
pointer to a c ha r’’ and ‘‘p c2 is a pointer to a c ha r c on st
ch ar
pc 2
ch ar co ns t.’’

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 5.4.1

Pointers and Constants

97

An object that is a constant when accessed through one pointer may be variable when accessed
in other ways. This is particularly useful for function arguments. By declaring a pointer argument
c on st the function is prohibited from modifying the object pointed to. For example:
co ns t,
c ha r* s tr cp y(c ha r* p c on st c ha r* q ; // cannot modify *q
ch ar st rc py ch ar p, co ns t ch ar q)

You can assign the address of a variable to a pointer to constant because no harm can come from
that. However, the address of a constant cannot be assigned to an unrestricted pointer because this
would allow the object’s value to be changed. For example:
v oi d f 4()
vo id f4
{
i nt a = 1
in t
1;
c on st i nt c = 2
co ns t in t
2;

c on st i nt p 1 = &c
co ns t in t* p1
c;
c on st i nt p 2 = &a
co ns t in t* p2
a;
i nt p 3 = &c
in t* p3
c;
*p 3 = 7
p3 7;
}

// ok
// ok
// error: initialization of int* with const int*
// try to change the value of c

It is possible to explicitly remove the restrictions on a pointer to c on st by explicit type conversion
co ns t
(§10.2.7.1 and §15.4.2.1).

5.5 References [ptr.ref]
A reference is an alternative name for an object. The main use of references is for specifying arguments and return values for functions in general and for overloaded operators (Chapter 11) in particular. The notation X means reference to X For example:
X&
X.
v oi d f
vo id f()
{
i nt i = 1

in t
1;
i nt r = i
in t&
i;
i nt x = r
in t
r;
r=2
2;

// r and i now refer to the same int
// x = 1
// i = 2

}

To ensure that a reference is a name for something (that is, bound to an object), we must initialize
the reference. For example:
i nt i = 1
in t
1;
i nt r 1 = i
in t& r1 i;
i nt r 2;
in t& r2
e xt er n i nt r 3;
ex te rn in t& r3

// ok: r1 initialized

// error: initializer missing
// ok: r3 initialized elsewhere

Initialization of a reference is something quite different from assignment to it. Despite appearances, no operator operates on a reference. For example:

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

98

Pointers, Arrays, and Structures

v oi d g
vo id g()
{
i nt i i = 0
in t ii 0;
i nt r r = i i;
in t& rr ii
r r++;
rr
i nt p p = &r r;
in t* pp
rr
}

Chapter 5

// ii is incremented to 1

// pp points to ii

This is legal, but r r++ does not increment the reference r r; rather, ++ is applied to an i nt that haprr
rr
in t
pens to be i i. Consequently, the value of a reference cannot be changed after initialization; it
ii
always refers to the object it was initialized to denote. To get a pointer to the object denoted by a
reference r r, we can write &r r.
rr
rr
The obvious implementation of a reference is as a (constant) pointer that is dereferenced each
time it is used. It doesn’t do much harm thinking about references that way, as long as one remembers that a reference isn’t an object that can be manipulated the way a pointer is:
p p:
pp

&i i
ii
r r:
rr

i i:
ii

1

In some cases, the compiler can optimize away a reference so that there is no object representing
that reference at run-time.
Initialization of a reference is trivial when the initializer is an lvalue (an object whose address
you can take; see §4.9.6). The initializer for a ‘‘plain’’ T must be an lvalue of type T

T&
T.
The initializer for a c on st T need not be an lvalue or even of type T In such cases,
co ns t T&
T.
[1] first, implicit type conversion to T is applied if necessary (see §C.6);
[2] then, the resulting value is placed in a temporary variable of type T and
T;
[3] finally, this temporary variable is used as the value of the initializer.
Consider:
d ou bl e& d r = 1
do ub le dr 1;
c on st d ou bl e& c dr = 1
co ns t do ub le cd r 1;

// error: lvalue needed
// ok

The interpretation of this last initialization might be:
d ou bl e t em p = d ou bl e(1 ; // first create a temporary with the right value
do ub le te mp do ub le 1)
c on st d ou bl e& c dr = t em p; // then use the temporary as the initializer for cdr
co ns t do ub le cd r te mp

A temporary created to hold a reference initializer persists until the end of its reference’s scope.
References to variables and references to constants are distinguished because the introduction of
a temporary in the case of the variable is highly error-prone; an assignment to the variable would
become an assignment to the – soon to disappear – temporary. No such problem exists for references to constants, and references to constants are often important as function arguments (§11.6).
A reference can be used to specify a function argument so that the function can change the
value of an object passed to it. For example:

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 5.5

References

99

v oi d i nc re me nt in t& a a) { a a++; }
vo id in cr em en t(i nt aa
aa
v oi d f
vo id f()
{
i nt x = 1
in t
1;
i nc re me nt x);
in cr em en t(x
}

// x = 2

The semantics of argument passing are defined to be those of initialization, so when called,
i nc re me nt argument a a became another name for x To keep a program readable, it is often best
in cr em en t’s
aa

x.
to avoid functions that modify their arguments. Instead, you can return a value from the function
explicitly or require a pointer argument:
i nt n ex t(i nt p { r et ur n p 1; }
in t ne xt in t p) re tu rn p+1
v oi d i nc r(i nt p { (*p
vo id in cr in t* p)
p)++; }
v oi d g
vo id g()
{
i nt x = 1
in t
1;
i nc re me nt x);
in cr em en t(x
x = n ex t(x ;
ne xt x)
i nc r(&x ;
in cr x)
}

// x = 2
// x = 3
// x = 4

The i nc re me nt x) notation doesn’t give a clue to the reader that x value is being modified, the
in cr em en t(x
x’s
way x ne xt x) and i nc r(&x does. Consequently ‘‘plain’’ reference arguments should be used

x=n ex t(x
in cr x)
only where the name of the function gives a strong hint that the reference argument is modified.
References can also be used to define functions that can be used on both the left-hand and
right-hand sides of an assignment. Again, many of the most interesting uses of this are found in the
design of nontrivial user-defined types. As an example, let us define a simple associative array.
First, we define struct P ai r like this:
Pa ir
s tr uc t P ai r {
st ru ct Pa ir
s tr in g n am e;
st ri ng na me
d ou bl e v al
do ub le va l;
};

The basic idea is that a s tr in g has a floating-point value associated with it. It is easy to define a
st ri ng
function, v al ue
va lu e(), that maintains a data structure consisting of one P ai r for each different string
Pa ir
that has been presented to it. To shorten the presentation, a very simple (and inefficient) implementation is used:
v ec to r p ai rs
ve ct or Pa ir pa ir s;
d ou bl e& v al ue co ns t s tr in g& s
do ub le va lu e(c on st st ri ng s)
/*
maintain a set of Pairs:
search for s, return its value if found; otherwise make a new Pair and return the default value 0
*/

{

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

100

Pointers, Arrays, and Structures

Chapter 5

f or (i nt i = 0 i < p ai rs si ze ; i
fo r in t
0;
pa ir s.s iz e() i++)
i f (s == p ai rs i].n am e) r et ur n p ai rs i].v al
if s
pa ir s[i na me re tu rn pa ir s[i va l;
P ai r p = { s 0 };
Pa ir
s,
p ai rs pu sh _b ac k(p ; // add Pair at end (§3.7.3)
pa ir s.p us h_ ba ck p)
r et ur n p ai rs pa ir s.s iz e()-1 va l;
re tu rn pa ir s[p ai rs si ze
1].v al
}

This function can be understood as an array of floating-point values indexed by character strings.

For a given argument string, v al ue
va lu e() finds the corresponding floating-point object (not the value
of the corresponding floating-point object); it then returns a reference to it. For example:
i nt m ai n() // count the number of occurrences of each word on input
in t ma in
{
s tr in g b uf
st ri ng bu f;
w hi le (c in bu f) v al ue bu f)++;
wh il e ci n>>b uf va lu e(b uf
f or (v ec to r::c on st _i te ra to r p = p ai rs be gi n(); p pa ir s.e nd ; ++p
fo r ve ct or Pa ir
co ns t_ it er at or
pa ir s.b eg in
p!=p ai rs en d()
p)
c ou t <n am e
p->v al
\n
}

Each time around, the w hi le
wh il e-loop reads one word from the standard input stream c in into the string
ci n
b uf (§3.6) and then updates the counter associated with it. Finally, the resulting table of different
bu f
words in the input, each with its number of occurrences, is printed. For example, given the input
aa bb bb aa aa bb aa aa

aa bb bb aa aa bb aa aa

this program will produce:
a a: 5
aa
b b: 3
bb

It is easy to refine this into a proper associative array type by using a template class with the selection operator [] overloaded (§11.8). It is even easier just to use the standard library m ap (§17.4.1).
ma p

5.6 Pointer to Void [ptr.ptrtovoid]
A pointer of any type of object can be assigned to a variable of type v oi d*, a v oi d* can be assigned
vo id
vo id
to another v oi d*, v oi d*s can be compared for equality and inequality, and a v oi d* can be explicitly
vo id vo id
vo id
converted to another type. Other operations would be unsafe because the compiler cannot know
what kind of object is really pointed to. Consequently, other operations result in compile-time
errors. To use a v oi d*, we must explicitly convert it to a pointer to a specific type. For example:
vo id
v oi d f in t* p i)
vo id f(i nt pi
{
v oi d* p v = p i; // ok: implicit conversion of int* to void*
vo id pv pi
*p v;
pv
// error: can’t dereference void*

p v++;
pv
// error: can’t increment void* (the size of the object pointed to is unknown)

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 5.6

Pointer to Void

i nt p i2 = s ta ti c_ ca st in t*>(p v);
in t* pi 2 st at ic _c as tpv

101

// explicit conversion back to int*

d ou bl e* p d1 = p v;
do ub le pd 1 pv
// error
d ou bl e* p d2 = p i;
do ub le pd 2 pi
// error
d ou bl e* p d3 = s ta ti c_ ca st do ub le
do ub le pd 3 st at ic _c as t<d ou bl e*>(p v); // unsafe
pv
}

In general, it is not safe to use a pointer that has been converted (‘‘cast’’) to a type that differs from
the type the object pointed to. For example, a machine may assume that every d ou bl e is allocated
do ub le
on an 8-byte boundary. If so, strange behavior could arise if p i pointed to an i nt that wasn’t allopi
in t
cated that way. This form of explicit type conversion is inherently unsafe and ugly. Consequently,
the notation used, s ta ti c_ ca st was designed to be ugly.
st at ic _c as t,
The primary use for v oi d* is for passing pointers to functions that are not allowed to make
vo id
assumptions about the type of the object and for returning untyped objects from functions. To use
such an object, we must use explicit type conversion.
Functions using v oi d* pointers typically exist at the very lowest level of the system, where real
vo id
hardware resources are manipulated. For example:
v oi d* m y_ al lo c(s iz e_ t n ; // allocate n bytes from my special heap
vo id my _a ll oc si ze _t n)

Occurrences of v oi d*s at higher levels of the system should be viewed with suspicion because they
vo id
are likely indicators of design errors. Where used for optimization, v oi d* can be hidden behind a
vo id
type-safe interface (§13.5, §24.4.2).
Pointers to functions (§7.7) and pointers to members (§15.5) cannot be assigned to v oi d*s.
vo id

5.7 Structures [ptr.struct]
An array is an aggregate of elements of the same type. A s tr uc t is an aggregate of elements of
st ru ct

(nearly) arbitrary types. For example:
s tr uc t a dd re ss {
st ru ct ad dr es s
c ha r* n am e;
ch ar na me
l on g i nt n um be r;
lo ng in t nu mb er
c ha r* s tr ee t;
ch ar st re et
c ha r* t ow n;
ch ar to wn
c ha r s ta te 2];
ch ar st at e[2
l on g z ip
lo ng zi p;
};

// "Jim Dandy"
// 61
// "South St"
// "New Providence"
// ’N’ ’J’
// 7974

This defines a new type called a dd re ss consisting of the items you need in order to send mail to
ad dr es s
someone. Note the semicolon at the end. This is one of very few places in C++ where it is necessary to have a semicolon after a curly brace, so people are prone to forget it.
Variables of type a dd re ss can be declared exactly as other variables, and the individual
ad dr es s
members can be accessed using the . (dot) operator. For example:

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

102

Pointers, Arrays, and Structures

Chapter 5

v oi d f
vo id f()
{
a dd re ss j d;
ad dr es s jd
j d.n am e = "J im D an dy
jd na me
Ji m Da nd y";
j d.n um be r = 6 1;
jd nu mb er 61
}

The notation used for initializing arrays can also be used for initializing variables of structure types.
For example:
a dd re ss j d = {
ad dr es s jd
"J im D an dy
Ji m Da nd y",
6 1, "S ou th S t",

61
So ut h St
"N ew P ro vi de nc e", {´N
Ne w Pr ov id en ce
N´,´J
J´}, 7 97 4
79 74
};

Using a constructor (§10.2.3) is usually better, however. Note that j d.s ta te could not be initialized
jd st at e
by the string " NJ ". Strings are terminated by the character ´\ 0´. Hence, " NJ " has three characters
"N J"
\0
"N J"
– one more than will fit into j d.s ta te
jd st at e.
Structure objects are often accessed through pointers using the -> (structure pointer dereference) operator. For example:
v oi d p ri nt _a dd r(a dd re ss p
vo id pr in t_ ad dr ad dr es s* p)
{
c ou t <n am e
\n
<n um be r
p->s tr ee t
\n
<t ow n
\n
<s ta te 0]
p->s ta te 1]
p->z ip
\n
}

When p is a pointer, p m is equivalent to (*p m.
p->m
p).m
Objects of structure types can be assigned, passed as function arguments, and returned as the
result from a function. For example:
a dd re ss c ur re nt
ad dr es s cu rr en t;
a dd re ss s et _c ur re nt ad dr es s n ex t)
ad dr es s se t_ cu rr en t(a dd re ss ne xt
{
a dd re ss p re v = c ur re nt
ad dr es s pr ev cu rr en t;
c ur re nt = n ex t;
cu rr en t ne xt
r et ur n p re v;
re tu rn pr ev
}

Other plausible operations, such as comparison (== and !=), are not defined. However, the user
can define such operators (Chapter 11).
The size of an object of a structure type is not necessarily the sum of the sizes of its members.

This is because many machines require objects of certain types to be allocated on architecturedependent boundaries or handle such objects much more efficiently if they are. For example, integers are often allocated on word boundaries. On such machines, objects are said to have to be
aligned properly. This leads to ‘‘holes’’ in the structures. For example, on many machines,

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 5.7

Structures

103

s iz eo f(a dd re ss is 2 4, and not 2 2 as might be expected. You can minimize wasted space by simsi ze of ad dr es s) 24
22
ply ordering members by size (largest member first). However, it is usually best to order members
for readability and sort them by size only if there is a demonstrated need to optimize.
The name of a type becomes available for use immediately after it has been encountered and not
just after the complete declaration has been seen. For example:
s tr uc t L in k {
st ru ct Li nk
L in k* p re vi ou s;
Li nk pr ev io us
L in k* s uc ce ss or
Li nk su cc es so r;
};

It is not possible to declare new objects of a structure type until the complete declaration has been
seen. For example:
s tr uc t N o_ go od {

st ru ct No _g oo d
N o_ go od m em be r;
No _g oo d me mb er
};

// error: recursive definition

This is an error because the compiler is not able to determine the size of N o_ go od To allow two
No _g oo d.
(or more) structure types to refer to each other, we can declare a name to be the name of a structure
type. For example:
s tr uc t L is t;
st ru ct Li st

// to be defined later

s tr uc t L in k {
st ru ct Li nk
L in k* p re
Li nk pr e;
L in k* s uc
Li nk su c;
L is t* m em be r_ of
Li st me mb er _o f;
};
s tr uc t L is t {
st ru ct Li st
L in k* h ea d;
Li nk he ad
};

Without the first declaration of L is t, use of L is t in the declaration of L in k would have caused a synLi st
Li st
Li nk
tax error.
The name of a structure type can be used before the type is defined as long as that use does not
require the name of a member or the size of the structure to be known. For example:
c la ss S
cl as s S;

// ‘S’ is the name of some type

e xt er n S a
ex te rn
a;
S f ;
f()
v oi d g S);
vo id g(S
S h S*);
S* h(S

However, many such declarations cannot be used unless the type S is defined:
v oi d k S* p
vo id k(S p)
{
S a
a;

// error: S not defined; size needed to allocate

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

104

Pointers, Arrays, and Structures

f ;
f()
g a);
g(a
p m=7
p->m 7;

// error: S not defined; size needed to return value
// error: S not defined; size needed to pass argument
// error: S not defined; member name not known

S q = h p);
S*
h(p
q m=7
q->m 7;

Chapter 5

// ok: pointers can be allocated and passed
// error: S not defined; member name not known

}

A s tr uc t is a simple form of a c la ss (Chapter 10).
st ru ct
cl as s
For reasons that reach into the pre-history of C, it is possible to declare a s tr uc t and a nonst ru ct
structure with the same name in the same scope. For example:
s tr uc t s ta t { /* ... */ };
st ru ct st at
i nt s ta t(c ha r* n am e, s tr uc t s ta t* b uf ;
in t st at ch ar na me st ru ct st at bu f)

In that case, the plain name (s ta t) is the name of the non-structure, and the structure must be
st at
referred to with the prefix s tr uc t. Similarly, the keywords c la ss u ni on (§C.8.2), and e nu m (§4.8)
st ru ct
cl as s, un io n
en um
can be used as prefixes for disambiguation. However, it is best not to overload names to make that
necessary.
5.7.1 Type Equivalence [ptr.equiv]
Two structures are different types even when they have the same members. For example,
s tr uc t S 1 { i nt a };
st ru ct S1 in t a;
s tr uc t S 2 { i nt a };
st ru ct S2 in t a;

are two different types, so
S1 x

S1 x;
S 2 y = x // error: type mismatch
S2
x;

Structure types are also different from fundamental types, so
S1 x
S1 x;
i nt i = x // error: type mismatch
in t
x;

Every s tr uc t must have a unique definition in a program (§9.2.3).
st ru ct

5.8 Advice [ptr.advice]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]

Avoid nontrivial pointer arithmetic; §5.3.
Take care not to write beyond the bounds of an array; §5.3.1.
Use 0 rather than N UL L; §5.1.1.
NU LL
Use v ec to r and v al ar ra y rather than built-in (C-style) arrays; §5.3.1.

ve ct or
va la rr ay
Use s tr in g rather than zero-terminated arrays of c ha r; §5.3.
st ri ng
ch ar
Minimize use of plain reference arguments; §5.5.
Avoid v oi d* except in low-level code; §5.6.
vo id
Avoid nontrivial literals (‘‘magic numbers’’) in code. Instead, define and use symbolic constants; §4.8, §5.4.

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 5.9

5.9 Exercises

Exercises

105

[ptr.exercises]

1. (∗1) Write declarations for the following: a pointer to a character, an array of 10 integers, a reference to an array of 10 integers, a pointer to an array of character strings, a pointer to a pointer
to a character, a constant integer, a pointer to a constant integer, and a constant pointer to an
integer. Initialize each one.
2. (∗1.5) What, on your system, are the restrictions on the pointer types c ha r*, i nt and v oi d*?
ch ar in t*,
vo id

For example, may an i nt have an odd value? Hint: alignment.
in t*
3. (∗1) Use t yp ed ef to define the types u ns ig ne d c ha r, c on st u ns ig ne d c ha r, pointer to integer,
ty pe de f
un si gn ed ch ar co ns t un si gn ed ch ar
pointer to pointer to c ha r, pointer to arrays of c ha r, array of 7 pointers to i nt pointer to an array
ch ar
ch ar
in t,
of 7 pointers to i nt and array of 8 arrays of 7 pointers to i nt
in t,
in t.
4. (∗1) Write a function that swaps (exchanges the values of) two integers. Use i nt as the arguin t*
ment type. Write another swap function using i nt as the argument type.
in t&
5. (∗1.5) What is the size of the array s tr in the following example:
st r

c ha r s tr = "a s ho rt s tr in g";
ch ar st r[]
a sh or t st ri ng

What is the length of the string " a s ho rt s tr in g"
"a sh or t st ri ng "?
6. (∗1) Define functions f ch ar g ch ar
f(c ha r), g(c ha r&), and h co ns t c ha r&). Call them with the arguments
h(c on st ch ar
´a 4 9, 3 30 0, c u c, and s c, where c is a c ha r, u c is an u ns ig ne d c ha r, and s c is a s ig ne d
a´, 49 33 00 c, uc
sc

ch ar uc
un si gn ed ch ar
sc
si gn ed
c ha r. Which calls are legal? Which calls cause the compiler to introduce a temporary variable?
ch ar
7. (∗1.5) Define a table of the names of months of the year and the number of days in each month.
Write out that table. Do this twice; once using an array of c ha r for the names and an array for
ch ar
the number of days and once using an array of structures, with each structure holding the name
of a month and the number of days in it.
8. (∗2) Run some tests to see if your compiler really generates equivalent code for iteration using
pointers and iteration using indexing (§5.3.1). If different degrees of optimization can be
requested, see if and how that affects the quality of the generated code.
9. (∗1.5) Find an example where it would make sense to use a name in its own initializer.
10. (∗1) Define an array of strings in which the strings contain the names of the months. Print those
strings. Pass the array to a function that prints those strings.
11. (∗2) Read a sequence of words from input. Use Q ui t as a word that terminates the input. Print
Qu it
the words in the order they were entered. Don’t print a word twice. Modify the program to sort
the words before printing them.
12. (∗2) Write a function that counts the number of occurrences of a pair of letters in a s tr in g and
st ri ng
another that does the same in a zero-terminated array of c ha r (a C-style string). For example,
ch ar
the pair "ab" appears twice in "xabaacbaxabb".
13. (∗1.5) Define a s tr uc t D at e to keep track of dates. Provide functions that read D at es from
st ru ct Da te
Da te
input, write D at es to output, and initialize a D at e with a date.

Da te
Da te

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

106

Pointers, Arrays, and Structures

Chapter 5

.

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

________________________________________
________________________________________________________________________________________________________________________________________________________________

6
________________________________________
________________________________________________________________________________________________________________________________________________________________

Expressions and Statements
Premature optimization
is the root of all evil.
– D. Knuth

On the other hand,
we cannot ignore efficiency.
– Jon Bentley

Desk calculator example — input — command line arguments — expression summary
— logical and relational operators — increment and decrement — free store — explicit
type conversion — statement summary — declarations — selection statements — declarations in conditions — iteration statements — the infamous g ot o — comments and
go to
indentation — advice — exercises.

6.1 A Desk Calculator [expr.calculator]
Statements and expressions are introduced by presenting a desk calculator program that provides
the four standard arithmetic operations as infix operators on floating-point numbers. The user can
also define variables. For example, given the input
r=2 5
2.5
a re a = p i * r * r
ar ea pi

(pi is predefined) the calculator program will write
2 5
2.5
1 9.6 35
19 63 5

where 2 5 is the result of the first line of input and 1 9.6 35 is the result of the second.
2.5
19 63 5

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.

Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

108

Expressions and Statements

Chapter 6

The calculator consists of four main parts: a parser, an input function, a symbol table, and a
driver. Actually, it is a miniature compiler in which the parser does the syntactic analysis, the input
function handles input and lexical analysis, the symbol table holds permanent information, and the
driver handles initialization, output, and errors. We could add many features to this calculator to
make it more useful (§6.6[20]), but the code is long enough as it is, and most features would just
add code without providing additional insight into the use of C++.
6.1.1 The Parser [expr.parser]
Here is a grammar for the language accepted by the calculator:
p ro gr am
pr og ra m:
E ND
EN D
e xp r_ li st E ND
ex pr _l is t EN D

// END is end-of-input

e xp r_ li st
ex pr _l is t:
e xp re ss io n P RI NT
ex pr es si on PR IN T

e xp re ss io n P RI NT e xp r_ li st
ex pr es si on PR IN T ex pr _l is t

// PRINT is semicolon

e xp re ss io n:
ex pr es si on
e xp re ss io n + t er m
ex pr es si on te rm
e xp re ss io n - t er m
ex pr es si on te rm
t er m
te rm
t er m:
te rm
t er m / p ri ma ry
te rm pr im ar y
t er m * p ri ma ry
te rm pr im ar y
p ri ma ry
pr im ar y
p ri ma ry
pr im ar y:
N UM BE R
NU MB ER
N AM E
NA ME
N AM E = e xp re ss io n
NA ME ex pr es si on
- p ri ma ry

pr im ar y
( e xp re ss io n )
ex pr es si on

In other words, a program is a sequence of expressions separated by semicolons. The basic units of
an expression are numbers, names, and the operators *, /, +, - (both unary and binary), and =.
Names need not be declared before use.
The style of syntax analysis used is usually called recursive descent; it is a popular and straightforward top-down technique. In a language such as C++, in which function calls are relatively
cheap, it is also efficient. For each production in the grammar, there is a function that calls other
functions. Terminal symbols (for example, E ND N UM BE R, +, and -) are recognized by the lexiEN D, NU MB ER
cal analyzer, g et _t ok en
ge t_ to ke n(); and nonterminal symbols are recognized by the syntax analyzer functions, e xp r(), t er m(), and p ri m(). As soon as both operands of a (sub)expression are known, the
ex pr
te rm
pr im
expression is evaluated; in a real compiler, code could be generated at this point.
The parser uses a function g et _t ok en
ge t_ to ke n() to get input. The value of the most recent call of
g et _t ok en
ge t_ to ke n() can be found in the global variable c ur r_ to k. The type of c ur r_ to k is the enumeracu rr _t ok
cu rr _t ok
tion T ok en _v al ue
To ke n_ va lu e:

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 6.1.1

The Parser

e nu m T ok en _v al ue {
en um To ke n_ va lu e
N AM E,
NA ME
N UM BE R,
NU MB ER
E ND
EN D,
P LU S=´+´,
PL US
M IN US
MI NU S=´-´, M UL
MU L=´*´,
P RI NT
PR IN T=´;´, A SS IG N=´=´, L P=´(´,
AS SI GN
LP
};

109

D IV
DI V=´/´,
R P=´)´
RP

T ok en _v al ue c ur r_ to k = P RI NT
To ke n_ va lu e cu rr _t ok PR IN T;

Representing each token by the integer value of its character is convenient and efficient and can be
a help to people using debuggers. This works as long as no character used as input has a value used
as an enumerator – and no character set I know of has a printing character with a single-digit integer value. I chose P RI NT as the initial value for c ur r_ to k because that is the value it will have
PR IN T
cu rr _t ok
after the calculator has evaluated an expression and displayed its value. Thus, I ‘‘start the system’’
in a normal state to minimize the chance of errors and the need for special startup code.
Each parser function takes a b oo l (§4.2) argument indicating whether the function needs to call
bo ol
g et _t ok en to get the next token. Each parser function evaluates ‘‘its’’ expression and returns the
ge t_ to ke n()
value. The function e xp r() handles addition and subtraction. It consists of a single loop that looks
ex pr
for terms to add or subtract:
d ou bl e e xp r(b oo l g et
do ub le ex pr bo ol ge t)
{
d ou bl e l ef t = t er m(g et ;
do ub le le ft te rm ge t)

// add and subtract

f or (;;)
fo r
// ‘‘forever’’
s wi tc h (c ur r_ to k) {
sw it ch cu rr _t ok
c as e P LU S:
ca se PL US

l ef t += t er m(t ru e);
le ft
te rm tr ue
b re ak
br ea k;
c as e M IN US
ca se MI NU S:
l ef t -= t er m(t ru e);
le ft
te rm tr ue
b re ak
br ea k;
d ef au lt
de fa ul t:
r et ur n l ef t;
re tu rn le ft
}
}

This function really does not do much itself. In a manner typical of higher-level functions in a
large program, it calls other functions to do the work.
The switch-statement tests the value of its condition, which is supplied in parentheses after the
s wi tc h keyword, against a set of constants. The break-statements are used to exit the switchsw it ch
statement. The constants following the c as e labels must be distinct. If the value tested does not
ca se
match any c as e label, the d ef au lt is chosen. The programmer need not provide a d ef au lt
ca se
de fa ul t
de fa ul t.
Note that an expression such as 2 3+4 is evaluated as (2 3)+4 as specified in the grammar.

2-3 4
2-3 4,
The curious notation f or
fo r(;;) is the standard way to specify an infinite loop; you could pronounce it ‘‘forever.’’ It is a degenerate form of a for-statement (§6.3.3); w hi le tr ue is an alternawh il e(t ru e)
tive. The switch-statement is executed repeatedly until something different from + and - is found,
and then the return-statement in the default case is executed.
The operators += and -= are used to handle the addition and subtraction; l ef t=l ef t+t er m() and
le ft le ft te rm

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

110

Expressions and Statements

Chapter 6

l ef t=l ef t-t er m() could have been used without changing the meaning of the program. However,
le ft le ft te rm
l ef t+=t er m() and l ef t-=t er m() not only are shorter but also express the intended operation
le ft te rm
le ft te rm
directly. Each assignment operator is a separate lexical token, so a + = 1 is a syntax error because
1;
of the space between the + and the =.
Assignment operators are provided for the binary operators
+

-

*

/

%

&

|

^

<<

>>

so that the following assignment operators are possible
=

+=

-=

*=

/=

%=

&=

|=

^=

<<= >>=

The % is the modulo, or remainder, operator; &, |, and ^ are the bitwise logical operators AND,
OR, and exclusive OR; << and >> are the left shift and right shift operators; §6.2 summarizes the
operators and their meanings. For a binary operator @ applied to operands of built-in types, an
expression x @= y means x =x @y except that x is evaluated once only.
x@ =y
x= x@ y,
Chapter 8 and Chapter 9 discuss how to organize a program as a set of modules. With one
exception, the declarations for this calculator example can be ordered so that everything is declared
exactly once and before it is used. The exception is e xp r(), which calls t er m(), which calls
ex pr
te rm
p ri m(), which in turn calls e xp r(). This loop must be broken somehow. A declaration
pr im
ex pr
d ou bl e e xp r(b oo l);
do ub le ex pr bo ol

before the definition of p ri m() will do nicely.
pr im
Function t er m() handles multiplication and division in the same way e xp r() handles addition
te rm

ex pr
and subtraction:
d ou bl e t er m(b oo l g et
do ub le te rm bo ol ge t)
{
d ou bl e l ef t = p ri m(g et ;
do ub le le ft pr im ge t)

// multiply and divide

f or (;;)
fo r
s wi tc h (c ur r_ to k) {
sw it ch cu rr _t ok
c as e M UL
ca se MU L:
l ef t *= p ri m(t ru e);
le ft
pr im tr ue
b re ak
br ea k;
c as e D IV
ca se DI V:
i f (d ou bl e d = p ri m(t ru e)) {
if do ub le
pr im tr ue
l ef t /= d
le ft
d;
b re ak

br ea k;
}
r et ur n e rr or di vi de b y 0 ;
re tu rn er ro r("d iv id e by 0")
d ef au lt
de fa ul t:
r et ur n l ef t;
re tu rn le ft
}
}

The result of dividing by zero is undefined and usually disastrous. We therefore test for 0 before
dividing and call e rr or if we detect a zero divisor. The function e rr or is described in §6.1.4.
er ro r()
er ro r()
The variable d is introduced into the program exactly where it is needed and initialized immediately. The scope of a name introduced in a condition is the statement controlled by that condition,

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 6.1.1

The Parser

111

and the resulting value is the value of the condition (§6.3.2.1). Consequently, the division and
assignment l ef t/=d is done if and only if d is nonzero.
le ft d

The function p ri m() handling a primary is much like e xp r() and t er m(), except that because
pr im
ex pr
te rm
we are getting lower in the call hierarchy a bit of real work is being done and no loop is necessary:
d ou bl e n um be r_ va lu e;
do ub le nu mb er _v al ue
s tr in g s tr in g_ va lu e;
st ri ng st ri ng _v al ue
d ou bl e p ri m(b oo l g et
do ub le pr im bo ol ge t)
{
i f (g et g et _t ok en ;
if ge t) ge t_ to ke n()

// handle primaries

s wi tc h (c ur r_ to k) {
sw it ch cu rr _t ok
c as e N UM BE R:
ca se NU MB ER
// floating-point constant
{
d ou bl e v = n um be r_ va lu e;
do ub le
nu mb er _v al ue
g et _t ok en ;
ge t_ to ke n()
r et ur n v
re tu rn v;

}
c as e N AM E:
ca se NA ME
{
d ou bl e& v = t ab le st ri ng _v al ue ;
do ub le
ta bl e[s tr in g_ va lu e]
i f (g et _t ok en
if ge t_ to ke n() == A SS IG N) v = e xp r(t ru e);
AS SI GN
ex pr tr ue
r et ur n v
re tu rn v;
}
c as e M IN US
ca se MI NU S:
// unary minus
r et ur n -p ri m(t ru e);
re tu rn pr im tr ue
c as e L P:
ca se LP
{
d ou bl e e = e xp r(t ru e);
do ub le
ex pr tr ue
i f (c ur r_ to k != R P) r et ur n e rr or
if cu rr _t ok
RP re tu rn er ro r(") e xp ec te d");
ex pe ct ed
g et _t ok en ;

ge t_ to ke n()
// eat ’)’
r et ur n e
re tu rn e;
}
d ef au lt
de fa ul t:
r et ur n e rr or pr im ar y e xp ec te d");
re tu rn er ro r("p ri ma ry ex pe ct ed
}
}

When a N UM BE R (that is, an integer or floating-point literal) is seen, its value is returned. The
NU MB ER
input routine g et _t ok en
ge t_ to ke n() places the value in the global variable n um be r_ va lu e. Use of a global
nu mb er _v al ue
variable in a program often indicates that the structure is not quite clean – that some sort of optimization has been applied. So it is here. Ideally, a lexical token consists of two parts: a value specifying the kind of token (a T ok en _v al ue in this program) and (when needed) the value of the token.
To ke n_ va lu e
Here, there is only a single, simple variable, c ur r_ to k, so the global variable n um be r_ va lu e is
cu rr _t ok
nu mb er _v al ue
needed to hold the value of the last N UM BE R read. Eliminating this spurious global variable is left
NU MB ER
as an exercise (§6.6[21]). Saving the value of n um be r_ va lu e in the local variable v before calling
nu mb er _v al ue
g et _t ok en
ge t_ to ke n() is not really necessary. For every legal input, the calculator always uses one number
in the computation before reading another from input. However, saving the value and displaying it
correctly after an error helps the user.

In the same way that the value of the last N UM BE R is kept in n um be r_ va lu e, the character
NU MB ER
nu mb er _v al ue
string representation of the last N AM E seen is kept in s tr in g_ va lu e. Before doing anything to a
NA ME
st ri ng _v al ue

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

112

Expressions and Statements

Chapter 6

name, the calculator must first look ahead to see if it is being assigned to or simply read. In both
cases, the symbol table is consulted. The symbol table is a m ap (§3.7.4, §17.4.1):
ma p
m ap st ri ng do ub le t ab le
ma p<s tr in g,d ou bl e> ta bl e;

That is, when t ab le is indexed by a s tr in g, the resulting value is the d ou bl e corresponding to the
ta bl e
st ri ng
do ub le
s tr in g. For example, if the user enters
st ri ng
r ad iu s = 6 37 8.3 88

ra di us 63 78 38 8;

the calculator will execute
d ou bl e& v = t ab le ra di us ;
do ub le
ta bl e["r ad iu s"]
// ... expr() calculates the value to be assigned ...
v = 6 37 8.3 88
63 78 38 8;

The reference v is used to hold on to the d ou bl e associated with r ad iu s while e xp r() calculates the
do ub le
ra di us
ex pr
value 6 37 8.3 88 from the input characters.
63 78 38 8
6.1.2 The Input Function [expr.input]
Reading input is often the messiest part of a program. This is because a program must communicate with a person, it must cope with that person’s whims, conventions, and seemingly random
errors. Trying to force the person to behave in a manner more suitable for the machine is often
(rightly) considered offensive. The task of a low-level input routine is to read characters and compose higher-level tokens from them. These tokens are then the units of input for higher-level routines. Here, low-level input is done by g et _t ok en
ge t_ to ke n(). Writing a low-level input routine need not be
an everyday task. Many systems provide standard functions for this.
I build g et _t ok en
ge t_ to ke n() in two stages. First, I provide a deceptively simple version that imposes a
burden on the user. Next, I modify it into a slightly less elegant, but much easier to use, version.
The idea is to read a character, use that character to decide what kind of token needs to be composed, and then return the T ok en _v al ue representing the token read.
To ke n_ va lu e
The initial statements read the first non-whitespace character into c h and check that the read
ch
operation succeeded:

T ok en _v al ue g et _t ok en
To ke n_ va lu e ge t_ to ke n()
{
c ha r c h = 0
ch ar ch 0;
c in ch
ci n>>c h;
s wi tc h (c h) {
sw it ch ch
c as e 0
ca se 0:
r et ur n c ur r_ to k=E ND
re tu rn cu rr _t ok EN D;

// assign and return

By default, operator >> skips whitespace (that is, spaces, tabs, newlines, etc.) and leaves the value
of c h unchanged if the input operation failed. Consequently, c h==0 indicates end of input.
ch
ch 0
Assignment is an operator, and the result of the assignment is the value of the variable assigned
to. This allows me to assign the value E ND to c ur r_ to k and return it in the same statement. HavEN D cu rr _t ok
ing a single statement rather than two is useful in maintenance. If the assignment and the return
became separated in the code, a programmer might update the one and forget to update to the other.

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 6.1.2

The Input Function

113

Let us look at some of the cases separately before considering the complete function. The
expression terminator ´;´, the parentheses, and the operators are handled simply by returning their
values:
c as e ´;´:
ca se
c as e ´*´:
ca se
c as e ´/´:
ca se
c as e ´+´:
ca se
c as e ´-´:
ca se
c as e ´(´:
ca se
c as e ´)´:
ca se
c as e ´=´:
ca se
r et ur n c ur r_ to k=T ok en _v al ue ch ;
re tu rn cu rr _t ok To ke n_ va lu e(c h)

Numbers are handled like this:
c as e ´0 c as e ´1
ca se 0´: ca se 1´: c as e ´2

ca se 2´: c as e ´3
ca se 3´: c as e ´4
ca se 4´:
c as e ´5
ca se 5´: c as e ´6
ca se 6´: c as e ´7
ca se 7´: c as e ´8
ca se 8´: c as e ´9
ca se 9´:
c as e ´.´:
ca se
c in pu tb ac k(c h);
ci n.p ut ba ck ch
c in >> n um be r_ va lu e;
ci n
nu mb er _v al ue
r et ur n c ur r_ to k=N UM BE R;
re tu rn cu rr _t ok NU MB ER

Stacking c as e labels horizontally rather than vertically is generally not a good idea because this
ca se
arrangement is harder to read. However, having one line for each digit is tedious. Because operator >> is already defined for reading floating-point constants into a d ou bl e, the code is trivial. First
do ub le
the initial character (a digit or a dot) is put back into c in Then the constant can be read into
ci n.
n um be r_ va lu e.
nu mb er _v al ue
A name is handled similarly:
d ef au lt
de fa ul t:

// NAME, NAME =, or error
i f (i sa lp ha ch
if is al ph a(c h)) {
c in pu tb ac k(c h);
ci n.p ut ba ck ch
c in st ri ng _v al ue
ci n>>s tr in g_ va lu e;
r et ur n c ur r_ to k=N AM E;
re tu rn cu rr _t ok NA ME
}
e rr or ba d t ok en ;
er ro r("b ad to ke n")
r et ur n c ur r_ to k=P RI NT
re tu rn cu rr _t ok PR IN T;

The standard library function i sa lp ha (§20.4.2) is used to avoid listing every character as a sepais al ph a()
rate c as e label. Operator >> applied to a string (in this case, s tr in g_ va lu e) reads until it hits whiteca se
st ri ng _v al ue
space. Consequently, a user must terminate a name by a space before an operator using the name as
an operand. This is less than ideal, so we will return to this problem in §6.1.3.
Here, finally, is the complete input function:
T ok en _v al ue g et _t ok en
To ke n_ va lu e ge t_ to ke n()
{
c ha r c h = 0
ch ar ch 0;
c in ch
ci n>>c h;

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.

Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

114

Expressions and Statements

Chapter 6

s wi tc h (c h) {
sw it ch ch
c as e 0
ca se 0:
r et ur n c ur r_ to k=E ND
re tu rn cu rr _t ok EN D;
c as e ´;´:
ca se
c as e ´*´:
ca se
c as e ´/´:
ca se
c as e ´+´:
ca se
c as e ´-´:
ca se
c as e ´(´:
ca se
c as e ´)´:
ca se
c as e ´=´:

ca se
r et ur n c ur r_ to k=T ok en _v al ue ch ;
re tu rn cu rr _t ok To ke n_ va lu e(c h)
c as e ´0
ca se 0´: c as e ´1
ca se 1´: c as e ´2
ca se 2´: c as e ´3
ca se 3´: c as e ´4
ca se 4´:
c as e ´5
ca se 5´: c as e ´6
ca se 6´: c as e ´7
ca se 7´: c as e ´8
ca se 8´: c as e ´9
ca se 9´:
c as e ´.´:
ca se
c in pu tb ac k(c h);
ci n.p ut ba ck ch
c in >> n um be r_ va lu e;
ci n
nu mb er _v al ue
r et ur n c ur r_ to k=N UM BE R;
re tu rn cu rr _t ok NU MB ER
d ef au lt
de fa ul t:
// NAME, NAME =, or error
i f (i sa lp ha ch {
if is al ph a(c h))
c in pu tb ac k(c h);

ci n.p ut ba ck ch
c in st ri ng _v al ue
ci n>>s tr in g_ va lu e;
r et ur n c ur r_ to k=N AM E;
re tu rn cu rr _t ok NA ME
}
e rr or ba d t ok en ;
er ro r("b ad to ke n")
r et ur n c ur r_ to k=P RI NT
re tu rn cu rr _t ok PR IN T;
}
}

The conversion of an operator to its token value is trivial because the T ok en _v al ue of an operator
To ke n_ va lu e
was defined as the integer value of the operator (§4.8).
6.1.3 Low-level Input [expr.low]
Using the calculator as defined so far reveals a few inconveniences. It is tedious to remember to
add a semicolon after an expression in order to get its value printed, and having a name terminated
by whitespace only is a real nuisance. For example, x 7 is an identifier – rather than the identifier
x=7
x followed by the operator = and the number 7 Both problems are solved by replacing the type7.
oriented default input operations in g et _t ok en with code that reads individual characters.
ge t_ to ke n()
First, we’ll make a newline equivalent to the semicolon used to mark the end of expression:
T ok en _v al ue g et _t ok en
To ke n_ va lu e ge t_ to ke n()
{
c ha r c h;
ch ar ch

d o { // skip whitespace except ’\n’
do
i f(!c in ge t(c h)) r et ur n c ur r_ to k = E ND
if ci n.g et ch
re tu rn cu rr _t ok EN D;
} w hi le (c h!=´\ n´ && i ss pa ce ch ;
wh il e ch
\n
is sp ac e(c h))

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 6.1.3

Low-level Input

115

s wi tc h (c h) {
sw it ch ch
c as e ´;´:
ca se
c as e ´\ n´:
ca se \n
r et ur n c ur r_ to k=P RI NT
re tu rn cu rr _t ok PR IN T;

A do-statement is used; it is equivalent to a while-statement except that the controlled statement is

always executed at least once. The call c in ge t(c h) reads a single character from the standard
ci n.g et ch
input stream into c h. By default, g et
ch
ge t() does not skip whitespace the way o pe ra to r >> does. The
op er at or
test i f (!c in ge t(c h)) fails if no character can be read from c in in this case, E ND is returned to
if
ci n.g et ch
ci n;
EN D
terminate the calculator session. The operator ! (NOT) is used because g et
ge t() returns t ru e in case
tr ue
of success.
The standard library function i ss pa ce
is sp ac e() provides the standard test for whitespace (§20.4.2);
i ss pa ce c) returns a nonzero value if c is a whitespace character and zero otherwise. The test is
is sp ac e(c
implemented as a table lookup, so using i ss pa ce
is sp ac e() is much faster than testing for the individual
whitespace characters. Similar functions test if a character is a digit – i sd ig it
is di gi t() – a letter – i sa lis al p ha – or a digit or letter – i sa ln um
ph a()
is al nu m().
After whitespace has been skipped, the next character is used to determine what kind of lexical
token is coming.
The problem caused by >> reading into a string until whitespace is encountered is solved by
reading one character at a time until a character that is not a letter or a digit is found:
d ef au lt

de fa ul t:
// NAME, NAME=, or error
i f (i sa lp ha ch
if is al ph a(c h)) {
s tr in g_ va lu e = c h;
st ri ng _v al ue ch
w hi le (c in ge t(c h) && i sa ln um ch
wh il e ci n.g et ch
is al nu m(c h)) s tr in g_ va lu e.p us h_ ba ck ch ;
st ri ng _v al ue pu sh _b ac k(c h)
c in pu tb ac k(c h);
ci n.p ut ba ck ch
r et ur n c ur r_ to k=N AM E;
re tu rn cu rr _t ok NA ME
}
e rr or ba d t ok en ;
er ro r("b ad to ke n")
r et ur n c ur r_ to k=P RI NT
re tu rn cu rr _t ok PR IN T;

Fortunately, these two improvements could both be implemented by modifying a single local section of code. Constructing programs so that improvements can be implemented through local modifications only is an important design aim.
6.1.4 Error Handling [expr.error]
Because the program is so simple, error handling is not a major concern. The error function simply
counts the errors, writes out an error message, and returns:
i nt n o_ of _e rr or s;
in t no _o f_ er ro rs
d ou bl e e rr or co ns t s tr in g& s
do ub le er ro r(c on st st ri ng s)
{
n o_ of _e rr or s++;

no _o f_ er ro rs
c er r << "e rr or " << s << ´\ n´;
ce rr
er ro r:
\n
r et ur n 1
re tu rn 1;
}

The stream c er r is an unbuffered output stream usually used to report errors (§21.2.1).
ce rr

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

116

Expressions and Statements

Chapter 6

The reason for returning a value is that errors typically occur in the middle of the evaluation of
an expression, so we should either abort that evaluation entirely or return a value that is unlikely to
cause subsequent errors. The latter is adequate for this simple calculator. Had g et _t ok en
ge t_ to ke n() kept
track of the line numbers, e rr or
er ro r() could have informed the user approximately where the error
occurred. This would be useful when the calculator is used noninteractively (§6.6[19]).
Often, a program must be terminated after an error has occurred because no sensible way of

continuing has been devised. This can be done by calling e xi t(), which first cleans up things like
ex it
output streams and then terminates the program with its argument as the return value (§9.4.1.1).
More stylized error-handling mechanisms can be implemented using exceptions (see §8.3,
Chapter 14), but what we have here is quite suitable for a 150-line calculator.
6.1.5 The Driver [expr.driver]
With all the pieces of the program in place, we need only a driver to start things. In this simple
example, m ai n() can do that:
ma in
i nt m ai n()
in t ma in
{
t ab le pi
ta bl e["p i"] = 3 14 15 92 65 35 89 79 32 38 5;
3.1 41 59 26 53 58 97 93 23 85
t ab le e"] = 2 71 82 81 82 84 59 04 52 35 4;
ta bl e["e
2.7 18 28 18 28 45 90 45 23 54

// insert predefined names

w hi le (c in {
wh il e ci n)
g et _t ok en ;
ge t_ to ke n()
i f (c ur r_ to k == E ND b re ak
if cu rr _t ok
EN D) br ea k;
i f (c ur r_ to k == P RI NT c on ti nu e;
if cu rr _t ok

PR IN T) co nt in ue
c ou t << e xp r(f al se << ´\ n´;
co ut
ex pr fa ls e)
\n
}
r et ur n n o_ of _e rr or s;
re tu rn no _o f_ er ro rs
}

Conventionally, m ai n() should return zero if the program terminates normally and nonzero otherma in
wise (§3.2). Returning the number of errors accomplishes this nicely. As it happens, the only
initialization needed is to insert the predefined names into the symbol table.
The primary task of the main loop is to read expressions and write out the answer. This is
achieved by the line:
c ou t << e xp r(f al se << ´\ n´;
co ut
ex pr fa ls e)
\n

The argument f al se tells e xp r() that it does not need to call g et _t ok en
fa ls e
ex pr
ge t_ to ke n() to get a current token on
which to work.
Testing c in each time around the loop ensures that the program terminates if something goes
ci n
wrong with the input stream, and testing for E ND ensures that the loop is correctly exited when
EN D
g et _t ok en encounters end-of-file. A break-statement exits its nearest enclosing switch-statement

ge t_ to ke n()
or loop (that is, a for-statement, while-statement, or do-statement). Testing for P RI NT (that is, for
PR IN T
´\ n´ and ´;´) relieves e xp r() of the responsibility for handling empty expressions. A continue\n
ex pr
statement is equivalent to going to the very end of a loop, so in this case

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

Section 6.1.5

The Driver

117

w hi le (c in {
wh il e ci n)
// ...
i f (c ur r_ to k == P RI NT c on ti nu e;
if cu rr _t ok
PR IN T) co nt in ue
c ou t << e xp r(f al se << ´\ n´;
co ut
ex pr fa ls e)
\n
}

is equivalent to

w hi le (c in {
wh il e ci n)
// ...
i f (c ur r_ to k != P RI NT
if cu rr _t ok
PR IN T)
c ou t << e xp r(f al se << ´\ n´;
co ut
ex pr fa ls e)
\n
}

6.1.6 Headers [expr.headers]
The calculator uses standard library facilities. Therefore, appropriate headers must be #i nc lu de to
in cl ud ed
complete the program:
#i nc lu de io st re am
in cl ud e
#i nc lu de st ri ng
in cl ud e<s tr in g>
#i nc lu de ma p>
in cl ud e#i nc lu de cc ty pe
in cl ud e<c ct yp e>

// I/O
// strings
// map
// isalpha(), etc.

All of these headers provide facilities in the s td namespace, so to use the names they provide we
st d
must either use explicit qualification with s td : or bring the names into the global namespace by
st d:
u si ng n am es pa ce s td
us in g na me sp ac e st d;

To avoid confusing the discussion of expressions with modularity issues, I did the latter. Chapter 8
and Chapter 9 discuss ways of organizing this calculator into modules using namespaces and how
to organize it into source files. On many systems, standard headers have equivalents with a .h sufh
fix that declare the classes, functions, etc., and place them in the global namespace (§9.2.1, §9.2.4,
§B.3.1).
6.1.7 Command-Line Arguments [expr.command]
After the program was written and tested, I found it a bother to first start the program, then type the
expressions, and finally quit. My most common use was to evaluate a single expression. If that
expression could be presented as a command-line argument, a few keystrokes could be avoided.
A program starts by calling m ai n() (§3.2, §9.4). When this is done, m ai n() is given two
ma in
ma in
arguments specifying the number of arguments, usually called a rg c, and an array of arguments,
ar gc
usually called a rg v. The arguments are character strings, so the type of a rg v is c ha r*[a rg c+1
ar gv
ar gv ch ar ar gc 1].
The name of the program (as it occurs on the command line) is passed as a rg v[0 so a rg c is
ar gv 0],
ar gc
always at least 1 The list of arguments is zero-terminated; that is, a rg v[a rg c]==0 For example,
1.
ar gv ar gc

0.
for the command
d c 1 50 1.1 93 4
dc 15 0/1 19 34

the arguments have these values:

The C++ Programming Language, Third Edition by Bjarne Stroustrup. Copyright ©1997 by AT&T.
Published by Addison Wesley Longman, Inc. ISBN 0-201-88954-4. All rights reserved.

The C++ Programming Language Third Edition phần 2 pps

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về