Tải bản đầy đủ (.pdf) (103 trang)

The C++ Programming Language Third Edition phần 10 ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (341.56 KB, 103 trang )

Section D.4.4.3

Date and Time Output

909

actual output and may contain the following special-purpose format specifiers:
%a
%a
%A
%A
%b
%b
%B
%B
%c
%c
%d
%d
%H
%H
%I
%I
%j
%j
%m
%m
%M
%M
%p
%p


%S
%S
%U
%U
%w
%w
%W
%W
%x
%x
%X
%X
%y
%y
%Y
%Y
%Z
%Z

abbreviated weekday name (e.g., Sat)
full weekday name (e.g., Saturday)
abbreviated month name (e.g., Feb)
full month name (e.g., February)
date and time (e.g., Sat Feb 06 21:46:05 1999)
day of month [01,31] (e.g., 06)
24-hour clock hour [00,23] (e.g., 21)
12-hour clock hour [01,12] (e.g., 09)
day of year [001,366] (e.g., 037)
month of year [01,12] (e.g., 02)
minute of hour [00,59] (e.g., 48)

a.m./p.m. indicator for 12-hour clock (e.g., PM)
second of minute [00,61] (e.g., 40)
week of year [00,53] starting with Sunday (e.g., 05); the first Sunday starts week 1
day of week [0,6]; 0 means Sunday (e.g., 6)
week of year [00,53] starting with Monday (e.g., 05); the first Monday starts week 1
date (e.g., 02/06/99)
time (e.g., 21:48:40)
year without century [00,99] (e.g., 99)
year (e.g., 1999)
time zone indicator (e.g., EST) if the time zone is known

This long list of very specialized formatting rules could be used as an argument for the use of
extensible I/O systems. However, as with most specialized notations, it is adequate for its task and
often even convenient.
In addition to these formatting directives, most implementations support ‘‘modifiers,’’ such as
an integer specifying a field width (§21.8), %1 0X Modifiers for the time-and-date formats are not
10 X.
part of the C++ standard, but some platform standards, such as POSIX, require them. Consequently, modifiers can be difficult to avoid even if their use isn’t perfectly portable.
The s pr in tf
sp ri nt f-like (§21.8) function s tr ft im e() from st rf ti me
ct im e>
ti me h>
the time and date format directives:
s iz e_ t s tr ft im e(c ha r* s s iz e_ t m ax c on st c ha r* f or ma t, c on st t m* t mp ;
si ze _t st rf ti me ch ar s, si ze _t ma x, co ns t ch ar fo rm at co ns t tm tm p)

This function places a maximum of m ax characters from *t mp and the f or ma t into *s according the
ma x
tm p

fo rm at
s
f or ma t. For example:
fo rm at
i nt m ai n()
in t ma in
{
c on st i nt m ax = 2 0; // sloppy: hope strftime() will never produce more than 20 characters
co ns t in t ma x 20
c ha r b uf ma x];
ch ar bu f[m ax
t im e_ t t = t im e(0 ;
ti me _t
ti me 0)
s tr ft im e(b uf ma x,"%A \n lo ca lt im e(&t ;
st rf ti me bu f,m ax
A\ n",l oc al ti me t))
c ou t << b uf
co ut
bu f;
}

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


910

Locales


Appendix D

On a Wednesday, this will print W ed ne sd ay in the default c la ss ic locale (§D.2.3) and o ns da g in
We dn es da y
cl as si c()
on sd ag
a Danish locale.
Characters that are not part of a format specified, such as the newline in the example, are simply
copied into the first argument (s
s).
When p ut
pu t() identifies a format character f (and optional modifier character m it calls the virm),
tual d o_ pu t() to do the actual formatting: d o_ pu t(b s,f il l,t f,m
do _p ut
do _p ut b,s fi ll t,f m).
A call p ut b,s fi ll t,f m) is a simplified form of p ut
pu t(b s,f il l,t f,m
pu t(), where a format character (f and a
f)
modifier character (m are explicitly provided. Thus,
m)
c on st c ha r f mt
co ns t ch ar fm t[] = "%1 0X
10 X";
p ut b,s fi ll t,f mt fm t+s iz eo f(f mt ;
pu t(b s,f il l,t fm t,f mt si ze of fm t))

can be abbreviated to
p ut b,s fi ll t,´X 10 ;
pu t(b s,f il l,t X´,1 0)


If a format contains multibyte characters, it must both begin and end in the default state (§D.4.6).
We can use p ut to implement a l oc al e-sensitive output operator for D at e:
pu t()
lo ca le
Da te
o st re am o pe ra to r<<(o st re am s c on st D at e& d
os tr ea m& op er at or
os tr ea m& s, co ns t Da te d)
{
o st re am :s en tr y g ua rd s);
os tr ea m: se nt ry gu ar d(s
// see §21.3.8
i f (!g ua rd r et ur n s
if
gu ar d) re tu rn s;
t m* t mp = l oc al ti me d.d ;
tm tm p lo ca lt im e(&d d)
t ry {
tr y
i f (u se _f ac et t im e_ pu t<c ha r> >(s ge tl oc
if us e_ fa ce t< ti me _p ut ch ar
s.g et lo c()).p ut s,s s.f il l(),t mp x´).f ai le d())
pu t(s s,s fi ll
tm p,´x
fa il ed
s se ts ta te io s_ ba se :f ai lb it ;
s.s et st at e(i os _b as e: fa il bi t)
}
c at ch (...) {

ca tc h
h an dl e_ io ex ce pt io n(s ;
ha nd le _i oe xc ep ti on s)
// see §D.4.2.2
}
r et ur n s
re tu rn s;
}

Since there is no standard D at e type, there is no default layout for date I/O. Here, I specified the
Da te
%x format by passing the character ´x as the format character. Because the %x format is the
x

x
default for g et _t im e() (§D.4.4.4), that is probably as close to a standard as one can get. See
ge t_ ti me
§D.4.4.5 for an example of how to use alternative formats.
A _ by na me version (§D.4, §D.4.1) of t im e_ pu t is also provided:
_b yn am e
ti me _p ut
t em pl at e <c la ss C h, c la ss O ut = o st re am bu f_ it er at or Ch >
te mp la te cl as s Ch cl as s Ou t os tr ea mb uf _i te ra to r<C h>
c la ss s td :t im e_ pu t_ by na me : p ub li c t im e_ pu tcl as s st d: ti me _p ut _b yn am e pu bl ic ti me _p ut Ch Ou t>

D.4.4.4 Date and Time Input
As ever, input is trickier than output. When we write code to output a value, we often have a
choice among different formats. In addition, when we write input code, we must deal with errors
and sometimes the possibility of several alternative formats.

The t im e_ ge t facet implements input of time and date. The idea is that t im e_ ge t of a l oc al e can
ti me _g et
ti me _g et
lo ca le

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


Section D.4.4.4

Date and Time Input

911

read the times and dates produced by the l oc al e’s t im e_ pu t. However, there are no standard d at e
lo ca le ti me _p ut
da te
and t im e classes, so a programmer can use a locale to produce output according to a variety of forti me
mats. For example, the following representations could all be produced by using a single output
statement, using t im e_ pu t (§D.4.4.5) from different locales:
ti me _p ut
J an ua ry 1 5t h 1 99 9
Ja nu ar y 15 th 19 99
T hu rs da y 1 5t h J an ua ry 1 99 9
Th ur sd ay 15 th Ja nu ar y 19 99
1 5 J an 1 99 9A D
15 Ja n 19 99 AD
T hu rs 1 5/1 99
Th ur s 15 1/9 9


The C++ standard encourages implementers of t im e_ ge t to accept dates and time formats as speciti me _g et
fied by POSIX and other standards. The problem is that it is difficult to standardize the intent to
read dates and times in whatever format is conventional in a given culture. It is wise to experiment
to see what a given locale provides (§D.6[8]). If a format isn’t accepted, a programmer can provide
a suitable alternative t im e_ ge t facet.
ti me _g et
The standard time input f ac et t im e_ ge t, is derived from t im e_ ba se
fa ce t, ti me _g et
ti me _b as e:
s tr uc t s td :t im e_ ba se {
st ru ct st d: ti me _b as e
e nu m d at eo rd er {
en um da te or de r
n o_ or de r, // no order, possibly more elements (such as day of week)
no _o rd er
d my
dm y,
// day before month before year
m dy
md y,
// month before day before year
y md
ym d,
// year before month before day
y dm
yd m
// year before day before month
};
};


An implementer can use this enumeration to simplify the parsing on date formats.
Like n um _g et t im e_ ge t accesses its buffer through a pair of input iterators:
nu m_ ge t, ti me _g et
t em pl at e <c la ss C h, c la ss I n = i st re am bu f_ it er at or Ch >
te mp la te cl as s Ch cl as s In is tr ea mb uf _i te ra to r<C h>
c la ss t im e_ ge t : p ub li c l oc al e::f ac et p ub li c t im e_ ba se {
cl as s ti me _g et pu bl ic lo ca le fa ce t, pu bl ic ti me _b as e
p ub li c:
pu bl ic
t yp ed ef C h c ha r_ ty pe
ty pe de f Ch ch ar _t yp e;
t yp ed ef I n i te r_ ty pe
ty pe de f In it er _t yp e;
e xp li ci t t im e_ ge t(s iz e_ t r = 0 ;
ex pl ic it ti me _g et si ze _t
0)
d at eo rd er d at e_ or de r() c on st { r et ur n d o_ da te _o rd er ; }
da te or de r da te _o rd er
co ns t re tu rn do _d at e_ or de r()
// read [b,e) into d, using formatting rules from s, reporting errors by setting r:
I n g et _t im e(I n b I n e i os _b as e& s i os _b as e::i os ta te r t m* d c on st
In ge t_ ti me In b, In e, io s_ ba se s, io s_ ba se io st at e& r, tm d) co ns t;
I n g et _d at e(I n b I n e i os _b as e& s i os _b as e::i os ta te r t m* d c on st
In ge t_ da te In b, In e, io s_ ba se s, io s_ ba se io st at e& r, tm d) co ns t;
I n g et _y ea r(I n b I n e i os _b as e& s i os _b as e::i os ta te r t m* d c on st
In ge t_ ye ar In b, In e, io s_ ba se s, io s_ ba se io st at e& r, tm d) co ns t;
I n g et _w ee kd ay In b I n e i os _b as e& s i os _b as e::i os ta te r t m* d c on st
In ge t_ we ek da y(I n b, In e, io s_ ba se s, io s_ ba se io st at e& r, tm d) co ns t;
I n g et _m on th na me In b I n e i os _b as e& s i os _b as e::i os ta te r t m* d c on st

In ge t_ mo nt hn am e(I n b, In e, io s_ ba se s, io s_ ba se io st at e& r, tm d) co ns t;
s ta ti c l oc al e::i d i d; // facet identifier object (§D.2, §D.3, §D.3.1)
st at ic lo ca le id id
p ro te ct ed
pr ot ec te d:
˜t im e_ ge t();
ti me _g et
// virtual ‘‘do_’’ functions for public functions (see §D.4.1)
};

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


912

Locales

Appendix D

The g et _t im e() function calls d o_ ge t_ ti me
ge t_ ti me
do _g et _t im e(). The default g et _t im e() reads time as produced by
ge t_ ti me
the l oc al e’s t im e_ pu t::p ut
lo ca le ti me _p ut pu t(), using the %X format (§D.4.4). Similarly, the g et _d at e() function
X
ge t_ da te
calls d o_ ge t_ da te
do _g et _d at e(). The default g et _d at e() reads a date as produced by the l oc al e’s

ge t_ da te
lo ca le
t im e_ pu t::p ut
ti me _p ut pu t(), using the %x format (§D.4.4).
x
Thus, the simplest input operator for D at es is something like this:
Da te
i st re am o pe ra to r>>(i st re am s D at e& d
is tr ea m& op er at or
is tr ea m& s, Da te d)
{
i st re am :s en tr y g ua rd s);
is tr ea m: se nt ry gu ar d(s
// see §21.3.8
i f (!g ua rd r et ur n s
if
gu ar d) re tu rn s;
i os _b as e::i os ta te r es = 0
io s_ ba se io st at e re s 0;
t m x = { 0 };
tm
i st re am bu f_ it er at or ch ar ch ar _t ra it s<c ha r> > e nd
is tr ea mb uf _i te ra to ren d;
t ry {
tr y
u se _f ac et t im e_ ge t<c ha r> >(s ge tl oc
us e_ fa ce t< ti me _g et ch ar
s.g et lo c()).g et _d at e(s en d,s re s,&x ;
ge t_ da te s,e nd s,r es x)

i f (r es 0 || r es io s_ ba se :e of bi t)
if re s==0
re s==i os _b as e: eo fb it
d = D at e(x tm _m da y,D at e::M on th x.t m_ mo n+1 ,x tm _y ea r+1 90 0);
Da te x.t m_ md ay Da te Mo nt h(x tm _m on 1) x.t m_ ye ar 19 00
e ls e
el se
s se ts ta te re s);
s.s et st at e(r es
}
c at ch (...) {
ca tc h
h an dl e_ io ex ce pt io n(s ;
ha nd le _i oe xc ep ti on s)
// see §D.4.2.2
}
r et ur n s
re tu rn s;
}

The call g et _d at e(s en d,s re s,&x relies on two implicit conversions from i st re am As the first
ge t_ da te s,e nd s,r es x)
is tr ea m:
argument, s is used to construct an i st re am bu f_ it er at or As third argument, s is converted to the
is tr ea mb uf _i te ra to r.
i st re am base class i os _b as e.
is tr ea m
io s_ ba se
This input operator will work correctly for dates in the range that can be represented by t im e_ t.
ti me _t

A trivial test case would be:
i nt m ai n()
in t ma in
t ry {
tr y
D at e t od ay
Da te to da y;
c ou t << t od ay << e nd l;
co ut
to da y
en dl
// write using %x format
D at e d 12 D at e::m ay 1 99 8);
Da te d(1 2, Da te ma y, 19 98
c ou t << d << e nd l;
co ut
en dl
D at e d d;
Da te dd
w hi le (c in >> d d) c ou t << d d << e nd l;
wh il e ci n
dd co ut
dd
en dl

// read dates produced by %x format

}
c at ch (D at e::B ad _d at e) {
ca tc h Da te Ba d_ da te

c ou t << "e xi t: b ad d at e c au gh t\ n";
co ut
ex it ba d da te ca ug ht \n
}

A _ by na me version (§D.4, §D.4.1) of t im e_ ge t is also provided:
_b yn am e
ti me _g et
t em pl at e <c la ss C h, c la ss I n = i st re am bu f_ it er at or Ch >
te mp la te cl as s Ch cl as s In is tr ea mb uf _i te ra to r<C h>
c la ss s td :t im e_ ge t_ by na me : p ub li c t im e_ ge t<C h,I n> { /* ... */ };
cl as s st d: ti me _g et _b yn am e pu bl ic ti me _g et Ch In

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


Section D.4.4.4

Date and Time Input

913

D.4.4.5 A More Flexible Date Class
If you tried to use the D at e class from §D.4.4.2 with the I/O from §D.4.4.3 and §D.4.4.4, you’d
Da te
soon find it restrictive:
[1] It can handle only dates that can be represented by a t im e_ t; that typically means in the
ti me _t
[1970,2038] range.

[2] It accepts dates only in the standard format – whatever that might be.
[3] Its reporting of input errors is unacceptable.
[4] It supports only streams of c ha r – not streams of arbitrary character types.
ch ar
A more interesting and more useful input operator would accept a wider range of dates, recognize a
few common formats, and reliably report errors in a useful form. To do this, we must depart from
the t im e_ t representation:
ti me _t
c la ss D at e {
cl as s Da te
p ub li c:
pu bl ic
e nu m M on th { j an 1, f eb m ar a pr m ay j un j ul a ug s ep o ct n ov d ec };
en um Mo nt h ja n=1 fe b, ma r, ap r, ma y, ju n, ju l, au g, se p, oc t, no v, de c
s tr uc t B ad _d at e {
st ru ct Ba d_ da te
c on st c ha r* w hy
co ns t ch ar wh y;
B ad _d at e(c on st c ha r* p : w hy p) { }
Ba d_ da te co ns t ch ar p) wh y(p
};
D at e(i nt d d, M on th m m, i nt y y, i nt d ay _o f_ we ek = 0 ;
Da te in t dd Mo nt h mm in t yy in t da y_ of _w ee k 0)
D at e();
Da te
v oi d m ak e_ tm tm t c on st
vo id ma ke _t m(t m* t) co ns t;
t im e_ t m ak e_ ti me _t c on st
ti me _t ma ke _t im e_ t() co ns t;


// place tm representation of Date in *t
// return time_t representation of Date

i nt y ea r() c on st { r et ur n y }
in t ye ar
co ns t re tu rn y;
M on th m on th
Mo nt h mo nt h() c on st { r et ur n m }
co ns t re tu rn m;
i nt d ay c on st { r et ur n d }
in t da y() co ns t re tu rn d;
// ...
p ri va te
pr iv at e:
c ha r d
ch ar d;
M on th m
Mo nt h m;
i nt y
in t y;
};

For simplicity, I reverted to the (d m,y representation (§10.2).
d,m y)
The constructor might be defined like this:
D at e::D at e(i nt d d, M on th m m, i nt y y, i nt d ay _o f_ we ek
Da te Da te in t dd Mo nt h mm in t yy in t da y_ of _w ee k)
:d dd , m mm , y yy
d(d d) m(m m) y(y y)
{

i f (d 0 && m Mo nt h(0 && y 0) r et ur n;
if d==0
m==M on th 0)
y==0 re tu rn
// Date(0,0,0) is the "null date"
i f (m mif mm ja n
de c
The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


914

Locales

Appendix D

i f (d d<1 || 3 1if dd 1
31 dd
t hr ow B ad _d at e("b ad d ay o f m on th ;
th ro w Ba d_ da te ba d da y of mo nt h")
i f (d ay _o f_ we ek && d ay _i n_ we ek yy mm dd
if da y_ of _w ee k
da y_ in _w ee k(y y,m m,d d)!=d ay _o f_ we ek
da y_ of _w ee k)
t hr ow B ad _d at e("b ad d ay o f w ee k");
th ro w Ba d_ da te ba d da y of we ek

}
D at e::D at e() :d 0), m 0), y 0) { } // a "null date"
Da te Da te
d(0 m(0 y(0

The d ay _i n_ we ek
da y_ in _w ee k() calculation is nontrivial and immaterial to the l oc al e mechanisms, so I have
lo ca le
left it out. If you need one, your system will have one somewhere.
Comparison operations are always useful for types such as D at e:
Da te
b oo l o pe ra to r==(c on st D at e& x c on st D at e& y
bo ol op er at or
co ns t Da te x, co ns t Da te y)
{
r et ur n x ye ar
re tu rn x.y ea r()==y ye ar
y.y ea r() && x mo nt h()==y mo nt h() && x da y()==y da y();
x.m on th
y.m on th
x.d ay
y.d ay
}
b oo l o pe ra to r!=(c on st D at e& x c on st D at e& y
bo ol op er at or
co ns t Da te x, co ns t Da te y)
{
r et ur n !(x y);
re tu rn
x==y

}

Having departed from the standard t m and t im e_ t formats, we need conversion functions to coopertm
ti me _t
ate with software that expects those types:
v oi d D at e::m ak e_ tm tm p c on st
vo id Da te ma ke _t m(t m* p) co ns t
{
t m x = { 0 };
tm
*p = x
p x;
p tm _y ea r = y 19 00
p->t m_ ye ar y-1 90 0;
p tm _m da y = d
p->t m_ md ay d;
p tm _m on = m 1;
p->t m_ mo n m-1
}

// put date into *p

t im e_ t D at e::m ak e_ ti me _t c on st
ti me _t Da te ma ke _t im e_ t() co ns t
{
i f (y 19 70 || 2 03 8if y<1 97 0
20 38 y)
// oversimplified
t hr ow B ad _d at e("d at e o ut o f r an ge f or t im e_ t");

th ro w Ba d_ da te da te ou t of ra ng e fo r ti me _t
tm x
tm x;
m ak e_ tm x);
ma ke _t m(&x
r et ur n m kt im e(&x ;
re tu rn mk ti me x)
}

D.4.4.6 Specifying a D at e Format
Da te
C++ doesn’t define a standard output format for dates (%x is as close as we get; §D.4.4.3). Howx
ever, even if a standard format existed, we would probably want to be able to use alternatives. This
could be done by providing a ‘‘default format’’ and a way of changing it. For example:

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


Section D.4.4.6

Specifying a D at e Format
Da te

915

c la ss D at e_ fo rm at {
cl as s Da te _f or ma t
s ta ti c c ha r f mt ;
st at ic ch ar fm t[]

// default format
c on st c ha r* c ur r;
co ns t ch ar cu rr
// current format
c on st c ha r* c ur r_ en d;
co ns t ch ar cu rr _e nd
p ub li c:
pu bl ic
D at e_ fo rm at
Da te _f or ma t() :c ur r(f mt , c ur r_ en d(f mt st rl en fm t)) { }
cu rr fm t) cu rr _e nd fm t+s tr le n(f mt
c on st c ha r* b eg in
co ns t ch ar be gi n() c on st { r et ur n c ur r; }
co ns t re tu rn cu rr
c on st c ha r* e nd
co ns t ch ar en d() c on st { r et ur n c ur r_ en d; }
co ns t re tu rn cu rr _e nd
v oi d s et co ns t c ha r* p c on st c ha r* q { c ur r=p c ur r_ en d=q }
vo id se t(c on st ch ar p, co ns t ch ar q) cu rr p; cu rr _e nd q;
v oi d s et co ns t c ha r* p { c ur r=p c ur r_ en d=c ur r+s tr le n(p ; }
vo id se t(c on st ch ar p) cu rr p; cu rr _e nd cu rr st rl en p)
s ta ti c c on st c ha r* d ef au lt _f mt
st at ic co ns t ch ar de fa ul t_ fm t() { r et ur n f mt }
re tu rn fm t;
};
c on st c ha r D at e_ fo rm at :f mt
co ns t ch ar Da te _f or ma t: fm t[] = "%A %B %d %Y
A, B d, Y"; // e.g., Friday, February 5, 1999
D at e_ fo rm at d at e_ fm t;
Da te _f or ma t da te _f mt


To be able to use that s tr ft im e() format (§D.4.4.3), I have refrained from parameterizing the
st rf ti me
D at e_ fo rm at class on the character type used. This implies that this solution allows only date notaDa te _f or ma t
tions for which the format can be expressed as a c ha r[]. I also used a global format object
ch ar
(d at e_ fm t) to provide a default D at e format. Since the value of d at e_ fm t can be changed, this proda te _f mt
Da te
da te _f mt
vides a crude way of controlling D at e formatting, similar to the way g lo ba l() (§D.2.3) can be
Da te
gl ob al
used to control formatting.
A more general solution is to add D at e_ in and D at e_ ou t facets to control reading and writing
Da te _i n
Da te _o ut
from a stream. That approach is presented in §D.4.4.7.
Given D at e_ fo rm at D at e::o pe ra to r<<() can be written like this:
Da te _f or ma t, Da te op er at or
t em pl at e<c la ss C h, c la ss T r>
te mp la te cl as s Ch cl as s Tr
b as ic _o st re am Ch Tr
ba si c_ os tr ea m<C h,T r>& o pe ra to r<<(b as ic _o st re am Ch Tr
op er at or
ba si c_ os tr ea m<C h,T r>& s c on st D at e& d
s, co ns t Da te d)
// write according to user-specified format
{
t yp en am e b as ic _o st re am Ch Tr :s en tr y g ua rd s); // see §21.3.8
ty pe na me ba si c_ os tr ea m<C h,T r>: se nt ry gu ar d(s

i f (!g ua rd r et ur n s
if
gu ar d) re tu rn s;
tm t
tm t;
d ma ke _t m(&t ;
d.m ak e_ tm t)
t ry {
tr y
c on st t im e_ pu t<C h>& f = u se _f ac et t im e_ pu t<C h> >(s ge tl oc
co ns t ti me _p ut Ch
us e_ fa ce t< ti me _p ut Ch
s.g et lo c());
i f (f pu t(s s,s fi ll ,&t da te _f mt be gi n(),d at e_ fm t.e nd
if f.p ut s,s s.f il l() t,d at e_ fm t.b eg in
da te _f mt en d()).f ai le d())
fa il ed
s se ts ta te io s_ ba se :f ai lb it ;
s.s et st at e(i os _b as e: fa il bi t)
}
c at ch (...) {
ca tc h
h an dl e_ io ex ce pt io n(s ;
ha nd le _i oe xc ep ti on s)
// see §D.4.2.2
}
r et ur n s
re tu rn s;
}


I could have used h as _f ac et to verify that s locale had a t im e_ pu t<C h> facet. However, here it
ha s_ fa ce t
s’s
ti me _p ut Ch
seemed simpler to handle that problem by catching any exception thrown by u se _f ac et
us e_ fa ce t.

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


916

Locales

Appendix D

Here is a simple test program that controls the output format through d at e_ fm t:
da te _f mt
i nt m ai n()
in t ma in
t ry {
tr y
w hi le (c in >> d d && d d != D at e()) c ou t << d d << e nd l;
wh il e ci n
dd
dd
Da te
co ut
dd

en dl

// write using default date_fmt

d at e_ fm t.s et
da te _f mt se t("%Y m/%d ;
Y/%m d")
w hi le (c in >> d d && d d != D at e()) c ou t << d d << e nd l;
wh il e ci n
dd
dd
Da te
co ut
dd
en dl
}
c at ch (D at e::B ad _d at e e {
ca tc h Da te Ba d_ da te e)
c ou t << "b ad d at e c au gh t: " << e wh y << e nd l;
co ut
ba d da te ca ug ht
e.w hy
en dl
}

// write using "%Y/%m/%d"

D.4.4.7 A D at e Input Facet
Da te
As ever, input is a bit more difficult than output. However, because the interface to low-level input

is fixed by g et _d at e() and because the o pe ra to r>>() defined for D at e in §D.4.4.4 didn’t directly
ge t_ da te
op er at or
Da te
access the representation of a D at e, we could use that o pe ra to r>>() unchanged. Here is a templaDa te
op er at or
tized version to match the o pe ra to r<<() from §D.4.4.6:
op er at or
t em pl at e<c la ss C h, c la ss T r>
te mp la te cl as s Ch cl as s Tr
i st re am Ch Tr
is tr ea m<C h,T r>& o pe ra to r>>(i st re am Ch Tr
op er at or
is tr ea m<C h,T r>& s D at e& d
s, Da te d)
{
t yp en am e i st re am Ch Tr :s en tr y g ua rd s);
ty pe na me is tr ea m<C h,T r>: se nt ry gu ar d(s
i f (g ua rd t ry {
if gu ar d) tr y
i os _b as e::i os ta te r es = 0
io s_ ba se io st at e re s 0;
t m x = { 0 };
tm
i st re am bu f_ it er at or Ch Tr e nd
is tr ea mb uf _i te ra to r<C h,T r> en d;
u se _f ac et t im e_ ge t<C h> >(s ge tl oc
us e_ fa ce t< ti me _g et Ch
s.g et lo c()).g et _d at e(s en d,s re s,&x ;
ge t_ da te s,e nd s,r es x)

i f (r es 0 || r es io s_ ba se :e of bi t)
if re s==0
re s==i os _b as e: eo fb it
d = D at e(x tm _m da y,D at e::M on th x.t m_ mo n+1 ,x tm _y ea r+1 90 0,x tm _w da y);
Da te x.t m_ md ay Da te Mo nt h(x tm _m on 1) x.t m_ ye ar 19 00 x.t m_ wd ay
e ls e
el se
s se ts ta te re s);
s.s et st at e(r es
}
c at ch (...) {
ca tc h
h an dl e_ io ex ce pt io n(s ;
ha nd le _i oe xc ep ti on s)
}
r et ur n s
re tu rn s;

// see §D.4.2.2

}

This D at e input operator calls g et _d at e() from the i st re am t im e_ ge t facet (§D.4.4.4). ThereDa te
ge t_ da te
is tr ea m’s ti me _g et
fore, we can provide a different and more flexible form of input by defining a new facet derived
from t im e_ ge t:
ti me _g et
t em pl at e<c la ss C h, c la ss I n = i st re am bu f_ it er at or Ch >
te mp la te cl as s Ch cl as s In is tr ea mb uf _i te ra to r<C h>

c la ss D at e_ in : p ub li c s td :t im e_ ge t<C h,I n> {
cl as s Da te _i n pu bl ic st d: ti me _g et Ch In
p ub li c:
pu bl ic
D at e_ in si ze _t r = 0 : s td :t im e_ ge t<C h>(r { }
Da te _i n(s iz e_ t
0) st d: ti me _g et Ch r)

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


Section D.4.4.7

A D at e Input Facet
Da te

917

p ro te ct ed
pr ot ec te d:
I n d o_ ge t_ da te In b I n e i os _b as e& s i os _b as e::i os ta te r t m* t mp c on st
In do _g et _d at e(I n b, In e, io s_ ba se s, io s_ ba se io st at e& r, tm tm p) co ns t;
p ri va te
pr iv at e:
e nu m V ty pe { n ov al ue u nk no wn d ay of we ek m on th };
en um Vt yp e no va lu e, un kn ow n, da yo fw ee k, mo nt h
I n g et va l(I n b I n e i os _b as e& s i os _b as e::i os ta te r i nt v V ty pe r es c on st
In ge tv al In b, In e, io s_ ba se s, io s_ ba se io st at e& r, in t* v, Vt yp e* re s) co ns t;
};


The g et va l() needs to read a year, a month, a day of the month, and optionally a day of the week
ge tv al
and compose the result into a t m.
tm
The names of the months and the names of the days of the week are locale specific. Consequently, we can’t mention them directly in our input function. Instead, we recognize months and
days by calling the functions that t im e_ ge t provides for that: g et _m on th na me
ti me _g et
ge t_ mo nt hn am e() and
g et _w ee kd ay (§D.4.4.4).
ge t_ we ek da y()
The year, the day of the month, and possibly the month are represented as integers. Unfortunately, a number does not indicate whether it denotes a day or a month, or whatever. For example,
7 could denote July, day 7 of a month, or even the year 2007. The real purpose of t im e_ ge t’s
ti me _g et
d at e_ or de r() is to resolve such ambiguities.
da te _o rd er
The strategy of D at e_ in is to read values, classify them, and then use d at e_ or de r() to see
Da te _i n
da te _o rd er
whether (or how) the values entered make sense. The private g et va l() function does the actual
ge tv al
reading from the stream buffer and the initial classification:
t em pl at e<c la ss C h, c la ss I n>
te mp la te cl as s Ch cl as s In
I n D at e_ in Ch In :g et va l(I n b I n e
In Da te _i n<C h,I n>: ge tv al In b, In e,
i os _b as e& s i os _b as e::i os ta te r i nt v V ty pe r es c on st
io s_ ba se s, io s_ ba se io st at e& r, in t* v, Vt yp e* re s) co ns t
// read part of Date: number, day_of_week, or month. Skip whitespace and punctuation.
{

c on st c ty pe Ch
co ns t ct yp e<C h>& c t = u se _f ac et c ty pe Ch >(s ge tl oc
ct us e_ fa ce t< ct yp e<C h> s.g et lo c()); // ctype is defined in §D.4.5
Ch c
Ch c;
*r es = n ov al ue // no value found
re s no va lu e;
f or (;;) {// skip whitespace and punctuation
fo r
i f (b == e r et ur n e
if b
e) re tu rn e;
c = *b
b;
i f (!(c t.i s(c ty pe _b as e::s pa ce c) || c t.i s(c ty pe _b as e::p un ct c))) b re ak
if
ct is ct yp e_ ba se sp ac e,c
ct is ct yp e_ ba se pu nc t,c
br ea k;
++b
b;
}
i f (c t.i s(c ty pe _b as e::d ig it c)) {
if ct is ct yp e_ ba se di gi t,c
i nt i = 0
in t
0;

// read integer without regard for numpunct


d o { // turn digit from arbitrary character set into decimal value:
do
s ta ti c c ha r c on st d ig it s[] = "0 12 34 56 78 9";
st at ic ch ar co ns t di gi ts
01 23 45 67 89
i = i 10 + f in d(d ig it s,d ig it s+1 0,c t.n ar ro w(c
i*1 0 fi nd di gi ts di gi ts 10 ct na rr ow c,´ ´))-d ig it s;
di gi ts
c = *++b
b;
} w hi le (c t.i s(c ty pe _b as e::d ig it c));
wh il e ct is ct yp e_ ba se di gi t,c

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


918

Locales

Appendix D

*v = i
v i;
*r es = u nk no wn
re s un kn ow n;
r et ur n b
re tu rn b;


// an integer, but we don’t know what it represents

}
i f (c t.i s(c ty pe _b as e::a lp ha c)) { // look for name of month or day of week
if ct is ct yp e_ ba se al ph a,c
b as ic _s tr in g<C h> s tr
ba si c_ st ri ng Ch st r;
w hi le (c t.i s(c ty pe _b as e::a lp ha c)) {
wh il e ct is ct yp e_ ba se al ph a,c
// read characters into string
s tr += c
st r
c;
i f (++b == e b re ak
if
b
e) br ea k;
c = *b
b;
}
tm t
tm t;
b as ic _s tr in gs tr ea m<C h> s s(s tr ;
ba si c_ st ri ng st re am Ch ss st r)
t yp ed ef i st re am bu f_ it er at or Ch S I;
ty pe de f is tr ea mb uf _i te ra to r<C h> SI
// iterator type for ss’ buffer
g et _m on th na me ss rd bu f(),S I(),s r,&t ; // read from in-memory stream buffer
ge t_ mo nt hn am e(s s.r db uf
SI

s,r t)
i f ((r io s_ ba se :b ad bi t|i os _b as e::f ai lb it
if
r&(i os _b as e: ba db it io s_ ba se fa il bi t))==0 {
0)
*v t tm _m on
v= t.t m_ mo n;
*r es = m on th
re s mo nt h;
r et ur n b
re tu rn b;
}
r=0
0;
// clear state before trying to read a second time
g et _w ee kd ay ss rd bu f(),S I(),s r,&t ; // read from in-memory stream buffer
ge t_ we ek da y(s s.r db uf
SI
s,r t)
i f ((r io s_ ba se :b ad bi t|i os _b as e::f ai lb it
if
r&(i os _b as e: ba db it io s_ ba se fa il bi t))==0 {
0)
*v = t tm _w da y;
v t.t m_ wd ay
*r es = d ay of we ek
re s da yo fw ee k;
r et ur n b
re tu rn b;
}

}
r |= i os _b as e::f ai lb it
io s_ ba se fa il bi t;
r et ur n b
re tu rn b;
}

The tricky part here is to distinguish months from weekdays. We read through input iterators, so
we cannot read [b e) twice, looking first for a month and then for a day. On the other hand, we
b,e
cannot look at one character at a time and decide, because only g et _m on th na me
ge t_ mo nt hn am e() and
g et _w ee kd ay know which character sequences make up the names of the months and the names
ge t_ we ek da y()
of the days of the week in a given locale. The solution I chose was to read strings of alphabetic
characters into a s tr in g, make a s tr in gs tr ea m from that string, and then repeatedly read from that
st ri ng
st ri ng st re am
stream’s s tr ea mb uf
st re am bu f.
The error recording uses the state bits, such as i os _b as e::b ad bi t, directly. This is necessary
io s_ ba se ba db it
because the more convenient functions for manipulating stream state, such as c le ar
cl ea r() and s et se ts ta te
st at e(), are defined in b as ic _i os rather than in its base i os _b as e (§21.3.3). If necessary, the >>
ba si c_ io s
io s_ ba se
operator then uses the error results reported by g et _d at e() to reset the state of the input stream.
ge t_ da te
Given g et va l(), we can read values first and then try to see whether they make sense later. The

ge tv al
d at e_ or de r() can be crucial:
da te _o rd er

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


Section D.4.4.7

A D at e Input Facet
Da te

919

t em pl at e<c la ss C h, c la ss I n>
te mp la te cl as s Ch cl as s In
I n D at e_ in Ch In :d o_ ge t_ da te In b I n e i os _b as e& s i os _b as e::i os ta te r t m* t mp c on st
In Da te _i n<C h,I n>: do _g et _d at e(I n b, In e, io s_ ba se s, io s_ ba se io st at e& r, tm tm p) co ns t
// optional day of week followed by ymd, dmy, mdy, or ydm
{
i nt v al 3];
in t va l[3
// for day, month, and year values in some order
V ty pe r es 3] = { n ov al ue };
Vt yp e re s[3
no va lu e
// for value classifications
f or (i nt i 0; b e && i 3; ++i { // read day, month, and year
fo r in t i=0 b!=e

i<3
i)
b = g et va l(b e,s r,&v al i],&r es i]);
ge tv al b,e s,r va l[i
re s[i
i f (r r et ur n b
if r) re tu rn b;
// oops: error
i f (r es i]==n ov al ue {
if re s[i
no va lu e)
// couldn’t complete date
r |= i os _b as e::b ad bi t;
io s_ ba se ba db it
r et ur n b
re tu rn b;
}
i f (r es i]==d ay of we ek {
if re s[i
da yo fw ee k)
t mp tm _w da y = v al i];
tm p->t m_ wd ay va l[i
--i // oops: not a day, month, or year
i;
}
}
t im e_ ba se :d at eo rd er o rd er = d at e_ or de r();
ti me _b as e: da te or de r or de r da te _o rd er
i f (r es 0] == m on th {
if re s[0

mo nt h)
// ...
}
e ls e i f (r es 1] == m on th {
el se if re s[1
mo nt h)
t mp tm _m on = v al 1];
tm p->t m_ mo n va l[1
s wi tc h (o rd er {
sw it ch or de r)
c as e d my
ca se dm y:
t mp tm _m da y = v al 0];
tm p->t m_ md ay va l[0
t mp tm _y ea r = v al 2];
tm p->t m_ ye ar va l[2
b re ak
br ea k;
c as e y md
ca se ym d:
t mp tm _y ea r = v al 0];
tm p->t m_ ye ar va l[0
t mp tm _m da y = v al 2];
tm p->t m_ md ay va l[2
b re ak
br ea k;
d ef au lt
de fa ul t:
r |= i os _b as e::b ad bi t;
io s_ ba se ba db it

r et ur n b
re tu rn b;
}
}
e ls e i f (r es 2] == m on th {
el se if re s[2
mo nt h)
// ...
}
e ls e {
el se
// ...
}
t mp tm _y ea r -= 1 90 0;
tm p->t m_ ye ar
19 00
r et ur n b
re tu rn b;

// now try to make sense of the values read

// mdy or error

// dmy or ymd or error

// ydm or error

// rely on dateorder or error

// adjust base year to suit tm convention


}

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


920

Locales

Appendix D

I have omitted bits of code that do not add to the understanding of locales, dates, or the handling of
input. Writing better and more general date input functions are left as exercises (§D.6[9-10]).
Here is a simple test program:
i nt m ai n()
in t ma in
t ry {
tr y
c in im bu e(l oc lo ca le ,n ew D at e_ in ; // read Dates using Date_in
ci n.i mb ue lo c(l oc al e() ne w Da te _i n))
w hi le (c in >> d d && d d != D at e()) c ou t << d d << e nd l;
wh il e ci n
dd
dd
Da te
co ut
dd
en dl

}
c at ch (D at e::B ad _d at e e {
ca tc h Da te Ba d_ da te e)
c ou t << "b ad d at e c au gh t: " << e wh y << e nd l;
co ut
ba d da te ca ug ht
e.w hy
en dl
}

Note that d o_ ge t_ da te will accept meaningless dates, such as
do _g et _d at e()
T hu rs da y O ct ob er 7 1 99 8
Th ur sd ay Oc to be r 7, 19 98

and
1 99 9/F eb 31
19 99 Fe b/3 1

The checks for consistency of the year, month, day, and optional day of the week are done in
D at e’s constructor. It is the D at e class’ job to know what constitutes a correct date, and it is not
Da te
Da te
necessary for D at e_ in to share that knowledge.
Da te _i n
It would be possible to have g et va l() or d o_ ge t_ da te
ge tv al
do _g et _d at e() guess about the meaning of numeric
values. For example,
1 2 M ay 1 92 2

12 Ma y 19 22

is clearly not the day 1922 of year 12. That is, we could ‘‘guess’’ that a numeric value that
couldn’t be a day of the specified month must be a year. Such ‘‘guessing’’ can be useful in specific
constrained context. However, it in not a good idea in more general contexts. For example,
1 2 M ay 1 5
12 Ma y 15

could be a date in the year 12, 15, 1912, 1915, 2012, or 2015. Sometimes, a better approach is to
augment the notation with clues that disambiguate years and days. For example, 1 st and 1 5t h are
1s t
15 th
clearly days of a month. Similarly, 7 51 BC and 1 45 3A D are explicitly identified as years.
75 1B C
14 53 AD
D.4.5 Character Classification
When reading characters from input, it is often necessary to classify them to make sense of what is
being read. For example, to read a number, an input routine needs to know which letters are digits.
Similarly, §6.1.2 showed a use of standard character classification functions for parsing input.
Naturally, classification of characters depends on the alphabet used. Consequently, a facet
c ty pe is provided to represent character classification in a locale.
ct yp e
The character classes as described by an enumeration called m as k:
ma sk

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


Section D.4.5


c la ss s td :c ty pe _b as e {
cl as s st d: ct yp e_ ba se
p ub li c:
pu bl ic
e nu m m as k {
en um ma sk
s pa ce = 1
sp ac e 1,
p ri nt = 1 1,
pr in t 1<<1
c nt rl = 1 2,
cn tr l 1<<2
u pp er = 1 3,
up pe r 1<<3
l ow er = 1 4,
lo we r 1<<4
a lp ha = 1 5,
al ph a 1<<5
d ig it = 1 6,
di gi t 1<<6
p un ct = 1 7,
pu nc t 1<<7
x di gi t = 1 8,
xd ig it 1<<8
a ln um al ph a|d ig it
al nu m=a lp ha di gi t,
g ra ph al nu m|p un ct
gr ap h=a ln um pu nc t
};

};

Character Classification

921

// the actual values are implementation defined
// whitespace (in "C" locale: ’ ’, ’\n’, ’\t’, ...)
// printing characters
// control characters
// uppercase characters
// lowercase characters
// alphabetic characters
// decimal digits
// punctuation characters
// hexadecimal digits
// alphanumeric characters

This m as k doesn’t depend on a particular character type. Consequently, this enumeration is placed
ma sk
in a (non-template) base class.
Clearly, m as k reflects the traditional C and C++ classification (§20.4.1). However, for different
ma sk
character sets, different character values fall into different classes. For example, for the ASCII
character set, the integer value 1 25 represents the character ´}´, which is a punctuation character
12 5
(p un ct However, in the Danish national character set, 1 25 represents the vowel ´a ´, which in a
pu nc t).
12 5
a

˚
Danish locale must be classified as an a lp ha
al ph a.
The classification is called a ‘‘mask’’ because the traditional efficient implementation of character classification for small character sets is a table in which each entry holds bits representing the
classification. For example:
t ab le a´] == l ow er al ph a|x di gi t
ta bl e[´a
lo we r|a lp ha xd ig it
t ab le 1´] == d ig it
ta bl e[´1
di gi t
t ab le ´] == s pa ce
ta bl e[´
sp ac e

Given that implementation, t ab le c]&m is nonzero if the character c is an m and 0 otherwise.
ta bl e[c m
The c ty pe facet is defined like this:
ct yp e
t em pl at e <c la ss C h>
te mp la te cl as s Ch
c la ss s td :c ty pe : p ub li c l oc al e::f ac et p ub li c c ty pe _b as e {
cl as s st d: ct yp e pu bl ic lo ca le fa ce t, pu bl ic ct yp e_ ba se
p ub li c:
pu bl ic
t yp ed ef C h c ha r_ ty pe
ty pe de f Ch ch ar _t yp e;
e xp li ci t c ty pe si ze _t r = 0 ;
ex pl ic it ct yp e(s iz e_ t
0)

b oo l i s(m as k m C h c c on st // is "c" an "m"?
bo ol is ma sk m, Ch c) co ns t;
// place classification for each Ch in [b:e) into v:
c on st C h* i s(c on st C h* b c on st C h* e m as k* v c on st
co ns t Ch is co ns t Ch b, co ns t Ch e, ma sk v) co ns t;
c on st C h* s ca n_ is ma sk m c on st C h* b c on st C h* e c on st // find an m
co ns t Ch sc an _i s(m as k m, co ns t Ch b, co ns t Ch e) co ns t;
c on st C h* s ca n_ no t(m as k m c on st C h* b c on st C h* e c on st // find a non-m
co ns t Ch sc an _n ot ma sk m, co ns t Ch b, co ns t Ch e) co ns t;

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


922

Locales

Appendix D

C h t ou pp er Ch c c on st
Ch to up pe r(C h c) co ns t;
c on st C h* t ou pp er Ch b c on st C h* e c on st // convert [b:e)
co ns t Ch to up pe r(C h* b, co ns t Ch e) co ns t;
C h t ol ow er Ch c c on st
Ch to lo we r(C h c) co ns t;
c on st C h* t ol ow er Ch b c on st C h* e c on st
co ns t Ch to lo we r(C h* b, co ns t Ch e) co ns t;
C h w id en ch ar c c on st
Ch wi de n(c ha r c) co ns t;

c on st c ha r* w id en co ns t c ha r* b c on st c ha r* e C h* b 2) c on st
co ns t ch ar wi de n(c on st ch ar b, co ns t ch ar e, Ch b2 co ns t;
c ha r n ar ro w(C h c c ha r d ef c on st
ch ar na rr ow Ch c, ch ar de f) co ns t;
c on st C h* n ar ro w(c on st C h* b c on st C h* e c ha r d ef c ha r* b 2) c on st
co ns t Ch na rr ow co ns t Ch b, co ns t Ch e, ch ar de f, ch ar b2 co ns t;
s ta ti c l oc al e::i d i d; // facet identifier object (§D.2, §D.3, §D.3.1)
st at ic lo ca le id id
p ro te ct ed
pr ot ec te d:
˜c ty pe ;
ct yp e()
// virtual ‘‘do_’’ functions for public functions (see §D.4.1)
};

A call i s(m c) tests whether the character c belongs to the classification m For example:
is m,c
m.
i nt c ou nt _s pa ce s(c on st s tr in g& s c on st l oc al e& l oc
in t co un t_ sp ac es co ns t st ri ng s, co ns t lo ca le lo c)
{
c on st c ty pe ch ar
co ns t ct yp e<c ha r>& c t = u se _f ac et c ty pe ch ar >(l oc ;
ct us e_ fa ce t< ct yp e<c ha r> lo c)
i nt i = 0
in t
0;
f or st ri ng :c on st _i te ra to r p = s be gi n(); p != s en d(); ++p
fo r(s tr in g: co ns t_ it er at or
s.b eg in

s.e nd
p)
i f (c t.i s(c ty pe _b as e::s pa ce p)) ++i
if ct is ct yp e_ ba se sp ac e,*p
i;
// whitespace as defined by ct
r et ur n i
re tu rn i;
}

Note that it is also possible to use i s() to check whether a character belongs to one of a number of
is
classifications. For example:
c t.i s(c ty pe _b as e::s pa ce ct yp e_ ba se :p un ct c); // is c whitespace or punctuation in ct?
ct is ct yp e_ ba se sp ac e|c ty pe _b as e: pu nc t,c

A call i s(b e,v determines the classification of each character in [b e) and places it in the coris b,e v)
b,e
responding position in the array v
v.
A call s ca n_ is m,b e) returns a pointer to the first character in [b e) that is an m If no
sc an _i s(m b,e
b,e
m.
character is classified as an m e is returned. As ever for standard facets, the public member funcm,
tion is implemented by a call to its ‘‘d o_ virtual function. A simple implementation might be:
do _’’
t em pl at e <c la ss C h>
te mp la te cl as s Ch
c on st C h* s td :c ty pe Ch :d o_ sc an _i s(m as k m c on st C h* b c on st C h* e c on st

co ns t Ch st d: ct yp e<C h>: do _s ca n_ is ma sk m, co ns t Ch b, co ns t Ch e) co ns t
{
w hi le (b e && !i s(m b)) ++b
wh il e b!=e
is m,*b
b;
r et ur n b
re tu rn b;
}

A call s ca n_ no t(m b,e returns a pointer to the first character in [b e) that is not an m If all
sc an _n ot m,b e)
b,e
m.
characters are classified as m e is returned.
m,
A call t ou pp er c) returns the uppercase version of c if such a version exists in the character set
to up pe r(c
used and c itself otherwise.
A call t ou pp er b,e converts each character in the range [b e) to uppercase and returns e A
to up pe r(b e)
b,e
e.
simple implementation might be:

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


Section D.4.5


Character Classification

923

t em pl at e <c la ss C h>
te mp la te cl as s Ch
c on st C h* s td :c ty pe Ch :t o_ up pe r(C h* b c on st C h* e
co ns t Ch st d: ct yp e<C h>: to _u pp er Ch b, co ns t Ch e)
{
f or (; b e; ++b *b = t ou pp er b);
fo r
b!=e
b) b to up pe r(*b
r et ur n e
re tu rn e;
}

The t ol ow er functions are similar to t ou pp er except that they convert to lowercase.
to lo we r()
to up pe r()
A call w id en c) transforms the character c into its corresponding C h value. If C h’s character
wi de n(c
Ch
Ch
set provides several characters corresponding to c the standard specifies that ‘‘the simplest reasonc,
able transformation’’ be used. For example,
w co ut << u se _f ac et c ty pe wc ha r_ t> >(w co ut ge tl oc
wc ou t
us e_ fa ce t< ct yp e

wc ou t.g et lo c()).w id en e´);
wi de n(´e

will output a reasonable equivalent to the character e in w co ut locale.
wc ou t’s
Translation between unrelated character representations, such as ASCII and EBCDIC, can also
be done by using w id en
wi de n(). For example, assume that an e bc di c locale exists:
eb cd ic
c ha r E BC DI C_ e = u se _f ac et c ty pe ch ar >(e bc di c).w id en e´);
ch ar EB CD IC _e us e_ fa ce t< ct yp e<c ha r>
eb cd ic wi de n(´e

A call w id en b,e v) takes each character in the range [b e) and places a widened version in the
wi de n(b e,v
b,e
corresponding position in the array v
v.
A call n ar ro w(c h,d ef produces a c ha r value corresponding to the character c h from the C h
na rr ow ch de f)
ch ar
ch
Ch
type. Again, ‘‘the simplest reasonable transformation’’ is to be used. If no such corresponding
c ha r exist, d ef is returned.
ch ar
de f
A call n ar ro w(b e,d ef v) takes each character in the range [b e) and places a narrowed
na rr ow b,e de f,v
b,e

version in the corresponding position in the array v
v.
The general idea is that n ar ro w() converts from a larger character set to a smaller one and that
na rr ow
w id en
wi de n() performs the inverse operation. For a character c from the smaller character set, we
expect:
c == n ar ro w(w id en c),0 // not guaranteed
na rr ow wi de n(c 0)

This is true provided that the character represented by c has only one representation in ‘‘the smaller
character set.’’ However, that is not guaranteed. If the characters represented by a c ha r are not a
ch ar
subset of those represented by the larger character set (C h), we should expect anomalies and potenCh
tial problems with code treating characters generically.
Similarly, for a character c h from the larger character set, we might expect:
ch
w id en na rr ow ch de f)) == c h || w id en na rr ow ch de f)) == w id en de f) // not guaranteed
wi de n(n ar ro w(c h,d ef
ch
wi de n(n ar ro w(c h,d ef
wi de n(d ef

However, even though this is often the case, it cannot be guaranteed for a character that is represented by several values in the larger character set but only once in the smaller character set. For
example, a digit, such as 7 often has several separate representations in a large character set. The
7,
reason for that is typically that a large character set has several conventional character sets as subsets and that the characters from the smaller sets are replicated for ease of conversion.
For every character in the basic source character set (§C.3.3), it is guaranteed that
w id en na rr ow ch _l it 0)) == c h_ li t
wi de n(n ar ro w(c h_ li t,0

ch _l it

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


924

Locales

Appendix D

For example:
w id en na rr ow x´),0 == ´x
wi de n(n ar ro w(´x
0)


The n ar ro w() and w id en
na rr ow
wi de n() functions respect character classifications wherever possible. For
example, if i s(a lp ha c), then i s(a lp ha na rr ow c,´a
is al ph a,c
is al ph a,n ar ro w(c a´)) and i s(a lp ha wi de n(c
is al ph a,w id en c)) wherever
a lp ha is a valid mask for the locale used.
al ph a
A major reason for using a c ty pe facet in general and for using n ar ro w() and w id en
ct yp e
na rr ow

wi de n() functions in particular is to be able to write code that does I/O and string manipulation for any character
set; that is, to make such code generic with respect to character sets. This implies that i os tr ea m
io st re am
implementations depend critically on these facilities. By relying on <i os tr ea m> and <s tr in g>, a
io st re am
st ri ng
user can avoid most direct uses of the c ty pe facet.
ct yp e
A _ by na me version (§D.4, §D.4.1) of c ty pe is provided:
_b yn am e
ct yp e
t em pl at e <c la ss C h> c la ss s td :c ty pe _b yn am e : p ub li c c ty pe Ch { /* ... */ };
te mp la te cl as s Ch cl as s st d: ct yp e_ by na me pu bl ic ct yp e<C h>

D.4.5.1 Convenience Interfaces
The most common use of the c ty pe facet is to inquire whether a character belongs to a given classict yp e
fication. Consequently, a set of functions is provided for that:
t em pl at e te mp la te cl as s
t em pl at e te mp la te cl as s
t em pl at e te mp la te cl as s
t em pl at e te mp la te cl as s
t em pl at e te mp la te cl as s
t em pl at e te mp la te cl as s
t em pl at e

te mp la te cl as s
t em pl at e te mp la te cl as s
t em pl at e te mp la te cl as s
t em pl at e te mp la te cl as s
t em pl at e te mp la te cl as s

C h> b oo l
Ch bo ol
C h> b oo l
Ch bo ol
C h> b oo l
Ch bo ol
C h> b oo l
Ch bo ol
C h> b oo l
Ch bo ol
C h> b oo l
Ch bo ol
C h> b oo l
Ch bo ol
C h> b oo l
Ch bo ol
C h> b oo l
Ch bo ol
C h> b oo l
Ch bo ol

C h> b oo l
Ch bo ol

i ss pa ce Ch c c on st l oc al e& l oc ;
is sp ac e(C h c, co ns t lo ca le lo c)
i sp ri nt Ch c c on st l oc al e& l oc ;
is pr in t(C h c, co ns t lo ca le lo c)
i sc nt rl Ch c c on st l oc al e& l oc ;
is cn tr l(C h c, co ns t lo ca le lo c)
i su pp er Ch c c on st l oc al e& l oc ;
is up pe r(C h c, co ns t lo ca le lo c)
i sl ow er Ch c c on st l oc al e& l oc ;
is lo we r(C h c, co ns t lo ca le lo c)
i sa lp ha Ch c c on st l oc al e& l oc ;
is al ph a(C h c, co ns t lo ca le lo c)
i sd ig it Ch c c on st l oc al e& l oc ;
is di gi t(C h c, co ns t lo ca le lo c)
i sp un ct Ch c c on st l oc al e& l oc ;
is pu nc t(C h c, co ns t lo ca le lo c)
i sx di gi t(C h c c on st l oc al e& l oc ;
is xd ig it Ch c, co ns t lo ca le lo c)
i sa ln um Ch c c on st l oc al e& l oc ;
is al nu m(C h c, co ns t lo ca le lo c)
i sg ra ph Ch c c on st l oc al e& l oc ;
is gr ap h(C h c, co ns t lo ca le lo c)

These functions are trivially implemented by using u se _f ac et For example:
us e_ fa ce t.
t em pl at e <c la ss C h>
te mp la te cl as s Ch

i nl in e b oo l i ss pa ce Ch c c on st l oc al e& l oc
in li ne bo ol is sp ac e(C h c, co ns t lo ca le lo c)
{
r et ur n u se _f ac et c ty pe Ch >(l oc is sp ac e,c ;
re tu rn us e_ fa ce t< ct yp e<C h> lo c).i s(s pa ce c)
}

The one-argument versions of these functions, presented in §20.4.2, are simply these functions for
the current C global locale (not the global C++ locale, l oc al e()). Except for the rare cases in which
lo ca le
the C global locale and the C++ global locale differ (§D.2.3), we can think of a one-argument version as the two-argument version applied to l oc al e(). For example:
lo ca le
i nl in e i nt i ss pa ce in t i
in li ne in t is sp ac e(i nt i)
{
r et ur n i ss pa ce i,l oc al e());
re tu rn is sp ac e(i lo ca le
}

// almost

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


Section D.4.5.1

Convenience Interfaces

925


D.4.6 Character Code Conversion
Sometimes, the representation of characters stored in a file differs from the desired representation
of those same characters in main memory. For example, Japanese characters are often stored in
files in which indicators (‘‘shifts’’) tell to which of the four common character sets (kanji, katakana, hiragana, and romaji) a given sequence of characters belongs. This is a bit unwieldy because
the meaning of each byte depends on its ‘‘shift state,’’ but it can save memory because only a kanji
requires more than one byte for its representation. In main memory, these characters are easier to
manipulate when represented in a multi-byte character set where every character has the same size.
Such characters (for example, Unicode characters) are typically placed in wide characters
(w ch ar _t §4.3). Consequently, the c od ec vt facet provides a mechanism for converting characters
wc ha r_ t;
co de cv t
from one representation to another as they are read or written. For example:
Disk representation:

JIS
I/O conversions controlled by c od ec vt
co de cv t

Main memory representation:

Unicode

This code-conversion mechanism is general enough to provide arbitrary conversions of character
representations. It allows us to write a program to use a suitable internal character representation
(stored in c ha r, w ch ar _t or whatever) and to then accept a variety of input character stream reprech ar wc ha r_ t,
sentations by adjusting the locale used by iostreams. The alternative would be to modify the program itself or to convert input and output files from/to a variety of formats.
The c od ec vt facet provides conversion between different character sets when a character is
co de cv t
moved between a stream buffer and external storage:

c la ss s td :c od ec vt _b as e {
cl as s st d: co de cv t_ ba se
p ub li c:
pu bl ic
e nu m r es ul t { o k, p ar ti al e rr or n oc on v };
en um re su lt ok pa rt ia l, er ro r, no co nv
};

// result indicators

t em pl at e te mp la te cl as s I, cl as s E, cl as s St at e>
c la ss s td :c od ec vt : p ub li c l oc al e::f ac et p ub li c c od ec vt _b as e {
cl as s st d: co de cv t pu bl ic lo ca le fa ce t, pu bl ic co de cv t_ ba se
p ub li c:
pu bl ic
t yp ed ef I i nt er n_ ty pe
ty pe de f in te rn _t yp e;
t yp ed ef E e xt er n_ ty pe
ty pe de f
ex te rn _t yp e;
t yp ed ef S ta te s ta te _t yp e;
ty pe de f St at e st at e_ ty pe
e xp li ci t c od ec vt si ze _t r = 0 ;
ex pl ic it co de cv t(s iz e_ t
0)
r es ul t i n(S ta te
re su lt in St at e&, c on st E f ro m, c on st E f ro m_ en d, c on st E f ro m_ ne xt / read
co ns t E* fr om co ns t E* fr om _e nd co ns t E*& fr om _n ex t,/
I t o, I t o_ en d, I

I* to I* to _e nd I*& t o_ ne xt c on st
to _n ex t) co ns t;

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


926

Locales

Appendix D

r es ul t o ut St at e&, c on st I f ro m, c on st I f ro m_ en d, c on st I f ro m_ ne xt / write
re su lt ou t(S ta te
co ns t I* fr om co ns t I* fr om _e nd co ns t I*& fr om _n ex t,/
E t o, E t o_ en d, E
E* to E* to _e nd E*& t o_ ne xt c on st
to _n ex t) co ns t;
r es ul t u ns hi ft St at e&, E t o, E t o_ en d, E
re su lt un sh if t(S ta te
E* to E* to _e nd E*& t o_ ne xt c on st // end character sequence
to _n ex t) co ns t;
i nt e nc od in g() c on st t hr ow ;
in t en co di ng
co ns t th ro w()
b oo l a lw ay s_ no co nv c on st t hr ow ;
bo ol al wa ys _n oc on v() co ns t th ro w()

// characterize basic encoding properties

// can we do I/O without code translation?

i nt l en gt h(c on st S ta te
in t le ng th co ns t St at e&, c on st E f ro m, c on st E f ro m_ en d, s iz e_ t m ax c on st
co ns t E* fr om co ns t E* fr om _e nd si ze _t ma x) co ns t;
i nt m ax _l en gt h() c on st t hr ow ;
in t ma x_ le ng th
co ns t th ro w()
// maximum possible length()
s ta ti c l oc al e::i d i d; // facet identifier object (§D.2, §D.3, §D.3.1)
st at ic lo ca le id id
p ro te ct ed
pr ot ec te d:
˜c od ec vt ;
co de cv t()
// virtual ‘‘do_’’ functions for public functions (see §D.4.1)
};

A c od ec vt facet is used by b as ic _f il eb uf (§21.5) to read or write characters. A b as ic _f il eb uf
co de cv t
ba si c_ fi le bu f
ba si c_ fi le bu f
obtains this facet from the stream’s locale (§21.7.1).
The S ta te template argument is the type used to hold the shift state of the stream being conSt at e
verted. S ta te can also be used to identify different conversions by specifying a specialization. The
St at e
latter is useful because characters of a variety of character encodings (character sets) can be stored
in objects of the same type. For example:
c la ss J IS st at e { /* .. */ };
cl as s JI Ss ta te

p = n ew c od ec vt wc ha r_ t,c ha r,m bs ta te _t
ne w co de cv t<w ch ar _t ch ar mb st at e_ t>;
q = n ew c od ec vt wc ha r_ t,c ha r,J IS st at e>;
ne w co de cv t
// standard char to wide char
// JIS to wide char

Without the different S ta te arguments, there would be no way for the facet to know which encoding
St at e
to assume for the stream of c ha rs. The m bs ta te _t type from <c wc ha r> or <w ch ar h> identifies
ch ar
mb st at e_ t
cw ch ar
wc ha r.h
the system’s standard conversion between c ha r and w ch ar _t
ch ar
wc ha r_ t.
A new c od ec vt can be also created as a derived class and identified by name. For example:
co de cv t
c la ss J IS cv t : p ub li c c od ec vt wc ha r_ t,c ha r,m bs ta te _t { /* ... */ };
cl as s JI Sc vt pu bl ic co de cv t<w ch ar _t ch ar mb st at e_ t>

A call i n(s fr om fr om _e nd fr om _n ex t,t o,t o_ en d,t o_ ne xt reads each character in the range
in s,f ro m,f ro m_ en d,f ro m_ ne xt to to _e nd to _n ex t)
[f ro m,f ro m_ en d) and tries to convert it. If a character is converted, i n() writes its converted
fr om fr om _e nd
in
form to the corresponding position in the [t o,t o_ en d) range; if not, i n() stops at that point.
to to _e nd

in
Upon return, i n() stores the position one-beyond-the-last character read in f ro m_ ne xt and the posiin
fr om _n ex t
tion one-beyond-the-last character written in t o_ ne xt The r es ul t value returned by i n() indicates
to _n ex t.
re su lt
in
how much work was done:
o k:
ok
all characters in the [f ro m,f ro m_ en d) range converted
fr om fr om _e nd
p ar ti al
pa rt ia l:
not all characters in the [f ro m,f ro m_ en d) range were converted
fr om fr om _e nd
e rr or
er ro r:
i n() encountered a character it couldn’t convert
in
n oc on v:
no co nv
no conversion was needed
Note that a p ar ti al conversion is not necessarily an error. Possibly more characters have to be read
pa rt ia l
before a multibyte character is complete and can be written, or maybe the output buffer has to be
emptied to make room for more characters.

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.



Section D.4.6

Character Code Conversion

927

The s argument of type S ta te indicates the state of the input character sequence at the start of
St at e
the call of i n(). This is significant when the external character representation uses shift states.
in
Note that s is a (non-c on st reference argument: At the end of the call, s holds the state of shift state
co ns t)
of the input sequence. This allows a programmer to deal with p ar ti al conversions and to convert a
pa rt ia l
long sequence using several calls to i n().
in
A call o ut s,f ro m,f ro m_ en d,f ro m_ ne xt to to _e nd to _n ex t) converts [f ro m,f ro m_ en d)
ou t(s fr om fr om _e nd fr om _n ex t,t o,t o_ en d,t o_ ne xt
fr om fr om _e nd
from the internal to the external representation in the same way the i n() converts from the external
in
to the internal representation.
A character stream must start and end in a ‘‘neutral’’ (unshifted) state. Typically, that state is
S ta te
St at e(). A call u ns hi ft s,t o,t o_ en d,t o_ ne xt looks at s and places characters in [t o,t o_ en d)
un sh if t(s to to _e nd to _n ex t)
to to _e nd
as needed to bring a sequence of characters back to that unshifted state. The result of u ns hi ft

un sh if t()
and the use of t o_ ne xt are done just like o ut
to _n ex t
ou t().
A call l en gt h(s fr om fr om _e nd ma x) returns the number of characters that i n() could conle ng th s,f ro m,f ro m_ en d,m ax
in
vert from [f ro m,f ro m_ en d).
fr om fr om _e nd
A call e nc od in g() returns
en co di ng
– 1 if the encoding of the external character set uses state (for example, uses shift and unshift
–1
character sequences)
0
if the encoding uses varying number of bytes to represent individual characters (for example, a character representation might use a bit in a byte to indicate whether one or two
bytes are used to represents that character)
n
if every character of the external character representation is n bytes
A call a lw ay s_ no co nv
al wa ys _n oc on v() returns t ru e if no conversion is required between the internal and the
tr ue
external character sets and f al se otherwise. Clearly, a lw ay s_ no co nv
fa ls e
al wa ys _n oc on v()==t ru e opens the possibiltr ue
ity for the implementation to provide the maximally efficient implementation that simply doesn’t
invoke the conversion functions.
A call m ax _l en gt h() returns the maximum value that l en gt h() can return for a valid set of
ma x_ le ng th
le ng th
arguments.

The simplest code conversion that I can think of is one that converts input to uppercase. Thus,
this is about as simple as a c od ec vt can be and still perform a service:
co de cv t
c la ss C vt _t o_ up pe r : p ub li c c od ec vt ch ar ch ar mb st at e_ t> {
cl as s Cv t_ to _u pp er pu bl ic co de cv t
// convert to uppercase

e xp li ci t C vt _t o_ up pe r(s iz e_ t r = 0 : c od ec vt r) { }
ex pl ic it Cv t_ to _u pp er si ze _t
0) co de cv t(r
p ro te ct ed
pr ot ec te d:
// read external representation write internal representation:
r es ul t d o_ in St at e& s c on st c ha r* f ro m, c on st c ha r* f ro m_ en d, c on st c ha r*& f ro m_ ne xt
re su lt do _i n(S ta te s, co ns t ch ar fr om co ns t ch ar fr om _e nd co ns t ch ar
fr om _n ex t,
c ha r* t o, c ha r* t o_ en d, c ha r*& t o_ ne xt c on st
ch ar to ch ar to _e nd ch ar
to _n ex t) co ns t;
// read internal representation write external representation:
r es ul t d o_ ou t(S ta te s c on st c ha r* f ro m, c on st c ha r* f ro m_ en d, c on st c ha r*& f ro m_ ne xt
re su lt do _o ut St at e& s, co ns t ch ar fr om co ns t ch ar fr om _e nd co ns t ch ar
fr om _n ex t,
c ha r* t o, c ha r* t o_ en d, c ha r*& t o_ ne xt c on st
ch ar to ch ar to _e nd ch ar
to _n ex t) co ns t
{
r et ur n c od ec vt ch ar ch ar mb st at e_ t>::d o_ ou t
re tu rn co de cv t

do _o ut
(s fr om fr om _e nd fr om _n ex t,t o,t o_ en d,t o_ ne xt ;
s,f ro m,f ro m_ en d,f ro m_ ne xt to to _e nd to _n ex t)
}

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


928

Locales

Appendix D

r es ul t d o_ un sh if t(S ta te
re su lt do _u ns hi ft St at e&, E t o, E t o_ en d, E
E* to E* to _e nd E*& t o_ ne xt c on st { r et ur n o k; }
to _n ex t) co ns t re tu rn ok
i nt d o_ en co di ng c on st t hr ow
in t do _e nc od in g() co ns t th ro w() { r et ur n 1 }
re tu rn 1;
b oo l d o_ al wa ys _n oc on v() c on st t hr ow
bo ol do _a lw ay s_ no co nv
co ns t th ro w() { r et ur n f al se }
re tu rn fa ls e;
i nt d o_ le ng th co ns t S ta te
in t do _l en gt h(c on st St at e&, c on st E f ro m, c on st E f ro m_ en d, s iz e_ t m ax c on st
co ns t E* fr om co ns t E* fr om _e nd si ze _t ma x) co ns t;
i nt d o_ ma x_ le ng th

in t do _m ax _l en gt h() c on st t hr ow ;
co ns t th ro w()
// maximum possible length()
};
c od ec vt ch ar ch ar mb st at e_ t>::r es ul t
co de cv tre su lt
C vt _t o_ up pe r::d o_ in St at e& s c on st c ha r* f ro m, c on st c ha r* f ro m_ en d,
Cv t_ to _u pp er do _i n(S ta te s, co ns t ch ar fr om co ns t ch ar fr om _e nd
c on st c ha r*& f ro m_ ne xt c ha r* t o, c ha r* t o_ en d, c ha r*& t o_ ne xt c on st
co ns t ch ar
fr om _n ex t, ch ar to ch ar to _e nd ch ar
to _n ex t) co ns t
{
// ... §D.6[16] ...
}
i nt m ai n()
in t ma in
// trivial test
{
l oc al e u lo ca le lo ca le , n ew C vt _t o_ up pe r);
lo ca le ul oc al e(l oc al e() ne w Cv t_ to _u pp er
c in im bu e(u lo ca le ;
ci n.i mb ue ul oc al e)
c ha r c h;
ch ar ch
w hi le (c in ch c ou t << c h;
wh il e ci n>>c h) co ut
ch
}


A _ by na me version (§D.4, §D.4.1) of c od ec vt is provided:
_b yn am e
co de cv t
t em pl at e te mp la te cl as s I, cl as s E, cl as s St at e>
c la ss s td :c od ec vt _b yn am e : p ub li c c od ec vt I,E St at e> { /* ... */ };
cl as s st d: co de cv t_ by na me pu bl ic co de cv t
D.4.7 Messages
Naturally, most end users prefer to use their native language to interact with a program. However,
we cannot provide a standard mechanism for expressing l oc al e-specific general interactions.
lo ca le
Instead, the library provides a simple mechanism for keeping a l oc al e-specific set of strings from
lo ca le
which a programmer can compose simple messages. In essence, m es sa ge s implements a trivial
me ss ag es
read-only database:
c la ss s td :m es sa ge s_ ba se {
cl as s st d: me ss ag es _b as e
p ub li c:
pu bl ic
t yp ed ef i nt c at al og // catalog identifier type
ty pe de f in t ca ta lo g;
};
t em pl at e <c la ss C h>
te mp la te cl as s Ch
c la ss s td :m es sa ge s : p ub li c l oc al e::f ac et p ub li c m es sa ge s_ ba se {
cl as s st d: me ss ag es pu bl ic lo ca le fa ce t, pu bl ic me ss ag es _b as e
p ub li c:

pu bl ic
t yp ed ef C h c ha r_ ty pe
ty pe de f Ch ch ar _t yp e;
t yp ed ef b as ic _s tr in g<C h> s tr in g_ ty pe
ty pe de f ba si c_ st ri ng Ch st ri ng _t yp e;
e xp li ci t m es sa ge s(s iz e_ t r = 0 ;
ex pl ic it me ss ag es si ze _t
0)

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


Section D.4.7

Messages

929

c at al og o pe n(c on st b as ic _s tr in g<c ha r>& f n, c on st l oc al e&) c on st
ca ta lo g op en co ns t ba si c_ st ri ng ch ar
fn co ns t lo ca le
co ns t;
s tr in g_ ty pe g et ca ta lo g c i nt s et i nt m sg id c on st s tr in g_ ty pe d c on st
st ri ng _t yp e ge t(c at al og c, in t se t, in t ms gi d, co ns t st ri ng _t yp e& d) co ns t;
v oi d c lo se ca ta lo g c c on st
vo id cl os e(c at al og c) co ns t;
s ta ti c l oc al e::i d i d; // facet identifier object (§D.2, §D.3, §D.3.1)
st at ic lo ca le id id
p ro te ct ed

pr ot ec te d:
˜m es sa ge s();
me ss ag es
// virtual ‘‘do_’’ functions for public functions (see §D.4.1)
};

A call o pe n(s lo c) opens a ‘‘catalog’’ of messages called s for the locale l oc A catalog is a set
op en s,l oc
lo c.
of strings organized in an implementation-specific way and accessed through the
m es sa ge s::g et
me ss ag es ge t() function. A negative value is returned if no catalog named s can be opened. A
catalog must be opened before the first use of g et
ge t().
A call c lo se ca t) closes the catalog identified by c at and frees all resources associated with
cl os e(c at
ca t
that catalog.
A call g et ca t,s et id fo o") looks for a message identified by (s et id in the catalog c at
ge t(c at se t,i d,"f oo
se t,i d)
ca t.
If a string is found, g et
ge t() returns that string; otherwise, g et
ge t() returns the default string (here,
s tr in g("f oo
st ri ng fo o")).
Here is an example of a m es sa ge s facet for an implementation in which a message catalog is a
me ss ag es
vector of sets of ‘‘messages’’ and a ‘‘message’’ is a string:

s tr uc t S et {
st ru ct Se t
v ec to r<s tr in g> m sg s;
ve ct or st ri ng ms gs
};
s tr uc t C at {
st ru ct Ca t
v ec to rve ct or Se t> se ts
};
c la ss M y_ me ss ag es : p ub li c m es sa ge s<c ha r> {
cl as s My _m es sa ge s pu bl ic me ss ag es ch ar
v ec to rve ct or Ca t>& ca ta lo gs
p ub li c:
pu bl ic
e xp li ci t M y_ me ss ag es si ze _t = 0 :c at al og s(*n ew v ec to rex pl ic it My _m es sa ge s(s iz e_ t 0) ca ta lo gs ne w ve ct or Ca t>)
c at al og d o_ op en co ns t s tr in g& s c on st l oc al e& l oc c on st
ca ta lo g do _o pe n(c on st st ri ng s, co ns t lo ca le lo c) co ns t;
// open catalog s
s tr in g d o_ ge t(c at al og c i nt s i nt m c on st s tr in g&) c on st
st ri ng do _g et ca ta lo g c, in t s, in t m, co ns t st ri ng
co ns t; // get message (s,m) in c
v oi d d o_ cl os e(c at al og c at c on st
vo id do _c lo se ca ta lo g ca t) co ns t
{
i f (c at al og s.s iz e()<=c at c at al og s.e ra se ca ta lo gs be gi n()+c at ;
if ca ta lo gs si ze
ca t) ca ta lo gs er as e(c at al og s.b eg in

ca t)
}
˜M y_ me ss ag es
My _m es sa ge s() { d el et e &c at al og s; }
de le te ca ta lo gs
};

All m es sa ge s’ member functions are c on st so the catalog data structure (the v ec to rme ss ag es
co ns t,
ve ct or Se t>)
outside the facet.
A message is selected by specifying a catalog, a set within that catalog, and a message string
within that set. A string is supplied as an argument, to be used as a default result in case no message is found in the catalog:

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


930

Locales

Appendix D

s tr in g M y_ me ss ag es :d o_ ge t(c at al og c at i nt s et i nt m sg c on st s tr in g& d ef c on st
st ri ng My _m es sa ge s: do _g et ca ta lo g ca t, in t se t, in t ms g, co ns t st ri ng de f) co ns t
{
i f (c at al og s.s iz e()<=c at r et ur n d ef
if ca ta lo gs si ze

ca t) re tu rn de f;
C at c = c at al og s[c at ;
Ca t&
ca ta lo gs ca t]
i f (c se ts si ze
if c.s et s.s iz e()<=s et r et ur n d ef
se t) re tu rn de f;
S et s = c se ts se t];
Se t&
c.s et s[s et
i f (s ms gs si ze
if s.m sg s.s iz e()<=m sg r et ur n d ef
ms g) re tu rn de f;
r et ur n s ms gs ms g];
re tu rn s.m sg s[m sg
}

Opening a catalog involves reading a textual representation from disk into a C at structure. Here, I
Ca t
chose a representation that is trivial to read. A set is delimited by <<< and >>>, and each message
is a line of text:
m es sa ge s<c ha r>::c at al og M y_ me ss ag es :d o_ op en co ns t s tr in g& n c on st l oc al e& l oc c on st
me ss ag es ch ar
ca ta lo g My _m es sa ge s: do _o pe n(c on st st ri ng n, co ns t lo ca le lo c) co ns t
{
s tr in g n n = n + l oc al e().n am e();
st ri ng nn
lo ca le
na me
i fs tr ea m f nn c_ st r());

if st re am f(n n.c _s tr
i f (!f r et ur n -1
if
f) re tu rn 1;
c at al og s.p us h_ ba ck Ca t());
ca ta lo gs pu sh _b ac k(C at
// make in-core catalog
C at c = c at al og s.b ac k();
Ca t&
ca ta lo gs ba ck
s tr in g s
st ri ng s;
w hi le (f s && s
wh il e f>>s
s=="<<<") {
// read Set
c se ts pu sh _b ac k(S et
c.s et s.p us h_ ba ck Se t());
S et s s = c se ts ba ck ;
Se t& ss c.s et s.b ac k()
w hi le (g et li ne f,s && s != ">>>") s s.m sg s.p us h_ ba ck s);
wh il e ge tl in e(f s)
ss ms gs pu sh _b ac k(s
}
r et ur n c at al og s.s iz e()-1
re tu rn ca ta lo gs si ze
1;

// read message


}

Here is a trivial use:
i nt m ai n()
in t ma in
{
i f (!h as _f ac et M y_ me ss ag es >(l oc al e())) {
if
ha s_ fa ce t< My _m es sa ge s lo ca le
c er r << "n o m es sa ge s f ac et f ou nd i n " << l oc al e().n am e() << ´\ n´;
ce rr
no me ss ag es fa ce t fo un d in
lo ca le
na me
\n
e xi t(1 ;
ex it 1)
}
c on st m es sa ge s<c ha r>& m = u se _f ac et M y_ me ss ag es >(l oc al e());
co ns t me ss ag es ch ar
us e_ fa ce t< My _m es sa ge s lo ca le
e xt er n s tr in g m es sa ge _d ir ec to ry
ex te rn st ri ng me ss ag e_ di re ct or y;
// where I keep my messages
i nt c at = m op en me ss ag e_ di re ct or y,l oc al e());
in t ca t m.o pe n(m es sa ge _d ir ec to ry lo ca le
i f (c at 0) {
if ca t<0
c er r << "n o c at al og f ou nd \n
ce rr

no ca ta lo g fo un d\ n";
e xi t(1 ;
ex it 1)
}
c ou t << m ge t(c at 0,0 Mi ss ed a ga in
co ut
m.g et ca t,0 0,"M is se d ag ai n!") << e nd l;
en dl
c ou t << m ge t(c at 1,2 Mi ss ed a ga in
co ut
m.g et ca t,1 2,"M is se d ag ai n!") << e nd l;
en dl

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


Section D.4.7

Messages

931

c ou t << m ge t(c at 1,3 Mi ss ed a ga in
co ut
m.g et ca t,1 3,"M is se d ag ai n!") << e nd l;
en dl
c ou t << m ge t(c at 3,0 Mi ss ed a ga in
co ut
m.g et ca t,3 0,"M is se d ag ai n!") << e nd l;

en dl
}

If the catalog is
<<<
h el lo
he ll o
g oo db ye
go od by e
>>>
<<<
y es
ye s
no
no
m ay be
ma yb e
>>>

this program prints
h el lo
he ll o
m ay be
ma yb e
M is se d a ga in
Mi ss ed ag ai n!
M is se d a ga in
Mi ss ed ag ai n!

D.4.7.1 Using Messages from Other Facets

In addition to being a repository for l oc al e-dependent strings used to communicate with users, meslo ca le
sages can be used to hold strings for other facets. For example, the S ea so n_ io facet (§D.3.2) could
Se as on _i o
have been written like this:
c la ss S ea so n_ io : p ub li c l oc al e::f ac et {
cl as s Se as on _i o pu bl ic lo ca le fa ce t
c on st m es sa ge s<c ha r>& m
co ns t me ss ag es ch ar
m;
// message directory
i nt c at
in t ca t;
// message catalog
p ub li c:
pu bl ic
c la ss M is si ng _m es sa ge s { };
cl as s Mi ss in g_ me ss ag es
S ea so n_ io in t i = 0
Se as on _i o(i nt
0)
: l oc al e::f ac et i),
lo ca le fa ce t(i
m us e_ fa ce tm(u se _f ac et Se as on _m es sa ge s>(l oc al e())),
c at m.o pe n(m es sa ge _d ir ec to ry lo ca le
ca t(m op en me ss ag e_ di re ct or y,l oc al e()))
{ i f (c at 0) t hr ow M is si ng _m es sa ge s(); }
if ca t<0 th ro w Mi ss in g_ me ss ag es
˜S ea so n_ io
Se as on _i o() { }


// to make it possible to destroy Season_io objects (§D.3)

c on st s tr in g& t o_ st r(S ea so n x c on st
co ns t st ri ng to _s tr Se as on x) co ns t;

// string representation of x

b oo l f ro m_ st r(c on st s tr in g& s S ea so n& x c on st // place Season corresponding to s in x
bo ol fr om _s tr co ns t st ri ng s, Se as on x) co ns t;
s ta ti c l oc al e::i d i d; // facet identifier object (§D.2, §D.3, §D.3.1)
st at ic lo ca le id id
};
l oc al e::i d S ea so n_ io :i d; // define the identifier object
lo ca le id Se as on _i o: id

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


932

Locales

Appendix D

c on st s tr in g& S ea so n_ io :t o_ st r(S ea so n x c on st
co ns t st ri ng Se as on _i o: to _s tr Se as on x) co ns t
{
r et ur n m ge t(c at x,"n o-s uc h-s ea so n");

re tu rn m->g et ca t,x no su ch se as on
}
b oo l S ea so n_ io :f ro m_ st r(c on st s tr in g& s S ea so n& x c on st
bo ol Se as on _i o: fr om _s tr co ns t st ri ng s, Se as on x) co ns t
{
f or (i nt i = S ea so n::s pr in g; i Se as on :w in te r; i
fo r in t
Se as on sp ri ng i<=S ea so n: wi nt er i++)
i f (m ge t(c at i,"n o-s uc h-s ea so n") == s {
if m->g et ca t,i no su ch se as on
s)
x = S ea so n(i ;
Se as on i)
r et ur n t ru e;
re tu rn tr ue
}
r et ur n f al se
re tu rn fa ls e;
}

This m es sa ge s-based solution differs from the original solution (§D.3.2) in that the implementer of
me ss ag es
a set of S ea so n strings for a new locale needs to be able to add them to a m es sa ge s directory. This
Se as on
me ss ag es
is easy for someone adding a new locale to an execution environment. However, since m es sa ge s
me ss ag es
provides only a read-only interface, adding a new set of season names may be beyond the scope of
an application programmer.
A _ by na me version (§D.4, §D.4.1) of m es sa ge s is provided:

_b yn am e
me ss ag es
t em pl at e <c la ss C h>
te mp la te cl as s Ch
c la ss s td :m es sa ge s_ by na me : p ub li c m es sa ge s<C h> { /* ... */ };
cl as s st d: me ss ag es _b yn am e pu bl ic me ss ag es Ch

D.5 Advice
[1] Expect that every nontrivial program or system that interacts directly with people will be used
in several different countries; §D.1.
[2] Don’t assume that everyone uses the same character set as you do; §D.4.1.
[3] Prefer using l oc al es to writing ad hoc code for culture-sensitive I/O; §D.1.
lo ca le
[4] Avoid embedding locale name strings in program text; §D.2.1.
[5] Minimize the use of global format information; §D.2.3, §D.4.4.7.
[6] Prefer locale-sensitive string comparisons and sorts; §D.2.4, §D.4.1.
[7] Make f ac et immutable; §D.2.2, §D.3.
fa ce ts
[8] Keep changes of l oc al e to a few places in a program; §D.2.3.
lo ca le
[9] Let l oc al e handle the lifetime of f ac et §D.3.
lo ca le
fa ce ts;
[10] When writing locale-sensitive I/O functions, remember to handle exceptions from usersupplied (overriding) functions; §D.4.2.2.
[11] Use a simple M on ey type to hold monetary values; §D.4.3.
Mo ne y
[12] Use simple user-defined types to hold values that require locale-sensitive I/O (rather than casting to and from values of built-in types); §D.4.3.
[13] Don’t believe timing figures until you have a good idea of all factors involved; §D.4.4.1.
[14] Be aware of the limitations of t im e_ t; §D.4.4.1, §D.4.4.5.
ti me _t

[15] Use a date-input routine that accepts a range of input formats; §D.4.4.5.
[16] Prefer the character classification functions in which the locale is explicit; §D.4.5, §D.4.5.1.

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.


Section D.6

Exercises

933

D.6 Exercises
1. (∗2.5) Define a S ea so n_ io (§D.3.2) for a language other than American English.
Se as on _i o
2. (∗2) Define a S ea so n_ io (§D.3.2) class that takes a set of name strings as a constructor arguSe as on _i o
ment so that S ea so n names for different locales can be represented as objects of this class.
Se as on
3. (∗3) Write a c ol la te ch ar :c om pa re
co ll at e<c ha r>: co mp ar e() that gives dictionary order. Preferably, do this for a
language, such as German or French, that has more letters in its alphabet than English does.
4. (∗2) Write a program that reads and writes b oo ls as numbers, as English words, and as words in
bo ol
another language of your choice.
5. (∗2.5) Define a T im e type for representing time of day. Define a D at e_ an d_ ti me type by using
Ti me
Da te _a nd _t im e
T im e and a D at e type. Discuss the pros and cons of this approach compared to the D at e from
Ti me

Da te
Da te
(§D.4.4). Implement l oc al e-sensitive I/O for T im e and D at e_ an d_ ti me
lo ca le
Ti me
Da te _a nd _t im e.
6. (∗2.5) Design and implement a postal code (zip code) facet. Implement it for at least two countries with dissimilar conventions for writing addresses. For example: N J 0 79 32 and C B2
NJ 07 93 2
CB 21QA .
7. (∗2.5) Design and implement a phone number facet. Implement it for at least two countries
with dissimilar conventions for writing phone numbers. For example, (9 73 3 60 80 00 and
97 3) 36 0-8 00 0
1 22 3 3 43 00 0.
12 23 34 30 00
8. (∗2.5) Experiment to find out what input and output formats your implementation uses for date
information.
9. (∗2.5) Define a g et _t im e() that ‘‘guesses’’ about the meaning of ambiguous dates, such as
ge t_ ti me
12/5/1995, but still rejects all or almost all mistakes. Be precise about what ‘‘guesses’’ are
accepted, and discuss the likelihood of a mistake.
10. (∗2) Define a g et _t im e() that accepts a greater variety of input formats than the one in
ge t_ ti me
§D.4.4.5.
11. (∗2) Make a list of the locales supported on your system.
12. (∗2.5) Figure out where named locales are stored on your system. If you have access to the part
of the system where locales are stored, make a new named locale. Be very careful not to break
existing locales.
13. (∗2) Compare the two S ea so n_ io implementations (§D.3.2 and §D.4.7.1).
Se as on _i o
14. (∗2) Write and test a D at e_ ou t facet that writes D at es using a format supplied as a constructor

Da te _o ut
Da te
argument. Discuss the pros and cons of this approach compared to the global date format provided by d at e_ fm t (§D.4.4.6).
da te _f mt
15. (∗2.5) Implement I/O of Roman numerals (such as X I and M DC LI I).
XI
MD CL II
16. (∗2.5) Implement and test C vt _t o_ up pe r (§D.4.6).
Cv t_ to _u pp er
17. (∗2.5) Use c lo ck
cl oc k() to determine average cost of (1) a function call, (2) a virtual function call,
(3) reading a c ha r, (4) reading a 1-digit i nt (5) reading a 5-digit i nt (6) reading a 5-digit d ou ch ar
in t,
in t,
do ub le (7) a 1-character s tr in g, (8) a 5-character s tr in g,and (9) a 40-character s tr in g.
bl e,
st ri ng
st ri ng
st ri ng
18. (∗6.5) Learn another natural language.

The C++ Programming Language, Special Edition by Bjarne Stroustrup. Copyright ©2000 by AT&T.
Published by Addison Wesley Inc. ISBN 0-201-70073-5. All rights reserved.