6.5.2.5 Compound literals
1058
Commentary
All objects defined outside the body of a function have static storage duration. The storage for such objects is
1065 compound
literal
outside func-
tion body
455 static
storage dura-
tion
initialized before program startup, so can only consist of constant expressions. This constraint only differs
151 static storage
duration
initialized before
startup
from an equivalent one for initializers by being framed in terms of “occurring outside the body of a function”
1644 initializer
static storage
duration object
rather than “an object that has static storage duration.”
Semantics
1057
A postfix expression that consists of a parenthesized type name followed by a brace-enclosed list of initializers
compound literal
is a compound literal.
Commentary
This defines the term compound literal. A compound literal differs from an initializer list in that it can occur
1641 initialization
syntax
outside of an object definition. Because their need be no associated type definition, a type name must be
specified (for initializers the type is obtained from the type of the object being initialized).
Other Languages
A form of compound literals are supported in some languages (e.g., Ada, Algol 68, CHILL, and Extended
Pascal). These languages do not always require a type name to be given. The type of the parenthesized list of
expressions is deduced from the context in which it occurs.
Coding Guidelines
From the coding guideline point of view, the use of compound literals appears fraught with potential pitfalls,
1066 compound
literal
inside function
body
including the use of the term compound literal which suggests a literal value, not an unnamed object.
1061 compound
literal
is lvalue
However, this construct is new in C99 and there is not yet sufficient experience in their use to know if any
specific guideline recommendations might apply to them.
1058
It provides an unnamed object whose value is given by the initializer list.
81)
compound literal
unnamed object
Commentary
The difference between this kind of unnamed object and that created by a call to a memory allocation function
(e.g.,
malloc
) is that its definition includes a type and it has a storage duration other than allocated (i.e.,
either static or automatic).
Other Languages
Some languages treat their equivalent of compound literals as just that, a literal. For instance, like other
literals, it is not possible to take their address.
Common Implementations
In those cases where a translator can deduce that storage need not be allocated for the unnamed object, the
as-if rule can be used, and it need not allocate any storage. This situation is likely to occur for compound
literals because, unless their address is taken (explicitly using the address-of operator, or in the case of an
array type implicit conversion to pointer type), they are only assigned a value at one location in the source
code. At their point of definition, and use, a translator can generate machine code that operates on their
constituent values directly rather than copying them to an unnamed object and operating on that.
Coding Guidelines
Guideline recommendations applicable to the unnamed object are the same as those that apply to objects
having the same storage duration. For instance, the guideline recommendation dealing with assigning the
address of objects to pointers.
1088.1 object
address assigned
Example
The following example not only requires that storage be allocated for the unnamed object created by the
compound literal, but that the value it contains be reset on every iteration of the loop.
June 24, 2009 v 1.2
6.5.2.5 Compound literals
1062
1 struct s_r {
2 int mem;
3 };
4
5 extern void glob(struct s_r
*
);
6
7 void f(void)
8 {
9 struct s_r
*
p_s_r;
10
11 do {
12 glob(p_s_r = &((struct s_r){1});
13 /
*
14
*
Instead of writing the above we could have written:
15
*
struct s_r unnamed_s_r = {1};
16
*
glob (p_s_r = &unnamed_s_r);
17
*
which assigns 1 to the member on every iteration, as
18
*
part of the process of defining the object.
19
*
/
20 p_s_r->mem++; /
*
Increment value held by unnamed object.
*
/
21 } while (p_s_r->mem != 10)
22 }
1059
If the type name specifies an array of unknown size, the size is determined by the initializer list as specified in
6.7.8, and the type of the compound literal is that of the completed array type.
Commentary
This behavior is discussed elsewhere.
array of un-
known size
initialized
1683
Coding Guidelines
The some of the issues involved in declaring arrays having an unknown size are discussed elsewhere.
array
incomplete type
1573
1060
Otherwise (when the type name specifies an object type), the type of the compound literal is that specified by
the type name.
Commentary
Presumably this is the declared type of the unnamed object initialized by the initializer list and therefore also
its effective type.
effective type 948
1061
In either case, the result is an lvalue.compound literal
is lvalue
Commentary
While the specification for a compound literal meets the requirements needed to be an lvalue, wording
lvalue 721
elsewhere might be read to imply that the result is not an lvalue. This specification clarifies the behavior.
lvalue
converted to value
725
Other Languages
Some languages consider, their equivalent of, compound literals to be just that, literals. For such languages
the result is an rvalue.
rvalue 736
1062
81) Note that this differs from a cast expression.footnote
81
Commentary
A cast operator takes a single scalar value (if necessary any lvalue is converted to its value) as its operand
and returns a value as its result.
v 1.2 June 24, 2009
6.5.2.5 Compound literals
1064
Coding Guidelines
Developers are unlikely to write expressions, such as
(int){1}
, when
(int)1
had been intended (on
standard US PC-compatible keyboards the pair of characters ( { and the pair ) } appear on four different
keys). Such usage may occur through the use of parameterized macros. However, at the time of this writing
there is insufficient experience with use of this new language construct to know whether any guideline
recommendation is worthwhile.
Example
The following all assign a value to loc. The first two assignments involve an lvalue to value conversion. In
the second two assignments the operand being assigned is already a value.
1 extern int glob = 1;
2
3 void f(void)
4 {
5 int loc;
6
7 loc=glob;
8 loc=(int){1};
9
10 loc=2;
11 loc=(int)2;
12 }
1063
For example, a cast specifies a conversion to scalar types or
void
only, and the result of a cast expression is
not an lvalue.
Commentary
These are restrictions on the types and operands of such an expression and one property of its result.
1134 cast
scalar or void
type
1131 footnote
85
Example
1 &(int)x; /
*
Constraint violation.
*
/
2 &(int){x}; /
*
Address of an unnamed object containing the current value of x.
*
/
1064
The value of the compound literal is that of an unnamed object initialized by the initializer list.
Commentary
The distinction between a compound literal acting as if the initializer list was its value, and an unnamed
object (initialized with values from the initializer list) being its value, is only apparent when the address-of
operator is applied to it. The creation of an unnamed object does not mean that locally allocated storage is a
factor in this distinction. Implementations of languages where compound literals are defined to be literals
sometimes use locally allocated temporary storage to hold their values. C implementations may find they can
optimize away allocation of any actual unnamed storage.
Common Implementations
If a compound literal occurs in a context where its value is required (e.g., assignment) there are obvious
opportunities for implementations to use the values of the initializer list directly. C99 is still too new to know
whether most implementations will make use of this optimization.
June 24, 2009 v 1.2
6.5.2.5 Compound literals
1066
Coding Guidelines
The distinction between the value of a compound literal being an unnamed object and being the values of the
initializer list could be viewed as an unnecessary complication that is not worth educating a developer about.
Until more experience has been gained with the kinds of mistakes developers make with compound literals,
it is not possible to recommend any guidelines.
Example
1 #include <string.h>
2
3 struct TAG {
4 int mem_1;
5 float mem_2;
6 };
7
8 struct TAG o_s1 = (struct TAG){1, 2.3};
9
10 void f(void)
11 {
12 memcpy(&o_s1, &(struct TAG){4, 5.6}, sizeof(struct TAG));
13 }
1065
If the compound literal occurs outside the body of a function, the object has static storage duration;compound literal
outside function
body
Commentary
This specification is consistent with how other object declarations, outside of function bodies, behave. The
storage duration of a compound literal is based on the context in which it occurs, not whether its initializer
storage
duration
object
448
list consists of constant expressions.
1 struct s_r {
2 int mem;
3 };
4
5 static struct s_r glob = {4};
6 static struct s_r col = (struct s_r){4}; /
*
Constraint violation.
*
/
7 static struct s_r
*
p_g = &(struct s_r){4};
8
9 void f(void)
10 {
11 static struct s_r loc = {4};
12 static struct s_r col = (struct s_r){4}; /
*
Constraint violation.
*
/
13 static struct s_r
*
p_l = &(struct s_r){4}; /
*
Constraint violation.
*
/
14 }
Other Languages
The storage duration specified by other languages, which support some form of compound literal, varies.
Some allow the developer to choose (e.g., Algol 68), others require them to be dynamically allocated (e.g.,
Ada), while in others (e.g., Fortran and Pascal) the issue is irrelevant because it is not possible to obtain their
address.
1066
otherwise, it has automatic storage duration associated with the enclosing block.compound literal
inside function
body
Commentary
A parallel can be drawn between an object definition that includes an initializer and a compound literal (that
is the definition of an unnamed object). The lifetime of the associated objects starts when the block that
v 1.2 June 24, 2009
6.5.2.5 Compound literals
1066
contains their definition is entered. However, the objects are not assigned their initial value, if any, until the
458 object
lifetime from
entry to exit of
block
declaration is encountered during program execution.
462 initialization
performed every
time declaration
reached
The unnamed object associated with a compound literal is initialized each time the statement that contains
it is encountered during program execution. Previous invocations, which may have modified the value of the
1711 object
initializer eval-
uated when
unnamed object, or nested invocations in a recursive call, do not affect the value of the newly created object.
1026 function call
recursive
Storage for the unnamed object is created on block entry. Executing a statement containing a compound
1078 EXAMPLE
compound literal
single object
literal does not cause any new storage to be allocated. Recursive calls to a function containing a compound
literal will cause different storage to be allocated, for the unnamed object, for each nested call.
1 struct foo {
2 struct foo
*
next;
3 int i;
4 };
5
6 void WG14_N759(void)
7 {
8 struct foo
*
p,
9
*
q;
10 /
*
11
*
The following loop ...
12
*
/
13 p = NULL;
14 for (int j = 0; j < 10; j++)
15 {
16 q = &((struct foo){ .next = p, .i = j });
17 p = q;
18 }
19 /
*
20
*
... is equivalent to the loop below.
21
*
/
22 p = NULL;
23 for (int j = 0; j < 10; j++)
24 {
25 struct foo T;
26
27 T.next = p;
28 T.i = j;
29 q = &T;
30 p = q;
31 }
32 }
Common Implementations
To what extent is it worth trying to optimize compound literals made up of a list of constant expressions;
for instance, by detecting those that are never modified, or by placing them in a static region of storage
that can be copied from or pointed at? The answer to these and many other optimization issues relating to
compound literals will have to wait until translator vendors get a feel for how their customers use this new, to
C, construct.
Coding Guidelines
Parallels can be drawn between the unnamed object associated with a compound literal and the temporaries
created in C
++
. Experience has shown that C
++
developers sometimes assume that the lifetime of a temporary
is greater than it is required to be by that languages standard. Based on this experience it is to be expected
that developers using C might make similar mistakes with the lifetime of the unnamed object associated with
a compound literal. Only time will tell whether these mistakes will be sufficiently common, or serious, that
the benefits of being able to apply the address-of operator to a compound literal (the operator that needs to be
used to extend the range of statements over which an unnamed object can be accessed) are outweighed by
the probably cost of faults.
June 24, 2009 v 1.2
6.5.2.5 Compound literals
1068
The guideline recommendation dealing with assigning the address of an object to a pointer object, whose
lifetime is greater than that of the addressed object, is applicable here.
object
address assigned
1088.1
1 #include <stdlib.h>
2
3 extern int glob;
4 struct s_r {
5 int mem;
6 };
7
8 void f(void)
9 {
10 struct s_r
*
p_s_r;
11
12 if (glob == 0)
13 {
14 p_s_r = &((struct s_r){1});
15 }
16 else
17 {
18 p_s_r = &((struct s_r){2});
19 }
20 /
*
The value of p_s_r is indeterminate here.
*
/
21
22 /
*
23
*
The iteration-statements all enclose their associated bodies in
24
*
a block. The effect of this block is to start and terminate
25
*
the lifetime of the contained compound literal.
26
*
/
27 p_s_r=NULL;
28 while (glob < 10)
29 {
30 /
*
31
*
In the following test the value of p_s_r is indeterminate
32
*
on the second and subsequent iterations of the loop.
33
*
/
34 if (p_s_r == NULL)
35 ;
36 p_s_r = &((struct s_r){1});
37 }
38 }
1067
All the semantic rules and constraints for initializer lists in 6.7.8 are applicable to compound literals.
82)
Commentary
They are the same except
•
initializer lists don’t create objects, they are simply a list of values with which to initialize an object;
and
• the type is deduced from the object being initialized, not a type name.
Coding Guidelines
Many of the coding guideline issues discussed for initializers also apply to compound literals.
initialization
syntax
1641
1068
String literals, and compound literals with const-qualified types, need not designate distinct objects.
83)
string literal
distinct object
compound literal
distinct object
v 1.2 June 24, 2009
6.5.2.5 Compound literals
1070
Commentary
A strictly conforming program can deduce if an implementation uses the same object for two string literals,
or compound literals, by performing an equality comparison on their addresses (an infinite number of
1076 EXAMPLE
string literals
shared
comparisons would be needed to deduce whether an implementation always used distinct objects). This
permission for string literals is also specified elsewhere.
908 string literal
distinct array
The only way a const-qualified object can be modified is by casting a pointer to it to a non-const-qualified
pointer. Such usage results in undefined behavior. The undefined behavior, if the pointer was used to modify
746 pointer
converting quali-
fied/unqualified
such an unnamed object that was not distinct, could also modify the values of other compound literal object
values.
Other Languages
Most languages do not consider any kind of literal to be modifiable, so whether they share the same storage
locations is not an issue.
Common Implementations
The extent to which developers will use compound literals having a const-qualified type, for which storage
is allocated and whose values form a sharable subset with another compound literal, remains to be seen.
Without such usage it is unlikely that implementors of optimizers will specifically look for savings in this
area, although they may come about as a consequence of optimizations not specifically aimed at compound
literals.
Example
In the following there is an opportunity to overlay the two unnamed objects containing zero values.
1 const int
*
p1 = (const int [99]){0};
2 const int
*
p2 = (const int [20]){0};
1069
EXAMPLE 1 The file scope definition
int
*
p = (int []){2, 4};
initializes
p
to point to the first element of an array of two ints, the first having the value two and the second,
four. The expressions in this compound literal are required to be constant. The unnamed object has static
storage duration.
Commentary
This usage, rather than the more obvious
int p[] = {2, 4};
, can arise because the initialization value is
derived through macro replacement. The same macro replacement is used in noninitialization contexts.
1070
EXAMPLE 2 In contrast, in
void f(void)
{
int
*
p;
/
*
...
*
/
p = (int [2]){
*
p};
/
*
...
*
/
}
p
is assigned the address of the first element of an array of two ints, the first having the value previously
pointed to by
p
and the second, zero. The expressions in this compound literal need not be constant. The
unnamed object has automatic storage duration.
Commentary
The assignment of values to the unnamed object occurs before the value of the right operand is assigned to
p
.
June 24, 2009 v 1.2
6.5.2.5 Compound literals
1074
Example
The above example is not the same as declaring p to be an array.
1 void f(void)
2 {
3 int p[2]; /
*
Storage for p is created by its definition.
*
/
4
5 /
*
6
*
Cannot assign new object to p, can only change existing values.
7
*
/
8 p[1]=0;
9 }
1071
EXAMPLE 3 Initializers with designations can be combined with compound literals. Structure objects created
using compound literals can be passed to functions without depending on member order:
drawline((struct point){.x=1, .y=1},
(struct point){.x=3, .y=4});
Or, if drawline instead expected pointers to struct point:
drawline(&(struct point){.x=1, .y=1},
&(struct point){.x=3, .y=4});
Commentary
This usage removes the need to create a temporary in the calling function. The arguments are passed by
value, like any other structure argument.
1072
EXAMPLE 4 A read-only compound literal can be specified through constructions like:
(const float []){1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6}
Commentary
An implementation may choose to place the contents of this compound literal in read-only memory, but it is
not required to do so. The term read-only is something of a misnomer, since it is possible to cast its address
to a non-const-qualified type and assign to the pointed-to object. (The behavior is undefined, but unless the
values are held in a kind of storage that cannot be modified, they are likely to be modified.)
Other Languages
Some languages support a proper read-only qualifier.
Common Implementations
On some freestanding implementations this compound literal might be held in ROM.
1073
82) For example, subobjects without explicit initializers are initialized to zero.footnote
82
Commentary
This behavior reduces the volume of the visible source code when the object type includes large numbers of
members or elements.
initializer
fewer in list
than members
1682
Coding Guidelines
Some of the readability issues applicable to statements have different priorities than those for declarations.
These are discussed elsewhere.
initialization
syntax
1641
1074
83) This allows implementations to share storage for string literals and constant compound literals with the
footnote
83
same or overlapping representations.
v 1.2 June 24, 2009
6.5.2.5 Compound literals
1077
Commentary
The need to discuss an implementation’s ability to share storage for string literals occurs because it is
possible to detect such sharing in a conforming program (e.g., by comparing two pointers assigned the
addresses of two distinct, in the visible source code, string literals). The C Committee choose to permit this
implementation behavior. (There were existing implementations, when the C90 Standard was being drafted,
that shared storage.)
1075
EXAMPLE 5 The following three expressions have different meanings:
"/tmp/fileXXXXXX"
(char []){"/tmp/fileXXXXXX"}
(const char []){"/tmp/fileXXXXXX"}
The first always has static storage duration and has type array of
char
, but need not be modifiable; the last
two have automatic storage duration when they occur within the body of a function, and the first of these two
is modifiable.
Commentary
In all three cases, a pointer to the start of storage is returned and the first 16 bytes of the storage allocated
will have the same set of values. If all three expressions occurred in the same source file, the first and third
could share the same storage even though their storage durations were different. Developers who see a
1076 EXAMPLE
string literals
shared
potential storage saving in using a compound literal instead of a string literal (the storage for one only need
be allocated during the lifetime of its enclosing block) also need to consider potential differences in the
number of machine code instructions that will be generated. Overall, there may be no savings.
1076
EXAMPLE 6 Like string literals, const-qualified compound literals can be placed into read-only memory and
EXAMPLE
string liter-
als shared
can even be shared. For example,
(const char []){"abc"} == "abc"
might yield 1 if the literals’ storage is shared.
Commentary
In this example pointers to the first element of the compound literal and a string literal are being compared
for equality. Permission to share the storage allocated for a compound literal only applies to those having a
const-qualified type (there is no such restriction on string literals).
1068 compound
literal
distinct object
908 string literal
distinct array
Coding Guidelines
Comparing string using an equality operator, rather than a call to the
strcmp
library function is a common
beginner mistake. Training is the obvious solution.
Usage
In the visible source of the
.c
files 0.1% of string literals appeared as the operand of the equality operator
(representing 0.3% of the occurrences of this operator).
1077
EXAMPLE 7 Since compound literals are unnamed, a single compound literal cannot specify a circularly
linked object. For example, there is no way to write a self-referential compound literal that could be used as
the function argument in place of the named object endless_zeros below:
struct int_list { int car; struct int_list
*
cdr; };
struct int_list endless_zeros = {0, &endless_zeros};
eval(endless_zeros);
June 24, 2009 v 1.2
6.5.2.5 Compound literals
1079
Commentary
A modification using pointer types, and an additional assignment, creates a circularly linked list that uses the
storage of the unnamed object:
1 struct int_list { int car; struct int_list
*
cdr; };
2 struct int_list
*
endless_zeros = &(struct int_list){0, 0};
3
4 endless_zeros->cdr=endless_zeros; /
*
Let’s follow ourselves.
*
/
The following statement would not have achieved the same result:
1 endless_zeros = &(struct int_list){0, endless_zeros};
because the second compound literal would occupy a distinct object, different from the first. The value of
endless_zeros
in the second compound literal would be pointing at the unnamed object allocated for the
first compound literal.
Other Languages
Algol 68 supports the creation of circularly linked objects (see the Other Languages subsection in the
following C sentence).
1078
EXAMPLE 8 Each compound literal creates only a single object in a given scope:EXAMPLE
compound literal
single object
struct s { int i; };
int f (void)
{
struct s
*
p = 0,
*
q;
int j = 0;
again:
q = p, p = &((struct s){ j++ });
if (j < 2) goto again;
return p == q && q->i == 1;
}
The function f() always returns the value 1.
Note that if an iteration statement were used instead of an explicit goto and a labeled statement, the lifetime
of the unnamed object would be the body of the loop only, and on entry next time around
p
would have an
indeterminate value, which would result in undefined behavior.
Commentary
Specifying that a single object is created helps prevent innocent-looking code consuming large amounts of
storage (e.g., use of a compound literal in a loop).
Other Languages
In Algol 68
LOC
creates storage for block scope objects. However, it generates new storage every time it is
executed. The following allocates 1,000 objects on the stack.
1 MODE M = STRUCT (REF M next, INT i);
2 M p;
3 INT i := 0
4
5 again:
6 p := LOC M := (p, i);
7 i +:= 1;
8 IF i < 1000 THEN
9 GO TO again
10 FI;
v 1.2 June 24, 2009
6.5.3 Unary operators
1080
1079
Forward references: type names (6.7.6), initialization (6.7.8).
6.5.3 Unary operators
1080
unary-expression
syntax
unary-expression:
postfix-expression
++ unary-expression
-- unary-expression
unary-operator cast-expression
sizeof unary-expression
sizeof ( type-name )
unary-operator: one of
&
*
+ - ~ !
Commentary
Note that the operand of unary-operator is a
cast-expression
, not a
unary-expression
. A unary operator
1133 cast-
expression
syntax
usually refers to an operator that takes a single argument. Technically all of the operators listed here, plus the
postfix increment and decrement operators, could be considered as being unary operators.
Rationale
Unary plus was adopted by the C89 Committee from several implementations, for symmetry with unary minus.
Other Languages
Some languages (i.e., Ada and Pascal) specify the unary operators to have lower precedence than the
multiplicative operators; for instance,
-x/y
is equivalent to
-(x/y)
in Ada, but
(-x)/y
in C. Most languages
1143 multiplicative-
expression
syntax
call all operators that take a single-operand unary operators.
Languages that support the unary
+
operator include Ada, Fortran, and Pascal. Some languages use the
keyword
NOT
rather than
!
. In the case of Cobol this keyword can also appear to the left of an operator,
indicating negation of the operator (i.e., NOT < meaning not less than).
Coding Guidelines
Coding guidelines need to be careful in their use of the term unary operator. Its meaning, as developers
understand it, may be different from its actual definition in C. The operators in a
unary-expression
occur
to the left of the operand. The only situation where a developer’s incorrect assumption about precedence
relationships might lead to a difference between predicted and actual behavior is when a postfix operator
occurs immediately to the right of the unary-expression.
Dev
943.1
Except when
sizeof ( type-name )
is immediately followed visually by a token having the lexical form
of an additive operator, if a
unary-expression
is not immediately followed by a postfix operator it need
not be parenthesized.
Although the expression
sizeof (int)-1
may not occur in the visible source code, it could easily occur as
the result of macro replacement of the operand of the
sizeof
operator. This is one of the reasons behind the
guideline recommendation specifying the parenthesizing of macro bodies (without parentheses the expression
1931.2 macro
definition
expression
is equivalent to (sizeof(int))-1).
Example
1 struct s {
2 int x;
3 };
June 24, 2009 v 1.2
6.5.3 Unary operators
1080
4 struct s
*
a;
5 int x;
6
7 void f(void)
8 {
9 x<-a->x;
10 x<--a->x;
11 x<- --a->x;
12 x<- - --a->x;
13
14 sizeof(long)-3; /
*
Could be mistaken for sizeof a cast-expression.
*
/
15 (sizeof(long))-3;
16 sizeof((long)-3);
17 }
Usage
See the Usage section of postfix-expression for ++ and -- digraph percentages.
postfix-
expression
syntax
985
Table 1080.1:
Common token pairs involving
sizeof
,
unary-operator
, prefix
++
, or prefix
--
(as a percentage of all
occurrences of each token). Based on the visible form of the .c files.
Token Sequence
% Occurrence
of First Token
% Occurrence of
Second Token
Token Sequence
% Occurrence
of First Token
% Occurrence of
Second Token
! defined 2.0 16.7 ! ( 14.5 0.5
*
v --v 0.3 7.8 -v identifier 30.2 0.4
-v floating-constant 0.3 6.7
*
v ( 9.0 0.4
*
v ++v 0.5 6.3 ~ integer-constant 20.1 0.2
! --v 0.2 4.8 ++v identifier 97.3 0.1
-v integer-constant 69.0 4.1 ~ identifier 56.3 0.1
&v identifier 96.1 1.9 ~ ( 23.4 0.1
sizeof ( 97.5 1.8 +v integer-constant 49.0 0.0
*
v identifier 86.8 1.0 --v identifier 97.1 0.0
! identifier 81.9 0.8
Numeric value
Occurrences
0 16 32 64 100 128 150 200 255
1
10
100
1,000
10,000
100,000
unary -
× decimal-constant
•
hexadecimal-constant
×
×
•
×
×
×
×
×
×
•
×
•
×
×
×
×
×
×
×
•
×
•
×
×
×
×
×
•
×
•
×
•
×
•
×
×
×
×
×
××
×
•
×
×
•
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
××
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
××
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
•
×
•
×
×
×
××××
×
×
×
×
×
×
×
×
×
×
×××
×
×
×
×
×
×
×
×
××
×
×
×
×
×
××
××
×
×
×
××
×
×
×
×
×
×
×
×
×
×
×××××
××
×
×
××
×
×××
×
×
×
×
×
×
×
×
×
×
××
×
××
×
×
×
×
×
×
×
×
×××
×
××
×
×
×
××
×
×
×
×
××
×
•
×
Numeric value
0 16 32 64 100 128 150 200 255
unary ˜
×
•
×
•
×
•
×
•
×
•
×
×
•
×
•
×
•
•
•
•
×
•
×
•
•
•••
•
•
×
×
•
×
•
•
•
•
•
••
•
•
×
•
×
•
•
×
••
•
•
•
•
•
•• •
×
•
×
•
•
•
• •
×
•
•
•
•
•
•
•
••
•
•
•
•
•
×
•
•
Figure 1080.1:
Number of
integer-constant
s having a given value appearing as the operand of the unary minus and unary
~
operators. Based on the visible form of the .c files.
v 1.2 June 24, 2009
6.5.3.1 Prefix increment and decrement operators
1082
Table 1080.2:
Occurrence of the
unary-operator
s, prefix
++
, and prefix
--
having particular operand types (as a percentage of
all occurrences of the particular operator; an _ prefix indicates a literal operand). Based on the translated form of this book’s
benchmark programs.
Operator Type % Operator Type % Operator Type %
-v _int 96.0 ~ unsigned long 6.8 ! _long 2.7
*
v ptr-to 95.3 &v int 6.2 ~ unsigned char 2.5
+v _int 72.2 ~ unsigned int 6.0 &v unsigned char 2.4
--v int 54.7 +v unsigned long 5.6 ! unsigned long 2.1
! int 50.0 +v long 5.6 ~ long 2.0
~ _int 49.3 +v float 5.6 ++v unsigned char 1.9
&v other-types 45.1 ! other-types 5.6 ~ _unsigned long 1.7
++v int 43.8 ++v unsigned long 5.2 ~ _unsigned int 1.7
++v ptr-to 33.3 &v struct
*
4.9 ! unsigned char 1.6
~ int 28.5 --v unsigned long 4.7 ~ other-types 1.6
--v unsigned int 22.1 ! unsigned int 4.7 -v _double 1.4
! ptr-to 20.1
*
v fnptr-to 4.1 -v other-types 1.3
--v ptr-to 14.6 &v unsigned long 4.0 ++v long 1.2
&v struct 13.9 --v other-types 4.0 -v int 1.2
&v char 13.1 &v long 3.4 ! _int 1.2
++v unsigned int 12.6 &v unsigned int 3.0 ++v unsigned short 1.1
+v int 11.1 &v unsigned short 2.9 &v char
*
1.1
! char 9.2 ! enum 2.9
6.5.3.1 Prefix increment and decrement operators
Constraints
1081
The operand of the prefix increment or decrement operator shall have qualified or unqualified real or pointer
postfix operator
operand
type and shall be a modifiable lvalue.
Commentary
This constraint mirrors that for the postfix forms of these operators.
1081 postfix
operator
operand
C
++
The use of an operand of type
bool
with the prefix
++
operator is deprecated (5.3.2p1); there is no corre-
sponding entry in annex D, but the proposed response to C
++
DR #145 inserted one. In the case of the
decrement operator:
5.3.2p1
The operand shall not be of type bool.
A C source file containing an instance of the prefix
--
operator applied to an operand having type
_Bool
is
likely to result in a C
++
translator issuing a diagnostic.
Coding Guidelines
Enumerated types are usually thought about in symbolic rather than arithmetic terms. The increment and
822 symbolic
name
517 enumeration
set of named
constants
decrement operators can also be given a symbolic interpretation. They are sometimes thought about in terms
of moving on to the next symbolic name in a list. This move to next operation relies on the enumeration
constants being represented by successive numeric values. While this usage is making use of representation
information, there is often a need to step through a series of symbolic names (and C provides no other built-in
mechanism), for instance, iterating over the named constants defined by an enumerated type.
1199 relational
operators
real operands
Dev
569.1
The operand of a prefix increment or decrement operator may have an enumerated type, provided the
enumeration constants defined by that type have successive numeric values.
Semantics
June 24, 2009 v 1.2
6.5.3.1 Prefix increment and decrement operators
1083
1082
The value of the operand of the prefix ++ operator is incremented.prefix ++
incremented
Commentary
The ordering of this and the following C sentence is the reverse of that specified for the postfix ++ operator.
postfix ++
result
1047
Common Implementations
The implementation of this operator is usually very straight-forward. A value is loaded into a register,
incremented, and then stored back into the original object, leaving the result in the register. Some CISC
processors contain instructions that increment the contents of storage directly. Processors that have a stack-
based architecture either need to contain store instructions that leave the value on the stack, or be willing to
pay the penalty of another load from storage.
Coding Guidelines
Translators have now progressed to the point where the optimizations many of them perform are much more
sophisticated than those needed to detect the more verbose sequence of operations equivalent to the prefix
++
operator. The writers of optimizers study existing source code to find out what constructs occur frequently
(they don’t want to waste time and money implementing optimizations for constructs that rarely occur).
However, in existing code it is rare to see an object being incremented (or decremented) without one of these
operators being used. Consequently optimizers are unlikely to attempt to transform the C source
i=i+1
into
++i
(which they might have to do for say Pascal, which has no increment operators requiring optimizers to
analyze an expression looking for operations that are effectively increment object). So the assertion that
++i
can be written as
i=i+1
and that it will be optimized by the translator is not guaranteed, even for a
highly optimizing translator. However, this is rarely an important issue anyway; the difference in quality of
generated machine code rarely has any impact on program performance.
From the coding guidelines perspective, uses of these operators can be grouped into three categories:
1.
The only operator in an expression statement. In this context the result returned by the operation is
ignored. The statement simply increments/decrements its operand. Use of the prefix, rather than the
postfix, form does not follow the pattern seen at the start of most visible source code statement lines—
an identifier followed by an operator (see Figure 940.2). A reader’s scanning of the source looking for
objects that are modified will be disrupted by the initial operator. For this reason, use of the postfix
form is recommended.
postfix
operator
constraint
1046
2.
One of the operators in a full expression that contains other operators. It is possible to write the code
full ex-
pression
1712
so that a prefix operator does not occur in the same expression as other operators. The evaluation can
be moved back before the containing expression (see the postfix operators for a fuller discussion of
this point).
postfix
operator
constraint
1046
1 ...++i...
becomes the equivalent form:
1 i++;
2 ...i...
The total cognitive effort needed to comprehend the equivalent form may be less than the prefix form,
and the peak effort is likely to be less (because the operations may have been split into smaller chunks
in serial rather than nested form).
3. The third point is the same as for the postfix operators.
postfix
operator
constraint
1046
Cg
1082.1
The prefix operators shall not appear in an expression statement.
v 1.2 June 24, 2009
6.5.3.2 Address and indirection operators
1088
1083
The result is the new value of the operand after incrementation. prefix ++
result
Other Languages
Pascal contains the
succ
operator. This returns the successor value (i.e., it adds one to its operand), but it
does not modify the value of an object appearing as its operand.
1084
The expression ++E is equivalent to (E+=1).
Commentary
The expression ++E need not be equivalent to E=E+1 (e.g., the expression E may contain a side effect).
C
++
C
++
lists an exception (5.3.2p1) for the case when
E
has type
bool
. This is needed because C
++
does not
define its boolean type in the same way as C. The behavior of this operator on operands is defined as a special
476 _Bool
large enough
to store 0 and 1
case in C
++
. The final result is the same as in C.
1085
See the discussions of additive operators and compound assignment for information on constraints, types,
prefix operators
see also
side effects, and conversions and the effects of operations on pointers.
Commentary
The same references are given for the postfix operators.
1050 postfix op-
erators
see also
C
++
5.3.2p1
[Note: see the discussions of addition (5.7) and assignment operators (5.17) for information on conversions. ]
There is no mention that the conditions described in these clauses also apply to this operator.
1086
The prefix
--
operator is analogous to the prefix
++
operator, except that the value of the operand is
decremented.
Commentary
The same Commentary and Coding Guidelines’ issues also apply. See the discussion elsewhere for cases
1082 prefix ++
incremented
1052 postfix --
analogous to ++
where the affects are not analogous.
C
++
The prefix
--
operator is not analogous to the prefix
++
operator in that its operand may not have type
bool
.
Other Languages
Pascal contains the
pred
reserved identifier. This returns the predecessor value, but does not modify the
value of its operand.
Coding Guidelines
The guideline recommendation for the prefix ++ operator has been worded to apply to either operator.
1082.1 prefix
in expression
statement
1087
Forward references: additive operators (6.5.6), compound assignment (6.5.16.2).
6.5.3.2 Address and indirection operators
Constraints
1088
The operand of the unary
&
operator shall be either a function designator, the result of a
[]
or unary
*
operator,
unary &
operand
constraints
or an lvalue that designates an object that is not a bit-field and is not declared with the
register
storage-class
specifier.
June 24, 2009 v 1.2
6.5.3.2 Address and indirection operators
1088
Commentary
Bit-fields are permitted (intended even) to occupy part of a storage unit. Requiring bit addressing could be a
bit-field
packed into
1410
huge burden on implementations. Very few processors support bit addressing and C is based on the byte
being the basic unit of addressability.
byte
addressable unit
53
The
register
storage-class specifier is only a hint to the translator. Taking the address of an object
register
storage-class
1369
could effectively prevent a translator from keeping its value in a register. A harmless consequence, but the C
Committee decided to make it a constraint violation.
C90
The words:
. . . , the result of a [ ] or unary
*
operator,
are new in C99 and were added to cover the following case:
1 int a[10];
2
3 for (int
*
p = &a[0]; p < &a[10]; p++)
4 /
*
...
*
/
where C90 requires the operand to refer to an object. The expression
a+10
exists, but does not refer to an
object. In C90 the expression &a[10] is undefined behavior, while C99 defines the behavior.
C
++
Like C90 the C
++
Standard does not say anything explicit about the result of a
[]
or unary
*
operator. The C
++
Standard does not explicitly exclude objects declared with the
register
storage-class specifier appearing as
operands of the unary & operator. In fact, there is wording suggesting that such a usage is permitted:
7.1.1p3
A
register
specifier has the same semantics as an
auto
specifier together with a hint to the implementation
that the object so declared will be heavily used. [Note: the hint can be ignored and in most implementations it
will be ignored if the address of the object is taken. —end note]
Source developed using a C
++
translator may contain occurrences of the unary
&
operator applied to an
operand declared with the
register
storage-class specifier, which will cause a constraint violation if
processed by a C translator.
1 void f(void)
2 {
3 register int a[10]; /
*
undefined behavior
*
/
4 // well-formed
5
6 &a[1] /
*
constraint violation
*
/
7 // well-formed
8 ;
9 }
Other Languages
Many languages that support pointers have no address operator (e.g., Pascal and Java, which has references,
not pointers). In these languages, pointers can only point at objects returned by the memory-allocation
functions. The address-of operator was introduced in Ada 95 (it was not in available in Ada 83). Many
languages do not allow the address of a function to be taken.
Coding Guidelines
In itself, use of the address-of operator is relatively harmless. The problems occur subsequently when the
value returned is used to access storage. The following are three, coding guideline related, consequences of
being able to take the address of an object:
v 1.2 June 24, 2009
6.5.3.2 Address and indirection operators
1088
•
It provides another mechanism for accessing the individual bytes of an object representation (a pointer
to an object can be cast to a pointer to character type, enabling the individual bytes of an object
representation to be accessed).
761 pointer
converted to
pointer to charac-
ter
• It is an alias for the object having that address.
•
It provides a mechanism for accessing the storage allocated to an object after the lifetime of that object
has terminated.
Assigning the address of an object potentially increases the scope over which that object can be accessed.
When is it necessary to increase the scope of an object? What are the costs/benefits of referring to an object
using its address rather than its name? (If a larger scope is needed, could an objects definition be moved to a
scope where it is visible to all source code statements that need to refer to it?)
The parameter-passing mechanism in C is pass by value. What is often known as pass by reference is
1004 function call
preparing for
achieved, in C, by explicitly passing the address of an object. Different calls to a function having pass-
by-reference arguments can involve different objects in different calls. Passing arguments, by reference,
to functions is not a necessity; it is possible to pass information into and out of functions using file scope
objects.
Assigning the address of an object creates an alias for that object. It then becomes possible to access the
same object in more than one way. The use of aliases creates technical problems for translators (the behavior
implied by the use of the
restrict
keyword was introduced into C99 to help get around this problem) and
1491 restrict
intended use
can require developers to use additional cognitive resources (they need to keep track of aliased objects).
A classification often implicitly made by developers is to categorize objects based on how they are
accessed, the two categories being those accessed by the name they were declared with and those accessed
via pointers. A consequence of using this classification is that developers overlook the possibility, within a
sequence of statements, of a particular object being modified via both methods. When readers are aware of an
object having two modes of reference (a name and a pointer dereference) is additional cognitive effort needed
to comprehend the source? Your author knows of no research in on this subject. These coding guidelines
discuss the aliasing issue purely from the oversight point of view (faults being introduced because of lack of
information), because there is no known experimental evidence for any cognitive factors.
One way of reducing aliasing issues at the point of object access is to reduce the number of objects whose
addresses are taken. Is it possible to specify a set of objects whose addresses should not be taken and what
are the costs of having no alternatives for these cases? Is the cost worth the benefit? Restricting the operands
of the address operator to be objects having block scope would limit the scope over which aliasing could
occur. However, there are situations where the addresses of objects at file scope needs to be used, including:
•
An argument to a function could be an object with block scope, or file scope; for instance, the
qsort
function might be called.
•
In resource-constrained environments it may be decided not to use dynamic storage allocation. For
instance, all of the required storage may be defined at file scope and pointers to objects within this
storage used by the program.
•
The return from a function call is sometimes a pointer to an object, holding information. It may
simplify storage management if this is a pointer to an object at file scope.
The following guideline recommendation ensures that the storage allocated to an object is not accessed once
the object’s lifetime has terminated.
Cg
1088.1
The address of an object shall not be assigned to another object whose scope is greater than that of
the object assigned.
Dev
1088.1
An object defined in block scope, having static storage duration, may have its address assigned to any
other object.
June 24, 2009 v 1.2
6.5.3.2 Address and indirection operators
1090
A function designator can appear as the operand of the address-of operator. However, taking the address of a
function is redundant. This issue is discussed elsewhere. Likewise for objects having an array type.
function
designator
converted to type
732
array
converted
to pointer
729
Example
In the following it is not possible to take the address of a or any of its elements.
1 register int a[3];
In fact this object is virtually useless (the identifier
a
can appear as the operand to the
sizeof
operator). If
allocated memory is not permitted (we know the memory requirements of the following on program startup):
1 extern int
*
p;
2
3 void init(void)
4 {
5 static int p_obj[20];
6
7 p=&p_obj;
8 }
This provides pointers to objects, but hides those objects within a block scope. There is no pointer/identifier
aliasing problem.
1089
The operand of the unary
*
operator shall have pointer type.unary *
operand has
pointer type
Commentary
Depending on the context in which it occurs, there may be restrictions on the pointed-to type (because of the
type of the result).
unary *
result type
1098
C
++
5.3.1p1
The unary
*
operator performs indirection: the expression to which it is applied shall be a pointer to an object
type, or a pointer to a function type . . .
C
++
does not permit the unary
*
operator to be applied to an operand having a pointer to void type.
1 void
*
g_ptr;
2
3 void f(void)
4 {
5 &
*
g_ptr; /
*
DR #012
*
/
6 // DR #232
7 }
Other Languages
In some languages indirection is a postfix operator; for instance, Pascal uses the token
^
as a postfix operator.
Semantics
1090
The unary & operator yields the address of its operand.unary &
operator
v 1.2 June 24, 2009
6.5.3.2 Address and indirection operators
1092
Commentary
For operands with static storage duration, the value of the address operator may be a constant (objects having
1341 address
constant
an array type also need to be indexed with a constant expression). There is no requirement that the address
of an object be the same between different executions of the same program image (for objects with static
storage duration) or different executions of the same function (for objects with automatic storage duration).
All external function references are resolved during translation phase 8. Any identifier denoting a function
139 transla-
tion phase
8
definition will have been resolved.
The C99 Standard refers to this as the address-of operator.
1014 footnote
79
C90
This sentence is new in C99 and summarizes what the unary & operator does.
C
++
Like C90, the C
++
Standard specifies a pointer to its operand (5.3.1p1). But later on (5.3.1p2) goes on to say:
“In particular, the address of an object of type “cv T” is “pointer to cv T,” with the same cv-qualifiers.”
Other Languages
Many languages do not contain an address-of operator. Fortran 95 has an address assignment operator,
=>
.
The left operand is assigned the address of the right operand.
Common Implementations
Early versions of K&R C treated p=&x as being equivalent to p&=x.
[734]
In the case of constant addresses the value used in the program image is often calculated at link-time. For
objects with automatic storage duration, their address is usually calculated by adding a known, at translation
time, value (the offset of an object within its local storage area) to the value of the frame pointer for that
function invocation. Addresses of elements, or members, of objects can be calculated using the base address
of the object plus the offset of the corresponding subobject.
Having an object appear as the operand of the address-of operator causes many implementations to play
safe and not attempt to perform some optimizations on that object. For instance, without sophisticated pointer
analysis, it is not possible to know which object a pointer dereference will access. (Implementations often
assume all objects that have had their address taken are possible candidates, others might use information on
the pointed-to type to attempt to reduce the set of possible accessed objects.) This often results in no attempt
being made to keep the values of such objects in registers.
Implementations’ representation of addresses is discussed elsewhere.
540 pointer type
describes a
1091
If the operand has type “type”, the result has type “pointer to type”.
Commentary
Although developers often refer to the address returned by the address-of operator, C does not have an
address type.
1092
If the operand is the result of a unary
*
operator, neither that operator nor the
&
operator is evaluated and the
&*
result is as if both were omitted, except that the constraints on the operators still apply and the result is not an
lvalue.
Commentary
The only effect of the operator pair
&
*
is to remove any lvalueness from the underlying operand. The
1114 footnote
84
combination
*
&
returns an lvalue if its operand is an lvalue. This specification is consistent with the behavior
1115 *&
of the last operator applied controlling lvalue-ness. This case was added in C99 to cover a number of existing
coding idioms; for instance:
1 #include <stddef.h>
2
June 24, 2009 v 1.2
6.5.3.2 Address and indirection operators
1093
3 void DR_076(void)
4 {
5 int
*
n = NULL;
6 int
*
p;
7
8 /
*
9
*
The following case is most likely to occur when the
10
*
expression
*
n is a macro argument, or body of a macro.
11
*
/
12 p = &
*
n;
13 /
*
...
*
/
14 }
C90
The responses to DR #012, DR #076, and DR #106 specified that the above constructs were constraint
violations. However, no C90 implementations known to your author diagnosed occurrences of these
constructs.
C
++
This behavior is not specified in C
++
. Given that either operator could be overloaded by the developer to have
a different meaning, such a specification would be out of place.
At the time of this writing a response to C
++
DR #232 is being drafted (a note from the Oct 2003 WG21
meeting says: “We agreed that the approach in the standard seems okay:
p = 0;
*
p;
is not inherently an
error. An lvalue-to-rvalue conversion would give it undefined behavior.”).
1 void DR_232(void)
2 {
3 int
*
loc = 0;
4
5 if (&
*
loc == 0) /
*
no dereference of a null pointer, defined behavior
*
/
6 // probably not a dereference of a null pointer.
7 ;
8
9 &
*
loc = 0; /
*
not an lvalue in C
*
/
10 // how should an implementation interpret the phrase must not (5.3.1p1)?
11 }
Common Implementations
Some C90 implementations did not optimize the operator pair
&
*
into a no-op. In these implementations the
behavior of the unary
*
operator was not altered by the subsequent address-of operator. C99 implementations
are required to optimize away the operator pair &
*
.
1093
Similarly, if the operand is the result of a
[]
operator, neither the
&
operator nor the unary
*
that is implied by
the
[]
is evaluated and the result is as if the
&
operator were removed and the
[]
operator were changed to a
+ operator.
Commentary
This case was added in C99 to cover a number of coding idioms; for instance:
1 void DR_076(void)
2 {
3 int a[10];
4 int
*
p;
5
6 /
*
7
*
It is possible to point one past the end of an object.
v 1.2 June 24, 2009
6.5.3.2 Address and indirection operators
1095
8
*
For instance, we might want to loop over an object, using
9
*
this one past the end value. Given the equivalence that
10
*
applies to the subscript operator the operand of & in the
11
*
following case is the result of a unary
*
operator.
12
*
/
13 p = &a[10];
14
15 for (p = &a[0]; p < &a[10]; p++)
16 /
*
...
*
/ ;
17 }
C90
This requirement was not explicitly specified in the C90 Standard. It was the subject of a DR #076 that was
closed by adding this wording to the C99 Standard.
C
++
This behavior is not specified in C
++
. Given that either operator could be overloaded by the developer to have
a different meaning, such a specification would be out of place. The response to C
++
DR #232 may specify
the behavior for this case.
Common Implementations
This requirement describes how all known C90 implementations behave.
Coding Guidelines
The expression &a[index], in the visible source code, could imply
• a lack of knowledge of C semantics (why wasn’t a+index written?),
• that the developer is trying to make the intent explicit, and
•
that the developer is adhering to a coding standard that recommends against the use of pointer
arithmetic— the authors of such standards often view
(a+index)
as pointer arithmetic, but
a[index]
as an array index (the equivalence between these two forms being lost on them).
989 array sub-
script
identical to
1094
Otherwise, the result is a pointer to the object or function designated by its operand.
Commentary
There is no difference between the use of objects having a pointer type and using the address-of operator. For
instance, the result of the address-of operator could be assigned to an object having the appropriate pointer
type, and that object used interchangeably with the value assigned to it.
Common Implementations
In most implementations a pointer refers to the actual address of an object or function.
540 pointer type
describes a
1095
The unary
*
operator denotes indirection. unary *
indirection
Commentary
The terms indirection and dereference are both commonly used by developers.
C
++
5.3.1p1
The unary
*
operator performs indirection.
June 24, 2009 v 1.2
6.5.3.2 Address and indirection operators
1096
Other Languages
Some languages (e.g., Pascal and Ada) use the postfix operator
^
. Other languages— Algol 68 and Fortran
95— implicitly perform the indirection operation. In this case, an occurrence of operand, having a pointer
type, is dereferenced to return the value of the pointed-to object.
Coding Guidelines
Some coding guideline documents place a maximum limit on the number of simultaneous indirection
operators that can be successively applied. The rationale being that deeply nested indirections can be difficult
to comprehend. Is there any substance to this claim?
Expressions, such as
***
p
, are similar to nested function calls in that they have to be comprehended in a
sequential nesting
*
sequential
nesting
()
1000
right-to-left order. The issue of nested constructions in natural language is discussed in that earlier C sentence.
At the time of this writing there is insufficient experimental evidence to enable a meaningful cost/benefit
analysis to be performed and these coding guidelines say nothing more about this issue.
If sequences of unary
*
operators are needed in an expression, it is because an algorithm’s data structures
make the usage necessary. In practice, long sequences of indirections using the unary
*
operator are rare. Like
the function call case, it may be possible to provide a visual form that provides a higher-level interpretation
and hides the implementation’s details of the successive indirections.
An explicit unary
*
operator is not the only way of specifying an indirection. Both the array subscript,
member
selection
1031
[]
, and member selection,
->
, binary operators imply an indirection. Developers rarely use the form
(
*
s).m
(
(&s)->m
), the form
s->m
(
s.m
) being much more obvious and natural. While the expression
s1->m1->m2->m3
is technically equivalent to
(
*
(
*
(
*
s1).m1).m2).m3
, it is comprehended in a left-to-right
order.
Usage
A study by Mock, Das, Chambers, and Eggers
[965]
looked at how many different objects the same pointer
dereference referred to during program execution (10 programs from the SPEC95 and SPEC2000 bench-
SPEC
benchmarks
0
marks were used). They found that in 90% to 100% of cases (average 98%) the set of objects pointed at, by a
particular pointer dereference, contained one item. They also performed a static analysis of the source using
a variety of algorithms for deducing points-to sets. On average (geometric mean) the static points to sets
were 3.3 larger than the dynamic points to sets.
1096
If the operand points to a function, the result is a function designator;
Commentary
The operand could be an object, with some pointer to function type, or it could be an identifier denoting
a function that has been implicitly converted to a pointer to function type. This result is equivalent to the
original function designator. Depending on the context in which it occurs this function designator may be
function
designator
731
converted to a pointer to function type.
function
designator
converted to type
732
C
++
The C
++
Standard also specifies (5.3.1p1) that this result is an lvalue. This difference is only significant for
reference types, which are not supported by C.
Other Languages
Those languages that support some form of pointers to functions usually only provide a mechanism for,
indirect, calls to the designated value. Operators for obtaining the function designator independent of a call
are rarely provided. Some languages (e.g., Algol 88, Lisp) provide a mechanism for defining anonymous
functions in an expression context, which can be assigned to objects and subsequently called.
Common Implementations
For most implementations the result is an address of a storage location. Whether there is a function definition
(translated machine code) at that address is not usually relevant until an attempt is made to call the designated
function (using the result).
v 1.2 June 24, 2009
6.5.3.2 Address and indirection operators
1099
Coding Guidelines
Because of the implicit conversions a translator is required to perform, the unary
*
operator is not required to
cause the designated function to be called. There are a number of situations that can cause such usage to
appear in source code: the token sequence may be in automatically generated source code, or the sequence
may occur in developer-written source via arguments passed to macros, or developers may apply it to objects
having a pointer to function type because they are unaware of the implicit conversions that need to be
performed.
Example
1 extern void f(void);
2 extern void (
*
p_f)(void);
3
4 void g(void)
5 {
6 f();
7 (
*
f)();
8 (
*******
f)();
9 (
*
p_f)();
10 }
1097
if it points to an object, the result is an lvalue designating the object.
Commentary
The indirection operator produces a result that allows the pointed-to object to be treated like an anonymous
object. The result can appear in the same places that an identifier (defined to be an object of the same type)
can appear. The resulting lvalue might not be a modifiable lvalue. There may already be an identifier that
724 modifiable
lvalue
refers to the same object. If two or more different access paths to an object exist, it is said to be aliased.
971 object
aliased
Common Implementations
Some processors (usually CISC) have instructions that treat their operand as an indirect reference. For
instance, an indirect load instruction obtains its value from the storage location pointed to by the storage
location that is the operand of the instruction.
1098
If the operand has type “pointer to type”, the result has type “type”. unary *
result type
Commentary
The indirection operator removes one level of pointer from the operand’s type. The operand is required to
have pointer type. In many contexts the result type of a pointer to function type will be implicitly converted
1089 unary *
operand has
pointer type
732 function
designator
converted to type
back to a pointer type.
1099
If an invalid value has been assigned to the pointer, the behavior of the unary
*
operator is undefined.
84)
Commentary
The standard does not provide an all-encompassing definition of what an invalid value is. The footnote
1114 footnote
84
gives some examples. An invalid value has to be created before it can be assigned and this may involve a
conversion operation. Those pointer conversions for which the standard defines the behavior do not create
743 pointer
to void
converted to/from
invalid values. So the original creation of the invalid value, prior to assignment, must also involves undefined
behavior.
If no value has been assigned to an object, it has an indeterminate value.
461 object
initial value
indeterminate
The equivalence between the array access operator and the indirection operator means that the behavior of
989 array sub-
script
identical to
what is commonly known as an out of bounds array access is specified here.
June 24, 2009 v 1.2
6.5.3.3 Unary arithmetic operators
1102
C
++
The C
++
Standard does not explicitly state the behavior for this situation.
Common Implementations
For most implementations the undefined behavior is decided by the behavior of the processor executing
the program. The root cause of applying the indirection operator to an invalid valid is often a fault in a
program and implementations that perform runtime checks sometimes issue a diagnostic when such an event
occurs. (Some vendors have concluded that their customers would not accept the high performance penalty
incurred in performing this check, and they don’t include it in their implementation.) The result can often be
manipulated independently of whether there is an object at that storage location, although some processors
do perform a few checks.
pointer
cause unde-
fined behavior
454
Techniques for detecting the dereferencing of invalid pointer values usually incur a significant runtime
overhead
[63, 692, 701, 1049, 1314]
(programs often execute at least a factor of 10 times slower). A recent im-
plementation developed by Dhurjati and Adve
[357]
reported performance overheads in the range 12% to
69%.
Coding Guidelines
This usage corresponds to a fault in the program and these coding guidelines are not intended to recommend
against the use of constructs that are obviously faults.
guidelines
not faults
0
1100
Forward references: storage-class specifiers (6.7.1), structure and union specifiers (6.7.2.1).
6.5.3.3 Unary arithmetic operators
Constraints
1101
The operand of the unary + or - operator shall have arithmetic type;
Commentary
The unary
-
operator is sometimes passed as a parameter in a macro invocation. In those cases where
negation of an operand is not required (in the final macro replacement), the unary
+
operator can be passed as
an argument (empty macro arguments cause problems for some preprocessors). The symmetry of having two
operators can also simplify the automatic generation of source code. While it would have been possible to
permit the unary + operator to have an operand of any type (since it has no effect other than performing the
integer promotions on its operand), it is very unlikely that this operator would ever appear in a context that
the unary - operator would not also appear in.
C
++
The C
++
Standard permits the operand of the unary + operator to have pointer type (5.3.1p6).
Coding Guidelines
While applying the unary minus operator to an operand having an unsigned integer type is seen in some
algorithms (it can be a more efficient method of subtracting the value from the corresponding
U
*
_MAX
macro,
in <limits.h>, and adding one), this usage is generally an oversight by a developer.
Rev
1101.1
The promoted operand of the unary - operator shall not be an unsigned type.
1102
of the ~ operator, integer type;
v 1.2 June 24, 2009
6.5.3.3 Unary arithmetic operators
1103
Commentary
There are algorithms (e.g., in graphics applications) that require the bits in an integer value to be comple-
mented, and processors invariably contain an instruction for performing this operation. Complementing
the bits in a floating-point value is a very rarely required operation and processors do not contain such an
instruction. This constraint reflects this common usage.
Other Languages
While many languages do not contain an equivalent of the
~
operator, their implementations sometimes
include it as an extension.
Coding Guidelines
Some coding guideline documents only recommend against the use of operands having a signed type. The
argument is that the representation of unsigned types is defined by the standard, while signed types might have
one of several representations. In practice, signed types almost universally have the same representation—
two’s complement. However, the possibility of variability of integer representation across processors is not
612 two’s comple-
ment
the only important issue here. The
~
operator treats its operand as a sequence of bits, not a numeric value. As
such it may be making use of representation information and the guideline recommendation dealing with this
issue would be applicable.
569.1 represen-
tation in-
formation
using
1103
of the ! operator, scalar type. !
operand type
Commentary
The logical negation operator is defined in terms of the equality operator, whose behavior in turn is only
1113 !
equivalent to
defined for scalar types.
1213 equality
operators
constraints
C
++
The C
++
Standard does not specify any requirements on the type of the operand of the ! operator.
5.3.1p8
The operand of the logical negation operator ! is implicitly converted to bool (clause 4);
But the behavior is only defined if operands of scalar type are converted to bool:
4.12p1
An rvalue of arithmetic, enumeration, pointer, or pointer to member type can be converted to an rvalue of type
bool.
Other Languages
Some languages require the operand to have a boolean type.
Coding Guidelines
The following are two possible ways of thinking about this operator are:
1.
As a shorthand form of the
!=
operator in a conditional expression. That is, in the same way the
two forms
if (x)
and
if (x == 0)
are equivalent, the two forms
if (!x)
and
if (x != 0)
are
equivalent.
2.
As a logical negation operator that reverses the state of a boolean value (it can take as its operand a
value in either of the possible boolean representation models and map it to the model that uses the 0/1
476 boolean role
for its boolean representation).
A double negative is very often interpreted as a positive statement in English (e.g., “It is not unknown for
double negatives to occur in C source”). The same semantics that apply in C. However, in some languages
(e.g., Spanish) a double negative is interpreted as making the statement more negative (this usage does occur
in casual English speech, e.g., “you haven’t seen nothing yet”, but it is rare and frowned on socially
[120]
).
The token
!
is commonly called the not operator. This term is a common English word whose use in a
sentence is similar to its use in a C expression. Through English language usage the word not, or an equivalent
June 24, 2009 v 1.2