Tải bản đầy đủ (.pdf) (83 trang)

Statistical Methods in Medical Research - part 7 ppsx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (521.26 KB, 83 trang )

calculate p
H
 1=2n when r  0 and p
H
2n À1=2n when r  n, and obtain y
from p
H
.
A further point is that the probit transformation does not stabilize variances,
even for observations with constant n. Some form of weighting is therefore
desirable in any analysis. A rigorous approach is provided by the method called
probit analysis (Finney, 1971; see also §20.4).
The effect of the probit transformation in linearizing a relationship is shown
in Fig. 14.1. In Figure 14.1(b) the vertical axis on the left is the NED of p, and the
scale on the right is the probability scale, in which the distances between points
on the vertical scale are proportional to the corresponding distances on the
probit or NED scale.
Logit transformation
The logit of p is defined as
y  ln
p
1 Àp
: 14:5
Occasionally (Fisher & Yates, 1963; Finney, 1978) the definition incorporates a
factor
1
2
so that y 
1
2
lnp=1 Àp; this has the effect of making the values rather


similar to those of the NED (i.e. probit À 5).
The effect of the logit (or logistic) transformation is very similar indeed to
that of the probit transformation.
The probit transformation is reasonable on biological grounds in some
circumstances; for example, in a quantal assay of insecticides applied under
different controlled conditions, a known number of flies might be exposed at a
number of different doses and a count made of the number killed. In this type of
study, individual tolerances or their logs may be assumed to have a normal
distribution, and this leads directly to the probit model (§20.4).
The logit transformation is more arbitrary, but has important advantages.
First, it is easier to calculate, since it requires only the log function rather than
the inverse normal distribution function. Secondly, and more importantly, the
logit is the logarithm of the odds, and logit differences are logarithms of odds
ratios (see (4.22)). The odds ratio is important in the analysis of epidemiological
studies, and logistic regression can be used for a variety of epidemiological study
designs (§19.4) to provide estimates of relative risk (§19.5).
14.2 Logistic regression
The logit transformation gives the method of logistic regression:
ln
m
1 Àm

 b
0
 b
1
x
1
 b
2

x
2
 b
p
x
p
: 14:6
488 Modelling categorical data
Fitting a model
Two approaches are possible: first, an approximate method using empirical
weights and, secondly, the theoretically more satisfactory maximum likelihood
solution. The former method is a weighted regression analysis (p. 344), where
each value of the logit is weighted by the reciprocal of its approximate variance.
This method is not exactÐfirst, because the ys are not normally distributed
about their population values and, secondly, because the weights are not exactly
in inverse proportion to the variances, being expressed in terms of the estimated
proportion p. For this reason the weights are often called empirical. Although
this method is adequate if most of the sample sizes are reasonably large and
few of the ps are close to 0 or 1 (Example 14.1 was analysed using this method
in earlier editions of this book), the ease of using the more satisfactory maxi-
mum likelihood method with statistical software means it is no longer recom-
mended. If the observed proportions p are based on n  1 observation only,
their values will be either 0 or 1, and the empirical method cannot be used.
This situation occurs in the analysis of prognostic data, where an individual
patient is classified as `success' or `failure', several explanatory variables x
j
are observed, and the object is to predict the probability of success in terms of
the xs.
Maximum likelihood
The method of estimation by maximum likelihood, introduced in §4.1, has

certain desirable theoretical properties and can be applied to fit logistic regres-
sion and other generalized linear models. The likelihood of the data is propor-
tional to the probability of obtaining the data (§3.3). For data of known
distributional form, and where the mean value is given in terms of a generalized
linear model, the probability of the observed data can be written down using the
appropriate probability distributions. For example, with logistic regression
the probability for each group or individual can be calculated using the binomial
probability from (14.6) in (3.12) and the likelihood of the whole data is the
product of these probabilities over all groups or individuals. This likelihood
depends on the values of the regression coefficients, and the maximum likelihood
estimates of these regression coefficients are those values that maximize the
likelihoodÐthat is, the values for which the data are most likely to occur. For
theoretical reasons, and also for practical convenience, it is preferable to work in
terms of the logarithm of the likelihood. Thus it is the log-likelihood, L, that is
maximized. The method also gives standard errors of the estimated regression
coefficients and significance tests of specific hypotheses.
By analogy with the analysis of variance for a continuous variable, the
analysis of deviance is used in generalized linear models. The deviance is defined
14.2 Logistic regression 489
as twice the difference between the log-likelihood of a perfectly fitting model and
that of the current model, and has associated degrees of freedom (DF) equal to
the difference in the number of parameters between these two models. Where the
error distribution is completely defined by the link between the random and
linear parts of the modelÐand this will be the case for binomial and Poisson
variables but not for a normal variable, for which the size of the variance is also
requiredÐthen deviances follow approximately the x
2
distribution and can be
used for the testing of significance. In particular, reductions in deviance due to
adding extra terms into the model can be used to assess whether the inclusion of

the extra terms had resulted in a significant improvement to the model. This is
analogous to the analysis of variance test for deletion of variables described in
§11.6 for a continuous variable.
The significance of an effect on a single degree of freedom may be tested
by the ratio of its estimate to its standard error (SE), assessed as a stand-
ardized normal deviate. This is known as the Wald test, and its square as the
Wald x
2
.
Another test is the score test which is based on the first derivative of the log-
likelihood with respect to a parameter and its variance (see Agresti, 1996, §4.5.2).
Both are evaluated at the null value of the parameter and conditionally on the
other terms in the model. This statistic is less readily available from statistical
software except in simple situations.
The procedure for fitting a model using the maximum likelihood
method usually involves iterationÐthat is, repeating a sequence of calculations
until a stable solution is reached. Fitted weights are used and, since these
depend on the parameter estimates, they change from cycle to cycle of the
iteration. The approximate solution using empirical weights could be the
first cycle in this iterative procedure, and the whole procedure is some-
times called iterative weighted least squares. The technical details of the proce-
dure will not be given since the process is obviously rather tedious, and the
computations require appropriate statistical software (for example, PROC
LOGISTIC in SAS (2000), LOGISTIC REGRESSION in SPSS (1999), or
GLIM (Healy, 1988). For further details of the maximum likelihood method
see, for example, Wetherill (1981).
Example 14.1
Table 14.1 shows some data reported by Lombard and Doering (1947) from a survey of
knowledge about cancer. These data have been used by several other authors (Dyke &
Patterson, 1952; Naylor, 1964). Each line of the table corresponds to a particular combin-

ation of factors in a 2
4
factorial arrangement, n being the number of individuals in this
category and r the number who gave a good score in response to questions about cancer
knowledge. The four factors are: A, newspaper reading; B, listening to radio; C, solid
reading; D, attendance at lectures.
490 Modelling categorical data
Table 14.1 A2
4
factorial set of proportions (Lombard & Doering, 1947). The fitted proportions from
a logistic regression analysis are shown in column (4).
(1) (2) (3) (4)
Factor
combination
Number of
individuals
n
Number
with
good score
r
Observed
proportion
(2)=(1)
p
Fitted
proportion
(1) 477 84 0Á176 0Á188
(a) 231 75 0Á325 0Á308
(b) 63 13 0Á206 0Á240

(ab) 94 35 0Á372 0Á377
(c) 150 67 0Á447 0Á382
(ac) 378 201 0Á532 0Á542
(bc) 32 16 0Á500 0Á458
(abc) 169 102 0Á604 0Á618
(d) 12 2 0Á167 0Á261
(ad) 13 7 0Á538 0Á404
(bd) 7 4 0Á571 0Á325
(abd) 12 8 0Á667 0Á480
(cd) 11 3 0Á273 0Á485
(acd) 45 27 0Á600 0Á643
(bcd) 4 1 0Á250 0Á562
(abcd) 31 23 0Á742 0Á711
Although the data were obtained from a survey rather than from a randomized
experiment, we can usefully study the effect on cancer knowledge of the four main effects
and their interactions. The main effects and interactions will not be orthogonal but can be
estimated.
There are 16 groups of individuals and a model containing all main effects and all
interactions would fit the data perfectly. Thus by definition it would have a deviance of
zero and serves as the reference point in assessing the fit of simpler models.
The first logistic regression model fitted was that containing only the main effects. This
gave a model in which the logit of the probability of a good score was estimated as
À1Á4604  0Á6498A  0Á3101B 0Á9806C 0Á4204D
SE: 0Á1154 0Á1222 0Á1107 0Á1910
z:5Á63 2Á54 8Á862Á20
P: <0Á001 0Á011 <0Á001 0Á028
Here the terms involving A, B, C and D are included when these factors are present and
omitted otherwise.
The significance of the main effects have been tested by Wald's testÐthat is, the ratio
of an estimate to its standard error assessed as a standardized normal deviate. Altern-

atively, the significance may be established by analysis of deviance. For example, fitting
the model containing only the main effects of B, C and D, gives a deviance of 45Á47 with 12
14.2 Logistic regression 491
DF. Adding the main effect of A to the model reduces the deviance to 13Á59 with 11 DF, so
that the deviance test for the effect of A, after allowing for B, C and D, is 45Á47 À 13Á59 
31Á88 as an approximate x
2
1
. This test is numerically similar to Wald's test, since
31Á88  5Á65
p
, but in general such close agreement would not be expected. Although the
deviance tests of main effects are not necessary here, in general they are needed. For
example, if a factor with more than two levels were fitted, using dummy variables (§11.7),
a deviance test with the appropriate degrees of freedom would be required.
The deviance associated with the model including all the main effects is 13Á59 with 11
DF, and this represents the 11 interactions not included in the model. Taking the deviance
as a x
2
11
, there is no evidence that the interactions are significant and the model with just
main effects is a good fit. However, there is still scope for one of the two-factor inter-
actions to be significant and it is prudent to try including each of the six two-factor
interactions in turn to the model. As an example, when the interaction of the two kinds
of reading, AC, is included, the deviance reduces to 10Á72 with 10 DF. Thus, this inter-
action has an approximate x
2
1
of 2Á87, which is not significant (P  0Á091). Similarly,
none of the other interactions is significant.

The adequacy of the fit can be visualized by comparing the observed and fitted
proportions over the 16 cells. The fitted proportions are shown in column (4) of Table
14.1 and seem in reasonable agreement with the observed values in column (3). A formal
test may be constructed by calculating the expected frequencies, E(r) and En Àr, for
each factor combination and calculating the Pearson's x
2
statistic (8.28). This has the
value 13Á61 with 11 DF (16 À5, since five parameters have been fitted). This test statistic is
very similar to the deviance in this example, and the model with just the main effects is
evidently a good fit.
The data of Example 13.2 could be analysed using logistic regression. In this
case the observed proportions are each based on one observation only. As a
model we could suppose that the logit of the population probability of survival,
Y, was related to haemoglobin, x
1
, and bilirubin, x
2
by the linear logistic
regression formula (14.6)
Y  b
0
 b
1
x
1
 b
2
x
2
:

Application of the maximum likelihood method gave the following estimates of
b
0
, b
1
, and b
2
with their standard errors:
^
b
0
À2Á354 Æ 2Á416
^
b
1
 0Á5324 Æ 0Á1487
^
b
2
À0Á4892 Æ 0Á3448:
14:7
The picture is similar to that presented by the discriminant analysis of Example
13.2. Haemoglobin is an important predictor; bilirubin is not. An interesting
point is that, if the distributions of the xs are multivariate normal, with the same
variances and covariances for both successes and failures (the basic model for
discriminant analysis), the discriminant function (13.4) can also be used to
predict Y. The formula is:
492 Modelling categorical data
Y  b
H

0
 b
H
1
x
1
 b
H
2
x
2
  b
H
p
x
p
,
where
b
H
j
 b
j
and
b
H
0
À
1
2

b
H
1


x
A1


x
B1
 b
H
p


x
Ap


x
Bp
 lnn
A
=n
B
: 14:8
In Example 13.2, using the discriminant function coefficients b
1
and b

2
given
there, we find
b
H
0
À4Á135,
b
H
1
 0Á6541,
b
H
2
À0Á3978,
which lead to values of Y not differing greatly from those obtained from (14.7),
except for extreme values of x
1
and x
2
.
An example of the use of the linear discriminant function to predict the
probability of coronary heart disease is given by Truett et al. (1967). The point
should be emphasized that, in situations in which the distributions of xs are far
from multivariate normal, this method may be unreliable, and the maximum
likelihood solution will be preferable.
To test the adequacy of the logistic regression model (14.6), after fitting by
maximum likelihood, an approximate x
2
test statistic is given by the deviance.

This was the approach in Example 14.1, where the deviance after fitting the four
main effects was 13Á59 with 11 DF (since four main effects and a constant term
had been estimated from 16 groups). The fit is clearly adequate, suggesting that
there is no need to postulate interactions, although, as was done in the example,
a further refinement to testing the goodness of fit is to try interactions, since a
single effect with 1 DF could be undetected when tested with other effects
contributing 10 DF.
In general terms, the adequacy of the model can be assessed by including
terms such as x
2
i
, to test for linearity in x
i
, and x
i
x
j
, to test for an interaction
between x
i
and x
j
.
The approximation to the distribution of the deviance by x
2
is unreliable for
sparse dataÐthat is, if a high proportion of the observed counts are small. The
extreme case of sparse data is where all values of n are 1. Differences between
deviances can still be used to test for the inclusion of extra terms in the model.
For sparse data, tests based on the differences in deviances are superior to the

corresponding Wald test (Hauck & Donner, 1977). Goodness-of-fit tests should
be carried out after forming groups of individuals with the same covariate
patterns. Even for a case of individual data, it may be that the final model results
14.2 Logistic regression 493
in a smaller number of distinct covariate patterns; this is particularly likely to be
the case if the covariates are categorical variables with just a few levels. The value
of the deviance is unaltered by grouping into covariate patterns, but the degrees
of freedom are equal to the number of covariate patterns less the number of
parameters fitted.
For individual data that do not reduce to a smaller number of covariate
patterns, tests based on grouping the data may be constructed. For a logistic
regression, grouping could be by the estimated probabilities and a x
2
test
produced by comparing observed and expected frequencies (Lemeshow & Hos-
mer, 1982; Hosmer & Lemeshow, 1989, §5.2.2). In this test the individuals are
ranked in terms of the size of the estimated probability, P, obtained from the
fitted logistic regression model. The individuals are then divided into g groups;
often g  10. One way of doing this is to have the groups of equal sizeÐthat is,
the first 10% of subjects are in the first group, etc. Another way is to define the
groups in terms of the estimated probabilities so that the first group contains
those with estimated probabilities less than 0Á1, the second 0Á1to0Á2, etc. A g Â2
table is then formed, in which the columns represent the two categories of the
dichotomous outcome variable, containing the observed and expected numbers
in each cell. The expected numbers for each group are the sum of the estimated
probabilities, P, and the sum of 1 À P, for all the individuals in that group. A x
2
goodness-of-fit statistic is then calculated (11.73). Based on simulations, Hosmer
and Lemeshow (1980) showed that this test statistic is distributed approximately
as a x

2
with g À 2 degrees of freedom. This test can be modified when some
individuals have the same covariate pattern (Hosmer & Lemeshow, 1989, §5.2.2),
provided that the total number of covariate patterns is not too different from the
total number of individuals.
Diagnostic methods based on residuals similar to those used in classical
regression (§11.9) can be applied. If the data are already grouped, as in Example
14.1, then standardized residuals can be produced and assessed, where each
residual is standardized by its estimated standard error. In logistic regression
the standardized residual is
r Àn^m
n^m1 À ^m
p

,
where there are r events out of n. For individual data the residual may be defined
using the above expression, with r either 0 or 1, but the individual residuals are of
little use since they are not distributed normally and cannot be assessed individu-
ally. For example, if ^m  0Á01, the only possible values of the standardized
residual are 9Á9 and À0Á1; the occurrence of the larger residual does not
necessarily indicate an outlying point, and if accompanied by 99 of the smaller
residuals the fit would be perfect. It is, therefore, necessary to group the re-
siduals, defining groups as individuals with similar values of the x
i
.
494 Modelling categorical data
Alternative definitions of the residual include correcting for the leverage of
the point in the space of the explanatory variables to produce a residual equiva-
lent to the Studentized residual (11.67). Another definition is the deviance re-
sidual, defined as the square root of the contribution of the point to the deviance.

Cox and Snell (1989; §2.7) give a good description of the use of residuals in
logistic regression.
The use of influence diagnostics is discussed in Cox and Snell (1989) and by
Hosmer and Lemeshow (1989, §5.3). There are some differences in leverage
between logistic regression and classical multiple regression. In the latter (see
p. 366) the points furthest from the mean of the x variables have the highest
leverages. In logistic regression the leverage is modified by the weight of each
observation and points with low or high expected probabilities have small
weight. As such probabilities are usually associated with distant points, this
reduces the leverage of these points. The balance between position of an obser-
vation in the x variable space and weight suggests that the points with highest
leverage are those with fitted probabilities of about 0Á2or0Á8 (Hosmer &
Lemeshow, 1989, §5.3). The concept of Cook's distance can be used in logistic
regression and (11.72) applies, using the modified leverage as just discussed,
although in this case only approximately (Pregibon, 1981).
In some cases the best-fitting model may not be a good fit, but all attempts to
improve it through adding in other or transformed x variables fail to give any
worthwhile improvement. This may be because of overdispersion due to some
extra source of variability. Unless this variability can be explained by some
extension to the model, the overdispersion can be taken into account in tests of
significance and the construction of confidence intervals by the use of a scaling
factor. Denoting this factor by f, any x
2
statistics are divided by f and standard
errors are multiplied by f
p
. f may be estimated from a goodness-of-fit test. For
non-sparse data this could be the residual deviance divided by its degrees of
freedom. For sparse data it is difficult to identify and estimate overdispersion.
For a more detailed discussion, see McCullagh and Nelder (1989, §4.5).

The model might be inadequate because of an inappropriate choice of the
link function. An approach to this problem is to extend the link function into a
family indexed by one or more parameters. Tests can then be derived to deter-
mine if there is evidence against the particular member of the family originally
used (Pregibon, 1980; Brown, 1982; McCullagh & Nelder, 1989).
The strength of fit, or the extent to which the fitted regression discriminates
between observed and predicted, is provided by the concordance/discordance of
pairs of responses. These measures are constructed as follows:
1 Define all pairs of observations in which one member of the pair has the
characteristic under analysis and the other does not.
2 Find the fitted probabilities of each member of the pair, p

and p
À
.
3 Then,
14.2 Logistic regression 495
if p

> p
À
the pair is concordant;
if p

< p
À
the pair is discordant;
if p

 p

À
the pair is tied.
4 Over all pairs find the percentages in the three classes, concordant, discord-
ant and tied.
These three percentages may be combined into a single summary measure in
various ways. A particularly useful summary measure is
c % concordant  0Á5% tied=100:
A value of c of 0Á5 indicates no discrimination and 1Á0 perfect discrimination. (c
is also the area under the receiver operating characteristic (ROC) curve (see
§19.9).)
For small data sets the methods discussed above will be inadequate, because
the approximations of the test statistics to the x
2
distribution will be unsatisfac-
tory, or a convergent maximum likelihood solution may not be obtained with
standard statistical software. Exact methods for logistic regression may be
applied (Mehta & Patel, 1995) using the LogXact software.
It was mentioned earlier that an important advantage of logistic regression
is that it can be applied to data from a variety of epidemiological designs,
including cohort studies and case±control studies (§19.4). In a matched case±
control study, controls are chosen to match their corresponding case for some
variables. Logistic regression can be applied to estimate the effects of variables
not included in the matching, but the analysis is conditional within the case±
control sets; the method is then referred to as conditional logistic regression
(§19.5).
14.3 Polytomous regression
Some procedures for the analysis of ordered categorical data are described in of
Chapter 15. These procedures are limited in two respects: they are appropriate
for relatively simple data structures, where the factors to be studied are few in
number; and the emphasis is mainly on significance tests, with little discussion of

the need to describe the nature of any associations revealed by the tests. Both of
these limitations are overcome by generalized linear models, which relate the
distribution of the ordered categorical response to a number of explanatory
variables. Because response variables of this type have more than two categories,
they are often referred to as polytomous responses and the corresponding proce-
dures as polytomous regression.
Three approaches are described very briefly here. The first two are general-
izations of logistic regression, and the third is related to comparisons of mean
scores (see (15.8)).
496 Modelling categorical data
The cumulative logits model
Denote the polytomous response variable by Y, and a particular category of Y by
j. The set of explanatory variables, x
1
, x
2
, , x
p
, will be denoted by the vector x.
Let
F
j
xPY j, given x
and
L
j
 logit F
j
x
 ln

F
j
x
1 ÀF
j
x
!
:
The model is described by the equation
L
j
xa
j
À b
H
x, 14:9
where b
H
x represents the usual linear function of the explanatory variables,
b
1
x
1
 b
2
x
2
  b
p
x

p
.
This model effectively gives a logistic regression, as in (14.6), for each of the
binary variables produced by drawing boundaries between two adjacent cate-
gories. For instance, if there are four categories numbered 1 to 4, there are three
binary variables representing the splits between 1 and 2±4, 1±2 and 3±4, and 1±3
and 4. Moreover, the regression coefficients b for the explanatory variables are
the same for all the splits, although the intercept term a
j
varies with the split.
Although a standard logistic regression could be carried out for any one of
the splits, a rather more complex analysis is needed to take account of the
interrelations between the data for different splits. Computer programs are
available (for instance, in SAS) to estimate the coefficients in the model and
their precision, either by maximum likelihood (as in SAS LOGIST) or weighted
least squares (as in SAS CATMOD), the latter being less reliable when many of
the frequencies in the original data are low.
The adjacent categories model
Here we define logits in terms of the probabilities for adjacent categories.
Define
L
j
 ln
p
j
p
j1

,
where p

j
is the probability of falling into the jth response category. The model is
described by the equation
L
j
 a
j
À b
H
x: 14:10
14.3 Polytomous regression 497
When there are only two response categories, (14.9) and (14.10) are entirely
equivalent, and both the cumulative logits model and the adjacent categor-
ies model reduce to ordinary logistic regression. In the more general case,
with more than two categories, computer programs are available for estima-
tion of the coefficients. For example, SAS CATMOD uses weighted least
squares.
The mean response model
Suppose that scores x are assigned to the categories, as in §15.2, and denote by
Mx the mean score for individuals with explanatory variables x. The model
specifies the same linear relation as in multiple regression
Mxa b
H
x: 14:11
The approach is thus a generalization of that underlying the comparison of
mean scores by (15.8) in the simple two-group case. In the general case the
regression coefficients cannot be estimated accurately by standard multiple
regression methods, because there may be large departures from normality
and disparities in variance. Nor can exact variances such as (15.5) be easily
exploited.

Choice of model
The choice between the models described briefly above, or any others, is largely
empirical: which is the most convenient to use, and which best describes the
data? There is no universally best choice. The two logistic models attempt
to describe the relative frequencies of observations in the various categories,
and their adequacy for any particular data set may be checked by compar-
ing observed and expected frequencies. The mean response model is less
searching, since it aims to describe only the mean values. It may, therefore, be a
little more flexible in fitting data, and particularly appropriate where there is a
natural underlying continuous response variate or scoring system, but
less appropriate when the fine structure of the categorical response is under
study.
Further descriptions of these models are given in Agresti (1990, Chapter 9),
and an application to repeated measures data is described in Agresti (1989). An
example relating alcohol consumption in eight ordered categories to biochemical
and haematological variables was discussed by Ashby et al. (1986). These
authors also discussed a test of goodness of fit, which is essentially an extension
of the Hosmer±Lemeshow test, and a method of allocating an individual to one of
the groups.
498 Modelling categorical data
Example 14.2
Bishop (2000) followed up 207 patients admitted to hospital following injury and recorded
functional outcome after 3 months using a modified Glasgow Outcome Score (GOS). This
score had five ordered categoriesÐfull recovery, mild disability, moderate disability,
severe disability, and dead or vegetative state.
The relationship between functional outcome and a number of variables relating to the
patient and the injury was analysed using the cumulative logits model (14.9) of
polytomous regression. The final model included seven bs indicating the relationship
between outcome and seven variables, which included age, whether the patient was
transferred from a peripheral hospital, three variables representing injury severity, two

interaction terms, and four as representing the splits between the five categories of GOS.
The model fitted well and was assessed in terms of ability to predict GOS for each
patient. For 88 patients there was exact agreement between observed and predicted GOS
scores compared with 52Á4 expected by chance if the model had no predicting ability, and
there were three patients who differed by three or four categories on the GOS scale
compared with 23Á3 expected. As discussed in §13.3, this is likely to be overoptimistic as
far as the ability of the model to predict the categories of future patients is concerned.
14.4 Poisson regression
Poisson distribution
The expectation of a Poisson variable is positive and so limited to the range 0 to
I. A link function is required to transform this to the unlimited range ÀIto I.
The usual transformation is the logarithmic transformation
gmln m,
leading to the log-linear model
ln m  b
0
 b
1
x
1
 b
2
x
2
  b
p
x
p
: 14:12
Example 14.3

Table 14.2 shows the number of cerebrovascular accidents experienced during a certain
period by 41 men, each of whom had recovered from a previous cerebrovascular accident
and was hypertensive. Sixteen of these men received treatment with hypotensive drugs and
25 formed a control group without such treatment. The data are shown in the form of a
frequency distribution, as the number of accidents takes only the values 0, 1, 2 and 3. This
was not a controlled trial with random allocation, but it was nevertheless useful to enquire
whether the difference in the mean numbers of accidents for the two groups was signifi-
cant, and since the age distributions of the two groups were markedly different it was
thought that an allowance for age might be important.
The data consist of 41 men, classified by three age groups and two treatment groups,
and the variable to be analysed is the number of cerebrovascular accidents, which takes
integral values. If the number of accidents is taken as having a Poisson distribution with
14.4 Poisson regression 499
Table 14.2 Distribution of numbers of cerebrovascular accidents experienced by males in hypoten-
sive-treated and control groups, subdivided by age.
Age (years)
40± 50± 60±
Number of
accidents
Number of
men
Number of
men
Number of
men
Control group 0 0 3 4
11 38
20 41
30 10
11113

Treated group 0 4 7 1
10 40
411 1
expectation dependent on age and treatment group, then a log-linear model (14.12) would
be appropriate.
Several log-linear models have been fitted and the following analysis of deviance has
been constructed:
Analysis of deviance
Fitting Deviance DF Effect
Deviance
difference DF
Constant term 40Á54 40
Treatment, T 31Á54 39 T (unadj.) 9Á00 1
Age, A 37Á63 38 A (unadj.) 2Á91 2
T  A28Á84 37 T (adj.) 8Á79 1
A (adj.) 2Á70 2
T  A  T Â A27Á04 35 T Â A1Á802
The test of the treatment effect after allowing for age is obtained from the deviance
difference after adding in a treatment effect to a model already containing age; that is,
37Á63 À28Á84  8Á79 (1 DF). Similarly, the effect of age adjusted for treatment has a test
statistic of 2Á70 (2 DF). There is no evidence of an interaction between treatment and age,
or of a main effect of age.
The log-linear model fitting just treatment is
ln m  0Á00 À 1Á386 treated group,
SE: 0Á536,
500 Modelling categorical data
giving fitted expectations of exp(0Á00) 1Á00 for the control group and exp(À1Á3860Á25
for the treated group. These values are identical with the observed valuesÐ25 accidents in
25 men in the control group and four accidents in 16 men in the treated groupÐalthough if
it had proved necessary to adjust for age this would not have been so.

The deviance of 27Á04 with 35 DF after fitting all effects is a measure of how well the
Poisson model fits the data. However, it would not be valid to assess this deviance as an
approximate x
2
because of the low counts on which it is based. Note that this restriction
does not apply to the tests of main effects and interactions, since these comparisons are
based on amalgamated data, as illustrated above for the effect of treatment.
We conclude our discussion of Poisson regression with an example of a log-
linear model applied to Poisson counts.
Example 14.4
Table 14.3 gives data on the number of incident cases of cancer in a large group of
ex-servicemen, who had been followed up over a 20-year period. The servicemen are in
two groups according to whether they served in a combat zone (veterans) or not, and the
experience of each serviceman is classified into subject-years at risk in 5-year age groups.
The study is described in Australian Institute of Health and Welfare (1992), where the
analysis also controlled for calendar year. Each serviceman passed through several of
these groups during the period of follow-up. The study was carried out in order to assess if
there was a difference in cancer risk between veterans and non-veterans. The model used
was a variant on (14.12). If y
ij
is the number of cases of cancer in group i and age group j,
and N
ij
is the corresponding number of subject-years, then y
ij
=N
ij
is the incidence rate.
Table 14.3 Number of incident cases of cancer and subject-years at risk in a group of ex-servicemen
(reproduced by permission of the Australian Institute of Health and Welfare).

Veterans Non-veterans
Age Number of
cancers Subject-years
Number of
cancers
Subject-years
±24 6 60 840 18 208487
25±29 21 157175 60 303 832
30±34 54 176 134 122 325 421
35±39 118 186 514 191 312 242
40±44 97 135 475 108 165 597
45±49 58 42 620 74 54 396
50±54 56 25 001 88 40 716
55±59 54 13 710 120 33 801
60±64 34 6 163 141 26 618
65±69 9 1 575 108 17 404
70± 2 273 99 14 146
Total 509 805 480 1129 1 502 660
14.4 Poisson regression 501
The log-linear model states that the logarithm of incidence will follow a linear model on
variables representing the group and age. Thus if m
ij
is the expectation of y
ij
, then
ln m
ij
 ln N
ij
 a b

i
x
i
 g
j
z
j
, 14:13
where x
i
and z
j
are dummy variables representing the veteran groups and the age groups,
respectively (the dummy variables were defined as in §11.7, with x
1
 1 for the veterans
group, and z
1
, z
2
, , z
10
 1 for age groups 25±29, 30±34, , 70±; no dummy variable
was required for the non-veterans or the youngest age group as their effects are included
within the coefficient a). This model differs from (14.12) in the inclusion of the first term
on the right-hand side, which ensures that the number of years at risk is taken into
account (see (19.38)).
The model was fitted by maximum likelihood using GLIM with ln N
ij
included as an

OFFSET. The estimates of the regression coefficients were:
Estimate SE
a À9Á324
b
1
(veterans) À0Á0035 0Á0555
c
1
(25±29) 0Á679 0Á232
c
2
(30±34) 1Á371 0Á218
c
3
(35±39) 1Á940 0Á212
c
4
(40±44) 2Á034 0Á216
c
5
(45±49) 2Á727 0Á222
c
6
(50±54) 3Á203 0Á221
c
7
(55±59) 3Á716 0Á218
c
8
(60±64) 4Á093 0Á218

c
9
(65±69) 4Á236 0Á224
c
10
(70±) 4Á364 0Á227
The estimate of the veterans effect is not significant, Wald z À0Á0035=0Á0555 
0Á06P  0Á95. Converting back from the log scale, the estimate of the relative risk of
cancer in veterans compared with non-veterans, after controlling for age, is expÀ0Á0035
 1Á00. The 95% confidence limits are expÀ0Á0035 Æ1Á96 Â 0Á05550Á89 and 1Á11.
502 Modelling categorical data
15 Empirical methods for categorical data
15.1 Introduction
Categorical data show the frequencies with which observations fall into various
categories or combinations of categories. Some of the basic methods of handling
this type of data have been discussed in earlier sections of the book, particularly
§§3.6, 3.7, 4.4, 4.5, 5.2, 8.5, 8.6 and 8.8. In the present chapter we gather together
a number of more advanced techniques for handling categorical data.
Many of the techniques described in these sections make use of the x
2
distributions, which have been used extensively in earlier chapters. These x
2
methods are, however, almost exclusively designed for significance testing. In
many problems involving categorical data, the estimation of relevant parameters
which describe the nature of possible associations between variables is much
more important than the performance of significance tests of null hypotheses.
Chapter 14 is devoted to a general approach to modelling the relationships
between variables, of which some particular cases are relevant to categorical data.
It is useful at this stage to make a distinction between three different types of
classification into categories, according to the types of variable described in §2.3.

1 Nominal variables, in which no ordering is implied.
2 Ordinal variables, in which the categories assume a natural ordering although
they are not necessarily associated with a quantitative measurement.
3 Quantitative variables, in which the categories are ordered by their associa-
tion with a quantitative measurement.
It is often useful to consider both ordinal and quantitative variables as
ordered, and to distinguish particularly between nominal and ordered data. But
data can sometimes be considered from more than one point of view. For
instance, quantitative data might be regarded as merely ordinal if it seemed
important to take account of the ordering but not to rely too closely on the
specific underlying variable. Ordered data might be regarded as purely nominal if
there seemed to be differences between the effects of different categories which
were not related to their natural order. We need, therefore, methods which can
be adapted to a wide range of situations.
Many of the x
2
tests introduced earlier have involved test statistics distrib-
uted as x
2
on several degrees of freedom. In each instance the test was sensitive to
departures from a null hypothesis, which could occur in various ways. In a 2 Âk
503
contingency table, for instance, the null hypothesis postulates equality between
the expected proportions of individuals in each column which fall into the first
row. There are k of these proportions, and the null hypothesis can be falsified if
any one of them differs from the others. These tests may be thought of as
`portmanteau' techniques, able to serve many different purposes. If, however,
we were particularly interested in a certain form of departure from the null
hypothesis, it might be possible to formulate a test which was particularly
sensitive to this situation, although perhaps less effective than the portmanteau

x
2
test in detecting other forms of departure. Sometimes these specially directed
tests can be achieved by subdividing the total x
2
statistic into portions which
follow x
2
distributions on reduced numbers of degrees of freedom (DF).
The situation is very similar to that encountered in the analysis of variance,
where a sum of squares (SSq) can sometimes be subdivided into portions, on
reduced numbers of DF, which represent specific contrasts between groups (§8.4).
In §§15.2 and 15.3 we describe methods for detecting trends in the probabilities
with which observations fall into a series of ordered categories. In §15.4 a similar
method is described for a single series of counts. In §15.5 two other situations are
described, in which the x
2
statistic calculated for a contingency table is subdivided
to shed light on specific ways in which categorical variables may be associated. In
§§15.6 and 15.7 some of the methods described earlier are generalized for situa-
tions in which the data are stratified (i.e. divided into subgroups), so that trends
can be examined within strata and finally pooled. Finally, in §15.8 we discuss exact
tests for some of the situations considered in the earlier sections.
More comprehensive treatments of the analysis of categorical data are con-
tained in the monographs by Fienberg (1980), Fleiss (1981), Cox and Snell (1989)
and Agresti (1990, 1996).
15.2 Trends in proportions
Suppose that, in a 2 Âk contingency table of the type discussed in §8.5, the k
groups have a natural order. They may correspond to different values, or groups
of values, of a quantitative variable like age; or they may correspond to quali-

tative categories, such as severity of a disease, which can be ordered but not
readily assigned a numerical value. The usual x
2
kÀ1
test is designed to detect
differences between the k proportions of observations falling into the first row.
More specifically, one might ask whether there is a significant trend in these
proportions from group 1 to group k.
For convenience of exposition we shall assign the groups to the rows of the
table, which now becomes k Â2 rather than 2 Â k. Let us assign a quantitative
variable, x, to the k groups. If the definition of groups uses such a variable, this
can be chosen to be x. If the definition is qualitative, x can take integer values
from 1 to k. The notation is as follows:
504 Empirical methods for categorical data
Variable
Frequency
Group x Positive Negative Total Proportion positive
1 x
1
r
1
n
1
À r
1
n
1
p
1
2 x

2
r
2
n
2
À r
2
n
2
p
2
.
.
.
ix
i
r
i
n
i
À r
i
n
i
p
i
.
.
.
kx

k
r
k
n
k
À r
k
n
k
p
k
All groups
combined
RNÀ RN P R=N
The numerator of the x
2
kÀ1
statistic, X
2
, is, from (8.29),

n
i
p
i
À P
2
,
a weighted sum of squares of the p
i

about the (weighted) mean P (see discussion
after (8.30)). It also turns out to be a straightforward sum of squares, between
groups, of a variable y taking the value 1 for each positive individual and 0 for each
negative. This SSq can be divided (as in §11.1) into an SSq due to regression of y
on x and an SSq due to departures from linear regression. If there is a trend of p
i
with x
i
, we might find the first of these two portions to be greater than would
be expected by chance. Dividing this portion by PQ, the denominator of (8.29),
gives us a x
2
1
statistic, X
2
1
, which forms part of X
2
and is particularly sensitive to
trend.
A little algebraic manipulation (Armitage, 1955) gives
X
2
1

NN

r
i
x

i
À R

n
i
x
i

2
RN À RN

n
i
x
2
i
À

n
i
x
i

2

, 15:1
often referred to as the Cochran±Armitage test of trend. The difference between
the two statistics,
X
2

2
 X
2
À X
2
1
, 15:2
may be regarded as a x
2
kÀ2
statistic testing departures from linear regression of p
i
on x
i
. As usual, both of these tests are approximate, but the approximation (15.2)
is likely to be adequate if only a small proportion of the expected frequencies
are less than about 5. The trend test (15.1) is adequate in these conditions but
also more widely, since it is based on a linear function of the frequencies, and is
likely to be satisfactory provided that only a small proportion of expected fre-
quencies are less than about 2 and that these do not occur in adjacent rows. If
appropriate statistical software is available, an exact test can be constructed
(§15.8).
15.2 Trends in proportions 505
Example 15.1
In the analysis of the data summarized in Table 15.1, it would be reasonable to ask
whether the proportion of patients accepting their general practitioner's invitation to
attend screening mammography tends to decrease as the time since their last consultation
increases. The first step is to decide on scores representing the four time-period categories.
It would be possible to use the mid-points of the time intervals, 3 months, 9 months, etc.,
but the last interval, being open, would be awkward. Instead, we shall use equally spaced

integer scores, as shown in the table.
From (15.1),
X
2
1

27827849À86236
2
86192278530À236
2

1Á2383 Â10
10
=0Á15132 Â10
10

 8Á18,
which, as a x
2
1
variate, is highly significant P  0Á004.
The overall x
2
3
statistic, from (8.28) or (8.30), is calculated as
X
2
 8Á92:
The test for departures from a linear trend (15.2) thus gives
X

2
2
 8Á92 À8Á18  0Á74
as a x
2
2
variate, which is clearly non-significant. There is thus a definite trend which may
well result in approximately equal decreases in the proportion of attenders as we change
successively to the categories representing longer times since the last consultation.
A number of other formulae are equivalent or nearly equivalent to (15.1).
The regression coefficient of y on x, measuring the rate at which the proportion
p
i
changes with the score x
i
, is estimated by the expression
Table 15.1 Numbers of patients attending or not attending screening mammography, classified by
time since last visit to the general practitioner (Irwig et al., 1990).
Time since
last visit
Score
x
Attendance
Total
Proportion
attending
Yes No
<6 months 0 59 97 156 0Á378
6±12 months 1 10 31 41 0Á244
1±2 years 2 12 36 48 0Á250

>2 years 3 5 28 33 0Á152
86 192 278
506 Empirical methods for categorical data
b 
NT
N

n
i
x
2
i
À

n
i
x
i

2
, 15:3
where
T 

r
i
x
i
À
R


n
i
x
i
N


x
i
r
i
À e
i
, 15:4
the cross-product of the scores x
i
and the discrepancies r
i
À e
i
in the contingency
table between the frequencies in the first column (i.e. of positives) and their
expected values from the margins of the table. Here r
i
and e
i
correspond to the O
and E of (8.28), e
i

being calculated as Rn
i
=N. On the null hypothesis of no
association between rows and columns, for fixed values of the marginal totals,
the exact variance of T is
varT
RN À RN

n
i
x
2
i
À

n
i
x
i

2

N
2
N À 1
: 15:5
A x
2
1
test for the trend in proportions is therefore provided by the statistic

X
2
1a

T
2
varT

N À 1

x
i
r
i
À e
i

2
N À R

e
i
x
2
i
À

e
i
x

i

2
=R
: 15:6
In fact, X
2
1a
N À 1X
2
1
=N, so the two tests are very nearly equivalent. The
distinction is unimportant in most analyses, when N is fairly large, but (15.6)
should be used when N is rather small. In particular, it is preferable in situations
to be considered in §15.7, where data are subdivided into strata, some of which
may be small.
If the null hypothesis is untrue, (15.5) overestimates varT, since it makes use
of the total variation of y rather than the variation about regression on x. For the
regression of a binary variable y on x, an analysis of variance could be calculated,
as in Table 11.1. In this analysis the sum of squares about regression turns out to
be RN À RN À X
2
1
=N, and, using (7.16), varb may be calculated as
varb
RN À RN À X
2
1

NN À 2N


n
i
x
2
i
À

n
i
x
i

2

: 15:7
For the calculation of confidence limits for the slope, therefore, the formula after
(7.18) may be used, with the percentile of the t distribution on N À2 DF (which
in most applications will be close to the standardized normal value), and with
SEb var
p
b.
By analogy with the situation for simple regression (see the paragraph after
(7.19)), the test for association based on the regression of y on x, as in (15.1) and
(15.6), should give the same significance level as that based on the regression of x
on y. Since y is a binary variable, the latter regression is essentially determined by
the difference between the mean values of x at the two levels of y. In many
15.2 Trends in proportions 507
problems, particularly where y is clearly the dependent variable, this difference is
of no interest. In other situationsÐfor example when the columns of the table

represent different treatments and the rows are ordered categories of a response
to treatmentÐthis is a natural way of approaching the data. The standard
method for comparing two means is, of course, the two-sample t test. The
method now under discussion provides an alternative, which may be preferable
for categorical responses since the data are usually far from normal.
The difference between the mean scores for the positive and negative
responses is
d 
NT
RN À R
, 15:8
where T is given by (15.4). Since d
2
=vardX
2
1a
, as given by (15.6), it can easily
be checked that
vards
2
x
1
R

1
N À R

, 15:9
where s
2

x
is the variance of x, given by
s
2
x


n
i
x
2
i
À

n
i
x
i

2
N
N À 1
: 15:10
Note that (15.10) is the variance of x for the complete data, not the variance
within the separate columns. If the null hypothesis is not true, (15.9) will over-
estimate the variance of d, and confidence limits for the difference in means
calculated from (15.9) will tend to be somewhat too wide.
The test for the difference in means described above is closely related to the
Wilcoxon and Mann±Whitney distribution-free tests described in §10.3.
In the previous chapter it was noted that logistic regression is a powerful

method of analysing dichotomous data. Logistic regression can be used to test
for a trend in proportions by fitting a model on x. If this is done, then one of the
test statistics, the score statistic (p. 490), is identical to (15.1).
Example 15.1, continued
Applying (15.3) and (15.7) to the data of Table 15.1 gives:
b À0Á0728, SE(b)  0Á0252, with 95% confidence limits (À0Á122, À0Á023).
Note that b
2
=varb8Á37, a little higher than X
2
1
, as would be expected.
Fitting a logistic regression gives a regression coefficient on x of:
b
H
À0Á375, SE(b
H
)  0Á133, with 95% confidence limits (À0Á637, À 0Á114).
Of course, b and b
H
are different because the former is a regression of the proportion and
the latter of the logit transform of proportion. Interpretation of b
H
is facilitated by taking
508 Empirical methods for categorical data
the exponential, which gives a reduction in odds of proportion of attenders by a factor of
0Á69 (95% confidence interval 0Á53 to 0Á89) per category of time since the last consultation.
The x
2
test statistics of the trend are 8Á67 for the deviance test, 7Á90 for Wald's test, and

8Á18 for the score test, this last value being identical to the value obtained earlier from
(15.1).
15.3 Trends in larger contingency tables
Tests for trend can also be applied to contingency tables larger than the k  2
table considered in §15.2. The extension to more than two columns of frequencies
gives rise to two possibilities: the columns may be nominal (i.e. unordered) or
ordered. In the first case, we might wish to test for differences in the mean row
scores between the different columns; this would be an alternative to the one-way
analysis of variance, just as the x
2
test based on (15.8) and (15.9) is an alternative
to the two-sample t test. In the second case, of ordered column categories, the
problem might be to test the regression of one set of scores on the other, or
equivalently the correlation between the row and column scores. Both situations
are illustrated by an example, the methods of analysis following closely those
described by Yates (1948).
Example 15.2
Sixty-six mothers who had suffered the death of a newborn baby were studied to assess the
relationship between their state of grief and degree of support (Tudehope et al., 1986).
Grief was recorded on a qualitative ordered scale with four categories and degree of
support on an ordered scale with three categories (Table 15.2). The overall test statistic
(8.28) is 9Á96 (6 DF), which is clearly not significant. Nevertheless, examination of the
contingency table suggests that those with good support experienced less grief than those
with poor support, whilst those with adequate support were intermediate, and that this
effect is being missed by the overall test. The aim of the trend test is to produce a more
sensitive test on this specific aspect.
We first ignore the ordering of the columns, regarding them as three different
categories of a nominal variable. The calculations proceed as follows.
1 Assign scores to rows (x) and columns (y): integer values starting from 1 have been
used. Denote the row totals by R

i
, i  1tor, the column totals by C
j
, j  1toc, and
the total number of subjects by N.
2 For each column calculate the sum of the row scores, X
j
, and the mean row score

x
j
.
For the first column,
X
1
17 Â16 Â 23 Â31 Â442,

x
1
 42=27  1Á56:
This calculation is also carried out for the column of row totals to give 126, which
serves as a check on the values of X
j
, which sum over columns to this value. This total
is the sum of the row scores for all the mothers, i.e.

x  126.
15.3 Trends in larger contingency tables 509
Table 15.2 Numbers of mothers by state of grief and degree of support (data of Tudehope et al.,
1986).

Grief state
Row
score
x
i
Support
Total
R
i
Sum of
column
scores Y
i
Good Adequate Poor
I 1 17 9 8 34 59
II 2 6 5 1 12 19
III 3 3 5 4 12 25
IV 4 125 820
Total, C
j
27 21 18 66 123
Col. score, y
j
123
Sum of row scores, X
j
424242126
Mean score,

x

j
1Á56 2Á00 2Á33
3 Calculate the sum of squares of row scores for all the mothers,

x
2
34 Â 112 Â412 Â 98 Â16318:
Correct this sum of squares for the mean.
S
xx
 318 À126
2
=66  77Á455:
4 A test of the equality of the mean scores

x
j
may now be carried out. The test statistic
is
X
2
N À 1

X
2
j
=C
j
À


x
2
=N=S
xx

65
77Á455
42
2
27

42
2
21

42
2
18
À
126
2
66

 5Á70:
15:11
This may be regarded as a x
2
cÀ1
, i.e. a x
2

2
statistic. This is not quite significant at the
5% level (P  0Á058), but is sufficiently close to allow the possibility that a test for the
apparent trend in the column means,

x
j
, may be significant. We now, therefore, make
use of the column scores, y
j
.
5 Repeat steps 2 and 3, working across rows instead of down columns; it is not necessary
to calculate the mean scores.

y  123,

y
2
 273, S
yy
 273 À123
2
=66  43Á773:
6 Calculate the sum of products of the sum of row scores, X
j
, and the corresponding
column scores, y
j
.


X
j
y
j
42 Â142 Â242 Â3252:
This total is

xy over all mothers and correcting for the means
510 Empirical methods for categorical data
S
xy
 252 À123 Â 126=66  17Á182:
7 The test statistic for the trend in the mean scores,

x
j
,is
X
2
N À 1S
2
xy
=S
xx
S
yy

 65 Â17Á182
2
=77Á455 Â43Á773

 5Á66:
15:12
This is approximately a x
2
1
statistic and is significant (P  0Á017). In this example most of
the difference between column means lies in the trend. This is clear from examination of
the column means, and the corresponding result for the test statistics is that the overall
test statistic of 5Á70 (2 DF) for equality of column means may be subdivided into 5Á66
(1 DF) for linear trend and, by subtraction, 0Á04 (1 DF) for departures from the trend.
Note that the test statistic (15.12) is N À1 times the square of the correlation
coefficient between the row and column scores, and this may be a convenient
way of calculating it on a computer. When r  2, (15.11) tests the equality of c
proportions, and is identical with (8.30) except for a multiplying factor of
N À 1=N; (15.12) tests the trend in the proportions and is identical with (15.6).
Both (15.11) and (15.12) are included in the SAS program PROC FREQ, the
former as the `ANOVA statistic' and the latter as the `Mantel±Haenszel chi-
square'.
15.4 Trends in counts
In many problems of the type considered in §15.3, the proportions under con-
sideration are very small. If, for instance, the frequencies in the `positive' column
of the table on p. 505 are much smaller than those in the `negative' column,
almost all the contribution to the test statistic comes from the positives. In the
limit, as the p
i
become very small and the n
i
become very large, both (15.1) and
(15.6) take the form
X

2
1P



x
i
r
i
À e
i

2


e
i
x
2
i
À

e
i
x
i

2
=R
: 15:13

The subscript P is used for this test statistic because in this limiting situation the
observed frequencies r
i
can be regarded as Poisson variates with means (under
the null hypothesis) e
i
, the expected values. In some applications the expected
values are proportional to subject-years of observation for individuals in each
category (see §19.7).
Example 15.3
In the data shown in Table 15.3, the observed deaths and person-years of observation
weighted by time since exposure are shown. If there were no association between the death
15.4 Trends in counts 511
Table 15.3 Mortality due to pleural mesothelioma in asbestos factory workers according to an
ordered category of amount of exposure (Berry et al., 2000).
Category of exposure,
x
i
Observed deaths,
r
i
Person-years,
P
i
Expected,
e
i
1 11 23 522 15Á482
2 18 34 269 22Á556
3 7 7 075 4Á657

4 16 14 138 9Á306
Totals 52 79 004 52Á001
rate and category of exposure, then the expected frequencies of deaths would be in
proportion to the person-years of observation in the different exposure categories, so as
to add to the observed total of 52. The total x
2
3
statistic, calculated as

r
i
À e
i

2
,is8Á21,
which is significant (P  0Á042). There is a suggestion that the association consists of a
deficit of deaths in the low-exposure categories and an excess with the higher exposures.
Application of (15.13) gives X
2
1P
 7Á27 P  0Á007. There is clearly evidence for a
gradual increase in the rate of deaths due to pleural mesothelioma with increasing
exposure in this study.
In Example 15.3, the observed number of deaths is a Poisson variable and,
therefore, the method of Poisson regression (§14.4) may be applied, but with an
additional term in (14.12) to incorporate the fact that the expected number of
deaths is proportional to the number of person-years of observation modified by
the regression model. The rationale is similar to that used in Example 14.4
leading to (14.13).

Example 15.3, continued
Poisson regression models (§14.4) have been fitted to the data of Table 15.3. The offset
was the logarithm of the number of years of observation. The first model fitted was a null
model just containing an intercept and this gave a deviance of 7Á41 (3 DF). Then the score
for category of exposure, x, was added to give a deviance of 0Á59 (2 DF). Thus, the
deviance test of the trend of mortality with category of exposure was 6Á82 which, as an
approximate x
2
1
, gives P  0Á009. The regression coefficient of x was 0Á3295 with
standard error 0Á1240, giving a Wald x
2
of 7Á06 (P  0Á008). These test statistics and
significance levels are in reasonable agreement with the value of 7Á 27 P  0Á007 found
for the trend test.
15.5 Other components of x
2
Most of the x
2
statistics described earlier in this chapter can be regarded as
components of the total x
2
statistic for a contingency table. Two further exam-
ples of the subdivision of x
2
statistics are given below.
512 Empirical methods for categorical data

×