Tải bản đầy đủ (.pdf) (36 trang)

Modeling Hydrologic Change: Statistical Methods - Chapter 8 pps

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (449.64 KB, 36 trang )


Detection of Change
in Moments

8.1 INTRODUCTION

Hydrologic variables depend on many factors associated with the meteorological
and hydrologic processes that govern their behavior. The measured values are often
treated as random variables, and the probability of occurrence of all random variables
is described by the underlying probability distribution, which includes both the
parameters and the function. Change in either the meteorological or the hydrologic
processes can induce change in the underlying population through either a change
in the probability function or the parameters of the function.
A sequential series of hydrologic data that has been affected by watershed change
is considered nonstationary. Some statistical methods are most sensitive to changes
in the moments, most noticeably the mean and variance; other statistical methods
are more sensitive to change in the distribution of the hydrologic variable. Selecting
a statistical test that is most sensitive to detecting a difference in means when the
change in the hydrologic process primarily caused a change in distribution may lead
to the conclusion that hydrologic change did not occur. It is important to hypothesize
the most likely effect of the hydrologic change on the measured hydrologic data so
that the most appropriate statistical test can be selected. Where the nature of the
change to the hydrologic data is uncertain, it may be prudent to subject the data to
tests that are sensitive to different types of change to decide whether the change in
the hydrologic processes caused a noticeable change in the measured data.
In this chapter, tests that are sensitive to changes in moments are introduced. In
the next chapter, tests that are appropriate for detecting change in the underlying
probability function are introduced. All of the tests follow the same six steps of
hypothesis testing (see Chapter 3), but because of differences in their sensitivities,
the user should be discriminating in the selection of a test.


8.2 GRAPHICAL ANALYSIS

Graphical analyses are quite useful as the starting point for detecting change and
the nature of any change uncovered. Changes in central tendency can be seen from
a shift of a univariate histogram or the movement up or down of a frequency curve.
For a periodic or cyclical hydrologic processes, a graphical analysis might reveal a
shift in the mean value, without any change in the amplitude or phase angle of the
periodic function.
8

L1600_Frame_C08.fm Page 173 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC

A change in the variance of a hydrologic variable will be characterized by a
change in spread of the data. For example, an increase in variance due to hydrologic
change would appear in the elongation of the tails of a histogram. A change in slope
of a frequency curve would also suggest a change in variance. An increase or decrease
in the amplitude of a periodic function would be a graphical indication of a change
in variance of the random variable.
Once a change has been detected graphically, statistical tests can be used to
confirm the change. While the test does not provide a model of the effect of change,
it does provide a theoretical justification for modeling the change. Detection is the
first step, justification or confirmation the second step, and modeling the change is
the final step. This chapter and Chapter 9 concentrate on tests that can be used to
statistically confirm the existence of change.

8.3 THE SIGN TEST

The sign test can be applied in cases that involve two related samples. The name of
the test implies that the random variable is quantified by signs rather than a numerical

value. It is a useful test when the criterion cannot be accurately evaluated but can
be accurately ranked as being above or below some standard, often implied to mean
the central tendency. Measurement is on the ordinal scale, within only three possible
outcomes: above the standard, below the standard, or equal to the standard. Symbols
such as

+

,



, and 0 are often used to reflect these three possibilities.
The data consist of two random variables or a single criterion in which the
criterion is evaluated for two conditions, such as before treatment versus after
treatment. A classical example of the use of the sign test would be a poll in which
voters would have two options: voting for or against passage. The first part of the
experiment would be to record each voter’s opinion on the proposed legislation,
that is, for or against,

+

or



. The treatment would be to have those in the random
sample of voters read a piece of literature that discusses the issue. Then they would
be asked their opinion a second time, that is, for or against. Of interest to the
pollster is whether the literature, which may possibly be biased toward one decision,

can influence the opinion of voters. This is a before–after comparison. Since the
before-and-after responses of the individuals in the sample are associated with the
individual, they are paired. The sign test would be the appropriate test for analyzing
the data.
The hypotheses can be expressed in several ways, with the appropriate alternative
pair of hypotheses selected depending on the specific situation. One expression of
the null hypothesis is

H

0

:

P

(

X

>

Y

)

=




P

(

X

<

Y

)

=

0.5 (8.1a)
in which

X

and

Y

are the criterion values for the two conditions. In the polling case,

X

would be the pre-treatment response (i.e., for or against) and

Y


would be the post-
treatment response (i.e., for or against). If the treatment does not have a significant
effect, then the changes in one direction should be balanced by changes in the

L1600_Frame_C08.fm Page 174 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC

other direction. If the criterion did not change, then the values of

X

and

Y

are equal,
which is denoted as a tie. Other ways of expressing the null hypothesis are

H

0

:

P

(

+


)

=



P

(



) (8.1b)
or

H

0

:

E

(

X

)


=



E

(

Y

) (8.1c)
where

+

indicates a change in one direction while



indicates a change in the other
direction, and the hypothesis expressed in terms of expectation suggests that the two
conditions have the same central tendency.
Both one- and two-tailed alternative hypotheses can be formulated:

H

A

:


P

(

X

>

Y

) >

P

(

X

<

Y

) or

P

(

+


) >

P

(



) (8.2a)

H

A

:

P

(

X

>

Y

) <

P


(

X

<

Y

) or

P

(

+

) <

P

(



) (8.2b)

H

A


:

P

(

X

>

Y

)





P

(

X

<

Y

) or


P

(

+

)





P

(–) (8.2c)
The alternative hypothesis selected would depend on the intent of the analysis.
To conduct the test, a sample of

N

is collected and measurements on both

X

and

Y

made, that is, pre-test and post-test. Given that two conditions are possible,


X

and

Y

, for each test, the analysis can be viewed as the following matrix:
Only the responses where a change was made are of interest. The treatment would
have an effect for cells (

X

,

Y

) and (

Y

,

X

) but not for cells (

X

,


X

) and (

Y

,

Y

). Thus, the
number of responses

N

12

and

N

21

are pertinent while the number of responses

N

11

and


N

22

are considered “ties” and are not pertinent to the calculation of the sample
test statistic. If

N



=



N

11



+



N

22




+



N

12



+



N

21

but only the latter two are of interest, then
the sample size of those affected is

n



=




N

12



+



N

21

, where

n







N

. The value of

n


, not

N

, is used to obtain the critical value.
The critical test statistic depends on the sample size. For small samples of about
20 to 25 or less, the critical value can be obtained directly from the binomial
distribution. For large samples (i.e., 20 to 25 or more), a normal approximation is
applied. The critical value also depends on the alternative hypothesis: one-tailed
lower, one-tailed upper, and two-tailed.
For small samples, the critical value is obtained from the cumulative binomial
distribution for a probability of

p



=

0.5. For any value

T

and sample size

n

, the
Post-test

Pre-test XY
XN
11
N
12
YN
21
N
22
L1600_Frame_C08.fm Page 175 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
cumulative distribution F(T) is:
(8.3)
When the alternative hypothesis suggests a one-tailed lower test, the critical value
is the largest value of T for which
α
≥ F(T). For a one-tailed upper test, the critical
value is the value of T for which
α
≥ F(n − T). For a two-tailed test, the lower and
upper critical values are the values of T for which
α
/2 ≥ F(T) and
α
/2 ≥ F(n − T).
Consider the case where n = 13. Table 8.1 includes the mass and cumulative
mass functions as a function of T. For a 1% level of significance, the critical value
for a one-tailed lower test would be 1 since F(2) is greater than
α
. For a 5% level

of significance, the critical value would be 3 since F(4) is greater than
α
. For a one-
tailed upper test, the critical values for 1% and 5% are 12 and 10. For a two-tailed
test, the critical values for a 1% level of significance are 1 and 12, which means the
null hypothesis is rejected if the computed value is equal to 0, 1, 12, or 13. For a
5% level of significance the critical values would be 2 and 11, which means that
the null hypothesis is rejected if the computed value is equal to 0, 1, 2, 11, 12, or
13. Even though
α
is set at 5%, the actual rejection probability is 2(0.01123) =
0.02246, that is, 2.2%, rather than 5%. Table 8.2 gives critical values computed using
the binomial distribution for sample sizes of 10 to 50.
TABLE 8.1
Binomial Probabilities for Sign Test
with n ==
==
13; f(T) ==
==
Mass Function and
F(T ) ==
==
Cumulative Mass Function
Tf(T) F(T)
0 0.00012 0.00012
1 0.00159 0.00171
2 0.00952 0.01123
3 0.03491 0.04614
4 0.08728 0.13342
5 0.15711 0.29053

6 0.20947 0.50000
7 0.20947 0.70947
8 0.15711 0.86658
9 0.08728 0.95386
10 0.03491 0.98877
11 0.00952 0.99829
12 0.00159 0.99988
13 0.00012 1.00000
FT
n
i
pp
n
i
nnT
i
T
i
T
n
() ( ) (.)=






−=








==
∑∑
105
00
L1600_Frame_C08.fm Page 176 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
For large sample sizes, the binomial mass function can be approximated with a
normal distribution. The mean and standard deviation of the binomial distribution are
µ
= np (8.4)
σ
= [np (1 − p)]
0.5
(8.5)
For a probability of 0.5, these reduce to
µ
= 0.5n and
σ
= 0.5n
0.5
. Using these with
the standard normal deviate gives:
(8.6)
Equation 8.6 gives a slightly biased estimate of the true probability. A better estimate
can be obtained by applying a continuity correction to x, such that z is

(8.7)
Equation 8.7 can be rearranged to solve for the random variable x, which is the
upper critical value for any level of significance:
(8.8)
TABLE 8.2
Critical Values for the Sign Tests
a
nX
.01
X
.05
nX
.01
X
.05
nX
.01
X
.05
nX
.01
X
.05
10 1
b
2
b
20 4 6
b
30 8 10401214

11 1 2 21 4 6 31 8 10 41 12 15
b
12 1 2 22 5 6 32 9 11
b
42 13 15
13 2
b
3 23 5 7 33 9 11 43 13 15
14 2 3 24 6
b
73410
b
11 44 14
b
16
15 3
b
4
b
25 6 8
b
35 10 12 45 14 16
16 3
b
4266 8361012461417
b
17 3 4 27 7 8 37 11 13 47 15 17
18 3 5 28 7 9 38 11 13 48 15 18
b
19 4 5 29 8

b
93912
b
14
b
49 16
b
18
50 16 18
a
For the upper tail critical values for probabilities of 5% and 1%, use X
u.05
= n − X
.05
and X
u.01
=
n – X
.01
.
b
To obtain a conservative estimate, reduce by 1. This entry has a rejection probability slightly
larger than that indicated, but the probability of the lower value is much less than the indicated
rejection probability. For example, for n = 13 and a 1% rejection probability, X = 1 has a
probability of 0.0017 and X = 2 has a probability of 0.0112.
z
xxn
n
=


=

µ
σ
05
05
05
.
.
.
z
xn
n
=
+−(.).
.
.
05 05
05
05
x n zn n zn=+ −= +−05 05 05 05 1
05 05
( )

L1600_Frame_C08.fm Page 177 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
Enter Equation 8.8 with the standard normal deviate z corresponding to the rejection
probability to compute the critical value. Equation 8.8 yields the upper bound. For
the one-sided lower test, use −z. Values obtained with this normal transformation
are approximate. It would be necessary to round up or round down the value obtained

with Equation 8.8. The rounded value can then be used with Equation 8.7 to estimate
the actual rejection probability.
Example 8.1
Mulch is applied to exposed soil surfaces at 63 sites, with each slope divided into
two equal sections. The density of mulch applied to the two sections is different
(i.e., high and low). The extent of the erosion was qualitatively assessed during storm
events with the extent rated as high or low. The sign test is used to test the following
hypotheses:
H
0
: The density of mulch does not affect the amount of eroded soil.
H
A
: Sites with the higher density of mulch experienced less erosion.
The statement of the alternative hypothesis dictates the use of a one-sided test.
Of the 63 sites, 46 showed a difference in eroded soil between the two sections,
with 17 sites not showing a difference. Of the 46 sites, the section with higher mulch
density showed the lower amount of eroded material on 41 sites. On 5 sites, the
section with higher mulch density showed the higher amount of erosion. Therefore,
the test statistic is 5. For a sample size of 46, the critical values for 1% and 5%
levels of significance are 14 and 17, respectively (see Table 8.2). Therefore, the null
hypothesis should be rejected. To use the normal approximation, the one-sided 1%
and 5% values of z are −2.327 and −1.645, which yield critical values of 14.6 and
16.9, respectively. These normal approximations are close to the actual binomial
values of 14 and 17.
8.4 TWO-SAMPLE t-TEST
In some cases, samples are obtained from two different populations, and it is of
interest to determine if the population means are equal. For example, two laboratories
may advertise that they evaluate water quality samples of some pollutant with an
accuracy of ±0.1 mg/L; samples may be used to test whether the means of the two

populations are equal. Similarly, tests could be conducted on engineering products
to determine whether the means are equal. The fraction of downtime for two com-
puter types could be tested to decide whether the mean times differ.
A number of tests can be used to test a pair of means. The method presented
here should be used to test the means of two independent samples. This test is
frequently of interest in engineering research when the investigator is interested in
comparing an experimental group to a control group. For example, an environmental
engineer might be interested in comparing the mean growth rates of microorganisms
in a polluted and natural environment. The procedure presented in this section can
be used to make the test.
L1600_Frame_C08.fm Page 178 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
Step 1: Formulate hypotheses. The means of two populations are denoted as
µ
1
and
µ
2
, the null hypothesis for a test on two independent means would be:
H
0
: The means of two populations are equal. (8.9a)
Mathematically, this is
H
0
:
µ
1
=
µ

2
(8.9b)
Both one-sided and two-sided alternatives can be used:
H
A1
:
µ
1
<
µ
2
(8.10a)
H
A2
:
µ
1
>
µ
2
(8.10b)
H
A3
:
µ
1

µ
2
(8.10c)

The selection of the alternative hypotheses should depend on the statement
of the problem.
Step 2: Select the appropriate model. For the case of two independent samples,
the hypotheses of step 1 can be tested using the following test statistic:
(8.11)
in which and are the means of the samples drawn from populations
1 and 2, respectively; n
1
and n
2
are the sample sizes used to compute and
, respectively; t is the value of a random variable that has a t distribution
with degrees of freedom (
υ
) of
ν
= n
1
+ n
2
− 2; and S
p
is the square root of
the pooled variance that is given by
(8.12)
in which and are the variances of the samples from population 1 and 2,
respectively. This test statistic assumes that the variances of the two popu-
lations are equal, but unknown.
Step 3: Select the level of significance. As usual, the level of significance
should be selected on the basis of the problem. However, values of either

5% or 1% are used most frequently.
t
XX
S
nn
p
=

+






12
12
05
11
.
X
1
X
2
X
1
X
2
S
nSn S

nn
p
2
11
2
22
2
12
11
2
=
−+−
+−
()()
S
1
2
S
2
2
L1600_Frame_C08.fm Page 179 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
Step 4: Compute an estimate of test statistic. Samples are drawn from the two
populations, and the sample means and variances computed. Equation 8.11
can be computed to test the null hypothesis of Equation 8.9.
Step 5: Define the region of rejection. The region of rejection is a function of
the degrees of freedom (
ν
= n
1

+ n
2
− 2), the level of significance (
α
), and
the statement of the alternative hypothesis. The regions of rejection for the
alternative hypotheses are as follows:
Step 6: Select the appropriate hypothesis. The sample estimate of the t statistic
from step 4 can be compared with the critical value (see Appendix
Table A.2), which is based on either t
α
or t
α
/2
obtained from step 5. If the
sample value lies in the region of rejection, then the null hypothesis should
be rejected.
Example 8.2
A study was made to measure the effect of suburban development on total nitrogen
levels in small streams. A decision was made to use the mean concentrations before
and after the development as the criterion. Eleven measurements of the total nitrogen
(mg/L) were taken prior to the development, with a mean of 0.78 mg/L and a standard
deviation of 0.36 mg/L. Fourteen measurements were taken after the development,
with a mean of 1.37 mg/L and a standard deviation of 0.87 mg/L. The data are used
to test the null hypothesis that the population means are equal against the alternative
hypothesis that the urban development increased total nitrogen levels, which requires
the following one-tailed test:
H
0
:

µ
b
=
µ
a
(8.13)
H
A
:
µ
b
<
µ
a
(8.14)
where
µ
b
and
µ
a
are the pre- and post-development means, respectively. Rejection
of H
0
would suggest that the nitrogen levels after development significantly exceed
the nitrogen levels before development, with the implication that the development
might have caused the increase.
Based on the sample data, the pooled variance of Equation 8.12 is
(8.15)
If H

A
is then reject H
0
if
µ
1
<
µ
2
t < −t
α
µ
1
>
µ
2
t > t
α
µ
1

µ
2
t < −t
α
/2
or t > t
α
/2
S

p
2
22
11 1 0 36 14 1 0 87
11 14 2
0 4842=
−+−
+−
=
()(.)()(.)
.
L1600_Frame_C08.fm Page 180 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
The computed value of the test statistic is
(8.16)
which has
ν
= 11 + 14 − 2 = 23 degrees of freedom. From Table A.2, with a 5%
level of significance and 23 degrees of freedom, the critical value of t is −1.714.
Thus, the null hypothesis is rejected, with the implication that the development
caused a significant change in nitrogen level. However, for a 1% significance level,
the critical value is −2.500, which leads to the decision that the increase is not
significant. This shows the importance of selecting the level of significance before
analyzing the data.
8.5 MANN–WHITNEY TEST
For cases in which watershed change occurs as an episodic event within the duration
of a flood series, the series can be separated into two subseries. The Mann–Whitney
U-test (Mann and Whitney, 1947) is a nonparametric alternative to the t-test for two
independent samples and can be used to test whether two independent samples have
been taken from the same population. Therefore, when the assumptions of the

parametric t-test are violated or are difficult to evaluate, such as with small samples,
the Mann–Whitney U-test should be applied. This test is equivalent to the Wilcoxon–
Mann–Whitney rank-sum test described in many textbooks as a t-test on the rank-
transformed data (Inman and Conover, 1983).
The procedure for applying the Mann–Whitney test follows.
1. Specify the hypotheses:
H
0
: The two independent samples are drawn from the same population.
H
A
: The two independent samples are not drawn from the same popu-
lation.
The alternative hypothesis shown is presented as a two-sided hypothesis;
one-sided alternative hypotheses can also be used:
H
A
: Higher (or lower) values are associated with one part of the series.
For the one-sided alternative, it is necessary to specify either higher or
lower values prior to analyzing the data.
2. The computed value (U) of the Mann–Whitney U-test is equal to the lesser
of U
a
and U
b
where
U
a
= n
a

n
b
+ 0.5 n
b
(n
b
+ 1) − S
b
(8.17a)
U
b
= n
a
n
b
+ 0.5 n
a
(n
a
+ 1) − S
a
(8.17b)
in which n
a
and n
b
are the sample sizes of subseries A and B, respectively.
The values of S
a
and S

b
are computed as follows: the two groups are
t =

+




=−
078 137
0 4842
1
11
1
14
2 104

.
.
L1600_Frame_C08.fm Page 181 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
combined, with the items in the combined group ranked in order from
smallest (rank = 1) to the largest (rank = n = n
a
+ n
b
). On each rank, include
a subscript a or b depending on whether the value is from subseries A or
B. S

a
and S
b
are the sums of the ranks with subscript a and b, respectively.
Since S
a
and S
b
are related by the following, only one value needs to be
computed by actual summation of the ranks:
(8.17c)
3. The level of significance (
α
) must be specified. As suggested earlier,
α
is
usually 0.05.
4. Compute the test statistic value of U of step 2 and the value of Z:
(8.18)
in which Z is the value of a random variable that has a standard normal dis-
tribution.
5. Obtain the critical value (Z
α
/2
for a two-sided test or Z
α

for a one-sided
test) from a standard-normal table (Table A.1).
6. Reject the null hypothesis if the computed value of Z (step 4) is greater

than Z
α
/2
or less than −Z
α
/2
(Z
α

for an upper one-sided test or −Z
α

for a
lower, one-sided test).
In some cases, the hydrologic change may dictate the direction of change in the
annual peak series; in such cases, it is appropriate to use the Mann–Whitney test as
a one-tailed test. For example, if channelization takes place within a watershed, the
peaks for the channelized condition should be greater than for the natural watershed.
To apply the U-test as a one-tailed test, specify subseries A as the series with the
smaller expected central tendency and then use U
a
as the computed value of the test
statistic, U = U
a
(rather than the lesser of U
a
and U
b
); the critical value of Z is −Z
α

(rather than −Z
α
/2
) and the null hypothesis is accepted when Z > − Z
α
.
Example 8.3
The 65-year annual maximum series for the Elizabeth River appears (Figure 7.2) to
change in magnitude about 1951 when the river was channelized. To test if the
channelization was accompanied by an increase in flood peaks, the flood series was
divided into two series, 1924 to 1951 (28 years) and 1952 to 1988 (37 years). Statistical
characteristics for the periods are given in Table 8.3.
Since the direction of change was dictated by the problem (i.e., the peak dis-
charges increased after channelization), the alternative hypothesis is
H
A
: The two independent samples are not from the same population and
the logarithms of the peaks for 1924–1951 are expected to be less than
those for 1952–1988.
SS nn
a
b
+= +05 1.( )
Z
Unn
nn n n
a
b
a
b

a
b
=

++
05
112
05
.
(( )/)
.
L1600_Frame_C08.fm Page 182 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
Analysis of the annual peak discharges yields U
a
= 200 and U
b
= 836. Since a one-
tailed test is made, U = U
a
. Using Equation 8.18, the computed value of Z is −4.213.
For a one-tailed test and a level of significance of 0.001, the critical value is −3.090.
Since Z < Z
α
, the null hypothesis is rejected. The results of the Mann–Whitney U-
test suggest that channelization significantly increased the peak discharges.
8.5.1 RATIONAL ANALYSIS OF THE MANN–WHITNEY TEST
When applying the Mann–Whitney test, the flows of the annual maximum series are
transformed to ranks. Thus, the variation within the floods of the annual series is
reduced to variation of ranks. The transformation of measurements on a continuous

variable (e.g., flow) to an ordinal scale (i.e., the rank of the flow) reduces the importance
of the between-flow variation. However, in contrast to the runs test, which reduces
the flows to a nominal variable, the importance of variation is greater for the Mann–
Whitney test than for the runs test. Random variation that is large relative to the
variation introduced by the watershed change into the annual flood series will more
likely mask the effect of the change when using the Mann–Whitney test compared
to the runs test.
This can be more clearly illustrated by examining the statistics U
a
and U
b
of
Equations 8.17. The first two terms of the equation represent the maximum possible
total of the ranks that could occur for a series with n
a
and n
b
elements. The third
terms, S
a
and S
b
, represent the summations of the actual ranks in sections A and B
of the series. Thus, when S
a
and S
b
are subtracted, the differences represent devia-
tions. If a trend is present, then one would expect either S
a

or S
b
to be small and the
other large, which would produce a small value of the test statistic U. If the flows
exhibit random variation that is large relative to the variation due to watershed
change, then it can introduce significant variation into the values of S
a
and S
b
, thereby
making it more difficult to detect a trend resulting from watershed change. In a
sense, the test is examining the significance of variation introduced by watershed
change relative to the inherent natural variation of the flows.
In addition to the variation from the watershed change, the importance of the
temporal location of the watershed change is important. In contrast to the runs test,
the Mann–Whitney test is less sensitive to the location of the trend. It is more likely
to detect a change in flows that results from watershed change near one of the ends
TABLE 8.3
Annual Peak Discharge Characteristics for Elizabeth River,
New Jersey, Before and After Channelization
Characteristics of Series Logarithms
Period
Mean
(ft
3
/s)
Standard
Deviation (ft
3
/s) Mean

Standard
Deviation Skew
1924–1951 1133 643 3.0040 0.2035 0.57
1952–1988 2059 908 3.2691 0.2074 −0.44
L1600_Frame_C08.fm Page 183 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
of the series than the runs test. If watershed change occurs near an end of the series,
then either N
a
or N
b
of Equations 8.17 would be small, but the sum of the ranks
would also be small. The summation terms S
a
and S
b
decrease in relative proportion
to the magnitudes of the first two terms. Thus, a change near either end of a series
is just as likely to be detected as a change near the middle of the series.
The Mann–Whitney test is intended for the analysis of two independent samples.
When applying the test to hydrologic data, a single series is separated into two parts
based on a watershed-change criterion. Thus, the test would not strictly apply if the
watershed change occurred over an extended period of time. Consequently, it may
be inappropriate to apply the test to data for watershed change that occurred gradually
with time. Relatively abrupt changes are more appropriate.
8.6 THE t-TEST FOR TWO RELATED SAMPLES
The t-test for independent samples was introduced in Section 8.4. The t-test described
here assumes that the pairs being compared are related. The relationship can arise
in two situations. First, the random sample is subjected to two different treatments
at different times and administered the same test following each treatment. The two

values of the paired test criterion are then compared and evaluated for a significant
difference. As an example, n watersheds could be selected and the slope of the main
channel computed for maps of different scales. The same computational method is
used, but different map scales may lead to different estimates of the slope. The
lengths may differ because the map with the coarser scale fails to show the same
degree of meandering. The statistical comparison of the slopes would seek to deter-
mine if the map scale makes a significant difference in estimates of channel slope.
The second case where pairing arises is when objects that are alike (e.g., identical
twins) or are similar (e.g., adjacent watersheds) need to be compared. In this case,
the pairs are given different treatments and the values of the test criterion compared.
For example, pairs of adjacent watersheds that are similar in slope, soils, climate, and
size, but differ in land cover (forested vs. deforested) are compared on the basis of a
criterion such as base flow rate, erosion rate, or evapotranspiration (ET). A significant
difference in the criterion would indicate the effect of the difference in the land cover.
The hypotheses for this test are the same as those for the t-test for two indepen-
dent samples:
H
0
:
µ
1
=
µ
2
H
A
:
µ
1


µ
2
(two-tailed)
µ
1
<
µ
2
(one-tailed lower)
µ
1
>
µ
2
(one-tailed upper)
The sampling variation for this test differs from that of the pooled variance for the
test of two independent samples (Equation 8.12). For this test, the sampling variation is
(8.19)
S
d
nn
m
=











2
05
1()
.
L1600_Frame_C08.fm Page 184 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
in which n is the number of pairs and
(8.20)
where D
i
is the difference between the ith pair of scores and is the average
difference between scores. The test statistic is:
with (8.21)
This test assumes that the difference scores are normally distributed. A log trans-
formation of the data may be used for highly skewed data. If the difference scores
deviate significantly from normality, especially if they are highly asymmetric, the
actual level of significance will differ considerably from the value used to find the
critical test statistic.
The computed test statistic is compared with the tabled t value for level of
significance
α
and degrees of freedom
υ
. The region of rejection depends on the
statement of the alternative hypothesis, as follows:
Example 8.4
Two different methods of assigning values of Manning’s roughness coefficient (n)

are used on eight small urban streams. The resulting values are given in columns 2
and 3 of Table 8.4. The two-tailed alternative hypothesis is used since the objective
is only to decide if the methods give, on the average, similar values. The value of
is equal to −0.027/8 = −0.003375. Results of Equation 8.20 follow:
(8.22)
The sampling variation is computed using Equation 8.19:
(8.23)
If H
A
is reject H
0
if
µ
x
<
µ
y
t < −t
α
,
υ
µ
x
>
µ
y
t > t
α
, υ
µ

x

µ
y
t < −t
α
/2,
υ
or t > t
α
/2,
υ
dDDD
D
n
i
i
i
i
i
i
i
222
2
∑∑ ∑

=−=−







()
D
tDS
m
= /
υ
=−n 1
D
∑= −

=d
2
2
0 001061
0 027
8
0 0009699.
(. )
.
S
m
=








=
0 0009699
88 1
0 004162
05
.
()
.
.
L1600_Frame_C08.fm Page 185 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
The test statistic is:
(8.24)
with 7 degrees of freedom. For a two-tailed test and a level of significance of 20%,
the critical value is ±1.415 (see Table A.2). Therefore, little chance exists that the
null hypothesis should be rejected. In spite of differences in the methods of estimat-
ing the roughness coefficients, the methods do not give significantly different values
on the average.
Example 8.5
The paired t-test can be applied to baseflow discharge data for two watersheds of
similar size (see Table 8.5). It is generally believed that increases in the percentage
of urban/suburban development cause reductions in baseflow discharges because
less water infiltrates during storm events. It is not clear exactly how much land
development is necessary to show a significant decrease in baseflow. The baseflows
of two pairs of similarly sized watersheds will be compared. The baseflow (ft
3
/sec)
based on the average daily flows between 1990 and 1998 were computed for each

month. Since baseflow rates depend on watershed size, both pairs have similar areas,
with percentage differences in the area of 7.50% and 3.16%. Such area differences
are not expected to have significant effects on the differences in baseflow.
For the first pair of watersheds (USGS gage station numbers 1593500 and
1591000) the monthly baseflows are given in columns 2 and 3 of Table 8.5. The two
watersheds have drainage areas of 37.78 mi
2
and 35.05 mi
2
. For the larger watershed,
the percentages of high density and residential area are 1.25% and 29.27%, respec-
tively. For the smaller watershed, the corresponding percentages are 0% and 1.08%.
A similar analysis was made for a second pair of watersheds (USGS gage station
TABLE 8.4
Comparison of Roughness Coefficients Using the Two-Sample
t-Test for Related Samples: Example 8.5
Roughness with
Channel Reach Method 1 Method 2 DD
2
1 0.072 0.059 0.013 0.000169
2 0.041 0.049 −0.008 0.000064
3 0.068 0.083 −0.015 0.000225
4 0.054 0.070 −0.016 0.000256
5 0.044 0.055 −0.011 0.000121
6 0.066 0.057 0.009 0.000081
7 0.080 0.071 0.009 0.000081
8 0.037 0.045 −0.088 0.000064
∑ D = −0.027 ∑ D
2
= 0.001061

t
t =

=−
0 003375
0 004162
0 811
.
.
.
L1600_Frame_C08.fm Page 186 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
numbers 1583600 and 1493000) that have areas of 21.09 mi
2
and 20.12 mi
2
, respec-
tively. The larger watershed has a percentage of high-density development of 1.04%
and of residential development of 20.37%, while the smaller watershed has corre-
sponding percentages of 0.04% and 1.77%, respectively. The first pair of watersheds
is compared first. The mean and standard deviation of the difference scores in column
4 of Table 8.5 are −2.733 and 3.608, respectively. Using Equation 8.21 the computed
value of the statistic is:
with (8.25)
The critical t values for levels of significance of 5%, 2.5%, and 1% are 1.796, 2.201,
and 2.718, respectively. Thus, the rejection probability is approximately 1.5%. The
null hypothesis of equal means would be rejected at 5% but accepted at 1%.
For the second pair of paired watersheds, the difference scores (column 7 of
Table 8.5) have a mean and standard deviation of 2.708 and 3.768, respectively. The
computed value of the t statistic is:

with (8.26)
Since the critical t values are negative because this is a one-tailed lower test, the
null hypothesis cannot be rejected. The mean baseflow discharges are not signifi-
cantly different in spite of the differences in watershed development.
TABLE 8.5
Two-Sample t-Test of Baseflow Discharge Rates
Watershed Pair 1 Watershed Pair 2
Month
(1)
X
ιι
ιι


Gage No.
1593500
(2)
Y
ιι
ιι


Gage No.
1591000
(3)
Difference
Scores
D
ιι
ιι

= X
ιι
ιι

−−
−−
Y
ιι
ιι
(4)
X
ιι
ιι


Gage No.
1583600
(5)
Y
ιι
ιι


Gage No.
1493000
(6)
Difference
Scores
D
ιι

ιι
= X
ιι
ιι

−−
−−
Y
ιι
ιι
(7)
Jan. 35.0 40 −5.0 20.0 25.0 −5.0
Feb. 40.0 40 0.0 25.0 25.0 0.0
Mar. 40.0 50 −10.0 30.0 30.0 0.0
Apr. 35.0 40 −5.0 30.0 30.0 0.0
May 30.0 30 0.0 25.0 20.0 5.0
June 20.0 20 0.0 17.5 12.5 5.0
July 10.0 15 −5.0 17.5 12.5 5.0
Aug. 10.0 10 0.0 15.0 10.0 5.0
Sept. 10.0 10 0.0 15.0 8.0 7.0
Oct. 12.5 10 2.5 12.5 10.0 2.5
Nov. 15.0 20 −5.0 20.0 12.0 8.0
Dec. 30.0 35 −5.0 20.0 20.0 0.0
t =

=
2 733
3 608 12
2 624
.

./
.
υ
=−=12 1 11
t ==
2 708
3 768 12
2 490
.
./
.
υ
=−=12 1 11
L1600_Frame_C08.fm Page 187 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
8.7 THE WALSH TEST
The Walsh test is a nonparametric alternative to the two-sample t-test for related
samples. It is applicable to data measured on an interval scale. The test assumes that
the two samples are drawn from symmetrical populations; however, the two popu-
lations do not have to be the same population, and they do not have to be normal
distributions. If the populations are symmetrical, then the mean of each population
will equal the median.
The null hypothesis for the Walsh test is:
H
0
: The average of the difference scores (
µ
d
) is equal to zero. (8.27a)
Both one-tailed and two-tailed alternative hypotheses can be tested:

H
A1
:
µ
d
< 0 (8.27b)
H
A2
:
µ
d
> 0 (8.27c)
H
A3
:
µ
d
≠ 0 (8.27d)
The alternative hypothesis should be selected based on the physical problem being
studied and prior to analyzing the data. Acceptance or rejection of the null hypothesis
will imply that the two populations have the same or different central tendencies,
respectively.
The test statistic for the Walsh test is computed based on the differences between
the paired observations. Table 8.6 gives the test statistics, which differ with the
sample size, level of significance, and tail of interest. The critical value is always
zero, so for a one-tailed lower test (Equation 8.27a), one of the test statistics in
column 4 of Table 8.6 should be used. For a one-tailed upper test (Equation 8.27b),
one of the test statistics of column 5 should be used. For a two-tailed test (Equation
8.27c), the test statistics of both columns 4 and 5 are used.
The procedure for conducting Walsh’s test is as follows:

1. Arrange the data as a set of related pairs: (x
i
, y
i
), i = 1, 2, …, n.
2. For each pair, compute the difference in the scores, d
i
: d
i
= x
i
− y
i
(note
that the values of d
i
can be positive, negative, or zero).
3. Arrange the difference scores in order of size, with the most negative
value given a rank of 1 to the algebraically largest value with a rank of
n. Tied values of d
i
are given sequential ranks, not average ranks. There-
fore, the d values are in order such that d
1
≤ d
2
≤ d
3
≤ … ≤ d
n

. (Note that
the subscript on d now indicates the order according to size, not the
number of the pair of step 2.)
4. Compute the sample value of the test statistic indicated in Table 8.6.
5. The critical test statistic is always zero.
L1600_Frame_C08.fm Page 188 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
TABLE 8.6
Critical Values for the Walsh Test
Significance
Level of Tests
Tests
Two-Tailed: Accept
µµ
µµ
δδ
δδ
≠≠
≠≠
0 If Either
n
One-
Tailed
Two-
Tailed One-Tailed: Accept
µµ
µµ
d
< 0 If One-Tailed: Accept
µµ

µµ
d
>>
>>
0 If
4 .062 .125 d
4
< 0 d
1
> 0
5 .062
.031
.125
.062
1/2(d
4
+ d
5
) < 0
d
5
< 0
1/2(d
1
+ d
2
) > 0
d
1
> 0

6 .047
.031
.016
.094
.062
.031
max [d
5
, 1/2(d
4
+ d
6
)] < 0
1/2(d
5
+ d
6
) < 0
d
6
< 0
min [d
2
, 1/2(d
1
+ d
4
)] > 0
1/2(d
1

+ d
2
) > 0
d
1
> 0
7 .055
.023
.016
.008
.109
.047
.031
.016
max [d
5
, 1/2(d
4
+ d
7
)] < 0
max [d
6
, 1/2(d
5
+ d
7
)] < 0
1/2(d
6

+ d
7
) < 0
d
7
< 0
min [d
3
, 1/2(d
1
+ d
4
)] > 0
min [d
2
, 1/2(d
1
+ d
3
)] > 0
1/2(d
1
+ d
2
) > 0
d
1
> 0
8 .043
.027

.012
.008
.004
.086
.055
.023
.016
.008
max [d
6
, 1/2(d
4
+ d
8
)] < 0
max [d
6
, 1/2(d
5
+ d
8
)] < 0
max [d
7
, 1/2(d
6
+ d
8
)] < 0
1/2(d

7
+ d
8
) < 0
d
8
< 0
min [d
3
, 1/2(d
1
+ d
5
)] > 0
min [d
3
, 1/2(d
1
+ d
4
)] > 0
min [d
2
, 1/2(d
1
+ d
3
)] > 0
1/2(d
1

+ d
2
) > 0
d
1
> 0
9 .051
.022
.010
.006
.004
.102
.043
.020
.012
.008
max [d
6
, 1/2(d
4
+ d
9
)] < 0
max [d
7
, 1/2(d
5
+ d
9
)] < 0

max [d
8
, 1/2(d
5
+ d
9
)] < 0
max [d
8
, 1/2(d
7
+ d
9
)] < 0
1/2(d
8
+ d
9
) < 0
min [d
4
, 1/2(d
1
+ d
6
)] > 0
min [d
3
, 1/2(d
1

+ d
5
)] > 0
min [d
2
, 1/2(d
1
+ d
5
)] > 0
min [d
2
, 1/2(d
1
+ d
5
)] > 0
1/2(d
1
+ d
2
) > 0
10 .056
.025
.011
.005
.111
.051
.021
.010

max [d
6
, 1/2(d
4
+ d
10
)] < 0
max [d
7
, 1/2(d
5
+ d
10
)] < 0
max [d
8
, 1/2(d
6
+ d
10
)] < 0
max [d
9
, 1/2(d
6
+ d
10
)] < 0
min [d
5

, 1/2(d
1
+ d
7
)] > 0
min [d
4
, 1/2(d
1
+ d
6
)] > 0
min [d
3
, 1/2(d
1
+ d
5
)] > 0
min [d
2
, 1/2(d
1
+ d
5
)] > 0
11 .048
.028
.011
.005

.097
.056
.021
.011
max [d
7
, 1/2(d
4
+ d
11
)] < 0
max [d
7
, 1/2(d
5
+ d
11
)] < 0
max [1/2(d
6
+ d
11
), 1/2(d
8
+ d
9
)] < 0
max [d
9
, 1/2(d

7
+ d
11
)] < 0
min [d
5
, 1/2(d
1
+ d
8
)] > 0
min [d
5
, 1/2(d
1
+ d
7
)] > 0
min [1/2(d
1
+ d
6
), 1/2(d
3
+ d
4
) > 0
min [d
3
, 1/2(d

1
+ d
5
)] > 0
12 .047
.024
.010
.005
.094
.048
.020
.011
max [1/2(d
4
+ d
12
), 1/2(d
5
+ d
11
)] < 0
max [d
8
, 1/2(d
5
+ d
12
)] < 0
max [d
9

, 1/2(d
6
+ d
12
)] < 0
max [1/2(d
7
+ d
12
), 1/2(d
9
+ d
10
)] < 0
min [1/2(d
1
+ d
9
), 1/2(d
2
+ d
8
)] > 0
min [d
5
, 1/2(d
1
+ d
8
)] > 0

min [d
4
, 1/2(d
1
+ d
7
)] > 0
min [1/2(d
1
+ d
6
), 1/2(d
3
+ d
4
)] > 0
13 .047
.023
.010
.005
.094
.047
.020
.010
max [1/2(d
4
+ d
13
), 1/2(d
5

+ d
12
)] < 0
max [1/2(d
5
+ d
13
), 1/2(d
6
+ d
12
)] < 0
max [1/2(d
6
+ d
13
), 1/2(d
9
+ d
10
)] < 0
max [d
10
, 1/2(d
7
+ d
13
)] < 0
min [1/2(d
1

+ d
10
), 1/2(d
2
+ d
9
)] > 0
min [1/2(d
1
+ d
9
), 1/2(d
2
+ d
8
)] > 0
min [1/2(d
1
+ d
8
), 1/2(d
4
+ d
5
)] > 0
min [d
4
, 1/2(d
1
+ d

7
)] > 0
14 .047
.023
.010
.005
.094
.047
.020
.010
max [1/2(d
4
+ d
14
), 1/2(d
5
+ d
13
)] < 0
max [1/2(d
5
+ d
14
), 1/2(d
6
+ d
13
)] < 0
max [d
10

, 1/2(d
6
+ d
14
)] < 0
max [1/2(d
7
+ d
14
), 1/2(d
10
+ d
11
)] < 0
min [1/2(d
1
+ d
11
), 1/2(d
2
+ d
10
)] > 0
min [1/2(d
1
+ d
10
), 1/2(d
2
+ d

9
)] > 0
min [d
5
, 1/2(d
1
+ d
9
)] > 0
min [1/2(d
1
+ d
8
), 1/2(d
4
+ d
5
)] > 0
15 .047
.023
.010
.005
.094
.047
.020
.010
max [1/2(d
4
+ d
15

), 1/2(d
5
+ d
14
)] < 0
max [1/2(d
5
+ d
15
), 1/2(d
6
+ d
14
)] < 0
max [1/2(d
6
+ d
15
), 1/2(d
10
+ d
11
)] < 0
max [d
11
, 1/2(d
7
+ d
15
)] < 0

min [1/2(d
1
+ d
12
), 1/2(d
2
+ d
11
)] > 0
min [1/2(d
1
+ d
11
), 1/2(d
2
+ d
10
)] > 0
min [1/2(d
1
+ d
10
), 1/2(d
5
+ d
6
)] > 0
min [d
5
, 1/2(d

1
+ d
9
)] > 0
L1600_Frame_C08.fm Page 189 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
6. Compare the computed value (W
L
and/or W
U
) to the critical value of zero,
as indicated in the table. The decision to accept or reject is indicated by:
Example 8.6
The data from Example 8.5 can be compared with the Walsh test. The first pair of
watersheds is compared first. Since the larger watershed has the higher percentages
of developed land, it would have the lower baseflows if the development has caused
decreases in baseflow. Therefore, the alternative hypothesis of Equation 8.27b will
be tested and the larger watershed will be denoted as x
i
. The difference scores and
their ranks are given in columns 4 and 5, respectively, of Table 8.7. The values of
the test statistics are computed and shown in Table 8.8. The null hypothesis could
be rejected for a level of significance of 4.7% but would be accepted for the smaller
levels of significance. It may be of interest to note that the computed values for the
three smaller levels of significance are equal to the critical value of zero. While this
is not sufficient for rejection, it may suggest that the baseflows for the more highly
developed watershed tend to be lower. In only 1 month (October) was the baseflow
from the more developed watershed higher than that for the less developed water-
shed. This shows the importance of the selection of the level of significance, as well
as the limitations of statistical analysis.

If H
A
is reject H
0
if
µ
d
< 0 W
L
< 0
µ
d
> 0 W
U
> 0
µ
d
≠ 0 W
L
> 0 or W
U
> 0
TABLE 8.7
Application of Walsh Test to Baseflow Discharge Data
Watershed Pair 1 Watershed Pair 2
Month
(1)
X
i
Gage No.

1593500
(2)
Y
i
Gage No.
1591000
(3)
Difference
Scores
d
i
==
==
x
i
−−
−−
y
i
(4)
Rank of
d
i
(5)
X
i
Gage No.
1583600
(6)
Y

i
Gage No.
1493000
(7)
d
i
==
==
x
i
−−
−−

y
i
(8)
Rank of
d
i
(9)
Jan. 35.0 40 −5.0 2 20.0 25.0 −5.0 1
Feb. 40.0 40 0.0 7 25.0 25.0 0.0 2
Mar. 40.0 50 −10.0 1 30.0 30.0 0.0 3
Apr. 35.0 40 −5.0 3 30.0 30.0 0.0 4
May 30.0 30 0.0 8 25.0 20.0 5.0 7
June 20.0 20 0.0 9 17.5 12.5 5.0 8
July 10.0 15 −5.0 4 17.5 12.5 5.0 9
Aug. 10.0 10 0.0 10 15.0 10.0 5.0 10
Sept. 10.0 10 0.0 11 15.0 8.0 7.0 11
Oct. 12.5 10 2.5 12 12.5 10.0 2.5 6

Nov. 15.0 20 −5.0 5 20.0 12.0 8.0 12
Dec. 30.0 35 −5.0 6 20.0 20.0 0.0 5
L1600_Frame_C08.fm Page 190 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
The second pair of watersheds can be similarly compared. Again, the larger
watershed is denoted as x
i
, and since it has the larger percentages of development,
it is expected to have lower baseflow discharges. Therefore, the alternative hypothesis
of Equation 8.27b is selected. The baseflow discharges are given in columns 6 and
7 of Table 8.7. The difference scores and the ranks of the scores are given in columns
8 and 9. In this case, only one of the difference scores is negative (January). This
in itself suggests that the land development did not decrease the baseflow. Application
of the Walsh test yields the sample values of the test statistic shown in Table 8.8.
All of these are greater than the criterion of less than zero, so the null hypothesis
is accepted at all four levels of significance.
Both data sets suggest that for these two pairs of coastal watersheds, land devel-
opment did not cause a significant decrease in baseflow discharge. Even though the
assumptions of the two-sample t-test may be violated, t-tests of the same data support
the results of the Walsh test. The critical values, whether for the untransformed data
means or the logarithms of the data, are significantly within the region of acceptance.
8.8 WILCOXON MATCHED-PAIRS,
SIGNED-RANKS TEST
The Wilcoxon matched-pairs, signed-ranks test is used to test whether two related
groups show a difference in central tendency. The null hypothesis is that the central
tendencies of two related populations are not significantly different. Thus, the test
is an alternative to the parametric two-sample t-test. However, the Wilcoxon test can
TABLE 8.8
Computation of Sample Test Statistics for Walsh Test
(a) Watershed Pair No. 1

Level of
Significance,
αα
αα
Test Statistic Sample Value Decision
0.047
max (−1.25, −2.5) < 0 Reject H
0
0.024
max (0, −1.25) < 0 Accept H
0
0.010
max (0, −1.25) < 0 Accept H
0
0.005
max (1.25, 0) < 0 Accept H
0
(b) Watershed Pair No. 2
Level of
Significance,
αα
αα
Sample Value Decision
0.047 max (4, 3.5) < 0 Accept H
0
0.024 max (5, 4) < 0 Accept H
0
0.010 max (5, 5.25) < 0 Accept H
0
0.005 max (6.5, 5) < 0 Accept H

0
max ( ), ( )
1
2
1
2
0
412
5
11
dd dd++






<
max , ( )ddd
8
5
12
1
2
0+







<
max , ( )ddd
9612
1
2
0+






<
max ( ), ( )
1
2
1
2
0
712 910
dd dd++






<
L1600_Frame_C08.fm Page 191 Tuesday, September 24, 2002 3:24 PM

© 2003 by CRC Press LLC
be used with variables that are measured on the ordinal scale. The following six-
step procedure is used to test the hypotheses:
1. The null hypothesis is that two related groups do not show a difference
in central tendency. Both one-sided and two-sided alternative hypotheses
can be tested.
2. The test statistic (T) is the lesser of the sums of the positive and negative
differences between the ranks of the values in the samples from the two
groups.
3. Select a level of significance.
4. The sample value of the test statistic (T) is obtained as follows:
a. Compute the magnitude of the difference between each pair of values.
b. Rank the absolute value of the differences in ascending order (rank 1
for the smallest difference; rank n for the largest difference).
c. Place the sign of the difference on the rank.
d. Compute the sum of the ranks of the positive differences, S
p
, and the
sum of the ranks of the negative differences, S
n
.
e. The value of the test statistic T is the lesser of the absolute values of
S
p
and S
n
.
5. For sample sizes of less than 50, the critical value, T
α
, is obtained from

Table A.9 using the sample size, n, and the level of significance as the
arguments. An approximation will be given below for samples larger than
25.
6. Using
µ
i
to indicate the central tendency of sample i, the decision criterion
is as follows:
It is important to note that the decision rule for both of the one-sided alternatives is
the same; this is the result of taking the absolute value in step 4e.
For sample sizes greater than 25, a normal approximation can be used. The mean
( ) and the standard deviation (S
T
) of the random variable T can be approximated by:
(8.28)
and
(8.29)
If H
A
is Reject H
0
if
µ
1
<
µ
2
T < T
α
µ

1
>
µ
2
T < T
α
µ
1

µ
2
T < T
α/2
T
T
nn
=
+()1
4
S
nn n
T
=
++







()( )
.
12 1
24
05
L1600_Frame_C08.fm Page 192 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
The random variable Z has a standard normal distribution, where Z is given by
(8.30)
For this test statistic, the region of rejection is
Several additional points are important:
1. For a one-tailed test, the direction of the difference must be specified
before analyzing the data. This implies that the test statistic is equal to
whichever sum S
p
or S
n
is specified in advance, not the lesser of the two.
2. If one or more differences is zero, treat them as follows:
a. If only 1, delete it and reduce n by 1.
b. If an even number, split them between S
p
and S
n
and use an average
rank.
c. If an odd number (3 or larger), delete 1 value, reduce n by 1, and treat
the remaining zeroes as an even number of zeroes.
Example 8.7
Consider the case of two 5-acre experimental watersheds, one continually maintained

with a low-density brush cover and the other allowed to naturally develop a more
dense brush cover over the 11-year period of record. The annual maximum dis-
charges were recorded for each year of record on both watersheds. The results are
given in Table 8.9. Expectation would be for the annual maximum discharge rates
from the low density brush covered watershed (i.e., watershed 1) to be higher than
those for the watershed undergoing change (i.e., watershed 2). The null hypothesis
would be that the means were equal, which would have the implication that the
brush density was not a dominant factor in peak discharge magnitude. A one-tailed
alternative is used, as peak discharges should decrease with increasing brush density.
Therefore, the sum of the negative ranks is to be used as the value of the test statistic T.
As shown in Table 8.9, the sample value is 3.
For a sample size of 11 and a one-tailed test, the critical values for levels of
significance of 5%, 1%, and 0.5% are 14, 7, and 5, respectively (see Table A.9).
Since the computed value of T is less than the critical value even for 0.5%, the null
hypothesis can be rejected, which leads to the conclusion that the mean discharge
decreases as the density of brush cover increases.
If H
A
is Reject H
0
if
µ
1
<
µ
2
Z < −Z
α
µ
1

>
µ
2
Z > Z
α
µ
1

µ
2
Z < −Z
α
/2
or Z > Z
α
/2
Z
TT
S
T
=

L1600_Frame_C08.fm Page 193 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
8.8.1 TIES
Tied values can play an important part in the accuracy of the Wilcoxon test. Ties
can arise in two ways. First, paired values of the criterion variable can be the same,
thus producing a difference of zero. Second, two differences can have the same
magnitude in which case they need to be assigned equal ranks. Understanding the
proper way to handle ties is important to the proper use of the Wilcoxon test.

Tied differences are more easily handled than tied values of the criterion. As a
general rule, if two differences have the same magnitude, then they are given the
average of the two ranks that would have been assigned to them. If three differences
have the same magnitude, then all three are assigned the average of the three ranks
that would have been assigned to them. Consider the case where the differences of
12 pairs of scores are as follows:
5, 7, −3, −5, 1, −2, 4, 3, 6, 4, 5, 8
Note that the sequence includes a pair of threes, with one negative and one positive.
These would have been assigned ranks of 3 and 4, so each is assigned a rank of 3.5.
Note that the difference in the signs of the differences, +3 and −3, is irrelevant in
assigning ranks. The sequence of differences also includes two values of 4, which
would have been ranked 5 and 6. Therefore, each is assigned a rank of 5.5. The
above sequence includes three differences that have magnitudes of 5. Since they
would have been assigned ranks of 7, 8, and 9, each of them is given a rank of 8.
Thus, the ranks of the 12 differences shown above are:
8, 11, 3.5, 8, 1, 2, 5.5, 3.5, 10, 5.5, 8, 12
Note that ranks of 7 and 9 are not included because of the ties.
TABLE 8.9
Detection of Change in Cover Density
Annual Maximum for
Watershed
Year 1 2 Difference r
p
r
n
1983 4.7 5.1 −0.4 2
1984 6.7 6.8 −0.1 1
1985 6.3 5.6 0.7 5
1986 3.7 3.2 0.5 3
1987 5.2 4.6 0.6 4

1988 6.3 5.2 1.1 6
1989 4.3 2.4 1.9 7
1990 4.9 2.7 2.2 8
1991 5.9 3.4 2.5 10
1992 3.5 1.2 2.3 9
1993 6.7 4.0 2.7 11
S
p
= 63 S
n
= 3
L1600_Frame_C08.fm Page 194 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
The presence of tied criterion scores is a more important problem since the
procedure assumes that the random variable is continuously distributed. However,
because tied measurements occur, the occurrence of tied scores must be addressed.
A pair of identical scores produces a difference of zero. The handling of a difference
of zero is subject to debate. Several courses of action are possible. First, the zero
value could be assigned a split rank of 0.5, with half being placed in the +rank
column and 0.5 being placed in the −rank column. Second, the pair with equal scores
could be discarded and the sample size reduced by 1. The problem with this alter-
native is that the scores were measured and even suggest that the null hypothesis of
equal means is correct. Therefore, discarding the value would bias the decision
toward rejection of the null hypothesis. Third, if the differences contain multiple
ties, they could be assigned equal ranks and divided between the +rank and −rank
columns. Fourth, the tied difference could be ranked with the rank assigned to the
column that is most likely to lead to acceptance of H
0
. Each of the four alternatives
is flawed, but at the same time has some merit. Therefore, a fifth possibility is to

complete the test using each of the four rules. If the alternatives lead to different
decisions, then it may be preferable to use another test.
Example 8.8
The handling of tied scores is illustrated with a hypothetical data set of eight paired
scores (columns 1 and 2 of Table 8.10). The data include one pair of ties of 3, which
yields a difference of zero. Method 1 would give the value a rank of 1, as it is the
smallest difference, but split the rank between the positive and negative sums (see
columns 4 and 5). Method 4 assigns the rank of 1 to the negative column, as this
will make the smaller sum larger, thus decreasing the likelihood of rejecting the null
hypothesis. For method 2, the zero difference is omitted from the sample, which
reduces the sample size to seven.
TABLE 8.10
Effect of Tied Scores with the Wilcoxon Test
Method 1 Method 2 Method 4
XYd+ Rank −−
−−
Rank + Rank −−
−−
Rank + Rank −−
−−
Rank
7 3 4 6.5 — 5.5 — 6.5 —
8 6 2 3.5 — 2.5 — 3.5 —
4 1 3 5.0 — 4.0 — 5.0 —
6 1 3 8.0 — 7.0 — 8.0 —
7 5 2 3.5 — 2.5 — 3.5 —
3 3 0 0.5 0.5 — — — 1
9 5 4 6.5 — 5.5 — 6.5 —
23−1 —2.0—1—2
Totals 33.5 2.5 27.0 1 33.0 3

L1600_Frame_C08.fm Page 195 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
For each of the three cases, the positive and negative ranks are added, which
gives the sums in Table 8.10. Using the smaller of the positive and negative ranks
yields values of 2.5, 1, and 3 for the test statistic under methods 1, 2, and 4,
respectively. For methods 1 and 4, the critical values are 4, 2, and 0 for a two-tailed
test (see Table A.9) for levels of significance of 5%, 2%, and 1%, respectively. For
these cases, H
0
would be rejected at the 5% level, but accepted at the 2% level. The
tied-score pair would not influence the decision. For method 2, the critical values
are 2 and 0 for 5% and 2%, respectively. In this case, the null hypothesis is rejected
at the 5% level but accepted at the 2% level, which is the same decision produced
using methods 1 and 4.
In this case, the method of handling ties did not make a difference in the decision.
In other cases, it is possible that the method selected would make a difference, which
suggests that all methods should be checked.
Example 8.9
Twenty-eight pairs of watersheds are used to examine the effect of a particular type
of land development on the erosion rate. The erosion rates (tons/acre/year) are given
in Table 8.11 for the unchanged watershed (X
1
) and the developed watershed (X
2
).
Of interest is whether development increases the mean erosion rate. Therefore, a
one-tailed test is applied. For testing the alternative that the mean of X
2
is greater
than the mean of X

1
, the summation of the positive ranks is used as the value of the
test statistic. For the 28 paired watersheds, only 12 experienced greater erosion than
the undeveloped watershed. Two pairs of watersheds had the same erosion rate,
which yields two differences of zero. These were assigned average ranks of 1.5,
with one placed in the +rank column and one in the −rank column. The sum of the
positive ranks is 220. If the normal approximation of Equations 8.28 to 8.30 is
applied, the mean is 203, the standard deviation S
T
is 43.91, and the z value is
Critical values of z for a lower-tailed test would be −1.282 and −1.645 for 10% and
5% levels of significance, respectively. Since the computed value of z is positive,
the null hypothesis of equal means is accepted.
For the standard test, critical sums of 130 and 102 are obtained from Table A.9
for 5% and 1% levels of significance, respectively. Since S
p
is larger than these
critical values, the null hypothesis of equal means is accepted.
Note that if the two-tailed alternative was tested, then the computed value of T
would be the lesser of S
p
and S
n
, which for the data would have been 186. This value
is still safely within the region of acceptance.
8.9 ONE-SAMPLE CHI-SQUARE TEST
Watershed change can influence the spread of data, not just the mean. Channelization
may increase the central tendency of floods and either increase or decrease the spread
of the data. If the channelization reduces the natural storage, then the smoothing of
the flows will not take place. However, if the channelization is limited in effects, it

T
z =

=
220 203
43 91
0 387
.
.
L1600_Frame_C08.fm Page 196 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC
may cause the smaller events to show a relatively greater effect than the larger events,
thereby reducing the spread. Variation in reservoir operating rules can significantly
influence the spread of flood data in channels downstream of the dam. Urbanization
can also change the spread of the data.
Both parametric and nonparametric tests are available for detecting change in
the variance of data. Additionally, both univariate and bivariate methods are available.
The one-sample test can be used to test the variance of a single random variable
against a standard of comparison with the following null hypothesis:
H
0
: (8.31)
TABLE 8.11
Application of Wilcoxon Test to Paired
Watersheds
Erosion Rate for
Watershed
Absolute
Difference Rank
12 ++

++
−−
−−
35 42 7 — 8
27 18 9 11 —
62 49 13 16 —
51 57 6 — 6
43 18 25 27 —
26 54 28 — 28
30 53 23 — 25
41 41 0 1.5 —
64 57 7 8 —
46 33 13 16 —
19 32 13 — 16
55 38 17 21.5 —
26 43 17 — 21.5
42 39 3 4 —
34 45 11 — 13
72 58 14 18 —
38 23 15 19 —
23 47 24 — 26
52 62 10 — 12
41 33 8 10 —
45 26 19 23 —
29 29 0 — 1.5
12 17 5 — 5
37 30 7 8 —
66 50 16 20 —
27 49 22 — 24
55 43 12 14 —

44 43 1 3 —
S
p
= 220 S
n
= 186
σ
o
2
σσ
22
=
o
L1600_Frame_C08.fm Page 197 Tuesday, September 24, 2002 3:24 PM
© 2003 by CRC Press LLC

×