Tài liệu Statistics for Environmental Engineers P2 doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (452.11 KB, 10 trang )

© 2002 By CRC Press LLC

at the shorter times and in this region the residuals are large and predominantly positive. Tukey (1977)
calls this process of plotting residuals

ﬂattening the data

. He emphasizes its power to shift our attention
from the ﬁtted line to the discrepancies between prediction and observation. It is these discrepancies
that contain the information needed to improve the model.
Make it a habit to examine the residuals of a ﬁtted model, including deviations from a simple mean.
Check for normality by making a dot diagram or histogram. Plot the residuals against the predicted
values, against the predictor variables, and as a function of the time order in which the measurements
were made. Residuals that appear to be random and to have uniform variance are persuasive evidence
that the model has no serious deﬁciencies. If the residuals show a trend, it is evidence that the model is
inadequate. If the residuals spread out, it suggests that a data transformation is probably needed.
Figure 3.12 is a calibration curve for measuring chloride using an ion chromatograph. There are three repli-
cate measures at each concentration level. The hidden variation of the replicates is revealed in Figure 3.13,

FIGURE 3.11

Graphing residuals. The visual impression from the top plot is that the vertical deviations are greater for
large values of time, but the residual plot (bottom) shows that the curve does not ﬁt the points at low times.

FIGURE 3.12

Calibration curve for measuring chloride with an ion chromatograph. There are three replicate measure-
ments at each of the 13 levels of chloride.
0 10 20 30
30
20

10
0
6
4
2
0
-2
-4
-6
Time (hours)
Residual Concentration
0
200
400
600
800
120100806040200
Standard conc. (mg/L)
Peak (1000s)

L1592_frame_C03 Page 32 Tuesday, December 18, 2001 1:41 PM
© 2002 By CRC Press LLC

which has ﬂattened the data by looking at deviations from the average of the three values at each level.
An important fact is revealed: the measurement error (variation) tends to increase as the concentration
increases. This must be taken into account when ﬁtting the calibration curve to the data.

A Note on Clarity and Style

Here are the words of some people who have devoted their talent and energy to improving the quality

of graphical presentations of statistical data.

“

Excellence in statistical graphics consists of complex ideas communicated with clarity, precision,
and efﬁciency

.” Edward Tufte (1983)
“

The greatest possibilities of visual display lie in vividness and inescapability of the intended
message

.” John Tukey (1990)
“

Graphing data should be an iterative experiment process

.” Cleveland (1994)

Tufte (1983) emphasizes clarity and simplicity in graphics. Wainer (1997) uses elegance, grace, and
impact to describe good graphics. Cleveland (1994) emphasizes clarity, precision, and efﬁciency. William
Playfair (1786), a pioneer and innovator in the use of statistical graphics, desires to tell a story graphically
as well as dramatically.
Vividness, drama, elegance, grace, clarity, and impact are not technical terms and the ideas they convey
are not easy to capture in technical rules, but Cleveland (1994) and Tufte (1983) have suggested basic
principles that will produce better graphics. Tufte (1983) says that graphical excellence:
• is the well-designed presentation of interesting data: a matter of

substance

, of

statistics

, and
of

design

• consists of complex ideas communicated with clarity, precision, and efﬁciency
• is that which gives the viewer the greatest number of ideas in the shortest time with the least
ink in the smallest space
• is almost always multivariate
• requires telling the truth about the data
These guidelines discourage fanciﬁed graphs with multiple fonts, cross hatching, and 3-D effects.
They do not say that color is necessary or helpful. A poor graph does not become better because color
is added.
Style is to a large extent personal. Let us look at ﬁve graphical versions of the same data in Figure 3.14.
The graphs show how the downward trend in the average number of bald eagle hatchlings in northwestern
Ontario reversed after DDT was banned in 1973. The top graphic (so easily produced by computer
graphics) does not facilitate understanding the data. It is loaded with what Tufte (1983) calls

chartjunk

—
three-dimensional boxes and shading. “Every bit of ink on a graphic requires a reason. And nearly always
that reason should be that the ink presents new information (Tufte, 1983).” The two bar charts in the

FIGURE 3.13

Residuals of the chloride data with respect to the average peak value at each concentration level.
120100806040200
-4000
-2000
0
2000
4000
Standard concentration (mg/L)
Residuals from average

L1592_frame_C03 Page 33 Tuesday, December 18, 2001 1:41 PM
© 2002 By CRC Press LLC

middle row are clear. The version on the right is cleaner and clearer (the box frame is not needed). The
white lines through the bars serve as the vertical scale. The two graphs in the bottom row are better yet.
The bars become dots with a line added to emphasize the trend. The version on the right smoothes the
trend with a curve and adds a note to show when DDT was banned.
Most data sets, like this simple one, can be plotted in a variety of ways. The viewer will appreciate the
effort required to explore variations and present one that is clear, precise, and efﬁcient in the presentation
of the essential information.

Should We Always Plot the Data?

According to Farquhar and Farquhar (1891), two 19th century economists, “Getting information from
a table is like extracting sunlight from a cucumber.” A virtually perfect rule of statistics is “Plot the
data.” There are times, however, when a plot is unnecessary. Figure 3.15 is an example. This is a simpliﬁed
reproduction (shading removed) of a published graph that showed ﬁve values.
pH

=

5 COD

=

2300 mg/L BOD

=

1500 mg/L TSS

=

875 mg/L TDS

=

5700 mg/L

FIGURE 3.14

Several versions of plots that show how banning DDT helped the recovery of the bald eagle population in
northwestern Ontario.
1980197519701965
0.4
0.6
0.8
1.0
1.2

1.4
6866 70 72 74 76 78 80
0
0.2
0.4
0.6
0.8
1.0
1.2
1968
1966
1970
1972
1974
1976
1978
1980
6866 70 72 74 76 78 80
0.
0.
0.
0.
1.
1.
1.
1980197519701965
0.4
0.6
0.8
1.0

1.2
1.4
Year
Year
Year Year
Year
0
2
4
6
8
0
2
4
Average Number of Bald
Eagle Hatchlings per Area
Average Number of Bald
Eagle Hatchlings per Area
Average Number of Bald
Eagle Hatchlings per Area
0
5
0
51.
0.
0.
1.
Mean Number of Young
per Breeding Area
Bald Eagle Hatchlings

per Nesting Site
DDT banned in
Ontario in 1973

L1592_frame_C03 Page 34 Tuesday, December 18, 2001 1:41 PM
© 2002 By CRC Press LLC

These ﬁve values say it all, and better than the graph. Do not use an axe to hack your way through an
open door. Aside from being unnecessary, this chart has three major faults. It confuses units — pH is not
measured in mg/L. Three-dimensional effects make it more difﬁcult to read the numerical values. Using
a log scale makes the values seem nearly the same when they are much different. The 875 mg/L TSS
and the 1500 mg/L COD have bars that are nearly the same height.

Summary

Graphical methods are obviously useful for both initial and exploratory data analyses, but they also serve
us well in the ﬁnal analysis. “A picture is worth a thousand words” is a cliché, but still powerfully true.
The right graph may reveal all that is important. If it only tells part of the story, that is the part that is
most likely to be remembered.
Tables of numbers camouﬂage the interesting features of data. The human mind, which is remarkably
well adapted to so many and varied tasks, is simply not capable of extracting useful information from
tabulated ﬁgures. Putting these same numbers in appropriate graphical form completely changes the
situation. The informed human mind can then operate efﬁciently with these graphs as inputs. In short,
suitable graphs of data and the human mind are an effective combination; endless tables of data and the
mind are not.
It is extremely important that plots be kept current because the ﬁrst purpose of keeping these plots
is to help monitor and, if necessary, to troubleshoot difﬁculties as they arise. The plots do not have to
be beautiful, or computer drafted, to be useful. Make simple plots by hand as the data become available. If
the plots are made at some future date to provide a record of what happened in the distant past, it will
be too late to take appropriate action to improve performance. The second purpose is to have an accurate

record of what has happened in the past, especially if the salient information is in such a form that it is
easily communicated and readily understood. If they are kept up-to-date and used for the ﬁrst purpose,
they can also be used for the second. On the other hand, if they are not kept up-to-date, they may be
useful for the second purpose only. In the interest of efﬁciency, they ought to serve double duty.
Intelligent data analysis begins with plotting the data. Be imaginative. Use a collection of different
graphs to see different aspects of the data. Plotting graphs in a notebook is not as useful as making plots
large and visible. Plots should be displayed in a prominent place so that those concerned with the environ-
mental system can review them readily.
We close with Tukey’s (1977) declaration: “

The greatest value of a picture

is when it

forces

us to

notice what we never expected to see

.” (Emphasis and italics in the original.)

References

Anscombe, F. J. (1973). “Graphs in Statistical Analysis,”

American Statistician,

27, 17–21.
Chatﬁeld, C. (1988).

Problem Solving: A Statistician’s Guide,

London, Chapman & Hall.
Chatﬁeld, C. (1991). “Avoiding Statistical Pitfalls,”

Stat. Sci.,

6(3), 240–268.
Cleveland, W. S. (1990).

The Elements of Graphing Data,

2nd ed., Summit, NJ, Hobart Press.
Cleveland, W. S. (1994),

Visualizing Data,

Summit, NJ, Hobart Press.

FIGURE 3.15

This unnecessary graph, which shows just
ﬁve values, should be replaced by a table.
10000
1000
Concentration (mg/L)
100
10
1

L1592_frame_C03 Page 35 Tuesday, December 18, 2001 1:41 PM
© 2002 By CRC Press LLC

Farquhar, A. B. and H. Farquhar (1891). “

Economic and Industrial Delusions: A Discourse of the Case for
Protection,

” New York, Putnam.
Gameson, A. L. H., G. A. Truesdale, and M. J. Van Overdijk (1961). “Variation in Performance of Twelve
Replicate Small-Scale Percolating Filters,”

Water and Waste Treatment J.,

9, 342–350.
Hunter, J. S. (1988). “The Digidot Plot,”

Am. Statistician,

42, 54.
Tufte, E. R. (1983).

The Visual Display of Quantitative Information,

Cheshire, CN, Graphics Press.
Tufte, E. R. (1990).

Envisioning Information,

Cheshire, CN, Graphics Press.
Tufte, E. R. (1997).

Visual Explanations,

Cheshire, CN, Graphics Press.
Tukey, J. W. (1977).

Exploratory Data Analysis,

Reading, MA, Addison-Wesley.
Tukey, J. W. (1990). “Data Based Graphics: Visual Display in the Decades to Come,”

Stat. Sci.,

5, 327–329.
Wainer, H. (1997).

Visual Revelations: Graphical Tales of Fate and Deception from Napoleon Boneparte to
Ross Perot

, New York, Copernicus, Springer-Verlag.

Exercises

3.1

Box-Whisker Plot. For the 11 ordered observations below, make the box-whisker plot to show
the median, the upper and lower quartiles, and the upper and lower cut-off.
36 37 45 52 56 58 66 68 75 90 100

3.2

Phosphorus in Sludge. The values below are annual average concentrations of total phosphorus
in municipal sewage sludge, measured as percent of dry weight solids. Time runs from right
to left. The ﬁrst value is for 1979. Make several plots of the data to discover any trends or
patterns. Try to explain any patterns you discover.
2.7 2.5 2.3 2.4 2.6 2.7 2.6 2.7 2.3 2.9 2.8
2.5 2.6 2.7 2.8 2.6 2.4 2.7 3.0 4.5 4.5 4.3

3.3

Waste Load Survey Data Analysis. The table gives 52 weekly average ﬂow and BOD

5

data
for wastewater. Plot the data in variety of ways that might interest an engineer who needs
to base a treatment plant design on these data. As a minimum, (a) make the time series plots
for BOD concentration, ﬂow, and BOD mass load (lb/day); and (b) determine whether ﬂow
and BOD are correlated.

Week
Flow
(MGD)
BOD
(mg/L) Week
Flow
(MGD)
BOD

(mg/L) Week
Flow
(MGD)
BOD
(mg/L)

1 3.115 1190 18 3.42 1143 35 2.434 1167
2 3.08 1211 19 3.276 1213 36 2.484 1042
3 4.496 1005 20 3.595 1300 37 2.466 1116
4 3.207 1208 21 4.377 1245 38 2.69 1228
5 3.881 1349 22 3.28 1211 39 2.026 1156
6 4.769 1221 23 3.986 1148 40 1.004 1073
7 3.5 1288 24 3.838 1258 41 1.769 1259
8 5.373 1193 25 3.424 1289 42 1.63 1337
9 3.779 1380 26 3.794 1147 43 2.67 1228
10 3.113 1168 27 2.903 1169 44 1.416 1107
11 4.008 1250 28 1.055 1102 45 2.164 1298
12 3.455 1437 29 2.931 1000 46 2.559 1284
13 3.106 1105 30 2.68 1372 47 1.735 1064
14 3.583 1155 31 2.048 1077 48 2.073 1245
15 3.889 1278 32 2.548 1324 49 1.641 1199
16 4.721 1046 33 1.457 1063 50 2.991 1279
17 4.241 1068 34 1.68 1242 51 3.031 1203
52 2.972 1197

L1592_frame_C03 Page 36 Tuesday, December 18, 2001 1:41 PM
© 2002 By CRC Press LLC

3.4

Efﬂuent Suspended Solids. The data below are efﬂuent suspended solids data for one year of
a wastewater treatment plant operation. Plot the data and discuss any patterns or character-
istics of the data that might interest plant management or operators.

3.5

Solid Waste Fuel Value. The table gives fuel values (Btu/lb) for typical solid waste from 35
countries. (The United States is number 35). Make a matrix scatterplot of the ﬁve waste
characteristics and any other plots that might help to identify a plausible model to relate fuel
value to composition.

Day Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec

13332162521202926 6231811
23722142630113633181316 9
3232212263122682113132012
42838143028213221 8 91515
54110132731212419 9 91218
6373423243527221922 51217
7233215202658221538 61219
82433222539403120111039 6
9224217221325322510 5 810
10 28 19 22 15 21 24 21 11 8 10 16 9
11 27 29 28 12 20 26 24 13 9 10 14 10
12 16 28 11 18 41 24 20 14 10 5 12 12
13 18 31 41 30 25 32 23 26 12 4 13 9
14 24 29 30 23 25 9 24 19 14 16 15 11
15 21 20 34 23 21 13 26 21 15 7 16 14
16 26 21 24 33 20 12 23 16 9 8 14 8
17 21 12 19 38 19 15 22 7 18 11 18 9

18 31 37 35 31 21 20 28 13 20 8 15 4
19 28 25 27 31 18 24 14 10 29 11 12 17
20 13 19 20 32 24 20 28 10 33 5 10 21
21 14 16 60 36 42 9 36 11 14 10 16 28
22 30 23 68 24 28 8 50 11 11 15 18 42
2341 946203515241426471022
24 30 17 27 20 25 20 22 11 34 20 11 16
25 47 20 23 22 11 19 24 16 38 13 3 9
26 33 24 26 25 22 28 26 13 40 28 9 12
27 41 28 26 25 25 29 11 12 41 41 8 9
28 44 13 42 24 23 25 19 20 18 29 9 7
29 30 20 33 27 28 21 10 28 29 29 10
30 15 21 13 33 25 21 13 38 18 11 7
31 32 22 26 24 14 25 12

MSW Components
Fuel Value
Country Paper Metal Glass Food Plastic (Btu/lb)
1 38 11 18 13 0.1 3260
2 35 10 9 24 6 3951
3 2 1 9 40 1 1239
4 20 5.3 8 40 5 3669
5 10 1.7 1.6 54 1.7 2208
6 1 3 6 80 4 2590
7 22 1 2 56 5 3342
8 13.4 6.2 6.6 41.8 4.2 2637
9 32.9 4.1 6.1 44 6.8 4087
10 37 8 8 28 2 3614
11 55 5 6 20 6 4845
12 30 4 4 30 1 2913

13 6 5 9 77 3 2824
14 20 5 10 21 2 2237
L1592_frame_C03 Page 37 Tuesday, December 18, 2001 1:41 PM
© 2002 By CRC Press LLC
3.6 Highway TPH Contamination. Total petroleum hydrocarbons (TPH) in soil specimens col-
lected at 30 locations alongside a 44.8-mile stretch of major highway are given in the table
below. The length was divided into 29 segments of 1.5 miles and one segment of 1.3 miles.
The sampling order for these segments was randomized, as was the location within each
segment. Also, the sample collection was randomized with respect to the eastbound or
westbound lane of the roadway. There are duplicate measurements on three specimens. Plot
the data in a variety of ways to check for randomness, independence, trend, and other inter-
esting patterns.
15 32 2 10 9 11 3596
16 3 1 8 36 1 1219
17 10 2 1 72 6 3136
18 17.2 1.8 2.1 69.8 3.8 3708
19 31 7 3 36 7 3945
20 21 5.7 3.9 50 6.2 3500
21 12.2 2.7 1.3 42.6 1 2038
22 22.2 3.2 11.9 50 6.2 3558
23 28 6 7 48 0.1 3219
24 15.5 4.5 2.5 51.5 2 2651
25 38.2 2 7.5 30.4 6.5 3953
26 2.2 2.2 1.75 52.5 1.2 1574
27 17 2 5 43 4 2649
28 24 9 8 53 2 3462
29 43 3 1 5 6 3640
30 18 4 3 50 4 2555
31 8 1 6 80 1 2595
32 4 3 30 2.6 1418

33 50 7 8 15 8 4792
34 8 1 3 25 2 1378
35 28.9 9.3 10.4 17.8 3.4 3088
Source: Khan, et al., J. Envir. Eng., ASCE, 117, 376, 1991.
Distance
(mile) Location
Sample
Order
TPH
(mg/kg)
Distance
(mile) Location
Sample
Order
TPH
(mg/kg)
0.7 West 14 19 23.4 West 7 101
1.5 East 15 40 25.3 West 6 119
4.1 West 13 48 25.6 West 5 129
4.8 East 17 23 28.2 East 23a 114
7.2 East 30 79 28.2 East 23b 92
8.7 West 12 19 28.7 East 24a 43
9 West 11 118 28.7 East 24b 62
11.7 East 18 8 30.9 East 25 230
12.6 West 10 91 32.8 West 4 14
14.5 East 19 21 33.2 West 3 272
15.6 East 20 36 34.6 West 2 242
16.6 West 9 16 36.7 West 1 30
19.4 West 8 44 38.9 East 26 76
20.5 East 21 44 39.8 East 27 208

21.7 East 22a 160 40.6 East 28 196
21.7 East 22b 153 42.7 West 16 125
44.3 East 29 167
Source: Phillips, I. (1999). Unpublished paper, Tufts University.
L1592_frame_C03 Page 38 Tuesday, December 18, 2001 1:41 PM
© 2002 By CRC Press LLC
3.7 Heavy Metals. Below are 100 daily observations of wastewater inﬂuent and efﬂuent lead (Pb)
concentration, measured as
µ
g/L, in wastewater. State your expectation for the relation
between inﬂuent and efﬂuent and then plot the data to see whether your ideas need modiﬁ-
cation.
Obs Inf Eff Obs Inf Eff Obs Inf Eff Obs Inf Eff
1 47 2 26 16 7 51 29 1 76 13 1
2 30 4 27 32 9 52 21 1 77 14 1
3 23 4 28 19 6 53 18 1 78 18 1
4 29 1 29 22 4 54 19 1 79 10 1
5 30 6 30 32 4 55 27 1 80 4 1
6 28 1 31 29 7 56 36 2 81 5 1
7 13 6 32 48 2 57 27 1 82 60 2
8 15 3 33 34 1 58 28 1 83 28 1
9 30 6 34 22 1 59 31 1 84 18 1
10 52 6 35 37 2 60 6 1 85 8 11
11 39 5 36 64 19 61 18 1 86 11 1
12 29 2 37 24 15 62 97 1 87 16 1
13 33 4 38 33 36 63 20 1 88 15 1
14 29 5 39 41 2 64 17 2 89 25 3
15 33 4 40 28 2 65 9 3 90 11 1
16 42 7 41 21 3 66 12 6 91 8 1
17 36 10 42 27 1 67 10 5 92 7 1

18 26 4 43 30 1 68 23 5 93 4 1
19 105 82 44 34 1 69 41 4 94 3 1
20 128 93 45 36 3 70 28 4 95 4 1
21 122 2 46 38 2 71 18 4 96 6 1
22 170 156 47 40 2 72 5 1 97 5 2
23 128 103 48 10 2 73 2 1 98 5 1
24 139 128 49 10 1 74 19 10 99 5 1
25 31 7 50 42 1 75 24 10 100 16 1
L1592_frame_C03 Page 39 Tuesday, December 18, 2001 1:41 PM
© 2002 By CRC Press LLC

4

Smoothing Data

KEY WORDS

moving average, exponentially weighted moving average, weighting factors, smooth-
ing, and median smoothing.

Smoothing is drawing a smooth curve through data in order to eliminate the roughness (scatter) that blurs
the fundamental underlying pattern. It sharpens our focus by unhooking our eye from the irregularities.
Smoothing can be thought of as a decomposition of the data. In curve ﬁtting, this decomposition has
the general relation:

data

=

ﬁt

+

residuals

. In smoothing, the analogous expression is:

data

=

smooth

+

rough

. Because the

smooth

is intended to be smooth (as the “ﬁt” is smooth in curve ﬁtting), we usually
show its points connected. Similarly, we show the

rough

(or residuals) as separated points, if we show
them at all. We may choose to show only those rough (residual) points that stand out markedly from
the smooth (Tukey, 1977).
We will discuss several methods of smoothing to produce graphs that are especially useful with time
series data from treatment plants and complicated environmental systems. The methods are well estab-
lished and have a long history of successful use in industry and econometrics. The methods are effective
and economical in terms of time and money. They are simple; they are useful to everyone, regardless
of statistical expertise. Only elementary arithmetic is needed. A computer may be helpful, but is not
needed, especially if one keeps the plot up-to-date by adding points daily or weekly as they become
available.
In statistics and quality control literature, one ﬁnds mathematics and theory that can embellish these
graphs. A formal statistical analysis, such as adding control limits, can become quite complex because
often the assumptions on which such tests are usually based are violated rather badly by environmental
data. These embellishments are discussed in another chapter.

Smoothing Methods

One method of smoothing would be to ﬁt a straight line or polynomial curve to the data. Aside from
the computational bother, this is not a useful general procedure because the very fact that smoothing is
needed means that we cannot see the underlying pattern clearly enough to know what particular polynomial
would be useful.
The simplest smoothing method is to plot the data on a logarithmic scale (or plot the logarithm of

y

instead of

y

itself). Smoothing by plotting the moving averages (MA) or exponentially weighted moving
averages (EWMA) requires only arithmetic.
A moving average (MA) gives equal weight to a sequence of past values; the weight depends on how
many past values are to be remembered. The EWMA gives more weight to recent events and progressively
forgets the past. How quickly the past is forgotten is determined by one parameter. The EWMA will
follow the current observations more closely than the MA. Often this is desirable but this responsiveness
is purchased by a loss in smoothing.
The choice of a smoothing method might be inﬂuenced by the application. Because the EWMA forgets
the past, it may give a more realistic representation of the actual threat of the pollutant to the environment.

L1592_Frame_C04 Page 41 Tuesday, December 18, 2001 1:41 PM
© 2002 By CRC Press LLC

For example, the BOD discharged into a freely ﬂowing stream is important the day it is discharged. A
2- or 3-day average might also be important because a few days of dissolved oxygen depression could
be disastrous while one day might be tolerable to aquatic organisms. A 30-day average of BOD could
be a less informative statistic about the threat to ﬁsh than a short-term average, but it may be needed to
assess the long-term trend in treatment plant performance.
For suspended solids that settle on a stream bed and form sludge banks, a long-term average might
be related to depth of the sludge bed and therefore be an informative statistic. If the solids do not settle,
the daily values may be more descriptive of potential damage. For a pollutant that could be ingested by
an organism and later excreted or metabolized, the exponentially weighted moving average might be a
good statistic.

Conversely, some pollutants may not exhibit their effect for years. Carcinogens are an example where
the long-term average could be important. Long-term in this context is years, so the 30-day average would
not be a particularly useful statistic. The ﬁrst ingested (or inhaled) irritants may have more importance
than recently ingested material. If so, perhaps past events should be weighted more heavily than recent
events if a statistic is to relate source of pollution to present effect. Choosing a statistic with the
appropriate weighting could increase the value of the data to biologists, epidemiologists, and others who
seek to relate pollutant discharges to effects on organisms.

Plotting on a Logarithmic Scale

The top panel of Figure 4.1 is a plot of inﬂuent copper concentration at a wastewater treatment plant.
This plot emphasizes the few high values, expecially those at days 225, 250, and 340. The bottom panel
shows the same data on a logarithmic scale. Now the process behavior appears more consistent. The
low values are more evident, and the high values do not seem so extreme. The episode around day 250
still looks unusual, but the day 225 and 340 values are above the average (on the log scale) by about
the same amount that the lowest values are below average.
Are the high values so extraordinary as to deserve special attention? Or are they rogue values (outliers)
that can be disregarded? This question cannot be answered without knowing the underlying distribution
of the data. If the underlying process naturally generates data with a lognormal distribution, the high
values ﬁt the general pattern of the data record.

FIGURE 4.1

Copper data plotted on arithmetic and logarithmic scales give a different impression about the high values.
350300250200150100500
Days
0
500
1000
10

100
1000
10000
Copper (mg/L)Copper (mg/L)

L1592_Frame_C04 Page 42 Tuesday, December 18, 2001 1:41 PM

Tài liệu Statistics for Environmental Engineers P2 doc

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về