Tải bản đầy đủ (.pdf) (737 trang)

Applied statistics for civil and environmental engineers, 2nd edition

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (7.46 MB, 737 trang )

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36

APPLIED STATISTICS FOR CIVIL AND
ENVIRONMENTAL ENGINEERS

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.
i

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW



BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.
ii

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36


APPLIED STATISTICS FOR
CIVIL AND ENVIRONMENTAL
ENGINEERS
Second Edition

Nathabandu T. Kottegoda
Department of Hydraulic, Environmental, and Surveying Engineering
Politecnico di Milano, Italy

Renzo Rosso
Department of Hydraulic, Environmental, and Surveying Engineering
Politecnico di Milano, Italy

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.
iii

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008


T1: SFK

12:36

This edition first published 2008
C 2008 by Blackwell Publishing Ltd and 1997 by The McGraw-Hill Companies, Inc.
Blackwell Publishing was acquired by John Wiley & Sons in February 2007. Blackwell’s publishing
programme has been merged with Wiley’s global Scientific, Technical, and Medical business to form
Wiley-Blackwell.
Registered office
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
Editorial office
9600 Garsington Road, Oxford, OX4 2DQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for
permission to reuse the copyright material in this book please see our website at
www.wiley.com/wiley-blackwell.
The right of the author to be identified as the author of this work has been asserted in accordance with the
Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted,
in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as
permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be
available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand
names and product names used in this book are trade names, service marks, trademarks or registered
trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in
this book. This publication is designed to provide accurate and authoritative information in regard to the
subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering
professional services. If professional advice or other expert assistance is required, the services of a competent
professional should be sought.

ISBN: 978-1-4051-7917-1
Library of Congress Cataloging-in-Publication Data
Kottegoda, N. T.
Applied statistics for civil and environmental engineers / Nathabandu T. Kottegoda, Renzo Rosso. – 2nd ed.
p. cm.
Prev. ed. published as: Statistics, probability, and reliability for civil and environmental engineers. New York :
McGraw-Hill, c1997.
Includes bibliographical references and index.
ISBN-13: 978-1-4051-7917-1 (hardback : alk. paper)
ISBN-10: 1-4051-7917-1 (hardback : alk. paper) 1. Civil engineering–Statistical methods. 2. Environmental
engineering–Statistical methods. 3. Probabilities. I. Rosso, Renzo. II. Kottegoda, N. T. Statistics, probability,
and reliability for civil and environmental engineers. III. Title.
TA340.K67 2008
519.502 4624–dc22
2007047496
A catalogue record for this book is available from the British Library.
Set in 10/12pt Times by Aptara Inc., New Delhi, India
Printed in Singapore by Utopia Press Pte Ltd
1 2008

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.
iv

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW


BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36

Contents

Dedication

xiii

Preface to the First Edition

xiv

Preface to the Second Edition

xvi

Introduction

1

1


Preliminary Data Analysis
1.1 Graphical Representation
1.1.1 Line diagram or bar chart
1.1.2 Dot diagram
1.1.3 Histogram
1.1.4 Frequency polygon
1.1.5 Cumulative relative frequency diagram
1.1.6 Duration curves
1.1.7 Summary of Section 1.1
1.2 Numerical Summaries of Data
1.2.1 Measures of central tendency
1.2.2 Measures of dispersion
1.2.3 Measure of asymmetry
1.2.4 Measure of peakedness
1.2.5 Summary of Section 1.2
1.3 Exploratory Methods
1.3.1 Stem-and-leaf plot
1.3.2 Box plot
1.3.3 Summary of Section 1.3
1.4 Data Observed in Pairs
1.4.1 Correlation and graphical plots
1.4.2 Covariance and the correlation coefficient
1.4.3 Q-Q plots
1.4.4 Summary of Section 1.4
1.5 Summary for Chapter 1
References
Problems

3
3

4
4
5
8
9
10
11
11
12
15
19
19
19
20
20
22
23
23
23
24
26
27
27
28
29

2

Basic Probability Concepts
2.1 Random Events

2.1.1 Sample space and events
2.1.2 The null event, intersection, and union
2.1.3 Venn diagram and event space
2.1.4 Summary of Section 2.1

38
39
39
41
43
49
v

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK


12:36

vi Contents

3

4

2.2

Measures of Probability
2.2.1 Interpretations of probability
2.2.2 Probability axioms
2.2.3 Addition rule
2.2.4 Further properties of probability functions
2.2.5 Conditional probability and multiplication rule
2.2.6 Stochastic independence
2.2.7 Total probability and Bayes’ theorems
2.2.8 Summary of Section 2.2
2.3 Summary for Chapter 2
References
Problems

50
50
52
53
55
56
61

65
72
72
73
74

Random Variables and Their Properties
3.1 Random Variables and Probability Distributions
3.1.1 Random variables
3.1.2 Probability mass function
3.1.3 Cumulative distribution function of a discrete random
variable
3.1.4 Probability density function
3.1.5 Cumulative distribution function of a continuous random
variable
3.1.6 Summary of Section 3.1
3.2 Descriptors of Random Variables
3.2.1 Expectation and other population measures
3.2.2 Generating functions
3.2.3 Estimation of parameters
3.2.4 Summary of Section 3.2
3.3 Multiple Random Variables
3.3.1 Joint probability distributions of discrete variables
3.3.2 Joint probability distributions of continuous variables
3.3.3 Properties of multiple variables
3.3.4 Summary of Section 3.3
3.4 Associated Random Variables and Probabilities
3.4.1 Functions of a random variable
3.4.2 Functions of two or more variables
3.4.3 Properties of derived variables

3.4.4 Compound variables
3.4.5 Summary of Section 3.4
3.5 Copulas
3.6 Summary for Chapter 3
References
Problems

83
83
83
84

88
90
90
90
99
103
112
112
113
118
124
132
132
133
135
143
151
154

154
157
157
160

Probability Distributions
4.1 Discrete Distributions
4.1.1 Bernoulli distribution
4.1.2 Binomial distribution
4.1.3 Poisson distribution
4.1.4 Geometric and negative binomial distributions

165
165
166
167
171
181

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

85
86

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW


BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36

Contents vii

5

4.1.5 Log-series distribution
4.1.6 Multinomial distribution
4.1.7 Hypergeometric distribution
4.1.8 Summary of Section 4.1
4.2 Continuous Distributions
4.2.1 Uniform distribution
4.2.2 Exponential distribution
4.2.3 Erlang and gamma distribution
4.2.4 Beta distribution
4.2.5 Weibull distribution
4.2.6 Normal distribution
4.2.7 Lognormal distribution
4.2.8 Summary of Section 4.2
4.3 Multivariate Distributions
4.3.1 Bivariate normal distribution

4.3.2 Other bivariate distributions
4.4 Summary for Chapter 4
References
Problems

185
187
189
192
194
194
196
200
203
205
209
215
217
217
219
222
222
223
224

Model Estimation and Testing
5.1 A Review of Terms Related to Random Sampling
5.2 Properties of Estimators
5.2.1 Unbiasedness
5.2.2 Consistency

5.2.3 Minimum variance
5.2.4 Efficiency
5.2.5 Sufficiency
5.2.6 Summary of Section 5.2
5.3 Estimation of Confidence Intervals
5.3.1 Confidence interval estimation of the mean when the
standard deviation is known
5.3.2 Confidence interval estimation of the mean when the
standard deviation is unknown
5.3.3 Confidence interval for a proportion
5.3.4 Sampling distribution of differences and sums of statistics
5.3.5 Interval estimation for the variance: chi-squared distribution
5.3.6 Summary of Section 5.3
5.4 Hypothesis Testing
5.4.1 Procedure for testing
5.4.2 Probabilities of Type I and Type II errors and the
power function
5.4.3 Neyman-Pearson lemma
5.4.4 Tests of hypotheses involving the variance
5.4.5 The F distribution and its use
5.4.6 Summary of Section 5.4
5.5 Nonparametric Methods
5.5.1 Sign test applied to the median
5.5.2 Wilcoxon signed-rank test for association of paired
observations

230
230
231
231

232
232
234
234
235
236

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

236
239
242
242
243
247
247
248
254
256
257
258
259
260
261
262

www.ebookcenter.ir


P1: SFK/RPW


P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36

viii Contents
5.5.3 Kruskal-Wallis test for paired observations in k samples
5.5.4 Tests on randomness: runs test
5.5.5 Spearman’s rank correlation coefficient
5.5.6 Summary of Section 5.5
5.6 Goodness-of-Fit Tests
5.6.1 Chi-squared goodness-of-fit test
5.6.2 Kolmogorov-Smirnov goodness-of-fit test
5.6.3 Kolmogorov-Smirnov two-sample test
5.6.4 Anderson-Darling goodness-of-fit test
5.6.5 Other methods for testing the goodness-of-fit to a
normal distribution
5.6.6 Summary of Section 5.6
5.7 Analysis of Variance
5.7.1 One-way analysis of variance
5.7.2 Two-way analysis of variance
5.7.3 Summary of Section 5.7

5.8 Probability Plotting Methods and Visual Aids
5.8.1 Probability plotting for uniform distribution
5.8.2 Probability plotting for normal distribution
5.8.3 Probability plotting for Gumbel or EV1 distribution
5.8.4 Probability plotting of other distributions
5.8.5 Visual fitting methods based on the histogram
5.8.6 Summary of Section 5.8
5.9 Identification and Accommodation of Outliers
5.9.1 Hypothesis tests
5.9.2 Test statistics for detection of outliers
5.9.3 Dealing with nonnormal data
5.9.4 Estimation of probabilities of extreme events when outliers
are present
5.9.5 Summary of Section 5.9
5.10 Summary of Chapter 5
References
Problems
6

Methods of Regression and Multivariate Analysis
6.1 Simple Linear Regression
6.1.1 Estimates of the parameters
6.1.2 Properties of the estimators and errors
6.1.3 Tests of significance and confidence intervals
6.1.4 The bivariate normal model and correlation
6.1.5 Summary of Section 6.1
6.2 Multiple Linear Regression
6.2.1 Formulation of the model
6.2.2 Linear least squares solutions using the matrix method
6.2.3 Properties of least squares estimators and error variance

6.2.4 Model testing
6.2.5 Model adequacy
6.2.6 Residual plots
6.2.7 Influential observations and outliers in regression
6.2.8 Transformations

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

264
267
268
269
270
271
273
274
277
281
282
283
284
288
294
295
296
297
300
301
303
305

305
306
307
309
311
312
312
313
316
326
327
328
332
337
339
342
342
343
343
346
350
355
356
358
365

www.ebookcenter.ir


P1: SFK/RPW


P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36

Contents ix
6.2.9 Confidence intervals on mean response and prediction
6.2.10 Ridge regression
6.2.11 Other methods and discussion of Section 6.2
6.3 Multivariate Analysis
6.3.1 Principal components analysis
6.3.2 Factor analysis
6.3.3 Cluster analysis
6.3.4 Other methods and summary of Section 6.3
6.4 Spatial Correlation
6.4.1 The estimation problem
6.4.2 Spatial correlation and the semivariogram
6.4.3 Some semivariogram models and physical aspects
6.4.4 Spatial interpolations and Kriging
6.4.5 Summary of Section 6.4
6.5 Summary of Chapter 6
References

Problems

366
368
370
373
373
379
383
385
386
387
387
389
391
394
394
395
398

7

Frequency Analysis of Extreme Events
7.1 Order Statistics
7.1.1 Definitions and distributions
7.1.2 Functions of order statistics
7.1.3 Expected value and variance of order statistics
7.1.4 Summary of Section 7.1
7.2 Extreme Value Distributions
7.2.1 Basic concepts of extreme value theory

7.2.2 Gumbel distribution
7.2.3 Fr´echet distribution
7.2.4 Weibull distribution as an extreme value model
7.2.5 General extreme value distribution
7.2.6 Contagious extreme value distributions
7.2.7 Use of other distributions as extreme value models
7.2.8 Summary of Section 7.2
7.3 Analysis of Natural Hazards
7.3.1 Floods, storms, and droughts
7.3.2 Earthquakes and volcanic eruptions
7.3.3 Winds
7.3.4 Sea levels and highest sea waves
7.3.5 Summary of Section 7.3
7.4 Summary of Chapter 7
References
Problems

405
406
406
409
411
415
415
415
422
429
432
435
439

445
450
453
453
461
465
470
473
474
474
478

8

Simulation Techniques for Design
8.1 Monte Carlo Simulation
8.1.1 Statistical experiments
8.1.2 Probability integral transform
8.1.3 Sample size and accuracy of Monte Carlo experiments
8.1.4 Summary for Section 8.1
8.2 Generation of Random Numbers

487
488
488
493
495
501
501


SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

x

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36

Contents

9

8.2.1 Random outcomes from standard uniform variates
8.2.2 Random outcomes from continuous variates
8.2.3 Random outcomes from discrete variates
8.2.4 Random outcomes from jointly distributed variates
8.2.5 Summary of Section 8.2

8.3 Use of Simulation
8.3.1 Distributions of derived design variates
8.3.2 Sampling statistics
8.3.3 Simulation of time- or space-varying systems
8.3.4 Design alternatives and optimal design
8.3.5 Summary of Section 8.3
8.4 Sensitivity and Uncertainty Analysis
8.5 Summary and Discussion of Chapter 8
References
Problems

501
506
511
513
514
514
514
517
519
524
530
530
531
531
533

Risk and Reliability Analysis
9.1 Measures of Reliability
9.1.1 Factors of safety

9.1.2 Safety margin
9.1.3 Reliability index
9.1.4 Performance function and limiting state
9.1.5 Further practical solutions
9.1.6 Summary of Section 9.1
9.2 Multiple Failure Modes
9.2.1 Independent failure modes
9.2.2 Mutually dependent failure modes
9.2.3 Summary of Section 9.2
9.3 Uncertainty in Reliability Assessments
9.3.1 Reliability limits
9.3.2 Bayesian revision of reliability
9.3.3 Summary of Section 9.3
9.4 Temporal Reliability
9.4.1 Failure process and survival time
9.4.2 Hazard function
9.4.3 Reliable life
9.4.4 Summary of Section 9.4
9.5 Reliability-Based Design
9.6 Summary for Chapter 9
References
Problems

541
542
542
547
550
558
568

577
577
578
584
592
592
592
593
597
597
597
602
605
606
606
612
613
615

10 Bayesian Decision Methods and Parameter Uncertainty
10.1 Basic Decision Theory
10.1.1 Bayes’ rules
10.1.2 Decision trees
10.1.3 The minimax solution
10.1.4 Summary of Section 10.1
10.2 Posterior Bayesian Decision Analysis
10.2.1 Subjective probabilities

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.


623
624
624
627
630
632
632
633

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36

Contents xi
10.2.2 Loss and utility functions
10.2.3 The discrete case
10.2.4 Inference with conditional binomial and prior beta

10.2.5 Poisson hazards and gamma prior
10.2.6 Inferences with normal distribution
10.2.7 Likelihood ratio testing
10.2.8 Summary of Section 10.2
10.3 Markov Chain Monte Carlo Methods
10.4 James-Stein Estimators
10.5 Summary and Discussion of Chapter 10
References
Problems

634
635
636
638
639
642
643
643
650
653
653
656

Appendix A: Further mathematics
A.1
Chebyshev Inequality
A.2
Convex Function and Jensen Inequality
A.3
Derivation of the Poisson distribution

A.4
Derivation of the normal distribution
A.5
MGF of the normal distribution
A.6
Central limit theorem
A.7
Pdf of Student’s T distribution
A.8
Pdf of the F distribution
A.9
Wilcoxon signed-rank test: mean and variance of the test statistic
A.10 Spearman’s rank correlation coefficient

659
659
659
659
660
661
662
663
664
664
665

Appendix B: Glossary of Symbols

667


Appendix C: Tables of Selected Distributions

673

Appendix D: Brief Answers to Selected Problems

684

Appendix E: Data Lists

687

Index

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

707

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008


T1: SFK

12:36

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.
xii

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36

Dedication

To my parents. To estimate the debt I owe them requires a lifespan of nibbanic extent. To
Mali, Shani, Siraj, and Natasha.
N.T.K.

A mamma Aria, a Donatella, ai due Riccardi della mia vita e al nostro indimenticabile
Rufus.
R.R.

xiii

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36

Preface to the First Edition

Statistics, probability, and reliability are subject areas that are not commonly easy for students of civil and environmental engineering. Such difficulties notwithstanding, a greater
emphasis is currently being made on the teaching of these methods throughout institutions
of higher learning. Many professors with whom we have spoken have expressed the need

for a single textbook of sufficient breadth and clarity to cover these topics.
One might ask why it is necessary to write a new book specifically for civil and environmental engineers. Firstly, we see a particular importance of statistical and associated
methods in our disciplines. For example, some modes of failure, interactions, probability
distributions, outliers, and spatial relationships that one encounters are unique and require
different approaches. Secondly, colleagues have said that existing books are either old and
outdated or omit particularly important engineering problems, emphasizing instead areas
that may not be directly relevant to the practitioner.
We set ourselves several objectives in writing this book. First, it was necessary to update
much of the older material, which have rightly stood for decades, even centuries. Indeed.
Second, we had to look at the engineer’s structures, waterways, and the like and bring in
as much material as possible for the tasks at hand. We felt an urgent need to modernize,
incorporate new concepts throughout, and reduce or eliminate the impact of some topics.
We aimed to order the material in a logical sequence. In particular we tried to adopt a
writing style and method of presentation that are lively and without overrigorous drudgery.
These had to be accomplished without compromising a deep and thorough treatment of
fundamentals.
The layout of the book is sraightforward, so it can be used to suit one’s personal needs.
We apologize to any readers who think we have strayed from the path of simplicity in
certain parts, such as the associated variables and contagious distributions of Chapter 3
and the order statistics of Chapter 7. One might wish to omit these sections on a first
reading. The introductions to the chapters will be helpful for this purpose.
The explanation of the theory is accompanied by the assumptions made. Definitions are
separately highlighted. In many places we point out the limitations and pitfalls or violations. There are warnings of possible misuses, misunderstandings, and misinterpretations.
We provide guidance to the proper interpretation of statistical results.
The numerous examples, for which we have for the most part used recorded observations, will be helpful to beginners as well as to mature students who will consult the text
as a reference. We hope these examples will lead to a better understanding of the material
and design variabilities, a prelude to the making of sound decisions.
Each chapter concludes with extensive homework problems. In many instances, as in
Chapter 1, they are based on real data not used elsewhere in the text. We have not used
cards or dice or coins or black and red balls in any of the problems and examples. Answers

to selected problems are summarized in Appendix D. A detailed manual of solutions is
available.
Computers are continuously becoming cheaper and more powerful. Newer ways of
handling data are being devised. At the inception, we seriously considered the use of
commercial software packages to enhance the scope of the book. However, the problem
of choosing one, from the many suitable packages acted as a deterrent. Our concern was the
serious limitations imposed by utilizing a source that necessitates corresponding purchase
xiv

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36

Preface to the First Edition xv
by an adopting school or by individual engineers. Besides, the calculations illustrated

in the book can be made using worksheets available as standard software for personal
computers. As an aid, the data in Appendix E will be placed on the Internet.
We have utilized the space saved (from jargon and notation of a particular software,
output, graphs, and tables) to widen the scope, make our explanations more thorough,
and insert additional illustrations and problems. Readers also have an almost all-inclusive
index, a comprehensive glossary of notation, additional mathematical explanations, and
other material in the appendixes. Furthermore, we hope that the extensive, annotated bibliographies at the end of each chapter, numerous citations and tables, will make this a useful
reference source.
The book is written for use by students, practicing engineers, teachers, and researchers in
civil and environmental engineering and applied statistics; female readers will find no hint
of male chauvinism here. It is designed for a one- or two-semester course and is suitable
for final-year undergraduate and first-year graduate students. The text is self-contained for
study by engineers. A background of elementary calculus and matrix algebra is assumed.

ACKNOWLEDGMENTS
We acknowledge with thanks the work of the staff at Publication Services, Inc., in Champaign, IL. Gianfausto Salvadori gave his time generously in reviewing the manuscript and
providing solutions to some homework problems. Thanks are due again to Adri Buishand
for his elaborate and painstaking reviews. Our publisher solicited other reviewers whose
reports were useful. Howard Tillotson and colleagues at the University of Birmingham,
England, provided data and some student problems. Discussions with Tony Lawrance at
lunch in the University Staff House and the example problem he solved at Helsinki Airport
are appreciated. Valuable assistance was provided by Giovanni Solari and Giulio Ballio in
wind and steel engineering, respectively. In addition, Giovanni Vannuchi was consulted
on geotechnical engineering. Research staff and doctoral students at the Politecnico di Milano helped with the homework problems and the preparation of the index. Dora Tartaglia
worked diligently on revisions to the manuscript. We thank the publishers, companies,
and individuals who gave us permission to use their material, data, and tables; some of the
tables were obtained through our own resources We shall be pleased to have any omissions
brought to our notice. The support and hospitality provided at the Universit`a degli Studi di
Pavia by Luigi Natale and others are acknowledged with thanks. Most importantly, without
the patience and tolerance of our families this book could not have been completed.

N. T. Kottegoda
R. Rosso
Milano, Italy
1 July 1996

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36

Preface to the Second Edition

Last year a senior European professor, who uses our book, was visiting us in Milano.
When told of the revisions underway he expressed some surprise. “There is nothing to
revise,” he said. But all books need revision sooner or later, especially a multidimensional
one. The equations, examples, problems, figures, tables, references, and footnotes are all

subject to inevitable human fallibilities: typographical errors and errors of fact. Our first
objective was to bring the text as close to the ideal state as possible. The second priority
was to modernize.
In Chapter 10, a new section is added on Markov chain Monte Carlo modeling; this has
popularized Bayesian methods in recent years; there is a full description and case study
on Gibbs sampling. In Chapter 8 on simulation, we include a new section on sensitivity
analysis and uncertainty analysis; a clear and detailed distinction is made between epistemic and aleatory uncertainties; their implications in decision-making are discussed. In
Chapter 7 on Frequency Analysis of Extreme Events, natural hazards and flood hydrology
are updated. In Chapter 6 on regression analysis, further considerations have been made on
the diagnostics of regression; there are new discussions on general and generalized linear
models. In Chapter 5 on Model Estimation and Testing we give special importance to the
Anderson-Darling goodness-of-fit test because of its sensitivity to departures in the tail
areas of a probability density function; we make applications to nonnormal distributions
using the same data as in the estimation of parameters. In Chapter 3 a section is added on
the novel method of copulas with particular emphasis on bivariate distributions. We have
revised the problems following Basic Probability Concepts in Chapter 2. Other chapters
are also revised and modernized and the annotated references are updated.
As before, we have kept in mind the scientific method of Claude Bernard, the French
medical researcher of the nineteenth century. This had three essential parts: observation of
phenomena in nature (seen in Appendix E, and in the examples and problems), observation
of experiments (as reported in each chapter), and the theoretical part (clear enough for the
audience in mind, but without over-simplification).
“Nobody trusts a model except the one who originated it; everybody trusts data except
those who record it.” Models and data are subject to uncertainty. There is still a gap
between models and data. We attempt to bridge this gap.
The title of the book has been abridged from Statistics, Probability, and Reliability
for Civil and Environmental Engineers to Applied Statistics for Civil and Environmental
Engineers. The applications and problems pertain almost equally to both disciplines and
all areas are included.
Another aspect we emphasized before was that the calculations illustrated in the book

can be made using worksheets available as standard software for personal computers.
Alternatively, R which is now commonplace can be downloaded free of charge and adopted
to run some of the homework problems, if one so prefers. Our decision not to recommend
the use of particular commercial software packages, by giving details of jargon, notation,
and so on, seems to be justified. We find that a specific version soon become obsolete with
the advent of a new version.
A limited access solutions manual is available with the data from Appendix E on the
Wiley-Blackwell website [www.blackwellpublishing.com/kottegoda].
xvi

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36

Preface to the Second Edition xvii

We are grateful for the encouragement given by many users of the first edition, and
to the few who pointed out some discrepancies. We thank the anonymous reviewers for
their useful comments. Gianfausto Salvadori, Carlo De Michele, Adri Buishand, and Tony
Lawrance assisted us again in the revisions. Julia Burden and Lucy Alexander of Blackwell
Publishing supported us throughout the project. Universit`a degli Studi di Pavia is thanked
for continued hospitality. The help provided by Fabrizio Borsa and Enrico Raiteri in the
preparation of some figures is acknowledged.
N. T. Kottegoda
R. Rosso
Milano, Italy
14 September 2007

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 13, 2008

T1: SFK

12:36


SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.
xviii

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 2, 2008

T1: SFK

15:39

Introduction

As a wide-ranging discipline, statistics concerns numerous procedures for deriving information from data that have been affected by chance variations. On the basis of scientific
experiments, one may record and make summaries of observations, quantify variations,
or other changes of significance, and compare data sequences by means of some numbers
or characteristics. The use of statistics in this way is for descriptive purposes. At a more
sophisticated level of analysis and interpretation, one can, for instance, test hypotheses
using the inferential approach developed during the twentieth century. Thus it may be
ascertained, for instance, whether the change of an ingredient affects the properties of

a concrete or whether a particular method of surfacing produces a longer-lasting road;
this approach often includes the estimation by means of observations of the parameters
of a statistical model. Then inferences can be drawn from data and predictions made or
decisions taken. When faced with uncertainty, this last phase is the principal aim of a civil
or environmental engineer acting as an applied statistician.
In all activities, engineers have to cope with possible uncertainties. Observations of soil
pressures, tensile strengths of concrete, yield strengths of steel, traffic densities, rainfalls,
river flows, and pollution loads in streams vary from one case to the next for apparently
unknown reasons or on account of factors that cannot be assessed to any degree of accuracy. However, designs need to be completed and structures, highways, water supply, and
sewerage schemes constructed. Sound engineering judgment, in fact, springs from physical and mathematical theories, but it goes far beyond that. Randomness in nature must be
taken into account. Thus the onus of dealing with the uncertainties lies with the engineer.
The appropriate methods of tackling the uncertainty vary with different circumstances.
The key is often the dispersion that is commonly evidenced in available data sets. Some
phenomena may have negligible or low variability. In such a case, the mean of past observations may be used as a descriptor, for example, the elastic constant of a steel. Nevertheless,
the consequences of a possible change in the mean should also be considered. Frequently,
the variability in observations is found to be quite substantial. In such situations, an engineer sometimes uses, rather conservatively, a design value such as the peak storm runoff
or the compressive strength of a concrete. Alternatively, it has been the practice to express
the ability of a component in a structure to withstand a specified loading without failure
or a permissible deflection by a so-called factor of safety; this is in effect a blanket to
cover all possible contingencies. However, we envisage some problems here in following
a purely deterministic approach because there are doubts concerning the consistency of
specified strengths, flows, loads, or factors from one case to another. These cannot be
lightly dismissed or easily compounded when the consequences of ignoring variability
are detrimental or, in general, if the decision is sensitive to a particular uncertainty. (Often
there are crucial economic considerations in these matters.) This obstacle strongly suggests that the way forward is by treating statistics and probability as necessary aids in
decision making, thus coping with uncertainty through the engineering process.
Note that statistical methods are in no way intended to replace the physical knowledge and experience of the engineer and his or her skills in experimentation. The engineer
should know how the measurements are made and recorded and how errors may arise from
possible limitations in the equipment. There should be readiness to make changes and improvements so that the data-gathering process is as reliable and representative as possible.
1


SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

2

QC: SFK/RPW

April 2, 2008

T1: SFK

15:39

Applied Statistics for Civil and Environmental Engineers
On this basis, statistics can be a complementary and a valuable aid to technology. In prudent
hands it can lead to the best practical assessment of what is partially known or uncertain.
The quantification of uncertainty and the assessment of its effects on design and implementation must include concepts and methods of probability, because statistics is built
on the foundation of probability theory. In addition, decision making under risk involves
the use of applied probability. Historically, probability theory arose as a branch of mathematics concerned with the analysis of certain games of chance; it consequently found
applications in the measurement and understanding of uncertainty in innumerable natural
phenomena and human activities. The fundamental interrelationship between statistics

and probability is clearly evident in practice. As seen in past decades, there has been an
irreversible change in emphasis from descriptive to inferential statistics. In this respect
we must note that statistical inferences and the risk and reliability of decision making
under uncertainty are evaluated through applied probability, using frequentist or Bayesian
estimation. This applies to the most widely used methods. Alternatives that come under
generalized information theory are now available.
The reliability of a system, structure, or component is the complement of its probability
of failure. Risk and reliability analysis, however, entail many activities. The survival probability of a system is usually stated in terms of the reliabilities of its components. The
modeling process is an essential part of the analysis, and time can be an important factor.
Also, the risk factor that one computes may be inherent, additional, or composite. All
these points show that reliability design deserves special emphasis.
Methods of reducing data, reviewed in Chapter 1, begin with tabulation and graphical
representation, which are necessary first steps in understanding the uncertainty in data and
the inherent variability. Numerical summaries provide descriptions for further analysis.
Exploratory methods are followed by relationships between data observed in pairs. Thus
the investigation begins. The route is long and diverse, because statistics is the science
and art of experimenting, collecting, analyzing, and making inferences from data. This
opening chapter provides a route map of what is to follow so that one can gain insight
into the numerous tools statistics offers and realizes the variety of problems that can be
tackled. In Chapters 2 and 3, we develop a background in probability theory for coping
with uncertainty in engineering. Using basic concepts, we then discuss the total probability
and Bayes’ theorems and define statistical properties of distributions used for estimation
purposes. Chapter 4 examines various mathematical models of random processes. There
is a wide-ranging discussion of discrete and continuous distributions; joint and derived
types are also given in Chapters 3 and 4; we introduce copulas that can effectively model
joint distributions. Model estimation and testing methods, such as confidence intervals,
hypothesis testing, analysis of variance, probability plotting, and identification of outliers,
are treated in Chapter 5. The estimation and testing are based on the principle that all
suppositions need to be carefully examined in light of experimentation and observation.
Details of regression and multivariate statistical methods are provided in Chapter 6, along

with principal component analysis and associated methods and spatial correlation. Extreme
value analysis applied to floods, droughts, winds, earthquakes, and other natural hazards is
found in Chapter 7; some special types of models are included. Simulation is the subject of
Chapter 8, which comprises the use of simulation in design and for other practical purposes;
also, we discuss sensitivity analysis and uncertainty analysis of the aleatory and epistemic
types. In Chapter 9, risk and reliability analysis and reliability design are developed in
detail. Chapter 10 is devoted to Bayesian and other types of economic decision making,
used when the engineer faces uncertainty; we include here Markov chain Monte Carlo
methods that have recently popularized the Bayesian approach.

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 15, 2008

T1: SFK

7:11

Chapter 1

Preliminary Data Analysis

All natural processes, as well as those devised by humans, are subject to variability.
Civil engineers are aware, for example, that crushing strengths of concrete, soil pressures,
strengths of welds, traffic flow, floods, and pollution loads in streams have wide variations.
These may arise on account of natural changes in properties, differences in interactions
between the ingredients of a material, environmental factors, or other causes. To cope
with uncertainty, the engineer must first obtain and investigate a sample of data, such as
a set of flow data or triaxial test results. The sample is used in applying statistics and
probability at the descriptive stage. For inferential purposes, however, one needs to make
decisions regarding the population from which the sample is drawn. By this we mean the
total or aggregate, which, for most physical processes, is the virtually unlimited universe
of all possible measurements. The main interest of the statistician is in the aggregation;
the individual items provide the hints, clues, and evidence.
A data set comprises a number of measurements of a phenomenon such as the failure
load of a structural component. The quantities measured are termed variables, each of
which may take any one of a specified set of values. Because of its inherent randomness
and hence unpredictability, a phenomenon that an engineer or scientist usually encounters
is referred to as a random variable, a name given to any quantity whose value depends
on chance.1 Random variables are usually denoted by capital letters. These are classified
by the form that their values can possibly take (or are assumed to take). The pattern of
variability is called a distribution. A continuous variable can have any value on a continuous scale between two limits, such as the volume of water flowing in a river per second
or the amount of daily rainfall measured in some city. A discrete variable, on the contrary,
can only assume countable isolated numbers like integers, such as the number of vehicles
turning left at an intersection, or other distinct values.
Having obtained a sample of data, the first step is its presentation. Consider, for example, the modulus of rupture data for a certain type of timber shown in Table E.1.1, in
Appendix E. The initial problem facing the civil engineer is that such an array of data by
itself does not give a clear idea of the underlying characteristics of the stress values in
this natural type of construction material. To extract the salient features and the particular
types of information one needs, one must summarize the data and present them in some

readily comprehensible forms. There are several methods of presentation, organization,
and reduction of data. Graphical methods constitute the first approach.

1.1 GRAPHICAL REPRESENTATION
If “a picture is worth a thousand words,” then graphical techniques provide an excellent
method to visualize the variability and other properties of a set of data. To the powerful
interactive system of one’s brain and eyes, graphical displays provide insight into the form

1

The term will be formally defined in Section 3.1.

3

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

4

QC: SFK/RPW

April 15, 2008


T1: SFK

7:11

Applied Statistics for Civil and Environmental Engineers
and shape of the data and lead to a preliminary concept of the generating process. We
proceed by assembling the data into graphs, scanning the details, and noting the important
characteristics. There are numerous types of graphs. Line and dot diagrams, histograms,
relative frequency polygons, and cumulative frequency curves are given in this section.
Subsequently, exploratory methods, such as stem-and-leaf plots and box diagrams and
graphs depicting a possible association between two variables, are presented in Sections
1.3 and 1.4. We begin with the simple task of counting.

1.1.1 Line diagram or bar chart
The occurrences of a discrete variable can be classified on a line diagram or bar chart.
In this type of graph, the horizontal axis gives the values of the discrete variable and the
occurrences are represented by the heights of vertical lines. The horizontal spread of these
lines and their relative heights indicate the variability and other characteristics of the data.
Example 1.1. Flood occurrences. Consider the annual number of floods of the Magra River
at Calamazza, situated between Pisa and Genoa in northwestern Italy, over a 34-year period,
as shown in Table 1.1.1.
A flood in the river at the point of measurement means the river has risen above a specified
level, beyond which the river poses a threat to lives and property. The data are plotted in
Fig. 1.1.1 as a line diagram.
The data suggest a symmetrical distribution with a midlocation of four floods per year.
In some other river basins, there is a nonlinear decrease in the occurrences for increasing
numbers of floods in a year commencing at zero, showing a negative exponential type of
variation.


1.1.2 Dot diagram
A different type of graph is required to present continuous data. If the data are few (say,
less than 25 items) a dot diagram is a useful visual aid. Consider the possibility that only
Table 1.1.1 Number of flood occurrences per
year from 1939 to 1972 at the gauging station of
Calamazza on the Magra River, between Pisa
and Genoa in northwestern Italya
Number of floods
in a year

Number of
occurrences

0
1
2
3
4
5
6
7
8
9

0
2
6
7
9
4

1
4
1
0

Total

34

a

A flood occurrence is defined as river discharge
exceeding 300 m3 /s.

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

QC: SFK/RPW

April 15, 2008

T1: SFK


7:11

Preliminary Data Analysis

5

Number of occurrences

9
8
7
6
5
4
3
2
1
0
0

1

2

3

4

5


6

7

8

9

Number of floods
Fig. 1.1.1 Line diagram for flood occurrences in the Magra River at Calamazza between Genoa
and Pisa in northwestern Italy.

the first 15 items of data in Table E.1.1—which shows the modulus of rupture in N/mm2
for 50 mm × 150 mm Swedish redwood and whitewood—are available. The abridged
data are ranked in ascending order and are given in Table 1.1.2 and plotted in Fig. 1.1.2.
The reader can see that the midlocation is close to 40 N/mm2 but the wide spread makes
this location difficult to discern. A larger sample should certainly be helpful.
1.1.3 Histogram
If there are at least, say, 25 observations, one of the most common graphical forms is a
block diagram called the histogram. For this purpose, the data are divided into groups
according to their magnitudes. The horizontal axis of the graph gives the magnitudes.
Blocks are drawn to represent the groups, each of which has a distinct upper and lower
limit. The area of a block is proportional to the number of occurrences in the group.
The variability of the data is shown by the horizontal spread of the blocks, and the most
common values are found in blocks with the largest areas. Other features such as the
symmetry of the data or lack of it are also shown.
The first step is to take into account the range r of the observations, that is, the difference
between the largest and smallest values.
Example 1.2. Timber strength. We go back to the timber strength data given in Table E.1.1.

They are arranged in order of magnitude in Table 1.1.3.
There are n = 165 observations with somewhat high variability, as expected, because
timber is a naturally variable material. Here the range r = 70.22 – 0.00 = 70.22 N/mm2 .

To draw a histogram, one divides the range into a number of classes or cells n c . The
number of occurrences in each class is counted and tabulated. These are called frequencies.
Table 1.1.2 The first 15 items of modulus of rupture data measuring
timber strengths in N/mm2 , from Table E.1.1 (commencing with the
top row), ranked in increasing order
29.11
40.53

29.93
41.64

32.02
45.54

32.40
48.37

33.06
48.78

34.12
50.98

35.58
65.35


39.34

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

www.ebookcenter.ir


P1: SFK/RPW

P2: SFK/RPW

BLUK154-Kottegoda

6

QC: SFK/RPW

April 15, 2008

T1: SFK

7:11

Applied Statistics for Civil and Environmental Engineers

25

30

35


40

45

50

55

Modulus of rupture, N/mm

60

65

70

2

Fig. 1.1.2 Dot diagram for a short sample of timber strengths from Table 1.1.3.

The width of the classes is usually made equal to facilitate interpretation. For some work
such as the fitting of a theoretical function to observed frequencies, however, unequal class
widths are used. Care should be exercised in the choice of the number of classes, n c . Too
few will cause an omission of some important features of the data; too many will not give
a clear overall picture because
√ there may be high fluctuations in the frequencies. A rule
of thumb is to make n c = n or an integer close to this, but it should be at least 5 and not
greater than 25. Thus, histograms based on fewer than 25 items may not be meaningful.
Sturges (1926) suggested the approximation

n c = 1 + 3.3 log10 n.

(1.1.1)

A more theoretically based alternative follows the work of Freedman and Diaconis (1981):2
nc =

r n 1/3
.
2 iqr

(1.1.2)

Here iqr is the interquartile range. To clarify this term, we must define Q 2 , or the
median. This denotes the middle term of a set of data when the values are arranged in
ascending order, or the average of the two middle terms if n is an even number. The first
or lower quartile, Q 1 , is the median of the lower half of the data, and likewise the third
Table 1.1.3 Ranked modulus of rupture data for timber strengths in N/mm2 , in
ascending order a
0.00
17.98
22.67
22.74
22.75
23.14
23.16
23.19
24.09
24.25
24.84

25.39
25.98
26.63
27.31
27.90
27.93
a

2

28.00
28.13
28.46
28.69
28.71
28.76
28.83
28.97
28.98
29.11
29.90
29.93
30.02
30.05
30.33
30.53
31.33

31.60
32.02

32.03
32.40
32.48
32.68
32.76
33.06
33.14
33.18
33.19
33.47
33.61
33.71
33.92
34.12
34.40

34.44
34.49
34.56
34.63
35.03
35.17
35.30
35.43
35.58
35.67
35.88
35.89
36.00
36.38

36.47
36.53
36.81

36.84
36.85
36.88
36.92
37.51
37.65
37.69
37.78
38.00
38.05
38.16
38.64
38.71
38.81
39.05
39.15
39.20

39.21
39.33
39.34
39.60
39.62
39.77
39.93
39.97

40.20
40.27
40.39
40.53
40.71
40.85
40.85
41.64
41.72

41.75
41.78
41.85
42.31
42.47
43.07
43.12
43.26
43.33
43.33
43.41
43.48
43.48
43.64
43.99
44.00
44.07

44.30
44.36

44.36
44.51
44.54
44.59
44.78
44.78
45.19
45.54
45.92
45.97
46.01
46.33
46.50
46.86
46.99

47.25
47.42
47.61
47.74
47.83
48.37
48.39
48.78
49.57
49.59
49.65
50.91
50.98
51.39

51.90
53.00
53.63

53.99
54.04
54.71
55.23
56.60
56.80
57.99
58.34
65.35
65.61
69.07
70.22

The original data set is given in Table E.1.1; n = 165. The median is underlined.

See also Scott (1979).

SOFTbank E-Book Center Tehran, Phone: 66403879,66493070 For Educational Use.

www.ebookcenter.ir


×