Tải bản đầy đủ (.pdf) (52 trang)

RELATIONAL MANAGEMENT and DISPLAY of SITE ENVIRONMENTAL DATA - PART 1 pps

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (790.03 KB, 52 trang )

RELATIONAL
MANAGEMENT
and
DISPLAY of SITE
ENVIRONMENTAL DATA
© 2002 by CRC Press LLC
LEWIS PUBLISHERS
A CRC Press Company
Boca Raton London New York Washington, D.C.
RELATIONAL
MANAGEMENT
and
DISPLAY of SITE
ENVIRONMENTAL DATA
David W. Rich, Ph.D.
© 2002 by CRC Press LLC
This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with
permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish
reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials
or for the consequences of their use.
Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical,
including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior
permission in writing from the publisher.
The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works,
or for resale. Specific permission must be obtained in writing from CRC Press LLC for such copying.
Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation, without intent to infringe.
Visit the CRC Press Web site at www.crcpress.com
© 2002 by CRC Press LLC
Lewis Publishers is an imprint of CRC Press LLC


No claim to original U.S. Government works
International Standard Book Number 1-56670-591-6
Library of Congress Card Number 2002019441
Printed in the United States of America 1 2 3 4 5 6 7 8 9 0
Printed on acid-free paper
Library of Congress Cataloging-in-Publication Data
Rich, David William, 1952-
Relational management and display of site environmental data / David W. Rich.
p. cm.
Includes bibliographical references and index.
ISBN 1-56670-591-6 (alk. paper)
1. Pollution—Measurement—Data processing. 2. Environmental monitoring—Data
processing. 3. Database management. I. Title.
TD193 .R53 2002
628.5′028′ 7—dc21 2002019441
© 2002 by CRC Press LLC
PREFACE
The environmental industry is changing, along with the way it manages data. Many projects
are making a transition from investigation through remediation to ongoing monitoring. Data
management is evolving from individual custom systems for each project to standardized,
centralized databases, and many organizations are starting to realize the cost savings of this
approach. The objective of Relational Management and Display of Site Environmental Data is to
bring together in one place the information necessary to manage the data well, so everyone, from
students to project managers, can learn how to benefit from better data management.
This book has come from many sources. It started out as a set of course notes to help transfer
knowledge about earth science computing and especially environmental data management to our
clients as part of our software and consulting practice. While it is still used for that purpose, it has
evolved into a synthesis of theory and a relation of experience in working with site environmental
data. It is not intended to be the last word on the way things are or should be done, but rather to
help people learn from the experience of others, and avoid mistakes whenever possible.

The book has six main sections plus appendices. Part One provides an overview of the subject
and some general concepts, including a discussion of system data content. Part Two covers system
design and implementation, including database elements, user interface issues, and implementation
and operation of the system. Part Three addresses gathering the data, starting with an overview of
site investigation and remediation, progressing through gathering samples in the field, and ending
with laboratory analysis. Part Four covers the data management process, including importing,
editing, maintaining data quality, and managing multiple projects. Part Five is about using the data
once it is in the database. It starts with selecting data, and then covers various aspects of data
output and analysis including reporting and display; graphs; cross sections and similar displays; a
large chapter on mapping and GIS; statistical analysis; and integration with other programs.
Section Six discusses problems, benefits, and successes with implementing a site environmental
data management system, along with an attempt to look into the future of data management and
environmental projects. Appendices include examples of a needs assessment, a data model, a data
transfer standard, typical constituent parameters, some exercises, a glossary, and a bibliography.
A number of people have contributed directly and indirectly to this book, including my
parents, Dr. Robert and Audrey Rich; Dr. William Fairley, my uncle and professor of geology at
the University of Notre Dame; and Dr. Albert Carozzi, my advisor and friend at the University of
Illinois. Numerous coworkers and friends at Texaco, Inc., Shell Oil Company, Sabine Corporation,
Grant Environmental, and Geotech Computer Systems, Inc. helped bring me to the point
professionally where I could write this book. These include Larry Ratliff, Jim Thomson, Dr. James
L. Grant, Neil Geitner, Steve Wampler, Jim Quin, Cathryn Stewart, Bill Thoen, Judy Mitchell, Dr.
Mike Wiley, and other Geotech staff members who helped with the book in various ways. Friends
© 2002 by CRC Press LLC
in other organizations have also helped me greatly in this process, including Jim Reed of
RockWare, Tom Bresnahan of Golden Software, and other early members of the Computer
Oriented Geological Society. Thanks also go to Dr. William Ganus, Roy Widmann, Sherron
Hendricks, and Frank Schultz of Kerr-McGee for their guidance.
I would also like to specifically thank those who reviewed all or part of the book, including
Cathryn Stewart (AquAeTer), Bill Thoen (GISNet), Mike Keester (Oklahoma State University),
Bill Ganus and Roy Widmann (Kerr-McGee), Mike Wiley (The Consulting Operation), and Sue

Stefanosky and Steve Clough (Roy. F. Weston). The improvements are theirs. The errors are still
mine.
Finally, my wife, business partner, and best friend, Toni Rich, has supported me throughout
my career, hanging in there through the good times and bad, and has always done what she could to
make our enterprise successful. She’s also a great proofreader.
Throughout this book a number of trademarks and registered trademarks are used. The
registered trademarks are registered in the United States, and may be registered in other countries.
Any omissions are unintentional and will be remedied in later editions. Enviro Data and Spase are
registered trademarks of Geotech Computer Systems, Incorporated. Microsoft, Office, Windows,
NT, Access, SQL Server, Visual Basic, Excel, and FoxPro are trademarks or registered trademarks
of Microsoft Corporation. Oracle is a registered trademark of Oracle Corporation. Paradox and
dBase are registered trademarks of Borland International, Incorporated. IBM and DB2 are
registered trademarks of International Business Machines Corporation. AutoCAD and AutoCAD
Map are registered trademarks of Autodesk, Incorporated. ArcView is a registered trademark of
Environmental Systems Research Institute, Incorporated. Norton Ghost is a trademark of Symantec
Corporation. Apple and Macintosh are registered trademarks of Apple Computer, Incorporated.
Sun is a registered trademark and Sparcstation is a trademark of Sun Microsystems. Capability
Maturity Model and CMM are registered trademarks of The Software Engineering Institute of
Carnegie Mellon University. Adobe and Acrobat are registered trademarks of Adobe Systems.
Grapher is a trademark and Surfer is a registered trademark of Golden Software, Inc. RockWare is
a registered trademark and RockWorks and Gridzo are trademarks of RockWare, Inc. Intergraph
and GeoMedia are trademarks of Intergraph Corporation. Corel is a trademark and Corel Draw is a
registered trademark of Corel Corporation. UNIX is a registered trademark of The Open Group.
Linux is a trademark of Linus Torvalds. Use of these products is for illustration only, and does not
signify endorsement by the author.
A Web site has been established for updates, exercises, and other information related to this
book. It is located at www.geotech.com/relman.
I welcome your comments and questions. I can be reached by email at
David W. Rich
© 2002 by CRC Press LLC

AUTHOR
David W. Rich is founder and president of Geotech Computer Systems, Inc. in Englewood,
CO. Geotech provides off-the-shelf and custom software and consulting services for environmental
data management, GIS, and other technical computing projects. Dr. Rich received his B.S. in
Geology from the University of Notre Dame in 1974, and his M.S. and Ph.D. in Geology from the
University of Illinois in 1977 and 1979, with his dissertation on “Porosity in Oolitic Limestones.”
He worked for Texaco, Inc. in Tulsa, OK and Shell Oil Company in Houston, TX, exploring for oil
and gas in Illinois and Oklahoma. He then moved to Sabine Corporation in Denver, CO as part of a
team that successfully explored for oil in the Minnelusa Formation in the Powder River Basin of
Wyoming. He directed the data management and graphics groups at Grant Environmental in
Englewood, CO where he worked on several projects involving soil and groundwater contaminated
with metals, organics, and radiologic constituents. His team created automated systems for
mapping and cross section generation directly from a database. In 1986 he founded Geotech
Computer Systems, Inc., where he has developed and supervised the development of custom and
commercial software for data management, GIS, statistics, and Web data access.
Environmental projects with which Dr. Rich has been directly involved include two Superfund
wood treating sites, three radioactive material processing facilities, two hazardous waste disposal
facilities, many municipal solid waste landfills, two petroleum refineries, and several mining and
petroleum production and transportation projects. He has been the lead developer on three public
health projects involving blood lead and related data, including detailed residential environmental
measurements. In addition he has been involved in many projects outside of the environmental
field, including a real-time Web-based weather mapping system, an agricultural GIS analysis tool,
and database systems for petroleum exploration and production data, paleontological data, land
ownership, health care tracking, parts inventory and invoice printing, and GPS data capture.
Dr. Rich has been using computers since 1970, and has been applying them to earth science
problems since 1975. He was a co-founder and president of the Computer Oriented Geological
Society in the early 1980s, and has authored or co-authored more than a dozen technical papers,
book chapters, and journal articles on environmental and petroleum data management, geology,
and computer applications. He has taught many short courses on geological and environmental
computing in several countries, and has given dozens of talks at various industry conventions and

other events.
When he is not working, Dr. Rich enjoys spending time with his family and riding his
motorcycle in the mountains, and often both at the same time.
© 2002 by CRC Press LLC
CONTENTS
PART ONE - OVERVIEW AND CONCEPTS
CHAPTER 1 - OVERVIEW OF ENVIRONMENTAL DATA MANAGEMENT
Concern for the environment
The computer revolution
Convergence - Environmental data management
Concept of data vs. information
EMS vs. EMIS vs. EDMS
CHAPTER 2 - SITE DATA MANAGEMENT CONCEPTS
Purpose of data management
Types of data storage
Responsibility for data management
Understanding the data
CHAPTER 3 - RELATIONAL DATA MANAGEMENT THEORY
What is relational data management?
History of relational data management
Data normalization
Structured Query Language
Benefits of normalization
Automated normalization
CHAPTER 4 - DATA CONTENT
Data content overview
Project technical data
Project administrative data
Project document data
Reference data

Document management
PART TWO - SYSTEM DESIGN AND IMPLEMENTATION
CHAPTER 5 - GENERAL DESIGN ISSUES
Database management software
© 2002 by CRC Press LLC
Database location options
Distributed vs. centralized databases
The data model
Data access requirements
Government EDMS systems
Other issues
CHAPTER 6 - DATABASE ELEMENTS
Hardware and software components
Units of data storage
Databases and files
Tables (“databases”)
Fields (columns)
Records (rows)
Queries (views)
Other database objects
CHAPTER 7 - THE USER INTERFACE
General user interface issues
Conceptual guidelines
Guidelines for specific elements
Documentation
CHAPTER 8 - IMPLEMENTING THE DATABASE SYSTEM
Designing the system
Buy or build?
Implementing the system
Managing the system

CHAPTER 9 - ONGOING DATA MANAGEMENT ACTIVITIES
Managing the workflow
Managing the data
Administering the system
PART THREE - GATHERING ENVIRONMENTAL DATA
CHAPTER 10 - SITE INVESTIGATION AND REMEDIATION
Overview of environmental regulations
The investigation and remediation process
Environmental Assessments and Environmental Impact Statements
CHAPTER 11 - GATHERING SAMPLES AND DATA IN THE FIELD
General sampling issues
Soil
Sediment
Groundwater
Surface water
Decontamination of equipment
Shipping of samples
Air
© 2002 by CRC Press LLC
Other media
Overview of parameters
CHAPTER 12 - ENVIRONMENTAL LABORATORY ANALYSIS
Laboratory workflow
Sample preparation
Analytical methods
Other analysis issues
PART FOUR - MAINTAINING THE DATA
CHAPTER 13 - IMPORTING DATA
Manual entry
Electronic import

Tracking imports
Undoing an import
Tracking quality
CHAPTER 14 - EDITING DATA
Manual editing
Automated editing
CHAPTER 15 - MAINTAINING AND TRACKING DATA QUALITY
QA vs. QC
The QAPP
QC samples and analyses
Data quality procedures
Database support for data quality and usability
Precision vs. accuracy
Protection from loss
CHAPTER 16 - DATA VERIFICATION AND VALIDATION
Types of data review
Meaning of verification
Meaning of validation
The verification and validation process
Verification and validation checks
Software assistance with verification and validation
CHAPTER 17 - MANAGING MULTIPLE PROJECTS AND DATABASES
One file or many?
Sharing data elements
Moving between databases
Limiting site access
PART FIVE - USING THE DATA
CHAPTER 18 - DATA SELECTION
Text-based queries
Graphical selection

Query-by-form
CHAPTER 19 - REPORTING AND DISPLAY
© 2002 by CRC Press LLC
Text output
Formatted reports
Formatting the result
Interactive output
Electronic distribution of data
CHAPTER 20 - GRAPHS
Graph overview
General concepts
Types of graphs
Graph examples
Curve fitting
Graph theory
CHAPTER 21 - CROSS SECTIONS, FENCE DIAGRAMS, AND 3-D DISPLAYS
Lithologic and wireline logs
Cross sections
Profiles
Fence diagrams and stick displays
Block Diagrams and 3-D displays
CHAPTER 22 - MAPPING AND GIS
Mapping concepts
Mapping software
Displaying data
Contouring and modeling
Specialized displays
CHAPTER 23 - STATISTICS AND ENVIRONMENTAL DATA
Statistical concepts
Types of statistical analyses

Outliers and comparison with limits
Toxicology and risk assessment
CHAPTER 24 - INTEGRATION WITH OTHER PROGRAMS
Export-import
Digital output
Export-import advantages and disadvantages
Direct connection
Data warehousing and data mining
Data integration
PART SIX - PROBLEMS, BENEFITS, AND SUCCESSES
CHAPTER 25 - AVOIDING PROBLEMS
Manage expectations
Use the right tool
Prepare for problems with the data
Plan project administration
Increasing the chance of a positive outcome
© 2002 by CRC Press LLC
CHAPTER 26 - SUCCESS STORIES
Financial benefits
Technical benefits
Subjective benefits
CHAPTER 27 - THE FUTURE OF ENVIRONMENTAL DATA MANAGEMENT
PART SEVEN - APPENDICES
APPENDIX A - NEEDS ASSESSMENT EXAMPLE
APPENDIX B - DATA MODEL EXAMPLE
Introduction
Conventions
Primary tables
Lookup tables
Reference tables

Utility tables
APPENDIX C - DATA TRANSFER STANDARD
Purpose
Database background information
Data content
Acceptable file formats
Submittal requirements
Non-conforming data
APPENDIX D - THE PARAMETERS
Overview
Inorganic parameters
Organic parameters
Other parameters
Method reference
APPENDIX E - EXERCISES
Database redesign exercise
Data normalization exercise
Group discussion - data management and your organization
Database redesign exercise solution
Data normalization exercise solution
Database software exercises
APPENDIX F - GLOSSARY
APPENDIX G - BIBLIOGRAPHY
© 2002 by CRC Press LLC
PART ONE - OVERVIEW AND
CONCEPTS
© 2002 by CRC Press LLC
CHAPTER 1
OVERVIEW OF ENVIRONMENTAL
DATA MANAGEMENT

Concern for our environment has been on the rise for many years, and rightly so. At many
industrial facilities and other locations toxic or potentially toxic materials have been released into
the environment in large amounts. While the health impact of these releases has been quite variable
and, in some cases, controversial, it clearly is important to understand the impact or potential
impact of these releases on the public, as well as on the natural environment. This has led to
increased study of the facilities and the areas around them, which has generated a large amount of
data. More and more, people are looking to sophisticated database management technology,
together with related technologies such as geographic information systems and statistical analysis
packages, to make sense of this data. This chapter discusses this increasing concern for the
environment, the growth of computer technology to support environmental data management, and
then some general thoughts on environmental data management in an organization.
CONCERN FOR THE ENVIRONMENT
The United States federal government has been regulating human impact on the environment
for over a century. Section 13 of the River and Harbor Act of 1899 made it unlawful (with some
exceptions) to put any refuse matter into navigable waters (Mackenthun, 1998, p. 20). Since then
hundreds of additional laws have been enacted to protect the environment. This regulation occurs
at all levels of government from international treaties, through federal and state governments, to
individual municipalities. Often this situation of multiple regulatory oversight results in a maze of
regulations that makes even legitimate efforts to improve the situation difficult, but it has definitely
increased the effort to clean up the environment and keep it clean.
Through the 1950s the general public had very little awareness or concern about
environmental issues. In the 1960s concern for the environment began to grow, helped at least
some by the book Silent Spring by Rachel Carson (Carson, 1962). The ongoing significance of this
book is highlighted by the fact that a 1994 edition of the book has a foreword by then Vice
President Al Gore. In this book Ms. Carson brought attention to the widespread and sometimes
indiscriminate use of DDT and other chlorinated hydrocarbons, organic phosphates, arsenic, and
other materials, and the impact of this use on ground and surface water, soil, plants, and animals.
She cites examples of workers overcome by exposure to large doses of chemicals, and changes in
animal populations after use of these chemicals, to build the case that widespread use of these
materials is harmful. She also discusses the link between these chemicals and cancer.

© 2002 by CRC Press LLC
Rachel Carson’s message about concern for the environment came at a time, the 1960s, when
America was ready for a “back-to-the-earth” message. With the youth of America and others
organizing to oppose the war in Vietnam, the two causes fit well together and encouraged each
other’s growth. This was reflected in the music of the time, with many songs in the sixties and
seventies discussing environmental issues, often combined with sentiments against the war and
nuclear power. The war in Vietnam ended, but the environmental movement lives on.
There are many examples of rock songs of the sixties and seventies discussing environmental
issues. In 1968 the rock musical Hair warned about the health effects of sulfur dioxide and carbon
monoxide. Zager and Evans in their 1969 song In The Year 2525 talked about taking from the earth
and not giving back, and in 1970 the Temptations discussed air pollution and many other social
issues in Ball of Confusion. Three Dog Night also warned about air pollution in their 1970 songs
Cowboy and Out in the Country. Perhaps the best example of a song about the environment is
Marvin Gaye’s 1971 song Mercy Mercy Me (The Ecology), in which he talked about oil polluting
the ocean, mercury in fish, and radiation in the air and underground. In 1975 Joni Mitchell told
farmers not to use DDT in her song Big Yellow Taxi, and the incomparable songwriter Bob Dylan
got into the act with his 1976 song A Hard Rain’s A-gonna Fall, warning about poison in water
and global hunger. It’s not a coincidence that this time frame overlaps all of the significant early
environmental regulations.
A good example of an organized environmental effort that started in those days and continues
today is Earth Day. Organized by Senator Gaylord Nelson and patterned after teach-ins against the
war in Vietnam, the first Earth Day was held on April 22, 1970, and an estimated 20 million people
around the country attended, according to television anchor Walter Cronkite. In the 10 years after
the first Earth Day, 28 significant pieces of federal environmental legislation were passed, along
with the establishment of the U.S. Environmental Protection Agency (EPA) in December of 1970.
The first major environmental act, the National Environmental Policy Act of 1969 (NEPA)
predated Earth Day, and had the stated purposes (Yost, 1997) of establishing harmony between
man and the environment; preventing or eliminating damage to the environment; stimulating the
health and welfare of man; enriching the understanding of ecological systems; and establishment of
the Council on Environmental Quality. Since that act, many laws protecting the environment have

been passed at the national, state, and local levels.
Evidence that public interest in environmental issues is still high can be found in the public
reaction to the book A Civil Action (Harr, 1995). This book describes the experience of people in
the town of Woburn, Massachusetts. A number of people in the town became ill and some died due
to contamination of groundwater with TCE, an industrial solvent. This book made the New York
Times bestseller list, and was later made into a movie starring John Travolta. More recently, the
movie Erin Brockovich starring Julia Roberts covered a similar issue in California with Pacific Gas
and Electric and problems with hexavalent chrome in groundwater causing serious health issues.
Public interest in the environment is exemplified by the various watchdog organizations that
track environmental issues in great detail. A good example of this is Scorecard.org, (Environmental
Defense, 2001) a Web site that provides a very large amount of information on environmental
current events, releases of toxic substances, environmental justice, and similar topics. For example,
on this site you can find the largest releasers of pollutants near your residence. Sites like this
definitely raise public awareness of environmental issues.
It’s also important to point out that the environmental industry is big business. According to
reports by the U.S. Department of Commerce and Environmental Business International (as quoted
in Diener, Terkla, and Cooke, 2000), the environmental industry in the U.S. in 1998 had $188.7
billion in sales, up 1.6% from the previous year. It employed 1,354,100 people in 115,850
companies. The worldwide market for environmental goods and services for the same period was
estimated to be $484 billion.
© 2002 by CRC Press LLC
Figure 1 - The author (front row center) examining state-of-the-art punch card technology in 1959
THE COMPUTER REVOLUTION
In parallel with growing public concern for the environment has been growth of technology to
support a better understanding of environmental conditions. While people have been using
computing devices of some sort for over a thousand years and mainframe computers since the
1950s (see Environmental Computing History Timeline sidebar), the advent of personal computers
in the 1980s made it possible to use them effectively on environmental projects. For more
information on the history of computers, see Augarten (1984) and Evans (1981). Discussions of the
history of geological use of computers are contained in Merriam (1983,1985).

With the advent of Windows-based, consumer-oriented database management programs in the
1990s, the tools were in place to create an environmental data management system (EDMS) to
store data for one or more facilities and use it to improve project management.
Computers have assumed an increasingly important role in our lives, both at work and at
home. The average American home contains more computers than bathtubs. From electronic
watches to microwave ovens, we are using computers of one type or another a significant
percentage of our waking hours. In the workplace, computers have changed from big number
crunchers cloistered somewhere in a climate-controlled environment to something that sits on our
desk (or our lap). No longer are computers used only for massive computing jobs which could not
be done by hand, but they are now replacing the manual way of doing our daily work. This is as
true in the earth science disciplines as anywhere else. Consequently, industry sages have suggested
that those who do not have computer skills will be left behind in the next wave of automation of the
© 2002 by CRC Press LLC
Environmental Computing History Timeline
1000 BC – The Abacus was invented (still in use).
1623 – The first mechanical calculator was invented by German professor Wilhelm
Schickard.
1834 – Charles Babbage began work on the Analytical Engine, which was never completed.
1850 – Charles Lyell was the first person to use statistics in geology.
1876 – Alexander Graham Bell patented the telephone.
1890 – Herman Hollerith built the Tabulating Machine, which was the first successful
mechanical calculating machine.
1899 – The River and Harbor Act was the first environmental law passed in the United
States.
1943 – The Mark 1, an electromechanical calculator, was developed.
1946 – ENIAC (Electronic Numerator, Integrator, Analyzer and Computer) was completed.
(Dick Tracy’s wrist radio also debuted in the comic strip.)
1947 – The transistor was invented by Bardeen, Brattain, and Shockley at Bell Labs.
1951 – UNIVAC, the first commercial computer, became available.
1952 – Digital plotters were introduced.

1958 – The integrated circuit was invented by Jack Kilby at Texas Instruments.
1962 – Rachel Carson’s Silent Spring is published, starting the environmental movement.
1965 – IBM white paper on computerized contouring appeared.
1969 – National Environmental Policy Act (NEPA) was enacted.
1970 – The first Earth Day was held.
1970 – Relational data management was described by Edwin Codd.
1971 – The first microprocessor, the Intel 4004, was introduced.
1973 – SQL was introduced by Boyce and Chamberlain.
1977 – The Apple II, the first widely accepted personal computer, was introduced.
1981 – IBM releases its Personal Computer. This was the computer that legitimized small
computers for business use.
1984 – The Macintosh computer was introduced, the first significant use of a graphical user
interface on a personal computer.
1985 – Windows 1.0 was released.
1990 – Microsoft shipped Windows 3.0, the first widely accepted version.
1994 – Netscape Navigator was released by Mosaic Communications, leading to widespread
use of the World Wide Web.
workplace. At the least, those who are computer aware will be in a better position to evaluate how
computers can help them in their work.
The growth that we have seen in computer processing power is related to Moore’s law
(Moore, 1965; see also Schaller, 1996), which states that the capacity of semiconductor memory
doubles every 18 months. The price-performance ratio of most computer components meets or
exceeds this law over time. For example, I bought a 10 megabyte hard drive in 1984 for $800. In
2001 I could buy a 20 gigabyte hard drive for $200, a price-performance increase of 8000 times in
17 years. This averages to a doubling about every 16 months. Over the same time, PC processing
speed has increased from 4 megahertz for $5000 to 1000 megahertz for $1000, an increase of
1250, a doubling every 20 months. These average to 18 months. So computers become twice as
powerful every year and a half, obeying Moore’s law.
Unlike 10 or especially 20 years ago, it is now usual in industrial and engineering companies
for each employee to have a suitable computer on his or her desk, and for that computer to be

networked to other people’s computers and often a server. This computing environment is a good
base on which to build a data management system.
© 2002 by CRC Press LLC
As the hardware has developed, so has the data management software. It is now possible to
outfit an organization with the software for a client-server data management system starting at
$1,000 or $2,000 a seat. Users probably already have the hardware. Adding software
customization, training, support, and other costs still allows a powerful data management system to
be put in place for a cost which is acceptable for many projects.
In general, computers perform best when problem solving calls for either deductive or
inductive reasoning, and poorly when using subjective reasoning. For example, calculating a series
of stratigraphic horizon elevations where the ground level elevation and the depth to the formation
are known is an example of deductive reasoning. Computers perform optimally on problems
requiring deductive reasoning because the answers are precise, requiring explicit computations.
Estimating the volume of contamination or contouring a surface is an example of inductive
reasoning. Inductive reasoning is less precise, and requires a skilled geoscientist to critique and
modify the interpretation. Lastly, the feeling that carbonate aquifers may be more complex than
clastic aquifers is an example of subjective reasoning. Subjective reasoning uses qualitative data
and is the most difficult of all for computer analysis. In such instances, the analytical potential of
the computer is secondary to its ability to store and graphically portray large amounts of
information. Graphic capabilities are requisite to earth scientists in order to make qualitative data
usable for interpretation.
Another example of appropriate use of computers relative to types of reasoning is the
distinction between verification and validation, which is discussed in detail in Chapter 16.
Verification, which checks compliance of data with project requirements, is an example of
deductive logic. Either a continuous calibration verification sample was run every ten samples or it
wasn’t. Validation, on the other hand, which determines the suitability of the data for use, is very
subjective, requiring an understanding of sampling conditions, analytical procedures, and the
expected use of the data. Verification is easily done with computer software. How far software can
go toward complete automation of the validation process remains to be seen.
CONVERGENCE - ENVIRONMENTAL DATA MANAGEMENT

Efficient data management is taking on increased importance in many organizations, and yours
is probably no exception. In the words of one author (Diamondstone, 1990, p. 3):
Automated measuring equipment has provided rapidly increasing amounts of
data. Now, the challenge before us is to assure sufficient data uniformity and
compatibility and to implement data quality measures so that these data will be
useful for integrative environmental problem solving.
This is particularly true in organizations where many different types of site investigation and
monitoring data are coming from a variety of different sources. Fortunately, software tools are now
available which allow off-the-shelf programs to be used by people who are not computer experts to
retrieve this data in a format that is meaningful to them. According to Finkelstein (1989, p. 3):
Management is on the threshold of an explosive boom in the use of computers. A
boom initiated by simplicity and ease of use. Managers and staff at all levels of
an organization will be able to design and implement their own systems, thereby
dramatically reducing their dependence on the data processing (DP)
department, while still ensuring that DP maintains central control, so that
application systems and their data can be used by others in the business.
With the advent of relatively easy to use software tools such as Microsoft Windows and
Microsoft Access, it is even more true now that individuals can have a much greater role in
© 2002 by CRC Press LLC
satisfying their own data management needs. It is important to develop a data management
approach that makes efficient use of these tools to solve the business problem of managing data
within the organization. The environmental data management system that will result from
implementation of a plan based on this approach will provide users with access to the
organization’s environmental data to satisfy their business needs. It will allow them to expand their
data retrievals as their needs change and as their skills develop.
As with most business decisions, the decision to implement a data management system should
be based on an analysis of the expected return on the time and money invested. In the case of an
office automation project, some of the return is tangible and can be expressed in dollar savings,
and some is intangible savings in efficiency in everyday operations. In general, the best approach
for system implementation is to look for leverage points in the system where a great return can be

had for a small cost. The question becomes: How do you improve the process to get the greatest
savings?
Often some examples of tangible returns can be identified within the organization. The
benefits can best be seen from analyzing the impact of the data management system on the whole
site investigation and remediation process. For example, during remediation you might be able, by
more careful tracking and modeling of the contamination, to decrease the amount of waste to be
removed or water to be processed. You may also be able to decrease the time required to complete
the project and save many person-years of cost by making quality data available in a standardized
format and in a timely fashion. For smaller sites, automating the data management process can
provide savings by repetition. Once the system has been set up for one site and people trained to
use it, that effort can be re-used on the next site.
The intangible benefits of a data management system are difficult to quantify, but subjectively
can include increased job satisfaction of project workers, a higher quality work product, and better
decision making. The cumulative financial and intangible return on investment of these various
benefits can easily justify reasonable expenditures for a data management system.
CONCEPT OF DATA VS. INFORMATION
It is important to recognize that there is a difference between numbers and letters stored in a
computer and useful information. Numbers stored in a computer, or printed out onto a sheet of
paper, may not themselves be of any value. It is only when those numbers are presented in a form
that is useful to the intended audience that they become useful information. The keys to making the
transition from data to information are organization and access. It doesn't matter if you have a file
of all the monitoring wells ever drilled; if you can't get the information you want out of the file, it is
useless. Before any database is created, careful attention should be paid to how the data is going to
be used, to ensure that the maximum use can be received from the effort.
Statistics and graphics can be tremendously helpful in perceiving relationships among different
variables contained in the data. As the power and ease-of-use of both general business programs
and technical programs for statistics and graphics improves, it will become common to take a good
look at the data as a set before working with individual members of the set.
The next step is to move from information to knowledge. The difference between the two is
understanding. Once you have processed the information and understand it, it becomes knowledge.

This transition is a human activity, not a computer activity, but the computer can help by
presenting the information in an understandable manner.
EMS VS. EMIS VS. EDMS
A final overview issue to discuss is the relationship between EMS (environmental management
systems), EMIS (environmental management information systems), and site EDMS (environmental
data management systems). An EMS is a set of policies and procedures for managing
© 2002 by CRC Press LLC
Data is or Data are?
Is “data” singular or plural? In this book the word data is used as a singular noun.
Depending on your background, you may not like this. Many engineers and scientists think of
data as the plural of “datum,” so they consider the word plural. Computer people view data as
a chunk of stuff, and, like “chunk,” consider it singular. In one dictionary I consulted
(Webster, 1984), data as the plural of datum was the third definition, with the first two being
synonyms for “information,” which is always treated as singular. It also states that common
usage at this time is singular rather than plural, and that “data can now be used as a singular
form in English.” In Strunk and White (1935), a style manual that I use, the discussion of
singular vs. plural nouns uses an example of the contents of a jar. If the jar contains marbles,
its contents are plural. If it contains jam, its content is singular. You decide: Is data jam or
marbles?
environmental issues for an organization or a facility. An EMIS is a software system implemented
to support the administration of the EMS (see Gilbert, 1999). EMIS usually has a focus on record
keeping and reporting, and is implemented with the hope of improving business processes and
practices. A site environmental data management systems (EDMS) is a software system for
managing data regarding the environmental impact of current or former operations. EDMS
overlaps partially with EMIS systems. For an operating facility, the EDMS is a part of the EMIS.
For a facility no longer in operation, there may be no formal EMS or EMIS, but the EDMS is
necessary to facilitate monitoring and cleanup.
© 2002 by CRC Press LLC
CHAPTER 2
SITE DATA MANAGEMENT CONCEPTS

The size and complexity of environmental investigation and monitoring programs at industrial
facilities continue to increase. Consequently the amount of environmental data, both at operating
facilities and orphan sites, is growing as well. The volume of data often exceeds the capacity of
simple tools like paper reports and spreadsheets. When that happens it is appropriate to implement
a more powerful data management system and often the system of choice is a relational database
manager. This section provides a top-down discussion of management of environmental data. It
focuses on the purpose and goals of environmental data management, and on the types and
locations of data storage. These issues should always be resolved before an electronic (or in fact
any) data management system should be implemented.
PURPOSE OF DATA MANAGEMENT
Why manage data electronically? Or why even manage it at all? Clear answers to these
questions are critical before a successful system can be implemented. This section addresses some
of the issues related to the purpose of data management. It all comes down to planning. If you
understand the goal to be accomplished, you have a better chance of accomplishing it.
There is only one real purpose of data management: to support the goals of the organization.
These goals are discussed in detail in Chapter 8. No data management system should be built
unless it satisfies one or more significant business or technical goals. Identification of these goals
should be done prior to designing and implementing the system for two reasons. One reason is that
the achievement of these goals provides the economic justification for the effort of building the
system. The other reason is that the system is more likely to generate satisfactory results if those
results are understood, at least to a first approximation, before the system is implemented and
functionality is frozen.
Different organizations have different things that make them tick. For some organizations,
internal considerations such as cost and efficiency are most important. For others, outside
appearances are equally or more important. The goals of the organization must be taken into
consideration in the design of the system so that the greatest benefit can be achieved. Typical goals
include:
Improve efficiency – Environmental site investigation and remediation projects can involve
an enormous amount of data. Computerized methods, if they are well designed and implemented,
© 2002 by CRC Press LLC

can be a great help in improving the flow of data through the project. They can also be a great sink
of time and effort if poorly managed.
Maximize quality – Because of the great importance of the results derived from
environmental investigation and remediation, it is critical that the quality of the output be
maximized relative to the cost. This is not trivial, and careful data storage, and annotation of data
with quality information, can be a great help in achieving data quality objectives.
Minimize cost – No organization has an unlimited amount of money, and even those with a
high level of commitment to environmental quality must spend their money wisely to receive the
greatest return on their investment. This means that unnecessary costs, whether in time or money,
must be minimized. Electronic data management can help contain costs by saving time and
minimizing lost data.
People tend to start working on a database without giving a lot of thought to what a database
really is. It is more than an accumulation of numbers and letters. It is a special way to help us
understand information. Here are some general thoughts about databases:
A database is a model of reality – In many cases, the data that we have for a facility is the
only representation that we have for conditions at that facility. This is especially true in the
subsurface, and for chemical constituents that are not visible, either because of their physical
condition or their location.
The model helps us understand the reality – In general, conditions at sites are nearly
infinitely complex. The total combination of geological, hydrological and engineering factors
usually exceeds our ability to understand it without some simplification. Our model of the site,
based on the data that we have, helps us to perform this simplification in a meaningful way.
This understanding helps us make decisions – Our simplified understanding of the site
allows us to make decisions about actions to be taken to improve the situation at the site. Our
model lets us propose and test solutions based on the data that we have, identify additional data
that we need, and then choose from the alternative solutions.
The clearer the model, the better the decisions – Since our decisions are based on our data-
based model, it follows that we will make better decisions if we have a clear, accurate, up-to-date
model. The purpose of a database management system for environmental data is to provide us the
information to build accurate models and keep them current.

Clearly information technology, including data management, is important to organizations.
Linderholm (2001) reports the results of a study that asked business executives about the
importance of information technology (IT) to their business. 70% reported that it was absolutely
essential, and 20% said it was extremely valuable. The net increase in revenue attributable to IT,
after accounting for IT costs, was estimated to be 20%, clearly a good return. 70% said that the
role of IT in business strategy is increasing. In the environmental business the story must be
similar, but perhaps not as strong. If you were to survey project managers today about the
importance of data management on their projects, probably the percentage that said it was essential
or extremely valuable would be less than the 90% quoted above, and maybe less than 50%. But as
the amount of data for sites continues to grow, this number will surely increase.
TYPES OF DATA STORAGE
Once the purpose of the system has been determined, the next step is to identify the data to be
contained in the system and how it is to be stored. Some data must be stored electronically, while
Environmental problems are complex problems. Complex problems have simple, easy-to-
understand wrong answers.
From Environmental Humor by Gerald Rich (1996), reprinted with permission
© 2002 by CRC Press LLC
other data might not need to be stored this way. Implementers should first develop a thorough
understanding of their existing data and storage methods, and then make decisions about how
electronic storage can provide an improvement. This section will cover ways of storing site
environmental data. The content of an EDMS will be discussed in Chapter 4.
Hard copy
Since its inception, hard copy data storage has been the lifeblood of the environmental
industry. Many organizations have thousands of boxes of paper related to their projects. The
importance of this data varies greatly, but in many organizations, it is not well understood.
A data management system for hard copy data is different from a data management system for
digital data such as laboratory analytical results. The former is really a document management
system, and many vendors offer software and other tools to build this type of system. The latter is
more of a technical database issue, and can be addressed by in-house generated solutions or off-
the-shelf or semi-custom solutions from environmental software vendors.

LAB REPORTS
Laboratory analyses can generate a large volume of paper. Programs like the U.S.E.P.A.
Contract Lab Program (CLP) specify deliverables that can be hundreds of pages for one sampling
event. This paper is important as backup for the data, but these hundreds of pages can cause a
storage and retrieval problem for many organizations. Often the usable data from the lab event, that
is, the data actually used to make site decisions, may be only a small fraction of the paper, with the
rest being quality assurance and other backup information.
DERIVED REPORTS
Evaluation of the results of laboratory analysis and other investigation efforts usually results in
a printed report. These reports contain a large amount of useful information, but over time can also
become a storage and retrieval problem.
Electronic
There are many different ways of organizing data for digital storage. There is no “right” or
“wrong” way, but there are approaches that provide greater benefits than others in specific
situations. People store environmental data a lot of different ways, both in database systems and in
other file types. Here we will discuss two non-database ways of storing data, and several different
database system designs for storing data.
TEXT FILES AND WORD PROCESSOR FILES
The simplest way to manage environmental data is in text files. These files contain just the
information of interest, with no formatting or information about the data structure or relationships
between different data elements. Usually these files are encoded in ASCII, which stands for
American Standard Code for Information Interchange and is pronounced like as′-kee. For this
reason they are sometimes called ASCII files. Text files can be effectively used for storing and
transferring small amounts of data. Because they lack “intelligence” they are not usually practical
for large data sets. For example, in order to search for one piece of data in a text file you must look
at every word until you find the one you are looking for, rather than using a more efficient method
such as indexed searching used by data management programs.
A variation on text files is word processor files, which contain some formatting and structure
resulting from the word processing program that created them. An example of this would be the
data in a table in a report. Again this works well only for small amounts of data.

© 2002 by CRC Press LLC
SPREADSHEETS
Over the years a large amount of environmental data has been managed in spreadsheets. This
approach works for data sets that are small to medium in size, and where the display and retrieval
requirements are relatively simple. For large data sets, a database manager program is usually
required because spreadsheets have a limit to the number of rows and columns that they contain,
and these limits can easily be exceeded by a large data set. For example, Lotus 123 has a limit of
about 16,000 rows of data, and Excel 97 has a limit of 65,536 rows.
Spreadsheets do have their place in working with environmental data. They are particularly
useful for statistical analysis of data and for graphing in a variety of ways. Spreadsheets are for
doing calculations. Database managers are for managing data. As long as both are used
appropriately, the two together can be very powerful.
The problem with spreadsheets occurs when they are used in situations where real data
management is required. For example, it’s not unusual for organizations to manage quarterly
groundwater monitoring data using spreadsheets. They can do statistics on the data and print
reports. Where the problem becomes evident is when it becomes necessary to do a historical
analysis of the data. It can be very difficult to tie the data together. The format of the spreadsheets
may have evolved over time. The file for one quarter may be missing or corrupted. Suddenly it
becomes a chore to pull all of the data together to answer a question such as “What is the trend of
the sulfate values over the last five years?”
DATABASE MANAGERS
For storing large amounts of data, and where immediate calculations are not as important,
database managers usually do a better job than spreadsheets, although the capabilities of
spreadsheets and databases certainly overlap somewhat. The better database managers allow you to
store related data in several different tables and to link them together based on the contents of the
data. Many database manager programs have a reputation for not being very easy to use, partly
because of the sheer number of options available. This has been improved with the menu-driven
interfaces that are now available. These interfaces help with the learning curve, but data
management software, especially database server software, can still be very difficult to master.
Many database manager programs provide a programming language, which allows you to

automate tasks that you perform often or repeatedly. It also allows you to configure the system for
other users. This language provides the tools to develop sophisticated applications programs for
nearly any data handling need, and provides the basis for some commercial EDMS software.
Database managers are usually classified by how they store and relate data. The most common
types are flat files, hierarchical, network, object-oriented, and relational. Most use the terminology
of “record” for each object in the database (such as a well or sample location) and “field” for each
type of information on each object (such as county or collection date). For information on database
management concepts see Date (1981) and Rumble and Hampel (1984).
Sullivan (2001) quotes a study by the University of California at Berkeley that humans have
generated 12 exabytes (an exabyte is over 1 million terabytes, or a million trillion bytes) of data
since the start of time, and will double this in the next two and a half years. Currently, about 20%
of the world’s data is contained in relational databases, while the rest is in flat files, audio, video,
pre-relational, and unstructured formats.
Flat file
A flat file is a two-dimensional array of data organized in rows and columns similar to a
spreadsheet. This is the simplest type of database manager. All of the data for a particular type of
object is stored in a single file or table, and each record can have one instance of data for each
field. A good analogy is a 3"×5" card file, where there is one card (record) for each item being
tracked in the database, and one line (field) for each type of information stored.
© 2002 by CRC Press LLC
Flat file database managers are usually the cheapest to buy, and often the easiest to use, but the
complexity of real-world data often requires more power than they can provide.
In a flat file storage system, each row represents one observation, such as a boring or a sample.
Each column contains the same kind of data. An example of a flat file of environmental data is
shown in the following table:
Well Elev X Y SampDate Sampler As AsFlag Cl ClFlag pH
B-1 725 1050 681 2/3/96 JLG .05 not det 6.8
B-1 725 1050 681 5/8/96 DWR .05 not det .05 not det 6.7
B-2 706 342 880 11/4/95 JAM 3.7 detected 9.1 detected 5.2
B-2 706 342 880 2/3/96 JLG 2.1 detected 8.4 detected 5.3

B-2 706 342 880 5/8/96 DWR 1.4 detected 7.2 detected 5.8
B-3 714 785 1101 2/3/96 JLG .05 not det 8.1
B-3 714 785 1101 5/8/96 CRS .05 not det .05 not det 7.9
Figure 2 - Flat file of environmental data
In this table, each line is the result of one sampling event for an observation well. Since the
wells were sampled more than once, and analyzed for multiple parameters, information specific to
the well, such as the elevation and location (X and Y), is repeated. This wastes space and increases
the chance for error since the same data element must be entered more than once. The same is true
for sampling events, represented here by the date and the initials of the person doing the sampling.
Also, since the format for the analysis results requires space for each value, if the value is missing,
as it is for some of the chloride measurements, the space for that data is wasted.
In general, flat files work acceptably for managing small amounts of data such as individual
sampling events. They become less efficient as the size of the database grows. Examples of flat file
data management programs are FileMaker Pro (www.filemaker.com) and Web-based database
programs such as QuickBase (www.quickbase.com).
Hierarchical
In the hierarchical design, the one-to-many relationship common to many data sets is
formalized into the database design. This design works well for situations such as multiple samples
for each boring, but has difficulty with other situations such as many-to-many relationships. This
type of program is less common than flat files or relational database managers, but is appropriate
for some types of data. In a hierarchical database, data elements can be viewed as branches of an
inverted tree.
A good example of a hierarchical database might be a database of organisms. At the top would
be the kingdom, and underneath that would be the phyla for each kingdom. Each phylum belongs
to only one kingdom, but each kingdom can have several phyla. The same continues down the line
for class, order, and so on. The most important factor in fitting data into this scheme is that there
must be no data element at one level that needs to be under more than one element at a higher
level. If a crinoid could be both a plant and an animal at the same time, it could not be classified in
a hierarchical database by phylogeny (which biological kingdom it evolved from).
Environmental site data is for the most part hierarchical in nature. Each site can have many

monitoring wells. Each well can have many samples, either over time or by depth. Then each
sample can be analyzed for multiple constituents. Each constituent analysis comes from one
specific sample, which comes from one well, which comes from one site.
A data set which is inherently hierarchical can be stored in a relational database manager, and
relational database managers are somewhat more flexible, so pure hierarchical database managers
are now rare.
© 2002 by CRC Press LLC
Network
In the network data model, multiple relationships between different elements at the same level
are easy to manage. Hypertext systems (such as the World Wide Web) are examples of managing
data this way. Network database managers are not common, but are appropriate in some cases,
especially those in which the interrelationships among data are complex.
An example of a network database would be a database of authors and articles. Each author
may have written many articles, and each article may have one or more authors. This is called a
“many-to-many” relationship. This is a good project for a network database manager. Each author
is entered, as is each article. Then the links between authors and articles are established. The data
elements are entered, and then the network established. Then an article can be called up, and the
information on its authors can be retrieved. Likewise, an author can be named, and his or her
articles listed.
A network data topology (geometric configuration) can be stored in a relational database
manager. A “join table” is needed to handle the many-to-many relationships. Storing the above
article database in a relational system would require three tables, one for authors, one for articles,
and a join table with the connections between them.
Object oriented
This relatively recent invention stores each data element as an object with properties and
methods encapsulated (wrapped up) into each object. This is a deviation from the usual separation
of code and data, but is being used successfully in many settings. Current object-oriented systems
do not provide the data retrieval speed on large data sets provided by relational systems. Using this
type of software involves a complete re-education of the user, since different terminology and
concepts are used. It is a very powerful way to manipulate data for many purposes, and is likely to

see more widespread use. Some of the features of object-oriented databases are described in the
next few paragraphs.
Encapsulation – Traditional programming languages focus on what is to be done. This is
referred to as “procedural programming.” Object-oriented programming focuses on objects, which
are a blend of data and program code (Watterson, 1989). In a procedural paradigm (a paradigm is
an approach or model), the data and the programs are separate. In an object-oriented paradigm, the
objects consist of data that knows what to do with itself, that is, objects contain methods for
performing actions. This is called encapsulation. Thus, instead of applying procedures to passive
data, in object-oriented programming systems (OOPS), methods are part of the objects.
Some examples of the difference between procedural systems and OOPS might be helpful. In a
procedural system, the data for a well could contain a field for well type, such as monitoring well
or soil boring. The program operating on the data would know what symbol to draw on the map
based on the contents of that field. In an OOPS the object called “soil boring” would include a
method to draw its symbol, based on the data content (properties) of the object. Properties of
objects in OOPS are usually loosely typed, which means that the distinction between data types
such as integers and characters is not rigorously defined. This can be useful when, as is often the
case, a numeric property such as depth to a particular formation needs to be filled with character
values such as NP (not present) or NDE (not deep enough).
For another illustration, imagine modeling a rock or soil body subject to chemical and physical
processes such as leaching or neutralization using an OOPS. Each mineral phase would be an
object of class “mineral,” while each fluid phase would be an object of class “fluid.” Methods
known to the objects would include precipitation, dissolution, compaction, and so on. The model is
given an initial condition, and then the objects interact via messages triggering methods until some
final state is reached.
Inheritance – Objects in an OOPS belong to classes, and members of a particular class share
the same methods. Also, similar classes of objects can inherit properties and methods from an
existing class. This feature, called inheritance, allows a building-block approach to designing a
© 2002 by CRC Press LLC

×