The National Health and Nutrition Examination Surveys
(NHANES) Volatile Organic Compound Dataset:
An Introduction to the Project and Analyses of the Relationship
between Personal Exposures to VOCs and Behavioral,
Socioeconomic, and Demographic Characteristics
A Collaborative Project of The Mickey Leland National
Urban Air Toxics Research Center and The National
Center for Health Statistics
NUMBER 16
2009
ABOUT THE NUATRC
The Mickey Leland National Urban Air Toxics Research Center (NUATRC or the Leland
Center) was established in 1991 to develop and support research into potential human
health effects of exposure to air toxics in urban communities. Authorized under the Clean
Air Act Amendments (CAAA) of 1990, the Center released its first Request for Applications
in 1993. The aim of the Leland Center since its inception has been to build a research
program structured to investigate and assess the risks to public health that may be
attributed to air toxics. Projects sponsored by the Leland Center are designed to provide
sound scientific data useful for researchers and for those charged with formulating
environmental regulations.
The Leland Center is a public-private partnership, in that it receives support from
government sources and from the private sector. Thus, government funding is leveraged
by funds contributed by organizations and businesses, enhancing the effectiveness of the
funding from both of these stakeholder groups. The U.S. Environmental Protection Agency
(EPA) has provided the major portion of the Center’s government funding to date, and a
number of corporate sponsors, primarily in the chemical and petrochemical fields, have
also supported the program.
A nine-member Board of Directors oversees the management and activities of the Leland
Center. The Board also appoints the thirteen members of a Scientific Advisory Panel (SAP)
who are drawn from the fields of government, academia and industry. These members
represent such scientific disciplines as epidemiology, biostatistics, toxicology and medicine.
The SAP provides guidance in the formulation of the Center’s research program and
conducts peer review of research results of the Center’s completed projects.
The Leland Center is named for the late United States Congressman George Thomas
“Mickey” Leland from Texas who sponsored and supported legislation to reduce the
problems of pollution, hunger, and poor housing that unduly affect residents of low-income
urban communities.
This project has been funded wholly or in part by the United States Environmental Protection Agency under assistance agreement X83234601.
The contents of this document do not necessarily r
eflect the views and policies of the Envir
onmental Pr
otection Agency, nor does mention of
trade names or commercial products constitute endorsement or recommendation for use.
The National Health and Nutrition Examination
Surveys (NHANES) Volatile Organic Compound
Dataset: An Introduction to the Project and
Analyses of the Relationship between Personal
Exposures to VOCs and Behavioral,
Socioeconomic, and
Demographic Characteristics
A Collaborative Project of The Mickey Leland National Urban Air Toxics
Research Center and The National Center for Health Statistics
2 NUATRC RESEARCH REPORT NO. 16
TABLE OF CONTENTS
3
3
3
3
3
4
4
4
4
4
4
5
5
6
6
7
7
9
9
19
31
41
BACKGROUND AND PURPOSE
THE MICKEY LELAND NATIONAL URBAN AIR TOXICS RESEARCH CENTER (NUATRC)
THE NATIONAL HEALTH AND NUTRITION EXAMINATION SURVEYS (NHANES)
THE NUATRC-NCHS COLLABORATION:THE VOC PROJECT
PURPOSE OF THIS REPORT
THE VOC PROJECT
OBJECTIVE
VOC MEASUREMENT
REVIEW OF LABORATORY ANALYSES
QU
ALITY CONTROL AND QUALITY ASSURANCE PROCEDURES
BLOOD LEVEL VOCS
PUBLIC RELEASE OF THE VOC DATASET
ANALYSIS OF THE NHANES VOC DATASET
CONCLUSION
REFERENCES
ACKNOWLEDGMENTS
ABBREVIATIONS
JOURNAL MANUSCRIPT REPRINTS
DISTRIBUTIONS OF PERSONAL VOC EXPOSURES: A POPULATION-BASED ANAYSIS (JIA, ET AL)
PREDICTORS OF PERSONAL AIR CONCENTRATIONS OF CHLOROFORM AMONG U.S. ADULTS IN NHANES 1999-2000
(RIEDERER, ET AL)
DEMOGRAPHIC, RESIDENTIAL, AND BEHAVIORAL DETERMINANTS OF ELEVATED EXPOSURES TO BENZENE,
ETHYLBENZENE, AND XYLENES AMONG U.S. POPULATION: RESULTS FROM 1999-2000 NHANES (SYMANSKI, ET AL)
CHAR
ACTERIZING RELATIONSHIPS BETWEEN PERSONAL EXPOSURES TO VOCS AND SOCIOECONOMIC,
DEMOGRAPHIC, BEHAVIORAL VARIABLES (WANG, ET AL)
NUATRC RESEARCH REPORT NO. 16
BACKGROUND AND PURPOSE
THE MICKEY LELAND NATIONAL URBAN AIR TOXICS
RESEARCH CENTER (NUATRC)
The Clean Air Act Amendments of 1990 established a
control program for sources of 187 “hazardous air
pollutants,” or “air toxics” that may pose a risk to public
health. With the passage of these amendments, Congress
established the NUATRC to develop and direct an
environmental health research program that would promote
a better understanding of the risks posed to human health
by the presence of these toxic chemicals in urban air.
Established as a public/private research organization, the
NUATRC's research program is developed with guidance
from a Scientific Advisory Panel composed of scientific
experts from academia, industry, and government and seeks
to fill gaps in scientific data. NUATRC-funded research is
intended to assist policy makers in the evaluation and
promulgation of sound environmental health decisions.
The NUATRC accomplishes its research mission by
sponsoring research on human health effects of air toxics at
universities and research institutions, by supporting
periodic workshops to share the current science on air
toxics, and by publishing NUATRC-funded study results in
its “NUATRC Research Reports,” thereby contributing
meaningful and relevant data to the peer-reviewed
literature.
THE NATIONAL HEALTH AND NUTRITION EXAMINATION
SURVEYS (NHANES)
The National Health Survey Act, passed in 1956,
authorized a continuing survey of the Nation's health to
provide current statistical data on the effects of illness and
disability in the US. To comply with the Act, the National
Center for Health Statistics (NCHS) conducted three
National Health Examination Surveys in the 1960s. In 1970,
a nutrition component was added to the survey, and,
between 1971 and 1994, NCHS conducted four National
Health and Nutrition Examination Surveys (NHANES).
These surveys were designed to capture specific
consecutive time periods, usually of six years' duration, and
data were released for three or six-year periods. In these
surveys, data on individuals were typically collected by at
least three approaches: through direct interview, physical
examination, and by clinical testing and measur
ement.
With the inception of the 1999 NHANES, the survey
became a continuous annual event. It now collects data
from a representative sample of the US population each
year. About 5,000 randomly selected subjects per year are
chosen, aged from birth onward, from 15 different locations
across the nation. Participants provide demographic and
health data and undergo physical examinations to assess
their current health status. For this purpose, fully equipped
Mobile Examination Centers (MECs) ar
e transported to data
collection sites, referred to as “stands,” so that medical
personnel can conduct the exams on-site in a standardized
manner.
THE NUATRC-NCHS COLLABORATION:THE VOC PROJECT
The NUATRC submitted a proposal in 1997 to the NCHS
for a collaborative project that would measure personal
exposures to volatile organic compounds (VOCs) among a
representative subgroup of participants in NHANES 1999-
2001. The collaborative project was designed to provide a
profile of VOC exposures experienced by US adults during
their daily activities. The NHANES-VOC project was a data-
gathering effort; the data are available on the NCHS website,
as described below.
To encourage wide use of the dataset for new research
projects and scientific publications, the NUATRC released a
Request For Applications (RFA) in 2006 entitled:
“Relationship between Personal Exposures to VOCs and
Behavioral, Socioeconomic, and Demographic
Characteristics: Analysis of the NHANES VOC Project
Dataset.” Manuscripts written by the project grantees, based
on their research under this program, are reproduced in this
report.
PURPOSE OF THIS REPORT
This report is intended to inform the research community
about the NUATRC- and NCHS-funded VOC database so
that it can be accessed for future data mining activities. It
also features the analyses of four investigators funded by
NUATRC to analyze the dataset; their work highlights the
utility of the dataset in understanding the national
distribution of personal exposures to VOCs and
determinants of these exposures. Their work can be used by
other investigators to generate hypotheses about potentially
significant exposure sources and pathways for VOCs in the
general US population.
3
The Mickey Leland National Urban Air Toxics Research Center and The National Center for Health Statistics
4
The Relationship between Personal Exposures to VOCs and Behavioral, Socioeconomic, and Demographic Characteristics
NUATRC RESEARCH REPORT NO. 16
THE VOC PROJECT
OBJECTIVE
The NUATRC proposed a project that would collect
personal exposure data on specific VOCs in a representative
subset of NHANES participants. Such data would provide
information on the distribution of personal exposures to
these hazardous air pollutants in the US population. If such
an effort were continued, it would provide valuable
information on trends over time of these exposures and also
help evaluate impact of regulations to control these
hazardous air pollutants.
The NUATRC proposal was accepted by NCHS, and the
Collaborative NCHS-NUATRC VOC Project (VOC Project)
became a three-year component of the NHANES survey
during the period 1999-2001. The aim of the project was to
collect personal exposure data about specific VOCs in a
representative subset of NHANES participants between the
ages of 20 and 59 years. The target sample size for the VOC
Project was 1,000 participants over the three-year period.
Personal exposure data were obtained for periods of 48 to
72 hours, using small lightweight passive sampling badges
that subjects wore from the time they left the MECs until
they returned to the MEC 48 to 72 hours later. Eligible
participants were recruited after completion of their
physical examinations. Activity data for the exposure
periods were collected from participants by means of a
questionnaire administered at the end of the exposure
periods when the participants returned to the MEC. The
participants also provided information about household
characteristics at that time.
VOC MEASUREMENT
The VOCs measured in the personal exposure study
included: benzene, chloroform, ethylbenzene,
tetrachloroethene, toluene, trichloroethene, o-xylene, m-p-
xylene, 1,4-dichlorobenzene, and methyl tert-butyl ether
(MTBE).
The VOC passive exposure monitor (or badge) used
in the study was the 3M Organic Vapor Monitor (Model
3520, 3M Company, St. Paul MN). All VOC analyses
were performed in accordance with methods described
in the 3M publication: “Organic Vapor Monitor
Sampling and Analysis Guide- October 1998.”
( />ver?66666U
uZjcFSLXTtlX&6OXMtEVuQEcuZgVs6EVs6E666666 )
Extraction efficiencies were determined in accordance with
the 3M procedures. Method detection limits were
determined for each compound based on the standard
laboratory methods. A Gas Chromatograph/Mass
Spectrometer was used for analyses. Laboratory procedures
and equipment standards followed accepted USEPA
protocols.
REVIEW OF LABORATORY ANALYSES
During the three-year project period, two different
laboratory contractors performed the badge analyses in two
different time periods Exposure data for the first year and a
half of the project was analyzed by Clayton Laboratories,
and for the remainder of the project, by the Environmental
and Occupational Health Sciences Institute (EOHSI)
laboratory of the University of Medicine and Dentistry, New
Jersey (UMDNJ), both contractors to the Leland Center.
Prior to approving the release of the VOC Project data set,
NCHS scientists conducted a review of the procedures
followed by the two laboratory groups in order to assess the
compatibility of the approaches taken by the two
laboratories and the reasonableness of the data produced for
the project. Although the methods used by the two contract
laboratories differed from those used at NCHS, the results
were judged to be comparable after the review was
completed.
QUALITY CONTROL AND QUALITY ASSURANCE PROCEDURES
Laboratory procedures and equipment standards
followed accepted USEPA protocols. For Quality Assurance
purposes, 10 percent of samples were split and analyzed
independently by the NUATRC contractor laboratory and
an outside laboratory. The analyses of these paired samples
were conducted at the two laboratories concurrently. The
results were evaluated for consistency and accuracy.
Quality Control procedures during the VOC Project
included the collection and analysis of the following
samples from each of the stands: two field blanks, one
positive control, two duplicate pairs, and one office air
sample.
BLOOD LEVEL VOCS
A subset of VOC Project participants also took part in a
related NHANES component, sponsored by the Centers for
Disease Control's (CDC) Center for Environmental Health
(CEH). That component collected data on blood-level VOCs
and home drinking water VOCs. Those study subjects were
asked to bring samples of home drinking water to the MEC
when they returned at the end of their exposure periods.
The goal of the CEH Project was to characterize the
NUATRC RESEARCH REPORT NO. 16
distributions of blood and water VOCs and to investigate
possible relationships between them.
PUBLIC RELEASE OF THE VOC DATASET
After the three-year data collection period for the VOC
Pr
oject ended, a Workshop was held to review the project
data. Participants included a panel of six researchers with
significant experience in conducting and evaluating
community studies of environmental health effects (Edo
Pellizzari of Research Triangle Institute, Paul Feder of
Battelle, David Ashley of CDC, Thomas Stock of the
University of Texas School of Public Health, Martin Harper
of CDC, and Edward Avol of the University of Southern
California Keck School of Medicine), NCHS scientists and
staff, and NUATRC staff.
At the conclusion of the Workshop, the Panel
recommended that the 1999-2000 VOC Project dataset be
released on the NCHS web site as part of the 1999-2000
NHANES data release. Data for ten VOCs were released in
April 2005: benzene, chloroform, ethylbenzene,
tetrachloroethylene, trichloroethylene, toluene,
m-p-
xylene,
o-xylene, 1,4 dichlorobenzene, and MTBE. The
website for the1999-2000 NHANES dataset is:
http://www
.cdc.gov/nchs/nhanes/nhanes99_00.htm.
The 2001 VOC Project dataset could not be publicly
released because of the small size, and the risk of disclosure
of individual information or identities in a one-year dataset.
The three-year 1999-2001 VOC Project was released for use
in the Research Data Center in 2007.
The Research Data Center at NCHS was established to
assist researchers whose projects require access to data that
are confidential in nature, or might lead to the disclosure of
confidential information or individual identities. These
researchers are asked to submit proposals to the Research
Data Center, describing their projects. If their proposals are
approved, the staff will then prepare a dataset created for
the particular project, while maintaining strict
confidentiality, and can provide statistical programming
and consulting expertise to facilitate the data analysis for
the project. There are fees associated with using the
Research Data Center.
The Research Data Center is located at the NCHS
headquarters office in Hyattsville, Maryland. Researchers
may work onsite at the headquarters or may access their
data at a remote site. Another option is to carry out the
research at a Census Research Data Center. The web site
addr
ess for this Center is:
http://www
.cdc.gov/nchs/r&d/rdc.htm
ANALYSIS OF THE NHANES VOC DATASET
To encourage wide use of the dataset for new research
projects and scientific publications, the NUATRC released
an RFA in 2006 entitled: “Relationship between Personal
Exposures to VOCs and Behavioral, Socioeconomic, and
Demographic Characteristics: Analysis of the NHANES
VOC Project Dataset.”
In November 2006, the NUATRC awarded four one-year
contracts. A condition of the award was that each
investigator was to prepare a manuscript based on the
project and submit it to a peer-reviewed publication. Grants
were awarded to the following investigators:
• Stuart Batterman, Environmental Health Sciences, School
of Public Health, University of Michigan, Ann Arbor,
Michigan
• P. Barry Ryan, Department of Environmental and
Occupational Health, Rollins School of Public Health,
Emory University, Atlanta, Georgia
• Elaine Symanski, Division of Epidemiology and Disease
Control, University of Texas School of Public Health,
Houston, Texas
• Sheng-Wei Wang, Institute of Environmental Health,
Taiwan (formerly of Environmental and Occupational
Health Sciences Institute, Piscataway, New Jersey)
In conformance with award requirements, each of these
investigators published their findings in the peer-reviewed
literature, and these publications (through agreement with
the respective journals) are reprinted in the pages that
follow.
Briefly, Drs. Jia, D'Souza, and Batterman (2008)
characterized distributions of personal exposures to ten of
the VOCs measured in the 1999-2000 NHANES. This study
provides graphs and tables that illustrate the national
exposure distribution and compares the NHANES results to
studies assessing VOC exposures among different
populations. According to the Jia et al analyses,
participants' exposures to VOCs vary dramatically. They
identified four groups of possible emission sources:
gasoline vapors and exhaust; tap water disinfection
products; cleaning products; and gasoline additive (MTBE).
They identified several methodological issues, and
suggested that complete models for the distribution of VOC
exposures require an approach that combines standard and
extreme value distributions and carefully identifies outliers.
Drs. Riederer, Bartell, and Ryan (2009) found that 8 of 10
US adults were exposed to detectable levels of chloroform.
5
The Mickey Leland National Urban Air Toxics Research Center and The National Center for Health Statistics
6
The Relationship between Personal Exposures to VOCs and Behavioral, Socioeconomic, and Demographic Characteristics
NUATRC RESEARCH REPORT NO. 16
Significant predictors of personal exposure to chloroform
included: demographic (age, race/ethnicity) and housing
characteristics (type of home, chloroform concentration in
home tap water), and personal exposure microevents
(leaving home windows open, visiting a pool). Reported
showering activity was not a significant predictor of
personal air chlor
oform in the study. The authors argued
that NHANES measurements likely underestimated true
inhalation exposures since subjects did not wear sampling
badges while showering or swimming, and because of
possible undersampling by the passive monitors.
Drs. Symanski, Stock, Tee, and Chan (2009) investigated
the relationship of socioeconomic, behavioral,
demographic, and residential characteristics to personal
exposures to benzene, toluene, ethylbenzene, and xylenes
(BTEX) compounds among a subsample of the NHANES
participants. Geometric mean (GM) levels were
significantly higher for males for all compounds except
toluene. For benzene, GM levels were elevated among
smokers and Hispanics. Regression analyses suggested that
the presence of an attached garage (for BTEX), having
windows closed in the home during the monitoring period
(for benzene and toluene), pumping gasoline (for toluene,
ethylbenzene and xylenes), or using paint thinner, brush
cleaner, or stripper (for xylenes) resulted in higher
exposures in the general population. The results of these
analyses confirmed findings of previous studies.
Drs. Wang, Majeed, Chu, and Lin (2009) found that
different subsets of behavioral, socioeconomic, and
demographic variables were significant exposure
predictors, depending upon the nature of the VOCs.
Sociodemographic factors (e.g., race/ethnicity and family
income) were generally found to influence personal
exposures to three chlorinated compounds: chloroform, 1,4-
dichlorobenzene, and tetrachloroethane. For the BTEX
compounds, housing characteristics (e.g., leaving windows
open and having an attached garage), and personal activities
related to the use of fuels or solvent-related products had a
significant influence on exposures. Differences in BTEX
exposures were also found in relation to gender due to
differences in time spent at work/school and outdoors. The
investigators presented a variety of statistical analysis
techniques for resolving challenges and limitations of the
dataset, including dealing with issues of outliers,
collinearity, and interaction effects.
CONCLUSION
A number of VOCs are among the air toxics listed in the
1990 Clean Air Amendments. Many of these compounds
were known to be present in both indoor and outdoor air,
but had not been monitored among the general population.
Infor
mation on levels of exposure to these compounds was
essential to determine the need for regulatory mechanisms
to reduce the levels of hazardous air pollutants to which the
general public is exposed. The NUATRC therefore
embarked on a project with the NCHS to develop a profile
of VOC exposures encountered by US adults in their daily
activities.
The NUATRC-NCHS collaborative project provides
valuable data, revealing a national distribution of personal
exposures to VOCs, which can be used to compare how
exposures in individual communities relate to the national
distribution. Because the NHANES characterized national-
level VOC exposures using a population-based sampling
strategy, the results represent non-occupational VOC
exposures throughout the US. The results of the four
NUATRC grant recipients can be used by other investigators
in generating hypotheses about potentially significant
exposure sources and pathways for VOCs in the general US
population. The results may also help in developing
approaches for minimizing VOC exposures and reducing
environmental health risks in the general population. Other
investigators are encouraged to access the dataset for future
data mining activities.
REFERENCES
Jia C, J D'Souza, and S Batterman. 2008. Distributions of
Personal VOC Exposures: A Population-based Analysis.
Environ Int 34(7): 922-931.
Riederer AM, SM Bartell and PB Ryan. 2009. Predictors of
Personal Air Concentrations of Chloroform among US
Adults in NHANES 1999-2000.
J Expo Sci Environ
Epidemiol
19(3):248-259.
Symanski E, TH Stock, PG Tee, W Chan. 2009.
Demographic, Residential, and Behavioral Determinants of
Elevated Exposures to Benzene, Toluene, Ethylbenzene,
and Xylenes among the US Population: Results from 1999-
2000 NHANES.
J Toxicol Environ Health A 72(14):915-24.
W
ang SW, MA Majeed, PL Chu, HC Lin. 2009.
Characterizing Relationships between Personal Exposures
to VOCs and Socioeconomic, Demographic, Behavioral
V
ariables.
Atmos Envir
on
43:2296-2302.
NUATRC RESEARCH REPORT NO. 16
ACKNOWLEDGMENTS
The NUATRC wishes to express its sincere appreciation
to the recipients of its NHANES VOC Project grants, Dr.
Stuart Batterman at University of Michigan, Drs. Barry Ryan
and Anne Riederer at Emory University, Dr. Elaine
Symanski at the University of T
exas, and Dr. Sheng-Wei
Wang at Institute of Environmental Health in Taiwan as
well as their research teams. The NUATRC also thanks Drs.
Thomas Stock and Maria Morandi, who developed the
original study design and questionnaire for the Pilot Study
and Dr. Clifford Weisel of EOHSI, who supervised the
analysis of badge samples. We also thank Brenda Gehan,
NUATRC Project Coordinator; Clifford Johnson, Director of
NHANES; Susan Schober, Senior Epidemiologist, NCHS;
David Lacher, Medical Officer, NCHS; Lester Curtin, Senior
Mathematical Statistician, NCHS; and NUATRC Scientific
Advisory Panel, whose expertise, diligence, and patience
have facilitated the successful completion of this report.
ABBREVIATIONS
BTEX benzene, ethylbenzene, toluene, and xylene
CAAA Clean Air Act Amendments
CDC Centers for Disease Control
CEH Center for Environmental Health
EOHSI
Environmental and Occupational Health
Sciences Institute
EPA Environmental Protection Agency
GM geometric mean
MTBE methyl
tert-butyl ether
MEC mobile examination center
NCHS National Center for Health Statistics
NHANES National Health and Nutrition Examination
Surveys
NUATRC National Urban Air Toxics Research Center
RFA Request for Applications
SAP Scientific Advisory Panel
UMDNJ University of Medicine and Dentistry, New
Jersey
VOC volatile organic compound
VOC Project Collaborative NCHS-NUATRC VOC Project
7
The Mickey Leland National Urban Air Toxics Research Center and The National Center for Health Statistics
Distributions of personal VOC exposures: A population-based analysis
Chunrong Jia, Jennifer D'Souza, Stuart Batterman
*
University of Michigan, Ann Arbor, MI 48109-2029, USA
Artic le history:
Received 19 November 2007
Accepted 10 February 2008
Available online 1 April 2008
© 2008 Elsevier Ltd. All rights reserved.
Keywords:
Benzene
Distribution
Exposure
Gumbel
Log-normal
MTBE
Outliers
Personal
Risk
Volatile organic compound
VOCs
Environment International 34 (2008) 922–931
* Corresponding author. Tel.: +1 734 763 2417.
E-mail address: (S. Batterman).
0160-4120/$ – see front matter © 2008 Elsevier Ltd. All rights reserved.
doi: 10.1016/j.envint.2008.02.0 02
Contents lists available at ScienceDirect
Environment International
j o u r n a l h o m e p a g e: w w w. elsev i e r.c om / l o c a t e / e nvint
A R T I C L E I N F O A B S T R A C T
exposures (Sielken and Valdez-Flores,1999) and have been considered
the most promising technique to emerge in the field of exposure
assessment (Nieuwenhuijsen et al., 2006). These remarks apply to air
pollutant measurements obtained using ambient, indoor and personal
monitoring, and to many other types of environmental measurements.
Further, they apply to both longitudinal (sequences) and cross-
sectional (spatial) data, with some differences resulting from the
types of correlations involved.
Pollutant distributions used in exposure and risk analyses are
usually derived from empirical data, and measurements using
personal monitoring are considered to be the best approximations to
actual exposure (NRC,1991). While personal monitoring has been used
for many pollutants, e.g., particulate matter, nitrogen oxides and
volatile organic compounds (VOCs), previous studies have not used a
population-based sample, and thus are not necessarily representative
of a broad population. In addition, the databases underlying many
studies used to estimate distributions may be unavailable, inconsistent
in quality, and difficult to understand. Indeed, it is a mammoth task to
design, recruit, monitor, quality-assure and evaluate a population-
based program, especially for large regions like the U.S. Importantly, if
the assumed pollutant distribution is not representative, then pre
dictions may not ref lect true exposures, and conclusions regarding
exposures and risks may be erroneous.
The objective of this study is to characterize the distributions of
personal exposures to VOCs in the U.S. measured in the 1999–2000
1. Introduction
Information regarding the distribution of pollutant concentrations
is used to answer many important questions in exposure and risk
assessment, such as ‘What is the variability of the exposure es-
timates?’(US EPA, 1992), and ‘How many individuals have exposure
over a given risk-based threshold?’ In the context of risk management,
this information is needed to apportion emission sources and, more
generally, to evaluate policies and interventions: ‘What fraction of
exposure is due to occupational exposure, traffic, indoor and other
sources?’ and: ‘Would controlling emission sources in residential
garages significantly reduce benzene exposure?’ (Batterman et al.,
2007; Loh et al., 2007). The availability, and then the form and
parameterization of distributions are critical assumptions that deter-
mine the answers to such questions. While the use of “standard”
distributions has been encouraged when feasible (Finley and Paus-
tenbach, 1994), typical statistical measures of central tendency and
dispersion, such as means, medians and standard deviations, and the
common assumption of log-normality may inadequately describe the
true distribution. Probabilistic methods, which use probability
distributions instead of point estimates to represent the range of
possible exposures, are potentially more representative of actual
Information regarding the distribution of volatile organic compound (VOC) concentrations and exposures is
scarce, and there have been few, if any, studies using population-based samples from which representative
estimates can be derived. This study characterizes distributions of personal exposures to ten different VOCs in
the U.S. measured in the 1999–2000 National Health and Nutrition Examination Survey (NHANES). Personal
VOC exposures were collected for 669 individuals over 2–3 days, and measurements were weighted to derive
national-level statistics. Four common exposure sources were identified using factor analyses: gasoline vapor
and vehicle exhaust, methyl tert-butyl ether (MBTE) as a gasoline additive, tapwater disinfection products, and
household cleaning products. Benzene, toluene, ethyl benzene, xylenes chloroform, and tetrachloroethene
were fit to log-normal distributions with reasonably good agreement to observations.1,4-Dichlorobenzene and
trichloroethene were fit to Pareto distributions, and MTBE to Weibull distribution, but agreement was poor.
However, distributions that attempt to match all of the VOC exposure data can lead to incorrect conclusions
regarding the level and frequency of the higher exposures. Maximum Gumbel distributions gave generally
good fits to extrema, however, they could not fully represent the highest exposures of the NHANES
measurements. The analysis suggests that complete models for the distribution of VOC exposures require an
approach that combines standard and extreme value distributions, and that carefully identifies outliers. This is
the first study to provide national-level and representative statistics regarding the VOC exposures, and its
results have important implications for risk assessment and probabilistic analyses.
Jia et al: Reprinted from Environment International, 34(7), Jia C, J D'Souza, and S Batterman, “Distributions of Personal VOC Exposures:
A Population-based Analysis,” 922-931, 2008, with permission from Elsevier.
National Health and Nutrition Examination Survey (NHANES). This
population-based survey represents what is believed to be the largest
study of VOC exposures in a community setting. The behavior of the
full range of the measurements is described using common statistical
distributions. We use correlations and factor analyses to identify
related VOCs and possible sources, and compare measurements to
risk-based levels. We then fit extreme concentrations to the maximum
Gumbel distribution, and address the issue of outliers. We conclude by
contrasting the NHANES measurements with several other recent
studies of personal VOC exposures.
2. Methods
2.1. NHANES
NHANES was designed primarily to assess the health and nutritional status of
adults and children in the U.S. through interviews and physical examinations. Surveys
were conducted periodically from 1971 to 1994, and became continuous in 1999. The
current NHANES (also known as continuous NHANES) was initiated in 1999 and uses a
2-year survey cycle. In the overall NHANES 1999–2000 sample, there were 9965
participants (5161 adults and 4804 children ≤ 18 years of age). Participants were
sampled through a stratified, multistage probability sampling scheme (CDC, 2006a,b).
Initially, counties (or blocks of counties) were selected. Within counties, groups of
blocks (household clusters) were chosen. Letters were sent to selected households
within those blocks, informing them of the study, after which NHANES staff visited the
households and one or more participants were interviewed from the household. Five
sub-populations were over-sampled to ensure sufficient sample size, specifically, low-
income persons, adolescents 12–19 years, persons ≥ 60 years of age, African Americans,
and Mexican Americans. The 1999–2000 survey was the first to measure personal
exposure to VOCs. A sub-sample of 851 adults (ages 20–59 years) of the overall NHANES
sample was selected to participate in these measurements. The sub-sample is based on
a one-fourth sample from 1999 and a one-third sample from 2000, and was designed to
be nationally representative.
2.2. VOC sampling and analysis
Personal VOC exposures were collected on the adult sub-sample selected from the
NHANES sample. There were no additional exclusion criteria. Participants were
instructed to wear badge-type passive exposure monitors (3M 3520 OVM, 3M Co., St.
Paul, MN) for 48–72 h. Additionally, participants were administered a short
questionnaire regarding the length of time they wore their badge and 30 other
questions on factors potentially related to VOC exposures, e.g., contact with dry cleaning,
tobacco smoke and gasoline vapor over the past several days. These questions were not
included in the larger NHANES survey.
VOC badges were chemically desorbed and analyzed by gas chromatography/mass
spectrometry (GC/MS, HP 5890/5972 MSD, EnviroQuant ChemStation, Hewlett-Packard,
Palo Alto, CA) following well-defined protocols and QA/QC protocols (CDC, 2006c;
Weisel et al., 2005a; Chung et al., 1999a,b). VOCs included benzene, toluene, ethyl
benzene, m,p-xylene, o-xylene (i.e., BTEX compounds), chloroform, trichloroethene
(TCE), tetrachloroethene (PERC), 1,4-dichlorobenzene (p-DCB) and methyl tert-butyl
ether (MTBE) (CDC, 2006c). Properties and method detection limits (MDLs) of these
compounds are summarized in Table 1, and the MDLs determined by Weisel et al.
(2005a) were applied in this paper.
2.3. Data acquisition and cleaning
Data were extracted from the 1999–2000 NHANES databases, maintained at the
Center for Disease Control and Prevention's (CDC) website ( www.cdc.gov/nchs/about/
major/nhanes/lab99_00.htm). The original dataset contained 851 cases (individuals)
and 53 variables, which included the participant's identification number, concentra-
tions and detection status of the ten VOCs, sampling information (including number of
hours the badge was worn), house characteristics, and participant activities. The dataset
also contain sampling variables specific to the VOC dataset, which represent the
influence of the observation in extrapolating to the national level, and which account
for the clustering in the data. These variables allow the results to be generalized to the
U.S. civilian non-institutionalized population. Due to the clustering, the total variance
also includes intra-cluster correlation, since observations within a cluster tend to be
similar. Not accounting for the clustering gives incorrect variance estimates and inflated
significance.
Of the 851 cases, 182 were non-respondents and were excluded from further
analyses. Two cases with excessively long sampling periods (5.7 and 7.9 days,
participants #578 and #468, respectively) were excluded. An initial screening analysis
identified two outliers (participants #3852 and #4076) with extremely high concentra-
tions of BTEX (N 2000 μgm
− 3
of ethyl benzene and xylenes for #3852, and N 6000 μgm
− 3
of toluene for #4076). These two cases were excluded. The final dataset included 665
participants.
2.4. Data analysis
As simple indicators of exposure, we defined two new variables: BTEX as the sum of
the five BTEX components; and TVOC
10
as the sum of the ten VOCs measured in
NHANES. The sums also used one-half of the MDLs for non-detects. Analysis started
with basic descriptive statistics, including sample size, detection frequency (DF),
average, standard deviation and percentiles. Spearman rank correlation coefficients
were calculated to investigate the relationship among pairs of VOCs using the weighted
dataset. The statistical significance of the correlations was determined for each VOC pair
as the minimum p-value from two linear regressions of each VOC on the other, also
using the weights as well as appropriate variance estimates. This procedure was used
for |r| N 0.4, and coefficients were considered significant for p ≤ 0.05. These statistics
were generated by SAS-callable SUDAAN (release 9.0, Research Triangle Institute,
Research Triangle Park, NC, U.S.) and the survey procedures in SAS 9.1 (SAS Institute Inc.,
Cary, NC, U.S.), which contain algorithms that properly weight cases and account for the
non-random and clustered sampling of the NHANES data. Factor analysis was used to
help identify common VOC sources and to identify a subset of four VOCs with varying
properties and different sources for further analysis in the present paper (Supplemental
materials give results for all ten VOCs). This analysis used log-transformed unweighted
data as full concentrations of most compounds were roughly log-normally distributed
(see results), and varimax rotations. Our analysis focused on the larger factor loadings,
typically N 0.6. These analyses used SAS 9.1.
To fit distributions of the full range of concentrations and extreme values, we
synthesized a derived dataset (n =14,898) in which cases were repeated with the
frequency of repetitions based on the case weights. This approach yields valid statistics
when the variance and correlation among variables was unimportant, e.g., univariate
analyses. Distributions were fitted by maximum likelihood estimation (Thompson,
1999a) using a sample size N 10,000 to achieve a high level of reliability in distributional
Table 1
Physical and chemical properties and method detection limits (MDLs) of the 10 VOCs
VOC Abbreviation Chemical
formula
CAS no. MW MP BP MDL (μgm
− 3
)
a
Unit Risk
b
RfC
b
(°C) (°C) EOHSI
c
UTSPH
d
UTSPH
d
(per μgm
− 3
)(μgm
− 3
)
Benzene Benzene C
6
H
6
71-43-2 78.1 5.5 80.1 1.1 0.54 0.7 7.8×10
− 6
30
Toluene Toluene C
7
H
8
108-88-3 92.1 − 95.0 110.6 6.7 7.12 5.5 NA NA
Ethyl benzene Ethyl benzene C
8
H
10
100-41-4 106.2 − 95.0 136.2 0.74 0.22 NA NA 1000
m-Xylene
e
m-Xylene C
8
H
10
108-38-3 106.2 − 47.9 139.1 1.4 0.65 NA NA NA
p-Xylene
e
p-Xylene C
8
H
10
106-42-3 106.2 13.3 138.4 1.4 0.65 NA NA NA
o-Xylene o-Xylene C
8
H
10
95-47-6 106.2 −25.2 144.4 0.85 0.29 NA NA NA
1,4-Dichlorobenzene p-DCB C
6
H
4
Cl
2
106-46-7 147.0 53.0 174.1 0.91 0.43 2.2 NA 800
Chloroform Chloroform CHCl
3
67-66-3 119.4 − 63.5 61.2 0.42 0.28 0.3 2.3 ×10
− 5
NA
Trichloroethene TCE C
2
HCl
3
79-01-6 131.4 − 84.8 87.0 0.44 0.24 NA NA NA
Tetrachloroethene PERC C
2
Cl
4
127-18-4 165.8 − 22.4 121.3 0.42 0.22 1.1 NA NA
Methyl tert-butyl ether MTBE C
5
H
12
O 1634-04-4 88.2 − 108.6 55.2 0.68 0.38 NA NA 3000
CAS=Chemical Abstracts Service, MW=molecular weight, MP=melting point, and BP=boiling point are all from the CRC handbook (Lide, 2005). RfC=Reference concentration, Unit
Risk=carcinogenic slope factor.
a
Based on 48-hour samples.
b
From US EPA (2007) showing the high unit risk estimate for benzene (low estimate is 2.2× 10
− 6
).
c
From Weisel et al. (2005a).
d
From Chung et al. (1999b).
e
m- and p-xylenes cannot be separated in the method, and they are considered as one compound.
923C. Jia et al. / Environment International 34 (2008) 922–931
attribution (Haas, 1997). Goodness-of-fit was evaluated using Anderson–Darling (A–D),
Kolmogorov–Smirnov (K–S), and Chi-square (χ
2
) tests, and by visually examining
probability plots and histograms. The A–D test served as the primary criterion since it is
suitable for fitting distributions with extreme tails, and thus appropriate for the
extrema emphasized here. Smaller A–D statistics indicate better fits. The other tests
help to confirm or improve the selection. These analyses primarily used Crystal Ball
(Decisioneering, Inc., Denver, CO, U.S.).
To test whether the highest concentrations fit a maximum Gumbel distribution, a
form used in several earlier air pollution analyses (Roberts, 1979a,b), we used a
relatively simple procedure (Barnett, 1975) in which each ordered extreme value C
i
is
plotted against quantity −ln[− ln(P
v
)], where P
v
is:
P
v
¼ r À 0:44
ðÞ
= N þ 0:12
ðÞ
ð1Þ
and where r =the reverse rank of C
i
, and N=the number of the extreme values. A good
fit (e.g., R
2
near unity) to the linear regression line confirms the appropriateness of this
distribution. This analysis was performed for the top decile among all participants
(n =64–65 cases after eliminating missing data), and also for the top 5% of
concentrations that exceeded MDLs (n =11–30 cases, depending on the VOC).
3. Results
3.1. Descriptive analysis
Descriptive statistics for the NHANES 1999–2000 VOC data are given in Table 2
(Supplementary materials give the complementary unweighted analysis in Table S1).
Most of the VOCs had detection frequencies (DF) exceeding 60%, except for TCE
(DF=23%) and MTBE (DF=28%). Concentrations varied widely, reflected in large
standard deviations and skewness coefficients. Chloroform's range was more restricted
(b MDL to 54 µg m
− 3
). In most cases, statistics obtained using weighted and unweighted
approaches were similar (Tables 2 and S2) although p-DCB and MTBE show several
differences at the higher concentrations, e.g., the weighted 75th and higher percentile
concentrations were much lower than the unweighted data for p-DCB, showing the
importance of using population-based statistics.
Of the ten VOCs, four had reference concentrations related to non-cancer toxicity
and two had cancer-slope factors listed in the US EPA IRIS database (US EPA, 2007)
(toxicity information for other VOCs is available elsewhere, but we restricted analyses to
the IRIS list, which is peer-reviewed and widely accepted). To identify those individuals
with high exposures to certain VOCs, we calculated the fraction with exposures that
exceeded the reference concentration or excess lifetime cancer risk levels of 10
− 4
,10
− 5
and 10
− 6
, with the strong assumption that the short-term NHANES measurement was
representative of long term exposures. Nearly all (N 99%) of the measurements fell below
the reference concentrations. A few (b 1%) of the benzene a nd ethyl b enze ne
measurements exceeded reference concentrations. However, 77 and 10% of the NHANES
measurements exceeded benzene concentrations that correspond to lifetime individual
risks of 10
− 5
and 10
− 4
(1.3 and 12.8 µg m
− 3
, respectively) (the upper bound cancer-slope
factor in IRIS was used for benzene). For chloroform, 86 and 16% exceeded these risk
levels (0.4 and 4.4 μgm
− 3
, respectively). However, because benzene's MDL (typically
1.1 μgm
− 3
) corresponds to a risk level of 8.6 ×10
− 6
, and chloroform's MDL (0.4 μgm
− 3
)
corresponds to 9.7 ×10
− 6
, the statistics for the 10
− 5
risk level (and lower) may not be
meaningful. Still, statistics for the higher exposures are significant and striking — there
are few other environmental pollutants that yield ≥ 10
− 4
risks in 10–16% of the
population. The median risks for benzene and chloroform (2.2 ×10
− 5
and 2.3 ×10
− 5
,
respectively) also are very similar to predictions based on microenvironmental con-
centrations and time activity patterns (Loh et al., 2007), although the fraction of the
NHANES subjects with risks ≥ 10
− 4
for these compounds appears to exceed the upper
range of predictions. This suggests that a full range distribution provides a poor fitto
extrema, which deserves special attention since these extrema represent the most
exposed individuals. In the following, we discuss the major VOC groups and individual
compounds.
3.1.1. BTEX compounds
Unsurprisingly, the five BTEX compounds were detected in nearly every sample
(DF=66 for benzene to DF =96% for m,p-xylene). Toluene and m,p-xylene had the
highest concentrations among the ten VOCs (medians of 17.4 and 6.5 μgm
− 3
,
respectively), and toluene was the predominant VOC component among the ten
VOCs for most (55%) participants. BTEX comprised the majority of TVOC
10
(average
percentage of BTEX:T VOC
10
=67 ±25%). BTEX compounds often arise as a group,
primarily from evaporated gasoline and vehicle exhaust. However, toluene and xylene
also have many separate and indoor sources, e.g., paints, solvents, and cigarette smoke.
Many studies have detected and reported high concentrations of the BTEX compounds
(Raw et al., 2004; Saarela et al., 2003; Mohamed et al., 2002; Clayton et al., 1999).
3.1.2. Chlorinated compounds
The four chlorinated compounds in the NHANES dataset had lower detection
frequencies (23–79%) than the BTEX compounds. Typically, outdoor levels of these
Table 2
Descriptive statistics of weighted data including the ten VOCs plus BTEX and TVOC
10
VOC Missing DF Mean SD GM GSD Skewness Min 25th Median 75th 90th 95th 99th Max
(%) (μgm
− 3
)(μgm
− 3
)(μgm
− 3
)(μgm
− 3
)(μgm
− 3
)(μgm
− 3
)(μgm
− 3
)(μgm
− 3
)(μgm
− 3
)(μgm
− 3
)(μgm
− 3
)(μgm
− 3
)
Benzene 21 65.5 5.3 7.0 3.2 2.6 4.3 0.7 1.4 2.8 5.8 13.5 18.7 32.6 119.5
Toluene 30 93.6 36.4 107.3 17.5 2.8 10.3 1.7 9.2 17.4 29.9 59.8 98.3 331.1 1610.8
Ethyl benzene 26 93.0 8.4 41.3 2.9 3.2 17.7 0.1 1.3 2.6 5.2 14.2 25.2 110.9 837.1
m,p-Xylene 22 95.9 18.8 43.2 7.2 3.5 5.8 0.2 3.3 6.5 14.6 38.7 69.8 233.0 728.7
o-Xylene 22 92.5 6.5 14.5 2.8 3.2 6.6 0.1 1.3 2.4 4.9 14.1 26.4 62.5 202.3
BTEX 15 97.6 74.4 153.0 36.5 2.9 6.6 0.8 18.6 33.2 66.6 152.8 285.3 784.4 1966.2
p-DCB 24 62.9 27.3 120.7 3.2 5.7 11.5 0.3 0.9 1.7 9.2 34.8 142.1 490.8 2235.6
Chloroform 17 79.3 2.7 4.5 1.4 3.0 4.4 0.2 0.6 1.1 3.0 5.9 12.1 25.4 53.9
TCE 24 22.9 3.4 22.7 0.4 3.4 11.0 0.1 0.2 0.3 0.5 1.2 7.4 75.5 327.3
PERC 26 69.0 5.2 31.2 1.0 4.1 16.1 0.1 0.4 0.7 2.4 6.6 18.5 76.8 659.1
MTBE 24 27.8 5.2 15.6 1.4 4.1 7.9 0.4 0.5 0.6 5.5 10.7 21.3 50.0 181.7
TVOC
10
13 99.2 117.3 200.9 61.9 2.9 5.3 0.6 31.1 55.6 106.0 273.4 382.8 1206.4 2276.1
n=665.
Percentiles below method detection limits (MDLs) are italicized. DF=Detection frequency; SD=standard deviation; GM=geometric mean; GSD=geometric standard deviation. Non-
detects were set to one-half of the MDLs.
Notes: As noted in the text, two cases were censored as outliers: Subject #4076 had a toluene concentration of 6280 μgm
− 3
(TVOC
10
=6488 μgm
− 3
); Subject #3852 had ethyl benzene,
m,p-xylene and o-xylene concentrations of 2210, 8370 and 2321 μgm
− 3
, respectively (TVOC
10
=14,287 μgm
− 3
).
Table 3
Spearman rank correlation coefficients for the 10 VOCs using the weighted data, with statistically significantly coefficients (p b 0.05) in bold
VOCs Toluene Ethyl benzene m,p-Xylene o-Xylene p-DCB Chloroform TCE PERC MTBE
Benzene 0.59 0.61 0.67 0.60 0.10 0.14 − 0.10 0.04 0.22
Toluene 0.70 0.72 0.72 0.13 0.13 0.02 0.12 0.11
Ethyl benzene 0.95 0.92 0.05 0.02 0.09 0.17 0.18
m,p-Xylene 0.95 0.07 0.05 0.07 0.18 0.20
o-Xylene 0.07 0.07 0.09 0.21 0.19
p-DCB 0.29 0.08 0.04 0.00
Chloroform 0.02 0.15 0.06
TCE 0.41 0.21
PERC 0.24
924 C. Jia et al. / Environment International 34 (2008) 922–931
compounds are low, and exposure occurs mostly from indoor or especially occupational
sources. Due to the differences among these compounds, each is discussed separately.
• p-DCB levels were surprisingly high and showed tremendous variability (median =
1.7 μgm
− 3
, average =27 μgm
− 3
, maximum=2236 µg m
− 3
), possibly due to the use of
mothballs, air fresheners and other deodorants (Sack et al., 1992; Wallace et al., 1987).
p-DCB was the predominant VOC in 15% of the exposure measurements, and for these
133 particip ants, the median concentration was high, 61.7 µg m
− 3
.
• Chloroform was found in most samples (79%) at a median concentration of 1.1 μgm
− 3
.
Chloroform along with bromodichloromethane, dibromochloromethane and bromo-
form, are trihalomethanes (THMs) that are often formed as water disinfection by-
products when chlorine is added to water, and that can be released to indoor air
when chlorinated tap water is used (Weisel et al., 1999).
• Tetrachloroethylene (PERC) was found in 69% of samples with a median concentration
of 0.7 μgm
− 3
. PERC is a component of dry-cleaning fluids, and high concentrations
might result from wearing freshly dry-cleaned clothes or visiting a dry cleaner. Two
measurements were extremely high (659 and 490 µg m
− 3
for participants #9751 and
#130), more than five times higher than the next measurement. It is puzzling,
however, that these two participants did not report dry-cleaning exposure, breathing
fumes from or using dry-cleaning fluid or spot remover. Subject #9751 spent an
unusually large amount of time at work/school (mean =9.4 h day
− 1
). Subject #130
worked with paint thinners, brush cleaners, or strippers as well as glues, adhesives,
hobbies or crafts, and also reported having new carpet installed in the past 6 months,
and possibly the high exposure might be explained by “exposure to solvents” that this
individual reported.
• Trichloroethylene (TCE) was detected in relatively few cases (DF =23%). However, the
top ten highest concentrations exceeded 300 µg m
− 3
. TCE has many industrial
applications, e.g., it has been commonly used as a degreasing solvent, but residential
uses are limited. Some exposure can occur from vapor intrusion into buildings from
contaminated sub-soils and from other environmental sources, but the high
concentrations suggest more immediate contact with solvents.
3.1.3. MTBE
This gasoline additive was detected in 28% of the measurements, with six very high
concentrations (98–182 µg m
− 3
). It was the predominant VOC in 5% (n =45) of the
subjects where the median level was 23 µg m
− 3
. MTBE has been used in gasoline in
selected areas in the U.S. since 1979, though it is now being phased out. No other uses
are likely to lead to public exposure. Thus, MTBE should not be detected in areas where
MTBE is not in gasoline, and MTBE should be a unique tracer for gasoline vapor in areas
where this compound is in gasoline. This suggests that a MTBE will have a bimodal
distribution in the NHANES data which combines these two areas, as shown later. The
geographic location of participants is not available for NHANES 1999–2000 (in contrast
to earlier data) because the sample size is much smaller, estimates by geographic region
are less stable, and the risk of identifying subjects is greater.
3.2. Correlations and factor analyses
As expected, the BTEX compounds were strongly correlated (r N 0.60, Table 3), and
the correlations between ethyl benzene, p,m-xylene and o-xylene were especially high
(0.92 ≤r ≤ 0.95). The latter three compounds co-exist in gasoline, as well as in other
products where they are called “mixed xylenes” (ATSDR, 2005). Chlorinated compounds
TCE and PERC showed moderate correlation (r=0.41). Correlations among other VOCs
were weak. Weighted and unweighted ( Table S3) correlation matrices were similar.
The factor analysis identified three factors that explained 67% of the total variance
when an eigenvalue cut-off of 1.0 was used, but results obtained unreasonably
associated MTBE, the gasoline tracer, with the chlorinated solvents TCE and PERC. We
then used a four factor analysis with a lower eigenvalue cut-off (0.8), which resolved
this issue. This analysis explained 76% of the variance. Factor 1 included the BTEX
compounds in which the mixed xylenes had very high loadings (N 0.9), following from
the correlations and showing that these VOCs nearly always occur together. Toluene
and benzene had lower loadings (0.73 and 0.79, respectively), indicating that other
factors contribute to these compounds. Factor 2 included TCE and PERC, which are
mainly used in dry-cleaning products. Factor 3 contained p-DCB, a deodorant found
especially in toilets, and chloroform, a water disinfection byproduct, thus this factor
likely reflects exposures in bathrooms. Factor 4 contained only MTBE (loading of 0.83).
These factors varied slightly depending on whether or not the data were log-trans-
formed. The factor analyses helped confirm to identification of the major VOC groups
and the likely sources of exposure. For further analysis in the present paper, we selected
one compound from each four factors, speci fically, benzene, PERC, chloroform and
MTBE (Supplemental materials show the other VOCs).
3.3. Probability and frequency distributions
3.3.1. Full distributions
Frequency distributions show “heavy” tails for the BTEX compounds, and high,
narrow peaks at low concentrations with only a very few high observations for TCE,
PERC and MTBE (Fig. S1). The latter three compounds were detected least frequently
(i.e., many values were below MDLs), and their median concentrations were the lowest
among the ten VOCs in NHANES. Bimodal distributions were observed for MBTE and
chloroform. For MTBE, over 70% of measurements were below the MDL, which formed a
mode around 0.7 µg m
− 3
; the second but smaller mode occurred around 7 µg m
− 3
. The
lower mode of the bimodal distribution reflects MDLs obtained for those study
participants living in areas where MTBE is not used, as well as those living in MTBE-use
areas but who have very low exposure to gasoline vapors. The upper mode reflects
MTBE-exposed participants living in MTBE-use areas. For chloroform, the lower mode
at about 1 μgm
− 3
may reflect both background levels and perhaps an erroneously low
MDL (stated as 0.4 μgm
− 3
); the upper mode near 4 μgm
− 3
may reflect individuals
Table 4
Identification and parameters of best-fit distributions
VOCs Best fits Distribution parameters Goodness-of-fit
tests
Location Scale Shape A–D p-value
Benzene Log-normal 4.95 5.89 – 150.1 b 0.005
Toluene Log-normal 29.40 39.76 – 76.7 b 0.005
Ethyl benzene Log-normal 5.79 9.96 – 107.4 b 0.005
m,p-Xylene Log-normal 15.89 30.95 – 86.1 b 0.005
o-Xylene Log-normal 5.47 9.28 – 132.8 b 0.005
BTEX Log-normal 65.52 97.65 95.5 b0.005
p-DCB
a
Pareto 0.31 – 0.43 393.2 b 0.005
Log-normal 14.41 63.78 – 441.6 b 0.005
Chloroform Log-normal 2.51 3.94 – 109.4 b 0.005
TCE
a
Pareto 0.12 – 0.79 635.1 b 0.005
Log-normal 0.90 1.65 – 1293.7 b0.005
PERC Log-normal 2.75 6.82 – 222.1 b 0.005
MTBE
a
Weibull 0.38 1.38 0.39 716.7 b0.010
Log-normal 3.75 9.40 – 1278.1 b 0.005
TVOC
10
Log-normal 108.34 155.62 – 92.0 b 0.005
a
Log-normal distribution is not the best fit for p-DCB, TCE and MTBE, but estimated
parameters for this distribution as well as the best-fit distributions are shown.
Fig. 1. A. Observed cumulative frequency distribution for measurements, and fitted (log-normal) cumulative probability distribution for benzene concentrations. B. Probability plots
for maximum Gumbel type distribution fitting both the top 5 and 10% of measurements. Points show individual measurements; lines show fitted distribution based on linear
regression after removing outliers, as discussed in text.
925C. Jia et al. / Environment International 34 (2008) 922–931
having higher exposure to chloroform. Each NHANES measurement (all VOCs) is
assigned a unique MDL, which depends on the averaging time.
Of the candidate distributions, log-normal distributions had the best fit to all VOCs
except for p-DCB and TCE, which were assigned the Pareto distribution, and MTBE, which
was assigned the Weibull distribution (Table 4). MTBE is a special case of a mixed
distribution since, as just discussed, concentrations outside the MTBE-use area reflect
MDLs, which in turn reflect the small amount of variation in the time that the badge was
exposed. In the MTBE-use area, distributions would be expected to be roughly log-normal,
paralleling benzene which also arises from gasoline-related sources. However, as noted,
the MTBE distribution cannot be cleanly split since information on the locations of
participants is unavailable. These distributions were selected using all observations
(n= 665) and the A–Dtest;theK–Sandχ
2
tests gave similar results. However , goodness-of-
fit tests usually rejected the candidate distributions, atypical result for environmental data,
in part due to anomalies and measurement errors (Ott, 1995).
Fitted and measured cumulative frequency distributions are compared for the four
VOCs in Figs. 1–4 ( Fig. S2 shows similar plots for all ten VOCs, BTEX and TVOC
10
)
Agreement was considered “good” if fitted quantiles were within ±20% of the
observations. Most concentrations below the 20th percentile were underestimated;
however, these measurements usually fell below MDLs and risk-based values (Table 1).
Otherwise, fits varied by VOC and percentile. The BTEX compounds and chloroform
generally showed good agreement with log-normal distributions, although ethyl
benzene and xylenes showed moderate differences, e.g., 65–80th percentiles were
overestimated by 20–30%, and 99th percentile concentrations were underestimated by
13–60%. Other compounds showed poor agreement, e.g., 95th to 99th percentile
concentrations measurements were underestimated by 30–65% for chloroform, TCE
and PERC; overestimated for MTBE by 7 –35%; and hugely overestimated for p-DCB
(factor of 2–28). Fits for TCE and MTBE were also poor at intermediate percentiles. Log-
normal distributions for p-DCB, TCE and MTBE, e.g., Fig. 4B, demonstrated poor fits that
were clearly worse than the selected Pareto and Weibull distributions. The composite
variables, BTEX and TVOC
10
, closely fit log-normal distributions, probably because these
summations of VOCs tended to “average-out” disparities.
While log-normal distributions provided moderately good fits to most compounds,
both low and high concentrations were underestimated, and the middle range was
overestimated. Geometric means were very close to medians for BTEX compounds,
moderately higher for chlorinated compounds, and much higher for p-DCB and MTBE
(Table 2). The highest concentrations (≥ 95th percentile) were significantly under-
predicted. The geometric standard deviations σ
g
ranged from 2.6 (benzene) to 5.7 (p-
DCB), showing considerable variation and no clear groupings (Table 2). None of the
candidate distributions fit p-DCB, TCE and MTBE, compounds with low detection
frequencies (63, 23 and 28%, respectively). As elaborated in the Discussion, we speculate
that these measurements reflect multiple circumstances: non-detects; moderate
concentrations due to local but dispersed sources; and very high concentrations due
to some unusual contact or exposure situation.
3.3.2. Extreme distributions
Fitted and observed maximum Gumbel distributions for concentrations exceeding
the 90th and 95th percentile concentrations are shown for benzene, PERC, chloroform
and MTBE in Figs. 1–4, respectively; fitting results, e.g., goodness-of-fitasR
2
, are shown
in Table 5 (plots for all ten VOCs are shown in Fig. S2). Considering the top decile and all
of the data, R
2
values were not impressive, ranged from 0.38 (ethyl benzene) to 0.89
(chloroform). Most VOCs attained only poor-to-fair fits. Several of the highest
measurements exceeded the next highest measurement by about 2-fold, suggesting
statistical outliers. These are also apparent as large deviations between measurement
and the fitted Gumbel distributions. On this basis, we identified the following outliers:
• Benzene, 1 measurement (119 µg m
− 3
, subject #5359, Fig. 1B). This individual
reported using household disinfectants, degreasing cleaners or furniture polish.
• Toluene, 6 measurements (1611, 1551, 1399, 1267, 797 and 668 µg m
− 3
for subjects
#4879, #8631, #2037, #4479, #2002, and #1002, respectively). All of these subjects
reported at least one of the following activities: pumping gasoline into a car; near a
smoking person for N 10 min; and breathing fumes or using gasoline. Note that at the
onset, we deleted two cases with still higher toluene levels (1352 and 6280 µg m
− 3
for
participants #3852 and #4076).
• Ethyl benzene measurement, 1 measurement (837 µg m
− 3
, subject #4514).
• m,p-Xylene measurement, 1 measurement (729 µg m
− 3
, subject #8801).
• o-Xylene, 3 measurements (202, 173 and 129 µg m
− 3
for subjects #8801, #4514 and
#8110, respectively). Subjects #4514 and #8801 reported being near a smoker for
N 10 min. Subject #8110 reported pumping gasoline into a car.
• PERC, 2 measurements (659 and 490 µg m
− 3
for subjects #9751 and #130, respectively,
Fig. 2B). These subjects did not report any contact with dry-cleaning products.
Fig. 2. Observed and fitted distributions for PERC. Otherwise as Fig. 1.
Fig. 3. Observed and fitted distributions for chloroform. No outliers are removed from the maximum Gumbel distribution. Otherwise as Fig. 1.
926 C. Jia et al. / Environment International 34 (2008) 922–931
• p-DCB, 4 measurements (2236, 2227, 1511, 1152 µg m
− 3
for subjects #3294, #8172,
#7929 and #9158, respectively). Three of these subjects reported deodorizer use (not
#3294).
• MTBE, 6 measurements (182,170,159,155,126 and 98 µg m
− 3
forsubjects#6514,#2031,
#1551,#4350, #1002 and#7949, respectively, Fig. 4B). Five ofthese subjects (not#7949)
reported pumping gasoline into a car, or breathing fumes or using gasoline.
Chloroform and TCE did not show obvious outliers. Interestingly, only four subjects
had multiple outliers (#1551 for toluene and MTBE; #2002 for toluene and MTBE;
#4514 for ethyl benzene and o-xylene; #8801 for m,p-xylene and o-xylene). In
addition, subject #5359, who had a high benzene exposure, had the second highest
chloroform concentration. Several of these concentrations are extremely high and
indicate the presence of very strong and local sources, e.g., p-DCB concentrations of
N 1000 μgm
− 3
are likely due to the use of mothballs and deodorizers. If such exposures
are infrequent, then the calculated lifetime exposures and risks may not be excessive.
Unfortunately, the NHAN ES dataset does not allow an estimate of the frequency of such
events. BTEX and TVOC
10
did not show outliers other than the two cases (subjects
#4076 and #3852) removed at the onset.
After removing these measurements, the fit of the Gumbel distribution improved
considerably, and most R
2
values exceeded 0.75 with the exceptions of toluene and TCE.
Results for chloroform and TCE were unchanged since no data were removed (the
rationale and approach to such selective data censoring is discussed in the Discussion).
Still, the top decile may not represent true extrema, especially if many measurements
fall below MDLs. Thus, we refit the Gumbel distribution to the top 5% of the data that
exceeded MDLs. This improved fit for all VOCs, especially for TCE for which the R
2
jumped to 0.88. Removal of outliers further improved fits, giving R
2
≥ 0.85 for all VOCs
except MTBE. Removing the top 8 MTBE measurements (2 additional points, rather than
just the 6 noted earlier) improved MTBEs R
2
to 0.87. Thus, with appropriate delineation
of extrema and exclusion of outliers, extreme values can be closely modeled.
4. Discussion
Measurements of environmental pollutants such as those in the
NHANES VOC exposure database reflect multiple circumstances that
may be classified into four groups based on the capabilities of the
monitoring method: (1) values falling below method detection limits
(MDLs), which are frequently assigned an estimated or imputed value,
e.g., 1/2 MDL; (2) detections or “traces” exceeding MDLs but still below
quantitation limits (e.g., 10 σ), that can only be imprecisely deter-
mined; (3) values within the normal linear range of the instrument;
and (4) “over-range” measurements that are likely to be under-re-
ported due to saturation or other non-linear effects. Reported mea-
surements are also prone to errors in the collection, analysis, data
entry and other factors. Measurements also may be classified into four
groups with respect to the phenomena that underlie the pollution or
the pollutant “event” during the measurement period: (1) an absence
of the pollutant; (2) generally low or “background levels” that arise
due to contributions from distant or “regional” emission sources; (3)
moderate-to-high concentrations from “local” or strong emission
sources that are well-dispersed; and (4) occasional very high con-
centration “hits” yielding “extrema” due to “near-field” impacts, ex-
ceptionally strong sources, or a combination of moderately-to-strong
sources and unfavorable dispersive conditions. For pollutants where
MDLs are low, measurements often reflect contributions from both
background and local sources. A conceptual understanding of these
groupings and at least some quantification of the applicable con-
centration ranges, which do not have precise boundaries, are necessary
to properly interpret measurements and distributions, including the
identification of outliers. We note that few laboratories or investigators
report performance measures that include limit of quantitation and
linear dynamic range. Also, all VOC sampling techniques have
limitations, and partial saturation of the adsorbents in the passive
samplers used in NHANES will reduce their sampling uptake rate at
Fig. 4. Observed and fitted distributions for MTBE. “Fitted Distribution1” is Weibell distribution, the best fit. “Fitted Distribution2” is log-normal, shown for comparison. Otherwise as Fig.1.
Table 5
Parameters of extreme distributions, including slopes, intercepts (IC), and R
2
from Eq. 1
VOCs Top 10th percentile Top 5th percentile No. of
outliers
removed
With outliers Without outliers With outliers Without outliers
Slope IC R
2
Slope IC R
2
Slope IC R
2
Slope IC R
2
Benzene 8 18 0.79 7 18 0.83 12 9 0.85 11 11 0.86 1
Toluene 177 107 0.61 67 105 0.69 389 −295 0.87 180 −45 0.92 6
Ethyl benzene 61 25 0.38 25 27 0.75 157 −147 0.59 53 − 14 0.94 1
m,p-Xylene 68 79 0.85 66 80 0.85 104 22 0.95 100 27 0.95 1
o-Xylene 22 26 0.78 15 26 0.84 38 − 3 0.91 27 11 0.94 3
BTEX 237 277 0.78 237 277 0.78 446 − 124 0.95 446 − 124 0.95 0
p-DCB 210 114 0.70 143 122 0.89 395 − 260 0.79 219 2 0.93 4
Chloroform 6 10 0.89 6 10 0.89 8 6 0.94 8 6 0.94 0
TCE 41 9 0.62 41 9 0.62 119 −170 0.88 119 − 170 0.88 0
PERC 48 15 0.45 19 17 0.81 130 − 149 0.70 39 − 13 0.96 2
MTBE 25 21 0.65 11 20 0.82 56 − 42 0.70 15 20 0.72 6
TVOC
10
283 414 0.82 283 414 0.82 441 143 0.96 441 143 0.96 0
Results shown for data with and without outliers. ‘Top 10th percentile’ is the top 10% of all data, and ‘Top 5th percentile’ is the top 5% of the observations above MDLs. Number of
outliers shown at right.
927C. Jia et al. / Environment International 34 (2008) 922–931
long averaging times and at high concentrations, leading to negative
biases at high concentrations or sampling times (Jia et al., 2007).
4.1. Probability distributions
In general, pollutant concentrations and exposures are random in
nature as they depend upon a number of variable factors, e.g., emis-
sion rates, microenvironmental characteristics, time activity budgets,
and human activities. Ofte n, basic information regarding VOC
concentrations or exposures is neither available, generalizable, nor
certain. This is in strong contrast to distributions of other variables
used in exposure and risk calculations, e.g., dosimetric parameters
(e.g., body weight, intake rate) and time activity durations (Sexton
et al.,1992; Finley et al.,1994), which are well-characterized and easily
bounded. Notably, the variation in concentrations or exposures can
dwarf the variation in other parameters (with the possible exception
of toxicity parameters like cancer-slope factors). Further variation in
results of measurement programs can be caused by a host of factors,
including sampling and analysis methods, sampling time, study popu-
lation, season, weather, etc.
4.1.1. Full distributions
Early studies of probability distributions focused on ambient mea-
surements of criteria pollutants in cities, e.g., carbon monoxide (Ott,
1979) and sulfur dioxide (Berger et al., 1982), and concentrations at all
averaging times were usually found to approximate log-normal
distributions (Larsen, 1969; Ott, 1995). In workplace settings, 8-hour
time-weighted average concentrations also have been frequently
represented using log-normal distributions (Nicas and Jayjock, 2002).
Relatively few studies have examined distributions of VOC concentra-
tions in non-occupational settings. Log-normal distributions were
assigned to ten VOCs measured in 427 indoor air samples collected in
residences in Denver, Colorado (Foster et al., 2003). Gamma distribu-
tions provided the best fit to concentrations of 28 VOCs (including 11
aldehydes) measured in 1417 Japanese homes (Park and Ikeda, 2004).
A recent U.S. review reported the log-normal distribution as the best
fit for 9 VOCs in most microenvironments, and the Gamma dis-
tribution for chloroform in dining rooms (Loh et al., 2007). As in the
ambient, workplace and indoor studies, log-normal distributions
provided only an approximate fit, at best, for most of the ten VOCs
examined in the present study. The fit was not always very good,
especially for the less frequently detected compounds, and statistical
tests of agreement usually failed.
The practice of fitting and analyzing distributions of air pollutant
concentrations has not become routine practice in exposure assess-
ment. To determine the underlying distribution, measurements are
generally matched to theoretical distributions using three steps:
selection of a candidate distribution; estimation of its parameters; and
assessment of the goodness-of-fit(US EPA, 1997). Despite the avail-
ability of automated software that can rapidly perform such analyses
(e.g., Crystal Ball, Decisioneering, Inc., Denver, CO, USA; @Risk,
Palisade Corporation, Ithaca, NY, USA; Risk Solver, Frontline Systems,
Inc., Incline Village, NV, USA), it appears that the most common
approach continues to be the assumption of log-normality. Thus,
medians or geometric means are used as a measure of central ten-
dency, and data are log-transformed for statistical inference testing.
These statistics give little if any information regarding extrema, and the
log-normal distributions rarely meet goodness-of-fit criteria.
4.1.2. Extreme distributions
As we noted at the onset, few studies have used a sampling design
or attained a sample size that is sufficient to characterize population
exposure. Importantly, extrema can only be derived from large studies.
Extreme values generally do not follow the distribution derived
from the full range of the data. In many cases, a particular distribution
or several distributions may reasonably approximate the middle 80%
of the values; however, it may be inappropriate for the top 5 or 10% of
the data (Haas, 1997). Extreme concentrations of air pollutants were
found to follow the Gumbel distribution when the full range was log-
normally distributed (Singpurwalla, 1972). We recently found that
Gumbel distributions were appropriate for the top decile concentra-
tions of 23 VOCs and carbonyls measured in Michigan, U.S. (Le et al.,
2007). The present study confirms that Gumbel distributions can be
used to describe the extreme values (e.g., top 10th or 5th percentiles)
of personal VOC exposures, with the caveat that a small number of
outliers will still exceed the fitted distribution. As noted by Ott (1995),
the upper tail of a distribution reflects a stochastic process, and it is
insensitive to the type of the hypothetical distributions, regardless the
original distribution producing the tail. Thus, a variety of distributions
can fit the extreme values equally well. While the NHANES VOC
extrema were well fit by the Gumbel distribution, we found that
Gamma and Weibull distributions were selected for the top 10th
percentile data, and Gamma and Beta distributions for the top 5th
percentile data on the basis of A–D tests (data not shown). One
advantage of the Gumbel distribution, however, is that its linear plot
helps in the identification of outliers.
Our experience analyzing the NHANES data provides guidance in
fitting extrema. First, a large sample is required, and it is advantageous
if most measurements exceed MDLs. Possibly 5% or fewer of the
observed values above MDLs may be considered extrema. Second,
distribution fitting cannot depend solely on goodness-of-fit tests, but
also on subjective judgment. Third, while the Gumbel (and other)
distributions are extreme value distributions, they may not fit outliers;
thus, these points must still be identified and removed, and an
iterative approach may be the best option. Such data censoring also
may be necessary to improve model fit for both full and extreme value
distributions. Such actions often and justifiably are criticized as
“cherry picking”. We recognize the uncertainty of the data, and
believe that most of the deleted values represent unusual cases.
However, relatively common situations such as refueling a vehicle,
smoking, and wearing freshly dry-cleaned clothes need more
investigation to see if they can produce the very high measurements
encountered. Still, the 24 censored measurements, plus the 2 censored
cases representing 20 additional measurements, represent a very
small percentage (0.7%) of the 6600 VOC measurements in NHANES.
For most of these measurements, our initial examination of the
NHANES survey data did not show anything unusual, though this
investigation is ongoing. Unfortun ately, in a study design like
NHANES, follow-up interviews or repeated measurements to try to
understand the exposure source and the reliability of the measure-
ment are not possible.
4.2. Comparison of NHANES and other exposure estimates
For some years, it has been known that exposure estimates derived
using personal sampling often exceed exposures based on indoor
monitoring, which in turn exceed measurements using outdoor or
ambient monitoring. This can apply to VOCs (Sexton et al., 2004;
Edwards et al., 2005), as well as other pollutants, e.g., particulate
matter (Wallace, 2000). While this “personal pollution cloud” or
“Linus effect” (after the comic strip character) is becoming better
recognized, its strength and variability among individuals have not
been quantified. Due to its signifi
cance, NHANES measurements
should only be compared to other studies that use personal sampling.
For VOCs, these include the Total Exposure Assessment Method
(TEAM) studies in the 1980s (Wallace, 2001), the National Human
Exposure Assessment Survey (NHEXAS) in the late 1990s, and more
recently, the Rela tionships of Indoor, Outdoor and Personal Air
(RIOPA) study. However, these (and other mostly smaller) studies
are not necessarily representative of the U.S. population, and none
used a population-based sampling strategy. Thus, these comparisons
may reflect local or regional differences in VOC exposure.
928 C. Jia et al. / Environment International 34 (2008) 922–931
We selected three U.S. studies that measured personal VOC
exposures that were more or less contemporaneous with NHANES.
These were conducted in Minnesota (MN) by Sexton et al. (2004),in
Maryland (MD) by Payne-Sturges et al. (2004), and in New Jersey,
Texas and California (NJ/TX/CA; Weisel et al., 2005b). We also included
the slightly earlier (mid-1990s) NHEXAS study (Clayton et al., 1999).
Table 6 compares average, median and 90th percentile (95th
percentile for NJ/TX/CA) concentrations reported in these studies.
Measurements from all studies show the very strong effect of non-
normality, e.g., means are typically 2 to 3 times higher than medians
(the NJ/TX/CA study shows a 30-fold difference for p-DCB). Largely
due to the influence of high concentrations (including potential
outliers), and to an extent due to the limited sample sizes (especially
in the MD study), it is clear that averages do not provide robust
measures of central tendency. Thus, the following di scussion
emphasizes non-parametric statistics.
Of the four reported VOCs, median concentrations in NHEXAS
significantly exceeded those in the more recent studies. This is
unsurp rising given the general downward trend in indoor and
outdoor VOC concentrations (Hodgson and Levin, 2003). In the three
other studies, medians and upper percentile statistics were similar to
NHANES. Only three compounds showed sizable differences:
• p-DCB: In MN, levels were very low (examining 50th and 90th per-
centiles), about 4 to 6 times lower than the NHANES data. In NJ/TX/
CA, medians were comparable to NHANES, but the 95th percentile
concentration was extremely high (314 µg m
− 3
), twice that in
NHANES (95th percentile concentration is 142 µg m
− 3
, Table 2).
• TCE: NHANES data showed a median TCE level 1.7–2.6 times higher
than those in the three other studies.
• MTBE: In MD, the median MTBE level was nearly 5 times higher than
the NHANES results, while the 90th percentile concentration was 6
times higher. The NJ/TX/CA statistics were 3 to 4 times higher. In
cases, these studies emphasized highly traffic-exposed individuals,
moreover, MTBE may be widely used in these study areas in
comparison to NHANES, which included areas where it was not
used. After censoring non-detected MT BE measurements, the
NHANES data gave a of 6.2 µg m
− 3
, just slightly lower than levels in
the NJ/TX/CA and MD studies. Note that this comparison is mean-
ingful only if it is assumed that all or most measurements in MTBE
usage areas would result in detections, which did occur for the other
gasoline components (MTBE was not reported in the MN study).
This comparison reveals several important findings. First, larger
though localized studies can give statistics that are representative or
nearly so, judged on the basis of their similarity to the NHANES data,
which is population-based and thus should be representative. This
mainly applies to the BTEX compounds that are ubiquitous. Second,
there is a need for additional and probably improved measurements of
chlorinated compounds, especially since some or much of the inter-
study variation seems likely to arise from MDL effects (lower MDLs are
needed). Finally, as noted earlier, when a pollutant like MTBE is used in
only a subset of the region studied, the resulting statistics and derived
distributions may not be reliable or nationally representative.
4.3. Importance and applications
The analysis of the NHANES data suggests that representing the full
range of VOC exposures requires a combined approach, namely, a log-
normal (or other) distribution may be used for low to moderately high
concentrations, and an extreme value distribution for the very highest
(≥ 95th percentile) concentrations. It is the highest concentrations and
exposures that may need control or mitigation, or drive policies to this
effect, thus these values require further attention. Also, the shift from
deterministic to probabilistic analyses, such as Monte Carlo methods,
requires appropriate distributions of exposure parameters (US EPA,
1995), and fitting and assigning probability distribution is a first and
critical step (Haas, 1997; Hamed and Bedient,1997; Thompson, 1999b).
Log-normal distributions are not always the first choice, and several
VOCs appear to follow other distributions. All of the full distributions,
that is, those that attempt to match all of the data, are likely to lead to
the wrong conclusions concerning the level and frequency of extrema.
4.4. Study limitations
We could not stratify the data to isolate regions where MTBE is
used in gasoline, and thus a single distribution very poorly described
MTBE concentrations. There are no replicates in the NHANES dataset,
uncertainty estimates for individual datum, or opportunities to further
investigate outliers. Exposure assumptions were simplified, i.e., short-
term NHANES measurements were extrapolated to estimate lifetime
exposures without adjustment for trends and uncertainties. We also
note that the risk levels and reference concentrations used are pro-
tective guidelines, not standards. As concentrations of many VOCs are
decreasing, the fitted distributions and other statistics in the present
paper will likely need updates in future years. Our identification of the
factors that explain the variation in the dataset is tentative, and might
change with additional information. Finally, it should be recognized
that due to correlations among VOCs, univariate analyses cannot be
Table 6
Results from selected studies of personal exposure to VOCs in the U.S. since 1990, and comparison to NHANES
Study area NHANES RIOPA NHEXAS Minneapolis, MN South Baltimore, MD
U.S. Elizabeth, NJ; Houston, TX;
Los Angeles, CA
IL, IN, OH, MI, MN, WI
Period 1999–2000 1999–2001 1995–1997 1999.0 2000–2001
Sample size 665 545 386 288 37
Statistics Mean Median Q90 Mean Median Q95 Mean Median Q90 Mean Median Q90 Mean Median Q90
Benzene 5.3 2.8 13.5 3.6 2.4 10.7 7.5 5.4 13.7 7.6 3.2 18.3 4.1 2.9 7.3
Toluene 36.4 17.4 59.8 19.2 12.2 50.2 NA NA NA 30.3 17.1 62.9 26.8 14.7 41.3
Ethyl benzene 8.4 2.6 14.2 2.8 1.7 7.5 NA NA NA 5.6 2.2 11.8 4.4 2.5 9.5
m,p-Xylene 18.8 6.5 38.7 8.1 4.4 22.7 NA NA NA 21.0 7.4 48.6 17.8
a
9.5
a
30.9
a
o-Xylene 6.5 2.4 14.1 2.9 1.7 8.1 NA NA NA 6.8 2.3 15.6 NA NA NA
p-DCB 27.3 1.7 34.8 56.7 1.9 314.0 NA NA NA 3.2 0.4 5.1 NA NA NA
Chloroform 2.7 1.1 5.9 4.2 1.0 6.3 2.3 2.0 4.5 1.5 1.0 3.9 4.8 2.3 7.8
TCE 3.4 0.3 1.2 1.0 0.1 1.9 5.3 0.6 6.0 1.0 0.2 1.4 0.4 0.2 0.8
PERC 5.2 0.7 6.6 7.1 0.6 7.2 31.9 2.0 10.8 31.8 0.9 7.0 3.0 0.9 8.2
MTBE 5.2 0.6 10.7 14.8 7.1 42.7 NA NA NA NA NA NA 24.7 8.8 66.6
RIOPA is cited in Weisel et al., 2005b, NEXAS in Clayton et al., 1999, Minneapolis study in Sexton et al., 2004, and South Baltimore in Payne-Sturges et al., 2004. “Q90” and “Q95” are
90th and 95th percentile concentrations, respectively. “NA” is not available in the indicated study.
a
Includes m-, p- and o -xylenes.
929C. Jia et al. / Environment International 34 (2008) 922–931
used to represent VOC mixtures, which represent a challenging public
health issue (US EPA, 2000; ATSDR, 2000).
5. Conclusions
This study explored the distribution of personal exposure mea-
surements of VOCs, and its findings are relevant to health risk as-
sessment and risk management. It is the first study to characterize
VOC exp osures at the national level using a popu lation-based
sampling strategy, thus, results should be broadly representative of
non-occupational VOC exposures throughout the U.S. Eight of the ten
VOCs monitored using personal sampling of 669 individuals in the
NHANES dataset were detected in most samples. Exposures among
study participants showed tremendous variability, ranging from
below method detection limits to as high as 6280 μgm
− 3
for indi-
vidual compounds and 14,287 μgm
− 3
as the sum of the ten VOCs in
the NHANES dataset. Correlations and factor analysis identified four
groups of possible emission sources: gasoline vapors and exhaust; tap
water disinfection products; cleaning products, and gasoline additive
(MTBE). Log-normal distributions were assigned to benzene, toluene,
ethyl benzene, xylenes, chloroform and PERC with moderate-to-good
agreement to observations. Different distributions were assigned to p-
DCB and TCE (Pareto distributions) and MTBE (Weibull distribution),
all with considerably poorer fit. Extrema were fit to the maximum
Gumbel distribution, and reasonable agreement was found for most
compounds, especially after censoring outliers and defining extrema
as the top 5% of measurements above MDLs. The dataset contained a
small fraction (b 1%) of extremely high concentrations, considered to
be outliers as they did fit neither the full nor extreme value dis-
tributions. The NHANES exposure database suggests that log-normal
distributions are not always the first choice for distributions, and that
none of standard distributional forms provided a close match to the
levels and frequencies of the highest exposure concentrations that
pose the greatest risks.
Acknowledgement
This work was performed under the support of the Mickey Leland
National Urban Air Toxics Research Center, Grant RFA 2006-01,
entitled “The relationship between personal exposures to VOCs and
behavioral, socioeconomic, demographic characteristics: analysis of
the NHANES VOC project dataset.”
Appendix A. Supplementary data
Supplementary data associated with this article can be found, in
the online version, at doi:10.1016/j.envint.2008.02.002.
References
ATSDR. Toxicological Profile for Xylene. Atlanta, GA: US Agency for Toxic Substances and
Disease Registry; 2005.
ATSDR. Guidance manual for the Assessment of Joint Toxic Action of Chemical Mixtures.
Atlanta, GA: US Agency for Toxic Substances and Disease Registry; 2000.
Barnett V. Probability plotting methods and order statistics. Appl Stat 1975;24:95–108.
Batterman S, Jia C, Hatzivailis G. Migration of volatile organic compounds from attached
garages to residences: a major exposure source. Environ Res 2007;104:224–40.
Berger A, Melice JL, Demuth CL. Statistical distributions of daily and high atmospheric
SO
2
concentrations. Atmos Environ 1982;16:2863–77.
CDC (Centers for Disease Control and Prevention). National Health and Nutrition
Examination Survey 1999–2000 Public Data Release File Documentation. Hyatts-
ville, MD: U.S. Department of Health and Human Services; 2006a. .
gov/nchs/data/nhanes/gendoc.pdf. Accessed 20 May 2007.
CDC (Centers for Disease Control and Prevention). National Health and Nutrition
Examination Survey 1999–2000 Data Documentation, Lab 21-Volatile Organic
Compounds. Hyattsville, MD: U.S. Department of Health and Human Services;
2006b. Accessed
20 May 2007.
CDC (Centers for Disease Control and Prevention). National Health and Nutrition
Examination Survey Questionnai re (or Examination Protocol, or Laboratory
Protocol). Hyattsville, MD: U.S. Department of Health and Human Services;
2006 c. />Accessed 20 May 2007.
Chung CW, Morandi MT, Stock TH, Afshar M. Evaluation of a passive sampler for volatile
organic compounds at ppb concentrations, varying temperatures, and humidities
with 24-h exposures. 1. Description and characterization of exposure chamber
system. Environ Sci Technol 1999a;33:3661–5.
Chung CW, Morandi MT, Stock TH, Afshar M. Evaluation of a passive sampler for volatile
organic compounds at ppb concentrations, varying temperatures, and humidities
with 24-h exposures. 2. Sampler performance. Environ Sci Technol 1999b;33:
3666–71.
Clayton CA, Pellizzari ED, Whitmore RW, Perritt RL, Quackenboss JJ. National Human
Exposure Assessment Survey (NHEXAS):distributionsand associations of lead, arsenic
and volatile organic compounds in EPA Region 5. J Expo Anal Environ Epidemiol
1999;9:381–92.
Edwards RD, Schweizer C, Jantunen M, Lai HK, Bayer-Oglesby L, Katsouyanni K, et al.
Personal exposures to VOC in the upper end of the distribution — relationships to
indoor, outdoor and workplace concentrations. Atmos Environ 20 05;39:2299–307.
Finley BL, Paustenbach DJ. The benefits of probabilistic exposure assessment: three case
studies involving contaminated air, water, and soil. Risk Anal 1994;14:53–73.
Finley B, Proctor D, Scott P, Harrington N, Paustenbach D, Price P. Recommended
distributions for exposure factors frequently used in health risk assessment. Risk
Anal 1994;14:533–51 .
Foster SJ, Kurtz JP, Woodland AK. Background indoor air risks at selected residences in
Denver Colorado; 2003. Accessed
12 June 2007.
Haas CN. Importance of distributional form in characterizing inputs to Monte Carlo risk
assessments. Risk Anal 1997;17:107–13.
Hamed MM, Bedient PB. On the effect of probability distributions of input variables in
public health risk assessment. Risk Anal 1997;17:97–105.
Hodgson AT, Levin H. Volatile organic compounds in indoor air: a review of
concentrations measured in North America since 1990. Berkeley, CA: Lawrence
Berkeley National Laboratory; 2003. Report LBNL-51715.
Jia C, Batterman S, Godwin C. Continuous, intermittent and passive sampling of airborne
VOCs. J Environ Monit 2007;9:1220–30.
Larsen RI. A new mathematical model of air pollutant concentration averaging time and
frequency. J Air Pollut Control Assoc 1969;19:24–30.
Le HQ, Batterman SA, Wahl RL. Reproducibility and imputation of air toxics data.
J Environ Monit 2007;9:1358–72.
Lide DR, editor. CRC Handbook of Chemistry and Physics. Boca Raton, FL: CRC Press;
2005.
Loh MM, Levy JI, Spengler JD, Houseman EA, Bennett DH. Ranking cancer risks of
organic hazardous air pollutants in the United States. Environ Health Perspect
2007;115:1160–8.
Mohamed M, Kang D, Aneja V. Volatile organic compounds in some urban locations in
United States. Chemosphere 2002;47:863–82.
NRC (National Research Council). Human Exposure Assessment of Airborne Pollutants:
Advances and Opportunities. Washington, DC: National Academy of Sciences; 1991.
Nicas M, Jayjock M. Uncertainty in exposure estimates made by modeling versus
monitoring. AIHAJ 2002;63:275–83.
Nieuwenhuijsen M, Paustenbach D, Duarte-Davidson R. New developments in exposure
assessment: the impact on the practice of health risk assessment and epidemio-
logical studies. Environ Int 2006;32:996–1009.
Ott WR. Environmental Statistics and Data Analysis. Boca Raton, FL: CRC Press, Inc.;
1995.
Ott WR. Testing the Validity of the Lognormal Probability Model: Computer Analysis of
Carbon Monoxide Data from U.S. Cities. Washington, DC: US Environmental
Protection Agency; 1979. EPA-600/4-79-040.
Park JS, Ikeda K. Exposure to the mixtures of organic compounds in homes in Japan.
Indoor Air 2004;14:413–20.
Payne-Sturges DC, Burke TA, Breysse P, Diener-West M, Buckley TJ. Personal exposure
meets risk assessment: a comparison of measured and modeled exposures and
risks in an urban community. Environ Health Perspect 2004;112:589–98.
Raw GJ, Coward SKD, Brown VM, Crump DR. Exposure to air pollutants in English
homes. J Expo Anal Environ Epidemiol 2004;14:S85–94.
Roberts EM. Review of statistics of extreme values with applications to air quality data.
Part I. Review. J Air Pollut Control Assoc 1979a;29:632–7.
Roberts EM. Review of statistics of extreme values with applications to air quality data.
Part II. Application. J Air Pollut Control Assoc 1979b;29:733–40.
Saarela K, Tirkkonen T, Laine-Ylijoki J, Jurvelin J, Nieuwenhuijsen M, Jantunen M.
Exposure of population and microenvironmental distributions of volatile organic
compound concentrations in the EXPOLIS study. Atmos Environ 2003;37:5563–75.
Sack TM, Steele DH, Hammerstrom K, Remmers J. A survey of household products for
volatile organic-compounds. Atmos Environ Part A — Gen Topics 1992;26:1063–70.
Sexton K, Selevan S, Wagener D, Lybarger J. Estimating human exposures to
environmental pollutants: availability and utility of existing databases. Arch Environ
Health 1992;47:398–407.
Sexton K, Adgate JL, Ramachandran G, Pratt GC, Mongin SJ, Stock TH, et al. Comparison
of personal, indoor, and outdoor exposures to hazardous air pollutants in three
urban communities. Environ Sci Technol 2004;38:423–30.
Sielken RL, Valdez-Flores C. Probabilistic risk assessment's use of trees and distributions
to reflect uncertainty and variability and to overcome the limitations of default
assumptions. Environ Int 1999;25:755–72.
Singpurwalla ND. Extreme values from a lognormal law with applications to air
pollution problems. Technometrics 1972;14:703–11.
Thompson KM. Software review of distribution fitting programs: Crystal Ball and BestFit
Add-In to @RISK. Hum Ecol Risk Assess 1999a;5:501–8.
930 C. Jia et al. / Environment International 34 (2008) 922–931
Thompson KM. Developing univariate distributions from data for risk analysis. Hum
Ecol Risk Assess 1999b;5:755–83.
US EPA (US Environmental Protection Agency). Guidelines for Exposure Assessment
(FRL-4129-5), vol. 10 4. Washington, DC: Federal Register; 1992. p. 22888–938.
US EPA (US Environmental Protection Agency). Guidance for risk characterization.
Washington, DC: Science Policy Council; 1995.
US EPA (US Environmental Protection Agency). Guiding Principles for Monte Carlo
Analysis. Washington, DC: Risk Assessment Forum; 1997. EPA/630/R-97/001.
US EPA (US Environmental Protection Agency). Supplementary Guidance for Conduct-
ing Health Risk Assessment of Chemical Mixtures. Washingto n, DC: Risk
Assessment Forum; 2000. EPA/630/R-00/002.
US EPA (US Environmental Protection Agency). IRIS Database for Risk Assessment; 2007.
Accessed 4 June 2007.
Wallace LA. Correlations of personal exposure to particles with outdoor air measure-
ments: a review of recent studies. Aerosol Sci Tech 2000;32:15 –25.
Wallace LA. Human exposure to volatile organic pollutants: implications for indoor air
studies. Annu Rev Energy Environ 2001;26:269–301.
Wallace LA, Pellizzari E, Leaderer B, Zelon H, Sheldon L. Emissions of volatile organic
compounds from building-materials and consumer products. Atmos Environ 1987;21:
385–93.
Weisel CP, Kim H, Haltmeier P, Klotz JB. Human respiratory uptake of chloroform and
haloketones during showering. J Expo Anal Environ Epidemiol 1999;15:6–16.
Weisel CP, Zhang J, Turpin BJ, Morandi MT, Colome S, Stock TH, et al. The relationships of
indoor, outdoor and personal air (RIOPA) study: study design, methods and initial
results. J Expo Anal Environ Epidemiol 2005a;15:123–37.
Weisel CP, Zhang J, Turpin BJ, Morandi MT, Colome S, Stock TH, et al. Relationships of
Indoor, Outdoor, and Personal Air (RIOPA): Part I. Collection Methods and
Descriptive Analyses. Houston,TX: Health Effects Institute, Boston, MA and National
Urban Air Toxics Research Center; 1983b. />id=31. Accessed 18 October 2007.
931C. Jia et al. / Environment International 34 (2008) 922–931
Predictors of personal air concentrations of chloroform among US adults in
NHANES 1999–2000
ANNE M. RIEDERER
a
, SCOTT M. BARTELL
a,b
AND P. BARRY RYAN
a
a
Department of Environmental and Occupational Health, Rollins School of Public Health, Emory University, Atlanta, Georgia, USA
b
Program in Public Health, University of California, Irvine, California, USA
Keywords: chloroform personal air inhalation risk.
Journal of Exposure Science and Environmental Epidemiology (2009) 19, 248–259
© 2009 Nature Publishing Group All rights reserved 1559-0631/09/$32.00
www.nature.com/jes
Riederer et al: Reprinted by permission from Macmillan Publishers Ltd: Journal of Exposure Science & Environmental
Epidemiology, 19(3): 248-259, 2009.
Volunteer studies suggest that showering/bathing with chlorinated tap water contributes to daily chloroform inhalation exposure for the majority of US
adults.We used data from the 1999–2000 US National Health and Nutrition Examination Survey (NHANES) and weighted multiple linear regression to
test the hypothesis that personal exposure microevents such as showering or spending time at a swimming pool would be significantly associated with
chloroform levels in 2–3 day personal air samples. The NHANES data show that eight of 10 US adults are exposed to detectable levels of chloroform.
Median (1.13 µg/m
3
), upper percentile (95th, 12.05 µg/m
3
), and cancer risk estimates were similar to those from recent US regional studies. Significant
predictors of log personal air chloroform in our model (R
2
=0.34) included age, chloroform concentrations in home tap water, having no windows open
at home during the sampling period, visiting a swimming pool during the sampling period, living in a mobile home/trailer or apartment versus living in a
single family (detached) home, and being Non-Hispanic Black versus Non-Hispanic White, although the race/ethnicity estimates appear influenced by
several outlying observations. Reported showering activity was not a significant predictor of personal air chloroform, possibly due to the wording of th e
NHANES shower question. The NHANES measurements likely underestimate true inhalation exposures since subjects did not wear sampling badges
while showering or swimming, and because of potential undersampling by the passive monitors. Research is needed to quantify the potential difference.
Journal of Exposure Science and Environmental Epidemiology (2009) 19, 248–259; doi:10.1038/jes.2008.7; published online 12 March 2008
Introduction
Chloroform is a colorless, volatile liquid that is sparingly soluble
in water and moderately lipophilic (Lide, 1996). Natural
sources including sea water and soil processes account for 90%
of emissions (Keene et al., 1999) while anthropogenic sources
include releases from drinking water and wastewater treatment,
certain industrial processes, cooling towers, and swimming
pools (McCulloch, 2003). Most chloroform in the environment
partitions to air, with the global average atmospheric
concentration estimated to be 73 ng/m
3
(McCulloch, 2003).
In mammals, inhaled chloroform is metabolized in the
liver, kidney, and nasal mucosa to trichloromethanol, which
degrades to phosgene (US Environmental Protection Agency
(EPA, 2001a)). Phosgene reacts with nucleophilic groups on
enzymes and proteins to form cytotoxic adducts (EPA,
2001a). There is no current evidence of long-term bioaccu-
mulation in humans (EPA, 2001a). Although an inhalation
reference concentration has not been published, EPA
published an oral reference dose of 0.01 milligrams per
kilogram body weight per day (mg/kg-d) based on animal
evidence of hepatotoxicity (EPA, 2007). EPA classifies
chloroform as a probable human carcinogen based on animal
studies showing that inhalation or ingestion at cytotoxic
doses produces hepatic and renal neoplasia (EPA, 2001a).
EPA has published an inhalation unit risk of 2.3 × 10
-5
per
µg/m
3
and estimated air concentrations of 4 µg/m
3
,
4 × 10
-1
µg/m
3
, and 4 × 10
-2
µg/m
3
at the 1 in 10
4
, 1 in
10
5
, and 1 in 10
6
cancer risk levels, respectively (EPA, 2007).
The California Environmental Protection Agency (CalEPA)
published an inhalation unit risk of 5.3 × 10
-6
per µg/m
3
(CalEPA, 2002). There is limited evidence for mutagenicity
or reproductive effects at doses below those causing systemic
toxicity (EPA, 2001a).
Chlorinated water is thought to be the primary source of
non-occupational chloroform exposure among US adults
(Nieuwenhuijsen et al., 2000; Wallace, 2001). Chloroform is
The present work was performed at the Department of Environmental
and Occupational Health, Rollins School of Public Health, Emory
University.
1. Abbreviations: AER, air exchange rate; CalEPA, California
Environmental Protection Agency; CDC, US Centers for Disease
Control and Prevention; CI, confidence interval; DHHS, US Depart-
ment of Healtha nd Human Services; NHANES, US National Health
and Nutrition Examination Survey; RfD, reference dose; TEAM, Total
Exposure Assessment Methodology; EPA, US Environmental Protection
Agency
2. Address all correspondence to: Dr. Anne M. Riederer, Department of
Environmental and Occupational Health, Rollins School of Public Health,
Emory University, 1518 Clifton Road NE, Atlanta, GA 30322, USA. Tel.:
þ404 712 8458. Fax: þ404 727 8744. E-mail:
Received 26 October 2007; accepted 24 January 2008; published online
12 March2008
formed in treated water by the reaction of chlorine with
humic acids and other organic material. Concentrations vary
by region, day, and time with reported levels ranging from
below detection to maximum values of 100–200 mg/l
(Clayton et al., 1999; Backer et al., 2000; Kerger et al.,
2000; Lynberg et al., 2001; Gordon et al., 2006). Bench-scale
experiments have shown that heating tap water gradually (as
in a hot water heater) or boiling can affect point of use levels.
Weisel and Chen (1994) recorded up to twofold increases in
tap water chloroform after heating from 25–651C for 30 min,
presumably from increased formation reactions among free
chlorine and dissolved organic constituents. Krasner and
Wright (2005) on the other hand hypothesized that
simultaneous formation and volatilization were responsible
for the 34% decrease they observed in tap water chloroform
after boiling for one minute.
Chloroform’s volatility and ubiquitous presence in tap
water may help explain why it is frequently detected in
personal air (e.g., air sampled from the breathing zone of
subjects) and indoor air at concentrations 10–100-fold higher
than outdoor levels. The EPA TEAM (Total Exposure
Assessment Methodology) studies showed consistently higher
levels in personal than outdoor air in 24-h samples collected
from over 1,500 subjects in four states (Wallace, 1987).
Recently, Weisel et al. (2005) measured higher concentrations
in 48-h samples of personal air (adult median 1.04 mg/m
3
)
and indoor air (median 0.92 mg/m
3
) than colocated outdoor
samples (median 0.17mg/m
3
) from 300 homes in Los
Angeles, Elizabeth, and Houston. Other US researchers
have found similar ratios of personal to indoor and outdoor
levels (Clayton et al., 1999; Payne-Sturges et al., 2004;
Sexton et al., 2004a).
While these studies illustrate the greater exposure potential
of personal and indoor air versus outdoor air, less is known
about which activities and microenvironments contribute the
largest fraction of daily inhalation intake. US adult volunteer
studies point to showering and/or bathing with chlorinated
tap water as a major contributor to daily inhalation
exposures. Gordon et al. (2006) found a 440-fold increase
in bathroom air chloroform after subjects took hot showers
in their study of household water use activities by seven
volunteers. Kerger et al. (2000) found that bathroom air
chloroform increased 3 and 1 mg/m
3
during showering and
bathing respectively for each mg/l chloroform in water. Using
a mass balance approach, water use data, and exposure
factor assumptions including 1 mg/l tap water chloroform,
McKone (1987) estimated that showering contributes up to
50% of lifetime chloroform inhalation exposures for the
average US adult versus spending time in the bathroom or
remainder of the house. Additionally, recent biomarker
studies of US adults show that showering/bathing is
significantly associated with increases in breath and/or blood
chloroform, while other household water use activities such
as washing dishes or clothes are not (Weisel et al., 1999;
Backer et al., 2000; Lynberg et al., 2001; Nuckols et al.,
2005; Xu and Weisel, 2005). Swimming in chlorinated pools
is also associated with elevated biomarker concentrations
though most studies have been conducted outside the United
States (Lindstrom et al., 1997; Le
´
vesque et al., 2000;
Erdinger et al., 2004; Caro and Gallego, 2007).
We used multiple linear regression to investigate the major
predictors of chloroform in personal air in the NHANES
1999–2000 VOC (volatile organic compound) Subsample
(US Centers for Disease Control and Prevention (CDC,
2007a)). The NHANES data, which include chloroform
concentrations in personal air and household tap water in
addition to socioeconomic data and information on activity
patterns, provide a unique opportunity to evaluate predictors
of inhalation exposures in a nationally representative sample.
We hypothesized that personal exposure microevents such as
showering/bathing and/or spending time at a pool would be
significantly associated with chloroform concentrations in
personal air while associations with other exposure factors
would not. We also compared personal air levels to EPA’s
inhalation unit risk values to evaluate the distribution of
cancer risk at the national level and among key subgroups.
Methods
NHANES Data Collection
Detailed methods are available at the NHANES website
(CDC, 2001). Briefly, a random subsample of subjects aged
20–59 was recruited to participate in the VOC study during
the NHANES medical examination. Consenting subjects
wore passive VOC exposure badges (3Mt Organic Vapor
Monitor 3520, 3M Corporation, St Paul, MN, USA)
continuously for 46–76 h after the examination. Subjects
were instructed to wear it on the upper chest, leave it on a
bedside table or clipped to a nearby lampshade while
sleeping, and leave it in an adjacent room while showering
since humidity affects readings. Subjects were also asked to
record hours spent indoors at home, indoors at work/school,
and outdoors using an activity log, and instructed to collect a
tap water sample from a bathtub or an outside faucet in an
NHANES-provided container (CDC, 2001). When subjects
returned their samples, an NHANES interviewer adminis-
tered a brief questionnaire to collect information on VOC
exposure-related activities (CDC, 2001). Home examiners
interviewed and collected samples from subjects who could
not return to the trailer within 46–76 h; samples collected
outside this window were considered invalid.
Samples were analyzed at CDC or contract laboratories.
Badge measurements below the analytical detection limit
were replaced with the detection limit, adjusted for badge
wearing minutes, divided by O2 (CDC, 2005a). Water
measurements below detection were replaced with the
detection limit divided by O2. Although badge field
Predictors of chloroform in personal air among US adults Riederer et al.
Journal of Exposure Science and Environmenta l Epidemiology (2009) 19(3) 249
duplicates, field blanks, and positive controls were collected,
results for these quality control samples were not available in
the NHANES public release data.
In addition to tap water chloroform, we considered 30
NHANES variables potential predictors of chloroform
inhalation exposure. Of these, 17 were from the VOC
Questionnaire (CDC, 2001), eight from the Demographic
Questionnaire (CDC, 2005b), and three from the Housing
Characteristics Questionnaire (CDC, 2005c). Another, body
mass index, was recorded during the NHANES examination
(CDC, 2005d). We considered the variable indicating
whether subjects participated in morning, afternoon or
evening examination sessions (CDC, 2005d) a proxy for
the time of day subjects began wearing the badge.
We downloaded the relevant data sets from the NHANES
website (CDC, 2005a, d–f, 2007a, b) and used the NHANES
VOC Subsample weights (WTSVOC2Y) as well as the
stratum (SDMVSTRA) and cluster (SDMVPSU) variables
available in the NHANES 1999–2000 demographic data for
weighted statistical analyses. Certain NHANES data were
updated after their initial public release; all used in the
present study were updated as of June 2007.
Variable Recodes
We preserved the NHANES categorical variable groupings
but recoded them so the group with the highest weighted
frequency in the VOC subsample was the reference group.
Minor recodes included combining the ‘‘something else’’ and
‘‘dorm’’ responses to the NHANES type of home question
into one category and transforming badge wearing minutes
to hours. We treated household income (INDHHINC) as a
continuous variable using the NHANES numerical cate-
gories (1–11) instead of their corresponding income ranges.
NHANES included two additional income categories (12,
4$20,000 and 13, o$20,000) to minimize refused/don’t
know responses. We recoded Category 12 responses as
missing; this affected 3.2% of subjects. There were no
Category 13 responses.
We developed a new occupation variable to identify
subjects with workplace exposure potential. NHANES
Question OCD230 asked subjects the industry they worked
in while Question OCD240 asked the type of work they
performed. We created a variable (‘‘occupation’’) with four
response categories: 0 F other; 1 F food preparation/store/
restaurant; 2 F manufacturing (paper, chemicals, food,
electrical/transport equipment); 3 F construction, and; 4 F
no industry/job recorded. We considered Categories 1–3 to
have workplace exposure potential based on information
from the 11th Report on Carcinogens (US Department of
Health and Human Services (DHHS, 2005)) and industries
reporting 410,000 lb annual chloroform releases during
1999–2000 to the EPA Toxic Release Inventory Program
(EPA, 2001b, 2002). We considered other industries/jobs
(Category 0) to have limited exposure potential.
Category 1 includes subjects who reported ‘‘retail-food
stores’’ or ‘‘retail-eating/drinking places’’ in response to
Question OCD230, as well as subjects who reported ‘‘cooks’’
or ‘‘miscellaneous food preparation/service’’ to Question
OCD240. We assumed these workers would spend part of the
day in a kitchen around water use activities. If a subject said
s/he worked in food preparation but as a waitress/waiter, we
coded her/him as Category 0, assuming s/he spent less time
around water than cooks or dishwashers for example.
Category 2 includes workers in food/kindred products, paper
products/printing/publishing, chemicals/petroleum/coal pro-
ducts, or transportation equipment industries. Textile/appa-
rel/furnishings machine operators were also included in
Category 2 since one author (A. Riederer) observed extensive
water use on visits to US textile mills in the 1990s, and since
the textile response category to Question OCD230 applied to
finished products which we assumed do not require as much
water to manufacture as unfinished cloth. Category 3
includes subjects who reported working in construction
(Question OCD230) and/or in construction trades (Question
OCD240).
Exploratory Data Analysis
Of the 851 subjects selected, 669 completed the VOC
sampling protocol. Subsample weights were adjusted by
CDC for non-response, and to match projected Census 2000
counts, and sum to 150,249,991 (CDC, 2006). We calculated
weighted response frequencies and 95% confidence intervals
(95% CIs) using PROC SURVEYFREQ in SAS 9.1 (SAS
Institute, Cary, NC, USA). We also conducted exploratory
analysis on the weighted and unweighted continuous
variables. Distributions of raw and log-transformed data
were visually evaluated for normality and outliers. Variables
with histograms appearing right-skewed were log-trans-
formed for the regressions. We evaluated colinearity between
continuous predictors using simple scatter plots. Last, we
calculated weighted cumulative percentiles of personal air
chloroform and 95% CIs for the percentile estimates using
the DESCRIPT procedure in SUDAAN 9.0.0 (Research
Triangle Institute, Research Triangle Park, NC, USA).
Regression Modeling and Diagnostics
We conducted weighted regression modeling in SUDAAN
PROC REGRESS, using the NHANES fill-in values for
measurements below detection. Model building was con-
ducted by first performing univariate regressions of log-
transformed chloroform badge concentrations on each of the
31 initial predictors. We also included a quadratic term for
badge wearing time to account for potential non-linearity in
response. Predictors with p-values of 0.2 or less were retained
for the multivariable analysis. These were assigned a random
number and added one-by-one in ascending order to a
multivariable model fitted using PROC REGRESS. Pre-
dictors with Pr0.2 were retained in each subsequent step.
Predictors of chloroform in personal air among US adultsRiederer et al.
250 Journal of Exposure Science and Environmental Epidemiolog y (2009) 19(3)
We fit the final model and manually removed predictors with
p40.05 until all remaining predictors had Pr0.05, our
criterion for statistical significance.
We evaluated model assumptions of normality and
homoscedasticity by examining plots of predicted values
versus residuals as well as histograms and normal probability
plots of residuals. Model fit was evaluated using the R
2
statistic. Following Korn and Graubard (1998), we exam-
ined partial regression plots to identify potentially influential
observations then compared parameter estimates in the full
model versus a model with each influential observation
excluded. Influential observations were excluded one at a
time in these analyses.
Cancer Risk Estimates
We estimated lifetime excess cancer risk for individual
subjects by multiplying her/his badge concentration by
EPA’s chloroform inhalation unit risk (EPA, 2005). This
method estimates an individual’s upper-bound risk of
developing cancer over a lifetime (70 years) of exposure at
the measured concentration. We estimated population risk in
units of excess cancer cases by multiplying each subject’s
individual excess risk by her/his NHANES sample weight,
then summing across the total population or subgroup. To
evaluate the distribution of risk burden within subgroups, we
calculated the weighted percent of each subgroup at the Z1
in 10
4
,1in10
6
–1 in 10
5
,andr1in10
6
individual risk levels.
We considered subgroups with higher proportions of people
at the Z1in10
4
risk level to bear a greater cancer burden
than subgroups with fewer at that level. For comparison, we
repeated these calculations using the CalEPA inhalation unit
risk (CalEPA, 2002).
Results
Weighted Detection and Response Frequencies
Chloroform was measured at levels at or above detection
limits in 77.2% of badge and 80.1% of water samples.
Measurements were below detection in 20.0% of badge and
15.3% of water samples, while 2.8 and 4.6% of badge and
water samples respectively were missing. One water measure-
ment exceeded the upper bound of the calibrated range of the
analytical method but by o20% thus we included it in our
regressions; excluding it did not change statistical outcomes.
Table 1 shows weighted response frequencies and descrip-
tive statistics for the regression predictors. Missing responses
ranged from 0–4.2% while refusals or ‘‘don’t know’’ (not
shown) accounted for o1% of responses. Household income
(not shown) was missing for 8.4% of subjects, while the three
most commonly reported categories were $25,000–34,999
(11.7%), $55,000–64,999 (10.4%), and Z$75,000 (22.7%).
Most subjects (48.8%) participated in the morning
NHANES examination. A majority reported wearing the
badge at all times (88.2%) and taking a hot shower for
Z5 min (85.9%). Half (55.4%) reported having windows
open at home, and/or breathing fumes from/using air
fresheners/room deodorizers (47.4%) and/or disinfectant/
degreasing cleaners (39.5%). Less than a third responded yes
to other chloroform-related items on the VOC Question-
naire. Only 8.8% reported visiting a pool. The median badge
wearing hours was 53.6 and no subject wore her/his badge
o28 h. Median hours spent indoors at home, indoors at
work/school, and outdoors were 29.9, 7.8 and 5.7,
respectively. Median chloroform in water was 13.7 ng/ml
(7.0–19.3 ng/ml, 95% CI) while the 95th percentile was
74.7 ng/ml (50.6–112.9 ng/ml, 95% CI).
Distribution of Personal Air Chloroform
Figure 1 shows the weighted cumulative distribution of
personal air chloroform in NHANES 1999–2000. Median
and 95th percentile levels were 1.13 mg/m
3
(0.93–1.39 mg/m
3
,
95% CI) and 12.05 mg/m
3
(8.12–13.54 mg/m
3
, 95% CI),
respectively. The maximum concentration was 53.9 mg/m
3
.
Figure 1 also shows the detection limits and the EPA-
estimated air concentrations at the 1 in 10
5
and 1 in 10
4
cancer risk levels. Detection limits varied for each badge
depending on wearing duration, with those worn longer
having lower limits than those worn for shorter periods. All
measurements at or below the 1 in 10
5
risk level (0.4 mg/m
3
),
corresponding to 13% of US adults, were below detection.
All measurements at or above 0.55 mg/m
3
were in the
detectable range. Approximately 59% (6% of US adults)
of values in the 0.41–0.55 mg/m
3
range were below detection
while 41% (4% of US adults) of values in this range were
detectable. The majority (62%) of US adults had measure-
ments at the 1 in 10
5
–1 in 10
4
risk level while 19% had values
exceeding the 1 in 10
4
risk level.
Significant Predictors of Personal Air Chloroform
Predictors eliminated by the univariate screen included: wore
badge at all times, education, body mass index, new carpets,
hours indoors at work/school, hours outdoors, took hot
shower for Z5 min, in dry cleaning shop/drycleaned clothes,
near wood-burning, breathed fumes from/used dry cleaning
fluid/spot remover, and breathed fumes from/used glues/
adhesives hobbies/crafts. Predictors eliminated during multi-
variable modeling included: badge wearing hours, examina-
tion session, gender, occupation, income, wear respirator at
work, wear gloves at work, number of rooms in the home,
hours indoors at home, use home water treatment devices,
store paints/fuels inside home, and breathe fumes from/use
paint, disinfectant/degreasing cleaners, and air fresheners/
room deodorizers.
Diagnostic plots suggested that model assumptions of
normality and homoscedasticity were valid. A maximum of
13 parameters were estimable in the final fitted model. Table 2
summarizes the regression coefficients (bs) for predictors
Predictors of chloroform in personal air among US adults Riederer et al.
Journal of Exposure Science and Environmenta l Epidemiology (2009) 19(3) 251
Ta ble 1 . Weighted response frequencies and descriptive statistics of chloroform inhalation exposure predictors in the NHANES 1999–2000 VOC Subsample.
Predictor NHANES code % Predictor (NHANES ID) %
(NHANES ID) Missing Median 95th
percentile
Range
Exam session (PHDSESN) ¼ missing/don’t know F
0 ¼ morning 48.8 Time-activity patterns
1 ¼ afternoon 29.6 Badge wearing hours (LBAVOCSD-h) 0.9 53.6 72.3 34–190
2 ¼ evening 21.7
Demographic Hours indoors at home (VTQ090) 3.1 29.9 49.5 2–70
Age (RIDAGEYR) ¼ missing/don’t know F
number (20–59) 100 Hours indoors work/school (VTQ110) 3.1 7.8 24.6 0–45
Gender (RIAGENDR) ¼ missing/don’t know F
1 ¼ male 48.6 Hours outdoors (VTQ120) 3.1 5.7 23.4 0–40
2 ¼ female 51.4
Race/ethnicity (RIDRETH1) ¼ missing/don’t know F Personal exposure microevents
1 ¼ Mexican American
2 ¼ Other Hispanic
7.3
7.8
%Missing %Yes %No
3 ¼ Non-Hispanic White 68.8 Wear badge at all times (VTQ015) 4.2 88.2 7.6
4 ¼ Non-Hispanic Black 11.7
5 ¼ Other Race 4.4 Any windows open at home (VTQ100) 3.1 55.4 41.2
Highest level education (DMDEDUC) ¼ missing/don’t know F Any time at swimming pool (VTQ140) 3.4 8.8 87.8
1 ¼ o high school 20.0
2 ¼ high school diploma 26.4 In drycleaning shop, drycleaned clothes (VTQ150) 3.4 14.2 82.1
3 ¼ 4 high school 53.6
Body mass index (BMXBMI) ¼missing 0.2 Near wood-burning fire 10 min or longer (VTQ160) 3.4 8.9 87.4
number 99.8
Housing Hot shower for 5 min or longer (VTQ180) 3.4 85.9 10.5
Type of home (HOD010) ¼ missing 0.6
1 ¼ mobile home/trailer 6.9 Breathe fumes from/use:
2 ¼ 1family,detached
3 ¼ 1family,attached
61.5
6.1
Paint (VTQ200A) 3.7 10.0 86.2
4 ¼ apartment 23.2 Disinfectant/degreasing cleaners (VTQ200C) 3.8 39.5 56.7
5 ¼ something else 1.0
6 ¼ dorm 0.6 Air fresheners/room deodorizers (VTQ200J) 3.8 47.7 48.2
Number of rooms in home (HOD050) ¼ missing 0.6 Drycleaning fluid/spot remover (VTQ200K) 3.8 6.3 89.9
number (1–12) 98.7
13 ¼ 13 or more 0.5 Glues/adhesives, hobbies crafts (VTQ200L) 3.8 8.8 87.4
Source of tap water in home (HOQ070) ¼ missing 0.6
1 ¼ private/public water 83.9 New carpets home/work past 6 months (VTQ070) 3.1 18.3 78.6
Predictors of chloroform in personal air among US adultsRiederer et al.
252 Journal of Exposure Science and Environmental Epidemiolog y (2009) 19(3)