Tải bản đầy đủ (.pdf) (30 trang)

Modeling phosphorus in the environment - Chapter 6 pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1005.91 KB, 30 trang )

131
6
Uncertainty Estimation
in Phosphorus Models
Keith Beven
Lancaster University, Lancaster, United Kingdom
Trevor Page
Lancaster University, Lancaster, United Kingdom
Malcolm McGechan
Scottish Agricultural College, Bush Estate, Penicuik,
United Kingdom
CONTENTS
6.1 Sources of Uncertainty in Modeling P Transport to Stream Channels 132
6.2 Sources of Uncertainty 133
6.3 Uncertainty Is Not Only Statistics 134
6.4 Uncertainty Estimation: Formal Bayes Methods 135
6.5 Uncertainty Estimation Based on the Equifinality Concept
and Formal Rejectionist Methods 137
6.6 Uncertainty as Part of a Learning Process 140
6.7 An Example Application 142
6.7.1 The MACRO Model 142
6.7.2 Study Site and Data 143
6.7.2.1 Drainage Discharge and Phosphorous Concentrations 143
6.7.2.2 Slurry Applications 144
6.7.3 MACRO Implementation within a Model Rejection Framework 144
6.7.4 Results and Discussion 146
6.7.4.1 Using Initial Rejection Criteria 146
6.7.4.2 Using Relaxed Rejection Criteria 148
6.7.4.3 Simulations for the Period from 1994 to 1995 148
6.7.4.4 Simulations and Parameterizations
for the Period 1995 to 96 150


6.8 Learning from Rejection: What If All the Models Tried
Are Nonbehavioral? 153
6.9 What Are the Implications for P Models? 155
Acknowledgments 157
References 157
© 2007 by Taylor & Francis Group, LLC
132 Modeling Phosphorus in the Environment
6.1 SOURCES OF UNCERTAINTY IN MODELING
P TRANSPORT TO STREAM CHANNELS
The starting point for this contribution is the extensive review of Beck (1987). Sum-
marizing his arguments at the end of the review, he posed the following questions:
Are the basic problems of model identification ones primarily of inadequate
method or of inadequate forms of data?
What opportunities are there for the development of improved, novel methods
of model structure identification, particularly regarding exposing the failure
of inadequate, constituent model hypotheses?
How can an archive of prior hypotheses be appropriately engaged in inferring
the form of an improved model structure from diagnosis of the failure of
an inadequate model structure? Moreover, in what form should the knowl-
edge of the archive be most usefully represented?
What does a lack of identifiability imply for the distortion of a model struc-
ture, and what are the consequences of a distorted model structure in terms
of generating predictions?
Given uncertainty, how can one speculate about the prediction of a “radically
different” future?
What, in the end, does all this mean for decision making under uncertainty?
These questions have been reinforced by the more recent analyses of environmental
modeling by Beven (2002a, 2002b, 2004a, 2005, 2006a) and demand an answer to
why, nearly two decades later, there are still many model structures and applications
that do not consider model identification problems and uncertainties explicitly.

The uncertainties exist. They are often ignored. It seems as if the saving grace
of the environmental modeler has been model calibration. If a model has at least
approximately the right sort of functionality, then there are generally sufficient
degrees of freedom to be able to adjust effective values of the parameters to get an
acceptable fit to the data and to declare some sort of success in reporting results in
scientific articles and reports to decision makers. This obviously does not mean that
what is being reported is good science if the calibration allows compensation for
errors in model structure as a representation of the processes actually controlling
water quality variables, including phosphorus (P) concentrations in different forms.
Perhaps we are now reaching a stage where it might be possible to take account of
some of the sources of uncertainty in predicting water quality more explicitly, P
being a particularly interesting and practically relevant example.
It is important to recognize from the outset, however, that this will be difficult:
(1) to evaluate model structures as working hypotheses about the functioning of
catchment systems independently of errors in the input data used to drive the
model and the calibration of effective parameter values; and (2) to estimate effec-
tive values of physical and geochemical parameter values a priori by measurement.
The struggle to improve water-quality modeling remains as much a struggle against
the limitations of current measurement techniques as against the limitations of
current model structures.
© 2007 by Taylor & Francis Group, LLC
Uncertainty Estimation in Phosphorus Models 133
6.2 SOURCES OF UNCERTAINTY
The sources of uncertainty in the modeling process are manifold, and, generally
speaking, good methodologies have not been developed for assessing the nature and
magnitude of uncertainties from different sources. They are thus frequently
neglected. For example, some uncertainty exists in the input and boundary condition
data used to drive a model. Such uncertainties include measurement errors in assess-
ing the inputs at the measurement scale, together with interpolation errors in space
and time to provide the values required at the lumped or distributed element scale

of the model. The interpolation error may be made worse by a lack of resolution in
the measurements in space and time and by nonstationarity in the processes con-
trolling the inputs. Rainfall is a good example. There are issues about all the
measurement techniques available to estimate rainfalls, both at a point using gauges
or over an area using radar or microwave techniques. Point ground-level measure-
ments may be sparse in space, whereas the spatial and temporal variability of rainfall
intensities may vary markedly between events. Rainfall may show fractal character-
istics in space and time, but analyses suggest that there may be nonstationarity in
the fractal scaling between events. Thus, interpolation of the measurements to pro-
vide the inputs — and an estimate of their uncertainty — at the space and time
scales of the model may be difficult. What is clear is that a point measurement of
rainfall is, under many circumstances, not a good estimate of the rainfall inputs
required by the model. The two variables may, because of time and space variability,
actually be related but different variables — they are incommensurate. Yet rainfall
data are essential to drive models that will predict the fluxes in hydrological pathways
that will control the transport of P. However, the number of nonhypothetical hydro-
logical modeling studies that have attempted to include a treatment of rainfall
estimation error is very small indeed.
The problem is compounded by other uncertainties. Most particularly for event-
based simulations, errors in the estimation of model initial conditions may be
important. Errors may be associated with the model structures used due to the
incorrect representation of some processes or the neglect of processes (e.g., prefer-
ential flow pathways) that are important in the real system. There may be errors in
estimating or calibrating the effective values of parameters in the model that may
control the predictions of P mobilization and transport in different pathways. Finally,
there may be errors in the observations used to evaluate the model predictions or to
calibrate the model parameters.
Unfortunately, the possibility of assessing all these different sources of error is
limited. In general, only the total model error produced can be assessed by comparing
an observation, which is not error-free, with a model-predicted variable produced

by a model, which is subject to structural and input errors. Unless some very strong
— and usually difficult to justify — assumptions are made about the nature of the
sources of error, disaggregating the total model error into its component parts will
be impossible. It is an ill-posed problem. The result will be an inevitable ambiguity
in model calibrations and error assessment — an ambiguity that also brings with it
difficulty in transferring information gained in one application to applications at
other sites or different hydrological conditions.
© 2007 by Taylor & Francis Group, LLC
134 Modeling Phosphorus in the Environment
6.3 UNCERTAINTY IS NOT ONLY STATISTICS
The aim of science, however, is a single true description of reality. The ambiguity
arising from uncertainties from these different sources means that this aim is difficult
to achieve in applications to places that are all unique in their characteristics and
uncertainties. It follows that many descriptions may be compatible with current
understanding and available observations, called the equifinality thesis (Beven 1993,
2006a). One way of viewing these multiple descriptions is as different working
hypotheses of how a system functions. The concept of the single description may
remain a philosophical axiom or theoretical aim but will generally be impossible to
achieve in practice in applications to real systems (Beven 2002a, 2002b).
This view is actually fundamentally different to a statistical approach to model
identification. In both frequentist and Bayesian approaches to statistics, the uncer-
tainty associated with a model prediction is often assumed to be adequately treated
as a single lumped additive variable in the form
O(X, t) = M(Θ,
ε
θ
, I,
ε
I
, X, t) +

ε
(X, t) (6.1)
where O(X, t) is a measured output variable, such as discharge, at point X and time
t; M(Θ,
ε
θ
, I,
ε
I
, X, t) is the prediction of that variable from the model with parameter
set Θ with errors
ε
θ
and driven by the input vector I with errors
ε
I
; and
ε
(X, t) is the
model error at that point in space and time. Transformations of the variables of
Equation 6.1 can also be used where appropriate to constrain the modeling problem
to this form. A logarithmic transformation, for example, can be used for an error
that is multiplicative —that is, increasing with the magnitude of the model prediction
— as a simple way of allowing for heteroscedascticity in the errors with nonconstant
variance. Other transformations can also be used to try to stabilize the statistical
characteristics of the error series (Box and Cox 1964). Normal statistical inference
then aims to identify the parameter set Θ that will be in some sense optimal, normally
by minimizing the residual error variance of a model of the model error, which
might include its own parameters for bias and autocorrelation terms with the aim
of making the residual error independent and identically distributed, even though

there may be good physical reasons why errors that have constant statistical char-
acteristics in hydrological and water quality modeling should not be expected (see,
e.g., Freer et al. 1996).
The additive form of Equation 6.1 allows the full range of statistical estimation
techniques, including Bayesian updating, to be used in model calibration. The
approach has been widely used in hydrological and water resources applications,
including flood forecasting involving data assimilation (e.g., Krzysztofowicz 2002;
Young 2001, 2002 and references therein), groundwater modeling, including Bayesian
averaging of model structures (e.g., Ye et al. 2004), and rainfall-runoff modeling
(e.g. Kavetski et al. 2002; Vrugt et al. 2002, 2003).
In principle, the additive error assumption that underlies this form of uncertainty
is particularly valuable for two reasons: (1) it allows checking of whether the actual
errors conform to the assumptions made about the structural model of the errors;
and (2) if this is so, then a true probability of predicting an observation, conditional
© 2007 by Taylor & Francis Group, LLC
Uncertainty Estimation in Phosphorus Models 135
on the model, can be predicted as the likelihood L(O(X, t) | M(
θ
, I, X, t)). These
advantages, however, may be difficult to justify in many real applications where
poorly known input errors are processed through a nonlinear model subject to
structural error and equifinality (see Hall 2003; Klir 1994 for reviews of more
generalized mathematizations of uncertainty, including discussion of fuzzy set meth-
ods and the Dempster-Shafer theory of evidence). One implication of the limitations
of the additive error model is that it may actually be quite difficult to estimate the
true probability of predicting an observation, given one or more models, except in
ideal cases because the model structural error has a complex and nonlinear effect
structured in both time and space on the total model error,
ε
(X, t).

This implies that a philosophically different approach to the statistical approach
might be worth investigating. In the statistical approach, the error model is generally
evaluated as conditioned on finding the best maximum likelihood model. In evalu-
ating models as multiple working hypotheses, it is often more interesting to estimate
the likelihood of a model conditioned on some vector of observations such as L(M(
θ
,
ε
θ
, I,
ε
I
, X, t) | O(X, t)) and, in particular, to reject those models as unacceptable
hypotheses that should have a zero likelihood.
This is the basis for the Generalized Likelihood Uncertainty Estimation (GLUE)
methodology, first proposed by Beven and Binley (1992). It can be argued that the
formal statistical approaches are a special case of the GLUE methodology within
which the formal assumptions of a defined error model can be accepted such that
the formal likelihood function can be used to weight model predictions. It can also
be argued that the GLUE methodology is a special case of formal statistical inference,
in which informal likelihood measures replace a formal likelihood function with its
rigorous assumptions about the nature of the error model. GLUE can indeed make
use of formal likelihood measures if the associated assumptions can be justified. It
is perhaps better, however, to consider the two approaches as based on different
philosophical frameworks to the uncertainty estimation problem.
6.4 UNCERTAINTY ESTIMATION: FORMAL
BAYES METHODS
The traditional approach to model calibration in hydrological modeling has been to
simplify Equation 6.1 to the form
O(X, t) = M(

θ
, I, X, t) +
ε
(X, t) (6.2)
with the aim of minimizing the total error in some way. This assumes that the effect
of all sources of error can be subsumed into the total error series as if the model
was correct and that the input and boundary condition data and observations were
known precisely.
Furthermore, if the total error
ε
(X, t) can be assumed to have a relatively simple
form — or can be suitably transformed to a simple form — then a formal statistical
likelihood function can be defined, dependent on the assumed error structure. Thus,
for an evaluation made for observations at a single site for total model errors that
© 2007 by Taylor & Francis Group, LLC
136 Modeling Phosphorus in the Environment
can be assumed to have zero mean, constant variance, independence in time, and a
Gaussian distribution, the likelihood function takes the form
(6.3)
where
ε
t
= O(X, t) − M(
θ
, I, X, t) at time t, T is the total number of time steps, and
σ
2
is the residual error variance. For total model errors that can be assumed to have
a constant bias, constant variance, autcorrelation in time, and a Gaussian distribution,
the likelihood function takes the form

(6.4)
where
µ
is the mean residual error (bias) and
α
is the lag 1 correlation coefficient
of the total model residuals in time. More complex error structure assumptions will
lead to more complex likelihood functions, with more parameters to be estimated.
A significant advantage of this formal statistical approach is that when the
assumptions are satisfied, the theory allows the estimation of the probability with
which an observation will be predicted, conditional on the model and parameter
values, and the probability density functions of the parameter estimates, which under
these assumptions will be multivariate normal. As more data are made available, the
use of these likelihood functions will also lead to reduced uncertainty in the estimated
parameter values, even if the total error variance is not reduced. O’Hagan (2004)
suggested that this is the only satisfactory way of addressing the issue of model
uncertainty; without proper probability estimate statements about modeling, uncer-
tainty will have no meaning.
There is an issue, however, about when probability estimates based on additive,
or transformed, error structures are meaningful. From a purely empirical point of
view, a test of the actual model residuals
ε
(X, t) for validity relative to the assumptions
made in formulating the likelihood function might be considered sufficient to justify
probability statements of uncertainty. From a theoretical point of view, however,
there has to be some concern about treating the full sources of error in Equation 6.2
in this type of aggregated form. Model structural errors will, in the general case, be
nonlinear, nonstationary, and nonadditive. Input and boundary condition errors, as
well as any parameter errors, will also be processed through the model structure in
nonlinear, nonstationary, and nonadditive ways.

Kennedy and O’Hagan (2001) attempted to address this problem by showing
that all sources of error might be represented within a hierarchical Bayesian frame-
work. In particular, where any model structural error is simple in form, it might be
possible to estimate this as what they called a “model inadequacy function,” or, more
LMIXt
T
T
(| (,, ,)) ( ) exp
/
εθ πσ
σ
ε
=−


=

2
1
2
22
2
2
1
t
t








LMIXt
T
(| (,, ,)) ( ) ( )
exp
//
εθ πσ α
σ
=−
×−

21
1
2
22 212
22
2
1
2
1
2
2
1()() [ ( )]−−+−−−







=

αεµ εµαε µ
tt
t
T















© 2007 by Taylor & Francis Group, LLC
Uncertainty Estimation in Phosphorus Models 137
recently, “model discrepancy function” (O’Hagan 2004). In principle, this could take
any nonlinear form, although the most complex in the cases they considered was a
constant bias, which can, in any case, be included as a parameter in Equation 6.4.
The aim is to extract as much structural information from the total error series as
possible, ideally leaving a Gaussian independent and identically distributed residual
error term. The model discrepancy function can then also be used in prediction,

under the assumption that the nature of the structural errors in calibration will be
similar in prediction.
It should be noted, however, that the model discrepancy function is not a direct
representation of model structural error. It is a compensatory term for all the unknown
sources of error in Equation 6.1, conditional on any particular realization of the
model, including specified parameter values and input data. These sources of error
could, in principle, be considered explicitly in the Bayesian hierarchy if good
information were available as to their nature. This will rarely be the case in hydro-
logical modeling applications, where, for example, rainfall inputs to the system may
be poorly known for all events in some catchments and where even the most
fundamental equation — the water balance — cannot be closed by measurement
(Beven 2001, 2002b). Thus, disaggregation of the different error components will
be necessarily poorly posed, and ignoring potential sources of error, including model
structural error, may result in an overestimation of the information content of addi-
tional data and may lead to an unjustified overconfidence in estimated parameter
values (see discussion in Beven and Young 2003). In representing the modeling
process by the simplified form of Equation 6.2, the error model is required to
compensate for all sources of deficiency.
6.5 UNCERTAINTY ESTIMATION BASED
ON THE EQUIFINALITY CONCEPT
AND FORMAL REJECTIONIST METHODS
The equifinality thesis is the central concept of the GLUE methodology (Beven and
Binley 1992; Beven and Freer 2001). The GLUE methodology does not purport to
estimate the probability of predicting an observation given the model but rather
attempts to evaluate the predicted distribution of a variable that is always conditional
on the model or models considered, the ranges of parameter values considered, the
evaluation measures used, and the input and output data available to the application
for model evaluation. The prediction distributions do not consider the residual error
associated with a particular model run explicitly. There is instead an assumption that
the error series associated with a model run in calibration will have similar charac-

teristics in prediction — note the similar assumption about model structural error
in the formal likelihood approach just described. Thus, in weighting the predictions
of multiple models to form the predictive distribution for a variable, there is an
implicit weighting of the error series associated with those models, without the need
to consider different sources of error explicitly; explicit error models can be handled
in this framework by treating them as additional model components (see, e.g.,
Romanowicz et al. 1998).
© 2007 by Taylor & Francis Group, LLC
138 Modeling Phosphorus in the Environment
One of the most interesting features of the GLUE methodology is the comple-
mentarity of model equifinality and model rejection. Equifinality accepts that mul-
tiple models may be useful in prediction and that any attempt to identify an optimal
model might be illusory. But if multiple models are to be considered acceptable or
behavioral, it is evident that models can also be rejected (given a likelihood of zero)
where they can be shown to be nonbehavioral (given unacceptable simulations of
the available observables). Thus, there is always a possibility that all the models
tried will be rejected — unlike the statistical approach where it is possible to
compensate for model deficiencies by some error structure.
However, at this point the limitations of implicit handling of error series in the GLUE
methodology become apparent since it is possible that some hypothetical perfect model
could be rejected if driven by poor input and boundary condition data or if compared
with poor observation data. Thus, there is a need for a more explicit consideration of
sources of error in this framework while retaining the possibility of model rejection.
A potential methodology has been proposed by Beven (2005, 2006a). Equation
6.1 can be rewritten to reflect more sources of error as
O(X, t) +
ε
O
(X, t) +
ε

C
(∆x, ∆t, X, t) = M(
θ
,
ε
θ
, I,
ε
I
, X, t) +
ε
M
(
θ
,
ε
θ
, I,
ε
I
, X, t) +
ε
r
(6.5)
The error terms on the left-hand side of Equation 6.5 represent the measurement
error,
ε
O
(X, t), and the commensurability error between observed and predicted vari-
ables,

ε
C
(∆x, ∆t, X, t). The model term, M(
θ
,
ε
θ
, I,
ε
I
, X, t), will reflect error in input
and boundary conditions, model parameters, and model structure. The error term,
ε
M
(
θ
,
ε
θ
, I,
ε
I
, X, t), can now be interpreted as a compensatory error term for model defi-
ciencies, analogous to the discrepancy function in the Bayesian statistical approach of
O’Hagan (2004) but that must also reflect error in input and boundary conditions,
model parameters, and model structure. Finally, there may be a random error term,
ε
r
.
Equation 6.5 has been written in this form to both highlight the importance

of observation measurement errors and the commensurability error issue and to
reflect the real difficulty of separating input and boundary condition errors, param-
eter errors, and model structural error in nonlinear cases. There is no general
theory available for doing this in nonlinear dynamic cases. One simplification can
be made in Equation 6.5: If applied on a model-by-model basis, model parameter
error has no real meaning. It is the model structure and set of effective parameter
values together that process the nonerror-free input data and determine total model
error in space and time. Thus, Equation 6.5 could be rewritten, for any model
structure, as
O(X, t) +
ε
O
(X, t) +
ε
C
(∆x, ∆t, X, t) = M(
θ
, I,
ε
I
, X, t) +
ε
M
(
θ
, I,
ε
I
, X, t) +
ε

r
(6.6)
and
ε
M
(
θ
, I,
ε
I
, X, t) is a model specific error term.
The question that then arises within this framework is whether
ε
M
(
θ
, I,
ε
I
, X, t)
is acceptable in relation to the terms
ε
O
(X, t) +
ε
C
(∆x, ∆t, X, t). This is equivalent
to asking if the following inequality holds:
O
min

(X, t) < M(
θ
, I,
ε
I
, X, t) < O
max
(X, t) for all O(X, t) (6.7)
© 2007 by Taylor & Francis Group, LLC
Uncertainty Estimation in Phosphorus Models 139
where O
min
(X, t) and O
max
(X, t) are acceptable limits for the prediction of the output
variables given
ε
O
(X, t) and
ε
C
(∆x, ∆t, X, t), which together might be termed an
effective observation error. The effective observation error takes account of both
real measurement errors and commensurability errors between observed and pre-
dicted variables. When defined in this way, the effective observation error needs
neither zero mean or constant variance nor to be Gaussian or stationary in the form
of its distribution in space or time, particularly where there may be physical con-
straints on the nature of that error. Note that the commensurability error might be
expected to be model implementation dependent in that the difference between
observed and predicted variables may depend on model time and space discretisa-

tions and measurement scales in relation to expected time and space heterogeneities
of the observable quantities. However, it should really be possible to develop a
methodology for making prior estimates of both measurement and commensurability
errors, since they should be independent of individual model runs. An objective
evaluation of each model run using Equation 6.7 should then be possible. If a model
does not provide predictions within the specified range, for any O(X, t), then it should
be rejected as nonbehavioral.
This rejectionist framework, based on the equifinality concept, is analogous to
set-theoretic concepts previously used in environmental modeling (by, e.g., Klepper
et al. 1991; Osidele et al. 2005; Rose et al. 1991; Spear et al. 1994; van Straten and
Keesman 1991). It is also a generalization of the Hornberger-Spear-Young method
of Generalized Sensitivity Analysis (Hornberger and Spear 1981; Young 1983),
which was also based on a split of a series of Monte Carlo model runs into sets of
behavioral and nonbehavioral models. It results in a set of provisionally behavioral
models that satisfy all the evaluation criteria as expressed as criteria in the form of
Equation 6.7.
The approach can also be relativist in taking account of the performance of
different models within the set of behavioral models (Beven 2004b, 2005). Within
the behavioral range, for all O(X, t), a positive weight could be assigned to the model
predictions, M(
θ
, I,
ε
I
, X, t), according to the level of past performance. The simplest
possible weighting scheme that need not be symmetric around the observed value,
given an observation O(X, t) and the acceptable range [O
min
(X, t), O
max

(X, t)], is the
triangular relative weighting scheme, but other bounded weighting schemes could
be used — including truncated Gaussian forms. A core range of observational
ambiguity, or equal weighting, could be added if required (Beven 2006a).
This methodology gives rise to some interesting possibilities. Within this frame-
work there is no possibility of a representation of model error being allowed to
compensate for poor model performance, even for the “optimal model,” unless the
acceptability limits are made artificially wide to avoid rejecting all of the models
— but this might not generally be considered to be good practice. If no model proves
to be behavioral, then it is an indication that there are conceptual, structural, or data
errors, though it may still be difficult to decide which is the most important. There
is perhaps then more possibility of learning from the modeling process on occasions
when it proves necessary to reject all the models tried.
However, this type of evaluation requires that consideration also be given to
input and boundary condition errors, since, as noted before, even the perfect model
© 2007 by Taylor & Francis Group, LLC
140 Modeling Phosphorus in the Environment
might not provide behavioral predictions if it is driven with poor input data error.
Thus, the combination of input and boundary data realization — within reasonable
bounds — and model structure and parameter set in producing M(
θ
, I,
ε
I
, X, t) should
be evaluated against the effective observational error. The result will hopefully still
be a set of behavioral models, each associated with some likelihood weight. Any
compensation effect between an input realization — and initial and boundary con-
ditions — and model parameter set in achieving success in the calibration period
will then be implicitly included in the set of behavioral models.

There is also the possibility that the behavioral models defined in this way
do not provide predictions that span the complete range of the acceptable error
around an observation. The behavioral models might, for example, provide sim-
ulations of an observed variable O(X, t) that all lie in the range O(X, t) to O
max
(X,
t) or even in just a small part of it. They are all still acceptable but are apparently
biased. This provides real information about the performance of the model or
other sources of error that can be investigated and allowed for specifically at that
site in prediction rather than being lost in a statistical representation of model
error.
6.6 UNCERTAINTY AS PART OF A LEARNING PROCESS
Both Bayesian and equifinality (rejectionist set-theoretic) concepts allow the mod-
eling process to be set up within a learning framework, using data assimilation to
update the model each time new data become available. This can be for short-term
forecasting with the aim of minimizing forecast uncertainty as conditioned on the
new data or in a simulation context with the aim of refining the model representation
of the system of interest as new information is received to update the Bayes likelihood
function or the weights associated with the set of behavioral models using the Bayes
equation, originally proposed by Thomas Bayes in 1724 (see Bernado and Smith
1994; Howson and Urbach 1993).
In formal Bayes theory, the posterior likelihood is intended to represent the
probability of predicting an observation, given the true model, L(Y |
θ
) where Y is the
observation vector and
θ
is the parameter vector.
L
p

(O|
θ
) ∝ L
o
(
θ
) L(
θ
|Y) (6.8)
where L
p
(O|
θ
) is the posterior probability of predicting observations O given a
model with parameter set
θ
, L
o
(
θ
) is the prior likelihood of parameter set
θ
, and
L(
θ
|Y) is the likelihood given data Y. However, Bayes’s equation was originally
stated in the more general conditioning form for hypotheses, H, given evidence, E, as
L
p
(H|E) ∝ L

o
(H) L(E|H) (6.9)
or, in the discrete form for k potential hypotheses proposed independently by Pierre-
Simon Laplace in 1820 as
L
p
(H
k
|E) ∝ L
o
(H
k
) L(E|H
k
) / Σ
k
L(H
k
|E) (6.10)
© 2007 by Taylor & Francis Group, LLC
Uncertainty Estimation in Phosphorus Models 141
where L
p
(H
k
|E) is the posterior likelihood for hypothesis H
k
given the evidence E;
L
o

(H
k
) is a prior likelihood for H
k
; and L(E|H
k
) is the likelihood of predicting the
evidence E given H
k
. Here, the hypotheses of interest are each model of the system,
including its parameter values and any other ancillary hypotheses.
When this type of Bayes conditioning is first applied, the prior likelihoods can
be chosen subjectively based on any available evidence about each model as hypoth-
esis. In later evaluations, which may be as each new piece or set of data becomes
available, the posterior for the last updating step becomes the prior for the next
updating step. It is therefore important that the errors at each updating step remain
consistent with the assumptions that underlie the definition of the likelihood function.
The equifinality approach as previously formulated in the extended GLUE meth-
odology is also effectively Bayesian in nature in that each time a new model
evaluation is made using Equation 6.10, the distribution of behavioral models and
the weights associated with them can be reevaluated as a combination of the prior
likelihood weights and the new evaluation for each model. The approach is less
formal in that more choices can be made about how to weight different behavioral
models and how to combine the weights in successive evaluations. It is easy to
include model evaluations based on multiple criteria within the GLUE methodology
in this way. Bayes’s equation implies a multiplicative combination but other types
of combinations, such as a weighted average of multiple evaluation measures, can
be used to provide a likelihood weight for each behavioral model. These choices
must, however, be made explicit so that they can be reproduced by others if necessary.
The most important part of the learning process is the successive application of

Equation 6.7. This defines the set of behavioral models and the rejection of non-
behavioral models. In the case of multicriteria evaluation, this may mean that a
model that is successful on one criterion may be rejected on another.
This type of learning process will increasingly represent the way models are
implemented. The possibility of the routine application of data assimilation within
a learning framework raises some interesting questions about the nature of modeling
and model evaluation. Effectively, repeated updating and correction of model pre-
dictions will allow the data assimilation process to compensate for errors in model
inputs and model structure. Model evaluation therefore becomes more difficult. In
real-time forecasting, this may not be such a problem. In fact, the desire is for the
data assimilation process to compensate for errors in model inputs and model
structures if this results in improved forecasts with maximized accuracy and mini-
mized uncertainty.
This may not be the case, however, in simulation where such compensation may
not be desirable if it leads to a model structure being accepted when it should be
rejected. There is, of course, a fundamental difficulty in deconstructing the causes
of model error and isolating the model structural error alone (see discussions in
Beven 2005, 2006a; Beven and Young 2003; Kavetski et al. 2002). Though we do
not want to accept a model structure because of the compensation allowed by data
assimilation, equally we would not want to make the error of rejecting a perfectly
good model because of errors in the input data. Differentiating between these sources
of error may be very difficult, if only because the nature of both input errors and
model structural errors may be nonstationary in time.
© 2007 by Taylor & Francis Group, LLC
142 Modeling Phosphorus in the Environment
Thus, there is a question as to whether the use of data assimilation can, in a
simulation context, reveal deficiencies in either model structures or input data.
There is an analogy here with the State Dependent Parameter (SDP) estimation
methodology used by Young (2003) and Romanowicz et al. (2004). In the SDP
approach, an initial estimate of a linear transfer function model is used within a

recursive data assimilation framework to examine how the best estimates of the
parameters change over time or with respect to some other variable. This can lead
to the identification of the dominant nonlinear modes of behavior of the system
based directly on the observations rather than prior conceptual assumptions about
the system response.
6.7 AN EXAMPLE APPLICATION
The following application of the methodology will illustrate some of the issues that
arise in thinking about different potential sources of error in the modeling processes
and the relationship between observed and predicted variables. It examines the use
of the MACRO-P model in predicting observations of P concentrations in the
drainage from slurry application experiments on instrumented grassland plots in
Scotland.
6.7.1 THE MACRO MODEL
The MACRO model of water flow in structured soils was developed by Jarvis and
colleagues (Jarvis 1994). It was adapted for colloid-facilitated transport of contam-
inants (Jarvis et al. 1999; Villholth et al. 2000), and subsequently the version used
here was adapted for P transport (McGechan 2002; McGechan et al. 2001). Only a
brief description of the representation of the more important model processes will
be given here as background to the model parameters investigated in the study.
MACRO is a soil profile model that is divided into a number of vertical layers
and is partitioned into micropore (soil matrix) and macropore domains. The boundary
between the two domains is described in the model by the air–entry soil–water
tension in the Brooks-Corey equation. The two domains function separately and are
associated with their own degree of saturation, hydraulic conductivity, and flux. Flow
in the micropores is calculated using the Richards (1931) equation. Flow in the
macropores is gravity driven at a rate determined by degree of macropore saturation.
Simulation of soluble P transport requires a single run of the model where the
convection-dispersion equation is solved for each layer at each time step to calculate
the P concentration. For simulation of colloid-facilitated P transport, two consecutive
model runs are required. In the first run the concentration of colloids is simulated

in place of the solute, and in the second run the concentration of P sorbed to sites
on the colloids is calculated. This requires the concentrations of colloidal particles
and inorganic P, respectively, to be specified. For this application, all colloidal
particles were those derived from the slurry applied on specific dates. An alternative
option in the model that was not used — to reduce the number of parameters to be
considered — was colloid generation by rainfall impact.
© 2007 by Taylor & Francis Group, LLC
Uncertainty Estimation in Phosphorus Models 143
Sorption of P to the soil matrix is described using the Freundlich isotherm
equation
s = K
d
c
n
(6.11)
where s is the sorbed phase concentration in either micropores or macropores, K
d
is
the Freundlich sorption coefficient, c is the solution concentration, and n is the
Freundlich sorption exponent. For sorption to colloids a similar formulation is
employed using a sorption coefficient for colloids, K
c
,

and the Freundlich sorption
exponent set to unity.
Filtration, which is the physical trapping of particles, in macropores and
micropores leading to the irreversible trapping of colloids, is included using the
following equations. For macropores,
(6.12)

where F is the filtration sink term, f
ref
is a reference filter coefficient, nf is an empirical
exponent, v
ref
is the pore water velocity at which f
ref
is measured, and
θ
is the
volumetric soil water content for the domain.
For micropores,
F = f
c
vc
θ
(6.13)
where f
c
is the micropore filter coefficient.
6.7.2 STUDY SITE AND DATA
The study site is located at Crighton Royal Farm, Dumfries, Scotland. The annual
average precipitation at the site is 1054 mm (Hooda et al. 1999), and the precipitation
for the simulation periods chosen for this study are 755 mm (October 1, 1994 to
March 31, 1995) and 624 mm (October 1, 1995 to April 30, 1996). The two 0.5 ha
grass field plots are grassland with a silty clay loam soil that showed significant
vertical macroporous flow channels (Hooda et al. 1999). The plots were isolated
from each other and external areas by a drainage system giving isolation, apart from
deep percolation, which is thought to be insignificant on these soils (Hooda et al.
1999; McGechan et al. 1997).

6.7.2.1 Drainage Discharge and Phosphorous Concentrations
From each plot the discharge from a main drain was recorded and summed to a
weekly total. Flow-proportional sampling was used to obtain an effective P con-
centration for the period since the previous sample. Different forms of P were
analyzed, but only total inorganic P (molybdate-reactive phosphorus, MRP) was
used for this study.
Ffvvc
nf nf
=

ref ref
1
θ
© 2007 by Taylor & Francis Group, LLC
144 Modeling Phosphorus in the Environment
6.7.2.2 Slurry Applications
Slurry applications were made at a rate of approximately 50 m
3
ha
−1
the following
times: February 14, 1994, 23 kg ha
−1
total phosphorus (TP), 5000 gm
−3
colloid
concentration; May 25, 1994, 8.4 kg ha
−1
TP, 1825 gm
−3

colloid concentration;
November 21, 1994, 17 kg ha
−1
TP, 3700 gm
−3
colloid concentration; May 31, 1995,
12.7 kg ha
−1
TP, 2760 gm
−3
colloid concentration; July 7, 1995, 4.1 kg ha
−1
TP, 900
gm
−3
colloid concentration; and January 31, 1996, 23.9 kg ha
−1
TP, 5200 gm
−3
colloid
concentration. For assumptions made in estimating the P and colloid concentrations
in slurry, see McGechan (2002).
6.7.3 MACRO IMPLEMENTATION WITHIN A MODEL
REJECTION FRAMEWORK
MACRO has a large number of parameter values that must be specified before a run
is made. Many of the parameters are effective parameters with values that are difficult
to specify at the scale of application because of scale and heterogeneity effects. Here
the most important parameters based on previous experience of calibrating MACRO
have been varied across ranges defined to allow for uncertainty in these effective
values (Table 6.1). Lacking knowledge about the nature of the prior distributions

and the possible covariation of the parameters, uniform independent distributions
have been assumed, noting that the evaluation of the simulations from each parameter
set will result in posterior likelihood weights that reflect any interactions among the
effective parameter values in achieving an acceptable simulation.
For the two periods of interest, as specified previously (1994 to 1995 and 1995
to 1996), 10,000 and 11,500 model simulations were carried out, respectively. At
each weekly time step, MACRO’s estimates of discharge and P concentration were
compared to the observed data described previously. Use of weekly discharge totals
and weekly flow-weighted average concentrations in the model evaluation does add
a degree of incommensurability to the model evaluations. This is particularly the
case for concentrations that can vary rapidly within hydrological events, particularly
at this relatively small scale, meaning that flow-weighted average concentrations
may systematically differ from those predicted by the model. For discharge the
problem is likely to be less pronounced, as the simulation of weekly discharge will
depend less on correct model dynamics and more on longer-term mass balances;
this does, however, mean that the evaluations will not provide a strong constraint
on the model’s description of the flow pathways.
To allow for these problems, effective observation errors were defined in a fuzzy
manner by allowing model simulations to be considered acceptable within some
range of effective observation error in the GLUE framework as described previously.
The errors were taken as ± 40% for flow-weighted P concentrations, −30% and +30
to +300% for discharge. The range +30 to +300% was used for the upper limit of
discharge evaluations as very low flows were known to be underestimated by the
measurement equipment. For this reason positive errors were estimated on a linear
sliding scale of +30% for flows of ≥ 80 mm wk
−1
to +300% for flows ≤ 1 mm wk
−1
.
The ranges were implemented as triangular fuzzy distributions (see Figure 6.1).

© 2007 by Taylor & Francis Group, LLC
Uncertainty Estimation in Phosphorus Models 145
TABLE 6.1
MACRO Parameters Varied within the GLUE Simulations and Ranges Used
for Sampling
Parameter Description Min Max Units
Hydrology
KSATMIN Macropore Ksat
a
37.5 (1)
20.8 (2)
16.7 (3-8)
5 × 10
−7
(9)
3750 (1)
2080 (2)
1670 (3-8)
5 × 10
−7
(9)
mm hr
−1
TPORV Total porosity
a
50 (1)
47.4 (2)
43.9 (3-9)
54.8 (1)
52.4 (2)

48.5 (3-9)
% (volume)
MAC_POR Macroporosity fraction
a
2.12 (1)
2.71 (2)
0.972 (3-9)
2.59 (1)
3.31 (2)
1.19 (3-9)
% (volume)
CTEN Critical soil moisture tension MIC/MAC —
determines when macropores flow
8.5 (1-9) 17.5 (1-9) cm H
2
O
KTHETA Hydraulic conductivity at critical soil
moisture tension (CTEN)
0.5 (1)
0.36 (2)
0.075 (3-8)
1 × 10
−26
(9)
6.0(1)
1.92 (2)
0.4 (3-8)
1 × 10
−26
(9)

mm hr
−1
THETAINI Initial soil moisture content 38.0 (1)
35.6 (2-3)
29.0 (4)
32.2 (5)
32.6 (6)
32.9 (7)
32.9 (8)
36.3 (9)
55.0 (1)
53.5 (2-3)
43.6 (4)
48.3 (5)
48.9 (6)
48.9 (7)
49.3 (8)
54.4 (9)
% (volume)
AREA Drainage basin area (controls deep
percolation)
1.29 1.75 ha
P PROCESS
FREUND (n) Freundlich sorption exponent (soil matrix) 0.8 2.2 —
ZKDPC (K
d
) Freundlich sorption coefficient (soil matrix) 0.1 5 m
3
g
−1

ZKD (K
c
) Freundlich sorption coefficient (colloidal
particles)
91(1)
185 (2)
300 (3)
750 (4)
1531 (5)
4500 (6-9)
366 (1)
740 (2)
1400 (3)
6000 (4)
7500 (5)
11000 (6-9)
(m
3
)
n
mg
1−n

kg
−1
FILTERMI (f
c
) Filter coefficient (micropores) 0 80 m
−1
REFILTER (f

ref
) Reference filter coefficient (macropores) 0 4 m
−1
Note: Values in brackets represent the numbering of the soil layers in the model (layer 1 at the surface).
a
Parameters varied within ranges by a multiplicative factor applied to all layers simultaneously.
© 2007 by Taylor & Francis Group, LLC
146 Modeling Phosphorus in the Environment
In assessing the performance of each simulation, it was accepted as behavioral only
if simulated values were within the prescribed range for all time steps. For simula-
tions that met this criterion the fuzzy measures at each time step were combined by
addition to generate a weighting coefficient. Simulations where the criterion was
not met were given a likelihood weighting of zero, i.e., the simulation, was deemed
nonbehavioral and was rejected. Furthermore, for simulations to be deemed behav-
ioral, they had to achieve this criterion for both discharge and concentration infor-
mation. Only the simulations that satisfied both criteria were used in the calculation
of the distribution of simulated P fluxes. It is demonstrated in the following that
even these rather relaxed rejection criteria had to be relaxed further to allow some
predictability using this model in this application.
6.7.4 RESULTS AND DISCUSSION
6.7.4.1 Using Initial Rejection Criteria
From the 10,000 and 11,500 model simulations for the periods from 1994 to 1995
and 1995 to 1996, using the stringent concentration fuzzy acceptance criteria
(Figure 6.2), there were no behavioral simulations. For discharge-constrained sim-
ulations, there was a single behavioral simulation for 1994 to 1995 and no behavioral
simulations after both evaluation periods. However, the single behavioral simulation
constrained by discharge only could be rejected as it did not satisfactorily describe
the P concentration criteria.
One possible reason for the model’s poor performance with respect to con-
centration is P inputs from grazing animals. These inputs were poorly known,

both in their timing and their magnitude, and were not included as model inputs.
The likely effects of grazing inputs on discharge P concentrations are discussed
further below.
For discharge, the only data problem known a priori was that for the study period
from 1994 to 1995 there was approximately 13 mm more discharge than was
observed as rainfall input (767.9 mm and 755 mm, respectively). There was no
apparent explanation for this, though it appears that there may be a long history of
this in plot experiments subject to the possibility of subsurface inflows from upslope
(see Beven 2004c). Furthermore, the problem is likely to be more pronounced, owing
to the potential underestimation of low magnitude discharges mentioned previously.
The effects of this problem are also discussed in the following.
FIGURE 6.1 Example of triangular fuzzy distribution of observed variable used in model
constraint.
Discharge (mm wk
-1
)
Fuzzy
weighting
510152025
1
0
© 2007 by Taylor & Francis Group, LLC
Uncertainty Estimation in Phosphorus Models 147
FIGURE 6.2 Time series of fuzzy ranges for stringent rejection criteria (dashed line) compared to relax
ed
rejection criteria (solid line) for concentration and discharge. Left-hand plots sho
w 1994 to 1995 data; right-
hand plots show 1995 to 1996 data.
135791113 15 17 19 21 23
Timestep

MRP (mg l
-1
)Discharge (mm wk
-1
)
13 57911131517192123
Timestep
1 23456789 1110 12 13
Timestep
1234567891011 1312
Timestep
Obs
Min obs 1
Max obs 1
Min obs 2
Max obs 2
Obs
Min obs 1
Max obs 1
Min obs 2
Max obs 2
Obs
Min obs 1
Max obs 1
Min obs 2
Max obs 2
Obs
Min obs 1
Max obs 1
Min obs 2

Max obs 2
4500
4000
3500
3000
2500
2000
1500
1000
500
0
MRP (mg l
-1
)
1800
1600
1400
1200
1000
800
600
400
200
0
140
120
100
80
60
40

20
0
Discharge (mm wk
-1
)
250
200
150
100
50
0
© 2007 by Taylor & Francis Group, LLC
148 Modeling Phosphorus in the Environment
Rejection of all models does not allow utilization of the model for prediction
but does force attention on possible problems rather than relying on a statistical
representation of model error to compensate for such problems. In this study it does
appear as if there are important errors in the inputs to the model, and in the
observations with which it is being compared, as well as possible model structural
errors. There is, therefore, the possibility of unjustifiably rejecting a model because
of errors in the data. Thus, in the case of the grazing inputs and the imbalance in
the hydrological fluxes, it is argued that the rejection criteria should be relaxed as the
model should not be expected to simulate correctly with incorrect or missing inputs.
6.7.4.2 Using Relaxed Rejection Criteria
The relaxation of the rejection criteria was performed in such a manner that only
those time steps where rejection of many parameterizations occurred were altered
to achieve a reasonable number of simulations for analysis to be performed and for
adequate simulation of high-magnitude flux time steps. This approach implicitly
allowed the investigation of performance at individual time steps and the assessment
of apparent errors in terms of their origins. The relaxed criteria (Figure 6.2) allowed
the designation of behavioral parameterizations as given in Table 6.2. Results of

these most successful simulations are presented in the following in relation to the
original rejection criteria so that deviations are clearly seen.
6.7.4.3 Simulations for the Period from 1994 to 1995
Figure 6.3a shows likelihood-weighted percentiles of simulated discharge com-
pared to the original fuzzy measure ranges. It can be seen that, in general, the
observed pattern is simulated well but that there is a systematic underestimation
for a number of time steps, having both low and high discharge magnitudes.
Conversely, and more importantly for the rejection of parameter sets, the two time
steps that were the most poorly simulated (steps 13 and 14) are overestimated by
the model. Presented in Figure 6.3b are histograms of simulated discharge for
time steps 13 and 14 together with a well-simulated time step 17, where the
simulated distribution is encompassed, and approximately centered within, the orig-
inal range of acceptable error. Figure 6.3c shows the same time steps as Figure 6.3b
but compares the cumulative distribution functions (CDF) of the observed and
simulated discharges. The deviations of given quantiles between the two CDFs
for each time step are plotted as a time series in Figure 6.3d where the pattern of
TABLE 6.2
Behavioral Simulation Numbers as Constrained by the Relaxed Criteria
Study Period Discharge Concentration Flux
1994 to 1995 4120 691 620
1995 to 1996 3010 755 536
© 2007 by Taylor & Francis Group, LLC
Uncertainty Estimation in Phosphorus Models 149
FIGURE 6.3 (a) Likelihood-weighed percentiles of simulated discharge compared to the stringent
fuzzy rejection criteria for the study period from 1994 to 1995 (see legend): note the poor simulation
of time steps 13 and 14. (b) Histograms of simulated discharge compared to the stringent (full
line) and relaxed (dashed line) fuzzy rejection criteria for time steps 13, 14, and 17. (c) Comparison
of cumulative distribution functions for observed stringent distribution (dashed line) and simulated
distribution (full line) for time steps 13, 14, and 17. (d) Time series of quantile deviations (see
legend): note that positive numbers on the y axis denote an overestimation and vice versa.

25
(a)
(b)
(c)
(d)
120
100
80
60
40
20
0
500
0
500
0
500
0
1
0.5
0
1
0.5
0
1
0.5
0
Discharge (mm wk
-1
)

30
20
10
0
-10
-20
-30
-40
-50
Drainflow quantile deviation (mm)
Discharge (mm wk
-1
)
Frequency
Cumulative
weighting
95
th
50
th
5
th
5
th
25
th
50
th
75
th

95
th
max
observed
min
Timestep
Timestep
Timestep 13 Timestep 14 Timestep 17
051015 20 25
051015 20 25
510152025
5
10
1510
2020 30
10 20 30
25
510152025
510152025
© 2007 by Taylor & Francis Group, LLC
150 Modeling Phosphorus in the Environment
general underprediction can be seen, expressed as negative quantile deviations.
The fact that time steps 13 and 14 have low-magnitude observed drainage dis-
charges, and hence may be subject to the problem of low-flow measurement error,
does not appear to be significant as there is no strong relationship between under-
estimation and the magnitude of the discharge, although the use of weekly dis-
charges results in a loss of information regarding the dynamics of individual events.
The general underestimation is consistent with the known inconsistency between
rainfall inputs and observed discharges for the period. However, the use of fuzzy
observations has allowed retention of behavioral parameterizations, but it must be

remembered that potential bias may now be incorporated into the likelihood
weightings assigned to successful parameter sets if used for prediction purposes,
though this can be examined explicitly by looking at the distribution of retained
predictions in relation to any of the observed values.
A similar evaluation of the P concentration simulations showed an under-
estimation for most time steps (Figure 6.4a,d). To obtain a useful number of
behavioral simulations, the rejection criteria had to be relaxed significantly for the
majority of time steps. For example, see Figure 6.2b and time step 3, and Figure 6.4b.
There is a strong positive relationship between underestimation and the concen-
tration magnitude, particularly for those time steps not directly associated with
the slurry application of October 21, 1994 (time step 6). The underestimation
also occurs mainly for time steps with relatively high discharges. This suggests
that the underestimation may result from the lack of P inputs derived from grazing
but may also be as a result of poor model representation of processes during
hydrological events.
The relaxation of the constraining ranges allowed satisfactory simulation of time
step 6, which is associated with the slurry application (Figure 6.4a), although the
distribution of simulated concentrations for this time step shows a general under-
estimation (Figure 6.4b,c). The behavioral simulations for both discharge and con-
centration were used to estimate MRP fluxes, where the effects of underestimation
of both discharge and concentration result in significant underestimation of MRP
fluxes for most of the high magnitude time steps (not shown here). As mentioned
previously, there is no information regarding the possible imbalance between rainfall
and discharge, and it remains unclear whether or not the concentration inconsisten-
cies can be fully explained by grazing inputs.
6.7.4.4 Simulations and Parameterizations for the Period
1995 to 1996
The discharge quantiles for the period 1995 to 1996 show a random pattern of error
compared to the original ranges of effective observation error (Figure 6.5a,d). There
is no apparent relationship between underestimation or overestimation and discharge

magnitude. Two time steps proved the most problematic to simulate: 5 and 13
(Figure 6.5a through 6.5d). Both were underestimated, even though the model
explicitly includes a preferential flow component. Time step 1 was also difficult to
simulate well in combination with the underestimated time step 5. To achieve an
adequate number of behavioral simulations, the rejection criteria had to be relaxed
© 2007 by Taylor & Francis Group, LLC
Uncertainty Estimation in Phosphorus Models 151
FIGURE 6.4 (a) Likelihood-weighed percentiles of simulated concentration compared to
the stringent fuzzy rejection criteria for the study period from 1994 to 1995 (see legend).
(b) Histograms of simulated concentration compared to the stringent (full line) and relaxed
(dashed line) fuzzy rejection criteria for time steps 3, 6, and 20. (c) Comparison of cumulative
distribution functions for observed stringent distribution (dashed line) and simulated distri-
bution (full line) for time steps 3, 6, and 20. (d) Time series of quantile deviations (see legend):
note that positive numbers on the y axis denote an overestimation and vice versa.
-
Total P conc. (ug l
-
1)
(a)
(b)
(c)
(d)
4000
3500
3000
2500
2000
1500
1000
500

0
200
100
0
1
0.5
0
1
0.5
0
1
0.5
0
100
50
0
100
50
0
MRP conc. (mg l
-1
)
MRP conc. (µg l
-1
)
200
0
-200
-400
-600

-800
-1000
-1200
-1400
-1600
MRP conc. quantile deviation (mg l
-1
)
Frequency
Cumulative
weighting
95
th
50
th
5
th
5
th
25
th
50
th
75
th
95
th
max
observed
min

Timestep
Timestep
Timestep 3 Timestep 6 Timestep 20
051015 20 25
051015 20 25
100 200 300 400 500
100 200 300 400 500
1000 2000 3000 4000
1000 2000 3000 4000
15 20 25 30
15 20 25 30
Slurry application
5
© 2007 by Taylor & Francis Group, LLC
152 Modeling Phosphorus in the Environment
FIGURE 6.5 (a) Likelihood-weighed percentiles of simulated discharge compared to the
stringent fuzzy rejection criteria for the study period 1995 to 1996 (see legend). (b) Histograms
of simulated discharge compared to the stringent (full line) and relaxed (dashed line) fuzzy
rejection criteria for time steps 1, 5, and 13. (c) Comparison of cumulative distribution
functions for observed stringent distribution (dashed line) and simulated distribution (full
line) for time steps 1, 5, and 13. (d) Time series of quantile deviations (see legend): note that
positive numbers on the y axis denote an overestimation and vice versa.
(a)
(b)
(c)
(d)
250
200
150
100

50
0
400
200
0
800
600
400
200
0
1500
1000
500
0
1
0.5
0
1
0.5
0
1
0.5
0
Drainflow (mm wk
-1
)
Discharge (mm wk
-1
)
30

20
10
0
-10
-20
-30
-40
Drainflow quantile deviation (mm wk
-1
)
Frequency
Cumulative
weighting
95
th
50
th
5
th
5
th
25
th
50
th
75
th
95
th
max

observed
min
Timestep
Timestep
Timestep 1
02468101214
064281012 14
010203040
010203040
20 30 40 50 60
Timestep 5
10 20 30 40 50 60
Timestep 13
02468
2468
© 2007 by Taylor & Francis Group, LLC
Uncertainty Estimation in Phosphorus Models 153
for time step 1 to allow far higher initial flows achieved by the specification of high
initial soil moisture content (
θ
init
).
Simulation of P concentrations showed a significant underestimation from time
steps 1 to 7 and 11 but with more reasonable distributions from time steps 8 to 10
and 12 to 13 (Figure 6.6a and 6.6d). Figure 6.6b and 6.6c show that for the time
step associated with the slurry application of January 31, 1996 (time step 8; January
15, 1996 to February 12, 1996), the distribution of simulated concentration is
adequate and centered on the observed value. As for the study period from 1994 to
1995, the general underestimation for other time steps may result from the lack of
grazing inputs, but similarly a strong relationship exists between the magnitude of

weekly discharge and underestimation, excluding the time step associated with the
slurry application; the observed relationship is nonlinear with decreasing severity of
underestimation with a decrease in weekly discharge.
In contrast to the study period from 1994 to 1995, the simulated MRP fluxes,
resulting from the likelihood-weighted distributions of behavioral discharges and
concentrations, represent the observed pattern well. Some of the earlier time steps
are underestimated as a result of the underestimation of concentrations but are
relatively unimportant in terms of overall MRP flux for the period.
6.8 LEARNING FROM REJECTION: WHAT IF ALL THE MODELS
TRIED ARE NONBEHAVIORAL?
As noted at the start of this chapter, it is possible to look at a model of a system as
a hypothesis about how that system works. Consideration of multiple model struc-
tures and parameter sets is then analogous to considering multiple competing hypoth-
eses about how the system works (Beven 2002b). The possibility of model rejection
within the modeling framework outlined in this chapter does lead to the possibility
that all the models tried as hypotheses will be rejected, as in the application presented
here (see also, e.g., Freer et al. 2002; Page et al. 2006). This contrasts with the
statistical approach within which at least the best model found will be retained, as
if it were correct, and the remaining error can be represented by an error model.
The assumption that the model is correct together with overestimating the informa-
tion content of the observational data can lead to some strange conclusions, however.
Thiemann et al. (2001), for example, applying a Bayesian data assimilation approach
in a rainfall-runoff modeling application, showed how the updating at each time step
results in a very well-defined set of parameters for the model but is associated with
a larger error variance. In addition, each time there is a rainfall event the observations
are outside the confidence limits of the error model. In such a case it might have
been better to allow for other sources of error, including model structural error, or
to allow that even the best model does not give an adequate simulation and should
be rejected (see discussion by Beven and Young 2003).
But concluding that all the models tried are nonbehavioral also demands a

response. One response is to ensure that the model space has been searched suffi-
ciently to be sure that no behavioral models exist. In high-dimensional spaces this
may require a very large number of model runs, and in the application reported here
© 2007 by Taylor & Francis Group, LLC
154 Modeling Phosphorus in the Environment
FIGURE 6.6 (a) Likelihood-weighed percentiles of simulated concentration compared to the
stringent fuzzy rejection criteria for the study period 1995 to 1996 (see legend). (b) Histograms
of simulated concentration compared to the stringent (full line) and relaxed (dashed line)
fuzzy rejection criteria for time steps 1, 8, and 11. (c) Comparison of cumulative distribution
functions for observed stringent distribution (dashed line) and simulated distribution (full
line) for time steps 1, 8, and 11. (d) Time series of quantile deviations (see legend): note that
positive numbers on the y axis denote an overestimation and vice versa.
Total P conc. (ug l
-
1)
Total P conc. (ug l
-
1)
(a)
(b)
(c)
(d)
1400
1200
1000
800
600
400
200
0

200
100
0
1
0.5
0
100
50
0
500
0
1
0.5
0
1
0.5
0
MRP conc. (µg l
-1
)
MRP conc. (µg l
-1
)
600
400
200
0
-200
-400
-600

-800
MRP conc. (µg l
-1
)
Frequency
Cumulative
weighting
95
th
50
th
5
th
5
th
25
th
50
th
75
th
95
th
max
observed
min
Timestep
Timestep
Timestep 1
064281012 14

064281012 14
0 200 400 600
Timestep 8
500 1000 1500
500 1000 1500
200 400 600
Timestep 11
0 100 200 300
50 100 150 200 250 300
Slurry application
© 2007 by Taylor & Francis Group, LLC
Uncertainty Estimation in Phosphorus Models 155
computing resources limited the number of simulations performed to some 10,000
or so, which might be insufficient to find the best possible simulations in the
parameter space searched. Iorgulescu et al. (2005), for example, using a model with
17 parameters carried out 2 billion simulations and accepted only 236.
It is also necessary to ensure that the reasons for failure can be properly justified
and are not just because of outliers or errors in the measurements or treatment of
the input data used to drive the models or the observations with which the models
are being compared. Some periods of dubious data, or particular types of model
failure, might be identified in this way. This was also an issue in the example
application previously reported here, though it has to be recognized that it is always
difficult to go back over what was done in a particular experiment when such
difficulties are identified. This is clearly, however, one way in which modelers and
field experimentalists might usefully interact in discussing why particular problems
might have occurred. Much more interaction is needed to develop appropriate param-
eterizations of process at the scale at which they will be applied (Beven 2006b).
The most satisfying response would be to learn from the model rejections to try
to improve the model structure. In complex model structures, however, there is
clearly no easy way of moving from identifying a lack of functionality resulting in

model rejection to a modification to improve that functionality. This is a step
requiring creativity from the modeler. The least satisfying response is to relax the
criteria of rejection — as was necessary in this example — to allow for the multiple
unknown sources of error and to ensure that a sample of behavioral models is retained
for use in prediction. However, again, examination of the reasons for model failure
might be sufficient to justify such a relaxation. This is a step requiring the modeler
to act responsibly.
6.9 WHAT ARE THE IMPLICATIONS FOR P MODELS?
In outlining the stages of the modeling process, Beven (2001) distinguished between
the perceptual model of the processes governing a flux of interest and the conceptual
model of those processes as represented in mathematical form. It is important to
recognize that the relationship between the two is complex (see Beven 2002a, 2002b,
2006a, 2006b). In particular the perceptual model can recognize many aspects of
the processes that are difficult to represent in mathematical form. The conceptual
model used in prediction is therefore a simplification, and often a gross simplifica-
tion, of the perceptual model. This is certainly true of models of P mobilization and
transport to stream channels. The perceptual model will recognize the possibility of
complex controls on P mobilization in different forms into different pathways,
including spatial and temporal nonstationarities, in a way that the conceptual models
described in later chapters of this volume can reproduce only approximately. In the
perceptual model, a qualitative description always can be provided of processes that
may be missing from some conceptual models, such as the role of macropores and
preferential pathways in the soil on P movement, or the role of colloids in controlling
time variable surface infiltration rates by blocking some larger flow pathways, or
the highly variable depths of overland flow and their interaction with raindrop
impacts in mobilizing particulate attached P. The conceptual models must assume
© 2007 by Taylor & Francis Group, LLC

×