Tải bản đầy đủ (.pdf) (17 trang)

Engineering Statistics Handbook Episode 5 Part 11 docx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (139.12 KB, 17 trang )


4. Process Modeling
4.4. Data Analysis for Process Modeling
4.4.4. How can I tell if a model fits my data?
4.4.4.2.How can I detect non-constant variation across
the data?
Scatter Plots
Allow
Comparison
of Random
Variation
Across Data
Similar to their use in checking the sufficiency of the functional form of the model, scatter plots
of the residuals are also used to check the assumption of constant standard deviation of random
errors. Scatter plots of the residuals versus the explanatory variables and versus the predicted
values from the model allow comparison of the amount of random variation in different parts of
the data. For example, the plot below shows residuals from a straight-line fit to the
Pressure/Temperature data. In this plot the range of the residuals looks essentially constant across
the levels of the predictor variable, temperature. The scatter in the residuals at temperatures
between 20 and 30 degrees is similar to the scatter in the residuals between 40 and 50 degrees and
between 55 and 70 degrees. This suggests that the standard deviation of the random errors is the
same for the responses observed at each temperature.
Residuals
from Pressure
/ Temperature
Example
4.4.4.2. How can I detect non-constant variation across the data?
(1 of 6) [5/1/2006 10:22:13 AM]
Modification
of Example
To illustrate how the residuals from the Pressure/Temperature data would look if the standard


deviation was not constant across the different temperature levels, a modified version of the data
was simulated. In the modified version, the standard deviation increases with increasing values of
pressure. Situations like this, in which the standard deviation increases with increasing values of
the response, are among the most common ways that non-constant random variation occurs in
physical science and engineering applications. A plot of the data is shown below. Comparison of
these two versions of the data is interesting because in the original units of the data they don't
look strikingly different.
Pressure
Data with
Non-Constant
Residual
Standard
Deviation
Residuals
Indicate
Non-Constant
Standard
Deviation
The residual plot from a straight-line fit to the modified data, however, highlights the
non-constant standard deviation in the data. The horn-shaped residual plot, starting with residuals
close together around 20 degrees and spreading out more widely as the temperature (and the
pressure) increases, is a typical plot indicating that the assumptions of the analysis are not
satisfied with this model. Other residual plot shapes besides the horn shape could indicate
non-constant standard deviation as well. For example, if the response variable for a data set
peaked in the middle of the range of the predictors and was small for extreme values of the
predictors, the residuals plotted versus the predictors would look like two horns with the bells
facing one another. In a case like this, a plot of the residuals versus the predicted values would
exhibit the single horn shape, however.
4.4.4.2. How can I detect non-constant variation across the data?
(2 of 6) [5/1/2006 10:22:13 AM]

Residuals
from Modified
Pressure
Data
Residual
Plots
Comparing
Variability
Apply to Most
Methods
The use of residual plots to check the assumption of constant standard deviation works in the
same way for most modeling methods. It is not limited to least squares regression even though
that is almost always the context in which it is explained. The plot below shows the residuals
from a LOESS fit to the data from the Thermocouple Calibration example. The even spread of the
residuals across the range of the data does not indicate any changes in the standard deviation,
leading us to the conclusion that this assumption is not unreasonable for these data.
Residuals
from LOESS
Fit to
Thermocouple
Calibration
Data
4.4.4.2. How can I detect non-constant variation across the data?
(3 of 6) [5/1/2006 10:22:13 AM]
Correct
Function
Needed to
Check for
Constant
Standard

Deviation
One potential pitfall in using residual plots to check for constant standard deviation across the
data is that the functional part of the model must adequately describe the systematic variation in
the data. If that is not the case, then the typical horn shape observed in the residuals could be due
to an artifact of the function fit to the data rather than to non-constant variation. For example, in
the Polymer Relaxation example it was hypothesized that both time and temperature are related to
the response variable, torque. However, if a single stretched exponential model in time was the
initial model used for the process, the residual plots could be misinterpreted fairly easily, leading
to the false conclusion that the standard deviation is not constant across the data. When the
functional part of the model does not fit the data well, the residuals do not reflect purely random
variations in the process. Instead, they reflect the remaining structure in the data not accounted
for by the function. Because the residuals are not random, they cannot be used to answer
questions about the random part of the model. This also emphasizes the importance of plotting the
data before fitting the initial model, even if a theoretical model for the data is available. Looking
at the data before fitting the initial model, at least in this case, would likely forestall this potential
problem.
4.4.4.2. How can I detect non-constant variation across the data?
(4 of 6) [5/1/2006 10:22:13 AM]
Polymer
Relaxation
Data Modeled
as a Single
Stretched
Exponential
Residuals
from Single
Stretched
Exponential
Model
4.4.4.2. How can I detect non-constant variation across the data?

(5 of 6) [5/1/2006 10:22:13 AM]
Getting Back
on Course
After a Bad
Start
Fortunately, even if the initial model were incorrect, and the residual plot above was made, there
are clues in this plot that indicate that the horn shape (pointing left this time) is not caused by
non-constant standard deviation. The cluster of residuals at time zero that have a residual torque
near one indicate that the functional part of the model does not fit the data. In addition, even when
the residuals occur with equal frequency above and below zero, the spacing of the residuals at
each time does not really look random. The spacing is too regular to represent random
measurement errors. At measurement times near the low end of the scale, the spacing of the
points increases as the residuals decrease and at the upper end of the scale the spacing decreases
as the residuals decrease. The patterns in the spacing of the residuals also points to the fact that
the functional form of the model is not correct and needs to be corrected before drawing
conclusions about the distribution of the residuals.
4.4.4.2. How can I detect non-constant variation across the data?
(6 of 6) [5/1/2006 10:22:13 AM]
No Drift
Indicated
Taken as a whole, this plot essentially shows that there is only random scatter in the relationship
between the observed pressures and order in which the data were collected, rather than any
systematic relationship. Although there appears to be a slight trend in the residuals when plotted
in run order, the trend is small when measured against short-term random variation in the data,
indicating that it is probably not a real effect. The presence of this apparent trend does emphasize,
however, that practice and judgment are needed to correctly interpret these plots. Although
residual plots are a very useful tool, if critical judgment is not used in their interpretation, you can
see things that aren't there or miss things that are. One hint that the slight slope visible in the data
is not worrisome in this case is the fact that the residuals overlap zero across all runs. If the
process was drifting significantly, it is likely that there would be some parts of the run sequence

in which the residuals would not overlap zero. If there is still some doubt about the slight trend
visible in the data after using this graphical procedure, a term describing the drift can be added to
the model and tested numerically to see if it has a significant impact on the results.
Modification
of Example
To illustrate how the residuals from the Pressure/Temperature data would look if there were drift
in the process, a modified version of the data was simulated. A small drift of 0.3
units/measurement was added to the process. A plot of the data is shown below. In this run
sequence plot a clear, strong trend is visible and there are portions of the run order where the
residuals do not overlap zero. Because the structure is so evident in this case, it is easy to
conclude that some sort of drift is present. Then, of course, its cause needs to be determined so
that appropriate steps can be taken to eliminate the drift from the process or to account for it in
the model.
4.4.4.3. How can I tell if there was drift in the measurement process?
(2 of 4) [5/1/2006 10:22:14 AM]
Run
Sequence
Plot for
Pressure /
Temperature
Data with
Drift
As in the case when the standard deviation was not constant across the data set, comparison of
these two versions of the data is interesting because the drift is not apparent in either data set
when viewed in the scale of the data. This highlights the need for graphical residual analysis
when developing process models.
Applicable
to Most
Regression
Methods

The run sequence plot, like most types of residual plots, can be used to check for drift in many
regression methods. It is not limited to least squares fitting or one particular type of model. The
run sequence plot below shows the residuals from the fit of the nonlinear model
to the data from the Polymer Relaxation example. The even spread of the residuals across the
range of the data indicates that there is no apparent drift in this process.
4.4.4.3. How can I tell if there was drift in the measurement process?
(3 of 4) [5/1/2006 10:22:14 AM]
Run
Sequence
Plot for
Polymer
Relaxation
Data
4.4.4.3. How can I tell if there was drift in the measurement process?
(4 of 4) [5/1/2006 10:22:14 AM]
Lag Plot:
Thermocouple
Calibration
Example
4.4.4.4. How can I assess whether the random errors are independent from one to the next?
(2 of 4) [5/1/2006 10:22:14 AM]
Lag Plot:
Polymer
Relaxation
Example
4.4.4.4. How can I assess whether the random errors are independent from one to the next?
(3 of 4) [5/1/2006 10:22:14 AM]
Next Steps Some of the different patterns that might be found in the residuals when the errors are not
independent are illustrated in the general discussion of the lag plot. If the residuals are not
random, then time series methods might be required to fully model the data. Some time series

basics are given in Section 4 of the chapter on Process Monitoring. Before jumping to
conclusions about the need for time series methods, however, be sure that a run order plot does
not show any trends, or other structure, in the data. If there is a trend in the run order plot,
whether caused by drift or by the use of the wrong functional form, the source of the structure
shown in the run order plot will also induce structure in the lag plot. Structure induced in the lag
plot in this way does not necessarily indicate dependence in successive random errors. The lag
plot can only be interpreted clearly after accounting for any structure in the run order plot.
4.4.4.4. How can I assess whether the random errors are independent from one to the next?
(4 of 4) [5/1/2006 10:22:14 AM]
Normal
Probability
Plot:
Thermocouple
Calibration
Example
4.4.4.5. How can I test whether or not the random errors are distributed normally?
(2 of 7) [5/1/2006 10:22:15 AM]
Normal
Probability
Plot: Polymer
Relaxation
Example
4.4.4.5. How can I test whether or not the random errors are distributed normally?
(3 of 7) [5/1/2006 10:22:15 AM]
Further
Discussion
and Examples
If the random errors from one of these processes were not normally distributed, then significant
curvature may have been visible in the relationship between the residuals and the quantiles from
the standard normal distribution, or there would be residuals at the upper and/or lower ends of the

line that clearly did not fit the linear relationship followed by the bulk of the data. Examples of
some typical cases obtained with non-normal random errors are illustrated in the general
discussion of the normal probability plot.
Histogram The normal probability plot helps us determine whether or not it is reasonable to assume that the
random errors in a statistical process can be assumed to be drawn from a normal distribution. An
advantage of the normal probability plot is that the human eye is very sensitive to deviations from
a straight line that might indicate that the errors come from a non-normal distribution. However,
when the normal probability plot suggests that the normality assumption may not be reasonable, it
does not give us a very good idea what the distribution does look like. A histogram of the
residuals from the fit, on the other hand, can provide a clearer picture of the shape of the
distribution. The fact that the histogram provides more general distributional information than
does the normal probability plot suggests that it will be harder to discern deviations from
normality than with the more specifically-oriented normal probability plot.
Examples Histograms for the three examples used to illustrate the normal probability plot are shown below.
The histograms are all more-or-less bell-shaped, confirming the conclusions from the normal
probability plots. Additional examples can be found in the gallery of graphical techniques.
4.4.4.5. How can I test whether or not the random errors are distributed normally?
(4 of 7) [5/1/2006 10:22:15 AM]
Histogram:
Temperature /
Pressure
Example
Histogram:
Thermocouple
Calibration
Example
4.4.4.5. How can I test whether or not the random errors are distributed normally?
(5 of 7) [5/1/2006 10:22:15 AM]

×