Tải bản đầy đủ (.pdf) (67 trang)

Microsoft Excel 2010: Data Analysis and Business Modeling phần 8 ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.02 MB, 67 trang )

484 Microsoft Excel 2010: Data Analysis and Business Modeling
In the Two Way ANOVA with Interaction worksheet, I changed the data from the previous
example to the data shown in Figure 57-10. After running the analysis for a two-factor
ANOVA with replication, I obtained the results shown in Figure 57-11.
FIGURE 57-10 Sales data with interaction between price and advertising.
FIGURE 57-11 Output for the two-factor ANOVA with interaction.
In this data set, the p-value for interaction is .001. When you see a low p-value (less than
.15) for interaction, you do not even check p-values for row and column factors. You simply
forecast sales for any price and advertising combination to equal the mean of the three
observations involving that price and advertising combination. For example, the best forecast
for sales during a month with high advertising and medium price is:
4
34 + 40 + 32
3
106
= 35.555 units=
The standard deviation of forecast errors is again the square root of the mean square within
(
17.11 = 4.14)
Thus, you can be 95 percent sure that the sales forecast is accurate within 8.26 units.
Chapter 57 Randomized Blocks and Two-Way ANOVA 485
Figure 57-12 illustrates why this data exhibits a signicant interaction between price and
advertising. For a low and medium price, increased advertising increases sales, but if price
is high, increased advertising has no effect on sales. This explains why you cannot use
equation 2 to forecast sales when a signicant interaction is present. After all, how can you
talk about an advertising effect when the effect of advertising depends on the price?
FIGURE 57-12 Price and advertising exhibit a signicant interaction in this set of data.
Problems
The data for the following problems is in the le Ch57.xlsx.
1. You believe that pressure (high, medium, or low) and temperature (high, medium, or
low) inuence the yield of a production process. Given this theory, determine the an-


swers to the following problems:

Use the data in the Problem 1 worksheet to determine how temperature and/or
pressure inuence the yield of the process.

With high pressure and low temperature, you’re 95 percent sure that process
yield will be in what range?
2. You are trying to determine how the particular sales representative and the number of
sales calls (one, three, or ve) made to a doctor inuence the amount (in thousands of
dollars) that each doctor prescribes of your drug. Use the data in the Problem 2 work-
sheet to determine the answers to the following problems:

How do the representative and number of sales calls inuence sales volume?

If Rep 3 makes ve sales calls to a doctor, you’re 95 percent sure she will generate
prescriptions within what range of dollars?
3. Answer the questions in Problem 2 by using the data in the Problem 3 worksheet.
4. The le Coupondata.xlsx contains information on sales of peanut butter for weeks
when a coupon was given out (or not) and advertising was done (or not) in the Sunday
paper. Describe how the coupon and advertising inuence peanut butter sales.
487
Chapter 58
Using Moving Averages to
Understand Time Series
Question answered in this chapter:

I’m trying to analyze the upward trend in quarterly revenues of Amazon.com since
1996. Fourth quarter sales in the U.S. are usually larger (because of Christmas) than
sales during the rst quarter of the following year. This pattern obscures the up-
ward trend in sales. Is there any way that I can graphically show the upward trend in

revenues?
Answers to This Chapter’s Question
Time series data simply displays the same quantity measured at different points in time. For
example, the data in the le Amazon.xlsx, a subset of which is shown in Figure 58-1, displays
the time series for quarterly revenues in millions of dollars for Amazon.com. The data covers
the time interval from the fourth quarter of 1995 through the third quarter of 2009.
To graph this time series, select the range C2:D59, which contains the quarter number (the
rst quarter is Quarter 1 and the last is Quarter 57) and Amazon quarterly revenues (in
millions of dollars). Then choose Chart on the Insert tab, and choose the second option under
the Scatter chart type. (Scatter with Smooth Lines and Markers.) The time series plot is shown
in Figure 58-2.
FIGURE 58-1 Quarterly revenues for Amazon sales.
488 Microsoft Excel 2010: Data Analysis and Business Modeling
FIGURE 58-2 Time series plot of quarterly toy revenues.
There is an upward trend in revenues, but the fact that fourth quarter revenues dwarf
revenues during the rst three quarters of each year makes it hard to spot the trend. Because
there are four quarters per year, it would be nice to graph average revenues during the
last four quarters. This is called a four- period moving average. Using a four-quarter mov-
ing average smooths out the seasonal inuence because each average will contain one data
point for each quarter. Such a graph is called a moving average graph because the plotted
average “moves” over time. Moving average graphs also smooth out random variation, which
helps you get a better idea of what is going on with your data.
To create a moving average graph of quarterly revenues, you can modify the chart. Select the
graph, and then click a data point until all the data points are displayed in blue. Right-click
any point, click Add Trendline, and then select the Moving Average option. Set the period
equal to 4. Microsoft Excel now creates the four-quarter moving average trend curve that’s
shown in Figure 58-3. (See the le Amazonma.xlsx.)
For each quarter, Excel plots the average of the current quarter and the last three quarters.
Of course, for a four-quarter moving average, the moving average curve starts with the
fourth data point. The moving average curve makes it clear that Amazon.com’s revenues

had a steady upward trend. In fact the slope of the four-quarter moving average appears to
be increasing. In all likelihood , the slope of this moving average graph will eventually level
off, resulting in a graph that looks like an S curve. The Excel Trend Curve feature cannot t S
curves, but the Excel 2010 Solver can be used to t S curves to data.
Chapter 58 Using Moving Averages to Understand Time Series 489
FIGURE 58-3 Four-quarter moving average trend curve.
Problem

The le Ch58data.xlsx contains quarterly revenues for GM, Ford, and GE. Construct a
four-quarter moving average trend curve for each company’s revenues. Describe what
you learn from each trend curve.
491
Chapter 59
Winters’s Method
You often need to predict future values of a time series, such as monthly costs or monthly
product revenues. This is usually difcult because the characteristics of any
time series
are constantly changing. Smoothing or adaptive methods are usually best suited for
forecasting future values of a time series. In this chapter, I describe the most powerful
smoothing method: Winters’s method. To help you understand how Winters’s method
works, I’ll use it to forecast monthly housing starts in the United States. Housing starts are
simply the number of new homes whose construction begins during a month. I’ll begin by
describing the three key characteristics of a time series.
Time Series Characteristics
The behavior of most time series can be explained by understanding three characteristics:
base, trend, and seasonality.

The base of a series describes the series’ current level in the absence of any seasonality.
For example, suppose the base level for U.S. housing starts is 160,000. In this case, you
can believe that if the current month were an average month relative to other months

of the year, 160,000 housing starts would occur.

The trend of a time series is the percentage increase per period in the base. Thus, a
trend of 1.02 means that you estimate that housing starts are increasing by 2 percent
each month.

The seasonality (seasonal index) for a period tells you how far above or below a typical
month you can expect housing starts to be. For example, if the December seasonal
index is .8, then December housing starts are 20 percent below a typical month. If
the June seasonal index is 1.3, then June housing starts are 30 percent higher than a
typical month.
Parameter Denitions
paAfter observing month t, you will have used all data observed through the end of month t
to estimate the following quantities of interest:

L
t
=Level of series

T
t
=Trend of series

S
t
=Seasonal index for current month
492 Microsoft Excel 2010: Data Analysis and Business Modeling
The key to Winters’s method is the following three equations, which are used to update L
t
,

T
t
, and S
t
. In the following formulas, alp, bet, and gam are called smoothing parameters. You
choose the values of these parameters to optimize forecasts. In the following formulas, c
equals the number of periods in a seasonal cycle (c=12 months, for example) and x
t
equals
the observed value of the time series at time t.

Formula 1: L
t
=alp(x
t
/s
t–c
)+(1–alp)(L
t–1
*T
t–1
)

Formula 2: T
t
=bet(L
t
/L
t–1
)+(1–bet)T

t–1

Formula 3: S
t
=gam(x
t
/L
t
)+(1–gam)s
t-c

Formula 1 indicates that the new base estimate is a weighted average of the current
observation (deseasonalized) and the last period’s base updated by the last trend estimate.
Formula 2 indicates that the new trend estimate is a weighted average of the ratio of the
current base to the last period’s base (this is a current estimate of trend) and the last period’s
trend. Formula 3 indicates that you update your seasonal index estimate as a weighted
average of the estimate of the seasonal index based on the current period and the previous
estimate. Note that larger values of the smoothing parameters correspond to putting more
weight on the current observation.
You can dene F
t,k
as your forecast (F) after period t for the period t+k. This results in the
formula F
t,k
=L
t*(
T
t)
k
s

t+k–c
. (I refer to this as formula 4.)
This formula rst uses the current trend estimate to update the base k periods forward. Then
the resulting base estimate for period t+k is adjusted by the appropriate seasonal index.
Initializing Winters’s Method
To start Winters’s method, you must have initial estimates for the series base, trend, and
seasonal indexes. I used monthly housing starts for the years 1986 and 1987 to initialize
Winters’s method. Then I chose smoothing parameters to optimize one-month-ahead
forecasts for the years 1988 through 1996. See Figure 59-1 and the le House2.xlsx. Here
are the steps I followed.
Chapter 59 Winters’s Method 493
FIGURE 59-1 Initialization of Winters’s method.
Step 1 I estimated, for example, the January seasonal index as the average of
January housing starts for 1986 and 1987 divided by the average monthly starts
for 1986 and 1987. Therefore, copying from G14 to G15:G25 the formula
=AVERAGE(B2,B14)/AVERAGE($B$2:$B$25) generates the estimates of seasonal indexes.
For example, the January estimate is 0.75 and the June estimate is 1.17.
Step 2 To estimate the average monthly trend, I took the twelfth root of the 1987 mean
starts divided by the 1986 mean starts. I computed this in cell J3 (and copied it to cell D25)
with the formula =(J1/J2)^(1/12).
Step 3
Going into January 1987, I estimated the base of the series as the deseasonalized
December 1987 value. This was computed in C25 with the formula =(B25/G25).
Estimating the Smoothing Constants
Now I’m ready to estimate smoothing constants. In column C, I will update the series base;
in column D, the series trend; and in column G, the seasonal indexes. In column E, I com-
pute the forecast for next month, and in column F, I compute the absolute percentage error
for each month. Finally, I use the Solver to choose values for the smoothing constants that
minimize the sum of the absolute percentage errors. Here’s the process.
Step 1 In G11:I11, I enter trial values (between 0 and 1) for the smoothing constants.

Step 2 In C26:C119, I compute the updated series level with formula 1 by copying from C26
to C27:C119 the formula =alp*(B26/G14)+(1–alp)*(C25*D25).
494 Microsoft Excel 2010: Data Analysis and Business Modeling
Step 3 In D26:D119, I use formula 2 to update the series trend, copying from D26 to
D27:D119 the formula =bet*(C26/C25)+(1–bet)*D25.
Step 4
In G26:G119, I use formula 3 to update the seasonal indexes, copying from G26 to
G27:G119 the formula =gam*(B26/C26)+(1–gam)*G14.
Step 5
In E26:E119, I use formula 4 to compute the forecast for the current month by
copying from E26 to E27:E119 the formula =(C25*D25)*G14.
Step 6
In F26:F119, I compute the absolute percentage error for each month by copying
from F26 to F27:F119 the formula =ABS(B26-E26)/B26.
Step 7
I compute the average absolute percentage error for the years 1988 through 1996
in F21 with the formula =AVERAGE(F26:F119).
Step 8
Now I use Solver to determine smoothing parameter values that minimize the
average absolute percentage error. The Solver Parameters dialog box is shown in Figure 59-2.
FIGURE 59-2 Solver Parameters dialog box for Winters’s model.
I used smoothing parameters (G11:I11) to minimize the average absolute percentage error
(cell F21). The Solver ensures that you nd the best combination of smoothing constants.
Smoothing constants must be between 0 and 1. Here, alp=.50, bet=.01, and gam=.27
minimizes the average absolute percentage error. You might nd slightly different values
for the smoothing constants, but you should obtain a mean absolute percentage error
(MAPE) close to 7.3 percent. In this example, there are many combinations of the smoothing
constants that give forecasts having approximately the same MAPE. Our one-month-ahead
forecasts are off by an average of 7.3 percent.
Chapter 59 Winters’s Method 495

Remarks

Instead of choosing smoothing parameters to optimize one-period forecast errors, you
could, for example, choose to optimize the average absolute percentage error incurred
in forecasting total housing starts for the next six months.

If at the end of month t you want to forecast sales for the next four quarters, you would
simply add f
t,1
+f
t,2
+f
t,3
+f
t,4
. If you want, you could choose smoothing parameters to
minimize the absolute percentage error incurred in estimating sales for the next year.
Problems
All the data for the following problems is in the le Quarterly.xlsx.
1. Use Winters’s method to forecast one-quarter-ahead revenues for Apple.
2. Use Winters’s method to forecast one-quarter-ahead revenues for Amazon.com.
3. Use Winters’s method to forecast one-quarter-ahead revenues for Home Depot.
4. Use Winters’s method to forecast total revenues for the next two quarters for
Home Depot.
497
Chapter 60
Ratio-to-Moving-Average Forecast
Method
Questions answered in this chapter:


What is the trend of a time series?

How do I dene seasonal indexes for a time series?

Is there an easy way to incorporate trend and seasonality into forecasting future
product sales?
Often you need a simple, accurate method to predict future quarterly revenues of a
corporation or future monthly sales of a product. The ratio-to-moving-average method
provides an accurate, easy-to-use forecasting method for these situations.
In the le Ratioma.xlsx, you are given sales of a product during 20 quarters (shown later
in Figure 60-1 in rows 5 through 24), and you want to predict sales during the next four
quarters (quarters 21-24). This time series has both trend and seasonality.
Answers to This Chapter’s Questions
What is the trend of a time series?
A trend of 10 units per quarter means, for example, that sales are increasing by 10 units
per quarter, while a trend of -5 units per quarter means that sales tend to decrease 5
units per quarter.
How do I dene seasonal indexes for a time series?
We know that Walmart sees a large increase in its sales during the fourth quarter (because
of the holiday season.) If you do not recognize this, you would have trouble coming up with
good forecasts of quarterly Walmart revenues. The concept of seasonal indexes helps you
better understand a company’s sales pattern. The quarterly seasonal indexes for Walmart
revenues are as follows:

Quarter 1 (January through March): .90

Quarter 2 (April through June): .98

Quarter 3 (July through September): .96


Quarter 4 (October through December): 1.16
498 Microsoft Excel 2010: Data Analysis and Business Modeling
These indexes imply, for example, that sales during a fourth quarter are typically 16 percent
higher than sales during an average quarter. Seasonal indexes must average out to 1.
To see whether you understand seasonal indexes, try and answer the following question:
Suppose that during Quarter 4 of 2013 Walmart has sales of $200 billion, and during Quarter
1 of 2014 Walmart has sales of $180 billion. Are things getting better or worse for Walmart?
The key idea here is to deseasonalize sales and express each quarter’s sales in terms of an
average quarter. For example, the Quarter 4 2013 sales are equivalent to selling 200/1.16 =
$172.4 billion in an average quarter, and the Quarter 1 2014 sales are equivalent to selling
180/.9 = $200 billion in an average quarter. Thus, even though Walmart’s actual sales
decreased 10 percent, sales appear to be increasing by (200/172.4) – 1 = 16 percent per
quarter. This simple example shows how important it is to understand your company’s or
product’s seasonal indexes.
Is there an easy way to incorporate trend and seasonality into forecasting future
product sales?
Now let’s turn to the simple ratio-to-moving-average forecasting method. This technique
enables you to easily estimate a time series’ trend and seasonal indexes and makes it easy
to generate forecasts of future values of the time series. The work I did for this question is
shown in Figure 60-1 and the le Ratioma.xlsx.
FIGURE 60-1 Data for the ratio-to-moving-average example.
You begin by trying to estimate the deseasonalized level of the series during each period
(using centered moving averages). Then you can t a trend line to your deseasonalized
estimates (in column G). Next you determine the seasonal index for each quarter. Finally,
you estimate the future level of the series by extrapolating the trend line and then predict
future sales by reseasonalizing the trend line estimate.
Chapter 60 Ratio-to-Moving-Average Forecast Method 499

Calculating moving averages To begin, compute a four-quarter moving average
(four quarters eliminates seasonality) for each quarter by averaging the prior quar-

ter, the current quarter, and the next two quarters. To do this, copy from F6 to F7:F22
the formula AVERAGE(E5:E8). For example, for Quarter 2, the moving average is
.25*(24+44+61+79) = 52.

Calculating centered moving averages The moving average for Quarter 2 is
centered at Quarter 2.5, while the moving average for Quarter 3 is centered at Quarter
3.5. Averaging these two moving averages gives a centered moving average, which
estimates the level of the process at the end of Quarter 3. Copying from cell G7 the
formula AVERAGE(F6:F7) gives you an estimate of the level of the series during each
series—without seasonality!

Fitting a trend line to the centered moving averages You use the centered moving
averages to t a trend line that can be used to estimate the future level of the series.
In F1, I use the formula SLOPE(G7:G22,B7:B22) to nd the slope of the trend line, and
in cell F2 I use the formula INTERCEPT(G7:G22,B7:B22) to nd the intercept of the
trend line. You can now estimate the level of the series during Quarter t to be
6.94t+ 30.17. Copying from G25 to G26:G28 the formula Intercept + Slope*B23
computes the estimated level of the series from Quarter 21 onward.

Computing the seasonal indexes Recall that a seasonal index of, say, 2 for a quarter
means sales in that quarter are twice sales during an average quarter, and a seasonal
index of .5 for a quarter means that sales during that quarter are half of an average
quarter. To determine the seasonal indexes, begin by calculating for each quarter
for which you have sales actual sales/centered moving average. To do this, copy from
cell H7 to H8:H22 the formula =E7/G7. You’ll see, for example, that during each rst
quarter, sales were 77, 71, 90, and 89 percent of average, so you can estimate the
seasonal index for Quarter 1 as the average of these four numbers (82 percent). To
calculate the initial seasonal index estimates, copy from cell K5 to K6:K8 the formula
AVERAGEIF($D$7:$D$22,J3,$H$7:$H$22). This formula averages the four estimates you
have for Quarter 1 seasonality.

Unfortunately, the seasonal indexes do not average exactly to 1. To ensure that the nal
seasonal indexes average to 1, copy from L3 to L4:L6 the formula
K3/AVERAGE($K$3:$K$6).

Forecasting sales during Quarters 21–24 To create the sales forecast for each future
quarter, you simply multiply the trend line estimate for the quarter’s level (from column
G) by the appropriate seasonal index. Copying from cell G25 to G26:G28 the formula
VLOOKUP(D25,season,3)*G25 computes the nal forecast for Quarters 21–24.
500 Microsoft Excel 2010: Data Analysis and Business Modeling
If you think the trend of the series has changed recently, you can estimate the series’
trend based on more recent data. For example, you could use the centered moving
averages for Quarters 13–18 to get a more recent trend estimate with the formula
SLOPE(G17:G22,B17:B22). This yields an estimated trend of 8.09 units per quarter. If you want
to forecast Quarter 22 sales, for example, you would take the last centered moving average
you have (from Quarter 18) of 160.13 and add 4(8.09) to estimate the level of the series in
Quarter 22. Then multiplying by the Quarter 2 seasonal index of .933 yields a nal forecast
for Quarter 22 sales of (160.13+4(8.09))*(.933) = 179.6 units.
Problem
1. The le Walmartdata.xlsx contains quarterly revenues of Walmart during the years
1994–2009. Use the ratio-to-moving-average method to forecast revenues for
Quarters 3 and 4 in 2009 and Quarters 1 and 2 of 2010. Use Quarters 53–60 to create
a trend estimate that you use in your forecasts.
501
Chapter 61
Forecasting in the Presence of
Special Events
Questions answered in this chapter:

How can I determine whether specic factors inuence customer trafc?


How can I evaluate forecast accuracy?

How can I check whether my forecast errors are random?
For a student project, a class and I attempted to forecast the number of customers visiting
the Eastland Plaza Branch of the Indiana University (IU) Credit Union each day. Interviews
with the branch manager made it clear that the following factors affected the number of
customers:

Month of the year

Day of the week

Whether the day was a faculty or staff payday

Whether the day before or the day after was a holiday
Answers to This Chapter’s Questions
How can I determine whether specic factors inuence customer trafc?
The data collected is contained in the Original worksheet in the le Creditunion.xlsx,
shown in Figure 61-1. If you try to run a regression on this data by using dummy variables
(as described in Chapter 54, “Incorporating Qualitative Factors into Multiple Regression”),
the dependent variable would be the number of customers arriving each day (the data in
column E). You would need 19 independent variables:

Eleven to account for the month (12 months minus 1)

Four to account for the day of the week (5 business days minus 1)

Two to account for the types of paydays that occur each month

Two to account for whether a particular day follows or precedes a holiday

Microsoft Excel 2010 allows only 15 independent variables, so it appears that you’re
in trouble.
502 Microsoft Excel 2010: Data Analysis and Business Modeling
FIGURE 61-1 Data used to predict credit union customer trafc.
When a regression forecasting model requires more than 15 independent variables, you
can use the Excel Solver to estimate the coefcients of the independent variables. You can
also use Excel to compute the R-squared values between forecasts and actual customer
trafc and the standard deviation for the forecast errors. To analyze this data, I created a
forecasting equation by using a lookup table to locate the day of the week, the month, and
other factors. Then I used Solver to choose the coefcients for each level of each factor that
yields the minimum sum of squared errors. (Each day’s error equals actual customers minus
forecasted customers.) Here are the particulars.
I began by creating indicator variables (in columns G through J) for whether the day is a
staff payday (SP), faculty payday (FAC), before a holiday (BH), or after a holiday (AH). (See
Figure 61-1.) For example, in cells G4, H4, and J4, I entered 1 to indicate that January 2 was a
staff payday, faculty payday, and after a holiday. Cell I4 contains 0 to indicate that January 2
was not before a holiday.
The forecast is dened by a constant (which helps to center the forecasts so that they will be
more accurate) and effects for each day of the week, each month, a staff payday, a faculty
payday, a day occurring before a holiday, and a day occurring after a holiday. I inserted trial
values for all these parameters (the Solver changing cells) in the cell range O4:O26, shown in
Figure 61-2. Solver will then choose values that make the model best t the data. For each
day, the forecast of customer count will be generated by the following equation:
Predicted customer count=Constant+(Month effect)+(Day of week effect)
+(Staff payday effect, if any)+(Faculty payday effect, if any)+
(Before holidayeffect, if any)+(After holiday effect, if any)
Chapter 61 Forecasting in the Presence of Special Events 503
Using this model, you can compute a forecast for each day’s customer count by copying from
K4 to K5:K257 the following formula:
$O$26+VLOOKUP(B4,$N$14:$O$25,2)+VLOOKUP(D4,$N$4:$O$8,2) +G4*$O$9+H4*$O$10+I4*$O$11+J4*$O$12

Cell O26 picks up the constant term. VLOOKUP(B4,$N$14:$O$25,2) picks up the month
coefcient for the current month, and VLOOKUP(D4,$N$4:$O$8,2) picks up the day of the
week coefcient for the current week. G4*$O$9+H4*$O$10+I4*$O$11+J4*$O$12 picks up
the effects (if any) when the current day is coded as SP, FAC, BH, or AH.
By copying from L4 to L5:L257 the formula (E4-K4)^2, I compute the squared error for each
day. Then, in cell L2, I compute the sum of squared errors with the formula SUM(L4:L257).
FIGURE 61-2 Changing cells and customer forecasts.
In cell R4, I average the day of the week changing cells with the formula AVERAGE(O4:O8),
and in cell R5, I average the month changing cells with the formula AVERAGE(O14:O25). Later,
I’ll constrain the average month and day of the week effects to equal 0, which ensures that a
month or day of the week with a positive effect has a higher than average customer count,
and a month or day of the week with a negative effect has a lower than average customer
count.
You can use the Solver settings shown in Figure 61-3 to choose the forecast parameters to
minimize the sum of squared errors.
504 Microsoft Excel 2010: Data Analysis and Business Modeling
FIGURE 61-3 Solver Parameters dialog box for determining forecast parameters.
The Solver model changes the coefcients for the month, day of the week, BH, AH, SP, FAC,
and the constant to minimize the sum of square errors. I also constrained the average day of
the week and month effect to equal 0. Using the Solver, the results shown in Figure 61-2 are
obtained. For example, Friday is the busiest day of the week and June is the busiest month.
A staff payday raises the forecast (all else being equal—in the Latin, ceteris paribus) by 397
customers.
How can I evaluate forecast accuracy?
To evaluate the accuracy of the
forecast, you compute the R-squared value between
the forecasts and the actual customer count in cell J1. The formula you use is
RSQ(E4:E257,K4:K257). This formula computes the percentage of the actual variation in
customer count that is explained by the forecasting model. Here, the independent variables
explain 77 percent of the daily variation in customer count.

You compute the error for each day in column M by copying from M4 to M5:M257 the
formula E4–K4. A close approximation to the standard error of the forecast is given by the
standard deviation of the errors. This value is computed in cell M1 by using the formula
STDEVS(M4:M257). Thus, approximately 68 percent of the forecasts should be accurate
within 163 customers, 95 percent accurate within 326 customers, and so on.
Let’s try and spot any outliers. Recall that an observation is an outlier if the absolute value
of a forecast error exceeds two times the standard error of the regression. Select the range
M4:M257, and then click Conditional Formatting on the Home tab. Next, select New Rule,
and in the New Formatting Rule dialog box, choose Use A Formula To Determine Which Cells
Chapter 61 Forecasting in the Presence of Special Events 505
To Format. Fill in the rule description in the dialog box as shown in Figure 61-4. (For more
information about conditional formatting, see Chapter 24, “Conditional Formatting.”)
FIGURE 61-4 Using conditional formatting to spot forecast outliers.
After choosing a format with a red font, the conditional formatting settings will display in
red any error that exceeds 2*(standard deviation of errors) in absolute error. Looking at the
outliers, you can see that the model often underforecasts the customer count for the rst
three days of the month. Also, during the second week in March (spring break), the model
overforecasts, and the day before spring break, it greatly underforecasts.
To remedy this problem, in the 1st Three Days worksheet, I added changing cells for each
of the rst three days of the month and for spring break and the day before spring break. I
added trial values for these new effects in cells O26:O30. By copying from K4 to K5:K257 the
following formula:
$O$25+VLOOKUP(B4,$N$13:$O$24,2)+VLOOKUP(D4,$N$4:$O$8,2)+G4*$O$9+H4*$O$10+I4*$O$11+J4*$O$12
+IF(C4=1,$O$26,IF(C4=2,$O$27,IF(C4=3,$O$28,0)))
I include the effects of the rst three days of the month. (The term IF(C4=1,$O$26,
IF(C4=2,$O$27,IF(C4=3,$O$28,0))) picks up the effect of the rst three days of the month.)
I manually entered the spring break coefcients in cells K52:K57. For example, in cell K52 I
added +O29 to the formula, and in cells K53:K57, I added +O30.
After including the new changing cells in the Solver dialog box, I get the results shown in
Figure 61-5. Notice that the rst three days of the month greatly increase customer count

(probably because of government support and Social Security checks), and that spring
break reduces customer count. Figure 61-5 also shows the improvement in the forecasting
accuracy. The R squared value (RSQ) has improved to 87 percent and the standard error is
reduced to 122 customers.
506 Microsoft Excel 2010: Data Analysis and Business Modeling
FIGURE 61-5 Forecast parameters and forecasts including spring break and the rst three days of the month.
By looking at the forecast errors for the week 12/24 through 12/31 (see Figure 61-6), you can
see that the model has greatly overforecasted the customer counts for the days in this week.
It also underforecasted customer counts for the week before Christmas. Further examination
of the forecast errors (often called residuals) also shows the following:

Thanksgiving is different from a normal holiday in that the credit union is far less busy
than expected the day after Thanksgiving.

The day before Good Friday is really busy because people leave town for Easter.

Tax day (April 16) is also busier than expected.

The week before Indiana University starts fall classes (last week in August) was not busy,
probably because many staff and faculty take a “summer ing vacation” before the
hectic onrush of the fall semester.
FIGURE 61-6 Errors for Christmas week.
Chapter 61 Forecasting in the Presence of Special Events 507
In the Christmas week worksheet, I added changing cells to incorporate the effects of these
factors. After adding the new parameters as changing cells, I ran Solver again. The results
are shown in Figure 61-7. The RSQ is up to 92 percent, and the standard error is down to
98.61 customers! Note that the post-Christmas week effect reduced daily customer count by
359, the day before Thanksgiving added 607 customers, the day after Thanksgiving reduced
customer count by 161, and so on.
FIGURE 61-7 Final forecast parameters.

Notice also how the forecasting model is improved by using outliers. If your outliers have
something in common (like being the rst three days of the month), include the common
factor as an independent variable and your forecasting error will drop.
How can I check whether my forecast errors are random?
A good forecasting method should create forecast errors or residuals that are random. By
random errors, I mean that your errors exhibit no discernible pattern. If forecast errors are
random, the sign of your errors should change (from plus to minus or minus to plus) approxi-
mately half the time. Therefore, a commonly used test to evaluate the randomness of fore-
cast errors is to look at the number of sign changes in the errors. If you have n observations,
nonrandomness of the errors is indicated if you nd either fewer than
n
2
n – 1

or more than
n
2
n – 1
+
508 Microsoft Excel 2010: Data Analysis and Business Modeling
changes in sign. In the Christmas week worksheet, as shown in Figure 61-7, I determined
the number of sign changes in the residuals by copying from cell P5 to P6:P257 the formula
IF(M5*M4<0,1,0). A sign change in the residuals occurs if and only if the product of two
consecutive residuals is negative. Therefore, this formula yields 1 whenever a change in the
sign of the residuals occurs. There were 125 changes in sign. In cell P1, I computed
254 = 110.6
2
254 –
1


changes in sign as the cutoff for nonrandom residuals. Therefore, we have random residuals.
A similar analysis was performed to predict daily customer counts for dinner at a major
restaurant chain. The special factors corresponded to holidays. The study found that Super
Sunday (the day of the NFL’s Super Bowl) was the least busy day and Valentine’s Day and
Mother’s Day were the busiest. Also, Saturday was the busiest day of the week for dinner and
Friday was the busiest day of the week for lunch.
Problems
1. How can you use the techniques outlined in this chapter to predict the daily sales of
pens at Staples?
2. If you had several years of data, how would you incorporate a trend in the analysis?
509
Chapter 62
An Introduction to Random
Variables
Questions answered in this chapter:

What is a random variable?

What is a discrete random variable?

What are the mean, variance, and standard deviation of a random variable?

What is a continuous random variable?

What is a probability density function?

What are independent random variables?
In today’s world, the only thing that’s certain is that we face a great deal of uncertainty.
In the next nine chapters, I’ll give you some powerful techniques that you can use to
incorporate uncertainty in business models. The key building block in modeling uncertainty

is understanding how to use random variables.
Answers to This Chapter’s Questions
What is a random variable?
Any situation whose outcome is uncertain is called an experiment. The value of a random
variable is based on the (uncertain) outcome of an experiment. For example, tossing a pair
of dice is an experiment, and a random variable might be dened as the sum of the values
shown on each die. In this case, the random variable could assume any of the values 2, 3, and
so on up to 12. As another example, consider the experiment of selling a new video game
console, for which a random variable might be dened as the market share for this new
product.
What is a discrete random variable?
A random variable is discrete if it can assume a nite number of possible values. Here are
some examples of discrete random variables:

Number of potential competitors for your product

Number of aces drawn in a ve-card poker hand

Number of car accidents you have (hopefully zero!) in a year
510 Microsoft Excel 2010: Data Analysis and Business Modeling

Number of dots showing on a die

Number of free throws out of 12 that Phoenix Sun’s star Steve Nash makes during a
basketball game
What are the mean, variance, and standard deviation of a random variable?
In Chapter 42, “Summarizing Data by Using Descriptive Statistics,” I discussed the mean,
variance, and standard deviation for a data set. In essence, the mean of a random variable
(often denoted by µ) is the average value of the random variable you would expect if you
performed an experiment many times. The mean of a random variable is often referred to

as the random variable’s expected value. The variance of a random variable (often denoted
by s
2
) is the average value of the squared deviation from the mean of a random variable that
you would expect if you performed an experiment many times. The standard deviation of a
random variable (often denoted by σ) is simply the square root of its variance. As with data
sets, the mean of a random variable is a summary measure for a typical value of the random
variable, whereas the variance and standard deviation measure the spread of the random
variable about its mean.
As an example of how to compute the mean, variance, and standard deviation of a random
variable, suppose you believe that the return on the stock market during the next year is
governed by the following probabilities:
Probability Market return
.40 +20 percent
.30 0 percent
.30 -20 percent
Hand calculations show the following:
µ=.40*(.20)+.30*(.00)+.30*(–.20)=.02 or 2 percent
s
2
=.4*(.20–.02)
2
+.30*(.0–.02)
2
+.30*(–.20–.02)
2
=.0276
Then σ=.166 or 16.6 percent.
In the le Meanvariance.xlsx (shown in Figure 62-1), I veried these computations.
Chapter 62 An Introduction to Random Variables 511

FIGURE 62-1 Computing the mean, standard deviation, and variance of a random variable.
I computed the mean of the market return in cell C9 with the formula
SUMPRODUCT(B4:B6,C4:C6). This formula multiplies each value of the random variable by its
probability and sums up the products.
To compute the variance of the market return, I determined the squared deviation of each
value of the random variable from its mean by copying from D4 to D5:D6 the formula
(B4–$C$9)^2. Then, in cell C10, I computed the variance of the market return as the aver-
age squared deviation from the mean with the formula SUMPRODUCT(C4:C6,D4:D6).
Finally, I computed the standard deviation of the market return in cell C11 with the formula
SQRT(C10).
What is a continuous random variable?
A continuous random variable is a random variable that can assume a very large number
or, to all intents and purposes, an innite number of values. Here are some examples of
continuous random variables:

Price of Microsoft stock one year from now

Market share for a new product

Market size for a new product

Cost of developing a new product

Newborn baby’s weight

Person’s IQ

Dirk Nowitzki’s three-point shooting percentage during next season
What is a probability density function?
A discrete random variable can be specied by a list of values and the probability of occur-

rence for each value of the random variable. Because a continuous random variable can as-
sume an innite number of values, you can’t list the probability of occurrence for each value
of a continuous random variable. A continuous random variable is completely described by
its probability density function. For example, the probability density function for a randomly
chosen person’s IQ is shown in Figure 62-2.

×