Tải bản đầy đủ (.pdf) (10 trang)

Handbook of Economic Forecasting part 54 ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (77.79 KB, 10 trang )

504 H. White
Table 8
Artificial data: Modified nonlinear least squares – Logistic
Summary goodness of fit
Hidden
units
Estimation
MSE
CV
MSE
Hold-out
MSE
Estimation
R-squared
CV
R-squared
Hold-out
R-squared
01.30098 1.58077 0.99298 0.23196 0.06679 0.06664
11.30013 1.49201 0.99851 0.23247 0.11919 0.06144
21.30000 1.50625 1.00046 0.23255 0.11079 0.05961
30.91397 1.10375 0.84768 0.46044 0.34840 0.20321
40.86988 1.05591 0.80838 0.48647 0.37665 0.24016
50.85581 1.03175 0.80328 0.49478 0.39091 0.24495
60.85010 1.01461 0.80021 0.49815 0.40102 0.24783
70.84517 1.00845 0.79558 0.50105 0.
40466 0.25219
80.83541 1.00419

0.75910 0.50681 0.40718 0.28648


90.80738 1.07768 0.75882 0.52336 0.36379 0.28674
10 0.79669 1.03882 0.73159 0.52967 0.38673 0.31233
11 0.79664 1.04495 0.73181 0.52971 0.38312 0.31213
12 0.79629 1.05454 0.72912 0.52991 0.37745 0.31466
13 0.79465 1.06053 0.72675 0.53088 0.37392 0.31688
14 0.78551 1.04599 0.71959 0.53628 0.38250 0.32361
15 0.78360 1.07676 0.72182 0.53740 0.36433 0.32152
16 0.76828 1.09929 0.70041 0.54645 0.
35103 0.34165
17 0.76311 1.08872 0.70466 0.54950 0.35727 0.33765
18 0.76169 1.11237 0.70764 0.55034 0.34332 0.33484
19 0.76160 1.13083 0.70768 0.55039 0.33242 0.33481
20 0.76135 1.13034 0.70736 0.55054 0.33271 0.33511
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
41 0.68366 1.14326 0.65124 0.59640 0.32508 0.38786

results as good as those seen in Table 9. Nevertheless, we observe quite good perfor-
mance. The best CV MSE performance occurs with 50 hidden units, corresponding to a
respectable hold-out R
2
of 0.471. Moreover, CV MSE appears to be trending downward,
suggesting that additional terms could further improve performance.
Table 11 shows analogous results for the polynomial version of QuickNet. Again we
see that additional polynomial terms do not improve in-sample fit as rapidly as do the
ANN terms. We also again see the extremely erratic behavior of CV MSE, arising from
precisely the same source as before, rendering CV MSE useless for polynomial model
selection purposes. Interestingly, however, the hold-out R
2
of the better-performing
models isn’t bad, with a maximum value of 0.390. The challenge is that this model
could never be identified using CV MSE.
We summarize these experiments with the following remarks. Compared to the fa-
miliar benchmark of algebraic polynomials, the use of ANNs appears to offer the
ability to more quickly capture nonlinearities; and the alarmingly erratic behavior of
Ch. 9: Approximate Nonlinear Forecasting Methods 505
Table 9
Artificial data: QuickNet – Logistic
Summary goodness of fit
Hidden
units

Estimation
MSE
CV
MSE
Hold-out
MSE
Estimation
R-squared
CV
R-squared
Hold-out
R-squared
01.30098 1.58077 0.99298 0.23196 0.06679 0.06664
11.21467 1.44012 0.93839 0.28292 0.14983 0.11795
21.00622 1.16190 0.86194 0.40598 0.31407 0.18982
30.87534 1.02132 0.81237 0.48324 0.39706 0.23641
40.82996 0.94456 0.71615 0.51004 0.44238 0.32685
50.79297 0.91595 0.67986 0.53187 0.45927 0.36096
60.76903 0.89458 0.67679 0.54600 0.47188 0.36384
70.72552 0.84374 0.62678 0.57169 0.
50190 0.41085
80.68977 0.81835 0.58523 0.59280 0.51689 0.44991
90.66635 0.80670 0.55821 0.60662 0.52376 0.47530
10 0.63501 0.79596 0.55889 0.62512 0.53010 0.47466
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29 0.49063 0.62450 0.49194 0.71036 0.63133 0.53759
30 0.47994 0.61135 0.49207 0.71667 0.63909 0.53747
31 0.47663 0.61293 0.48731 0.71862 0.63816 0.
54195
32 0.47217 0.60931 0.48532 0.72125 0.64029 0.54382
33 0.46507 0.59559

0.48624 0.72545 0.64840 0.54295

34 0.46105 0.59797 0.48943 0.72782 0.64699 0.53995
35 0.45784 0.60633 0.48603 0.72971 0.64206 0.54315
36 0.45480 0.60412 0.48765 0.73151 0.64336 0.54163
37 0.45401 0.60424 0.48977 0.73198 0.64329 0.53964
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
49 0.43136 0.64107 0.47770 0.74535 0.62154 0.55098

CV MSE for polynomials definitely serves as a cautionary note. In our controlled en-
vironment, QuickNet, either with logistic cdf or ridgelet activation function, performs
well in rapidly extracting a reliable nonlinear predictive relationship. Naïve NLS is bet-
ter than a simple linear forecast, as is modified NLS. The lackluster performance of
the latter method does little to recommend it, however. Nor do the computational com-
plexity, modest performance, and somewhat erratic behavior of naïve NLS support its
routine use. The relatively good performance of QuickNet seen here suggests it is well
worth application, further study, and refinement.
7.2. Explaining forecast outcomes

In this section we illustrate application of the explanatory taxonomy provided in Sec-
tion 6.2. For conciseness, we restrict attention to examining the out-of-sample pre-
dictions made with the CV MSE-best nonlinear forecasting model corresponding to
506 H. White
Table 10
Artificial data: QuickNet – Ridgelet
Summary goodness of fit
Hidden
units
Estimation
MSE
CV
MSE
Hold-out
MSE
Estimation
R-squared
CV
R-squared
Hold-out
R-squared
01.30098 1.58077 0.99298 0.23196 0.06679 0.06664
11.22724 1.43273 0.87504 0.27550 0.15419 0.17750
21.17665 1.39998 0.83579 0.30537 0.17352 0.21439
31.09149 1.30517 0.75993 0.35564 0.22949 0.28570
40.98380 1.22154 0.75393 0.41922 0.27887 0.29134
50.88845 1.13625 0.73192 0.47550 0.32922 0.31203
60.85571 1.03044 0.71145 0.49483 0.39168 0.33126
70.83444 1.02006 0.69144 0.50739 0.
39781 0.35008

80.81150 0.98440 0.64753 0.52093 0.41886 0.39135
90.78824 0.99417 0.67279 0.53467 0.41309 0.36761
10 0.77323 0.96053 0.70196 0.54352 0.43295 0.34018
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
27 0.56099 0.82982 0.55838 0.66882 0.51012 0.47515
28 0.55073 0.80588 0.53706 0.67488 0.52425 0.49518
29 0.54414 0.82178 0.51536 0.67877 0.51487 0.
51559

30 0.54103 0.81704 0.53229 0.68060 0.51766 0.49967

31 0.53545 0.80240 0.53970 0.68390 0.52630 0.49271
32 0.53222 0.80171 0.55080 0.68581 0.52671 0.48227
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
47 0.47173 0.75552 0.56503 0.72152 0.55398 0.46890
48 0.46773 0.74575 0.55972 0.72388 0.55975 0.47389
49 0.46531 0.73767 0.55892 0.72530 0.56452 0.47464
50 0.
46239 0.73640

0.56272 0.72703 0.56527 0.47107


Table 9. This is an ANN with logistic cdf activation and 33 hidden units, achieving a
hold-out R2 of 0.5493.
The first step in applying the taxonomy is to check whether the forecast function
ˆ
f is
monotone or not. A simple way to check this is to examine the first partial derivatives
of
ˆ
f with respect to the predictors, x, which we write D
ˆ
f = (D
1
ˆ
f, ,D
9
ˆ
f),D
j
ˆ
f ≡

ˆ
f/∂x
j
. If any of these derivatives change sign over the estimation or hold-out sam-
ples, then
ˆ
f is not monotone. Note that this is a necessary and not sufficient condition
for monotonicity. In particular, if

ˆ
f is nonmonotone over regions not covered by the
data, then this simple check will not signal nonmonotonicity. In such cases, further ex-
ploration of the forecast function may be required. In Table 12 we display summary
statistics including the minimum and maximum values of the elements of D
ˆ
f over the
hold-out sample. The nonmonotonicity is obvious from the differing signs of the max-
ima and minima. We are thus in Case II of the taxonomy.
Ch. 9: Approximate Nonlinear Forecasting Methods 507
Table 11
Artificial data: QuickNet – Polynomial
Summary goodness of fit
Hidden
units
Estimation
MSE
CV
MSE
Hold-out
MSE
Estimation
R-squared
CV
R-squared
Hold-out
R-squared
01.30098 1.58077 0.99298 0.23196 0.06679 0.06664
11.20939 1.42354


0.96230 0.28604 0.15962 0.09547

21.13967 1.54695 0.93570 0.32720 0.08676 0.12048
31.09208 2.26962 0.93592 0.35529 −0.33987 0.12027
41.03733 2.14800 0.89861 0.38761 −0.26807 0.15534
51.00583 4.26301 0.87986 0.40621 −1.51666 0.17297
60.98113 4.01405 0.86677 0.42079 −1.36969 0.18527
70.95294 3.34959 0.85683 0.43743 −0.97743 0.19461
80.93024 3.88817 0.86203 0.45083 −1.29538 0
.18972
90.90701 4.35370 0.84558 0.46455 −1.57020 0.20519
10 0.89332 3.45478 0.84267 0.47263 −1.03953 0.20792
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
41 0.61881 15.22200 0.67752 0.63468 −7.98627 0.36316
42 0.61305 14.85660 0.67194 0.63809 −7.77057 0.36841
43 0.60894 15.82990 0.67470 0.64051 −8.34518 0.36581
44 0.60399 15
.23310 0.67954 0.64344 −7.99283 0.36126
45 0.60117 13.93220 0.67664 0.64510 −7.22489 0.36399
46 0.59572 15.58510 0.66968 0.64832 −8.20064 0.37053
47 0.59303 15.63730 0.66592 0.64990 −8.23149 0.37407
48 0.58907 16.39490 0.65814 0.65224 −8.67874 0.38137
49 0.58607 15.33290 0.65483 0.65402 −8.05178 0.38448
50 0.58171 16.08150 0.64922 0.65659 −8.49372 0.
38976

Table 12
Hold-out sample: Summary statistics
Summary statistics for derivatives of prediction function
x
1
x
2
x
3
x
4
x
5
x

6
x
7
x
8
x
9
mean −8.484 7.638 3.411 −7.371 −9.980 −8.375 0.538 −5.512 −12.267
sd 17.353 19.064 6.313 13.248 18.843 10.144 8.918 7.941 17.853
min −155.752 −5.672 −6.355 −115.062 −168.269 −93.124 −9.563 −68.698 −156.821
max 3.785 166.042 51.985 2.084 4.331 0.219 70.
775 3.177 2.722
Summary statistics for predictions and predictors
Prediction x
1
x
2
x
3
x
4
x
5
x
6
x
7
x
8
x

9
mean −0.111 0.046 0.048 0.043 0.580 0.582 0.586 1.009 1.010 1.013
sd 0.775 0.736 0.738 0.743 0.455 0.456 0.457 0.406 0.406 0.408
min −2.658 −1.910 −1.910 −1.910 0.000 0.000 0.000 0.000 0.000 0.000
max 3.087 2.234 2.234 2.234 2.234 2.234 2.234 2.182 2.182 2.182
508 H. White
Table 13
Hold-out sample: Actual and standardized values of predictors
Order stat. Prediction x
1
x
2
x
3
x
4
x
5
x
6
x
7
x
8
x
9
253 3.087 −1.463 −0.577 −0.835 1.463 0.577 0.835 1.686 0.896 1.132
−2.051 −0.847 −1.183 1.944 −0.010 0.545 1.668 −0.281 0.290
252 1.862 0.014 1.240 0.169 0.014 1.240 0.169 0.303 1.089 0.339
−0.043 1.615 0.169 −1.243 1.444 −0.913 −1.738 0.193 −

1.654
251 1.750 −0.815 −0.315 −1.043 0.815 0.315 1.043 1.093 0.523 1.583
−1.170 −0.492 −1.463 0.517 −0.584 1.001 0.208 −1.198 1.397
2 −2.429 −0.077 0.167 0.766 0.077 0.167 0.766 1.008 1.965 0.786
−0.167 0.161 0.973 −1.107 −0.909 0.394
−0.003 2.349 −0.559
1 −2.658 −0.762 −0.014 1.146 0.762 0.014 1.146 1.483 0.634 1.194
−1.097 −0.084 1.484 0.400 −1.244 1.225 1.167 −0.925 0.444
Note: Actual values in first row, standardized values in second row.
Thenextstepistoexamine
ˆ
δ =
ˆ
f −Y for remarkable values, that is, values that are
either unusual or extreme. When one is considering a single out-of-sample prediction,
the comparison must be done relative to the estimation data set. Here, however, we have
a hold-out sample containing a relatively large number of observations, so we can con-
duct our examination relative to the hold-out data. For this, it is convenient to sort the
hold-out observations in order of
ˆ
δ (equivalently
ˆ
f ) and examine the distances between
the order statistics. Large values for these distances identify potentially remarkable val-
ues. In this case we have that the largest values between order statistics occur only in
the tail, so the only remarkable values are the extreme values. We are thus dealing with
cases II.C.2, II.D.3, or II.D.4.
The taxonomy resolves the explanation once we determine whether the predictors are
remarkable or not, and if remarkable in what way (unusual or extreme). The comparison
data must be the estimation sample if there are only a few predictions, but given the rel-

atively large hold-out sample here, we can assess the behavior of the predictors relative
to the hold-out data. As mentioned in Section 6.2, a quick and dirty way to check for
remarkable values is to consider each predictor separately. A check of the order statis-
tic spacings for the individual predictors does not reveal unusual values in the hold-out
data, so in Table 13 we present information bearing on whether or not the values of
the predictors associated with the five most extreme
ˆ
f ’s are extreme. We provide both
actual values and standardized values, in terms of (hold-out) standard deviations from
the (hold-out) mean.
The largest and most extreme prediction (
ˆ
f = 3.0871) has associated predictor val-
ues that are plausibly extreme: x
1
and x
4
are approximately two standard deviations
from their hold-out sample means, and x
7
is at 1.67 standard deviations. This first
example therefore is plausibly case II.D.4: an extreme forecast explained by extreme
predictors. This classification is also plausible for examples 2 and 4, as predictors x
2
,
Ch. 9: Approximate Nonlinear Forecasting Methods 509
x
7
, and x
9

are moderately extreme for example 2 and predictor x
8
is extreme for ex-
ample 4. On the other hand, the predictors for examples 3 and 5 do not appear to be
particularly extreme. As we earlier found no evidence of unusual nonextreme predic-
tors, these examples are plausibly classified as case II.C.2: extreme forecasts explained
by nonmonotonicities.
It is worth emphasizing that the discussion of this section is not definitive, as we
have illustrated our explanatory taxonomy using only the most easily applied tools.
This is certainly relevant, as these tools are those most accessible to practitioners, and
they afford a simple first cut at understanding particular outcomes. They are also help-
ful in identifying cases for which further analysis, and in particular application of more
sophisticated tools, such as those involving multivariate density estimation, may be war-
ranted.
8. Summary and concluding remarks
In this chapter, we have reviewed key aspects of forecasting using nonlinear models.
In economics, any model, whether linear or nonlinear, is typically misspecified. Con-
sequently, the resulting forecasts provide only an approximation to the best possible
forecast. As we have seen, it is possible, at least in principle, to obtain superior approx-
imations to the optimal forecast using a nonlinear approach. Against this possibility lie
some potentially serious practical challenges. Primary among these are computational
difficulties, the dangers of overfit, and potential difficulties of interpretation.
As we have seen, by focusing on models linear in the parameters and nonlinear in
the predictors, it is possible to avoid the main computational difficulties and retain the
benefits of the additional flexibility afforded by predictor nonlinearity. Further, use of
nonlinear approximation, that is, using only the more important terms of a nonlinear se-
ries, can afford further advantages. There is a vast range of possible methods of this sort.
Choice among these methods can be guided to only a modest degree by a priori knowl-
edge. The remaining guidance must come from the data. Specifically, careful applica-
tion of methods for controlling model complexity, such as Geisser’s (1975) delete-d

cross-validation for cross-section data or Racine’s (2000) hv-block cross-validation for
time-series data, is required in order to properly address the danger of overfit. A care-
ful consideration of the interpretational issues shows that the difficulties there lie not
so much with nonlinear models as with their relative unfamiliarity; as we have seen,
the interpretational issues are either identical or highly parallel for linear and nonlinear
approaches.
In our discussion here, we have paid particular attention to nonlinear models con-
structed using artificial neural networks (ANNs), using these to illustrate both the
challenges to the use of nonlinear methods and effective solutions to these challenges.
In particular, we propose QuickNet, an appealing family of algorithms for constructing
nonlinear forecasts that retains the benefits of using a model nonlinear in the predictors
while avoiding or mitigating the other challenges to the use of nonlinear forecasting
510 H. White
models. In our limited example with artificial data, we saw some encouraging perfor-
mance from QuickNet, both in terms of computational speed relative to more standard
ANN methods and in terms of resulting forecasting performance relative to more fa-
miliar polynomial approximations. In our real-world data example, we also saw that
building useful forecasting models can be quite challenging. There is no substitute for
a thorough understanding of the strengths and weaknesses of the methods applied; nor
can the importance of a thorough understanding of the domain being modeled be over-
emphasized.
Acknowledgements
The author is grateful for the comments and suggestions of the editors and three anony-
mous referees, which have led to substantial improvements over the initial draft. Any
errors remain the author’s responsibility.
References
Akaike, H. (1970). “Statistical predictor identification”. Annals of the Institute of Statistical Mathematics 22,
203–217.
Akaike, H. (1973). “Information theory and an extension of the likelihood principle”. In: Petrov, B.N., Csaki,
F. (Eds.), Proceedings of the Second International Symposium of Information Theory. Akademiai Kiado,

Budapest.
Allen, D. (1974). “The relationship between variable selection and data augmentation and a method for pre-
diction”. Technometrics 16, 125–127.
Benjamini, Y., Hochberg, Y. (1995). “Controlling the false discovery rate: A practical and powerful approach
to multiple testing”. Journal of the Royal Statistical Society, Series B 57, 289–300.
Burman, P., Chow, E., Nolan, D. (1994). “A cross validatory method for dependent data”. Biometrika 81,
351–358.
Bierens, H. (1990). “A consistent conditional moment test of functional form”. Econometrica 58, 1443–1458.
Candes, E. (1998). “Ridgelets: Theory and applications”. Ph.D. Dissertation, Department of Statistics, Stan-
ford University.
Candes, E. (1999a). “Harmonic analysis of neural networks”. Applied and Computational Harmonic Analy-
sis 6, 197–218.
Candes, E. (1999b). “On the representation of mutilated Sobolev functions”. SIAM Journal of Mathematical
Analysis 33, 2495–2509.
Candes, E. (2003). “Ridgelets: Estimating with ridge functions”. Annals of Statistics 33, 1561–1599.
Chen, X. (2005). “Large sample sieve estimation of semi-nonparametric models”. C.V. Starr Center Working
Paper, New York University.
Coifman, R., Wickhauser, M. (1992). “Entropy based algorithms for best basis selection”. IEEE Transactions
on Information Theory 32, 712–718.
Craven, P., Wahba, G. (1979). “Smoothing noisy data with spline functions: Estimating the correct degree of
smoothing by the method of generalized cross-validation”. Numerical Mathematics 31, 377–403.
Daubechies, I. (1988). “Orthonormal bases of compactly supported wavelets”. Communications in Pure and
Applied Mathematics 41, 909–996.
Daubechies, I. (1992). Ten Lectures on Wavelets. SIAM, Philadelphia, PA.
Ch. 9: Approximate Nonlinear Forecasting Methods 511
Dekel, S., Leviatan, D. (2003). “Adaptive multivariate piecewise polynomial approximation”. SPIE Proceed-
ings 5207, 125–133.
DeVore, R. (1998). “Nonlinear approximation”. Acta Numerica 7, 51–150.
DeVore, R., Temlyakov, V.(1996). “Some remarks on greedy algorithms”. Advances in Computational Math-
ematics 5, 173–187.

Gallant, A.R. (1981). “On the bias in flexible functional forms and an essentially unbiased form: The Fourier
flexible form”. Journal of Econometrics 15, 211–245.
Geisser, S. (1975). “The predictive sample reuse method with applications”. Journal of the American Statis-
tical Association 70, 320–328.
Gencay, R., Selchuk, F., Whitcher, B. (2001). An Introduction to Wavelets and other Filtering Methods in
Finance and Econometrics. Academic Press, New York.
Gonçalves, S., White, H. (2005). “Bootstrap standard error estimation for linear regressions”. Journal of the
American Statistical Association 100, 970–979.
Hahn, J. (1998). “On the role of the propensity score in efficient semiparametric estimation of average treat-
ment effects”. Econometrica 66, 315–331.
Hannan, E., Quinn, B. (1979). “The determination of the order of an autoregression”. Journal of the Royal
Statistical Society, Series B 41, 190–195.
Hendry, D.F., Krolzig, H M. (2001). Automatic Econometric Model Selection with PcGets. Timberlake Con-
sultants Press, London.
Hirano, K., Imbens, G. (2001). “Estimation of causal effects using propensity score weighting: An application
to right heart catheterization”. Health Services & Outcomes Research 2, 259–278.
Jones, L.K. (1992). “A simple lemma on greedy approximation in Hilbert space and convergence rates for
projection pursuit regression and neural network training”. Annals of Statistics 20, 608–613.
Jones, L.K. (1997). “The computational intractability of training sigmoid neural networks”. IEEE Transac-
tions on Information Theory 43, 167–173.
Kim, T., White, H. (2003). “Estimation, inference, and specification testing for possibly misspecified quantile
regressions”. In: Fomby, T., Hill, R.C. (Eds.), Maximum Likelihood Estimation of Misspecified Models:
Twenty Years Later. Elsevier, New York, pp. 107–132.
Koenker, R., Basset, G. (1978). “Regression quantiles”. Econometrica 46, 33–50.
Kuan, C M., White, H. (1994). “Artificial neural networks: An econometric perspective”. Econometric Re-
views 13, 1–92.
Lehmann, E.L., Romano, J.P. (2005). “Generalizations of the familywise error rate”. Annals of Statistics 33,
1138–1154.
Lendasse, A., Lee, J., de Bodt, E., Wertz, V., Verleysen, M. (2003). “Approximation by radial basis function
networks: Application to option pricing”. In: Lesage, C., Cottrell, M. (Eds.), Connectionist Approaches

in Economics and Management Sciences. Kluwer, Amsterdam, pp. 203–214.
Li, Q., Racine, J. (2003). “Nonparametric estimation of distributions with categorical and continuous data”.
Journal of Multivariate Analysis 86, 266–292.
Mallows, C. (1973). “Some comments on C
p
”. Technometrics 15, 661–675.
Pérez-Amaral, T., Gallo, G.M., White, H. (2003). “A flexible tool for model building: The RElevant Trans-
formation of the Inputs Network Approach (RETINA)”. Oxford Bulletin of Economics and Statistics 65,
821–838.
Pérez-Amaral, T., Gallo, G.M., White, H. (2005). “A comparison of complementary automatic modeling
methods: RETINA and PcGets”. Econometric Theory 21, 262–277.
Pisier G. (1980). “Remarques sur un resultat non publie de B. Maurey”. Seminaire d’Analyse Fonctionelle
1980–81, Ecole Polytechnique, Centre de Mathematiques, Palaiseau.
Powell, M. (1987). “Radial basis functions for multivariate interpolation: A review”. In: Mason, J.C., Cox,
M.G. (Eds.), Algorithms for Approximation. Oxford University Press, Oxford, pp. 143–167.
Racine, J. (1997). “Feasible cross-validatory model selection for general stationary processes”. Journal of
Applied Econometrics 12, 169–179.
Racine, J. (2000). “A consistent cross-validatory method for dependent data: hv-block cross-validation”. Jour-
nal of Econometrics 99, 39–61.
512 H. White
Rissanen, J. (1978). “Modeling by shortest data description”. Automatica 14, 465–471.
Schwarz, G. (1978). “Estimating the dimension of a model”. Annals of Statistics 6, 461–464.
Shao, J. (1993). “Linear model selection by cross-validation”. Journal of the American Statistical Associa-
tion 88, 486–495.
Shao, J. (1997). “An asymptotic theory for linear model selection”. Statistica Sinica 7, 221–264.
Stinchcombe, M., White, H. (1998). “Consistent specification testing with nuisance parameters present only
under the alternative”. Econometric Theory 14, 295–325.
Stone, M. (1974). “Cross-validatory choice and assessment of statistical predictions”. Journal of the Royal
Statistical Society, Series B 36, 111–147.
Stone, M. (1976). “An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion”.

Journal of the Royal Statistical Society, Series B 39, 44–47.
Sullivan, R., Timmermann, A., White, H. (1999). “Data snooping, technical trading rule performance, and the
bootstrap”. Journal of Finance 54, 1647–1692.
Swanson, N.,White, H. (1995). “A model selection approach to assessing the information in the term structure
using linear models and artificial neural networks”. Journal of Business and Economic Statistics 13, 265–
276.
Teräsvirta, T. (2006). “Forecasting economic variables with nonlinear models”. In: Elliott, G., Granger,
C.W.J., Timmermann, A. (Eds.), Handbook of Economic Forecasting. Elsevier, Amsterdam. Chapter 8
in this volume.
Timmermann, A., Granger, C.W.J. (2004). “Efficient market hypothesis and forecasting”. International Jour-
nal of Forecasting 20, 15–27.
Trippi, R., Turban, E. (1992). Neural Networks in Finance and Investing: Using Artificial Intelligence to
Improve Real World Performance. McGraw-Hill, New York.
Vu, V.H. (1998). “On the infeasibility of training neural networks with small mean-squared error”. IEEE
Transactions on Information Theory 44, 2892–2900.
Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia, PA.
Wahba, G., Wold, S. (1975). “A completely automatic French curve: Fitting spline functions by cross-
validation”. Communications in Statistics 4, 1–17.
Westfall, P., Young, S. (1993). Resampling-Based Multiple Testing: Examples and Methods for P -Value
Adjustment. Wiley, New York.
White, H. (1980). “Using least squares to approximate unknown regression functions”. International Eco-
nomic Review 21, 149–170.
White, H. (1981). “Consequences and detection of misspecified nonlinear regression models”. Journal of the
American Statistical Association 76, 419–433.
White, H. (2001). Asymptotic Theory for Econometricians. Academic Press, San Diego, CA.
Williams, E. (2003). “Essays in multiple comparison testing”. Ph.D. Dissertation, Department of Economics,
University of California, San Diego, CA.
PART 3
FORECASTING WITH PARTICULAR
DATA STRUCTURES

×