Tải bản đầy đủ (.pdf) (44 trang)

bootstrap methods for markov processes

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (843.95 KB, 44 trang )




BOOTSTRAP METHODS FOR MARKOV PROCESSES


by


Joel L. Horowitz
Department of Economics
Northwestern University
Evanston, IL 60208-2600


October 2002


ABSTRACT

The block bootstrap is the best known method for implementing the bootstrap with time-
series data when the analyst does not have a parametric model that reduces the data generation
process to simple random sampling. However, the errors made by the block bootstrap converge
to zero only slightly faster than those made by first-order asymptotic approximations. This paper
describes a bootstrap procedure for data that are generated by a (possibly higher-order) Markov
process or by a process that can be approximated by a Markov process with sufficient accuracy.
The procedure is based on estimating the Markov transition density nonparametrically. Bootstrap
samples are obtained by sampling the process implied by the estimated transition density.
Conditions are given under which the errors made by the Markov bootstrap converge to zero
more rapidly than those made by the block bootstrap.















______________________________________________________________________________

I thank Kung-Sik Chan, Wolfgang Härdle, Bruce Hansen, Oliver Linton, Daniel McFadden,
Whitney Newey, Efstathios Paparoditis, Gene Savin, and two anonymous referees for many
helpful comments and suggestions. Research supported in part by NSF Grant SES-9910925.


BOOTSTRAP METHODS FOR MARKOV PROCESSES


1. INTRODUCTION
This paper describes a bootstrap procedure for data that are generated by a (possibly
higher-order) Markov process. The procedure is also applicable to non-Markov processes, such
as finite-order MA processes, that can be approximated with sufficient accuracy by Markov
processes. Under suitable conditions, the procedure is more accurate than the block bootstrap,
which is the leading nonparametric method for implementing the bootstrap with time-series data.
The bootstrap is a method for estimating the distribution of an estimator or test statistic

by resampling one’s data or a model estimated from the data. Under conditions that hold in a
wide variety of econometric applications, the bootstrap provides approximations to distributions
of statistics, coverage probabilities of confidence intervals, and rejection probabilities of tests that
are more accurate than the approximations of first-order asymptotic distribution theory. Monte
Carlo experiments have shown that the bootstrap can spectacularly reduce the difference between
the true and nominal probabilities that a test rejects a correct null hypothesis (hereinafter the error
in the rejection probability or ERP). See Horowitz (1994, 1997, 1999) for examples. Similarly,
the bootstrap can greatly reduce the difference between the true and nominal coverage
probabilities of a confidence interval (the error in the coverage probability or ECP).
The methods that are available for implementing the bootstrap and the improvements in
accuracy that it achieves relative to first-order asymptotic approximations depend on whether the
data are a random sample from a distribution or a time series. If the data are a random sample,
then the bootstrap can be implemented by sampling the data randomly with replacement or by
sampling a parametric model of the distribution of the data. The distribution of a statistic is
estimated by its empirical distribution under sampling from the data or parametric model
(bootstrap sampling). To summarize important properties of the bootstrap when the data are a
random sample, let n be the sample size and
T be a statistic that is asymptotically distributed as
N(0,1) (e.g., a t statistic for testing a hypothesis about a slope parameter in a linear regression
model). Then the following results hold under regularity conditions that are satisfied by a wide
variety of econometric models. See Hall (1992) for details.
n
1. The error in the bootstrap estimate of the one-sided probability
is O(
n
PT z≤ )
p
(n
-1
),

whereas the error made by first order asymptotic approximations is O(n
-1/2
).
2. The error in the bootstrap estimate of the symmetrical probability
is
O
(| | )
n
PT z≤
p
(n
-3/2
), whereas the error made by first-order approximations is O(n
-1
).

1

3. When the critical value of a one-sided hypothesis test is obtained by using the
bootstrap, the ERP of the test is O(n
-1
), whereas it is O(n
-1/2
) when the critical value is obtained
from first-order approximations. The same result applies to the ECP of a one-sided confidence
interval. In some cases, the bootstrap can reduce the ERP of a one-sided test to O(n
-3/2
) (Hall
1992, p. 178; Davidson and MacKinnon 1999).
4. When the critical value of a symmetrical hypothesis test is obtained by using the

bootstrap, the ERP of the test is O(n
-2
), whereas it is O(n
-1
) when the critical value is obtained
from first-order approximations. The same result applies to the ECP of a symmetrical confidence
interval.
The practical consequence of these results is that the ERP’s of tests and ECP’s of
confidence intervals based on the bootstrap are often substantially smaller than ERP’s and ECP’s
based on first-order asymptotic approximations. These benefits are available with samples of the
sizes encountered in applications (Horowitz 1994, 1997, 1999).
The situation is more complicated when the data are a time series. To obtain asymptotic
refinements, bootstrap sampling must be carried out in a way that suitably captures the
dependence structure of the data generation process (DGP). If a parametric model is available
that reduces the DGP to independent random sampling (e.g., an ARMA model), then the results
summarized above continue to hold under appropriate regularity conditions. See, for example,
Andrews (1999), Bose (1988), and Bose (1990). If a parametric model is not available, then the
best known method for generating bootstrap samples consists of dividing the data into blocks and
sampling the blocks randomly with replacement. This is called the block bootstrap. The blocks,
whose lengths increase with increasing size of the estimation data set, may be non-overlapping
(Carlstein 1986, Hall 1985) or overlapping (Hall 1985, Künsch 1989). Regardless of the method
that is used, blocking distorts the dependence structure of the data and, thereby, increases the
error made by the bootstrap. The main results are that under regularity conditions and when the
block length is chosen optimally:
1. The errors in the bootstrap estimates of one-sided and symmetrical probabilities are
almost surely O
p
(n
-3/4
) and O

p
(n
-6/5
), respectively (Hall et al., 1995).
2. The ECP’s (ERP’s) of one-sided and symmetrical confidence intervals (tests) are
O(n
-3/4
) and O(n
-5/4
), respectively (Zvingelis 2000).
Thus, the errors made by the block bootstrap converge to zero at rates that are slower
than those of the bootstrap based on data that are a random sample. Monte Carlo results have
confirmed this disappointing performance of the block bootstrap (Hall and Horowitz 1996).

2

The relatively poor performance of the block bootstrap has led to a search for other ways
to implement the bootstrap with dependent data. Bühlmann (1997, 1998), Choi and Hall (2000),
Kreiss (1992), and Paparoditis (1996) have proposed a sieve bootstrap for linear processes (that
is, AR, vector AR, or invertible MA processes of possibly infinite order). In the sieve bootstrap,
the DGP is approximated by an AR(p) model in which p increases with increasing sample size.
Bootstrap samples are generated by the estimated AR(p) model. Choi and Hall (2000) have
shown that the ECP of a one-sided confidence interval based on the sieve bootstrap is
On
1
()
ε
−+

for any

ε
> 0, which is only slightly larger than the ECP of
1
(On )

that is available when the data
are a random sample. This result is encouraging, but its practical utility is limited. If a process
has a finite-order ARMA representation, then the ARMA model can be used to reduce the DGP
to random sampling from some distribution. Standard methods can be used to implement the
bootstrap, and the sieve bootstrap is not needed. Sieve methods have not been developed for
nonlinear processes such as nonlinear autoregressive, ARCH, and GARCH processes.
The bootstrap procedure described in this paper applies to a linear or nonlinear DGP that
is a (possibly higher-order) Markov process or can be approximated by one with sufficient
accuracy. The procedure is based on estimating the Markov transition density nonparametrically.
Bootstrap samples are obtained by sampling the process implied by the estimated transition
density. This procedure will be called the Markov conditional bootstrap (MCB). Conditions are
given under which:
1. The errors in the MCB estimates of one-sided and symmetrical probabilities are almost
surely
1
(On )
ε
−+
and On
3/2
()
ε
−+
, respectively, for any
ε

> 0.
2. The ERP’s (ECP’s) of one sided and symmetrical tests (confidence intervals) based on
the MCB are
1
(On )
ε
−+
and
3/2
(On )
ε
−+
, respectively, for any
ε
> 0.
Thus, under the conditions that are given here, the errors made by the MCB converge to
zero more rapidly than those made by the block bootstrap. Moreover for one-sided probabilities,
symmetrical probabilities, and one-sided confidence intervals and tests, the errors made by the
MCB converge only slightly less rapidly than those made by the bootstrap for data that are
sampled randomly from a distribution.
The conditions required to obtain these results are stronger than those required to obtain
asymptotic refinements with the block bootstrap. If the required conditions are not satisfied, then
the errors made by the MCB may converge more slowly than those made by the block bootstrap.
Moreover, as will be explained in Section 3.2, the MCB suffers from a form of the curse of
dimensionality of nonparametric estimation. A large data set (e.g., high-frequency financial data)

3

is likely to be needed to obtain good performance if the DGP is a high-dimension vector process
or a high-order Markov process. Thus, the MCB is not a replacement for the block bootstrap.

The MCB is, however, an attractive alternative to the block bootstrap when the conditions needed
for good performance of the MCB are satisfied.
There have been several previous investigations of the MCB. Rajarshi (1990) gave
conditions under which the MCB consistently estimates the asymptotic distribution of a statistic.
Datta and McCormick (1995) gave conditions under which the error in the MCB estimator of the
distribution function of a normalized sample average is almost surely
on . Hansen (1999)
proposed using an empirical likelihood estimator of the Markov transition probability but did not
prove that the resulting version of the MCB is consistent or provides asymptotic refinements.
Chan and Tong (1998) proposed using the MCB in a test for multimodality in the distribution of
dependent data. Paparoditis and Politis (2001a, 2001b) proposed estimating the Markov
transition probability by resampling the data in a suitable way. No previous authors have
evaluated the ERP or ECP of the MCB or compared its accuracy to that of the block bootstrap.
Thus, the results presented here go well beyond those of previous investigators.
1/2
(

)
)
The MCB is described informally in Section 2 of this paper. Section 3 presents regularity
conditions and formal results for data that are generated by a Markov process. Section 4 extends
the MCB to generalized method of moments (GMM) estimators and approximate Markov
processes. Section 5 presents the results of a Monte Carlo investigation of the numerical
performance of the MCB. Section 6 presents concluding comments. The proofs of theorems are
in the Appendix.
2. INFORMAL DESCRIPTION OF THE METHOD
This section describes the MCB procedure for data that are generated by a Markov
process and provides an informal summary of the main results of the paper. For any integer j, let
be a continuously distributed random variable. Let {
d

j
X ∈R (1d ≥ : 1,2, , }
j
X
jn= be a
realization of a strictly stationary, q’th order Markov process. Thus,
112 2 11
( | , , ) ( | , , )
jjj j j j jjj j jqjq
XxX xX x XxX x X x
−−−− −− − −
≤= ==≤= =PP
almost surely for d-vectors
and some finite integer q ≥ 1. It is assumed that q is
known. Cheng and Tong (1992) show how to estimate q. In addition, for technical reasons that
are discussed further in Section 3.1, it is assumed that
12
, , ,
jj j
xx x
−−
j
X
has bounded support and that
if for some
cov( , ) 0
jjk
XX
+
= k> M

M
<
∞ . Define
1
()
X
µ
=
E and .
1
1
n
j
j
mn X

=
=


4

2.1.
Statement of the Problem
The problem addressed in the remainder of this section and in Section 3 is to carry out
inference based on a Studentized statistic,
T , whose form is
n
(2.1) ,
1/2

[() ()]/
nn
TnHmH s
µ
=−
where
H is a sufficiently smooth, scalar-valued function,
2
n
s
is a consistent estimator of the
variance of the asymptotic distribution of
nH
1/2
[() ]mH()
µ

( )
n
Tt
, and T as .
The objects of interest are (1) the probabilities
(0,1)
d
n
N→ n →∞

P and (|
n
T| )t


P for any finite, scalar t,
(2) the probability that a test based on
T rejects the correct hypothesis H
n
0
: [ (HX)] ( )H
µ
=E ,
and (3) the coverage probabilities of confidence intervals for ( )H
µ
that are based on T . To
avoid repetitive arguments, only probabilities and symmetrical hypothesis tests are treated
explicitly. An
n
α
-level symmetrical test based on T rejects H
n
0
if | |
nn
Tz
α
> , where
n
z
α
is the
α
-level critical value. Arguments similar to those made in this section and Section 4 can be used

to obtain the results stated in the introduction for one-sided tests and for confidence intervals
based on the MCB.
The focus on statistics of the form (2.1) with a continuously distributed
X
may appear to
be restrictive, but this appearance is misleading. A wide variety of statistics that are important in
applications can be approximated with negligible error by statistics of the form (2.1). In
particular, as will be explained in Section 4.1,
t statistics for testing hypotheses about parameters
estimated by GMM can be approximated this way.
1

2.2.
The MCB Procedure
Consider the problem of estimating ( )
n
Tz

P , (| | )
n
Tz

P , or
n
z
α
. For any integer
, define YX . Let jq>
1
( , , )

jj jq
X
−−
′′
=

y
p
denote the probability density function of
. Let f denote the probability density function of
1
(,
qq
YX

1
)X
′′
+

j
X
conditional on Y . If f
and
p were known, then and
j
( )
n
Tz≤P (|
n

| )Tz

P could be estimated as follows:
1. Draw YX from the distribution whose density is
1
( , , )
qq
X
+



1
′′
y
p
. Draw
1q
X
+


from the distribution whose density is
1
(| )
q
fY
+



. Set YX
21
( , , )
qq
X
++2

′′
=
 
.
2. Having obtained
YX
1
( , , )
jj jq
X
−−




 
for any , draw 2jq≥+
j
X

from the
distribution whose density is ( | )
j

f
Y⋅

(
jj
. Set YX
11
, ,
j
X
+−
)
q+

′′
=

.

5

3. Repeat step 2 until a simulated data series { : 1, , }
j
X
j=

n has been obtained.
Compute
µ
as (say)

11
( , , )
yq q 1
x
px xdx dx

. Then compute a simulated test statistic T by
substituting the simulated data into (2.1).
n

4. Estimate
(
( )
n
Tz≤P (| | )
n
Tz

P ) from the empirical distribution of T (| ) that is
obtained by repeating steps 1-3 many times. Estimate
n

|
n
T

n
z
α
by the 1 -

α
quantile of the empirical
distribution of |
.
|
n
T

This procedure cannot be implemented in an application because f and p
y
are unknown.
The MCB replaces f and
y
p
with kernel nonparametric estimators. To obtain the estimators, let
f
K be a kernel function (in the sense of nonparametric density estimation) of a ( 1)dq
+
-
dimensional argument. Let
p
K be a kernel function of a dq -dimensional argument. Let
be a sequence of positive constants (bandwidths) such that as .
Conditions that
{ :
n
1,2, }
hn= 0
n
h → n →∞

f
K ,
p
K , and { must satisfy are given in Section 3. For , ,
and , define
}
n
h
d
x ∈
dq
y ∈
( , )zxy=
(1)
1
1
(, ) ,
()
n
jj
nz f
dq
nn
n
jq
x
XyY
pxy K
hh
nqh

+
=+
−−

=





and
1
1
()
()
n
j
ny p
dq
n
n
jq
yY
py K
h
nqh
=+


=





.
The estimators of
y
p
and f, respectively, are
ny
p
and
(2.2) ( | ) ( , )/ ( )
nnzny
f
xy p xy p y= .
The MCB estimates
,
( )
n
Tz≤P (| | )
n
Tz

P , and
n
z
α
by repeatedly sampling the Markov
process generated by the transition density (

n
| )
f
xy. However, ( | )
n
f
xy is an inaccurate
estimator of ( | )
f
xy in regions where ( )
y
p
y is close to zero. To obtain the asymptotic
refinements described in Section 1, it is necessary to avoid such regions. Here, this is done by
truncating the MCB sample. To carry out the truncation, let { :
nn
Cy
( ) }
y
py
n
λ
=
≥ , where
0
n
λ
> for each and 1n = ,2, , 0
n
λ


ˆˆ
, ,
XX
as at a rate that is specified in Section 3.1.
Having obtained realizations from the Markov process induced by
n →∞
11
(
j
jq

≥+
1)

6

(|)
n
f
xy, the MCB retains a realization of
ˆ
j
X
only if
(
1
ˆˆ
, , )
jjq n

X
X
−+
∈C
′′
. Thus, MCB
proceeds as follows:
1
1
ˆ
q
Y
+
, ,

1jq≥+
ˆˆ
(
jj
+
=
ˆˆ
( , ,
X
ˆˆ
, ,
n
ˆ
: 1, , }
j

X
jn=
ˆˆ
)]/
nn
s
1
ˆ
mn

=
[(
H
| )z

n
z
α
ˆ
n
z
, }
X
n
MCB 1. Draw
from the distribution whose density is
1
ˆˆˆ
( , , )
qq

YXX
+


ny
p
. Retain
if Y . Otherwise, discard the current Y
1
ˆ
q
C
+

n 1
ˆ
q
+
and draw a new one. Continue this
process until a
Y is obtained.
1q

ˆ
+ n
C
MCB 2. Having obtained
YX
1
ˆˆ ˆ

( )
jj j
X





for any , draw
ˆ
j
X
from the
distribution whose density is
ˆ
(| )
nj
f
Y⋅
ˆ
. Retain
j
X
and set YX
11
ˆ
, )
jq
X
,

−+
′′
if
1
)
jjq
X
−+
∈C. Otherwise, discard the current
ˆ
j
X
and draw a new one. Continue this
process until an
ˆ
j
X
is obtained for which
1
( )
jjq
X
XC
−+
∈ .
q

n
MCB 3. Repeat step 2 until a bootstrap data series {
has been obtained.

Compute the bootstrap test statistic
Tn , where ,
1/2
ˆ
ˆ
) (
mH
µ
≡−
1
ˆ
n
j
j
X
=

ˆ
µ
is
the mean of
X
relative to the distribution induced by the sampling procedure of steps MCB 1
and MCB 2 (bootstrap sampling), and
2
ˆ
n
s
is an estimator of the variance of the asymptotic
distribution of

1/2
ˆ
[() ()]
nHmH
ˆ
µ
− under bootstrap sampling.
MCB 4. Estimate (( )
n
Tz≤P (|
n
TP ) from the empirical distribution of T (||)
that is obtained by repeating steps 1-3 many times. Estimate
ˆ
n
ˆ
n
T
by the 1 -
α
quantile of the
empirical distribution of |
. Denote this estimator by
ˆ
T
|
n
ˆ
n
z

α
.
A symmetrical test of H
0
based on T and the bootstrap critical value
n
α
rejects at the
nominal
α
level if ||
ˆ
nn
Tz
α
≥ .
2.3.
Properties of the MCB
This section presents an informal summary of the main results of the paper and of the
arguments that lead to them. The results are stated formally in Section 3. Let ⋅ denote the
Euclidean norm. Let
ˆ
P
denote the probability measure induced by the MCB sampling procedure
(steps MCB 1 – MCB 2) conditional on the data { : 1,
j
j
=
. Let any
ε

> 0 be given.
The main results are that under regularity conditions stated in Section 3.1:
(2.3)
1
ˆˆ
sup ( ) ( ) | ( )
nn
z
Tz TzOn
ε
−+
≤− ≤ =|P P

7

almost surely,
(2.4)
3/2
ˆˆ
sup(|| ) (|| )| (
nn
z
Tz TzOn
)
ε
−+
≤− ≤ =|P P
almost surely, and
(2.5)
3/2

ˆ
(| | ) ( )
nn
Tz On
ε
α
α
−+
>=+P .
These results may be contrasted with the analogous ones for the block bootstrap. The
block bootstrap with optimal block lengths yields
On , , and for the
right-hand sides of (2.3)-(2.5), respectively (Hall et al. 1995, Zvingelis 2000). Therefore, the
MCB is more accurate than the block bootstrap under the regularity conditions of Section 3.1.
3/4
()
p
− 6/5
(
p
On

) )
)
5/4
(On

These results are obtained by carrying out Edgeworth expansions of
and
. Additional notation is needed to describe the expansions. Let

( )
n
Tz≤P
{ : 1,
j
ˆˆ
(
n
Tz≤P , }
X
jn
χ
≡=
denote the data. Let
and Φ
φ
, respectively, denote the standard normal distribution function and
density. The j’th cumulant of T (4
n
)j

has the form if j is odd and
if j is even, where
1/2 1
(
j
no
κ

+

/2
)n

1
j
n o
κ

+
1
(2)Ij=+ (n )

j
κ
is a constant (Hall 1992, p. 46). Define
. Conditional on
1
( , ,
κκ
=
4
)
κ

χ
, the j’th cumulant of almost surely has the form
if j is odd and
ˆ
n
T

1
)
1/2
ˆ
j
n
κ

+
1/2
(
on

)
1
) (
j
on
ˆ
n
κ
(2
Ij


=+ + if j is even. The quantities
ˆ
j
κ


depend on
χ
. They are nonstochastic relative to bootstrap sampling but are random variables
relative to the stochastic process that generates
χ
. Define
14
ˆˆ ˆ
( , , )
κ
κκ

=
.
Under the regularity conditions of Section 3.1, ( )
n
Tz

P has the Edgeworth expansion
(2.6)

2
/2 3/2
1
()() (,)()(
j
nj
j
Tz z n z zOn
πκφ

−−
=
≤=Φ+ +

P )
uniformly over
z, where ( , )
j
z
π
κ
is a polynomial function of z for each , a continuously
differentiable function of the components of
κ
κ
for each z, an even function of z if j = 1, and an
odd function of
z if j = 2. Moreover, (| |
n
Tz)

P has the expansion
(2.7)

13
2
(| | ) 2 ( ) 1 2 ( , ) ( ) ( )
n
Tz z n z zOn
πκφ

−−
≤=Φ−+ +P
/2
uniformly over
z. Conditional on
χ
, the bootstrap probabilities
ˆˆ
()
n
Tz

P and have
the expansions
ˆˆ
(| | )
n
T≤P z
)
(2.8)

2
/2 3/2
1
ˆˆ
ˆ
()() (,)()(
j
nj
j

Tz z n z zOn
πκφ
−−
=
≤=Φ+ +

P

8

and
(2.9)

13
2
ˆˆ
ˆ
(| | ) 2 ( ) 1 2 ( , ) ( ) ( )
n
Tz z n z zOn
πκφ
−−
≤=Φ−+ +P
/2
uniformly over
z almost surely. Therefore,
(2.10)
(
)
1/ 2 1

ˆˆ
ˆ
|( ) ( )| ( )
nn
Tz Tz On On
κκ


≤− ≤ = − +PP
and
(2.11)
(
)
13
ˆˆ
ˆ
|(| | ) (| | )| ( )
nn
Tz TzOn On
κκ
−−
≤− ≤ = − +PP
/2

almost surely uniformly over
z. Under the regularity conditions of Section 3.1,
(2.12)
1/2
ˆ
()

On
ε
κκ
−+
−=
almost surely for any
0
ε
> . Results (2.3)-(2.4) follow by substituting (2.12) into (2.10)-(2.11).
To obtain (2.5), observe that
ˆˆ
ˆ
(| | ) (| | ) 1
nn nn
Tz Tz
αα
α

=≤=PP −
/2
/2
. It follows from (2.7)
and (2.10) that
(2.13)

13
2
2( )12 ( ,)( ) 1 ( )
nnn
znzz On

ααα
πκφ α
−−
Φ−+ =−+
and
(2.14)

13
2
ˆ
ˆˆˆ
2( ) 12 ( ,)( ) 1 ( )
nnn
znzz On
ααα
πκφ α
−−
Φ−+ =−+
almost surely. Let
v
α
denote the 1 / 2
α

quantile of the N(0,1) distribution. Then Cornish-
Fisher inversions of (2.13) and (2.14) (e.g., Hall 1992, p. 88-89) give
(2.15)

13
2

(,) ( )
n
zvn v On
αα α
πκ
−−
=− +
/2
/2
and
(2.16)

13
2
ˆ
ˆ
(,) ( )
n
zvn v On
αα α
πκ
−−
=− +
almost surely. Therefore,
()
13
22
13/2
ˆ
ˆ

(2.17) (| | ) {| | [ ( , ) ( , )] ( )}
ˆ
(2.18) [| | ( )].
nn nn
nn
Tz Tz n v v On
TzOn On
αααα
α
πκπκ
κκ
−−
−−
≤= ≤+ − +
=≤+ −+
PP
P
/2
.
Result (2.5) follows by applying (2.12) to the right-hand side of (2.18).
3. MAIN RESULTS
This section presents theorems that formalize results (2.3)-(2.5).

9

3.1 Assumptions
Results (2.3)-(2.5) are established under assumptions that are stated in this section. The
proof of the validity of the Edgeworth expansions (2.6)-(2.9) relies on a theorem of Götze and
Hipp (1983) and requires certain restrictive assumptions. See Assumption 4 below. It is likely
that the expansions are valid under weaker assumptions, but proving this conjecture is beyond the

scope of this paper. The results of this section hold under weaker assumptions if the Edgeworth
expansions remain valid.
The following additional notation is used. Let
z
p
denote the probability density function
of
. Let
11
(,
qqq
ZXY
+++
′′

1
)

ˆ
E
denote the expectation with respect to the distribution induced by
bootstrap sampling (steps MCB 1 and MCB 2 of Section 2.2). Define
ˆ
ˆˆˆˆ
[( )( )]nm m
µ
µ

Σ=
and

− −E

2
ˆ
()'
n
sH
ˆ
()H
µ
µ
)
=∇ Σ∇


11
()(
j
XX
. For reasons that are explained later in this section, it is assumed that
0
µ
µ
+

−−=E if jM for some integer >
M
<
∞ . Define


{
}
1
11
() ()() ()( )( )()
nM M
n i i i ij ij i
ij
nM XmXm XmX m X mXm


++
==



′′
Σ= − − − + − − + − −


∑∑
,

{
}
1
11
ˆˆ ˆˆ ˆ ˆ
ˆ
ˆˆ ˆ ˆ ˆˆ

() ()() ()( )( )()
nM M
n i i i ij ij i
ij
nM XmXm XmX m X mXm


++
==



′′
Σ= − − − + − − + − −


∑∑
,
{
}
11 11 1 1
1
ˆˆ ˆ ˆ ˆ ˆ ˆ
ˆˆ ˆ ˆ ˆˆ
()() ()( )( )()
M
nj
j
XX XX X X
µµ µ µ µµ

++
=
j


′′
Σ= −−+ − −+−−




E ,
and
2
ˆˆ
()' ()/
nn
2
n
H
Hs
ϑµ µ
=∇ Σ ∇

Note that
2
n
ϑ
can be evaluated in an application because the
bootstrap DGP is known. For any 0

λ
> , define,
1
{ : ( , , )
d
qyq
Cx pxx
λ
λ
=
∈≥



11
some , , }
q
for
x
x

(C. Let )
λ

B
denote the measurable subsets of C
λ

( , )
k

A
ξ
P. Let denote the
k-step transition probability from a point C
λ
ξ


to a set ( )CA
λ

B

.
Assumption 1
: { is a realization of a strictly stationary, q’th
order Markov process that is geometrically strongly mixing (GSM).
: 1, 2, , ; }
d
jj
Xj nX=∈R
2

Assumption 2
: (i) The distribution of
1q
Z
+
is absolutely continuous with respect to
Lebesgue measure. (ii) For

t and each k such that 0
d
∈ kq
<

(3.1)
(
)
lim sup exp ,| | , 1
jj
t
tX X j j k j j
ι

→∞

′′′

≤≠<

EE
.
(iii) The functions
y
p
,
z
p
, and f are bounded. (iv) For some , 2≥
y

p
and
z
p
are everywhere
at least times continuously differentiable with respect to any mixture of their arguments. 

10

Assumption 3
: (i) H is three times continuously differentiable in a neighborhood of
µ
.
(ii) The gradient of H is non-zero in a neighborhood of
µ
.
Assumption 4
: (i)
j
X
has bounded support. (ii) For all sufficiently small 0
λ
> , some
0
ε
> , and some integer , 0>k
,;()
sup | ( , ) ( , ) | 1
kk
CA C

AA
λλ
ξζ
ξζ
ε
∈⊂
−<PP

B
−.
(iii) For some
M
<∞,
11
()( )
j
XX 0
µ
µ
+

−−E
2
() ()
nn
= if . (iv) The sample and bootstrap
variance estimators are
j M>
s
Hm Hm


=∇ Σ ∇ , and
22
nn
ˆ
ˆˆ
() ()
n
ˆ
s
Hm Hm

Σ∇
ϑ
=∇ . (v) For
n
λ
as
in Assumption 6,
1
)]
1
) | ] [(lo
yq n q q
Y y o
λ
[( gpY n

+
<==P as uniformly over such that n →∞

q
y
()
yq n
py
λ
≥ .
Let K be a bounded, continuous function whose support is [-1,1] and that is symmetrical
about 0. For each integer
, let K satisfy,
0, ,j =


1
1
1if 0
() 0 if 1
(nonzero) if
j
K
j
vKvdv j
B
j

=


=≤<



=





For any integer J > 0, let
W (j = 1, …, J) denote the j’th component of the vector
()j
J
W . ∈
Assumption 5
: Let
(1)dq
f
v
+
∈ and .
dq
p
v ∈
f
K and
p
K have the forms
(1)
()
()
11

() ( ); () (
dq dq
j
j
ff pp p
f
jj
Kv Kv Kv Kv
+
==
==
∏∏
).
Assumption 6
: (i) Let 1/[2 ( 1)]dq
η
=++ . Then
nh
hcn
η

= for some finite constant
. (ii) .
0
h
c >
2
(log )
nn
nh

λ


Assumptions 2(i), 2(ii) and 4 are used to insure the validity of the Edgeworth expansions
(2.6)-(2.9). Condition (3.1) is a dependent-data version of the Cramér condition. See Götze and
Hipp (1983). Assumptions 2(iii) and 2(iv) insure sufficiently rapid convergence of
ˆ
κκ
− .
Assumptions 4(i)-4(ii) are used to show that the bootstrap DGP is GSM. The GSM
property is used to prove the validity of the Edgeworth expansions (2.8)-(2.9). The results of this
paper hold when
j
X
has unbounded support if the expansions (2.8)-(2.9) are valid and ( )
y
p
y
decreases at an exponentially fast rate as
y →∞.
3
Assumption 4(iii) is used to insure the
validity of the Edgeworth expansions (2.6)-(2.9). It is needed because the known conditions for
the validity of these expansions apply to statistics that are functions of sample moments. Under

11

4(iii)-4(iv),
n
s

and T are functions of sample moments of
n
j
X
. This is not the case if T is
Studentized with a kernel-type variance estimator (e.g., Andrews 1991; Andrews and Monahan
1992; Newey and West 1987, 1994;). However, under Assumption 1, the smoothing parameter of
a kernel variance estimator can be chosen so that the estimator is
n
1/2
n
ε

-consistent for any 0
ε
> .
Therefore, if the required Edgeworth expansions are valid, the results established here hold when
is Studentized with a kernel variance estimator.
n
T
4

n
ˆ
()
Hm

∇Σ
(H
ˆ

n

ˆ
)]
µ
1)]
+
0
ν
>
)
Tz≤−
Tz
|P
p
|P
α
(| )
α
P
| )
α
>=
(1
The quantity
2
ϑ
in
2
ˆ

n
s
is a correction factor analogous to that used by Hall and Horowitz
(1996) and Andrews (1999, 2002). It is needed because
is a biased estimator
of the variance of the asymptotic distribution of
nH
()
Hm
ˆ
)m
1/2
[(
− . The bias converges to zero
too slowly to enable the MCB to achieve the asymptotic refinements described in Section 2.3.
The correction factor removes the bias without distorting higher-order terms of the Edgeworth
expansion of the CDF of
.
ˆˆ
n
()
Tz≤P
ˆ
Assumption 4(v) restricts the tail thickness of the transition density function.
5

Assumption 5 specifies convenient forms for
f
K and
p

K , which may be higher-order kernels.
Higher-order kernels are used to insure sufficiently rapid convergence of
ˆ
κ
κ

.
3.2 Theorems
This section gives theorems that establish conditions under which results (2.3)-(2.5) hold.
The bootstrap is implemented by carrying out steps MCB 1-MCB 4. Let /[2 (dq
δ
=
+ .
Theorem 3.1
: Let Assumptions 1-6 hold. Then for every
(3.2)
1/2
ˆˆ
sup ( ( ) | ( )
nn
z
TzOn
δ
ν
−−+
≤ =P
and
(3.3)
1
ˆˆ

su (|| ) (|| )| (
nn
z
TzOn
δ
ν
−− +
≤− ≤ =P
)
almost surely.
Theorem 3.2
: Let Assumptions 1-6 hold. For any (0,1)

let
ˆ
n
z
α
satisfy
ˆˆ
ˆ
|
nn
Tz
α
>=. Then for every 0
ν
>
(3.4)
1

ˆ
(| ( )
nn
Tz On
δ
ν
α
−− +
+P .
Theorems 3.1 and 3.2 imply that results (2.3)-(2.6) hold if 2 ) ( 1) /(4 )dq
ε
ε
>− + . With
the block bootstrap, the right-hand sides of (3.2)-(3.4) are O
p
(n
-3/4
), O
p
(n
-6/5
), and O(n
-5/4
),

12

respectively. The errors made by the MCB converge to zero more rapidly than do those of the
block bootstrap if
 is sufficiently large. With the MCB, the right-hand side of (3.2) is on

if
, the right-hand side of (3.3) is if , and the right-hand
side of (3.4) is
if . However, the errors of the MCB converge more
slowly than do those of the block bootstrap if the distribution of
3/4
()

( 1) / 2dq>+

6/5
()on

( 1) / 3dq>+
5/4
()

on ( 1)/ 2dq>+
Z
is not sufficiently smooth (
is too small). Moreover, the MCB suffers from a form of the curse of dimensionality in
nonparametric estimation. That is, with a fixed value of
, the accuracy of the MCB decreases as
d and q increase. Thus, the MCB, like all nonparametric estimators, is likely to be most attractive
in applications where d and q are not large. It is possible that this problem can be mitigated,
though at the cost of imposing additional structure on the DGP, through the use of dimension
reduction methods. For example, many familiar time series DGP’s can be represented as single-
index models or nonparametric additive models with a possibly unknown link function.
However, investigation of dimension reduction methods for the MCB is beyond the scope of this
paper.



: 1, ,
j
X
in
=
j
0
τ

( , ,
jj
XX
)
j
τ


=
′′
1j
τ
≥+
1+
1L
θ
×
θ
4. EXTENSIONS

Section 4.1 extends the results of Section 3 to tests based on GMM estimators. Section
4.2 presents the extension to approximate Markov processes.
4.1 Tests Based on GMM Estimators
This section gives conditions under which (3.2)-(3.4) hold for the t statistic for testing a
hypothesis about a parameter that is estimated by GMM. The main task is to show that the
probability distribution of the GMM t statistic can be approximated with sufficient accuracy by
the distribution of a statistic of the form (2.1). Hall and Horowitz (1996) and Andrews (1999,
2002) use similar approximations to show that the block bootstrap provides asymptotic
refinements for t tests based on GMM estimators.
Denote the sample by { }
. In this section, some components of
X
may be
discrete, but there must be at least one continuous component. See Assumption 9 below.
Suppose that the GMM moment conditions depend on up to
lags of
j
X
. Define
X for some fixed integer 0
τ
≥ and . Let X denote a random vector
that is distributed as
τ
X . Estimation of the parameter is based on the moment
condition ( , )G 0
θ
=E X , where G is a known 1
G
L

×
function and
G
LL
θ
≥ . Let
0
θ
denote the

13

true but unknown value of
θ
. Assume that
00
(,)( ,) 0
ij
GG
θ
θ

=
E XX if | for some
.
|
G
ijM−>
G
M <∞

6
As in Hall and Horowitz (1996) and Andrews (1999, 2002), two forms of the GMM
estimator are considered. One uses a fixed weight matrix, and the other uses an estimator of the
asymptotically optimal weight matrix. The results stated here can be extended to other forms
such as the continuous updating estimator. In the first form, the estimator of
θ
,
n
θ
, solves
11
11
( , )
nn
i
ii
nG
ττ
θ
θθ
=+=+

Ω≡



∑∑
XX
Θ
GG

LL
×
11
11
( ) ( , )
nn
n i
ii
nG
ττ
θ
θ
−−
=+




∑∑

XX
θ
n
θ

1
11
(, ,)
G
M

iii
ij
H
ττ
θ
nG
θ

+
=+




+





∑∑
XX
( ,)( ,)
ij i
GG
( ,)
θθ
+

X X

11
00
[()]
n
E
θ



G
LL
θ
×
DE=∂
(,
n
θ
11
0
−−

1−
n
θ
n
θ
σ
0
Ω ( )
nrr

σ
, )rr
r
θ
θ
n
θ
r
θ
=
r
=−
(4.1)
min ( ) ( , )
ni
JnG
θ
−−
∈Θ









,
where is the parameter set and


is a , positive-semidefinite, symmetrical matrix of
constants. In the second form,
n
θ
solves
(4.2)
min ( , ) ( , )
nn i
JnG
θ
θθ
∈Θ
=+


≡Ω



,
where
solves (4.1),
1
() (,)(,)
n
ni
G
θθ


=+

Ω=
XX ,
j
and , ,) ( ( ,)
i ij i ij
HGG
θ
θθ
++

=+XX X X .
7

To obtain the t statistic, let
Ω= . Define the matrices
0
[(,)/G
θ
θ
∂X and ]
1
1
)/
n
ni
i
Dn G
τ

θ

=+
=∂

X ∂.
Define
(4.3)

() ()DD D DDD
σ
′′
=Ω ΩΩ Ω
if
solves (4.1) and
(4.4)

1
0
()DD
σ


=Ω
if
solves (4.2). Let
n
σ
be the consistent estimator of that is obtained by replacing
D

and
in (4.3) and (4.4) by and
n
D ( )
nn
θ
Ω . In addition, let be the ( component of
n
σ
,
and let
and
nr
θ
be the r’th components of and , respectively. The t statistic for testing
H
0
:
0
θ
is .
1/ /2
0
r
tn
θθ
2 1
()/(
r n r
σ

)
nr n

14

To obtain the MCB version of
t , let {
nr
ˆ
: 1, , }
j
X
j= n be a bootstrap sample that is
obtained by carrying out steps MCB 1 and MCB 2 but with the modified transition density
estimator that is described in equations (4.5)-(4.7) below. The modified estimator allows some
components of
j
X
to be discrete. Define
ˆ
j
ˆˆ
( , , )
jj
XX
τ

=

X . Let X denote a random vector

that is distributed as
ˆ
1
ˆ
τ
+
X . Let
ˆ
E
denote the expectation with respect to the distribution induced
by bootstrap sampling. Define
ˆ
(, ) (, )GG
θ
θ
=

ˆ
ˆ
(, )
n
Gii
θ
E X . The bootstrap version of the
moment condition ( ,G ) 0
θ
=E X is
ˆˆ
(,
θ

ˆ
G ) 0
=
E X . As in Hall and Horowitz (1996) and Andrews
(1999, 2002), the bootstrap version is recentered relative to the population version because,
except in special cases, there is no
θ
such that
ˆ
ˆ
(,) 0G
θ
=
E X when
G
LL
θ
> . Brown, et al.
(2000) and Hansen (1999) discuss an empirical likelihood approach to recentering. Recentering
is unnecessary if
G
LL
θ
=
, but it simplifies the technical analysis and, therefore, is done here.
8

To form the bootstrap version of
, let
nr

t
ˆ
n
θ
denote the bootstrap estimator of
θ
. Let
ˆ
n
D

be the quantity that is obtained by replacing
with and
i
X
ˆ
i
X
n
θ
with
ˆ
n
θ
in the expression for
. Define
n
D
1
ˆ

ˆ
(,)/
nn
DG
τ
θ
θ
+
=∂ ∂E X ,
1
1
11
ˆˆ ˆ ˆ ˆ
ˆ
() (,)(,) (, ,)
G
M
n
niiii
ij
nGG H
ττ
θθθ
j
θ


+
=+ =+






Ω= +





∑∑
XX XX ,
1
11 11
1
ˆˆ ˆˆ ˆ ˆ
ˆ
(,)(,) (, ,)
G
M
nnn j
j
GG H
ττ ττ
τ
θθ θ
n

++ +++
=+






Ω= +






E XX XX ,
and
1
11 11
1
ˆˆ ˆˆ ˆ ˆ
ˆ
(,)(,) (, ,)
nnn j
j
GG H
ττ ττ
τ
θθ θ
n


=++ +++

=+





Ω+






E

XX XX .
Define
ˆ
n
σ
, by replacing with , with , G
ˆ
G
i
X
ˆ
i
X
n
θ

with
ˆ
n
θ
, with
n
D
ˆ
n
D
and ( )
nn
θ
Ω with
ˆ
ˆ
(
n
)
n
θ
Ω in the formula for
n
σ
. Let *
n

=Ω if
n
θ

solves (4.1) and
ˆ
*
n
(
n
)
n
θ
Ω=Ω if
n
θ
solves
(4.2). Define
11
(*) * *(*)
nnnn nnnnnnnn
DDD DDD
σ
−−
′′ ′
=Ω ΩΩΩ Ω


1−
,
11
(*) * *(*)
nnnn nnnnnnnn
DDD DDD

σ
−−
′′ ′
=Ω ΩΩΩ Ω
1−
,

15

and
2
()/()
nr n rr n rr
ϑσ σ
=

, where ( )
nrr
σ

and ()
nrr
σ
are the ( components of , )rr
n
σ

and
n
σ

,
respectively. Let
ˆ
nr
θ
denote the r’th component of
ˆ
n
θ
. Then the MCB version of the t statistic
is
tn . The quantity
1/2 1/
ˆ
ˆ
ˆ
()/(
nr nr nr n rr
ϑθθσ
2
)
nr
=−
nr
ϑ
is a correction factor analogous to that used
by Hall and Horowitz (1996) and Andrews (1999, 2002).
Now let
V
( , )

j
θ
X ( 1 , , )jn
τ
=+ be the vector containing the unique components of
(,
j
G
)
θ
X , ( , ) (
ji
GG, )
j
θ
θ
X (0
G
iM
+
X )

≤ , and the derivatives through order 6 of G( , )
j
θ
X
and ( , ) (
j
GG
, )

ij
θ
θ
+
XX . Let denote the support of (
X
S
11
, , )XX
τ
+
. Define
y
p
,
z
p
, and
f

as in Sections 2-3 but with counting measure as the dominating measure for discrete components
of
j
X
. The following new assumptions are used to derive the results of this section.
Assumptions 7-9 are similar to ones made by Hall and Horowitz (1996) and Andrews (1999,
2002).
( 1, ,j= )n
Assumption 7
:

0
θ
is an interior point of the compact parameter set and is the unique
solution in to the equation
Θ
Θ (,) 0G
θ
=
E X .
Assumption 8
: (i) There are finite constants and such that
G
C
V
C
1
(,)
G
GC
τ
θ
+
≤X
and
1
(,)
V
VC
τ
θ

+
≤X for all
1
τ
+

X
SX and
θ

Θ . (ii)
10 1 0
(,)( ,)
j
GG
ττ
0
θ
θ
+++

=
E
M
j
XX
11
1
[( ,)( ,)
G

j
GG
ττ
θ
+++
=
if
for some . (iii)
G
M Mj>
G
<∞
{
11
ττ
++
( ,)
θθ θ
,)(GG


+
+

E XX XX
}
1
(,)(
j
GG

τ
θ
++ +
XX
1
,)]
τ
θ

exists for all
θ

Θ . Its smallest eigenvalue is bounded away from 0
uniformly over
θ
in an open sphere,
0
N
, centered on
0
θ
. (iv) There is a bounded function
such that
(
G
C
)
i
11 1
)(GG

ττ
θθ
++
XX
2
,)−≤(,
112
()
τ
G
C
θ
θ

+
X for all X and
1
τ
+

X
S
12
,
θ
θ
∈Θ. (v) G is 6-times continuously differentiable with respect to the components of
θ

everywhere in

0
N
. (vi) There is a bounded function C such that ()
V
i
11
)(VV
12
,)
ττ
1 1
()C
τ
2
(,
V
θ
θθ
+
X
θ
++
XX−≤− for all
1
τ
+

X 12
,XS and
θ

θ

Θ.
Assumption 9
: (i)
j
X
can be partitioned ( 1, , )j= n
)
() ( )
(,
cd
jj
XX



, where

for some , the distributions of
()c
d
j
X ∈
1d ≥
()c
j
X
and
0

(, )/G
θ
θ

∂X
()d
j
are absolutely continuous with
respect to Lebesgue measure, and the distribution of
X
is discrete with finitely many mass
points. There need not be any discrete components of
j
X
, but there must be at least one
continuous component. (ii) The functions
y
p
,
z
p
, and f are bounded. (iii) For some integer

16

2> ,
y
p
and
z

p
are everywhere at least  times continuously differentiable with respect to
any mixture of their continuous arguments.
}
)=
()
)
y
d=
c
j
ˆ
n
tz
α
>
jj
X
= =
1,
j
Assumption 10
: Assumptions 2 and 4 hold with V
0
(,
j
)
θ
X in place of
j

X
.
As in Sections 2-3, {
ˆ
: 1, ,
j
X
j= n in the MCB for GMM is a realization of the
stochastic process induced by a nonparametric estimator of the Markov transition density. If
j
X

has no discrete components, then the density estimator is (2.2) and MCB samples are generated
by carrying out steps MCB 1 and MCB 2 of Section 2.2. A modified transition density estimator
is needed if
j
X
has one or more discrete components. Let
() ( )
(,
cd
jj
YY)



be the partition of
into continuous and discrete components. The modified density estimator is
j
Y

(4.5)
(| (,)/ ()
nnn
f
xy g xy p y,
y
where
(4.6)
() ()
()
() ()
() ()
(1)
1
1
(, , ( )( )
()
cc
cc
n
jj
dd
dd
nf
jj
dq
nn
n
jq
xXyY

gx K Ix X Iy Y
hh
nqh
+
=+

−−

==




,
=
()
dim( )
c
j
X, and
(4.7)
()
()
()
()
1
1
() ( )
()
c

n
d
d
ny p
j
dq
n
n
jq
yY
py K Iy Y
h
nqh
=+



==




.
The result of this section is given by the following theorem.
Theorem 4.1: Let Assumptions 1, 4(i), 4(v), and 5-10 hold. For any (0,1)
α
∈ let
ˆ
n
z

α

satisfy
ˆ
ˆ
(| | )
nr
α
=P . Then (3.2)-(3.4) hold with t and t in place of T and T .
nr
ˆ
nr
n
ˆ
n
4.2 Approximate Markov Processes
This section extends the results of Section 3.2 to approximate Markov processes. As in
Sections 2-3, the objective is to carry out inference based on the statistic
T defined in (2.1). For
an arbitrary random vector V , let
n
( | )
j
x
vp denote the conditional probability density of
j
X
at
x and V . An approximate Markov process is defined to be a stochastic process that
satisfies the following assumption.

v
Assumption AMP
: (i) { is strictly stationary and GSM.
(ii) For some finite b , integer
, all finite and all
: 0, 2, ; }
d
Xj X=±± ∈R
0
0q > j q≥
0>
0
q
j

17

1
12 12
, ,
sup ( | , , ) ( | , , , )
jj
bq
jj j jj j jq
xx
x
xx xxx x e


−− −− −

−<pp .
Assumption AMP is satisfied, for example, by the MA(1) process
YU
1jj j
U
β

=+ ,
where
1
β
< and U is iid with mean zero and bounded support.
j
9

The MCB for an approximate Markov process (hereinafter abbreviated AMCB) is the
same as the MCB except that the order q of the estimated Markov process (2.2) increases at the
rate
as . The estimated transition density (2.2) is calculated as if the data were
generated by a true Markov process of order
. The AMCB is implemented by
carrying out steps MCB 1-MCB 4 with the resulting estimated transition density. Because
increases very slowly as n , a large value of is not necessarily required to obtain good
finite-sample performance with the AMCB. Section 5 provides an illustration.
2
(log )n n →∞
2
(log )q∝
q
n

}

q

1
)
q
→∞
To formalize the properties of the AMCB, let
{ be the Markov process that is
induced by the relation
()q
j
X
(4.8)
.
() () ()
11
1
( | , , ) ( | , , )
qq q
j j jq j j j j jq jq
jjq
j
XxXx Xx XxXx Xx
−− −−


≤= ==≤= =PP
Define

and
() () ()
1
( , , )
qq q
jj
j
YX X




=

() () ()
11
(,
qqq
qqq
ZXY
+++



≡ . Let
()q
y
p
and
()q

z
p
, respectively,
denote the probability density functions of
Y
()
1
q
q
+
and
()
1
q
q
Z
+
. Let
(q)
f
denote the probability
density function of
()q
j
X
conditional on Y . Let
()q
j
f
denote the probability density of

j
X

conditional on
. To accommodate a Markov process whose order increases with
increasing
, Assumptions 2, 5 and 6 are modified as follows. In these assumptions, increases
as
in such a way that for some finite constant , and { is a
sequence of positive, even integers satisfying  and
12
, ,
jj
XX
−−
n
→∞
q
n
2
g )qnc→
n
→∞
/(lo 0>c
0
}
n

(dq 1) /
n

+
→ as . n →∞
Assumption 2′
: For some finite integer and each : (i) The distribution of
0
n
0
nn≥
()
1
q
q
Z
+

is absolutely continuous with respect to Lebesgue measure
. (ii) For and each k such that

d
t ∈
0 kq<≤
(3.1)
(
)
() ()
lim sup exp ,| | , 1
qq
j
j
t

tX X j j k j j
ι

→∞

′′′

≤≠<


EE .
(iii) The functions
()q
y
p
,
()q
z
p
,
()q
f
, and
f
are bounded. (iv) The functions
()q
y
p
and
()q

z
p
, are
everywhere
times continuously differentiable with respect to any mixture of their arguments.
n


18

For each positive integer
, let be a bounded, continuous function whose support is
[-1,1], that is symmetrical about 0, and that satisfies
n
n
K

1
1
1if 0
() 0 if 1
(nonzero) if
j
nn
K
n
j
vK vdv j
Bj


=


=≤<


=





Assumption 5′
: Let
(1)dq
f
v
+
∈ and . For each finite integer ,
dq
p
v ∈ n
f
K and
p
K
have the forms
(1)
()
()

11
() ( ); () (
dq dq
j
j
ff n pp np
f
jj
K v Kv Kv Kv
+
==
==
∏∏
).
Assumption 6′
: (i) Let 1/[2 ( 1)]
n
dq
η
=++ . Then
nh
hcn
η

= for some finite constant
. (ii) .
0
h
c >
2

(log )
n
nn
nh
λ


The main difference between these assumptions and Assumptions 2, 5, and 6 of the MCB
is the strengthened smoothness Assumption 2′(iv). The MCB for an order q Markov process
provides asymptotic refinements whenever
y
p
and
z
p
have derivatives of fixed order
, whereas the AMCB requires the existence of derivatives of all orders as . (1dq>+ ) n →∞
The ability of the AMCB to provide asymptotic refinements is established by the
following theorem.
Theorem 4.2
: Let Assumptions AMCB, 2′, 3, 4, 5′, and 6′ hold with for
some finite . For any
2
/(log )qn→ c
0c > (0,1)
α
∈ let
ˆ
n
z

α
satisfy
ˆˆ
ˆ
(| | )
nn
Tz
α
α
>=P . Then for every 0
ν
>

1
ˆˆ
sup ( ) ( ) | ( )
nn
z
Tz Tz On
ν

+
≤− ≤ =|P P ,

3/2
ˆˆ
sup(|| ) (|| )| (
nn
z
Tz TzOn

)
ν
−+
≤− ≤ =|P P ,
almost surely, and

3/2
ˆ
(| | ) ( )
nn
Tz On
ν
α
α
−+
>=+P .
5. MONTE CARLO EXPERIMENTS
This section describes four Monte Carlo experiments that illustrate the numerical
performance of the MCB. The number of experiments is small because the computations are very
lengthy.
Each experiment consists of testing the hypothesis H
0
that the slope coefficient is zero in
the regression of
j
X
on
1j
X


. The coefficient is estimated by ordinary least squares (OLS), and

19

acceptance or rejection of H
0
is based on the OLS t statistic. The experiments evaluate the
empirical rejection probabilities of one-sided and symmetrical tests at the nominal 0.05 level.
Results are reported using critical values obtained from the MCB, the block bootstrap, and first-
order asymptotic distribution theory. Four DGPs are used in the experiments. Two are the
ARCH(1) processes
t
j
1)q+
(5.1)
,
21/2
1
(1 0.3 )
jj j
XU X

=+
where {
is an iid sequence that has either the N(0,1) distribution or the distribution with
}
j
U
(5.2)
.

7
( ) 0.5[sin ( / 2) 1] (| | 1)
j
Uu u Iu
π
≤= + ≤P
The other two DGPs are the GARCH(1,1) processes
(5.3)
1/2
jjj
X
Uh= ,
where
(5.4)

2
11
10.4( )
jj
hhX
−−
=+ +
j
and {
is an iid sequence with either the N(0,1) distribution or the distribution of (5.2). DGP
(5.1) is a first-order Markov process. DGP (5.3)-(5.4) is an approximate Markov process. In the
experiments reported here, this DGP is approximated by a Markov process of order . When
has the admittedly somewhat artificial distribution (5.2),
}
j

U
2q =
j
U
X
has bounded support as required
by Assumption 4. When UN , ~(0,1)
j j
X
has unbounded support and
2
j
X
has moments only
through orders 8 and 4 for models (5.1) and (5.3)-(5.4), respectively (He and Teräsvirta 1999).
Therefore, the experiments with UN illustrate the performance of the MCB under
conditions that are considerably weaker than those of the formal theory.
~(0,1)
j
The MCB was carried out using the 4th-order kernel
(5.4)
.
246
( ) (105/ 64)(1 5 7 3 ) (| | 1)Kv v v v I v=−+−≤
Implementation of the MCB requires choosing the bandwidth parameter
. Preliminary
experiments showed that the Monte Carlo results are not highly sensitive to the choice of
h , so a
simple method motivated by Silverman’s (1986) rule-of-thumb is used. This consists of setting
equal to the asymptotically optimal bandwidth for estimating the variate normal

density
, where is the (
n
h
1) −
n
n
h (q +
2
1
(0, )
nq
NI
σ
+ 1q
I
+
1) (q
+
× identity matrix and
2
n
σ
is the estimated
variance of
1
X
. Of course, there is no reason to believe that this is optimal in any sense in the
MCB setting. The preliminary experiments also indicated that the Monte Carlo results are
n

h

20

insensitive to the choice of trimming parameter, so trimming was not carried out in the
experiments reported here.
Implementation of the block bootstrap requires selecting the block length. Data-based
methods for selecting block lengths in hypothesis testing are not available, so results are reported
here for three different block lengths, (2, 5, 10). The experiments were carried out in GAUSS
using GAUSS random number generators. The sample size is 50n
=
. There are 5000 Monte
Carlo replications in each experiment. MCB and block bootstrap critical values are based on 99
bootstrap samples.
10

The results of the experiments are shown in Table 1. The differences between the
empirical and nominal rejection probabilities (ERP’s) with first-order asymptotic critical values
tend to be large. The symmetrical and lower-tail tests reject the null hypothesis too often. The
upper-tail test does not reject the null hypothesis often enough when the innovations have the
distribution (5.2). The ERP’s with block bootstrap critical values are sensitive to the block
length. With some block lengths, the ERP’s are small, but with others they are comparable to or
larger than the ERP’s with asymptotic critical values. With the MCB, the ERP’s are smaller than
they are with asymptotic critical values in 10 of the 12 experiments. The MCB has relatively
large ERP’s with the GARCH(1,1) model and normal innovations because this DGP lacks the
higher-order moments needed to obtain good accuracy with the bootstrap even with iid data.
6. CONCLUSIONS
The block bootstrap is the best known method for implementing the bootstrap with time
series data when one does not have a parametric model that reduces the DGP to simple random
sampling. However, the errors made by the block bootstrap converge to zero only slightly faster

than those made by first-order asymptotic approximations. This paper has shown that the errors
made by the MCB converge to zero more rapidly than those made by the block bootstrap if the
DGP is a Markov or approximate Markov process and certain other conditions are satisfied.
These conditions are stronger than those required by the block bootstrap. Therefore, the MCB is
not a substitute for the block bootstrap, but the MCB is an attractive alternative to the block
bootstrap when the MCB’s stronger regularity conditions are satisfied. Further research could
usefully investigate the possibility of developing bootstrap methods that are more accurate than
the block bootstrap but impose less a priori structure on the DGP than do the MCB or the sieve
bootstrap for linear processes.

21


FOOTNOTES
1
Statistics with asymptotic chi-square distributions are not treated explicitly in this paper.
However, the arguments made here can be extended to show that asymptotic chi-square statistics
based on GMM estimators behave like symmetrical statistics. See, for example, the discussion of
the GMM test of overidentifying restrictions in Hall and Horowitz (1996) and Andrews (1999).
Under regularity conditions similar to those of Sections 3 and 4, the results stated here for
symmetrical probabilities and tests apply to asymptotic chi-square statistics based on GMM
estimators. Andrews (1999) defines a class of “minimum
ρ
estimators” that is closely related to
GMM estimators. The results here also apply to t tests based on minimum
ρ
estimators.
2
GSM means that the process is strongly mixing with a mixing parameter that is an
exponentially decreasing function of the lag length.


3
If Y has finitely many moments and the required Edgeworth expansions are valid, then the
errors made by the MCB increase as the number of moments of Y decreases. If Y has too few
moments, then the errors made by the MCB decrease more slowly than the errors made by the
block bootstrap.
4
Götze and Künsch (1996) have given conditions under which T with a kernel-type variance
estimator has an Edgeworth expansion up to
On . Analogous conditions are not yet known
for expansions through
. A recent paper by Inoue and Shintani (2000) gives expansions
for statistics based on the block bootstrap for linear models with kernel-type variance estimators.
n
1/2
()

1
()On

5
As an example of a sufficient condition for 4(v), let
X
be scalar with support . Then 4(v)
holds if there are finite constants
, , , and such that


when and when 1 .
The generalization to multivariate X is analytically straightforward though notationally

cumbersome.
[0,1]
1
0c >
2
0c > 0
η
> 0
ε
>
1 q
cx
η
12
( , , )
yq q
p
xxcx
η

q
x
ε
<
112
(1 ) ( , , ) (1 )
qyq q
cx pxxc x
η
η

−≤ ≤−
q
x
ε
−<

6
This assumption is analogous to Assumption 4(iii) in Section 3.1 and is made for the same
reason. The discussion of Assumption 4(iii) applies to the current assumption.

22


n

G
LL
7
As in Section 3, is used instead of a kernel-type covariance matrix estimator to insure that
the test statistics of interest can be approximated with sufficient accuracy by smooth functions of
sample moments. This is not possible with a kernel covariance estimator.
8
The formulae required when depend on whether recentering is used. The formulae
given here apply only with recentering.

θ
=
9
As in Section 3, the assumption of bounded support can be relaxed at the cost of additional
technical complexity if the required Edgeworth expansions are valid.

10
Bootstrap samples were generated by applying the inverse-distribution method to a fine grid
of points. The computations are slow because the transition probability from each sampled grid
point to every other point must be estimated nonparametrically.

23

MATHEMATICAL APPENDIX
This Appendix presents the proofs of the theorems stated in the text. To minimize the
complexity of the notation, the proofs are given only for 1d
=
. The proofs for are similar
but require notation that is more complex and lengthy.
2d ≥
A.1 Preliminary Lemmas
This section states lemmas that are used to prove Theorems 3.1 and 3.2. Lemmas 1-11
establish properties of a truncated version of the DGP. Lemmas 12-14 establish properties of the
bootstrap DGP. The main result is Lemma 14, which establishes the rate of convergence of
ˆ
κκ
− . This result is used Section A.2 to prove Theorems 3.1 and 3.2. See Section 2.3 for an
informal outline. Assumptions 1-6 hold throughout. “Almost surely” is abbreviated “a.s.”
Define
. A tilde over a probability density function (e.g.,
1
* { : , , }
nj j j j q n
Cxxx C
−+
=∈

f

)
denotes the density of a truncated random variable whose density function without truncation is
f
. Define ( | )
n
f
xy ( )
ny
and
p
y as in (2.2). The transition density of the bootstrap DGP is
1
1
( | ) ( , , )
(|)
( * | , , )
nj j j jq n
nj j
nnj j jq
f
xyIx x C
fx y
Cx x
−+
−−

=
Π


,
where
*
(*|) (|)
nj
nnj t n j
C
Cy fxydΠ=

x.
The initial bootstrap observation is sampled from the distribution whose density is
11
1
( , , ) ( , , )
( , , )
()
ny q q n
ny q
nn
p
xxIxxC
px x
C

=
Π

,
where

11
( ) ( , , )
n
nn nyq q
C
C p x x dx dxΠ=

.
Conditional on the data { : 1, , }
i
X
i= n, define
(|)( *)
(|)
(*|)
jj j nj
jj
nj j
fx y Ix C
fx y
Cy

=
Π

,
where
*
(*|) (|)
nj

nj j j
C
Cy fxydΠ=

x
and

24

×