Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (63.59 KB, 18 trang )
<span class='text_page_counter'>(1)</span><div class='page_container' data-page=1>
<b>Peter Bartlett</b>
Last lecture:
1. Yule-Walker estimation
2. Computational simplifications: un/conditional least squares
3. Diagnostics
4. Model selection
Suppose that X<sub>1</sub>, X<sub>2</sub>, . . . , X<sub>n</sub> is drawn from a zero mean Gaussian
ARMA(p,q) process. The likelihood of parameters φ ∈ <sub>R</sub>p, θ ∈ <sub>R</sub>q,
σ<sub>w</sub>2 ∈ <sub>R</sub>+ is defined as the density of X = (X1, X2, . . . , Xn)′ under the
Gaussian model with those parameters:
L(φ, θ, σ<sub>w</sub>2 ) = 1
(2π)n/2 <sub>|</sub><sub>Γ</sub>
n|1/2
exp
−1
2X
′<sub>Γ</sub>−1
n X
,
where |A| denotes the determinant of a matrix A, and Γn is the
variance/covariance matrix of X with the given parameter values.
<i>We can simplify the likelihood by expressing it in terms of the innovations.</i>
Since the innovations are linear in previous and current values, we can write
X<sub>1</sub>
..
.
X<sub>n</sub>
| {z }
X
= C
X<sub>1</sub> − X<sub>1</sub>0
..
.
X<sub>n</sub> − Xn−1
n
| {z }
U
where C is a lower triangular matrix with ones on the diagonal.
Take the variance/covariance of both sides to see that
Γ<sub>n</sub> = CDC′ <sub>where</sub> <sub>D</sub> <sub>= diag(P</sub>0
Thus, |Γ<sub>n</sub>| = |C|2P<sub>1</sub>0 · · ·P<sub>n</sub>n−1 = P<sub>1</sub>0 · · ·P<sub>n</sub>n−1 and
X′<sub>Γ</sub>−1
n X = U′C′Γn−1CU = U′C′C−TD−1C−1CU = U′D−1U.
So we can rewrite the likelihood as
L(φ, θ, σ<sub>w</sub>2 ) = 1
(2π)n<sub>P</sub>0
1 · · ·Pnn−1
1/2 exp −
1
2
n
X
i=1
(Xi − X<sub>i</sub>i−1)2/P<sub>i</sub>i−1
!
= 1
(2πσ<sub>w</sub>2 )nr<sub>1</sub>0 · · ·rnn−11
/2 exp
−S(φ, θ)
2σ2
w
,
where ri−1
i = Pii−1/σw2 and
S(φ, θ) =
n
X
i=1
X<sub>i</sub> − Xi−1
i
2
ri−1
i
The log likelihood of φ, θ, σ<sub>w</sub>2 is
l(φ, θ, σ<sub>w</sub>2 ) = log(L(φ, θ, σ<sub>w</sub>2 ))
= −n
2 log(2πσ
2
w) −
1
2
n
X
i=1
logri−1
i −
S(φ, θ)
2σ2
w
.
Differentiating with respect to σ<sub>w</sub>2 shows that the MLE ( ˆφ,θ,ˆ σˆ<sub>w</sub>2 ) satisfies
n
2ˆσ2
w
= S( ˆφ,θ)ˆ
2ˆσ4
w
⇔ σˆ<sub>w</sub>2 = S( ˆφ, θ)ˆ
n ,
and φ,ˆ θˆminimize log S( ˆφ,θ)ˆ
n
!
+ 1
n
n
X
i=1
logri−1
The MLE ( ˆφ,θ,ˆ σˆ<sub>w</sub>2 ) satisfies
ˆ
σ<sub>w</sub>2 = S( ˆφ, θ)ˆ
n ,
and φ,ˆ θˆminimize log S( ˆφ,θ)ˆ
n
!
+ 1
n
n
X
i=1
logri−1
i ,
where ri−1
i = Pii−1/σw2 and
S(φ, θ) =
n
X
i=1
X<sub>i</sub> − Xi−1
i
2
ri−1
i
Minimization is done numerically (e.g., Newton-Raphson).
Computational simplifications:
• <i>Unconditional least squares. Drop the</i> log r<sub>i</sub>i−1 terms.
• <i>Conditional least squares. Also approximate the computation of</i> xi−1
i by
dropping initial terms in S. e.g., for AR(2), all but the first two terms in S
depend linearly on φ<sub>1</sub>, φ<sub>2</sub>, so we have a least squares problem.
The differences diminish as sample size increases. For example,
Pt−1
t → σw2 so rtt−1 → 1, and thus n−1
P
For an ARMA(p,q) process, the MLE and un/conditional least
squares estimators satisfy
ˆ
∼ AN
0,
σ<sub>w</sub>2
n
Γφφ Γφθ
Γ<sub>θφ</sub> Γ<sub>θθ</sub>,
−1
,
where
= Cov((X, Y ),(X, Y )),
X = (X<sub>1</sub>, . . . , X<sub>p</sub>)′ <sub>φ(B</sub><sub>)X</sub>
t = Wt,
Y = (Y<sub>1</sub>, . . . , Y<sub>p</sub>)′ <sub>θ(B</sub><sub>)Y</sub>
2. Computational simplifications: un/conditional least squares
3. Diagnostics
4. Model selection
1. Plot the time series.
Look for trends, seasonal components, step changes, outliers.
2. Nonlinearly transform data, if necessary
3. Identify preliminary values of p, and q.
4. Estimate parameters.
5. Use diagnostics to confirm residuals are white/iid/normal.
How do we check that a model fits well?
The residuals (innovations, x<sub>t</sub> − xt−1
t ) should be white.
<i>Consider the standardized innovations,</i>
e<sub>t</sub> = xt − xˆ
t−1
t
q
ˆ
Pt−1
t
.
This should behave like a mean-zero, unit variance, iid sequence.
• Check a time plot
{X<sub>t</sub>} i.i.d. implies that X<sub>t</sub>, X<sub>t</sub><sub>+1</sub> and X<sub>t</sub><sub>+2</sub> are equally likely to occur in
any of six possible orders:
0 5 10 15 20
0.5
1
1.5
2
2.5
3
3.5
(provided Xt, Xt+1, Xt+2 are distinct).
Define T = |{t : Xt, Xt+1, Xt+2 is a turning point}|.
ET = (n − 2)2/3.
Can show T ∼ AN(2n/3,8n/45).
Reject (at 5% level) the hypothesis that the series is i.i.d. if
T −
2n
3
> 1.96
r
8n
45.
S = |{i : X<sub>i</sub> > X<sub>i</sub><sub>−</sub><sub>1</sub>}| = |{i : (∇X)<sub>i</sub> > 0}|.
ES = n − 1
2 .
Can show S ∼ AN(n/2, n/12).
Reject (at 5% level) the hypothesis that the series is i.i.d. if
S −
n
2
> 1.96
r
n
12.
Tests for trend.
N = |{(i, j) : X<sub>i</sub> > X<sub>j</sub> and i > j}|.
EN = n(n − 1)
4 .
Can show N ∼ AN(n2/4, n3/36).
Reject (at 5% level) the hypothesis that the series is i.i.d. if
N −
n2
4
> 1.96
r
n3
36.
Plot the pairs (m<sub>1</sub>, X<sub>(1)</sub>), . . . ,(m<sub>n</sub>, X<sub>(</sub><sub>n</sub><sub>)</sub>),
where mj = EZ(j),
Z<sub>(1)</sub> < · · · < Z<sub>(</sub><sub>n</sub><sub>)</sub> are order statistics from N(0,1) sample of size n, and
X<sub>(1)</sub> < · · · < X<sub>(</sub><sub>n</sub><sub>)</sub> are order statistics of the series X1, . . . , Xn.
<i>Idea: If</i> Xi ∼ N(µ, σ2), then
EX<sub>(</sub><sub>j</sub><sub>)</sub> = µ + σmj,
so (m<sub>j</sub>, X<sub>(</sub><sub>j</sub><sub>)</sub>) <i>should be linear.</i>
2. Computational simplifications: un/conditional least squares
3. Diagnostics
4. Model selection