Tải bản đầy đủ (.pdf) (59 trang)

Class Notes in Statistics and Econometrics Part 33 pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (481.24 KB, 59 trang )

CHAPTER 65
Disturbance Related (Seemingly Unrelated)
Regressions
One has m timeseries regression equations y
i
= X
i
β
i
+ ε
ε
ε
i
. Everything is dif-
ferent: the dependent variables, the explanatory variables, the coefficient vectors.
Even the numbers of the observations may be different, The ith regression has k
i
explanatory variables and t
i
observations. They may be time series covering dif-
ferent but partly overlapping time periods. This is why they are called “seemingly
unrelated” regressions. The only connection between the regressions is that for those
observations which overlap in time the disturbances for different regressions are con-
temperaneously correlated, and these correlations are assumed to be constant over
1375
1376 65. SEEMINGLY UNRELATED
time. In tiles, this model is
(65.0.18)
t
m
Y


m
=
t
m
X k
m
B

m
+
t
m
E
m
65.1. The Supermatrix Representation
One can combine all these regressions into one big “supermatrix” as follows:
(65.1.1)





y
1
y
2
.
.
.
y

m





=





X
1
O ··· O
O X
2
··· O
.
.
.
.
.
.
.
.
.
.
.
.

O O ··· X
m










β
1
β
2
.
.
.
β
m





+






ε
ε
ε
1
ε
ε
ε
2
.
.
.
ε
ε
ε
m





65.1. THE SUPERMATRIX REPRESENTATION 1377
The covariance matrix of the disturbance term in (65.1.1) has the following “striped”
form:
(65.1.2)
V
[






ε
ε
ε
1
ε
ε
ε
2
.
.
.
ε
ε
ε
m





] =





σ

11
I
11
σ
12
I
12
··· σ
1m
I
1m
σ
21
I
21
σ
22
I
22
··· σ
2m
I
2m
.
.
.
.
.
.
.

.
.
.
.
.
σ
m1
I
m1
σ
m2
I
m2
··· σ
mm
I
mm





Here I
ij
is the t
i
×t
j
matrix which has zeros everywhere except at the intersections
of rows and columns denoting the same time period.

In the special case that all time pe riods are identical, i.e., all t
i
= t, one can
define the matrices Y =

y
1
··· y
m

and E =

ε
ε
ε
1
··· ε
ε
ε
m

, and write the
equations in matrix form as follows:
(65.1.3) Y =

X
1
β
1
. X

m
β
m

+ E = H(B) + E
The vector of dependent variables and the vector of disturbances in the supermatrix
representation (65.1.1) can in this spe cial case be written in terms of the vector-
ization operator as vec Y and vec E. And the covariance matrix can be written as
a Kronecker product:
V
[vec E] = Σ
Σ
Σ ⊗ I, since all I
ij
in (65.1.2) are t × t identity
1378 65. SEEMINGLY UNRELATED
matrices. If t = 5 and m = 3, the covariance matrix would be



























σ
11
0 0 0 0 σ
12
0 0 0 0 σ
13
0 0 0 0
0 σ
11
0 0 0 0 σ
12
0 0 0 0 σ
13
0 0 0
0 0 σ
11
0 0 0 0 σ

12
0 0 0 0 σ
13
0 0
0 0 0 σ
11
0 0 0 0 σ
12
0 0 0 0 σ
13
0
0 0 0 0 σ
11
0 0 0 0 σ
12
0 0 0 0 σ
13
σ
21
0 0 0 0 σ
22
0 0 0 0 σ
23
0 0 0 0
0 σ
21
0 0 0 0 σ
22
0 0 0 0 σ
23

0 0 0
0 0 σ
21
0 0 0 0 σ
22
0 0 0 0 σ
23
0 0
0 0 0 σ
21
0 0 0 0 σ
22
0 0 0 0 σ
23
0
0 0 0 0 σ
21
0 0 0 0 σ
22
0 0 0 0 σ
23
σ
31
0 0 0 0 σ
32
0 0 0 0 σ
33
0 0 0 0
0 σ
31

0 0 0 0 σ
32
0 0 0 0 σ
33
0 0 0
0 0 σ
31
0 0 0 0 σ
32
0 0 0 0 σ
33
0 0
0 0 0 σ
31
0 0 0 0 σ
32
0 0 0 0 σ
33
0
0 0 0 0 σ
31
0 0 0 0 σ
32
0 0 0 0 σ
33



























If in addition all regressions have the same number of regressors, one can combine
the coefficients into a matrix B and can write the system as
(65.1.4) vec Y = Z vec B + vec E vec E ∼ (o,Σ
Σ
Σ ⊗ I),
65.1. THE SUPERMATRIX REPRESENTATION 1379
where Z contains the regressors arranged in a block-diagonal “supermatrix.”
If one knows Σ
Σ

Σ up to a multiplicative factor, and if all regressions cover the same
time period, then one can apply (26.0.2) to (65.1.4) to get the following formula for
the GLS estimator and at the same time maximum likelihood estimator:
(65.1.5) vec(
ˆ
B) =

Z


Σ
Σ
−1
⊗ I)Z

−1
Z


Σ
Σ
−1
⊗ I) vec(Y ).
To evaluate this, note first that Z


Σ
Σ
−1
⊗ I) =






X

1
O ··· O
O X

2
··· O
.
.
.
.
.
.
.
.
.
.
.
.
O O ··· X

m











σ
11
I σ
12
I ··· σ
1m
I
σ
21
I σ
22
I ··· σ
2m
I
.
.
.
.
.
.
.
.

.
.
.
.
σ
m1
I σ
m2
I ··· σ
mm
I





=



σ
11
X

1
··· σ
1m
X

1

.
.
.
.
.
.
.
.
.
σ
m1
X

m
··· σ
mm
X

m



where σ
ij
are the elements of the inverse of Σ
Σ
Σ, therefore
(65.1.6)




ˆ
β
1
.
.
.
ˆ
β
m



=



σ
11
X

1
X
1
··· σ
1m
X

1
X

m
.
.
.
.
.
.
.
.
.
σ
m1
X

m
X
1
··· σ
mm
X

m
X
m



−1




X

1

m
i=1
σ
1i
y
i
.
.
.
X

m

m
i=1
σ
mi
y
i



.
In the seemingly unrelated regression model, OLS on each equation singly is therefore
less efficient than an approach which estimates all the equations simultaneously. If

1380 65. SEEMINGLY UNRELATED
the numbers of observations in the different regressions are unequal, then the formula
for the GLSE is no longer so simple. It is given in [JHG
+
88, (11.2.59) on p. 464].
65.2. The Likelihood Function
We know therefore what to do in the hypothetical case that Σ
Σ
Σ is known. What
if it is not known? We will derive here the maximum likelihood estimator. For the
exponent of the likelihood function we need the following mathematical tool:
Problem 532. Show that

t
s=1
a

s


Ωa
s
= tr A



ΩA where A =

a
1

. . . a
t

.
Answer.
A



ΩA =


a

1
.
.
.
a

t






a
1
. . . a

t

=

a

1


Ωa
1
a

1


Ωa
2
··· a

1


Ωa
t
a

2



Ωa
1
a

2


Ωa
2
··· a

2


Ωa
t
a

t


Ωa
1
a

t


Ωa
2

··· a

t


Ωa
t

Now take the trace of this. 
65.2. THE LIKELIHOOD FUNCTION 1381
To derive the likelihood function, define the matrix function H(B) as follows:
H(B) is a t × m matrix the ith column of which is X
i
β
i
, i.e., H(B) as a column-
partitioned matrix is H(B) =

X
1
β
1
··· X
m
β
m

. In tiles,
(65.2.1) H(B) =
t

m
X k
m
B

m
The above notation follows [DM93, 315–318]. [Gre97, p. 683 top] writes this
same H as the matrix product
(65.2.2) H(B) = ZΠ(B)
where Z has all the different regressors in the different regressions as columns (it is
Z =

X
1
··· X
n

with duplicate columns deleted), and the ith column of Π has
zeros for those regressors which are not in the ith equation, and elements of B for
those regressors which are in the ith equation.
Using H, the model is simply, as in (65.0.18),
(65.2.3) Y = H(B) + E , vec(E) ∼ N (o, Σ
Σ
Σ ⊗ I)
1382 65. SEEMINGLY UNRELATED
This is a matrix generalization of (56.0.21).
The likelihood function which we are going to derive now is valid not only for
this particular H but for more general, possibly nonlinear H. Define η
s
(B) to be

the sth row of H, written as a column vector, i.e., as a row-partitioned matrix we
have H(B) =



η

1
(B)
.
.
.
η

t
(B)



. Then (
65.2.3) in row-partitioned form reads
(65.2.4)



y

1
.
.

.
y

t



=



η

1
(B)
.
.
.
η

t
(B)



+



ε

ε
ε

1
.
.
.
ε
ε
ε

t



We assume Normality, the sth row vector is y

s
∼ N(η

s
(B), Σ
Σ
Σ), or y
s
∼ N(η
s
(B), Σ
Σ
Σ),

and we assume that different rows are independent. Therefore the density function
65.2. THE LIKELIHOOD FUNCTION 1383
is
f
Y
(Y ) =
t

s=1

(2π)
−m/2
(detΣ
Σ
Σ)
−1/2
exp


1
2
(y
s
− η
s
(B))

Σ
Σ
Σ

−1
(y
s
− η
s
(B))


= (2π)
−mt/2
(detΣ
Σ
Σ)
−t/2
exp


1
2

s
(y
s
− η
s
(B))

Σ
Σ
Σ

−1
(y
s
− η
s
(B))

= (2π)
−mt/2
(detΣ
Σ
Σ)
−t/2
exp


1
2
tr(Y −H(B))Σ
Σ
Σ
−1
(Y −H(B))


= (2π)
−mt/2
(detΣ
Σ
Σ)

−t/2
exp


1
2
tr(Y −H(B))

(Y −H(B))Σ
Σ
Σ
−1

.(65.2.5)
Problem 533. Expain exactly the step in the derivation of (65.2.5) in which the
trace enters.
1384 65. SEEMINGLY UNRELATED
Answer. Write the quadratic form in the exponent as follows:
t

s=1
(y
s
− η
s
(B))

Σ
Σ
Σ

−1
(y
s
− η
s
(B)) =
t

s=1
tr(y
s
− η
s
(B))

Σ
Σ
Σ
−1
(y
s
− η
s
(B))
(65.2.6)
=
t

s=1
tr Σ

Σ
Σ
−1
(y
s
− η
s
(B))(y
s
− η
s
(B))

(65.2.7)
= tr Σ
Σ
Σ
−1
t

s=1
(y
s
− η
s
(B))(y
s
− η
s
(B))


(65.2.8)
= tr Σ
Σ
Σ
−1

(y
1
− η
1
(B)) ··· (y
t
− η
t
(B))



(y
1
− η
1
(B))

.
.
.
(y
t

− η
t
(B))



(65.2.9)
= tr Σ
Σ
Σ
−1
(Y − H(B))

(Y − H(B))(65.2.10)

The log likelihood function (Y ; B, Σ
Σ
Σ) is therefore
(65.2.11)  = −
mt
2
log 2π −
t
2
log detΣ
Σ
Σ −
1
2
tr(Y −H(B))


(Y −H(B))Σ
Σ
Σ
−1
.
65.2. THE LIKELIHOOD FUNCTION 1385
In order to concentrate out Σ
Σ
Σ it is simpler to take the partial derivatives with respect
to Σ
Σ
Σ
−1
than those with respect to Σ
Σ
Σ itself. Using the matrix differentiation rules
(C.1.24) and (C.1.16) and noting that −t/2 log det Σ
Σ
Σ = t/2 log det Σ
Σ
Σ
−1
one gets:
(65.2.12)
∂
∂Σ
Σ
Σ
−1

=
t
2
Σ
Σ
Σ −
1
2
(Y −H(B))

(Y −H(B)),
and if we set this zero we get
(65.2.13)
ˆ
Σ(B) =
1
t
(Y −H(B))

(Y −H(B)).
Written row vector by row vector this is
(65.2.14)
ˆ
Σ =
1
t
t

s=1
(y

s
− η
s
(B))(y
s
− η
s
(B))

The maximum likelihood estimator of Σ
Σ
Σ is therefore simply the sample covariance
matrix of the residuals taken with the maximum likelihood estimates of B.
We know therefore what the maximum likelihood estimator of Σ
Σ
Σ is if B is known:
it is the sample covariance matrix of the residuals. And we know what the maximum
likelihood estimator of B is if Σ
Σ
Σ is known: it is given by equation (65.1.6). In such a
situation, one goo d numerical method is to iterate: start with an initial estimate of
Σ
Σ
Σ (perhaps from the OLS residuals), get from this an estimate of B, then use this
1386 65. SEEMINGLY UNRELATED
to get a second estimate of Σ
Σ
Σ, etc., until it converges. This iterative scheme is called
iterated Zellner or iterated SUR. See [Ruu00, p. 706], the original article is [Zel62].
65.3. Concentrating out the Covariance Matrix (Incomplete)

One can rewrite (65.2.11) using (65.2.13) as a definition:
(65.3.1)  = −
mt
2
log 2π −
t
2
log detΣ
Σ
Σ −
t
2
trΣ
Σ
Σ
−1
ˆ
Σ
and therefore the concentrated log likelihood function is, compare [Gre97, 15-53 on
p. 685]:

c
= −
mt
2
log 2π −
t
2
log det
ˆ

Σ −
t
2
tr
ˆ
Σ
−1
ˆ
Σ
= −
mt
2
(1 + log 2π) −
t
2
log det
ˆ
Σ(B).(65.3.2)
This is an important formula which is valid for all the different models, including
nonlinear models, which can be written in the form (65.2.3).
As a next step we will write, following [Gre97, p. 683], H(B) = ZΠ(B) and
derive the following formula from [Gre97, p. 685]:
(65.3.3)
∂
c
∂Π

=
ˆ
Σ

−1
(Y −ZΠ)

Z
65.3. CONCENTRATING OUT THE COVARIANCE MATRIX (INCOMPLETE) 1387
Here is a derivation of this using tile notation. We use the notation
ˆ
E = Y −H(B)
for the matrix of residuals, and apply the chain rule to get the derivatives:
(65.3.4)
∂
c
∂Π

=
∂
c

ˆ
Σ


ˆ
Σ

ˆ
E


ˆ

E
∂Π

The product here is not a matrix product but the concatenation of a matrix with
three arrays of rank 4. In tile notation, the first term in this product is
(65.3.5)
∂
c

ˆ
Σ

= ∂ 
c
/∂
ˆ
Σ =
t
2
ˆ
Σ
−1
1388 65. SEEMINGLY UNRELATED
This is an array of rank 2, i.e., a matrix, but the other factors are arrays of rank 4:
Using (C.1.22) we get

ˆ
Σ

ˆ

E

= ∂
ˆ
Σ


ˆ
E =
1
t

ˆ
E
ˆ
E


ˆ
E =
=
1
t
X
+
1
t
X
Finally, by (C.1.18),


ˆ
E
∂Π

= ∂
Z
Π

∂ Π =
Z
65.4. SITUATIONS IN WHICH OL S IS BEST 1389
Putting it all together, using the symmetry of the first term (65.3.5) (which has the
effect that the term with the crossing arms is the same as the straight one), gives
∂
c
∂Π

= ∂ 
c
/ ∂
Π
=
ˆ
E Z
ˆ
Σ
−1
which is exactly (65.3.3).
65.4. Situations in which OLS is Best
One of the most amazing results regarding seemingly unrelated regressions is:

if the X matrices are identical, then it is not necessary to do GLS, because OLS
on each equation separately gives exactly the same res ult. Question 534 gives three
different proofs of this:
Problem 534. Given a set of disturbance related regression equations
(65.4.1) y
i
= Xβ
i
+ ε
ε
ε
i
i = 1, . . . m
in which all X
i
are equal to X, note that equation (65.4.1) has no subscript at the
matrices of explanatory variables.
1390 65. SEEMINGLY UNRELATED
• a. 1 point Defining Y =

y
1
··· y
m

, B =

β
1
··· β

m

and E =

ε
ε
ε
1
··· ε
ε
ε
m

, show that the m equations (65.4.1) can be combined into the single
matrix equation
(65.4.2) Y = XB + E.
Answer. The only step needed to show this is that XB, column by column, can be written
XB =


1
. . . Xβ
m

. 
• b. 1 point The contemporaneous correlations of the disturbances can now be
written vec(E) ∼ (o,Σ
Σ
Σ ⊗ I).
• c. 4 points For this part of the Question you will need the following properties

of vec and ⊗: (A⊗B)

= A

⊗B

, (A⊗B)(C ⊗D) = (AC)⊗(BD), (A⊗B)
−1
=
A
−1
⊗ B
−1
, vec(A + B ) = vec(A) + vec(B), and finally the important identity
((B.5.19)) vec(ABC) = (C

⊗ A) vec(B).
By applying the vec operator to (65.4.2) show that the BLUE of the matrix B is
ˆ
B = (X

X)
−1
X

Y , i.e., show that, despite the fact t hat the dispersion matrix is
not spherical, one simply has to apply OLS to every equation separately.
Answer. Use (B.5.19) to write (65.4.2) in vectorized form as
vec(Y ) = (I ⊗ X) vec(B) + vec(E)
65.4. SITUATIONS IN WHICH OL S IS BEST 1391

Since
V
[vec(E)] = Σ
Σ
Σ ⊗I, the GLS estimate is
vec(
ˆ
B) =

(I ⊗X)


Σ
Σ ⊗I)
−1
(I ⊗X)

−1
(I ⊗X)


Σ
Σ ⊗I)
−1
vec(Y )
=

(I ⊗X

)(Σ

Σ
Σ
−1
⊗ I)(I ⊗ X)

−1
(I ⊗X

)(Σ
Σ
Σ
−1
⊗ I) vec(Y )
=

Σ
Σ
Σ
−1
⊗ X

X

−1

Σ
Σ
−1
⊗ X


) vec(Y )
=

I ⊗(X

X)
−1
X


vec(Y )
and applying (B.5.19) again, this is equivalent to
ˆ
B = (X

X)
−1
X

Y .(65.4.3)

• d. 3 points [DM93, p. 313] appeals to Kruskal’s theorem, which is Question
499, to prove this. Supply the details of this proof.
Answer. Look at the derivation of (65.4.3) again. The Σ
Σ
Σ
−1
in numerator and denominator
cancel out since they commute with Z. defining Ω


Ω = Σ
Σ
Σ ⊗ I, this “commuting” is the formula
1392 65. SEEMINGLY UNRELATED


ΩZ = ZK for some K, i.e.,
(65.4.4)


σ
11
I . . . σ
1m
I
.
.
.
.
.
.
.
.
.
σ
m1
I . . . σ
mm
I





X . . . 0
.
.
.
.
.
.
.
.
.
0 . . . X


=


X . . . 0
.
.
.
.
.
.
.
.
.
0 . . . X





σ
11
I . . . σ
1m
I
.
.
.
.
.
.
.
.
.
σ
m1
I . . . σ
mm
I


.
Note that the I on the lefthand side are m×m, and those on the right are k ×k. This “commuting”
allows us to apply Kruskal’s theorem. 
• e. 4 points Theil [The71, pp. 500–502] gives a different proof: he maximizes
the likelihood function of Y with respect to B for the given Σ

Σ
Σ, using the fact that
the matrix of OLS estimates
ˆ
B has the property that (Y −X
ˆ
B)

(Y −X
ˆ
B) is by a
nnd matrix smaller than any other (Y −XB)

(Y −XB). Carry out this proof in
detail.
Answer. Let B =
ˆ
B + A; then (Y −XB)

(Y −XB) = (Y −X
ˆ
B)

(Y −X
ˆ
B) + A

X

XA

because the cross product terms A

X

(Y − X
ˆ
B) = O since
ˆ
B satisfies the normal equation
X

(Y − X
ˆ
B) = O.
Instead of maximizing the likelihood function with respect to B and Σ
Σ
Σ simultaneously, Theil
in [The71, p. 500–502] only maximizes it with respect to B for the given Σ
Σ
Σ and finds a solution
which is independent of Σ
Σ
Σ. The likelihoo d function of Y is (65.2.5) with H(B) = XB, i.e.,
(65.4.5) f
Y
(Y ) = (2π)
−tm/2
(det Σ
Σ
Σ)

−t/2
exp


1
2
tr Σ
Σ
Σ
−1
(Y − XB))

(Y − XB)

65.4. SITUATIONS IN WHICH OL S IS BEST 1393
The trace in the exponent can be split up into tr(Σ
Σ
Σ
−1
(Y −X
ˆ
B)

(Y −X
ˆ
B) + tr Σ
Σ
Σ
−1
X


X

XA;
but this last term is equal to tr XAΣ
Σ
Σ
−1
A

X

, which is ≥ 0. 
Joint estimation has therefore the greatest efficiency gains over OLS if the cor-
relations between the errors are high and the correlations between the explanatory
variables are low.
Problem 535. Are following statements true or false?
• a. 1 point In a seemingly unrelated regression framework, joint estimation of
the whole model is much better than estimation of each equation singly if the errors
are highly correlated. True or false?
Answer. True 
• b. 1 point In a seemingly unrelated regression framework, joint estimation
of the whole model is much better than estimation of each equation singly if the
independent variables in the different regressions are highly correlated. True or false?
Answer. False. 
Assume I have two equations whose disturbances are correlated, and the second
has all variables that the first has, plus some additional ones. Then the inclusion
of the se cond equation does not give additional information for the first; however,
including the first gives additional information for the second!
1394 65. SEEMINGLY UNRELATED

What is the rationale for this? Since the first equation has fewer variables than
the second, I know the disturbances better. For instance, if the equation would
not have any variables, then I would know the disturbances exactly. But if I know
these disturbances, and know that they are correlated with the disturbances of the
second equation, then I can also say something about the disturbances of the second
equation, and therefore estimate the parameters of the second equation better.
Problem 536. You have two disturbance-related equations
(65.4.6) y
1
= X
1
β
1
+ ε
ε
ε
1
, y
2
= X
2
β
2
+ ε
ε
ε
2
,

ε

ε
ε
1
ε
ε
ε
2



o
o

,

σ
11
σ
12
σ
21
σ
22

⊗ I
where all σ
ij
are known, and the set of explanatory variables in X
1
is a subset of

those in X
2
. One of t he following two statements is correct, the other is false. Which
is correct? (a) in order to estimate β
1
, OLS on the first equation singly is as good
as SUR. (b) in order to estimate β
2
, OLS on the second equation singly is as good
as SUR. Which of these two is true?
Answer. The first is true. One cannot obtain a more efficient estimator of β
1
by considering
the whole system. This is [JGH
+
85, p. 469]. 
65.5. UNKNOWN COVARIANCE MATRIX 1395
65.5. Unknown Covariance Matrix
What to do when we don’t know Σ
Σ
Σ? Two main possibilities: One is “feas ible
GLS”, which uses the OLS residuals to estimate Σ
Σ
Σ, and then uses the GLS formula
with the estimated elements of Σ
Σ
Σ. This is the most obvious method; unfortunately
if the numbers of observations are unequal, then this may no longer give a nonneg-
ative definite matrix. The other is the maximum likelihood estimation of B and Σ
Σ

Σ
simultaneously. If one iterates the “feasible GLS” method, i.e., uses the residuals of
the feasible GLS equation to get new estimates of Σ
Σ
Σ, then does feasible GLS with
the new Σ
Σ
Σ, etc., then one will get the maximum likelihood estimator.
Problem 537. 4 points Explain how to do iterated EGLS (i.e., GLS with an
estimated covariance matrix) in a model with first-order autoregression, and in a
seemingly unrelated regression model. Will you end up with the (normal) maximum
likelihood estimator if you iterate until convergence?
Answer. You will only get the Maximum Likelihood estimator in the SUR case, not in the
AR1 case, because the determi nant term will never come in by iteration, and in the AR1 case, EGLS
is known to un deres tima te the ρ. Of course, iterated EGLS is in both situations asymtotically as
good as Maximum Likelihood, but the question was whether it is in small samples already equal to
the ML. You can have asymptotically equivalent estimates which differ greatly in small samples. 
1396 65. SEEMINGLY UNRELATED
Asymptotically, feasible GLS is as good as Maximum likelihood. This is really
nothing new and nothing exciting. The two es timators may have quite different
prop e rties before the asymptotic limit is reached! But there is another, much stronger
result: already for finite sample size, iterated feasible GLS is equal to the maximum
likelihood estimator.
Problem 538. 5 points Define “seemingly unrelated equations” and discuss the
estimation issues involved.
CHAPTER 66
Simultaneous Equations Systems
This was a central part of econometrics in the fifties and sixties.
66.1. Examples
[JHG

+
88, 14.1 Introduction] gives examples. The first example is clearly not
identified, indeed it has no exogenous variables. But the idea of a simultaneous
equations system is not dep endent on this:
y
d
= ια + pβ + ε
ε
ε
1
(66.1.1)
y
s
= ιγ + pδ + ε
ε
ε
2
(66.1.2)
y
d
= y
s
(66.1.3)
1397
1398 66. SIMULTANEOUS EQUATIONS SYSTEMS
y
d
, y
s
, and p are the jointly determined endogenous variables. The first equation

describes the behavior of the consumers, the second the behavior of producers.
Problem 539. [Gre97, p. 709 ff]. Here is a demand and supply curve with q
quantity, p price, y income, and ι is the vector of ones. All vectors are t-vectors.
q = α
0
ι + α
1
p + α
2
y + ε
ε
ε
d
ε
ε
ε
d
∼ (o, σ
2
d
I) (demand)(66.1.4)
q = β
0
ι + β
1
p + ε
ε
ε
s
ε

ε
ε
s
∼ (o, σ
2
s
I) (supply)(66.1.5)
ε
ε
ε
d
and ε
ε
ε
s
are independent of y, but amongst each other they are contemporaneously
correlated, with their covariance constant over time:
(66.1.6) cov[ε
dt
, ε
su
] =

0 if t = u
σ
ds
if t = u
• a. 1 point Which variables are exogenous and which are endogenous?
Answer. p and q are called jointly dependent or endogenous. y is determined outside the
system or exogenous. 

66.1. EXAMPLES 1399
• b. 2 points Assuming α
1
= β
1
, verify that the reduced-form equations for p
and q are as follows:
p =
α
0
− β
0
β
1
− α
1
ι +
α
2
β
1
− α
1
y +
ε
ε
ε
d
−ε
ε

ε
s
β
1
− α
1
(66.1.7)
q =
β
1
α
0
− β
0
α
1
β
1
− α
1
ι +
β
1
α
2
β
1
− α
1
y +

β
1
ε
ε
ε
d
− α
1
ε
ε
ε
s
β
1
− α
1
(66.1.8)
Answer. One gets the reduced form equation for p by simply setting the righthand sides
equal:
β
0
ι + β
1
p + ε
ε
ε
s
= α
0
ι + α

1
p + α
2
y + ε
ε
ε
d

1
− α
1
)p = (α
0
− β
0
)ι + α
2
y + ε
ε
ε
d
−ε
ε
ε
s
,
hence (66.1.7). To get the reduced form equation for q, plug that for p into the supply function
(one might also plug it into the demand function but the math would be more complicated):
q = β
0

ι + β
1
p + ε
ε
ε
s
= β
0
ι +
β
1

0
− β
0
)
β
1
− α
1
ι +
β
1
α
2
β
1
− α
1
y +

β
1

ε
ε
d
−ε
ε
ε
s
)
β
1
− α
1
+ ε
ε
ε
s
Combining the first two and the last two terms gives (66.1.8). 
• c. 2 points Show that one will in general not get consistent estimates of the
supply equation parameters if one regresses q on p (with an intercept).

×