Ch11 a

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (599 KB, 32 trang )

Metropolis-Hastings algorithm
Dr. Jarad Niemi
STAT 544 - Iowa State University

April 2, 2019

Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

1 / 32

Outline

Metropolis-Hastings algorithm
Independence proposal
Random-walk proposal
Optimal tuning parameter
Binomial example
Normal example
Binomial hierarchical example

Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

2 / 32

Metropolis-Hastings algorithm

Metropolis-Hastings algorithm
Let
p(θ|y) be the target distribution and
θ(t) be the current draw from p(θ|y).
The Metropolis-Hastings algorithm performs the following
1. propose θ∗ ∼ g(θ|θ(t) )
2. accept θ(t+1) = θ∗ with probability min{1, r} where
r = r(θ(t) , θ∗ ) =

p(θ∗ |y)/g(θ∗ |θ(t) )
p(θ∗ |y) g(θ(t) |θ∗ )
=
p(θ(t) |y)/g(θ(t) |θ∗ )
p(θ(t) |y) g(θ∗ |θ(t) )

otherwise, set θ(t+1) = θ(t) .

Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

3 / 32

Metropolis-Hastings algorithm

Metropolis-Hastings algorithm
Suppose we only know the target up to a normalizing constant, i.e.
p(θ|y) = q(θ|y)/q(y)
where we only know q(θ|y).
The Metropolis-Hastings algorithm performs the following
1. propose θ∗ ∼ g(θ|θ(t) )
2. accept θ(t+1) = θ∗ with probability min{1, r} where
r = r(θ(t) , θ∗ ) =

p(θ∗ |y) g(θ(t) |θ∗ )
q(θ∗ |y)/q(y) g(θ(t) |θ∗ )
q(θ∗ |y) g(θ(t) |θ∗ )
=
=
p(θ(t) |y) g(θ∗ |θ(t) )
q(θ(t) |y)/q(y) g(θ∗ |θ(t) )
q(θ(t) |y) g(θ∗ |θ(t) )

otherwise, set θ(t+1) = θ(t) .

Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

4 / 32

Metropolis-Hastings algorithm

Two standard Metropolis-Hastings algorithms

Independent Metropolis-Hastings
Independent proposal, i.e. g(θ|θ(t) ) = g(θ)

Random-walk Metropolis
Symmetric proposal, i.e. g(θ|θ(t) ) = g(θ(t) |θ) for all θ, θ(t) .

Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

5 / 32

Independence Metropolis-Hastings

Independence Metropolis-Hastings
Let
p(θ|y) ∝ q(θ|y) be the target distribution,
θ(t) be the current draw from p(θ|y), and
g(θ|θ(t) ) = g(θ), i.e. the proposal is independent of the current value.
The independence Metropolis-Hastings algorithm performs the following
1. propose θ∗ ∼ g(θ)

2. accept θ(t+1) = θ∗ with probability min{1, r} where
r=

q(θ∗ |y)/g(θ∗ )
q(θ∗ |y) g(θ(t) )
=
q(θ(t) |y)/g(θ(t) )
q(θ(t) |y) g(θ∗ )

otherwise, set θ(t+1) = θ(t) .
Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

6 / 32

Independence Metropolis-Hastings

Intuition through examples
proposed= −1

proposed= 0

proposed= 1

0.3
0.2

0.1

current= −1

0.4

distribution
proposal

0.0

target

0.4

0.2
0.1

accept

current= 0

y

0.3

FALSE
TRUE

0.0

value

0.4

0.2
0.1

current= 1

current
0.3

proposed

0.0
−2 −1

Jarad Niemi (STAT544@ISU)

0

1

2

3 −2 −1

0

1

theta

2

3 −2 −1

Metropolis-Hastings

0

1

2

3

April 2, 2019

7 / 32

Independence Metropolis-Hastings

Example: Normal-Cauchy model
Let Y ∼ N (θ, 1) with θ ∼ Ca(0, 1) such that the posterior is
p(θ|y) ∝ p(y|θ)p(θ) ∝

exp(−(y − θ)2 /2)

1 + θ2

Use N (y, 1) as the proposal, then the Metropolis-Hastings acceptance
probability is the min{1, r} with
r =
=
=

Jarad Niemi (STAT544@ISU)

q(θ∗ |y) g(θ(t) )
q(θ(t) |y) g(θ∗ )
exp(−(y−θ∗ )2 /2)/1+(θ∗ )2 exp(−(θ(t) −y)2 /2)
exp(−(y−θ(t) )2 /2)/1+(θ(t) )2 exp(−(θ∗ −y)2 /2)
1+(θ(t) )2
1+(θ∗ )2

Metropolis-Hastings

April 2, 2019

8 / 32

Independence Metropolis-Hastings

Example: Normal-Cauchy model

0.4

density

distribution
proposal
target

0.2

0.0
−2

0

2

theta

Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

9 / 32

Independence Metropolis-Hastings

Example: Normal-Cauchy model
Independence Metropolis−Hastings

θ

1

0

−1
0

25

50

75

100

Iteration (t)

Independence Metropolis−Hastings (poor starting value)
10.0

θ

7.5
5.0
2.5
0.0
0

Jarad Niemi (STAT544@ISU)

25

50

Iteration (t)

Metropolis-Hastings

75

100

April 2, 2019

10 / 32

Independence Metropolis-Hastings

Need heavy tails
Recall that
rejection sampling requires the proposal to have heavy tails and
importance sampling is efficient only when the proposal has heavy
tails.
Independence Metropolis-Hastings also requires heavy tailed proposals for
efficiency since if θ(t) is
in a region where p(θ(t) |y) >> g(θ(t) ), i.e. target has heavier tails

than the proposal, then
any proposal θ∗ such that p(θ∗ |y) ≈ g(θ∗ ), i.e. in the center of the
target and proposal,
will result in
g(θ(t) ) p(θ∗ |y)
r=
≈0
p(θ(t) |y) g(θ∗ )
and few samples will be accepted.
Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

11 / 32

Independence Metropolis-Hastings

Need heavy tails - example
Suppose θ|y ∼ Ca(0, 1) and we use a standard normal as a proposal. Then
0.4

value

0.3

density
target

0.2

proposal

0.1
0.0
−4

−2

0

2

4

target / proposal

x
100

10

1
−4

−2

0

2

4

theta

Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

12 / 32

Independence Metropolis-Hastings

Need heavy tails

θ

2

0

−2

0

250

500

750

1000

Iteration (t)

Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

13 / 32

Random-walk Metropolis

Random-walk Metropolis
Let
p(θ|y) ∝ q(θ|y) be the target distribution,
θ(t) be the current draw from p(θ|y), and
g(θ∗ |θ(t) ) = g(θ(t) |θ∗ ), i.e. the proposal is symmetric.
The Metropolis algorithm performs the following
1. propose θ∗ ∼ g(θ|θ(t) )
2. accept θ(t+1) = θ∗ with probability min{1, r} where
r=

q(θ∗ |y) g(θ(t) |θ∗ )
q(θ∗ |y)
=
q(θ(t) |y) g(θ∗ |θ(t) )
q(θ(t) |y)

otherwise, set θ(t+1) = θ(t) .
This is also referred to as random-walk Metropolis.
Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

14 / 32

Random-walk Metropolis

Stochastic hill climbing
Notice that r = q(θ∗ |y)/q(θ(t) |y) and thus will accept whenever the target
density is larger when evaluated at the proposed value than it is when
evaluated at the current value.

0.4

Suppose θ|y ∼ N (0, 1), θ(t) = 1, and θ∗ ∼ N (θ(t) , 1).

0.2

0.0

0.1

dnorm(x)

0.3

Target
Proposal

−3

−2

−1

0

1

2

3

x

Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

15 / 32

Random-walk Metropolis

Example: Normal-Cauchy model

Let Y ∼ N (θ, 1) with θ ∼ Ca(0, 1) such that the posterior is
p(θ|y) ∝ p(y|θ)p(θ) ∝

exp(−(y − θ)2 /2)
1 + θ2

Use N (θ(t) , v 2 ) as the proposal, then the acceptance probability is the
min{1, r} with
q(θ∗ |y)
p(y|θ∗ )p(θ∗ )
r=
=
.
q(θ(t) |y)
p(y|θ(t) )p(θ(t) )
For this example, let v 2 = 1.

Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

16 / 32

Random-walk Metropolis

Example: Normal-Cauchy model
Random−walk Metropolis
2

θ

1
0
−1
−2
0

25

50

75

100

75

100

t

Random−walk Metropolis (poor starting value)
10.0

θ

7.5
5.0
2.5
0.0
0

25

50

t

Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

17 / 32

Random-walk Metropolis

Optimal tuning parameter

Random-walk tuning parameter
Let p(θ|y) be the target distribution, the proposal is symmetric with scale v 2 , and
θ(t) is (approximately) distributed according to p(θ|y).
If v 2 ≈ 0, then θ∗ ≈ θ(t) and
r=

q(θ∗ |y)
≈1
q(θ(t) |y)

and all proposals are accepted, but θ∗ ≈ θ(t) .
As v 2 → ∞, then q(θ∗ |y) ≈ 0 since θ∗ will be far from the mass of the
target distribution and
q(θ∗ |y)
r=
≈0
q(θ(t) |y)
so all proposed values are rejected.
So there is an optimal v 2 somewhere. For normal targets, the optimal
random-walk proposal variance is 2.42 V ar(θ|y)/d where d is the dimension of θ
which results in an acceptance rate of 40% for d = 1 down to 20% as d → ∞.
Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

18 / 32

Random-walk Metropolis

Optimal tuning parameter

Random-walk with tuning parameter that is too big and
too small
Let y|θ ∼ N (θ, 1), θ ∼ Ca(0, 1), and y = 1.
0.8

0.4

theta

as.factor(v)
0.1

0.0

10
−0.4

0

25

50

75

100

iteration

Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

19 / 32

Random-walk Metropolis

Binomial model

Binomial model
Let Y ∼ Bin(n, θ) and θ ∼ Be(1/2, 1/2), thus the posterior is
p(θ|y) ∝ θy−0.5 (1 − θ)n−y−0.5 I(0 < θ < 1).

To construct a random-walk Metropolis algorithm, we choose the proposal
θ∗ ∼ N (θ(t) , 0.42 )
and accept, i.e. θ(t+1) = θ∗ with probability min{1, r} where
r=

p(θ∗ |y)

(θ∗ )y−0.5 (1 − θ∗ )n−y−0.5 I(0 < θ∗ < 1)
=
p(θ(t) |y)
(θ(t) )y−0.5 (1 − θ(t) )n−y−0.5 I(0 < θ(t) < 1)

otherwise, set θ(t+1) = θ(t) .
Jarad Niemi (STAT544@ISU)

Metropolis-Hastings

April 2, 2019

20 / 32

Ch11 a

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về