Metropolis-Hastings algorithm
Dr. Jarad Niemi
STAT 544 - Iowa State University
April 2, 2019
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
1 / 32
Outline
Metropolis-Hastings algorithm
Independence proposal
Random-walk proposal
Optimal tuning parameter
Binomial example
Normal example
Binomial hierarchical example
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
2 / 32
Metropolis-Hastings algorithm
Metropolis-Hastings algorithm
Let
p(θ|y) be the target distribution and
θ(t) be the current draw from p(θ|y).
The Metropolis-Hastings algorithm performs the following
1. propose θ∗ ∼ g(θ|θ(t) )
2. accept θ(t+1) = θ∗ with probability min{1, r} where
r = r(θ(t) , θ∗ ) =
p(θ∗ |y)/g(θ∗ |θ(t) )
p(θ∗ |y) g(θ(t) |θ∗ )
=
p(θ(t) |y)/g(θ(t) |θ∗ )
p(θ(t) |y) g(θ∗ |θ(t) )
otherwise, set θ(t+1) = θ(t) .
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
3 / 32
Metropolis-Hastings algorithm
Metropolis-Hastings algorithm
Suppose we only know the target up to a normalizing constant, i.e.
p(θ|y) = q(θ|y)/q(y)
where we only know q(θ|y).
The Metropolis-Hastings algorithm performs the following
1. propose θ∗ ∼ g(θ|θ(t) )
2. accept θ(t+1) = θ∗ with probability min{1, r} where
r = r(θ(t) , θ∗ ) =
p(θ∗ |y) g(θ(t) |θ∗ )
q(θ∗ |y)/q(y) g(θ(t) |θ∗ )
q(θ∗ |y) g(θ(t) |θ∗ )
=
=
p(θ(t) |y) g(θ∗ |θ(t) )
q(θ(t) |y)/q(y) g(θ∗ |θ(t) )
q(θ(t) |y) g(θ∗ |θ(t) )
otherwise, set θ(t+1) = θ(t) .
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
4 / 32
Metropolis-Hastings algorithm
Two standard Metropolis-Hastings algorithms
Independent Metropolis-Hastings
Independent proposal, i.e. g(θ|θ(t) ) = g(θ)
Random-walk Metropolis
Symmetric proposal, i.e. g(θ|θ(t) ) = g(θ(t) |θ) for all θ, θ(t) .
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
5 / 32
Independence Metropolis-Hastings
Independence Metropolis-Hastings
Let
p(θ|y) ∝ q(θ|y) be the target distribution,
θ(t) be the current draw from p(θ|y), and
g(θ|θ(t) ) = g(θ), i.e. the proposal is independent of the current value.
The independence Metropolis-Hastings algorithm performs the following
1. propose θ∗ ∼ g(θ)
2. accept θ(t+1) = θ∗ with probability min{1, r} where
r=
q(θ∗ |y)/g(θ∗ )
q(θ∗ |y) g(θ(t) )
=
q(θ(t) |y)/g(θ(t) )
q(θ(t) |y) g(θ∗ )
otherwise, set θ(t+1) = θ(t) .
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
6 / 32
Independence Metropolis-Hastings
Intuition through examples
proposed= −1
proposed= 0
proposed= 1
0.3
0.2
0.1
current= −1
0.4
distribution
proposal
0.0
target
0.4
0.2
0.1
accept
current= 0
y
0.3
FALSE
TRUE
0.0
value
0.4
0.2
0.1
current= 1
current
0.3
proposed
0.0
−2 −1
Jarad Niemi (STAT544@ISU)
0
1
2
3 −2 −1
0
1
theta
2
3 −2 −1
Metropolis-Hastings
0
1
2
3
April 2, 2019
7 / 32
Independence Metropolis-Hastings
Example: Normal-Cauchy model
Let Y ∼ N (θ, 1) with θ ∼ Ca(0, 1) such that the posterior is
p(θ|y) ∝ p(y|θ)p(θ) ∝
exp(−(y − θ)2 /2)
1 + θ2
Use N (y, 1) as the proposal, then the Metropolis-Hastings acceptance
probability is the min{1, r} with
r =
=
=
Jarad Niemi (STAT544@ISU)
q(θ∗ |y) g(θ(t) )
q(θ(t) |y) g(θ∗ )
exp(−(y−θ∗ )2 /2)/1+(θ∗ )2 exp(−(θ(t) −y)2 /2)
exp(−(y−θ(t) )2 /2)/1+(θ(t) )2 exp(−(θ∗ −y)2 /2)
1+(θ(t) )2
1+(θ∗ )2
Metropolis-Hastings
April 2, 2019
8 / 32
Independence Metropolis-Hastings
Example: Normal-Cauchy model
0.4
density
distribution
proposal
target
0.2
0.0
−2
0
2
theta
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
9 / 32
Independence Metropolis-Hastings
Example: Normal-Cauchy model
Independence Metropolis−Hastings
θ
1
0
−1
0
25
50
75
100
Iteration (t)
Independence Metropolis−Hastings (poor starting value)
10.0
θ
7.5
5.0
2.5
0.0
0
Jarad Niemi (STAT544@ISU)
25
50
Iteration (t)
Metropolis-Hastings
75
100
April 2, 2019
10 / 32
Independence Metropolis-Hastings
Need heavy tails
Recall that
rejection sampling requires the proposal to have heavy tails and
importance sampling is efficient only when the proposal has heavy
tails.
Independence Metropolis-Hastings also requires heavy tailed proposals for
efficiency since if θ(t) is
in a region where p(θ(t) |y) >> g(θ(t) ), i.e. target has heavier tails
than the proposal, then
any proposal θ∗ such that p(θ∗ |y) ≈ g(θ∗ ), i.e. in the center of the
target and proposal,
will result in
g(θ(t) ) p(θ∗ |y)
r=
≈0
p(θ(t) |y) g(θ∗ )
and few samples will be accepted.
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
11 / 32
Independence Metropolis-Hastings
Need heavy tails - example
Suppose θ|y ∼ Ca(0, 1) and we use a standard normal as a proposal. Then
0.4
value
0.3
density
target
0.2
proposal
0.1
0.0
−4
−2
0
2
4
target / proposal
x
100
10
1
−4
−2
0
2
4
theta
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
12 / 32
Independence Metropolis-Hastings
Need heavy tails
θ
2
0
−2
0
250
500
750
1000
Iteration (t)
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
13 / 32
Random-walk Metropolis
Random-walk Metropolis
Let
p(θ|y) ∝ q(θ|y) be the target distribution,
θ(t) be the current draw from p(θ|y), and
g(θ∗ |θ(t) ) = g(θ(t) |θ∗ ), i.e. the proposal is symmetric.
The Metropolis algorithm performs the following
1. propose θ∗ ∼ g(θ|θ(t) )
2. accept θ(t+1) = θ∗ with probability min{1, r} where
r=
q(θ∗ |y) g(θ(t) |θ∗ )
q(θ∗ |y)
=
q(θ(t) |y) g(θ∗ |θ(t) )
q(θ(t) |y)
otherwise, set θ(t+1) = θ(t) .
This is also referred to as random-walk Metropolis.
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
14 / 32
Random-walk Metropolis
Stochastic hill climbing
Notice that r = q(θ∗ |y)/q(θ(t) |y) and thus will accept whenever the target
density is larger when evaluated at the proposed value than it is when
evaluated at the current value.
0.4
Suppose θ|y ∼ N (0, 1), θ(t) = 1, and θ∗ ∼ N (θ(t) , 1).
0.2
0.0
0.1
dnorm(x)
0.3
Target
Proposal
−3
−2
−1
0
1
2
3
x
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
15 / 32
Random-walk Metropolis
Example: Normal-Cauchy model
Let Y ∼ N (θ, 1) with θ ∼ Ca(0, 1) such that the posterior is
p(θ|y) ∝ p(y|θ)p(θ) ∝
exp(−(y − θ)2 /2)
1 + θ2
Use N (θ(t) , v 2 ) as the proposal, then the acceptance probability is the
min{1, r} with
q(θ∗ |y)
p(y|θ∗ )p(θ∗ )
r=
=
.
q(θ(t) |y)
p(y|θ(t) )p(θ(t) )
For this example, let v 2 = 1.
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
16 / 32
Random-walk Metropolis
Example: Normal-Cauchy model
Random−walk Metropolis
2
θ
1
0
−1
−2
0
25
50
75
100
75
100
t
Random−walk Metropolis (poor starting value)
10.0
θ
7.5
5.0
2.5
0.0
0
25
50
t
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
17 / 32
Random-walk Metropolis
Optimal tuning parameter
Random-walk tuning parameter
Let p(θ|y) be the target distribution, the proposal is symmetric with scale v 2 , and
θ(t) is (approximately) distributed according to p(θ|y).
If v 2 ≈ 0, then θ∗ ≈ θ(t) and
r=
q(θ∗ |y)
≈1
q(θ(t) |y)
and all proposals are accepted, but θ∗ ≈ θ(t) .
As v 2 → ∞, then q(θ∗ |y) ≈ 0 since θ∗ will be far from the mass of the
target distribution and
q(θ∗ |y)
r=
≈0
q(θ(t) |y)
so all proposed values are rejected.
So there is an optimal v 2 somewhere. For normal targets, the optimal
random-walk proposal variance is 2.42 V ar(θ|y)/d where d is the dimension of θ
which results in an acceptance rate of 40% for d = 1 down to 20% as d → ∞.
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
18 / 32
Random-walk Metropolis
Optimal tuning parameter
Random-walk with tuning parameter that is too big and
too small
Let y|θ ∼ N (θ, 1), θ ∼ Ca(0, 1), and y = 1.
0.8
0.4
theta
as.factor(v)
0.1
0.0
10
−0.4
0
25
50
75
100
iteration
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
19 / 32
Random-walk Metropolis
Binomial model
Binomial model
Let Y ∼ Bin(n, θ) and θ ∼ Be(1/2, 1/2), thus the posterior is
p(θ|y) ∝ θy−0.5 (1 − θ)n−y−0.5 I(0 < θ < 1).
To construct a random-walk Metropolis algorithm, we choose the proposal
θ∗ ∼ N (θ(t) , 0.42 )
and accept, i.e. θ(t+1) = θ∗ with probability min{1, r} where
r=
p(θ∗ |y)
(θ∗ )y−0.5 (1 − θ∗ )n−y−0.5 I(0 < θ∗ < 1)
=
p(θ(t) |y)
(θ(t) )y−0.5 (1 − θ(t) )n−y−0.5 I(0 < θ(t) < 1)
otherwise, set θ(t+1) = θ(t) .
Jarad Niemi (STAT544@ISU)
Metropolis-Hastings
April 2, 2019
20 / 32