Tải bản đầy đủ (.pdf) (23 trang)

DSpace at VNU: Optimal control problem for the Lyapunov exponents of random matrix products

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (135.93 KB, 23 trang )

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS: Vol. 105, No. 2, pp. 347–369, MAY 2000

Optimal Control Problem for the Lyapunov
Exponents of Random Matrix Products1
N. H. DU2
Communicated by G. P. Papavassilopoulos

Abstract. This paper deals with the optimal control problem for the
Lyapunov exponents of stochastic matrix products when these matrices
depend on a controlled Markov process with values in a finite or
countable set. Under some hypotheses, the reduced process satisfies the
Doeblin condition and the existence of an optimal control is proved.
Furthermore, with this optimal control, the spectrum of the system consists of only one element.
Key Words. Random matrix products, Lyapunov exponents, Markov
processes, decision models, optimal policy, optimal control, system
spectrum.

1. Introduction
In this article, we deal with an optimal decision problem in which the
objective function is the essential supremum of the Lyapunov exponents for
a dynamical system described by random matrix products when these
matrices depend on a controlled Markov process (ξn) with values in a finite
or countable set I. We assume that (ξn) has transition probability
P(a)G(Pij (a): i, j∈I ),
which depends on a control parameter a.
1

The author thanks Professor C. C. Heyde, Dr. D. J. Daley, Mrs. Lynne Simpson, and Ms.
Jenny Goodwin for helping him during his stay in the School of Mathematical Sciences,
Australian National University, Canberra, Australia. He thanks the referee for suggesting
constructive ideas for this article.


2
Professor, Faculty of Mathematics, Mechanics, and Informatics, Vietnam National University, Thanh Xuan, Hanoi, Vietnam.

347
0022-3239͞00͞0500-0347$18.00͞0  2000 Plenum Publishing Corporation


348

JOTA: VOL. 105, NO. 2, MAY 2000

For any admissible control (ut ), that we shall define exactly below,
we consider the R d-valued random variables (Xn : nG0, 1, . . .) given by the
following difference equation:
XnC1 GM(ξnC1 , YnC1)Xn ,
d

X0 Gx∈R ,

(1a)
(1b)

where (Yn ) is a sequence of i.i.d. random variables and M(i, y) are invertible
dBd matrices. The behavior of the solutions of the system (1) when the
transition probability does not depend on a has been studied by many
authors; see Refs. 1–3. Let X un (x) be the solution of (1) associated with the
control (ut ). The process (ut ) affects the solutions of the system through the
transition probability P(a). We define the Lyapunov exponent of X un (x) by

λ u[x]Glim sup (1͞n) log ͉X un (x)͉.

n→S

(2)

For any admissible control u, the Lyapunov exponent of the system (1)
is in general a random variable; then, in order to exclude the randomness,
we introduce here the new concept of essential Lyapunov exponent of
X un (x) defined as follows:
Λ[x]GPAess sup λ u[x],
where PAess sup denotes the essential supremum taken under the probability P. Because of the linearity of (1), it is easy to verify that the map
x → Λ u[x] satisfies the general property of Lyapunov exponents, i.e.,
(a) Λ u[α x]GΛ u[x],
for any α ≠0, x∈R d,
u
u
(b) Λ [xCy]Ymax{Λ [x], Λ u[y]}.
Therefore, Λ u[·] takes many finite values, namely,
Λ u1YΛ u2Y· · ·YΛ ud .
It is well-known that the trivial solution X ≡ 0 of (1) is stable if
sup Λ u[x]F0.

͉x͉Y1

(3)

Hence, for the trivial solution X ≡ 0 to be stable, it suffices to choose a
control (ut ) such that the condition (3) holds. However, this condition is
not always satisfied because, in some cases, among the class of admissible
controls, we are not able to find a control (ut ) that yield negative Lyapunov
exponents of the solutions of (1). So, in view of applications, it is natural

that we want to find a control (ut ) with which the system (1) is nearest to
stability. This means that the Lyapunov exponents of our system must be
as small as possible.


JOTA: VOL. 105, NO. 2, MAY 2000

349

This question leads us to consider the problem of minimizing the function Λ u[x] over the class of admissible controls. In this article, the main idea
for solving this problem is to relate it to the Markov decision problem with
per unit average cost.
The paper is organized as follows. Section 2 introduces the fundamental
notations and hypotheses, in terms of which we define the policies and
objective function for the problem. Sections 3–4 contain the main results:
we reduce the state space and prove that, under the assumptions introduced
in Section 2, our model satisfies the Doeblin condition. From this, we can
use methods dealt with in Ref. 4 and the properties of Lyapunov exponents
to show the existence of an optimal policy. Furthermore, with this policy,
the spectrum of the system (1) consists of only one element.
2. Notations and Hypotheses
Let A be a compact metric space, called the space of actions, and let
NG{1, 2, . . .} be the set of natural numbers. Throughout this paper, if m(·)
is a measure and f is an m-integrable function, we denote ͐ f (x) dm by m( f );
and if S is a topological space, we write B (S ) for the Borel sets of S. Let
Y be a measurable space endowed with the σ -algebra B (Y), and suppose
that µ is a probability measure on (Y, B (Y)). Let I be a finite or countable
set. Suppose that, for every a∈A, we have a transition matrix
P(a)G(Pij (a): i, j∈I).
Let M: IBY → Gl (d, R) be a measurable map from IBY into the group

of invertible matrices Gl (d, R). Throughout this paper, we shall make the
following hypotheses.
Assumption A1.
sup
i ∈I

Ύ [͉log͉M(i, Y)

−1 p

͉͉ C͉log͉M(i, Y)͉͉p ] µ (dy)FS,

Y

for some pH1.

(4)

Assumption A2. The map a → P(a) is weakly continuous in the sense
that, for any i∈I, the ith row vector of the matrix P(a) is continuous in a
in l1. Moreover, the Markov chain P(a) satisfies the Doeblin condition;
namely, there exist a finite set K⊂I and numbers α H0, n0H0 such that





J n0 ∈K j 1 , j 2 ,..., jn0A1 ∈I

Pij 1 (a1 )P j 1 j 2 (a2 ) . . . P jn0A1 jn0 (an0)Xα ,


for any i∈I and a1 , a2 , . . . , an0 ∈A.

(5)


350

JOTA: VOL. 105, NO. 2, MAY 2000

Assumption A3. For the distribution Q(i, H )Gµ ( y∈Y: M (i, y)∈H),
H∈B (Gl (d, R)), of M(i, ·) on GL(d, R), there exists a number n1H0 such
that3
Q(i1 , i2 , . . . , in1 , ·)GQ(i1 , ·) ∗ Q(i2 , ·) ∗ · · · ∗Q(in1 , ·),

(6)

for any i1 , i2 , . . . , in1 ∈I, has a nonvanishing absolutely continuous part in
its Lesbegue decomposition.
Assumption A3 means that, if
Q(i1 , i1 , . . . , in1 , ·)GQc(·)CQ s(·),
where Q c and Q s are respectively the absolutely continuous and singular
parts with respect to the Lebesgue measure, then Q c(Gl (d, R))H0.
Example 2.1. Let IG{+, −} and let
M(±, y)G

1 ±y

΄ 1 0΅ .


Suppose that YnӍγ , where γ has a continuous distribution; then, it is easy
to see that Assumption A3 is true with n1 G4.
We shall now formulate the problem in canonical space. Denote by Ω0
the set of all sequences (wn ) with wn G(ξn , yn , an ), where ξn ∈I, yn ∈Y, and
an ∈A, i.e.,
Ω0 G{ f: N → IBYBA},

ΩGR dBΩ0.

Let x0 be a random variable, let Wn G(ξn , Yn , An ) be the canonical process
defined on Ω by
Wn (x, ω )Gωn ,

x0(x, ω )Gx,

(x, ω )∈R dBΩ0 GΩ,

and let Fn Gσ (x0 , Wt : tYn) be the canonical filtration on Ω. (F t ) is called
the σ -field of observable events up to t. We write F for FS ,
S

F

S



n G1

Fn ,


F

0

Gσ (x0 ).

A decision π t at time t is a stochastic kernel on
(IBYBA)tA1B(iBY), namely,

π t Gπ t (·͉x, w1 , w2 , . . . , wtA1 , ξt , yt).
3

The asterisk in (6) denotes the convolution operation.

B (A)BR dB


351

JOTA: VOL. 105, NO. 2, MAY 2000

A sequence of decisions π G(π 1 , π 2 , . . .) is called a policy. We use Π to
denote the class of all policies.
Let π ∈Π be a policy, and let q∈P (I ), ν ∈P (R d ), where P (S ) denotes
the set of probability measures on S for any measurable space S. Then, we
can define a probability measure P on (Ω, Ft , F ) such that the following
conditions are satisfied: for any nG1, 2, . . . , B∈B (Y), and i, j∈I:
(i)


P(Yn ∈B ͉ FnA1 , ξn)Gµ (B),

(7)

(ii)

P(ξnC1 Gj͉ Fn)GPξn (An ),

(8)

(iii) P(An ∈·͉ FnA1 , ξn , Yn)Gπ n (· ͉x0 , W1 , W2 , . . . , WnA1 , ξn , Yn ),
(iv)

P(ξ0 Gi)Gqi ,

(v)

P(x0 ∈B)Gν (B),

(9)

qG(q1 , q2 , . . .),
∀B∈B (R d ),

with the convention W0 Gconst.
The probability P is called the control associated with the policy π and
the initial distributions q, ν. We denote by R (q, ν) the class of controls
starting from q. It is well-known that R (q, ν) is a convex, closed set.
Let P∈R (q, ν) be a control associated with the policy π ∈Π and q, ν.
We consider a difference equation in the form

XnC1 GM(ξnC1 , YnC1)Xn ,

(10a)

X0 Gx0 .

(10b)

Suppose that X(n, x0 ) is the solution of (10) starting at x0 , i.e.,
X(0, x0 )Gx0 ,

P-a.s.

We consider the following two objective functions:

Ά

·

Λ(q, ν, π )GPAess sup lim sup (1͞t) log͉X(t, x0 )͉ ,
t→S

(11)

with the essential supremum taken over the probability P, and
Ψ(q, ν, π )GEq,π ν lim sup (1͞t) log͉X(t, x0 )͉,
t→S

(12)


where Eq,π ν denotes the expectation with respect to the measure Pq,ν . If q and
ν are degenerate at i and x, we will write simply Λ(i, x, π ) and Ψ(i, x, π ),
instead of Λ(q, ν, π ) and Ψ(q, ν, π ), respectively. It is evident that
Λ(q, ν, π )XΨ(q, ν, π ),

(13)


352

JOTA: VOL. 105, NO. 2, MAY 2000

for any q, ν, π . Let
Λ(q, ν)Ginf{Λ(q, ν, π ): π ∈Π},

Λ*Ginf Λ(q, ν),
q,ν

(14a)

Ψ(q, ν)Ginf{Ψ(q, ν, π ): π ∈Π}, Ψ *Ginf Ψ(q, ν).

(14b)

q,ν

The triplet (q, ν, π ) is said to be minimum for problem (11) [respectively
(12)] if Λ(q, ν, π )GΛ* [respectively Ψ(q, ν, π )GΨ *], and π *∈Π is called
optimal if Λ(i, x, π *)GΛ(i, x) [respectively Ψ(i, x, π *)GΨ(i, x)] for any i∈
I, x∈R d.

From (13), we get
Λ*XΨ *.
So, if (q, ν, π ) is minimum for problem (12) and Λ(q, ν, π )GΨ *, then
(q, ν, π ) is also minimum for problem (11). Therefore, we hope that, under
suitable hypotheses, it is sufficient to consider problem (12) to find an optimal control for problem (11).
3. Reduced Markov Decision Model
It is well-known that the objective function given in the form (11) or
(12) is independent of the length of the vectors. Therefore, we may reduce
the state space. Any two nonzero vectors are said to be equivalent if they
are proportional. The space of equivalent classes is denoted by P dA1. The
action of a matrix g on R d preserves the equivalence relation. We use g
again to denote the quotient action on P dA1. Let us consider the Fn-adapted
reduced process
Zn G(ξn , Sn),

nG1, 2, . . . ,

(15)

defined on Ω with values on IBP dA1, where
Sn GXn ͉͞Xn ͉,

nG0, 1, 2, . . . .

We put

ρn (i, s)Glog[1͉͞M −1 (i, Yn )s͉],

i∈I, s∈P dA1, nG1, 2, . . . .


Then, it is easy to check that
log͉X(t, x0 )͉Gρ1 (Z1 )Cρ2(Z2 )C· · ·Cρt (Zt)Clog ͉x0 ͉,
where

ρk (Zk)Glog[1͉͞M −1 (ξk , Yk)Sk ͉]
Glog͉Xk ͉Alog͉XkA1 ͉.

(16)


JOTA: VOL. 105, NO. 2, MAY 2000

353

Hence,
t

lim sup (1͞t) log͉X(t, x0 )͉Glim sup (1͞t) ∑ ρn (Zn).
t→S

t→S

(17)

n G1

If the policy is constant, i.e.,

π t (·͉x0 , w1 , w2 , . . . , wtA1 , ξt , yt)Gδ a ,
where a∈A is fixed and δ a is the Dirac mass at a, then (Zn ) is a Markov

process with transition operator
P(ξtC1 Gj, StC1 ∈B͉ FtA1 , ξt Gi, St Gs)
GP(ξtC1 Gj, M( j, YtC1)s͉͞M( j, YtC1)s͉∈B͉ξ Gi, St Gs)
GPij (a)

Ύ 1 [M( j, y)s͉͞M( j, y)s͉]µ (dy),
B

for any i, j∈I, s∈P dA1, B∈B (P dA1 ). We denote this transition by
T( jBB͉i, s, a)GPij (a)

Ύ 1 [M( j, y)s͉͞M( j, y)s͉]µ (dy).
B

(18)

The policy π G(π 1 , π 2 , . . .) is said to be Markov stationary for the control problem of Lyapunov exponents (or randomized stationary, see Ref. 4)
if there exists a kernel Φ on B (A)B(IBP dA1 ) such that, for tG1, 2, . . . ,

π t G(da͉x0 , W1 , W2 , . . . , WtA1 , ξt , Yt )GΦ(da͉Zt).
We write ΦS for the policy (Φ, Φ, . . .).
A Markov stationary policy Φ is called a stationary policy (or determined stationary policy) if Φ(·͉i, s) is the Dirac mass for any i∈I, s∈P dA1.
In this case, a stationary policy is described completely by a measurable
mapping f: IBP dA1 → A such that
Φ({ f (i, s)}͉i, s)G1,

for i∈I, s∈P dA1 ;

see Refs. 5–6. We denote this policy by f S.
Let Φ(da͉i, s) be a Markov stationary policy; then, under the probability associated with Φ, the process (Zt ) is Markov with transition probability TΦ given by


Ύ

TΦ (C͉i, s)G T(C͉i, s, a)Φ(da͉i, s).
Lemma 3.1. Under Assumptions A2 and A3, for any Markov stationary policy, the Markov chain (Zn ) satisfies the Doeblin condition (see Refs.


354

JOTA: VOL. 105, NO. 2, MAY 2000

7–8) with respect to the product measure of Lebesgue measure meas(·) on
P dA1 and a counting measure on I.
Proof. We have to prove that, for any Markov stationary policy
Φ(·͉i, s), there exist a counting measure on I (γ , say) and numbers (H0,
δ H0, and m0 such that, for every i∈I and s∈P dA1,
T Φm0 (C͉i, s)Y1A(,

(19)

for any C∈B (I )BB (P dA1 ) such that γ Bmeas(C)Fδ , where meas(·)
denotes the Lebesgue measure on P dA1.
Let K and α , n0 be given as in Assumption A2. Then,





j ∉K j 1 , j 2 ,..., jn0 ∈I


Pij 1 (a0 )·P j 1 j 2 (a1 ) . . . P jn0 j (an0)Y1Aα ,

(20)

for any i∈I, a0 , a1 , . . . , a jn0 ∈A. We note that, if (20) is satisfied for n0 , then
it will be satisfied for any nXn0 . Indeed,





j ∉K j 1 , j 2 ,..., jn0C1 ∈I

Pij 1 (a0 ) · P j 1 j 2 (a1 ) . . . P jn0C1 j (an0C1)

G ∑ Pij 1 (a0 ) ∑
j 1 ∈I



j ∉K j 2,..., jn0C1 ∈I

P j 1 j 2 (a1 ) . . . P jn0C1 j (an0C1)Y1Aα .

Furthermore, if Assumption A3 is true for n1 , then it is still true for any
nXn1 by the following property: if one of the measures σ 1 and σ 2 is absolutely continuous with respect to σ on a topological group, then their convolution is absolutely continuous with respect to σ . Hence, without loss of
generality, we can suppose that n0 Gn1 G1 and we shall show that (19) is
satisfied for m0 G1. To avoid complexities, we put
Q(i, s, B)Gµ{y: M(i, y)s͉͞M(i, y)s͉∈B},
Q(i, H )Gµ{y: M(i, y)∈H},

Ci G{s∈P dA1 : (i, s)∈C},
for any i∈I, H∈B (Gl (d, R)), and C∈B (IBP dA1 ).
Let us choose γ to be a probability measure on I such that

γ (i)G1͞r, if i∈K,
where r is the number of elements of K. We denote by m(·) the product
measure γ (·)Bmeas(·) on IBP dA1. Suppose that δ 1F1͞r; then, from
m(C)Fδ 21 , it follows that
meas(Ci )Fδ 1 ,

for any i∈K.


JOTA: VOL. 105, NO. 2, MAY 2000

355

By using the definition of TΦ , we have

Ύ P (a)Q( j, s, C )Φ(da͉i, s)

TΦ (C͉i, s)G ∑

ij

j ∈I

G∑

j ∉K


Ύ P (a)Q( j, s, C )Φ(da͉i, s)
ij

C∑

j ∈K

Y∑

j ∉K

j

j

Ύ P (a)Q( j, s, C )Φ(da͉i, s)
ij

j

Ύ P (a)Φ(da͉i, s)
ij

Csup Q( j, s, Cj ) ∑
j ∈K

j ∈K

Ύ P (a)Φ(da͉i, s).

ij

(21)

Let i∈K be fixed. By Assumption A3, there exists a function F defined
on Gl(d, R) such that
Q c(i, H )G

Ύ F(g) dg.
H

Since F≠0, we can find a bounded Borel set H0 such that Q c(i, H0 )Gσ H0
and F(g) is essentially bounded on H0 . We suppose that
͉g͉Yc,

͉F(g)͉Yc,

for any g∈H0 .

From this, we have
r 0 )Y1Aσ ,
Q s (i, H0 )CQ(i, H
r 0 denotes the complement of H0 . Letting
where H
c · BG{x: x͉͞x͉∈B, ͉x͉Fc},
we get
r 0 )CQ s(i, H0 )CQ c(i, {g∈H0 : gs͉͞gs͉∈B})
Q(i, s, B)YQ(i, H
Y1Aσ CQ c(i, H0 ∩{gs∈c·B}).
But

Q c (i, H0 ∩{gs∈c · B})Yk· meas(c · B)
Yk · cd · meas(B),

(22)


356

JOTA: VOL. 105, NO. 2, MAY 2000

where there is an abuse of notation between meas(·) on Gl(d, R) and
meas(·) on R d. This implies that, if
meas(B)Fδ 1 _ σ ͞2kcd,
then by (22),
Q(i, s, B)Y1A(1͞2)σ .
From (21), we have
TΦ (C͉i, s)Y ∑

j≠K

Ύ P (a)Φ(da͉i, s)
ij

C(1A(1͞2) σ ) ∑

j ∈K

Y1A(1͞2) σ ∑

j ∈K


Ύ P (a)Φ(da͉i, s)
ij

Ύ P (a)Φ(da͉i, s)
ij

Y1A(1͞2)ασ .
The proof of the lemma is completed by putting
(G(1͞2)ασ

and δ Gmin{1͞r 2, δ 21 }.



In connection with the value functions Λ(·) and Ψ(·), we consider the
Markov decision model mentioned above with value function in the form
t

V(i, s, π )Glim sup (1͞t)Eiπ ∑ ρn (Zn),
t→S

Z0 G(i, s),

(23)

n G1

which is familiar to us.
We put

V(i, s)Ginf V(i, s, π ), V(q, ν)Ginf V(q, ν, π ), V*Ginf V(q, ν),
π

π

q,ν

as in (12). Let

ρ(i, s, a)G ∑ Pij (a)E log͉M( j, Yn)s͉.
j ∈I

By Assumption A1, ρ(i, s, a) is a bounded continuous function, and it is
easy to see that
Eiπ ρn (Zn)GEiπ ρ(ZnA1 , AnA1),

nG1, 2, . . . .

From this, we get
t

V(i, s, π )Glim sup (1͞t) ∑ Eiπ ρn (Zn , An).
t→S

n G1

(24)


JOTA: VOL. 105, NO. 2, MAY 2000


357

Because (Yn ) is an i.i.d. sequence, then by Assumption A1, the sequence
(ρ(Zn )) is uniformly integrable for any π ∈Π. By virtue of the Fatou lemma,
we have
Ψ(i, s, π )XV(i, s, π ),

i∈I, s∈P dA1, π ∈Π.

(25)

Hence,
Ψ(i, s)XV(i, s).

(26)

Theorem 3.1. If ΦS is a Markov policy, then
Ψ(i, s, ΦS )GV(i, s, ΦS ).

(27)

Proof. Under the policy ΦS, (Zn ) is a Markov process with transition
probability
T(C͉i, s, Φ )G

Ύ T(C͉i, s, a)Φ(da͉i, s)_T (C͉i, s).
Φ

A


By Lemma 3.1, the Markov process (Zn ) satisfies the Doeblin condition
with respect to the measure m(·) defined in the proof of Lemma 3.1 with
constants δ and (. So, we can define a decomposition of the state space
IBP dA1 into a transient set F and a finite number of ergodic sets C 1,
C 2, . . . , C p, with
m(C r )Xδ ,

1YrYp.

The restriction of (Zn ) on C r is ergodic, so it is Harris recurrent with respect
to the invariant measure γ r (·) defined by
t

γ r (·)G lim (1͞t) ∑ T nΦ (· ͉i, s),
t→S

(i, s)∈C r.

n G1

By Exercise 31 in Ref. 9, TΦ is quasicompact. On the other hand, if we put

Ύ ρ(i, s, a) · Φ(da͉i, s),

ρΦ G

A

then it is easy to show that


Ύ [ ρ (i, s)AV(i, s, Φ )]ν (di, ds)G0.
Φ

This implies that ρΦAV(i, s, Φ ) is a charge on C r. Hence, the Poisson
equation
(EATΦ )hGρΦAV,

(28)


358

JOTA: VOL. 105, NO. 2, MAY 2000

where E is the identity operator on C r, has a bounded solution. Let h be
such a solution of (28), and put
Hn GρnC1 (ZnC1)Ch(ZnC1)Ah(Zn)AV(Zn);

(29)

we remark that, for any r, V is constant on C r by the ergodicity of (Zn ).
t
From (28), we can prove that ∑n G0 Hn is an F n-martingale. Indeed,
E[Hn ͉ F n ]GρΦ (Zn )CTh(Zn )Ah(Zn )AV(Zn )G0.
From Assumption A1, it follows that
sup E[͉Hn ͉(log+ ͉Hn ͉)2 ]FS.
n

By using the law of large numbers for martingale sequences, we get

t

lim (1͞t) ∑ Hn G0,

t→S

a.s.

n G0

Hence,
t

lim (1͞t) ∑ ρnC1 (ZnC1)GV(i, s, ΦS ),

t→S

a.s., (i, s)∈C r.

(30)

n G0

This means that
Ψ(i, v, ΦS )GV(i, s, ΦS ),

p

for any (i, s)∈ * C r.
r G1


We consider now (i, s)∈F. Let τ be the last exit time from the set F,
i.e.,

τ Gsup{nH0: ZnA1 ∈F}.
Since F is a transient set, P(τ FS )G1 and τ is the stopping time, because
every ergodic set is absorbed. Hence,
t

Ψ(i, s, ΦS )GE lim sup (1͞t) ∑ ρn (Zn)
t→S

n G1
t

GE lim sup (1͞t) ∑ ρn (Zn )
t→S

S

G∑

n G1

n Gτ

Ύ* E ΄lim sup (1͞t) ∑ ρ (Z )͉ τ Gk, Z G( j, u)΅
t

Cr


t→S

BP(τ Gk, Zk ∈(dj, du)).

n

nGk

n

k


359

JOTA: VOL. 105, NO. 2, MAY 2000

Using a proof similar to that of (30), we can show that

΄

΅

t

E lim (1͞t) ∑ ρn (Zn)͉ τ Gk, Zk G( j, u) GV( j, u, ΦS ),
t→S

n Gk


for any ( j, h)∈*C r. Then, we get
S

p

Ψ(i, s, ΦS )G ∑ ∑ Vr P(τ Gk, Zk ∈C r ),

(i, s)∈F,

(31)

k G1 r G1

where
when (i, s)∈C r.

Vr GV(i, s),

On the other hand, if (i, s)∈F, then
V(i, s, ΦS )
t

Glim sup (1͞t)E ∑ ρn (Zn)
t→S

n G1
t

XE lim inf (1͞t) ∑ ρn (Zn)

t→S

S

G∑

k G1

n G1

Ύ* E ΄lim inf (1͞t) ∑ ρ (Z ))͉ τ Gk, Z G( j, u)΅
t

n

t→S

Cr

n

k

n Gk

BP(τ Gk, Zk ∈(dj, du))
S

p


G ∑ ∑ VrP(τ Gk, Zk ∈C r ).

(32)

k G1 r G1

By comparing (31) and (32), we get
t

t

E lim (1͞t) ∑ ρn (Zn)G lim (1͞t)E ∑ ρn (Zn),
t→S

n G1

t→S

n G1

for any (i, s)∈F, i.e.,
Ψ(i, s, ΦS )GV(i, s, ΦS ),

(i, s)∈F.

From this and (30), Theorem 3.1 is proved.



Since (30) takes place P-a.s., then it is easy to establish a relation

between the value functions (11) and (23).
Theorem 3.2. Under the assumptions of Theorem 3.1, for any Markov
stationary policy ΦS G(Φ, Φ, . . .), let τ and C r, Vr , 1YrYl, defined as in


360

JOTA: VOL. 105, NO. 2, MAY 2000

the proof of Theorem 3.1. Then,
Λ(i, s, ΦS )G

V(i, s, Φ ),
if (i, s)∈C r,
l
PAess sup{∑r G1 1{Zr ∈C r} · Vr }, if (i, s)∈F.

Ά

¯ the subclass of Π
We now turn to the reduced problem. Denote by Π
consisting of kernels of the form π¯ (da͉Z1 , A1 , . . . , ZtA1 , AtA1 , Zt ), which
may be considered as a kernel on B (A)B(IBP dA1BA)tB(IBP dA1 ). Let
π ∈Π be an arbitrary policy, and let

Frt Gσ {Zn , An : nYt}.
Suppose that π¯ is the dual projection of π on FrnA1 ∨ σ (Zn ), i.e.,
E[π t ( f ͉ FrtA1 , zt ]Gπ¯ t ( f ͉Z1 , A1 , Z2 , A2 , . . . , ZtA1 , AtA1 , Zt),
¯ , and it is easy to verify
for any measurable bounded fX0. It is obvious π¯ ∈Π

that
Eisπ ρ(Zn , An)GEisπ¯ ρ(Zn , An),

for any n∈N;

i.e., the control P associated with π and the control Pr associated with π¯
agree on (Ω, Frn , FrS ). So,
t

V(i, s, π )Glim sup (1͞t) ∑ Eisπ ρ(Zn , An)
t→S

n G1
t

Glim sup (1͞t) ∑ Eisπr ρ(Zn , An)
t→S

n G1

GV(i, s, π¯ );
i.e., π and π¯ have the same value.
Therefore, for the control problem of Lyapunov exponents, we can
reduce our model by considering that (Zn , An ) is a canonical process defined
on the canonical space
¯ G{ f: N → IBP dA1BA},

¯ with controlled transition probability
and the policies are in the class Π
(18). This reduced model has many advantages because P dA1 is compact.

Therefore, in the following, we consider only the reduced model. Further¯ , as we now show.
more, we can find optimal policies in Π


JOTA: VOL. 105, NO. 2, MAY 2000

361

4. Existence of an Optimal Policy
To prove the existence of an optimal policy, we use the Kurano ideas,
which are explained in Ref. 4. In this section, we replace Assumption A2
by the following assumption.
Assumption A2′. the map a → P(a) is continuous and the family
{P(a): a∈A} is tight, i.e., for any (H0, there is a finite set K such that, for
any i∈I, a∈A,

∑ Pij (a)X1A(.

j ∈K

This assumption implies the tightness of the sequence (Zn , An ) under
¯ and any initial distribution of Z0 . Indeed, P dA1 and A are
any policy π ∈Π
compact, so we have only to prove that the sequence of (ξn ) is tight. Taking
¯ and initial distribution of Z0 , we have
any policy π ∈Π

∑ P(ξn Gj)G ∑ EP(ξn Gj͉ F

j ∈K


j ∈K

)

nA1

G ∑ PξnA1, j (AnA1)X1A(.
j ∈K

¯ and any
Lemma 4.1. See Ref. 4, Lemma 2.1. For any policy π ∈Π
initial distribution q, ν of ξ0 , S0 , we can find a probability measure on
IBP dA1BA, namely σ , such that

Ύ

ρ(z, a)σ (dz, da)YV(q, ν, π )

Ύ

g(z)σ (dz, da)

(33)

IBP dA1BA

and

IBP dA1BA


G

Ύ

σ (dz, da)

IBP dA1BA

Ύ

g(z′) · T(dz′͉z, a),

IBP dA1

for any bounded continuous function g.
¯ , q∈P (I), and ν ∈P (P dA1), we put
Proof. For given π ∈Π
T

µT (D)G(1͞T) ∑ E1D (Zn , An),
n G1

(34)


362

JOTA: VOL. 105, NO. 2, MAY 2000


where the expectation is taken with respect to the control associated with
(q, ν, π ). The family { µT (·): TG1, 2, . . .} is tight, so there exists a sequence
{Tn } and a probability measure σ on IBP dA1 such that
w
µT n (·) →σ
(·).

This implies that

Ύ

σ ( ρ)G ρ(z, a)σ (dz, da)
Glim µT n (ρ)
Ylim sup µT (ρ)
T→S

T

Glim sup (1͞T) ∑ Eρ(Zn , An)
T→S

n G1

GV(q, ν, π ),
so we have (33).
On the other hand, for any bounded continuous function g, we have

΂

΃


T

0G(1͞T)E ∑ g(Zn)AE[g(Zn)͉ FrnA1 ] ,
n G1

and

Ύ

E[g(Zn)͉ FrnA1 ]G g(z)T(dz͉ZnA1 , AnA1).
Hence,

Ύ

Tn

g(z)σ (dz, da)Glim sup (1͞Tn) ∑ Eg(Zt)
T n→S

t G1
Tn

Glim sup (1͞Tn ) ∑ ET(g͉ZtA1 , AtA1)
T n→S

tG1

Ύ


G T(g͉z, a)σ (dz, da),


so we get (34). The lemma is proved.

This lemma allows us to conclude that the set of Markov policies is
complete in the sense that
inf {V(q, ν, π ): π is policy}Ginf {V(q, ν, Φ ): Φ is Markov}.
q,ν

q,ν


363

JOTA: VOL. 105, NO. 2, MAY 2000

Lemma 4.2. If {σ n } is a sequence of probability measures which
satisfy (34), then {σ n } is tight.
Proof. Let (H0, and let K be as mentioned in Assumption A2′. Then,
from (34), we have

Ύ

σ n (KBP dA1BA)G σ n (dz, da)T(KBP dA1 ͉z, a)

Ύ

G σ n(dz, da) ∑ Pij (a)
j ∈K


X1A(,
for any n. This means that the sequence {σ n } is tight. Hence, the lemma is
proved.

From this lemma and using the same argument as in Ref. 4, we can
show that there is an invariant measure ν on IBP dA1 and a Markov policy
ΦS such that
V(ν, ΦS )GV*;
here, there is an abuse of notation, but we can define V(ν, π ) exactly as
V(i, s, π ). Under the Doeblin condition on the process (Zn ), we have a
decomposition of the state space as in the proof of Theorem 3.1, namely,
the subsets F, C 1, . . . , C l for which
m(C r )Xδ ,

for any 1YrYl,

and
V(i, s, ΦS )GV*,

(35)

and by Theorem 3.2,
Λ(i, s, ΦS )GV*,

(36)

for any (i, s)∈C_*C , where the union is over all r∈{i: ν (C )H0}. On the
other hand, for any fixed i∈I such that
r


meas(Ci )≠0,
the map
s → V(i, s, ΦS )

i

where Ci G{s: (i, s)∈C},


364

JOTA: VOL. 105, NO. 2, MAY 2000

satisfies the general property of the Lyapunov exponents (see Ref. 10).
Hence, the set
Li G{s: V(i, s, ΦS )GV*}
is the projection of a linear subspace of R d containing Ci , so it must be
P dA1. Let
SG{(i, s)∈IBP dA1 : V(i, s, ΦS )GV*}.
We remark that C⊂S, so there exists an i∈I such that meas(Si )H0. This
implies that
Si GP dA1,

(37)

for some i∈I.
Lemma 4.3. S is an invariant set.
Proof. Suppose that there is an (i0 , s0 )∈S such that
P iΦ0 {ω : Zn (ω )∉S, for some nH0}H0.

S

Then, we can find an integer kH0 and a nonempty Borel set B⊂IBP dA1
such that
P{ω : Zk (ω )∈B}H0
and
V(i, s, ΦS )Xα HV*,
for any (i, s)∈B. Because ΦS is a Markov policy, then we have by Theorem
3.1 that
V(i, s, ΦS )GΨ(i, s, ΦS ),
for any (i, s). This implies that
V*GV(i0 , s0 , ΦS )
t

GE lim (1͞t) ∑ ρn (Zn)
t→S

n G1
t

GE lim (1͞t) ∑ ρn (Zn)
t→S

΄

n Gk
t

t


΅

GE lim (1͞t) ∑ ρn (Zn)1B (Zk)C lim (1͞t) ∑ ρn (Zn)1Br (Zk)
t→S

n Gk

t→S

n Gk


365

JOTA: VOL. 105, NO. 2, MAY 2000

΄

t

t

GE lim (1͞t) ∑ ρn (Zn)1B (Zk)C lim (1͞t) ∑ ρn (Zn)1Br (Zk)
t→S

t→S

n Gk

n Gk


΅

GE[Ψ(Zk) · 1B (Zk)CE[Ψ(Zk) · 1Br (Zk)
Xα P(Zk ∈B)CV*P(Zk)∈Br )HV*.
This is a contradiction. Thus, S is an invariant.



Theorem 4.1. Suppose that there exists a Markov policy LS(·͉i, s)
such that
S

PiL

Ά * {ξ Gj}· G1,
S

n G1

(38)

n

for any i, j∈I. Then, there exists an optimal stationary policy for all problems with the objective functions Λ, Ψ, V.
Proof. We define a new policy
Φ∏(·͉i, s)G

Φ(· ͉i, s),


ΆL(·͉ i, s),

if (i, s)∈S,
otherwise,

where Φ and S are mentioned in (36)–(37). Let (i0 , s0 )∈IBP dA1 be fixed,
and let τ Gτ (i0 , s0 ) be the first hitting time of the set S of (Zn ). From (37)–
(38), it follows that P(τ FS )G1. So, in a way similar to the proof of
Theorem 3.1 and Theorem 3.2, we get
S

V(i0 , s0 , Φ∏S)G ∑ V* · P(τ Gk, Zk ∈S )
k G1

GV*P(τ FS )GV*,
and this means that Φ∏S is an optimal policy for the objective function V(·).
The fact that Φ∏ is optimal for the objective function Λ(·) follows from
Theorem 3.2 and for Ψ(·) is deduced from Inequalities (13) and (25). In this
case, we have
Λ*GΨ *GV*.
Let F, C , . . . , C m be a decomposition of IBP dA1 with respect to Φ∏S as in
Theorem 3.1. Then, there exists a function h defined on IBP dA1 such that
1

(EATΦ∏)hGρΦ∏AV*,

(39)

where
TΦ (· ͉i, s)G(T· ͉i, s, Φ )G


Ύ T(· ͉i, s, a)Φ(da͉i, s),
A


366

JOTA: VOL. 105, NO. 2, MAY 2000

and

ρΦ (i, s)Gρ(i, s, Φ )G

Ύ ρ(i, s, a)Φ(da͉i, s).
A

m

Indeed, the function h can be defined on CG*i G1 C i, because TΦ∏ is quasicompact and ρΦ∏AV* is a charge on every C r. On the other hand, the set
F is transient, so

∑ T Φ∏ (F͉i, s)FS,
n

n

for any (i, s)∈F. Since ρΦ∏AV* is bounded, we can define the function h
on F by
S


h(i, s)G ∑ TΦn ∏ (ρΦ∏AV*͉i, s).
n G0

It is easy to check that h satisfies Eq. (39). Let h be such a solution of (39);
we put
S(i, s)G{a∈A: h(i, s)AT(h͉i, s, a)Xρ(i, s, a)AV*};
here, S(i, s) is measurable and by (39) and
Φ(S(i, s)͉i, s)H0,

for any (i, s),

so it follows from the selection theorem that we can find a map
f: IBP dA1 → A such that
h(i, s)AT(h͉i, s, f (i, s))Xρ(i, s, f (i, s))AV*.
Putting
Tf (h͉i, s)GT(h͉i, s, f (i, s)),

ρf (i, s)Gρ(i, s, f (i, s)),
from (39)–(40) we get
h(i, s)ATf (h͉i, s)Xρf (i, s)AV*,
Tf (h͉i, s)AT 2f (h͉i, s)pXTf (ρf ͉i, s)AV*,
·
··
TfnA1 (h͉i, s)ATfn (h͉i, s)XT fnA1 ( ρf ͉i, s)AV*.
This implies that
t

0Xlim sup (1͞t) ∑ T nf (ρf ͉i, s)AV*,
t→S


n G1

(40)


JOTA: VOL. 105, NO. 2, MAY 2000

367

i.e.,
t

V(i, s, f S )Glim sup (1͞t) ∑ T nf ( ρf ͉i, s)GV*.
t→S

This means that f
is proved.

S

n G1

is an optimal stationary policy. Therefore, the theorem


Corollary 4.1. If there exists a0 ∈A such that the chain P(a0 ) is Harris
recurrent, then there exists an optimal stationary policy.
In conclusion, we see that
Λ(i, s, π )XΨ(i, s, π )XV(i, s, π ),
for any policy π ∈Π; and if ΦS is a Markov policy, then under the hypotheses of Theorem 4.1, we have

Λ(i, s, ΦS )GΨ(i, s, ΦS )GV(i, s, ΦS ).
But we have proved that there exists a stationary policy f S such that
Λ(i, s, f S )GV(i, s)GV(i, s, f S )GV*,
for any (i, s)∈IBP dA1. This implies that
Λ(i, s)GΛ(i, s, f S ).
Theorem 4.2. If there exists a Markov policy such that condition (38)
is satisfied, then there exists a sationary policy f S which minimizes the
Lyapunov exponent of solutions of the system (10). In this case, the spectrum of the system (10) consists of only one element, namely Λ*. This means
that
lim (1͞t) log͉X f S (t, x)͉GΛ*,

t→S

S

P if -a.s.,

for any i∈I and x∈R d \{0}.
Example 4.1. We consider an example for dG2, IG{1, 2}, and
P(a)G

1Aα

΄β

α
, ∀aG(α , β )∈AG[1͞2; 2͞3]B[1͞3; 2͞3].
1Aβ

΅


Suppose that, for any i∈I, the expectation

λ i GE log ͉M(i, Yn)s͉


368

JOTA: VOL. 105, NO. 2, MAY 2000

does not depend on s∈I. Without loss of generality, we can suppose that
λ 1Yλ 2 . In this case, ρ defined in Section 3 is now given by

ρ(i, s, a)Gpi1λ 1Cpi2 λ 2 .
By Lemma 4.1, it suffices to consider only stationary policies. Let { f (i, s)}
be such a policy, and let ν (di, ds) be an invariant measure associated to f S.
Then,

Ύ

V(ν, f S )G ρ(i, s, f (i, s))ν (di, ds) ∑

j ∈I

Gλ 1C(λ 2Aλ 1 )

Ύ p ( f (i, s))λ ν (di, ds)
ij

j


Ύ p ( f (i, s))ν (di, ds)
i2

Xλ 1C(1͞3)(λ 2Aλ 1 )
G(2͞3)λ 1C(1͞3)λ 2 .
The inequality takes places if and only if
f (1, s)G1͞3,

f (2, s)G2͞3.

(41)

This means that an optimal policy is given by (41), and in this case
V*G(2͞3)λ 1C(1͞3)λ 2 .
It is easy to see that we can choose an example where V*Y0, but in general
V(ν, f )X0 for stationary policies; i.e., under optimal policies, our system is
stable; while, under other policies, our system may not be stable.

References
1. FURSTENBERG, H., and KIFFER, Y., Random Matrix Products and Measures on
Projective Spaces, Israel Journal of Mathematics, Vol. 46, pp. 12–32, 1982.
2. ARNOLD, L., and KLIEMAN, W., Qualitative Theory of Stochastic Systems, Probabilistic Analysis and Related Topics, Vol. 3, pp. 1–79, 1983.
3. DU, N. H., and NHUNG, T. V., Relation between the Sample and Moment Lyapunov Exponents, Stochastics and Stochastics Reports, Vol. 37, pp. 201–211, 1991.
4. KURANO, M., The Existence of Minimum Pair of State and Policy for Markov
Decision Process under the Hypothesis of Doeblin, SIAM Journal on Control
and Optimization, Vol. 27, pp. 296–307, 1989.
5. KUSHNER, H., Stochastic Stability and Control, Academic Press, New York,
NY, 1967.
6. DYNKIN, E. B., and IUSHKEVICH, A. A., Markov Controlled Processes and

Applications, Nauka, Moscow, Russia, 1976 (in Russian).


JOTA: VOL. 105, NO. 2, MAY 2000

369

7. ROSENBLATT, M., Markov Process: Structure and Asymptotic Behavior, Springer
Verlag, Heidelberg, Germany, 1983.
8. DOOB, J. L., Stochastic Processes, John Wiley and Sons, New York, NY, 1953.
9. REVUZ, D., Markov Chains, North Holland, Amsterdam, Holland, 1975.
10 BYLOV, B. F., VINOGRAD, R. E., GROBMAN, M. M., and NEMYCKII, V. V.,
Theory of Lyapunov Exponents, Nauka, Moscow, Russia, 1986 (in Russian).



×