Tải bản đầy đủ (.pdf) (147 trang)

Limiting behavior of eigenvectors of large dimensional random matrices

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (420.56 KB, 147 trang )

LIMITING BEHAVIOR OF EIGENVECTORS
OF LARGE DIMENSIONAL RANDOM
MATRICES
XIA NINGNING
NATIONAL UNIVERSITY OF SINGAPORE
2013
LIMITING BEHAVIOR OF EIGENVECTORS
OF LARGE DIMENSIONAL RANDOM
MATRICES
XIA NINGNING
(B.Sc. Bohai University of China)
A THESIS SUBMITTED
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
DEPARTMENT OF STATISTICS AND APPLIED
PROBABILITY
NATIONAL UNIVERSITY OF SINGAPORE
2013
ii
ACKNOWLEDGEMENTS
I would like to express my deep and sincere gratitude to my supervisor, Professor
Bai Zhidong. His valuable guidance and continuous support are crucial to the
completion of this thesis. He is truly a great mentor not only in statistics but also
in daily life. I have learned many things from him, especially regarding academic
research and character building. Next, I would like to thank Assistant Professor
Pan Guangming and Qin Yingli for discussion on various topics in research. I also
thank all my friends who helped me to make life easier as a graduate student.
Finally, I wish to express my gratitude to the university and the department for
supporting me through NUS Graduate Research Scholarship.
iii
CONTENTS
Acknowledgements ii


Summary vi
List of Notations viii
Chapter 1 Introduction 1
1.1 Large Dimensional Random Matrices . . . . . . . . . . . . . . . . . 1
1.1.1 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Eigenvector . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Moment Method . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Stieltjes Transform . . . . . . . . . . . . . . . . . . . . . . . 8
CONTENTS iv
1.2.3 Organization of the Thesis . . . . . . . . . . . . . . . . . . . 13
Chapter 2 Literature Review for Sample Covariance Matrices 15
2.1 Spectral Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.1 Limiting Spectral Distribution . . . . . . . . . . . . . . . . . 16
2.1.2 Limits of Extreme Eigenvalues . . . . . . . . . . . . . . . . . 20
2.1.3 Convergence Rate . . . . . . . . . . . . . . . . . . . . . . . . 22
2.1.4 CLT of Linear Spectral Statistics . . . . . . . . . . . . . . . 23
2.2 Eigenvector Properties . . . . . . . . . . . . . . . . . . . . . . . . . 28
Chapter 3 Convergence Rate of VESD for Sample Covariance Ma-
trices 32
3.1 Introduction and Main Result . . . . . . . . . . . . . . . . . . . . . 32
3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1.2 Main theorems . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.1 Stieltjes transform . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.2 Inequalities for the distance between distributions via Stielt-
jes transforms . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Preliminary Formulae . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.1 Truncation and Normalization . . . . . . . . . . . . . . . . . 49

3.4.2 Proof of Theorem 3.1 . . . . . . . . . . . . . . . . . . . . . . 53
3.4.3 Proof of Theorem 3.2 . . . . . . . . . . . . . . . . . . . . . . 68
3.4.4 Proof of Theorem 3.3 . . . . . . . . . . . . . . . . . . . . . . 69
3.4.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
CONTENTS v
Chapter 4 Functional CLT of Eigenvectors for Sample Covariance
Matrices 83
4.1 Introduction and Main Result . . . . . . . . . . . . . . . . . . . . . 83
4.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.1.2 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.2 Bernstein Polynomial Strategy . . . . . . . . . . . . . . . . . . . . . 90
4.3 Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.3.1 Convergence of ∇
1
− E∇
1
. . . . . . . . . . . . . . . . . . . 93
4.3.2 Mean Function . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.3.3 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
Chapter 5 Conclusion and Future Research 130
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
Bibliography 133
vi
SUMMARY
All classical limiting theorems in multivariate statistical analysis assume that
the number of variables is fixed and the sample size is much larger than the di-
mension of the data. In the light of the rapid development of computer science,
we are dealing with large dimensional data in most cases. Moving from low di-
mensional to large dimensional problems, random matrices theory (RMT) as an

efficient approach has received much attention and developed significantly. The
original statistical issues in multivariate analysis have changed to the investigation
on limiting properties of eigenvalues and eigenvectors for large dimensional random
matrices in RMT.
Based on the observation that the limiting spectral properties of large dimen-
sional sample covariance matrix are asymptotically distribution free and the fact
Summary vii
that the matrix of eigenvectors (eigenmatrix) of the Wishart matrix is Haar dis-
tributed over the group of unitary matrices, it is conjectured that the behavior of
eigenmatrix of a large sample covariance matrix should asymptotically perform as
Haar distributed under some moment conditions. Thus, the thesis is concerned on
finding the limiting behavior of eigenvectors of large sample covariance matrices.
The main work in this thesis involves two parts. In the first part (Chapter 3),
to investigate the limiting behavior of eigenvectors of a large sample covariance
matrix, we define the eigenVector Empirical Spectral Distribution (VESD) with
weights defined by eigenvectors and establish three types of convergence rates of
the VESD when data dimension n and sample size N proportionally tend to infinity.
In the second part (Chapter 4), the limiting behavior of eigenvectors of sample
covariance matrices is further discussed. Using Bernstein polynomial approxima-
tion and results obtained in Chapter 3, we prove the central limit theorem for the
linear spectral statistics associated with the VESD, indexed by a set of functions
with continuous second order derivatives. This result provides us a strong evidence
to conjecture that the eigenmatrix of large sample covariance matrices is asymptot-
ically Haar distributed. Thus, based on the result in Chapter 4, we have a better
view of the asymptotic property of eigenvectors for large general random matrices,
such as Wigner matrices.
viii
LIST Of NOTATIONS
X
n

(X
ij
)
n×N
= (X
1
, ··· , X
N
)
S
n
(1/N)X
n
X

n
, the simplified sample covariance matrix
S
n
(1/N)X

n
X
n
, the companion matrix of S
n
x
n
∥x
n

∥ = 1, unit vector in space C
n
c
n
n/N → c, the dimension to sample size ratio index
F
c
(x), F
c
n
(x) M-P law with ratio c and c
n
,
F
S
n
(x), F
S
n
(x) the empirical spectral distribution (ESD) of S
n
, S
n
F
c
n
(x) (1 − c
n
)I
(0,∞)

(x) + c
n
F
c
n
(x)
H
S
n
(x) the eigenvector empirical spectral distribution (VES-
D) of S
n
List of Notations ix
m
n
(z), m
H
n
(z) the Stieltjes transform of F
S
n
(x) and H
S
n
(x)
m
0
n
(z), m(z) the Stieltjes transform of F
c

n
(x) and F
c
(x)
m
n
(z) the Stieltjes transform of F
S
n
(x)
m
0
n
(z), m(z) the Stieltjes transform of F
c
n
(x) and F
c
(x)
U open interval including the support of M-P law
G
n
(x)

N(H
S
n
(x) − F
c
n

(x))
G
n
(f)

U
f(x)dG
n
(x), f ∈ C
2
(U )
γ
m
the contour formed by the boundary of rectangle with
vertices (a
l
± i/

m) and (b
r
± i/

m)
f
m
(x) Bernstein polynomial functions
A
n
= O
p

(B
n
) lim
t→∞
sup
n
P (|A
n
/B
n
| ≥ t) = 0
∥x
n
∥ Euclidean norm for any vector x
n
∈ C
n
∥A∥ spectral norm of matrices, i.e. ∥A∥ =

λ
AA

max
∥F (x)∥ norm of functions, i.e. ∥F (x)∥ = sup
x
|F (x)|
1
CHAPTER 1
Introduction
1.1 Large Dimensional Random Matrices

The development of Random Matrices Theory (RMT) comes from the fact that
the classical multivariate analysis is no longer suitable for dealing with large di-
mensional problems. All classical multivariate analysis assumes that the dimension
of the data is small and fixed and the number of observations, or sample size, is
large and tends to infinity. However, most of cases we are dealing with nowadays
are the data sets that the dimension increases together with the sample size, or in
other words we can say that the dimension and sample size share the same order.
1.1 Large Dimensional Random Matrices 2
The following two examples illustrate the serious effect of large dimensional
problems solving by conventional statistical analysis. Bai and Saranadasa (1996)
showed that both Dempster’s non-exact test (Dempster, 1958) and their asymp-
totically normally distributed test have higher power than classical Hotelling’s test
when the data dimension is proportionally close to the sample degree of freedom.
Another example was presented in Bai and Silverstein (2004). When dimension n
increases proportionally to sample size N, an important statistics in multivariate
analysis L
n
= ln(det S
n
) performs in a complete different manner than it does on
data of very low dimension with large sample size. Thus, a serious error is caused
when using classical limiting theory to show the asymptotic normality of L
n
under
large dimensional case.
Therefore, the theory of random matrices as a possible and effective method in
dealing with large dimensional data analysis has received much attention among
statisticians in recent years. For the same reason, the wide application of RMT can
be observed in many research areas, such as finance, engineering, signal processing,
genetics, network security, image processing and wireless communication problems.

From its inception, random matrix theory has been heavily influenced by its
applications in physics, statistics and engineering. The landmark contributions to
the theory of random matrices of Wishart (1928), Wigner (1958), and Marˇcenko
1.1 Large Dimensional Random Matrices 3
and Pastur (1967) were motivated to a large extent by practical experimental prob-
lems. Nowadays, RMT finds applications in more fields as diverse as the Riemann
hypothesis, stochastic differential equations, condensed matter physics, statistical
physics, chaotic systems, numerical linear algebra, neural networks, multivariate
statistics, information theory, signal processing, and small-world networks.
1.1.1 Spectral Analysis
Definition 1.1. (Empirical Spectral Distribution)
Suppose A is an n×n matrix with eigenvalues λ
j
, j = 1, 2, ··· , n. If all these eigen-
values are real, e.g. if A is Hermitian, we can define a one-dimensional distribution
function
F
A
(x) =
1
n
n

j=1
I(λ
j
≤ x),
called the empirical spectral distribution (ESD) of the matrix A, where I
denotes indicator function. If the eigenvalues λ
j

’s are not all real, we can define a
two-dimensional empirical spectral distribution of the matrix A:
F
A
(x, y) =
1
n
n

j=1
I(j ≤ n : ℜ(λ
j
) ≤ x, ℑ(λ
j
) ≤ y),
where ℜ and ℑ denote the real part and the imaginary part respectively.
We are especially interested in sequences of random matrices with dimension
1.1 Large Dimensional Random Matrices 4
(number of rows) tending to infinity. One of the main problems in RMT is to in-
vestigate the convergence of the sequence of empirical spectral distributions {F
A
n
}
for a given sequence of random matrices {A
n
}. The limit distribution F , which is
usually nonrandom, is called the Limiting Spectral Distribution (LSD) of the
sequence {A
n
}.

The initial investigation on the spectral analysis of random matrices comes
from nuclear physics during the 1950’s. There are thousands of energy levels in
a quantum system. It is impossible to observe all the energy levels individually,
but they can be represented by eigenvalues of a certain matrix. Since then, lots
of theorems and applications are established by physicists and statisticians on the
spectral of large dimensional random matrices, referring to Mehta (1990). The
interest of spectral analysis of large dimensional random matrices in statistical
inference is due to the fact that many important statistics in classical multivariate
analysis can be expressed as functionals of the ESD of some random matrices.
Thus, we can revise the conventional results using random matrix theory and make
them effective in applications.
1.2 Methodologies 5
1.1.2 Eigenvector
Besides the importance of spectral analysis, practical applications of RMT have
also raised the need for a better understanding to the limiting behavior of eigenvec-
tors of large dimensional random matrices. For example, in principal component
analysis (PCA), the eigenvectors corresponding to a few of the largest eigenvalues
of random matrices (that is, the directions of the principal components) are of spe-
cial interest. Therefore, the limiting behavior of eigenvectors of large dimensional
random matrices becomes an important issue in RMT. However, the investigation
on eigenvectors has been relatively weaker than that on eigenvalues in the litera-
ture due to the difficulty of mathematical formulation since the dimension increases
with the sample size.
1.2 Methodologies
This section introduces two important methods in the spectral analysis of large
dimensional random matrices, namely the moment method and Stieltjes transform
method. These two methods are widely used in random matrix theory. In this
section, we give a detailed discussion on these two methods, especially on the
investigation of the convergence rate using Stieltjes transform.
1.2 Methodologies 6

1.2.1 Moment Method
Moment method is widely used in finding the existence of limiting spectral dis-
tributions and limiting theorems on extreme eigenvalues ever since it was firstly
used by Wigner in 1958 to prove the famous semicircle law. Using moment method,
Bai, Silverstein and Yin (1988) proved an important theorem that the existence
of a finite fourth moment of the entries of both Wigner and sample covariance
matrices is a sufficient and necessary condition to guarantee that the largest eigen-
values converge almost surely to the largest number in the support of their limiting
spectral distributions.
Moment method is based on the moment convergence theorem. Suppose {F
n
}
denotes a sequence of distribution functions with finite moments of all orders. Let
the k-th moment of the distribution F
n
be denoted by
β
n,k
= β
k
(F
n
) =

x
k
dF
n
(x).
The following Lemmas are summarized in Bai and Silverstein (2010, Appendix

B).
Lemma 1.1. (Moment Convergence Theorem).
A sequence of distribution functions {F
n
} converges weakly to a limit if the follow-
ing conditions are satisfied:
1.2 Methodologies 7
(1) Each F
n
has finite moments of all orders.
(2) For each fixed integer k ≥ 0, β
n,k
converges to a finite limit β
k
as n → ∞.
(3) If two right continuous nondecreasing functions F , G have the same moment
sequence {β
k
}, then F = G + constant.
The following two lemmas guarantee that a probability distribution function is
uniquely determined by its moments.
Lemma 1.2. (Carleman).
Let
{
β
k
=
β
k
(

F
)
}
be the sequence of moments of the distribution function
F
. If
the following Carleman condition is satisfied:


k=1
β
−1/2k
2k
= ∞,
then, F is uniquely determined by the moment sequence {β
k
, k = 0, 1, ··· , }.
Lemma 1.3. (M. Riesz).
Let {β
k
} be the sequence of moments of the distribution function F . If
lim inf
k→∞
1
k
β
1/2k
2k
< ∞,
then, F is uniquely determined by the moment sequence {β

k
, k = 0, 1, ···} .
The moment convergence theorem shows that under what conditions the con-
vergence of moments of all orders implies the weak convergence of the sequence of
1.2 Methodologies 8
the distributions {F
n
}. In finding limiting spectral distributions, the foundamental
theory is the moment convergence theorem together with Carleman’s condition or
Riesz’s condition.
1.2.2 Stieltjes Transform
In this section, another important method in spectral analysis of random ma-
trix theory will be introduced - the Stieltjes transform method. Stieltjes transform
method is commonly used to investigate the limiting spectral properties of a class
of random matrices. Compared with the moment method, the Stieltjes transform
method is more attractive to researchers. This is mainly because the momen-
t method is always followed with sophisticated graph theory and combinatorics,
which makes the proof much tedious and complex.
First, we introduce basic concept and properties of the Stieltjes transform,
referring to Bai and Silverstein (2010, Appendix B). Further, the use of Stieltjes
transform will be demonstrated. That is how to find limiting spectral distributions
and estimate convergence rate of empirical spectral distributions in terms of their
Stieltjes transforms.
Definition 1.2. (The Stieltjes Transform)
1.2 Methodologies 9
If G(x) is a function of bounded variation on the real line, then its Stieltjes trans-
form is defined by
m
G
(z) =


1
λ − z
dG(λ), z ∈ C
+
,
where z ∈ C
+
≡ {z ∈ C : ℑz > 0} and ℑ denotes the imaginary part.
Remark 1.1. The Stieltjes transform is defined on C
+
.
Remark 1.2. The imaginary part of z plays an important role in Stieltjes transfor-
m. For all bounded variation functions, their Stieltjes transform always exist and
well defined since the absolute value of m
G
(z) is bounded by 1/v, where v = ℑz.
Theorem 1.1. (Inversion formula)
For any continuity points a < b of G, we have
G{[a, b]} = lim
ϵ→0
+
1
π

b
a
ℑm
G
(x + iϵ)dx.

Considering G as a finite signed measure, the above theorem shows a one-to-one
correspondence between the finite signed measures and their Stieltjes transforms.
Another important advantage of Stieltjes transforms is that the density function
of a signed measure can be obtained easily via its Stieltjes transform. We have the
following theorem.
Theorem 1.2. (Differentiability)
Let G be function of bounded variation and x
0
∈ R. Suppose that lim
z∈C
+
→x
0
ℑm
G
(z)
1.2 Methodologies 10
exists. Call it ℑm
G
(x
0
). Then G is differentiable at x
0
, and its derivative is
1
π
ℑm
G
(x
0

).
Note that in advanced probability theory, the famous continuity theorem de-
scribes that weak convergence of distribution functions can be showed by conver-
gence of their characteristic functions. Next, we will introduce a parallel theorem
regarding b ounded variation functions and their Stieltjes transforms. Using the
following theorem, the convergence of empirical spectral distributions of a class
of random matrices can be established by showing convergence of their Stieltjes
transforms and the limiting spectral distribution can be found by the limit of a
sequence of Stieltjes transforms.
Theorem 1.3. (Continuity)
Assume that {G
n
} is a sequence of functions of bounded variation and G
n
(−∞) = 0
for all n. Then
lim
n→∞
m
G
n
(z) = m(z) ∀z ∈ C
+
,
if and only if there is a function of bounded variation G with G(−∞) = 0 and
Stieltjes transform m(z) and such that G
n
→ G vaguely.
Next, we will introduce how to use the Stieltjes transform in random matrix
theory to find the limiting spectral distribution and establish the convergence rate

of empirical spectral distribution.
1.2 Methodologies 11
Let F
A
be the empirical spectral distribution function for any n ×n Hermitian
matrix A. Then, the Stieltjes transform of F
A
is given by
m
F
A
(z) =

1
x − z
dF
A
(x) =
1
n
tr(A − zI)
−1
,
where I denotes the identity matrix. Further applying the inverse matrix formula,
we have
m
F
A
(z) =
1

n
n

k=1
1
a
kk
− z − α

k
(A
k
− zI)
−1
α
k
,
where a
kk
is the (k, k)th entry of A, ∗ denotes the conjugate transpose, A
k
is the
(n −1) ×(n −1) submatrix of A with the k-th row and k-th column removed and
α
k
is the k-th column of A with the k-th entry removed.
If the denominator a
kk
−z−α


k
(A
k
−zI)
−1
α
k
can be expressed as g(z, m
F
A
(z))+
o(1) for some function g, then the limiting spectral distribution exists and its
Stieltjes transform m(z) is the solution to the equation
m(z) =
1
g(z, m(z))
.
For estimating convergence rates of empirical spectral distribution function
to its limiting spectral distribution, the following three lemmas (also called Bai
inequality), which were first established by Bai in 1993, demonstrate the distance
between two distribution functions in terms of their Stieltjes transforms.
Lemma 1.4. (Theorem 2.1, Bai (1993a)).
1.2 Methodologies 12
Let F be a distribution function and let G be a function of bounded variation
satisfying

|F (x) − G(x)|dx < ∞. Denote their Stieltjes transforms by f(z) and
g(z), respectively. Then, we have
∥F − G∥ := sup
x

|F (x) − G(x)|

1
π(2γ − 1)



−∞
|f(z) − g(z)|du +
1
v
sup
x

|y|≤2vh
|G(x + y) − G(x)|dy

,
where z = u + iv, v > 0, and h and γ are constants related to each other by
γ =
1
π

|u|<h
1
u
2
+ 1
du >
1

2
.
Sometimes, we can establish a bound for ∥F − G∥ on a finite interval in terms
of the integral of the absolute difference of their Stieltjes transform, when the
functions F and G have light tails or have bounded support. We have the following
lemmas.
Lemma 1.5. (Theorem 2.2, Bai (1993a))
Under the assumptions of Lemma 1.4, we have
∥F − G∥ ≤
1
π(1 − κ)(2γ −1)


A
−A
|f(z) − g(z)|du + 2πv
−1

|x|>B
|F (x) −G(x)|dx
+v
−1
sup
x

|y|≤2vh
|G(x + y) − G(x)|dy

,
where A and B are positive constants such that A > B and

κ =
4B
π(A − B)(2γ −1)
< 1. (1.1)
1.2 Methodologies 13
Lemma 1.6. (Corollary 2.3, Bai (1993a))
In addition to the assumptions of Lemma 1.5, assume further that, for some
constant B > 0, F ([−B, B]) = 1 and |G|((−∞, −B)) = |G|((B, ∞)) = 0, where
|G|((a, b)) denotes the total variation of the signed measure G on the interval (a, b).
Then, we have
∥F − G∥ ≤
1
π(1 − κ)(2γ −1)


A
−A
|f(z) − g(z)|du
+v
−1
sup
x

|y|≤2vh
|G(x + y) − G(x)|dy

,
where A, B and κ are defined in (1.1).
Remark 1.3. From these lemmas, we can see that the convergence rate of ∥F −G∥
has nothing to do with these constants h, γ, κ, A, B, but only depend on the rate

of v tending to 0 which will be illustrated in details in Chapter 3.
1.2.3 Organization of the Thesis
The thesis consists of five chapters and is organized as follows. In Chapter 1,
we have provided a general introduction to the RMT theory including spectral and
eigenvector analysis as well as two main methodologies in research.
1.2 Methodologies 14
In Chapter 2, we illustrate a review on spectral analysis of random matrix
theory in details.
Chapter 3 and Chapter 4 are the main parts of this thesis. We prove our
main results, three types of convergence rate of VESD and central limit theorem
for linear spectral statistics of VESD for sample covariance matrices, respectively.
Here VESD denotes eigenvector empirical spectral distribution which is defined in
(3.2).
In the last chapter, we discuss future research.
15
CHAPTER 2
Literature Review for Sample
Covariance Matrices
2.1 Spectral Analysis
Sample covariance matrix is the most fundamental and important random ma-
trix in multivariate statistical analysis. It plays an important role in hypothesis
testing, principal component analysis and factor analysis. Many test statistic can
be expressed as a function of its eigenvalues. Thus the spectral analysis of sample
covariance matrix has been well developed in the past decades. We will introduce

×