R. D. De Groat, et. Al. “Subspace Tracking.”
2000 CRC Press LLC. <>.
SubspaceTracking
R.D.DeGroat
TheUniversityofTexasatDallas
E.M.Dowling
TheUniversityofTexasatDallas
D.A.Linebarger
TheUniversityofTexasatDallas
66.1Introduction
66.2Background
EVDvs.SVD
•
ShortMemoryWindowsforTimeVarying
Estimation
•
ClassificationofSubspaceMethods
•
Historical
OverviewofMEPMethods
•
HistoricalOverviewofAdaptive,
Non-MEPMethods
66.3IssuesRelevanttoSubspaceandEigenTrackingMethods
BiasDuetoTimeVaryingNatureofDataModel
•
Control-
lingRoundoffErrorAccumulationandOrthogonalityErrors
•
Forward-BackwardAveraging
•
Frequencyvs.SubspaceEs-
timationPerformance
•
TheDifficultyofTestingandCom-
paringSubspaceTrackingMethods
•
SphericalSubspace(SS)
Updating—AGeneralFrameworkforSimplifiedUpdating
•
InitializationofSubspaceandEigenTrackingAlgorithms
•
DetectionSchemesforSubspaceTracking
66.4SummaryofSubspaceTrackingMethodsDevelopedSince
1990
ModifiedEigenProblems
•
Gradient-BasedEigenTracking
•
TheURVandRankRevealingQR(RRQR)Updates
•
Miscel-
laneousMethods
References
66.1 Introduction
Mosthighresolutiondirection-of-arrival(DOA)estimationmethodsrelyonsubspaceoreigen-
basedinformationwhichcanbeobtainedfromtheeigenvaluedecomposition(EVD)ofanestimated
correlationmatrix,orfromthesingularvaluedecomposition(SVD)ofthecorrespondingdata
matrix.However,theexpenseofdirectlycomputingthesedecompositionsisusuallyprohibitivefor
real-timeprocessing.Also,becausetheDOAanglesaretypicallytime-varying,repeatedcomputation
isnecessarytotracktheangles.Thishasmotivatedresearchersinrecentyearstodeveloplowcosteigen
andsubspacetrackingmethods.Fourbasicstrategieshavebeenpursuedtoreducecomputation:
(1)computingonlyafeweigencomponents,(2)computingasubspacebasisinsteadofindividual
eigencomponents,(3)approximatingtheeigencomponentsorbasis,and(4)recursivelyupdatingthe
eigencomponentsorbasis.Themostefficientmethodsusuallyemployseveralofthesestrategies.
In1990,anextensivesurveyofSVDtrackingmethodswaspublishedbyComonandGolub[7].
Theyclassifiedthevariousalgorithmsaccordingtocomplexityandbasicallytwocategoriesemerge:
O(n
2
r)andO(nr
2
)methods,wherenisthesnapshotvectorsizeandristhenumberofextreme
eigenpairstobetracked.Typically,r<norrn,sotheO(nr
2
)methodsinvolvesignificantlyfewer
computationsthantheO(n
2
r)algorithms.However,since1990,anumberofO(nr)algorithmshave
c
1999byCRCPressLLC
been developed. This article will primarily focus on recursive subspace and eigen updating methods
developed since 1990, especially, the O(nr
2
) and O(nr) algorithms.
66.2 Background
66.2.1 EVD vs. SVD
Let X =[x
1
|x
2
|...|x
N
] be an n × N data matrix where the kth column corresponds to the kth
snapshot vector, x
k
∈ C
n
. With block processing, the correlation matrix for a zero mean, stationary,
ergodic vector process is typically estimated as R =
1
N
XX
H
where the true correlation matrix,
= E[x
k
x
H
k
]=E[R].
The EVDoftheestimatedcorrelationmatrix is closely related tothe SVD of the corresponding data
matrix. The SVD of X is given by X = USV
H
where U ∈ C
n×n
and V ∈ C
N×N
areunitarymatrices
and S ∈ C
n×N
is a diagonal matrix whose nonzero entries are positive. It is easy to see that the left
singular vectors of X are the eigenvectors of XX
H
= USS
T
U
H
, and the right singular vectors of X
are the eigenvectors of X
H
X = VS
T
SV
H
. This is so because XX
H
and X
H
X are positive definite
Hermitian matrices (which have orthogonal eigenvectors and real, positive eigenvalues). Also note
that the nonzero singular values of X are the positive square roots of the nonzero eigenvalues of XX
H
and X
H
X. Mathematically, the eigen information contained in the SVD of X or the EVD of XX
H
(or X
H
X) is equivalent, but the dynamic range of the eigenvalues is twice that of the corresponding
singular values. With finite precision arithmetic, the greater dynamic range can result in a loss of
information. For example, in rank determination, suppose the smallest singular value is where
is machine precision. The corresponding eigenvalue,
2
, would be considered a machine precision
zero and the EVD of XX
H
(or X
H
X)would incorrectly indicate a rank deficiency. Because of the
dynamic range issue, it is generally recommended to use the SVD of X (orasquarerootfactorofR).
However, because additive sensor noise usually dominates numerical errors, this choice may not be
critical in most signal processing applications.
66.2.2 Short Memory Windows for Time Varying Estimation
Ultimately, weareinterestedin trackingsome aspectoftheeigenstructureofatimevarying correlation
(or data) matrix. For simplicity we will focus on time varying estimation of the correlation matrix,
realizing that the EVD of R is trivially related to the SVD of X. A time varying estimator must
have a short term memory in order to track changes. An example of long memory estimation is an
estimator that involves a growing rectangular data window. As time goes on, the estimated quantities
depend moreand more on the old data, and less and less on the new data. The twomost popular short
memory approaches to estimating a time varying correlation matrix involve (1) a moving rectangular
window and (2) an exponentially faded window. Unfortunately, an unbiased, causal estimate of the
true instantaneous correlation matrix at time k,
k
= E[x
k
x
H
k
], is not possible if averaging is used
and the vector process is truly time varying. However, it is usually assumed that the process is varying
slowly enough within the effective observation window that the process is approximately stationary
and some averaging is desirable. In any event, at time k, a length N moving rectangular data window
results in a rank two modification of the correlation matrix estimate, i.e.,
R
(rect)
k
= R
(rect)
k−1
+
1
N
(x
k
x
H
k
− x
k−N
x
H
k−N
)
(66.1)
where x
k
is the new snapshot vector and x
k−N
is the oldest vector which is being removed from the
estimate. The corresponding data matrix is given by X
(rect)
k
=[x
k
|x
k−1
|...|x
k−N+1
] and R
(rect)
k
=
1
N
X
(rect)
k
X
(rect)
k
H
. Subtracting the rank one matrix from the correlation estimate is referred to as
c
1999 by CRC Press LLC
a rank one downdate. Downdating moves all the eigenvalues down (or unchanged). Updating, on
the other hand, moves all eigenvalues up (or unchanged). Downdating is potentially ill-conditioned
because the smallest eigenvalue can move towards zero.
An exponentially faded data window produces a rank one modification in
R
(f ade)
k
= αR
(f ade)
k−1
+ (1 − α)x
k
x
H
k
(66.2)
where α is the fading factor with 0 ≤ α ≤ 1. In this case, the data matrix is growing in size, but the
older data is de-emphasized with a diagonal weighting matrix,
X
(f ade)
k
=[x
k
|x
k−1
|...|x
1
] sqrt(diag(1,α,α
2
, ..., α
k−1
))and R
(f ade)
k
= (1−α)X
(f ade)
k
X
(f ade)
k
H
.
Of course, the two windows could be combined to produce an exponentially faded moving rect-
angular window, but this kind of hybrid short memory window has not been the subject of much
study in the signal processing literature. Similarly, not much attention has been paid to which short
memory windowing scheme is most appropriate for a given data model. Since downdating is poten-
tially ill-conditioned, and since two rank one modifications usually involve more computation than
one, the exponentially faded window has some advantages over the moving rectangular window. The
main advantage of a (short) rectangular window is in tracking sudden changes. Assuming stationar-
ity within the effective observation window, the power in a rectangular window will be equal to the
power in an exponentially faded window when
N ≈
1
(1 − α)
or equivalent ly α ≈ 1 −
1
N
=
N − 1
N
.
(66.3)
Based on a Fourier analysis of linearly varying frequencies, equal frequency lags occur when [14]
N ≈
(1 + α)
(1 − α)
or equivalent ly α ≈
N − 1
N + 1
.
(66.4)
Either one of these relationships could be used as a rule of thumb for relating the effective observation
window of the two most popular short memory windowing schemes.
66.2.3 Classification of Subspace Methods
Eigenstructure estimation can be classified as (1) block or (2) recursive. Block methods simply
compute an EVD, SVD, or related decomposition based on a block of data. Recursive methods
update the previously computed eigen information using new data as it arrives. Wefocus on recursive
subspace updating methods in this article.
Most subspace tracking algorithms can also be broadly categorized as (1) modified eigen problem
(MEP) methods or (2) adaptive (or non-MEP) methods. With short memory windowing, MEP
methods are adaptive in the sense that they can track time varying eigen information. However,
when we use the word adaptive, we mean that exact eigen information is not computed at each
update, but rather, an adaptive method tends to move towards an EVD (or some aspect of an EVD) at
each update. For example, gradient-based, perturbation-based, and neural network-based methods
are classified as adaptive because on average they move towards an EVD at each update. On the other
hand, rank one, rank k, and sphericalized EVD and SVD updates are, by definition, MEP methods
because exact eigen information associated with an explicit matrix is computed at each update. Both
MEP and adaptive methods are supposed to track the eigen information of the instantaneous, time
varying correlation matrix.
c
1999 by CRC Press LLC
66.2.4 Historical Overview of MEP Methods
Many researchers have studied SVD and EVD tracking problems. Golub [19] introduced one of the
first eigen-updating schemes, and his ideas were developed and expanded by Bunch and co-workers
in [3, 4]. The basic idea is to update the EVD of a symmetric (or Hermitian) matrix when modified
by a rank one matrix. The rank-one eigen update was simplified in [37], when Schreiber introduced
a transformation that makes the core eigenproblem real. Based on an additive white noise model,
Karasalo [21] and Schreiber [37] suggested that the noise subspace be “sphericalized”, i.e., replace
the noise eigenvalues by their average value so that deflation [4] could be used to significantly reduce
computation. By deflating the noise subspace and only tracking the r dominant eigenvectors, the
computation is reduced from O(n
3
) to O(nr
2
) per update. DeGroat reduced computation further
by extending this concept to the signal subspace [8]. By sphericalizing and deflating both the signal
and the noise subspaces, the cost of tracking the r dimensional signal (or noise) subspace is O(nr)
and no iteration is involved. To make eigen updating more practical, DeGroat and Roberts developed
stabilization schemes to control the loss of orthogonality due to the buildup of roundoff error [10].
Further work related to eigenvector stabilization is reported in [15, 28, 29, 30]. Recently, a more
stable version of Bunch’s algorithm was developed by Gu and Eisenstat [20]. In [46], Yu extended
rank one eigen updating to rank k updating.
DeGroatshowedin[8] that forcing certain subspaces of the correlation matrix to be spherical, i.e.,
replacing the associated eigenvalues with a fixed or average value, is an easy way to deflate the size
of the updating problem and reduce computation. Basically, a spherical subspace (SS) update is a
rank one EVD update of a sphericalized correlation matrix. Asymptotic convergence analysis of SS
updating is found in [11, 13]. A four level SS update capable of automatic signal subspace rank and
size adjustment is described in [9, 11]. The four level and the two level SS updates are the only MEP
updates to date that are O(nr) and noniterative. For more details on SS updating, see Section 66.3.6,
Spherical Subspace (SS) Updating: A General Framework for Simplified Updating.
In [42], Xu and Kailath present a Lanczos based subspace tracking method with an associated
detection scheme to track the number of sources. A reference list for systolic implementations of
SVD based subspace trackers is contained in [12].
66.2.5 Historical Overview of Adaptive, Non-MEP Methods
Owsley pioneered orthogonal iteration and stochastic-based subspace trackers in [32]. Yang and
KavehextendedOwsley’sworkin[44] by devising a family of constrained gradient-based algorithms.
A highly parallel algorithm, denoted the inflation method, is introduced for the estimation of the
noise subspace. The computational complexity of this family of gradient-based methods varies from
(approximately) n
2
r to
7
2
nr for the adaptation equation. However, since the eigenvectors are only ap-
proximatelyorthogonal, an additional nr
2
flops may be needed if Gram Schmidt orthogonalization is
used. It maybe that a partial orthogonalization scheme (see Section 66.3.2 Controlling Roundoff Error
Accumulation and Orthogonality Errors) can be combined with Yang and Kaveh’s methods to improve
orthogonality enough to eliminate the O(nr
2
) Gram Schmidt computation. Karhunen [22] also ex-
tended Owsley’s work by developing a stochastic approximation method for subspace computation.
Bin Yang [43] used recursive least squares (RLS) methods with a projection approximation approach
to develop the projection approximation subspace tracker (PAST) which tracks an arbitrary basis for
the signal subspace, and PASTd which uses deflation to track the individual eigencomponents. A
multi-vector eigen tracker based on the conjugate gradient method is developed in [18]. Previous
conjugate gradient-based methods tracked a single eigenvector only. Orthogonal iteration, lossless
adaptive filter, and perturbation-based subspace trackers appear in [40][36], and [5] respectively.
A family of non-EVD subspace trackers is given in [16]. An adaptive subspace method that uses a
linear operator, referred to as the Propagator, is given in [26]. Approximate SVD methods that are
c
1999 by CRC Press LLC
based on a QR update step followed by a single (or partial) Jacobi sweep to move the triangular factor
towards a diagonal form appear in [12, 17, 30]. These methods can be described as approximate SVD
methods because they will converge to an SVD if the Jacobi sweeps are repeated.
Subspace estimation methods based on URV or rank revealing QR (RRQR) decompositions are
referenced in [6]. These rank revealing decompositions can divide a set of orthonormal vectors into
sets that span the signal and noise subspaces. However, a threshold (noise power) level that lies
between the largest noise eigenvalue and the smallest signal eigenvalue must be known in advance.
In some ways, the URV decomposition can be viewed as an approximate SVD. For example, the
transposed QR (TQR) iteration [12] can be used to compute the SVD of a matrix, but if the iteration
is stopped before convergence, the resulting decomposition is URV-like.
Artificial neural networks (ANN) have also been used to estimate eigen information [35]. In 1982,
Oja [31] wasone of the first to developan eigenvector estimating ANN.Using a Hebbian type learning
rule, this ANN adaptively extracts the first principal eigenvector. Much research has been done in
this area since 1982. For an overview and a list of references, see [35].
66.3 Issues Relevant to Subspace and Eigen Tracking
Methods
66.3.1 Bias Due to Time Varying Nature of Data Model
Because direction-of-arrival (DOA) angles are typically time varying, a range of spatial frequencies
is usually included in the effective observation window. Most spatial frequency estimation methods
yield frequency estimates that are approximately equal to the effective frequency average in the
window. Consequently, the estimates lag the true instantaneous frequency. If the frequency variation
is assumed to be linear within the effective observation window, this lag (or bias) can be easily
estimated and compensated [14].
66.3.2 Controlling Roundoff Error Accumulation and Orthogonality
Errors
Numerical algorithms are generally defined as stable if the roundoff error accumulates in a linear
fashion. However, recursive updating algorithms cannot tolerateeven a linear buildup oferrorif large
(possibly unbounded) numbers of updates are to be performed. For real time processing, periodic
reinitialization is undesirable. Most of the subspace tracking algorithms involve the product of at
least k orthogonal matrices by the time the kth update is computed. According to Parlett [33], the
error propagated by a product of orthogonal matrices is bounded as
|U
k
U
H
k
− I|
E
≤ (k + 1)n
1.5
(66.5)
where the n × n matrix U
k
= U
k−1
Q
k
= Q
k
Q
k−1
...Q
1
is a product of k matrices that are each
orthogonal to working accuracy, is machine precision, and|.|
E
denotes the Euclidean matrix norm.
Clearly, if k is large enough, the roundoff error accumulation can be significant.
There are really only two sources of error in updating a symmetric or Hermitian EVD: (1) the
eigenvalues and (2) the eigenvectors. Of course, the eigenvectors and eigenvalues are interrelated.
Errors in one tend to produce errors in the other. At each update, small errors may occur in the
EVD update so that the eigenvalues become slowly perturbed and the eigenvectors become slowly
nonorthonormal. The solution is to prevent significant errors from ever accumulating in either.
We do not expect the main source of error to be from the eigenvalues. According to Stewart [38],
the eigenvalues of a Hermitian matrix are perfectly conditioned, having condition numbers of one.
Moreover, it is easy to show that when exponential weighting is used, the accumulated roundoff error
c
1999 by CRC Press LLC