DSpace at VNU: Second-order optimization based adaptive PARAFAC decomposition of three-way tensors

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.18 MB, 12 trang )

JID:YDSPR AID:2072 /FLA

[m5G; v1.195; Prn:13/01/2017; 9:02] P.1 (1-12)

Digital Signal Processing ••• (••••) •••–•••

1

Contents lists available at ScienceDirect

2

67
68

3

Digital Signal Processing

4
5

69
70
71

6

72

www.elsevier.com/locate/dsp

7

73

8

74

9

75

10

76

11
12
13

Second-order optimization based adaptive PARAFAC decomposition of
three-way tensors

14
15
16
17
18

Viet-Dung Nguyen

a

, Karim Abed-Meraim , Nguyen Linh-Trung

b

a

PRISME Laboratory, University of Orléans, 12 rue de Blois BP 6744, Orléans, France
b
University of Engineering and Technology, Vietnam National University Hanoi, 144 Xuan Thuy, Cau Giay, Hanoi, Viet Nam

23

26
27
28

81
82
83
84
86

a r t i c l e

i n f o

Article history:
Available online xxxx

24
25

79

85

20
22

78
80

a,∗

19
21

77

Keywords:
Fast adaptive PARAFAC
Big data
Parallel computing
Non-negative constraint

29

30

a b s t r a c t

87
88

A fast adaptive parallel factor (PARAFAC) decomposition algorithm is proposed for a class of thirdorder tensors that have one dimension growing linearly with time. It is based on an alternating least
squares approach in conjunction with a Newton-type optimization technique. By preserving the Khatri–
Rao product and exploiting the reduced-rank update structure of the estimated subspace at each time
instant, the algorithm achieves linear complexity and superior convergence performance. A modiﬁed
version of the algorithm is also proposed to deal with the non-negative constraint. In addition, parallel
implementation issues are investigated. Finally, the performance of the algorithm is numerically studied
and compared to several state-of-the-art algorithms.
© 2017 Elsevier Inc. All rights reserved.

89
90
91
92
93
94
95
96

31

97

32

98

33

99

34
35

100

1. Introduction

36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

52
53
54
55
56
57
58

With the recent advances on sensor and streaming technologies, processing massive volumes of data (or “big data”) with
time constraints or even in real-time is not only crucial but also
challenging [1], in a wide range of applications including MIMO
radars [2], biomedical imaging [3], and signal processing [4].
A typical situation in those cases is that data are acquired along
multiple dimensions, one among which is time. Such data can be
naturally represented by multi-way arrays, which are called tensors. Tensor decomposition, thereby, can be used as an important
tool to analyze, understand or eventually compress data. Tucker
decomposition and Parallel Factor (PARAFAC) decomposition are
two widely used tensor decomposition methods and can be considered as generalizations of singular value decomposition (SVD)
to multi-way arrays. While Tucker decomposition lacks uniqueness
and often requires dealing with imposed constraints such as orthogonality, non-negativity, or sparseness [5], PARAFAC decomposition is unique up to scale and permutation indeterminacy under
a mild condition. For recent surveys on tensor decomposition, the
reader is referred to [4,6,7], and references therein for more details.
In this paper, PARAFAC decomposition is the method of interest.
For streaming tensors, direct application of batch (i.e., oﬄine)
PARAFAC decomposition is computationally demanding. Instead, an

59
60
61
62

63
64
65
66

*

Corresponding author.
E-mail addresses: (V.-D. Nguyen),
(K. Abed-Meraim),
(N. Linh-Trung).
/>1051-2004/© 2017 Elsevier Inc. All rights reserved.

adaptive (i.e., incremental) approach is more suitable and, hence,
should provide a good trade-off between quality and eﬃciency.
In contrast to adaptive ﬁltering [8] or subspace tracking [9–11],
which have a long standing history and are well-understood,
adaptive tensor decomposition has received little attention so
far.
In [2], Nion and Sidiropoulos proposed an adaptive decomposition model for a class of third-order tensors that have one
dimension growing with time. Accordingly, they proposed two algorithms: recursive least-squares tracking (PARAFAC-RLST) and simultaneous diagonalization tracking (PARAFAC-SDT). In [3], Mardani, Mateos and Giannakis also proposed an adaptive PARAFAC
method for streaming data under partial observation. A common
basis in these studies is the use of ﬁrst-order methods (i.e., using gradients) to optimize an exponentially weighted least-squares
cost function.
For the above class of third-order tensors, we have recently proposed a fast algorithm for adaptive PARAFAC decomposition that
has only linear complexity [12]. This algorithm, called 3D-OPAST,
generalizes the orthonormal projection approximation subspace
tracking (OPAST) algorithm [13] by exploiting a special interpretation of the Khatri–Rao product as collinear vectors inside each
column of the estimated subspace.
In this paper, we will provide an improved version of 3D-OPAST.

Compared to 3D-OPAST, SOAP is slightly better in terms of performance and has long run-time stability. In particular, we propose a
new algorithm for second-order optimization based adaptive PARAFAC
decomposition (SOAP). The main contributions of the proposed algorithm are summarized as follows.

101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126

127
128
129
130
131
132

JID:YDSPR

AID:2072 /FLA

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

19
20
21
22
23
24
25
26
27
28

[m5G; v1.195; Prn:13/01/2017; 9:02] P.2 (1-12)

V.-D. Nguyen et al. / Digital Signal Processing ••• (••••) •••–•••

2

1. SOAP has lower complexity and comparable or superior convergence as compared to PARAFAC-RLST and PARAFAC-SDT. In
terms of complexity, if I and K are the two tensor dimensions other than time, and R is the tensor rank, then SOAP
requires only O ( I K R ) ﬂops per iteration (linear complexity
with respect to R) while PARAFAC-RLST and PARARAC-SDT
require O ( I K R 2 ) (quadratic complexity). This is achieved by
ﬁrst exploiting a second-order stochastic gradient algorithm,
in replace of the ﬁrst-order gradient algorithm used in [2], to
improve estimation accuracy. Then, at each step in the algorithm, a column of the estimated subspace is forced to have a
Kronecker product structure so that the overall subspace approximately preserves the Khatri–Rao product structure. When
possible, a rank-one update is also exploited to achieve linear
complexity.
2. A variant of SOAP is proposed for adaptive PARAFAC decomposition with a non-negativity constraint. It is known that imposing a non-negativity constraint on PARAFAC, when applicable,
not only improves the physical interpretation [14] but also

helps to avoid diverging components [15,16]. To the best of
our knowledge, adaptive non-negative PARAFAC has not been
addressed in the literature.
3. SOAP is ready for parallel/decentralized computing implementation, an advantage not considered in [2]. This is especially important when performing large-scale online processing tasks. SOAP allows reduction of algorithm complexity and
storage when several parallel computing units (DSP) are available.

29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

Notations: We follow the notations used in [7]. Calligraphic letters are used for tensors (A, B , . . .). Matrices, (row and column)
vectors, and scalars are denoted by boldface uppercase, boldface
lowercase, and lowercase respectively; for examples A, a, and a.
The element (i , j , k) of a tensor A ∈ C I × J × K is symbolized as ai jk ,
the element (i , j ) of a matrix A ∈ C I × J as ai j , and the entry i of a
vector a ∈ C I as ai . A ⊗ B denotes the Kronecker product of A and
B, A B the Khatri–Rao product (column-wise Kronecker product),
A ∗ B the Hadamard product (element-wise matrix product), a ◦ b

the outer product of a and b, and [A]+ = max{0, ai j }, for all i , j,
is a positive orthant projection of a real-valued matrix A. A T , A∗ ,
A H and A# denote the transpose, the complex conjugate, the complex conjugate transpose and the pseudo-inverse of A, respectively.
A 0 denotes a non-negative A, whose entries ai j ≥ 0 for all i , j.

44
45

2. Batch and adaptive PARAFAC

46
47
48
49
50

2.1. Batch PARAFAC
Consider a tensor X ∈ C I × J × K . The PARAFAC decomposition of
X can be written as follows:
R

51
52

X=

55
56
57
58

59

(1)

r =1

53
54

ar ◦ br ◦ cr ,

summing R rank-one tensors, where R is the rank of X . The set
of vectors {ar }, {br }, and {cr } can be grouped into the so-called
loading matrices A = [a1 . . . a R ] ∈ C I × R , B = [b1 . . . b R ] ∈ C J × R , and
C = [ c1 . . . c R ] ∈ C K × R .
In practice, (1) is only an approximate tensor. In other words,
in a noisy environment we should have

60
61
62
63
64
65
66

R

X=

ar ◦ br ◦ cr + N ,

67
68
69
70
71
72
73
74

Fig. 1. Adaptive third-order tensor model and its equivalent matrix form.

(1 )

X

T

= (A

C)B ,

75

(3)

IK× J

(1)

∈C
with X(i −1) K +k, j = xi jk . We can write analo(
2)
and X(3) [2]. Without loss of generality,
gous expressions for X

where X(1)

we assume that 2 ≤ I ≤ K ≤ J . PARAFAC is generically unique if it
satisﬁes the following condition [17], [18]:

R ≤ min( I + K − 2, J )

(4)

Moreover, if 3 ≤ I ≤ K , then the generic uniqueness holds generally (see [18] and references therein) when

R ≤ ( I − 1)( K − 1) and ( I − 1)( K − 1) ≤ J .

(5)

76
77
78
79
80
81
82
83
84

85
86
87
88
89

2.2. Adaptive PARAFAC

90

In batch PARAFAC, the dimensions of X are ﬁxed. In contrast, in adaptive PARAFAC, they grow with time, that is, X (t ) ∈
C I (t )× J (t )× K (t ) in the general case. In this paper, we consider the
case where only one dimension grows with time, in particular
X (t ) ∈ C I × J (t )× K .
For the ease of comparison, we follow the basic adaptive
PARAFAC model and assumptions that were introduced in [2]. Under this model, the mode-1 tensor represented in matrix form at
time t, X(1) (t ), is given by

91
92
93
94
95
96
97
98
99
100

X(1) (t )

H(t )B T (t ),

(6)

101

where H(t ) = A(t ) C(t ) ∈ C
, B(t ) ∈ C
. When taking into
account two successive times, it can be expressed as a concatenation of the mode-1 tensor of past data at time t − 1 and a vector
of new data at time t, i.e.,

102

X(1) (t ) = (X(1) (t − 1), x(t )),

107

I K ×R

J (t )× R

(7)

where x(t ) ∈ C I K is obtained from vectorizing the new slide of
data at time t. Fig. 1 illustrates this formulation.
The loading matrices A and C assumed to follow unknown but
slowly time-varying models such that A(t ) A(t − 1) and C(t )
C(t − 1), and hence H(t ) H(t − 1). Accordingly, we have

T

B (t )

T

T

(B (t − 1), b (t )).

(8)

It means that at each time instant we only need to estimate the
row vector b(t ) and augment it to B(t − 1) to obtain B(t ), instead
of updating the whole B(t ).
Also, the tensor rank, R, is assumed to be known and constant so that at each time instant, when new data is added to the
old tensor, the uniqueness property of the new tensor is fulﬁlled
by (4) and (5). We note that estimating tensor rank is NP-complete
problem [7]. Several heuristic methods can be found in [19] and
references therein.

103
104
105
106
108
109
110
111
112

113
114
115
116
117
118
119
120
121
122
123
124
125

3. Proposed second-order optimization based adaptive PARAFAC

126
127

(2)

r =1

where N is a noise tensor. Thus, given a data tensor X , PARAFAC
tries to achieve R-rank best least squares approximation. Equation (1) can also be re-formulated in matrix form as

Consider the following exponentially weighted least-square cost
function:

(t ) =

1
2

t

τ =1

128
129
130

λt −τ φ(τ ),

(9)

131
132

JID:YDSPR AID:2072 /FLA

[m5G; v1.195; Prn:13/01/2017; 9:02] P.3 (1-12)

V.-D. Nguyen et al. / Digital Signal Processing ••• (••••) •••–•••

1

where

4
5
6
7
8
9
10
11

φ(τ ) = x(τ ) − H(t )bT (τ )

2
2

(10)

and λ ∈ (0, 1] is referred to as the forgetting factor. In (10), we
rely on the slow time-varying assumption of the loading matrices
so that H(τ ) H(t ) in the considered processing window. Now,
ﬁnding the loading matrices of the adaptive PARAFAC model of (6)
corresponds to minimizing (9), that is,

min

H(t ),B(t )

(t ),

s.t. H(t ) = A(t )

C(t ).

(11)

12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

This cost function is well-known in adaptive ﬁlter theory [8]

and can be solved by a recursive least-square method as used
in [2]. In this paper, we provide an alternative way to not only
improve the performance of the algorithm but also reduce the
complexity. For the performance, our idea is to ﬁrst optimize
the exponentially-weighted least-squares cost function by using
second-order stochastic gradient, then approximately preserve the
Khatri–Rao product of the estimated subspace H(t ) at each step.
To achieve linear complexity, we propose to update each column
of the subspace at a time instant using a cyclic strategy. Thus,
our algorithm is called Second-Order Optimization based Adaptive
PARAFAC (SOAP).
Given the estimates of A(t − 1), B(t − 1) and C(t − 1), the objective of SOAP is to construct the recursive update expressions for
A(t ), B(t ) and C(t ) using alternating minimization. The algorithm
includes four main steps as follows:
Step 1: Estimate b T (t )
Step 2: Given b(t ), estimate H(t )
Step 3: Extract A(t ) and C(t ), and estimate one column of H(t )
Step 4: Calculate H# (t ) and update b(t )
While the steps are similar to those in [2], the details in these
steps contain our various improvements. We now explain these
steps in detail. The summary of SOAP is given in Table 1.

A(t − 1), B(t − 1), C(t − 1), H(t − 1), H# (t − 1), R−1 (t − 1)
1. Estimate b T (t )
b T (t ) = H# (t − 1)x(t )
2. Estimate H(t )
β(t ) = 1 + λ−1 b∗ (t )R−1 (t − 1)bT (t )
u(t ) = λ−1 R−1 (t − 1)b T (t )
R−1 (t ) = λ−1 R−1 (t − 1) − β −1 (t )u(t )u H (t )
γ (t ) = η(1 − β −1 (t )b∗ (t )u(t ))

d(t ) = γ (t )[x(t ) − H(t − 1)b T (t )]
H(t ) = H(t − 1) + d(t )u T (t )
3. Extract A(t ) and C(t ), update one column of H(t )
– Extract A(t ) and C(t )
for i = 1, . . . , R
Hi (t ) = unvec(hi (t ))
ai (t ) = HiT (t )ci (t − 1)
Hi (t )ai (t )
ci (t ) =
Hi (t )ai (t )
– Update column j of H(t )
j = (t mod R ) + 1
ˆ j (t ) = a j (t ) ⊗ c j (t )
h
ˆ j (t ) − h j (t )
z(t ) = h
ˆ j (t )
H(:, j )(t ) = h
4. Calculate H# (t ) and update b(t )
Calculate H# (t ) using fast matrix inversion lemma
b T (t ) = H(t )# x(t )
B T (t ) = [B T (t − 1), b T (t )]

40

43
44
45
46
47

48

3.1. Estimate b T (t )

53
54
55
56
57
58
59
60
61

This step is the same as in [2]. Vector b T (t ) can be obtained as
the least-square solution of (10), according to

arg min
bT

T

x(t ) − H(t − 1)b (t )

2
2,

(12)

which is

bˆ T (t ) = H# (t − 1)x(t ),

(13)

φ(τ ) = x(τ ) − (b(τ ) ⊗ I I K )h(t )

wherein we have exploited the fact that vec(ABC T ) = (C ⊗ A) vec(B).
As mentioned in [20], the direction at the maximum rate of
change of a function
with respect to h is given by the derivative of
with respect to h∗ , i.e., [Dh∗ (h, h∗ )] T . Consequently, we
have

64
65
66

[Dh∗ (h, h∗ )]T

t
h=h(t −1)

=−

λt −τ [(b H (τ ) ⊗ I I K )x(τ )

τ =1

− (b H (τ )b(τ ) ⊗ I I K )h(t − 1)].

(15)

75
76
77
78

82
83
84
85

(30)
(32)

86
87
88
89
90

(33)
(8)

91
92
93
94

H = Dh ([Dh∗ (h, h∗ )]T )
t

t −τ

=

λ

96
97
98

h=h(t −1)

99

H

[(b (τ )b(τ )) ⊗ I I K ] = R(t ) ⊗ I I K ,

(16)

101
103

t

R(t ) =

104

t −τ

λ

105

H

b (τ )b(τ )

106

τ =1

107

= λR(t − 1) + b H (t )b(t )

∗

[Dh∗ φ(h, h , t )]

108

T
h=h(t −1)

109

110
111
112

H

= −[(b (t ) ⊗ I I K )x(t )

113
114
115
116
117

h(t ) = h(t − 1) + ηH−1 [Dh∗ φ(h, h∗ , t )] T ,
where
obtain

100
102

− (b (t )b(t )) ⊗ I I K h(t − 1)]. (17)

(14)

74

81

The update rule of h is thus given by

2
2,

73

80

H

Unlike [2], we use here a Newton-type method to ﬁnd H(t ). Let
h = vec(H). Function φ(τ ) in (10) can be rewritten as

72

95

where H# (t − 1) has been calculated in the previous iteration.
3.2. Estimate H(t )

71

79

To achieve linear complexity, we replace (15) by instant gradient
estimation (i.e., stochastic gradient)

62
63

(22)
(23)
(21)
(26)
(25)
(24)

where

51
52

(13)

Thus, its Hessian is computed as

49
50

70

A(t ), B(t ), C(t ), H(t ), H# (t ), R−1 (t )

41
42

69

Outputs:

38
39

68

Inputs:

τ =1

36
37

67

Table 1
Summary of SOAP.

2
3

3

(18)

η is a step size. By substituting (16) and (17) into (18), we

118
119
120
121

h(t ) = h(t − 1) + η[R

−1

122

H

(t )(b (t ) ⊗ I I K )x(t )

123

− R−1 (t )(b H (t )b(t ) ⊗ I I K )h(t − 1)].

(19)

125

We can stack (i.e., unvec) (19) in matrix form as follows:
T

∗

H(t ) = H(t − 1) + η[x(t ) − H(t − 1)b (t )]b (t )R

−1

(t ).

124
126

(20)

Here, we can see that calculating and storing the Hessian explicitly
as in (16) is not necessary. Instead, we only need to calculate the
pseudo-inverse of R(t ). Since R(t ) has a rank-1 update structure,
its inverse can be eﬃciently updated using the inversion lemma as

127
128
129
130
131
132

JID:YDSPR

AID:2072 /FLA

[m5G; v1.195; Prn:13/01/2017; 9:02] P.4 (1-12)

V.-D. Nguyen et al. / Digital Signal Processing ••• (••••) •••–•••

4

1

R−1 (t ) = [λR(t − 1) + b T (t )b∗ (t )]−1

2

−1 −1

=λ

3
4
5
6
7

R

(t − 1) − β

−1

(t )u(t )u (t ),

β(t ) = 1 + λ−1 b∗ (t )R−1 (t − 1)bT (t ),
−1 −1

T

u(t ) = λ

9

Substituting (21) into (20) yields

11
12
13
14
15
16

(21)

where

8
10

H

R

(t − 1)b (t ).

(22)
(23)

H(t ) = H(t − 1) + d(t )u H (t ),

(24)

where
T

d(t ) = γ (t )[x(t ) − H(t − 1)b (t )],

γ (t ) = η(1 − β −1 (t )b∗ (t )u(t )).

(25)
(26)

21
22
23
24
25
26
27
28
29
30

C(t )

31

= [a1 (t ) ⊗ c1 (t ) · · · a R (t ) ⊗ c R (t )]

32

= [vec(c1 (t )a1T (t )) · · · vec(c R (t )aTR (t ))].

33
34
35
36
37
38
39

(27)

Each column is the vectorization of a rank-1 matrix. The loading
vectors ci (t ) and ai (t ) are, thus, the principal left singular vector
and the conjugate of the principal right singular vector of matrix
Hi (t ) = unvec(ai (t ) ⊗ ci (t )), respectively [2]. However, using batch
SVD is not suitable for adaptive tracking, we use a single Bi-SVD
iteration [22] to update ai (t ) and ci (t ) recursively according to

40
41
42
43
44
45
46
47
48
49
50
51

52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

for i = 1, . . . , R
ai (t ) = HiT (t )ci (t − 1)
ci (t ) =

Hi (t )ai (t )
Hi (t )ai (t )

.

(28)
(29)

Then, having obtained A(t ) and C(t ), H(t ) may be re-updated
according to (27), as done in [2]. However, to achieve linear complexity, in this paper we choose to re-update only one column of

H(t ) at each iteration. In particular, at time instant t, we select the
column of H(t ) to be updated in a cyclic way; that is, select column j where j = (t mod R ) + 1. This column is then updated
as

ˆ j (t ) = a j (t ) ⊗ c j (t ).
h

(30)

Because of the best rank-1 approximation property of SVD, we
take advantage of the denoised loading vectors a j (t ) and c j (t )
and, thereby, improve the accuracy of H(t ) estimation. The other
columns of H(t ) are left unchanged to preserve the reduced-rank
structure of the updated matrix. The updated version of H(t ) can
be expressed as

ˆ (t ) =
H

H(t ) + z(t )e Tj (t ),

(31)

where

ˆ j (t ) − h j (t ),
z(t ) = h

70

74

As mentioned in Step 3, we can compute the pseudo-inverse
ˆ (t ) eﬃciently thanks to its rank-2 update structure. Our case
of H
corresponds to Theorem 5 in [23], as given in the Appendix.
Then, we can update b(t ) by

76

ˆ # (t )x(t ),
b T (t ) = H

72
73
75

(33)

77
78
79
80
81
82
83
84

The purpose of this step is to (i) preserve an approximate
Khatri–Rao product structure of H(t ) in order to improve the estimation accuracy and ensure the convergence of the algorithm,

(ii) provide a reduced rank update structure that allows the calculation of H# (t ) in the next step with linear complexity, and
(iii) extract from H(t ) the loading matrices A(t ) and C(t ). This can
be implemented eﬃciently as follows.
Before proceeding further, recall from [21,2] that A(t ) and C(t )
can be extracted from H(t ), based on

A(t )

69

3.4. Calculate H# (t ) and update b(t )

3.5. Algorithm initialization

H(t )

68

71

3.3. Extract A(t ) and C(t ) & update one column of H(t )

19
20

67

ˆ (t ) has a rank-2 update structure by substituting (24)
to see that H
into (31).

and hence obtain B(t ) from (8).

17
18

and e j (t ) is the unit vector whose j-th entry is one. Here, vector
h j (t ) deﬁnes the j-th column of H(t ) estimated from previous step
and z(t ) is an error term of j-th column between the current step
ˆ j (t )) and the previous step (i.e, h j (t )). It is straightforward
(i.e, h

(32)

85

To initialize A(0), B(0), C(0), H# (0) and R−1 (0) before tracking, we can capture J 0 slices, where J 0 is chosen to satisfy the
uniqueness condition of (4) and (5), and then run a batch PARAFAC
algorithm to obtain A(0), B(0), and C(0). After that, we compute
H# (0) = (A(0) C(0))# and R−1 (0) = (B T (0)B(0))−1 .

86
87
88
89
90
91
92

4. Adaptive non-negative PARAFAC

93
94

In this section, we consider the case where the constraint of
non-negativity is imposed, and modify SOAP accordingly. We call
this modiﬁcation of SOAP as non-negative SOAP (NSOAP).
Given the non-negative estimates of A(t − 1) 0, B(t − 1) 0,
and C(t − 1) 0, we want to ﬁnd recursive updates of A(t ) 0,
B(t )
0, and C(t )
0, which are the loading matrices of the
PARAFAC decomposition. We note that, while SOAP works with the
general case of complex values, in this section we only consider
real non-negative ones.
A simple approach is to use the positive orthant projection.
That is, at each step of SOAP, we project the result on the positive orthant, for example, in Step 1, set b T (t ) := [b T (t )]+ . However,
this naive combination does not work for Step 2 (the so-called
projected Newton-type method), as indicated in the context of
constrained optimization [24] or least-squares non-negative matrix
approximation [25].
In practice, for batch processing, the projected Newton-type
method requires a combination of restrictions on the Hessian (e.g.,
diagonal or partly diagonal [26]) and the Armijo-like step rule [24]
to guarantee a quadratic rate of convergence. In spite of their advantage, computing the Armijo-like step rule is expensive and not
suitable for adaptive algorithms. It is even more diﬃcult in our
case because the global optimum value can be changed continuously, depending on new data.
Therefore, we propose to use a simpler strategy. Particularly,
because of the slowly time-varying model assumption, we restrict
ourselves to only calculating the diagonal of the Hessian and using a ﬁxed step rule. Even though the convergence proof is not

available yet, this strategy still gives an acceptable performance
and represents a good trade-off between the performance and the
complexity, as indicated in Section 6.
Now, for the details, several modiﬁcations of SOAP for handling
the non-negativity constraint are as follows. At the end of Step 1,
we add one minor step after obtaining b T (t ), by setting

b T (t ) := [b T (t )]+ .

(34)

In Step 2, after calculating R−1 (t ), we extract the diagonal matrix
(which is non-negative since R is positive Hermitian) as

95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110

111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132

JID:YDSPR AID:2072 /FLA

[m5G; v1.195; Prn:13/01/2017; 9:02] P.5 (1-12)

V.-D. Nguyen et al. / Digital Signal Processing ••• (••••) •••–•••

1
2

Table 2
Summary of NSOAP.
Inputs:

3

A(t − 1) 0, B(t − 1) 0, C(t − 1) 0,
H(t − 1), H# (t − 1), R−1 (t − 1)
1. Estimate b T (t )
Perform Step 1 of SOAP
b T (t ) = [b T (t )]+
2. Estimate H(t )
β(t ) = 1 + b(t )R# (t − 1)b(t )T
u(t ) = λ−1 R−1 (t − 1)b(t ) T
R−1 (t ) = λ−1 R−1 (t − 1) − β −1 (t )u(t )u(t ) T
ˆ −1 (t ) = diag(diag(R−1 (t )))
R
γ (t ) = η(1 − β −1 (t )b∗ (t )u(t ))
d(t ) = γ (t )(x(t ) − H(t − 1)b T (t ))
ˆ (t ) = λ−1 Rˆ −1 (t )bT (t )
u
ˆ T (t )
H(t ) = H(t − 1) + d(t )u
˜ (t ) = [H(t )]+
H
˜ (t )
3. Same as Step 3 of SOAP but with H

4. Calculate H# (t ) and update b(t )
Perform Step 4 of SOAP
b T (t ) = [b T (t )]+
B T (t ) = [B T (t − 1), b T (t )]

4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

(34)
(22)
(23)
(21)
(35)
(26)

(25)
(36)
(38)

(34)
(8)

Outputs:

22

A(t ) 0, B(t ) 0, C(t )
H(t ), H# (t ), R−1 (t )

23

0,

24
25
26

ˆ −1

R

(t ) = diag(diag(R

−1

27

and then calculate

28

ˆ (t ) = λ−1 Rˆ −1 (t )bT (t ).
u

29
30
31
32
33

(t ))),

(35)

(36)

ˆ (t ) instead of d(t ) and u(t ),
Thus, H(t ) is updated, using d(t ) and u
by

ˆ T (t ).
H(t ) = H(t − 1) + d(t )u

(37)

34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

Then, we project H(t ) on the positive orthant to obtain

˜ (t ) = [H(t )]+ .
H

(38)

As previously discussed in Step 3, ci (t ) and ai (t ) are respectively
the principal left singular vector and the conjugate of the principal right singular vector of matrix Hi (t ) = unvec(ai (t ) ⊗ ci (t )).
The updated loading matrices A(t ) and C(t ), obtained by (28)
˜ i (t ) is non-negative and so
and (29), are still non-negative since H
are A(t − 1) and C(t − 1) (already obtained in the previous time
instant). In Step 4, a positive orthant projection like (34) is used.

A summary of all the steps of the NSOAP algorithm is given in
Table 2. The initialization of NSOAP is similar to that of SOAP, except that a batch non-negative PARAFAC algorithm is used instead
of the standard PARAFAC.

5

Third, the main reason that helps SOAP achieve linear complexity, while still having a comparable or even superior performance
(as shown in the next section) stems from Steps 2 and 3. In fact,
the subspace H(t ) in SOAP is updated two times. In Step 2, both
the gradient (stochastic) and Hessian are used instead of only the
gradient as in PARAFAC-RLST. This ﬁrst time update is for all R
columns (i.e. equation (24)). However, the subspace update does
not preserve Khatri–Rao product structure of H(t ). Thus, in Step 3,
we exploit the Khatri–Rao product structure to enhance the performance. This is the second update. Moreover, we note that, in [2],
PARAFAC-RLST and PARAFAC-SDT extract A(t ) and C(t ) directly
from the subspace update without preserving Khatri–Rao product
as our proposed algorithm. The fast calculation of H# (t ) using the
pseudo-inverse lemma in Step 4 is a consequence of designing H(t )
to have a rank-2 update in Steps 2 and 3.
Finally, in Step 3, we can update all columns of H(t ) using (28)
and (29) for i = 1, . . . , R. However, it leads to calculate H# (t ) without a rank-2 update structure as in SOAP, by using the Khatri–Rao
product structure as follows:
#

T

T

H (t ) = [A (t )A(t ) ∗ C (t )C(t )]

−1

[A(t )

T

C(t )] .

(39)

The cost for this implementation is O ( I K R 2 ) and is thus disregarded in this paper.
Now, we show that both SOAP and NSOAP are easy to realize
in a parallel scheme. This implementation is important when used
for massive data (large dimensional systems). It can be observed
that the main computational cost comes from the matrix–vector
product. Assume that R computational units (DSPs) are available.
Then, in Step 1, Equation (13) corresponds to

˜ i (t − 1)x(t ),
biT (t ) = h

i = 1, . . . , R ,

53
54
55
56
57
58
59

60
61
62
63
64
65
66

69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90

91
92
93
94
95
96
97

˜ i (t − 1) is i-th row of H# (t − 1). It means that we have
where h
replaced the matrix–vector product by the vector–vector product.
This procedure can also be applied in Step 4. Steps 2 and 3 themselves have already a parallel structure and, again, note that each
column of H(t ) can be estimated independently. By this way, the
overall cost can be reduced, by approximately a factor of R, to
O ( I K ) ﬂops per iteration.

98
99
100
101
102
103
104
105
106

6. Simulations

107
108

In this section, we study the performance of the proposed algorithms using both synthetic and real data, from [27]. We also
consider the effect of different system parameters on the performance.

109
110
111
112
113

6.1. Performance comparison

114
115

5. Discussions

51
52

68

(40)

49
50

67

In this section, we provide some important comments on the

similarities and differences between our algorithms and those developed in [2]. Discussions on parallel implementations are also
given.
First, it is straightforward to realize that in all steps the main
cost of SOAP comes from the matrix–vector product. Thus, it has
linear complexity of O ( I K R ). NSOAP is slightly more expensive but
has the same complexity order as SOAP.
Second, obviously, in Step 3 of SOAP, we can choose to update
d > 1 columns of H(t ) instead of only 1 column. This parameter
d can be chosen to balance between the estimation accuracy and
the numerical cost. However, in our simulation contexts, we have
observed that the loss of estimation accuracy is quite minor as
compared to updating all the columns of H(t ) and so we opted for
d = 1 in all our simulation experiments given in this paper.

First, we use the framework provided by the authors in [2] to
verify and compare the performance of the considered algorithms.
A time-varying model is thereby constructed so that, at time instant t, we generate the loading matrices A(t ) and C(t ) as

116

A(t ) = (1 − ε A )A(t − 1) + ε A N A ,

(41)

120

C(t ) = (1 − εC )C(t − 1) + εC NC ,

(42)

where ε A and εC control the speed of variation for A and C between two successive observations, N A and NC are random matrices with identical sizes with A and C. Generate a vector b(t )
randomly and the noiseless input data x(t ) is given by

x(t ) = [A(t )

T

C(t )]b (t ).

Thus, this observation vector follows the model described in Section 2.2 and are constrained by the assumptions therein. Then, the
noisy observation is given by

117
118
119
121
122
123
124
125
126
127
128
129
130
131
132

JID:YDSPR

AID:2072 /FLA

[m5G; v1.195; Prn:13/01/2017; 9:02] P.6 (1-12)

V.-D. Nguyen et al. / Digital Signal Processing ••• (••••) •••–•••

6

1
2
3
4
5
6
7

67

Table 3
Experimental set-up parameters.
Fig.
2
3
4
5

I
20
20

5
20

J0
50
50
70
50

68

K

R

20
20
61
20

8
8
3
8

ε A , εC

λ

η

1000
1000
201
1000

10−3

0.8
0.8
0.8
0.8

NA
0.02
0.002
0.02

8
9
10
11

6
7
8–9

50
20
5

50
50
50

50
20
61

20
8
3

69

T

1000
10000
1000

10−3
NA
10−2
10−3
10−5
10−3
0
0

70
71
72
73
74

0.8
0.8
0.8

75

NA
NA
0.02

76
77

12
13
14
15
16
17
18
19
20
21
22

23
24

78

x˜ (t ) = x(t ) + σ n(t ),

79

(43)

80
81

where n(t ) is a zero mean, unit-variance noise vector while parameter σ is introduced to control the noise level. We set a default
value of σ to 10−3 . To have a fair comparison, we keep all default
parameters of the algorithms and the model as offered by the authors of [2]. A summary of parameters used in our experiments is
showed in Table 3.
The performance measures for the loading matrices A(t ) and
C(t ) are the standard deviations (STD) between the true loading
matrices, A(t ) and C(t ), and their estimates, Aes (t ) and Ces (t ), up
to a scale and permutation indeterminacy at each time

82
83
84
85
86
87
88

89
90

25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

91

STDA (t ) = A(t ) − Aes (t )

F,

(44)

STDC (t ) = C(t ) − Ces (t )

F

.

(45)

92
93
94
95

For the loading matrix B(t ), because of its time-shift structure, we
verify its performance through x(t ) by

STDB (t ) = x(t ) − xes (t )

96
97

2

.

(46)

98
99

We would like to remind that, in the noiseless case x(t ) = [A(t )
C(t )]b T (t ). Thus, when we assess performance of algorithms by
equation (46), we evaluate the whole model at each time instant
and indirectly verify the estimation accuracy of b(t ).
To assess the convergence rate of the algorithms, we set up the
following scenario: always keep the speed of variation of A and C

constant except at a few speciﬁc time instants at which the speed
of variation arbitrarily increases. Thus, the algorithm that recovers
faster yields a better convergence rate. This scenario is similar to
the convergence rate assessment in the context of subspace tracking, see for example [28,29].
The ﬁrst experiment is to compare the performance of SOAP
with that of PARAFAC-RLST, PARAFAC-SDT (exponential window),
3DOPAST and batch PARAFAC-ALS (Alternating Least-Squares).
Batch PARAFAC here serves as a “lower bound” for adaptive algorithms. As shown in Fig. 2, SOAP outperforms PARAFAC-RLST,
PARAFAC-SDT, 3DOPAST and is closer to batch PARAFAC than the
others. In addition, SOAP, 3DOPAST, PARAFAC-RLST, and PARAFACSDT approximately have the same convergence rate. Again, we
note that the computational complexity of SOAP and 3DOPAST are
O ( I K R ) as compared to O ( I K R 2 ) of PARAFAC-RLST and PARAFACSDT.
In the second experiment for non-negative data, we take absolute value of the previous model, i.e.,
+

100
101
102
103
104
105
106
107
108
109
110
111
112
113
114

115
116
117
118
119
120
121
122
123

A (t ) = |A(t )|,

(47)

124

C+ (t ) = |C(t )|,

(48)

126

+

x (t ) = |x(t )|,

125
127

(49)

where (+ ) means non-negative. Since there exist no other adaptive non-negative PARAFAC algorithms apart from our NSOAP, we
compare NSOAP with the batch non-negative PARAFAC (Batch NPARAFAC) algorithm implemented in the N-way toolbox [30]. The

128
129

Fig. 2. Performance and speed convergence rate comparison of ﬁve algorithms when
loading matrices change relatively fast, ε A = εC = 10−3 .

130
131
132

JID:YDSPR AID:2072 /FLA

[m5G; v1.195; Prn:13/01/2017; 9:02] P.7 (1-12)

V.-D. Nguyen et al. / Digital Signal Processing ••• (••••) •••–•••

7

results are shown in Fig. 3. As expected, performance of batch NPARAFAC is better than that of NSOAP. However, the advantage of
NSOAP is its low computational complexity suitable for streaming
data contexts.
To further study the performance of NSOAP, we apply it to a
typical example using a ﬂuorescence dataset [27] which includes
ﬁve samples of ﬂuorescence excitation-emission of size 5 samples
× 201 emission wavelengths × 61 excitation wavelengths. It is

showed that the estimated loading matrices from PARAFAC with
the nonnegative constraint are similar to the pure spectra. Here,
we use an initialization tensor of size 5 × 70 × 61. Note that the
emission wavelength is relatively short, and during the interval
[250, 300] one of three components is almost zero. Fig. 4 shows
that NSOAP can recover the tensor components in this particular
example.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

6.2. Relative execution time comparison

18

In Section 5, we indicate that our algorithms have linear complexity O ( I K R ). Here, we provide a rough complexity assessment

of the algorithms using CPU execution time as a measure. We emphasize the relativity of this comparison because the CPU execution time depends on various factors including platform, programming language, implementation. We compare SOAP with PARAFACRLST and PARAFAC-SDT.1 Again, all parameters of PARAFAC-RLST
and PARAFAC-SDT are kept default. The algorithms are run in
a computer Intel Core i7 2.8 GHz with 8 GB RAM and Maltab
R2015a. Fig. 6 shows that SOAP is faster than PARAFAC-RLST and
PARAFAC-SDT.

20
21
22
23
24
25
26
27
28
29
30

70
71
72
73
74
75
76
77
78
79
80
81

83
85
86
87
88
89
90
91
92
93
94
95
96

31
32
33
34
35
36
37
38
39
40
41

6.3. Effect of the speed of variation

97

In this section, we consider different values of the speed of
variation ε A and εC to evaluate its effect on the performance.
Fig. 5 shows that SOAP adapts better to fast variation (ε A = εC =
10−2 ) than PARAFAC-RLST, PARAFAC-SDT, and 3DOPAST. In this
case, SOAP still converges while the others diverge. When the variation is smaller (ε A = εC = 10−3 or 10−5 ), SOAP is comparable to
PARAFAC-RLST and PARAFAC-SDT.

99

6.4. Long run-time stability

42
43

Performance of various algorithms including the famous recursive least squares (RLS) and least mean square (LMS) can suffer
when running for long time. This is referred to as long run-time
stability or limited precision effect [31] caused by accumulated
quantization error in time. We show that, by experiment, SOAP
is more stable than PARAFAC-RLST, PARAFAC-SDT and 3DOPAST in
this aspect2 (Fig. 7), at least in the described context and given parameters. A theoretical analysis to explain why SOAP is more stable
is still an open problem and deserves a future work. We note that,
in practice, the limited precision effect can be resolved by reinitializing algorithms after typically several thousands of iterations.

44
45
46
47
48
49
50

51
52
53
54
55

6.5. Waveform-preserving character

56
57

98
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117

118
119
120
121
122
123

The waveform-preserving character is important in communication and bio-medical applications, as for example in the blind
receivers for direct-sequence code-division multiple access (DSCDMA) [32] and multi-subject Functional Magnetic Resonance

58
59
60
61
62

124
125
126
127
128

63

66

69

84

19

65

68

82

17

64

67

Fig. 3. Performance comparison of NSOAP with batch non-negative PARAFAC when
loading matrices change relatively fast, ε A = εC = 10−3 .

1

129

We disregard 3DOPAST in this experiment because the code is not optimized

yet.
2

An experiment where we run SOAP 106 iterations was conducted to conﬁrm
the stability of the proposed algorithm.

130

131
132

JID:YDSPR

8

AID:2072 /FLA

[m5G; v1.195; Prn:13/01/2017; 9:02] P.8 (1-12)

V.-D. Nguyen et al. / Digital Signal Processing ••• (••••) •••–•••

1

67

2

68

3

69

4

70

5

71

6

72

7

73

8

74

9

75

10

76

11

77

12

78

13

79

14

80

15

81

16

82

17

83

18

84

19

85

20

86

21

87

22

88

23

89

24

90

25

91

26

92

27

93

28

94

29

95

30

96

31

97

32

98

33

99

34

100

35

101

36

102

37

103

38

104

39

105

40

106

41

107

42

108

43

109

44

110

45

111

46

112

47

113

48

114

49

115

50

116

51

117

52

118

53

119

54

120

55

121

56

122

57

123

58

124

59

125

60

126

61

127

62
63
64
65
66

128

Fig. 4. Performance comparison of NSOAP with batch non-negative PARAFAC in ﬂuorescence data set. For B(t ), we present only a part of recovered loading matrix;
initialization part is disregarded.

129

130

Fig. 5. The effect of the speed of variation on the algorithm performance.

131
132

JID:YDSPR AID:2072 /FLA

[m5G; v1.195; Prn:13/01/2017; 9:02] P.9 (1-12)

V.-D. Nguyen et al. / Digital Signal Processing ••• (••••) •••–•••

9

1

67

2

68

3

69

4

70

5

71

6

72

7

73

8

74

9

75

10

76

11

77

12

78

13

79

14

80

15

81

16

82

17

83

18

84

19

85

20
21
22

86
87

Fig. 6. CPU time-based comparison when loading matrices change relatively fast,
ε A = εC = 10−3 .

88

23
24
25
26
27
28
29
30
31
32
33
34
35
36

89

37
38
39
40
41
42
43
44
45

90

Imaging (fMRI) analysis [33]. In this section, we illustrate this
property of our algorithms via a synthetic example. We ﬁrst generate the loading matrix B(t ) including three kinds of signals: a
chirp, a rectangular wave and a sawtooth wave. The signal length
is 1 s and the sample rate is 1 kHz. For the chirp, the instantaneous frequency is 0 at t = 0 and crosses 300 Hz at t = 1 s.
The rectangular wave has a frequency of 50 Hz, and the symmetric sawtooth has a repetition frequency of 20 Hz with a sawtooth
width of 0.05 s. For A(t ) and C(t ), we use the loading matrices
from the ﬂuoresence example where components of A(t ) have a
sharp change and components of C(t ) have a smooth change. The
PARAFAC model is then disturbed by a Gaussian noise with signalto-noise-ratio (SNR) of 15 dB. The SNR (in dB) is deﬁned by

E{ A(t )

C(t )b(t ) }2

91
92
93

94
95
96
97
98
99
100
101
102
103

(50)

104

The simulation results when applying SOAP and NOSAP are shown
in Figs. 8 and 9, corresponding to the ﬁrst 200 data samples (iterations). As we can see, both algorithms lead to a good restoration
of the original components.

106

7. Conclusions

111

SNR(dB) =

σ2

.

105
107
108
109
110

46
47
48
49
50
51
52
53
54
55

112
113

In this paper, we have proposed two eﬃcient adaptive PARAFAC
decomposition algorithms: SOAP for a standard setup and NSOAP
for the non-negative constraint case. To our best knowledge, no
adaptive non-negative PARAFAC algorithms have been addressed
before. By exploiting the data structure, the proposed algorithms
achieve linear computational complexity of O ( I K R ) per iteration
while enjoying a good performance as compared to the state-ofthe-art algorithms. These algorithms3 can be considered as a starting point of real-time PARAFAC-based applications.

114

115
116
117
118
119
120
121

56
57

122
123

Acknowledgments

58
59
60
61
62
63

124
126
127
128
129

Fig. 7. Long time run stability of four algorithms when loading matrices change

relatively fast, ε A = εC = 10−3 .

64
65
66

125

We would like to thank the associate editor and the reviewers
for their efforts and careful evaluations, comments and suggestions.
We would like to thank Dr. Nion and Prof. Sidiropoulos for making their codes available.

3

Program codes will be made available on-line after publication of this work.

130
131
132

JID:YDSPR

10

AID:2072 /FLA

[m5G; v1.195; Prn:13/01/2017; 9:02] P.10 (1-12)

V.-D. Nguyen et al. / Digital Signal Processing ••• (••••) •••–•••

1

67

2

68

3

69

4

70

5

71

6

72

7

73

8

74

9

75

10

76

11

77

12

78

13

79

14

80

15

81

16

82

17

83

18

84

19

85

20

86

21

87

22

88

23

89

24

90

25

91

26

92

27

93

28

94

29

95

30

96

31

97

32

98

33

99

34

100

35

101

36

102

37

103

38

104

39

105

40

106

41

107

42

108

43

109

44

110

45

111

46

112

47

113

48

114

49

115

50

116

51

117

52

118

53

119

54

120

55

121

56

122

57

123

58

124

59

125

60

126

61

127

62

128

63
64
65
66

129

Fig. 8. Illustration of waveform-preserving character of SOAP through synthetic example with SNR = 15 dB. Solid line represents solution of SOAP while dash–dot line
represents ground-truth.

Fig. 9. Illustration of waveform-preserving character of NSOAP through synthetic example with SNR = 15 dB.

130
131
132

JID:YDSPR AID:2072 /FLA

[m5G; v1.195; Prn:13/01/2017; 9:02] P.11 (1-12)

V.-D. Nguyen et al. / Digital Signal Processing ••• (••••) •••–•••

1
2
3

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.02-2015.32.

4
5

Appendix A

6
7
8
9
10
11
12
13
14

To make our paper self-contained, we present here rank-1 update for the pseudo-inverse as discussed in Step 4 of the proposed
ˆ (t ) has a rank-2
algorithm (Section 3.4). Because the matrix H
structure, we can apply formula (51) twice to obtain its pseudoˆ # (t ).
inverse H
Given matrix A ∈ C I × J , its pseudo-inverse A# ∈ C I × J and two
vectors, c ∈ C I ×1 d ∈ C J ×1 , fast update of (A + cd H )# , corresponding to Theorem 5 in [23], is given by

15
16

(A + cd H )# = A# +

17
18
19
20
21
22
23
24
25
26
27

30
31
32
33
34

β

σ

A# h H u H −
∗

pq H ,

(51)

where

β = 1 + d H A# c
H

#

h=d A
k = A# c

u = c − Ak
p=−

28
29

1

β∗

qH = −

σ= h

u

2

β∗

#

H

A h −k

h

2

2

u

uH − h
β∗
2

+ | β |2 .

We note that this update includes only matrix–vector multiplications and, thus, preserves linear complexity of our algorithm.

35
36

39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66

[16] A. Stegeman, Finding the limit of diverging components in three-way CANDECOMP/PARAFAC – a demonstration of its practical merits, Comput. Stat. Data

Anal. 75 (2014) 203–216.
[17] J.B. Kruskal, Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics, Linear Algebra
Appl. 18 (2) (1977) 95–138.
[18] I. Domanov, L.D. Lathauwer, Generic uniqueness conditions for the canonical
polyadic decomposition and INDSCAL, SIAM J. Matrix Anal. Appl. 36 (4) (2015)
1567–1589.
[19] K. Liu, J.P.C. da Costa, H.C. So, L. Huang, J. Ye, Detection of number of components in CANDECOMP/PARAFAC models via minimum description length, Digit.
Signal Process. 51 (2016) 110–123.
[20] A. Hjorungnes, D. Gesbert, Complex-valued matrix differentiation: techniques
and key results, IEEE Trans. Signal Process. 55 (6) (2007) 2740–2746.
[21] F. Roemer, M. Haardt, Tensor-based channel estimation (TENCE) for two-way
relaying with multiple antennas and spatial reuse, in: IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2009, pp. 3641–3644.
[22] P. Strobach, Bi-iteration SVD subspace tracking algorithms, IEEE Trans. Signal
Process. 45 (5) (1997) 1222–1240.
[23] J.C.D. Meyer, Generalized inversion of modiﬁed matrices, SIAM J. Appl. Math.
24 (3) (1973) 315–323.
[24] D.P. Bertsekas, Projected Newton methods for optimization problems with simple constraints, SIAM J. Control Optim. 20 (2) (1982) 221–246.
[25] D. Kim, S. Sra, I.S. Dhillon, Fast Newton-type methods for the least squares
nonnegative matrix approximation problem, in: SDM, vol. 7, SIAM, 2007,
pp. 343–354.
[26] M. Schmidt, D. Kim, S. Sra, Projected Newton-type methods in machine learning, in: Optimization for Machine Learning, 2012, p. 305.
[27] R. Bro, PARAFAC. Tutorial and applications, Chemom. Intell. Lab. Syst. 38 (2)
(1997) 149–171.
[28] P. Strobach, Fast recursive subspace adaptive ESPRIT algorithms, IEEE Trans. Signal Process. 46 (9) (1998) 2413–2430.
[29] R. Badeau, G. Richard, B. David, Fast and stable YAST algorithm for principal and
minor subspace tracking, IEEE Trans. Signal Process. 56 (8) (2008) 3437–3446.
[30] C.A. Andersson, R. Bro, The N-way toolbox for MATLAB, Chemom. Intell. Lab.
Syst. 52 (1) (2000) 1–4.
[31] J.M. Cioﬃ, Limited-precision effects in adaptive ﬁltering, IEEE Trans. Circuits
Syst. 34 (7) (1987) 821–833.

[32] N.D. Sidiropoulos, G.B. Giannakis, R. Bro, Blind PARAFAC receivers for DS-CDMA
systems, IEEE Trans. Signal Process. 48 (3) (2000) 810–823.
[33] C.F. Beckmann, S.M. Smith, Tensorial extensions of independent component
analysis for multisubject FMRI analysis, Neuroimage 25 (1) (2005) 294–311.

References

37
38

11

[1] K. Slavakis, G. Giannakis, G. Mateos, Modeling and optimization for big data
analytics: (statistical) learning tools for our era of data deluge, IEEE Signal Process. Mag. 31 (5) (2014) 18–31.
[2] D. Nion, N.D. Sidiropoulos, Adaptive algorithms to track the PARAFAC decomposition of a third-order tensor, IEEE Trans. Signal Process. 57 (6) (2009)
2299–2310.
[3] M. Mardani, G. Mateos, G.B. Giannakis, Subspace learning and imputation for
streaming big data matrices and tensors, IEEE Trans. Signal Process. 63 (10)
(2015) 2663–2677.
[4] A. Cichocki, D. Mandic, L. De Lathauwer, G. Zhou, Q. Zhao, C. Caiafa, H.A.
Phan, Tensor decompositions for signal processing applications: from twoway to multiway component analysis, IEEE Signal Process. Mag. 32 (2) (2015)
145–163.
[5] M. Mørup, L.K. Hansen, S.M. Arnfred, Algorithms for sparse nonnegative Tucker
decompositions, Neural Comput. 20 (8) (2008) 2112–2131.
[6] P. Comon, Tensors: a brief introduction, IEEE Signal Process. Mag. 31 (3) (2014)
44–53.
[7] T.G. Kolda, B.W. Bader, Tensor decompositions and applications, SIAM Rev.
51 (3) (2009) 455–500.
[8] S.S. Haykin, Adaptive Filter Theory, Pearson Education India, 2007.
[9] P. Comon, G.H. Golub, Tracking a few extreme singular values and vectors in

signal processing, Proc. IEEE 78 (8) (1990) 1327–1343.
[10] Y. Hua, Y. Xiang, T. Chen, K. Abed-Meraim, Y. Miao, A new look at the power
method for fast subspace tracking, Digit. Signal Process. 9 (4) (1999) 297–314.
[11] X.G. Doukopoulos, G.V. Moustakides, Fast and stable subspace tracking, IEEE
Trans. Signal Process. 56 (4) (2008) 1452–1465.
[12] V.-D. Nguyen, K. Abed-Meraim, N. Linh-Trung, Fast adaptive PARAFAC decomposition algorithm with linear complexity, in: International Conference on
Acoustics, Speech and Signal Processing, ICASSP, 2016, pp. 6235–6239.
[13] K. Abed-Meraim, A. Chkeif, Y. Hua, Fast orthonormal PAST algorithm, IEEE Signal Process. Lett. 7 (3) (2000) 60–62.
[14] A. Cichocki, R. Zdunek, A.H. Phan, S. Amari, Nonnegative Matrix and Tensor
Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind
Source Separation, John Wiley & Sons, 2009.
[15] L.-H. Lim, P. Comon, Nonnegative approximations of nonnegative tensors, J.
Chemom. 23 (7–8) (2009) 432–441.

67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82

83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102

Viet-Dung Nguyen received the bachelor degree from the VNU University of Engineering and Technology, Vietnam, in 2009, the M.Sc. degree in
network and telecommunication from the École Normale Supérieure (ENS),
Cachan, France, in 2012, and the Ph.D. degree from the University of Orléans, France, in 2016, in the ﬁeld of signal processing. Now he has held
a postdoctoral research fellow position at Signals and Systems Laboratory
(L2S), University of Paris-Saclay.
His current research interests include: wireless communication for
Internet-of-Things, matrix and tensor decompositions, adaptive signal processing, blind source separation, array signal processing and statistical
performance analysis.

103
104
105
106
107
108
109
110
111
112
113
114

Karim Abed-Meraim was born in 1967. He received the State Engineering Degree from Ecole Polytechnique, Paris, France, in 1990, the State
Engineering Degree from Ecole Nationale Supérieure des Télécommunications (ENST), Paris, France, in 1992, the M.Sc. degree from Paris XI University, Orsay, France, in 1992 and the Ph.D. degree from the ENST in 1995 (in
the ﬁeld of Signal Processing and communications). From 1995 to 1998,
he took a position as a research fellow at the Electrical Engineering Department of the University of Melbourne where he worked on research
project related to “Blind System Identiﬁcation for Wireless Communications” and “Array Processing for Communications”. From 1998 to 2012
he has been Assistant then Associate Professor at the Signal and Image
Processing Department of Telecom-ParisTech. His research interests are related to statistical signal processing with application to communications,
system identiﬁcation, adaptive ﬁltering and tracking, radar and array processing, biomedical signal processing and statistical performance analysis.
In September 2012 he joined the University of Orléans (PRISME Lab.) as a
full Professor.
He has been also a visiting scholar at the Centre of Wireless Communications (National University of Singapore) in 1999, at the EEE Department of Nanyang Technological University (Singapore) in 2001, at Telecom

115
116
117
118
119

120
121
122
123
124
125
126
127
128
129
130
131
132

JID:YDSPR

12

1
2
3
4
5
6
7

AID:2072 /FLA

[m5G; v1.195; Prn:13/01/2017; 9:02] P.12 (1-12)

V.-D. Nguyen et al. / Digital Signal Processing ••• (••••) •••–•••

Malaysia Research and Development Centre in 2004, at the School of Engineering and Mathematics of Edith Cowan University (Perth, Australia)
in 2004, at the EEE Department of the National University of Singapore
in 2006, at Sharjah University (UAE) in 2008–2009 and at King Abdullah
University of Science and Technology (KSA) in 2013 and 2014.
He is the author of about 400 scientiﬁc publications including book
chapters, international journal and conference papers and patents.

8
9
10
11
12
13
14

Dr. Nguyen Linh-Trung studied B.Eng. and Ph.D. both in Electrical Engineering at Queensland University of Technology, Brisbane, Australia. Since
2006, he has been at the faculty of the University of Engineering and Technology (VNU-UET), a member university of Vietnam National University,
Hanoi (VNU), where he is currently an associate professor of electronic
engineering in the Faculty of Electronics and Telecommunications.

His technical interest is in signal processing methods and algorithms
(for, especially, time–frequency signal analysis, blind source separation,
compressive sampling, tensor-based signal analysis, graph signal processing), and apply them to wireless communication and networking, biomedical engineering, with a current focus on large-scale processing.
He has held a postdoctoral research fellow position at the French National Space Agency (CNES), and visiting positions at Télécom ParisTech,
Vanderbilt University, CentraleSupélec, the Université Paris Sud, the Université Paris 13, and the University of Illinois. He has served the RadioElectronics Association of Vietnam (REV) and the IEEE on a number of
positions, including member of REV Standing Committee, senior member
of the IEEE, TPC co-chair of REV-IEEE annual International Conference on

Advanced Technologies for Communications (ATC), managing editor of REV
Journal on Electronics and Communications.

67
68
69
70
71
72
73
74
75
76
77
78
79
80

15

81

16

82

17

83