Tải bản đầy đủ (.pdf) (13 trang)

Novel Algorithm for Non Negative Matrix Factorization

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (120.75 KB, 13 trang )

New Mathematics and Natural Computation
Vol. 11, No. 2 (2015) 121–133
#
.c World Scienti¯c Publishing Company
DOI: 10.1142/S1793005715400013

New Math. and Nat. Computation 2015.11:121-133. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/26/15. For personal use only.

Novel Algorithm for Non-Negative Matrix Factorization

Tran Dang Hien*
Vietnam National University
Hanoi, Vietnam


Do Van Tuan
Hanoi College of Commerce and Tourism
Hanoi, Vietnam


Pham Van At
Hanoi University of Communications and Transport
Hanoi, Vietnam


Le Hung Son
Hanoi University of Science and Technology
Hanoi, Vietnam



Non-negative matrix factorization (NMF) is an emerging technique with a wide spectrum of
potential applications in data analysis. Mathematically, NMF can be formulated as a minimization problem with non-negative constraints. This problem attracts much attention from
researchers for theoretical reasons and for potential applications. Currently, the most popular
approach to solve NMF is the multiplicative update algorithm proposed by Lee and Seung. In
this paper, we propose an additive update algorithm that has a faster computational speed than
Lee and Seung's multiplicative update algorithm.
Keywords: NMF; non-negative matrix factorization; KKT; Krush–Kuhn–Tucker optimal condition; the stationarity point; updating an element of matrix; updating matrices.

1. Introduction
Non-negative matrix factorization (NMF) is an approximate representation of a
given non-negative matrix V 2 R nÂm as a product of two non-negative matrices

*Corresponding

author.
121


122

T. D. Hien et al.

W 2 R nÂr and H 2 R rÂm :

New Math. and Nat. Computation 2015.11:121-133. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/26/15. For personal use only.

V % W Â H:

ð1:1Þ


Because r is usually selected to be a small number, the sizes of matrices W and H are
much smaller than that of V . If V is a data matrix of an object, then W and H can be
viewed as approximate representations of V . Thus, NMF can be considered an effective technique for representing and reducing data. Although this technique has
been recently developed, it has a wide range of applications, such as mathematical
optimization,9 document clustering,7,11 data mining,8 object recognition5 and
detecting forgery.10,12
To measure the approximation in (1.1), the Frobenius norm of di®erence matrix is
used:
n X
m
1
1X
fðW ; HÞ ¼ jjWH À V jj 2F ¼
ððWHÞij À Vij Þ 2 :
ð1:2Þ
2
2 i¼1 j¼1
Thus, NMF can be formulated as an optimization problem with non-negative
constraints:
min

W !0; H!0

fðW ; HÞ:

ð1:3Þ

Since the objective function fðW ; HÞ is not convex, most knowns methods only can
¯nd a pair of matrices ðW ; HÞ satisfying the Krush–Kuhn–Tucker (KKT) condition2

(such pair ðW ; HÞ is called a stationary point of (1.3)):
ððWH À

Wia ! 0;
T
V ÞH Þia ! 0;

Hbj ! 0;
ðW T ðWH À V ÞÞbj ! 0;

Wia  ððWH À V ÞH T Þia ¼ 0;
Hbj  ðW T ðWH À V ÞÞbj ¼ 0; 8 i; a; b; j;

ð1:4Þ

where
@fðW ; HÞ
;
@W
@fðW ; HÞ
:
W T ðWH À V Þ ¼
@H
To update H or W (the remaining ¯xed), the gradient direction reverse is often used
with a certain appropriate steps to decrease the value of the objective function, while
still ensuring the non-negative of H and W . Among the known algorithms used to
solve (1.3), Lee and Seung (LS)6 algorithm often is mentioned. This algorithm is a
simple calculation scheme that is easy to install and gives good results, so it remains a
more commonly used algorithm.10,12 The LS algorithm uses the formulas:



~ ij ¼ Hij À ij @f
H
 Hij þ ij ðW T V À W T WHÞij ;
ð1:5Þ
@H ij


@f
~
~ T À WH
~H
~ T Þij :
W ij ¼ Wij À  ij
 Wij þ  ij ðV H
ð1:6Þ
@W ij
ðWH À V ÞH T ¼


Novel Algorithm for Non-negative Matrix Factorization

123

By selecting steps ij and  ij :
ij ¼

Hij
T
ðW WHÞ


ij

;

 ij ¼

Wij
;
~H
~ T Þij
ðW H

ð1:7Þ

then formulas (1.5) and (1.6) become

New Math. and Nat. Computation 2015.11:121-133. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/26/15. For personal use only.

T
~ ij ¼ Hij ðW V Þij ;
H
ðW T WHÞij

~T
~ ij ¼ Wij ðV H Þij :
W
~H
~ T Þij

ðW H

Because above adjustment formulas use multiplications, this algorithm is called the
method of multiplicative update. With this adjustment, the non-negative is ensured.
Reference 6 also proves the monotonic decrease of the objective function after adjustment:
~ ;H
~ Þ < fðW ; HÞ:
fðW
This algorithm is easy to complement on the computer, but slowly convergent.
To improve the convergence speed, Gonzalez and Zhang3 has improved the LS
algorithm by using one more coe±cient for each column of H and one more coe±cient
for each row of W . In other words, instead of (1.5) and (1.6) they use the formulas:
~ ij ¼ Hij þ j ij ðW T V À W T WHÞij ;
H
~ T À WH
~H
~ T Þij ;
~ ij ¼ Wij þ
j  ij ðV H
W
where ij and  ij are de¯ned by (1.7), j and
j are calculated through the function:
 ¼ gðA; b; xÞ;
where A is the matrix and b and x are the vectors. This function is de¯ned as follows:
q ¼ A T ðb À AxÞ

and

p ¼ ½x:=ðA T Axފ  q;


where the symbols ð:=Þ and ðÞ denote component-wise division and multiplication,
respectively. Then  ¼ gðA; b; xÞ is calculated with the formula:


pT q
 ¼ min
;
0:99
Â
maxf
:
x
þ
p
!
0g
:
p T A T Ap
The coe±cients j and
j are determined by the function gðA; b; xÞ as follows:
j ¼ gðW ; Vj ; Hj Þ;

i ¼

j ¼ 1; . . . ; n;

gðH T ; V iT ; W iT Þ;

i ¼ 1; . . . ; m:


However, experiments show that Gonzalez–Zhang (GZ)3 algorithm is not faster than
LS algorithm.
In this paper we propose a new algorithm by updating each element of matrices
W and H, based on the idea of non-linear \Gauss–Seidel" method.4 With some
assumptions, the proposed algorithm ensures reaching the stationary point


124

T. D. Hien et al.

New Math. and Nat. Computation 2015.11:121-133. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/26/15. For personal use only.

(Theorem 2, Sec. 3.2). Experiments show that the proposed algorithm converges
faster than the LS and GZ algorithms.
The content of this paper is organized as follows: in Sec. 2, we present an algorithm to update an element of the matrix W or H. This algorithm will be used in
Sec. 3 to construct a new algorithm for NMF (1.3). We also consider the convergent
properties of new algorithm. Section 4 presents a scheme for installing new algorithm
on the computer. In Sec. 5, we present experimental results that compare the calculating speed of algorithms. Finally, conclusions are given in Sec. 6.
2. Algorithm for Updating and Element of Matrix
2.1. Updating an element of matrix W
In this subsection, we consider the algorithm for updating an element of W , while
retaining the remaining elements of W and H. Suppose Wij is adjusted by adding
parameter:
~ ij ¼ Wij þ :
W

ð2:1Þ


~ is an obtained matrix, then by some matrix operations, we have
If W

a 6¼ i; b ¼ 1; . . . ; m
ðWHÞab ;
~
ðW HÞab ¼
ðWHÞib þ Hjb ; a ¼ i; b ¼ 1; . . . ; m:
From (1.2) it follows:
~ ; HÞ ¼ fðW ; HÞ þ gð Þ;
fðW

ð2:2Þ

where
1
ðp 2 Þ þ q ;
2
m
X
2
H jb
;


gð Þ ¼

ð2:3Þ
ð2:4Þ


b¼1



m
X

ðWH À V Þib  Hjb :

ð2:5Þ

i¼1

~ ; HÞ, one needs to de¯ne so that gð Þ achieves the minimum value
To minimize fðW
~ ij ¼ Wij þ ! 0. Because gð Þ is a quadratic function, can be
on the condition W
de¯ned as follows:
8
0;
q ¼ 0;
>
>


>
>
<
q
ð2:6Þ

¼ max À p ; Àwij ; q > 0;
>
>
> q
>
:À ;
q < 0:
p


Novel Algorithm for Non-negative Matrix Factorization

125

Formula (2.6) always means, because if q 6¼ 0 then by (2.4) we have p > 0. From (2.3)
and (2.6), we get:
gð Þ ¼ 0; if ðq ¼ 0Þ or ðq > 0
gð Þ < 0;

and Wij ¼ 0Þ;

otherwise:

ð2:7aÞ
ð2:7bÞ

New Math. and Nat. Computation 2015.11:121-133. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/26/15. For personal use only.

By using the updated formulas (2.1) and (2.6), the monotonous decrease of the

objective function fðW ; HÞ is con¯rmed in the following lemma.
Lemma 2.1.1. If conditions KKT are not satis¯ed at Wij , then:
~ ; HÞ < fðW ; HÞ;
fðW
~ ¼ W.
Otherwise: W
Proof. From (2.4) and (2.5) it follows:
q ¼ ððWH À V ÞH T Þij :
Therefore, if conditions KKT (1.4) are not satis¯ed at Wij , then the properties
Wij ! 0;

q ! 0;

Wij  q ¼ 0;

ð2:8Þ

cannot occur simultaneously. From this, and because Wij ! 0, it follows that case
(2.7a) cannot occur. Therefore, case (2.7b) must occur and we have gð Þ < 0.
Therefore, from (2.2), we obtain
~ ; HÞ < fðW ; HÞ:
fðW
Conversely, if (2.8) is satis¯ed it means that: q ¼ 0 or q > 0 and Wij ¼ 0. So from
(2.6), it follows ¼ 0. Therefore, by (2.1) we have
~ ij ¼ Wij :
W
Thus, the lemma is proved.
2.2. Updating an element of matrix H
~ be obtained from the update rule:
Let H

~ ij ¼ Hij þ
;
H

ð2:9Þ

where
is de¯ned by the formulas:


n
X

W ai2 ;

ð2:10Þ

Wai  ðWH À V Þaj ;

ð2:11Þ

a¼1



n
X
a¼1



126

T. D. Hien et al.

New Math. and Nat. Computation 2015.11:121-133. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/26/15. For personal use only.

8
0;
v ¼ 0;
>
>
 v

>
<
max À ; ÀHij ; v > 0;
ð2:12Þ

¼
u
>
> v
>
:À ;
v < 0:
u
By the same discussions used in Lemma 2.1.1, we have the following Lemma.
Lemma 2.2.1. If conditions KKT are not satis¯ed at Hij , then:
~ Þ < fðW ; HÞ;

fðW ; H
~ ¼ H.
Otherwise: H
3. The Proposed Algorithm
3.1. Updating matrices W and H
~ ;H
~ Þ as
In this subsection, we consider the transformation T from ðW ; HÞ to ðW
follows:
.

Modify elements of W by Sec. 2.1.

.

Modify elements of H by Sec. 2.2.

~ ;H
~ Þ ¼ T ðW ; HÞ shall be carried out as follows:
In other words, the transformation ðW
Step 1: Initialize
~ ¼ W; H
~ ¼ H:
W
~
Step 2: Update elements of W
For j ¼ 1; . . . ; r and i ¼ 1; . . . ; n
~ ij
W


~ ij þ :
W

is computed from (2.4)–(2.6).
~
Step 3: Update elements of H
For i ¼ 1; . . . ; r and j ¼ 1; . . . ; m
~ ij
H

~ ij þ
:
H


is computed from (2.10)–(2.12).
From Lemmas 2.1.1 and 2.2.1, we can easily obtain the following important result:
Lemma 3.1.1. If solution ðW ; HÞ does not satisfy the condition KKT (1.4), then
~ ;H
~ Þ ¼ fðT ðW ; HÞÞ < fðW ; HÞ:
fðW
~ ;H
~ Þ ¼ ðW ; HÞ.
In the contrary case, then: ðW
Following property is directly obtained from Lemma 3.1.1.


Novel Algorithm for Non-negative Matrix Factorization

127


Corollary 3.1.1. For any ðW ; HÞ ! 0, if set
~ ;H
~ Þ ¼ T ðW ; HÞ;
ðW
~ ;H
~ Þ ¼ ðW ; HÞ or fðW
~ ;H
~ Þ ¼ fðW ; HÞ.
then ðW
3.2. Algorithm for NMF (1.3)
New Math. and Nat. Computation 2015.11:121-133. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/26/15. For personal use only.

The algorithm is described through the transformation T as follows:
Step 1: Initialize W ¼ W 1 ! 0; H ¼ H 1 ! 0.
Step 2: For k ¼ 1; 2; . . . ;
ðW kþ1 ; H kþ1 Þ ¼ T ðW k ; H k Þ:
From Corollary 3.1.1, we obtain the following important property of preceding
algorithm.
Theorem 3.2.1. Suppose ðW k ; H k Þ is a sequence of solutions created by
algorithm 3.2, then the sequence of objective function values fðW k ; H k Þ actually
decreases monotonously:
fðW kþ1 ; H kþ1 Þ < fðW k ; H k Þ;

8 k ! 1:

Moreover, the sequence fðW k ; H k Þ is bounded below by zero, so Theorem 3.2.1
implies the following corollary:
Corollary 3.2.1. Sequence fðW k ; H k Þ is a convergence sequence. In other words,

there exists non-negative value f such that:
lim fðW k ; H k Þ ¼ f :

k!1

Now, we consider another convergence property of algorithm 3.2.
Theorem 3.2.2. Suppose ðW ; H Þ is a limit point of the sequence ðW k ; H k Þ and
m
X

2

j ¼ 1; . . . ; r;

ð3:1Þ

2

i ¼ 1; . . . ; r;

ð3:2Þ

H jb > 0;

b¼1
n
X

W ai > 0;


a¼1

then ðW ; H Þ is the stationary point of the problem (1.3).
Proof. By assumption, ðW ; H Þ is the limit of some subsequence ðW tk ; H tk Þ of the
sequence ðW k ; H k Þ:
lim ðW tk ; H tk Þ ¼ ðW ; H Þ:

k!1

ð3:3Þ


128

T. D. Hien et al.

By conditions (3.1) and (3.2), the transformation T is continuous at ðW ; H Þ.
Therefore, from (3.3) we get
lim T ðW tk ; H tk Þ ¼ T ðW ; H Þ:

k!1

Moreover, since T ðW tk ; H tk Þ ¼ ðW tk þ1 ; H tk þ1 Þ, then
lim ðW tk þ1 ; H tk þ1 Þ ¼ T ðW ; H Þ:

New Math. and Nat. Computation 2015.11:121-133. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/26/15. For personal use only.

k!1


ð3:4Þ

Using a continuation of the object function fðW ; HÞ, from (3.3) and (3.4) we have
lim fðW tk ; H tk Þ ¼ fðW ; H Þ;

k!1

lim fðW tk þ1 ; H tk þ1 Þ ¼ fðT ðW ; H ÞÞ:

k!1

On the other hand, by Corollary 3.2.1, sequence fðW k ; H k Þ is convergent, it follows:
fðT ðW ; H ÞÞ ¼ fðW ; H Þ:
Therefore, by Lemma 3.1.1, W ; H must be a stationary point of problem (1.3).
Thus, the theorem is proved.

4. Some Variations of Algorithm
In this section, we provide variations for the algorithm in Sec. 3.2 (algorithm 3.2)
to reduce the volume of calculations and increase convenience for the installation
program.
4.1. Evaluate computational complexity
To update element Wij by formulas (2.1), (2.4), (2.5) and (2.6), we need to use m
multiplications for calculating p and m  ðn  m  rÞ multiplications for calculating q.
Similarly, to update element Hij using (2.9), (2.10), (2.11) and (2.12), we need n
multiplications for computing u and ðn  m  rÞ Â n multiplications for computing
v. It follows that the number of calculations to make a loop (transformation
~ ;H
~ Þ ¼ T ðW ; HÞ) of the algorithm 3.2 is
ðW
2  n  m  r  ð1 þ n  m  rÞ:


ð4:1Þ

4.2. Some variations for updating W and H
4.2.1. Updating Wij
If set
D ¼ WH À V ;

ð4:2Þ


Novel Algorithm for Non-negative Matrix Factorization

129

then the formula (2.5) for q becomes
m
X



Dib  Hjb :

ð4:3Þ

b¼1

New Math. and Nat. Computation 2015.11:121-133. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/26/15. For personal use only.


If one considers D as known, the calculation of q in (4.3) needs m multiplications.
~ to be used
After updating Wij by the formula (2.1), we need to recalculate D from W
for the adjustment of other elements of W :
~ ¼W
~H
~ ÀV:
D
~ is determined from D in the formula:
From (2.1) and (4.2), it is seen that D

~ ab ¼
D

Dab ;
a 6¼ i; b ¼ 1; . . . ; m;
Dib þ Hjb ; a ¼ i; b ¼ 1; . . . ; m:

ð4:4Þ

Therefore, we only need to adjust the ith row of D and use m multiplications.
From formulas (2.1), (2.4), (2.6), (4.3) and (4.4), we have a new scheme for
updating matrix W as follows.
4.2.2. Scheme for updating matrix W
For i ¼ 1 to r,


m
X


2
H jb
:

b¼1

For i ¼ 1 to n,


¼

8
< 0;

m
X





q ¼ 0;

q
: max À ; Àwij ; q 6¼ 0;
p
Wij

Dib


Dib  Hjb ;

b¼1

Dib þ Hjb ;

Wij þ ;
a ¼ i;

b ¼ 1; . . . ; m;

End for i,
End for j.
The total number of operations used to adjust the matrix W is 2 Â nÂ
m  m  r þ m  r.


130

T. D. Hien et al.

4.2.3. Updating Hij
Similarly, the formula (2.11) for v becomes


m
X

Wai  Daj :


ð4:5Þ

New Math. and Nat. Computation 2015.11:121-133. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/26/15. For personal use only.

a¼1

According to this formula, we only use n multiplications to calculate v. After
adjusting Hij by formula (2.9), we need to recalculate matrix D using the following
formula:

Dab ;
b 6¼ j; a ¼ 1; . . . ; n;
~
D ab ¼
ð4:6Þ
Daj þ
Wai ; b ¼ j; a ¼ 1; . . . ; n:
Therefore, we only need to adjust the jth column of D and use n multiplications.
From formulas (2.9), (2.10), (2.12), (4.5) and (4.6), we have a new scheme for
updating matrix H as follows.
4.2.4. Scheme for updating matrix H
For i ¼ 1 to r,


n
X

W ai2 :


a¼1

For j ¼ 1 to m,


n
X

Wai  Daj ;

a¼1


¼

Daj

8
< 0;





Hij

Hij þ
;

v ¼ 0;


: max À v ; ÀHij ; v 6¼ 0;
u

Daj þ
Wai ;

b ¼ j;

a ¼ 1; . . . ; n:

End for j,
End for i.
The total number of operations used to adjust the matrix H is 2  n  m  rþ
n  r.
Using the preceding results together, we can construct a new calculating scheme
for the algorithm 3.2 in Sec. 4.3.


Novel Algorithm for Non-negative Matrix Factorization

131

4.3. New calculating scheme for the algorithm 3.2
(i) Initialize W ¼ W 1 ! 0, H ¼ H 1 ! 0.
D ¼ WH À V :

New Math. and Nat. Computation 2015.11:121-133. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/26/15. For personal use only.


(ii) For k ¼ 1; 2; . . . ;
(a) Update W by using Sec. 4.2.2.
(b) Update H by using Sec. 4.2.4.
The computational complexity of this scheme is as follows:
.

Initialization step needs n  m  r multiplications for computing D.

.

Each loop needs n  m  r þ r  ðn þ mÞ multiplications.

Comparing with (4.1), the number of operations is now greatly reduced.
5. Experiments
In this section, we present results of two experiments on the algorithms: New NMF
(new proposed additive update algorithm), GZ and LS. The programs are written in
MATLAB and run on a machine with the following con¯gurations: Intel Pentium
Core 2 P6100 2.0 GHz, RAM 3 GB. New NMF is built according to the schema in
Sec. 4.3.
5.1. Experiment 1
This experiment will compare the speed of convergence to stationary point of the
algorithms. First of all condition KKT (1.4) is equivalent to the following condition:
ðW ; HÞ ¼ 0;
where
ðW ; HÞ ¼

n X
r
X


j minðWia ; ððWH À V ÞH T Þia Þj

i¼1 a¼1
r X
m
X

þ

j minðHbj ; ðW T ðWH À V ÞÞbj Þj:

b¼1 j¼1

Thus, if ðW ; HÞ is smaller, then ðW ; HÞ is closer to the stationary point of the
problem (1.3). To get a quantity independent with the size of W and H, we use
following formula:
ÁðW ; HÞ ¼

ðW ; HÞ
;
 W þ H

in which W is the number of elements of the set:
fj minðWia ; ððWH À V ÞH T Þia Þj 6¼ 0; i ¼ 1; . . . ; n; a ¼ 1; . . . ; rg


132

T. D. Hien et al.


New Math. and Nat. Computation 2015.11:121-133. Downloaded from www.worldscientific.com
by MCMASTER UNIVERSITY on 10/26/15. For personal use only.

Table 1. Normalized KKT residuals value ÁðW ; HÞ.
Time (s)

New NMF

GZ

LS

60
120
180
240
300

3.6450
1.5523
0.1514
0.0260
0.0029

3700.4892
3718.2967
3708.6043
3706.4059
3696.7690


3576.0937
3539.8986
3534.6358
3524.6715
3508.3239

and H is the number of elements of the set:
fj minðHbj ; ðW T ðWH À V ÞÞbj Þj 6¼ 0; j ¼ 1; . . . ; m; b ¼ 1; . . . ; rg;
where ÁðW ; HÞ is called a normalized KKT residual. Table 1 presents the value
ÁðW ; HÞ of the solution ðW ; HÞ received by each algorithm implemented in the given
time periods on the dataset of size ðn; m; rÞ ¼ ð200; 100; 10Þ, in which V , W 1 and H 1
were generated randomly with Vij 2 ½0; 500Š, ðW 1 Þij 2 ½0; 5Š, ðH 1 Þij 2 ½0; 5Š.
The results in Table 1 show that the GZ and LS algorithms cannot converge to a
stationary point (value ÁðW ; HÞ is still large). Meanwhile, the new NMF algorithm
converges to a stationary point because the value ÁðW ; HÞ reaches a value approximately equal to zero.
5.2. Experiment 2
This experiment will compare the convergence speed to the minimum value of objective function fðW ; HÞ of the algorithms implemented in given time periods on the
data set of size ðn; m; rÞ ¼ ð500; 100; 20Þ, in which V , W 1 and H 1 were generated
randomly with Vij 2 ½0; 500Š, ðW 1 Þij 2 ½0; 1Š, ðH 1 Þij 2 ½0; 1Š. The algorithms are run
¯ve times with ¯ve di®erent pairs of W 1 and H 1 generated randomly in the interval
½0; 1Š. Average values of objective function after ¯ve algorithm iterations in each
given time period are presented in Table 2.
The results in Table 2 show that the objective function value of the solutions
generated by the GZ and LS algorithms is quite large. Meanwhile, the objective
function value of New NMF algorithm is much smaller.

Table 2. Average values of objective function.
Time (s)

New NMF


GZ

LS

60
120
180
240
300
360

57.054
21.896
18.116
17.220
16.684
16.458

359.128
319.674
299.812
290.789
284.866
281.511

285.011
273.564
267.631
264.632

262.865
261.914


×