DSpace at VNU: A novel efficient two-phase algorithm for training interpolation radial basis function networks

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (463.4 KB, 10 trang )

ARTICLE IN PRESS

Signal Processing 87 (2007) 2708–2717
www.elsevier.com/locate/sigpro

A novel efﬁcient two-phase algorithm for training interpolation
radial basis function networks$
Hoang Xuan Huana, Dang Thi Thu Hiena, Huu Tue Huynhb,c,Ã
a

Faculty of Information Technology, College of Technology, Vietnam National University, Hanoi, Vietnam
Faculty of Electronics and Telecommunications, College of Technology, Vietnam National University, Hanoi, Vietnam
c
Department of Electrical and Computer Engineering, Laval University, Quebec, Canada

b

Received 18 October 2006; received in revised form 28 April 2007; accepted 8 May 2007
Available online 16 May 2007

Abstract
Interpolation radial basis function (RBF) networks have been widely used in various applications. The output layer
weights are usually determined by minimizing the sum-of-squares error or by directly solving interpolation equations.
When the number of interpolation nodes is large, these methods are time consuming, difﬁcult to control the balance
between the convergence rate and the generality, and difﬁcult to reach a high accuracy. In this paper, we propose a twophase algorithm for training interpolation RBF networks with bell-shaped basis functions. In the ﬁrst phase, the width
parameters of basis functions are determined by taking into account the tradeoff between the error and the convergence
rate. Then, the output layer weights are determined by ﬁnding the ﬁxed point of a given contraction transformation. The
running time of this new algorithm is relatively short and the balance between the convergence rate and the generality is
easily controlled by adjusting the involved parameters, while the error is made as small as desired. Also, its running time
can be further enhanced thanks to the possibility to parallelize the proposed algorithm. Finally, its efﬁciency is illustrated
by simulations.

r 2007 Elsevier B.V. All rights reserved.
Keywords: Radial basis functions; Width parameters; Output weights; Contraction transformation; Fixed point

1. Introduction
$

This work has been ﬁnancially supported by the College of
Technology, Vietnam National University, Hanoi. Some preliminary results of this work were presented at the Vietnamese
National Workshop on Some Selected Topics of Information
Technology, Hai Phong, 25–27 August 2005.
ÃCorresponding author. Faculty of Electronics and Telecommunications, College of Technology, Vietnam National University, 144 Xuanthuy, Caugiay, Hanoi, Vietnam.
Tel.: +84 4 754 9271; fax: +84 4 754 9338.
E-mail addresses: (H.X. Huan),
(D.T.T. Hien), ,
(H.T. Huynh).

Radial basis function (RBF) networks, ﬁrst
proposed by Powell [1] and introduced into neural
network literature by Broomhead and Lowe [2],
have been widely used in pattern recognition,
equalization, clustering, etc. (see [3–6]). In a multivariate interpolation network of a function f, the
interpolation function is of the form
jðxÞ ¼

0165-1684/$ - see front matter r 2007 Elsevier B.V. All rights reserved.
doi:10.1016/j.sigpro.2007.05.001

M
X
k¼1

Á
À
wk h x À vk ; sk þ w0

ARTICLE IN PRESS
H.X. Huan et al. / Signal Processing 87 (2007) 2708–2717
k
k
such
È k ÉN that jðx Þ ¼ y ; 8k ¼ 1; . . . ; N, where
x k¼1 is a set of n-dimensional vectors (called
interpolation nodes) and yk ¼ f ðxk Þ is the measured
k
value of the function
f at the node x . The real
k

functions hð x À v ; sk Þ are called RBFs with the
centers vk , M(MpN) is the number of RBFs used to
approximate f, and wk and sk are unknown
parameters to be determined. Properties of RBFs
were studied in [7–9]. The most common kind of
2
2
RBFs is the Gaussian function hðu; sÞ ¼ eÀu =s . In
interpolation RBF networks, their centers are
interpolation nodes; in this case, M ¼ N and vk ¼
xk for all k. In network training algorithms, parameters wk and sk are often determined by minimizing the sum-of-squares error or directly solving

interpolation equations (see [4,6]). An advantage of
interpolation RBF functions, proved by Bianchini
et al. [10], is that their sum of squared error has no
local minima, so that any optimization procedure
always gives a unique solution. The most common
training algorithm is the gradient descent method.
Despite the fact that the training time for an RBF
network is shorter than that for multiple-layered
perceptron (MLP), it is still rather long and the
efﬁciency of any optimal algorithm depends on the
choice of initial values [ref]. On the other hand, it is
difﬁcult to obtain small errors, and it is not easy to
control the balance between the convergence rate
and the generality, which depends on the radial
parameters. Consequently, interpolation networks
are only used when the number of interpolation
nodes is not too large. Looney [5] suggests to use
this network when the number of interpolation
nodes is less than 200.
Let us consider an interpolation problem in the R4
space, with 10 points on each dimension. The total
number of nodes is 104; even with this relatively high
ﬁgure, the nature of the interpolation problem is still
very sparse. With known methods, it is impossible to
handle this situation. In this paper, we propose a
highly efﬁcient two-phase algorithm for training
interpolation networks. In the ﬁrst phase, the radial
parameters sk are deﬁned by balancing between the
convergence rate and the generality. In the second
phase, the output weights wk are determined by

calculating the ﬁxed point of a given contraction
transformation. This algorithm converges quickly,
and can be parallelized in order to reduce its running
time. Furthermore, this method gives a high accuracy. Preliminary results show that this algorithm
works well even when the interpolation nodes are
relatively large as high as 5000 nodes.

2709

This paper is organized as follows. In Section 2,
RBF networks and usual training methods are
brieﬂy introduced. Section 3 is dedicated to the new
training algorithm. Simulation results are presented
in Section 4. Finally, important features of the
algorithm are discussed in the conclusion.
2. Interpolation problems and RBF networks: an
overview
In this section, the interpolation problem is stated
ﬁrst, then Gaussian RBFs and interpolation RBF
networks are brieﬂy introduced.
2.1. Multivariate interpolation problem and radial
basis functions
2.1.1. Multivariate interpolation problem
Consider a multivariate
f: D(CRn)-Rm
È k k ÉNfunction
k
and a sample set x ; y k¼1 ðx 2 Rn ; yk 2 Rm Þsuch
that f(xk) ¼ yk for k ¼ 1, y, N. Let j be a function
of a known form satisfying interpolation conditions:

jðxi Þ ¼ yi ;

8i ¼ 1; . . . ; N.

(1)

Eq. (1) helps determine the unknown parameters
in j.
The points xk are called interpolation nodes, and
the function j is called interpolation function of f
and used to approximate f on the domain D. In
1987, Powell proposed to use RBFs as interpolation
function j. This technique, using Gaussian RBF,
is described in the following; for further details,
see [4–6].
2.1.2. Radial basis function technique
Without loss of generality, it is assumed that m is
equal to 1. The interpolation function j has the
following form:
jðxÞ ¼

N
X

wk jk ðxÞ þ w0 ,

(2)

k¼1

where
k 2
2
jk ðxÞ ¼ eÀkxÀv k =sk

(3)

is the kth RBF corresponding to the function
qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

Pn 2
hðx À vk ; sk Þ in Section 1, kuk ¼
i¼1 ui is the
Euclidean norm of u; the interpolation node vk is the
center vector of jk; sk and wk are unknown
parameters to be determined. For each k, the
parameter sk, also called the width of jk, is used

ARTICLE IN PRESS
H.X. Huan et al. / Signal Processing 87 (2007) 2708–2717

2710

to
control
the domain of inﬂuence of the RBF jk. If
x À vk 43sk then jk(x) is almost negligible. In the
approximation problem, the number of RBFs is

much smaller than N and the center vectors are
chosen by any convenient methods.
By inserting the interpolation conditions of (1)
into (2), a system of equations is obtained in order
to determine the sets fsk g and fwk g:
jðxi Þ ¼

N
X

wk jk ðxi Þ þ w0 ¼ yi ;

8i ¼ 1; . . . ; N.

k¼1

(4)
Taking (3) into account, i.e. vk ¼ xk, it gives
N
X

i
k 2
2
wk eÀkx Àx k =sk ¼ yi À w0 ¼ zi ;

8i ¼ 1; . . . ; N.

k¼1

(5)
If the parameters sk are selected, then we consider
the N Â N matrix F:
(6)

F ¼ ðjk;i ÞNÂN
with
i

jk;i ¼ jk ðxi Þ ¼ eÀjjx Àx

k 2

jj =s2k

.

Michelli [11] has proved that if the nodes xk are
pairwise different, then F is positive-deﬁnite, and
hence invertible. Therefore, for any w0, there always
exists unique solutions w1,y,wN for (5). The above
technique can then be used to design interpolation
RBF neural networks (hereinafter called interpolation RBF networks).
2.2. Interpolation RBF networks
An interpolation RBF network which interpolates an n-variable real function f: D(CRn)-Rm is a
3-layer feedforward network. It is composed of n
nodes of the input layer, represented by the input
vector xARn, N hidden neurons, of which the kth
output is the value of radial function jk, m output
neurons which determine interpolated values of f.

The hidden layer is also called RBF layer.
Like other two-phase algorithms, one advantage
of this new algorithm is that m neurons of the
output layer can be trained separately. There are
many different ways to train an RBF network.
Schwenker et al. [6] categorize these training
methods into one-, two-, and three-phase learning
schemes. In one-phase training, the widths sk of the

RBF layer are set to a predeﬁned real number s,
and only the output layer weights wk are adjusted.
The most common learning scheme is two-phase
training, where the two layers of the RBF network
are trained separately. The width parameters of the
RBF layer are determined ﬁrst, the output layer
weights are then trained by a supervised learning
rule. Three-phase training is only used for approximation RBF networks; after the initialization of the
RBF networks utilizing two phase training, the
whole architecture is adjusted through a further
optimization procedure. The output layer may be
determined directly by solving (4). However, when
the number of interpolation nodes reaches hundreds, these methods are unstable. Usually, in a
training algorithm, the output weights are determined by minimizing the sum-of-squares error,
which is deﬁned as
E¼

N À
X

Á2

jðxk Þ À yk .

(7)

k¼1

Since the function E does not have local minima,
optimization procedures always give a good solution for (4). In practice, the training time of an RBF
network is much shorter than that of an MLP one.
However, known methods of multivariate minimizing still take rather long running times, and it is
difﬁcult to reach a very small error, or to parallelize
the algorithm structure.
Moreover, the width parameters of the RBFs also
affect the network quality and the training time [5].
Preferably, these parameters should be chosen large
when the number of nodes is small, and small when
the number of nodes large. Therefore, they can be
used to control the balance between the convergence
rate and the generality of the network.
In the following, a new two-phase training
algorithm is proposed. Brieﬂy, in the ﬁrst phase,
the width parameters sk of the network are
determined by balancing between its approximation
generality and its convergence rate, and in the
second phase, the output weights wk are iteratively
adjusted by ﬁnding the corresponding ﬁxed point of
a given contraction transformation.
3. Iterative training algorithm
The main idea, which is stated in the following
basic theorem, of the new training algorithm is

based on a contraction-mapping related to the
matrix F.

ARTICLE IN PRESS
H.X. Huan et al. / Signal Processing 87 (2007) 2708–2717

2711

3.1. Basic theorem
matrix,
Â Let I beÃT the identity
Â
ÃT W ¼
w1 Á Á Á wN
and Z ¼ z1 Á Á Á zN , respectively, the output weight vector and the right-hand
side of (5).
By setting
h i
C ¼ I À F ¼ ck;j
,
(8)
NxN

we have
(
ck;j ¼

Fig. 1. Network training procedure.

0
i
k 2
2
ÀeÀjjx Àx jj =sk

if : k ¼ j;
if : kaj:

(9)

Then, (4) can be expressed as following:
W ¼ CW þ Z.

(10)

As mentioned in Section 2.1, if the width
parameters sk and w0 are determined then (10)
always has a unique solution W. First, we set the
average of all yk to w0 by
w0 ¼

N
1X
yk .
N k¼1

(11)

Now, for each kpN, we have the following

function qk with argument sk:
qk ¼

N
X

c .
k;j

(12)

j¼1

Theorem 1. The function qk ðsk Þ is increasing. Also,
for every positive number qo1, there exists a sk such
that qk is equal to q.
Proof. From (9) and (12), we can easily verify that
qk is an increasing function of sk. Moreover, we
have
lim qk ¼ N À 1

sk !1

and

lim qk ¼ 0.

sk !0

(13)

Because the function qk is continuous, for every
qA(0,1) there exists a sk such that qk(sk) ¼ q. The
theorem is proved. &
This theorem shows that for each given positive
value qo1, we can ﬁnd a set of values fsk gN
k¼1 such
that the solution W* of (10) is the ﬁxed point of the
contraction transformation CW+Z corresponding
to the contraction coefﬁcient q.
3.2. Algorithm description
Given an error e, a positive number qo1 and a
given 0oao1, the objective of our algorithm is to

determine the parameters sk and W*. In the ﬁrst
phase, sk are determined such that qkpq, and if sk
is replaced by sk =a
È then qÉk4q. Therefore, the norm
kCkÃ ¼ maxkukp1 kCukÃ of the matrix C induced
by the vector norm k:kÃ deﬁned in Eq. (14) is smaller
than q. In the second phase, the solution W* of
Eq. (10) is iteratively adjusted by ﬁnding the ﬁxed
point of the contraction transformation CW þ Z.
The algorithm is speciﬁed in Fig. 1 and described in
detail thereafter.
3.2.1. Phase 1. Determining width parameters
The ﬁrst phase of the algorithm is to determine
the width parameters sk such that qkpq and closest
to q; i.e., if we replace sk by sk =a then qk4q.
Given a positive number ao1 and an initial width

s0, pwhich
might be chosen to be equal to
ﬃﬃﬃ
1=ð 2ð2NÞ1=n Þ as suggested in [5], then the algorithm performs the following iterative procedure.
3.2.2. Phase 2. Determining output weights
To determine the solution W* of Eq. (10), the
following iterative procedure is executed.
For each n-dimensional vector u, we denote by
kukÃ the following norm:
kukÃ ¼

N
X
uj .

(14)

j¼1

The end condition of algorithm can be chosen
from one of the following equations:

q
W 1 À W 0 p
(15)
(a)
Ã
1Àq
À

Á

(b) tX ln ð1 À qÞ=kZkÃ ¼ ln À ln kZ kÃ þ lnð1 À qÞ ,
ln q

ln q

(16)
where t is the number of iterative time.

ARTICLE IN PRESS
H.X. Huan et al. / Signal Processing 87 (2007) 2708–2717

2712

These end conditions are suggested from the
following theorem of convergent property.
3.3. Convergence property
The following theorem ensures the algorithm
convergence and allows us to estimate its error.
Theorem 2. The algorithm always ends after a finite
number of iterations, and the final error is bounded by
1

W À W Ã p.
(17)
Ã
Proof. First, from the conclusion of the Theorem 1,
it can be seen that the ﬁrst phase of the algorithm

always ends after a ﬁnite number of steps and qkpq
for every k. On the other hand, the norm kCkÃ of
matrix C induced by the vector norm k:kÃ in Eq.
(14) is determined by the following equation (see [2,
Theorem 2, Subsection 9.6, Chapter I]):
È É
kCkÃ ¼ max qk pq.
(18)
kpN

Phase 1: Beside n and N, the complexity of the
ﬁrst phase depends on the distribution of interpolaÈ ÉN
tion nodes xk k¼1 and does not depend on the
function f. Depending on the initial choices of s0,
that there can be p4q (corresponding to step 3 of
Fig. 2) or poq (corresponding to step 4 of Fig. 2).
For the former case, for every kpN, let mk be the
number of iterations in step 3 such that qk4q with
sk ¼ amk À1 s0 but qkpq with sk ¼ amk s0 . Therefore,
smin
mk ploga
where smin ¼ minfsk gðps0 Þ. (21)
s0
In the same manner, if mk is number of iterations
in step 4 then,
smax
mk ploga
where smax ¼ maxfsk gðXs0 Þ.
s0
(22)

Let

'
&

smin
smax
c ¼ max loga
;
log
,
s a s
0

Therefore, phase 2 corresponds to the procedure
of ﬁnding the ﬁxed point of the contraction
transformation Cu þ Z with the contraction coefﬁcient q with respect to the initial approximation
u0 ¼ 0 and u1 ¼ Z. It follows that if we perform t
iterative steps in phase 2, then W1 corresponds to
the (t+1)th approximate solution ukþ1 of the ﬁxed
point W* of the contraction transformation. Using
Theorem 1 in subsection 12.2 of [4], the training
error can be bounded by
tþ1
tþ1
1

W À W Ã p q

u1 À u0 ¼ q
kZ kÃ .
Ã
Ã
1Àq
1Àq

(19)
It is easy to verify that expression (16) is
equivalent to the equation qtþ1 =ð1 À qÞkZ kÃ p.
Then the statement holds if the end condition b) is
used. On the other hand, in Eq. (19) at t ¼ 0, with
W 0 ¼ u0 , then u1 ¼ W 1 and
1

W À W Ã p q W 1 À W 0 .
(20)
Ã
Ã
1Àq
Combining (15) and (20) gives (17). Then the
statement holds if the end condition (a) is used.
The theorem is proved. &
3.4. Complexity of the algorithm
In this section, the complexity for each phase of
the algorithm is analyzed.

(23)

0

then the complexity of phase 1 is O(cnN2) (Fig. 3).
Phase 2: The number T of the iterations in phase
2 depends on the norm kZ kÃ of the vector z and the
value q. It follows from (16) and the proof of
Theorem 2 that T can be estimated by
À
Á
ln ð1 À qÞ=ðkZ kÃ Þ
ð1 À qÞ
T¼
.
(24)
¼ logq
kZ kÃ
ln q
Therefore, the complexity of the phase 2 is
O(TnN2). Hence, the total complexity of this new
algorithm is O((T+c)nN2).
4. Simulation study
Simulations for a 3-input RBF network are
performed in order to test the training time and
the generality of the algorithm. Its efﬁciency is also
compared to that of the gradient algorithm. The
network generality is tested by the following
procedure: ﬁrst, choose some points that do not
belong to the set of interpolation nodes, then after
the network has been trained, the network outputs

are compared to the true values of the function at
these points in order to estimate the error.
Because all norms in a ﬁnite-dimensional space
are equivalent (see [12, theorem in Section 9.2]) then
instead of the norm kÈukÃ Édetermined by (14), the
norm kukÃ ¼ maxjpN uj is used for the end
condition (15). Since kukÃ pN kukÃ , this change does
not inﬂuence the convergent property of algorithm.

ARTICLE IN PRESS
H.X. Huan et al. / Signal Processing 87 (2007) 2708–2717

2713

Fig. 2. Speciﬁcation of the ﬁrst phase of the algorithm.

Fig. 3. Speciﬁcation of the second phase of the algorithm.

4.1. Test of training time

4.1.1. Testing results
Simulations are done for the function y ¼ x21.x2+
sin(x2+x3+1)+4 with x1A[0,3], x2A[0,4], x3A[0,5].
The stopping rule is for e ¼ 10À6. Parameters q and
a are set in turn to 0.5 and 0.5; 0.8 and 0.5; 0.8 and
0.7; 0.8 and 0.8, respectively, with the number of
nodes varying from 100 to 5000. The simulation
results are presented in Table 1.
Table 2 shows results for q ¼ a ¼ 0.7 with 2500

nodes and different values for the stopping error e.
Comments: From these results, it is observed that:

The training time reﬂecting the convergence rate
is examined for several numbers of nodes and for
different values of parameters q, a and e.

(1) The training time of our algorithm is relatively
short (only about several minutes for case of
about 3000 nodes). It increases when q or a

The data in this following example are obtained
by approximately scaling each dimension and ﬁnally
combining the overall coordinates, and among
these, choose the data. The simulations are run on
a computer with the following conﬁguration: Intel
Pentium 4 Processor, 3.0 GHz, 256 MB DDR
RAM. The test results and comments of the
algorithm for the function y ¼ x21x2+sin(x2+
x3+1)+4 are presented below.

ARTICLE IN PRESS
H.X. Huan et al. / Signal Processing 87 (2007) 2708–2717

2714

Table 1
Training time for the stopping rule deﬁned by e ¼ 10À6
Number of nodes

e ¼ 10À6, q ¼ 0.5,
a ¼ 0.5
Training time (00 )

e ¼ 10À6, q ¼ 0.8,
a ¼ 0.5
Training time (00 )

e ¼ 10À6, q ¼ 0.8,
a ¼ 0.7
Training time (00 )

e ¼ 10À6, q ¼ 0.8,
a ¼ 0.8
Training time (00 )

100
400
1225
2500
3600
4900

1
2
30
167
378
602

1
3
35
170
390
886

1
4
37
173
530
1000

1
4
45
174
597
1125

Table 2
Training time for cases of 2500 nodes, q ¼ a ¼ 0.7 and e is changed
Number of nodes

e ¼ 10À9
Training time (00 )

e ¼ 10À8

Training time (00 )

e ¼ 10À6
Training time (00 )

e ¼ 10À5
Training time (00 )

2500

177

175

172

170

increase. This means that smaller are q or a,
shorter is the training time. However, the training
time is more sensitive in regard of a than of q.
(2) When the stopping error is reduced, the total training time very slightly changes. This feature means
that the high accuracy required for a given application does not strongly affect the training time.
4.2. Test of generality for q or a is changed
sequentially
To avoid unnecessary long running time, in this
part, we limit the node number to 400. These nodes
scattered in the domain {x1A[0,3], x2A[0,4], x3A
[0,5]} are generated as described above, and the
network is trained for different values of q and a,

with the stopping error e ¼ 10À6. After the training
is completed, the errors at 8 randomly chosen points
that do not belong to the trained nodes are checked.
Test results for cases that q or a is changed
sequentially are presented in Tables 3 and 4.
4.2.1. Test with q ¼ 0.8 and a is changed sequentially
Testing results: Experiment results for e ¼ 10À6,
q ¼ 0.8 and a is set in turn to 0.9; 0.8; 0.6; 0.4 are
presented in Table 3.
Comment: From these results, it can observed that
when a increases the checked errors decrease
quickly. It implies from that when a is small, width
parameters sk should be also small, this inﬂuences
the generality of network. In our experience, it is
convenient to set aA[0.7,y,0.9]; but the concrete

choice depends on the balance between demanded
training time and generality of network.
4.2.2. Test with a ¼ 0.9 and q is changed sequentially
Testing results: The results for e ¼ 10À6, a ¼ 0.9
and q is set in turn to 0.9; 0.7; 0.5; 0.3 are presented
in Table 4.
Comment: These results have shown that the
generality of network strongly increases when q
increases, although the change of q weakly inﬂuences to training time as mentioned in Section 4.1.
4.3. Comparing to gradient algorithm
We have performed simulations for the function
y ¼ x21x2+sin(x2+x3+1)+4 with 100 interpolation
nodes and x1A[0,3], x2A[0,4], x3A[0,5]. For the
gradient algorithm family, it is very difﬁcult to reach

a high training accuracy and it is also difﬁcult to
control the generality of networks. Beside their
training time, accuracy at trained nodes and error at
untrained nodes (generality) obtained by the gradient method and by our algorithm are now
compared. The program of the gradient algorithm
is built by using Matlab 6.5.
4.3.1. Test of accuracy at trained nodes
We randomly choose 8 nodes in 100 interpolation
nodes. After the training network by our algorithm
with e ¼ 10À6, q ¼ 0.8, a ¼ 0.9 (training time: 1 s)
and by the gradient algorithm in two cases in turn to
100 loop times (training time: 1 s) and 10,000 loop

ARTICLE IN PRESS
H.X. Huan et al. / Signal Processing 87 (2007) 2708–2717

2715

Table 3
Checking error at nodes with q ¼ 0.8; e ¼ 10À6 and a is set in turn to 0.9; 0.8; 0.6
Co-ordinate of checked point

X1

X2

X3

2.68412

2.21042
2.842314
2.842315
2.05235
2.84202
2.051234
2.52621

2.94652
1.052145
2.525423
3.789123
3.78235
3.789241
3.15775
3.36832

3.329423
0.040721
0.048435
3.283235
1.63321
3.283023
0.59763
0.86412

Original
function
value

26.065739
10.007523
23.983329
35.587645
20.063778
35.582265
16.287349
24.627938

q ¼ 0.8, a ¼ 0.9 (training
time ¼ 500 )

q ¼ 0.8, a ¼ 0.8 (training time
¼ 400 )

q ¼ 0.8, a ¼ 0.6 (training time
¼ 400 )

Interpolation
value

Error

Interpolation
value

Error

Interpolation
value

Error

26.0679
10.0024
24.01001
35.5818
20.05203
35.5986
16.28183
24.67451

21.6 Â 10À4
51.24 Â 10À4
266.81 Â 10À4
58.45 Â 10À4
117.48 Â 10À4
163.34 Â 10À4
55.16 Â 10À4
465.72 Â 10À4

26.06879
10.0144
24.0201
35.5799
20.0803
35.5621
16.294
24.58628

30.502 Â 10À4
68.763 Â 10À4
367.706 Â 10À4
77.452 Â 10À4
165.219 Â 10À4
201.655 Â 10À4
66.505 Â 10À4
416.584 Â 10À4

26.0691
10.0146
24.0251
35.5963
20.0812
35.561
16.295
24.5798

33.802 Â 10À4
71.163 Â 10À4
417.90 Â 10À4
86.548 Â 10À4
174.21 Â 10À4
212.65 Â 10À4
78.505 Â 10À4
481.38 Â 10À4

149.97 Â 10À4

Average error

174.298 Â 10À4

194.522 Â 10À4

Table 4
Checking error at nodes with a ¼ 0.9; e ¼ 10À6 and q is set in turn to 0.9; 0.7; 0.5
Co-ordinate of checked point
X1

X2

X3

Original
function
value

2.68412
2.21042
2.842314
2.842315
2.05235
2.84202
2.051234
2.52621

2.94652
1.052145
2.525423

3.789123
3.78235
3.7892411
3.15775
3.36832

3.32942
0.04072
0.04843
3.28323
1.63321
3.28302
0.59763
0.86412

26.06573
10.00752
23.98332
35.58764
20.06377
35.58226
16.28734
24.62793

Average error

q ¼ 0.9, a ¼ 0.9

q ¼ 0.7, a ¼ 0.9

Interpolation
value

Error

Interpolation
value

Error

Interpolation
value

Error

26.0655
10.0217
24.0112
35.5818
20.1105
35.5881
16.2853
24.6117

2.22 Â 10À4
141.79 Â 10À4
279.17 Â 10À4
58.03 Â 10À4
467.62 Â 10À4
58.26 Â 10À4

20.73 Â 10À4
162.8 Â 10À4

26.0654
10.0196
24.0204
35.5819
20.1159
35.5884
16.2852
24.6133

3.12 Â 10À4
120.33 Â 10À4
370.87 Â 10À4
57.27 Â 10À4
520.95 Â 10À4
61.45 Â 10À4
21.13 Â 10À4
146.16 Â 10À4

26.0693
10.0224
24.0221
35.5818
20.1135
35.5886
16.2775
24.6108

35.46 Â 10À4
149.06 Â 10À4
387.53 Â 10À4
58.08 Â 10À4
497.7 Â 10À4
63.11 Â 10À4
98.93 Â 10À4
171.74 Â 10À4

148.83 Â 10À4

times (training time: 180 s), we check the errors at
the chosen nodes to compare the accuracy of the
algorithms.
Testing results: The experiment results are presented in Table 5.
Comment: It can be observed that our algorithm is
much better than the gradient algorithm in both
training time and accuracy. This fact seems natural,
because the gradient algorithm uses an optimization
procedure. And it is known that it is difﬁcult to obtain
a high accuracy in using any optimization procedure.
4.3.2. Comparing of generality
We randomly choose 8 untrained nodes. After the
training network by two algorithms with the same
parameters in Section 4.3.1, we check the errors at
the chosen nodes to compare the generality of the
algorithms.

q ¼ 0.5, a ¼ 0.9

162.67 Â 10À4

182.71 Â 10À4

Testing results: Experiment results are presented
Table 6.
Comments: From these results, it is very important to observe that in MLP networks, it is well
known when the training error is small, the overﬁt
phenomenon might happen [13]. But for RBF
networks, the RBFs only have local inﬂuence such
that when data are not noisy, the overﬁt phenomenon would not be a serious problem. In fact, the
simulation results show that this new algorithm
offers a very short training time with a test error
very small, compared to the gradient algorithm.
5. Discussion and conclusion
This paper proposes a simple two-phase algorithm to train interpolation RBF networks. The ﬁrst
phase is to iteratively determine the widths of

ARTICLE IN PRESS
H.X. Huan et al. / Signal Processing 87 (2007) 2708–2717

2716

Table 5
Checking error at 8 trained nodes to compare accuracy
Co-ordinate of checked node

Original
function value

X1

X2

X3

1.666667
0.333333
2.666667
0.666667
2.666667
1.666667
2.333333
2.666667

0.000000
0.444444
0.444444
1.333333
1.333333
1.777778
0.444444
3.555556

0.000000
1.379573
1.536421
0.128552
1.589585

0.088890
0.039225
0.852303

4.841471
4.361647
7.320530
5.221158
12.77726
9.209746
7.415960
28.51619

Gradient algorithm with 100
loop times training time: 100

Gradient algorithm 10,000
loop times training time: 18000

New algorithm e ¼ 10À6,
q ¼ 0.8, a ¼ 0.9 training
time:100

Interpolation
value

Error

Interpolation
value

Error

Interpolation
value

Error

4.4645
3.5933
8.7058
4.0646
12.5041
6.6682
6.7228
28.0927

3769.7 Â 10À4
7683.4 Â 10À4
13852.7 Â 10À4
11565.5 Â 10À4
2731.6 Â 10À4
25415.4 Â 10À4
6931.5 Â 10À4
4234.9 Â 10À4

5.0959
3.6708
7.2647
4.9517

12.1965
9.2944
7.48
29.2798

2544.2 Â 10À4
6908.4 Â 10À4
558.2 Â 10À4
2694.5 Â 10À4
5807.6 Â 10À4
846.5 Â 10À4
640.4 Â 10À4
7636.0 Â 10À4

4.84146
4.36166
7.32052
5.22117
12.7772
9.20972
7.41596
28.5162

0.1 Â 10À4
0.09 Â 10À4
0.08 Â 10À4
0.1 Â 10À4
0.7 Â 10À4
0.2 Â 10À4
0.005 Â 10À4

0.09 Â 10À4

9523.1 Â 10À4

Average error

3454.5 Â 10À4

0.19 Â 10À4

Table 6
Checking error at 8 untrained nodes to compare the generality
Co-ordinate of checked
node
X1

X2

X3

0.32163
0.67123
1.68125
0.34312
2.65989
1.67013
2.65914
1.3163

0.45123

0.8912
1.34121
1.78123
3.56012
2.23123
3.56123
0.44925

1.38123
1.4512
0.27423
2.56984
0.8498
0.29423
0.85612
1.12987

Average error

Original
function value

4.350910
4.202069
8.293276
3.406823
28.42147
9.84913
28.41991
5.311670

Gradient algorithm with 100
loop times training time: 100

Gradient algorithm 10,000
loop times training time: 18000

New algorithm e ¼ 10À6, q ¼
0.8, a ¼ 0.9 training time:100

Interpolation
value

Error

Interpolation
value

Error

Interpolation
value

Error

2.1394
2.8529
6.1078
3.2115
27.5174

8.6415
27.5147
3.5188

22115.1 Â 10À4
13491.6 Â 10À4
21854.7 Â 10À4
1953.2 Â 10À4
9040.7 Â 10À4
12076.3 Â 10À4
9052.1 Â 10À4
17928.7 Â 10À4

3.9309
4.7884
8.3869
4.1438
29.1648
9.5863
29.1634
5.3729

4200.1 Â 10À4
5863.3 Â 10À4
936.2 Â 10À4
7369.7 Â 10À4
7433.2 Â 10À4
2628.3 Â 10À4
7434.8 Â 10À4
612.2 Â 10À4

4.32214
4.20115
8.30878
3.399
28.429
9.79204
28.419
5.28737

287.7 Â 10À4
9.1 Â 104
155.0 Â 10À4
78.2 Â 10À4
75.2 Â 10À4
570.9 Â 10À4
9.1 Â 10À4
243.0 Â 10À4

13439.0 Â 10À4

Gaussian RBF associated to each node. And each
RBF is trained separately from others. The second
phase iteratively computes the output layer weights
by using a given contraction mapping. It is shown in
this paper that the algorithm always converges; and
the running time only depends on the initial values
of q, a, e on the distribution of the interpolation
nodes and on the vector norm of the interpolated
function computed at these nodes.

Owing to numerical advantages of contraction
transformations, it is easy to obtain very small
training errors, and it is also easy to control the
balance between its convergence rate and the
generality of network by setting appropriate values
to parameters. One of the most important features
of this algorithm is the output layer weights can be
trained independently such that the whole algorithm
can be parallelized. Furthermore, for a large

4559.7 Â 10À4

178.56 Â 10À4

network, the stopping rule based on the norm of
N-dimensional vectors can be replaced by a much
simpler one deﬁned in Eq. (16) to avoid lengthy
computations.
When the number of nodes is very large, the
clustering approach can be used to regroup data
into several sets with smaller size. By doing so, the
training can be parallely done for each cluster and
helps to reduce the training time. The obtained
networks are called local RBF networks. This
approach might be considered as equivalent to the
spline method and it will be presented in a forthcoming paper.
In the case of a very large number N of nodes,
and for the point of view of neural network as
associative memory, another approach can be
exploited. In fact, an approximate RBF network

can be designed with the number of hidden nodes

ARTICLE IN PRESS
H.X. Huan et al. / Signal Processing 87 (2007) 2708–2717

much lesser than N, based on the following scheme.
First, the data set is partitioned into K clusters
fC i gK
i¼1 by using any clustering algorithm (for
example, the k-mean method). Then the center ni
of the RBF associated to the ith hidden neuron can
be chosen to be the mean vector di of Ci or the vector
in equation C i nearest to di. The network is trained
by theÈ algorithm
with the set of new interpolation
Ék
nodes ni i¼1 . Other choice based on any variation of
this approach can be made, dependently on the
context of the desired applications.
The advantage of RBF networks is its local
inﬂuence property (see [13,14]); so that width parameters are generally chosen small (see Section 6.1,
pp. 288–289 of [15]), especially, in [5, Section 7.7, p.
262]) it is suggested to choose s ¼ 0.05 or 0.1. In
Section 3.11, p. 99 of [5], it is also suggested to use
s ¼ 1=ð2NÞ1=n , which is very small. Therefore, qk
must be small. In fact, the condition qko1 presents
some limitations, but does not affect the algorithm
performance.
Our iterative algorithm is based on the principle

of contraction mapping; to insure the contraction
property, q chosen by (12) is fundamental to this
algorithm, so that this is rather empirical and it does
not correspond to any optimum consideration. In
RBF networks, to determine the optimum width
parameters (or q) is still an open problem.
References
[1] M.J.D. Powell, Radial basis function approximations to
polynomials, in: Proceedings of the Numerical analysis 1987,
Dundee, UK, 1988, pp. 223–241.

2717

[2] D.S. Broomhead, D. Lowe, Multivariable functional interpolation and adaptive networks, Complex Syst. 2 (1988) 321–355.
[3] E. Blanzieri, Theoretical interpretations and applications of
radial basis function networks, Technical Report DIT-03023, Informatica e Telecomunicazioni, University of Trento,
2003.
[4] S. Haykin, Neural Networks: A Comprehensive Foundation, second ed., Prentice-Hall Inc., Englewood Cliffs, NJ,
1999.
[5] C.G. Looney, Pattern Recognition Using Neural Networks:
Theory and Algorithm for Engineers and Scientist, Oxford
University Press, New York, 1997.
[6] F. Schwenker, H.A. Kesler, G. Palm, Three learning phases
for radial-basis-function networks, Neural Networks 14
(4–5) (2001) 439–458.
[7] E.J. Hartman, J.D. Keeler, J.M. Kowalski, Layered neural
networks with Gaussian hidden units as universal approximations, Neural Comput. 2 (2) (1990) 210–215.
[8] J. Park, I.W. Sandberg, Approximation and radialbasis-function networks, Neural Comput. 5 (3) (1993)
305–316.
[9] T. Poggio, F. Girosi, Networks for approximating and

learning, IEEE Proc. 78 (9) (1990) 1481–1497.
[10] M. Bianchini, P. Frasconi, M. Gori, Learning without local
minima in radial basis function networks, IEEE Trans.
Neural Networks 6 (3) (1995) 749–756.
[11] C. Michelli, Interpolation of scattered data: distance
matrices and conditionally positive deﬁnite functions,
Constr. Approx. 2 (1986) 11–22.
[12] L. Collatz, Functional Analysis and Numerical Mathematics, Academic Press, New York, 1966.
[13] T.M. Mitchell, Machine Learning, McGraw-Hill, New
York, 1997.
[14] H.X. Huan, D.T.T. Hien, An iterative algorithm for training
an interpolation RBF networks, in: Proceedings of the
Vietnamese National Workshop on Some Selected Topics
of Information Technology, Haiphong, Vietnam, 2005,
pp. 314–323.
[15] Mousoun, Fundamental of Artiﬁcial Neural Networks, MIT
Press, Cambridge, MA, 1995.

DSpace at VNU: A novel efficient two-phase algorithm for training interpolation radial basis function networks

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về