Tải bản đầy đủ (.pdf) (27 trang)

Nghiên cứu, phát triển kỹ thuật định vị trong nhà sử dụng tín hiệu wi fi tt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.03 MB, 27 trang )

MINISTRY OF EDUCATION AND TRAINING

MINISTRY OF SCIENCE AND TECHNOLOGY

NATIONAL CENTER FOR TECHNOLOGICAL PROGRESS

VU TRUNG KIEN

RESEARCH AND DEVELOPMENT FOR WI-FI BASED
INDOOR POSITIONING TECHNIQUE

SUMMARY OF DOCTORAL THESIS
Field of study: Electronics Engineering
Code: 9520203

HA NOI - 2019


The thesis is completed at:
National Center for Technological Progress

Supervisor: Prof., Dr. Le Hung Lan

Reviewer 1: Assoc. Prof., Dr. Thai Quang Vinh
Reviewer 2: Assoc. Prof., Dr. Ha Hai Nam
Reviewer 3: Assoc. Prof., Dr. Hoang Van Phuc

The thesis shall be defended in front of the Thesis Committee at
Academy Level at National Center for Technological Progress
At...... hour....... date...... month...... year 2019


The thesis can be found at: The Library of National Center for
Technological Progress; The National Library


LIST OF WORKS RELATED TO THE THESIS
HAS BEEN PUBLISHED
[CT1] Hoang Manh Kha, Duong Thi Hang, Vu Trung Kien, Trinh Anh
Vu (2017), Enhancing WiFi based Indoor Positioning by
Modeling measurement Data with GMM, IEEE International
Conference on Advanced Technologies for Communications,
IEEE, Quy Nhon, Vietnam, pp. 325-328
[CT2] Vu, T.K., Hoang, M.K., and Le, H.L. (2018), "WLAN
Fingerprinting based Indoor Positioning in the Precence of
Dropped Mixture Data", Journal of Military Science and
Technology. 57A(3), pp. 25-34.
/>EWZu6-/view
[CT3] Vu, Trung Kien and Le, Hung Lan (2018), "Gaussian Mixture
Modeling for Wi-Fi Fingerprinting based Indoor Positioning in
the Presence of Censored Data", Vietnam Journal of Science,
Technology and Engineering. 61(1), pp. 3-8,
DOI: />[CT4](ISI-Q2) Vu, Trung Kien, Hoang, Manh Kha, and Le, Hung Lan
(2019), "An EM algorithm for GMM parameter estimation in the
presence of censored and dropped data with potential application
for indoor positioning", ICT Express, 5(2), pp. 120-123,
DOI: 10.1016/j.icte.2018.08.001
Accepted paper:
[CT5](ISI-Q3) Vu, Trung Kien, Hoang, Manh Kha, and Le, Hung Lan
(2019), “Performance Enhancement of Wi-Fi Fingerprinting
based IPS by Accurate Parameter Estimation of Censored and
Dropped

Data”,
Radioengineering,
ISSN:
1805-9600.
Submission: 06/04/2019, Reviews Opened: 27/05/2019,
Accepted: 03/09/2019.


1

INTRODUCTION
1. The necessity of the thesis
Satellite based positioning systems such as the GPS (Global
Positioning System) can accurately locate objects in outdoor
environments. However, in indoor environments, because satellite
signals are not transmitted directly to the positioning device, the
accuracy of these systems is greatly reduced. On the other hand, there
are more and more indoor navigation needs, such as positioning for
smartphone users to move in terminals, airports, and commercial
centers; locating for goods in stock; positioning for cars in the parking
lots ... . For these reasons, in recent years, the IPS (Indoor Positioning
System) is interested in research and development.
Among the current indoor positioning technologies, Wi-Fi based
positioning technology in the WLAN (Wireless Local Area Network) is
most commonly used due to some reasons such as: Wi-Fi is available at
most areas, popular mobile devices such as phones and computers are
equipped with Wi-Fi signal transceivers.
According to the above reasons, the author has chosen the topic:
"Research and development for Wi-Fi based indoor positioning
techniques", which delves into the research of RSSIF-IPT (Received

Signal Strength Indication Fingerprinting based Indoor Positioning
Technique).
2. Scope of the study
Researching techniques for positioning the static objects in 2dimensional space in indoor environments. Positioning technique
focused on research is RSSIF-IPT. The studied issues include:
Characteristics of Wi-Fi RSSI; modelling the distribution of Wi-Fi
RSSI; algorithm to estimate parameters, optimize the parameters of the
model used to model the distribution of Wi-Fi RSSI; online positioning
algorithm.


2

3. Research objectives of the topic
Researching and developing the Wi-Fi RSSI fingerprinting based
indoor positioning technique in order to minimize positioning errors and
optimize positioning time. The detailed research objectives are as
follow:
+ Developing algorithms to estimate the parameters and number of
Gaussian components in GMM (Gaussian Mixture Model) in the
presence of unobservable data;
+ Developing a positioning algorithm for minimizing positioning
errors and optimizing positioning time;
4. Methods
Statistical method for conducting the characteristics of collected data
(Wi-Fi RSSI); analytical method for developing parameter estimation
algorithms and positioning algorithms; Monte Carlo method for
evaluating proposed algorithms; empirical methods on both simulation
data and real data to verify the effectiveness of the proposals applied to
IPS.

5. New findings of the doctoral dissertation
- The parameter estimation algorithm for GMM in the presence of
censored and dropped mixture data [CT2-CT4];
- The model selection algorithm for GMM from incomplete data
[CT5];
- The positioning procedure in the presence of unobservable data
[CT5].
6. Organization of dissertation
The thesis will be divided into 4 chapters: Chapter 1: Overview of WiFi based IPS. Chapter 2: GMM parameter estimation in the presence of
censored and dropped data. Chapter 3: GMM model selection in the
presence of censored and dropped data. Chapter 4: Positioning algorithm
and experimental results.


3

CHAPTER 1. OVERVIEW OF WI-FI BASED IPS
1.1. Wi-Fi based indoor positioning techniques
Wi-Fi based indoor positioning techniques (IPT) can be divided into
two main groups:
- Time and Space Attributes of Received Signal (TSARS) based IPTs.
TSARS can be the Time of Arrival (ToA); the Time Difference of
Arrival (TDoA) or the Angle of Arrival (AoA).
- RSSI based IPTs. This group includes the proximity positioning
technique; Path Loss Model (PLM) based positioning technique and
RSSIF-IPT.
RSSIF-IPT consists of two phases: the offline training phase and the
online positioning phase. During the training phase, RSSIs were
collected at the reference points (RP) to build the database. At the online
positioning stage, the RSSIs collected by the object (OB) are compared

with the database, thereby estimating the position of the OB through the
location of one or several RPs. Among the positioning techniques,
RSSIF-IPT has the most advantages.
RSSIF-IPT can be utilized the deterministic method (D-RSSIF-IPT) or
probability method (P-RSSIF-IPT). Compared with D-RSSIF-IPT, PRSSIF-IPT has lower positioning error because the database of this
method can cover the variation of RSSI. P-RSSIF-IPT can use nonparametric model (e.g. histogram) or parametric model (e.g. Gaussian
process, GMM) to model the distribution of Wi-Fi RSSIs. P-RSSIF-IPT
using a parametric model has lower positioning errors; the database has
to store fewer parameters than P-RSSIF-IPT using a non-parametric
model.
1.2. Theoretical studies about the available RSSIF-IPT
The distribution of Wi-Fi RSSIs can be fitted by the Gaussian process
or the GMM if data was collected under the changing conditions (e.g.


4

door opening or closing, the moving of commuters). Therefore,
compared to Gaussian process, GMM can model Wi-Fi RSSI
distribution more accurately.
However, some data samples may not be observable due to either of
the following reasons:
- Censoring, i.e., clipping. This problem refers to the fact that sensors
are unable to measure RSSI values below some threshold, such as −100
dBm.
- Dropping. It means that occasionally RSSI measurements of access
points are not available, although their value is clearly above the
censoring threshold.
While censoring occurs due to the limited sensitivity of Wi-Fi sensors
on portable devices, dropping comes from the limitation of sensor

drivers and the operation of WLAN system.
According to our data investigation, the data set (Wi-Fi RSSIs)
collected at an RP, from an AP has the characteristics corresponding to
one of the following eight cases:
(1) The distribution of data can be drawn from one Gaussian
component, data set are observable;
(2) The distribution of data can be drawn from one Gaussian
component, a part of data set are unobservable due to censoring
problem;
(3) The distribution of data can be drawn from one Gaussian
component, a part of data set are unobservable due to dropping
problem;
(4) The distribution of data can be drawn from one Gaussian
component, a part of data set are unobservable due to censoring and
dropping problems;
(5) The distribution of data can be drawn from more than one Gaussian
component, data set are observable;


5

(6) The distribution of data can be drawn from more than one Gaussian
component, a part of data set are unobservable due to censoring
problem (figure 1.10a);
(7) The distribution of data can be drawn from more than one Gaussian
component, a part of data set are unobservable due to dropping
problem (figure 1.10b);
(8) The distribution of data can be drawn from more than one Gaussian
component, a part of data set is unobservable due to censoring and
dropping problems (figure 1.10c).


a.

b.

c.

Figure 1.10. Histogram of Wi-Fi RSSIs
The authors in published articles solved the data set with
characteristics such as (1) - (5). However, no studies have been able to
solve the data set with the same characteristics as the cases (6) - (8). For
this reason, the thesis focuses on researching and proposing solutions to
develop RSSIF-IPT to simultaneously solve the problems of censoring,
dropping and multi-component problems (cases (6) - (8)).
1.3. Conclusion of chapter 1
In this chapter, the thesis presents available Wi-Fi based indoor
positioning techniques. Chapter 1 also summarizes and analyzes related
works on RSSIF-IPT. According to related works and the issues that
have not been solved for RSSIF-IPT, the thesis proposes scientific
research goals.


6

CHAPTER 2. GMM PARAMETER ESTIMATION IN THE
PRESENCE OF CENSORED AND DROPPED DATA
2.1. Motivation
In indoor environment, data set (Wi-Fi RSSIs) collected at a RP from
an AP can be modeled by the GMM with J Gaussian components (J is a
finite number). Let yn is RSSI value gathered at n time, ( yn  ,

th

n  1  N ), N is the number of measurements. yn are independent and

identically distributed random variables. In a GMM, the PDF
(Probability Density Function) of an observation yn is:
J

p  yn ; Θ    w j  ( yn ; j ),

(2.1)

j 1

Θ is a set of parameters of GMM, w j and  j are mixing weights
and parameters jth Gaussian component.
While y   y1 ,y2 ,...,yN  is the set of unobservable, non-censored, non-

dropped data (complete data), let c be the specific threshold at which a
portable device (e.g., smart phone) does not report the signal strength;
let x   x1 ,x2 ,...,xN  be the set of observable data, censored, possibly
dropped data (incomplete data). The censoring problem can be presented
as follow:
 yn if yn  c

xn  

 c if yn  c

,n  1 N.


(2.4)

Let d   d1 ,d2 ,...,dN  be the set of hidden binary variables indicating
whether an observation ( yn ) is dropped (dn  1) or not (dn  0) . The
dropping problem can be presented as follow:
 y if d n =0
xn   n
, n  1  N.
c
if
d
=1
n


(2.5)


7

If data are unobservable owing to the censoring and dropping
problems then:
 y if yn  c and d n = 0
xn   n
,n  1 N.
c
if
y


c
and
d
=
1
n
n


(2.6)

The motivation of this chapter is GMM parameters estimation via
incomplete data (x) .
2.2. Introduction to the EM algorithm
The EM (Expectation Maximization) algorithm is an iterative method
for ML (Maximum Likelihood) estimation of parameters of statistical
models in the presence of hidden variables. This method can be used to
estimate the parameters of a GMM, including two steps:
- E-step: Creates a function for the expectation of the loglikelihood evaluated using the current estimate for the parameters.
- M-step: computes parameters maximizing the expected loglikelihood found on the E-step.
2.3. GMM parameter estimation in the presence of censored data
The EM algorithm for GMM parameters estimation in the presence of
censored data (EM-C-GMM) [CT3] is developed as follows:
Let Δnj ( n  1  N , j  1  J ) be the latent variables, Δ nj  1 if yn belongs
to j th Gaussian component, Δ nj  0 if yn does not belong to j th Gaussian
component. The expectation of log-likelihood function (LLF) of y given
by observations (x) and old estimated parameters are calculated:
E-step:

 




Q Θ; Θ ( k )   ln   Θ; y, Δ  x; Θ( k )
N

J 





    nj ln  w j p  yn ; j   p yn ,  nj | xn ; Θ
n 1 j 1 





(k )

 dy .
n

(2.17)


8






Function Q Θ; Θ(k ) was calculated for two case including xn  yn and

xn  c , obtained by:
Q  Θ; Θ

(k )

   1  z    x ;   ln  w   ln    x ;  
N

J

n

n

(k )
j

j

n

j

n 1 j 1


c


  yn ; j( k ) 
 znβ    ln  w j    ln    yn ; j  
dyn  .
(k )
I

n 1 j 1

0 j 


N

J

(2.19)

(k )
j

In the equation (2.19), zn (n 1 N ) are hidden binary variables
indicating whether yn is unobservable  zn  1  xn  c  or observable

 z n  0  xn  y n 
The notations ( xn ; (jk ) ) , β((jk ) ) and I0 ( (j k ) ) are given as follows:





xn ; (jk )

β



(jk )










w(jk )  xn ; (j k )
J


j 1

c

J

 w(jk )


      y ; 

I0 

n



xn ;

(k )
j



;

(2.20)

 ;
I  

w(jk ) I0  (j k )
j 1

(k )
j

w(jk ) 




(k )
j



0

(2.21)

(k )
j

 cj
1
dyn  erfc  

2
2 (jk )

(k )




.




(2.22)

M-step:
Re-estimated parameters at (k+1) iteration are obtained by computing



the partial derivatives of Q Θ;Θ(k )



in the equation (2.19) w.r.t. the

elements of  j , j , w j and setting them to zero, then we arrived at
formulae given in the equations (2.23)-(2.25).


9
N

 1  z    x ;   x
n

n 1

 (j k 1) 

(k )
j


n

N

 1  z    x ; 
n

n 1

n

(k )
j

 



I

0



I1 

 1  zn    xn ; (jk )
n1


N

n

n

(k )
j

n

z


n 1

(k )
j

N

(k )
j

n 1

n

.


(k )
j

n

N

n 1

n

 I 2  (j k )  2 (j k ) I1  (j k ) 
 N
(k )
(k ) 2
β  j  

   j    zn
(k )
(k )
I

I

 0  j 
 n1
0 j 
+
.
N


n

n

N

w

( k 1)
j



(k )
j

(k )
j

n1

n 1

n

n

N


 

(k )
j

n

N

 1  z    x ;    β     z
(k )
j

(2.24)

N

 1  z    x ;    β     z
n1

(2.23)

(k ) 2
j

 1  z    x ;    β     z
n1

(k )
j


N


  β   I   z
 
 x   
(k )
j

0

N

2 ( k 1)
j

 β 

n

(k )
j

I1  (j k ) 

n 1

(2.25)


n

.

 

In the equations (2.23)-(2.25), I1  (j k ) and I2  (j k ) are given as follows:

 

 

I1  (j k )   (jk ) I0  (j k )

    



2
 
(k )  
c


1 (k )
j
 ;

 j exp  
(

k
)
 2 j  
2
 
 

 



1 (k )

I2  (j k )    (jk )   (jk )  I0  (j k ) 
 j c   (jk )
2


2

2



(2.26)

2
 
(k )  
c



j
 .
exp   
(
k
)
 2 j  
 
 

(2.27)

2.4. GMM parameter estimation in the presence of dropped data
The EM algorithm for GMM parameters estimation in the presence of
dropped data (EM-D-GMM) [CT2] is developed as follows:
E-step:


10



(k )

Q Θ;Θ
N

N


J

  d w







n1 j 1

J

(k ) 
j ln

n

 w   ln  
j

(2.30)





1  dn   xn ; (jk ) ln  w j   ln 1    ln   xn ; j   .

n1 j 1

In the equation (2.30),   P(dn 1) is the dropping probability.
M-step:

1  dn    xn ; (jk )  xn


n1
N

 (jk 1) 

1  dn    xn ; (jk ) 


n1
N

N

 2j 

( k 1)



 1  dn    xn ; (jk )  xn   (jk ) 
n1




2

N

1  dn    xn ;(jk ) 

n1



N

w(jk 1)

(2.31)

.



N

(2.32)

N

 1  dn   xn ; (jk )  dn w(jk )
n 1


.

n 1

.

(2.33)

N



( k 1)



dn

n1
N

.

(2.34)

2.5. GMM parameter estimation in the presence of censored and
dropped data
The EM algorithm for GMM parameters estimation in the presence of
censored and dropped data (EM-CD-GMM) [CT4] is developed as

follows:
E-step:


11



Q Θ; Θ( k )
N



J









  1  vn   xn ; (jk ) ln  w j   ln 1    ln   xn ; j  
n1 j 1
N

J

vnβ

n1 j 1
N



(jk )

 α Θ

(k )

,

(k )





J



 

c

ln  w 1    ln  y ; 

n

j 
  j 





 yn ; (j k )

 

I0  (j k )

 dy

n



vn w(jk ) 1  α Θ( k ) , ( k )  ln  .


n1 j 1

(2.52)
In the equation (2.52): vn (n  1 N ) are hidden binary variables
indicating whether yn is unobservable  vn  1  xn  c  or observable
J

 vn  0  x n  y n  ; α  Θ


(k )

, ( k )  

1  ( k )   w(jk ) I0  (j k ) 
j 1

J

1     w
(k )

j 1

I  (j k )    ( k )

(k )
j
0

M-step:



N








 
 I  v
  .
,   v
I1  (j k )

 1  vn   xn ; (jk ) xn  β (jk ) α Θ(k ) , (k )
 (jk 1) 

n1

1  vn   

n1
N

xn ; (jk )

  β

(jk )

 α Θ

N

 2j 


( k 1)



(k )

1  vn    xn ; (jk )  xn   (jk ) 

n 1

N

1  vn   

n 1

N

n
(k )
n

1
j
N

0

(k )


(2.53)

n

n1

2

N

  β   α Θ ,   v
I 

2 I  



β    α  Θ , 

    v
I 

I  

 


+
.

 1  v    x ;    β    α Θ ,   v
(k )
j

xn ; (jk )

(k )

(jk )
(k )
j

2

(k )

n 1

n

n

(k )
j

(k )

n 1

(k )

j
1

(k )
j

0

N

(k )

0

(k )
j

n

(k )
j

(k )
j

(k )
j

(k )


(k )

2

n 1

N

n 1

N

n

n

(2.54)


12
N

w(jk 1)



1  vn   

n1




1  α Θ ( k ) , ( k )  v

 n1 n

xn ; (jk )

  β

 α Θ

(k )

,

(k )

N

v
n 1

N

n

(2.55)

N






N



(jk )

( k 1)



.
N





1  α Θ(k ) , ( k )  v

 n1 n

(2.56)

.


N
As can be seen in equations (2.53) - (2.56), collected data, including
observable, censored and dropped samples are contributed to the
estimate, simultaneously. This means the proposed EM algorithm can
deal with all the mentioned phenomena presented in collected data.

2.6. Evaluation of the EM-CD-GMM
In this section, the proposed EM-CD-GMM was evaluated and
compared to other EM algorithms by using Kullback Leibler Divergence
(KLD). After 1000 experiments, the mean of KLD (KLD ) is shown in
table 2.1 and standard deviation of KLD (  KLD ) is shown in table 2.2
(when c= – 90dBm).
Table 2.1.  KLD of the EM algorithms after 1000 experiments
c
(dBm)

Algorithm

–90

EM-GMM
EM-CD-G
EM-CD-GMM



0
3.1491
0.0798
0.0098


0.075
3.2325
0.0864
0.0111

0.15
3.3142
0.1096
0.0229

0.225
3.5054
0.1329
0.0334

0.3
6.1253
0.1998
0.0364

Table 2.2.  KLD of the EM algorithms after 1000 experiments
c
(dBm)

Algorithm

–90

EM-GMM

EM-CD-G
EM-CD-GMM



0
0.0351
0.1199
0.0227

0.075
0.3535
0.1364
0.0601

0.15
1.7911
0.1535
0.0857

0.225
2.202
0.1963
0.1005

0.3
2.4937
0.296
0.1302



13

As can be seen in table 2.1 and table 2.2:
- When   0 and c 96, data are almost observable. The EM-GMM
and the EM-CD-GMM introduced the same results. The EM-CD-G has
a larger error due to the fact that this algorithm assumed the distribution
of data by the Gaussian process.
- For other cases,  KLD and  KLD of the EM-CD-GMM are always the
smallest. Hence, EM-CD-GMM is the most effective algorithm for
GMM parameter estimation in the presence of censored and dropped
data.
2.7. Conclusion of chapter 2
In chapter 2, the author proposed three algorithms to estimate the
parameters of GMM in the following cases: A part of the data set cannot
be observed due to censoring; due to dropping; due to censoring and
dropping. Experimental results had demonstrated the effectiveness of
EM-CD-GMM algorithm compared to EM-GMM and EM-CD-G.


14

CHAPTER 3. GMM MODEL SELECTION IN THE PRESENCE
OF CENSORED AND DROPPED DATA
3.1. Motivation
In the complex indoor environments, the histogram of collected Wi-Fi
RSSIs can be drawn from one or more than one Gaussian components. If
using GMM with J Gaussian components, the number of parameters of
GMM will be NPs = 3J-1. This means that the number of parameters to
store in the database and the computational cost of positioning

algorithms are proportional to the number of Gaussian components used
to describe the distribution of Wi-Fi RSSIs. Therefore, it is necessary to
have a solution to estimate the number of Gaussian components in
GMM to optimize the database and reduce the complexity of the
calculations in the positioning algorithm of the IPS.
3.2. Methods for GMM model selection
3.2.1. Penalty Function (PF) based methods

x be the mixture and observable data set; N is the number of
ˆ is the set of parameters of GMM with J Gaussian
samples in x ; Θ
J
Let

ˆ | x) is the
components; N Ps is the number of parameters of GMM; (Θ
J

likelihood function. PF of Akaike Information Criterion (AIC), AIC3
and Bayesian Information Criterion (BIC) were defined as follows:

PFAIC (Θˆ J )  2ln[(Θˆ J | x)]  2NPs.

(3.3)

PFAIC3(Θˆ J )  2ln[(Θˆ J | x)]  3NPs.

(3.4)

PFBIC (Θˆ J )  2ln[(Θˆ J | x)]  NPs ln  N .


(3.5)

3.2.2. Characteristic Function (CF) based methods
The CF based method uses the convergence of the Sum of Weighted
Real parts of all Log-Characteristic Functions (SWRLCF) to determine
the number of Gaussian components, is as follows:


15

J

SWRLCF( J )   wˆ jˆ j .

(3.6)

j 1

3.3. GMM model selection in the presence of censored and dropped
data [CT5]
The term ln [(Θˆ J | x)] of PFBIC in the equation (3.5) can be calculated
as follows:








J

1  vn  ln  1 ˆ wˆ j  xn ;ˆj
n 1
j 1

N
J


 vn ln  1 ˆ
wˆ j I0 ˆ j ˆ .
n 1
j 1



ˆ ,ˆ | x  
ln  Θ
J



N


















 



(3.7)

ˆ ,ˆ ) be the PF of BIC in the presence of censored and
Let PFBICCD (Θ
J

dropped data, we have:





N




J

n1



j 1







ˆ ,ˆ  2 1  vn  ln 1 ˆ  wˆ  xn ;ˆ 
PFBICCD Θ
j

 j
J

 2 vn ln  1 ˆ
n1

N






  wˆ j I0 ˆj  ˆ   3J ln  N .
J



j 1



(3.12)

The algorithm for GMM parameter estimation and model selection in
the presence of censored and dropped data (EM-CD-GMM-PFBIC-CD) is
as follows (figure 3.4):
Input: A set of incomplete data (x) , convergence threshold of the EM
algorithm for CD-GMM ( EM ) and the maximum number of Gaussian
components ( J max ) for calculating PFs.
Output: The estimated number of Gaussian components ( Jˆ ) and
ˆ ,ˆ ) in the CD-GMM using to model the
estimated parameters (Θ

distribution of x .


16
Begin
J 1

The EM algorithm


k 1

Initiate  j  j  1  J  and 

k  k 1

E-step: According to equation (2-5,10,11),

 , I   , β  Θ  , α  Θ ,  ( k )  ,
  at k th iteration  j =1  J  ;

compute   x ; Θ
n



I1  j( k )

 and I

2

(k )
j

(k )

0


j

(k )

(k)
j

(k )
j

According to equation (18), compute:



 

ln   Θ

J

(k )

,

(k )



| x  at k iteration
th




M-step: According to equation (6-9),
compute: Θ (jk 1) =  


( k  1)
j

, 

2
j



( k  1)

,w

( k  1)
j




and  ( k 1) at  k  1 iteration  j =1  J  ;
th


According to equation (18), compute:



 

ln   Θ J

( k 1 )



, ( k 1) | x  at  k  1 iteration
th





 



False

( k 1)
(k )
ln   Θ J 
, ( k 1) | x   ln   Θ J  , ( k ) | x   






True
Output a set of estimated parameters in the CD-GMM
ˆ J   Θ J ( k 1) ,ˆ   ( k 1)
with J Gausssian components: Θ



ˆ J ,ˆ
According to equation (19), compute PFBIC  CD Θ

False

J=Jmax



J  J 1

True
Select the smallest PFBIC CD among J max penalty functions:














ˆ Jˆ ,ˆ  min  PF
ˆ J 1 ,ˆ ,..., PF
ˆ J=J ,ˆ 
PFBIC CD Θ
Θ
BIC CD Θ
 BIC CD



max

 

Output the estimated number of Gaussian components Jˆ
ˆ Jˆ ;ˆ
and estimated parameters: Θ

End

Figure 3.4. The EM-CD-GMM-PFBIC-CD algorithm



17

3.4. Evaluation of GMM model selection algorithms
In this section, the following GMM model selection algorithms will be
evaluated through various experiments with artificial data:
- GMM model selection algorithm utilized the EM-GMM and PFAIC
(EM-GMM-PFAIC); initialized parameters are  EM  106 , Jmax  6 ;
- GMM model selection algorithm utilized the EM-GMM and PFBIC
(EM-GMM-PFBIC); initialized parameters are  EM  106 , Jmax  6 ;
- GMM model selection algorithm utilized the EM-GMM and
SWRLCF (EM-GMM-SWRLCF); initialized parameters are  EM  106 ,

CF  0.02;

- The proposed algorithm (EM-CD-GMM-PFBIC-CD), initialized
parameters are  EM  106 , Jmax  6 .
After 1000 experiments, different levels between the true number ( J )
and estimated number ( Jˆ ) of Gaussian components were recorded in
table 3.2.
As can be seen in Tab.2, the proposed method introduced far better
results than other approaches, especially when data are suffered from
censoring or dropping or both of them. This can be explained as follows:
The proposed method utilized the extended version of the EM algorithm
in which both observable data ( xn  yn ) and unobservable data ( xn  c)
are contributed to the estimates. When data are unobservable owing to
the censoring and dropping problems, this algorithm produces a lot
better results compared to the standard EM algorithm. Moreover, in the
PF of AIC, the PF of BIC and SWRLCF, unobservable data had almost
no practical contribution while they really contributed to the likelihood
in PF of our proposal, as mentioned in sub-section 3.3.



18

c
(dBm)

Table 3.2. Different levels between J and Jˆ of four approaches
Methods

Probability

EM-GMM-PFAIC

P(J =Jˆ)
P(| J  Jˆ |1)
P(J  Jˆ | 2)
P(J =Jˆ)

EM-GMM-PFBIC

92
EM-GMM-SWRLCF

P(| J  Jˆ |1)
P(J  Jˆ | 2)
P(J =Jˆ)
P(| J  Jˆ |1)
P(J  Jˆ | 2)
P(J =Jˆ)


EM-CD-GMM-PFBIC-CD

P(| J  Jˆ |1)
P(J  Jˆ | 2)



0
0.01
0.31
0.68
0.01
0.39
0.6
0.52
0.39
0.09
0.82
0.16
0.02

0.1
0.01
0.27
0.72
0.01
0.37
0.62
0.02

0.78
0.2
0.8
0.18
0.02

0.2
0.01
0.22
0.78
0.01
0.3
0.69
0.01
0.77
0.22
0.79
0.2
0.01

3.5. Conclusion of chapter 3
When a portion of the data is not observed due to dropping or
censoring or both, the other GMM model selection algorithms have a
large error due to the absence of unobserved data samples. . In chapter 3,
PF of BIC is calculated on both the observed data samples and the
unobserved data samples. These are new findings of the proposed GMM
model selection method compared to others.


19


CHAPTER 4. POSITIONING ALGORITHM AND
EXPERIMENTAL RESULTS
4.1. Motivation
P-RSSIF-IPT includes offline training phase and online positioning
phase. In the offline training phase, let NRP be the number of RPs; N AP is
the number of APs; x q ,i  q  1  N RP , i  1  N AP  is the data set collected
at qth RP from ith AP. Therefore, database built in the offline training
stage of IPS utilized P-RSSIF-IPT is:

R  Θˆ q ,i ; q  1  N RP , i  1  N AP ,

(4.1)

ˆ q,i is the set of parameters in the GMM used to model the
Θ
distribution of x q ,i , estimated by the EM-CD-GMM-PFBIC-CD.
During the online positioning phase, let xon  ( x1on ... xNonAP ) be the
data set collected by OB, the positioning problem can be formulated as a
classification problem, where the classes are the positions from which
RSSI measurements are taken during the offline training phase (RPs).
To estimate the target’s position, a MAP (maximum a posteriori)
based classification rule is developed in this chapter. The censoring and
dropping problems were also considered in this proposal.
4.2. Optimal classification rule for censored and dropped mixture
data [CT5]
Let  q be the position of the qth RP; xon  [x1on , x2on ,..., xNonAP ] is the data
set gathered by OB. Posterior probability is determined as follows:
N AP


p  xion |  q  P   q 

i 1

p   q | xon   N RP N AP

p


q' 1 i 1

xion

  

|  q' P  q'

(4.2)


20

In the equation (4.2), P( q ) is the marginal probability, considering
that RPs are independent of each other, then
N RP N AP

p  xion |  q'  P   q' 


q' 1 i 1


is the normalizing constant;

P  q  

1
;
N RP

p  xion |  q 

is

likelihood, can be calculated as follows:
Jˆq ,i
 N AP

1 ˆ q ,i
wˆ q ,i , j  xion ;ˆq ,i , j
 i 1
j 1
khi xion > c
N N
ˆ
J
q ',i
 RP AP
ˆ
1  q ',i
wˆ q ',i , j  xion ;ˆq ',i , j


j 1
 q '1 i 1
  N AP  Jˆq ,i


ˆ
 w
ˆ I 
1 ˆ q ,i  ˆ q ,i 
 i 1  j 1 q ,i , j 0 q ,i , j



 khi x on  c
i
 N RP N AP  Jˆq ,i


 w
ˆ q ,i , j I0 ˆq ,i , j 1 ˆ q ,i  ˆ q ,i 



 q '1 i 1  j 1




p   q | xon 


 















 



 

 

(4.9)



Using the set KNN of nearest neighbors which is chosen among the

offline locations by taking those with the largest posteriors, the final
location estimate is then obtained by the weighted average:
 q p   q | x on 

q

K
ˆ  x on  
 qK p   q | xon 
NN

(4.10)

NN

4.3. Experimental results
4.3.1. Positioning accuracy
In order to evaluate the positioning accuracy of the proposed method,
compared to the other state-of-art approaches, the author of this thesis
conducts experiments with both simulation data and real field data.


21

4.3.1.1. Simulation results
In order to evaluate the effectiveness of the proposed approach, a floor
plan having an overall size of 45m by 45m with 100 RPs and 10 APs
was generated. The training data were collected as following:
(1) Collect data at each RP from each AP according to PLM:
r

yn[dBm]=RSSI 0[dBm]  10 log10     
(4.11)
r0
(2) Rounding yn .
(3) Generate censored and dropped data,   0.15 , c  100dBm .
In the training phase, 400 measurements were collected at each RP
from each AP. Data collected at 50% of the RPs is distributed according
to the Gaussian process; data collected at other 17% RPs are distributed
following GMM with 2 Gaussian components; data collected at another
17% RPs have a distribution conforming to GMM with 3 Gaussian
components; data collected at the remaining 16% RPs was distributed
according to the GMM with 4 Gaussian components.
During the online positioning phase, 1000 measurements were
collected at 100 locations of 100 RPs. At each location, 10 samples were
collected in the same scenarios with the training data. Table 4.2. shown
the mean and variance of positioning error of IPS utilized 4 approaches,
including:
- Histogram.
- EM-GMM-AIC-MaP. In this approach, at the training stage, GMM
is used to model the distribution of data and the EM-GMM algorithm
combined with the AIC standard is used to estimate the parameters of
GMM. At the online positioning stage, the MaP based positioning
algorithm was applied.
- EM-CD-G-MaP. This approach used Gaussian process to model the
distribution of data and used EM-CD-G algorithm to estimate
parameters in the training phase. The MaP based positioning algorithm
was also applied in the online positioning stage.


22


- EM-CD-GMM-BIC-MaP is the method proposed by the author. In
the training phase, GMM is used to describe the distribution of data and
the EM-CD-GMM-PFBIC-CD algorithm is used to estimate the parameters
of GMM. The initiated parameters of the EM-CD-GMM-PFBIC-CD
include:  EM  106 , Jmax  6 . In the online positioning stage, the
algorithm locates based on the MaP method (section 4.2) was utilized.
The number of nearest neighbours is KNN  3. After 1000 experiments,
positioning results were calculated and reported in table 4.1.
4.3.1.2. Experimental results
In order to evaluate the positioning accuracy of the proposed method,
compared to the three state-of-art approaches on real data, the author
conducted experiments on a ground of a floor having overall size of
360m2.
In the training phase, RSSI values were taken at 25 RPs (25 free
positions, without wall, furniture), roughly evenly distributed, resulting
in an average distance of 2,7m between two locations. At each RP, 400
measurements were collected from each available AP. Training
measurements were gathered at four different times of the day including
morning, noon, afternoon and evening (100 samples per section). The
direction of the collecting equipment (smart phone) was also changed
during the measurement collection. In each direction of 0o, 90o, 180o and
270o, 25 measurements were collected. There are 4 APs which are
available at all positions of 25 RPs among 26 APs detected in collected
training data. The strongest AP selection strategy was applied to select 4
APs which have the largest mean of RSSI values and use them to build
the radio-map by utilizing the algorithm introduced in sub-section 4.2.
The convergence threshold of the EM algorithm was set to 106
( EM  106 ) and the maximum number of Gaussian components for
calculating PFs was set to 6 ( J max  6) .

In the online phase, 100 sets ( x on ) of Wi-Fi RSSI measurements were
gathered at the positions of 25 RPs (4 sets per RP) in the same scenarios
with the training data. The MAP method presented in sub-section 4.2


×