Tải bản đầy đủ (.pdf) (55 trang)

Phân tích hồi quy bằng Support Vector Machines (SVM) Đề tài nghiên cứu khoa học cấp cơ sở

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.86 MB, 55 trang )

BO
GIAO DUC VA DAO TAO
• • •
TRUONG DAI HOC KINH TE TP.HO
CHi
MINH
• •
DE
TAl
NGHIEN
ciru
KHOA
HOC
CAP
CO
SO

. ' .
PHAN
TiCH
HOI
QUY BANG
SUPPORT
VECTOR
MACHINES (SVM)
MA
SO: CS-2007-01
BQ
GIAO
DUC.DAOTAO-
. .


,-
TRIJONG
D/,11
HQC
KINH
r{rP.HCM
THU'VItN
I
r-l19c:-
'
CN: ThS. GVC
HuYNH
VAN
DUC
TP. HO
CHi
MINH NAM 2009
BO GIAO DUC
VA
DAO TAO
• • •
TRUONG DAI HOC KINH TE TP.HO
CHi
MINH
• •
DE
TAl
NGHIEN
Ciru
KHOA HOC CAP

CO
SO

' " , ;t,.
PHAN
TICH
HOI
QUY BANG
SUPPORT
VECTOR
MACHINES (SVM)
MA
SO: CS-2007-01
CHU NHI¢M: ThS. GVC HUYNH
VAN
DUC
THANH VrEN:
ThS. GV NGUYEN CONG
TRi
TP. HO
CHi
MINH NAM 2009
Ngay nay chung
ta
dang dung
tru&c
mot kh6i
luQ'ng
du kh6ng
16

fin
chua ben trong
quy Iuat
cha
dUQ'C
kham pha. Cung
v&i
S\1'
phat triSn cua khoa hoc,
S\1'
hiSu
cua
chung
ta
vS
nhiSu d6i
tUQ'ng,
S\1'
vat
dUQ'C
dfiy
du
han
va chi han. M6i quan gifra cac
y8u
tfi
theo d6 cang them phuc Mot thuoc tinh c6 thS c6 m6i quan
v&i
rAt
nhiSu

nhung thuoc
tinh khac, ddn d8n du quan sat
duQ'c
thuang
c6 s6 chiSu
rAt
l&n
lam cho
cac phuang phap truySn th6ng gap nhiSu kh6 khan.
Sau thai ky hoang kim cua thJng
ke
rieng phdn (cac thap nien
1930-
1960,
v&i
phuang
phap
clfe
dc,zi
likelihood do Fisher
dS
xuAt
vao
dfiu
thap nien 1930, v6n lam
rAt
t6t
v&i
dfr
lieu c6 s6 chiSu nh6),

ngucri
ta
dfiu
quay
v&i
thJng
ke
t6ng quat [1]. Ngay Iap
tuc mot
nguyen ly chung
dUQ'C
chAp
nhan rong rai, nguyen ly
qrc
tidu t6n thdt thlfc
(Empirical Risk
Minimization-
ERM).
V&i
dii c6 s6 chiSu Ian, khong gian gia thi8t tra nen
phuc
Lam th8 nao vira kiSm
soat
dUQ'C
khong gian gia thi8t vira bao dam tinh vfrng cua cac
u&c
luQ'ng?
Nguyen
ly
qrc

tidu tbn thdt cdu true (Structural Risk Minimization - SRM) da
duQ'c
xufit vao gifra
thap nien 1970
thvc nguyen ly
ERM
c6 kiSm soat
S\1'
phuc cua khong gian
gia thi8t.
Sau d6 (1990), cac mo hinh Support Vector Machines (SVM)
duQ'c
gi&i
nhu
Ia
mot
phuang phap cai dat
nguyen ly SRM.
Tu
d6 nay, cac thuat toan SVM da chung to
duQ'c
kha nang lam qua
v&i
dii c6 s6 chiSu
l&n.
Trang
dS
tai nay, chung toi
gi&i
mo hinh SVM nhu

Ia
mot(phuang phap h5i quy
qua cho
dfr
c6 tinh phi cao. Trong khuon kh6 cita m9t
dS
tai
cAp
00
sa,
chung toi khong c6 tham vong
l&n,
khong dua ra bfit ky mot nghien cuu
m&i
hoac mot
ung
d1,mg
thvc t8 qua nao
ca.
Chung toi tap trung trinh bay mot each c6 M
th6ng cac khai cac bai toan va cac thuat toan
huAn
cho
thAy
SVM dang
dS
chung ta
dfiu
tu nghien
Cll'U

sau han
vS
n6.
Ngmli ra chung toi cling da cai mot thuat toan
huAn
SVM, da trinh bay Hoi
thao Qu6c
gia
lfin
thu
Ill
Nghien
c(ru
ca
ban va
ung
d1,1ng
Cong thong tin nam 2007
(Hoi thao
F AIR07), va xay dvng mot chuang trinh minh hoa. Chung toi
da
dung chuang
tiinh nay
du
thvc t8
lAy
tu
mot
dS
tai nghien cuu

cfip
bo [20].
CAu
true cua
dS
tai g5m ba chuang va mot
ph1,1
l1,1c.
- Chuang 1 phac thao mot hue tranh toan canh, cling
gi&i
dong
CO'
nghien
Cll'U.
- Chuang 2 chi ti8t xay dvng mo hinh.
-
Chuang 3 trinh bay mot thuat toan
huAn
chi ti8t d8n muc c6 thS cai dat
duQ'c
dang.
- Phfin
ph1,1l1,1c
trinh bay cac k8t qua thvc bao g5m du
Ifiy
tu
[20].
PHAN TiCH HOI QUY BANG SVM
Toi xin
g&i

loi cam an chan thimh Phong
Qufm
ly khoa
hQc
-
HQ'p
tcic
quBc
da
kien cho chung toi hoan
tAt
tai nay; Cam
an
cac d6ng nghiep trong khoa Tin
hQc
quan
ly,
cac d6ng nghiep
tu
Khoa Cong ngM thong tin,
hQc
Khoa
hQc
t1,r
nhien
TpHCM, da tham gia va dong g6p cac y quy bau trong cac bu6i seminar
duQ'c
t6
chuc
cho tai nay.

Du
rAt
n6
h,rc
bam sat
m1,1c
tieu, nhung tai
duQ'c
thl,l'c
hien con
Chung toi nghiem tuc d6n nhan cac g6p y
gAn
xa.
11
Tp.H6 Chi Minh, ngay 24/04/2009
Nh6m tac gia
Mucluc
. .
MO'diu
i
MIJC
II}C

iii
ChU'O'Dg
1:
D't
vftn 1
Chwung
2:

MO
hinh
SVM
7
1.
Mo
hinh SVM tach
tuydn
tlnh 9
Bai toan tach 9
Mo hinh toan hoc 9
Mo hinh chiu
16i

12
2.
Mo
hinh tach phi tuyin
15
3.
Mo
hinh hdi quy
SVM.

19
C
J.
'
khA


'At}'J.
20
au tnic ong
g1an
glc:t
uet
.
Mo hinh toan hoc
21
ChU'O'Dg
3:
toan
huftn
SM0
25
1.
Mota
thuqt toan : 26
K A ti'nh
J
• h '
dJ
:t
26
1em
tra
to1
uu
cua p uong an
01

ngau .
chinh phuong an 27
Xay d\fng bang tinh toan 29
Minh
hQa
29
Minh
hQa
trubng
hQl>
phi
31
toan SMO cua Platt [25]. 32
Heuristic tim i 33
Heuristic
ti1n
j , 3 3
2.
Thugt toan SMO
cho
biti
toim
hdi quy 34
Xay d\fng bang tinh toan
'"""'
37
Minh
hQa
38
K@t

luij.n

41
Tai
tham
khao
_ 43
Phi} 1:
Thl}'C

47
1.
Bai
toan
tach 47
2.
Bai
to
an
hdi quy 48
3.
Bai
toan
thl,fc
ti.


49
Du
lieu 49

qua thir nghiem 50
Chi
ml}c
• • • • • 52
Ill
ChU'ong
1:
D(it
vftn
d@
Bai toan
suy
luqn
quy
nqp
da c6
tu
han 2000 nam qua. Tuy nhien mai thS ky XVIII,
mf>i
lien he gifra nganh khoa
h(JC
thl!C
va
CRC
nganh khoa
h(JC
chinh
Xac
khac
nhu

toan, logic
mai
duqc
ra (D.
Hume
va
I. Kant, bai toan phan - demarcation
problem)[ 1].
C6
n6i
S\1'
phat
tri&n
cita khoa
hrc
va
cu(jc each
mqng
c6ng thong tin trong
k)r
XX da la ti8n d8 cho viec hi en cite y
tuc:'Yng
m6i trong suy luan th6ng ke.
du cite ySu t6
cua
suy Iuan th6ng ke da
tan
tl;li
each day han 2 thS ky, trong cite cong
viec cua

Gauss
va
Laplace,
nhung
n8n tang that
S\1'
cua ly thuySt chi
dUQ'C
ddu vito
cu6i thap nien
1920. a
thai
d6, cite th6ng ke
mota
hfiu
nhu
daddy
du v6i nhi8u quy
luqt th6ng ke cho phep
mota
t6t cite biSn c6 xay ra trong thS gi6i thuc. Cling vao nhung
nam
1920 nay, cite
mo
hinh
ca
sa
cho
ca
hai tiSp can:

thf>ng
ke c6 (con
dUQ'C
goi la
th6ng ke tham s6)
Ifin
th6ng ke t6ng quat cling
da
hinh thanh [1].
Su
phat cua khoa
hoc hien ddu vito cu6i thS ky XIX da lam thay d6i
su
biSt cua chung
ta
v8 mo
hinh t6ng quat
cua
thS giai thuc
tu
mo
hinh mang tinh
xac
dinh sang
mo
hinh co tinh
ngdu nhien. Cite y
tuc:'Yng
mai
c6 y nghla cho suy Iuan th6ng

ke
hien trong thai ky
nay la cua
Karl Popper, Glivenko, Cantelli, Andrei N. Kolmogorov
va
Ronald A. Fisher
[1].
Karl Popper, vito
nhung
nam
dfiu
cua thap ky 1930, da
xem
xet
bai toan quy
tU
khia
triSt hoc. Nguyen ly phiin cua ong t6ng quat,
dua
tren khai niem v8 kha nang
sai (falsifiability)
cua
ly thuySt. Lfin
dfiu
tien ong da lien kSt kha nang t6ng quat h6a v6i
khai niem
dung lut;mg (capacity).
Cling vito
nhung
nam ddu cua thap ky 1930 nay, Andrei N. Kolmogorov xet bai toan

quy
tu
khia
th6ng
ke ly thuySt. Cong viec cua
ong
dua
vito hai qua chinh:
S\1'
h()i
1\1
cua phan ph6i
thuc
nghiem
dSn
phan ph6i
thuc
(Glivenko
va
Cantelli, 1933)
va
t6c
d()
h()i
41
nay nhanh
co
ham mil va d()c tap v6i
phf>i
(Kolmogorov, 1933). Hai

qua nay
la
ca
SO
chfnh
cua
S\1'
phat cua nguyen ly
thJng
ke
tJng
quat.
Cling trong
thai
ky
nay, Ronald
A.
Fisher
da
xet bai toan quy
tu
khia
thf>ng
ke
ung
d1.mg.
Do
ap
luc
cong viec luc gia

cAn
c6
cite
qua
tinh toan nhanh,
dan
gian
va
hieu qua,
R.
Fisher
da
d8 nghi m()t can mang tinh rieng phdn,
U'cYC
llf(J11g
cac tham
sJ
cua ham
mat
d().
can nay
da
chia khoa hoc
thf>ng
ke thanh hai nhitnh
thf>ng
ke
t6ng quat
va
th6ng ke phdn, con

dUQ'C
goi
la
th6ng ke tham
s6
1
.
Trong luc mo hinh
th6ng ke
t6ng
quat phat cham, thi
mo
hinh th6ng
ke
tham
sf>
phat nhanh.
ddu
tu
thap nien 1930, chi trong vong
10
nam sau cite
tf>
chinh cua mo hinh
thf>ng
1
ngfr
dung cila n6
Ia
th6ng

ke
parametric.
PHAN
TiCH
HOI
QUY
BANG
SVM
ke
tham
s6
da
dtrQ'c
dua
ra.
Khoang thai gian
tir
1930
dSn
1960
Ia
thai
Icy
vang
son
cua
tiSp
can
nay.
Cac gia

thiSt
chinh cua
mo
hinh
th&ng
ke
tham
sf>
Ia
[1]:
1.
tim
mot quan
phlJ
thu(Jc
ham
tir
dfr
cac nha th6ng
ke
dinh nghla
mot
tap
cac
ham
phl,l
thuoc tham
sf>,
v&i
sf>

it cac tham s6 va tinh theo tham
s6;
2.
th&ng
ke
cua thanh
phdn
nhien,
Ia
sai s6 giua mo hinh
va
du
li¢u
thl!C,
tuan thea Iuat phan
ph&i
chuAn;
3.
v oi gia
thiSt
2,
phuang phap
Cf!C
dc;Ii
likelihood
Ia
phuang phap
t5t.
Ngay
nay

khi
n6i
dSn
luQ'c
dB
cua Fisher nguai
ta
hay
goi
Ia
th5ng
ke
c6
Th5ng
ke
cfl
di8n
di
giai
ba
bai
toan:
U'cYC
lu()11g
ham
m(it
dQ,
U'cYC
lw;mg
hJi quy va

U'cYC
lu()11g
ham
phan bi¢t dung cac
mo
hinh tham
sf>
khac
nhau
(Phuang phap
Cf!C
dqi likelihood,
R.A.Fisher, 1930)
v&i
CO'
sa
toan
vfrng
(Mathematical Methods of Statistics, Harold
Cramer,
1946).
Mot
each
tflng
quat,
suy
Iuan
thf>ng
ke
di

giai
mot
bai
toan
qt'c
tidu
phidm
ham
dva vao du thvc
V&i
each
Ic\m
rieng
phdn
cua Fisher,
ly
thuySt
th5ng
ke
c6
di8n
da
khong
xem
xet mot
each
chi
tiSt
bai
toan Cl,lC

ti8u
phiSm
ham
nay
2
.
Ngoai
ra,
u&c
luQ'ng
ham
gia
tri
thvc
tir
dfr
duQ'c
xem nhu
bai
toan trung
tam
cua
thf>ng
ke
trng
d1,1ng.
Ky
thuat chinh
dtrQ'c
sir

dt,mg
a
day
Ia
phuang
phap
t6ng binh
phUV11g
be
nhdt va phuang phap t6ng modul
be
nhdt
dtrQ'c
Gauss va Laplace
dS
xufit
trong thai
gian
dai
trong qua
khu.
Tuy
nhien
nhfrng
phan tich
vS
cac
phuang
phap
nay

chi
m&i
thvc
trong
thS
ky
XX.
Thea
d6
thf>ng
ke
c6
chu
trong
dSn
cac
u&c
ltrQ'ng
khong
Gia
thiSt
vS
u&c
luQ'ng
khong
ddu
duQ'c
xem
xet
4

sau
khi
James va Stein (1961)
xay dvng
mOt
u&c
ltrQ'ng
ky
vong cua
mot
vecta nhien
(n
;:::
3)
c6
phan
ph&i
chudn
v&i
rna
tran tuang quan dan
vi.
U'&c
ltrQ'ng
nay
cMch
va
v&i
kich
thu&c

quan sat
c5
dinh
u&c
IUQ'ng
nay
dSu
t6t han trung binh (mot
u&c
luQ'ng
khong
cua·
ky
vong).
sau Baranchik da dua
ra
mot
tap
cac
u&c
ltrQ'ng
nhu
vay,
baa
gBm
u&c
ltrQ'ng
cua James-
Stein.
Them

vao
d6,
trong
cac
bai
toan thvc
tS,
khong
phai
tfit
ca
sac
gia
thiSt
cua
mo
hinh
th6ng
ke
tham
s6
duQ'c
thoa
man.
Cac
bai
toan
ngay
nay
c6

sf>
rfit
Ion
dSn
S\1
bung
n6
tA
hQ'p
cua
cac
tham
sf>.
Ngoai
ra
quy
Iuat
cua thanh
phdn
nhien
c6
th8
khong
thea
phan
ph6i
chudn (Tukey) va phuang
phap
eve likelihood cling khong
Ia

phuang
phap t6t
nhfit
(James va Stein)
[1].
Da
c6
nhfrng
c5
VUQ'tqua
chS
nay:
1.
P.
Huber (1960) phat
tiSp
can
robust
cho
phep
gia
thiSt
phan
ph5i
chudn
cua thanh
phdn
nhien;
2
Bill

toan
qrc
ham
da
tn'l'
tlllinh
bai toan chinh lien quan
xAp
xi
ham
va
giai tich
ham.
3
Trong
s6
cac phuang pMp
u&c
lu<;Yng
kh6ng ch?ch
thi
phuang
pMp
t6ng binh phuang
be
nhAt
Ia
phuang
fhaP
c6

phuang
sai
be
nhAt.
Vao
nhfrng
nc1m
1960
ly
cac
bill
toan
y6u
(ill-posed problems) dua
ra
mot
phuang
phap
xay
d\l'ng
cac
u&c
luQ'ng
cMch. sau y
tu&ng
nay
duQ'c
dung
cho
bill

toan
u&c
Im;mg
h6i
quy
cua
Iy
hoc
thflng
ke.
Thflng
ke
c6
hung
vao
bai toan I
\fa
chon
mo
hinh.
2
BE TAl
CAP
CO
sd
Chuang
1:
DA
TV
AN

DE
2.
J. Nedler (1970)
xuAt
mo
hinh tuySn tinh
t6ng
quat
cho
phep chon
mo
hinh t6t
nhAt;
3.
L. Breiman, P.
Huber
va
J.
Friedman
xet
ham
phi tuySn theo tham s6 va
dung
phuang
phap
ClfC
tiJu ham thi¢t hqi thlfc nghi¢m (Empirical Risk
Minimization-
ERM)
thay cho eve likelihood.

Cuoc each
cong
nghe thong tin 50 nam sau
d6
da tac dong to Ion
dai
s6ng,
rna
ra cac
CO'
hoi moi
cho
phep c6 sang trong cac
cong
viec hang ngay. Trong
th6ng ke c6 s6 tham s6 cua mo hinh Ia nho do d6 kSt
qua
cua
n6 chi gioi trong
cac hiun c6
s6
nho. Ngay sau khi cuoc each
cong
nghe thong tin cung
dp
cac
ca
hoi
uoc
luQ'ng cac ham voi s6 chiSu Ion,

nguai
ta
xem
xet
biSu d6 cua
Fisher
va
quay
th6ng
ke
t6ng
quat.
I>a
c6
c6
trong
giai bai toan voi s6 chiSu Ion.
Truoc
nam 1970 can
chinh cho bai
toan
uoc
luQ'ng
hBi
quy chiSu Ia
phuang
phap t6ng binh
phuang
be
nhAt

va
phuang
phap
t6ng
modul be
nhAt
voi cac ham tuySn tinh theo tham s6. Trong cac
nam cua thap nien 1970 cac
ham
tuySn tinh t6ng quat
duQ'c
dung voi hy vong tim
dUQ'C
s6
nho cac ham
CO'
sa.
Thap nien 1980-1990 hien
phuang
phap
tl,l
diSn, voi s6 Ion cac
ham
cho truoc, dung
du
lieu xac dinh mot
s6
nho cac ham
va
uoc

luQ'ng cac he s6.
Phuong
phap nay gBm Projection Pursuit (Friedman
va
Stuetzle (1981),
Huber
(1985));
MARS (Multivariate Adaptive Regression Spline) (see
Friedman
(1991))
rAt
thu hut
va
tra
thanh
cong
Cl,l
chinh trong phan tich nhiSu chiSu.
Tra
hti
can
t6ng
quat
da
bi quen
lang
trong su6t
20
nam. Nam 1958 F. Rosenblatt,
mot

nha sinh ly hoc,
dadS
mo
hinh perceptron cho bai toan tach tuySn tinh
va
co
ths
t6ng quat
h6a
dUQ'C.
Mo
hinh perceptron phan anh
SIJ
sinh ly hoc kinh c6
trong
CO'
chS hoc
nhu
Ia
SIJ
tuang
tac gifra s6 Ion cac tac nhan
dan
gian (mo hinh
naron
cua McCulloch-Fitts).
Ngay
Iap
tuc
mot nguyen ly chung

duQ'c
chAp
nhan, chinh Ia
nguyen ly ERM. Sau
d6
ly thuySt
ERM
cho bai toan nhan mdu
da
duQ'c
xay
dvng
vao cu6i nam 1960.
Nam
1963,
Novikoff
dua
ra dinh ly
SIJ
hoi
tl,l
cua thuat toan perceptron (Hoi nghi hoc
may
Vien
Khoa
hoc
diSu Moscow) c6 anh
huang
dSn nhfrng nguai tham
dv.

Dung
may
tinh
va
cac thuat toan
dan
gian
chu6c
each lam
cua
con
nguai,
dong
vat
va
w nhien giai quySt bai toan. I>inh Iy nay lam phat sinh hai cau hoi:
1.
Tim
lai
giai t6i uu?
2.
Bai toan tach Ia each t6t
nhAt
SIJ
t6ng
quat?
vs
sau, vao nhfrng nam
cua
thap nien 1980,

mot
trong nhfrng bai toan tang (bai toan
Glivenko-Cantelli) da ddn ly thuySt thdng ke tdng quat,
dva
vao
du
true cua ho cac
khong gian gia thiJt IBng nhau [1]. Theo d6, ben
chAt
lUQ'ng
cua
xAp
xi, tiSp can nay
con
quan tam dSn
SIJ
phuc
cua
cac khong gian
gia
thiSt.
Nhu
vay
viec soat cac
khong
gian
gia
thiSt Ia
mot
trong nhfrng cong

Cl,l
chinh
cua
tiSp can nay.
Lam
thS nao soat
duQ'c
do
phuc
cua
khong gian
gia
thiSt? Theo Iuat s6 Ion c6
suAt
cua
mot
biSn
c6
se hoi
tl,l
dSn
xac
suAt
xay ra biSn c6 m\y. Tuy nhien voi
mot
ho cac biSn c6,
SIJ
h(Ji
{1f
aJu c6 dam bao hay khong thi

khong
soat
KhoaTHQL
3
PHAN
TiCH
HOI
QUY
BANG
SVM
d9 phuc cua khong gian
gia
c6 lien quan ly
sv
h9i
tl,l
C6
ba
khai niem d9 phuc
cua
khong gian
gia
duqc
cap (xem
[1
],
chuang
2) la d(j
h6n
d(m

(Annealed Entropy), ham tang truong (Growth Function)
va
s6 VC (VC
dimension). Ly
Sl,l'
h()i
tl,l
da
duqc
xay
dvng
vao
cu6i nam 1960 (Vapnik va
Chervonenkis,
1968, 1971) v6i
honda
tang la ho cac khai niem
dung
luqng
(capacity) cua
tap cac
ham chi thj (indicator functions, cac ham nhan
gia
tri 0 1) con
dUC)'C
goi la
sJ
chiJu
VC.
Nguyen ly

eve
ham l6i v6i s6 VC nho
duqc
goi la nguyen ly qrc tiJu
hqi cdu true (Structural Rist
Minimization-
SRM)
Su phat
tl,lc
cua nguyen ly nay da
m9t
thuat toan m6i
duqc
goi la
may
vectO'
(Support Vector
Machines-
SVM) [1, 2]. Gi6ng v6i
mo
hinh perceptron,
cac thuat toan
SVM
cung
t6ng
quat h6a
tu
viec giai bai toan tach tinh.
Tu
mo hinh perceptron

va
kha nang t6ng quat
h6a
cua
n6 (F. Rosenblatt, 1958),
mo
hinh
neuron nhan (Artificial Neural Network - ANN)
da
phat
va
c6 cac
ung
dl,lng
hieu qua trong linh
vvc
khac nhau [3, 4,
5,
6, 7]. Nhfrng gi neuron lam
duqc
thi
SVM
cung hlm duqc, tham chi con hieu qua
han
[2, 8, 9]. Nhfrng thanh
cong
cua
cac mo
hinh
SVM

khac nhau
da
chung
to
kha
nang cua thuat toan nay [8,
9,
10,
11].
biet trong
m9t
tra
gfin day
[9]
(Xindong Wu, 2007) da
SVM
trong top
10
cac thuat toan khai khoang
du
Ngay nay
luqng
du
tang gftp doi sau m6i 20 thang (Sever Hayri, 1998). Rftt
quy luat
An
chua
ben
trong kh6i
luqng

du
vo
cling l6n d6
cfin
duQ'c
phat Llnh
Vl,l'C
Kinh cung
khong
la
gi
xay ra
m9t
cong ty
dUC)'C
hanh vi
cua khach hang? m()t
luqc
kinh doanh hieu qua se
dUO'C
ra.
Trong khoa hoc kinh viec
xu
ly
du
la cong
sue
quan trong. giai
trong qua trinh
dua

ramo
hinh, dinh
mo
hinh
cfin
phai
xu
ly
du
phuang nao
d6
nghien cuu trong kinh c6 d6ng
nhAt
v6i
dfr
·
Hai
phuang
phap chinh dung phan tich
du
duQ'c
su
dl,lng trong kinh la
phuang
phap ky thuat va
phuang
phap
ca
ban [20].
Phuang

phap nao cung
dva
tren
ca
sa
cua ly
xac
xuAt.
Chung
ta
bai toan chinh cua ly xitc suftt la nghien ClrU
t6ng cua cac luQ'ng ngftu nhien d()c lap c6
phuang
sai
Dfr trong
ph1;1m
vi
va
h1;1n
c6 thoa man nay, tuy nhien v6i kh6i
luQ'ng
du
d6 s9 nay d6 khong con dung nfra. V6i cac
phuang
phap n6i
tren
vftn
lAy
chinh
xac

anh
huang
l6n qua. Lam nao
lfty
mftu phu
hO'P
v&i
vAn
nghien
cuu
trong ngfr canh nay? Them vao d6, cling v6i toan
du
hoa
kinh
t6
rAt
m6i dang tac d()ng vao cac kinh Vai tro tac d()ng cua
cMng
dang
An
chua
trong
du
rna
lAy
mfiu khong chinh xac co lam sai
qua phan tich.
Thi truong chung khoan
tu
lau

da
duqc
xem la llnh
vvc
dfiu
tu
c6
lO'i
nhuan cao. Bai toan
dl,l'
bao gia
chung
khoan chiu anh
huang
bai
tuang
tac gifra cac hinh kinh chinh
sach, tham chi tam ly trong quan
rAt
phuc nen
rftt
kh6 khan trong
dv
bao.
C6
chung
5
Mot
s6
tai dung

ngfr
may vecta
h6
tr(Y.
4
DE TAl CAP
Cd
sd
Chuang
1:
DA
TV
AN
DE
cu
cho riing (Yunos,
Zaid,
Jamaluddin, Shamsuddin, Sallehuddin, & Alwi, 2001) phan
tich ky thuat
khong
c6
kha
nang
du
bao
chinh
xac
gia
chung
khoan.

GAn
day ky thuat tinh
toan
nhu
Granular
computing,
Rough
sets, Neural networks,
Fuzzy
sets, Genertic
algorithms
dUQ'C
Slr
dt,mg
rong
fiii
cai do chinh
xac
CUa
du
baa
cling
nhu
qua
tinh toan t6t han
so
v6i
phan tich ky thuat.
ncr
ron da

chung
to
tinh qua trong
bai toan
du
baa
gia
chung
khoan (Yoon & Swales, 1991 ),
c6
kha
nang
giai rna tinh phi
cua
du
lieu,
mo
ta
cac
dl,ic
trung
cua thi
truang
chung
khoan (Lapedes & Farber,
1987),
du
baa
chi s6 thi
truang

(Chong & Kyoung, 1992.) (Freisleben, 1992), nhan
cac
miu
trong
cac
d6
thuang
(Dutta & Shekhar, 1990), lai sufit
cua
trai phiSu
cong
ty,
uac
luqng
gia
Iua
ch<;>n
(Li, 1994)va chi
baa
mua
ban (Chapman, 1994)
(Margarita, 1992).
Nhu
cAu
c6
them
cac
phuang
phap
va

ky thuat
mai
trong
viec
xu
ly dfr ngay cang
Ian.
phuang
phap
va
ky
thuat khai
pha
dfr phat tri
thuc
da
dang
vase
COn
dUQ'C
dua
ra
da
chtrng
to
tinh
qua
CUa
chung
trong

l'inh
VUC
khac nhau,
trong
d6
c6 kinh
Cac
phuang
phap
va
ky
thuat
c6
dSn nhu: SVM, tim Iuat kSt
hqp, ly
tap tho,

Chung
toi tim thfiy
cac
thuat
toan
SVM
duqc
xay
dung
dua
tren
nguyen ly
SRM

vai
tang
toan
h<;>c
vfrng Ngoai
ra
cac
mo
hinh
SVM
da
duqc
chung
to
tinh
nang
qua
cua
n6
so
vai
mo
hinh ncrron
nhan
va
mo
hinh th6ng
ke
khac [21].
Chung

toi
hy
v<;>ng
cac
mo
hinh
SVM
cung cfip them cong
Cl,l
qua
cho
nhu
du
rfit
Ian trong viec tim
cac
quan
he
ham
tir
dfr lieu
trong
linh
vue
kinh nay.
[11, 12, 13, 14, 15, 16, 17, 18, 19]. Tinh
qua cua
mo
hinh
phAn

duqc
ph1,1c
thong
qua
h<;>c
cac tap dfr
miu.
c6
tra
lai
cac cau hoi tren
mot
each
thea
dang
chung
ta
cAn
quay
trcr
ly
va
lam
cac
nghien
cuu
mang
tinh
CO'
ban

cao.
C6
nhu
vay
chung
ta
mai
c6
CO'
sa
dua
ra
mo hinh
mai
va
ap
dl,lng
duqc
n6
trong cac bai
toan
thuc
Mot
each
t\1'
nhien
c6
mot
s6
cau hoi

dl,it
ra cho mot
mo
hinh
SVM
Cl,l
Ia:
1.
Mo
hinh nay
c6
vfrng khong? _
2.
Lam
thB
nao
soat
duqc
cac
khong
gian
gia
If>ng
nhau?
3.
Do
phuc
cua
thuat
toan hufin

cua
mo
hinh?
Day
Ia
mot
cong
phuc
Trang
vi
cua
mot
tai
cfip
CO'
sa,
chung toi chi
thuc
hien
mot
sf>
nghien
cuu
M1,1c
tieu
dM
ra
cho
tai nay Ia:
1.

Cac
mo
hinh
SVM
CO'
ban
2.
Giai
thieu
thuat
toan hufin
nhanh
3.
Xay
dung
mot
cai
dl,it
thl'r
Thong
qua
tai
chung
toi mu6n
giai
mot
m6
hinh
cho
bai toan

hf>i
quy ap
d1,1ng
cho
bcU
toan
uac
luqng
quan ham
tir
dfr
cua kinh Dfr lieu
thuc
duqc
Ifiy
tir
mot
tai nghien
cuu
cfip
bo
(2007),
trong
d6
cac
tac
gia
da
dung
mo

hinh
hf>i
quy tinh
thea
can
th6ng
ke
tham
sf>
[20].
Cac
mo
hinh
SVM
ca
ban
duqc
trinh
bay
tu
cac tai [1, 2, 21, 22, 23, 24].
Thuat
toan hufin nhanh
la
thuat
toan SMO [25, 26, 27, 28]
duqc
ch<;>n
trinh
bay

vi cac ly do:
KhoalHQL
5
PAAN
TICH
HOI
QUY
BANG
SVM
- Lam ro
dUQ'C
each xay dvng thuat toan c6 hoc hoi
duQ'c;
- Cho
ca
hoi cai [26, 27, 29];
-
Vdn con tinh thoi S\l [13]
Chuang
trinh cai
dl,it
thu
dUQ'C
theo huang d6i
tUQ'ng
cho phep
su
dt mg
qua ca thuat toan m6i ldn rna ngu6n.
6

DE
TAI
cAP
co
sd
ChU'O'Dg
2:
Mo hinh SVM
Trong
ly
hoc th6ng ke, bai
tocm
hQc
c6 giam sat
duqc
hinh thanh nhu sau
[I,
2,
21]. Cho
tap
dfr
hoc
{(xi,
Yi)}
duqc
lAy
theo phan b6 xac
suAt
chua p(x, y).
Gia

su
t5n quan M
hamy
ph\1
thuoc vao
X.
V6i
hamfkha
dr,
chung
ta
dinh nghra ham
V(y,j(x))
do sv t6n
thAt
(Loss Function) khi chdp
nhanf
Ham/
d.n
tim chinh
Ia
lai giai
cua bai toan
tiJu ham hgi (Risk Functional):
I
v(y,f(x))p(x,y)dxdy
(
2.1)
Vi p chua chung
ta

tim lai giai trong mot lop ham (
duqc
goi
Ia
kh6ng gian gia
each
qrc
tiJu ham hgi
thT:rc
(Empirical Risk Minimization, ERM):
(
2.2)
Gia
su
[*
Ia lai giai cua bai toan C\l'C ham (2.1) va
/Ia
lai giai cua bai
toan
C\l'C ham thvc (2.2). Goi L
Ia
gia tri cua ham ung
v6i
[*
vaLE
Ia
gia tri cua ham thvc ung v6i
/.
Ta
c6

L =
JV(y,f*(x))p(x,y)dxdy
(
2.3)
(
2.4)
Cau hoi
du<;Yc
ra mot each w nhien
Ia
lam nita danh gia aU(J'C Sl:f khac giua hai
gia
trt
nay. V6i can SRM, da
duQ'c
cap trong
ph§.n
m&
d§.u,
ham tdn thdt ngoai
ph1,1
thuoc vao
I6i
cua ham
dv
bao
j(x)
so v6i gia tri thvc y, con
ph1,1
thuoc vao do

phuc
cua khong gian gia
C6 mot s6 khai do do phuc cua mot ho cac ham [1]. Trong tai nay chung toi
chon gi6i khai s6 chidu
VC
6
(VC dimension) (1, 2]. Chung toi se dinh nghla
chi khai nay trong qua trinh mo hinh h6a.
Trong
vi
cua
tai chung toi khong di
;vao
nghien
cuu
cac danh gia S\l'
khcic
gifraL (2.3) vaLE (2.4) rna tap trung vao qua trinh mo hinh h6a
va
thuat toan
hufin
Tuy nhien
tru6c
khi di vao chi chung toi cling mu6n
gi&i
mot dinh ly cho thfiy .
vai
tro cua s6 chidu
vc
trong dua ra cac danh gia.

:E>inh
ly sau neu ra mot danh gia
S\l' khac gifra L vaLE
dva
theo d9 tin
cc;ly,
s6 chidu
vc
va kich
thu&c
cua tqp
du
6
VC
Ia
ten cua hai tac
gia
d€
xuftt
khai Vapnik
va
Chervonenkis (1998).
7
PHAN
TiCH
HOI
QUY
BANG
SVM
Gia sit

VIa
s6
VC
cita kh6ng gian gia
H,
m
Ia
kich thu&c cita
tcjp
du
V&i
xac sudt
1-
1'f,
IJi
kj;
vr;mg
be
nhdt L va IJi
thTJc
be
nhdt
LE
thoa
rang bu9c
v(
1
+log
c;)
)-log(%)

IL
-LEI
75,
4 fl
m
d9c
lcjp
v&i
phiin
b6
xac sudt p(x,
y)
(
2.5)
Ro rang dinh ly tren day chi c6 y nghia v6i cac khong gian gia c6 s6
chiBu
VC
huu
Nhu vay kiSm soat khong gian gia dong mot vai tro quan trong khi cai
nguyen ly SRM. Theo nguyen ly SRM, cac khong gian gia g6m cac ho ham I6ng
nhau [ 1'
2]
hinh thanh mot thu w tinh giup dang xay dvng
mo
hinh hoc. v a
dung duqc cac danh gia
gifJng
nhu dinh
ly
(2.5),

s6
chiBu
vc cua cac khong gian gia
con
phiii
huu
Bai toan v6i s6
chiBu
l6n
Ia
bai toan phuc (R.Bellman 1960). Mo hinh SVM duqc xay
dvng dva tren
nguyen ly SRM, c6 kha nang kiSm soat do phuc cua khong gian gia
cho phep giai bai
to{m
v6i s6 l6n. phat
tu
bai toan tach tinh c6
t6ng quat duqc (mo hinh perceptron cua
F.
Rosenblatt, 1950), mo hinh SVM duqc xay
dvng va
dAn
tra thanh mot trong nhung phucmg phap qua ghii bai toan uoc luqng
ham
tu
du thvc Dva tren
mo
hinh perceptron, mo hinh SVM di tim mot sieu
tach t6i uu theo nghia cvc

Ia
hanh lang chia hai lop d6i tuqng [1].
Nhu
vay khong gian gia cua
mo
hinh SVM
duQ'c
xay dvng hoi ho ham [1,
2]
f(x,w)=wrx+b
(2.6)
DS cai nguyen ly SRM, mo hinh SVM xay dvng ho ham I6ng nhau c6
{f(x,
w) =
wr
x +
b,
lwl
5:
A} (
2.7)
Thay vi giai bai toan cho m6i khong gian gia con (2. 7), mo hinh SVM se soat
cac
khong gian gia nay qua mot tham s6 [
1].
Trong chuang nay chung
tOi
khong
quan tam cong cac thang do cling nhu cac thuoc tinh
mo

ta
dfJi
tuqng.
Chung toi thua nhan
dfJi
tUQ'ng
duqc
mo
ta hoi mot vee
tO'
n cac s6 va tap trung vao
cac
mo hinh toan hoc. Cac mo hinh toan hoc duqc gi6i trong chuang bao g6m:
mo
hinh tach tinh, mo hinh chiu
I6i,
mo
hinh tach phi va mo hinh h6i quy.
8
DE
TAl
CAP
CO
sd
Chuang
2:
MO
HiNH
SVM
Bai

to{m
tach
la
trucmg hqp rieng cua bai toan phan lop. Tai nay trinh bay SVM cho
bai toim tach.
Mo
hinh SVM cho bai toan phan lop
la
mot
chu khac. TiSp can SVM
dva
tn3n
ca
sa
cua
tach tuySn tinh [
1].
Gia
su
bai toan la
kha
tach tuySn tinh.
phan tach hai lop, c6
rAt
sieu lam
duqc
nay. TiSp can
SVM
chon ra mot
sieu

t6i
uu.
trvc
giac, sieu nay xac dinh do rong lon
nhAt
cua
duang
bien
duqc
xac
dinh
bai
sieu
Hinh sau minh
hoa
cac khai quan trong cua
SVM
bao
gam
sieu
ph&ng
tdi
uu,
w!cta
tl:fa va
M.
Theo d6 sieu t6i uu la sieu tach
c6
lon
nhAt

va
duqc
xac djnh
qua
cac vecta fl:fa (Support Vectors).
Mot
each
trvc
giac sieu t6i
uu
la duy
nhk
Tuy nhien
phuang
trinh cua
mot
sieu la khong duy
nhAt.
Chung
ta
se
di
tim
mot
se
duqc
goi la
chinh sao cho chinh cua sieu t6i
uu
la

duy nhdt.
Cho
tapJifr
{(xi,
Yi)
}i = I m, trong d6
Yi
E ( -1,
1}.
Xet sieu tuy
y,
w\: + b = 0,
wrxi
+ b
d+
=min
lwl
Yt=1
wrxi
+ b
d_ =
max
: :
Yi=-1
lwl
d =
d+-
d_
Sieu nay la lai giai cua bai toan tach tuySn tinh nSu d >
0.

Khi
dy
gia tri d duqc goi
la
d9
r9ng cua
tl
Gia
su
tap
dfr
la kha tach
va
gia
su
sieu dang xet
la
mot sieu
tach.
C6
chon b sao cho
d+
=-
d_.
Voi b
nhu
thS,
phuang
trinh sau
KhoaTHQL

wrx+
b
dlwl/2
=
0
9
PHAN
TICH
HOI
QUY
BANG
SVM
Ia phuang trinh diSn cua sieu
han
nua
wTxi + b
1 =
Yt=l
dlwl/2
wTxi + b
-1
-
max
fiiil
-
Yt=-1
dlwl/2
' '
Phurmg trinh thoa
rimg

bu9c sau
alf(JC
gri
la phuang trinh chinh cua sieu
phi!ing (con alf(lc
g9i
la sieu phi!ing chinh
tcic)
a6i
v&i
t(ip dfr X cho
tru&c.
(
2.8)
Chung
ta
quan
Him
sieu tach c6 dIan
nhat
V6i diSn chinh
ta
c6 sieu
phling t6i
zm
Ia
duy nhfit [
1]
_
V6i diSn chinh

mo
hinh toan hoc cua SVM cho bai toan tach tinh
Ia
bai
toan
quy ho(lch toan
phuang
sau (con duqc goi
Ia
bai toan gBc):
1
-wrw
2
(
wT
xi + b
)Yi
;:::
1
(
2.9)
trong d6
{(xi,yi)}i=l,m
Ia
dfr
lieu cua m
dBi
tuc,mg,
Yi
E {0,1} xac dinh lap cua d6i tuqng

thu
;_
l(li
du6i d(lng rna
v6i
1
-wrw
2
yrxrw
+by;::: e
X=
(xl
x2

xm)
yr
=
(Yl
Yz

Ym)
eT = (1 1

1)
Y = diag(Yl
Yz

Ym)
Ta
c6

bai toan
dBi
ngfiu cua bai toan g6c (2.1
O)
la
1
-aTDa
2
trong d6
10
( 2.10 )
(
2.11)
DE
TAl
CAP
CO
sd
Chuang
2:
MO
HiNH
SVM
Tu
ly
d6i ta c6 cite rang buoc d6i (xem
ph1,1ll,lc
3):
1
Yi(wTxi

+b);:::
1
ai;::: 0
2
w=XYa
wtuyy
3
yTa
= 0 b tUyy
Chung ta
cfin
tim cac vecto
tl,l'a,
la nhung vecto tren 2 ducrng bien thoa man cac rang
buoc
thuc.
Tu
rang buoc d6i thu
1,
ro rang vecto ung vai thanh
phfin
duong cua
(X
la mot vecto
tl,l'a.
Vai bang
du
Mt
M2
M3

M4
Ms
XJ
1
3
4
1
3
x2
y
1 1
2
1
4 1
5
-1
6
-1
Ta c6 m = 5, n = 2. Bai toan quy toan phuong luc nay:
1
2 2
2
(w
1
+ w
2
)
w1
+w2
+b

3w
1
+2w
2
+b
4w
1
+4w
2
+b
-w1
-5w
2
-b
;:::
-3w
1
-6w
2
-b
;:::
Hinh sau cho thfty cac vecto
tl,l'a
la
M3,
M4
va M
5
.
···:····.

;

··:


::·
. . . .
·:·
. .

·.··

·:··.·.··®··.=··
·=··
Giai phuong trinh ung vai cac vecto
tl,l'a
dUQ'C
WI=
0.4, Wz =
-0.8
Va
b = 2.6.
KhoaTHQL
+b
-b
-b
1
1
1
1

1
1
1
1
11
PHAN TiCH HOI QUY BANG SVM
qua nay chi mang tinh tf\lc giac
chua
du phl,lc, chung ta
dn
kiSm tra
tinh
t6i
uu
cua n6.
Tru6c
tinh .
XY
= G :
Do
M,,
M2
khong
la
vectcr
tl,l'a,
ta c6
a,=
a2
=

0.
Cac
rang
bu<)c
thu
2
va
3 cho
ta
l
4a
3
-a4
-3a
5
4a
3
-Sa4
-6a
5
a
3
-a4
-a
5
c6
a3
=as=
0.4,
l4

=
0.
- 0.4
-0.8
0
KiSm
tra phucrng an: phucrng an cua bai toan g6c wT = (0.4
-0.8),
b = 2.6
va
phucrng an cua bai toan d6i ngdu aT = (0 0
0.4
0 0.4)
thea
tieu
chuAn
t6i uu.
Trong
thvc
biSu diSn cua cac d6i tm;mg trong khong gian n
la
khong kha
tach.
Khi
§.y
sieu t6i
uu
duqc chon theo hai
ml,lc
tieu

xung
d<)t
nhau:
vua
bao dam
d<)
r<)ng
cua bien Ian nhfit,
vua
bao dam
lfli
phan tach nho nhfit.
• • •
j !
I • I
L

,,,_,_,,,,,,,,,,, ,./t


w ,_,,,,.,,,_,_,,,J
Tru6c
chung ta
din
dinh nghia s6
chidu
VC
cho bai toan tach.
Giil
sir

dft GU(YC
ch{Jn
trong
tqp
X
new
do.
s6
vc
cua
h{J
ham
F a6i
vtri
X
Ia
s6
1671
nhdt cac
w}cta
co
th€
tach
aU(Yc
thanh hai
16p
bdt
kj;
biri
m9t

ham
thu9c
F
Gia
sir cac vectcr
du
duqc
Ifiy
trong
JR{n.
Xet ho ham diSn cac sieu
la ho ham ph\l
thu<)c
tharn s6 a E
JR{n,
F = {
Ia
: x E
JR{n
a'
1
'x
+ b}. Ta c6 s6
chidu
VC
cua F
b&ng
n +
1.
Nhu

da cap,
SVM
lam v6i cac sieu chinh
Hie
duqc
cfiu
true
b&i
cac khong
gian gia I6ng nhau:
{f(x,
w) =
wT
x +
b,
lwl
::;
A}
12
DE
TAl
CAP CO
sd
Chuang
2:
MO HiNH SVM
Dinh
ly
sau [1] xac dinh s6
VC

cua cac kh6ng gian ghi vai
dfr
duqc
ldy
trong mot bi cua
Jru.n.
Cho
tcjp
dft
li?u
v&i
thimh phdn x c6
d(J
l6n bi
chcjn
tren
b&i
giit
tr;
D va kh6ng
gian
gia
thih
g6m cac sieu
phimg
chinh tlic
v&i
w c6
d(J
l6n bt

chc)n
tren
b&i
gia
triA.
Khi
dy
sJ
chiiu
VC
bi
chcjn
tren hili
giG
tri min(int(D
2
A
2
),
n) +
1.
Ro
rang vai dinh ly 2 tren day, thi dinh
ly
1
tru&c
d6la
c6 y nghla vai bai toan tach
tinh
vi

s6 VC bi chan.
Yi(wTxi
+b)<
1
Chung
ta
c6
m6
hinh toan hoc cua
m6
hinh chiu I6i
Ia
bai toan quy toan phuong
sau
(duqc
du&i
rna
tran):
1
-WTW +
CeTz
2
fYT
XT
W +
by
e - z
t
(
2.12)

Trong d6 tham s6 C dung
d8
ki8m soat cac kh6ng gian ghi [
1].
Mot each chinh xac
vai
C cang Ian,
ICri
giai se duqc tim trong kh6ng gian gia cang phuc tuc kh6ng
gian
gia
c6
s6 VC cang Ian. Do rong cua theo d6 cling
ph\1
thuoc C, cho nen
nguai ta con goi m6 hinh chiu
I6i
Ia
m6
hinh M
mim
(soft margin)
Bai tolm quy toan phuang vai
dfr
duqc cho trong
vi
d\1
1
Ia:
1

2 2
Z(w1 + w
2
)
+ C(z
1
+ z
2
+ z
3
+ z
4
+ z
5
) +
min
w
1
+ w
2
+ b 1 - z
1
3w
1
+2w
2
+b
1-
z
2

4w
1
+4w
2
+b
1 - z
3
-w
1
-Sw
2
-b
1 - z
4
-3w
1
-6w
2
-b
1 - z
5
zi 0
(i
= 1 5)
Bai toan d6i
ngfiu
cua bai toan (2.12) duqc cho bai
1
2aTDa-
min

f
yra
= 0
to:::;;
a:::;;
c
Khoa
THQL
(
2.13)
13
PHAN TiCH HOI QUY BANG SVM
trong
d6
D =
yr
xr
XY
vi
d\l
tren, tinh:
Suy
ra
D=
XY
=
(1
1
3 4
-1

-3)
2 4
-5
-6
2
5
8
-6 -9
5
13
20
-13 -21
8
20
32
-24
-36
-6
-13
-24
26
33
-9
-21
-36
33 45
V a
bai
toim
d6i

ngfiu
Ia:
af
+
Sa
1
a
2
+
8a
1
a
3
-
6a
1
a
4
-
9a
1
a
5
+
20a
2
a
3
-
13a

2
a
4
-
21a
2
a
5
+16ai
-
24a
3
a
4
-
36a
3
a
5
+13a: +
33a
4
a
5
-a
1
-
a
2
-

a
3
-
a
4
-
a
5
+
min
{
a1
+
az
+
a3
- «s = 0
0
::::;;
ai
::::;;
C,
(t
- 1 5)
Ctic
rang
buoc d6i
ngfiu
bao
g6m:

1
Yi(wTxi+b);;:::
1-zi
ai;;:::
0
2
zi;;:::
0
ai
C
3
w=XYa
w
tuyy
4
yTa
= 0
b
tuy
y
Tac6
0 <
ai
< C-+ yi(wTxi
+b)=
1
Suy
ra
vectO'
thu i ung

vcri
thanh
phAn
CXj
ducrng
va
be
han c
Ia
vectO'
tva.
Vi
d\l
sau
cho
thAy
vai
tro cua
C.
14
Vfin
dfr
lieu
duQ'c
cho trong
vi
du
1.
-
Vcri

C =
1.
phucrng
an:
wr
= (0.4,
-0.8),
b = 2.6,
zr
= (0,
0, 0, 0,
O)
cua
bai
toan g6c va
ar
= (0,
0,
0.4,
0,
0.4) cua
bai
toan
d6i
ngfiu
thea
diSu
kien
t6i
uu.

Ham muc tieu gia
tri
nho
nhAt
0.4.
DE
TAl
CAP
Cd
sd
Chuang
2:
MO HINH SVM
V&i
c = 0.2. phucrng
an:
wT
= (0.25,
-0.5),
b = 1.25,
zT
=
(0,
0,
0.75, 0, 0)
CUa
bai toan
g{)c
va
aT=

(0.03125,
0,
0.2, 0.05625, 0.175)
cua bai toan d5i ng[u thoa
kien t5i uu. Ham
ffil,lC
tieu
dl;lt
gia tti nho nhat
0.30625.
qua nay cho
th§.y
lai giai c6 khac nhau trong cac khong gian gia khac nhau.
Trong
vi
d1,1
tren chung ta con quan sat thay hai sau:
I.
V&i
c
l&n,
Iai giai
Ia
trung nhau;
2.
V&i
c nho, ham
ffil,lC
tieu giam, cho thay mo hinh cang
kh&p

v&i
thvc
Trong bai toan tach tinh, lai giai (sieu tach) c6 thuang khong duy nhat.
can may vecta
t11cl
(Support Vector
Machines-
SVM) dua ra mot tieu
chuAn
t5i uu
cho phep chi ra lai giai t5t nhat trong s5 cac lai giai kha di. biet can.nay xac dinh
sieu
tach qua cac vecta
t11a
(Support Vector) thay
vi
mot phucrng trinh
tuang
minh.
V&i
cac vectcr
tva
chung ta c6 phep thay d6i khong gian bai toan rna khong
cAn
tuang minh phep d6i. can nay cho phep xay dvng cac mo hinh tach phi
mot each deo va hieu qua.
Hinh sau minh hoa truang
hQ'p
tach phi trong d6 thay cho duong tach
Ia

mot
duang
cong tach.
V&i
nhfrng bai toan khong kha tach (theo nghia t6n
tl;li
mot sieu
tach) chung ta giai n6
di
tim mot phucrng
an
tach phi
0
0
Ban chat cua can SVM Ia tach tinh. thvc hien tach phi SVM
du lieu tap d5i
tuQ'ng
sang khong gian m6i v6i
s5
Ian han. Bai toan trong
khong gian
m&i
Ia kha tach.
7
Hien
tuQ'ng
qua
kh6'p
(gifra
mo

hinh
va
mfru)
khOng
pMi
Ia
hien tuqng
t6t.
Viec
qua khop voi
mfru
c6
tl
- ,
h;;'
T ,. '
1'
ct'
I - '
eu
m
IvU
eng
Iv
ung ngu vecta o
trc;r.
rong tat
Ivu
nay
c mng

tm
ung t ngu vecta
(1fQ
VI
vai
tro
Sl)' ciia cac
vectO'
nay.
Khoa1HQL
15
PHAN TiCH HOI QUY BANG SVM
16
Cho bang
dfr
sau:
Mt
Mz
M3
M4
Ms
M6
M1
X
.•.:;:

·
v
.,
"'


·.
+
'\t,(i):
·.
b ) ,
.•
r\·:t
. : ·
r
+ttl)

·
. . lj!{(i}

'
•·
. . .LCI
·:
•n:
. .
"·x,
L.


: ;

;




;

; :

:
.
y
1
1
2 1
4
-1
5
-1
6
-1
8
1
9 1
Hai lop nay Ia khong
kha
tach. Chung ta se anh Xl;l X sang khong gian khac v6i s6
2 bai
{mh
x1;1:
x H cfl(x) = (x,
(x-
5)
2

)
Bang
dfr
m6i Ia tach
duQ'c
trong khong gian m6i.
X
(x-
5)
2
y
1
16
1
2
9
1
4
1
-1
5
0
-1
6
1
-1
8
9 1
9
16

1
DE
TAl
CAP CO
sd
Chuang
2:
MO
HiNH
SVM
l-S
··y········································································································································································································································

1 G


·.·················

·.·········

-

·.···································-!;§:·.····-

.
::
to·""-·····················-·························································· ································································
I • •
. '
.s

'"(·"·"·"·"·"·"-'"·"·"·""·"·"·"·"·"·"·"·"·"·"·""·"""·"·"-""·""""·"·"·"·"·"·"·"""-'"·"·"·"·"·"·"·"·"·"·""·"·"·"-""·"·"·""·"·"·"·"-'"·"·"·""·"""·""·"·"·"·""·"""·""·"·"·"·"·"·"·""·""""""·"·"·'"-"·"·"·"·"·"·"·"·"""·"·"·"""·"·""·""·"·"·"·"·"·"·"·"·"
6

!

.
4

1

,


.
1
0

;

.,

,

,,

®·····

·······:········-························;·························
0
2:

4 6 8
V&i
each
lcl.m
ml.y
chung ta
phai
hai
kh6
khan
lien quan
imh
<1>:
1.
Cach
xac
dinh anh
ml.y
va
2.
Tra
v8
khong gian
bi8u
ban
dAu.
TM.t
ra
v&i
SVM

chung ta khong
phai
ban
dim
d8n
di8u
nay
vi
l<Yi
giai
cua
bai
toim
tuong
duang
v&i
viec xac dinh
cac
vee
ta tua. That
vay:
m
x wTx + b
=I
aiyi(xi,x) + b
=I
aiyi(xi,x) + b
i=l
a
1

>0
Bai
toan g6c
cho
tach
phi
khong
khac
gi
v&i
tach tinh trir
cac
vecta
duc,1c
cho trong khong gian
m&i
1
-WTW +
CzTe
2
{
(
wT
ct>(xi)
+ b
)Yi
2::
1 -
zi
z2::0

Bai toan
d6i
ng[u cling khong khac
gi
so
v&i
tach tuy8n tinh
1
-aTDa-
min
2
{
yTa
= 0
o:s;;a:s;;c
( 2.14)
trir
viec thay
th8
cac thanh
phAn
cua
rna
tran
d6i
xung
XT
X,
cac
phAn

tir
(xi'
xj)'
b&i
cac
phAn
tu
mai
(<I>(
xi),
ct>(xj)>.
Ky
hieu
Khoa
THQL
17
PHAN TiCH HOI QUY BANG SVM
Ma
tran K =
(kij)
duQ'c
goi la rna tran kernel.
Thvc chung ta khong
cAn
phai xac dinh
<1>,
rna dung mot ham goi la ham kernel
kij
=
kernel(xi,xi)

Trong nhi8u trucrng
hQ'p
nola
ham cua tich vo huang
kii
=
kernel((xi,xi))
Nghien
Cll'U
vi cac ham kernel
VUQ't
khoi
phl;lm
vi
cua
d8
tai. Trong
d8
tai nay chung toi
gioi
mot s6 ham kernel thong dvng [2]
Bang
cac lo\li
ham
kernel
thong
dvng
Logi
Bteu
thuc

Tuyen tinh
kernel(x,y)
=
(x,y)
Da
thuc
kernel(x,y)
=
((x,y)
+ 1)d
Gauss
kernel(x,
y)
= e
_llx-yll
2
2u
2
Tang Hyperbol
tanh(K(X,
y)
- o)
Nghich dao
1
Jllx-yii
2
+P
Trang trucrng
hQ'p
su dvng ham kernel, chung ta khong

din
quay v8 khong gian ban
dAu.
V
Oi
X, ta dinh X
thUOC
lop naa d\Ia vaa qua:
x
wr
cl>(x)
+ b
=I
yjaj
( ci>(xj) f
cl>(x)
+ b
a
1
>o
=I
yjajk(xi,x)
+ b
aJ>O

.
v8
ban
chfit
SVM la

phuang
phap
di
uoc
luQ'ng
cac thanh
phAn
cua vecta phap cua sieu
ph&ng
(chinh t6i uu.
uoc
luQ'ng
nay c6 vfrng? Hai dinh ly 1
va
dinh
ly
2
kh&ng
dinh di8u nay trong trucrng
hQ'p
tach tinh khi rna s6 chi8u VC la hfru nay
khong con dung
trong trucrng
hQ'p
tach phi Voi tach phi chung ta giai bai
toan
trong khong gian moi voi s6 chi8u VC c6 khong hfru
[1,
2]. Tuy nhien SVM
la

phuang phap cai nguyen ly SRM voi
cfiu
true cac khong gian ham I6ng nhau c6
s6
chi8u VC
hfru
[1]. Trong d6 tham s6 C cua mo hinh dung saat s6 chi8u VC
cua
cac khong gian gia con nay.
xac dinh gia tri t6t nhfit cpa tham s6 C chung ta phai
di
giai mot bai taan t6i
uu
[1].
Day la mot chu
d8
khac cua SVM ngoai vi
d8
tai cho nen chung toi chi
d8
cap
nhu la mot dam bao cho ung dvng cac mo hinh SVM
vaa
cac bai toan thvc
18
DE
TAl
CAP
CO
sd

Chuang
2:
MO
HiNH
SVM
Khac v6i bai toan tach, y nhan gia tti 1 -1, bai toan h6i quy lam v6i gia tti
cua
y la gia tti lien we. Mot each trvc giac chung ta se tap
dfr
thanh 2
16p c6 xay dung
mo
hinh dua vao SVM.
Vi
d1,1
sau cho chung
ta
thAy
SVM, ban
chAt
lam tren
hai
16p,
c6 giai bai
to{m
h6i quy nhu nao.
Xet bang du
X y
1
1

1
3
2 3
3
3
4 5
4
6
7
6
Phuang
trinh h6i quy la:
'
.

:
:J.
,j

·0
.\

,

1
• • • • ••.••• •
,
• •• •.•••
each nhan doi m6i va
dfiy

hai phia cua
tfl,lc
y,
chung ta c6 hai
16p.
Gia day c6
ap
d1,1ng
SVM .
KhoaTHQL
.
fi

········•·········•·························
······························
···············
······························
·······•····•····••••··················
.::

1;01
oo
m

1
i

,



._.,.,,


,



§.
.

?.

s
19
PHAN TiCH HOI QUY BANG SVM
Tu
true giac tren chung
ta
di
xay dvng mo hinh toan hoc.
thAy
mo
hinh toan hoc duqc
Hip
nhu nao, chung ta xet chu6i d6i hinh thuc sau.
Bai toan SVM cho vee
to
du
mai x va phap
m6'i

w Ia
1
-wrw
+
cerz
2
Tuong
duong
vai
{
v(xrw
+be)
;:::
e-
z
z;:::O
1
-WTW + CeTz
2
{
Y(XTw-
y +be);:::
e-
z
z;:::o
Bay
gicr
thay m6i (xi,
yi)
bcri

hai
mai
chi a cho hai
lap
Ia (xi,
Yi
- 1 -
c)
thuoc
lap
c6 nhan 1
va
(xi,yi + 1 +c) thuoc
lap
c6 nhan -1,
ta
c6 bai toan:
1
-WTW + CeTz
2
{
xr
w - y +
be
;:::
-£e
-
z+
-xrw+y-be
;:::-Ee-z-

z+,z-;:::0
Ta
c6 mo hinh toan hoc cho SVM h6i quy tinh:
1
-wrw
+
CeT(z+
+
z-)
2
{
T .
+
Y
.
- w xt - b <
ce
+
z.
L - L
-yi
+
wT
xi + b
:::;
ce
+ zi-
z(,zi;::: 0
True
giac c6 giup tim ra nhung can phu hqp

vai
bai toan. Tuy nhien chung ta
rAt
d.n
djnh
l1;1i
mo
hinh nay. Chung ta phai bao dam
lcri
giai cua mo hinh Ia mot
uac
luQ'ng
vfrng cho bai toan h6i quy. Vai can SVM chung ta se phai cap s6
VC.
Mo
hinh SVM giai bai toan tach
su
d1,mg
cac sieu
ph!ng
theo each sau:
"'
{-1,
y=
1,
wrx+b<O
wTx+b;:::O
Khi
Ay
tAt

ca cac khong gian gia c6 s6 VC huu
h1;1n.
Mo
hinh SVM giai bai
toan
h6i quy
su
dl,mg
cac sieu theo each khac:
y =
WTX
+ b
20
DE
TAI
cAP
cd
so

×