Tải bản đầy đủ (.pdf) (55 trang)

Phân tích hồi quy bằng support vector machines (SVM)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.08 MB, 55 trang )

BO• GIAO DUC
VA DAO TAO


TRUONG DAI
HOC
KINH TE TP.HO CHi MINH



DE TAl NGHIEN ciru KHOA HOC
CAP CO SO

.

'

.

PHAN TiCH HOI QUY BANG
SUPPORT VECTOR MACHINES (SVM)

MA SO: CS-2007-01
BQ GIAO DUC.DAOTAO- .. ,TRIJONG D/,11 HQC KINH r{rP.HCM

THU'VItN

r-l19c:-

CN: ThS. GVC HuYNH VAN DUC


TP. HO CHi MINH NAM 2009

I'


BO• GIAO DUC
VA DAO TAO


TRUONG DAI
HOC
KINH TE TP.HO CHi MINH



DE TAl NGHIEN Ciru KHOA HOC
CAP CO SO


'

"

,

;t,.

~

PHAN TICH HOI QUY BANG

SUPPORT VECTOR MACHINES (SVM)

MA SO: CS-2007-01

CHU NHI¢M:
THANH VrEN:

ThS. GVC HUYNH VAN DUC
ThS. GV NGUYEN CONG TRi

TP. HO CHi MINH NAM 2009


Ngay nay chung ta dang dung tru&c mot kh6i luQ'ng du li~u kh6ng 16 fin chua ben trong
nhi~u quy Iuat cha dUQ'C kham pha. Cung v&i S\1' phat triSn cua khoa hoc, S\1' hiSu bi~t cua
chung ta vS nhiSu d6i tUQ'ng, S\1' vat dUQ'C dfiy du han va chi ti~t han. M6i quan gifra cac
y8u tfi theo d6 cang them phuc t~;tp. Mot thuoc tinh c6 thS c6 m6i quan h~ v&i rAt nhiSu
nhung thuoc tinh khac, ddn d8n du li~u quan sat duQ'c thuang c6 s6 chiSu rAt l&n lam cho
cac phuang phap truySn th6ng gap nhiSu kh6 khan.
Sau thai ky hoang kim cua thJng ke rieng phdn (cac thap nien 1930- 1960, v&i phuang
phap clfe dc,zi likelihood do Fisher dS xuAt vao dfiu thap nien 1930, v6n lam vi~c rAt t6t v&i
dfr lieu c6 s6 chiSu nh6), ngucri ta b~t dfiu quay l~;ti v&i thJng ke t6ng quat [1]. Ngay Iap
tuc mot nguyen ly chung dUQ'C chAp nhan rong rai, nguyen ly qrc tidu t6n thdt thlfc
nghi~m (Empirical Risk Minimization- ERM).
V&i dii li~u c6 s6 chiSu Ian, khong gian gia thi8t tra nen phuc t~;tp. Lam th8 nao vira kiSm
soat dUQ'C khong gian gia thi8t vira bao dam tinh vfrng cua cac u&c luQ'ng? Nguyen ly qrc
tidu tbn thdt cdu true (Structural Risk Minimization - SRM) da duQ'c d~ xufit vao gifra
thap nien 1970 nh~m thvc hi~n nguyen ly ERM c6 kiSm soat S\1' phuc t~;tp cua khong gian
gia thi8t.
Sau d6 (1990), cac mo hinh Support Vector Machines (SVM) duQ'c gi&i thi~u nhu Ia mot

phuang phap cai dat nguyen ly SRM. Tu d6 d~n nay, cac thuat toan SVM da chung to
duQ'c kha nang lam vi~c hi~u qua v&i dii li~u c6 s6 chiSu l&n.
Trang dS tai nay, chung toi gi&i thi~u mo hinh SVM nhu Ia mot(phuang phap h5i quy
hi~u qua cho dfr li~u nhi~u chi~u c6 tinh phi tuy~n cao. Trong khuon kh6 cita m9t dS tai
cAp 00 sa, chung toi khong c6 tham vong l&n, khong dua ra bfit ky mot nghien cuu m&i
hoac mot ung d1,mg thvc t8 hi~u qua nao ca. Chung toi tap trung trinh bay mot each c6 M
th6ng cac khai ni~m, cac bai toan va cac thuat toan huAn luy~n cho thAy SVM dang dS
chung ta dfiu tu nghien Cll'U sau han vS n6.
Ngmli ra chung toi cling da cai ti~n mot thuat toan huAn luy~n SVM, da trinh bay t~;ti Hoi
thao Qu6c gia lfin thu Ill Nghien c(ru ca ban va ung d1,1ng Cong ngh~ thong tin nam 2007
(Hoi thao F AIR07), va xay dvng mot chuang trinh minh hoa. Chung toi da dung chuang
tiinh nay ch~;ty du li~u thvc t8 lAy mot dS tai nghien cuu cfip bo [20].

tu

CAu true cua dS tai g5m ba chuang va mot ph1,1 l1,1c.
-

Chuang 1 phac thao mot hue tranh toan canh, cling gi&i thi~u dong CO' nghien Cll'U.
Chuang 2 chi ti8t vi~c xay dvng mo hinh.

-

Chuang 3 trinh bay mot thuat toan huAn luy~n chi ti8t d8n muc c6 thS cai dat duQ'c
d~ dang.

-

Phfin ph1,1l1,1c trinh bay cac k8t qua ch~;ty thvc nghi~m, bao g5m du li~u Ifiy


tu [20].


PHAN TiCH HOI QUY BANG SVM

Toi xin g&i loi cam an chan thimh d~n Phong Qufm ly khoa hQc - HQ'p tcic quBc t~ da t~o
di~u kien cho chung toi hoan tAt d~ tai nay; Cam an cac d6ng nghiep trong khoa Tin hQc
quan ly, cac d6ng nghiep tu Khoa Cong ngM thong tin, d~i hQc Khoa hQc t1,r nhien
TpHCM, da tham gia va dong g6p cac y ki~n quy bau trong cac bu6i seminar duQ'c t6
chuc cho d~ tai nay.
nhung d~ tai duQ'c thl,l'c hien ch~c ch~n con nhi~u khi~m
khuy~t. Chung toi nghiem tuc d6n nhan cac g6p y gAn xa.

Du rAt n6 h,rc bam sat m1,1c tieu,

Tp.H6 Chi Minh, ngay 24/04/2009
Nh6m tac gia

11


. .

Mucluc
MO'diu ...................................................................................................................................................... i

MIJC II}C .................................................................................................................................................. iii
ChU'O'Dg 1:

D't vftn d~ ......................................................................................................................... 1


Chwung 2:

MO hinh SVM.................................................................................................................... 7

1.

Mo hinh SVM tach tuydn tlnh ....................................................................................................... 9
Bai toan tach .................................................................................................................................... 9
Mo hinh toan hoc .............................................................................................................................. 9
Mo hinh chiu 16i ............................................................................................................................. 12

2.

Mo hinh tach phi tuyin ....... ........................................................................................................ 15

3.

Mo hinh hdi quy SVM. ................................................................................................................ 19
'
khAong g1an
• glc:t
'At}'J.
Cau tnic
uet ......................................................................................................... . 20
Mo hinh toan hoc ............................................................................................................................ 21
J.

ChU'O'Dg 3:


1.

Thu~t toan huftn luy~n SM0 .......................................................................................... 25

Mota thuqt toan ........................................................................................ :............................... 26
K 1em
A
J..

' dJ..
:t
tra ti'nh to1
uu cua
p huong an
01 ngau
.................................................................................. . 26
Di~u chinh phuong an..................................................................................................................... 27
Xay d\fng bang tinh toan ................................................................................................................. 29
Minh hQa ........................................................................................................................................ 29
Minh hQa trubng hQl> phi tuy~n ....................................................................................................... 31
Thu~t toan SMO cua Platt [25]. ....................................................................................................... 32
Heuristic tim i ................................................................................................................................ 33
Heuristic ti1n j .................................................................................................................,.............. 33

2.

Thugt toan SMO cho biti toim hdi quy ........................................................................................ 34
Xay d\fng bang tinh toan ............................... '"""' ............................................................................... 37
Minh hQa ........................................................................................................................................ 38


K@t luij.n ................................................................................................................................................. 41
Tai li~u tham khao ....................................................................................................._............................ 43
Phi} I~.IC 1: Thl}'C nghi~m ......................................................................................................................... 47

1.

Bai toan tach ............................................................................................................................. 47

2.

Bai to an hdi quy................................. ........................................................................................ 48

3.

Bai toan thl,fc ti. ......................................................................................................................... 49

Du lieu ........................................................................................................................................... 49
K~t qua ch~y thir nghiem ................................................................................................................ 50
Chi ml}c ................................................•.........................................•........•...................•......•...................52

Ill


ChU'ong 1:
D(it vftn d@

Bai toan suy luqn quy nqp da c6 tu han 2000 nam qua. Tuy nhien mai d~n thS ky XVIII,
mf>i lien he gifra nganh khoa h(JC thl!C nghi~m va CRC nganh khoa h(JC chinh Xac khac nhu
toan, logic mai duqc d~t ra (D. Hume va I. Kant, bai toan phan bi~t - demarcation
problem)[ 1].

C6 th~ n6i S\1' phat tri&n cita khoa hrc va cu(jc each mqng v~ c6ng ngh~ thong tin trong
th~ k)r XX da la ti8n d8 cho viec xu~t hi en cite y tuc:'Yng m6i trong suy luan th6ng ke.

M~c du cite ySu t6 cua suy Iuan th6ng ke da tan tl;li each day han 2 thS ky, trong cite cong
viec cua Gauss va Laplace, nhung n8n tang that S\1' cua ly thuySt chi dUQ'C b~t ddu vito
cu6i thap nien 1920. thai di~m d6, cite th6ng ke mota hfiu nhu daddy du v6i nhi8u quy
luqt th6ng ke cho phep mota t6t cite biSn c6 xay ra trong thS gi6i thuc. Cling vao nhung
nam 1920 nay, cite mo hinh ca
cho ca hai tiSp can: thf>ng ke c6 di~n (con dUQ'C goi la
th6ng ke tham s6) Ifin th6ng ke t6ng quat cling da hinh thanh [1]. Su phat tri~n cua khoa
hoc hien d~;ti b~t ddu vito cu6i thS ky XIX da lam thay d6i su hi~u biSt cua chung ta v8 mo
hinh t6ng quat cua thS giai thuc tu mo hinh mang tinh xac dinh sang mo hinh co tinh
ngdu nhien. Cite y tuc:'Yng mai c6 y nghla cho suy Iuan th6ng ke xu~t hien trong thai ky
nay la cua Karl Popper, Glivenko, Cantelli, Andrei N. Kolmogorov va Ronald A. Fisher

a

sa

[1].
Karl Popper, vito nhung nam dfiu cua thap ky 1930, da xem xet bai toan quy n~;tp tU khia
c~;tnh triSt hoc. Nguyen ly phiin bi~t cua ong r~t t6ng quat, dua tren khai niem v8 kha nang
sai (falsifiability) cua ly thuySt. Lfin dfiu tien ong da lien kSt kha nang t6ng quat h6a v6i
khai niem dung lut;mg (capacity).
Cling vito nhung nam ddu cua thap ky 1930 nay, Andrei N. Kolmogorov l~;ti xet bai toan
quy n~;tp tu khia c~;tnh th6ng ke ly thuySt. Cong viec cua ong dua vito hai k~t qua chinh: S\1'
h()i 1\1 cua phan ph6i thuc nghiem dSn phan ph6i thuc (Glivenko va Cantelli, 1933) va t6c
d() h()i 41 nay nhanh co ham mil va d()c tap v6i ph~n phf>i (Kolmogorov, 1933). Hai k~t
qua nay la ca SO chfnh cua S\1' phat tri~n cua nguyen ly thJng ke tJng quat.
Cling trong thai ky nay, Ronald A. Fisher da xet bai toan quy n~;tp tu khia c~;tnh thf>ng ke

ung d1.mg. Do ap luc cong viec luc b~y gia cAn c6 cite k~t qua tinh toan nhanh, dan gian
va hieu qua, R. Fisher da d8 nghi m()t ti~p can mang tinh rieng phdn, U'cYC llf(J11g cac tham

sJ cua ham mat d(). Ti~p can nay da chia khoa hoc thf>ng ke thanh hai nhitnh thf>ng ke
t6ng quat va th6ng ke ~ieng phdn, con dUQ'C goi la th6ng ke tham s61. Trong luc mo hinh
th6ng ke t6ng quat phat tri~n cham, thi mo hinh th6ng ke tham sf> l~;ti phat tri~n r~t nhanh.
B~t ddu tu thap nien 1930, chi trong vong 10 nam sau cite y~u tf> chinh cua mo hinh thf>ng
1

Th~t ngfr dung cila n6

Ia th6ng ke parametric.


PHAN TiCH HOI QUY BANG SVM

ke tham s6 da dtrQ'c dua ra. Khoang thai gian tir 1930 dSn 1960 Ia thai Icy vang son cua
tiSp can nay. Cac gia thiSt chinh cua mo hinh th&ng ke tham sf> Ia [1]:
1. D~ tim mot quan h~ phlJ thu(Jc ham tir dfr Ii~u, cac nha th6ng ke dinh nghla mot
tap cac ham phl,l thuoc tham sf>, v&i sf> it cac tham s6 va tuy~n tinh theo tham s6;
2. Lu~it th&ng ke cua thanh phdn ng~u nhien, Ia sai s6 giua mo hinh va du li¢u thl!C,
tuan thea Iuat phan ph&i chuAn;
3.

voi gia thiSt 2, phuang phap Cf!C dc;Ii likelihood Ia phuang phap t5t.

Ngay nay khi n6i dSn luQ'c dB cua Fisher nguai ta hay goi Ia th5ng ke c6 di~n. Th5ng ke
cfl di8n di giai ba bai toan: U'cYC lu()11g ham m(it dQ, U'cYC lw;mg hJi quy va U'cYC lu()11g ham
phan bi¢t dung cac mo hinh tham sf> khac nhau (Phuang phap Cf!C dqi likelihood,
R.A.Fisher, 1930) v&i CO' sa toan vfrng ch~c (Mathematical Methods of Statistics, Harold

Cramer, 1946). Mot each tflng quat, suy Iuan thf>ng ke di giai mot bai toan qt'c tidu phidm
ham dva vao du Ii~u thvc nghi~m. V&i each Ic\m rieng phdn cua Fisher, ly thuySt th5ng ke
c6 di8n da khong xem xet mot each chi tiSt bai toan Cl,lC ti8u phiSm ham nay 2 .
Ngoai ra, u&c luQ'ng ham gia tri thvc tir dfr li~u duQ'c xem nhu bai toan trung tam cua
thf>ng ke trng d1,1ng. Ky thuat chinh dtrQ'c sir dt,mg aday Ia phuang phap t6ng binh phUV11g
be nhdt va phuang phap t6ng modul be nhdt dtrQ'c Gauss va Laplace dS xufit trong thai
gian dai trong qua khu. Tuy nhien nhfrng phan tich vS cac phuang phap nay chi m&i thvc
hi~n trong thS ky XX. Thea d6 thf>ng ke c6 di~n chu trong dSn cac u&c ltrQ'ng khong
ch~ch 3 .

Gia thiSt vS u&c luQ'ng khong ch~ch b~t ddu duQ'c xem xet4 sau khi James va Stein (1961)
xay dvng mOt u&c ltrQ'ng ky vong cua mot vecta ng~u nhien (n ;::: 3) c6 phan ph&i chudn
v&i rna tran tuang quan dan vi. U'&c ltrQ'ng nay cMch va v&i kich thu&c quan sat c5 dinh
u&c IUQ'ng nay dSu t6t han trung binh m~u (mot u&c luQ'ng khong ch~ch cua· ky vong). v~
sau Baranchik da dua ra mot tap cac u&c ltrQ'ng nhu vay, baa gBm u&c ltrQ'ng cua JamesStein.
Them vao d6, trong cac bai toan thvc tS, khong phai tfit ca sac gia thiSt cua mo hinh th6ng
ke tham s6 d~u duQ'c thoa man. Cac bai toan ngay nay c6 sf> chi~u rfit Ion d~n dSn S\1 bung
n6 tA hQ'p cua cac tham sf>. Ngoai ra quy Iuat cua thanh phdn ng~u nhien c6 th8 khong
thea phan ph6i chudn (Tukey) va phuang phap eve d~i likelihood cling khong Ia phuang
phap t6t nhfit (James va Stein) [1].
Da c6 nhfrng c5 g~ng VUQ'tqua h~n chS nay:
1. P. Huber (1960) phat tri~n tiSp can robust cho phep Io~i gia thiSt phan ph5i chudn

cua thanh phdn ng~u nhien;

Bill toan qrc ti~u phi~m ham da tn'l' tlllinh bai toan chinh lien quan d~n xAp xi ham va giai tich ham.
Trong s6 cac phuang pMp u&c lu<;Yng kh6ng ch?ch thi phuang pMp t6ng binh phuang be nhAt Ia phuang
fhaP c6 phuang sai be nhAt.
Vao nhfrng nc1m 1960 ly thuy~t cac bill toan y6u (ill-posed problems) dua ra mot phuang phap xay d\l'ng
cac u&c luQ'ng cMch. V~ sau y tu&ng nay duQ'c dung cho bill toan u&c Im;mg h6i quy cua Iy thuy~t hoc

thflng ke. Thflng ke c6 di~n t~p hung vao bai toan I\fa chon mo hinh.
2

3

2

BE TAl CAP CO sd


Chuang 1: DATV AN DE
2. J. Nedler (1970) d~ xuAt mo hinh tuySn tinh t6ng quat cho phep chon mo hinh t6t
nhAt;

3. L. Breiman, P. Huber va J. Friedman xet d~ng ham phi tuySn theo tham s6 va b~t
d~u dung phuang phap ClfC tiJu ham thi¢t hqi thlfc nghi¢m (Empirical Risk
Minimization- ERM) thay cho eve d~i likelihood.
Cuoc each m~g v~ cong nghe thong tin 50 nam sau d6 da tac dong to Ion d~n dai s6ng,
rna ra cac CO' hoi moi cho phep c6 nhi~u sang t~o trong cac cong viec hang ngay. Trong
th6ng ke c6 di~n s6 tham s6 cua mo hinh Ia nho do d6 kSt qua cua n6 chi gioi h~n trong
cac hiun c6 s6 chi~u nho. Ngay sau khi cuoc each m~ng cong nghe thong tin cung dp cac
ca hoi uoc luQ'ng cac ham voi s6 chiSu Ion, nguai ta b~t d~u xem xet l~i biSu d6 cua
Fisher va quay l~i th6ng ke t6ng quat.
I>a c6 nhi~u c6 g~ng trong giai quy~t bai toan voi s6 chiSu Ion. Truoc nam 1970 ti~p can
chinh cho bai toan uoc luQ'ng hBi quy nhi~u chiSu Ia phuang phap t6ng binh phuang be
nhAt va phuang phap t6ng modul be nhAt voi cac ham tuySn tinh theo tham s6. Trong cac
nam cua thap nien 1970 cac ham tuySn tinh t6ng quat duQ'c dung voi hy vong tim dUQ'C s6
Thap nien 1980-1990 xu~t hien phuang phap tl,l diSn, voi s6 Ion cac
nho cac ham CO'
ham cho truoc, dung du lieu xac dinh mot s6 nho cac ham va uoc luQ'ng cac he s6.

Phuong phap nay gBm Projection Pursuit (Friedman va Stuetzle (1981), Huber (1985));
MARS (Multivariate Adaptive Regression Spline) (see Friedman (1991)) rAt thu hut va
tra thanh cong Cl,l chinh trong phan tich nhiSu chiSu.

sa.

Tra hti ti~p can t6ng quat da bi quen lang trong su6t 20 nam. Nam 1958 F. Rosenblatt,
mot nha sinh ly hoc, dadS xu~t mo hinh perceptron cho bai toan tach tuySn tinh va co ths
t6ng quat h6a dUQ'C. Mo hinh perceptron phan anh SIJ hi~u bi~t sinh ly hoc th~n kinh c6

di~n trong CO' chS hoc nhu Ia SIJ tuang tac gifra s6 Ion cac tac nhan dan gian (mo hinh
naron cua McCulloch-Fitts). Ngay Iap tuc mot nguyen ly chung duQ'c chAp nhan, chinh Ia
nguyen ly ERM. Sau d6 ly thuySt ERM cho bai toan nhan d~ng mdu da duQ'c xay dvng
vao cu6i nam 1960.
Nam 1963, Novikoff dua ra dinh ly v~ SIJ hoi tl,l cua thuat...toan perceptron (Hoi nghi hoc
may ~i Vien Khoa hoc diSu khi~n Moscow) c6 anh huang m~nh dSn nhfrng nguai tham
dv. Dung may tinh va cac thuat toan dan gian b~t chu6c each lam cua con nguai, dong
vat va w nhien d~ giai quySt bai toan. I>inh Iy nay lam phat sinh hai cau hoi:
1. Tim lai giai t6i uu?
2. Bai toan tach Ia each t6t nhAt d~ di~u khi~n SIJ t6ng quat?

vs sau, vao nhfrng nam cua thap nien 1980, mot trong nhfrng bai toan n~n tang (bai toan

Glivenko-Cantelli) da ddn d~n ly thuySt thdng ke tdng quat, dva vao du true cua ho cac
khong gian gia thiJt IBng nhau [1]. Theo d6, ben c~nh chAt lUQ'ng cua xAp xi, tiSp can nay
con quan tam dSn SIJ phuc t~p cua cac khong gian gia thiSt. Nhu vay viec ki~m soat cac
khong gian gia thiSt Ia mot trong nhfrng cong Cl,l chinh cua tiSp can nay.
Lam thS nao ki~m soat duQ'c do phuc t~p cua khong gian gia thiSt? Theo Iuat s6 Ion c6
di~n, t~n suAt cua mot biSn c6 se hoi tl,l dSn xac suAt xay ra biSn c6 m\y. Tuy nhien voi
mot ho cac biSn c6, SIJ h(Ji {1f aJu c6 dam bao hay khong thi khong ch~c. Vi~c ki~m soat

KhoaTHQL

3


PHAN TiCH HOI QUY BANG SVM
d9 phuc t~;tp cua khong gian gia thi~t c6 lien quan d~n ly thuy~t v~ sv h9i tl,l d~u. C6 ba
khai niem v~ d9 phuc t~;tp cua khong gian gia thi~t duqc d~ cap (xem [1 ], chuang 2) la d(j
h6n d(m (Annealed Entropy), ham tang truong (Growth Function) va s6 chi~u VC (VC
dimension). Ly thuy~t v~ Sl,l' h()i tl,l d~u da duqc xay dvng vao cu6i nam 1960 (Vapnik va
Chervonenkis, 1968, 1971) v6i honda tang la ho cac khai niem dung luqng (capacity) cua
tap cac ham chi thj (indicator functions, cac ham nhan gia tri 0 ho~c 1) con dUC)'C goi la sJ
chiJu VC. Nguyen ly eve ti~u ham l6i v6i s6 chi~u VC nho duqc goi la nguyen ly qrc tiJu
thi~t hqi cdu true (Structural Rist Minimization- SRM)
Su phat tri~n ti~p tl,lc cua nguyen ly nay da d~n d~n m9t lo~;ti thuat toan m6i duqc goi la
may vectO' t~u5 (Support Vector Machines- SVM) [1, 2]. Gi6ng v6i mo hinh perceptron,
cac thuat toan SVM cung t6ng quat h6a tu viec giai bai toan tach tuy~n tinh.

Tu mo hinh perceptron va kha nang t6ng quat h6a cua n6 (F. Rosenblatt,

1958), mo hinh
m~;tng neuron nhan t~;to (Artificial Neural Network - ANN) da phat tri~n va c6 cac ung
dl,lng hieu qua trong nhi~u linh vvc khac nhau [3, 4, 5, 6, 7]. Nhfrng gi m~;tng neuron lam
duqc thi SVM cung hlm duqc, tham chi con hieu qua han [2, 8, 9]. Nhfrng thanh cong cua
cac mo hinh SVM khac nhau da chung to kha nang cua lo~;ti thuat toan nay [8, 9, 10, 11].
I>~c biet trong m9t di~u tra gfin day [9] (Xindong Wu, 2007) da x~p SVM n~m trong top
10 cac thuat toan khai khoang du li~u.

Ngay nay luqng du li~u tang gftp doi sau m6i 20 thang (Sever Hayri, 1998). Rftt nhi~u
quy luat An chua ben trong kh6i luqng du li~u vo cling l6n d6 cfin duQ'c phat hi~n. Llnh

Vl,l'C Kinh t~ cung khong la ngo~;ti l~. I>i~u gi xay ra n~u m9t cong ty n~m b~t dUC)'C hanh vi
cua khach hang? Ch~c ch~n m()t chi~n luqc kinh doanh hieu qua se dUO'C d~t ra.
Trong khoa hoc kinh t~, viec xu ly du li~u la cong vi~c h~t sue quan trong. Nhi~u giai
do~;tn trong qua trinh dua ramo hinh, ki~m dinh mo hinh d~u cfin phai xu ly du li~u. v~
phuang di~n nao d6 nghien cuu trong kinh t~ c6 th~ d6ng nhAt v6i dfr li~u. ·
Hai phuang phap chinh dung d~ phan tich du li~u duQ'c su dl,lng trong kinh t~ la phuang
phap ky thuat va phuang phap ca ban [20]. Phuang phap nao cung dva tren ca
cua ly
thuy~t xac xuAt. Chung ta d~u bi~t, bai toan chinh cua ly thuy~t xitc suftt la nghien ClrU
t6ng cua cac d~;ti luQ'ng ngftu nhien d()c lap c6 phuang sai d~u.

sa

Dfr li~u trong ph1;1m vi h~p va ng~n h1;1n c6 th~ thoa man di~u ki~n nay, tuy nhien v6i kh6i
luQ'ng du li~u d6 s9 hi~n nay di~u d6 khong con ch~c dung nfra. V6i cac phuang phap n6i
tren vftn d~ lAy m~u chinh xac anh huang l6n d~n k~t qua. Lam th~ nao lfty mftu phu hO'P
v&i vAn d~ nghien cuu trong ngfr canh nay? Them vao d6, cling v6i vi~c toan du hoa n~n
kinh t~, nhi~u y~u t6 rAt m6i dang tac d()ng vao cac n~n kinh t~. Vai tro tac d()ng cua
cMng dang An chua trong du li~u rna vi~c lAy mfiu khong chinh xac co th~ lam sai l~ch
k~t qua phan tich.
Thi truong chung khoan tu lau da duqc xem la llnh vvc dfiu tu c6 lO'i nhuan cao. Bai toan
dl,l' bao gia chung khoan chiu anh huang bai tuang tac gifra cac lo~;ti hinh kinh t~, chinh
sach, tham chi tam ly trong quan h~ rAt phuc t~;tp nen rftt kh6 khan trong dv bao. C6 chung
5

4

Mot s6 tai li~u ti~ng Vi~t dung thu~it ngfr may vecta h6 tr(Y.

DE TAl CAP Cd sd



Chuang 1:

DA TVAN DE

cu cho riing (Yunos, Zaid, Jamaluddin, Shamsuddin, Sallehuddin, & Alwi, 2001) phan
tich ky thuat khong c6 kha nang du bao chinh xac gia chung khoan. GAn day ky thuat tinh
toan m~m nhu Granular computing, Rough sets, Neural networks, Fuzzy sets, Genertic
algorithms dUQ'C Slr dt,mg rong fiii d~ cai thi~n do chinh xac CUa du baa cling nhu hi~u qua
tinh toan t6t han so v6i phan tich ky thuat. M~ng ncr ron da chung to tinh hi~u qua trong
bai toan du baa gia chung khoan (Yoon & Swales, 1991 ), c6 kha nang giai rna tinh phi
tuy~n cua du lieu, mo ta cac dl,ic trung cua thi truang chung khoan (Lapedes & Farber,
1987), du baa chi s6 thi truang (Chong & Kyoung, 1992.) (Freisleben, 1992), nhan d~ng
cac miu trong cac bi~u d6 thuang m~i (Dutta & Shekhar, 1990), lai sufit cua trai phiSu
cong ty, uac luqng gia Iua ch<;>n (Li, 1994)va chi baa mua ban (Chapman, 1994)
(Margarita, 1992).
Nhu cAu c6 them cac phuang phap va ky thuat mai trong viec xu ly dfr li~u ngay cang
Ian. Nhi~u phuang phap va ky thuat khai pha dfr li~u d~ phat hi~n tri thuc da dang vase
COn dUQ'C dua ra da chtrng to tinh hi~u qua CUa chung trong nhi~U l'inh VUC khac nhau,
trong d6 c6 kinh t~. Cac phuang phap va ky thuat c6 th~ k~ dSn nhu: SVM, tim Iuat kSt
hqp, ly thuy~t tap tho, .... Chung toi tim thfiy cac thuat toan SVM duqc xay dung dua tren
nguyen ly SRM vai n~n tang toan h<;>c vfrng ch~c. Ngoai ra cac mo hinh SVM da duqc
chung to tinh nang hi~u qua cua n6 so vai mo hinh m~ng ncrron nhan t~o va nhi~u mo
hinh th6ng ke khac [21]. Chung toi hy v<;>ng cac mo hinh SVM cung cfip them nhi~u cong
Cl,l hi~u qua cho nhu du rfit Ian trong viec tim cac quan he ham tir dfr lieu trong linh vue
kinh t~ hi~n nay.
GAndayc6rfitnhi~umohinhSVMduqcd~nghi [11, 12, 13, 14, 15, 16, 17, 18, 19]. Tinh

hi~u qua cua nhi~u mo hinh phAn nhi~u duqc thuy~t ph1,1c thong qua vi~c h<;>c cac tap dfr

li~u miu. D~ c6 th~ tra lai cac cau hoi tren mot each thea dang chung ta cAn quay trcr l~i
ly thuy~t va lam cac nghien cuu mang tinh CO' ban cao. C6 nhu vay chung ta mai c6 CO'
d~ dua ra mo hinh mai va ap dl,lng duqc n6 trong cac bai toan thuc t~.

sa

Mot each t\1' nhien c6 mot s6 cau hoi dl,it ra cho mot mo hinh SVM Cl,l th~ Ia:
1. Mo hinh nay li~u c6 vfrng khong?
_
2. Lam thB nao ki~m soat duqc cac khong gian gia thi~t If>ng nhau?
3. Do phuc t~p cua thuat toan hufin luy~n cua mo hinh?
Day Ia mot cong vi~c phuc t~p. Trang ph~m vi cua mot d~ tai cfip CO'
thuc hien mot sf> nghien cuu h~n ch~. M1,1c tieu dM ra cho d~ tai nay Ia:

sa, chung toi chi

1. Cac mo hinh SVM CO' ban
2. Giai thieu thuat toan hufin luy~n nhanh
3. Xay dung mot cai dl,it thl'r nghi~m
Thong qua d~ tai chung toi mu6n giai thi~u mot lo~i m6 hinh cho bai toan hf>i quy ap
d1,1ng cho bcU toan uac luqng quan h~ ham tir dfr li~u cua kinh t~. Dfr lieu ch~y thuc
nghi~m duqc Ifiy tir mot d~ tai nghien cuu cfip bo (2007), trong d6 cac tac gia da dung mo
hinh hf>i quy tuy~n tinh thea ti~p can th6ng ke tham sf> [20]. Cac mo hinh SVM ca ban
duqc trinh bay
cac tai li~u [1, 2, 21, 22, 23, 24]. Thuat toan hufin luy~n nhanh la thuat
toan SMO [25, 26, 27, 28] duqc ch<;>n d~ trinh bay vi cac ly do:

tu

KhoalHQL


5


PAAN TICH HOI QUY BANG SVM
-

Lam ro dUQ'C each xay dvng thuat toan c6 th~ hoc hoi duQ'c;
Cho ca hoi cai ti~n [26, 27, 29];
Vdn con tinh thoi S\l [13]

Chuang trinh cai dl,it thu nghi~m dUQ'C thi~t thi~t theo huang d6i tUQ'ng cho phep
l~i hi~u qua ca v~ thi~t k~ thuat toan m6i ldn rna ngu6n.

6

su dt..mg

DE TAI cAP co sd


ChU'O'Dg 2:
Mo hinh SVM

Trong ly thuy~t hoc th6ng ke, bai tocm hQc c6 giam sat duqc hinh thanh nhu sau [I, 2,
21]. Cho tap dfr li~u hoc {(xi, Yi)} duqc lAy m~u theo phan b6 xac suAt chua bi~t p(x, y).
Gia
t5n t~i quan M hamy ph\1 thuoc vao X. V6i hamfkha dr, chung ta dinh nghra ham
V(y,j(x)) do sv t6n thAt (Loss Function) khi chdp nhanf Ham/ d.n tim chinh Ia lai giai
cua bai toan c~ tiJu phi~m ham thi~t hgi (Risk Functional):


su

I v(y,f(x))p(x,y)dxdy

( 2.1)

Vi p chua bi~t, chung ta tim lai giai trong mot lop ham (duqc goi Ia kh6ng gian gia thi~t)
b~ng each qrc tiJu ham thi~t hgi thT:rc nghi~m (Empirical Risk Minimization, ERM):

( 2.2)

su

Gia
[* Ia lai giai cua bai toan C\l'C ti~u phi~m ham thi~t h~i (2.1) va /Ia lai giai cua bai
toan C\l'C ti~u ham thi~t h~i thvc nghi~m (2.2). Goi L Ia gia tri cua phi~m ham thi~t h~i ung
v6i [* vaLE Ia gia tri cua ham thi~t h~i thvc nghi~m ung v6i /. Ta c6

L = JV(y,f*(x))p(x,y)dxdy

( 2.3)
( 2.4)

w

Cau hoi du<;Yc d~t ra mot each nhien Ia lam th~ nita danh gia aU(J'C Sl:f khac bi~t giua hai
gia trt nay. V6i ti~p can SRM, da duQ'c d~ cap trong ph§.n m& d§.u, ham tdn thdt ngoai
vi~c ph1,1 thuoc vao I6i cua ham dv bao j(x) so v6i gia tri thvc y, con ph1,1 thuoc vao do
phuc t~p cua khong gian gia thi~t.

C6 mot s6 khai ni~m do do phuc t~p cua mot ho cac ham [1]. Trong d~ tai nay chung toi
chon gi6i thi~u khai ni~m s6 chidu VC 6 (VC dimension) (1, 2]. Chung toi se dinh nghla
chi ti~t khai ni~m nay trong qua trinh mo hinh h6a.
Trong ph~m vi cua d~ tai chung toi khong di ;vao nghien cuu cac danh gia v~ S\l' khcic bi~t
gifraL (2.3) vaLE (2.4) rna tap trung vao qua trinh mo hinh h6a va thuat toan hufin luy~n.
Tuy nhien tru6c khi di vao chi ti~t, chung toi cling mu6n gi&i thi~u mot dinh ly cho thfiy .
vai tro cua s6 chidu vc trong vi~c dua ra cac danh gia. :E>inh ly sau neu ra mot danh gia v~
S\l' khac bi~t gifra L vaLE dva theo d9 tin cc;ly, s6 chidu vc va kich thu&c cua tqp du li~u.

6

VC Ia vi~t t~t ten cua hai tac gia d€ xuftt khai ni~m: Vapnik va Chervonenkis (1998).

7


PHAN TiCH HOI QUY BANG SVM

Gia sit VIa s6 chi~u VC cita kh6ng gian gia thi~t H, m Ia kich thu&c cita tcjp du
li~u. V&i xac sudt 1- 1'f, IJi kj; vr;mg be nhdt L va IJi thTJc nghi~m be nhdt LE thoa
rang bu9c

IL -LEI 75, 4..fl

v(1+log c;) )-log(%)

( 2.5)

m


d9c lcjp v&i phiin b6 xac sudt p(x, y)

Ro rang dinh ly tren day chi c6 y nghia v6i cac khong gian gia thi~t c6 s6 chiBu VC huu
h~n. Nhu vay vi~c kiSm soat khong gian gia thi~t dong mot vai tro quan trong khi cai d~t
nguyen ly SRM. Theo nguyen ly SRM, cac khong gian gia thi~t g6m cac ho ham I6ng
nhau [ 1' 2] hinh thanh mot thu tuy~n tinh giup d~ dang xay dvng mo hinh hoc. v a d~
dung duqc cac danh gia gifJng nhu dinh ly (2.5), s6 chiBu vc cua cac khong gian gia thi~t
con nh~t thi~t phiii huu h~;tn.

w

Bai toan v6i s6 chiBu l6n Ia bai toan phuc t~;tp (R.Bellman 1960). Mo hinh SVM duqc xay
dvng dva tren nguyen ly SRM, c6 kha nang kiSm soat do phuc t~;tp cua khong gian gia
thi~t, cho phep giai bai to{m v6i s6 chi~u l6n. Xu~t phat tu bai toan tach tuy~n tinh c6 th~
t6ng quat duqc (mo hinh perceptron cua F. Rosenblatt, 1950), mo hinh SVM duqc xay
dvng va dAn tra thanh mot trong nhung phucmg phap hi~u qua ghii bai toan uoc luqng
ham tu du li~u thvc nghi~m. Dva tren mo hinh perceptron, mo hinh SVM di tim mot sieu
ph~ng tach t6i uu theo nghia cvc d~;ti 1~, Ia hanh lang chia c~t hai lop d6i tuqng [1].

Nhu vay khong gian gia thi~t cua mo hinh SVM duQ'c xay dvng hoi ho ham d~;tng [1, 2]

f(x,w)=wrx+b

(2.6)

DS cai d~t nguyen ly SRM, mo hinh SVM xay dvng ho ham I6ng nhau c6 d~;tng:

{f(x, w) = wr x + b, lwl -.5: A}

( 2.7)


Thay vi giai bai toan cho m6i khong gian gia thi~t con (2. 7), mo hinh SVM se ki~m soat
cac khong gian gia thi~t nay qua mot tham s6 [ 1]. Trong chuang nay chung tOi khong
quan tam d~n cong vi~c thi~t k~ cac thang do cling nhu cac thuoc tinh mo ta dfJi tuqng.
Chung toi thua nhan dfJi tUQ'ng duqc mo ta hoi mot vee tO' n chi~u cac s6 va tap trung vao
cac mo hinh toan hoc. Cac mo hinh toan hoc duqc gi6i thi~u trong chuang bao g6m: mo
hinh tach tuy~n tinh, mo hinh chiu I6i, mo hinh tach phi tuy~n va mo hinh h6i quy.

8

DE TAl CAP CO

sd


Chuang 2: MO HiNH SVM

Bai to{m tach la trucmg hqp rieng cua bai toan phan lop. Tai li~u nay trinh bay SVM cho
bai toim tach. Mo hinh SVM cho bai toan phan lop la mot chu d~ khac. TiSp can SVM
dva tn3n ca
cua vi~c tach tuySn tinh [ 1]. Gia su bai toan la kha tach tuySn tinh. D~
phan tach hai lop, c6 rAt nhi~u sieu ph~ng lam duqc di~u nay. TiSp can SVM chon ra mot
sieu ph~ng t6i uu. v~ m~t trvc giac, sieu ph~ng nay xac dinh do rong lon nhAt cua duang
bien duqc xac dinh bai sieu ph~ng.

sa

Hinh sau minh hoa cac khai ni~m quan trong cua SVM bao gam sieu ph&ng tdi uu, w!cta
tl:fa va M. Theo d6 sieu ph~ng t6i uu la sieu ph~ng tach c6 1~ lon nhAt va duqc xac djnh
qua cac vecta fl:fa (Support Vectors).


Mot each trvc giac sieu ph~ng t6i uu la duy nhk Tuy nhien phuang trinh bi~u di~n cua
mot sieu ph~ng la khong duy nhAt. Chung ta se di tim mot d~ng bi~u di~n, se duqc goi la
d~ng chinh t~c, sao cho bi~u di~n chinh t~c cua sieu ph~ng t6i uu la duy nhdt. Cho tapJifr

li~u {(xi, Yi) }i =

I..m,

trong d6 Yi

E ( -1,

1}. Xet sieu ph~ng tuy

y, w\: + b = 0, d~t

+b
lwl

wrxi
d+ =min
Yt=1

wrxi + b
d_ = max --:--:---

lwl

Yi=-1


d

= d+- d_

Sieu ph~ng nay la lai giai cua bai toan tach tuySn tinh nSu d > 0. Khi dy gia tri d duqc goi
la d9 r9ng cua tl Gia su tap dfr li~u la kha tach va gia su sieu ph~ng dang xet la mot sieu
ph~ng tach. C6 th~ chon b sao cho d+ =- d_. Voi b nhu thS, phuang trinh sau

wrx+ b

dlwl/2

KhoaTHQL

=

0

9


PHAN TICH HOI QUY BANG SVM
Ia phuang trinh bi~u diSn cua sieu ph~ng, han nua
1=

wTxi

dlwl/2


Yt=l

-1 - max
-

+b

min~__,...-

wTxi

Yt=-1

+b

fiiil

dlwl/2 '---'

Phurmg trinh thoa rimg bu9c sau alf(JC gri la phuang trinh chinh t~c cua sieu
phi!ing (con alf(lc g9i la sieu phi!ing chinh tcic) a6i v&i t(ip dfr li~u X cho tru&c.

( 2.8)
Chung ta quan Him d~n sieu ph~ng tach c6 dIan nhat V6i bi~u diSn chinh t~c, ta c6 sieu
phling t6i zm Ia duy nhfit [ 1] _

V6i bi~u diSn chinh t~c, mo hinh toan hoc cua SVM cho bai toan tach tuy~n tinh Ia bai
toan quy ho(lch toan phuang sau (con duqc goi Ia bai toan gBc):

1


-wrw ~min
2

+ b )Yi ;::: 1

( wT xi

( 2.9)

trong d6 {(xi,yi)}i=l,m Ia dfr lieu cua m dBi tuc,mg, Yi E {0,1} xac dinh lap cua d6i tuqng
thu ;_ Vi~t l(li du6i d(lng rna tr~n:
1
-wrw ~min

2

yrxrw +by;::: e

( 2.10 )

v6i
X= (xl
yr = (Yl
eT

= (1

x2


...

Yz

... Ym)

1

Y = diag(Yl

...

xm)

1)

Yz

... Ym)

Ta c6 bai toan dBi ngfiu cua bai toan g6c (2.1 O) la
1
-aTDa ~min
2

( 2.11)
trong d6

10


DE TAl CAP CO sd


Chuang 2: MO HiNH SVM

Tu ly thuy~t d6i ng~u, ta c6 cite c~p rang buoc d6i ng~u (xem ph1,1ll,lc 3):
1

Yi(wTxi

+b);:::

ai;::: 0

1

2

w=XYa

wtuyy

3

yTa = 0

b tUyy

Chung ta cfin tim cac vecto tl,l'a, la nhung vecto n~m tren 2 ducrng bien thoa man cac rang
buoc d~ng thuc. Tu c~p rang buoc d6i ng~u thu 1, ro rang vecto ung vai thanh phfin

duong cua (X la mot vecto tl,l'a.

Vai bang du li~u:

x2

XJ

Mt
M2
M3
M4
Ms

1
3
4
1
3

y

1
2
4

5
6

1

1
1
-1
-1

Ta c6 m = 5, n = 2. Bai toan quy ho~;~.ch toan phuong luc nay:
1

2

(w12 + w22 )

+w2
+2w 2
+4w 2
-5w 2
-6w2

w1
3w1
4w1
-w1
-3w1

~min

+b ~ 1
+b ~ 1
+b ~ 1
-b ;::: 1

-b ;::: 1

Hinh sau cho thfty cac vecto tl,l'a la M3, M4 va M5 .
....;...

···:····.
··:


.. ::· . . . . ·:· . .

~: ...

·.· ...·:··.·.··®··.=··

Giai

h~

phuong trinh ung vai cac vecto tl,l'a

dUQ'C WI=

KhoaTHQL

·=··

0.4,

Wz =


-0.8

+b

1

-b
-b

1
1

Va b = 2.6.

11


PHAN TiCH HOI QUY BANG SVM

K~t qua nay chi mang tinh tf\lc giac chua du thuy~t phl,lc, chung ta dn kiSm tra
.
tinh t6i uu cua n6. Tru6c h~t tinh

G~ :

XY =

=~ =~)
c~p


Do M,, M2 khong la vectcr tl,l'a, ta c6 a,= a2 = 0. Cac
ta h~:

4a3
4a 3

la

3

-a4
-Sa4
-a4

-3a 5 -6a 5
-a 5

rang bu<)c thu 2 va 3 cho

0.4
-0.8
0

c6 nghi~rn a3 =as= 0.4, l4 = 0.
KiSm tra c~p phucrng an: phucrng an cua bai toan g6c wT = (0.4 -0.8), b = 2.6
va phucrng an cua bai toan d6i ngdu aT = (0 0 0.4 0 0.4) thea tieu chuAn
t6i uu.

Trong thvc t~ nhi~u biSu diSn cua cac d6i tm;mg trong khong gian n chi~u la khong kha

tach. Khi §.y sieu ph~ng t6i uu duqc chon theo hai ml,lc tieu xung d<)t nhau: vua bao dam
d<) r<)ng cua bien Ian nhfit, vua bao dam lfli phan tach nho nhfit.

~:

I

j•





!•
I

L. ,,,_,_,,,,,,,,,,,..,./t.. ~::::.,.,.,,,,,._,,,,,,: ... -.w.-.,_,,,,.,,,_,_,,,J

Tru6c h~t chung ta din dinh nghia s6 chidu VC cho bai toan tach.

Giil sir dft li~u GU(YC ch{Jn trong tqp X new do. s6 chi~u vc cua h{J ham F a6i vtri X
Ia s6 1671 nhdt cac w}cta co th€ tach aU(Yc thanh hai 16p bdt kj; biri m9t ham thu9c
F
Gia sir cac vectcr du li~u duqc Ifiy trong JR{n. Xet ho ham bi~u diSn cac sieu ph~ng,
la ho ham ph\l thu<)c tharn s6 a E JR{n, F = { Ia : x E JR{n ~--+ a'1'x + b}. Ta c6 s6 chidu
VC cua F b&ng n + 1.
Nhu da d~ cap, SVM lam vi~c v6i cac sieu ph~ng chinh Hie duqc cfiu true b&i cac khong
gian gia thi~t I6ng nhau:

{f(x, w) = wT x + b, lwl ::; A}

12

DE TAl CAP CO sd


Chuang 2: MO HiNH SVM

Dinh ly sau [1] xac dinh s6 chi~u VC cua cac kh6ng gian ghi thi~t vai dfr li~u duqc ldy
trong mot mi~n bi ch~.n cua Jru.n.

Cho tcjp dft li?u v&i thimh phdn x c6 d(J l6n bi chcjn tren b&i giit tr; D va kh6ng
gian gia thih g6m cac sieu phimg chinh tlic v&i w c6 d(J l6n bt chc)n tren b&i gia
2
triA. Khi dy sJ chiiu VC bi chcjn tren hili giG tri min(int(D A 2 ), n) + 1.
Ro rang vai dinh ly 2 tren day, thi dinh ly 1 tru&c d6la c6 y nghla vai bai toan tach tuy~n

tinh vi s6 chi~u VC bi chan.

D~t

Yi(wTxi+b)~1

Yi(wTxi +b)< 1
Chung ta c6 m6 hinh toan hoc cua m6 hinh chiu I6i Ia bai toan quy ho~ch toan phuong
sau (duqc vi~t du&i d~ng rna tran):
1
-WTW
2

+ CeTz


fYT XT W

t

~min

+ by ~ e -

z

( 2.12)

z~O

Trong d6 tham s6 C dung d8 ki8m soat cac kh6ng gian ghi thi~t [ 1]. Mot each chinh xac
vai C cang Ian, ICri giai se duqc tim trong kh6ng gian gia thi~t cang phuc t~p, tuc kh6ng
gian gia thi~t c6 s6 chi~u VC cang Ian. Do rong cua l~ theo d6 cling ph\1 thuoc C, cho nen
nguai ta con goi m6 hinh chiu I6i Ia m6 hinh Mmim (soft margin)

Bai tolm quy

ho~ch

toan phuang vai dfr

li~u

duqc cho trong vi d\1 1 Ia:


1
Z(w12 + w22) + C(z1 + z2 + z3

w1
3w1
4w1
-w1
-3w1

+ z4 + z5)---+ min

+ w2

+b

~

+2w2
+4w2
-Sw2
-6w2

+b
+b

~

-b
-b


~
~

0

(i

zi

~

~

1 - z1
1- z 2
1 - z3
1 - z4
1 - z5
= 1.. 5)

Bai toan d6i ngfiu cua bai toan (2.12) duqc cho bai
1
2aTDa- eTa~ min

f yra = 0

to:::;; a:::;; c
Khoa THQL

( 2.13)

13


PHAN TiCH HOI QUY BANG SVM

trong d6 D

= yr xr XY

Ti~p vi d\l tren, tinh:

XY = (1

3
2

1

4
4

-1
-5

-3)
-6

Suy ra
5
13

20
-13
-21

2
5
8
-6
-9

D=

-6
-13
-24
26
33

8
20
32
-24
-36

-9
-21
-36
33
45


Va bai toim d6i ngfiu Ia:

af + Sa1 a 2 + 8a1 a 3 -

6a1 a 4 - 9a1 a 5
13a2 a 4 - 21a 2 a 5
36a3 a 5

a4

-

+6.5a~ + 20a 2 a 3 -

+16ai - 24a3 a 4
+13a: + 33a4 a 5
+22.5a~

-a1

-

a2

a1
{

-

a3


-

+ az + a3 -

a 5 ----+ min
~~- «s = 0

0 : : ; ai ::::;; C, (t

-

1.. 5)

Ctic c~p rang buoc d6i ngfiu bao g6m:
1

Yi(wTxi+b);;::: 1-zi

2

zi;;::: 0

ai

3

w=XYa

w


4

yTa = 0

ai;;::: 0
~

C

tuyy
b tuy y

Tac6
0

< ai < C-+ yi(wTxi +b)= 1

Suy ra vectO' thu i ung vcri thanh phAn CXj ducrng va be han

c Ia vectO' tva.

Vi d\l sau cho thAy vai tro cua C.
Vfin dfr lieu duQ'c cho trong vi du 1.
-

14

Vcri C = 1. C~p phucrng an: wr = (0.4, -0.8), b = 2.6, zr = (0, 0, 0, 0, O) cua
bai toan g6c va ar = (0, 0, 0.4, 0, 0.4) cua bai toan d6i ngfiu thea diSu kien t6i

uu. Ham muc tieu d~t gia tri nho nhAt b~ng 0.4.

DE TAl CAP Cd sd


Chuang 2: MO HINH SVM

V&i c = 0.2. C~p phucrng an: wT = (0.25, -0.5), b = 1.25, zT =
(0, 0, 0.75, 0, 0) CUa bai toan g{)c va aT= (0.03125, 0, 0.2, 0.05625, 0.175)
cua bai toan d5i ng[u thoa di~u kien t5i uu. Ham ffil,lC tieu dl;lt gia tti nho nhat
b~ng 0.30625.
K~t qua nay cho th§.y lai giai c6 th~ khac nhau trong cac khong gian gia thi~t khac nhau.
Trong vi d1,1 tren chung ta con quan sat thay hai di~u sau:

I. V&i
2. V&i

c l&n, Iai giai Ia trung nhau;
c nho, ham ffil,lC tieu giam, cho thay mo hinh cang kh&p v&i thvc t~

7

.

Trong bai toan tach tuy~n tinh, lai giai (sieu ph~ng tach) n~u c6 thuang khong duy nhat.
Ti~p can may vecta t11cl (Support Vector Machines- SVM) dua ra mot tieu chuAn t5i uu
cho phep chi ra lai giai t5t nhat trong s5 cac lai giai kha di. D~c biet ti~p can.nay xac dinh
sieu ph~ng tach qua cac vecta t11a (Support Vector) thay vi mot phucrng trinh tuang minh.
V&i cac vectcr tva chung ta c6 th~ phep thay d6i khong gian bi~u di~n bai toan rna khong
cAn tuang minh phep bi~n d6i. Ti~p can nay cho phep xay dvng cac mo hinh tach phi

tuy~n mot each m~m deo va hieu qua.
Hinh sau minh hoa truang hQ'p tach phi tuy~n, trong d6 thay cho duong th~ng tach Ia mot
duang cong tach. V&i nhfrng bai toan khong kha tach (theo nghia t6n tl;li mot sieu ph~ng
tach) chung ta giai n6 b~ng di tim mot phucrng an tach phi tuy~n.

0

0

Ban chat cua ti~p can SVM Ia tach tuy~n tinh. D~ thvc hien tach phi tuy~n, SVM chuy~n
du lieu bi~u di~n tap d5i tuQ'ng sang khong gian m6i v6i s5 chi~u Ian han. Bai toan trong
khong gian m&i Ia kha tach.

7

Hien tuQ'ng qua kh6'p (gifra mo hinh va mfru) khOng pMi Ia hien tuqng t6t. Viec qua khop voi mfru c6 th~

0
~N°hi~~kht~I~
, h;;'
,. 1·~
' c1'
~· ct'ung tIm~t~ ngu- vecta
'
eu m IvUiu~~mdvo~gt.d'
eng Iv ung tlm~~t ngu- vecta
o trc;r. Trong tat
Ivu nay
mng tm
(1fQ


VI vai tro th~t Sl)' ciia cac vectO' nay.

Khoa1HQL

15


PHAN TiCH HOI QUY BANG SVM

.•.:;: ..

·-. . .

v
"'
·.·. +b ) '\t,(i):
r\·:t ..., .•: ·.....·~
.,..

..

;$(~t

r

~··
'· ·
.-~:~"


+ttl)



..
•·

lj!{(i} ..
.

·:' •n: . . .

.LCI
"·x,

L.. . -.-~~: }~.:.. :;. . ;. . . . . . -. ;-.-~. . ;.. ;:....... :. .=~

Cho bang dfr

li~u

sau:
y

X

Mt

1


1

Mz

2

1

M3

4

-1

M4

5

-1

Ms

6

-1

M6

8


1

M1

9

1

Hai lop nay Ia khong kha tach. Chung ta se anh
chi~u b~ng 2 bai {mh x1;1:
x

H

Xl;l X

sang khong gian khac v6i s6

cfl(x) = (x, (x- 5) 2 )

Bang dfr li~u m6i Ia tach duQ'c trong khong gian m6i.

16

X

(x- 5) 2

y


1

16

1

2

9

1

4

1

-1

5

0

-1

6

1

-1


8

9

1

9

16

1

DE TAl CAP CO sd


Chuang 2: MO HiNH SVM

l-S ··y········································································································································································································································
.1G

~ . . . . . . .·.······~~==--················· ...·.·················...·.·········...-...................................................................................................................................·.···································-!;§:·.····-...........

··~····

: !=:~==~~~=~:,==:~==::::=====~

to·""-·····················-··························································----································································
I~
•.
•'


.s '"(·"·"·"·"·"·"-'"·"·"·" ·"·"·"·"·"·"·"·"·"·"·" ·" "·"·"-" ·" " ·"·"·"·"·"·"·" "-'"·"·"·"·"·"·"·"·"·"·" ·"·"·"-" ·"·"·" ·"·"·"·"-'"·"·"·" ·" "·" ·"·"·"·" ·" "·" ·"·"·"·"·"·"·" ·" " " ·"·"·'"-"·"·"·"·"·"·"·"·" "·"·"·" "·"·" ·" ·"·"·"·"·"·"·"·"·"

!. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6 ..

4 ..1. . .,....,:~~~:'::;':':~~:':~=:::~~;~,~~~:~;::~~~~~:::::~::~~:::::::::~~:~~~~~~~~::~~~:::=:~~~:·:.::~·~::,~::~:::
1

. ..

·l· · · · · · '"·'"·"·"·'"·"·"·"·"•"·"·"·"·"'"·" -" ·"·"·"·"·"·"·"·"·"·"·"'"~·.· · · '"·"·®'"·"·"·"•"·"-'"·"·"-'"•"•'"-"•'"-" ·"·"·'§'" "'"·"·"·"·"·"·"·"'"-"·"·"·"·'"·"' " "·"·'"·" ·"-'"·" ·"·"·"·"·"·"·"·"·"·"•"·"·"

0 ..;............................ .,.. ....................,........., ............®·····. . .·······:········-························;·························
0

2:

4

6

8

V&i each lcl.m ml.y chung ta g~p phai hai kh6 khan lien quan d~n imh X~ <1>:
1. Cach xac dinh anh X~ ml.y va
2. Tra v8 khong gian bi8u di~n ban dAu.

TM.t ra v&i SVM chung ta khong phai ban dim d8n di8u nay vi l

duang v&i viec xac dinh cac vee ta tua. That vay:
m

x

~ wTx + b

=I

aiyi(xi,x) + b

=I

aiyi(xi,x) + b

a 1>0

i=l

Bai toan g6c cho tach phi tuy~n khong khac gi v&i tach tuy~n tinh ngo~i trir cac vecta
bi~u di~n duc,1c cho trong khong gian m&i
1

-WTW + CzTe
2

~min

( wT ct>(xi) + b )Yi 2:: 1 - zi
{

z2::0

( 2.14)

Bai toan d6i ng[u cling khong khac gi so v&i tach tuy8n tinh
1

-aTDa2

eTa~ min

yTa = 0
{o:s;;a:s;;c
ngo~i trir viec thay th8 cac thanh phAn cua rna tran d6i xung XTX, cac phAn tir (xi' xj)' b&i

cac phAn tu mai (<I>( xi), ct>(xj)>.

Ky hieu

Khoa THQL

17


PHAN TiCH HOI QUY BANG SVM

Ma tran K

=


(kij) duQ'c goi la rna tran kernel.

Thvc t~ chung ta khong cAn phai xac dinh <1>, rna dung mot ham goi la ham kernel
kij

= kernel(xi,xi)

Trong nhi8u trucrng hQ'p nola ham cua tich vo huang
kii = kernel((xi,xi))

Nghien Cll'U vi cac ham kernel VUQ't khoi phl;lm vi cua d8 tai. Trong d8 tai nay chung toi
gioi thi~u mot s6 ham kernel thong dvng [2]
Bang cac lo\li ham kernel thong dvng

Bteu thuc

Logi
Tuyen tinh
Da thuc
Gauss

kernel(x,y) = (x,y)
kernel(x,y) = ((x,y) + 1)d
_llx-yll 2
kernel(x, y) = e 2u 2
tanh(K(X, y) - o)
1

Tang Hyperbol
Nghich dao


Jllx-yii 2

+P

Trang trucrng hQ'p su dvng ham kernel, chung ta khong din quay v8 khong gian ban dAu.
V Oi X, ta quy~t dinh X thUOC lop naa d\Ia vaa k~t qua:

x

~--+

wr cl>(x) + b

=I
=I
..

a 1>o

aJ>O

yjaj ( ci>(xj)

f cl>(x) +

yjajk(xi,x)

+b


b

.

¥.l~~} tf~;~~l.~ v~1~f?.lr c~JiJ i1-kt:~h: r~Jti lt.~:.r·i~~~

v8 ban chfit SVM la phuang phap di uoc luQ'ng cac thanh phAn cua vecta phap cua sieu
ph&ng (chinh t~c) t6i uu. Li~u uoc luQ'ng nay c6 vfrng? Hai dinh ly 1 va dinh ly 2 kh&ng
dinh di8u nay trong trucrng hQ'p tach tuy~n tinh khi rna s6 chi8u VC la hfru h~;tn. Di~u nay
khong con dung trong trucrng hQ'p tach phi tuy~n. Voi tach phi tuy~n chung ta giai bai
toan trong khong gian moi voi s6 chi8u VC c6 th~ khong hfru h~;tn [1, 2]. Tuy nhien SVM
la phuang phap cai d~t nguyen ly SRM voi cfiu true cac khong gian ham I6ng nhau c6 s6
chi8u VC hfru h~;tn [1]. Trong d6 tham s6 C cua mo hinh dung d~ ki~m saat s6 chi8u VC
cua cac khong gian gia thi~t con nay.

D~ xac dinh gia tri t6t nhfit cpa tham s6 C chung ta phai di giai mot bai taan t6i uu [1].
Day la mot chu d8 khac cua SVM n~m ngoai ph~;tm vi d8 tai cho nen chung toi chi d8 cap
d~n nhu la mot dam bao cho vi~c ung dvng cac mo hinh SVM vaa cac bai toan thvc t~.

18

DE TAl CAP CO sd


Chuang 2: MO HiNH SVM

Khac v6i bai toan tach, y nhan gia tti 1 ho~c -1, bai toan h6i quy lam vi~c v6i mi~n gia tti
cua y la mi~n gia tti lien we. Mot each trvc giac chung ta se chuy~n tap dfr li~u thanh 2
16p d~ c6 th~ xay dung mo hinh dua vao SVM. Vi d1,1 sau cho chung ta thAy SVM, v~ ban
chAt lam vi~c tren hai 16p, c6 th~ giai quy~t bai to{m h6i quy nhu th~ nao.


Xet bang du li~u:
y

X

1

1

1

3

2

3

3

3

4

5

4

6


7

6

Phuang trinh h6i quy la:

'

~~?~~
..~,._..,..,....,,.

~

.

:

".1".·.·.·.·.·.·.·.·.·.·.·.·.·:~ ~;: : ~ ~:~ ·.·.·.·.~·.·.:~.·.·.·.·.~·.·.·.·.·.·.·.·.·.·.·.·.·.·.~·.~·.·.·.·.·.·.·.·.·.·.·.·.·.·.·.·.·.·.·.·.·.·.·.~ ~·.·.·.·.·.·.·.·.·.·.·.·.·.·.·.

:J.

,j. . •.•.•.•.•.•.•.•.•.- . : i~· .•.• .•.•.•.•.•.•.•.•.•.•.•.w.•.•.•.w.• .•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.• .•.w.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•.•

·0 .\.............................. ,..............................1•...•.....•..•...••.•••...•.. , •.....••....•.•••..

B~ng each nhan doi m6i di~m va dfiy v~ hai phia cua tfl,lc y, chung ta c6 hai 16p.
Gia day c6 th~ ap d1,1ng SVM .
.fi

·r··-~·-································ .. ········•·········•························· ······························ ···············


.::......

······························
~

1;01

oo

m

·······•····•····••••··················

~ t: : .~: .·: : : : :.~·.-.~·: : .~ ·- -~:~: : : : ,~: : :.·: :.·: : : : : : : :.~·: : : : .: : : : : : : : : : : : .·: :,
•1

KhoaTHQL

i . . .,. . . ~·-··--···-·],. . ._.,., ..~ .......,. . ~............. ~. . . . . . §. .............. ?.............. s

19


PHAN TiCH HOI QUY BANG SVM

Tu true giac tren chung ta di xay dvng mo hinh toan hoc. D~ thAy mo hinh toan hoc duqc
thi~t Hip nhu th~ nao, chung ta xet chu6i bi~n d6i hinh thuc sau. D~t

Bai toan SVM cho vee to du li~u mai


x va phap tuy~n m6'i wIa

1
-wrw + cerz
2

~min

v(xrw +be) ;::: e- z
{
z;:::O
Tuong duong vai

1
-WTW
2

+ CeTz ~min

Y(XTw- y +be);::: e- z
{
z;:::o
Bay gicr n~u thay m6i c~p (xi, yi) bcri hai c~p mai chi a d~u cho hai lap Ia (xi, Yi - 1 thuoc lap c6 nhan b~ng 1 va (xi,yi

c)

+ 1 +c) thuoc lap c6 nhan b~ng -1, ta c6 bai toan:

1

-WTW + CeTz
2

~min

xr w - y + be ;::: -£e - z+
-xrw+y-be ;:::-Ee-z{
z+,z-;:::0
Ta c6 mo hinh toan hoc cho SVM h6i quy tuy~n tinh:
1

-wrw + CeT(z+
2

+ z-) ~min
T .
ce + z.L+
YL. - w xt - b <
-yi + wT xi + b :::; ce + zi-

{

z(,zi;::: 0

True giac c6 th~ giup tim ra nhung ti~p can phu hqp vai bai toan. Tuy nhien chung ta rAt
d.n ki~m djnh l1;1i mo hinh nay. Chung ta phai bao dam r~ng lcri giai cua mo hinh Ia mot
uac luQ'ng vfrng cho bai toan h6i quy. Vai ti~p can SVM chung ta se phai d~ cap d~n s6
chi~u VC.

Mo hinh SVM giai bai toan tach su d1,mg cac sieu ph!ng theo each sau:


"' {-1,1,

y=

wrx+bwTx+b;:::O

Khi Ay tAt ca cac khong gian gia thi~t d~u c6 s6 chi~u VC huu h1;1n. Mo hinh SVM giai bai
toan h6i quy su dl,mg cac sieu ph~ng theo each khac:

y = WTX + b
20

DE TAI cAP cd so


×