Tải bản đầy đủ (.docx) (57 trang)

luận văn -thuật toán bayes và ứng dụng - luận văn, đồ án, đề tài tốt nghiệp

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.83 MB, 57 trang )

DAI HQC QUOC HA HA N I
TRUIING DQ HQC CONG
NGHE,
THUAT TOAN BAYES VA IJ’NG DUNG
KHOA LU TOT NGHIEP DAI HOC HE CHINH QUY
Nganh : Cong Ngh e) Thong
Tin
DQ HQC QUOC HA HA N I
TRU£1NG DQ HQC CONG
NGHE,
THUAT TOAN BAYES VA IJ’NG DUNG
KHOA LU TOT NGHIEP DAI HOC HE CHINH QUY
Nganh : Cong Ngh e) Thong
Tin
C:in b(o huiing dd n: ThS. Nguye’n Nam Hai
C:in b(o dong hinting din: ThS. Dfi Hoii ng
Ki“en
Viet khoa 1ua) n khoa hoc la mot trong nhiing viec kho khan nhat ma em
ph:ii

hoén thanh tir trucrc de'n nay. Trong qua trinh thirc hie(n de tai em da gap rat
nhiéu kho khan va bcr ngo. Néu khong co nhiing su giup do va Hi dong vie“n chan
thanh cua

nhiéu thay co ban be va gia gia dinh co lé em kho co the hoén thénh 1ua( n
van my.
Dau tién em xin gfri Hi cain on chan thanh den thay Nguyen Nam Hai va thay
Do Hoang Kién da truc tiép huéing dan em hoén thanh 1ua)n van my. Nhcr co thay
ma

em dupc tie'p cm voi nguon tai lieu gia tri. cling nhir nhiirig gop y guy gin sau my.


Ben

canh su giup dci do, em co duoc cac thay bén Trung tain way tinh tao moi diéu
kien

em diroc tiép cfin voi
he thong. Em biét on nhiing ngay thang dupe tain viéc be“n cac thay, em khong
the“
mo
que“n nhiing ngay thang tuyet voi do.
Trong qua trinh gop nhiit nhiing kién thirc guy bau, ciic thay, co,
by
be la
nhimg ngufii da ciing em sat cénh trong suot thoi gian em hoc tap va nghie“n ciii
duoi

in:ii truéing Dai hoc Cong nghé.
Trong nhiing no luc do, khong the khong ké den cong lao to 1éin khong gi co the
den dap cua cha me nhiing ngucri da sinh thénh, duéing duc con nén nguoi, luon nhac

nhci, dong vién con hoén thénh tot nhiém vu.
Nguye“n Van Huy
Tom tat noi dung
Thong ké (to:in hoc) la bo mon toén hoc rat quan trong va co nhiéu Eng dung to
1éin trong thuc té, gifip con ngufii rfit ra thong tin
tu“
dii lieu quan sat, nham gi:ii
quye't

ciic bai toiin thuc té trong cuoc song.

Trong khoa 1ua) n my trinh bay
ve“
mot tiép cfin thong
ke“
trong vie(c du doén
sir

kie)n dua vao 1y thuyet Bayes. Ly thuyet my noi ve viec tinh xac suat ciia su kien
dira

vao cac két qua thong ké cac sir kien trong qua khu. Sau viec tinh toiin
Sau phan 1y thuyet chfing ta sé tim hieu ve bai to:in thuc té trong ngiinh cong
nghe“ thong tin. Bai toiin ve vie(c loc thu rite tjr do( ng. Giai quyet bai my la su két
hpp

tu“
rat nhiéu phuong an nhu DNS Blacklist, kie“m na nguoi nhan, nguoi gin,
dung bo loc Bayes, chan dia chi IP, Blacklist/Whitelist, Dung bo loc Bayes la
phuong tin

thong minh no gan gin véii nguoi dung bcri chinh ngucri dung da hua
luyen no nhan

bie't thu rac. Khoa 1ua)n my tap chung vao viéc tim hieu bo
loc thu rac Bayesspam


ma n on mcr cai
da)tt cho he thong email co ten la
S

SquuiirrrreellMMaaiill

rub n on m‹i dan
dirpc dung cho he thong email cua truéing dai hoc Cong nghe - Coltech Mail. Ket qua
cho thay bo loc co muc do hoat dong hieu quit la khac nhau tiiy thuoc viec ngucri dung

huan luyen cho bo loc thong qua cac thu dien tu ma ho cho la thir rac nhung rio chung
bo loc da dem lai hieu qua kha tot.
Thuat toan Bayes va ting dung
Thu)at toén Bayes va ring dpng
Phq lqc A Cn sn dir li(u ciia b(o lpc 43
Téi lie(u thaw khd o 44
Thuat toan Bayes v:i ting dung
Chuong 1 Giéi thi(u
1.1 T6ng quan
Khoa hoc thong
ke“
dong mot vat tro cuc quan Oong, mot vai tro khong the
thie'u dupc trong bat cir cong sinh nghién ciiu khoa hoc, nhiit la khoa hoc thirc nghiem
nhu y khoa, sinh hoc, nong nghiep, hoa hoc, va ngay ca xa hoi hoc. Thi nghiem dua
vao cac phirong phiip thong ké hoc co the cung cap cho khoa hoc nhiing cau O:i Hi
khach quan nhat cho nhimg van de kho khiin nhat.
Khoa hoc thong ké la khoa hoc ve thu th)ap, phan tich, die“n giai va trinh bay
cac dir 1i“eu de tir do tim ra ban chiit va tinh guy 1ua( t cua ciic hi(en tirpng kinh té, xa
hoi
- tu nhién. Khoa hoc thong ké dua vio 1y thuyet thong ké, mot loai torn hoc ring d g.
Trong 1y thuye't thong ké, tinh chat ngau nhién va su khong chiic ch% co the lam mo
hinh dua vao 1y thuyet xiic suat. Vi muc dich ciia khoa hoc thong ké la de tao ra thong
tin "ding nhat" theo dir lieu co san, co nhie“u hoc gia nhin khoa thong ké nhu mot loai
1y thuyet quye't du) ih.

i mo quan trong, cung cap
cac

thong tin thong ké trung thuc, khach quan, chinh xiic, diiy dii, ki.p tho trong viéc
diinh

gia, du biro tinh hinh, hoach dinh chién luoc, chinh such, xay dpng ké hoach phat
trién kinh té - xa hoi v:i d:ip ring nhu ciiu thong tin thong ké ciia cac to chirc, ca nhfin.
Trong so nhiing vai tro quan trong thi du bio tinh hinh la mot trong nhiing vat tro
mang

nhie“u y nghia, no co ca mot qu:i trinh hua luyen bén trong va co tinh xii 1y tu
dong khi da duoc hua luyen. Hay noi kh:ic hon la khi da co tri thiic lay tir ciic dii lieu
thong

ke“
hay kinh nghiem ciia nguéii dung két hpp véii mot phuong phap hoc (huan
1uye“n)
dua trén 1y thuyet thong ké ta sé co duoc mot
duoc nhiing quyet dinh voi do chinh xac kha cao.
thuc
de“
tu no co the dira ra
Phan tich thong ké la mot khau quan trong khong the thieu dirpc trong cac
cong trinh nghién ciiu khoa hoc, nhat la khoa hoc thuc nghiem. Mot cong trinh nghién
ciiu khoa hoc, cho du co ton ke" m va quan trong co mo, néu khong dupe phfin
tich
dung phirong phap
gicr co cc hoi duoc xuat hieu trong cac tap san khoa
hoc. Ngay nay, chi


nhin qua
tat ca cac trip san nghién emu khoa hoc trén the gioi,
hau nhir bat cu bai biro y hoc mo ciing co phan “Statistical Analysis” (Phan tich thong
ke“), noi run tae gin phai mo ta cfin than phuong phap phan tich, tinh toan nhu the mo,
va giai thich ngan gon tai sao su dung nhiing phuong phap do de ham y “biro ke“” hay
1
Thuat toan Bayes va ting dung
tiing trong luong khoa hoc cho nhirng phiit bie“u trong bai biro. Ciic tap san y hoc co
uy tin cang cao yéu cau ve phfin tich thong ké cang néng. Khong co phan phiin tich
thong
ke“,
bai biro khong the xem la mot “bai bao khoa hoc”. Khong co phfin tich
thong
ke“,

cong trinh nghién c hu chua dope xem la hoan tat.
Trong khoa hoc thong ké, co hai truéing ph:ii “canh nanh” song song voi nhau,
do la triréing phai tiin so (frequentist school) va tru6ng phiii Bayes (Bayesian school).
Phfin 1éin ciic phuong phiip thong ké dang str dijng ngay nay dupc phat trién tir trufing
phiii tfin nay, tru6ng phai Bayes dang trén da “chinh phuc” khoa hoc
bang mot suy right “moi” ve khoa hoc va suy luan khoa hoc. Phuong phap thong ké
thuo( c truéing phai tiin so thiréing don gién hon ciic phuong phap thuo( c trufing phiii
Bayes. Céi nguoi tiing vi von rang nhiing ai law thong ke“ theo trufing phai Bayes 1:i
nhting thieu tai!
De hieu su khiic biet cc bun giira hai truéing phiii my, co
Ie"
cfin phai noi doi
qua vai dong ve triét 1y khoa hoc thong ké bang mot vi du
ve“

nghién ciiu y khoa.
De“
biet hai thu)at diéu tri co hieu qu:i giong nhau hay khong, nha nghién ciiu phai thu thap
dir 1ie(u trong hai nhéim be“nh nhfin (mot nhom dupc diéu tri. bang phuong phiip A,
va mot nhom duoc diéu tri. bang phuong ph:ip B). Truéng phiii tiin so da)t ciiu hot rang
“néu hat thuat die“u tri. co hieu qu:i nhu nhau, x:ie suat run dir lieu quan sat
1:i
bao
nhie“u”, nhung truéing phiii Bayes hoi khiic: “Voi dli lieu quan siit dupc, xiic suiit rn:i
thua) t diéu tri. A co hieu qu:i cao hon thu(at diéu tri B la bao nhiéu”. Tuy hai ciich
hoi

thoat diiu méii dpc qua thi chiing co gi khiic nhau, nhung suy right chung ta se
thay

do la so khac biet mang tinh triét 1y khoa hoc va y nghia ciia no rat quan trong.
Doi voi nguoi bite st (hay nha khoa hoc noi chung), suy luan theo tru6ng phai Bayes
la rat tu
nhie“n, rat hpp véii thuc Ie. Trong y khoa tain sang, ngucri biic st ph:ii su
dijng ket qua

xét nghi e( m de phiin doiin bénh nhiin mac hay khong mac ung thir (ciing
giong nhu

trong nghién ciiu khoa hoc, chting ta phiii
st
dqng so lieu de suy 1ua)n ve
khii uang ciia mot gin thiét).
Thu a)t to:in Bayes va ring
dpng

1.2
Can true
Ciic phan co lai cua khoa 1ua)n co cau triic nhir sau:
Chuong 2 trinh bay cc so 1y thuyet Bayes cue khiii niem, phirong phap duoc
su dung trong khoa 1ua)n.
Chuong 3 trinh bay 1y thuyet Bayes riling cao - Naive Bayes. Chuong my sé
de cap den khiii nie“ m, uu diem va Eng dung phfin loai cua no tir do cé n cu nghién
c6u

xay dprig he thong phfin loai viin ban.
Chuong 4 trinh bay chi tiét ve bo loc bao gore cac van de
ve“
cc so tri thuc,
viéc
hut
luyen cho bo loc, cach thuc lam viec va huéing c:ii tién trong viec loc thu
riic.
Chuong 5 trinh bay ket 1ua)n ve chirong trinh ting dqng bo loc BayesSpam cai
dat trén he thong thir dien tir Squirrelmail.
Chuong 2 Cn sit ly thuyét
2.1 Ph:it biéu d]nh ly Bayes
Dinh 1y Bayes cho phép tinh xiic suat xiiy ra ciia mot sir kien ngau nhién A khi
biet su kien lien quan B da x:iy ra. X:ie suat nay dirpc hie(u la P(A B) , va doc
1:i
xiic suat ciia A ne'u co B
.
Dai luong n:iy dupc got xac suat co diéu kien hay xiic suat
hiiu nghi e( m vi no duoc riit ra tir gin tri diroc cho cua B hoac phu thu(oc vao gia
tri.
do.

Theo dinh li Bayes, xac suat xay ra A khi biet B se" phu thuoc vao 3 yéu to:
> Xiic suat xay ra A cua rie“ng no, khong quan
than
den B. Ki hieu la
P(A) va dpc la xac suat cua A. Day dupc got la xac suat bién duyén
hay xiic suat tién nghiem, no la "tién nghiem" theo nghia rang no
khong

quan
than
den bat ky thong tin mo ve B.
Xiic suat xay ra B khi biet A xay ra. Ki hie(u la P(B A) va doc la
"xiic

suat cua B néu co A
.
Dai luong my goi la kha uang
(
likelihoods xay

ra B khi biet A da x:iy ra. Chu y khong nham lan giita
kha uang xay ra A khi biet B va xac suat xay ra A khi biet B.
Khi biet ba dai luong my, xac suat cua A khi biet B cho boi cong thuc:
Thuat toiin Bayes va ting dung
1
0
2.2 Circ tiéu héa riii ro trong bai toiin phan lép
Bayes
Bay gio xem xét bai toiin nut chat, hay hinh dung réng nha rniiy sun xuat dirpc 2
loai la: wi = Super va wi = Average

GU su them rang nha may co mot ho so ctia ciic kho chua s:in pham
de“
fun gin,
toan tune lai nhu sau:
Theo do ta de dung tinh dupc xac suat de mot nut chai thu(oc 16p mo nong 2
1éip, day goi la xiic suat tién nghie“ m hay la prevalences:
P(w;) = n;/n = 0.4 P(w ) = n /n = 0.6 (1-1)
De y rang xac suat tién nghiem trén khong phai hoan toan php thuoc vao nha
way san xuat ma no chit yéu vao chat luong cua nguyén 1i“eu. Tuong tu mot b:ic
st

chuyén khoa tim khong the mo kiém soat xac suat bénh nhoi mau cc tim cua mot
nhom dan cir. Prevalences co the tain diéu do btri vi no lien quan de'n uang thin tu
nhién.
Gin sir bai toan yéu cau thuc hieu mot quyet dinh khong ro rang, chiing han
chon 16p cho cai nut chai bat ma khong biet gi ve nut chat do. Ne'u chi co thong tin
la xac suat tién nghiem thi ta sé chon lip wi. Voi cach my ch6ng ta mong réng no chi
sai 40% so 1fin.
Gi:i su rang chfing ta co the do dupc vecto da)c trung ctia nut chai, p(w,|x) la
xiic suat co diéu kie“n
suat de doi tuong x thuoc 16p w,. Néu chfing ta co the
Thu)at toiin Bayes va ring
d
g
Xiic suat hiiu nghiem P(w,|x) céi the“ tinh dirpc neu chiing ta biet pdfs (cac ham
mat do xiic suat) ctia cac phiin phoi vec to da)c trung ctia 2 1ép. Sau do ta tinh cue xiic
suat p(x|w,) , la xac suat de doi tuong thuoc 1ép w› co da)c trung 1:i x goi 1:i
likPlfhOOd of x tain di.ch la kha uang x:iy ra x hay 1:i hpp Ij cua x. Thirc té ta dung
cong thuc Bayes:
Luu y réng P(w,) va P(w,|x) la cac xac suat rcri rac, trai lai p(x|w,) va p(x) la

cac gin tr] cua ham run.t do xac suat. De y rang khi so s:inh (1-2a) ta co gia tri chung la
p(x) do do ta viét lai:
if p(x|w;) P(w;) > p(x|w )P(w ) then x ÷ w; else x c w . (1-4)
Hay la:

then x ÷ w; else x c w
.
(1-4a)
Trong “rig thuc (1-4a) thi v(x) goi la so hpp 1y (likelihood ratio)
20
18
16
6
t4
Hinh 1: Bieu do ciia da)c trung N cho hai 1dp hpc ciia cue nut chai. Gin tri
ngufing N = 65 dupe dénh dau bang mot duéing thang diltig
Gin ski rang moi nut chai chi co mot dac trung la N, tire la vec to d)ac trung la x = [N],
gia sir co m(ot nut chai co x = [65].
Tit do thi. ta tinh dupe cue xiic suat likelihood:
p(x|w ) = 20/24 = 0.833
——›
P(wi) p(x|wi) = 0.333
(1-5a)
p(x|w ) = 16/23 = 0.696
——+
P(w ) p(x|wi) = 0.418
(1-5b)
Ta sé phfin x = [65] vao 1ép w iinc du hpp 1y(likelihood) ciia wi hit hon
ciia
Wz

Hinh 2 rninh hpa anh huéing ciia viec dieu chinh ngui5ng xiic sufit tién nghiem
den cue hum mat do xiic suat.
>
Xiic suat tie’n nghiem dong nhat (equal prevalences). Véri cue ham mat do
xiic
sit
dong nhat, ngufing quy d)mh la mot ma khoiing ciich de'n phan
tit

trung binh. So lupng cue truéing hpp phén 1éip sai tuong ting véii viing
dupe

to d(am. Day la viing ma khoang ciich phén 16p la nho nhat.
>
Xiic suat tién nghiem
cc
wi 1éin hon ciia w
.
Ngufing quyet dinh thay the
cue 16p co xiic sufit tién nghiem nho hon. Vi vay giiim so truéing hpp ciia 16p
co xiic suat tie’n nghi(em cao duéing nhu co v’e thu(an tien.
Thua)t toiin Bayes va ting
d;ing
Hinh 2: Xiic suat tién ng1iie(m dong nhat (a), khong dong nhat (b).
Chting ta thay rang that su do tech ngufing quye't dinh da de'ii lip w tot
hon
1érp
wi. Die’u my nghe co ve hpp 1y ke tit kin rné bay gici
1érp
wz xuat hieu

thuéing

xuyén hon. Khi do sai toén phfin tang ten dieu la la su anh huéing ciia xiic suat
tién

nghiem la co loi. Cau trii Hi cho cau hoi my la tién quan den chit de phan 1dp
mao

hiém, ma sé dupe trinh bay ngay bay gici.
Chfing ta gia dinh réng gin ciia mot nut chai (cord stopper) thu(oc 1éip w; la
0.025£, lip w la 0.015f:. Gia sir la cue nut chai 1dp wi dope dung cho cue chai da)c
bi(et, co c:ie n6t chai 1dp w thi dttng cho cue chai binh thufing.
cho loai chai d)ac biet.
NB - Hiinh dong ciia viec sit d;ing mot nfit chai(cord shopper) de phan
cho loai chai binh thuéing.
×
wi = S (siéu 16p); wz = A (1dp trung binh)
Thufit to:in Bayes va King d;ing
DISCRIH Rows Observed classifications
AR ALYSIS Columns: Predicted classifications
Total
. 0 0
.
0
7 3
,
0 0 0 0 0
Hinh 3
:
Ket qua phan léip ciia cork stoppers véii xac suat tién nghiem khong dong

nhat: 0.4 cho léip w1 va 0.6 cho léip w2
D)mh nghia:
Thua) t toiin Bayes va mug
dung
15
0 0.015

0.01 0
(1-6)
Vi the do(
it
unit véri hanh dong sts d rig mot nut chai (mo ta béii vecto da)c
trung x) va phén vao cho nhimg chai da)c biet co the dupe biéu thi nhu sau:
R(o | x) = R(SB | x) = k(SB | S)P(S | x) + Z(SB | A)P(A | x) (1-6a)
R(o | x) = 0.015 P(A | x)
Tuong tu cho tritéing hpp néu phén cho nhiing chai thong thuéng:
R(o | x) = R(NB | x) = X(NB | S)P(S | x) + X(NB | A)P(A | x)
(1-6b)
R(o | x) = 0.01P(S | x)
Chting ta gia d)mh ning diinh gin riii ro chi chi.u anh huéing tii quyet djnh sai.
Do vay mot quyet dinh chinh xac
ra thiet hai ,=0, nhu trong (1-6).
Chung ta quan tain de'n viec giiim thie’u min riii ro trung binh tinh cho mot
lupng 16n nut chai bat
.
Cong thiic Bayes cho riii ro nho nhat lam dupe die’u my
bang ciich cue tiéu hoa cue riii ro cé› die“u kie( n R(O, | x).
Cut sit ban dau ning cue quyet dinh sai tain co ciing mot mat rent, chiing co ti Ie
véii mot don vi mat rruit:
0 if i

——
j
1 if j « j
(I -7a)
Trong truéing hpp my tit tat cii cue x:ie suat hau nghiem déu tiing ten mot,
chfing ta cén phiii cue tiéu hoa:
Thua)t toiin Bayes va ting
d;ing
Dieu my tuong dirong voi viec chfing ta cuc dai P(wi | x), 1ua)t quyet dinh
Bayes cho riu ro cuc tie“u tuong Eng véii viec tong quat hoa van
de:

(1-7c)
Tém lai. lu‹1t quyét dinh Bayes cho
ii
ro circ tiéu, khi sir phén top dung thi khong bi
wit
mét vé néu nhir phiin top sai thi co
wit
incit, la cén phéi chon dirac top co xéc
suit
héu nghiém
ID
crc dai.
Hann quyet dinh cho lip wi la:
g,(x) = P(w, | x) (4-18d)
Bay gio hay xem xét cac tinh huong khac nhau cua ciic thie“t hai x:iy ra
cho

nhimg quye“t dinh sai law,

de“
cho don gum gia su c = 2. Dna vao cue biéu thuc
(1-6a) va (1-6b) that de nhfin thay rang mot nut chai sé thuoc 1ép w neu:
Vi the nguéng quyet dinh so voi ty so hpp 1y(likelihood) thi no nghiéng ve su
mat rent. Ta co the cai dat 1ua)t quyet dinh Bayes nhu hinh
5.
Tuong tu chfing ta co the dieu chinh xac suat tién nghiem nhu sau:
Thuat toan Bayes va ting dung
i hicshold unit
Hinh 5:
lmpleineiaatioii of the baye.stair decision rule for two c lasscs with
diff creii i los.s factors l’or wrong flecisi oa.s
chinh la gum

luon cue
qu:i phiin lép o hinh
cr
hinh 6.
DISCPIII Rovs
:
0bserved c lass i I i cat ions
A1f
é£YS
1$
Co Iunns Pred ict ed
class i
I mat ions
G_ 1 1 G_2 2
p= 3 0 8 0 0 p• 6 9 2 0 0
Tof at


Ta co the tinh gin tri. rfii ro trung binh truéng hpp co 2 1éip:
Thuat to:in Bayes va ring dung
Chung ta hay sit d rig tap dir lieu h 1uye(n de diinh gin nhimg sai so n:iy,
Pe; =0. 1 va Pe =0.46 (xem hinh 6). Rim ro trung binh doi vcii moi n6t chai bay gici
la:
R = 0.015Pei + 0.01Peii = 0.0061C.
Véri f 2 la tap cue l6p ta co cong thuc (1-9) tong qu:it:
Lu(at quye't dinh Bayes khong phai la lua chon duy nhat trong thong ké phén
1ép. Ciing fun y ning, trong thuc
te'
mot trong
nhimg
giiim thieu riii ro trung
binh la su d;ing me tuong ciia ham mat do xiic suat tinh dupe tit mot tap dli lieu huan
luyen, nhu chiing ta da tain o trén cho cork Stoppers. Ne'u chiing ta co nhimg cm cir de
au thi ta thay the viéc tinh
cue thaw biéu thich hpp tit ta’p h luyen. Hoac la chiing ta
" co
the’
sts d;ing
phuong phiip cue tiéu hoa riii ro theo kinh nghiem (empirical risk minimization
t
ERM)), nguye’n tae la cue tiéu hoa rid ro theo kinh nghiem thay vi riii ro thuc té.
2.3 Phan l6p Bayes chuan tae
Cho den gicr chiing ta'“ chua
gia d)mh da)c trung ciia phiin phoi mau cho
likelihoods. Tuy nhién, mo hinh chufin tae la mot gin d(mh hpp 1y. Mo hinh chufin tae
co tién quan de'n dinh 1y giéii han trung tain noi tiéng, theo (dinh 1y my thi tong ctia mot
tuong 1éin cue bién

au nhién doc
ve 1ua)t chua. Thuc té ta co dupe mot xap xi den 1ua)t churn tae, thtim chi voi ca
mot

doi nho dupe them vao cue bie'n ngau nhie’n. Doi
véii
cue da)c
trung
cé›
the dupe coi la két quit cua viec bo sung cue bie'n doc lap, thuéing thi gia du) ih la co the
chap nha) n.
Likelihood chufin tae ctia 1éip in, dupe bieu dien béii hém rrit.it do xiic suat:
Thua)t toiin Bayes va ting
d;ing
Hinh 7 minh hoa plinth p/i i chuén song truéing hpp co hai
chie’u.
(1-10)
(1•10a)
(1-10b)
Cho mot tap huan luyen co n mau T={X;, X , . . . X } dupe
“ t:i boi
phoi
vcri
ham
run.t
do xiic suat la p(T
|
8),
B
la mot vec

ter
thaw so ctia phan phoi
(chang han nhu vec to trung binh ctia phén phoi chufin). Mot ciich dung chit y tinh
dupe me lupng u ciia vecto thaw bie'n la cue dai hoa hum
rrif.it
do xiic suat p(T
|
8),

co the coi diiy la mot ham ciia 8 goi la likelihood of 8 cho tap hum luyc:n. Gia su
ning
'i
mau la dna
vao doc lap tit mot tap vo han, chiing ta co the biéu th] likelihood
nhu
sau:
Khi sit dpng trc›c lirpng hpp
IQ
cvc dqi (maximum likelihood estimation) ciia
cue bun phfin phoi thi no thuéing de dang hon la tinh cue dai ciia In[p(T|8)], dieu my
la tuong duong nhau. Voi phén phoi Gauss trtc /trpzig méu dupe cho béii cue cong
thuc (1-10a) va (1-10b) chinh la inc lirpng hpp ly cvc dqi va no sé hoi tu
thuc.
14
'
mot gin
Thua)t toiin Bayes va ring
d;ing
Hinh 7: The be ll -shapcJ .surfacc of a two-dimensional normal disiribu‹ion
An ct \ ipsis with equal probability density points is at.so shown.

Nhu co the’ nhin thay tit (1-10), cue be mii.t ciia mat do xiic suat dong nhat voi
hpp Iy ' chum (normal likelihood) thoa man Mahalanobis metric:
Bay gicr chting ta tiép tuc tinh ham quyet du) ih cho cue da)c trung ctia phan phoi
g,(x) = P(in, | x) = P(in,) p(x | in,) (1-11)
biéu doi logarit ta dupe:

(1-1 1a)
l
l
(1-11b)
Bang ciich sts d;ing nhiiog ham quyet dinh, ro rang phu thuoc Mahalanobis
metric, ta co the xay dpng phén
1éip
Bayes
véri rim
ro nho nhat, day la phén lép Hi
uu.

Chis y rang cong thuc (I-llb) sir d;ing gin
tri.
that ctia khoang ciich Mahalanobis,
trong

khi
rrui
truoc do chfing ta su d;ing me lupng cua khoiing c:ich my.
2
Thua)t toiin Bayes va ting
d;ing
Qua do ta co dupe ham quyet dinh tuyén tinh

(1-12)
(1-12a)
H
a
i
1
é
i
Thuat to:in Bayes va ring dung
l6
p
phfin blet voi phan phoi chua, x:ie suat tién nghiem dong nhat
va
covariance va viin con co mot cong thiic rat don gum cho xiic suat cua loi
ciia phiin

(1-13)
(1- 13a)
(1-13b)
Thuat to:in Bayes va ring dung
l6
binh phuong ciia kho:ing ciich Bhattacharyya, mot khoang ciich Mahalanobis
ciia sai phiin trung binh, the“ hieu tinh de t:ich 1éip.
Hinh 8 the hieu dung die(u ciia Pe voi s;r tiing dan ciia binh phuong khéng
c:ich

Bhattacharyya. Ham my gum dan theo cap so mii va no hoi tu tiém cén Hi 0. Vi
vay

that kho de gum sai so phan 1dp khi gin tri. my la nho.

Luu y réng ngay ca khi cue phan phoi u khong phiii la phfin phoi chub, mien
la ch6ng doi xiing va phiii tuiin theo Mahalanobis metric, thi chiing ta sé thu dupe mat
phén l6p quyet dinh tuong tu nhu phan 16p chuén, cho dii co en khiic biet ve diinh gin
sai so va xiic suat hau nghiem. De minh hpa ta hay xét hai 16p co xiic suat tién nghiem
dong nhat va co ba loai phén phoi doi xfing, véii ciing do lech tiéu churn va trung binh
0 va 2.3 nhu hinh 9.
1•lnrmaI distri button: p( x
]
‹zi, )
2
Hi
Tl
h 9: Two classc.s with symmetric distributions and thc same .standard deviation
(a)
Normal; (h) Cauchy; tc) Logistic
Thu(at torn Bayes va ring
d9ng
24
Phan 1éip toi uu cho 3 truéing hpp su d rig ciing mot ngufing quyet dinh co gin
tri. 1.15, tuy nhie’n cue sai so phén 1dp la khiic nhau:
Nomal: Pe = 1

erf(2.3/2) =
12.5%

Cauchy: Pe = 22.7%
Logistic: Pe = 24.0%
Két quit thuc nghiem cho thay, khi ma tran covariance dna ra do( tech gioi han,
thi su phfin 1éip co the thjrc hi(en mot ciich tuong tu véii phuong phiip toi uu véii dieu
ki(en cue covariance la dong nhat. dieu my la hpp 1y vi khi cue covariance khong

khiic

biet nhau nhie“u thi s;r khiic biet giiia cue giiii ph:ip b)ac hai va tuyén tinh chi
dung ke
khi cue ma'
u
u

c
ci
i
i
i
c
c
h
h xa
n
nguyéenn
u nhu o hinh 10.
Hinh 10: Discrimination of two c:lasses with optimum quudrutic c lassi fier (solid
line) and sub-optimum linear classified (dotieil line)
Chfing ta se minh hoa béng ciich sit d;ing b(o dir lieu Norm2c2d. Sai so 1y
thuyet doi voi truéing hpp hai 16p, hai chiéu va bo dli lieu tre’n la:
0.8 —0.8 2
—0.8 1.6 3
Ucic tuong sai so ciia bo dli lieu hua luye(n cho tap du lieu my la 5%. Bang
ciich dna vao sai so ±0.1 vao cue gin tri ciia ma tr(an énh xa A cho bo dir lieu, voi
do(


1e(ch mm giiia 15% va 42% gin ri. ciia covariance, ta dupe sai so tap huan luyen
la

6%.

×