Tải bản đầy đủ (.pdf) (11 trang)

Xây dựng tập nhãn từ so sánh để phân tích xảm xúc người dùng từ những bình luận tiếng việt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (310.64 KB, 11 trang )

TAP CHi KHOA HOC BAI HQC VAN LANG

Ly Thj Huy^n Chau

XAY DlTNG TAP NHAN TlT SO SANH DE PHAN T I C H

CAM XUC NGlTOfl DUNG TtrNHU*NG BIIVH LUAN TIENG VIET
CONSTRUCTION SET LABELS OF COMPARISON SENTENCE TO SENTIMENT
ANALYSIS OF THE USER FROM VIETNAMESE COMMENTS
L Y THI HUYEN CHAl/'^

TOM TAT: Cau so sdnh ddng vai tro quan trgng trong viec the hien edm xuc cua ngudi
viet vi vdn di hg dang quan tdm bdng each so sdnh vdi cdc ddi tugng khdc nhdm dua ra
quan diem ddnh gid ddi tuang Id tdt hoac khdng tot. Bdi viit xdy dung tap nhdn de xdc
dinh cau so sdnh trong nhimg binh lugn tieng Viet thuoc mgt mien cu the (trang web bdn
dien thoai di dgng) vd tap nhdn tic so sdnh dugc ung dung de dua ra ket qud phdn tich edm
xuc cua ngicai diing dua tren cdc binh lugn cua hg. Viie xdy dung ndy dugc thirc hien timg
bude bdng cdch phdn tieh trin mdt miin die lieu cu thi, ddng thdi icng dung cdc chuang
tri.:h xu ly ngdn ngii vd kho tic vimg phong phu cua Tic diin cam xuc tiing Viet de dua ra
kit qud phdn tich vdi do chinh xdc cao. Hieu qud eiia phucmg phdp ndy dugc thi hiin
thdng qua mgt chuang trinh ung dung dugc xdy dung di ddnh gid do chinh xdc cua tap
nhan xdc dinh edu so sdnh trong binh lugn tieng Viet.
Tic khoa: nhan, so sdnh, edm xuc, diem tich cue, diem tieu cue.
ABSTRACT: Comparison sentences have important role in presenting the writer's
emotions about the issues they are concerned by comparison with other objects in order to
evaluate whether the object is good or bad. This paper builds set labels to identify the
comparison sentences in the Vietnamese comments in a specific domain (website selling
mobile phones) and the collective label for comparison used to analyze the emotions of
users based on their comments. The construction is carried out gradually by analyzing
data of a specific domain, and applying special programs to processing language and by
referring to the rich vocabulary of the Vietnamese emotional dictionary in order to arrive


at highly accurate results of analysis. The effectiveness of this method is
manifestedthrough an application program which is built to evaluate the accuracy of the
collective label in determining comparison sentences of Vietnamese comments.
Key words: label, comparative/comparison, emotions, positive points, negative points.stu
1. DAT VAN DE
Su gia tang ciia cac thiSt bi su dung
web cho phep con nguoi co th6 giao tiep

voi nhau trong cong dong web bang nhieu
hinh thiic khdc nhau nhu dien dan, mang xa
hgi, blog. Do do mgt s6 lugng lan cac dii

'•' ThS Trucmg Dai hoc Van Lang, Email' \


TAP CHi KHOA HQC BAI HQC VAN LANG

heu khong d6ng nhdt dugc tao ra boi nhiing
ngudi sii dung trong cdc c0ng dong, trong
do nhung edu binh lu|n cua ngu&i dung la
mgt ngu6n tdi nguySn vo cung Idm vd c6 y
nghia thyc tiln. Hi?n nay, cdc doanh
nghi?p lu6n sii dung cac m ^ g xa hgi true
tuyln d§ quang bd kinh doanh cila cong ty,
cung nhu sii dung cdc dich vu v6n c6 ciia
m6t trang mang truyin thong xa hgi dang
CO de phuc vu cho ho?t d^ng kinh doanh
cuahg.
Trong thdi dgi phdt triln cua mang xa
hgi, th6ng qua nhiing cau binh li4n d^g so

sdnh, ngudi diing mang xa h6i muon trinh
bdy thai dg cua minh vl san phim minh
quan tdm, hoac mudn tim hiiu vl sdn pham
(dien tii c6ng ngh$ nhu may tinh, di?n
thoai) thong qua nhiing binh lugn truac do
ciia nguoi sii dung da tiing tim hieu. V6
phia doanh nghiep, hg muon hilt dugc
ddnh gid ciia ngudi diing vl sdn p h ^ ciia
cong ty tii cdc binh luan co tinh chat so
sdnh do, chiing the hien s\i ddnh gid cua
ngudi binh ludn ve sdn ph4m ciia c6ng ty
dua vdo mgt sdn phdm khdc dugc so sdnh,
CO the t6t hon hoac t6 hon vd gay anh
huong den tdm ly, edm xiic ciia nhilu
ngu6i dgc khdc.

S6 02 / 2017

dilm kh6ng dgt dugc ket qua t6t khi trong
edu c6 nhilu han mgt nguoi ndm giii chii
dl.
Mgt nghiSn cuu khdc ciia Jindal va Liu
[7, tr.244-251] cho thdy viSc xdc dinh cau
so sdnh hiJu ich cho viec phan tich cau
trong tdi Ii$u. Nhan thIy dugc tdm quan
trgng ciia cau so sanh, bdi bdo dua ra nhiing
vdn dh cua vice xac dinh edu so sdnh, phdn
logi cdc cau so sanh, dua ra cdc nhan vd sau
do tilp can phuong phap hgc giam sdt de
xac dinh edu so sdnh tCr tai lieu bang viec

kit hgp phuong phap CSR (Class
Sequential Rules) vd hgc may (Machine
Learning).

Ngoai ra, bai viit [4, tr.417-422]
nghiSn ciiu xdy dvmg tii diln edm xiic dua
tren b6 tii v\mg tiing Anh vdi cdc trgng so
dilm tich cue vd tiSu cue. Nghien ciiu xay
dung tap tii diln tir vung SentiWordNet
lam nguon tai nguyen cong khai cho cac
nghiSn cuai khai thdc quan diem khdc.
M6t nghi6n curu khdc tuong tu (?hu
Jindal vd Liu [8, ti:.1331-1336] phan loai
cac logi edu so sdnh, xdc djnh cac dac diem
ri6ng ciia chiing, cdch xac dinh vi tri cua
cac th\rc till dl dua ra kit qud khai thac
quan diem chmh xac. Tuy nhien, chua xac
2. THV*C TR4NG NGHIEN CtTU dinh dugc cdc d6i tugng khdc trong edu so
PHAN TICH CAM XUC TtT CAC sdnh vd chi thuc hien trSn ng6n ngii tieng
Anh.
BiNH LUAN SO SANH
Nhdn thdy dugc tdm quan trgng ciia
Khai pha quan dilm tien miic d6 cau
viSc nit trich quan diem tii nhiing binh luein va cum cau dugc thuc hien tiong [5, tr.201COti'nhchat so sdnh, nghien ciiu [10] dua ra 248]. NghiSn cthi dl xuit dugc cac giai
phuang phdp xac dinh edm xiic ciia nguoi phap de giai quyet cdc vdn de tin dgng cf
dung bdng cdch dua ra ngu6i ndm giii quan nghiSn ciiu hnidc cua tac gia. Vdi nhiing
dilm, d6ng thoi xdc djnh cac tii cam xiic da k§t qud dgt dugc la nguin tham khao tot
tao nSn nhilu edm xiic trong mfit cau. Tuy lien quan den khai phd quan diem.
nhien, vi^c xdc dinh ngudi ndm giii quan



TAP CHi KHOA HOC DAI HOC VAN LANG
Trong nghien curu [14, ti.230-235],
nhom tac gia phan tich d l thuc hien nhiing
cong viec chinh ciia viec khai pha quan
diem tu nhirng binh luan tren web cua
khach hang ve san phdm va dich vu ma hg
quan tam sir dung. Ket qud la nghien ciiu
cung cap mgt cai nhin tong quan khi dua ra
nhieu cong viec vd ky thuat dap ling viec
khai pha quan diem.
Mgt nghien ciiu khac, [6, tr.2Il-217]
thuc hien viec khai thac quan diem tu
nhihig tieu blog tren internet bang each riit
trich cdc tinh tir thugc mgt ITnh vuc cu the,
dong thdi dua ra cdch tiep can mai bang
phuang phdp tu dgng trich xudt tinh tii de
dua ra quan diem ngudi diing tir nhung tai
lieu thu thap dugc tren internet.
Nhdn thdy khai thac quan diem la
nhiem vu ciia viec trich xuat tir mgt tap hgp
cac tai lieu, nghien ciiu [2, U-.523-526] danh
gid cdch tiep can viec sir dung dau ngoac
chii thich trich tii tin tiic dugc cung cdp bdi
cong cu thu thap tin tiic Europe Media
Monitor (EMM). Nghien cuu nay thuc hien
tren dii lieu dac biet (bang bao gia), se lam
da dang viec kham pha quan diem ngudi
tieu dung.
Viec phdn tich cam xiic tren miic do

cau dugc thuc hien trong nghien ciiu [9,
tr.153.153] bang each xdy dung he thong
phdn tich cam xiic dua tren quy tac bang
each sii dung Framework Gate. Nghien ciixi
ndy cho thay ket qua phan tich cam xiic cho
mgt vai san phdm tren dii lieu training va
dii lieu test dat ket qua chinh xac cao, dfing
thdi tao tien de de khai pha nhiing vkn dl
lien quan den phan tich cam xuc tieng Viet.
viec

Ngoai ra, trong [1. tr.17-23] trinh bay
xay dung tir dien tir vung

Ly Thi Huyln Chau
SentiWordNet giiip ngudi diing phdn loai
cam xiic vd trich xudt quan diem. Tuy
nhien, cdc tu vung trong tir dien chua day
dii va chi ddp ling trong mgt mien cu the.
Dimg dii lieu thu thap dugc tir Twitter,
[11, tr.538-541] nghien cuu cdc tien ich ciia
tinh nang ngon ngit de phdt hien cam xiic
ciia cdc thong diep Twitter. Ddy la ddnh gia
ve nguon tai nguyen sir dung, thuc su hiru
ich cho nhieu nghien ciiu sii dung de khai
phd quan diem.
Nhdn thdy tdm quan trgng cua tir khoa
trong viec riit trich quan diem, nghien ciiu
[3, tr56-59] tap trung xac dinh tap tir khoa
de phan loai va nit trich quan diem. Nghien

cuu dua ra tap tir khoa phan loai edm xiic
vd ddnh gia tinh hieu qud ciia tap tir khoa
do gop phan cho cac nghien ctiu khai pha
quan diem sau nay.
Viec nit trich chinh kien cua ngudi
diing trong cac van ban tren mang xa hgi
nen dugc thuc hien trong [12, tr.538-547]
cung cap mgt phuang phap phat hien chinh
kien ciia ngudi dimg dua tren nhiing y kien
cd nhan hg trinh bay tren mang xa hgi
Twitter. Day la nghien ciiu cung cap mgt
thuat todn mdi cho viec phat hien chinh
kien ciia chu the trong van ban.
Phdn tich cam xiic dua vao tir dien cam
xiic tieng Viet dugc thuc hien trong [15,
tr.136-148]. Tir dien kha chinh xac khi
dugc xay dung dua tren tir dien
SentiWordNet va tir cam xuc dugc nit trich
tir cac trang mang xa hgi trong mgt mien cu
the. Day la nghien ciiu cung cap mgt tir
dien cam xiic tieng Viet vdi so tu vimg kha
Idn giiip ich cho viec khai pha quan diem.
Trong viec xu ly ngon ngii tu nhien,
nghien ciiu [16] cho rang ban chdt cua qua


So 02 / 2017

TAP CHI KHOA HOC DAI HQC VAN LANG
trinh nit tiich cam xiic ngudi dimg tren

mang xd hgi la mgt qua trmh may hgc.
Nghien ciiu thong qua nhiing binh luan,
nhiing tieu blog tien mang xa hgi, nghien
ciiu danh gia dugc hanh vi cua con ngudi
the hien rdt nhieu qua ngon ngQ, vd cdn
phai dugc ghi nhd.
Qua nhieu nghien ciiu ve phdn tich
cam xiic co the thdy da so quan diem dugc
nit trich tir cac binh luan tieng Anh vd chua
tap trung tren cdc cau so sanh nen viec xdy
dung tap nhan de xdc dinh cau so sanh tir
nhiing binh luan so sanh tieng Viet trong
mgt mien cu the de dua ra ket qua phdn tich
edm xiic dang la mgt vdn dl dang rdt dugc
ngudi dimg quan tam.
3. TIM HIEU PHAN M E M CAN NHAN
TlT LOAI VA T i r DIEN CAM XUC
TIENG VIET
3.1. Phan mem gan nhan tir loai tieng
Viet
vnTagger la phdn mem ma nguon md
cua Le Hong Phuang dimg dl tach tir va
gdn nhan tir loai cho van ban tiing Viet.
Nghien ciiu [13, tr. 12] da mo ta tap nhan
dugc diing trong chuang trinh vnTagger
bao gom 18 nhan tir loai. Phien ban chung
toi sir dung la phien ban 4.2.0 dugc cong b6
vao thang 4/2010.

ra, tir dien ndy dugc xdy dung dua tren mgt

mien cu the Id cdc binh luan dugc thu thap
tir cac trang web thuang mai dac biet la
dien thoai di dgng vd may tinh nen rdt phii
hgp vdi muc dich ciia nghien ciiu. D6ng
thdi, vi tir dien nay da dugc xdy dung dua
tren SentiWordNet vd WordNet nen nghien
ciiu nay chi diing ngii lieu SentiWordNet
nhu Id ca sd dii lieu de kiem tra tinh chinh
xac ciia tir dien. Trong [I] mo ta cac thanh
phdn cila SentiWordNet nhu sau;
Synset: la mgt ban ghi, edu tao bdi 6
cot, cdc cot phan each bdi ddu <tab>;
POS: tir loai ciia tir
ID: ma dai dien cho synset
PosScore: trgng so tich eye cua tir
NegScore: trgng so tieu cue cua tir
SynsetTerms; la nhirng tir nhan djnh
trong synset.
SynsetTerms: la nhimg tir nhdn dinh
trong synset. Mgt synset co thi chiia nhilu
tu, va cac tir nay la tir ding ngliTa vdi nhau.
Mgt tir co the co nhieu ngir canh khac nhau
va trgng s6 Pos(s)/Neg(s) se khac, do do
cac tu nay se dugc gan kem theo si hieu dl
phan biet cdc tir.
POS ID

PosScore NegScore SynsetTeims

V


19S4570

0125

0

3.2. Tir dien cam xiic tieng Viet

V

mm

0.125

0

khongjanfil

Sii dung tir dien de trich xudt cam xiic
la mgt trong nhimg each tilp can chinh dl
khai thdc quan dilm. Trong [15], nhom
nghien ciru da dua tren ngu6n tir vung tiing
Anh cua SentiWordNet dl xdy dung mgt
Tir dien tieng Viet vdi 26,186 tir cam xiic
thugc loai tinh tir, trang tir, danh tir va dgng
tir, trong do moi tir cam xiic se co mgt
trgng so diem tich cue vd tieu cue. Ngoai

V


1988330

05

0

nguyenjoaiSl

nn?>32

ngayW

V

1333S70 0125

0.125

V

2002720 0125

0

duoi^JIduoiffl

V

2006710 0,25


0

de'nfllS

V

2007G80 0.125

0

V

2020410

0

0.125

m
lanjhaSI

Hinh L Mot v^i d6ng dii lieu trong Tir diln cam
xiic tieng Viet


Ly Thi Huyen Chau

TAP CHi KHOA HOC DAI HOC VAN LANG


4. DE XUAT PHCONG PHAP PHAN
TICH CAM XUC D I T A TREN TU DIEN
CAM XUC TIENG VIET
4.1. Xac dinh cac loai so sanh tieng Viet

Vi du: "Dien thogi iPhone thi chudn
earn me ndu ".
Nghien ciiu ndy tap trung phan tich cdc
binh luan tieng Viet dang so sdnh nen tiong

Tieng Viet giong tieng Anh ve cdc loai
so sdnh dugc mo ta chi tilt trong [5]. Cac
edu bmh luan tiing Viet tiiudng thugc mgt
trong ba loai edu so sdnh sau, cdc edu binh
luan con lai thugc dang edu thong thudng
hoac edu bdt thudng:

nghien ciiu nay co the bo qua cac cau thong

Cau so sdnh nhdt: la nhiing cau so sanh
Idn han hoac nho hon tat ca cac doi tugng
con lai. Trong edu thudng co cac tir nhu:
nhat, so 1,...

cho biet danh sdch cac loai cau so sanh ma

Vi du: "iPhone la ddng dien thogi dep
nhit"
Cau so sdnh bdng: la nhirng cau so
sdnh su tuang duang \'e mgt so dac diem

giira cac doi tugng. Trong cau thudng co
cdc tir nhu: nhu nhau, giong,...
Vi du: '"iPhone vd Android la hai ddng
dien thoai edm img tdt nhu nhau".
Cdu so sanh han: la nhung edu so sanh
su Idn han hoac nho hon, sir sap xep c6 thii
tu giiia cac doi tugng. Trong cau thudng co
cac tir nhu: hon, thua,....
Vi du: "iPhone
Nokia".

chup hinh dep hem

Cau thong thudng: la cau binh luan
thong thudng khong chi ra su so sanh, ciing
nhu khong dua ra thir tu giCra cac doi
tugng.
Vi du- "Dien thogi iPhone cam img
rdt tdt "
Cau bdt thudng: la bao gom nhirng cdu
tiing long, khong ddu, hoac viet theo thuat
ngu- thanh thieu nien, theo thuat ngir mang
xahgi,...

thudng va cdu bat thudng, tuy nhien chiing
van dugc thu thap de danh gid miic do
chenh lech giUa cau so sdnh vd cdu thong
thudng ciia cac binh luan dugc thu thap tir
cdc trang web thuang mai. Bang I sau day
chiing toi tap trung nghien ciiu.

Bang L Danh sach loai can so sanh
' TT \ Loai cau so sanh
'

Nhan

1

' S o sanh nhat

N

2

1 So sanh hon

H

So sanh bang

B

' 3

4.2. Xdc dinh bo tap tir theo loai cau so
sdnh
Dua tien cdc binh luan dugc thu thap
tir cdc trang web thuang mai, ngudi nghien
ciiu tu xac dinh cac cdu binh luan so sanh
vd xdy dung bg tap tir theo timg loai so

sanh. Ket qua khdi tao c6 16 tir loai dugc
xdc dinh (trong do cac nhan: N: so sdnh
nhdt, H: so sanh han, B: so sanh bang).
Bang 2. Danh sach khoi tao tir theo loai so sanh
TT

Nhan

Tir the hien

1
2

N

nhat

N

no 1

X

number !

4

N

so !


"

N

so mgt

6

N

number one

7

H

hon

8

H

thua

9

H

kem



TAP CHi KHOA HOC DAI HOC VAN LANG
10

B

giong

11

B

same

12

B

CO

vdi 16 tir khdi tao ndy dugc thong ke cu th^

13

B

y XI

trong Bang 3. Thong ke ndy dugc thuc hien


14

B

nhu

15

B

bang

16

B

deu

Ddnh gid do chinh xac ciia Thuat toan

tren 705 cdu binh luan, dugc Idy tir 5 chu
dh ngau nhien.

Bang 3. Kk qua th6ng ke dp chinh xac cua thuat toan xac djnh c§u so sanh va gan nhan so sanh
Cau
TT

chuae


1

Dien thoai nao co camera chup hinh dep hon iPhone 6?

Do
Dung

chinh

98

88

89%

246

231

94%

105

100

96%

binh
luan


2
3

Dung iPhone 6 Plus roi thi chuyen sang Note 4 hay
HTC One M9?
Galaxy Note 4 hay iPhone 6 Plus phii hgp hon vai viec
Ihir ky?

xac

4

Pin Galaxy S6 tot hon iPhone 6

67

63

94%

5

Galaxy S6 Edge va iPhone 6 Plus do kha nang chong
rung

189

172

91%


Quan sat Bang 3, co the thay vdi bg tap
tit khdi tao gom 16 tir d Bang 2, do chinh
xac trung binh cua thuat toan xdc dinh cau
so sanh va gan nhan so sanh la 92.8%.
Do sai so ciia thuat toan chu yeu tap
trung tren cac cum tir co gan lien vdi tu
"nhu" trong Bang 3. mac dii co tir "nhu"
nhung cau lai khong mang y nghTa so sanh
bang, vi du: hau nhu, nhu vay thoi, mong
nhu the, gia nhu, nhu kieu cua em, don cii
nhu, nhu the la, nhu sau, nhu each nhin,...
Mat khdc vdi tir "hem" co th^ dan dSn mgt
vdi trudng hgp sal, nhu: hem 1 nam,...
Sau qua
quan sal tren
ngudi nghien
mgt so tir vao

trinh tinh do chinh xac va
tap tir dan din k^t qua sai,
ciru nhan thay cdn b6 sung
bg tir khdi tao, vdi ly do, tdn

suat xuat hien thudng xuyen ciia cdc tir nay
vd cac tu diing chuan "tieng Viet".
Hien tai, bg tap tir loai so sanh bao
gom 26 tir, sau khi thuc thi thuat toan moi
de xac dinh cau so sdnh va gdn nhan so
sanh tren bg tap td mdi nay, ket qua voi

1720 cau binh luan thi co 457 cau thugc
dang so sanh. Danh sach ddy dii cua bg tir
khdi tao va tir bo sung sau qua trinh phdn
tich duac the hien trong Bang 4.
4.3. Cac bude thuc hien chinh
Bu'dc 1 • Thu thap va tiln xir ly dQ lieu
binh luan: la bude thu thap dii lieu binh
luan tu dgng tir cac trang web thuang mai
(sir dung cong cu Craw Tool ciia Website
Internet Marketing Ninjas), sau do dir lieu


TAP CHt KHOA HQC DAI HQC VAN LANG
se dugc chuan hoa vd tdch cdu de phu hgp
vdi muc dich phan tich.
Buac 2: Xac dinh cau binh luan tieng
Viet dang so sanh: la bude dua vdo tap
danh sach cac tir xdc dinh cdu so sanh de
xdc dinh va gan nhan cau so sanh. Tiep
theo, sir dung chuang trinh vnTagger de
gdn nhan tir loai tieng Viet, sau do nit trich
danh sach vd vi tri cua cdc tir dugc gdn
nhan theo yeu cdu phdn tich.
Bang 4. Danh sach tii theo timg loai so sanh sau qua
trinh phan tich
TT

Mian

Tir the hien


1
2

N

nhat

N

no 1

^ •

number 1

3
4

N

so 1

5

N

so mpt

6

7

N

number one

N

vo doi

8

X

tren cii tuyet voi
kho ai vLrot qua

9

N

10

N

xuiit sSc

11

N


hoan hao

12

N

lam gi CO doi thu

13

N

chua CO doi thu

14

N

dinh ciia dinh

15

N

an din het

16

H


hon

17

H

thua

18

H

kem

19

B

giong

20

B

same

21

B


CO

22
23

B

yxi

B

nhu

24

B

bang

25

B

deu

26

B


ngang

Bude 3: Su dung tir dien cam xiic tieng
Viet dh tinh diem trgng so tich cue, tieu

Ly Thi Huyen Chau
cue: bude ndy se kiem tra cau binh luan co
thugc dang cau phii dinh, sau do dua vao
Tir dien cam xiic tieng Viet va danh sach
cac tir gdn nhan de tinh diem tich cue va
tieu cue.
Diem tich cue ciia tinh tir vd dgng tir
dugc tinh theo cong thiic:
pos=Y

Pi

(1)

Trong do:
pos: Diem tich cue
?,-.• Diem tich cue ciia tinh tir/dgng tir
thui
Diem tieu cue ciia tinh tir vd dgng tir
dugc tinh theo cong thirc:
neg= ^ Ns

J (2)

Trong do:

neg: Diem tieu cue
N,: Diem tieu cue ciia tinh tir/dgng tu
thiii
Vi du: "Note/N 4/M chupA' dep/A
hcm/R ip/N 6/M".
Kit qud: Vdi cau tren, tinh tir trong cau
la ''dep'\ vdi tinh tu nay khi lim trong Tir
dien cam xiic tieng Viet theo cong thiic (1),
(2), ket qud diem tich cue ciia tinh tir
'''dep'': pos — 6.75, diem tieu cue ciia tinh tir
^'dep^'' neg = 0.5.
Neu trong cau co xuat hien tir phu dinh
\'a vi tri xuat hien ciia tir phii dinh trudc
ngay vi tri ciia cua tinh tu/dgng tir thi diem
so tich cue vd tieu cue ciia tinh tir/dgng td
do dugc tinh theo c5ng thuc sau:
fpos= neg
fneg=pos
Trong do:

(3)


So 02 / 2017

TAP CHi KHOA HQC DAI HQC VAN LANG
fpos:

Dilm tich cue ciia tinh tir/dgng


trudc nhu sau:

tir CO phu dinh
fneg:

sau khi co tir phu dinh "khdng" kem phia
Diem tich cue: pos = 0.5

DiSm tieu cue ciia tinh tir/dgng

Diem tieu cue: neg = 6.75

tir CO phu dinh.

Diem tich cue va tieu cue cho timg doi

Vi du: "Note/N 4/M chup/V khong/R

tugng chu de ttiy thugc vao vi tri xudt hien

dep/A han/R ip/N 6/M".
Kit qua: Dilm tich cue vd tieu cue ciia
tinh tir '"dep" da dugc tinh d Bude (3). Vay

ciia d6i tugng do so vdi vi tri ciia tu loai so
sdnh va cdu so sdnh.

diem tich cue vd tieu cue cua tinh tir ndy
Bang 5. Bdng uu tien ddi tuang chu di trong edu so sdnh
TT

1

Nhat

2

Nhdt

3

Bang

4

Bang

5

Hon

6

Hon

7

Hon

Oi „ + AdjW + char


Oi„

Vi dti: iPhone 6 la dep nhdt.
AdjW + char + Oi „
Vi du: dep nhdt Id iPhone 6.

Oi„

Oi + Oj + AajW + char
Vi du: iPhone 6 vd Z3 dep nhit nhau
Oi+AdjA' + ehar+02
Vi du: iPhone 6 dep nhu Z3.

Oi.Oz
Oi.Oz

Oi + AdjW + char + O2
OI

Vi du: iPhone 6 dep han Z2.
O I + AdjW + char
Vi dii: iPhone 6 dep han.

0,

0 , +char + 02+Adj/V
Vi du: iPhone 6 han Z3 ve chup dep.

Diem tich cue cho toan bg chii dh dugc
tinh theo cong thiic:

spos = X POSf

(4)

Trong do:
spos: Tong diem tich cue cua chii dl.
posj: Diem tich cue ciia doi tugng thii j .
Diem tieu cue cho toan bg chii dl dugc
tinh theo cong thuc:
sneg=Y

Doi tu'OUg u'u tien

Cu p h a p

Cau so sanh

negf

(5)

Trong do:
sneg: Tong diem tieu cue ciia chii dl.

0,

negj: Diem tieu cue ciia doi tugng thiij.
Sau khi tinh dugc tong. diem tich circ
vd tieu cue ciia cac doi tugng, tien hanh so
sanh ket qua va phan tich:

Neu spos > sneg: D6i tugng trong chii
de dugc danh gia tot
Neu spos < sneg: Doi tugng trong chii
de dugc ddnh gia khong tot.
Neu spos = sneg: Doi tugng trong chu
de dugc danh gia binh thudng.
Bude 4: Phan tich cam xiic ngudi diing
dua tren binh luan tieng Viet dang so sanh:
bude nay se xdc dinh vi tri ciia cdc doi


TAP CHi KHOA HOC DAI HOC VAN LANG
tugng chii de trong cdu so sanh dl tinh tong
diem tich cue, tieu cue cho m6i ddi tugng,
sau do tong hgp so sanh vd dua ra nhan xet.
5. K E T QUA \ A DANH GIA
Cdc binh luan: dugc thu thap tir cdc
trang Web thuang mai, dac biet la trong
ITnh vuc dien thoai, vdi so lugng chii de bai
bao: 25 (nguon: sohoa.vnexpress.net), so
lugng binh luan: 2,185 (bao gdm cdu thong
thudng \'d cdu so sdnh).
So lugng 2,185 binh luan tren dugc
chudn hoa de cdc binh luan ddp ling dugc
yeu cdu nghien ciiu. Sau khi chudn hoa, so
binh luan con lai dap ling dugc yeu cdu Id
17 chu de va 795 binh luan.
Tu 17 chii de tren, co 25 doi tugng chu
de ciing vdi 427 tu cac tu \ iet tat cac dang
khdc CO the co ciia doi tugng chii de.

Ket qud khi ngat cdu binh luan sau khi
ap dung tap ky tu Id 1,720 cdu dugc tdch tir
795 binh luan da dugc chuan hoa. Trong
danh sdch 1,720 cau cd 457 cdu thugc dang
cau binh luan so sdnh vd dugc gdn nhan.
Sau khi dp dung tap nhan xac dinh cau
so sdnh tieng Viet, chiing toi do ludng do
chinh xac cua chiing bang each tien hanh
kiem tra ket qud phdn tich cam xiic cua
ngudi dimg tren timg cdu binh luan (1,720
cau), nhimg xin de kiem tra gom cd:
Gan nhan tir loai (chi xet nhimg tir
nhan quan tam: danh tir, dgng tir, tinh
tir,...).
Xac dinh loai cdu so sanh
Tinh diem tich cue
Tinh diem tieu cue
Do chinh xdc tnmg bmh cim \iec dp
dung tap nhan xdc dinh cdu so sanh de phdn
tich cam xiic tong hgp la 74.7%. Qua qua
trinh danh gid, chiing toi nhdn thay rang tap

L\' Thj Huyen Chau
nhan nay co do chinh xac co the ling dung
dugc, tap nhan ndy boat dgng tot khi nhirng
binh luan thu dugc trong mgt mien cu the.
Viec thuc thi se hi gidm di neu nhimg ddnh
gid den tu cdc linh vuc khdc nhau. Trong
ngon ngu Viet, co mgt so trudng hgp md
mgt tir quan diem co the dugc gidi thich bdi

nhieu y nghTa khdc nhau. Mgt chii "ddi" la
mgt cam xiic tich cue neu no de cap den pin
nhung se trd thdnh cam xiic tieu cue khi
chiing ta noi dieu gi do ve thdi gian chd dgi
trong mgt nha hang. Day la ly do lai sao do
chinh xac ciia tap nhan nay khong cao nhu
mong dgi, nhung do chinh xac ciia phuang
phdp ndy se gop phdn tao tien de cho cdc
nghien ciiu khac ve viec phan tich cam xiic
cua ngudi dimg dua tren cac binh luan
tieng Viet.
6. K E T LU.AN VA KIEN NGHI
Bdi viet da tim hieu vd xdy dung dugc
cac loai so sanh va tap nhan xac dinh cdu
so sdnh, sau do dua ra each xdc djnh vd gan
nhdn cdc cdu so sanh. Dong thdi, nghien
cuu nay da dua ra dugc thuat todn ngat cdu
\ a dp dung chuong trinh \'nTagger de gan
nhan tir loai tieng Viet. Tir do, dua tren tir
dien edm xiic tieng Viet, nghien ciiu nay
chi ra dugc ket qua phan tich cam xiic dua
tren tap danh tir, danh tir rieng, danh tir dan
vi. so tu va dua tren diem tich cue, tieu cue
ciia tinh tir, dgng tir.
De the hien do chinh xdc cua tap nhan
xac dinh cau so sanh, ngudi nghien cihi da
xay dung chuang trinh thir nghiem.
Chuang trinh dugc thuc hien tren 17 chit
de, 795 binh luan, 1,720 cau va dua tren Tir
dien cam xiic tieng Viet gom 26,186 tir. Ket

qua ciia phuang phdp img dung tap nhan


TAP CHi KHOA HOC DAI HOC VAN LANG

Xiic dinh eau so sanh ti6ng Viet la kha quan
voi dp chinh xac trung binh dat 74.7%.
Trong thai gian toi, ngoai viec tiip tuc
giai quyit cac vin di con tin tai, mpt so
nghien ciiu tiip theo dtr kiin se thuc hien:
Nghien ciiu them ve quy luat gan nhan ctia
vnTagger di bap phu hit tSt ca cac trucmg

hop gan nhan; B6 sung danh sach each viet
Ithac cua cac doi tuong co the co tit cac
trang mang xa hoi; Cat tien phuong phap
xac dinh cau so sanh blng each hoan thien
tap tir the hien cac loai so sanh.

TAI LIEU THAM KHAO
1. Baccianella, S., A. Esuli, and F. Sebastiani (2010), SentiWordNet 3.0' An Enhanced
Lexical Resource for Sentiment Analysis and Opinion Mining, Proceedings of the
International Conference pn Language Resources and Evaluation.
2. Balahur, A. et al. (2009), Opinion Mining on Newspaper Quotations. Proceedings of
the 2009 lEEE/WIC/ACM International Joint Cpnference on Web Intelligence and
Intelligent Agent Technology,
3. Baumgarten, M. et al. (2013), Keyword-Based Sentiment Mining using Twitter,
International .lournal of Ambient Computing and Intelligence.
4. Esuli, A. and F. Sebastiani (2006), Sentiwordnet. A Publicly Available Lexical Resource
for Opinion Mining, Proceedings of the 5th Conference on Language Resources and

Evaluation.
5. Ganapathibholla, M. and B. Liu (2008), Mining Opinions in Comparative Sentences.
Proceedings of the 22nd International Conference on Computational Linguistics.
6. Harb, A., et al. (2008). Web Opinion Mining How to extract opinions from blogs?
Proceedings of the 5th international conference on Soft computing as transdisciplinary
science and technology.
7. Jindal, N. and B. Liu (2006), Identifying Comparative Sentences in Text Documents,
Proceedings of the 29th annual international ACM SIGIR conference on Research and
development in information retrieval.
8. Jindal, N. and B. Liu (2006), Mining Comparative Sentences and Relations,
Proceedings of the 21st national conference on Artificial intelligence.
9. Kieu, B.T. and S.B. Pham (2010), Sentiment Analysis for Vietnamese, Proceedings of
the 2010 Second International Conference on Knowledge and Systems Engineering.
10. Kim, S.-M. and E. Hovy (2004), Determining the Sentiment of Opinions, Proceedings
of the 20th international conference on Computational Linguistics, no. 1367.
11. Kouloumpis, E., T. Wilson, and J.D. Moore (2011), Twitter Sentiment Analysis The
Good the Bad and the OMGl, Proceedings of the Fifth International Conference on
Weblogs and Social Media.


TAP CHi KHOA HOC DAI HQC VAN LANG

Ly Thj Huy£ti Chau

12. Kwon, A. and K.-S. Lee (2013), Opinion Bias Detection Based on Social Opinions for
Twitter, Journal of Information Processing Systems.
13. Le-Hong, P., et al. (2010), An Empirical Study of Maximum Entropy Approach for
part-of-Speech Tagging of Vietnamese Texts, Traitement Automatique des Langues
N'aturelles.
14 Lee, D., O.-R. Jeong, and S.-g. Lee (2008), Opinion Mining of Customer Feedback

Data on the Web, Proceedings of the 2nd international conference on Ubiquitous
information management and communication.
15. Xguyen, H.N,, et al. (2014), Domain Specific Sentiment Dictionary for Opinion Mining
of Vietnamese Text. Proceedings of the 8th International Workshop on Multidisciplinary Trends in Artificial Intelligence.
16. Thakkar, H. and D. Patel (2015), Approaches for Sentiment Analysis on Twitter- A
5ra?e-o/-^r/5rwrfy, arXiv preprint arXiv: 1512.01043.
Ngay nhan bdi; 08/11/2016. Ngay bien tap xong: 17/02/2017. Duyet dang: 21/3/2017



×