Tải bản đầy đủ (.pdf) (6 trang)

DSpace at VNU: Combining statistical machine learning with transformation rule learning for Vietnamese Word Sense Disambiguation

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (198.92 KB, 6 trang )

Combining statistical machine learning with
transformation rule learning for Vietnamese Word
Sense Disambiguation
Phu-Hung Dinh, Ngoc-Khuong Nguyen, and Anh-Cuong Le
Dept. of Computer Science
University of Engineering and Technology
Vietnam National University, Ha Noi
144 Xuan Thuy, Cau Giay, Ha Noi, Viet Nam
{, , }

Abstract—Word Sense Disambiguation (WSD) is the task of
determining the right sense of a word depending on the context
it appears. Among various approaches developed for this task,
statistical machine learning methods have been showing their
advantages in comparison with others. However, there are some
cases which cannot be solved by a general statistical model. This
paper proposes a novel framework, in which we use the rules
generated by transformation based learning (TBL) to improve
the performance of a statistical machine learning model. This
framework can be considered as a combination of a rule-based
method and statistical based method. We have developed this
method for the problem of Vietnamese WSD and achieved some
promising results.
Index Terms—Machine Learning, Transformation Based
Learning, Naive Bayesian classification

I. INTRODUCTION
As the ambiguity of natural languages, a word may have
multiple meanings (senses). Practically speaking, an ambiguous word has ambiguity regarding its part-of-speech and meaning. WSD usually considers to disambiguate the meaning of a
word in a specific part-of-speech. A word in a specific part-ofspeech which has several meanings is called polysemous. For
example, the noun “bank” has at least two different meanings:


“bank” in “Bank of England” and “bank” in “river bank”.
Beside that, polysemous word also exists in Vietnamese. For
instance, consider the following sentences:
• Anh ta đang câu cá ở ao.
He is fishing in a pond.
• Đại bác câu trúng lô cốt.
The guns lobbed home shells on the blockhouse.
The occurrence of the word “câu” in the two sentences
clearly denotes different meanings: “to fish” and “to lob”.
WSD means that determining its right sense in the particular context. The success of this task makes benefit for
many Natural Language Processing (NLP) problems such as
information retrieval, machine translation, human-computer
communication, and so on.
The automatic disambiguation of word senses has received
concern since the 1950s [1]. Since that time, many studies

have been investigating on various methods for this problem,
but the performances of available WSD systems or published
results are limited. These methods can be mainly divided into
two approaches: knowledge-based and machine learning (using
corpora).
Knowledge-based methods rely on previously acquired linguistic knowledge. Therefore, WSD task will be performed by
matching the context in which they appear with information
from an external knowledge source. The methods in this
approach are based on knowledge resources like WordNet
thesaurus, as well as grammar rules or hand-coded rules for
disambiguation (see [1] for more detail and discussion).
In machine learning approach, since the decade of 1990s,
empirical and statistical approach have attracted almost studies
in NLP field. Many machine learning methods have been

applied to a large variety of NLP task (including WSD) with
remarkable success. The methods in this approach use techniques from statistics and machine learning to induce models
of language usage from large samples of text. Generally, basing
on labeled data, unlabeled data, or both, machine learning
methods can be divided into three groups including supervised,
unsupervised, and semi-supervised one. Because supervised
systems are based on annotated data, they achieve better
results. Many machine learning methods have been applied
to systems such as: use maximum entropy models [2], [3],
use support vector machines (SVM) [4], use decision list [5],
[6], use Naive Bayesian (NB) classifier [7], [8]. Other studies
have tried to use linguistic knowledge from dictionaries and
thesauri as in [9], [10].
Machine learning approach seems to show its advantages in
comparison with knowledge based approach. While knowledge
based approach is based on rules generated by experts as well
as their ability and meets difficulties in solving a big number
of cases, machine learning approach can solve the problem
on a large scale without paying much attention on linguistic
aspects.
However, the obtained results for WSD (e.g. in English) are
still far from applying in a real system. Although an average

978-1-4673-0309-5/12/$31.00 ©2012 IEEE


accuracy on Senseval-2 and Senseval-3 1 is around 70%, some
other studies such as [13] achieve higher accuracy (about 90%
for several words) as being implemented on a large training
data.

The first reason from our observation causing unexpected
results for the WSD statistical machine learning systems is
based on a spare corpus. The second reason is that there are
usually exceptional cases for any NLP problems (particularly
for WSD), which does not depend on a general principle (or
model). Therefore, in this paper we focus on correcting the
cases which may be misclassified by a statistical machine
learning system. During the research, by borrowing the idea
from knowledge based approach instead of generating these
rules by an expert we will apply the techniques in TBL for
automatically producing the rules.
Firstly, basing on the training corpus, a machine learning
model is trained to be used as the initialized classification
system for a TBL based error driven learning approach in order
to amend the initial prediction, using a development corpus.
Consequently a set of TBL rules are produced. Secondly, in
the final model, as the first step we first use the machine
learning model to detect senses for the polysemous words and
then apply the obtained transformation rules on the results of
the first steps to get the final senses.
The paper is organized into six parts including the introduction. In section II, we will present a background including TBL
and a statistical machine learning method. And then, the detail
of our proposed model will be presented in the section III. In
section IV, we will present feature selection and rule templates
selection. Data preparation and experiments will be presented
in section V. Finally, we conclude the paper in section VI.
II. BACKGROUND
In this section we will introduce the NB classification (in
the corpus-based approach) and the TBL (in the rule-based
approach), which are the two basic methods used in the

combination method we propose.
A. NB model
NB method have been used in most classification work and
were first used for WSD by Gale [7]. NB classifiers work
on the assumption that all the feature variables representing a
problem are conditionally independent given the classes.
Assuming that the polysemous word w is disambiguating. Suppose that w has a set of potential senses (classes)
S = {s1 , . . . , sc }, and it is given a context of w which is
presented by a set of features F = {f1 , . . . , fn }. The Bayesian
theory suggests that the word w should be assigned to class
sk provided that the a posterior probability of that class is
maximum, namely
sk = arg maxP (sj |F ), j ∈ {1, . . . , c}
sj

where the value of P (sj |F ) is computed by the following
equation:
1 See

more detail about these corpora on />
P (sj |F ) =

P (sj )P (F |sj )
P (F )

P (F ) is constant for all senses and therefore does not influence
the value of P (sj |F ). The sense sk of w is then:
sk = arg maxP (sj |F )
sj


= arg max
sj

P (sj )P (F |sj )
P (F )
n

= arg maxP (sj )
sj

P (fi |sj )
i=1

n

= arg max[logP (sj ) +
sj

logP (fi |sj )]
i=1

The values of P (fi |sj ) and P (si ) are computed via
maximum-likelihood estimation as:
C(sj )
C(fi , sj )
P (sj ) =
and
P (fi |sj ) =
N
C(sj )

where C(fi , sj ) is the number of occurrences of fi in a
context of sense sj in the training corpus, C(sj ) is the number
of occurrences of sj in the training corpus, and N is the total
number of occurrences of the polysemous word w or the size
of the training dataset. To avoid the effects of zero counts when
estimating the conditional probabilities of the model, we set
P (fi , sj ) equal to 1/N for each sense sj when meeting a new
feature fi in a context of the test dataset. NB algorithm for
WSD is presented as follows:
Training:
for all senses sj of w do
for all features fi extracted from the training data do
C(fi , sj )
P (fi |sj ) =
C(sj )
end
end
for all senses sj of w do
C(w, sj )
P (sj ) =
C(w)
end
Disambiguation:
for all senses sj of w do
score(sj ) = log(P (sj ))
for all features fi in the context window c do
score(sj ) = score(sj ) + log(P (fi |sj ))
end
end
choose sk = arg maxscore(sj )

sj

B. Transformation-Based Learning
TBL is known as the most successful method in the rulebased approach for many NLP tasks because it provides a


method for automatically learning the rules.
Eric and Brill [11] introduced TBL and showed that it can
do part-of-speech tagging with fairly high accuracy. The same
method can be applied in many natural language processing tasks, for example: text chunking, parsing, named entity
recognition, and word sense disambiguation. The method’s key
idea is to compare the golden-corpus being correctly tagged
with the current-corpus being created through an initial tagger,
then automatically generate rules to correct errors based on
predefined templates.
Transformation-based learning algorithm runs in multiple
iterations as following:
Input: Raw-corpus containing the entire raw text without
labels is extracted from the golden-corpus that contains manually labeled context/label pairs.
• Step 1: Generating initial-corpus by using an initial label
where its input is the raw-corpus.
• Step 2: Comparing the initial-corpus with the goldencorpus to determine the initial-corpus’s label errors from
which all rule templates are used for creating potential
rules.
• Step 3: Appling each rule in potential rules to a copy
of initial-corpus. The score of a rule is computed by
subtracting of number of additional errors from number
of correctly changed labels. The rule with the best score
is selected.
• Step 4: Updating the initial-corpus by applying selected

rule and moving this rule to the list of transformation
rules .
• Step 5: Stopping if the best score is smaller than a
predefined threshold T, otherwise repeat step 2.
Output: List of transformation rules.
III. OUR APPROACH
In this section, we describe our approach to induce a model
which will correct the missing tagged senses of a statistical
machine learning model (note that, we choose here the NB
classification model). This model includes the training phase
and testing phase.
Notice that in this model, we will use a training data for
generating a NB classification and then use a developing data
for learning the transformation rules. These two sets of tagged
data is constructed by manually labeling from a set of selected
contexts of the polysemous word.
A. The training phase
The training process consists of two stages. In the first
stage, list error are determined based on the NB model. This
stage is described as shown in Figure 1.

Step 2: Using training-corpus to train a NB classification
model. This classification model is then tested on the raw
developing-corpus obtained in the step 1. The obtained
result is called the initial-corpus.
• Step 3: Comparing initial-corpus with the developingcorpus to determine list all contexts with wrong labels
from the NB classification model.
Output: List of contexts with wrong labels (we call it the
list error as shown in Figure 1).



Figure 1.

The diagram describes training algorithm (first stage)

In the second stage, the set of TBL rules is determined
based on applying the TBL algorithms on the list error
obtained at the step 1. Notice that in this stage we will use a
predefined templates for generating potential TBL rules (this
is mentioned in detail in Section IV.B) Now, this stage is
described as follows (shown in Figure 2).
Input: Developing-corpus and initial-corpus, and the list
error.
• Step 1: Applying the rule templates for the list error
to generate a list of transformation rules (called list
potential-rules).
• Step 2: Applying each rule in list potential-rule to a copy
of the initial-corpus. Score of each rule is calculated as
s2 − s1 , where s1 is the number of cases that right labels
are transformed into wrong labels, s2 is the number of
cases that are corrected. The rule with highest score is
selected.
• Step 3: Updating initial-corpus by applying the rule with
the highest score and moving this rule to the selected
TBL rules. List error also has been updated accordingly
by comparing initial-corpus with developing-corpus.
• Step 4: Stopping if the highest score is smaller than a
predefined threshold T, else go to step 1.
Output: List of transformation rules (i.e. selected TBL
rules).

B. The test phase

Input: Training-corpus and developing-corpus contain manually labeled context/label pairs.
• Step 1: Obtaining the raw developing-corpus by removing the labels from the developing-corpus.

The proposed approach uses selected TBL rules obtained at
the training phase for the testing as follows (shown in Figure
3).
Input: Test-corpus and selected TBL rules.


Figure 2.

The diagram describes training algorithm (second stage)

Step 1: Obtaining raw test-corpus by removing the label
from the test-corpus.
• Step 2: Using the NB Classification model on the raw
test-corpus and obtaining the called initial-corpus.
• Step 3: Appling selected TBL rules for initial-corpus to
create the labeled corpus.
• Step 4: Comparing the labeled corpus with test-corpus
to evaluate system (i.e. get the accuracy).
Output: Accuracy of the proposed model.


W is a context of w within a windows (-3,+3) in which w0
= w is the target word; for each i ∈ {−3... + 3} wi is a word
appearing at the position i in relation with w.
Based on previous studies [12], [13], [14] and our experiment, we propose to use two kinds of knowledge and represent

them as subsets of features, as following:
• Bag-of-words, F1 (l,r) ={w−l ,. . . ,w+r }:
We choose F1 (-3,+3), it has seven elements (features) as
follows:
F1 (-3,+3) = {w−3 , w−2 , w−1 , w0 , w1 , w2 , w3 }
• Collocation of words, F2 ={w−l . . . w+r }:
We choose collocations provided that their lengths
(including the target word) are less or equal to 4, it
means (l + r + 1) ≤ 4. It has nine elements (features) as
follows:
F2 ={w−1 w0 , w0 w1 , w−2 w−1 w0 , w−1 w0 w1 , w0 w1 w2 ,
w−2 w−1 w0 w1 ,
w−1 w0 w1 w2 ,
w−3 w−2 w−1 w0 ,
w0 w1 w2 w3 }
In summary we obtain 16 features and denote them as
(f1 , f2 , ..., f16 ). These features will be used in the NB classification model and for building a TBL rules.
B. Rule templates for building TBL rules
Rule templates are the important parts of the TBL algorithm.
They are used for automatically generating TBL rules. Based
on previous studies [15], [16] and presented features above,
we propose some rule templates as follows (shown in Figure
4).
A
A
A
A
A

Figure 3.


The diagram describes test phase

IV. FEATURES AND RULE TEMPLATES
A. Feature Selection
One of the most important tasks in WSD is the determination of useful information related to word senses. In the
corpus-based approach, most studies have just considered the
information extracted from the context in which the target word
appears.
Context is the only means to identify the meaning of a polysemous word. Therefore, all works on sense disambiguation
rely on the context of the target word to provide information
being used for its disambiguation. For corpus-based methods,
context also provides the prior knowledge with which current
context is compared to achieve disambiguation.
Suppose that w is the polysemous word to be disambiguated,
and S = {s1 , s2 ,...sm } is the set of its potential senses. Given
a context W of w represented as:
W = {...w−3 , w−2 , w−1 , w0 , w1 , w2 , w3 ...}







B
B
B
B
B


word
word
word
word
word

C
C
C
C
C

@
@
@
@
@

Figure 4.

[
[
[
[
[

-1 ]
1]
-2 ] & word D @ [ -1 ]

-1 ] & word D @ [ 1 ]
1 ] & word D @ [ 2 ]

The rule template

For example, some explanation of the rule templates are
described as follows:
The rule template: “A → B word C @ [ 1 ]” means “transfer
label of current word from label A to label B if the next word
is C” or the rule template “A → B word C @ [ -1 ] & word
D @ [ 1 ]” means “transfer label of current word from label
A to label B if the previous word is C and the next word is
D”.
V. EXPERIMENT
A. Data preparation
As considering WSD as a classification problem, we need
an annotated corpus for this task. For English, many studies
use such kinds of corpus as Senseval-1, Senseval-2, Senseval3 and so on. Because a standard corpus in Vietnamese does
not exist, it is necessary to building a training-corpus for it. To
this end, we first use a crawler to collect data from web sites
and obtain about 1.2GB of raw data (approximately 120.000


articles from more than 50 Vietnamese web sites such as
www.vnexpress.net, www.dantri.com.vn, etc.). We then extract
from this corpus the contexts (containing several sentences
around the ambiguous word) for 10 ambiguous words. For
example, a context for the ambiguous word “bạc” is shown in
Figure 5.
Trọng tâm của tháng là sự hòa hợp trong gia đình, khi

các thành viên đồng thuận về con đường sự nghiệp của
bạn. Giữa tháng 3, tình hình tài chính của bạn cải thiện
rất nhiều. Tiền bạc vẫn đổ dồn về, nhưng phải luôn biết
cách chi tiêu hợp lý. Đây cũng là khoảng thời gian thích
hợp để bạn đầu tư vào các tài sản cố định. Nếu may
mắn, bạn sẽ thu về một khoản tiền lớn.
Figure 5.

A context of the word “bạc”

After that, these contexts of 10 ambiguous words is manually labeled to obtain the labeled corpus. The Table I describes
in detail the number of samples and senses of ambiguous
words in turn.

Table II
DATA SETS

No
1
2
3
4
5
6
7
8
9
10

Word

Bạc
Bạc
Cất
Câu
Câu
Cầu
Khai
Pha
Phát
Sắc

Part of speech
Noun
Adj
Verb
Noun
Verb
Noun
Verb
Verb
Verb
Noun

Senses
4
4
8
2
3
2

4
2
8
4

Firstly, from the labeled corpus, we divide this corpus into
two parts by the ratio 3:1, called data-corpus 1 and data-corpus
2 respectively. Data-corpus 1 is used for training and datacorpus 2 is used for test in the models like NB, TBL, SVM
2
, and our proposed model.
Secondly, data-corpus 1 is used for the purpose of building
the TBL rules so it is divided randomly 10 times into two
parts by the ratio 3:1, one part is used for training (called
training-corpus) and the other is used for developing (called
developing-corpus). Notice that the training phase for building
TBL rules will be processed 10 times corresponding to the
training and developing data, to cover as much as possible for
extracting TBL rules.
The Table II shows the number of training, developing, and
test sets.
2 we use libsvm for SVM model. See more details about libsvm at:
/>
Corpus 1
Training Developing
687
230
308
105
673
229

1767
589
163
57
659
220
1944
650
331
112
1205
408
1124
376

Corpus 2
Test
307
139
301
786
75
295
865
149
538
500

Table III
NB MODEL RESULTS


Examples
1224
552
1203
3142
295
1174
3459
592
2151
2000

To conduct the experiment we build some kinds of data as
follows:

Part of
Speech
Noun
Adj
Verb
Noun
Verb
Noun
Verb
Verb
Verb
Noun

B. Experimental results

In this section, we present experimental results on four models: NB model, TBL model, SVM model, and the proposed
model that is the combination of NB and TBL model.
From data sets above, firstly, we evaluate the accuracy on
the NB model and obtain the results shown in Table III. The
average accuracy of this model is about 86.5%.

Table I
STATISTICS ON THE LABELED DATA
No
1
2
3
4
5
6
7
8
9
10

Word
Bạc
Bạc
Cất
Câu
Câu
Cầu
Khai
Pha
Phát

Sắc

No

Word

1
2
3
4
5
6
7
8
9
10

Bạc
Bạc
Cất
Câu
Câu
Cầu
Khai
Pha
Phát
Sắc

Part of
Speech

Noun
Adj
Verb
Noun
Verb
Noun
Verb
Verb
Verb
Noun

Training

Test

Accur (%)

917
413
902
2356
220
879
2594
443
1613
1500

307
139

301
786
75
295
865
149
538
500

81.8
85.6
84.4
97.6
85.3
95.6
90.4
79.2
73.6
91.6

Averages

1328

444

86.5

Secondly, with each ambiguous word, using training algorithm in Section III, we obtain the lists of TBL rules. As
running this phase 10 times, we obtain 10 lists of TBL rules.

The Table IV shows the experimental results if we separately
test (use the combination model) these TBL lists for the word
"bạc", and its part-of-speech is adjective. Moreover, if we
combine all rules into an account and then we will obtain
better accuracy 92.8%. Beside that, some TBL rules we obtain
for the word "bạc" is shown the Figure 6.
4
2
2
2
2
3
4









2
4
1
3
3
2
1


word
word
word
word
word
word
word

Figure 6.

vàng @ [ -1 ]
sới @ [ -1 ]
cao @ [ 1 ] & word cấp @ [ 2 ]
tiền @ [ 1 ]
mấy @ [ -2 ] & word triệu @ [ -1 ]
tờ @ [ -1 ]
két @ [ -1 ]
Some TBL rules for the word “bạc”

Finally, we show the experimental results on our system
for the 10 ambiguous word. It can be seen from the Table


Table IV
NB & RULES BASED MODEL’S RESULT FOR AMBIGUOUS WORD “BẠC”
No

List of Rules

1

2
3
4
5
6
7
8
9
10

list
list
list
list
list
list
list
list
list
list

11

combined rules

rules
rules
rules
rules
rules

rules
rules
rules
rules
rules

1
2
3
4
5
6
7
8
9
10

Accuracy of
NB & TBL (%)
89.2
89.9
89.2
89.9
89.9
89.9
89.2
90.6
92.1
89.2


ACKNOWLEDGMENT
This work is partially supported by the Vietnams National Foundation for Science and Technology Development
(NAFOSTED), project code 102.99.35.09
REFERENCES

92.8

V that results obtained from the proposed model (combining
NB classification and TBL) is better than results obtained
from NB classification model, TBL model, and SVM model.
The average accuracy of this model achieves about 91.3% for
10 ambiguous words, which is 4.8%, 7.4%, and 3.1% more
accurate than the NB classification model, TBL model, and
SVM model respectively.
Table V
NB & TBL & SVM & OUR PROPOSED MODEL RESULTS
No

Word

1
2
3
4
5
6
7
8
9
10


Bạc
Bạc
Cất
Câu
Câu
Cầu
Khai
Pha
Phát
Sắc

1
2
3
4

Accuracy
Accuracy
Accuracy
Accuracy

of
of
of
of

other natural language processing tasks such as part-of-speech
tagging, syntactic parsing, and so on.


Part of
Speech
Noun
Adj
Verb
Noun
Verb
Noun
Verb
Verb
Verb
Noun

Accur1
(%)
81.8
85.6
84.4
97.6
85.3
95.6
90.4
79.2
73.6
91.6

Accur2
(%)
82.4
83.5

79.7
97.3
88.0
85.4
88.2
76.5
75.2
83.2

Accur3
(%)
84.4
88.5
86.4
97.8
86.7
95.6
91.2
81.2
77.1
92.8

Accur4
(%)
88.6
92.8
89.7
98.3
96.0
95.9

92.9
83.9
80.9
94.0

Averages

86.5

83.9

88.1

91.3

NB model
TBL model
SVM model
NB & TBL model

VI. CONCLUSIONS
This paper has proposed a new method for combining
the advantages of machine learning approach and rule-based
approach for the task of word sense disambiguation. In
particular, we have used NB classification for the machine
learning method and combined it with TBL. We have
experimented on some Vietnamese polysemous words and
the obtained accuracy has increased until 4.8%, 7.4%, and
3.1% when being compared with the results of NB model,
TBL model, and SVM model respectively. It also proves that

TBL can be utilized to correct wrong results from statistical
machine learning models.
This model can be applied to other languages for the task
of WSD, and we believe that it can be also applied to some

[1] N. Ide and J. Véronis, “Introduction to the special issue on word sense
disambiguation: the state of the art,” Comput. Linguist., vol. 24, pp.
2–40, March 1998.
[2] A. Suárez and M. Palomar, “A maximum entropy-based word sense
disambiguation system,” in Proceedings of the 19th international conference on Computational linguistics - Volume 1, ser. COLING ’02.
Stroudsburg, PA, USA: Association for Computational Linguistics, 2002,
pp. 1–7.
[3] A. L. Berger, S. A. D. Pietra, and V. J. D. Pietra, “A maximum entropy
approach to natural language processing,” Computational Linguistics,
vol. 22, pp. 39–71, 1996.
[4] Y. K. Lee, H. T. Ng, and T. K. Chia, “Supervised word sense disambiguation with support vector machines and multiple knowledge sources,” in
Senseval-3: Third International Workshop on the Evaluation of Systems
for the Semantic Analysis of Text, 2004, pp. 137–140.
[5] D. Yarowsky, “Unsupervised word sense disambiguation rivaling supervised methods,” in Proceedings of the 33rd annual meeting on
Association for Computational Linguistics, ser. ACL ’95. Stroudsburg,
PA, USA: Association for Computational Linguistics, 1995, pp. 189–
196.
[6] T. Pedersen, “A decision tree of bigrams is an accurate predictor of word
sense,” in Proceedings of the second meeting of the North American
Chapter of the Association for Computational Linguistics on Language
technologies, ser. NAACL ’01. Stroudsburg, PA, USA: Association for
Computational Linguistics, 2001, pp. 1–8.
[7] W. A. Gale, K. W. Church, and D. Yarowsky, “A method for disambiguating word senses in a large corpus,” Computers and the Humanities,
vol. 26, pp. 415–439, 1992.
[8] T. Pedersen, “A simple approach to building ensembles of naive bayesian

classifiers for word sense disambiguation,” 2000.
[9] M. Lesk, “Automatic sense disambiguation using machine readable
dictionaries: how to tell a pine cone from an ice cream cone,” in
Proceedings of the 5th annual international conference on Systems
documentation, ser. SIGDOC ’86. New York, NY, USA: ACM, 1986,
pp. 24–26.
[10] R. Navigli and P. Velardi, “Structural semantic interconnections: A
knowledge-based approach to word sense disambiguation,” IEEE Trans.
Pattern Anal. Mach. Intell., vol. 27, pp. 1075–1086, July 2005.
[11] E. Brill, “Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging,” Comput.
Linguist., vol. 21, pp. 543–565, December 1995.
[12] C. A. Le, “A study of classifier combination and semi-supervised
learning for word sense disambiguation,” Ph.D. dissertation, School of
Information Science Japan Advanced Institute of Science and Technology, 2007.
[13] C. A. Le and A. Shimazu, “High word sense disambiguation using
naive bayesian classifier with rich features,” in The 18th Pacific Asia
Conference on Language, Information and Computation (PACLIC-2004),
2004, pp. 105–113.
[14] R. F. Mihalcea, “Word sense disambiguation with pattern learning and
automatic feature selection,” Nat. Lang. Eng., vol. 8, pp. 343–358,
December 2002.
[15] G. Ngai and R. Florian, “Transformation-based learning in the fast
lane,” in Proceedings of the second meeting of the North American
Chapter of the Association for Computational Linguistics on Language
technologies, ser. NAACL ’01. Stroudsburg, PA, USA: Association for
Computational Linguistics, 2001, pp. 1–8.
[16] R. L. Milidiú, J. C. Duarte, and C. Nogueira Dos Santos, “Current topics
in artificial intelligence,” D. Borrajo, L. Castillo, and J. M. Corchado,
Eds. Berlin, Heidelberg: Springer-Verlag, 2007, ch. TBL Template
Selection: An Evolutionary Approach, pp. 180–189.




×