Multi-class Text Classification using Support Vector Machines (SVMs)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.39 MB, 1 trang )

(1)<div class='page_container' data-page=1>

The 1

st

UTS-VNU Research School

Advanced Technologies for IoT Applications

Abstract

My project presents the experiment conclusion of the suited of
Support vector machines for multi-class text in different datasets
and discusses the process of text classification with a series of
novel Multi-class Support vector machines methods. It address
the following points:

• How to represent text documents as feature vectors and the
effect of text representation on classification result?

• How to use the binary classifiers for Multi-classification
problem? And the empirical experiment on different dataset
both in English and Vietnamese.

Conclusion

Having many difference kinds of approach using SVMs are experimented to shows that SVMs are well suited for
multi-class text classification problem. Most of them based on the combinations of binary classifiers and find a
way to use these classifiers with more effective. Through my empirical evaluation, each method has their
advantaged, one approach could be slower in training time and accuracy but it has the simple in ideal and
construct. The novel approaches such as tree-based SVMs, like DAGSVM, and membership function SVMs, like
fuzzy SVMs, has performed better than other. But the changes only few percent points in accuracy and less than
a minute in training time, the differences are not enough for a significant change.

References

1. Joachims, Thorsten. "Text categorization with support vector machines: Learning with many relevant
features." European conference on machine learning. Springer Berlin Heidelberg, 1998.

2. Abe, Shigeo. Support vector machines for pattern classification. Vol. 2. London: Springer, 2005.

3. Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on
Intelligent Systems and Technology, 2:27:1{27:27, 2011.

Future work

In recent years, there has been a various interesting approaches on how to utilize unlabeled data such as Self
training and Co-training, Generative probabilistic models, Semi-supervised support vector machines,
Graph-based semi-supervised learning. In next short term, I would like to study previous related works to find out how
other researchers solve this problem and carefully making empirical research evaluating most common and
effective approaches.

Title

Multi-class Text Classification using Support Vector Machines (SVMs)

Author Names and Affiliations

NGHIA NGUYEN HOANG

University of Information Technology,
Ho Chi Minh city

Problem Statement

Nowadays, a vast amount of data is being produced more and
more, which leads to the problems of gaining insights from the

produced data. This makes the huge desire for automatically
classifying large amount of text information. And Support vector
machines has been proved to be a effective learning machine,
especially for classification.

Contributions

• Hai NT, Nghia NH, Le TD, Nguyen VT. A
Hybrid Feature Selection Method for
Vietnamese Text Classification. In
Knowledge and Systems Engineering
(KSE), 2015 Seventh International
Conference on 2015 Oct 8 (pp. 91-96).
IEEE.

</div>