Tải bản đầy đủ (.pdf) (10 trang)

Nghiên cứu và áp dụng kỹ thuật tự động hóa tiên tiến vào tóm tắt tự động văn bản

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (292.37 KB, 10 trang )

VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY

DO THUY DUONG

RESEARCH AND APPLY EVOLUTIONARY
COMPUTATION TECHNIQUES ON
AUTOMATIC TEXT SUMMARIZATION

MASTER THESIS IN INFORMATION TECHNOLOGY

HANOI - 2015


VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY

DO THUY DUONG

RESEARCH AND APPLY EVOLUTIONARY
COMPUTATION TECHNIQUES ON
AUTOMATIC TEXT SUMMARIZATION
Field:

Information technology

Major:

Software Engineering

Code:



60480103

MASTER THESIS IN INFORMATION TECHNOLOGY
SUPERVISOR: Assoc. Prof. Nguyen Xuan Hoai

HANOI - 2015


3

Declaration of authorship
I, Do Thuy Duong, declare that this thesis ‘Research and apply evolutionary
computation techniques on automatic text summarization’ and the work
presented in it are my own.
I confirm that:
This work was done wholly or mainly while in candidature for a research degree
at this University;
Where any part of this thesis has previously been submitted for a degree or any
other qualification at this University or any other institution, this has been
clearly stated;
Where I have consulted the published work of others, this is always clearly
attributed;
I have acknowledged all main sources of help;
Where the thesis is based on work done by myself jointly with others, I have
made clear exactly what was done by others and what I have contributed myself;

Signed:
……………………………………………………………………………………


Date:
……………………………………………………………………………………


4

Acknowledgements
I am heartily thankful to my supervisor, Prof. Nguyen Xuan Hoai, whose
encouragement, guidance and support from the initial to the final level have
enabled me to develop an understanding of the topic.
I would like to show my gratitude to the teachers in the University of
Engineering and Technology, Vietnam National University, Hanoi for helping
me to gain a large body of knowledge during my two years of studying.
Lastly, I offer my regards and blessings to my friends and my family, who have
always encouraged me so that I could finish this challenging research.


5

Contents
Declaration of authorship ...................................................................................... 3
Acknowledgements ............................................................................................... 4
Contents ................................................................................................................. 5
List of figures ........................................................................................................ 7
List of tables .......................................................................................................... 8
1.

Chapter 1 ....................................................................................................... 9

Introduction ........................................................................................................... 9


2.

1. 1.

Motivation ............................................................................................. 9

1. 2.

Research Objectives ............................................................................ 10

1. 3.

Thesis overview ................................................................................... 10

Chapter 2 ..................................................................................................... 11

Background knowledge ....................................................................................... 11
2. 1.

3.

Automatic text summarization ............................................................ 11

2.1.1.

Definition ...................................................................................... 11

2.1.2.


Types of text summarization ......................................................... 12

2.1.3.

Methodologies for automatic text summarization ........................ 15

2. 2.

Evolutionary computation ................................................................... 16

2. 3.

Differential evolution (DE) ................................................................. 19

2. 4.

Conclusion ........................................................................................... 26

Chapter 3 ..................................................................................................... 27

Automatic text summarization using differential evolution algorithm ............... 27
3. 1.

Automatic text summarization using differential evolution (DE)....... 27

3.1.1.

Document collection representation.............................................. 27

3.1.2.


Objective/ Fitness function ........................................................... 28


6
3.1.3.

Main steps of differential evolution .............................................. 30

3.1.4.

Experiment, result and discussion ................................................. 35

3. 2.

3.2.1.

Method .......................................................................................... 40

3.2.2.

Experiment, result and discussion................................................. 42

3. 3.
4.

Improvement........................................................................................ 40

Conclusion ........................................................................................... 46


Chapter 4 ..................................................................................................... 47

Conclusion and future work ................................................................................ 47

5.

4. 1.

Contributions ....................................................................................... 47

4. 2.

Future work ......................................................................................... 47

Reference ..................................................................................................... 48


7

List of figures
Figure 2.1. A typical summarization system....................................................... 12
Figure 2.2. A summarizer highlights all sentences included in an extractive
summary .............................................................................................................. 13
Figure 2.3. An example of the abstract summary ............................................... 14
Figure 2.4. Multi-document summarization ....................................................... 15
Figure 2.5. The general scheme of an Evolutionary Algorithm in pseudo-code 17
Figure 2.6. General scheme of evolutionary algorithms ..................................... 18
Figure 2.7. Correlation between number of generations and best fitness in
population ............................................................................................................ 19
Figure 2.8. Steps of differential evolution algorithm .......................................... 20

Figure 2.9. Steps to get the next X1 (generation 1) ............................................ 25
Figure 3.1. Illustration of mutation operation ..................................................... 32
Figure 3.2. Illustration of crossover operation .................................................... 33
Figure 3.3. Changes in summary length in [DE] method on DUC2004 ............. 38
Figure 3.4. Changes in summary length in [DE] method on DUC2007 ............. 39
Figure 3.5. Summary length in [MultiDE] method on DUC2004 ...................... 43
Figure 3.6. Summary length in [MultiDE] method on DUC2007 ...................... 43
Figure 3.7. Comparison between F-values of [DE] and [MultiDE] on DUC2004
............................................................................................................................. 45
Figure 3.8. Comparison between F-values of [DE] and [MultiDE] on DUC2007
............................................................................................................................. 46


8

List of tables
Table 2.1. The basic evolutionary computation linking natural evolution to
problem solving ................................................................................................... 17
Table 2.2.Fitness of six individuals at generation 0 ............................................ 22
Table 2.3. Creation of mutant vector V1 ............................................................ 23
Table 2.4. Creation of trial vector Z1 .................................................................. 23
Table 2.5. Values of X1 in generation 1 ............................................................. 24
Table 3.1. Description of the datasets used in the experiment............................ 35
Table 3.2. Parameter settings of the first experiment ......................................... 37
Table 3.3. Summary lengths of some document collections in DUC2004 using
[DE] method ........................................................................................................ 38
Table 3.4. Summary lengths of some document collections in DUC2007 using
[DE] method ........................................................................................................ 40
Table 3.5. F-Values of three evaluation measures of method [DE] on DUC2004
and DUC2007 ...................................................................................................... 40

Table 3.6. Parameter settings of the second experiment ..................................... 42
Table 3.7. Summary lengths of some document collections in DUC2004 using
[MultiDE] method ............................................................................................... 44
Table 3.8. Summary lengths of some document collections in DUC2007 using
[MultiDE] method ............................................................................................... 44
Table 3.9. F-Values of three evaluation measures of method [MultiDE] on
DUC2004 and DUC2007 .................................................................................... 45


9

1. Chapter 1
Introduction
Automatic text summarization means detecting important and condensed
contents in one or more documents. This is a very challenging problem, relating
to many scientific areas such as artificial intelligence, statistics, linguistics, etc.
Many researches have been conducted world wide since 1950 and produced
some systems such as SUMMARIST, SweSUM, MEAD, SUMMON, etc.
However, this research area is still challenging and attracts more and more
attention.
In this thesis, we are going to study some evolutionary computation techniques,
then apply the differential evolution algorithm to the practical problem:
automatic text summarization, in particular, multi-document summarization.
Moreover, we also attempt to deal with constraint on the summary length that
has not been handled effectively in these stochastic popular-based methods.
1. 1. Motivation
Evolutionary computation techniques use different algorithms to evolve a
population of individuals over a certain number of generations. These
population are applied with operations on such as mutation, crossover and
selection to reproduce new offspring, which then compete with each other and

the previous generation to survive based on some evaluation function. The
process ends when a stopping criteria is reached and we found the best
individual – the best solution to our real-world problem.
Evolutionary algorithms have been applied to solve numerous problems in
various fields, one of which is automatic text summarization. However, we have
found it has a weak point in handling the summary length, not like other
sentence ranking methods. Therefore, this research attempts to improve this
aspect of these algorithms.


10
1. 2. Research Objectives
The thesis is aimed to study evolutionary computation techniques, especially the
differential evolution algorithm, and its application to the problem of automatic
text summarization. We find the limitation of other researchers’ ways to handle
the summary length of this algorithm, then propose a new method to manage
this length constraint satisfying users’ demand, but still keep the quality of the
summary.
1. 3. Thesis overview
The rest of this thesis is organized as follows. In chapter 2, we review the
background knowledge of text summarization, its classification and introduce
the main principles of evolutionary computation. In particular, the differential
evolution algorithm is discussed.
Chapter 3 explains in details the above algorithm when applied to automatic text
summarization, in our case it is on multi-document collections. Then, an
experiment is performed to test the original differential evolution algorithm.
Besides, we improve the result of the previous experiment, dealing with the
summary length so that the document collection is compressed quickly and
effectively.
Chapter 4 will recapitulate the thesis, present our contributions and state some

future research directions in this field.



×