Tải bản đầy đủ (.pdf) (5 trang)

Báo cáo khoa học: "Evaluation of Importance of Sentences based on Connectivity to Title" doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (419.88 KB, 5 trang )

Evaluation of Importance of Sentences
based on Connectivity to Title
Takehiko Yoshimi and Toshiyuki Okunishi
Takahiro Yamaji and Yoji Fukumochi
Software Business Development Center, SHARP Corporation
492 Minosho-cho Yamatokoriyama Nara, Japan
Abstract
This paper proposes a method of selecting impor-
tant sentences from a text based on the evaluation
of the connectivity between sentences by using sur-
face information. We assume that the title of a text
is the most concise statement which expresses the
most essential information of the text, and that the
closer a sentence relates to an important sentence,
the more important this sentence is. The importance
of a sentence is defined as the connectivity between
the sentence and the title. The connectivity between
two sentences is measured based on correference be-
tween a pronoun and a preceding (pro)noun, and on
lexical cohesion of lexical items. In an experiment
with 80 English texts, which consist of an average
of 29.0 sentences, the proposed method has marked
recall of 78.2% and precision of 57.7%, with the se-
lection ratio being 25%. The recall and precision
values surpass those achieved by conventional meth-
ods, which means that our method is more effective
in abridging relatively short texts.
(Luhn, 1958; Edmundson, 1969; i~I~I~, 1987;
l~)g~)~ and ~I~, 1988; r~]i~l~ et al., 1989;
Salton et al., 1994; Brandow et al., 1995; ~'2~tZ~
and ~3kl~, 1995; {~)l~ et al., 1995; I£1~$1~ et


al., 1995; Watanabe, 1996; Zechner, 1996; {q~it~,
1997).
1443
i~©~ L~, 1)
~~9 ~
b~
~tS~ (F,~i~k~ et al., 1989; Ono et al., 1994)
(Hoey, 1991; Collier, 1994; ~gi~ , 1997; {~ ;~
~ et al., 1993) 7)~b5. ~&i~-~t,¢<'~-~
ot,¢7)~ 9 ~)~b;5 (Halliday and Hasan, 1976) 7)~, ~
2
2.1
ff-#Y, I- ~Ili~ J- R:q) ~_~l~ I= ~ ~ 7~ {K~
~:Z ( ¢~ ~) ~ b ~? :Z ( ©~ ~) ~ © ;~ ~
[A,B,C} ~.
$2 $4"~'' ~. ~.
'\
{A,D,E} [B,C,G}
~ ~\,
/
~'k IA,D,H,K}
{ A,D,E,F} {D,H}
f4~ .~lZ ~ 7o ~}/~[~.( 5~ ~
~)
[] 1: ~¢~/~6~]~
~.
i<3
(1)
2.2 ~a~ ~9 ~1~ o~ ~ffi
.~.~ ~ ©~-e~:, #o~ ~ ,~ Iz t~ t ~-c ~, ~.

I~, ~-~, ),.~.f~, .~-~, ~-~, ~l~©
~,-~-e~ ~ ~ L.,= e~_~°, t~.~.
& eS~¢)~t~ = M~,I
(e)
y_,y_z2, Mi, i
~S i ~©~ ~ t~S~
(2) ~9,~.~t~ 2.2A ~_/~-~-t6. _ 9¢~
1444
2.2.1
A,~~(~) :~O)~,,~o)~t~
2.2.2 ~@~ ~"~ ~/J¢ ~.J ~ ~
to~=~d'h~t~, ff~J~_[~, "put pressure on"
"put" [:t~' ~W, "cabinet meeting" ~ "meet-
ing" g~ ~ ~k'~ ,~,.~h~- 6.
2.2.3 ~-,, 0) ~:g~ {-~
:k:-~ t~{[~_ ~ (Edmundson, 1969; P,~8,~ et al.,
1989; Watanabe, 1996) ©~){~'~bTo. ~g~'~t±,
t
1
b, ~.~I: w : 5 ~ bt:.
2.2.4
~-~:a)~.a) 9~ 9
~X Sj
t:_$31,,~©~ (theme) ~ b'~l~,
(Givon, 1979). ~o~:, ~ Sj ¢)~Y~S~ ~e)~tZ]~
(t~, 1985)~L ~T~, ~e{£~$<,
~ ~:~#~, Sj ~ ~l~]~ ~o~ , ~©~i~:Y. ~"
~X~¢) 1/4 ~-F¢~
3
¢)~© 17.9% ~b o ~:

b'~I~-~hl~T~ (~k~/K
and ~
t
F
~S¢)~_~ =
N
-:zx~.~ A, ~~,~o©-:z:~-~ B, C,
25%
~ LT:. ~2
1:-3:~l~,
~ +Yi=~J~
1445
20
15
~
b~J~
10
I I
© 0
0 0
© © 0
0 © 0
0 © 0 0
0 0 © O0
0 0 0 0 O0
0 0 0 0 O0
I
40 60
I
2O

[] 2:
~-~t:~ 6~/~¢
$
©
©
© ©
I
8O
100
~-~ 78.2% 57.7% 25%
-:/x ~.h A 72.3% 52.6% 26%
-:/;z ~ ~ B 61.7% 39.5% 29%
-:/y~ ~.h C 61.4% 40.9% 29%
->" ~ ~- A D 57.5% 42.2% 27%
~g/~otz=k-~5.
~za~-~x b-~l~, "shooting"
"gunfire" ©~1~,~-~-~
t;~z~,
"gun-
fire"
~ZE~t~
g'~ ~lz ~o ot~7]~ ~ t~!, , ~ 7%
~#©(~ b ~:~ (base) ©~J~ =~ ~t~ ~_tff,
nounce" ~ "announcement" t~:, ~.~ L~ b~"
1446
t$5.
4
~3~ ~) l:
t: ~ btz ~ -~, ~ 78.2%, ~#~ 57.7% ©~
~6 : k ~, U~=.

RIU (Hearst, 1997), ~r+)-:~ " b l:°y~ ' ='~ {:-~-T-~
References
R. Brandow, K. Mitze, and L. F. Rau. 1995. Auto-
matic Condensation of Electric Publications by
Sentence Selection. Information Processing &'
Management, 31(5):675-685.
100
80
6O
~#m
/
~m
(%)
4O
20
0
0
~,~ ' , _ ~'¢¢
x. ~ -~ + ' i~IgN~NNo~N~ -+. -
+.
I I I I
20 40 60 80 100
~m (%)
~3:~~~~
A. Collier. 1994. A System for Automating Concor-
dance Line Selection. In Proceedings of NeMLaP,
pages 95-100.
H. P. Ednmndson. 1969. New Methods in
Automatic Extracting. Journal of the ACM,
16(2):264-285.

T. Givon. 1979. From Discourse to Syntax: Gram-
mar as a Processing Strategy. In T. Givon, editor,
Discourse and Syntax, pages 81-112. Academic
Press.
M. A. K. Halliday and R. Hasan. 1976. Cohesion in
English. Longman.
M. A. Hearst. 1997. TextTiling: Segmenting Text
into Multi-paragraph Subtopic Passages. Compu-
tational Linguistics, 23(1):33-64.
M. Hoey. 1991. Patterns of Lexis in Text. Describ-
ing English Language. Oxford University Press.
H. P. Luhn. 1958. The Automatic Creation of Lit-
erature Abstracts. IBM Journal for Research and
Development, 2(2):159-165.
K. Ono, K. Sumita, and S. Miike. 1994. Abstract
Generation based on Rhetorical Structure Extrac-
tion. In Proceedings of COLING, pages 344-348.
G. Salton, J. Allan, C. Buckley, and A. Singhal.
1994. Automatic Analysis, Theme Generation,
and Summarization of Machine-Readable Texts.
Science, 264(3):1421-1426.
H. Watanabe. 1996. A Method for Abstracting
Newspaper Articles by Using Surface Clues. In
Proceedings of COLING, pages 974-979.
1447
K. Zechner. 1996. Fast Generation of Abstracts
from General Domain Text Corpora by Extract-
ing Relevant Sentences. In Proceedings of COL-
ING, pages 986-989.
r,~]i¢~, ~i/~, and ~I~ 1989. ~P)~3~cP~')~]~t~

~I:-"p~'~. NLC89-40, ~-T-~{~.
~I~B. 1987. ~~t~-:,':xff-2~. NL63-
6, '1~¢~~.
~'~ ~, ~, and P~J~ 1993. ~3~
~.
~J~J~Fq, ~, and ~1~ 1995. ~-Y-=~
:~©~'4~-~ ~-~. ~~:~,
36(10):2371-2379.
~g~n~:, ~1~, and I~ 1995. ~t~
~-~Iz$~J~ L ~': ~~-:/~ ff h GREEN.
~J~, 2(1):39-55.
~'2~1:1:~ and ~<;~gl~. 1995. ~ffl/¢~ :/©~
~~~, 36(8):1838-1844.
{~. 1997. ~L~$~J~bt:.~[~ • t~ b~
~.
~. 1985.
*~©~t. :k~.
~2g?~ 1997. 3~©~#~t:-~-'-J< I~1~. In
,,J~J~x~ 31~lZ~:~-~l~, pages 321-
324.
~g~)X and ~1~. 1988. ~ ~7 1~'~9~
~~ ~ ~, 29(3):325-328.

×