Tải bản đầy đủ (.pdf) (8 trang)

Báo cáo khoa học: "Extracting Key Semantic Terms from Chinese Speech Query for Web Searches" ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (163.49 KB, 8 trang )

ExtractingKeySemanticTermsfromChineseSpeechQueryforWeb
Searches

GangWANG

NationalUniversityof
Singapore



Tat-SengCHUA

NationalUniversityofSinga-
pore


Yong-ChengWANG

ShanghaiJiaoTongUniver-
sity,China,200030

Abstract
This paper discusses the challenges and pro-
poses a solution to performing information re-
trievalontheWebusingChinesenaturallanguage
speech query. The main contribution of this re-
searchisindevisingadivide-and-conquerstrategy
toalleviatethespeech recognition errors. It uses
thequerymodeltofacilitatetheextractionofmain
coresemanticstring(CSS)fromtheChinesenatu-
rallanguagespeechquery.ItthenbreakstheCSS


into basic components corresponding to phrases,
and uses a multi-tier strategy to map the basic
components to known phrases inorderto further
eliminatetheerrors.Theresultingsystemhasbeen
foundtobeeffective.
1 Introduction
Weareentering aninformation era, where infor-
mationhasbecomeoneofthemajorresourcesin
ourdailyactivities.Withitswidespreadadoption,
Internethasbecomethelargestinformationwealth
for all to share.Currently, most (Chinese)search
engines can only support term-based information
retrieval,wheretheusersarerequiredtoenterthe
queriesdirectlythroughkeyboardsinfrontofthe
computer. However, there is a large segment of
populationinChinaandtherestoftheworldwho
areilliterateanddonothavetheskillstousethe
computer.Theyarethusunabletotakeadvantage
ofthevastamountoffreelyavailableinformation.
Since almost every person can speak and under-
standspokenlanguage,theresearchon“(Chinese)
natural language speech query retrieval” would
enableaveragepersonstoaccessinformationusing
thecurrentsearchengineswithouttheneedtolearn
specialcomputerskillsortraining.Theycansim-
ply access the search engine using common de-
vices that they are familiar with such as the
telephone,PDAandsoon.
Inordertoimplementaspeech-basedinforma-
tion retrieval system, one of the most important

challengesishowtoobtainthecorrectqueryterms
fromthespokennaturallanguagequerythatcon-
veythemainsemanticsofthequery.Thisrequires
theintegrationofnaturallanguagequeryprocess-
ingandspeechrecognitionresearch.
Naturallanguagequeryprocessinghasbeenan
activeareaofresearchfor many yearsandmany
techniques have been developed (Jacobs and
Rau1993;Kupie,1993;
Strzalkowski,1999;Yuet
al,1999).Mostofthesetechniques,however,focus
onlyonwrittenlanguage,withfewdevotedtothe
studyofspokenlanguagequeryprocessing.
Speech recognition involves the conversion of
acousticspeechsignalstoastreamoftext.Because
ofthecomplexityofhumanvocaltract,thespeech
signalsbeingobservedaredifferent,evenformul-
tipleutterancesofthesamesequenceofwordsby
thesameperson(Leeetal1996).Furthermore,the
speechsignalscanbeinfluencedbythedifferences
across different speakers, dialects, transmission
distortions, and speaking environments. These
have contributed to the noise and variability of
speechsignals.Asoneofthemainsourcesofer-
rorsinChinesespeechrecognitioncomefromsub-
stitution (Wang 2002; Zhou 1997), in which a
wrongbutsimilarsoundingtermisusedinplaceof
thecorrectterm,confusionmatrixhasbeenusedto
recordconfusedsoundpairsinanattempttoelimi-
nate this error. Confusion matrix has been em-

ployed effectively in spoken document retrieval
(Singhaletal,1999andSrinivasanetal2000)and
tominimizespeechrecognitionerrors(Shenetal,
1998). However, when such method is used di-
rectlytocorrectspeechrecognitionerrors,ittends
tobringin too many irrelevantterms(Ng2000).
Becauseimportant terms in a longdocumentare
oftenrepeatedseveraltimes,thereisagoodchance
thatsuchtermswillbecorrectlyrecognizedatleast
oncebyaspeechrecognitionenginewithareason-
ablelevelof wordrecognitionrate.Manyspoken
documentretrieval(SDR)systemstookadvantage
ofthisfactinreducingthespeechrecognitionand
matchingerrors(Mengetal2001;Wangetal2001;
Chen et al2001). Incontrastto SDR,very little
work has been done on Chinese spoken query
processing(SQP),whichistheuseofspokenque-
riestoretrievaltextualdocuments.Moreover,spo-
kenqueriesinSQPtendtobeveryshortwithfew
repeatedterms.
In this paper, we aim to integrate the spoken
languageandnaturallanguageresearchtoprocess
spokenquerieswithspeechrecognitionerrors.The
maincontributionofthisresearchisindevisinga
divide-and-conquerstrategytoalleviatethespeech
recognition errors. It first employs the Chinese
query model to isolate the Core Semantic String
(CSS) that conveys the semantics of the spoken
query. It then breaks the CSS into basic compo-
nentscorrespondingtophrases,andusesamulti-

tierstrategytomapthebasiccomponentstoknown
phrasesinadictionaryinordertofurthereliminate
theerrors.
Intherestofthispaper,anoverviewofthepro-
posedapproachisintroducedinSection2.Section
3describesthequerymodel,whileSection4out-
lines the use of multi-tier approach to eliminate
errorsinCSS.Section5discussestheexperimental
setup and results.Finally,Section 6 contains our
concludingremarks.
2 Overviewoftheproposedapproach
Therearemanychallengesinsupportingsurfingof
Webbyspeechqueries.Oneofthemainchallenges
isthatthecurrentspeechrecognitiontechnologyis
notverygood,especiallyforaverageusersthatdo
nothaveanyspeechtrainings.Forsuchunlimited
user group, the speech recognition engine could
achieveanaccuracyoflessthan50%.Becauseof
this,thekeyphraseswederived fromthespeech
querycouldbeinerrorormissingthemainseman-
ticofthequeryaltogether.This wouldaffectthe
effectivenessoftheresultingsystemtremendously.
Giventhespeech-to-textoutputwitherrors,the
keyissueisonhowtoanalyzethequeryinorderto
grasptheCoreSemanticString(CSS)asaccurately
as possible. CSS is defined as the key term se-
quenceinthequerythatconveysthemainseman-
tics of the query. For example, given the query:

”(Pleasetell

metheinformationonhowtheU.S.separatesthe
most-favored-nation status from human rights is-
sueinchina).TheCSSinthequeryisunderlined.
WecansegmenttheCSSintoseveralbasiccom-
ponentsthatcorrespondtokey concepts suchas:
 (U.S.),  (China),  (human
rightsissue),
(themost-favored-nation
status)and
(separate).
Because of the difficulty in handling speech
recognitionerrorsinvolvingmultiple segmentsof
CSSs,welimitourresearchtoqueriesthatcontain
onlyoneCSSstring.However,weallowaCSSto
includemultiplebasiccomponentsasdepicted in
theaboveexample.Thisisreasonableasmostque-
riesposedbytheusersontheWebtendtobeshort
withonlyafewcharacters(Pu2000).
Thus the accurate extraction of CSS and its
separation into basic components is essential to
alleviatethespeechrecognitionerrors.Firstofall,
isolatingCSSfromtherestofspeechenablesusto
ignoreerrorsinotherpartsofspeech,suchasthe
greetingsandpoliteremarks,whichhavenoeffects
ontheoutcomeofthequery.Second,byseparating
theCSSintobasiccomponents,wecanlimitthe
propagationoferrors,andemploythesetofknown
phrasesinthedomaintohelpcorrecttheerrorsin
thesecomponentsseparately.








Figure1:Overviewoftheproposedapproach
To achieve this, weprocess the query in three
mainstagesasillustratedinFigure1.First,given
theuser’soralquery,thesystemusesaspeechrec-
ognitionenginetoconvertthespeechtotext.Sec-
ond, we analyze the query using a query model
(QM) to extract CSS from the query with mini-
mumerrors.QMdefinesthestructuresandsome
of the standard phrases used in typical queries.
Third,wedividetheCSSintobasiccomponents,
andemployamulti-tierapproachtomatchtheba-
QM

Confusionmatrix

PhraseDictionary

Multi-Tier
mapping
Basic
Components

Speech
Query


CSS

sic components to the nearest known phrases in
ordertocorrectthespeechrecognitionerrors.The
aimhereistoimproverecallwithoutexcessivelost
in precision. The resulting key components are
thenusedasquerytostandardsearchengine.
The following sections describe the details of
ourapproach.
3 QueryModel(QM)
Querymodel (QM) is used to analyzethe query
and extract the core semantic string (CSS) that
containsthemainsemanticofthequery.Thereare
twomaincomponentsforaquerymodel.Thefirst
isquery componentdictionary,which isa set of
phrasesthathascertainsemanticfunctions,suchas
the polite remarks, prepositions, time etc. The
othercomponentisthequerystructure,whichde-
finesasequenceofacceptablesemanticallytagged
tokens, such as “Begin, Core Semantic String,
QuestionPhrase, and End”.Each querystructure
alsoincludesitsoccurrenceprobabilitywithinthe
query corpus. Table 2 gives some examples of
querystructures.
3.1QueryModelGeneration
Inordertocomeupwithasetofgeneralizedquery
structures, we use a query log of typical queries
posedbyusers.Thequerylogconsistsof557que-
ries,collectedfromtwenty-eighthumansubjectsat

the Shanghai Jiao Tong University (Ying 2002).
Eachsubjectisaskedtopose20separatequeriesto
retrievegeneralinformationfromtheWeb.
After analyzing the queries, we derive a query
modelcomprising51querystructuresandasetof
query components. For each query structure, we
compute its probability of occurrence, which is
used to determine the more likely structure con-
tainingCSSincasetherearemultipleCSSsfound.
Aspartoftheanalysisofthequerylog,weclassify
thequerycomponentsintotenclasses,aslistedin
Table1.Thesetenclassesarecalledsemantictags.
Theycanbefurtherdividedintotwomaincatego-
ries:theclosedclassandopenclass.Closedclasses
are those that have relatively fixed word lists.
Theseincludequestionphrases,quantifiers,polite
remarks, prepositions, time and commonly used
verb and subject-verb phrases. Wecollectallthe
phrasesbelongingtoclosedclassesfromthequery
logandstoretheminthequerycomponentdiction-
ary.TheopenclassistheCSS,which wedonot
knowinadvance.CSStypicallyincludesperson’s
names,eventsandcountry’snamesetc.
Table1:DefinitionandExamplesofSemantictags
SemTag

Nameoftag Example
1. Verb-Object
Phrase
 give  

(me)
2. QuestionPhrase
(isthere)
3. QuestionField
(news),
(report)
4. Quantifier
(some)
5. VerbPhrase
(find)
collect 
6. PoliteRemark
(pleasehelp
me)
7. Preposition
(about),
(about)
8. Subject-Verb
phrase
(I) (want)
9. CoreSemantic
String
9.11 
(9.11event)
10. Time
(today)
Table2:ExamplesofQueryStructure

1


Q1:0,2,7,9,3,0:0.0025,
 9.11  
2793
IsthereanyinformationonSeptember11?

2

Q2:0,1,7,9,3,0:0.01
   
1793
GivemesomeinformationaboutBenladen.
Giventhesetofsamplequeries,aheuristicrule-
basedapproachisusedtoanalyzethequeries,and
break them into basic components with assigned
semantictagsbymatchingthewordslistedinTa-
ble 1. Any sequences of words or phrases not
foundintheclosedclassaretaggedasCSS(with
Semantic Tag 9). We can thus derive the query
structuresoftheformgiveninTable2.
3.2ModelingofQueryStructureasFSA
Duetospeechrecognitionerrors,wedonotexpect
thequerycomponentsandhencethequerystruc-
turetoberecognizedcorrectly. Instead,weparse
thequerystructure inordertoisolateandextract
CSS.Tofacilitatethis,weemploytheFiniteState
Automata(FSA)tomodelthequerystructure.FSA
modelstheexpectedsequencesoftokensintypical
queries andannotate the semantictags,including
CSS.AFSA isdefinedforeach of the51query
structures.AnexampleofFSAisgiveninFigure2.

BecauseCSSisanopenset,wedonotknowits
contentinadvance.Instead,weusethefollowing
tworulestodeterminethecandidatesforCSS:(a)
itisan unknownstring not present intheQuery
Component Dictionary; and (b) its length is not
lessthantwo,astheaveragelengthofconceptsin
Chineseisgreaterthanone(Wang1992).
At each stage of parsing the query using FSA
(Hobbsetal1997),weneedtomakedecisionon
which state to proceedand how to handleunex-
pected tokens in the query. Thus at each stage,
FSAneedstoperformthreefunctions:
a) Gotofunction:Itmapsapairconsistingofa
stateandaninputsymbolintoanewstateor
thefailstate.WeuseG(N,X)=N’todefine
thegotofunctionfromStateNtoStateN’,
giventheoccurrenceoftokenX.
b) Fail function: It is consulted whenever the
gotofunctionreportsafailurewhenencoun-
teringanunexpectedtoken.Weusef(N)=N’
torepresentthefailfunction.
c) Output function: In the FSA, certain states
aredesignatedasoutputstates,which indi-
cate that a sequence of tokens has been
found and are tagged with the appropriate
semantictag.
To construct a goto function, we begin with a
graph consisting of one vertex which represents
State0.WethenentereachtokenXintothegraph
byaddingadirectedpathtothegraphthatbegins

atthestartstate.Newverticesandedgesareadded
tothegraph sothat therewill be, startingat the
startstate,apathinthegraphthatspellsoutthe
tokenX.ThetokenXisaddedtotheoutputfunc-
tionofthestateatwhichthepathterminates.
Forexample,supposethatourQueryComponent
Dictionary consists of seven phrases as follows:

 (please help me);  (some); 
(about);
(news); (collect); (tell
me);
 (what do youhave)”. Adding these
tokensintothegraphwillresultinaFSAasshown
inFigure2.ThepathfromState0toState3spells
outthephrase“
(Pleasehelpme)”,andon
completion of this path, we associate its output
withsemantictag6.Similarly,theoutputof“

(some)” is associated with State 5, and semantic
tag4,andsoon.
Wenowuseanexampletoillustratetheprocess
of parsing the query. Suppose the user issues a
speechquery:”
” (please help me to collect some information
about Bin Laden).However, the resultofspeech
recognition witherrors is: ”
 (please)  (help)
(me) (receive) (send) (some)

(about) (half) (pull) (light) (of) 
(news)”. Note that there are 4 mis-recognized
characterswhichareunderlined.

Note:indicatesthesemantictag.
Figure2:FSAforpartofQueryComponentDictionary
TheFSAbeginswithState0.Whenthesystem
encountersthesequenceofcharacters
(please)
(help) (me),thestatechangesfrom0to1,2
andeventuallyto3.AtState3,thesystemrecog-
nizes a polite remark phrase and output a token
withsemantictag6.
Next,thesystemmeetsthecharacter
(receive),
itwilltransittoState10,becauseofg(0,
)=10.
Whenthesystemseesthenextcharacter
(send),
which does not have a corresponding transition
rule, the goto function reports a failure. Because
thelengthofthestringis2andthestringisnotin
theQueryComponentDictionary,thesemantictag
9isassignedtotoken”
”accordingtothedefi-
nitionofCSS.
By repeating the aboveprocess, we obtain the
followingresult:
     
694793

HerethesemantictagsareasdefinedinTable1.
Itisnotedthatbecauseofspeechrecognitionerrors,
thesystem detected twoCSSs,andboth ofthem
containspeechrecognitionerrors.
3.3CSSExtractionbyQueryModel
Giventhat we mayfind multiple CSSs, the next
stageistoanalyzetheCSSsfoundalongwiththeir
surroundingcontextinordertodeterminethemost
probableCSS.Theapproachisbasedontheprem-
isethatchoosingthebestsenseforaninputvector
amountstochoosingthemostprobablesensegiven
that vector. The input vector i has three compo-
nents:leftcontext(L
i
),theCSSitself(CSS
i
),and
rightcontext(R
i
).Theprobabilityofsuchastruc-
tureoccurringintheQueryModelisasfollows:

=
=
n
j
jiji
pCs
0
)*(  (1)

whereC
ij
issetto 1ifthe inputvectori(L
i
,R
i
)
matchesthetwocorrespondingleftandrightCSS
contextofthequerystructurej,and0otherwise.p
j

is the possibility of occurrence of the j
th
 query
structure,andnisthetotalnumberofthestructures
intheQueryModel.NotethatEquation(1)givesa
detectedCSShigherweightifitmatchestomore
querystructureswithhigheroccurrenceprobabili-
ties. We simply select the best CSS
i
 such that
)(maxarg
i
i
s
accordingtoEqn(1).
Forillustration,let’sconsidertheaboveexample
with2detectedCSSs.ThetwoCSSvectorsare:[6,
9, 4] and [7, 9, 3]. From the Query Model, we
know that the probability of occurrence, p

j
, of
structure[6,9,4]is0,andthatofstructure[7,9,3]
is0.03,withthelattermatchestoonlyonestruc-
ture.Hencethes
i
valuesforthemare0and0.03
respectively.Thusthemostprobablecoresemantic
structureis[7,9,3]andtheCSS“
(half) (pull)
(light)”isextracted.
4 QueryTermsGeneration
Becauseofspeechrecognitionerror,theCSSob-
tained is likely to contain error, or in the worse
case,missingthemainsemanticsofthequeryalto-
gether.Wenowdiscusshowwealleviatetheerrors
inCSSfortheformercase.Wewillfirstbreakthe
CSS into one or more basic semantic parts, and
thenapplythemulti-tiermethodtomapthequery
componentstoknownphrases.
4.1BreakingCSSintoBasicComponents
Inmanycases,theCSSobtainedmaybemadeup
ofseveralsemanticcomponentsequivalenttobase
nounphrases.Hereweemployatechniquebased
onChinesecutmarks(Wang1992)toperformthe
segmentation. The Chinese cut marks are tokens
that can separate aChinesesentence into several
semanticparts.Zhou(1997)usedsuchtechniqueto
detectnewChinesewords,andreportedgoodre-
sults with precision and recall of 92% and 70%

respectively.ByseparatingtheCSSintobasickey
components,wecanlimitthepropagationoferrors.
4.2Multi-tierquerytermmapping
Inordertofurthereliminatethespeechrecognition
errors,weproposeamulti-tierapproachtomapthe
basic componentsin CSS into known phrases by
usingacombinationofmatchingtechniques.Todo
this,weneedtobuildupaphrasedictionarycon-
taining typical conceptsused ingeneral and spe-
cificdomains.MostbasicCSScomponentsshould
bemappedtooneofthesephrases.Thusevenifa
basiccomponentcontainserrors,aslongaswecan
findasufficientlysimilarphraseinthephrasedic-
tionary, wecanusethisinplaceoftheerroneous
CSScomponent,thuseliminatingtheerrors.
We collected a phrase dictionary containing
about32,842phrases,covering mostlybasenoun
phraseandnamedentity.Thephrasesarederived
fromtwosources.We firstderivedasetofcom-
mon phrases from the digital dictionary and the
logsinthesearchengineusedattheShanghaiJiao
TongUniversity.Wealsoderivedasetofdomain
specific phrases by extracting the base noun
phrasesandnamedentitiesfromtheon-linenews
articlesobtainedduringtheperiod.Thisapproach
isreasonableasinpracticewecanuserecentweb
ornewsarticlesto extractconceptstoupdatethe
phrasedictionary.
Given the phrase dictionary, the next problem
then is to map the basicCSS components tothe

nearest phrases in the dictionary. As the basic
componentsmaycontainerrors,wecannotmatch
them exactly just at the character level. We thus
propose to match each basic component with the
knownphrasesinthedictionaryatthreelevels:(a)
character level; (b) syllable string level; and (c)
confusion syllable string level. The purpose of
matching at levels b and c is to overcome the
homophoneprobleminCSS.Forexample,“

(Laden)” is wrongly recognized as “
 (pull
lamp)”bythespeechrecognitionengine.Sucher-
rorscannotbere-solvedatthecharactermatching
level,butitcanprobablybematchedatthesyllable
stringlevel.Theconfusionmatrixisusedtofurther
reducetheeffectofspeechrecognitionerrorsdue
tosimilarsoundingcharacters.
To account for possible errors in CSS compo-
nents, we perform similarity, instead of exact,
matchingatthethreelevels.GiventhebasicCSS
componentq
i
,andaphrasec
j
inthedictionary,we
compute:
=
=
),(

0
*
|}||,max{|
),(
),(
ii
cqLCS
k
k
ii
ii
ii
M
cq
cqLCS
cqSim
(2)
where LCS(q
i
,c
j
)gives the number of characters/
syllablematchedbetweenq
i
andc
i
intheorderof
theirappearanceusingthelongestcommonsubse-
quence matching (LCS) algorithm (Cormen et al
1990).M

k
isintroducedtoaccountsforthesimilar-
itybetweenthetwomatchingunits,andisdepend-
ent on the level of matching. If the matching is
performedatthecharacterorsyllablestringlevels,
thebasicmatchingunitisonecharacteroronesyl-
lableandthesimilaritybetweenthetwomatching
unitsis1.Ifthematchingisdoneattheconfusion
syllablestringlevel,M
k
isthecorrespondingcoef-
ficientsintheconfusionmatrix.HenceLCS(q
i
,c
j
)
givesthedegreeofmatchbetweenq
i
andc
j
,nor-
malizedbythemaximumlengthofq
i
orc
j
;andΣM
gives the degree of similarity between the units
beingmatched.
Thethreelevelofmatchingalsorangesfrombe-
ingmoreexactatthecharacterlevel,tolessexact

attheconfusionsyllablelevel.Thusifwecanfind
a relevant phrase with sim(q
i
,c
j
)>
 at the higher
characterlevel,wewillnotperformfurthermatch-
ing at the lower levels. Otherwise, we will relax
theconstrainttoperformthe matchingatsucces-
sivelylowerlevels,probablyattheexpenseofpre-
cision.
 Thedetailofalgorithmislistedasfollows:
Input:BasicCSSComponent,q
i

a. Matchq
i
withphrasesindictionaryatcharacter
levelusingEqn.(2).
b. Ifwecannotfindamatch,thenmatchq
i
with
phrasesatthesyllablelevelusingEqn.(2).
c. Ifwestillcannotfindamatch,matchq
i
with
phrasesattheconfusionsyllablelevelusing
Eqn.(2).
d. Ifwefoundamatch,setq’

i
=c
j
;otherwiseset
q’
i
=q
i
.
Forexample,givenaquery:“
”(pleasetellmesomenewsabout
Iraq).Ifthequeryiswronglyrecognizedas“
”. If, however, we
couldcorrectly extracttheCSS“
(Iraq)
fromthismis-recognizedquery,thenwecouldig-
norethespeechrecognitionerrorsinotherpartsof
the above query. Even if there are errors in the
CSSextracted,suchas“
(chen) (waterside)”
insteadof“
(chenshuibian)”,wecouldap-
plythesyllablestringlevelmatchingtocorrectthe
homophone errors. For CSS errors such as “

(corrupt)
(usually)”insteadofthecorrectCSS

(Taliban)”, which could not be corrected
atthesyllablestringmatchinglevel,wecouldap-

plytheconfusionsyllablestringmatchingtoover-
comethiserror.
5 Experimentsandanalysis
Asoursystem aimsto correct theerrorsand ex-
tractCSScomponentsinspokenqueries,itisim-
portant todemonstrate thatour system is able to
handlequeriesofdifferentcharacteristics.Tothis
end,wedevisedtwosetsoftestqueriesasfollows.
a)Corpuswithshortqueries
We devised 10 queries, each containing a CSS
withonlyonebasiccomponent.Thisisthetypical
typeofqueriesposedbytheusersontheweb.We
asked10 differentpeopleto “speak” thequeries,
and used the IBM ViaVoice 98 to perform the
speechtotextconversion.Thisgivesrisetoacol-
lectionof100spokenqueries.Thereisatotalof
1,340Chinesecharactersinthetestquerieswitha
speechrecognitionerrorrateof32.5%.
b)Corpuswithlongqueries

Inordertotestonqueriesusedinstandardtest
corpuses,weadoptedthequerytopics(1-10)em-
ployed in TREC-5Chinese-Languagetrack.Here
each query contains more thanone key semantic
component.Werephrasedthequeriesintonatural
languagequeryformat,andaskedtwelvesubjects
to “read” the queries. We again used the IBM
ViaVoice98toperformthespeechrecognitionon
theresulting120 differentspokenqueries,giving
risetoatotalof2,354Chinesecharacters witha

speechrecognitionerrorrateof23.75%.
Wedevisedtwoexperimentstoevaluatetheper-
formance of ourtechniques.The firstexperiment
wasdesignedtotesttheeffectivenessofourquery
model in extracting CSSs. The second was de-
signedtotesttheaccuracyofouroverallsystemin
extractingbasicquerycomponents.

5.1Test1:AccuracyofextractingCSSs
The test results show that by using our query
model,wecouldcorrectlyextract99%and96%of
CSSs from the spoken queries for the short and
long query category respectively. The errors are
mainly due to the wrong tagging of some query
components,whichcausedthequerymodeltomiss
the correct querystructure, or match to a wrong
structure.
Forexample:giventhequery“
”(pleasetellmesomenewsabout
Taliban).Ifitiswronglyrecognizedas:
   
97910
which is a nonsensical sentence. Since the prob-
abilitiesofoccurrencebothquerystructures[0,9,7]
and[7,9,10]are0,wecouldnotfindtheCSSatall.
Thiserrorismainlyduetothemis-recognitionof
thelastquerycomponent“
(news)”to“ 
(afternoon)”.ItconfusestheQueryModel,which
couldnotfindthecorrectCSS.

Theoverallresultsindicatethattherearefewer
errorsinshortqueriesassuchqueriescontainonly
one CSS component. This is encouraging as in
practicemostusersissueonlyshortqueries.
5.2Test2:Accuracyofextracting basic query
components
In order to test the accuracy of extracting basic
querycomponents,weaskedonesubjecttomanu-
ally divide the CSS into basic components, and
used that as the ground truth. We compared the
followingtwomethodsofextractingCSScompo-
nents:
a) As a baseline, we simply performed the stan-
dardstopwordremovalanddividedthequery
intocomponentswiththehelpofadictionary.
However, there is no attempt to correct the
speechrecognitionerrorsinthesecomponents.
Hereweassumethatthenaturallanguagequery
isabagofwordswithstopwordremoved(Ri-
cardo,1999).Currently,mostsearchenginesare
basedonthisapproach.
b)WeappliedourquerymodeltoextractCSSand
employed the multi-tier mapping approach to
extractandcorrecttheerrorsinthebasicCSS
components.
Tables 3 and 4 give the comparisons between
Methods(a)and(b),whichclearlyshowthatour
methodoutperformsthe baselinemethodbyover
20.2% and20%inF
1

measure fortheshortand
longqueriesrespectively.
Table3:ComparisonofMethodsaandbforshortquery
 Average
Precision

Average
Recall
F
1

Methoda

31% 58.5% 40.5%
Methodb

53.98% 69.4% 60.7%
 +22.98%

+10.9% +20.2%
Table4:ComparisonofMethodsaandbforlongquery
 Average
Precision

Average
Recall
F
1

Methoda


39.23% 85.99% 53.9%
Methodb

67.75% 81.31% 73.9%
 +28.52%

-4.68% +20.0%
Theimprovementislargelyduetotheuseofour
approach to extract CSS and correct the speech
recognition errors in the CSS components. More
detailedanalysisoflongqueriesinTable3reveals
thatourmethodperformsworsethanthebaseline
method in recall. This is mainly due to errors in
extracting and breaking CSS into basic compo-
nents. Although we used the multi-tier mapping
approachtoreducetheerrorsfromspeechrecogni-
tion, its improvement is insufficient to offset the
lost in recallduetoerrors inextractingCSS.On
theotherhand, fortheshortquerycases,without
theerrorsinbreakingCSS,oursystemismoreef-
fectivethanthebaselineinrecall.Itisnotedthatin
bothcases,oursystemperformssignificantlybet-
terthanthebaselineintermsofprecisionandF
1

measures.
6 Conclusion
Althoughresearchonnaturallanguagequeryproc-
essingandspeechrecognitionhasbeencarriedout

formanyyears,thecombinationofthesetwoap-
proachesto help a large population of infrequent
usersto“surfthewebbyvoice”hasbeenrelatively
recent. This paper outlines a divide-and-conquer
approachtoalleviatetheeffectofspeechrecogni-
tionerror,andinextractingkeyCSScomponents
foruseinastandardsearchenginetoretrieverele-
vantdocuments.Themaininnovativestepsinour
system are: (a) we use a query model to isolate
CSSinspeechqueries;(b)webreaktheCSSinto
basiccomponents;and(c)weemployamulti-tier
approach tomapthebasiccomponentstoknown
phrases in the dictionary. The tests demonstrate
thatourapproachiseffective.
Theworkisonlythebeginning.Furtherresearch
canbecarriedoutasfollows.First,asmostofthe
queriesareaboutnamedentities suchastheper-
sonsororganizations,weneedtoperformnamed
entityanalysis onthequeriestobetterextractits
structure,andinmappingtoknownnamedentities.
Second,mostspeechrecognitionenginewillreturn
a list of probable words for each syllable. This
couldbeincorporatedintoourframeworktofacili-
tatemulti-tiermapping.
References
BerlinChen,Hsin-minWang,andLin-ShanLee
(2001),“ImprovedSpokenDocumentRetrieval
byExploringExtraAcousticandLinguistic
Cues”,Proceedingsofthe7thEuropeanConfer-
enceonSpeechCommunicationandTechnology

locatedat
/>PaulS.JacobsandLisaF.Rau(1993),Innova-
tionsinTextInterpretation,ArtificialIntelli-
gence,Volume63,October1993(SpecialIssue
onTextUnderstanding)pp.143-191
Thomas H. Cormen, Charles E. Leiserson and
RonaldL.Rivest(1990),“Introductiontoalgo-
rithms”,publishedbyMcGraw-Hill.
JerryR.Hobbs,etal,(1997),FASTUS:ACas-
cadedFinite-StateTransducerforExtractingIn-
formationfromNatural-LanguageText,Finite-
StateLanguageProcessing,EmmanuelRoche
andYvesSchabes,pp.383-406,MITPress,
JulianKupiec(1993),MURAX:“Arobustlinguis-
tic approach for question answering using an
one-lineencyclopedia”, Proceedings of 16
th
 an-
nual conference on Research and Development
inInformationRetrieval(SIGIR),pp.181-190
Chin-Hui Lee et al (1996), “A Survey on Auto-
matic Speech Recognition with an Illustrative
ExampleOnContinuousSpeechRecognitionof
Mandarin”, in Computational Linguistics and
ChineseLanguageProcessing,pp.1-36
Helen Meng and Pui Yu Hui (2001), “Spoken
DocumentRetrievalfor the languages of Hong
Kong”, International Symposium on Intelligent
Multimedia,VideoandSpeechProcessing,May
2001,locatedat

www.se.cuhk.edu.hk/PEOPLE/
KenneyNg(2000),“InformationFusionForSpo-
ken Document Retrieval”, Proceedings of
ICASSP’00, Istanbul, Turkey, Jun, located at
/>Hsiao Tieh Pu (2000), “Understanding Chinese
Users’ Information Behaviors through Analysis
of Web Search Term Logs”, Journal of Com-
puters,pp.75-82
Liqin, Shen, Haixin Chai, Yong Qin and Tang
Donald (1998),“CharacterError Correction for
ChineseSpeechRecognition System”,Proceed-
ings of International Symposium on Chinese
Spoken Language Processing Symposium Pro-
ceedings,pp.136-138
Amit Singhal and Fernando Pereira (1999),
“Document Expansion for Speech Retrieval”,
Proceedings of the 22
nd
 Annual International
conferenceonResearchandDevelopmentinIn-
formationRetrieval(SIGIR),pp.34~41
Tomek Strzalkowski (1999), “Natural language
information retrieval”,Boston: Kluwer Publish-
ing.
GangWang(2002),“WebsurfingbyChinese
Speech”,Masterthesis,NationalUniversityof
Singapore.
Hsin-minWang,HelenMeng,PatrickSchone,Ber-
lin Chen and Wai-Kt Lo (2001), “Multi-Scale
Audio Indexing for translingual spoken docu-

ment retrieval”, Proceedings of IEEE Interna-
tionalConferenceonAcoustics,Speech, Signal
processing,SaltLakeCity,USA,May2001,lo-
catedat
/>YongchengWang(1992),Technologyandbasisof
Chinese Information Processing, Shanghai Jiao
TongUniversityPress
Baeza-Yates, Ricardo and Ribeiro-Neto, Berthier
(1999),“Introductiontomoderninformationre-
trieval”,PublishedbyLondon:LibraryAssocia-
tionPublishing.
Hai-nanYing,YongJiandWeiShen,(2002),“re-
portofquerylog”,internalreportinShanghai
JiaoTongUniversity
GuodongZhouandKimTengLua(1997)Detec-
tionofUnknownChineseWordsUsingaHybrid
ApproachComputerProcessingofOrientalLan-
guages,Vol11,No1,1997,63-75
GuodongZhou(1997),“LanguageModellingin
MandarinSpeechRecognition”,Ph.D.Thesis,
NationalUniversityofSingapore.

×