Tải bản đầy đủ (.pdf) (422 trang)

Addison wesley XML and java developing web applications 2nd edition may 2002 ISBN 0201770040

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.27 MB, 422 trang )

Chapter5.WorkingwithSAX
Section5.1.Introduction
Section5.2.BasicTipsforUsingSAX
Section5.3.DOMversusSAX
Section5.4.Summary


5.1Introduction
UnlikeDOM,theSAXspecificationisnotauthorizedbyW3C.
SAXwasdevelopedthroughthexml-devmailinglist,thelargest
communityofXML-relateddevelopers.ThedevelopmentofSAX
wasfinishedinMay1998.SAX2.0,whichintroduced
namespacesupportandthefeature/propertymechanism,was
completedinMay2000.
AsdescribedinChapter2,SAXisanevent-basedparsingAPI.
Itsmethodsanddatastructuresaremuchsimplerthanthoseof
DOM.Thissimplicityimpliesthatapplicationprogramsbasedon
SAXarerequiredtodomoreworkthanthosebasedonDOM.
Ontheotherhand,SAX-basedprogramscanoftenachievehigh
performance.
Inthischapter,wedescribesometipsforusingSAX.Thenwe
compareDOMandSAX,andintroducesampleprogramsusing
DOMandSAX.


5.2BasicTipsforUsingSAX
InChapter2,Sections2.4(seeFigure2.2)and2.4.2describe
thebasicconceptsofSAXandtheprogrammingmodelforSAX.
TheconceptofSAXissimple.ASAXparserreadsanXML
documentfromthebeginning,andtheparsertellsan
applicationwhatitfindsbyusingthecallbackmethodsof


ContentHandlerorotherinterfaces.
However,therearesomethingsyoushouldknow.Wediscuss
theminthissection.

5.2.1ContentHandler
Inthissection,wediscussamajortrapforbeginningusersof
SAXandtheparserfeaturemechanism,animportantfeature
introducedinSAX2.

Trapofthecharacters()Events
Thecharacters()methodofContentHandlerconfusesSAX
beginners.Considerthefollowingdocument:

<root>
Hello,
XML&Java!
</root>
Aprogrammermightexpecttheparsingofthisdocumentto
throwfiveevents:

startDocument()


startElement()fortherootelement
characters():"\nHello,\nXML&Java!\n"
endElement()fortherootelement
endDocument()
Actually,theSAXparserofXercesproducesthree
characters()eventsbetweenstartElement()and
endElement().Theyare:


characters():"\nHello,\nXML"
characters():"&"
characters():"Java!\n"
TheSAXparserofCrimsonproduceseightcharacters()
events:

characters():""
characters():"\n"
characters():"Hello,"
characters():"\n"
characters():"XML"
characters():"&"


characters():"Java!"
characters():"\n"
Thesebehaviorsarenotbugsintheseparsers.TheSAX
specificationallowssplittingatextsegmentintoseveralevents.
Sotakecarewhenyouwriteanapplicationthatprocesses
characterdata.
Listing5.1isaprogramthatcheckswhetherthetextinan
elementmatchesagivenstring.Theprogramshowsawayto
solvetheproblemofsplitcharacters()events.
Listing5.1Acorrectwaytoprocesstext,

chap05/TextMatch.java

packagechap05;
importjava.io.IOException;

importjava.util.Stack;
importorg.xml.sax.Attributes;
importorg.xml.sax.SAXException;
importorg.xml.sax.XMLReader;
importorg.xml.sax.helpers.DefaultHandler;
importorg.xml.sax.helpers.XMLReaderFactory;
publicclassTextMatchextendsDefaultHandler{
StringBufferbuffer;
Stringpattern;
Stackcontext;
publicTextMatch(Stringpattern){
this.buffer=newStringBuffer();


this.pattern=pattern;
this.context=newStack();
}

protectedvoidflushText(){
if(this.buffer.length()>0){
Stringtext=newString(this.buffer);
if(pattern.equals(text)){
System.out.print("Pattern'"+this.pattern
+"'hasbeenfoundaround")
for(inti=0;iSystem.out.print("/"+this.context.elem
}
System.out.println("");
}
}

this.buffer.setLength(0);
}

publicvoidcharacters(char[]ch,intstart,intlen
throwsSAXException{
this.buffer.append(ch,start,len);
}
publicvoidignorableWhitespace(char[]ch,intstart
throwsSAXException{
this.buffer.append(ch,start,len);
}
publicvoidprocessingInstruction(Stringtarget,Str
throwsSAXException{
//NothingtodobecausePIdoesnotaffectthem
//ofadocument.
}
publicvoidstartElement(Stringuri,Stringlocal,


Stringqname,Attributesat
throwsSAXException{
this.flushText();
this.context.push(local);
}
publicvoidendElement(Stringuri,Stringlocal,Str
throwsSAXException{
this.flushText();
this.context.pop();
}


publicstaticvoidmain(String[]argv){
if(argv.length!=2){
System.out.println("TextMatchSystem.exit(1);
}
try{
XMLReaderxreader=XMLReaderFactory.createXML
"org.apache.xerces.parsers.SAXParser");
xreader.setContentHandler(newTextMatch(argv[0
xreader.parse(argv[1]);
}catch(IOExceptionioe){
ioe.printStackTrace();
}catch(SAXExceptionse){
se.printStackTrace();
}
}
}
Thisprogramassumesthatthestarttagsandendtagssplitthe
textandthatthecommentsandprocessinginstructionsdonot.
Characterdataissavedtoabufferinthecharacters()
method,andamatchingprocessagainstthebufferisinvokedin


tagevents.
Let'srunTextMatchagainsttheXMLdocumentshownin
Listing5.2.
Listing5.2AsampledocumentforTextMatch,

chap05/match.xml


<?xmlversion="1.0"encoding="us-ascii"?>
<root>
<movie>A3x3Matri<X/movie>
<book>XM<!---->L&Jav<?target?>a</book>
</root>
R:\samples>javachap05.TextMatch"XML&Java"file:./ch
Pattern'XML&Java'hasbeenfoundaround{}root/{}boo
TextMatchfinds"XML&Java"inthebookelement,the
characterdataofwhichissplitbyacomment,anentity
reference,andaprocessinginstruction.

ParserFeatures
TheSAX2specificationdefinestwostandardfeatures:
namespaceandnamespace-prefix.Thedefaultfeaturesettings
ofSAX2-compliantparsersareasfollows.
Namespacefeature,

istrue.
Namespace-prefixfeature,

isfalse.


Thedefaultsettingshavethesemeanings.
TheparserprovidesinformationaboutnamespaceURIsand
localnamesviaContentHandler.startElement(),
ContentHandler.endElement(),
Attributes.getURI(),and
Attributes.getLocalName().


ContentHandler.startPrefixMapping()and
ContentHandler.endPrefixMapping()arecalledwhen
elementsdeclaringnamespacesarevisitedandleft,
respectively.
AnAttributesinstancecontainsnonamespace
declarations.
Theavailabilityofqualifiednamesisimplementationdependent.
Ifthenamespacefeatureisturnedoff,theavailabilityof
namespaceURIsandlocalnamesisimplementation-dependent,
start/endPrefixMapping()arenotcalled,andan
Attributesinstancecontainsnamespacedeclarations.
Ifthenamespace-prefixfeatureisturnedon,qualifiednames
areavailable,andanAttributesinstancecontains
namespacedeclarations.
Table5.1showsasummaryofthesefeatures.
Table5.1.SAXFeatures

NAMESPACE
FEATURE
true

NAMESPACENS
QUALIFIED
CALLS
PREFIX
URI/LOCAL
*PrefixMapping(
NAME
FEATURE
NAME

false
x
-


true
false
false

true
false
true

x
-

x
x

Basically,youneednotdisablethenamespacefeature.Turnit
offonlywhentheslightoverheadofthisfeatureis
unacceptable.Turnonthenamespace-prefixfeatureifyouneed
qualifiednamesornamespacedeclarationsasattributes.
AccordingtotheJAXPspecification,aSAXparsercreatedby
SAXParserFactoryisnotnamespace-awarebydefault.Inthe
JAXPimplementationofXerces,
SAXParserFactory.setNamespaceAware()affectsthe
settingofthenamespacefeature.AsforCrimsonintheJAXP
1.1referenceimplementation,
SAXParserFactory.setNamespaceAware()seemstoaffect

neitherthenamespacefeaturenorthenamespace-prefix
feature.WerecommendthatyoualwaysgetanXMLReader
instancebyusingSAXParser.getXMLReader()andthatyou
setthesefeaturesexplicitly.

5.2.2UsingandWritingSAXFilters
ASAXfilterreceivesSAXeventsfromaSAXparser,modifies
theseevents,andforwardsthemtoahandler,asshownin
Figure5.1.AsfarastheSAXparserisconcerned,theSAXfilter
canbeseenasahandler.Ontheotherhand,asfarthehandler
isconcerned,theSAXfiltercanbeseenasaSAXparser.
Figure5.1.SAXfilter


TheSAX2specificationprovidestheXMLFilterinterfacefor
SAXfilters.ThisinterfaceisderivedfromXMLReader,the
interfaceforSAXparsers.
TypicalusesofSAXfiltersarethefollowing.

ModifyingXMLdocuments
WhenyouwriteaprogramformodifyingXMLdocuments,you
mightwanttoreuseXMLSerializerforserializingSAXevents
toanXMLdocument.ThenyouonlyhavetowriteaSAXfilter
thatmodifiesSAXevents,andinsertthefilterbetweenaSAX
parserandXMLSerializer.

Convenienceforthenexthandler
Youcansimplifyhandlersforcomplicatedtasksbycreating
preprocessingSAXfilters.Forexample,supposethatyouwant
towriteaSAXhandlerthatsupportsboth

title="foobar">...</book>and<book>
<title>foobar</title>...</book>.TheSAXhandler
becomessimplerifyouwriteafilterforcanonicalizingeventsto
oneofthetwoformats.Anotherexampleisthecharacters()
trapdiscussedinSection5.2.1.Youcanavoidthetrapby
implementingaSAXfilterthatconcatenatesconsecutive
characters()events.


Controlofeventflow
SupposethatyouwanttousetwohandlersforasingleXML
documentatthesametime.Unfortunately,youcannotregister
twoormorehandlersofthesametypetooneXMLReader
instance.SoyouimplementahandlerasaSAXfilter(see
Figure5.2),oryoumakeafilterthatacceptstheregistrationof
twohandlersandduplicatestheinputevents(seeFigure5.3.)
Figure5.2.Ahandlerperformsasafilter.

Figure5.3.Afilterduplicatesevents.


UsingFilters
AtypicalcodefragmentforusingaSAXparserfollows.

XMLReaderparser=XMLReaderFactory.createXMLReader();
//orparser=newSAXParser()ifyouuseXerces.
parser.setContentHandler(handler);
parser.parse(...);
Ifyouwantafilterbetweentheparserandthehandler,modify
thiscodefragmenttothis:


XMLReaderparser=...
XMLFilterfilter=newSomethingFilter();
filter.setParent(parser);
filter.setContentHandler(handler);
filter.parse(...);
ortothis:

//Iftheconstructorforthefiltertakesaparent
//(parserorfilter)asaparameter.
XMLReaderparser=...
XMLReaderfilter=newSomethingFilter(parser);
filter.setContentHandler(handler);
filter.parse(...);
Thefollowingtwocodefragmentsuseaparserandtwofilters.
Firstfragment:

XMLReaderparser=...
XMLFilterfilter1=newSomethingFilter();
filter1.setParent(parser);


XMLFilterfilter2=newOtherFilter();
filter2.setParent(filter1);
filter2.setContentHandler(handler);
filter2.parse(...);
Secondfragment:

XMLReaderparser=...
XMLReaderfilter2=newOtherFilter(newSomethingFilter

filter2.setContentHandler(handler);
filter2.parse(...);
Thesecodefragmentsmakeaneventchain,asshowninFigure
5.4.
Figure5.4.Aparser,twofilters,andahandler

WritingFilters
TheXMLFilterinterfaceisderivedfromtheXMLReader
interfacebyaddinggetParent()andsetParent().The
XMLFilterismerelyaninterfacedefinition,anditdoesnot
helpustoimplementafilter.Asabaseclassforimplementing
filters,SAXprovidestheXMLFilterImplclass.
Asdemonstratedearlier,ifafilterconstructortakesan
XMLReaderasanargument,theapplicationcodebecomes


simpler.
Listing5.3isanexampleofaSAXfilter.Itreplaceselements
like<email></email>with
<uri>mailto:</uri>.
Listing5.3AnexampleofaSAXfilter,

chap05/MailFilter.java

packagechap05;
importorg.apache.xerces.parsers.SAXParser;
importorg.apache.xml.serialize.OutputFormat;
importorg.apache.xml.serialize.XMLSerializer;
importorg.xml.sax.Attributes;
importorg.xml.sax.ContentHandler;

importorg.xml.sax.SAXException;
importorg.xml.sax.XMLReader;
importorg.xml.sax.helpers.AttributesImpl;
importorg.xml.sax.helpers.XMLFilterImpl;
importorg.xml.sax.helpers.XMLReaderFactory;
/**
*<email></email>
*-><uri>mailto:</uri>
*/
publicclassMailFilterextendsXMLFilterImpl{
publicMailFilter(XMLReaderparent){
super(parent);
}
/**
*Replace`email'with`uri',


*andmakeacharacterseventfor"mailto:".
*/
publicvoidstartElement(Stringuri,Stringloca
Attributesatts)
throwsSAXException{
ContentHandlerch=this.getContentHandler();
if(ch==null)
return;
if(uri.length()==0&&local.equals("email"
ch.startElement("","uri","uri",atts);
Stringmailto="mailto:";
ch.characters(mailto.toCharArray(),0,mai
}else

ch.startElement(uri,local,qname,atts);
}

/**
*Replace`email'with`uri'.
*/
publicvoidendElement(Stringuri,Stringlocal,
throwsSAXException{
ContentHandlerch=this.getContentHandler();
if(ch==null)
return;
if(uri.length()==0&&local.equals("email"
ch.endElement("","uri","uri");
}else
ch.endElement(uri,local,qname);
}

publicstaticvoidmain(String[]argv)throwsExcept
OutputFormatformat
=newOutputFormat("xml","UTF-8",false)


format.setPreserveSpace(true);
ContentHandlerhandler=newXMLSerializer(Syste

XMLReaderparser=XMLReaderFactory.createXMLRea
"org.apache.xerces.parsers.SAXParser");
XMLReaderfilter=newMailFilter(parser);
filter.setContentHandler(handler);
filter.parse(argv[0]);

System.out.println("");
}
}
Intheoverridingmethodsofyourfilter,remembertoforward
(modified)SAXeventstotheappropriatemethodsofthe
registeredhandler.NotethatgetXxxHandler()methodsmay
returnnull.Soyouhavetocheckwhetherthenexthandleris
nullbeforecallingit.
Toseehowthisprogramworks,typethefollowing:

R:\samples>typechap05\addresses.xml
<?xmlversion="1.0"encoding="us-ascii"?>
<addresses>
<email></email>
<email></email>
<email></email>
</addresses>

R:\samples>javachap05.MailFilterfile:./chap05/addres
<?xmlversion="1.0"encoding="UTF-8"?>
<addresses>
<uri>mailto:</uri>
<uri>mailto:</uri>


<uri>mailto:</uri>
</addresses>
5.2.3NewFeaturesofSAX2
Inthissection,wesummarizethenewfeaturesofSAX2for
developerswhohaveexperiencewithSAX1.


Namespacesupport
SAX1wasfinalizedbeforethe"NamespaceinXML"specification
becameaW3CRecommendation.SoSAX1hasnonamespace
support.WithSAX2,applicationscanreceivenamespace
informationasdescribedinSection5.2.1.

SAXfilters
SAX1hasnointerfaceforfilters,thoughwecanwritefilters
withoutsuchaninterface.SAX2introducedastandard
XMLFilterinterface.Itmakeswritingandusingfilterseasier.

MoreinformationaboutanXMLdocument
WithSAX1,applicationscanknownothingaboutcomments,
CDATAsections,andmanytypesofdeclarationsinDTDs.SAX2
supportsthemwithnewinterfaces.

Feature/propertymechanism
SAX2providesagenericmechanismtoenableordisablethe
featuresofSAXparsersandtosetorgetextrainformation


aboutSAXparsers.

Namechangestoclassesandinterfaces
SomeinterfacesofSAX1weremadeobsoletebySAX2.We
recommendusingtheSAX2interfacesevenifyoudon'tneed
thenewfeaturesofSAX.Table5.2summarizesthename
changes.
Table5.2.InterfaceChangesbetweenSAX1andSAX2


SAX1
Parser

SAX2
XMLReader

CHANGES
Supportofnew
interfaces
ParserFactory
XMLReaderFactory Supportofnew
interfaces
DocumentHandler ContentHandler Supportofnamespace
HandlerBase
DefaultHandler Supportofnew
interfaces
AttributeList
Attributes
Supportofnamespace
AttributeListImpl AttributesImpl Supportofnew
interfaces
DeclHandler
N/A
Receivedeclarationsin
DTDs
LexicalHandler Receivelexical
N/A
informationsuchas
commentsandCDATA

sections
XMLFilter
N/A
Newfilterinterface


5.3DOMversusSAX
WediscussedthebasicconceptsofDOMandtipsforusingDOM
inChapter4anddiscussedthoseofSAXintheprevious
section.InSection2.4.3,wediscussedpointsfordeciding
whethertouseDOMorSAX.Inthissection,wecomparethe
performanceofDOMandSAXandstudytheconversionofDOM
fromandtoSAX.

5.3.1Performance:MemoryandSpeed
Inthissection,wecomparetheperformanceofDOMandSAX
basedonmemoryusageandonparsingspeed.

MemoryUsage
First,wecomparethememoryusageofDOMandSAX.Wecan
guessthatSAXuseslessmemorythanDOM.
WeusetheXMLdocumentshowninListing5.4.Itssizeis348
bytes.
Listing5.4Asampledocumenttotestmemoryusage,

chap05/memtest10.xml

<?xmlversion="1.0"encoding="us-ascii"?>
<root>
<child>Hello,XML!1</child>

<child>Hello,XML!2</child>
<child>Hello,XML!3</child>
<child>Hello,XML!4</child>
<child>Hello,XML!5</child>


<child>Hello,XML!6</child>
<child>Hello,XML!7</child>
<child>Hello,XML!8</child>
<child>Hello,XML!9</child>
<child>Hello,XML!10</child>
</root>
Listing5.5parsesagivenXMLdocumenttentimeswithaSAX
parserandprintsthememoryusageforeachiteration.
Listing5.5PrintmemoryusageforSAXparsing,

chap05/MemoryUsageSAX.java

packagechap05;
importorg.apache.xerces.parsers.SAXParser;

publicclassMemoryUsageSAX{
staticvoidprintMemory(){
System.gc();
Runtimert=Runtime.getRuntime();
System.out.print(rt.totalMemory()-rt.freeMemory(
}
publicstaticvoidmain(String[]argv)throwsExcep
Stringxml=argv[0];
printMemory();

System.out.println("");
finalintN=10;
SAXParsersaxp=newSAXParser();
printMemory();
for(inti=0;iSystem.out.print(",");


saxp.parse(xml);
printMemory();
}
System.out.println("");
}
}
R:\samples>javachap05.MemoryUsageSAXfile:./chap05/mem
104792,152912,208360,207712,247704,207712,247704,207712
247704,207712
ASAXparsercreateseventsandthrowsthemtoahandler.If
thehandlerdoesnothingortherearenohandlers,nothingis
storedinmemory.Theresultjustshownconfirmsthis
observation.Theamountofmemoryuseddidnotincreaseafter
thefirstparsing.Thememoryconsumedinthefirstparsingwas
fortheclassesandworkingareaoftheparser.
Next,let'sdosimilarexperimentsforDOM.Listing5.6parsesa
givenXMLdocumentwithaDOMparsertentimesandprints
thememoryusageforeachiteration.Toseehowmuchmemory
isusedfortheDOMtrees,theprogramkeepseachofthe
createdDOMtreesinmemory.
Listing5.6PrintmemoryusageforDOMparsing,


chap05/MemoryUsageDOM.java

packagechap05;
importorg.apache.xerces.parsers.DOMParser;
importorg.w3c.dom.Document;

publicclassMemoryUsageDOM{
staticfinalStringPROP_DOC=
" />

staticfinalStringFEATURE_DEFER=
" />
staticvoidprintMemory(){
System.gc();
Runtimert=Runtime.getRuntime();
System.out.print(rt.totalMemory()-rt.freeMemory()
}

publicstaticvoidmain(String[]argv)throwsExcept
StringclassName=argv[0];
booleandefer=argv[1].equals("true");
Stringxml=argv[2];
printMemory();
System.out.println("");
finalintN=10;
Document[]docs=newDocument[N];
DOMParserdomp=newDOMParser();
domp.setProperty(PROP_DOC,className);
domp.setFeature(FEATURE_DEFER,defer);
printMemory();

for(inti=0;iSystem.out.print(",");
domp.parse(xml);
docs[i]=domp.getDocument();
printMemory();
}
System.out.println("");
}
}
XerceshastwoDOMimplementations.Oneisfullycompliant


withallDOMLevel2specifications.ItsDocument
implementationclassis
org.apache.xerces.dom.DocumentImpl.Another
implementationsupportsDOMLevel2Coreonly.ItsDocument
implementationclassis
org.apache.xerces.dom.CoreDocumentImpl.Inaddition,
DocumentImplhastheDeferredDOMfeature,whichimproves
parsingspeed.IfDeferredDOMisenabled,theXercesparser
doesnotcreateallDOMnodesduringparsing.Theyarecreated
onlywhenanapplicationprogramattemptstoaccessthem.
Inthissection,wecallDocumentImplwithDeferredDOM
"DeferredDOM,"wecallDocumentImplwithoutdeferredDOM
"Non-deferredDOM,"andwecallCoreDocumentImpl"Core
DOM."
Listing5.6cancheckthememoryusageofthesethree
implementations:DeferredDOM,Non-deferredDOM,andCore
DOM.


R:\samples>javachap05.MemoryUsageDOMorg.apache.xerces
DocumentImpltruefile:./chap05/memtest10.xml
104768,155576,334536,446816,563016,679216,795416,896928
1129328,1245528,1347040

R:\samples>javachap05.MemoryUsageDOMorg.apache.xerce
DocumentImplfalsefile:./chap05/memtest10.xml
104768,155576,278488,280832,324480,327120,329776,291456
340400,302080

R:\samples>javachap05.MemoryUsageDOMorg.apache.xerce
CoreDocumentImplfalsefile:./chap05/memtest10.xml
104776,155584,278472,280792,324416,327032,329664,291320
340192,301848


ThefirstcommandinvokesDeferredDOM,whichisthedefault
settingofXerces,andusesapproximately110KBforone
document.ThesecondinvokesNon-deferredDOManduses
about2.62KBforonedocument.ThethirdinvokesCoreDOM
andusesabout2.60KBforonedocument.
Figure5.5showsthememoryusageofSAX,DeferredDOM,
Non-deferredDOM,andCoreDOM.
Figure5.5.MemoryusageforSAXandDOM
implementations

ForNon-deferredDOMorCoreDOM,theamountofmemory
usedincreasesinproportiontothenumberofnodesina
document.ForDeferredDOM,theamountofmemoryusedis
notproportional.Itdoesnotuse220KBforadocumenttwiceas

large.Table5.3showsthememoryusagefordocuments
containing10,100,200,300,400,or500childnodes.
ThisresultindicatesthatDeferredDOMwastesmuchmemory.
Infact,DeferredDOMdeferscreatingDOMnodesinorderto
improvenotmemoryperformancebutparsingspeed.In
general,objectcreationinJavacostmuchtime,andreducing
objectcreation(newoperators)isveryeffectiveforimproving


×