Chapter5.WorkingwithSAX
Section5.1.Introduction
Section5.2.BasicTipsforUsingSAX
Section5.3.DOMversusSAX
Section5.4.Summary
5.1Introduction
UnlikeDOM,theSAXspecificationisnotauthorizedbyW3C.
SAXwasdevelopedthroughthexml-devmailinglist,thelargest
communityofXML-relateddevelopers.ThedevelopmentofSAX
wasfinishedinMay1998.SAX2.0,whichintroduced
namespacesupportandthefeature/propertymechanism,was
completedinMay2000.
AsdescribedinChapter2,SAXisanevent-basedparsingAPI.
Itsmethodsanddatastructuresaremuchsimplerthanthoseof
DOM.Thissimplicityimpliesthatapplicationprogramsbasedon
SAXarerequiredtodomoreworkthanthosebasedonDOM.
Ontheotherhand,SAX-basedprogramscanoftenachievehigh
performance.
Inthischapter,wedescribesometipsforusingSAX.Thenwe
compareDOMandSAX,andintroducesampleprogramsusing
DOMandSAX.
5.2BasicTipsforUsingSAX
InChapter2,Sections2.4(seeFigure2.2)and2.4.2describe
thebasicconceptsofSAXandtheprogrammingmodelforSAX.
TheconceptofSAXissimple.ASAXparserreadsanXML
documentfromthebeginning,andtheparsertellsan
applicationwhatitfindsbyusingthecallbackmethodsof
ContentHandlerorotherinterfaces.
However,therearesomethingsyoushouldknow.Wediscuss
theminthissection.
5.2.1ContentHandler
Inthissection,wediscussamajortrapforbeginningusersof
SAXandtheparserfeaturemechanism,animportantfeature
introducedinSAX2.
Trapofthecharacters()Events
Thecharacters()methodofContentHandlerconfusesSAX
beginners.Considerthefollowingdocument:
<root>
Hello,
XML&Java!
</root>
Aprogrammermightexpecttheparsingofthisdocumentto
throwfiveevents:
startDocument()
startElement()fortherootelement
characters():"\nHello,\nXML&Java!\n"
endElement()fortherootelement
endDocument()
Actually,theSAXparserofXercesproducesthree
characters()eventsbetweenstartElement()and
endElement().Theyare:
characters():"\nHello,\nXML"
characters():"&"
characters():"Java!\n"
TheSAXparserofCrimsonproduceseightcharacters()
events:
characters():""
characters():"\n"
characters():"Hello,"
characters():"\n"
characters():"XML"
characters():"&"
characters():"Java!"
characters():"\n"
Thesebehaviorsarenotbugsintheseparsers.TheSAX
specificationallowssplittingatextsegmentintoseveralevents.
Sotakecarewhenyouwriteanapplicationthatprocesses
characterdata.
Listing5.1isaprogramthatcheckswhetherthetextinan
elementmatchesagivenstring.Theprogramshowsawayto
solvetheproblemofsplitcharacters()events.
Listing5.1Acorrectwaytoprocesstext,
chap05/TextMatch.java
packagechap05;
importjava.io.IOException;
importjava.util.Stack;
importorg.xml.sax.Attributes;
importorg.xml.sax.SAXException;
importorg.xml.sax.XMLReader;
importorg.xml.sax.helpers.DefaultHandler;
importorg.xml.sax.helpers.XMLReaderFactory;
publicclassTextMatchextendsDefaultHandler{
StringBufferbuffer;
Stringpattern;
Stackcontext;
publicTextMatch(Stringpattern){
this.buffer=newStringBuffer();
this.pattern=pattern;
this.context=newStack();
}
protectedvoidflushText(){
if(this.buffer.length()>0){
Stringtext=newString(this.buffer);
if(pattern.equals(text)){
System.out.print("Pattern'"+this.pattern
+"'hasbeenfoundaround")
for(inti=0;i
System.out.print("/"+this.context.elem
}
System.out.println("");
}
}
this.buffer.setLength(0);
}
publicvoidcharacters(char[]ch,intstart,intlen
throwsSAXException{
this.buffer.append(ch,start,len);
}
publicvoidignorableWhitespace(char[]ch,intstart
throwsSAXException{
this.buffer.append(ch,start,len);
}
publicvoidprocessingInstruction(Stringtarget,Str
throwsSAXException{
//NothingtodobecausePIdoesnotaffectthem
//ofadocument.
}
publicvoidstartElement(Stringuri,Stringlocal,
Stringqname,Attributesat
throwsSAXException{
this.flushText();
this.context.push(local);
}
publicvoidendElement(Stringuri,Stringlocal,Str
throwsSAXException{
this.flushText();
this.context.pop();
}
publicstaticvoidmain(String[]argv){
if(argv.length!=2){
System.out.println("TextMatch
System.exit(1);
}
try{
XMLReaderxreader=XMLReaderFactory.createXML
"org.apache.xerces.parsers.SAXParser");
xreader.setContentHandler(newTextMatch(argv[0
xreader.parse(argv[1]);
}catch(IOExceptionioe){
ioe.printStackTrace();
}catch(SAXExceptionse){
se.printStackTrace();
}
}
}
Thisprogramassumesthatthestarttagsandendtagssplitthe
textandthatthecommentsandprocessinginstructionsdonot.
Characterdataissavedtoabufferinthecharacters()
method,andamatchingprocessagainstthebufferisinvokedin
tagevents.
Let'srunTextMatchagainsttheXMLdocumentshownin
Listing5.2.
Listing5.2AsampledocumentforTextMatch,
chap05/match.xml
<?xmlversion="1.0"encoding="us-ascii"?>
<root>
<movie>A3x3Matri<X/movie>
<book>XM<!---->L&Jav<?target?>a</book>
</root>
R:\samples>javachap05.TextMatch"XML&Java"file:./ch
Pattern'XML&Java'hasbeenfoundaround{}root/{}boo
TextMatchfinds"XML&Java"inthebookelement,the
characterdataofwhichissplitbyacomment,anentity
reference,andaprocessinginstruction.
ParserFeatures
TheSAX2specificationdefinestwostandardfeatures:
namespaceandnamespace-prefix.Thedefaultfeaturesettings
ofSAX2-compliantparsersareasfollows.
Namespacefeature,
istrue.
Namespace-prefixfeature,
isfalse.
Thedefaultsettingshavethesemeanings.
TheparserprovidesinformationaboutnamespaceURIsand
localnamesviaContentHandler.startElement(),
ContentHandler.endElement(),
Attributes.getURI(),and
Attributes.getLocalName().
ContentHandler.startPrefixMapping()and
ContentHandler.endPrefixMapping()arecalledwhen
elementsdeclaringnamespacesarevisitedandleft,
respectively.
AnAttributesinstancecontainsnonamespace
declarations.
Theavailabilityofqualifiednamesisimplementationdependent.
Ifthenamespacefeatureisturnedoff,theavailabilityof
namespaceURIsandlocalnamesisimplementation-dependent,
start/endPrefixMapping()arenotcalled,andan
Attributesinstancecontainsnamespacedeclarations.
Ifthenamespace-prefixfeatureisturnedon,qualifiednames
areavailable,andanAttributesinstancecontains
namespacedeclarations.
Table5.1showsasummaryofthesefeatures.
Table5.1.SAXFeatures
NAMESPACE
FEATURE
true
NAMESPACENS
QUALIFIED
CALLS
PREFIX
URI/LOCAL
*PrefixMapping(
NAME
FEATURE
NAME
false
x
-
true
false
false
true
false
true
x
-
x
x
Basically,youneednotdisablethenamespacefeature.Turnit
offonlywhentheslightoverheadofthisfeatureis
unacceptable.Turnonthenamespace-prefixfeatureifyouneed
qualifiednamesornamespacedeclarationsasattributes.
AccordingtotheJAXPspecification,aSAXparsercreatedby
SAXParserFactoryisnotnamespace-awarebydefault.Inthe
JAXPimplementationofXerces,
SAXParserFactory.setNamespaceAware()affectsthe
settingofthenamespacefeature.AsforCrimsonintheJAXP
1.1referenceimplementation,
SAXParserFactory.setNamespaceAware()seemstoaffect
neitherthenamespacefeaturenorthenamespace-prefix
feature.WerecommendthatyoualwaysgetanXMLReader
instancebyusingSAXParser.getXMLReader()andthatyou
setthesefeaturesexplicitly.
5.2.2UsingandWritingSAXFilters
ASAXfilterreceivesSAXeventsfromaSAXparser,modifies
theseevents,andforwardsthemtoahandler,asshownin
Figure5.1.AsfarastheSAXparserisconcerned,theSAXfilter
canbeseenasahandler.Ontheotherhand,asfarthehandler
isconcerned,theSAXfiltercanbeseenasaSAXparser.
Figure5.1.SAXfilter
TheSAX2specificationprovidestheXMLFilterinterfacefor
SAXfilters.ThisinterfaceisderivedfromXMLReader,the
interfaceforSAXparsers.
TypicalusesofSAXfiltersarethefollowing.
ModifyingXMLdocuments
WhenyouwriteaprogramformodifyingXMLdocuments,you
mightwanttoreuseXMLSerializerforserializingSAXevents
toanXMLdocument.ThenyouonlyhavetowriteaSAXfilter
thatmodifiesSAXevents,andinsertthefilterbetweenaSAX
parserandXMLSerializer.
Convenienceforthenexthandler
Youcansimplifyhandlersforcomplicatedtasksbycreating
preprocessingSAXfilters.Forexample,supposethatyouwant
towriteaSAXhandlerthatsupportsboth
title="foobar">...</book>and<book>
<title>foobar</title>...</book>.TheSAXhandler
becomessimplerifyouwriteafilterforcanonicalizingeventsto
oneofthetwoformats.Anotherexampleisthecharacters()
trapdiscussedinSection5.2.1.Youcanavoidthetrapby
implementingaSAXfilterthatconcatenatesconsecutive
characters()events.
Controlofeventflow
SupposethatyouwanttousetwohandlersforasingleXML
documentatthesametime.Unfortunately,youcannotregister
twoormorehandlersofthesametypetooneXMLReader
instance.SoyouimplementahandlerasaSAXfilter(see
Figure5.2),oryoumakeafilterthatacceptstheregistrationof
twohandlersandduplicatestheinputevents(seeFigure5.3.)
Figure5.2.Ahandlerperformsasafilter.
Figure5.3.Afilterduplicatesevents.
UsingFilters
AtypicalcodefragmentforusingaSAXparserfollows.
XMLReaderparser=XMLReaderFactory.createXMLReader();
//orparser=newSAXParser()ifyouuseXerces.
parser.setContentHandler(handler);
parser.parse(...);
Ifyouwantafilterbetweentheparserandthehandler,modify
thiscodefragmenttothis:
XMLReaderparser=...
XMLFilterfilter=newSomethingFilter();
filter.setParent(parser);
filter.setContentHandler(handler);
filter.parse(...);
ortothis:
//Iftheconstructorforthefiltertakesaparent
//(parserorfilter)asaparameter.
XMLReaderparser=...
XMLReaderfilter=newSomethingFilter(parser);
filter.setContentHandler(handler);
filter.parse(...);
Thefollowingtwocodefragmentsuseaparserandtwofilters.
Firstfragment:
XMLReaderparser=...
XMLFilterfilter1=newSomethingFilter();
filter1.setParent(parser);
XMLFilterfilter2=newOtherFilter();
filter2.setParent(filter1);
filter2.setContentHandler(handler);
filter2.parse(...);
Secondfragment:
XMLReaderparser=...
XMLReaderfilter2=newOtherFilter(newSomethingFilter
filter2.setContentHandler(handler);
filter2.parse(...);
Thesecodefragmentsmakeaneventchain,asshowninFigure
5.4.
Figure5.4.Aparser,twofilters,andahandler
WritingFilters
TheXMLFilterinterfaceisderivedfromtheXMLReader
interfacebyaddinggetParent()andsetParent().The
XMLFilterismerelyaninterfacedefinition,anditdoesnot
helpustoimplementafilter.Asabaseclassforimplementing
filters,SAXprovidestheXMLFilterImplclass.
Asdemonstratedearlier,ifafilterconstructortakesan
XMLReaderasanargument,theapplicationcodebecomes
simpler.
Listing5.3isanexampleofaSAXfilter.Itreplaceselements
like<email></email>with
<uri>mailto:</uri>.
Listing5.3AnexampleofaSAXfilter,
chap05/MailFilter.java
packagechap05;
importorg.apache.xerces.parsers.SAXParser;
importorg.apache.xml.serialize.OutputFormat;
importorg.apache.xml.serialize.XMLSerializer;
importorg.xml.sax.Attributes;
importorg.xml.sax.ContentHandler;
importorg.xml.sax.SAXException;
importorg.xml.sax.XMLReader;
importorg.xml.sax.helpers.AttributesImpl;
importorg.xml.sax.helpers.XMLFilterImpl;
importorg.xml.sax.helpers.XMLReaderFactory;
/**
*<email></email>
*-><uri>mailto:</uri>
*/
publicclassMailFilterextendsXMLFilterImpl{
publicMailFilter(XMLReaderparent){
super(parent);
}
/**
*Replace`email'with`uri',
*andmakeacharacterseventfor"mailto:".
*/
publicvoidstartElement(Stringuri,Stringloca
Attributesatts)
throwsSAXException{
ContentHandlerch=this.getContentHandler();
if(ch==null)
return;
if(uri.length()==0&&local.equals("email"
ch.startElement("","uri","uri",atts);
Stringmailto="mailto:";
ch.characters(mailto.toCharArray(),0,mai
}else
ch.startElement(uri,local,qname,atts);
}
/**
*Replace`email'with`uri'.
*/
publicvoidendElement(Stringuri,Stringlocal,
throwsSAXException{
ContentHandlerch=this.getContentHandler();
if(ch==null)
return;
if(uri.length()==0&&local.equals("email"
ch.endElement("","uri","uri");
}else
ch.endElement(uri,local,qname);
}
publicstaticvoidmain(String[]argv)throwsExcept
OutputFormatformat
=newOutputFormat("xml","UTF-8",false)
format.setPreserveSpace(true);
ContentHandlerhandler=newXMLSerializer(Syste
XMLReaderparser=XMLReaderFactory.createXMLRea
"org.apache.xerces.parsers.SAXParser");
XMLReaderfilter=newMailFilter(parser);
filter.setContentHandler(handler);
filter.parse(argv[0]);
System.out.println("");
}
}
Intheoverridingmethodsofyourfilter,remembertoforward
(modified)SAXeventstotheappropriatemethodsofthe
registeredhandler.NotethatgetXxxHandler()methodsmay
returnnull.Soyouhavetocheckwhetherthenexthandleris
nullbeforecallingit.
Toseehowthisprogramworks,typethefollowing:
R:\samples>typechap05\addresses.xml
<?xmlversion="1.0"encoding="us-ascii"?>
<addresses>
<email></email>
<email></email>
<email></email>
</addresses>
R:\samples>javachap05.MailFilterfile:./chap05/addres
<?xmlversion="1.0"encoding="UTF-8"?>
<addresses>
<uri>mailto:</uri>
<uri>mailto:</uri>
<uri>mailto:</uri>
</addresses>
5.2.3NewFeaturesofSAX2
Inthissection,wesummarizethenewfeaturesofSAX2for
developerswhohaveexperiencewithSAX1.
Namespacesupport
SAX1wasfinalizedbeforethe"NamespaceinXML"specification
becameaW3CRecommendation.SoSAX1hasnonamespace
support.WithSAX2,applicationscanreceivenamespace
informationasdescribedinSection5.2.1.
SAXfilters
SAX1hasnointerfaceforfilters,thoughwecanwritefilters
withoutsuchaninterface.SAX2introducedastandard
XMLFilterinterface.Itmakeswritingandusingfilterseasier.
MoreinformationaboutanXMLdocument
WithSAX1,applicationscanknownothingaboutcomments,
CDATAsections,andmanytypesofdeclarationsinDTDs.SAX2
supportsthemwithnewinterfaces.
Feature/propertymechanism
SAX2providesagenericmechanismtoenableordisablethe
featuresofSAXparsersandtosetorgetextrainformation
aboutSAXparsers.
Namechangestoclassesandinterfaces
SomeinterfacesofSAX1weremadeobsoletebySAX2.We
recommendusingtheSAX2interfacesevenifyoudon'tneed
thenewfeaturesofSAX.Table5.2summarizesthename
changes.
Table5.2.InterfaceChangesbetweenSAX1andSAX2
SAX1
Parser
SAX2
XMLReader
CHANGES
Supportofnew
interfaces
ParserFactory
XMLReaderFactory Supportofnew
interfaces
DocumentHandler ContentHandler Supportofnamespace
HandlerBase
DefaultHandler Supportofnew
interfaces
AttributeList
Attributes
Supportofnamespace
AttributeListImpl AttributesImpl Supportofnew
interfaces
DeclHandler
N/A
Receivedeclarationsin
DTDs
LexicalHandler Receivelexical
N/A
informationsuchas
commentsandCDATA
sections
XMLFilter
N/A
Newfilterinterface
5.3DOMversusSAX
WediscussedthebasicconceptsofDOMandtipsforusingDOM
inChapter4anddiscussedthoseofSAXintheprevious
section.InSection2.4.3,wediscussedpointsfordeciding
whethertouseDOMorSAX.Inthissection,wecomparethe
performanceofDOMandSAXandstudytheconversionofDOM
fromandtoSAX.
5.3.1Performance:MemoryandSpeed
Inthissection,wecomparetheperformanceofDOMandSAX
basedonmemoryusageandonparsingspeed.
MemoryUsage
First,wecomparethememoryusageofDOMandSAX.Wecan
guessthatSAXuseslessmemorythanDOM.
WeusetheXMLdocumentshowninListing5.4.Itssizeis348
bytes.
Listing5.4Asampledocumenttotestmemoryusage,
chap05/memtest10.xml
<?xmlversion="1.0"encoding="us-ascii"?>
<root>
<child>Hello,XML!1</child>
<child>Hello,XML!2</child>
<child>Hello,XML!3</child>
<child>Hello,XML!4</child>
<child>Hello,XML!5</child>
<child>Hello,XML!6</child>
<child>Hello,XML!7</child>
<child>Hello,XML!8</child>
<child>Hello,XML!9</child>
<child>Hello,XML!10</child>
</root>
Listing5.5parsesagivenXMLdocumenttentimeswithaSAX
parserandprintsthememoryusageforeachiteration.
Listing5.5PrintmemoryusageforSAXparsing,
chap05/MemoryUsageSAX.java
packagechap05;
importorg.apache.xerces.parsers.SAXParser;
publicclassMemoryUsageSAX{
staticvoidprintMemory(){
System.gc();
Runtimert=Runtime.getRuntime();
System.out.print(rt.totalMemory()-rt.freeMemory(
}
publicstaticvoidmain(String[]argv)throwsExcep
Stringxml=argv[0];
printMemory();
System.out.println("");
finalintN=10;
SAXParsersaxp=newSAXParser();
printMemory();
for(inti=0;iSystem.out.print(",");
saxp.parse(xml);
printMemory();
}
System.out.println("");
}
}
R:\samples>javachap05.MemoryUsageSAXfile:./chap05/mem
104792,152912,208360,207712,247704,207712,247704,207712
247704,207712
ASAXparsercreateseventsandthrowsthemtoahandler.If
thehandlerdoesnothingortherearenohandlers,nothingis
storedinmemory.Theresultjustshownconfirmsthis
observation.Theamountofmemoryuseddidnotincreaseafter
thefirstparsing.Thememoryconsumedinthefirstparsingwas
fortheclassesandworkingareaoftheparser.
Next,let'sdosimilarexperimentsforDOM.Listing5.6parsesa
givenXMLdocumentwithaDOMparsertentimesandprints
thememoryusageforeachiteration.Toseehowmuchmemory
isusedfortheDOMtrees,theprogramkeepseachofthe
createdDOMtreesinmemory.
Listing5.6PrintmemoryusageforDOMparsing,
chap05/MemoryUsageDOM.java
packagechap05;
importorg.apache.xerces.parsers.DOMParser;
importorg.w3c.dom.Document;
publicclassMemoryUsageDOM{
staticfinalStringPROP_DOC=
" />
staticfinalStringFEATURE_DEFER=
" />
staticvoidprintMemory(){
System.gc();
Runtimert=Runtime.getRuntime();
System.out.print(rt.totalMemory()-rt.freeMemory()
}
publicstaticvoidmain(String[]argv)throwsExcept
StringclassName=argv[0];
booleandefer=argv[1].equals("true");
Stringxml=argv[2];
printMemory();
System.out.println("");
finalintN=10;
Document[]docs=newDocument[N];
DOMParserdomp=newDOMParser();
domp.setProperty(PROP_DOC,className);
domp.setFeature(FEATURE_DEFER,defer);
printMemory();
for(inti=0;iSystem.out.print(",");
domp.parse(xml);
docs[i]=domp.getDocument();
printMemory();
}
System.out.println("");
}
}
XerceshastwoDOMimplementations.Oneisfullycompliant
withallDOMLevel2specifications.ItsDocument
implementationclassis
org.apache.xerces.dom.DocumentImpl.Another
implementationsupportsDOMLevel2Coreonly.ItsDocument
implementationclassis
org.apache.xerces.dom.CoreDocumentImpl.Inaddition,
DocumentImplhastheDeferredDOMfeature,whichimproves
parsingspeed.IfDeferredDOMisenabled,theXercesparser
doesnotcreateallDOMnodesduringparsing.Theyarecreated
onlywhenanapplicationprogramattemptstoaccessthem.
Inthissection,wecallDocumentImplwithDeferredDOM
"DeferredDOM,"wecallDocumentImplwithoutdeferredDOM
"Non-deferredDOM,"andwecallCoreDocumentImpl"Core
DOM."
Listing5.6cancheckthememoryusageofthesethree
implementations:DeferredDOM,Non-deferredDOM,andCore
DOM.
R:\samples>javachap05.MemoryUsageDOMorg.apache.xerces
DocumentImpltruefile:./chap05/memtest10.xml
104768,155576,334536,446816,563016,679216,795416,896928
1129328,1245528,1347040
R:\samples>javachap05.MemoryUsageDOMorg.apache.xerce
DocumentImplfalsefile:./chap05/memtest10.xml
104768,155576,278488,280832,324480,327120,329776,291456
340400,302080
R:\samples>javachap05.MemoryUsageDOMorg.apache.xerce
CoreDocumentImplfalsefile:./chap05/memtest10.xml
104776,155584,278472,280792,324416,327032,329664,291320
340192,301848
ThefirstcommandinvokesDeferredDOM,whichisthedefault
settingofXerces,andusesapproximately110KBforone
document.ThesecondinvokesNon-deferredDOManduses
about2.62KBforonedocument.ThethirdinvokesCoreDOM
andusesabout2.60KBforonedocument.
Figure5.5showsthememoryusageofSAX,DeferredDOM,
Non-deferredDOM,andCoreDOM.
Figure5.5.MemoryusageforSAXandDOM
implementations
ForNon-deferredDOMorCoreDOM,theamountofmemory
usedincreasesinproportiontothenumberofnodesina
document.ForDeferredDOM,theamountofmemoryusedis
notproportional.Itdoesnotuse220KBforadocumenttwiceas
large.Table5.3showsthememoryusagefordocuments
containing10,100,200,300,400,or500childnodes.
ThisresultindicatesthatDeferredDOMwastesmuchmemory.
Infact,DeferredDOMdeferscreatingDOMnodesinorderto
improvenotmemoryperformancebutparsingspeed.In
general,objectcreationinJavacostmuchtime,andreducing
objectcreation(newoperators)isveryeffectiveforimproving