learn xml in a weekend

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (5.08 MB, 282 trang )

Learn XML in a Weekend
ERIK WESTERMANN

Premier Press, a division of Course Technology
2645 Erie Avenue, Suite 41 ,
Cincinnati , Ohio 45208
Copyright © 2002 by Premier Press, a division of Course Technology.
All rights reserved. No part of this book may be reproduced or transmitted in any form
or by any means, electronic or mechanical, including photocopying, recording, or by
any information storage or retrieval system without written permission from Premier
Press, except for the inclusion of brief quotations in a review.
The Premier Press logo and related trade dress are trademarks of Premier Press,
Inc. and may not be used without written permission.
Publisher: Stacy L. Hiquet
Marketing Manager: Heather Hurley
Managing Editor: Sandy Doell
Acquisitions Editor: Todd Jensen
Project Editor/Copy Editor: Sean Medlock
Editorial Assistants: Margaret Bauer, Elizabeth Barrett
Technical Reviewer: Michelle Jones
Interior Layout: Marian Hartsough
Cover Designer: Mike Tanamachi
Indexer: Katherine Stimson
Proofreader: Lorraine Gunter
Extensible Markup Language (XML) 1.0 (Second Edition), © 2000 W3C (MIT, INRIA,
Keio), All Rights Reserved. W3C liability, trademark, document use, and software
licensing rules apply.
The Unicode Consortium, UNICODE STANDARD VERSION 3.0, Fig. 2-3 pg. 14, ©
2000, 1992 by Unicode, Inc. Reprinted by permission of Pearson Education, Inc.
DocBook, © 1992–2000 HaL Computer Systems, Inc., O'Reilly & Associates, Inc.,
AborText, Inc., Fujitsu Software Corporation, Norman Walsh, and the Organization

for the Advancement of Structured Information Standards (OASIS). All other
trademarks are the property of their respective owners.
Important: Premier Press cannot provide software support. Please contact the
appropriate software manufacturer's technical support line or Web site for assistance.
Premier Press and the author have attempted throughout this book to distinguish
proprietary trademarks from descriptive terms by following the capitalization style
used by the manufacturer.
Information contained in this book has been obtained by Premier Press from sources
believed to be reliable. However, because of the possibility of human or mechanical
error by our sources, Premier Press, or others, the Publisher does not guarantee the
accuracy, adequacy, or completeness of any information and is not responsible for
any errors or omissions or the results obtained from use of such information. Readers
should be particularly aware of the fact that the Internet is an ever-changing entity.
Some facts may have changed since this book went to press.
ISBN: 1-59200-010-X
Library of Congress Catalog Card Number: 2002106524
Printed in the United States of America
02 03 04 05 BH 10 9 8 7 6 5 4 3 2 1
For the two greatest boys in the world, my sons, Vikranth and Siddharth.
ABOUT THE AUTHOR
ERIK WESTERMANN is an independent, accomplished developer with more than 10
years of experience in professional programming and design. Erik also enjoys writing
and has written for a number of publications on the Internet and in print. Erik's
professional affiliations include the IEEE Computer Society (
), the
Association for Computing Machinery (
), and the Worldwide Institute of
Software Architects (), where he is a practicing member. Erik has
spoken at conferences including VSLive 2001 in Sydney, Australia. Erik's Web site is
.

ACKNOWLEDGMENTS
First and foremost, I'd like to thank Brad Jones for helping me get this project off the
ground; Todd Jensen, acquisitions editor, for putting up with my "short" e-mails; Amy
Pettinella, my project editor, for overseeing the project from (almost) the beginning;
and Michelle Jones, technical editor, for her comments and suggestions.
I would also like to thank Altova, the producers of XML Spy, for the copy of XML Spy,
and Jon Bachman at eXcelon for helping to get a copy of eXcelon Stylus Studio for
the readers of this book.
I'd like to thank Tom Archer for his support throughout the project, and for helping me
get my writing career started in the first place. I could not have done it without you.
Thanks, Tom!
I'd also like to thank my sons, Vikranth and Siddharth, for understanding when I was
busy, and for the time they gave up spending with me so that I could produce this
book for you. I'd also like to thank my wife, Shanthi, for her ceaseless support in all of
my endeavors.
Foreword
The first time I met Erik was while running the popular CodeGuru Web site a few
years ago, where he was responsible for writing the book reviews. While Erik's
reviews had proven to be one of the most popular aspects of the site, we never had a
system in place that would allow us to easily provide a means for the user to read
archived reviews. Obviously, we could have simply organized the reviews much like
we did the code articles, but we also wanted a means by which reviews could be
searched using criteria such as rating, publisher, author, and title.
The solution Erik came up with was both elegant and functional. By combining the
powers of ASP (Active Server Pages), XML, and XSL, in a weekend he wrote the
foundations for the book review archive section that is still in use today at CodeGuru,
as well as many other popular Web sites. His application design was so flexible that
his work was later expanded to work with archived newsletters and many other
document types.
Okay, so we know that Erik is great with XML, but will reading this book make you as

productive as he is? I'll admit that when I was approached about writing this
foreword, I was a bit wary that any reasonable amount of XML could be learned in a
single weekend. I told Erik that I would need to read the entire book to make sure my
name would be associated with something that I believe in. Well, two days later, not
only was I surprised that the book does indeed deliver on its promises, but I actually
learned several new bits of information about XML despite having used it for over two
years now!
If you're new to XML and have no time to waste on theoretical discussions, this book
is a goldmine of information. By the end of
Saturday afternoon
's lesson, many XML
documents that you may have seen but never quite understood will begin to make
sense. By the end of
Saturday afternoon's lesson, you will understand basic XML
constructs such as elements and attributes, you will have worked with XML
namespaces and fully comprehend how to use them properly, and you'll understand
how XML fits into practical applications. By Sunday evening, you will have done
everything from working with document models and DTDs, to creating and interfacing
your own XML documents with style sheets (both CSS and XSL), to programmatically
accessing XML documents from your applications using the XML DOM.
The key is that Erik takes a pragmatic approach, helping you become productive
quickly while taking the time to explain important details along the way. I found the
discussions on character sets, character encoding, and schemas particularly
interesting because they were so detailed, yet so easy to read and understand.
That's unique in books like this. Erik enjoys teaching others, and his experience
shines though on every page. The numerous sample XML documents throughout the
book make it an interesting read, but Erik goes beyond that and includes code for
Web pages and applications using programming languages like VBScript, JavaScript,
and C#. Also, the samples are interesting even if you're not a programmer, because
they provide you with another perspective on how developers work with XML.

Simply put, the clear explanations, real-world examples, and a focus on relevant
technologies make this book an essential addition to your bookshelf if you're serious
about XML.
Tom Archer

July 2002
Introduction
Welcome to Learn XML In a Weekend. This book contains seven lessons and other
resources that are focused on only one thing: getting you up to speed with XML, its
related technologies, and its latest developments. The lessons span a weekend,
beginning on
Friday evening
and ending on Sunday evening. Yes, you can learn
XML in a weekend!
As you look at all of the other XML books that line the shelves, you might ask,
"What's so special about this book?" This book is different from the rest of the pack
because not only does it explain what XML is and how to use it, but it presents
relevant, practical, and real-world uses of XML. While a lot of books focus on core
XML (its syntax, DTDs, and so on), which is very useful information, they often
assume that you have the expertise to integrate XML into your organization's
operations.
This book focuses on relevant XML technologies like XPath, XSD, DTD, and CSS,
and explains why other technologies, like XDR, may not be important in certain
scenarios. This book also takes a practical approach to working with XML. After
showing you the core syntax and other rules, I'll show you how to work with XML
using two of the best XML editors on the market today: eXcelon's Stylus Studio and
Altova's XML Spy. There's not much point in writing XML documents, schemas, and
transformations by hand if XML editors can generate a lot of the XML for you!
I'll also discuss how to use XML in Internet Explorer, Microsoft Active Server Pages,
and Microsoft's latest offerings: the .NET Framework and the Visual C# .NET

programming language.
This book succinctly describes XML and its related technologies, focusing only on
what's relevant in today's rapidly changing marketplace. I'll help you make choices
that can mean the difference between a successful solution and one that fails
because it uses irrelevant, incompatible, or outdated standards. Skim through the
book now and take a look through Saturday afternoon's lesson, which describes how
to create XML documents. That single lesson covers everything you need to know,
from basic syntax to creating XML documents using different languages (important in
today's global marketplace). By the end of that lesson alone, you'll already
understand terms like entity reference, character sets, and namespaces.
How This Book Is Organized
This book is organized into seven lessons that span a weekend, beginning on Friday
evening and ending on Sunday evening. By Monday morning, you'll be right up to
speed with XML and its related technologies. If you're like me and cannot devote an
entire weekend to reading a book because of other commitments, feel free to read
this book whenever you like.
Here's an overview of each lesson:
Friday Evening focuses on introducing XML: what it is, why it's useful, and how
people use it.
Saturday Morning is a slightly longer lesson that focuses on using XML in Internet
Explorer with HTML and XSL, and using XML with Microsoft's Active Server Pages.
This lesson gives you an overview of what you can do with XML. Don't worry if you're
not a programmer or don't understand the programming language that's used in the
lesson. The idea is to expose you to these technologies so that you'll gain a better
understanding of how others use XML.
Saturday Afternoon
is a slightly longer lesson, focusing on how to write XML
documents by following the rules that XML imposes. This lesson covers basic
document structure, working with attributes, comments, and CDATA sections. The
lesson also covers character encoding, which allows international users to read your

XML documents, and namespaces, a feature that makes your XML documents more
useful by allowing you to share them with others.
Saturday Evening is one of the longest lessons in the book, focusing on document
modeling using DTD and XSD. I suggest that you start reading this chapter as soon
as you can after you complete Saturday afternoon's lesson so that you can complete
it in one evening.
Sunday Morning focuses on using XML Spy and Stylus Studio to create and work
with XML solutions. The lesson also covers XSL debugging using Stylus Studio,
which can save you hours of frustration when your XSL code doesn't work as you
expect it should. This lesson also describes Microsoft XML Core Services, how to
determine what version is installed on your system, and how to get the latest
updates.
Sunday Afternoon is a longer lesson, so I recommend you try to start it as soon as
possible after completing the previous lesson. This lesson focuses on presenting
data on the Web using presentation technologies like CSS and XSL. It examines how
to repurpose an XML document using XSL that you create using Stylus Studio's
graphical XSL editor.
Sunday Evening shows you how to use XML with Internet Explorer's Data Source
Object (DSO), the XML Document Object Model (XML DOM), and Microsoft's .NET
Framework. The DSO produces impressive results, like support for paging through
long sets of data without any programming. The XML DOM is useful for creating and
manipulating an XML document programmatically (via an application's code), and the
Microsoft .NET Framework offers support for XML throughout.
Appendix A provides an HTML and XPath reference to help you become more
productive. This appendix includes examples and screen shots.
Appendix B presents the W3C XML 1.0 Specification. This is a shorter specification
than the one published by the W3C and uses examples throughout.
Appendix C is a list of Web resources.
The Glossary is a comprehensive listing of terms, along with their definitions. Most
terms are used in the book, but there are some additional terms that you'll come

across as you work with XML but do not appear in the book.
Conventions Used in This Book
This book uses a number of conventions that make it easier to read:
Note Notes provide additional information.
Tip Tips highlight information that appears in the surrounding text.
Code that appears within the body of a paragraph is shown in another font to make it
stand out from the rest of the surrounding text.
Code listings appear in another font, sometimes including bold lines to highlight
certain parts of the listing. The following is an example of a listing that contains bold
text:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="
<xs:complexType name="license_t">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute name="licenseNumber" type="xs:string"/>
<xs:attribute name="ownerName" type="xs:string"/>
References
The following is a list of materials I used to prepare this book:
W3C, Extensible Markup Language (XML) 1.0 (Second Edition), World Wide Web
Consortium, 2000,

W3C, XML Path Language (XPath) Version 1.0, World Wide Web Consortium, 1999,

W3C, XSL Transformations (XSLT) Version 1.0, World Wide Web Consortium, 1999,

W3C, Cascading Style Sheets, level 1, World Wide Web Consortium, 1996,

Nikola Ozu et al, Professional XML, Wrox Press, 2001
Khun Yee Fung, XSLT: Working with XML and HTML, Addison Wesley, 2000

The Unicode Consortium, UNICODE STANDARD VERSION 3.0, 2000
Friday Evening: Introducing XML
Good evening! Tonight you begin learning how people use XML in real-world
scenarios. This evening introduces you to what XML is, how to create XML
documents and play by XML's rules, the benefits of using XML, and how XML relates
to HTML. The remainder of the evening discusses the typical life cycle of an XML
document, describes how others make XML work for them, and covers the basics of
the types of XML documents you'll probably encounter.
What Is XML?
XML stands for extensible markup language, a syntax that describes how to add
structure to data. A markup language is a specification that adds new information to
existing information while keeping the two sets of information separate. If it were as
simple as that, I could describe XML to you in just a few pages.
However, XML is more complicated than that. It's a simple syntax that describes
information, a set of technologies that allows you to format and filter information
independently of how that information is represented, and the embodiment of an idea
that reduces data to its purest form, devoid of formatting and other irrelevant aspects,
to attain a very high level of usefulness and flexibility.
Oddly enough, XML is not a markup language. Instead, it defines a set of rules for
creating markup languages. There are many types of markup languages, the most
popular of which is HTML (Hypertext Markup Language), the publishing language of
the Internet. HTML combines formatting information with a Web page's content so
that you see the page in the way the designer intended for you to see it.
The two most important elements that make HTML work are the HTML itself and
software that's capable of interpreting HTML. When you view a Web page, your
browser retrieves the page, interprets the HTML, and displays the resulting document
on your screen. The same two elements, XML itself and software that's capable of
interpreting XML, are needed with XML.
Assume that you're working with a file that looks like this:
Learn XML In A Weekend, Erik Westermann, 159200010X

This file describes information about a book using three fields: the title, author, and
ISBN (a number that uniquely identifies a book). While it's clear to you and me that
Learn XML In A Weekend represents the title of a book, a computer would have a
tough time figuring out that
• There are three fields in the file (separated by commas).
• Each field represents an individual piece of data.
XML enables you to add structure to the data. Here's the same file marked up with
XML:
<books>
<book>
<title>Learn XML In A Weekend</title>
<author>Erik Westermann</author>
<isbn>159200010X</isbn>
</book>
</books>
It's now apparent, both to us and to software that's capable of interpreting XML, that
the file contains information about a collection of books (there's only one book in this
collection) broken into three fields: title, author, and ISBN. For software to be able to
interpret the XML, the sample follows certain rules:
• Text inside the angle brackets (< and >) represents a markup element.
• Text outside of the angle brackets is data.
• The beginning of a unit of data has a start tag prefix.
• The end of a unit of data is marked with an end tag. This is almost identical to
a start tag, except that it begins with a slash (/).
For example, <title> is a start tag, Learn XML In A Weekend represents a unit of
data, and </title> is an end tag. XML defines only the syntax—the rules—and leaves
it to you to decide how you structure it and what data you store in it.
XML documents reside in files that you can create with an editor like Windows
Notepad, making XML very accessible. Specialized editors are available to help you
manage XML documents and ensure that you follow the rules of the XML

specification. I'll cover two such editors later in this book.
Note Windows Notepad is a simple text editor that comes with Windows. You can
start Notepad by clicking Start, Run, and then typing notepad.
It is important to understand that XML is an enabling technology, which is analogous
to any written or spoken language. A language doesn't communicate for us. We're
able to communicate because we use language.
Just as you play a role in reading the words on this page (the words are meaningless,
unless someone reads them), XML becomes useful only in the context of a system
that's able to interpret it. Unlike written and spoken languages, you're not likely to
directly read or write XML. People rarely read XML documents—in most cases,
software creates an XML file and then other software uses it without anyone actually
viewing the XML document itself. However, you still need to understand what XML is
and how to use it to your advantage.
There are three important characteristics of XML that make it useful in a variety of
systems and solutions:
• XML is extensible.
• XML separates data from presentation.
• XML is a widely accepted public standard.
XML Is Extensible
Think of XML like this: one syntax, many languages.
XML describes the basic syntax—the basic format—and rules that XML documents
must follow. Unlike markup languages like HTML, which has a predefined set of tags
(items with the angle brackets, as in the previous sample), XML doesn't put any
limitations on which tags you can use or create. For example, there isn't any reason
you couldn't rename the <book> tag to <manuscript> or <record>.
XML essentially allows you to create your own language, or vocabulary, that suits
your application. The XML standard (described shortly) describes how to create tags
and structure an XML document, creating a framework. As long as you stay within
the framework, you're free to define tags that suit your data or application.
XML Separates Data from Presentation

Take a close look at the page layout of this book—it contains several types of
headings and other formatting elements. The information on this page wouldn't
change if you changed its format, though. If you remove the headings, italic
characters, and other formatting, you'll be left with the essence of this book—the
information that it contains, or its content.
XML allows you to store content with regard to how it will be presented—whether in
print, on a computer screen, on a cellular phone's tiny display screen, or even read
aloud by speech software. When you want to present an XML document, you'll often
use another XML vocabulary (set of XML tags) to describe the presentation. Also,
you'll use other software to perform the transformation from XML into the format you
want to present the content in, as shown in
Figure 1.1.

Figure 1.1: Presenting an XML document by first transforming it.
XML Is Widely a Accepted Public Standard
XML was developed by an organization called the World Wide Web Consortium
(W3C), whose role is to promote interoperability between computer systems and
applications by developing standards and technologies for the Internet. The W3C
members include people from technology product vendors, content providers,
corporate users, research labs, and governments. Their goal is to ensure that its
recommendations (commonly referred to as standards) are vendor-neutral (not
specific to a particular company or organization) and receive consideration from a
broad range of users and developers.
The W3C's standards cannot be changed or dropped altogether without input from its
members and from the general public (if they choose to participate in the process).
This process is in contrast to proprietary standards that some vendors implement.
For example, Microsoft could decide to stop developing a standard it has created,
and subsequently stop incorporating it into its products. This is not likely to happen to
standards that the W3C develops.
Is XML a Programming Language?

A programming language is a vocabulary and syntax for instructing a computer to
perform specific tasks. XML doesn't qualify as a programming language because it
doesn't instruct a computer to do anything, as such. It's usually stored in a simple text
file and is processed by special software that's capable of interpreting XML. For
example, if the processing software is designed to change the behavior of an
application based on the contents of an XML file, the software will carry out the
changes. XML acts as a syntax to add structure to data, and it relies on other
software to make it useful.
Is XML Related to HTML?
HTML, the publishing language of the Internet, is related to XML through a language
called SGML (Standard Generalized Markup Language).
SGML is a complex markup language that has its roots in GML, another markup
language developed by a researcher working for IBM during the late 1960s. HTML is
an SGML application, which means that HTML is a type of document that SGML
directly supports. XML is a drastic simplification of SGML that removes its less
frequently used features and imposes new constraints that make it easier to work
with than SGML. However, like HTML, XML is a representation of SGML.
Why Not Use HTML?
Web developers are a very resourceful group of people. HTML has many
shortcomings, and the Web developer community at large has worked to overcome
them. The underlying problem with HTML is that it's a language that describes how to
present information—it doesn't describe the information itself (with the exception of a
few tags like <title> and <body>). Some people ask why the W3C doesn't extend
HTML so it describes information. The problem with that approach is backward-
compatibility with existing HTML pages and Web browsers. The syntax that describes
how to format HTML and the software that processes HTML aren't as strict as the
rules that XML imposes. Along with less strict rules comes an increase in the
complexity of the software that interprets HTML, and adding new tags and
capabilities to HTML would make the software even more complex.
The W3C has created a recommendation (a standard, in practical terms) called

XHTML to address some of these complexities. XHTML is essentially a strict version
of HTML—it combines the strength of HTML with the power of XML by imposing XML
rules on HTML documents. For example, this is a fragment of a simple HTML
document:
<TABLE width=50% ALIGN=center>
<tr>
<td>
<ul>
<li>List Item 1
<li>List Item 2
</ul>
</tr>
</table>
<p>The above table contains a list
<HR>
<p>Contact the author for details

Notice that the <TABLE> element includes two attributes, width and align, and the
end tag, </table>, is in lowercase as opposed to the uppercase start tag. The list
items (the ones that start with the <li> tag) don't have an end tag, as is the case with
the <p> tags that appear after the table. The <hr> tag doesn't require an end tag,
since the tag stands on its own. This listing represents completely legal HTML.
Browsers will display the page as the designer intends it to be shown.
If you rewrite the fragment using XHTML, it would look something like this:
<table width="50%" align="center">
<tr>
<td>
<ul>
<li>List Item 1</li>
<li>List Item 2</li>

</ul>
</tr>
</table>
<p>The above table contains a list</p>
<hr/>
<p>Contact the author for details</p>
The difference between the two fragments is subtle:
• All tags and attributes must be in lowercase.
• Attribute values must appear in quotes (refer to the <table> tag's width and
align attributes).
• All tags must have both a start and end tag.
• Empty tags, like the HTML <hr> tag, must appear as empty XML elements
using the syntax shown in the previous listing (<hr/>—note the slash character
just before the last angle bracket).
XHTML allows Web developers to combine HTML with XML either in the same file or
in separate files. The final result on HTML, however, is that its rules are too relaxed
and the software that processes it is too complex to survive a major revision. The
restrictions that XHTML imposes alleviate these problems to allow for further
development.
Biography of an XML Document
Throughout the chapter I've hinted at the stages an XML document passes through,
beginning at its creation and ending at its presentation.
Figure 1.2 summarizes how
to create an XML document. It shows a person using Windows Notepad to create an
XML document and store it in a file.

Figure 1.2: Creating an XML document.
Figure 1.3 shows what happens when a user requests a page from a Web site that
uses XML documents to manage its content.

Figure 1.3: Later stages of an XML document's life.
The process starts with the user making a request for a page from a Web site (step
1). The Web server (the computer that runs the Web site) retrieves the document the
user wants. However, the document is in XML, and the user expects the document to
be a Web page that's marked up using HTML. In step 3, the Web server transforms
the XML document into HTML by combining it with another document that describes
how to perform the transformation. The software that performs the actual
transformation is called a parser. An XML parser interprets the tags in an XML
document and can perform other functions, like transforming XML into other formats.
In step 4, the parser produces the resulting HTML document, which gets passed on
to the Web site in step 5. The final step in the process occurs when the Web site
delivers the HTML file to the user's computer. The user's browser interprets the
HTML file and displays it onscreen (not shown in the figure).
This scenario is just one of many ways to use XML documents. The
next section

describes how people use XML documents in real-world applications.
Elements of XML Documents
The best way to learn what makes up an XML document is to work from a simple
example. The following listing is a complete XML document that lists the names of
two people:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE people [
<!ELEMENT people (person+)>
<!ELEMENT person (name)>
<!ELEMENT name (first, last)>
<!ELEMENT first (#PCDATA)>
<!ELEMENT last (#PCDATA)>
]>
<people>

<person>
<name>
<first>Essam</first>
<last>Ahmed</last>
</name>
</person>
<person>
<name>
<first>Tom</first>
<last>Archer</last>
</name>
</person>
</people>
XML lets you name the parts of the document anything you want. It doesn't matter
how you're going to use the document, and the final appearance of the document
doesn't matter either. All that matters is that you follow the basic rules for creating
tags, as described earlier. This sample document contains some markup at the very
beginning that obviously doesn't follow the basic rules—I'll explain what those parts
are in a moment.
Figure 1.4 highlights the various elements of the sample document.

Figure 1.4: Elements of an XML document.
The sample document, like all XML documents, has content interspersed with
markup symbols. Take a closer look at the parts that make up this document. The
numbers refer to the numbers in black circles in Figure 1.4:
• 1 XML declaration: Describes general characteristics of the document, such
as that it's an XML document, which version of the XML specification it
complies with (1.0 is the only known version at the time of this writing), and
which character encoding it uses. (I'll describe character encoding in
Saturday

morning's lesson, "Separating Content from Style.")
• 2 Document Type Declaration (DTD): This describes the structure of the
document in terms of which elements it may contain, along with any
restrictions it may have. (I'll describe the DTD in detail on Saturday morning.)
• 3 Internal DTD subset: A DTD can contain references to other DTDs.
However, the one in this example uses internal declarations that are local to
the XML document.
• 4 XML information set: This represents the XML document's content—the
information the document conveys.
• 5 Root element: This encloses all the information. An XML document can
have only one root element.
• 6 Start tag: XML elements have a start and end tag—the start tag provides
the name of the XML element.
• 7 End tag: The name of the end tag must exactly match the name of the start
tag.
• 8 XML element: The start and end tags are collectively referred to as an XML
element.
• 9 Data: XML elements can contain data between the start and end tags.
An XML document represents information using a hierarchy. That is, it begins with a
root element, which contains sub-elements, which in turn can contain other sub-
elements, text, or both. One way of depicting such a hierarchy is an upside-down tree
structure, as shown in
Figure 1.5.

Figure 1.5: Tree view of an XML document.
Although XML is designed so that people can read it, it isn't intended to create a
finished document. In other words, you can't open up just any XML-tagged document
in a browser and expect it to be formatted nicely. XML is meant to hold content so
that when the document is combined with other resources, such as a style sheet, it
becomes a finished product.

XML in the Real World
XML enjoys broad support from major software vendors, programming languages,
and platforms (operating systems). Since XML is platform- and vendor-neutral, it's
easy to integrate in a variety of ways. XML plays three primary roles:
• Application integration
• Knowledge management
• System-level integration
Using XML for Application Integration
A classic example of integrating applications is adding package-tracking functionality
to a company's Web site that fulfills customers' orders. For example, assume that you
run an online store and want to let your customers track the status of their orders
without leaving your site. You could implement a page that displays the order, along
with a link that allows the customer to check the order's status and get package-
tracking information after the order ships.
Your company uses several couriers to deliver orders to customers, and you want to
present this tracking information regardless of the courier. XML is perfect for this
scenario. It allows your Web site to request package-tracking information from
another site on the customer's behalf, and the results are delivered in a predictable
format that's easy to integrate into your site. As long as the software on your Web
site knows the format (structure) of the XML document(s) on the other couriers' Web
sites, your site will be able to integrate the results into the customer's order status
page.
That's a very simple example of integrating applications. A more complex example
involves Microsoft's .NET Platform, which makes extensive use of XML to achieve a
high degree of interoperability between distributed applications. Using the .NET
Framework, a developer could create an application that requests information and
interacts with other applications on the Internet using standardized XML vocabularies
(XML tags), without the users even being aware that it's happening. The developer
could integrate Internet-based applications that provide paid services or free
information, or that simply perform processing on behalf of the user. The possibilities

are limitless.
This level of integration is possible because XML is platform-neutral. As long as two
applications "speak" XML, using a predetermined vocabulary, they can interact with
each other regardless of where they physically reside or how they're implemented.
Using XML for Knowledge Management
Most personal Web sites are made up of HTML pages that contain static
(unchanging) content. Using HTML pages to provide content to your site's visitors
works well, as long as the number of pages you need to manage remains relatively
small. If you want to update your existing pages, you have to edit them directly. If you
want to change the appearance of some or all pages on your Web site, you have to
edit them directly as well. As your site grows, changing sitewide characteristics such
as the site's overall appearance, navigation aids, and interactive capabilities
becomes a significant problem because you have to change a large number of
pages.
Managing a Web site's content is easy with a class of applications called Content
Management Systems (CMS). CMS allows Web site owners, content providers such
as journalists, and other (usually) nontechnical users to add new information to a
Web site without any knowledge of the site's underlying structure or operation. Web
sites that display ads in certain positions on each page, or that track how their visitors
use them, are particularly difficult to manage because they often incorporate
additional programming to manage those functions.
XML has made great strides toward integrating CMS solutions. XML-based CMS
stores a Web site's content in XML files and delivers the content to users in a variety
of formats, including HTML. In fact, there are some free, XML-based CMS's available
on the Internet. FullXML is a free, XML-based CMS that uses Microsoft technologies
like Windows, Microsoft Internet Information Server (Web server), and the Microsoft
XML parser (software that interprets XML). Visit
for more
information.
XML is also being used as a portable database system. I use portable in terms of

easily moving a data store (a repository of data) that's stored on one system to
another system. Popular database systems are based on proprietary formats that
their vendors have invented. For example, if you use a database system from one
vendor, it's very difficult to integrate it with a database system from another vendor.
Besides the obvious competitive reasons, there are incompatibilities in the system's
file formats and methods of communication.
XML addresses these problems by allowing you to retain the structure that a
database system provides while making it easy to access and move the entire set of
data from one system to another. For example, you can move a data set from a Unix-
based system to a Windows-based system without using any special software, which
is practically impossible with proprietary database systems. The advantage of XML is
that the data store (repository) becomes open (easily accessible without having to
use any special software) and vendor-neutral. Those are two very important
characteristics in the face of fast-paced economic changes that could lead to vendors
going out of business or dropping entire product lines.
Another aspect of knowledge management is content reuse. With the increasing
demand for quality content, providers are looking for interesting ways to reuse and
integrate content that they've spent a lot of money to acquire. XML makes it easy to
aggregate content from a number of XML documents into a new document and
present it in various formats.
Using XML for System-level Integration
The software you use every day relies on the fundamental functions of other software
(such as a Web server) and operating systems (such as Windows). Sometimes
developers need to move data and system-level entities (objects, if you're interested)
from one computer to another, or from one application to another on the same
computer. For decades, this has been a difficult problem to address.
XML helps by providing a format that's easy to marshal (transport). Documents are
stored as simple text files, which easily translate into strings that are relatively easy to
marshal between computers and processes. For example, the Microsoft .NET
Framework uses XML to marshal data on a single system or across systems

interconnected by a network, like the Internet. If I've lost you, don't worry. All you
need to understand is that XML can help you quickly achieve interoperability at very
low levels within a system.
XML Vocabularies
As you've learned, XML allows you to create your own vocabulary that suits your
application or data. A vocabulary is simply a set of tags with specific meanings that
developers and applications understand. For example, the "books" XML document at
the beginning of this chapter uses an XML vocabulary that defines the meanings of
the <books>, <book>, <title>, <author>, and <isbn> tags. Specifically, when an
application reads the "books" XML document, it understands that the <books> tag
refers to a set of books, while a single book is represented by the <book> tag.
Since XML is so flexible, new XML vocabularies are being developed at an incredible
pace. Some vocabularies have become so popular and useful that the community at
large, and even the W3C, have adopted them as industry standards. Once a
vocabulary becomes standardized, it's easier for developers and vendors to support
the vocabulary and integrate it into applications and other systems.
XML vocabularies are broadly divided into two groups, horizontal and vertical, as
shown in
Figure 1.6.

Figure 1.6: Groups of XML vocabularies.
Horizontal XML vocabularies represent core definitions and elements upon which all
industry-specific XML vocabularies rely. For example, SOAP is a vocabulary that's
useful for all types of XML applications that need to communicate with each other
over a network like the Internet. Vertical XML vocabularies are industry-specific.
Table 1.1 lists some industries and the names of some of their XML vocabularies,
either in use or under development.
Table 1.1: INDUSTRY-SPECIFIC XML VOCABULARIES
Industry Examples of XML Vocabularies
Accounting XFRML (Extensible Financial Reporting Markup Language),

SMBXML (Small and Medium Sized Business XML)
Entertainment SMDL (Standard Music Description Language), ChessGML (Chess
Table 1.1: INDUSTRY-SPECIFIC XML VOCABULARIES
Industry Examples of XML Vocabularies
Game Markup Language), BGML (Board Game Markup Language)
Customer
relations
CIML (Customer Information Markup Language), NAML
(Name/Address Markup Language), vCard
Education TML (Tutorial Markup Language), SCORM (Shareable Courseware
Object Reference Model Initiative), LMML (Learning Material
Markup Language)
Software OSD (Open Software Description), PML (Pattern Markup
Language), BRML (Business Rules Markup Language)
Manufacturing SML (Steel Markup Language)
Computer XML (Extensible Logfile Format), SML (Smart Card Markup
Language), TDML (Timing Diagram Markup Language)
Energy PetroXML, ProductionML, GeophysicsML
Multimedia SVG (Scalable Vector Graphics), MML (Music Markup Language),
X3D (Extensible 3D)
The following sections describe some popular vocabularies to give you an idea of
how much development has already taken place. Keep in mind that these are all XML
vocabularies. That is, they represent XML documents that developers and software
applications have agreed to use to facilitate communication and interoperability.
XSL
XSL, the Extensible Stylesheet Language, is an XML vocabulary that describes how
to present a document. In other words, you write XSL using XML. When you combine
XSL with XML using a parser, as shown in Figure 1.3, the parser produces a new file
that's based on the formatting commands that you specify using XSL. You can
present the resulting document on a screen, in print, or in other media. XSL enables

XML content to remain separate from its presentation. If you don't fully understand
how this works, it's described in more detail on
Sunday afternoon. For now, it's
important to understand the underlying concept of using XSL to describe the
presentation of an XML document.
For example, consider the "books" XML document at the beginning of this chapter.
Suppose that you want to format the document as a table, as shown in
Figure 1.7
.

Figure 1.7: Presenting an XML document in a browser using HTML.
Using XML Spy, a tool that I discuss on
Sunday morning and Sunday afternoon, you
can easily generate the necessary XSL with drag-and-drop editing. Here's a fragment
of the XSL that the parser uses to perform the transformation (note that this is only a
small part of the complete document):
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="
<xsl:template match="/">
<html>
<head/>
<body>
<xsl:for-each select="books">
<xsl:for-each select="book">
<xsl:if test="position()=1">
<xsl:text disable-output-escaping="yes">
<table border="1"></xsl:text>
</xsl:if>
<xsl:if test="position()=1">

<thead>
<tr>
<td>Title</td>
<td>Author</td>
<td>ISBN</td>
</tr>
</thead>
</xsl:if>
For the moment, you don't need to understand what the XSL means. The point is that
this is an XML document that happens to use the XSL vocabulary. The document
follows all of XML's rules with regard to start and end tags (and several other rules
that I'll describe in the next lesson). If you combine the complete XSL document with
the "books" XML document, you'll end up with the table back in
Figure 1.4. If you
want to display the "books" XML document in another format, such as a bulleted
listing, just change the XSL document and transform the XML document again. The
XML document remains the same, regardless of which format you choose to display
it in.
CDF
CDF, the Channel Definition Format, is an XML vocabulary invented by Microsoft to
automatically notify Web users that new content is available. That way, users can find
out about new content without having to actually visit the site.
CDF pushes information out to users who are interested in receiving updates. Web
publishers use CDF to describe the information they want to publish, and how
frequently they want to update interested users in any changes. When a Web
publisher changes its site, interested users' systems are automatically updated. In
fact, CDF is integrated into Microsoft Windows through the Active Desktop, so a user
can have Web site updates appear as part of his or her Windows desktop.
CDF also allows users to customize how they want to be notified when a Web site is
updated. Users can choose from several notification methods, including e-mail,

screen saver, desktop component, and channel. The first two formats are self-
explanatory. A desktop component is a special window that remains open on your
screen but resides on the desktop itself (where the wallpaper is). It always has the
latest information in it, and when you click on a link, it starts Internet Explorer and
opens the Web site.
Figure 1.8 shows a desktop component that the W3C publishes.

Figure 1.8: A desktop component displaying updates from the W3C Web site.
A channel is like an item in Internet Explorer's Favorites menu—you simply select the
channel, and IE opens up a page that has information about the Web site's updates.
The twist with the channel format is that you may be able to browse through some or
all of the content when you're not connected to the Internet. (The Web publisher
determines if you can view the content offline.) The channel format is a benefit to
mobile users, or users who prefer to use a portable device to catch up on the latest
from their favorite Web sites.
The only browser that's capable of working with CDF is IE. Microsoft submitted the
CDF format to the W3C in 1997 for consideration and possible development as a
widely accepted standard, but the W3C hasn't pursued the format since then.
MathML
Presenting mathematical expressions and equations in Web documents is usually
difficult, because most systems support only basic symbols for operators like
addition, subtraction, multiplication, and division.
MathML, the Math Markup Language, meets the needs of a broad set of users,
including scientists, teachers, the publishing industry, and vendors of software tools
that allow you to create and manipulate mathematical expressions. It's a W3C
recommendation, which means it's a broadly accepted industry standard. For
example,
Figure 1.9 shows a complex mathematical expression with characters that
most browsers, including IE, cannot display using standard HTML.

Figure 1.9: A mathematical equation based on a MathML document.
Note The samples for this book include a page called testMathML.html in the
chapter01 folder. You need to download and install a browser that's capable of
interpreting MathML documents, like the freely available Amaya browser at
Select the Distributions option and pick the
download file for your operating system. The sample is located in the
\XMLInAWeekend\chapter01 folder. Please see the Preface for information on
where to obtain the samples.
MathML can get rather complicated. For example, the following listing represents the
MathML for the expression in Figure 1.9:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE math PUBLIC "-//W3C//DTD MathML 2.0//EN"
"
<math
xmlns="
xmlns:xlink="
<mrow>
<mi>A</mi>
<mo>=</mo>
<mrow displaystyle='true'>
<msubsup>
<mo>∫</mo>
<mn>0</mn>
<mn>1</mn>
</msubsup>
<mfrac>
<mrow>
<mo moveablelimits='true'>ln</mo>
<mrow>
<mo stretchy='false'>(</mo>

</math>
There are three types of MathML elements: presentation elements, content elements,
and interface elements. Presentation elements describe mathematic notational
structures, such as rows (mrow), identifiers (mi), and numbers (mn). Content
elements represent mathematical concepts like addition and constructs like matrixes.
There is only one interface element: the math element. It allows MathML to coexist
with HTML, providing MathML-capable software with a general overview of the
MathML document. It also allows special style sheets (formatting instructions) to be
associated with MathML documents.
DocBook
DocBook is an XML vocabulary designed to help publishers and authors create
books. Although DocBook works particularly well for books on computer software and
hardware, it's useful for other types of books too. It's not a W3C standard, but a
group called Organization for the Advancement of Structured Information Standards
(OASIS) promotes its use and develops it, along with other important industry
specifications.
The following listing demonstrates some of the content from this chapter, marked up
using DocBook:
<chapter id="Chapter 1">
<title> What is XML?</title>
<warning>
<para>I have changed the content a little</para>
</warning>

<para>XML provides a means to add <emphasis>structure</emphasis>
to the data, making the structure more apparent. Here's the same
file marked up with XML:<para>
<programlisting><![CDATA[
<books>
<book>

<title>Learn XML In A Weekend</title>
<author>Erik Westermann</author>
<isbn>159200010X</isbn>
</book>
</books>]]>
</programlisting>
</chapter>
The preceding listing is based on the DocBook specification, which is a DTD (briefly
described in the "
Elements of XML Documents" section earlier in the chapter). The
following listing is a very small fragment of the DTD that describes DocBook:
<![%book.element;[
<!ELEMENT book %ho; ((%div.title.content;)?, bookinfo?,
(dedication | toc | lot
| glossary | bibliography | preface
| %chapter.class; | reference | part
| %article.class;
| %appendix.class;
| %index.class;
| colophon)*)
%ubiq.inclusion;>
<! end of book. element >]]>
<!ENTITY % book.attlist "INCLUDE">
<![%book.attlist;[
<!ATTLIST book fpi CDATA #IMPLIED
%label.attrib;
%status.attrib;
%common.attrib;
%book.role.attrib;
%local.book.attrib;

>
<! end of book. attlist >]]>
<![%chapter.element;[
<!ELEMENT chapter %ho; (beginpage?,
chapterinfo?,
(%bookcomponent.title.content;),
(%nav.class;)*,
tocchap?,
(%bookcomponent.content;),
(%nav.class;)*)
%ubiq.inclusion;>
<! end of chapter. element >]]>

<!ENTITY % chapter.attlist "INCLUDE">
<![%chapter.attlist;[
<!ATTLIST chapter
%label.attrib;
%status.attrib;
%common.attrib;
%chapter.role.attrib;
%local.chapter.attrib;
>
<! end of chapter.attlist >]]>
SVG
SVG, Scalable Vector Graphics, is an XML vocabulary for describing two-
dimensional graphics. Most graphics on the Internet are referred to as bitmaps. A
bitmap is a file that contains information about a graphical image, including the
location and color of each individual element. Bitmaps store a lot of information, so
the files can get very large. That's why it takes longer for pages with lots of graphics
to download into your browser.

SVG makes it possible to describe images using XML instead of a bitmap. It
describes an image in terms of its lines and curves instead of its individual picture
elements, making it much more descriptive and compact than bitmaps. For example,
Figure 1.10 shows a simple graphic that takes about 54,000 bytes to store in a
bitmap file (specifically, a JPG file). Expressing the same file using SVG requires
about 3,000 bytes—that's 18 times less space.

Figure 1.10: A simple SVG-based image.
The following is a partial listing of the SVG used to generate the image in
Figure
1.10:
<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN"
"
<svg width="200" height="300">
<path
d="M122.966 199.448 C124.37 199.448 125.509 "
style="fill:rgb(192,192,192);stroke:rgb(0,0,0);stroke-width:1"/>
<text x="81px" y="91px" transform="translate(9 8) ">Learn</text>
<text x="80px" y="111px" transform="translate(11 7) ">XML in
a</text>
<text x="77px" y="122px" transform="translate(0 1) ">Weekend</text>
</svg>
The following listing is a fragment of the DTD that describes the SVG vocabulary. It
doesn't include attribute and entity declarations:
<!- =============================================================
PARTIAL DECLARATIONS CORRESPONDING TO: Document Structure
============================================================= >
<!ENTITY % svgExt "" >
<!ELEMENT svg (desc|title|metadata|defs|

learn xml in a weekend

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về