Fundamental XML for
Developers
Dr. Timothy M. Chester
Texas A&M University
Timothy M. Chester is. . .
•
Senior IT Manager, Texas A&M University
–
Application Development, Systems Integration, Developer Tools
& Training
•
Lecturer, Texas A&M College of Business
–
Courses on Business Programming Fundamentals (VB.NET,
C#), XML & Advanced Web Development.
•
Author
–
Visual Studio Magazine, Dr. Dobbs Journal, IT Professional
•
Consultant
–
President & Principal, eInternet Studios
•
Contact Information
–
E-mail:
–
Web:
Texas A&M University
You Are. . .
•
Software Developers
–
New to XML, Object Oriented Development
–
Require ‘basics’ of XML course
•
IT Managers
–
Need familiarity with XML basics and
terminology
–
Interested in how XML can affect both
software development and legacy system
integration
This session . . .
•
Assumes you know nothing about XML or
XML based technologies
•
Provides a basic introduction to XML
based technologies
•
Demonstrates some of the basics of
working with the DOM, XSLT, Schema,
WSDL, and SOAP.
Agenda
XML
•
Document Object Model (DOM)
•
XPATH
•
XSLT
•
Schema
•
WSDL
•
SOAP
•
Questions
Underlying Technologies
XML Is the Glue
Program
the Web
X
M
L
Browse
the Web
H
T
M
L
T
C
P
/
I
P
Connect
the Web
T
e
c
h
n
o
l
o
g
y
I
n
n
o
v
a
t
i
o
n
Connectivity Presentation
Connecting
Applications
F
T
P
,
E
-
m
a
i
l
,
G
o
p
h
e
r
W
e
b
P
a
g
e
s
W
e
b
S
e
r
v
i
c
e
s
Evolution of Web
Generation 1
Static HTML
HTML
Generation 2
Web Applications
HTML
HTML, XML
HTML, XML
Generation 3
Web Services
Other Web Services
Partner
Web Service
Partner
Web Service
Data Access and Storage Tier
Application Business Logic Tier
YourCompany.com
Internet + XML
Web Services Overview
Application Model
Other Applications
End Users
Introducing XML
•
XML stands for Extensible Markup
Language. A markup language specifies
the structure and content of a document.
•
Because it is extensible, XML can be used
to create a wide variety of document
types.
Introducing XML
•
XML is a subset of a the Standard Generalized
Markup Language (SGML) which was introduced
in the 1980s. SGML is very complex and can be
costly.
•
These reasons led to the creation of Hypertext
Markup Language (HTML), a more easily used
markup language. XML can be seen as sitting
between SGML and HTML – easier to learn than
SGML, but more robust than HTML.
The Limits of HTML
•
HTML was designed for formatting text on a Web page.
It was not designed for dealing with the content of a Web
page. Additional features have been added to HTML, but
they do not solve data description or cataloging issues in
an HTML document.
•
Because HTML is not extensible, it cannot be modified to
meet specific needs. Browser developers have added
features making HTML more robust, but this has resulted
in a confusing mix of different HTML standards.
Introducing XML
•
HTML cannot be applied consistently.
Different browsers require different
standards making the final document
appear differently on one browser
compared with another.
Introduction to XML Markup
•
XML document (intro.xml)
–
Marks up message as XML
–
Commonly stored in text files
•
Extension .xml
1 <?xml version = "1.0"?>
2
3 <! Fig. 5.1 : intro.xml >
4 <! Simple introduction to XML markup >
5
6 <myMessage>
7 <message>Welcome to XML!</message>
8 </myMessage>
Line numbers are not part
of XML document. We
include them for clarity.
Document begins with declaration
that specifies XML version 1.0
Element message is
child element of root
element myMessage
•
XML documents
–
Must contain exactly one root element
•
Attempting to create more than one root element is
erroneous
–
Elements must be nested properly
•
Incorrect: <x><y>hello</x></y>
•
Correct: <x><y>hello</y></x>
–
Must be well-formed
Introduction to XML Markup
(cont.)
XML Parsers
•
An XML processor (also called XML
parser) evaluates the document to make
sure it conforms to all XML specifications
for structure and syntax.
•
XML parsers are strict. It is this rigidity
built into XML that ensures XML code
accepted by the parser will work the same
everywhere.
XML Parsers
•
Microsoft’s parser is called MSXML and is
built directly in IE versions 5.0 and above.
•
Netscape developed its own parser, called
Mozilla, which is built into version 6.0 and
above.
Parsers and Well-formed XML
Documents (cont.)
•
XML parsers support
–
Document Object Model (DOM)
•
Builds tree structure containing document data in
memory
–
Simple API for XML (SAX)
•
Generates events when tags, comments, etc. are
encountered
–
(Events are notifications to the application)
Parsing an XML Document with
MSXML
•
XML document
–
Contains data
–
Does not contain formatting information
–
Load XML document into Internet Explorer 5.0
•
Document is parsed by msxml.
•
Places plus (+) or minus (-) signs next to container elements
–
Plus sign indicates that all child elements are hidden
–
Clicking plus sign expands container element
»
Displays children
–
Minus sign indicates that all child elements are visible
–
Clicking minus sign collapses container element
»
Hides children
•
Error generated, if document is not well formed
XML document shown in IE6.
Character Set
•
XML documents may contain
–
Carriage returns
–
Line feeds
–
Unicode characters
•
Enables computers to process characters for
several languages
Characters vs. Markup
•
XML must differentiate between
–
Markup text
•
Enclosed in angle brackets (< and >)
–
e.g,. Child elements
–
Character data
•
Text between start tag and end tag
–
Welcome to XML!
–
Elements versus Attributes
White Space, Entity References
and Built-in Entities
•
Whitespace characters
–
Spaces, tabs, line feeds and carriage returns
•
Significant (preserved by application)
•
Insignificant (not preserved by application)
–
Normalization
»
Whitespace collapsed into single whitespace
character
»
Sometimes whitespace removed entirely
<markup>This is character data</markup>
after normalization, becomes
<markup>This is character data</markup>
White Space, Entity References and
Built-in Entities (cont.)
•
XML-reserved characters
–
Ampersand (&)
–
Left-angle bracket (<)
–
Right-angle bracket (>)
–
Apostrophe (’)
–
Double quote (”)
•
Entity references
–
Allow to use XML-reserved characters
•
Begin with ampersand (&) and end with semicolon (;)
–
Prevents from misinterpreting character data as markup