Tải bản đầy đủ (.pdf) (95 trang)

Apress-Visual CSharp 2010 Recipes A Problem Solution Approach_2 potx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.05 MB, 95 trang )




C H A P T E R 6

■ ■ ■

261

XML Processing
One of the most remarkable aspects of the Microsoft .NET Framework is its deep integration with XML.
In many .NET applications, you won’t even be aware you’re using XML technologies—they’ll just be
used behind the scenes when you serialize a Microsoft ADO.NET DataSet, call a web service, or read
application settings from a Web.config configuration file. In other cases, you’ll want to work directly with
the System.Xml namespaces to manipulate Extensible Markup Language (XML) data. Common XML
tasks don’t just include parsing an XML file, but also include validating it against a schema, applying an
Extensible Stylesheet Language (XSL) transform to create a new document or Hypertext Markup
Language (HTML) page, and searching intelligently with XPath.
In .NET 3.5, Microsoft added LINQ to XML, which integrates XML handling into the LINQ model for
querying data sources. You can use the same keywords and syntax to query XML as you would a
collection or a database.
The recipes in this chapter describe how to do the following:
• Read, parse, and manipulate XML data (recipes 6-1, 6-2, 6-3, and 6-7)
• Search an XML document for specific nodes, either by name (recipe 6-4), by
namespace (recipe 6-5), or by using XPath (recipe 6-6)
• Validate an XML document with an XML schema (recipe 6-8)
• Serialize an object to XML (recipe 6-9), create an XML schema for a class (recipe 6-
10), and generate the source code for a class based on an XML schema (recipe 6-
11)
• Transform an XML document to another document using an XSL Transformations
(XSLT) stylesheet (recipe 6-12)


• Use LINQ to XML to load, create, query and modify XML trees (recipes 6-13, 6-14,
6-15, and 6-16).
6-1. Show the Structure of an XML Document in a TreeView
Problem
You need to display the structure and content of an XML document in a Windows-based application.
CHAPTER 6 ■ XML PROCESSING

262

Solution
Load the XML document using the System.Xml.XmlDocument class. Create a reentrant method that
converts a single XmlNode into a System.Windows.Forms.TreeNode, and call it recursively to walk through
the entire document.
How It Works
The .NET Framework provides several different ways to process XML documents. The one you use
depends in part upon your programming task. One of the most fully featured classes is XmlDocument,
which provides an in-memory representation of an XML document that conforms to the W3C Document
Object Model (DOM). The XmlDocument class allows you to browse through the nodes in any direction,
insert and remove nodes, and change the structure on the fly. For details of the DOM specification, go to
www.w3c.org.
■ Note The XmlDocument class is not scalable for very large XML documents, because it holds the entire XML
content in memory at once. If you want a more memory-efficient alternative, and you can afford to read and
process the XML piece by piece, consider the
XmlReader and XmlWriter classes described in recipe 6-7.
To use the XmlDocument class, simply create a new instance of the class and call the Load method with
a file name, a Stream, a TextReader, or an XmlReader object. It is also possible to read the XML from a
simple string with the LoadXML method. You can even supply a string with a URL that points to an XML
document on the Web using the Load method. The XmlDocument instance will be populated with the tree
of elements, or nodes, from the source document. The entry point for accessing these nodes is the root
element, which is provided through the XmlDocument.DocumentElement property. DocumentElement is an

XmlElement object that can contain one or more nested XmlNode objects, which in turn can contain more
XmlNode objects, and so on. An XmlNode is the basic ingredient of an XML file. Common XML nodes
include elements, attributes, comments, and contained text.
When dealing with an XmlNode or a class that derives from it (such as XmlElement or XmlAttribute),
you can use the following basic properties:
• ChildNodes is an XmlNodeList collection that contains the first level of nested
nodes.
• Name is the name of the node.
• NodeType returns a member of the System.Xml.XmlNodeType enumeration that
indicates the type of the node (element, attribute, text, and so on).
• Value is the content of the node, if it’s a text or CDATA node.
• Attributes provides a collection of node objects representing the attributes
applied to the element.
• InnerText retrieves a string with the concatenated value of the node and all nested
nodes.
CHAPTER 6 ■ XML PROCESSING

263

• InnerXml retrieves a string with the concatenated XML markup for all nested
nodes.
• OuterXml retrieves a string with the concatenated XML markup for the current
node and all nested nodes.
The Code
The following example walks through every element of an XmlDocument using the ChildNodes property
and a recursive method. Each node is displayed in a TreeView control, with descriptive text that either
identifies it or shows its content.

using System;
using System.Windows.Forms;

using System.Xml;
using System.IO;

namespace Apress.VisualCSharpRecipes.Chapter06
{
public partial class Recipe06_01 : System.Windows.Forms.Form
{
public Recipe06_01()
{
InitializeComponent();
}

// Default the file name to the sample document.
private void Recipe06_01_Load(object sender, EventArgs e)
{
txtXmlFile.Text = Path.Combine(Application.StartupPath,
@" \ \ProductCatalog.xml");
}

private void cmdLoad_Click(object sender, System.EventArgs e)
{
// Clear the tree.
treeXml.Nodes.Clear();

// Load the XML document.
XmlDocument doc = new XmlDocument();
try
{
doc.Load(txtXmlFile.Text);
}

catch (Exception err)
{
MessageBox.Show(err.Message);
return;
}

CHAPTER 6 ■ XML PROCESSING

264

// Populate the TreeView.
ConvertXmlNodeToTreeNode(doc, treeXml.Nodes);

// Expand all nodes.
treeXml.Nodes[0].ExpandAll();
}

private void ConvertXmlNodeToTreeNode(XmlNode xmlNode,
TreeNodeCollection treeNodes)
{
// Add a TreeNode node that represents this XmlNode.
TreeNode newTreeNode = treeNodes.Add(xmlNode.Name);

// Customize the TreeNode text based on the XmlNode
// type and content.
switch (xmlNode.NodeType)
{
case XmlNodeType.ProcessingInstruction:
case XmlNodeType.XmlDeclaration:
newTreeNode.Text = "<?" + xmlNode.Name + " " +

xmlNode.Value + "?>";
break;
case XmlNodeType.Element:
newTreeNode.Text = "<" + xmlNode.Name + ">";
break;
case XmlNodeType.Attribute:
newTreeNode.Text = "ATTRIBUTE: " + xmlNode.Name;
break;
case XmlNodeType.Text:
case XmlNodeType.CDATA:
newTreeNode.Text = xmlNode.Value;
break;
case XmlNodeType.Comment:
newTreeNode.Text = "<! " + xmlNode.Value + " >";
break;
}

// Call this routine recursively for each attribute.
// (XmlAttribute is a subclass of XmlNode.)
if (xmlNode.Attributes != null)
{
foreach (XmlAttribute attribute in xmlNode.Attributes)
{
ConvertXmlNodeToTreeNode(attribute, newTreeNode.Nodes);
}
}

CHAPTER 6 ■ XML PROCESSING

265


// Call this routine recursively for each child node.
// Typically, this child node represents a nested element
// or element content.
foreach (XmlNode childNode in xmlNode.ChildNodes)
{
ConvertXmlNodeToTreeNode(childNode, newTreeNode.Nodes);
}
}
}
}
Usage
As an example, consider the following simple XML file (which is included with the sample code as the
ProductCatalog.xml file):

<?xml version="1.0" ?>
<! This document is a sample catalog for demonstration purposes >
<productCatalog>
<catalogName>Freeman and Freeman Unique Catalog 2010</catalogName>
<expiryDate>2012-01-01</expiryDate>

<products>
<product id="1001">
<productName>Gourmet Coffee</productName>
<description>Beans from rare Chillean plantations.</description>
<productPrice>0.99</productPrice>
<inStock>true</inStock>
</product>
<product id="1002">
<productName>Blue China Tea Pot</productName>

<description>A trendy update for tea drinkers.</description>
<productPrice>102.99</productPrice>
<inStock>true</inStock>
</product>
</products>
</productCatalog>

Figure 6-1 shows how this file will be rendered in the Recipe06_01 form.
CHAPTER 6 ■ XML PROCESSING

266


Figure 6-1. The displayed structure of an XML document
6-2. Insert Nodes in an XML Document
Problem
You need to modify an XML document by inserting new data, or you want to create an entirely new XML
document in memory.
Solution
Create the node using the appropriate XmlDocument method (such as CreateElement, CreateAttribute,
CreateNode, and so on). Then insert it using the appropriate XmlNode method (such as InsertAfter,
InsertBefore, or AppendChild).
How It Works
Inserting a node into the XmlDocument class is a two-step process. You must first create the node, and
then you insert it at the appropriate location. You can then call XmlDocument.Save to persist changes.
CHAPTER 6 ■ XML PROCESSING

267

To create a node, you use one of the XmlDocument methods starting with the word Create, depending

on the type of node. This ensures that the node will have the same namespace as the rest of the
document. (Alternatively, you can supply a namespace as an additional string argument.) Next, you
must find a suitable related node and use one of its insertion methods to add the new node to the tree.
The Code
The following example demonstrates this technique by programmatically creating a new XML
document:

using System;
using System.Xml;

namespace Apress.VisualCSharpRecipes.Chapter06
{
public class Recipe06_02
{
private static void Main()
{
// Create a new, empty document.
XmlDocument doc = new XmlDocument();
XmlNode docNode = doc.CreateXmlDeclaration("1.0", "UTF-8", null);
doc.AppendChild(docNode);

// Create and insert a new element.
XmlNode productsNode = doc.CreateElement("products");
doc.AppendChild(productsNode);

// Create a nested element (with an attribute).
XmlNode productNode = doc.CreateElement("product");
XmlAttribute productAttribute = doc.CreateAttribute("id");
productAttribute.Value = "1001";
productNode.Attributes.Append(productAttribute);

productsNode.AppendChild(productNode);

// Create and add the subelements for this product node
// (with contained text data).
XmlNode nameNode = doc.CreateElement("productName");
nameNode.AppendChild(doc.CreateTextNode("Gourmet Coffee"));
productNode.AppendChild(nameNode);
XmlNode priceNode = doc.CreateElement("productPrice");
priceNode.AppendChild(doc.CreateTextNode("0.99"));
productNode.AppendChild(priceNode);

// Create and add another product node.
productNode = doc.CreateElement("product");
productAttribute = doc.CreateAttribute("id");
productAttribute.Value = "1002";
productNode.Attributes.Append(productAttribute);
productsNode.AppendChild(productNode);
CHAPTER 6 ■ XML PROCESSING

268

nameNode = doc.CreateElement("productName");
nameNode.AppendChild(doc.CreateTextNode("Blue China Tea Pot"));
productNode.AppendChild(nameNode);
priceNode = doc.CreateElement("productPrice");
priceNode.AppendChild(doc.CreateTextNode("102.99"));
productNode.AppendChild(priceNode);

// Save the document (to the console window rather than a file).
doc.Save(Console.Out);

Console.ReadLine();
}
}
}

When you run this code, the generated XML document looks like this:
<?xml version="1.0"?>
<products>
<product id="1001">
<productName>Gourmet Coffee</productName>
<productPrice>0.99</productPrice>
</product>
<product id="1002">
<productName>Blue China Tea Pot</productName>
<productPrice>102.99</productPrice>
</product>
</products>
6-3. Quickly Append Nodes in an XML Document
Problem
You need to add nodes to an XML document without requiring lengthy, verbose code.
CHAPTER 6 ■ XML PROCESSING

269

Solution
Create a helper function that accepts a tag name and content, and can generate the entire element at
once. Alternatively, use the XmlDocument.CloneNode method to copy branches of an XmlDocument.
How It Works
Inserting a single element into an XmlDocument requires several lines of code. You can shorten this code
in several ways. One approach is to create a dedicated helper class with higher-level methods for adding

elements and attributes. For example, you could create an AddElement method that generates a new
element, inserts it, and adds any contained text—the three operations needed to insert most elements.
The Code
Here’s an example of one such helper class:

using System;
using System.Xml;

namespace Apress.VisualCSharpRecipes.Chapter06
{
public class XmlHelper
{
public static XmlNode AddElement(string tagName,
string textContent, XmlNode parent)
{
XmlNode node = parent.OwnerDocument.CreateElement(tagName);
parent.AppendChild(node);

if (textContent != null)
{
XmlNode content;
content = parent.OwnerDocument.CreateTextNode(textContent);
node.AppendChild(content);
}
return node;
}

public static XmlNode AddAttribute(string attributeName,
string textContent, XmlNode parent)
{

XmlAttribute attribute;
attribute = parent.OwnerDocument.CreateAttribute(attributeName);
attribute.Value = textContent;
parent.Attributes.Append(attribute);

CHAPTER 6 ■ XML PROCESSING

270

return attribute;
}
}
}

You can now condense the XML-generating code from recipe 6-2 with the simpler syntax shown
here:

public class Recipe06_03
{
private static void Main()
{
// Create the basic document.
XmlDocument doc = new XmlDocument();
XmlNode docNode = doc.CreateXmlDeclaration("1.0", "UTF-8", null);
doc.AppendChild(docNode);
XmlNode products = doc.CreateElement("products");
doc.AppendChild(products);

// Add two products.
XmlNode product = XmlHelper.AddElement("product", null, products);

XmlHelper.AddAttribute("id", "1001", product);
XmlHelper.AddElement("productName", "Gourmet Coffee", product);
XmlHelper.AddElement("productPrice", "0.99", product);

product = XmlHelper.AddElement("product", null, products);
XmlHelper.AddAttribute("id", "1002", product);
XmlHelper.AddElement("productName", "Blue China Tea Pot", product);
XmlHelper.AddElement("productPrice", "102.99", product);

// Save the document (to the console window rather than a file).
doc.Save(Console.Out);
Console.ReadLine();
}
}

Alternatively, you might want to take the helper methods such as AddAttribute and AddElement and
make them instance methods in a custom class you derive from XmlDocument.
Another approach to simplifying writing XML is to duplicate nodes using the XmlNode.CloneNode
method. CloneNode accepts a Boolean deep parameter. If you supply true, CloneNode will duplicate the
entire branch, with all nested nodes.
Here is an example that creates a new product node by copying the first node:

// (Add first product node.)

// Create a new element based on an existing product.
product = product.CloneNode(true);

CHAPTER 6 ■ XML PROCESSING

271


// Modify the node data.
product.Attributes[0].Value = "1002";
product.ChildNodes[0].ChildNodes[0].Value = "Blue China Tea Pot";
product.ChildNodes[1].ChildNodes[0].Value = "102.99";

// Add the new element.
products.AppendChild(product);

Notice that in this case, certain assumptions are being made about the existing nodes (for example,
that the first child in the item node is always the name, and the second child is always the price). If this
assumption is not guaranteed to be true, you might need to examine the node name programmatically.
6-4. Find Specific Elements by Name
Problem
You need to retrieve a specific node from an XmlDocument, and you know its name but not its position.
Solution
Use the XmlDocument.GetElementsByTagName method, which searches an entire document and returns a
System.Xml.XmlNodeList containing any matches.
How It Works
The XmlDocument class provides a convenient GetElementsByTagName method that searches an entire
document for nodes that have the indicated element name. It returns the results as a collection of
XmlNode objects.
The Code
The following code demonstrates how you could use GetElementsByTagName to calculate the total price of
items in a catalog by retrieving all elements with the name productPrice:

using System;
using System.Xml;

namespace Apress.VisualCSharpRecipes.Chapter06

{
public class Recipe06_04
{
private static void Main()
{
CHAPTER 6 ■ XML PROCESSING

272

// Load the document.
XmlDocument doc = new XmlDocument();
doc.Load(@" \ \ProductCatalog.xml");

// Retrieve all prices.
XmlNodeList prices = doc.GetElementsByTagName("productPrice");

decimal totalPrice = 0;
foreach (XmlNode price in prices)
{
// Get the inner text of each matching element.
totalPrice += Decimal.Parse(price.ChildNodes[0].Value);
}

Console.WriteLine("Total catalog value: " + totalPrice.ToString());
Console.ReadLine();
}
}
}
Notes
You can also search portions of an XML document by using the XmlElement.GetElementsByTagName

method. It searches all the descendant nodes looking for matches. To use this method, first retrieve an
XmlNode that corresponds to an element. Then cast this object to an XmlElement. The following example
demonstrates how to find the price node under the first product element:

// Retrieve a reference to the first product.
XmlNode product = doc.GetElementsByTagName("products")[0];

// Find the price under this product.
XmlNode productPrice
= ((XmlElement)product).GetElementsByTagName("productPrice")[0];
Console.WriteLine("Price is " + productPrice.InnerText);

If your elements include an attribute of type ID, you can also use a method called GetElementById to
retrieve an element that has a matching ID value.
6-5. Get XML Nodes in a Specific XML Namespace
Problem
You need to retrieve nodes from a specific namespace using an XmlDocument.
CHAPTER 6 ■ XML PROCESSING

273

Solution
Use the overload of the XmlDocument.GetElementsByTagName method that requires a namespace name as
a string argument. Additionally, supply an asterisk (*) for the element name if you want to match all
tags.
How It Works
Many XML documents contain nodes from more than one namespace. For example, an XML document
that represents a scientific article might use a separate type of markup for denoting math equations and
vector diagrams, or an XML document with information about a purchase order might aggregate client
and order information with a shipping record. Similarly, an XML document that represents a business-

to-business transaction might include portions from both companies, written in separate markup
languages.
A common task in XML programming is to retrieve the elements found in a specific namespace. You
can perform this task with the overloaded version of the XmlDocument.GetElementsByTagName method that
requires a namespace name. You can use this method to find tags by name or to find all the tags in the
specified namespace if you supply an asterisk for the tag name parameter.
The Code
As an example, consider the following compound XML document, which includes order and client
information, in two different namespaces (http://mycompany/OrderML and http://mycompany/ClientML):

<?xml version="1.0" ?>
<ord:order xmlns:ord="http://mycompany/OrderML"
xmlns:cli="http://mycompany/ClientML">

<cli:client>
<cli:firstName>Sally</cli:firstName>
<cli:lastName>Sergeyeva</cli:lastName>
</cli:client>

<ord:orderItem itemNumber="3211"/>
<ord:orderItem itemNumber="1155"/>

</ord:order>

Here is a simple console application that selects all the tags in the http://mycompany/OrderML
namespace:

using System;
using System.Xml;


namespace Apress.VisualCSharpRecipes.Chapter06
{
public class Recipe06_05
{
private static void Main()
CHAPTER 6 ■ XML PROCESSING

274

{
// Load the document.
XmlDocument doc = new XmlDocument();
doc.Load(@" \ \Order.xml");

// Retrieve all order tags.
XmlNodeList matches = doc.GetElementsByTagName("*",
"http://mycompany/OrderML");

// Display all the information.
Console.WriteLine("Element \tAttributes");
Console.WriteLine("******* \t**********");

foreach (XmlNode node in matches)
{
Console.Write(node.Name + "\t");
foreach (XmlAttribute attribute in node.Attributes)
{
Console.Write(attribute.Value + " ");
}
Console.WriteLine();

}
Console.ReadLine();
}
}
}

The output of this program is as follows:
Element Attributes
******* **********
ord:order http://mycompany/OrderML http://mycompany/ClientML
ord:orderItem 3211
ord:orderItem 1155
6-6. Find Elements with an XPath Search
Problem
You need to search an XML document for nodes using advanced search criteria. For example, you might
want to search a particular branch of an XML document for nodes that have certain attributes or contain
a specific number of nested child nodes.
CHAPTER 6 ■ XML PROCESSING

275

Solution
Execute an XPath expression using the SelectNodes or SelectSingleNode method of the XmlDocument
class.
How It Works
The XmlNode class defines two methods that perform XPath searches: SelectNodes and SelectSingleNode.
These methods operate on all contained child nodes. Because the XmlDocument inherits from XmlNode,
you can call XmlDocument.SelectNodes to search an entire document.
The Code
For example, consider the following XML document, which represents an order for two items. This

document includes text and numeric data, nested elements, and attributes, and so is a good way to test
simple XPath expressions.

<?xml version="1.0"?>
<Order id="2004-01-30.195496">
<Client id="ROS-930252034">
<Name>Remarkable Office Supplies</Name>
</Client>

<Items>
<Item id="1001">
<Name>Electronic Protractor</Name>
<Price>42.99</Price>
</Item>
<Item id="1002">
<Name>Invisible Ink</Name>
<Price>200.25</Price>
</Item>
</Items>
</Order>

Basic XPath syntax uses a pathlike notation. For example, the path /Order/Items/Item indicates an
<Item> element that is nested inside an <Items> element, which in turn is nested in a root <Order>
element. This is an absolute path. The following example uses an XPath absolute path to find the name
of every item in an order:

using System;
using System.Xml;

namespace Apress.VisualCSharpRecipes.Chapter06

{
public class Recipe06_06
{
private static void Main()
{
CHAPTER 6 ■ XML PROCESSING

276

// Load the document.
XmlDocument doc = new XmlDocument();
doc.Load(@" \ \orders.xml");

// Retrieve the name of every item.
// This could not be accomplished as easily with the
// GetElementsByTagName method, because Name elements are
// used in Item elements and Client elements, and so
// both types would be returned.
XmlNodeList nodes = doc.SelectNodes("/Order/Items/Item/Name");

foreach (XmlNode node in nodes)
{
Console.WriteLine(node.InnerText);
}
Console.ReadLine();
}
}
}

The output of this program is as follows:

Electronic Protractor
Invisible Ink
Notes
XPath provides a rich and powerful search syntax, and it is impossible to explain all the variations you
can use in a short recipe. However, Table 6-1 outlines some of the key ingredients in more advanced
XPath expressions and includes examples that show how they would work with the order document. For
a more detailed reference, refer to the W3C XPath recommendation, at www.w3.org/TR/xpath.
Table 6-1. XPath Expression Syntax
Expression Description Example
/
Starts an absolute path
that selects from the root
node.
/Order/Items/Item selects all Item elements that are children
of an Items element, which is itself a child of the root Order
element.
//
Starts a relative path that
selects nodes anywhere.
//Item/Name selects all the Name elements that are children of
an Item element, regardless of where they appear in the
document.

CHAPTER 6 ■ XML PROCESSING

277

Expression Description Example
@
Selects an attribute of a

node.
/Order/@id selects the attribute named id from the root
Order element.
*
Selects any element in the
path.
/Order/* selects both Items and Client nodes because both
are contained by a root Order element.
|
Combines multiple paths. /Order/Items/Item/Name|Order/Client/Name selects the Name
nodes used to describe a Client and the Name nodes used to
describe an Item.
.
Indicates the current
(default) node.
If the current node is an Order, the expression ./Items refers
to the related items for that order.

Indicates the parent node. //Name/ selects any element that is parent to a Name, which
includes the Client and Item elements.
[ ]
Defines selection criteria
that can test a contained
node or an attribute value.
/Order[@id="2004-01-30.195496"] selects the Order
elements with the indicated attribute value.
/Order/Items/Item[Price > 50] selects products higher
than $50 in price.
/Order/Items/Item[Price > 50 and Name="Laser Printer"]
selects products that match two criteria.

starts-
with
Retrieves elements based
on what text a contained
element starts with.
/Order/Items/Item[starts-with(Name, "C")] finds all Item
elements that have a Name element that starts with the letter
C.
position
Retrieves elements based
on position.
/Order/Items/Item[position ()=2] selects the second Item
element.
count

Counts elements. You
specify the name of the
child element to count or
an asterisk (*) for all
children.
/Order/Items/Item[count(Price) = 1] retrieves Item
elements that have exactly one nested Price element.
■ Note XPath expressions and all element and attribute names you use inside them are always case-sensitive,
because XML itself is case-sensitive.
CHAPTER 6 ■ XML PROCESSING

278

6-7. Read and Write XML Without Loading an Entire
Document into Memory

Problem
You need to read XML from a stream or write it to a stream. However, you want to process the
information one node at a time, rather than loading it all into memory with an XmlDocument.
Solution
To write XML, create an XmlWriter that wraps a stream and use Write methods (such as
WriteStartElement and WriteEndElement). To read XML, create an XmlReader that wraps a stream, and
call Read to move from node to node.
How It Works
The XmlWriter and XmlReader classes read or write XML directly from a stream one node at a time. These
classes do not provide the same features for navigating and manipulating your XML as XmlDocument, but
they do provide higher performance and a smaller memory footprint, particularly if you need to deal
with large XML documents.
Both XmlWriter and XmlReader are abstract classes, which means you cannot create an instance of
them directly. Instead, you should call the Create method of XmlWriter or XmlReader and supply a file or
stream. The Create method will return the right derived class based on the options you specify. This
allows for a more flexible model. Because your code uses the base classes, it can work seamlessly with
any derived class. For example, you could switch to a validating reader (as shown in the next recipe)
without needing to modify your code.
To write XML to any stream, you can use the streamlined XmlWriter. It provides Write methods that
write one node at a time. These include the following:
• WriteStartDocument, which writes the document prologue, and WriteEndDocument,
which closes any open elements at the end of the document.
• WriteStartElement, which writes an opening tag for the element you specify. You
can then add more elements nested inside this element, or you can call
WriteEndElement to write the closing tag.
• WriteElementString, which writes an entire element, with an opening tag, a
closing tag, and text content.
• WriteAttributeString, which writes an entire attribute for the nearest open
element, with a name and value.
Using these methods usually requires less code than creating an XmlDocument by hand, as

demonstrated in recipes 6-2 and 6-3.
To read the XML, you use the Read method of the XmlReader. This method advances the reader to the
next node and returns true. If no more nodes can be found, it returns false. You can retrieve
CHAPTER 6 ■ XML PROCESSING

279

information about the current node through XmlReader properties, including its Name, Value, and
NodeType.
To find out whether an element has attributes, you must explicitly test the HasAttributes property
and then use the GetAttribute method to retrieve the attributes by name or index number. The
XmlTextReader class can access only one node at a time, and it cannot move backward or jump to an
arbitrary node, which gives much less flexibility than the XmlDocument class.
The Code
The following console application writes and reads a simple XML document using the XmlWriter and
XmlReader classes. This is the same XML document created in recipes 6-2 and 6-3 using the XmlDocument
class.

using System;
using System.Xml;
using System.IO;
using System.Text;

namespace Apress.VisualCSharpRecipes.Chapter06
{
public class Recipe06_07
{
private static void Main()
{
// Create the file and writer.

FileStream fs = new FileStream("products.xml", FileMode.Create);

// If you want to configure additional details (like indenting,
// encoding, and new line handling), use the overload of the Create
// method that accepts an XmlWriterSettings object instead.
XmlWriter w = XmlWriter.Create(fs);

// Start the document.
w.WriteStartDocument();
w.WriteStartElement("products");

// Write a product.
w.WriteStartElement("product");
w.WriteAttributeString("id", "1001");
w.WriteElementString("productName", "Gourmet Coffee");
w.WriteElementString("productPrice", "0.99");
w.WriteEndElement();

// Write another product.
w.WriteStartElement("product");
w.WriteAttributeString("id", "1002");
w.WriteElementString("productName", "Blue China Tea Pot");
w.WriteElementString("productPrice", "102.99");
w.WriteEndElement();

CHAPTER 6 ■ XML PROCESSING

280

// End the document.

w.WriteEndElement();
w.WriteEndDocument();
w.Flush();
fs.Close();

Console.WriteLine("Document created. " +
"Press Enter to read the document.");
Console.ReadLine();

fs = new FileStream("products.xml", FileMode.Open);

// If you want to configure additional details (like comments,
// whitespace handling, or validation), use the overload of the Create
// method that accepts an XmlReaderSettings object instead.
XmlReader r = XmlReader.Create(fs);

// Read all nodes.
while (r.Read())
{
if (r.NodeType == XmlNodeType.Element)
{
Console.WriteLine();
Console.WriteLine("<" + r.Name + ">");

if (r.HasAttributes)
{
for (int i = 0; i < r.AttributeCount; i++)
{
Console.WriteLine("\tATTRIBUTE: " +
r.GetAttribute(i));

}
}
}
else if (r.NodeType == XmlNodeType.Text)
{
Console.WriteLine("\tVALUE: " + r.Value);
}
}
Console.ReadLine();
}
}
}

Often, when using the XmlReader, you are searching for specific nodes rather than processing every
element, as in this example. The approach used in this example does not work as well in this situation. It
forces you to read element tags, text content, and CDATA sections separately, which means you need to
explicitly keep track of where you are in the document. A better approach is to read the entire node and
text content at once (for simple text-only nodes) by using the ReadElementString method. You can also
use methods such as ReadToDescendant, ReadToFollowing, and ReadToNextSibling, all of which allow you
to skip some nodes.
CHAPTER 6 ■ XML PROCESSING

281

For example, you can use ReadToFollowing("Price"); to skip straight to the next Price element,
without worrying about whitespace, comments, or other elements before it. (If a Price element cannot
be found, the XmlReader moves to the end of the document, and the ReadToFollowing method returns
false.)
6-8. Validate an XML Document Against a Schema
Problem

You need to validate the content of an XML document by ensuring that it conforms to an XML schema.
Solution
When you call XmlReader.Create, supply an XmlReaderSettings object that indicates you want to perform
validation. Then move through the document one node at a time by calling XmlReader.Read, catching
any validation exceptions. To find all the errors in a document without catching exceptions, handle the
ValidationEventHandler event on the XmlReaderSettings object given as parameter to XmlReader.
How It Works
An XML schema defines the rules that a given type of XML document must follow. The schema includes
rules that define the following:
• The elements and attributes that can appear in a document
• The data types for elements and attributes
• The structure of a document, including what elements are children of other
elements
• The order and number of child elements that appear in a document
• Whether elements are empty, can include text, or require fixed values
XML schema documents are beyond the scope of this chapter, but you can learn much from a
simple example. This recipe uses the product catalog first presented in recipe 6-1.
At its most basic level, XML Schema Definition (XSD) defines the elements that can occur in an XML
document. XSD documents are themselves written in XML, and you use a separate predefined element
(named <element>) in the XSD document to indicate each element that is required in the target
document. The type attribute indicates the data type. Here is an example for a product name:
<xsd:element name="productName" type="xsd:string" />
And here is an example for the product price:
<xsd:element name="productPrice" type="xsd:decimal" />
The basic schema data types are defined at www.w3.org/TR/xmlschema-2. They map closely to .NET
data types and include string, int, long, decimal, float, dateTime, boolean, and base64Binary—to name
a few of the most frequently used types.
Both productName and productPrice are simple types because they contain only character data.
Elements that contain nested elements are called complex types. You can nest them together using a
CHAPTER 6 ■ XML PROCESSING


282

<sequence> tag, if order is important, or an <all> tag if it is not. Here is how you might model the
<product> element in the product catalog. Notice that attributes are always declared after elements, and
they are not grouped with a <sequence> or <all> tag because order is never important:

<xsd:complexType name="product">
<xsd:sequence>
<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="productPrice" type="xsd:decimal"/>
<xsd:element name="inStock" type="xsd:boolean"/>
</xsd:sequence>
<xsd:attribute name="id" type="xsd:integer"/>
</xsd:complexType>

By default, a listed element can occur exactly one time in a document. You can configure this
behavior by specifying the maxOccurs and minOccurs attributes. Here is an example that allows an
unlimited number of products in the catalog:

<xsd:element name="product" type="product" maxOccurs="unbounded" />

Here is the complete schema for the product catalog XML:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="

<! Define the complex type product. >
<xsd:complexType name="product">
<xsd:sequence>

<xsd:element name="productName" type="xsd:string"/>
<xsd:element name="description" type="xsd:string"/>
<xsd:element name="productPrice" type="xsd:decimal"/>
<xsd:element name="inStock" type="xsd:boolean"/>
</xsd:sequence>
<xsd:attribute name="id" type="xsd:integer"/>
</xsd:complexType>

<! This is the structure the document must match.
It begins with a productCatalog element that nests other elements. >
<xsd:element name="productCatalog">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="catalogName" type="xsd:string"/>
<xsd:element name="expiryDate" type="xsd:date"/>

<xsd:element name="products">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="product" type="product"
maxOccurs="unbounded" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
CHAPTER 6 ■ XML PROCESSING

283

</xsd:sequence>
</xsd:complexType>

</xsd:element>

</xsd:schema>

The XmlReader class can enforce these schema rules, providing you explicitly request a validating
reader when you use the XmlReader.Create method. (Even if you do not use a validating reader, an
exception will be thrown if the reader discovers XML that is not well formed, such as an illegal character,
improperly nested tags, and so on.)
Once you have created your validating reader, the validation occurs automatically as you read
through the document. As soon as an error is found, the XmlReader raises a ValidationEventHandler
event with information about the error on the XmlReaderSettings object given at creation time. If you
want, you can handle this event and continue processing the document to find more errors. If you do
not handle this event, an XmlException will be raised when the first error is encountered, and processing
will be aborted.
The Code
The next example shows a utility class that displays all errors in an XML document when the ValidateXml
method is called. Errors are displayed in a console window, and a final Boolean variable is returned to
indicate the success or failure of the entire validation operation.

using System;
using System.Xml;
using System.Xml.Schema;

namespace Apress.VisualCSharpRecipes.Chapter06
{
public class ConsoleValidator
{
// Set to true if at least one error exists.
private bool failed;


public bool Failed
{
get {return failed;}
}

public bool ValidateXml(string xmlFilename, string schemaFilename)
{
// Set the type of validation.
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Schema;

// Load the schema file.
XmlSchemaSet schemas = new XmlSchemaSet();
settings.Schemas = schemas;
// When loading the schema, specify the namespace it validates
// and the location of the file. Use null to use
CHAPTER 6 ■ XML PROCESSING

284

// the targetNamespace value from the schema.
schemas.Add(null, schemaFilename);

// Specify an event handler for validation errors.
settings.ValidationEventHandler += ValidationEventHandler;

// Create the validating reader.
XmlReader validator = XmlReader.Create(xmlFilename, settings);

failed = false;

try
{
// Read all XML data.
while (validator.Read()) {}
}
catch (XmlException err)
{
// This happens if the XML document includes illegal characters
// or tags that aren't properly nested or closed.
Console.WriteLine("A critical XML error has occurred.");
Console.WriteLine(err.Message);
failed = true;
}
finally
{
validator.Close();
}

return !failed;
}

private void ValidationEventHandler(object sender,
ValidationEventArgs args)
{
failed = true;

// Display the validation error.
Console.WriteLine("Validation error: " + args.Message);
Console.WriteLine();
}

}
}

Here is how you would use the class to validate the product catalog:

public class Recipe06_08
{
private static void Main()
{
ConsoleValidator consoleValidator = new ConsoleValidator();
Console.WriteLine("Validating ProductCatalog.xml.");

×