Tải bản đầy đủ (.pdf) (8 trang)

Tài liệu XML Features in ADO.NET ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (44.97 KB, 8 trang )

XML Features in ADO.NET
Dino Esposito
Wintellect
December 13, 2001
XML and some of its related technologies, including XPath, XSL Transformation, and XML Schema, are
unquestionably at the foundation of ADO.NET. As a matter of fact, XML constitutes the key element for the greatly
improved interoperability of the ADO.NET object model when compared to ADO. In ADO, XML was merely an (non-
default) I/O format used to persist the content of a disconnected recordset. The participation of XML in the building
and in the inter-workings of ADO.NET is much deeper. The aspects of ADO.NET where the interaction and the
integration with XML is stronger can be summarized in the following points:
l
Objects serialization and remoting
l
A dual programming interface
l
XML-driven batch update (for SQL Server 2000 only)
In ADO.NET, you have several options to save objects to, and restore objects from, XML documents. To say it all,
this ability belongs to one object only—the DataSet—but can be extended to other container objects with minimal
coding. Saving objects like DataTable and DataView to XML is substantially seen as a special case of the DataSet
serialization.
Furthermore, ADO.NET and XML classes provide for a sort of unified intermediate API that is made available to
programmers through a dual and synchronized programming interface. You can access and update data using either
the hierarchical and node-based approach of XML ,or the relational approach of column-based tabular data sets. At
any time, you can switch from a DataSet representation of the data to XMLDOM, and vice versa. Data is
synchronized and any change you enter in either model is immediately reflected and visible in the other. In this
article, I'll cover ADO.NET-to-XML serialization and XML data access —that is, the first two points in the list above.
Next month, I'll attack XML-driven batch update—one of the coolest features you get from SQL Server 2000 XML
Extensions (SQLXML 2.0).
DataSet and XML
Just like any other .NET object, the DataSet object is stored in memory in a binary format. Unlike other objects,
though, the DataSet is always remoted and serialized in a special XML format called the DiffGram. When the


DataSet trespasses the boundaries of the app -domains, or the physical borders of the machine, it is automatically
rendered as a DiffGram. At destination, the DataSet is silently rebuilt as a binary and promptly usable object. The
same serialization facilities are available to applications through a bunch of methods, a pair of which clearly stands
out. They are ReadXml and WriteXml. The table below presents the DataSet's methods you can use to work with
XML, both in reading and in writing.
Methods Description
GetXml
l
Returns a string that is the XML representation of the data stored in the DataSet
l
No schema information is included
GetXmlSchema
l
Returns a string that is the XML schema information for the DataSet
ReadXml
l
Populates a DataSet object with the specified XML data read from a stream or a file
ReadXmlSchema
l
Loads the specified XML schema information into the current DataSet object
WriteXml
l
Writes the XML data, and optionally the schema, that represents the DataSet
l
Can write to a stream or a file
WriteXmlSchema
l
Writes the string being the XML schema information for the DataSet
Page 1 of 8XML Features in ADO.NET
1/3/2002 />As the table shows, when working with DataSet and XML you can manage data and schema information as distinct

entities. You can take the XML schema out of a DataSet and use it as a string. Alternately, you could write it to a
disk file or load it into an empty DataSet object. Side by side with the methods listed in the table above, the
DataSet object also features two XML-related properties—Namespace and Prefix . Namespace determines the
XML namespace used to scope XML attributes and elements when you read them into a DataSet. The prefix to alias
the namespace is stored in the Prefix property.
Building a DataSet from XML
The ReadXml method fills out a DataSet object reading from a variety of sources, including disk files, .NET
streams, or instances of XmlReader objects. The method can process any type of XML file, but, of course, XML files
having a non-tabular and rather irregularly shaped structure may create some problems when rendered in terms of
rows and columns.
The ReadXml method has several overloads, all of which are rather similar. They take the XML source plus an
optional XmlReadMode value as arguments. For example:
public XmlReadMode ReadXml(String, XmlReadMode);
The method creates the relational schema for the DataSet depending on the read mode specified, and whether or
not a schema already exists in the DataSet. The following code snippet illustrates the typical code you would use to
load a DataSet from XML.
StreamReader sr = new StreamReader(fileName);
DataSet ds = new DataSet();
ds.ReadXml(sr); // defaults to XmlReadMode.Auto
sr.Close();
When loading the contents of XML sources into a DataSet, ReadXml does not merge rows whose primary key
information match. To merge an existing DataSet with one loaded from XML, you first have to create a new DataSet,
and then merge the two using the Merge method. During the merging, the rows that get overwritten are those with
matching primary keys. An alternate way to merge existing DataSet objects with contents read from XML is
through the DiffGram format (more on this later).
The table below illustrates the various read modes that ReadXml supports. You can set them using the
XmlReadMode enumeration.
The default read mode is not listed in the table and is XmlReadMode.Auto . When this mode is set, or when no
read mode has been explicitly set, the ReadXml method examines the XML source and chooses the most
appropriate option.

If the XML source turns out to be a DiffGram, then it is loaded as such. If the source happens to contain an inline
schema, or a reference to an external schema, it is loaded using ReadXmlSchema. Finally, if no schema
information is available in the XML source, then the ReadXml method infers the schema using the DataSet's
InferXmlSchema method. The relational structure, or schema, of a DataSet is made up of tables, columns,
constraints, and relations. Let's review what exactly happens when each of the modes is set.
The XmlReadMode.IgnoreSchema option causes the method to ignore any inline or referenced schema. So, the
data is loaded into the existing DataSet schema and any data that does not fit in it is discarded. If no schema exists
in the DataSet, no data is loaded. Notice that an empty DataSet has no schema information. Bear in mind that if the
l Can write to a stream or a file
Read Mode Description
IgnoreSchema Ignores any inline schema and relies on the DataSet's existing schema
ReadSchema Reads any inline schema and loads the data and schema
InferSchema Ignores any inline schema and infers the schema from the XML data
DiffGram Reads a DiffGram and adds the data to the current schema
Fragment Reads and adds XML fragments until the end of the stream is reached
Page 2 of 8XML Features in ADO.NET
1/3/2002 />XML source is in a DiffGram format, then the IgnoreSchema option has the same effect as
XmlReadMode.DiffGram .
// no schema in the DataSet, no data will be loaded
DataSet ds = new DataSet();
StreamReader sr = new StreamReader(fileName);
ds.ReadXml(sr, XmlReadMode.IgnoreSchema);
The XmlReadMode.ReadSchema option works only with inline schema and does not recognize external
references. It can add new tables to the DataSet, but if any tables defined in the inline schema already exist in the
DataSet, an exception is thrown. You cannot use the ReadSchema option to change the schema of an existing
table. If the DataSet does not contain a schema (that is, it's empty), and there is no inline schema, no data is read
and loaded. ReadXml can only read inline schemas defined using the XML Schema definition language (XSD) or
XML-Data Reduced (XDR). No document type definition (DTD) is supported.
When the XmlReadMode.InferSchema option is set, ReadXml infers the schema directly from the structure of
the XML data and ignores any inline schema that may be present. The data is loaded only when the schema has

been inferred. Existing schemas are extended by adding new tables, or by adding new columns to existing tables, as
appropriate. You can use the DataSet's InferXmlSchema method to load the schema from the specified XML file
into the DataSet. You can control, to some extent, the XML elements processed during the schema inference
operation. The signature of the method InferXmlSchema allows you to specify an array of namespaces whose
elements will be excluded from inference.
void InferXmlSchema(String fileName, String[] rgNamespace);
A DiffGram is an XML format that ADO.NET utilizes to persist the state of a DataSet. Similar to the SQLXML's
updategram format, the DiffGram contains both current and original versions of data rows. Loading a DiffGram using
ReadXml will merge rows that have the matching primary keys. You explicitly instruct ReadXml to work on a
DiffGram using the XmlReadMode.DiffGram flag. When using the DiffGram format, the target DataSet must have
the same schema as the DiffGram, otherwise the merge operation fails and an exception is thrown.
When the XmlReadMode.Fragment option is set, the DataSet is loaded from an XML fragment. An XML fragment
is a valid piece of XML that identifies elements, attributes, and documents. The XML fragment for an element is the
markup text that fully qualifies the XML element (node, CDATA, processing instruction, comment). The fragment for
an attribute is the attribute value, and for a document is the entire content set. When the XML data is a fragment,
the root level rules for well-formed XML documents are not applied. Fragments that match the existing schema are
appended to the appropriate tables and fragments that do not match the schema are discarded. ReadXml reads
from the current position to the end of the stream. The XmlReadMode.Fragment option should not be used to
populate an empty, and subsequently schema-less, DataSet.
Serializing DataSet Objects to XML
The XML representation of the DataSet can be written to a file, a stream, an XmlWriter object, or a string, using
the WriteXml method. The XML representation can include, or not include, schema information. The actual behavior
of the WriteXml method can be controlled through the optional XmlWriteMode parameter you can pass. The
values in the XmlWriteMode enum determine the output's layout. The DataSet representation includes tables,
relations, and constraints definitions. The rows in the DataSet's tables are written in their current versions unless
you choose to employ the DiffGram format. The table below summarizes the writing options available with
XmlWriteMode.
XmlWriteMode.IgnoreSchema is the default option. The following code shows the typical way to serialize a
DataSet to XML.
Write Mode Description

IgnoreSchema Writes the contents of the DataSet as XML data without schema
WriteSchema Writes the contents of the DataSet with an inline XSD schema
DiffGram Writes the contents of the DataSet as a DiffGram, including original and current values
Page 3 of 8XML Features in ADO.NET
1/3/2002 />// ds is the DataSet
StreamWriter sw = new StreamWriter(fileName);
ds.WriteXml(sw); // defaults to XmlWriteMode.IgnoreSchema
sw.Close();
A few factors influence the final structure of the XML document created from the contents of a DataSet object. They
are;
l The overall XML format used—DiffGram or plain hierarchical representation of the current contents
l The presence of schema information
l Nested relations
l How table columns are mapped to XML elements
The DiffGram format is a special XML format that I'll describe more in depth in a moment. It does not include
schema information, but preserves row state and row errors. As such, it seems able to constitute a closer
representation of the living instance of the DataSet.
Schema information, if present in the DataSet being created, is always written as inline XSD. There's no way for you
to write it as XDR, DTD, or add a reference to an external file. The root node of the generated XML file takes the
name of the DataSet, or NewDataSet, if no name has been specified. The following code snippet is an example of
the XML representation of a DataSet object made of two tables, Customers and Orders, put into relationship on
the CustomerID field.
<MyDataSet>
<xs:schema ... />
<Customers>
<CustomerID>1</CustomerID>
<FName>John</FName>
<LName>Smith</LName>
</Customers>
<Customers>

<CustomerID>2</CustomerID>
<FName>Joe</FName>
<LName>Users</LName>
</Customers>
<Orders>
<CustomerID>1</CustomerID>
<OrderID>000A01</OrderID>
</Orders>
<Orders>
<CustomerID>1</CustomerID>
<OrderID>000B01</OrderID>
</Orders>
</MyDataSet>
From the listing above, you can hardly say that the two tables are in relation. Some information about this is set in
the <xs:schema> tree, but aside from this, nothing else would hint toward that conclusion. A relation set on the
CustomerID field put down in words sounds like this—all the orders issued by a given customer. The XML tree above
does not provide an immediate representation for this information. To change the order of the nodes when a data
relation is present in the DataSet, you can set the Nested attribute of the DataRelation object to true. As a result
of this change, the XML code from above changes as follows:
<MyDataSet>
<xs:schema ... />
<Customers>
<CustomerID>1</CustomerID>
<FName>John</FName>
<LName>Smith</LName>
<Orders>

<CustomerID>1</CustomerID>

<OrderID>000A01</OrderID>


Page 4 of 8XML Features in ADO.NET
1/3/2002 /></Orders>

<Orders>

<CustomerID>1</CustomerID>

<OrderID>000B01</OrderID>

</Orders>

</Customers>
<Customers>
<CustomerID>2</CustomerID>
<FName>Joe</FName>
<LName>Users</LName>
</Customers>
</MyDataSet>
As you can see, all the orders are now grouped under the corresponding customer subtree.
By default, in XML table columns are rendered as node elements. However, this is only a setting that can be
adjusted on a per column basis. The DataColumn object has a property called ColumnMapping that determines
how that column will be rendered in XML. The ColumnMapping property takes values from the MappingType
enum listed below.
If the XML output format is the DiffGram, then the Hidden mapping type is ignored. In this case, though, the
DiffGram representation of the column features a special attribute that marks the column as originally hidden from
XML serialization. The SimpleContent mapping type is not always available and can be used only if the table has one
column.
The DiffGram Format
A DiffGram is simply an XML string written according to a particular schema that represents the contents of a

DataSet. It is in no way a .NET type. The following code snippet shows how to serialize a DataSet object to a
DiffGram.
StreamWriter sw = new StreamWriter(fileName);
ds.WriteXml(sw, XmlWriteMode.DiffGram);
sw.Close();
The resulting XML code is rooted in the <diffgr:diffgram> node, and contains up to three distinct sections of data,
as shown below:
<diffgr:diffgram>
<MyDataSet>
:
</MyDataSet>

<diffgr:before>
:
</diffgr:before>

<diffgr:errors>
:
</diffgr:errors>
</diffgr:diffgram>
Mapping Description
Element Mapped to an XML node element:
<CustomerID>value</CustomerID>

Attribute Mapped to an XML node attribute:
<Customers CustomerID=value>

Hidden Not displayed in the XML data unless the DiffGram format is used
SimpleContent Mapped to simple text:
<Customers>value</Customers>


Page 5 of 8XML Features in ADO.NET
1/3/2002 />

×