CHAPTER 9 ■ ADDITIONAL XML CAPABILITIES
327
try
{
xDocument.Validate(schemaSet, (o, vea) =>
{
Console.WriteLine(
"A validation error occurred processing object type {0}.",
o.GetType().Name);
Console.WriteLine(vea.Message);
throw (new Exception(vea.Message));
});
Console.WriteLine("Document validated successfully.");
}
catch (Exception ex)
{
Console.WriteLine("Exception occurred: {0}", ex.Message);
Console.WriteLine("Document validated unsuccessfully.");
}
Check that out. An entire method specified as a lambda expression. Do lambda expressions rock
or what? Here are the results:
Here is the source XML document:
<BookParticipants>
<BookParticipant type="Author" language="English">
<FirstName>Joe</FirstName>
<LastName>Rattz</LastName>
</BookParticipant>
<BookParticipant type="Editor">
<FirstName>Ewan</FirstName>
<LastName>Buckingham</LastName>
</BookParticipant>
</BookParticipants>
A validation error occurred processing object type XAttribute.
The 'language' attribute is not declared.
Exception occurred: The 'language' attribute is not declared.
Document validated unsuccessfully.
Now, I’ll try an example specifying to add the schema information, as shown in Listing 9-17.
Listing 9-17. Unsuccessfully Validating an XML Document Against an XSD Schema Using a Lambda
Expression and Specifying to Add Schema Information
XDocument xDocument = new XDocument(
new XElement("BookParticipants",
new XElement("BookParticipant",
new XAttribute("type", "Author"),
new XElement("FirstName", "Joe"),
new XElement("MiddleName", "Carson"),
new XElement("LastName", "Rattz")),
Rattz_789-3.book Page 327 Tuesday, October 16, 2007 2:21 PM
328
CHAPTER 9
■ ADDITIONAL XML CAPABILITIES
new XElement("BookParticipant",
new XAttribute("type", "Editor"),
new XElement("FirstName", "Ewan"),
new XElement("LastName", "Buckingham"))));
Console.WriteLine("Here is the source XML document:");
Console.WriteLine("{0}{1}{1}", xDocument, System.Environment.NewLine);
XmlSchemaSet schemaSet = new XmlSchemaSet();
schemaSet.Add(null, "bookparticipants.xsd");
xDocument.Validate(schemaSet, (o, vea) =>
{
Console.WriteLine("An exception occurred processing object type {0}.",
o.GetType().Name);
Console.WriteLine("{0}{1}", vea.Message, System.Environment.NewLine);
},
true);
foreach(XElement element in xDocument.Descendants())
{
Console.WriteLine("Element {0} is {1}", element.Name,
element.GetSchemaInfo().Validity);
XmlSchemaElement se = element.GetSchemaInfo().SchemaElement;
if (se != null)
{
Console.WriteLine(
"Schema element {0} must have MinOccurs = {1} and MaxOccurs = {2}{3}",
se.Name, se.MinOccurs, se.MaxOccurs, System.Environment.NewLine);
}
else
{
// Invalid elements will not have a SchemaElement.
Console.WriteLine();
}
}
This example starts like the previous. It creates an XML document. This time, though, I added
an additional element for the first BookParticipant: MiddleName. This is invalid because it is not spec-
ified in the schema I am validating against. Unlike the previous example, I specify for the Validate
method to add the schema information. Also, unlike the previous example, I am not throwing an
exception in my validation event handling code. As you may recall, I mentioned previously that the
validation must complete to have the schema information added, so your handler must not throw an
exception. Therefore, I also removed the try/catch block as well.
After the validation completes, I am enumerating all the elements in the document and displaying
whether they are valid. Additionally, I obtain the SchemaElement object from the added schema infor-
mation. Notice that I make sure the SchemaElement property is not null, because if the element is not
valid, the SchemaElement property may be null. After all, the element may not be valid because it is not
in the schema, so how could there be schema information? The same applies to the SchemaAttribute
property for invalid attributes. Once I have a SchemaElement object, I display its Name, MinOccurs, and
MaxOccurs properties.
Rattz_789-3.book Page 328 Tuesday, October 16, 2007 2:21 PM
CHAPTER 9 ■ ADDITIONAL XML CAPABILITIES
329
Here are the results:
Here is the source XML document:
<BookParticipants>
<BookParticipant type="Author">
<FirstName>Joe</FirstName>
<MiddleName>Carson</MiddleName>
<LastName>Rattz</LastName>
</BookParticipant>
<BookParticipant type="Editor">
<FirstName>Ewan</FirstName>
<LastName>Buckingham</LastName>
</BookParticipant>
</BookParticipants>
An exception occurred processing object type XElement.
The element 'BookParticipant' has invalid child element 'MiddleName'. List of
possible elements expected: 'LastName'.
Element BookParticipants is Invalid
Schema element BookParticipants must have MinOccurs = 1 and MaxOccurs = 1
Element BookParticipant is Invalid
Schema element BookParticipant must have MinOccurs = 1 and MaxOccurs =
79228162514264337593543950335
Element FirstName is Valid
Schema element FirstName must have MinOccurs = 1 and MaxOccurs = 1
Element MiddleName is Invalid
Element LastName is NotKnown
Element BookParticipant is Valid
Schema element BookParticipant must have MinOccurs = 1 and MaxOccurs =
79228162514264337593543950335
Element FirstName is Valid
Schema element FirstName must have MinOccurs = 1 and MaxOccurs = 1
Element LastName is Valid
Schema element LastName must have MinOccurs = 1 and MaxOccurs = 1
There are no real surprises in this output. Notice that the MaxOccurs property value for the
BookParticipant element is a very large number. This is because in the schema, the maxOccurs attribute
is specified to be "unbounded".
For the final pair of validation examples, I will use one of the Validate method prototypes that
applies to validating elements. The first thing you will notice about it is that it has an argument that
requires an XmlSchemaObject to be passed. This means the document must have already been validated.
This seems odd. This is for a scenario where we have already validated once and need to revalidate a
portion of the XML tree.
For this scenario, imagine I load an XML document and validate it to start. Next, I have allowed
a user to update the data for one of the book participants and now need to update the XML document
Rattz_789-3.book Page 329 Tuesday, October 16, 2007 2:21 PM
330
CHAPTER 9
■ ADDITIONAL XML CAPABILITIES
to reflect the user’s changes, and I want to validate that portion of the XML tree again, after the updates.
This is where the Validate method prototypes of the elements and attributes can come in handy.
Because this example, shown in Listing 9-18, is more complex than some of the previous exam-
ples, I will explain it as I go. First, to be a little different, and because I need an expanded schema to
facilitate an edit to the XML tree, I will define the schema programmatically instead of loading it from
a file, as I have in the previous examples.
Listing 9-18. Successfully Validating an XML Element
string schema =
@"<?xml version='1.0' encoding='utf-8'?>
<xs:schema attributeFormDefault='unqualified' elementFormDefault='qualified'
xmlns:xs=' /> <xs:element name='BookParticipants'>
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs='unbounded' name='BookParticipant'>
<xs:complexType>
<xs:sequence>
<xs:element name='FirstName' type='xs:string' />
<xs:element minOccurs='0' name='MiddleInitial'
type='xs:string' />
<xs:element name='LastName' type='xs:string' />
</xs:sequence>
<xs:attribute name='type' type='xs:string' use='required' />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>";
XmlSchemaSet schemaSet = new XmlSchemaSet();
schemaSet.Add("", XmlReader.Create(new StringReader(schema)));
In the previous code, I merely copied the schema from the file that I have been using. I did a
search on the double quotes and replaced them with single quotes. I also added a MiddleInitial
element between the FirstName and LastName elements. Notice that I specify the minOccurs attribute
as 0 so the element is not required. Next, I create a schema set from the schema. Next, it’s time to
create an XML document:
XDocument xDocument = new XDocument(
new XElement("BookParticipants",
new XElement("BookParticipant",
new XAttribute("type", "Author"),
new XElement("FirstName", "Joe"),
new XElement("LastName", "Rattz")),
new XElement("BookParticipant",
new XAttribute("type", "Editor"),
new XElement("FirstName", "Ewan"),
new XElement("LastName", "Buckingham"))));
Console.WriteLine("Here is the source XML document:");
Console.WriteLine("{0}{1}{1}", xDocument, System.Environment.NewLine);
Rattz_789-3.book Page 330 Tuesday, October 16, 2007 2:21 PM
CHAPTER 9 ■ ADDITIONAL XML CAPABILITIES
331
There is nothing new here. I just created the same document I usually do for the examples and
displayed it. Now I will validate the document:
bool valid = true;
xDocument.Validate(schemaSet, (o, vea) =>
{
Console.WriteLine("An exception occurred processing object type {0}.",
o.GetType().Name);
Console.WriteLine(vea.Message);
valid = false;
}, true);
Console.WriteLine("Document validated {0}.{1}",
valid ? "successfully" : "unsuccessfully",
System.Environment.NewLine);
Notice that I validate a little differently than I have in previous examples. I initialize a bool to
true, representing whether the document is valid. Inside the validation handler, I set it to false. So
if a validation error occurs, valid will be set to false. I then check the value of valid after validation
to determine whether the document is valid, and display its validity. In this example, the document
is valid at this point.
Now, it’s time to imagine that I am allowing a user to edit any particular book participant. The
user has edited the book participant whose first name is "Joe". So I obtain a reference for that element,
update it, and revalidate it after the update:
XElement bookParticipant = xDocument.Descendants("BookParticipant").
Where(e => ((string)e.Element("FirstName")).Equals("Joe")).First();
bookParticipant.Element("FirstName").
AddAfterSelf(new XElement("MiddleInitial", "C"));
valid = true;
bookParticipant.Validate(bookParticipant.GetSchemaInfo().SchemaElement, schemaSet,
(o, vea) =>
{
Console.WriteLine("An exception occurred processing object type {0}.",
o.GetType().Name);
Console.WriteLine(vea.Message);
valid = false;
}, true);
Console.WriteLine("Element validated {0}.{1}",
valid ? "successfully" : "unsuccessfully",
System.Environment.NewLine);
As you can see, I initialize valid to true and call the Validate method, this time on the
bookParticipant element instead of the entire document. Inside the validation event handler, I set
valid to false. After validation of the book participant element, I display its validity. Here are the results:
Rattz_789-3.book Page 331 Tuesday, October 16, 2007 2:21 PM
332
CHAPTER 9
■ ADDITIONAL XML CAPABILITIES
Here is the source XML document:
<BookParticipants>
<BookParticipant type="Author">
<FirstName>Joe</FirstName>
<LastName>Rattz</LastName>
</BookParticipant>
<BookParticipant type="Editor">
<FirstName>Ewan</FirstName>
<LastName>Buckingham</LastName>
</BookParticipant>
</BookParticipants>
Document validated successfully.
Element validated successfully.
As you can see, the validation of the element is successful. For the final example, I have the same
code, except this time when I update the BookParticipant element, I will create a MiddleName element,
as opposed to MiddleInitial, which is not valid. Listing 9-19 is the code.
Listing 9-19. Unsuccessfully Validating an XML Element
string schema =
@"<?xml version='1.0' encoding='utf-8'?>
<xs:schema attributeFormDefault='unqualified' elementFormDefault='qualified'
xmlns:xs=' /> <xs:element name='BookParticipants'>
<xs:complexType>
<xs:sequence>
<xs:element maxOccurs='unbounded' name='BookParticipant'>
<xs:complexType>
<xs:sequence>
<xs:element name='FirstName' type='xs:string' />
<xs:element minOccurs='0' name='MiddleInitial' type='xs:string' />
<xs:element name='LastName' type='xs:string' />
</xs:sequence>
<xs:attribute name='type' type='xs:string' use='required' />
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>";
XmlSchemaSet schemaSet = new XmlSchemaSet();
schemaSet.Add("", XmlReader.Create(new StringReader(schema)));
Rattz_789-3.book Page 332 Tuesday, October 16, 2007 2:21 PM
CHAPTER 9 ■ ADDITIONAL XML CAPABILITIES
333
XDocument xDocument = new XDocument(
new XElement("BookParticipants",
new XElement("BookParticipant",
new XAttribute("type", "Author"),
new XElement("FirstName", "Joe"),
new XElement("LastName", "Rattz")),
new XElement("BookParticipant",
new XAttribute("type", "Editor"),
new XElement("FirstName", "Ewan"),
new XElement("LastName", "Buckingham"))));
Console.WriteLine("Here is the source XML document:");
Console.WriteLine("{0}{1}{1}", xDocument, System.Environment.NewLine);
bool valid = true;
xDocument.Validate(schemaSet, (o, vea) =>
{
Console.WriteLine("An exception occurred processing object type {0}.",
o.GetType().Name);
Console.WriteLine(vea.Message);
valid = false;
}, true);
Console.WriteLine("Document validated {0}.{1}",
valid ? "successfully" : "unsuccessfully",
System.Environment.NewLine);
XElement bookParticipant = xDocument.Descendants("BookParticipant").
Where(e => ((string)e.Element("FirstName")).Equals("Joe")).First();
bookParticipant.Element("FirstName").
AddAfterSelf(new XElement("MiddleName", "Carson"));
valid = true;
bookParticipant.Validate(bookParticipant.GetSchemaInfo().SchemaElement, schemaSet,
(o, vea) =>
{
Console.WriteLine("An exception occurred processing object type {0}.",
o.GetType().Name);
Console.WriteLine(vea.Message);
valid = false;
}, true);
Console.WriteLine("Element validated {0}.{1}",
valid ? "successfully" : "unsuccessfully",
System.Environment.NewLine);
Rattz_789-3.book Page 333 Tuesday, October 16, 2007 2:21 PM
334
CHAPTER 9
■ ADDITIONAL XML CAPABILITIES
This code is identical to the previous example except instead of adding a MiddleInitial element, I
added a MiddleName element that is invalid. Here are the results:
Here is the source XML document:
<BookParticipants>
<BookParticipant type="Author">
<FirstName>Joe</FirstName>
<LastName>Rattz</LastName>
</BookParticipant>
<BookParticipant type="Editor">
<FirstName>Ewan</FirstName>
<LastName>Buckingham</LastName>
</BookParticipant>
</BookParticipants>
Document validated successfully.
An exception occurred processing object type XElement.
The element 'BookParticipant' has invalid child element 'MiddleName'. List of
possible elements expected: 'MiddleInitial, LastName'.
Element validated unsuccessfully.
As you can see, the element is no longer valid. Now, this example may seem a little hokey because
I said to imagine a user is editing the document. No developer in their right mind would create a user
interface that would intentionally allow a user to create edits that would be invalid. But imagine if
that user is in reality some other process on the XML document. Perhaps you passed the XML docu-
ment to someone else’s program to make some update and you know they personally have it in for
you and are seeking your personal destruction. Now it may make sense to revalidate. You know you
can’t trust them.
XPath
If you are accustomed to using XPath, you can also gain some XPath query capabilities thanks to the
System.Xml.XPath.Extensions class in the System.Xml.XPath namespace. This class adds XPath search
capability via extension methods.
Prototypes
Here is a list of some of the method prototypes available in the System.Xml.XPath.Extensions class:
XPathNavigator Extensions.CreateNavigator(this XNode node);
XPathNavigator Extensions.CreateNavigator(this XNode node, XmlNameTable nameTable);
object Extensions.XPathEvaluate(this XNode node, string expression);
object Extensions.XPathEvaluate(this XNode node, string expression,
IXmlNamespaceResolver resolver);
XElement Extensions.XPathSelectElement(this XNode node, string expression);
XElement Extensions.XPathSelectElement(this XNode node, string expression,
IXmlNamespaceResolver resolver);
Rattz_789-3.book Page 334 Tuesday, October 16, 2007 2:21 PM
CHAPTER 9 ■ ADDITIONAL XML CAPABILITIES
335
IEnumerable<XElement> Extensions.XPathSelectElements(this XNode node,
string expression);
IEnumerable<XElement> Extensions.XPathSelectElements(this XNode node,
string expression, IXmlNamespaceResolver resolver);
Examples
Using these extension methods, it is possible to query a LINQ to XML document using XPath search
expressions. Listing 9-20 is an example.
Listing 9-20. Querying XML with XPath Syntax
XDocument xDocument = new XDocument(
new XElement("BookParticipants",
new XElement("BookParticipant",
new XAttribute("type", "Author"),
new XElement("FirstName", "Joe"),
new XElement("LastName", "Rattz")),
new XElement("BookParticipant",
new XAttribute("type", "Editor"),
new XElement("FirstName", "Ewan"),
new XElement("LastName", "Buckingham"))));
XElement bookParticipant = xDocument.XPathSelectElement(
"//BookParticipants/BookParticipant[FirstName='Joe']");
Console.WriteLine(bookParticipant);
As you can see, I created my typical XML document. I didn’t display the document this time,
though. I called the XPathSelectElement method on the document and provided an XPath search
expression to find the BookParticipant element whose FirstName element’s value is "Joe". Here are
the results:
<BookParticipant type="Author">
<FirstName>Joe</FirstName>
<LastName>Rattz</LastName>
</BookParticipant>
Using the XPath extension methods, you can obtain a reference to a System.Xml.XPath.
XPathNavigator object to navigate your XML document, perform an XPath query to return an
element or sequence of elements, or evaluate an XPath query expression.
Summary
At this point, if you came into this chapter without any knowledge of XML, I can only assume you are
overwhelmed. If you did have a basic understanding of XML, but not of LINQ to XML, I hope I have
made this understandable for you. The power and flexibility of the LINQ to XML API is quite intoxicating.
While writing this chapter and creating the examples, I would find myself lulled into a state of
XML euphoria, a state without the underlying desire to avoid using “real” XML, only to find myself
back at my day job planning on taking advantage of the simplicity LINQ to XML offers, despite the
fact that my work project cannot use it because it has not been released yet. So many times I thought,
Rattz_789-3.book Page 335 Tuesday, October 16, 2007 2:21 PM
336
CHAPTER 9
■ ADDITIONAL XML CAPABILITIES
if I could just use functional construction to whip up this piece of XML, only to find the reality of the
situation causing me to use my standby XML library, the String.Format method.
Don’t chastise me for taking the easy way out. As I previously mentioned, I was at a Microsoft
seminar where the presenter demonstrated code that built XML in a similar manner.
Having written the many examples in this chapter and the previous LINQ to XML chapters, I can’t
tell you how excited I will be to actually use the LINQ to XML API in my real production code. The fact
is that with LINQ to XML, because XML creation is largely based on elements rather than documents
coupled with the capability of functional construction, creating XML is painless. It might even be
fun. Combine the easy creation with the intuitive traversal and modification, and it becomes a joy to
work with, considering the alternatives.
Having all this ease of use working with XML piled on top of a powerfully flexible query language
makes LINQ to XML my personal favorite part of LINQ. If you find yourself dreading XML or intimi-
dated to work with it, I think you will find the LINQ to XML API quite pleasant.
Rattz_789-3.book Page 336 Tuesday, October 16, 2007 2:21 PM
■ ■ ■
PART 4
LINQ to DataSet
Rattz_789-3.book Page 337 Tuesday, October 16, 2007 2:21 PM
Rattz_789-3.book Page 338 Tuesday, October 16, 2007 2:21 PM
339
■ ■ ■
CHAPTER 10
LINQ to DataSet Operators
While I haven’t covered LINQ to SQL yet, let me mention at this time that to utilize LINQ to SQL for a
given database, source code classes must be generated for that database and compiled, or a mapping
file must be created. This means that performing LINQ queries with LINQ to SQL on a database
unknown until runtime is not possible. Additionally, LINQ to SQL only works with Microsoft SQL
Server. What is a developer to do?
The LINQ to DataSet operators allow a developer to perform LINQ queries on a DataSet, and
since a DataSet can be obtained using normal ADO.NET SQL queries, LINQ to DataSet allows LINQ
queries over any database that can be queried with ADO.NET. This provides a far more dynamic
database-querying interface than LINQ to SQL.
You may be wondering, under what circumstances would you not know the database until runtime?
It is true that for the typical application, the database is known while the application is being developed,
and therefore LINQ to DataSet is not as necessary. But what about a database utility type application? For
example, consider an application such as SQL Server Enterprise Manager. It doesn’t know what
databases are going to be installed on the server until runtime. The Enterprise Manager application
allows you to examine whatever databases are installed on the server, with whatever tables are in a
specified database. There is no way the Enterprise Manager application developer could generate
the LINQ to SQL classes at compile time for your database. This is when LINQ to DataSet becomes a
necessity.
While this part of the book is named LINQ to DataSet, you will find that the added operators are
really pertaining to DataTable, DataRow, and DataColumn objects. Don’t be surprised that you don’t see
DataSet objects referenced often in this chapter. It is understood that in real-life circumstances, your
DataTable objects will almost always come from DataSet objects. However, for the purpose of database
independence, brevity, and clarity, I have intentionally created simple DataTable objects program-
matically, rather than retrieved them from a database, for most of the examples.
The LINQ to DataSet operators consist of several special operators from multiple assemblies
and namespaces that allow the developer to do the following:
• Perform set operations on sequences of DataRow objects.
• Retrieve and set DataColumn values.
• Obtain a LINQ standard IEnumerable<T> sequence from a DataTable so Standard Query Oper-
ators may be called.
• Copy modified sequences of DataRow objects to a DataTable.
In addition to these LINQ to DataSet operators, once you have called the AsEnumerable operator,
you can call the LINQ to Objects Standard Query Operators on the returned sequence of DataRow
objects, resulting in even more power and flexibility.
Rattz_789-3.book Page 339 Tuesday, October 16, 2007 2:21 PM
340
CHAPTER 10
■ LINQ TO DATASET OPERATORS
Assembly References
For the examples in this chapter, you will need to add references to your project for the System.
Data.dll and System.Data.DataSetExtensions.dll assembly DLLs, if they have not already been
added.
Referenced Namespaces
To use the LINQ to DataSet operators, add a using directive to the top of your code for the System.
Linq, and System.Data namespaces if they are not already there:
using System.Data;
using System.Linq;
This will allow your code to find the LINQ to DataSet operators.
Common Code for the Examples
Virtually every example in this chapter will require a DataTable object on which to perform LINQ to
DataSet queries. In real production code, you would typically obtain these DataTable objects by querying
a database. However, for some of these examples, I present situations where the data conditions in
a typical database table will not suffice. For example, I need duplicate records to demonstrate the
Distinct method. Rather than jump through hoops trying to manipulate the database to contain the
data I may need, I programmatically create a DataTable containing the specific data I desire for each
example. This also relieves you of the burden of having a database for testing the majority of these
examples.
Since I will not actually be querying a database for the DataTable objects, and to make creating
the DataTable objects easy, I generate them from an array of objects of a predefined class. For the
predefined class, I use the Student class.
A Simple Class with Two Public Members
class Student
{
public int Id;
public string Name;
}
You should just imagine that I am querying a table named Students where each record is a
student, and the table contains two columns: Id and Name.
To make creation of the DataTable simple, and to prevent obscuring the relevant details of each
example, I use a common method to convert an array of Student objects into a DataTable object. This
allows the data to easily vary from example to example. Here is that common method.
Converting an Array of Student Objects to a DataTable
static DataTable GetDataTable(Student[] students)
{
DataTable table = new DataTable();
table.Columns.Add("Id", typeof(Int32));
table.Columns.Add("Name", typeof(string));
Rattz_789-3.book Page 340 Tuesday, October 16, 2007 2:21 PM
CHAPTER 10 ■ LINQ TO DATASET OPERATORS
341
foreach (Student student in students)
{
table.Rows.Add(student.Id, student.Name);
}
return (table);
}
There isn’t anything complex in this method. I just instantiate a DataTable object, add two
columns, and add a row for each element in the passed students array.
For many of the examples of the LINQ to DataSet operators, I need to display a DataTable for the
results of the code to be clear. While the actual data in the DataTable varies, the code needed to display
the DataTable object’s header will not. Instead of repeating this code throughout all the examples, I
create the following method and call it in any example needing to display a DataTable header.
The OutputDataTableHeader Method
static void OutputDataTableHeader(DataTable dt, int columnWidth)
{
string format = string.Format("{0}0,-{1}{2}", "{", columnWidth, "}");
// Display the column headings.
foreach(DataColumn column in dt.Columns)
{
Console.Write(format, column.ColumnName);
}
Console.WriteLine();
foreach(DataColumn column in dt.Columns)
{
for(int i = 0; i < columnWidth; i++)
{
Console.Write("=");
}
}
Console.WriteLine();
}
The purpose of the method is to output the header of a DataTable in a tabular form.
DataRow Set Operators
As you may recall, in the LINQ to Objects API, there are a handful of Standard Query Operators that
exist for the purpose of making sequence set-type comparisons. I am referring to the Distinct, Except,
Intersect, Union, and SequenceEqual operators. Each of these operators performs a set operation on
two sequences.
For each of these set-type operators, determining sequence element equality is necessary to
perform the appropriate set operation. These operators perform element comparisons by calling the
GetHashCode and Equals methods on the elements. For a DataRow, this results in a reference compar-
ison, which is not the desired behavior. This will result in the incorrect determination of element
equality, thereby causing the operators to return erroneous results. Because of this, each of these
operators has an additional prototype that I omitted in the LINQ to Objects chapters. This additional
prototype allows an IEqualityComparer object to be provided as an argument. Conveniently, a comparer
object has been provided for us specifically for these versions of the operators,
Rattz_789-3.book Page 341 Tuesday, October 16, 2007 2:21 PM
342
CHAPTER 10
■ LINQ TO DATASET OPERATORS
System.Data.DataRowComparer.Default. This comparer class is in the System.Data namespace in
the System.Data.Entity.dll assembly. This comparer determines element equality by comparing
the number of columns and the static data type of each column, and using the IComparable interface
on the column’s dynamic data type if that type implements the interface; otherwise, it calls the
System.Object’s static Equals method.
Each of these additional operator prototypes is defined in the System.Linq.Enumerable static
class just as the other prototypes of these operators are.
In this section, I provide some examples to illustrate the incorrect and, more importantly, correct
way to make these sequence comparisons when working with DataSet objects.
Distinct
The Distinct operator removes duplicate rows from a sequence of objects. It returns an object that
when enumerated, enumerates a source sequence of objects and returns a sequence of objects with
the duplicate rows removed. Typically, this operator determines duplicates by calling each element’s
data type’s GetHashCode and Equals methods. However, for DataRow type objects, this would cause an
incorrect result.
Because I am going to call the additional prototype and provide the System.Data.DataRowComparer.
Default comparer object, the element equality will be properly determined. With it, a row is deemed
to be a duplicate by comparing DataRow objects using the number of columns in a row and the static
data type of each column, and then using the IComparable interface on each column if its dynamic data
type implements the IComparable interface, or calling the static Equals method in System.Object if it
does not.
Prototypes
The Distinct operator has one prototype I will cover.
The Distinct Prototype
public static IEnumerable<T> Distinct<T> (
this IEnumerable<T> source,
IEqualityComparer<T> comparer);
Examples
In the first example, I create a DataTable from an array of Student objects using my common
GetDataTable method, and the array will have one duplicate in it. The record whose Id is equal to 1
is repeated in the array. I then display the DataTable. This proves that the record is in the DataTable
twice. Then I remove any duplicate rows by calling the Distinct operator, and display the DataTable
again, showing that the duplicate row has been removed. Listing 10-1 shows the code.
Listing 10-1. The Distinct Operator with an Equality Comparer
Student[] students = {
new Student { Id = 1, Name = "Joe Rattz" },
new Student { Id = 6, Name = "Ulyses Hutchens" },
new Student { Id = 19, Name = "Bob Tanko" },
new Student { Id = 45, Name = "Erin Doutensal" },
new Student { Id = 1, Name = "Joe Rattz" },
new Student { Id = 12, Name = "Bob Mapplethorpe" },
new Student { Id = 17, Name = "Anthony Adams" },
Rattz_789-3.book Page 342 Tuesday, October 16, 2007 2:21 PM
CHAPTER 10 ■ LINQ TO DATASET OPERATORS
343
new Student { Id = 32, Name = "Dignan Stephens" }
};
DataTable dt = GetDataTable(students);
Console.WriteLine("{0}Before calling Distinct(){0}",
System.Environment.NewLine);
OutputDataTableHeader(dt, 15);
foreach (DataRow dataRow in dt.Rows)
{
Console.WriteLine("{0,-15}{1,-15}",
dataRow.Field<int>(0),
dataRow.Field<string>(1));
}
IEnumerable<DataRow> distinct =
dt.AsEnumerable().Distinct(DataRowComparer.Default);
Console.WriteLine("{0}After calling Distinct(){0}",
System.Environment.NewLine);
OutputDataTableHeader(dt, 15);
foreach (DataRow dataRow in distinct)
{
Console.WriteLine("{0,-15}{1,-15}",
dataRow.Field<int>(0),
dataRow.Field<string>(1));
}
Notice that I use the AsEnumerable operator to get a sequence of DataRow objects from the
DataTable because that is what I must call the Distinct operator on. Also notice that in the students
array, the record with an Id equal to 1 is repeated.
You no doubt noticed that I call a method named Field on the DataRow object. For now, just
understand that this is a helper method that makes obtaining a DataColumn object’s value from a
DataRow more convenient. I cover the Field<T> operator in depth later in the “DataRow Field Operators”
section of this chapter.
Here are the results:
Before calling Distinct()
Id Name
==============================
1 Joe Rattz
6 Ulyses Hutchens
19 Bob Tanko
45 Erin Doutensal
1 Joe Rattz
12 Bob Mapplethorpe
17 Anthony Adams
32 Dignan Stephens
Rattz_789-3.book Page 343 Tuesday, October 16, 2007 2:21 PM
344
CHAPTER 10
■ LINQ TO DATASET OPERATORS
After calling Distinct()
Id Name
==============================
1 Joe Rattz
6 Ulyses Hutchens
19 Bob Tanko
45 Erin Doutensal
12 Bob Mapplethorpe
17 Anthony Adams
32 Dignan Stephens
Notice that in the results, before I call the Distinct operator, the record whose Id is 1 is repeated,
and that after calling the Distinct operator, the second occurrence of that record has been removed.
For a second example, I am going to demonstrate the results if I had called the Distinct operator
without specifying the comparer object. The code is shown in Listing 10-2.
Listing 10-2. The Distinct Operator Without an Equality Comparer
Student[] students = {
new Student { Id = 1, Name = "Joe Rattz" },
new Student { Id = 6, Name = "Ulyses Hutchens" },
new Student { Id = 19, Name = "Bob Tanko" },
new Student { Id = 45, Name = "Erin Doutensal" },
new Student { Id = 1, Name = "Joe Rattz" },
new Student { Id = 12, Name = "Bob Mapplethorpe" },
new Student { Id = 17, Name = "Anthony Adams" },
new Student { Id = 32, Name = "Dignan Stephens" }
};
DataTable dt = GetDataTable(students);
Console.WriteLine("{0}Before calling Distinct(){0}",
System.Environment.NewLine);
OutputDataTableHeader(dt, 15);
foreach (DataRow dataRow in dt.Rows)
{
Console.WriteLine("{0,-15}{1,-15}",
dataRow.Field<int>(0),
dataRow.Field<string>(1));
}
IEnumerable<DataRow> distinct = dt.AsEnumerable().Distinct();
Console.WriteLine("{0}After calling Distinct(){0}",
System.Environment.NewLine);
OutputDataTableHeader(dt, 15);
foreach (DataRow dataRow in distinct)
{
Console.WriteLine("{0,-15}{1,-15}",
Rattz_789-3.book Page 344 Tuesday, October 16, 2007 2:21 PM
CHAPTER 10 ■ LINQ TO DATASET OPERATORS
345
dataRow.Field<int>(0),
dataRow.Field<string>(1));
}
The only difference between this code and the previous example is that the call to the Distinct
operator does not have an equality comparer provided. Will it remove the duplicate row? Let’s take
a look:
Before calling Distinct()
Id Name
==============================
1 Joe Rattz
6 Ulyses Hutchens
19 Bob Tanko
45 Erin Doutensal
1 Joe Rattz
12 Bob Mapplethorpe
17 Anthony Adams
32 Dignan Stephens
After calling Distinct()
Id Name
==============================
1 Joe Rattz
6 Ulyses Hutchens
19 Bob Tanko
45 Erin Doutensal
1 Joe Rattz
12 Bob Mapplethorpe
17 Anthony Adams
32 Dignan Stephens
No, it did not remove the duplicate. As you can now see, these two examples are comparing
rows differently.
Except
The Except operator produces a sequence of DataRow objects that are in the first sequence of DataRow
objects that do not exist in the second sequence of DataRow objects. The operator returns an object that,
when enumerated, enumerates the first sequence of DataRow objects collecting the unique elements,
followed by enumerating the second sequence of DataRow objects, removing those elements from
the collection that also occur in the first sequence. Lastly, it yields the remaining elements in the
collection in the order they are collected.
To determine that elements from the same sequence are unique, and that one element in one
sequence is or is not equal to an element in the other sequence, the operator must be able to deter-
mine whether two elements are equal. Typically, this operator determines element equality by calling
each element’s data type’s GetHashCode and Equals methods. However, for DataRow type objects, this
would cause an incorrect result.
Because I am going to call the additional prototype and provide the System.Data.DataRowComparer.
Default comparer object, the element equality will be properly determined. With it, a row is deemed
to be a duplicate by comparing DataRow objects using the number of columns in a row and the static
Rattz_789-3.book Page 345 Tuesday, October 16, 2007 2:21 PM
346
CHAPTER 10
■ LINQ TO DATASET OPERATORS
data type of each column, and then using the IComparable interface on each column if its dynamic data
type implements the IComparable interface, or calling the static Equals method in System.Object if
it does not.
Prototypes
The Except operator has one prototype I will cover.
The Except Prototype
public static IEnumerable<T> Except<T> (
this IEnumerable<T> first,
IEnumerable<T> second,
IEqualityComparer<T> comparer);
Examples
In this example, I call the Except operator twice. The first time, I pass the System.Data.DataRowComparer.
Default comparer object, so the results of the first query with the Except operator should be correct.
The second time I call the Except operator I will not pass the comparer object. This causes the results
of that query to be incorrect. Listing 10-3 shows the code.
Listing 10-3. The Except Operator with and Without the Comparer Object
Student[] students = {
new Student { Id = 1, Name = "Joe Rattz" },
new Student { Id = 7, Name = "Anthony Adams" },
new Student { Id = 13, Name = "Stacy Sinclair" },
new Student { Id = 72, Name = "Dignan Stephens" }
};
Student[] students2 = {
new Student { Id = 5, Name = "Abe Henry" },
new Student { Id = 7, Name = "Anthony Adams" },
new Student { Id = 29, Name = "Future Man" },
new Student { Id = 72, Name = "Dignan Stephens" }
};
DataTable dt1 = GetDataTable(students);
IEnumerable<DataRow> seq1 = dt1.AsEnumerable();
DataTable dt2 = GetDataTable(students2);
IEnumerable<DataRow> seq2 = dt2.AsEnumerable();
IEnumerable<DataRow> except =
seq1.Except(seq2, System.Data.DataRowComparer.Default);
Console.WriteLine("{0}Results of Except() with comparer{0}",
System.Environment.NewLine);
OutputDataTableHeader(dt1, 15);
foreach (DataRow dataRow in except)
{
Console.WriteLine("{0,-15}{1,-15}",
Rattz_789-3.book Page 346 Tuesday, October 16, 2007 2:21 PM
CHAPTER 10 ■ LINQ TO DATASET OPERATORS
347
dataRow.Field<int>(0),
dataRow.Field<string>(1));
}
except = seq1.Except(seq2);
Console.WriteLine("{0}Results of Except() without comparer{0}",
System.Environment.NewLine);
OutputDataTableHeader(dt1, 15);
foreach (DataRow dataRow in except)
{
Console.WriteLine("{0,-15}{1,-15}",
dataRow.Field<int>(0),
dataRow.Field<string>(1));
}
There isn’t much to this example. I basically create two DataTable objects that are populated
from the Student arrays. I create sequences from each DataTable object by calling the AsEnumerable
method. I then call the Except operator on the two sequences and display the results of each. As you
can see, the first time I call the Except operator, I pass the System.Data.DataRowComparer.Default
comparer object. The second time I do not.
Let’s look at the results of that code by pressing Ctrl+F5:
Results of Except() with comparer
Id Name
==============================
1 Joe Rattz
13 Stacy Sinclair
Results of Except() without comparer
Id Name
==============================
1 Joe Rattz
7 Anthony Adams
13 Stacy Sinclair
72 Dignan Stephens
As you can see, the Except operator called with the System.Data.DataRowComparer.Default
comparer object is able to properly determine the element equality for the two sequences, whereas
the Except operator without the comparer object does not identify any elements from the two sequences
as being equal, which is not the desired behavior for this operator.
Intersect
The Intersect operator produces a sequence of DataRow objects that is the intersection of two
sequences of DataRow objects. It returns an object that when enumerated enumerates the first sequence
of DataRow objects collecting the unique elements, followed by enumerating the second sequence of
DataRow objects, marking those elements occurring in both sequences. Lastly, it yields the marked
elements in the order they are collected.
Rattz_789-3.book Page 347 Tuesday, October 16, 2007 2:21 PM
348
CHAPTER 10
■ LINQ TO DATASET OPERATORS
To determine that elements from the same sequence are unique, and that one element in one
sequence is or is not equal to an element in the other sequence, the operator must be able to deter-
mine whether two elements are equal. Typically, this operator determines element equality by calling
each element’s data type’s GetHashCode and Equals methods. However, for DataRow type objects, this
would cause an incorrect result.
Because I am going to call the additional prototype and provide the System.Data.DataRowComparer.
Default comparer object, the element equality will be properly determined. With it, a row is deemed
to be a duplicate by comparing DataRow objects using the number of columns in a row and the static
data type of each column, and then using the IComparable interface on each column if its dynamic data
type implements the IComparable interface, or calling the static Equals method in System.Object if it
does not.
Prototypes
The Intersect operator has one prototype I will cover.
The Intersect Prototype
public static IEnumerable<T> Intersect<T> (
this IEnumerable<T> first,
IEnumerable<T> second,
IEqualityComparer<T> comparer);
Examples
In this example, I use the same basic code I use in the Except example, except I will change the operator
calls from Except to Intersect. Listing 10-4 shows that code.
Listing 10-4. The Intersect Operator with and Without the Comparer Object
Student[] students = {
new Student { Id = 1, Name = "Joe Rattz" },
new Student { Id = 7, Name = "Anthony Adams" },
new Student { Id = 13, Name = "Stacy Sinclair" },
new Student { Id = 72, Name = "Dignan Stephens" }
};
Student[] students2 = {
new Student { Id = 5, Name = "Abe Henry" },
new Student { Id = 7, Name = "Anthony Adams" },
new Student { Id = 29, Name = "Future Man" },
new Student { Id = 72, Name = "Dignan Stephens" }
};
DataTable dt1 = GetDataTable(students);
IEnumerable<DataRow> seq1 = dt1.AsEnumerable();
DataTable dt2 = GetDataTable(students2);
IEnumerable<DataRow> seq2 = dt2.AsEnumerable();
IEnumerable<DataRow> intersect =
seq1.Intersect(seq2, System.Data.DataRowComparer.Default);
Rattz_789-3.book Page 348 Tuesday, October 16, 2007 2:21 PM
CHAPTER 10 ■ LINQ TO DATASET OPERATORS
349
Console.WriteLine("{0}Results of Intersect() with comparer{0}",
System.Environment.NewLine);
OutputDataTableHeader(dt1, 15);
foreach (DataRow dataRow in intersect)
{
Console.WriteLine("{0,-15}{1,-15}",
dataRow.Field<int>(0),
dataRow.Field<string>(1));
}
intersect = seq1.Intersect(seq2);
Console.WriteLine("{0}Results of Intersect() without comparer{0}",
System.Environment.NewLine);
OutputDataTableHeader(dt1, 15);
foreach (DataRow dataRow in intersect)
{
Console.WriteLine("{0,-15}{1,-15}",
dataRow.Field<int>(0),
dataRow.Field<string>(1));
}
There is nothing new here. I create a couple of DataTable objects from the two Student arrays
and obtain sequences from them. I then call the Intersect operator first with the comparer object
and then without. I display the results after each Intersect call. Let’s look at the results of that code
by pressing Ctrl+F5:
Results of Intersect() with comparer
Id Name
==============================
7 Anthony Adams
72 Dignan Stephens
Results of Intersect() without comparer
Id Name
==============================
As you can see, the Intersect operator with the comparer is able to properly determine the
element equality from the two sequences, whereas the Intersect operator without the comparer did
not identify any elements from the two sequences as being equal, which is not the desired behavior
for this operator.
Union
The Union operator produces a sequence of DataRow objects that is the union of two sequences
of DataRow objects. It returns an object that, when enumerated, enumerates the first sequence of
DataRow objects, followed by the second sequence of DataRow objects, yielding any element that has
not already been yielded.
Rattz_789-3.book Page 349 Tuesday, October 16, 2007 2:21 PM
350
CHAPTER 10
■ LINQ TO DATASET OPERATORS
To determine that elements have already been yielded, the operator must be able to determine
whether two elements are equal. Typically, this operator determines element equality by calling
each element’s data type’s GetHashCode and Equals methods. However, for DataRow type objects, this
would cause an incorrect result.
Because I am going to call the additional prototype and provide the System.Data.DataRowComparer.
Default comparer object, the element equality will be properly determined. With it, a row is deemed
to be a duplicate by comparing DataRow objects using the number of columns in a row and the static
data type of each column, and then using the IComparable interface on each column if its dynamic data
type implements the IComparable interface, or calling the static Equals method in System.Object if it
does not.
Prototypes
The Union operator has one prototype I will cover.
The Union Prototype
public static IEnumerable<T> Union<T> (
this IEnumerable<T> first,
IEnumerable<T> second,
IEqualityComparer<T> comparer);
Examples
In this example, I use the same basic code I use in the Intersect example, except I will change the
operator calls from Intersect to Union. Listing 10-5 shows that code.
Listing 10-5. The Union Operator with and Without the Comparer Object
Student[] students = {
new Student { Id = 1, Name = "Joe Rattz" },
new Student { Id = 7, Name = "Anthony Adams" },
new Student { Id = 13, Name = "Stacy Sinclair" },
new Student { Id = 72, Name = "Dignan Stephens" }
};
Student[] students2 = {
new Student { Id = 5, Name = "Abe Henry" },
new Student { Id = 7, Name = "Anthony Adams" },
new Student { Id = 29, Name = "Future Man" },
new Student { Id = 72, Name = "Dignan Stephens" }
};
DataTable dt1 = GetDataTable(students);
IEnumerable<DataRow> seq1 = dt1.AsEnumerable();
DataTable dt2 = GetDataTable(students2);
IEnumerable<DataRow> seq2 = dt2.AsEnumerable();
IEnumerable<DataRow> union =
seq1.Union(seq2, System.Data.DataRowComparer.Default);
Console.WriteLine("{0}Results of Union() with comparer{0}",
System.Environment.NewLine);
Rattz_789-3.book Page 350 Tuesday, October 16, 2007 2:21 PM
CHAPTER 10 ■ LINQ TO DATASET OPERATORS
351
OutputDataTableHeader(dt1, 15);
foreach (DataRow dataRow in union)
{
Console.WriteLine("{0,-15}{1,-15}",
dataRow.Field<int>(0),
dataRow.Field<string>(1));
}
union = seq1.Union(seq2);
Console.WriteLine("{0}Results of Union() without comparer{0}",
System.Environment.NewLine);
OutputDataTableHeader(dt1, 15);
foreach (DataRow dataRow in union)
{
Console.WriteLine("{0,-15}{1,-15}",
dataRow.Field<int>(0),
dataRow.Field<string>(1));
}
Again, there is nothing new here. I create a couple of DataTable objects from the two Student
arrays and obtain sequences from them. I then call the Union operator first with the comparer object
and then without. I display the results after each Union call. Here are the results:
Results of Union() with comparer
Id Name
==============================
1 Joe Rattz
7 Anthony Adams
13 Stacy Sinclair
72 Dignan Stephens
5 Abe Henry
29 Future Man
Results of Union() without comparer
Id Name
==============================
1 Joe Rattz
7 Anthony Adams
13 Stacy Sinclair
72 Dignan Stephens
5 Abe Henry
7 Anthony Adams
29 Future Man
72 Dignan Stephens
Notice that the results of the Union operator with the comparer object are correct, but the results
of the Union operator without the comparer object are not.
Rattz_789-3.book Page 351 Tuesday, October 16, 2007 2:21 PM