Tải bản đầy đủ (.pdf) (40 trang)

SQL Server MVP Deep Dives- P6

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (412.17 KB, 40 trang )

156

CHAPTER 12

Using XML to transport relational data

entities—the XML Schema. Again, the subject of XML and XML Schema exceeds the
scope of this article; therefore let’s emphasize one principal benefit of using the XML
Schema.
In SQL Server, the XML standard is implemented as a data type, and because all data
types typically represent (implement) a specific (data) domain, the purpose of the XML
Schema in regards to the XML data type is to enforce its domain.
The XML Schema will provide us with a guarantee that the discography data coming in or going out of our database is valid—that it complies with the business rules.

Data domain
A data domain defines which values are allowed in a specific data element (such as
a variable, a column, and so on).
For instance, a data element of a numerical type can only contain numerical data
(numbers)—it can’t, for instance, contain letters or punctuation marks (with the obvious exception of the decimal point). A data element of the integer numerical type
can only contain numbers, and no other characters.
The XML data domain is similar to the two examples in the previous paragraph, but
is governed by a much more complex set of rules defined by an XML Schema.

ENTITIES OF PRINCIPAL IMPORTANCE

Let’s take another look at the physical model. We can see two entities that stand out as
being more significant to the business compared to the rest.
The first such entity is the Album—even from Joe’s narrative it should be quite
clear that the Album represents a principal business entity. It contains all the information vital to the discography business: all the data about the Tracks and about the
Album itself.
The other principal entity—also verifiable both in the logical and the physical


models, as well as in Joe’s statements—is the Band. Bands represent (at least in our
particular model) the groups of Persons collectively responsible for the existence of
the discography business.
This means that we’ll require two XML Schemas: one to represent the Albums, and
one to represent the Bands. By using two separate schemas, we’ll also be able to isolate
the two principal business entities (allowing independent exchange of information
regarding each of them), and we’ll be able to eliminate some redundancy. We’ll illustrate that last statement in a minute.
Let’s now implement our physical model in the form of XML Schemas. We’ll be
implementing the same data model as before, but this time using a different technology. See listing 1.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


Understanding before coding

Listing 1

157

The Album XML Schema

xmlns:xs=" />xmlns:ma=" />xmlns:m=" />elementFormDefault="qualified"
attributeFormDefault="qualified"
targetNamespace=" /><xs:import namespace=" />➥ schemaLocation="common.xsd"/>
<xs:element name="discography">
<xs:complexType>
<xs:sequence>
<xs:element name="album" maxOccurs="unbounded">

<xs:complexType>
<xs:sequence>
<xs:element name="track" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="author" type="m:person" maxOccurs="unbounded"/>
<xs:element name="band" maxOccurs="unbounded">
<xs:complexType>
<xs:attribute name="bandName" type="m:bandName"/>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="title" type="m:entityTitle" use="required"/>
➥ use="required"/>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute name="title" type="m:entityTitle" use="required"/>
<xs:attribute name="published" type="xs:dateTime" use="required"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

In listing 1, we can observe how our data model can be implemented as an XML
Schema from the perspective of the Album entity: a Discography contains one or
more Albums, which contain one or more Tracks written by one or more Authors and

performed by one or more Bands.
Because we’ll be using a separate XML Schema for the Band entity, we can leave
out the Band Members from the Album definition, clearly eliminating unnecessary
data redundancy, as shown in listing 2.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


158

CHAPTER 12

Listing 2

Using XML to transport relational data

The Band XML Schema

xmlns:xs=" />xmlns:mb=" />xmlns:m=" />elementFormDefault="qualified"
attributeFormDefault="qualified"
targetNamespace=" /><xs:import namespace=" />➥ schemaLocation="common.xsd"/>
<xs:element name="bands">
<xs:complexType>
<xs:sequence>
<xs:element name="band" maxOccurs="unbounded">
<xs:complexType>
<xs:sequence>
<xs:element name="member" type="m:person" maxOccurs="unbounded"/>

</xs:sequence>
<xs:attribute name="bandName" type="m:bandName" use="required"/>
<xs:attribute name="established" type="xs:dateTime" use="required"/>
➥ default="9999-12-31T00:00:00.000"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>

In listing 2 we can observe how the data model can be implemented from the perspective of the Band entity: a Discography is a collection of one or more Bands, containing
one or more Members.
This way, each individual Band entity can exist independently of any Album entity,
but the consistency of the Discography as a whole remains intact as long as each
Album entity references the appropriate Band entity (or entities).
In listings 1 and 2 we can observe that both XML Schemas import a third one. This is
due to yet another simplification, based on the fact that both the Album and the Band

A few comments on the structure of the XML Schemas
The entities are implemented as XML elements. Their attributes are implemented as
XML attributes of the XML element implementing the corresponding entity.
The relationships between the entities are implemented in the structuring of the XML,
and the nesting of XML elements. For example, following the logical model rule, which
states that the Discography entity contains Album entities, the Album element is
placed inside the Discography element, and because an Album entity contains Track
entities, the latter are represented by elements nested inside the Album element.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Licensed to Kerri Ross <>


Understanding before coding

159

XML Schemas use a shared collection of types. These shared types are defined in the
Common XML Schema, shown in listing 3.
For instance, the Person entity is present in the Album XML as well as the Band
XML; therefore both can use the same type for the Person entity, rather than explicitly

implementing two separate types with the same set of properties.
Listing 3

Common XML Schema

xmlns:xs=" />xmlns:m=" />elementFormDefault="qualified"
attributeFormDefault="qualified"
targetNamespace=" /><xs:simpleType name="personName">
<xs:restriction base="xs:string">
<xs:maxLength value="150"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="bandName">
<xs:restriction base="xs:string">
<xs:minLength value="1"/>
<xs:maxLength value="450"/>
</xs:restriction>

</xs:simpleType>
<xs:simpleType name="entityTitle">
<xs:restriction base="xs:string">
<xs:minLength value="1"/>
<xs:maxLength value="450"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="person">
<xs:attribute name="firstName" use="required">
<xs:simpleType>
<xs:restriction base="m:personName">
<xs:minLength value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="middleName" type="m:personName" use="required"/>
<xs:attribute name="lastName" type="m:personName" use="required"/>
</xs:complexType>
</xs:schema>

Now, if you want to see an example of the amount of redundancy eliminated because
we chose to separate the two principal entities and implemented two XML Schemas
instead of one, look at the XML examples containing partial discography data of two
/www.manning.com/SQLServerMVPDeep
well-known rock bands published at http:/
Dives (both XML Schemas are also located there).

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>



160

CHAPTER 12

Using XML to transport relational data

Enabling and maintaining the data flow
After implementing the data store part of the data model, we can now focus on the
operational part of the logical model. We mentioned three data management operations that will be supported by our solution: entity creation, entity modification, and
entity retrieval.
Regarding their relationship to the data flow, we can divide the supported data
management operations into two groups:
Inbound operations—Govern the flow of data into the database. Create and Update
are both inbound operations;
Outbound operations—Govern the flow of data out of the database. Read is the
outbound operation.
With inbound operations, our objective should be clear. We’ll have to
Extract the data from the XML source.
Insert the data into the data store that doesn’t yet exist there.
Update data that already exists in the data store to reflect the data extracted
from the source.
With outbound operations, the objective is to
Read the data from the database and return it in XML format.

Preparing the inbound data flow
Before we begin coding, we must consider all the relevant facts about the XML sources
used in our solution.
Both XML Schemas allow the XML to contain more than one entity. The related
entities are nested in the source, which reflects the relationships between them. Not

only must we extract the entities from the XML source, but we also have to do this in
the correct order.
How do we determine the correct order? By reviewing the physical model, shown
in figure 1, the dependency of individual sets of data can be observed (follow the arrows
and identify where they all point to). When importing the data into the database, we
should start with the independent entities and finish with dependent ones.
This is a valid order of inbound operations for the Album XML Schema:
1
2
3
4
5
6
7

Title (doesn’t depend on any other entity)
Album (depends on Title)
Track (depends on Title and Album)
Person (doesn’t depend on any other entity)
Track Author (depends on Person and Track)
Band (doesn’t depend on any other Entity)
Track Performer (depends on Band and Track)

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


Enabling and maintaining the data flow

161


This is a valid order of inbound operations for the Band XML Schema:
1
2
3

Band (doesn’t depend on any other entity)
Person (doesn’t depend on any other entity)
Band Member (depends on Person and Band)

EXTRACTING DATA FROM XML USING TRANSACT-SQL

You can choose from three data retrieval methods implemented in SQL Server 2005
and SQL Server 2008, and all the details regarding them are available in Books Online.
In this chapter, we only need to know the bare essentials about these methods:
The purpose of the value() method is to extract the value from a single XML
data element (a singleton) and return it in the designated data type. We’ll use
this method to extract the values from the XML nodes.
The purpose of the query() method is to read data from one or more XML
nodes and return a sequence of XML data elements or a single XML data element. The query() method can also be used to create XML data, but in this
chapter we’ll only use it to retrieve data. The return type of the query()
method is XML. We’ll use this method to specify the target of the extraction
operation and to transform the source data if needed.
The purpose of the nodes() method is to read data from an XML entity and
return a set of XML nodes. This method returns a row of XML data for each node
in the XML entity that corresponds to the given criteria. We’ll use this method
to retrieve the data from the XML source in the form of a dataset representing a
single entity or a single relationship between our entities.
The execution of all three methods is governed through an XQuery statement or an
XPath expression passed to each of the methods as an argument. A detailed explanation

of XQuery and XPath expressions is once again outside the scope of this chapter, but
a brief version of the explanation is presented in the sidebar, “A few words on XPath
expressions and XQuery statements.”

A few words on XPath expressions and XQuery statements
The XPath expression is the principal expression used in retrieving data from XML entities. It guides the XML processor as it traverses the XML entity toward the targets
containing the data that you want to extract.
For example, the /orders/order/orderDate XPath expression points to all elements named orderDate that exist inside elements named order, which in turn exist inside the element named orders, which exists at the root of the XML entity.
We could compare the XPath expression with the FROM clause of a Transact-SQL
(T-SQL) query.
An XPath expression can be extended with an XPath predicate, the purpose of which
is to restrict the traversal of the XML entity even further.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


162

CHAPTER 12

Using XML to transport relational data

A few words on XPath expressions and XQuery statements (continued)
For example, the /orders/order/orderDate[. > 20080101] XPath expression contains an XPath predicate (enclosed in square brackets) restricting the XPath expression to point to only those elements named orderDate that contain values greater
than 20080101.
We could compare the XPath predicate with the WHERE clause of a T-SQL query.
Compared to the XPath expression, the XQuery statement provides additional functionality needed in extracting the data from XML entities and transforming it. An XQuery statement can also be used to write XML data. One or more XPath expressions are
used in every XQuery statement.


In this chapter, no data management operations against XML entities will require any
knowledge of XQuery.
In table 3, we can see the XPath expressions pointing to individual entities of the
Album XML Schema, and in table 4 we can see the XPath expressions pointing to individual entities of the Band XML Schema.
Table 3

XPath expressions used to extract the entities from the Album XML

Entity

XPath expression

Title

/ma:discography/ma:album
/ma:discography/ma:album/ma:track

Album

/ma:discography/ma:album

Track

/ma:discography/ma:album/ma:track

Person

/ma:discography/ma:album/ma:track/ma:author

Band


/ma:discography/ma:album/ma:track/ma:band

Note that in tables 3 and 4, the names of the elements are prefixed with a reference to
the respective XML namespace implemented by each XML Schema. You can observe
all of the XML namespace declarations in listings 1 through 3. The namespaces are
declared in the xmlns attributes of the root (schema) element of each XML Schema.
Each XML Schema also targets a specific XML namespace, as declared in the targetNamespace attribute of the schema element. This specifies the namespace of the XML
entity in which a particular XML Schema is used.
Table 4

XPath expressions used to extract the entities from the Band XML

Entity

XPath expression

Band

/mb:bands/mb:band

Person

/mb:bands/mb:band/mb:member

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


Enabling and maintaining the data flow


163

A few words about XML namespaces
First of all, the subject of XML namespaces exceeds the scope of this chapter. But
what you should know about XML namespaces in order to understand their role in
these examples is that they represent the business domain in which a particular XML
entity exists.
In our examples, we’ve introduced three XML namespaces: one for Album data, another for Band data, and a third to represent a shared domain used both by the Album
and the Band domains.
Think about it: does an Album represent the same business entity as a Band? No,
absolutely not! Therefore, if we’ve decided on using XML to represent each of them,
we need a way to distinguish between them, and this is where XML namespaces
come in.
An XML entity that exists in the Album namespace can’t be mistaken for an XML entity
that exists in the Band namespace, although they’re both represented as XML. In
plain English: an Album can’t be a Band and a Band can’t be an Album.
Microsoft SQL Server 2005 and later versions support XML namespaces and introduces two methods used to declare them using T-SQL. Throughout this chapter we’ll
be using the WITH XMLNAMESPACES clause to declare XML namespaces that will be
used in XPath expressions. All the details regarding XML namespaces in SQL Server
and the WITH XMLNAMESPACES clause can be found in Books Online.
General information regarding XML namespaces can also be found online: http://
www.w3.org/TR/xml-names/.

Importing the data
Using the XPath expressions listed in tables 3 and 4, we can prepare individual T-SQL
SELECT statements used to extract the data from the XML source. In these SELECT
statements, we’ll use the XML retrieval methods mentioned earlier, and in the final
definition of the query, we’ll include them in INSERT statements that will be used to
import the data extracted from the XML source into the corresponding tables of our

Discography database.
Note that in the INSERT statements, we’ll also have to prevent certain constraint
violations—most of all, we’ll need to prevent the import of data that already exists in
the database.
EXTRACTING ALBUM DATA

The source of the Album data is an XML entity based on the Album XML Schema
shown in listing 1 earlier in this chapter. This XML Schema provides the structure to
hold the data for the Title, Album, and Person entities, including data for the Track
Author and Track Performer associative entities.
All the details regarding XPath functions implemented in SQL Server are available
in the Books Online article titled “XQuery Functions against the xml Data Type.”

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


164

CHAPTER 12

Using XML to transport relational data

In the following examples, @xml designates a variable of the XML type holding the
XML data in question. First off, listing 4 shows the code to extract the titles.
Listing 4
with

select


union
select

Extracting the titles

xmlnamespaces
(
' as xsi
,' as m
,' as ma
)
Discography.Album.query
('
data(@ma:title)
').value
(
'.'
,'nvarchar(450)'
) as Title
from
@xml.nodes
('
/ma:discography/ma:album
') Discography (Album)
Discography.Track.query
('
data(@ma:title)
').value
(
'.'

,'nvarchar(450)'
)
from
@xml.nodes
('
/ma:discography/ma:album/ma:track
') Discography (Track)

The Title entity contains both the Album and the Track titles. Because in SQL Server
2005 it’s not possible to specify a union XPath expression, the two sets must be merged
into one using the T-SQL UNION clause.
Using the union XPath expression, the query could be simplified as shown in
listing 5.
Listing 5
with

select

Simplified query with union XPath expression

xmlnamespaces
(
' as xsi
,' as m
,' as ma
)
Discography.Album.query
('

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

Licensed to Kerri Ross <>


Enabling and maintaining the data flow

from

data(@ma:title)
').value
(
'.'
,'nvarchar(450)'
) as Title
@xml.nodes
('
/ma:discography/ma:album
|
/ma:discography/ma:album/ma:track
') Discography (Album)

Next up, listing 6 shows the code to extract the albums.
Listing 6
with

select

Extracting the albums

xmlnamespaces
(

' as xsi
,' as m
,' as ma
)
Discography.Album.query
('
data(@ma:title)
').value
(
'.'
,'nvarchar(450)'
) as Title
,Discography.Album.query
('
data(@ma:published)
').value
(
'.'
,'datetime'
) as Published
from
@xml.nodes
('
/ma:discography/ma:album
') Discography (Album)

Listing 7 shows the code to extract the tracks.
Listing 7
with


select

Extracting the tracks

xmlnamespaces
(
' as xsi
,' as m
,' as ma
)
Discography.Track.query

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>

165


166

CHAPTER 12

Using XML to transport relational data

('
data(@ma:title)
').value
(
'.'
,'nvarchar(450)'

) as TrackTitle
,Discography.Track.query
('
data(@ma:trackNumber)
').value
(
'.'
,'int'
) as TrackNumber
,Discography.Track.query
('
data(parent::ma:album/@ma:title)
').value
(
'.'
,'nvarchar(450)'
) as AlbumTitle
,Discography.Track.query
('
data(parent::ma:album/@ma:published)
').value
(
'.'
,'datetime'
) as Published
from
@xml.nodes
('
/ma:discography/ma:album/ma:track
') Discography (Track)


Listing 8 shows the code to extract the persons.
Listing 8
with

select

Extracting the persons

xmlnamespaces
(
' as xsi
,' as m
,' as ma
)
distinct
Discography.Person.query
('
data(@m:firstName)
').value
(
'.'
,'nvarchar(150)'
) as FirstName

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


Enabling and maintaining the data flow


167

,Discography.Person.query
('
data(@m:middleName)
').value
(
'.'
,'nvarchar(150)'
) as MiddleName
,Discography.Person.query
('
data(@m:lastName)
').value
(
'.'
,'nvarchar(150)'
) as LastName
from
@xml.nodes
('
/ma:discography/ma:album/ma:track/ma:author
') Discography (Person)

Listing 9 shows the code to extract the bands.
Listing 9
with

select


Extracting the bands

xmlnamespaces
(
' as xsi
,' as m
,' as ma
)
distinct
Discography.Band.query
('
data(@ma:bandName)
').value
(
'.'
,'nvarchar(450)'
) as [Name]
from
@xml.nodes
('
/ma:discography/ma:album/ma:track/ma:band
') Discography (Band)

EXTRACTING BAND DATA

The source of the Band data is an XML entity based on the Band XML Schema shown
in listing 2 earlier in this chapter. This XML Schema provides all the data for the Band
and Person entities, including the data for the Band Member associative entity.
Compare the XML namespace declarations in listing 10 with the declaration in the

code listings presented earlier. Is there something different? Why?

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


168

CHAPTER 12

Listing 10
with

select

Using XML to transport relational data

Extracting the bands

xmlnamespaces
(
' as xsi
,' as m
,' as mb
)
distinct
Bands.Band.query
('
data(@mb:bandName)
').value

(
'.'
,'nvarchar(450)'
) as [Name]
,nullif(Bands.Band.query
('
data(@mb:established)
').value
(
'.'
,'datetime'
), N'') as Established
,nullif(Bands.Band.query
('
data(@mb:disbanded)
').value
(
'.'
,'datetime'
), cast(N'99991231' as datetime)) as Disbanded
from
@xml.nodes
('
/mb:bands/mb:band
') Bands (Band)

Listing 11 shows the code to extract the persons.
Listing 11
with


select

Extracting the persons

xmlnamespaces
(
' as xsi
,' as m
,' as mb
)
distinct
Bands.Band.query
('
data(@mb:bandName)
').value
(
'.'

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


Enabling and maintaining the data flow

169

,'nvarchar(450)'
) as [Name]
,nullif(Bands.Band.query
('

data(@mb:established)
').value
(
'.'
,'datetime'
), N'') as Established
,nullif(Bands.Band.query
('
data(@mb:disbanded)
').value
(
'.'
,'datetime'
), N'') as Disbanded
from
@xml.nodes
('
/mb:bands/mb:band
') Bands (Band)

By combining the queries listed previously into a workflow of data management operations, we can design two SQL procedures, each with a specific purpose based on the
two principal business entities mentioned in the section “The XML Schema”: one procedure to save the Album data and one procedure to save the Band data.
TIP

Here’s a beginner’s trick for memorizing XML retrieval methods: Nodes
provide the set, query retrieves the data element, and value extracts the
data.

We haven’t discussed one important issue yet—the question of associative entities. As
you may have observed in our examples, only the primary entities are listed. Why is

that? The answer is simple: associative entities, representing the many-to-many relationships between the primary entities, can be retrieved from the XML source by combining the queries used in retrieving the data of the individual primary entities of a
particular relationship. The combinations are listed in table 5.
Table 5

Retrieving the associative entities

Associative entity

Provided by combining these primary entities

Track Author

Track joined with Person—based on the nesting of the Author XML element inside the Track XML element of the Album XML

Track Performer

Track joined with Band—based on the nesting of the Band XML element inside the Track XML element of the Album XML

Band Member

Band joined with Person—based on the nesting of the Person XML element inside the Band XML element of the Band XML

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


170

CHAPTER 12


Using XML to transport relational data

Similarly, the one-to-many relationships between primary entities can be retrieved:
A Track is related to the corresponding Album based on the nesting of the
Track XML element inside the Album XML element.
A Track is related to a Title based on the value of the Title XML attribute of the
Track XML element.
An Album is related to a Title based on the value of the Title XML attribute of
the Album XML element.
The scripts containing the definitions of the two procedures can be downloaded from
/>You should study both definitions thoroughly before creating and/or attempting to use the procedures.
Note that in both procedures, table variables are used as temporary storage, which
provides the primary keys (based on IDENTITY columns) needed for preserving referential integrity in the Discography database. The dependency of individual business
entities was mentioned in the section “Preparing the inbound data flow.”
In both procedures, in the INSERT statements used to import the data into the
tables of the Discography database, observe the methods used to prevent the unique
and primary key constraint violations.
In brief, this is the operational flow used in both procedures:
1

2

3

4

5

From the XML source, extract the data that represents each primary entity (in
the order mentioned in the section “Preparing the inbound data flow”).

Insert the data into the database table, but exclude rows that already exist at the
destination (using the EXCEPT clause or the NOT EXISTS predicate).
Save the data of each primary entity in a table variable, including the surrogate
key values that the rows received when they were inserted into the database
table.
After both primary entities of a particular one-to-many relationship have been
inserted and temporarily saved in the corresponding table variables, insert the
data representing these relationships to the associative database tables.
After all the data has been extracted and all primary and associative entities
have been inserted, the process finishes.

After you’ve carefully studied both stored procedures and have identified all the concepts presented in this chapter, prepare a T-SQL script to import the XML samples.
Execute the script in steps: one XML file at a time, observe the progress, and inspect
the tables of the Discography database after each step of the script has finished.
A sample script can also be downloaded from http:/
/www.manning.com/
SQLServerMVPDeepDives.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


Enabling and maintaining the data flow

171

Exporting the data
To provide the outbound data flow, we pretty much have to do the opposite of what
we achieved in the previous section: extract the data from the database and return it
as XML.

In the data export, we’ll also implement both XML Schemas designed earlier;
therefore we’ll need two retrieval procedures—one for the Album data and another
for the Band data.
You should study the following two queries carefully in order to understand how
the FOR XML clauses using the PATH declaration instruct the database engine to construct the XML entity. You can find all the details regarding the FOR XML clause in
Books Online.
Let’s start with the simpler of the two queries. As we defined earlier in this chapter,
a Discography contains one or more Bands containing one or more Members. In the
Band XML Schema, the relationship between the Band and the Person entities is
implemented in form of XML elements representing the Band Members nested inside
the XML element representing each individual Band.
In listing 12, you can observe how the FOR XML query used to retrieve the Person
entity data is nested inside the FOR XML query used to retrieve the Band entity data.
The result from the nested query is exposed as a column in the outer query, and the
name of this column is specified in the PATH declaration of the inner query’s FOR XML
clause (in our example, mb:member).
Listing 12
with

select

To export the Band data from the database

xmlnamespaces
(
' as xsi
,' as m
,' as mb
)
Music.Band.Name as [@mb:bandName]

,Music.Band.Established as [@mb:established]
,Music.Band.Disbanded as [@mb:disbanded]
,(
select Music.Person.FirstName as [@m:firstName]
,Music.Person.MiddleName as [@m:middleName]
,Music.Person.LastName as [@m:lastName]
from
Music.Person
inner join
Music.BandMember
on
Music.BandMember.PersonId =
➥ Music.Person.PersonId
where
(Music.BandMember.BandId = Music.Band.BandId)
order by
Music.Person.LastName
,Music.Person.FirstName
,Music.Person.MiddleName
for xml path('mb:member'), type
)
from
Music.Band
order by
Music.Band.Name
for xml path('mb:band'), root('mb:bands'), type

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>



172

CHAPTER 12

Using XML to transport relational data

The proper nesting of the data (namely, that the Band contains the correct Members)
is achieved by correctly referencing the Band Member associative entity, where the
many-to-many relationships between the Bands and the Persons are stored.
The outer query also uses a PATH declaration specifying the XML node (mb:band)
together with the ROOT declaration specifying the name of the XML root node.
The TYPE declaration is used to instruct the database engine to return the resultset
as XML data rather than character data, which is the default (if the TYPE declaration is
omitted).
Once again, we began the T-SQL query with the XML namespaces declaration, providing us with all the necessary namespaces implemented by the corresponding Band
XML Schema.
The first thing that should be apparent from listing 13 is the added complexity
resulting from the deeper nesting of the Album XML entity. Remember how we
defined the Album XML Schema: a Discography contains one or more Albums containing one or more Tracks written by one or more Authors and performed by one or
more Bands.
Listing 13
with

select

To export the Album data from the database

xmlnamespaces
(

' as xsi
,' as m
,' as ma
)
Music.Title.Title as [@ma:title]
,Music.Album.Published as [@ma:published]
,(
select Music.Title.Title as [@ma:title]
,Music.Track.TrackNumber as [@ma:trackNumber]
,(
select Music.Person.FirstName as [@m:firstName]
,Music.Person.MiddleName as [@m:middleName]
,Music.Person.LastName as [@m:lastName]
from
Music.Person
inner join
Music.TrackAuthor
on
➥ Music.TrackAuthor.PersonId = Music.Person.PersonId
where
(Music.TrackAuthor.TrackId =
➥ Music.Track.TrackId)
order by
Music.Person.LastName
,Music.Person.FirstName
,Music.Person.MiddleName
for xml path('ma:author'), type
)
,(
select Music.Band.Name as [@ma:bandName]

from
Music.Band
inner join
Music.TrackPerformer
on
➥ Music.TrackPerformer.BandId = Music.Band.BandId

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


Preparing the sample data
where

173

(Music.TrackPerformer.TrackId =

➥ Music.Track.TrackId)

order by
Music.Band.Name
for xml path('ma:band'), type

)
from

Music.Title
inner join


Music.Track
on
Music.Track.TitleId =
➥ Music.Title.TitleId
where
(Music.Track.AlbumId = Music.Album.AlbumId)
order by
Music.Track.TrackNumber
for xml path('ma:track'), type
)
from

Music.Title
inner join

Music.Album
on
Music.Album.TitleId =
➥ Music.Title.TitleId
order by
Music.Album.Published
for xml path('ma:album'), root('ma:discography'), type

The queries to retrieve Person and Band data are nested inside the outer query used
to retrieve Track data, which is nested inside the outermost query used to retrieve
Album data. The result of each inner query is exposed to the outer query as a column
of the outer query’s resultset, and its name is specified by the inner query’s PATH declaration of the FOR XML clause.
The outermost query also uses the PATH declaration specifying the destination
XML node and the ROOT declaration specifying the root node of the destination XML
entity.

The XML namespaces declaration at the beginning of the query provides all the
necessary namespaces implemented by the corresponding Album XML Schema.
The queries presented in listings 11 and 12 are used in two SQL procedures, the
/www.manning.com/SQLServerdefinitions of which can be downloaded from http:/
MVPDeepDives.
Review both procedures carefully before creating them in the Discography database. Pay attention to the optional input parameters used by the procedures, and how
they’re used in the queries to restrict the resultset.
Can you predict what would happen if the parameters weren’t specified when
using the procedures to retrieve data?

Preparing the sample data
As the development of a client application to create and edit XML data is outside the
scope of this chapter, you could resort to a generic solution such as Microsoft InfoPath, or design a custom application implementing the functionalities provided in
this chapter, or even use a text editor to create XML data.
InfoPath, for instance, provides a fairly simple way of designing forms based on
sample XML data or on an XML Schema, such as the two schemas used in this chapter.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


174

CHAPTER 12

Using XML to transport relational data

In fact, the samples published to http:/
/www.manning.com/SQLServerMVPDeep
Dives have been created using two InfoPath forms based on the Album and the Band

XML Schemas. These forms can also be downloaded from />SQLServerMVPDeepDives.

Homework
Even though this chapter spans several diverse subjects, it’s not as diverse as the reality
it tries to imitate. We left a few gaps; for one, we haven’t considered all the facts about
the discography business that can be observed in reality.
Here are some things you could do to improve this solution:
1

2

3

4

5

Create additional sample data:
Use the editor of your choice to add data to the sample XML entities.
Design InfoPath forms based on the XML Schemas designed in this chapter.
Design a custom client application implementing the XML Schemas and
SQL procedures designed in this chapter.
Extend the entities with additional attributes:
Track Duration.
Album Description.
Lyrics.
Think about other facts about discographies:
Tracks aren’t performed by Bands; they’re performed by Musicians. Sometimes, Bands hire additional Musicians who aren’t Band Members to help
them record.
Musicians play (different) Instruments and perform in different Roles as

Band Members.
Persons join the Band at some time, and they can also leave the Band at some
time. They can even join and leave a Band more than once.
More people are involved in making an Album than Authors and Artists.
Bands can share a Name, yet Bands with the same name rarely share their
Origin.
Think about data management as presented in this chapter and data management in general:
Could the existing processes be optimized?
Could the error handling in the procedures be improved in any way?
What would be needed to support all data management operations (such as
including Delete)?
What other possibilities in terms of data analysis does the data model provide?
Think about other possibilities of retrieving Album and/or Band data as XML
corresponding to the appropriate XML Schema.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


Summary

175

Summary
We took a real-life business and analyzed it, and interviewed an (imaginary) expert in
the business and summarized his responses. We then collected all the facts and used
them to design the logical model of the forthcoming software solution.
After applying a bit of good old normalization “magic,” we transformed the logical
model into a physical model that we could then implement in the form of a SQL
Server database, and also in the form of two XML Schemas.

The Discography database will serve as permanent storage for our discography data,
and the schema-governed XML will serve as temporary storage and provide a way of
transporting the data in and out of the permanent data store.
We’ve seen examples of the XML retrieval functionalities provided in Microsoft
SQL Server’s T-SQL language. Essential information was provided regarding the XML
standard, the XML Schema, the XML Query (or XQuery), the XPath expression and
XPath predicates, and last but not least, the XML namespaces. This essential information provides a first step into the world of XML, and shows ways of bridging the gap
between the world of XML and the world of SQL, using SQL Server 2005 or later.

About the author
Matija Lah graduated at the Faculty of Law at the University of
Maribor, Slovenia, in 1999. As a lawyer with extensive experience in IT, in 2001 he joined IUS SOFTWARE d.o.o., the leading
provider of legal information in Slovenia, where he first came
into contact with Microsoft SQL Server. In 2005, he decided to
pursue a career as a freelance consultant in the domain of general, business, and legal information. In 2006 this led him to
join AI-in-Law Future Technologies, Inc., a company that
applies artificial intelligence to the legal information domain.
Based on his continuous contributions to the SQL community, Microsoft gave him the
Most Valuable Professional award for SQL Server in 2007.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


13 Full-text searching
Robert C. Cain

Search is everywhere. In addition to the powerful search engines available to us,
it seems like every website we visit has a search box for searching within that site.
Wouldn’t it be great to incorporate search within your applications? Fortunately,

SQL Server provides a powerful text search engine that’s as easy to use as onetwo-three!

Foundations of full-text searching
Before we begin the step-by-step process of creating and using full-text indexes,
there are a few fundamentals that you’ll need to understand. Full-text search isn’t a
fancy way of doing a LIKE search with SQL. Instead, every word is placed into a special type of index called a full-text index. These indexes are organized and stored in
full-text catalogs, which act as containers to organize our indexes.
Each word in a full-text index also includes a unique key for that record. You
should note that in order to full-text index a table, SQL Server requires the table to
have a unique, single-column key. This single-column key is used as part of the
ranking functions we’ll cover later in this chapter.
All of the text-based data types are eligible for full-text searching. The complete list is char, nchar, varchar, nvarchar, text, ntext, xml, image, and varbinary(max). According to online documentation from Microsoft, text, ntext, and
image data types will be deprecated in future versions of SQL Server, so I suggest
avoiding these if you can.
Char, nchar, varchar, and nvarchar all make sense as candidates for full-text
indexing. XML also makes sense, because it’s text based, but adds the advantage
that markup tags are ignored—only the data is full-text indexed. The data type that
might have you scratching your head is varbinary(max). To understand this, we
have to briefly delve into the history of the full-text engine.
The code base for the full-text search engine included with SQL Server
descended from a product called Microsoft Index Server. With it, you could index various document types stored on your server, be it a Windows NT 4.0 server or IIS
176

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


Creating and maintaining catalogs

177


(Internet Information Server). The ability to look inside documents and index their
content was retained and lives on in SQL Server’s full-text search engine.
SQL Server allows you to store various types of unstructured documents, such as
Microsoft Word, Excel, and many others inside a varbinary(max) field. If the full-text
engine recognizes the type of document stored in a varbinary(max) field, it’ll open
the document and index all words contained in the document.
We’re almost ready to dig into some code, but before we do, you should note
that all of the examples in this chapter use the AdventureWorks2008 database.
This is freely available from Microsoft’s CodePlex site. As of this writing, you can
/www.codeplex.com/MSFTDBProdSamples/Release/
find AdventureWorks at http:/
ProjectReleases.aspx.

Creating and maintaining catalogs
The “one” in our one-two-three concerns the catalog. The catalog is a logical container to hold a group of one or more full-text indexes. Creating a catalog is fairly
straightforward. Let’s look at the basic statement to create one (for a complete syntax
diagram, refer to the SQL Server Books On Line):
CREATE FULLTEXT CATALOG AdventureWorksFTC
AS DEFAULT;

First, note that you’ll want to supply the name of your catalog in place of AdventureWorksFTC. If you only have one full-text catalog for your database, I suggest using the
same name as the database followed by FTC (for full-text catalog), as in the example.
The optional AS DEFAULT tells SQL Server to use this particular catalog as the
default for all full-text commands if no catalog is specified. It’s a good idea to specify
at least one catalog as the default, and if you only have one, you definitely want to add
this to the statement.
That’s all there is to it; you now have an empty catalog waiting for your indexes.
Before we start loading it with full-text indexes, though, let’s take a moment to look at
a few commands available for maintaining the catalog.

The first two are similar to each other, in that they update all of the indexes in the
catalog, but they do it in quite different ways. The first is the REBUILD command.
ALTER FULLTEXT CATALOG AdventureWorksFTC REBUILD;

This will go through each index and rebuild it from the source tables. It’s the fastest,
most efficient way to rebuild an entire catalog, but it has the side effect of taking the
catalog offline—your catalog won’t be available for your users to do any full-text
searching. If your operation is a 9-to-5 shop and you’re doing a rebuild during off
hours, then REBUILD is the way to go. But what if your operation runs 24 hours a day?
For those situations, we have the REORGANIZE command:
ALTER FULLTEXT CATALOG AdventureWorksFTC REORGANIZE;

The REORGANIZE command will rebuild all indexes, without taking the catalog offline.
Your users will still be able to use and query the catalog normally. The downside is that
this is a lot slower than doing a rebuild.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


178

CHAPTER 13

Full-text searching

Fortunately, doing either a rebuild or a reorganize to your catalog is fairly rare.
The most likely call for this is during database updates that span the majority of
tables in the database. When making mass updates, you may find a significant speed
increase by turning off full-text indexing (using techniques later in this chapter),

doing the updates, turning indexing back on, and then rebuilding/reorganizing the
entire catalog.
The final command is quite simple: it sets a catalog to be the new default catalog.
ALTER FULLTEXT CATALOG AdventureWorksFTC AS DEFAULT;

Like the previous commands, this isn’t something you’ll use often. Perhaps in a
long script, you may want to change the default catalog temporarily to make your
coding easier.

Creating and maintaining full-text indexes
Now that the catalog exists, we’re ready to create indexes to put in it. In this section,
we’ll see not only how to create a full-text index, but how to maintain it.

Creating the full-text index
The second step in our one-two-three process is to create a full-text index. In the
AdventureWorks database is a table called Production.ProductDescription. Full-text
searching through product descriptions seems like a logical thing users would want to
do, so we’ll use this table. The next piece of information we need to know is what columns to search on. If you examine the table in SQL Server Management Studio, you’ll
see it only has one column that’s eligible for full-text searching: Description. The final
thing we need to know is the name of the unique index. Expanding the Keys branch
in Management Studio shows us one key, named PK_ProductDescription_ProductDescriptionID. Armed with this information, we can now issue the command to create our full-text index on this table:
CREATE FULLTEXT INDEX ON Production.ProductDescription
([Description])
KEY INDEX PK_ProductDescription_ProductDescriptionID
ON AdventureWorksFTC
WITH CHANGE_TRACKING AUTO;

We start by issuing CREATE FULLTEXT INDEX ON XXX (replacing XXX with the name of the
table we want to index). Note something interesting, though: at no point do we give
the full-text index a name. With full-text indexing, each table is allowed to have one

and only one full-text index. Because of this, SQL Server takes care of creating a
unique index name for us, allowing us to refer to it by the table name.
The single full-text index per table isn’t the limitation it might seem at first,
because you can have as many columns as you want in the index, as line two of the preceding code shows. List each column in parentheses, separated by commas. You can
also add and remove columns later, as we’ll see momentarily.
The next line, KEY INDEX, asks you to specify the unique index for your table. This
will typically be your primary key index. The important thing is that it be a single-

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


Creating and maintaining full-text indexes

179

column, non-nullable unique index. SQL Server requires this in order to perform its
ranking functions, discussed later in this chapter.
The fourth line, ON AdventureWorksFTC, is optional if you have AdventureWorks
as your default catalog. If you omit it, the full-text index will be placed into the default
catalog. If you have multiple catalogs for your system, you can add the catalog name in
order to place your new full-text index in a catalog other than the default.
The next line, WITH CHANGE_TRACKING, is probably the most important line in the
statement. It defines how SQL Server will manage your full-text index; therefore,
understanding the options is key to understanding how your index will get updated.
The AUTO option is the most straightforward, so we’ll tackle it first.
With AUTO, every time a row in your table is updated, SQL Server will update the fulltext index associated with that table. This is by far the easiest way to manage your fulltext indexes, but it can cause performance penalties if your table has a large number of
updates in a short time span. I’d like to give you a more definitive statement than
“large,” but it depends on a variety of variables. How beefy is your server? How much
RAM is installed? What’s the speed of the disks? And is the catalog on the same drive as

the database or a different one? All of these come into play; my best advice is to set up
your index in a test environment with the change tracking set to AUTO, and then test
with a load that simulates your production environment. If you can measure an unacceptable decrease in performance, you can instead set change tracking to MANUAL.
With MANUAL change tracking, each time a row is updated in your table, SQL Server
sets an internal flag that marks that row as having been updated—but no action is
taken to update your full-text index. To update the full-text index, you must issue an
ALTER command, which we’ll cover in detail shortly. This method is much more efficient and faster than using AUTO. It does have a downside, though, in that there’s a
time delay. You have to set up a job using SQL Server’s job agent to issue the ALTER
command at a frequency acceptable to your users. Thus, there will be some time delay
between when a user updates a record in a table and when that data is available to be
full-text searched on. For tables with large numbers of updates, MANUAL is definitely
the preferred method.
The last option, OFF, will create the full-text index and populate it, but then cause
no further updates to the index to be exercised. It won’t track changes to the table, as
with manual mode, nor will it set up the index to be automatically updated. OFF mode
would be useful with static tables—tables where you don’t plan on doing updates. Perhaps these are lookup tables, or they’re tables from a legacy system you want to be
able to report on for historical purposes, but that will never be updated.
With all three options, when you create the index, SQL Server immediately populates the full-text index from the source table. There may be times when this is undesirable. With AUTO or MANUAL, you don’t have much choice, but with OFF mode, there’s
an additional option: OFF, NO POPULATION. When you tack on NO POPULATION, SQL
Server will create the full-text index but not populate it. This would be useful when
you want to break your scripts into two parts—one to create the full-text indexes, and

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


180

CHAPTER 13


Full-text searching

a second you’d use later to populate them, perhaps during off hours, using the ALTER
statement as shown later in this chapter.
Now that the full-text index has been created, we’re ready for step three, querying
data from our full-text index. Before we proceed, let’s take a few moments to examine
how to maintain our full-text indexes.

Maintaining full-text indexes
Anyone with five minutes of experience in the computer industry knows that the one
thing that’s constant is change. SQL Server provides many ways to change our full-text
indexes, most of which are variations of the ALTER command. Let’s look at some ways
to maintain the index we just created in the previous section. The first statement is
ALTER FULLTEXT INDEX ON Production.ProductDescription
START UPDATE POPULATION;

This is the command to update the full-text index when CHANGE_TRACKING is set to
MANUAL, and probably the command you’ll use the most. When issued, SQL Server will
roll through all of the rows in the table, and will update the corresponding full-text
index for rows that have been marked as updated. To make life easier, you could issue
this command from a scheduled SQL Server job on a timed basis.
A corresponding command is the full population command:
ALTER FULLTEXT INDEX ON Production.ProductDescription
START FULL POPULATION;

This command will rebuild the entire full-text index for this table from the ground
up. You’d likely want to use this if you had turned off full-text indexing in order to
update the source table and were now ready to get it back in sync with the full-text
index.
The next two commands will allow us to add and remove columns from our fulltext index:

ALTER FULLTEXT INDEX ON Production.ProductDescription
ADD ([Description]);
ALTER FULLTEXT INDEX ON Production.ProductDescription
DROP ([Description]);

All you need to do is indicate the column you want to add or drop, and SQL Server will
take care of the rest.
It’s also possible to alter the change-tracking mode after you create the full-text
index. The change-tracking mode works like it does when creating the index. For
example, if we wanted to change the tracking mode on the product description table,
we’d issue this:
ALTER FULLTEXT INDEX ON Production.ProductDescription
SET CHANGE_TRACKING MANUAL;

To set it back to AUTO:

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
Licensed to Kerri Ross <>


Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×