Tải bản đầy đủ (.pdf) (10 trang)

Hướng dẫn học Microsoft SQL Server 2008 part 55 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (606.69 KB, 10 trang )

Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 502
Part III Beyond Relational
The other solution to the multiple-column search problem consists of adding an additional col-
umn to hold all the text to be searched and duplicating the data from the original columns to a
FullTextSearch column within an after trigger or using a persisted computed column. This solution
is not smooth either. It duplicates data and costs performance time during inserts and updates. The crux
of the decision regarding how to solve the multiple-column search is the conflict between fast reads and
fast writes — OLAP versus OLTP.
Searches with wildcards
Because the full-text search engine has its roots in Windows Index and was not a SQL Server–developed
component, its wildcards use the standard DOS conventions (asterisk for a multi-character wildcard, and
double quotes) instead of SQL-style wildcards and SQL single quotes.
The other thing to keep in mind about full-text wildcards is that they work only at the end of a word,
not at the beginning. Indexes search from the beginning of strings, as shown here:
SELECT Title
FROM Fable
WHERE CONTAINS (*,‘ "Hunt*" ’);
Result:
Title

The Hunter and the Woodman
The Ass in the Lion’s Skin
The Bald Knight
Phrase searches
Full-text search can attempt to locate full phrases if those phrases are surrounded by double quotes.
For example, to search for the fable about the boy who cried wolf, searching for ‘‘Wolf! Wolf!’’ does
the trick:
SELECT Title
FROM Fable
WHERE CONTAINS (*,‘ "Wolf! Wolf!" ’);
Result:


Title

The Shepherd’s Boy and the Wolf
Word-proximity searches
When searching large documents, it’s nice to be able to specify the proximity of the search words. Full-
text search implements a proximity switch by means of the
NEAR option. The relative distance between
502
www.getcoolebook.com
Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 503
Using Integrated Full-Text Search 19
the words is calculated, and, if the words are close enough (within about 30 words, depending on the
size of the text), then full-text search returns a
true for the row.
The story of Androcles, the slave who pulls the thorn from the lion’s paw, is one of the longer fables
in the sample database, so it’s a good test sample.
The following query attempts to locate the fable ‘‘Androcles’’ based on the proximity of the words
‘‘pardoned’’ and ‘‘forest’’ in the fable’s text:
SELECT Title
FROM Fable
WHERE CONTAINS (*,‘pardoned NEAR forest’);
Result:
Title

Androcles
The proximity switch can handle multiple words. The following query tests the proximity of the words
‘‘lion,’’ ‘‘paw,’’ and ‘‘bleeding’’:
SELECT Title
FROM Fable
WHERE CONTAINS (*,‘lion NEAR paw NEAR bleeding’);

Result:
Title

Androcles
The proximity feature can be used with CONTAINSTABLE;theRANK indicates relative proximity.
The following query ranks the fables that mention the word ‘‘life’’ near the word ‘‘death’’ in order of
proximity:
SELECT Fable.Title, FTS.Rank
FROM Fable
INNER JOIN CONTAINSTABLE (Fable, *,‘life NEAR death’) AS FTS
ON Fable.FableID = FTS.[KEY]
ORDER BY FTS.Rank DESC;
Result:
Title Rank

The Serpent and the Eagle 7
The Eagle and the Arrow 1
The Woodman and the Serpent 1
503
www.getcoolebook.com
Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 504
Part III Beyond Relational
Word-inflection searches
The full-text search engine can actually perform linguistic analysis and base a search for different words
on a common root word. This enables you to search for words without worrying about number or
tense. For example, the inflection feature makes possible a search for the word ‘‘flying’’ that finds a
row containing the word ‘‘flew.’’ The language you specify for the table is critical in a case like this.
Something else to keep in mind is that the word base will not cross parts of speech, meaning that
a search for a noun won’t locate a verb form of the same root. The following query demonstrates
inflection by locating the fable with the word ‘‘flew’’ in ‘‘The Crow and the Pitcher’’:

SELECT Title
FROM Fable
WHERE CONTAINS (*,‘FORMSOF(INFLECTIONAL,fly)’);
Result:
Title

The Crow and the Pitcher
The Bald Knight
Thesaurus searches
The full-text search engine has the capability to perform thesaurus lookups for word replacements as
well as synonyms. To configure your own thesaurus options, edit the thesaurus file. The location of the
thesaurus file is dependent on your language, and server.
The thesaurus file for your language will follow the naming convention TSXXX.xml, where XXX is your
language code (e.g., ENU for U.S. English, ENG for U.K. English, and so on). You need to remove the
comment lines from your thesaurus file. If you edit this file in a text editor, then there are two sections
or nodes to the thesaurus file: an expansion node and a replacement node. The expansion node is used to
expand your search argument from one term to another argument. For example, in the thesaurus file,
you will find the following expansion:
<expansion>
<sub>Internet Explorer</sub>
<sub>IE</sub>
<sub>IE5</sub>
</expansion>
This will convert any searches on ‘‘IE’’ to search on ‘‘IE’’ or ‘‘IE5’’ or ‘‘Internet Explorer.’’
The replacement node is used to replace a search argument with another argument. For example, if you
want the search argument sex interpreted as gender, you could use the replacement node to do that:
<replacement>
<pat>sex</pat>
<sub>gender</sub>
</replacement>

504
www.getcoolebook.com
Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 505
Using Integrated Full-Text Search 19
The pat element (sex) indicates the pattern you want substituted by the sub element (gender).
A
FREETEXT query will automatically use the thesaurus file for the language type. Here is an example
of a generational query using the
Thesaurus option:
SELECT * FROM TableName WHERE CONTAINS(*,‘FORMSOF(Thesaurus,"IE")’);
This returns matches to rows containing IE, IE5, and Internet Explorer.
Variable-word-weight searches
In a search for multiple words, relative weight may be assigned, making one word critical to the search
and another word much less important. The weights are set on a scale of 0.0 to 1.0.
The
ISABOUT option enables weighting, and any hit on the given word allows the rows to be returned,
so it functions as an implied Boolean
OR operator.
The following two queries use the
weight option with CONTAINSTABLE to highlight the differences
among the words ‘‘lion,’’ ‘‘brave,’’ and ‘‘eagle’’ as the weighting changes. The query will examine only the
FableText column to prevent the results from being skewed by the shorter lengths of the text found
on the title and moral columns. The first query weights the three words evenly:
SELECT Fable.Title, FTS.Rank
FROM Fable
INNER JOIN CONTAINSTABLE
(Fable, FableText,
‘ISABOUT (Lion weight (.5),
Brave weight (.5),
Eagle weight (.5))’) AS FTS

ON Fable.FableID = FTS.[KEY]
ORDER BY Rank DESC;
Result:
Title Rank

Androcles 92
The Eagle and the Fox 85
The Hunter and the Woodman 50
The Serpent and the Eagle 50
The Dogs and the Fox 32
The Eagle and the Arrow 21
The Ass in the Lion’s Skin 16
When the relative importance of the word ‘‘eagle’’ is elevated, it’s a different story:
SELECT Fable.Title, FTS.Rank
FROM Fable
INNER JOIN CONTAINSTABLE
(Fable, FableText,
505
www.getcoolebook.com
Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 506
Part III Beyond Relational
‘ISABOUT (Lion weight (.2),
Brave weight (.2),
Eagle weight (.8))’) AS FTS
ON Fable.FableID = FTS.[KEY]
ORDER BY Rank DESC;
Result:
Title Rank

The Eagle and the Fox 102

The Serpent and the Eagle 59
The Eagle and the Arrow 25
Androcles 25
The Hunter and the Woodman 14
The Dogs and the Fox 9
The Ass in the Lion’s Skin 4
When all the columns participate in the full-text search, the small size of the moral and the title make
the target words seem relatively more important within the text. The next query uses the same weighting
as the previous query but includes all columns (*):
SELECT Fable.Title, FTS.Rank
FROM Fable
INNER JOIN CONTAINSTABLE
(Fable, *,
‘ISABOUT (Lion weight (.2),
Brave weight (.2),
Eagle weight (.8))’) AS FTS
ON Fable.FableID = FTS.[KEY]
ORDER BY Rank DESC;
Result:
Title Rank

The Wolf and the Kid 408
The Hunter and the Woodman 408
The Eagle and the Fox 102
The Eagle and the Arrow 80
The Serpent and the Eagle 80
Androcles 25
The Ass in the Lion’s Skin 23
The Dogs and the Fox 9
The ranking is relative, and is based on word frequency, word proximity, and the relative importance of

a given word within the text. ‘‘The Wolf and the Kid’’ does not contain an eagle or a lion, but two fac-
tors favor bravado. First, ‘‘brave’’ is a rarer word than ‘‘lion’’ or ‘‘eagle’’ in both the column and the table.
Second, the word ‘‘brave’’ appears in the moral as one of only 10 words. So even though ‘‘brave’’ was
weighted less, it rises to the top of the list. It’s all based on word frequencies and statistics (and some-
times, I think, the phase of the moon!).
506
www.getcoolebook.com
Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 507
Using Integrated Full-Text Search 19
Fuzzy Searches
While the CONTAINS predicate and CONTAINSTABLE-derived table perform exact word searches, the
FREETEXT predicate expands on the CONTAINS functionality to include fuzzy, or approximate, full-text
searches from free-form text.
Instead of searching for two or three words and adding the options for inflection and weighting, the
fuzzy search handles the complexity of building searches that make use of all the full-text search engine
options, and tries to solve the problem for you. Internally, the free-form text is broken down into
multiple words and phrases, and the full-text search with inflections and weighting is then performed on
the result.
Freetext
FREETEXT works within a WHERE clause just like CONTAINS, but without all the options. The follow-
ing query uses a fuzzy search to find the fable about the big race:
SELECT Title
FROM Fable
WHERE FREETEXT
(*,‘The tortoise beat the hare in the big race’);
Result:
Title

The Hare and the Tortoise
FreetextTable

Fuzzy searches benefit from the FREETEXT-derived table that returns the ranking in the same way that
CONTAINSTABLE does. The two queries shown in this section demonstrate a fuzzy full-text search using
the
FREETEXT-derived table. Here is the first query:
SELECT Fable.Title, FTS.Rank
FROM Fable
INNER JOIN FREETEXTTABLE
(Fable, *, ‘The brave hunter kills the lion’) AS FTS
ON Fable.FableID = FTS.[KEY]
ORDER BY Rank DESC;
Result:
Title Rank

The Hunter and the Woodman 257
The Ass in the Lion’s Skin 202
The Wolf and the Kid 187
Androcles 113
507
www.getcoolebook.com
Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 508
Part III Beyond Relational
The Dogs and the Fox 100
The Goose With the Golden Eggs 72
The Shepherd’s Boy and the Wolf 72
Here is the second query:
SELECT Fable.Title, FTS.Rank
FROM Fable
INNER JOIN FREETEXTTABLE
(Fable, *, ‘The eagle was shot by an arrow’) AS FTS
ON Fable.FableID = FTS.[KEY]

ORDER BY Rank DESC;
Result:
Title Rank

The Eagle and the Arrow 288
The Eagle and the Fox 135
The Serpent and the Eagle 112
The Hunter and the Woodman 102
The Father and His Two Daughters 72
Performance
SQL Server 2008’s full-text search engine performance is several orders of magnitude faster than
previous versions of SQL Server. However, you still might want to tune your system for optimal
performance.
■ iFTS benefits from a very fast subsystem. Place your catalog on its own controller, preferably
its own RAID 10 array. A sweet spot exists for SQL iFTS on eight-way servers. After a full or
incremental population, force a master merge, which will consolidate all the shadow indexes
into a single master index, by issuing the following command:
ALTER FULLTEXT CATALOG catalog_name REORGANIZE;
■ You can also increase the maximum number of ranges that the gathering process can use. To
do so, issue the following command:
EXEC sp_configure ‘max full-text crawl range’, 32;
Summary
SQL Server indexes are not designed for searching for words in the middle of a column. If the database
project requires flexible word searches, then Integrated Full-Test Search (iFTS) is the perfect solution,
even though it requires additional development and administrative work.
508
www.getcoolebook.com
Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 509
Using Integrated Full-Text Search 19
■ iFTS requires configuring a catalog for each table to be searched.

■ iFTS catalogs are not populated synchronously within the SQL Server transaction. They are
populated asynchronously following the transaction. The recommended method is using
Change Tracking, which can automatically push changes as they occur.

CONTAINS is used within the WHERE clause and performs simple word searches, but it can
also perform inflectional, proximity, and thesaurus searches.

CONTAINSTABLE functions like CONTAINS but it returns a data set that can be referenced in
a
FROM clause.

FREETEXT and FREETEXTTABLE essentially turn on every advanced feature of iFTS and
perform a fuzzy word search.
As you read through this ‘‘Beyond Relational’’ part of the book, I hope you’re getting a sense of the
breadth of data SQL Server can manage. The next chapter concludes this part with Filestream, a new
way to store large BLOBs with SQL Server.
509
www.getcoolebook.com
Nielsen c19.tex V4 - 07/21/2009 1:02pm Page 510
www.getcoolebook.com
Nielsen p04.tex V4 - 07/21/2009 1:06pm Page 511
Developing with
SQL Server
IN THIS PART
Chapter 20
Creating the Physical Database
Schema
Chapter 21
Programming with T-SQL
Chapter 22

Kill the Cursor!
Chapter 23
T-SQL Error Handling
Chapter 24
Developing Stored Procedures
Chapter 25
Building User-Defined Functions
Chapter 26
Creating DML Triggers
Chapter 27
Creating DDL Triggers
Chapter 28
Building the Data Abstraction Layer
Chapter 29
Dynamic SQL and Code Generation
P
art II of this book was all about writing set-based queries. Part III
extended the
select command to data types beyond relational.
This part continues to expand on
select to provide programmable
flow of control to develop server-side solutions; and SQL Server has a large
variety of technologies to choose from to develop server-side code — from
the mature T-SQL language to .NET assemblies hosted within SQL Server.
This part opens with DDL commands (
create, alter,anddrop), and
progresses through 10 chapters of Transact-SQL that build on one another
into a crescendo with the data abstraction layer and dynamic SQL. The final
chapter fits CLR programming into the picture.
So, unleash the programmer within and have fun. There’s a whole world of

developer possibilities with SQL Server 2005.
If SQL Server is the box, then Part IV is all about thinking inside the box,
and moving the processing as close to the data as possible.
www.getcoolebook.com

×