Tải bản đầy đủ (.pdf) (5 trang)

The Language of SQL- P22 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (112.98 KB, 5 trang )

The result is:
FirstName LastName
Cary Grant
Matching by Sound
Let’s turn from matching letters and characters to matching sounds. SQL pro-
vides two functions that give you some interesting ways to compare the sounds
of words or phrases. The two functions are
SOUNDEX and DIFFERENCE.
Let’s first look at an example that utilizes the
SOUNDEX function:
SELECT
SOUNDEX ('Smith') AS 'Sound of Smith',
SOUNDEX ('Smythe') AS 'Sound of Smythe'
The result is:
Sound of Smith Sound of Smythe
S530 S530
The SOUNDEX function always returns a four-character response, which is a sort
of code for the sound of the phrase. The first character is always the first letter of
the phrase. In this case, the first character is S because both Smith and Smythe
begin with an S.
The remaining three characters are calculated from an analysis of the sound of
the rest of the phrase. Internally, the function first removes all vowels and the
letter Y. So, the function takes the MITH from SMITH and converts it to MTH.
Likewise, it takes the MYTHE from SMYTHE and converts it to MTH. It then
assigns a number to represent the sound of the phrase. In this example, that
number turns out to be 530.
Since
SOUNDEX returns a value of S530 for both Smith and Smythe, you can
conclude that they probably have very similar sounds.
Microsoft SQL Server provides one additional function, called
DIFFERENCE,


which works in conjunction with the
SOUNDEX function.
Matching by Sound 91
DATABASE DIFFERENCES: MySQL and Oracle
The
DIFFERENCE function isn’t available in MySQL or Oracle.
Here’s an example, using the same words:
SELECT
DIFFERENCE ('Smith', 'Smythe') AS 'The Difference'
The result is:
The Difference
4
The DIFFERENCE function always requires two arguments. Internally, the
function first retrieves the
SOUNDEX values for each of the arguments and then
compares those values. If it returns a value of 4, as in the previous example, that
means that all four characters in the
SOUNDEX value are identical. A value of 0
means that none of the characters is identical. Therefore, a
DIFFERENCE value
of 4 indicates the highest pos sible match, and a value of 0 is the lowest possible
match.
With this in mind, here’s an example of how the
DIFFERENCE function can be
used to retrieve values that are very similar in sound to a specific phrase. Work-
ing from the Actors table, you’re going to attempt to find rows with a first name
that sounds like John. The
SELECT statement is:
SELECT
FirstName,

LastName
FROM Actors
WHERE DIFFERENCE (FirstName, 'John') ¼ 4
The results are:
FirstName LastName
Jon Voight
John Wayne
Chapter 9

Inexact Matches92
The DIFFERENCE function concluded that both John and Jon had a difference
value of 4 between the name and the specified value of John.
If you want to analyze exactly why these two rows were selected, you can alter
your
SELECT to show both the SOUNDEX and DIFFERENCE values for all rows
in the table:
SELECT
FirstName,
LastName,
DIFFERENCE (FirstName, 'John') AS 'Difference Value',
SOUNDEX (FirstName) AS 'Soundex Value'
FROM Actors
This returns:
FirstName LastName Difference Value Soundex Value
Cary Grant 2 C600
Mary Steenburgen 2 M600
Jon Voight 4 J500
Dustin Hoffman 1 D235
John Wayne 4 J500
Gary Cooper 2 G600

Julie Andrews 3 J400
Notice that both Jon Voight and John Wayne have a SOUNDEX value of J500 and
a
DIFFERENCE value of 4 for their first names. This explains why they were
initially selected. Also notice that Julie Andrews has a
DIFFERENCE value of 3. If
you had specified a
WHERE clause where the DIFFERENCE value equaled 3 or 4,
that actor would have been selected as well.
Looking Ahead
This concludes our study of matching phrases by pattern or sound. Matching by
patterns is an important and widely used function of SQL. Any time you enter a
word in a search box and attempt to retrieve all entities containing that word,
you are utilizing pattern matching. Efforts to match by sound are much less
common. The technology exists, but there is an inherent difficulty in translating
words to sounds. The English language, or any language for that matter, contains
too many quirks and exceptions for such a match to be reliable.
Looking Ahead 93
In our next chapter, ‘‘Summarizing Data,’’ we’re going to turn our attention to
ways to separate data into groups and summarize the values in those groups with
various statistics. Back in Chapter 4, we talked about scalar functions. The next
chapter will introduce another type of function, called aggregate functions. These
aggregate functions will allow you to summarize your data in many useful ways.
For example, you’ll be able to look at any group of orders and determine the
number of orders, the total dollar amount of the orders, and the average order
size. With these techniques, you’ll be able to move beyond the presentation of
detailed data and begin to truly add value for your users as you deliver sum-
marized information.
Chapter 9


Inexact Matches94
chapter 10
Summarizing Data
Keywords Introduced: DISTINCT, SUM,
AVG, MIN, MAX, COUNT, GROUP BY, HAVING
Up until now, we’ve been presenting data basically as it exists in a database. Sure,
we’ve used some functions to move things around and have created some addi-
tional calculations, but the rows we’ve retrieved have corresponded to rows in
the underlying database. We now want to turn to various methods to summarize
our data.
The computer term usually associated with this type of endeavor is aggregation,
which means ‘‘to combine into groups.’’ The ability to aggregate and summarize
your data is key to being able to move beyond a mere display of data to some-
thing approaching real information. There’s a bit of magic involved when users
view summarized data in a report. They understand and appreciate that you’ve
been able to extract some real meaning from the mass of data in a database, in
order to present a clearer picture of what it all means.
Eliminating Duplicates
Although it doesn’t provide a true aggregation, the most elementary way
to summarize data is to eliminate duplicates. SQL has a keyword named
DISTINCT, which provides an easy way to remove duplicate rows from your
output.
95

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×