Tải bản đầy đủ (.pdf) (10 trang)

Hướng dẫn học Microsoft SQL Server 2008 part 37 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.03 MB, 10 trang )

Nielsen c13.tex V4 - 07/21/2009 12:48pm Page 322
Part II Manipulating Data With Select
4 43877 2001-08-01 00:00:00.000
5 43894 2001-08-01 00:00:00.000
6 43895 2001-08-01 00:00:00.000
7 43911 2001-08-01 00:00:00.000
1 44109 2001-09-01 00:00:00.000
1 44285 2001-10-01 00:00:00.000
1 44483 2001-11-01 00:00:00.000
2 44501 2001-11-01 00:00:00.000
As expected, the windowed sort (in this case, the RowNumber column) restarts with every new month.
Ranking Functions
The windowing capability (the OVER() clause) by itself doesn’t create any query output columns; that’s
where the ranking functions come into play:

row_number
■ rank
■ dense_rank
■ ntile
Just to be explicit, the ranking functions all require the windowing function.
All the normal aggregate functions —
SUM(), MIN(), MAX(), COUNT(*), and so on — can also be
used as ranking functions.
Row number() function
The ROW_NUMBER() function generates an on-the-fly auto-incrementing integer according to the sort
order of the
OVER() clause. It’s similar to Oracle’s RowNum column.
The row number function simply numbers the rows in the query result — there’s absolutely no
correlation with any physical address or absolute row number. This is important because in a relational
database, row position, number, and order have no meaning. It also means that as rows are added or
deleted from the underlying data source, the row numbers for the query results will change. In addition,


if there are sets of rows with the same values in all ordering columns, then their order is undefined, so
their row numbers may change between two executions even if the underlying data does not change.
One common practical use of the
ROW_NUMBER() function is to filter by the row number values for
pagination. For example, a query that easily produces rows 21–40 would be useful for returning the
second page of data for a web page. Just be aware that the rows in the pages may change — typically,
this grabs data from a temp table.
It would seem that the natural way to build a row number pagination query would be to simply add the
OVER() clause and ROW_NUMBER() function to the WHERE clause:
SELECT ROW_NUMBER() OVER(ORDER BY OrderDate, SalesOrderID) as
RowNumber, SalesOrderID
FROM Sales.SalesOrderHeader
WHERE SalesPersonID = 280
322
www.getcoolebook.com
Nielsen c13.tex V4 - 07/21/2009 12:48pm Page 323
Windowing and Ranking 13
AND ROW_NUMBER() OVER(ORDER BY OrderDate, SalesOrderID)
Between 21 AND 40
ORDER BY RowNumber;
Result:
Msg 4108, Level 15, State 1, Line 4
Windowed functions can only appear in the SELECT or ORDER BY clauses.
Because the WHERE clause occurs very early in the query processing — often in the query operation that
actually reads the data from the data source — and the
OVER() clause occurs late in the query process-
ing, the
WHERE clause doesn’t yet know about the windowed sort of the data or the ranking function.
The
WHERE clause can’t possibly filter by the generated row number.

There is a simple solution: Embed the windowing and ranking functionality in a subquery or common
table expression:
SELECT RowNumber, SalesOrderID, OrderDate, SalesOrderNumber
FROM (
SELECT ROW_NUMBER() OVER(ORDER BY OrderDate, SalesOrderID) as
RowNumber, *
FROM Sales.SalesOrderHeader
WHERE SalesPersonID = 280
)ASQ
WHERE RowNumber BETWEEN 21 AND 40
ORDER BY RowNumber;
Result:
RowNumber SalesOrderID OrderDate SalesOrderNumber

21 45041 2002-01-01 00:00:00.000 SO45041
22 45042 2002-01-01 00:00:00.000 SO45042
23 45267 2002-02-01 00:00:00.000 SO45267
24 45283 2002-02-01 00:00:00.000 SO45283
25 45295 2002-02-01 00:00:00.000 SO45295
26 45296 2002-02-01 00:00:00.000 SO45296
27 45303 2002-02-01 00:00:00.000 SO45303
28 45318 2002-02-01 00:00:00.000 SO45318
29 45320 2002-02-01 00:00:00.000 SO45320
30 45338 2002-02-01 00:00:00.000 SO45338
31 45549 2002-03-01 00:00:00.000 SO45549
32 45783 2002-04-01 00:00:00.000 SO45783
33 46025 2002-05-01 00:00:00.000 SO46025
34 46042 2002-05-01 00:00:00.000 SO46042
35 46052 2002-05-01 00:00:00.000 SO46052
36 46053 2002-05-01 00:00:00.000 SO46053

37 46060 2002-05-01 00:00:00.000 SO46060
38 46077 2002-05-01 00:00:00.000 SO46077
39 46080 2002-05-01 00:00:00.000 SO46080
40 46092 2002-05-01 00:00:00.000 SO46092
323
www.getcoolebook.com
Nielsen c13.tex V4 - 07/21/2009 12:48pm Page 324
Part II Manipulating Data With Select
The second query in this chapter, in the ‘‘Partitioning within the Window’’ section, showed how group-
ing the sort order of the window generated row numbers that started over with every new partition.
Rank() and dense_rank() functions
The RANK() and DENSE_RANK() functions return values as if the rows were competing according to
the windowed sort order. Any ties are grouped together with the same ranked value. For example, if
Frank and Jim both tied for third place, then they would both receive a
rank() value of 3.
Using sales data from
AdventureWorks2008, there are ties for least sold products, which makes it a
good table to play with
RANK() and DENSE_RANK(). ProductID’s 943 and 911 tie for third place
and
ProductID’s 927 and 898 tie for fourth or fifth place depending on how ties are counted:
Least Sold Products:
SELECT ProductID, COUNT(*) as ‘count’
FROM Sales.SalesOrderDetail
GROUP BY ProductID
ORDER BY COUNT(*);
Result (abbreviated):
ProductID count

897 2

942 5
943 6
911 6
927 9
898 9
744 13
903 14

Examining the sales data using windowing and the RANK() function returns the ranking values:
SELECT ProductID, SalesCount,
RANK() OVER (ORDER BY SalesCount) as ‘Rank’,
DENSE_RANK() OVER(Order By SalesCount) as ‘DenseRank’
FROM (SELECT ProductID, COUNT(*) as SalesCount
FROM Sales.SalesOrderDetail
GROUP BY ProductID
)ASQ
ORDER BY ‘Rank’;
Result (abbreviated):
ProductID SalesCount Rank DenseRank

897 2 1 1
942 5 2 2
943 6 3 3
911 6 3 3
927 9 54
324
www.getcoolebook.com
Nielsen c13.tex V4 - 07/21/2009 12:48pm Page 325
Windowing and Ranking 13
898 9 5 4

744 13 7 5
903 14 8 6

This example perfectly demonstrates the difference between RANK() and DENSE_RANK(). RANK()
counts each tie as a ranked row. In this example, Product IDs 943 and 911 both tie for third place
but consume the third and fourth row in the ranking, placing
ProductID 927 in fifth place.
DENSE_RANK() handles ties differently. Tied rows only consume a single value in the ranking, so
the next rank is the next place in the ranking order. No ranks are skipped. In the previous query,
ProductID 927 is in fourth place using DENSE_RANK().
Just as with the
ROW_NUMBER() function, RANK() and DENSE_RANK() can be used with a partitioned
OVER() clause. The previous example could be partitioned by product category to rank product sales
with each category.
Ntile() function
The fourth ranking function organizes the rows into n number of groups, called tiles, and returns the tile
number. For example, if the result set has ten rows, then
NTILE(5) would split the ten rows into five
equally sized tiles with two rows in each tile in the order of the
OVER() clause’s ORDER BY.
If the number of rows is not evenly divisible by the number of tiles, then the tiles get the extra row.
For example, for 74 rows and 10 tiles, the first 4 tiles get 8 rows each, and tiles 5 through 10 get
7 rows each. This can skew the results for smaller data sets. For example, 15 rows into 10 tiles would
place 10 rows in the lower five tiles and only place five tiles in the upper five tiles. But for larger data
sets — splitting a few hundred rows into 100 tiles, for example — it works great.
This rule also applies if there are fewer rows than tiles. The rows are not spread across all tiles; instead,
the tiles are filled until the rows are consumed. For example, if five rows are split using
NTILE(10),
the result set would not use tiles 1, 3, 5, 7, and 9, but instead show tiles 1, 2, 3, 4, and 5.
A common real-world example of

NTILE() is the percentile scoring used in college entrance exams.
The following query first calculates the
AdventureWorks2008 products’ sales quantity in the sub-
query. The outer query then uses the
OVER() clause to sort by the sales count, and the NTILE(100)
to calculate the percentile according to the sales count:
SELECT ProductID, SalesCount,
NTILE(100) OVER (ORDER BY SalesCount) as Percentile
FROM (SELECT ProductID, COUNT(*) as SalesCount
FROM Sales.SalesOrderDetail
GROUP BY ProductID
)ASQ
ORDER BY Percentile DESC;
Result (abbreviated):
ProductID SalesCount Percentile

712 3382 100
870 4688 100
921 3095 99
325
www.getcoolebook.com
Nielsen c13.tex V4 - 07/21/2009 12:48pm Page 326
Part II Manipulating Data With Select
873 3354 99
707 3083 98
711 3090 98
922 2376 97

830 33 5
888 39 5

902 20 4
950 28 4
946 30 4
744 13 3
903 14 3
919 16 3
911 6 2
927 9 2
898 9 2
897 2 1
942 5 1
943 6 1
Like the other three ranking functions, NTILE() can be used with a partitioned OVER() clause. Simi-
lar to the ranking example, the previous example could be partitioned by product category to generate
percentiles within each category.
Aggregate Functions
SQL query functions all fit together like a magnificent puzzle. A fine example is how windowing
can use not only the four ranking functions —
ROW_NUMBER(), RANK(), DENSE_RANK(),and
NTILE() — but also the standard aggregate functions: COUNT(*), MIN(), MAX(), and so on, which
were covered in the last chapter.
I won’t rehash the aggregate functions here, and usually the aggregate functions will fit well within a
normal aggregate query, but here’s an example of using the
SUM() aggregate function in a window to
calculate the total sales order count for each product subcategory, and then, using that result from the
window, calculate the percentage of sales orders for each product within its subcategory:
SELECT ProductID, Product, SalesCount,
NTILE(100) OVER (ORDER BY SalesCount) as Percentile,
SubCat,
CAST(CAST(SalesCount AS NUMERIC(9,2))

/ SUM(SalesCount) OVER(Partition BY SubCat)
* 100 AS NUMERIC (4,1)) AS PercOfSubCat
FROM (SELECT P.ProductID, P.[Name] AS Product,
PSC.NAME AS SubCat, COUNT(*) as SalesCount
FROM Sales.SalesOrderDetail AS SOD
JOIN Production.Product AS P
ON SOD.ProductID = P.ProductID
326
www.getcoolebook.com
Nielsen c13.tex V4 - 07/21/2009 12:48pm Page 327
Windowing and Ranking 13
JOIN Production.ProductSubcategory PSC
ON P.ProductSubcategoryID = PSC.ProductSubcategoryID
GROUP BY PSC.NAME, P.[Name], P.ProductID
)Q
ORDER BY Percentile DESC
Result (abbreviated):
ProductID Product SalesCount Percentile SubCat PercOfSubCat

870 Water Bottle - 30 oz. 4688 100 Bottles and Cages 55.6
712 AWC Logo Cap 3382 100 Caps 100.0
921 Mountain Tire Tube 3095 99 Tires and Tubes 17.7
873 Patch Kit/8 Patches 3354 99 Tires and Tubes 19.2
707 Sport-100 Helmet, Red 3083 98 Helmets 33.6
711 Sport-100 Helmet, Blue 3090 98 Helmets 33.7
708 Sport-100 Helmet, Black 3007 97 Helmets 32.8
922 Road Tire Tube 2376 97 Tires and Tubes 13.6
878 Fender Set - Mountain 2121 96 Fenders 100.0
871 Mountain Bottle Cage 2025 96 Bottles and Cages 24.0


Summary
Windowing — an extremely powerful technology that creates an independent sort of the query
results — supplies the sort order for the ranking functions which calculate row numbers, ranks, dense
ranks, and n-tiles. When coding a complex query that makes the data twist and shout, creative use of
windowing and ranking can be the difference between solving the problem in a single query or resorting
to temp tables and code.
The key point to remember is that the
OVER() clause generates the sort order for the ranking functions.
This chapter wraps up the set of chapters that explain how to query the data. The next chapters finish
up the part on select by showing how to package queries into reusable views, and add insert, update,
delete, and merge verbs to queries to modify data.
(In case you haven’t checked yet and still need to know: The hidden arrow in the FedEx logo is
between the E and the X.)
327
www.getcoolebook.com
Nielsen c13.tex V4 - 07/21/2009 12:48pm Page 328
www.getcoolebook.com
Nielsen c14.tex V4 - 07/21/2009 12:49pm Page 329
Projecting Data Through
Views
IN THIS CHAPTER
Planning views wisely
Creating views with
Management Studio or DDL
Updating through views
Performance and views
Nesting views
Security through views
Synonyms
A

view is the saved text of a SQL SELECT statement that may be referenced
as a data source within a query, similar to how a subquery can be used as
a data source — no more, no less. A view can’t be executed by itself; it
must be used within a query.
Views are sometimes described as ‘‘virtual tables.’’ This isn’t an accurate descrip-
tion because views don’t store any data. Like any other SQL query, views merely
refer to the data stored in tables.
With this in mind, it’s important to fully understand how views work, the pros
and cons of using views, and the best place to use views within your project
architecture.
Why Use Views?
While there are several opinions on the use of views, ranging from total absti-
nence to overuse, the Information Architecture Principle (from Chapter 2, ‘‘Smart
Database Design’’) serves as a guide for their most appropriate use. The principle
states that ‘‘information must be made readily available in a usable format for
daily operations and analysis by individuals, groups, and processes ’’
Presenting data in a more useable format is precisely what views do best.
Based on the premise that views are best used to increase data integrity and ease of
writing ad hoc queries, and not as a central part of a production application, here
are some ideas for building ad hoc query views:
■ Use views to denormalize or flatten complex joins and hide any surro-
gate keys used to link data within the database schema. A well-designed
view invites the user to get right to the data of interest.
■ Save complex aggregate queries as views. Even power users will
appreciate a well-crafted aggregate query saved as a view.
329
www.getcoolebook.com
Nielsen c14.tex V4 - 07/21/2009 12:49pm Page 330
Part II Manipulating Data with Select
Best Practice

V
iews are an important part of the abstraction puzzle; I recommend being intentional in their use. Some
developers are enamored with views and use them as the primary abstraction layer for their databases.
They create layers of nested views, or stored procedures that refer to views. This practice serves no valid
purpose, creates confusion, and requires needless overhead. The best database abstraction layer is a single
layer of stored procedures that directly refer to tables, or sometimes user-defined functions (see Chapter 28,
‘‘Building out the Data Abstraction Layer’’).
Instead, use views only to support ad hoc queries and reports. For queries that are run occasionally, views
perform well even when compared with stored procedures.
Data within a normalized database is rarely organized in a readily available format. Building ad hoc
queries that extract the correct information from a normalized database is a challenge for most end-users. A
well-written view can hide the complexity and present the correct data to the user.
■ Use aliases to change cryptic column names to recognizable column names. Just as the
SQL
SELECT statement can use column or table aliases to modify the names of columns or
tables, these features may be used within a view to present a more readable record set to
the user.
■ Include only the columns of interest to the user. When columns that don’t concern users
are left out of the view, the view is easier to query. The columns that are included in the
view are called projected columns, meaning they project only the selected data from the entire
underlying table.
■ Plan generic, dynamic views that will have long, useful lives. Single-purpose views quickly become
obsolete and clutter the database. Build the view with the intention that it will be used with a
WHERE clause to select a subset of data. The view should return all the rows if the user does not
supply a
WHERE restriction. For example, the vEventList view returns all the events; the user
should use a
WHERE clause to select the local events, or the events in a certain month.
■ If a view is needed to return a restricted set of data, such as the next month’s events, then
the view should calculate the next month so that it will continue to function over time.

Hard-coding values such as a month number or name would be poor practice.
■ If the view selects data from a range, then consider writing it as a user-defined function (see
Chapter 25, ‘‘Building User-Defined Functions’’), which can accept parameters.
■ Consolidate data from across a complex environment. Queries that need to collect data from
across multiple servers are simplified by encapsulating the union of data from multiple servers
within a view. This is one case where basing several reports, and even stored procedures, on a
view improves the stability, integrity, and maintainability of the system.
330
www.getcoolebook.com
Nielsen c14.tex V4 - 07/21/2009 12:49pm Page 331
Projecting Data Through Views 14
Using Views for Column-Level Security
O
ne of the basic relational operators is projection — the ability to expose specific columns. One primary
advantage of views is their natural capacity to project a predefined set of columns. Here’s where theory
becomes practical. A view can project columns on a need-to-know basis and hide columns that are sensitive
(e.g., payroll and credit card data), irrelevant, or confusing for the purpose of the view.
SQL Server supports column-level security, a nd it’s a powerful feature. The problem is that ad hoc queries
made by users who don’t understand the schema very well will often run into security errors. I recommend
implementing SQL Server column-level security, and then also using views to shield users from ever
encountering the security. Grant users read permission from only the views, and restrict access to the
physical tables (see Chapter 50, ‘‘Authorizing Securables’’).
I’ve seen databases that only use views for column-level security without any SQL Server–enforced security.
This is woefully inadequate and will surely be penalized by any serious security audit.
The goal when developing views is two-fold: to enable users to get to the data easily and to protect the
data from the users. By building views that provide the correct data, you are preventing erroneous or
inaccurate queries and misinterpretation.
There are other advanced forms of views.
Distributed partition views
,or

federated databases
, divide very large tables across multiple smaller
tables or separate servers to improve performance. The partitioned view then spans the multiple tables
or servers, thus sharing the query load across more disk spindles. These are covered in Chapter 68,
‘‘Partitioning.’’
Indexed views
are a powerful feature that actually materializes the data, storing the results of the view
in a clustered index on disk, so in this sense it’s not a pure view. Like any view, it can select data from
multiple data sources. Think of the indexed view as a covering index but with greater control — you
can include data from multiple data sources, and you don’t have to include the clustered index keys.
The index may then be referenced when executing queries, regardless of whether the view is in the
query, so the name is slightly confusing.
Because designing an indexed view is more like designing an indexing structure than creat-
ing a view, I’ve included indexed views in Chapter 64, ‘‘Indexing Strategies.’’
The Basic View
Using SQL Server Management Studio, views may be created, modified, executed, and included within
other queries, using either the Query Designer or the DDL code within the Query Editor.
331
www.getcoolebook.com

×