Tải bản đầy đủ (.pdf) (10 trang)

Microsoft SQL Server 2008 R2 Unleashed- P125 ppsx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (233.18 KB, 10 trang )

ptg
1184
CHAPTER 34 Data Structures, Indexes, and Performance
Index Design Guidelines
SQL Server indexes are mostly transparent to end users and T-SQL developers. Indexes are
typically not specified in queries unless you use table hints to force the Query Optimizer
to use a particular index. (Although forcing indexes is generally not advised, using Query
Optimizer table hints is covered in more detail in Chapter 35.) Normally, based on the
index key histogram or density values, the SQL Server cost-based Query Optimizer auto-
matically chooses the index that is least expensive from an I/O standpoint.
Chapter 35 goes into greater detail on how the Query Optimizer estimates I/O and deter-
mines the most efficient query plan. In the meantime, the following are some of the
main guidelines to follow in creating useful indexes that the Query Optimizer can use
effectively:
. For composite indexes, try to keep the more selective columns leftmost in the
index. The first element in the index should be the most unique (if possible), and
index column order in general should be from most to least unique. However,
remember that selectivity doesn’t help if the first ordered index column is not speci-
fied in your SARGs or join clauses. To ensure that the index is used for the largest
number of queries, be sure the first ordered column is the column used most often
in your queries.
. Be sure to index columns used in joins. Joins are processed inefficiently if no index
on the column(s) is specified in a join. Remember that a
PRIMARY KEY constraint
automatically creates an index on a column, but a FOREIGN KEY constraint does not.
You should create indexes on your foreign key columns if your queries commonly
join between the primary key and foreign key tables.
. Tailor your indexes for your most critical queries and transactions. You cannot index
for every possible query that might be run against your tables. However, your appli-
cations will perform better if you can identify your critical and most frequently
executed queries and design indexes to support them. SQL Server Profiler, which is


covered in Chapter 6, is a useful tool for identifying the most frequently executed
queries. SQL Server Profiler can also help identify slow-running queries that might
benefit from improved index design.
. Avoid indexes on columns that have poor selectivity. The Query Optimizer is not
likely to use the indexes, so they would simply take up space and add unnecessary
overhead during inserts, updates, and deletes. One possible exception occurs when
the index can be used to cover a query. Index covering is discussed in more detail in
the “Index Covering” section, later in this chapter.
. Choose your clustered and nonclustered indexes carefully. The next two sections dis-
cuss tips and guidelines for choosing between clustered or nonclustered indexes,
based on the data contained in the columns and the types of queries executed
against the columns.
Download from www.wowebook.com
ptg
1185
Index Design Guidelines
34
Clustered Index Indications
Searching for rows via a clustered index is almost always faster than searching for rows via
a nonclustered index—for two reasons. One reason is that a clustered index contains only
pointers to pages rather than pointers to individual data rows; therefore, a clustered index
is more compact than a nonclustered index. Because a clustered index is smaller and
doesn’t require an additional lookup via the row locator to find the matching rows, the
rows can be found with fewer page reads than with a similarly defined nonclustered index.
The second reason is that because the data in a table with a clustered index is physically
sorted on the clustered key, searching for duplicate values or for a range of clustered key
values is faster; the rows are adjacent to each other, and SQL Server can simply locate the
first qualifying row and then search the rows in sequence until the last qualifying row is
found. However, because you are allowed to create only one clustered index per table, you
must judiciously choose the column or columns on which to define the clustered index.

If you require only a single index on a table, it’s typically advantageous to make it a clus-
tered index; the resulting overhead of maintaining clustered indexes during updates, inserts,
and deletes can be considerably less than the overhead incurred by nonclustered indexes.
By default, the primary key on a table is defined as a clustered unique index. In most
applications, the primary key column on a table is almost always retrieved in single-row
lookups. For single-row lookups, a nonclustered index usually costs you only a few more
I/Os than a similar clustered index. Are you or the users really going to notice a difference
between three page reads to retrieve a single data row versus four- to six-page reads to
retrieve a single data row? Not at all. However, if you have to perform a range retrieval,
such as a lookup on last name, will you notice a difference between scanning 10% of the
table versus having to find the rows using a full table scan? Most definitely. With this in
mind, you might want to consider creating your primary key as a unique nonclustered
index and choosing another candidate for your clustered index.
Following are guidelines to consider for other potential candidates for clustered indexes:
. Columns with a number of duplicate values searched frequently (for
example, WHERE last_name = ‘Smith’)—Because the data is physically sorted,
all the duplicate values are kept together. Any query that tries to fetch records
against such keys finds all the values, using a minimum of I/O. SQL Server locates
the first row that matches the SARG and then scans the data rows in order until it
finds the last row matching the SARG.
. Columns often specified in the ORDER BY clause—Because the data is already
sorted, SQL Server can avoid having to re-sort the data if the
ORDER BY is on the clus-
tered index key and the data is retrieved in clustered key order. Remember that even
for a table scan, the data is retrieved in clustered key order because the data in the
table is in clustered key order. The only exception is if a parallel query operation is
used to retrieve the data rows; in that case, the results needs to be re-sorted when
the result sets from each parallel thread are merged. (For more information on paral-
lel query strategies, see Chapter 35.)
Download from www.wowebook.com

ptg
1186
CHAPTER 34 Data Structures, Indexes, and Performance
. Columns often searched for within a range of values (for example, WHERE
price between $10 and $20)—A clustered index can be used to locate the first
qualifying row in the range of values. Because the rows in the table are in sorted
order, SQL Server can simply scan the data pages in order until it finds the last quali-
fying row within the range. When the result set within the range of values is large, a
clustered index scan is significantly more efficient in terms of total logical I/O
performed than repeated row locator lookups via a nonclustered index.
. Columns, other than the primary key, frequently used in join clauses—
Clustered indexes tend to be smaller than nonclustered indexes; the amount of page
I/O required per lookup is generally less for a clustered index than for a nonclustered
index. It can be a significant difference when joining many records. An extra page
read or two might not seem like much for a single-row retrieval, but add those addi-
tional page reads to 100,000 join iterations, and you’re looking at a total of 100,000
to 200,000 additional page reads.
When you consider columns for a clustered index, you might want to try to keep your
clustered indexes on relatively static columns to minimize the re-sorting of data rows
when an indexed column is updated. Any time a clustered index key value changes, the
entire data row has to be moved to keep the clustered data values in physical sort order. In
addition, all nonclustered indexes using the clustered key as the row locator to that row
also need to be updated.
You should also avoid creating clustered indexes on wide keys that are made up of several
columns, especially several large-size columns. The reason is that the clustered key values
are incorporated in all nonclustered indexes as the row locater. Because the nonclustered
index entries contain the clustering key in addition to the key columns defined for that
nonclustered index, the nonclustered indexes end up being significantly larger and less
efficient in terms of I/O.
Because you can physically sort the data in a table in only one way, you can have only

one clustered index per table. Any other columns you want to index have to be defined
with nonclustered indexes.
Nonclustered Index Indications
SQL Server allows you to create a maximum of 999 nonclustered indexes on a table. Until
tables become extremely large, the actual space taken by a nonclustered index is a minor
expense compared to the increased access performance. You need to keep in mind,
however, that as you add more indexes to the system, database modification statements
get slower due to the index maintenance overhead.
Also, when defining nonclustered indexes, you typically want to define indexes on
columns that are more selective (that is, columns with low density values) so that they can
be used effectively by the Query Optimizer. A high number of duplicate values in a
Download from www.wowebook.com
ptg
1187
Index Design Guidelines
34
nonclustered index can often make it more expensive (in terms of I/O) to process the
query using the nonclustered index than a table scan. Let’s look at a hypothetical example:
select title from titles
where price between $5. and $10.
Assume that you have 1 million rows within the range; those 1 million rows could be
randomly scattered throughout the table. Although the index leaf level has all the index
rows in sorted order, reading all data rows one at a time would require a separate lookup
via the row locator for each row in the worst-case scenario.
Thus, the worst-case I/O estimate for range retrievals using a nonclustered index is as
follows:
Number of levels in the nonclustered index
+ Number of index pages scanned to find all matching rows
+ (Number of matching rows × Number of pages per lookup via the row locator)
If you have no clustered index on the table, the row locator is simply a page and row

pointer and requires one data page read to find the matching data row. If 1 million rows
are in the range, the worst-case cost estimate to search via the nonclustered index with no
clustered index on the table would be as follows:
Number of index page reads to find all the row locators
+ (1 million matching rows × 1 data page read)
= 1 million + I/O
If you have a clustered index on the table, the row locator is a clustered index key for the
data row. Using the row locator to find the matching row requires searching the clustered
index tree to locate the data row. Assuming that the clustered index has two nonleaf
levels, it would cost three pages to find each qualifying row on a data page. If the range
has 1 million rows, the worst-case cost estimate to search via the nonclustered index with
a clustered index on the table would be as follows:
Number of index page reads to find all the row locators
+ (1 million matching rows × 3 pages per lookup via the row locator)
= 3 million + I/O
Contrast each of these scenarios with the cost of a table scan. If the entire table takes up
50,000 pages, a full table scan would cost only 50,000 in terms of I/O. Therefore, in this
example, doing a table scan would actually be more efficient than using the nonclus-
tered index.
The following guidelines help you identify potential candidates for nonclustered indexes
for your environment:
. Columns referenced in SARGs or join clauses that have a relatively high selectivity
(the density value is low).
Download from www.wowebook.com
ptg
1188
CHAPTER 34 Data Structures, Indexes, and Performance
. Columns referenced in both the WHERE clause and the ORDER BY clause. When the
data rows are retrieved using a nonclustered index, they are retrieved in nonclus-
tered index key order. If the result set is to be ordered by the nonclustered index

key(s) as well, SQL Server can avoid having to re-sort the result set, resulting in a
more efficient query. In the following sample query, SQL Server can avoid the extra
step of sorting the result set if a nonclustered index is on state and the index is
used to retrieve the matching rows:
select * from authors
where state like ‘C%’
order by state
In general, nonclustered indexes are useful for single-row lookups, joins, queries on
columns that are highly selective, or queries with small range retrievals. Also, when
considering your nonclustered index design, you should not overlook the benefits of
index covering, as described in the following section.
Index Covering
Index covering is a situation in which all the information required by the query in the
SELECT and WHERE clauses can be found entirely within the nonclustered index itself.
Because the nonclustered index contains a leaf row corresponding to every data row in the
table, SQL Server can satisfy the query from the leaf rows of the nonclustered index. This
results in faster retrieval of data because all the information can come directly from the
index page, and SQL Server avoids lookups of the data pages.
Because the leaf pages in a nonclustered index are linked together, the leaf level of the index
can be scanned just like the data pages in a table. Because the leaf index rows are typically
much smaller than the data rows, a nonclustered index that covers a query will be faster
than a clustered index on the same columns because fewer pages would need to be read.
In the following example, a nonclustered index on the au_lname and au_fname columns of
the authors table would cover the query because the result columns and the SARGs can
all be derived from the index itself:
Select au_lname, au_fname
From authors
Where au_lname like ‘M%’
Go
Many other queries that use an aggregate function (such as MIN, MAX, AVG, SUM, and COUNT)

or simply check for existence of criteria also benefit from index covering. The following
aggregate query samples can take advantage of index covering:
select count(au_lname) from authors where au_lname like ‘M%’
select count(*) from authors where au_lname like ‘M%’
select count(*) from authors
Download from www.wowebook.com
ptg
1189
Index Design Guidelines
34
You might wonder how the last query, which doesn’t even specify a SARG, can use an
index. SQL Server knows that by its nature, a nonclustered index contains a row for every
data row in the table; it can simply count all the rows in any of the nonclustered indexes
instead of scanning the whole table. For the last query, SQL Server chooses the smallest
nonclustered index—that is, the one with the smallest number of leaf pages.
Index covering can sometimes occur when you are not expecting it. As discussed previ-
ously in this chapter, when you have a clustered index defined on a table, the clustered
key is carried into all the nonclustered indexes to be used as the row locator to locate the
actual data row. Having the additional clustered key column values in the nonclustered
index provides more data values that can be used in index covering.
For example, assume that the
authors table has a clustered index on au_lname and
au_fname and a nonclustered primary key defined on au_id. Each row in the nonclustered
index on au_id would contain the clustered key values for au_lname and au_fname for its
corresponding data row. Because of this, the following query would actually be covered by
the nonclustered index on au_id:
select au_lname, au_fname
from authors
where au_id like ‘123%’
Explicitly adding additional columns to nonclustered indexes to promote the occurrence

of index covering has historically been a common method of improving query response
time. Consider the following query:
select royalty from titles
where price between $10 and $20
If you create an index on only the price column, SQL Server can find the rows in the
index where price is between $10 and $20, but it has to access the data rows to retrieve
royalty. With 100 rows in the range, the worst-case I/O cost to retrieve the data rows
would be as follows:
Number of index levels
+ Number of index pages to find the 100 matching rows
+ (100 × Number of pages per lookup via the row locator)
If the royalty column were added to the index on the price column, SQL Server could
scan the index to retrieve the results instead of having to perform the lookups via the row
locator against the table, resulting in faster query response. The I/O cost using index
covering would be lower, as follows:
Number of index levels
+ Number of index pages to scan to find the 100 matching rows
Download from www.wowebook.com
ptg
1190
CHAPTER 34 Data Structures, Indexes, and Performance
If you are considering padding your indexes to take advantage of index covering, beware
of making an index too wide. As index row width approaches data row width, the benefits
of covering are lost as the number of pages in the leaf level increases. As the number of
leaf-level index pages approaches the number of pages in the table, the number of index
levels also increases, increasing the I/O cost of using the index to locate data.
You should also avoid adding to the index columns that are frequently updated.
Remember that any changes to the columns in the data rows cascade into the indexes as
well. This increases the index maintenance overhead, which can adversely affect update
performance.

As an alternative to adding columns to the nonclustered index key to encourage index
covering, you might want to consider taking advantage of the included columns feature in
SQL Server 2008.
Included Columns
A feature available for nonclustered indexes in SQL Server 2008 is included columns.
Included columns allow you to add nonkey columns to the leaf level of a nonclustered
index for the purpose of index covering.
One advantage of included columns is that because the nonkey columns are stored only
in the leaf level of the index, the nonleaf rows of the index are smaller, which helps
reduce the overall size of the index, thereby helping reduce the I/O cost of using the
index. Another advantage is that this feature allows you to exceed the SQL Server
maximum limits of 16 index key columns and 900-byte index key size. The included
nonkey columns are not factored in when calculating the number of index key columns
or index key size. All data types are allowed as included columns except for the
text,
ntext, and image data types. To add included columns to an index, specify the INCLUDE
clause to the CREATE INDEX statement:
CREATE INDEX NC_titles_price on titles (price) INCLUDE (royalty)
An additional advantage of included columns is that you can add columns to a unique
index for index covering purposes without affecting the uniqueness of the actual index
key(s) and without having to create a second index on the unique key column(s) and the
additional covering columns. For example, consider that you have a large number of
queries that search titles by title_id to retrieve the price value. Creating a covering index
on title_id and price could improve performance of these queries. However, creating a
unique index on title_id and price would not enforce uniqueness on title_id alone (it
would allow the insertion of multiple rows with the same title_id as long as they had
different prices). Without using included columns, you would have to create a unique
index on title_id and an additional nonunique index on title_id and price to enforce
uniqueness on title_id and also have a covering index on title_id and price. However,
with the included column feature, you can create just a single unique index on title_id

with price as an included column:
CREATE INDEX UQ_titleid_price on titles (title_id) INCLUDE (price)
Download from www.wowebook.com
ptg
1191
Indexed Views
34
TIP
If you have existing nonclustered indexes with a large index key size, you might want to
consider redesigning them so that only columns used for searching and lookups are
key columns. You should make all other columns that were added for index covering
into included columns. This way, you still have all columns needed to cover your
queries, but the index key itself is smaller and more efficient.
You still should be careful to avoid adding unnecessary columns as included columns of
an index. Adding too many index columns, key or nonkey, can adversely affect perfor-
mance for the following reasons:
. Fewer index leaf rows fit on a page, which can increase I/O costs to search the leaf
level of the index and also reduce data cache efficiency.
. Because of the increased leaf row size, more disk space is required to store the index,
especially if you are adding varchar(max), nvarchar(max), varbinary(max), or xml
data types as nonkey index columns. Because the column values are also copied into
the index leaf level, you are essentially storing the data values twice.
. Changes to the included columns in the data rows cascade into the leaf rows of the
index as well. This increases the index maintenance overhead, which can adversely
affect performance of data modifications.
Wide Indexes Versus Multiple Indexes
As an index key gets wider, the selectivity of the key generally becomes higher as well. It
might seem that creating wide indexes would result in better performance. This is not
necessarily true. The reason is that the wider the key, the fewer rows SQL Server stores on
the index pages, requiring more pages at each level; this results in a higher number of

levels in the index B-tree. To get to specific rows, SQL Server must perform more I/O.
To get better performance from queries, instead of creating a few wide indexes, you should
consider creating multiple narrower indexes. The advantage here is that with smaller keys,
the Query Optimizer can quickly scan through multiple indexes to determine the most
efficient access plan. SQL Server has the option of performing multiple index lookups
within a single query and merging the result sets together to generate an intersection of
the indexes. Also, with more indexes, the Query Optimizer can choose from a wider
variety of query plan alternatives.
If you are considering creating a wide key, you should individually check the distribution
of values for each member of the composite key. If the selectivity on the individual
columns is high, you might want to break up the index into multiple indexes. If the selec-
tivity of individual columns is low but is high for combined columns, it makes sense to
have wider keys on the table. To get to the right combination, you can populate your
table with real-world data, experiment with creating multiple indexes, and check the
Download from www.wowebook.com
ptg
1192
CHAPTER 34 Data Structures, Indexes, and Performance
distribution of values for each column. Based on the histogram steps and index density,
you can make the decisions for an index design that works best for your environment.
Indexed Views
As discussed in Chapter 27, “Creating and Managing Views,” SQL Server 2008 allows you
to create indexed views. An indexed view is any view that has a clustered index defined on
it. When a CREATE INDEX statement is executed on a view, the result set for the view is
materialized and stored in the database with the same structure as a table with a clustered
index. Changes made to the data in the underlying tables of the view are automatically
reflected in the view the same way any changes to a table are reflected in its indexes. In
addition to a clustered index, you can create additional nonclustered indexes on indexed
views to provide additional query performance. Additional indexes on views might
provide more options for the Query Optimizer to choose from during the optimization

process.
In the Developer and Enterprise Editions of SQL Server 2008, when an indexed view exists
on a table and you access the view directly within a query, the Query Optimizer automati-
cally considers using the index on the view to improve query performance, just as an
index on a table is used to improve performance. The Query Optimizer also considers
using the indexed view, even for queries that do not directly name the view in the
FROM
clause. In other words, when a query might benefit from using the indexed view, the
Query Optimizer can use the indexed view to satisfy the query in place of an existing
index on the table itself. (For more information on how indexed views are used in query
plans, see Chapter 35.)
It is important to note that although indexed views can be created in all editions of SQL
Server 2008, only the Developer and Enterprise Editions automatically use indexed views to
optimize queries. In the other editions, indexed views are not used to improve query
performance unless the view is explicitly specified in the query and the NOEXPAND hint is
specified as well. Without the NOEXPAND hint, SQL Server expands the view to its underlying
base tables and optimizes based on the table indexes. The following example shows the use
of the NOEXPAND option to force SQL Server to use the indexed view specified in the query:
select * from sales_Qty_Rollup WITH (NOEXPAND)
where stor_id between ‘B914’ and ‘B999’SET ARITHABORT ON
Indexed views add overhead and can be more complex for SQL Server to maintain over
time than normal indexes. Each time an underlying table of a view is modified, SQL
Server has to update the view result set and potentially the index on that view. The scope
of a view’s index can be larger than that of any single table’s index, especially if the view
is defined on several large tables. The overhead associated with maintaining a view and its
index during updates can negate any benefit that queries gain from the indexed view.
Because of this additional maintenance overhead, you should create indexes only on
Download from www.wowebook.com
ptg
1193

Indexes on Computed Columns
34
views where the advantage provided by the improved speed in retrieving the results
outweighs the increased maintenance overhead.
Following are some guidelines to consider when you design indexed views:
. Create indexes on views where the underlying table data is relatively static.
. Create indexed views that will be used by several queries.
. Keep the indexes small. As with table indexes, a smaller index allows SQL Server to
access the data more efficiently.
. Create indexed views that will be significantly smaller than the underlying table(s).
An indexed view might not provide significant performance gains if its size is similar
to the size of the original table.
. You need to specify the NOEXPAND hint in editions of SQL Server other than the
Developer and Enterprise Editions of SQL Server; otherwise, the indexed view is not
used to optimize the query.
Indexes on Computed Columns
SQL Server 2008 allows you to build indexes on computed columns in your tables.
Computed columns can participate at any position of an index, along with your other
table columns, including in a PRIMARY KEY or UNIQUE constraint. To create an index on
computed columns, you must set the following session options as shown:
. SET CONCAT_NULL_YIELDS_NULL ON
. SET QUOTED_IDENTIFIER ON
. SET ANSI_NULLS ON
. SET ANSI_PADDING ON
. SET ANSI_WARNINGS ON
. SET NUMERIC_ROUNDABORT OFF
If any of these six SET options were not in effect when you created the table, you get the
following message when you try to create an index on the computed column:
Server: Msg 1934, Level 16, State 1, Line 2
CREATE INDEX failed because the following SET options

have incorrect settings: ‘<OPTION NAME>’.
Download from www.wowebook.com

×