Microsoft SQL Server 2008 R2 Unleashed- P121 ppsx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (220.97 KB, 10 trang )

ptg
1144
CHAPTER 34 Data Structures, Indexes, and Performance
Deleting Rows
What happens when rows are deleted from a table? How, and when, does SQL Server
reclaim the space when data is removed from a table?
Deleting Rows from a Heap
In a heap table, SQL Server does not automatically compress the space on a page when a
row is removed; that is, the rows are not all moved up to the beginning of the page to
keep all free space at the end, as SQL Server did in versions prior to 7.0. To optimize
performance, SQL Server holds off on compacting the rows until the page needs contigu-
ous space for storing a new row.
Deleting Rows from an Index
Because the data pages of a clustered table are actually the leaf pages of the clustered
index, the behavior of data row deletes on a clustered table is the same as row deletions
from an index page.
When rows are deleted from the leaf level of an index, they are not actually deleted but are
marked as ghost records. Keeping the row as a ghost record makes it easier for SQL Server
to perform key-range locking (key-range locking is discussed in Chapter 37, “Locking and
Performance”). If ghost records were not used, SQL Server would have to lock the entire
range surrounding the deleted record. With the ghost record still present and visible inter-
nally to SQL Server (it is not visible in query result sets), SQL Server can use the ghost
record as an endpoint for the key-range lock to prevent “phantom” records with the same
key value from being inserted, while allowing inserts of other values to proceed.
Ghost records do not stay around forever, though. SQL Server has a special internal house-
keeping process that periodically examines the leaf level of B-trees for ghost records and
removes them. This is the same thread that performs the autoshrink process for databases.
Whenever you delete a row, all nonclustered indexes need to be updated to remove the
pointers to the deleted row. Nonleaf index rows are not ghosted when deleted. As with
heap tables, however, the space is not compressed on the nonleaf index page until space is
needed for a new row.

Reclaiming Space
Only when the last row is deleted from a data page is the page deallocated from the table.
The only exception is if it is the last page remaining; all tables must have at least one page
allocated, even if it’s empty. When a deletion of an index row leaves only one row remain-
ing on the page, the remaining row is moved to a neighboring page, and the now-empty
index page is deallocated.
If the page to be deallocated is the last remaining used page in a uniform extent allocated
to the table, the extent is deallocated from the table as well.
Download from www.wowebook.com
ptg
1145
Data Modification and Performance
34
Updating Rows
SQL Server 2008 performs row updates by evaluating the number of rows affected, whether
the rows are being accessed via a scan or index retrieval and whether any index keys are
being modified, and automatically chooses the appropriate and most efficient update strat-
egy for the rows affected. SQL Server can perform two types of update strategies:
. In-place updates
. Not-in-place updates
In-Place Updates
In SQL Server 2008, in-place updates are performed as often as possible to minimize the
overhead of an update. An in-place update means that the row is modified where it is on
the page, and only the affected bytes are changed.
When an in-place update is performed, in addition to the reduced overhead in the table
itself, only a single modify record is written to the log. However, if the table has a trigger
on it or is marked for replication, the update is still done in place but is recorded in the
log as a delete followed by an insert (this provides the before-and-after image for the
trigger that is referenced in the
inserted and deleted tables).

In-place updates are performed whenever a heap is being updated and the row still fits on
the same page, or when a clustered table is updated and the clustered key itself is not
changed. You can get an in-place update if the clustered key changes but the row does not
have to move; that is, the sorting of the rows wouldn’t change.
Not-In-Place Updates
If the change to a clustered key prevents an in-place update from being performed, or if
the modification to a row increases its size such that it can no longer fit on its current
page, the update is performed as a delete followed by an insert; this is referred to as a not-
in-place update.
When performing an update that affects multiple index keys, SQL Server keeps a list of the
rows that need to be updated in memory, if it’s small enough; otherwise, it is stored in
tempdb. SQL Server then sorts the list by index key and type of operation (delete or insert).
This list of operations, called the input stream, consists of both the old and new values for
every column in the affected rows as well as the unique row identifier for each row.
SQL Server then examines the input stream to determine whether any of the updates
conflict or would generate duplicate key values while processing (if they were to generate
a duplicate key after processing, the update cannot proceed). It then rearranges the opera-
tions in the input stream in a manner to prevent any intermediate violations of the
unique key.
For example, consider the following update to a table with a unique key on a sequential
primary key:
update table1 set pkey = pkey + 1
Download from www.wowebook.com
ptg
1146
CHAPTER 34 Data Structures, Indexes, and Performance
Even though all values would still be unique when the update finished, if the update were
performed internally one row at a time in sequential order, it would generate duplicates
during the intermediate processing as the pkey value was incremented and matched the
next pkey value. SQL Server would rearrange and rework the updates in the input stream

to process them in a manner that would avoid the duplicates and then process them a
row at a time. If possible, deletes and inserts on the same key value in the input stream
are collapsed into a single update. In some cases, you might still get some rows that can
be updated in place.
Forward Pointers
As mentioned earlier, when page splits on a clustered table occur, the nonclustered
indexes do not need to be updated to reflect the new location of the rows because the row
locator for the row is the clustered index key rather than the page and row ID. When an
update operation on a heap table causes rows to move, the row locators in the nonclus-
tered index would need to be updated to reflect the new location or the rows. This could
be expensive if there were a larger number of nonclustered indexes on the heap.
SQL Server 2008 addresses this performance issue through the use of forward pointers.
When a row in a heap moves, it leaves a forward pointer in the original location of the row.
The forward pointer avoids having to update the nonclustered index row locator. When
SQL Server is searching for the row via the nonclustered index, the index pointer directs it
to the original location, where the forward pointer redirects it to the new row location.
A row never has more than one forward pointer. If the row moves again from its
forwarded location, the forward pointer stored at the original row location is updated to
the row’s new location. There is never a forward pointer that points to another forward
pointer. If the row ever shrinks enough to fit back into its original location, the forward
pointer is removed, and the row is put back where it originated.
When a forward pointer is created, it remains unless the row moves back to its original
location. The only other circumstance that results in forward pointers being deleted occurs
when the entire database is shrunk. When a database file is shrunk and the data reorga-
nized, all row locators are reassigned because the rows are moved to new pages.
Index Utilization
Now that you have an understanding of table and index structures and the overhead
required to maintain your data and indexes, you might want to put things into practice to
actually come up with an index design for your database, defining the appropriate indexes
to support your queries. To effectively determine the appropriate indexes that should be

created, you need to determine whether they’ll actually be used by the SQL Server Query
Optimizer. If an index is not being used effectively, it’s just wasting space and creating
unnecessary overhead during updates.
Download from www.wowebook.com
ptg
1147
Index Utilization
34
The main criterion to remember is that SQL Server does not use an index for the more
efficient row locator lookup if at least the first column of the index is not included in a
valid search argument (SARG) or join clause. You should keep this point in mind when
choosing the column order for composite indexes. For example, consider the following
index on the stores table in the bigpubs2008 database:
create index nc1_stores on stores (city, state, zip)
NOTE
Unless stated otherwise, all sample queries from this point on in this chapter are run
in the bigpubs2008 database, which is available on the included CD or via download
from this book’s website at www.samspublishing.com. Instructions on installing this
database is provided in the Introduction.
Each of the following queries could use the index because they include the first column,
city, of the index as part of the SARG:
select stor_name from stores
where city = ‘Frederick’
and state = ‘MD’
and zip = ‘21702’
select stor_name from stores
where city = ‘Frederick’
and state = ‘MD’
select stor_name from stores
where city = ‘Frederick’

and zip = ‘21702’
However, the following queries do not use the index for a row locator lookup because
they don’t specify the city column as a SARG:
select stor_name from stores
where state = ‘MD’
and zip = ‘21702’
select stor_name from stores
where zip = ‘21702’
For the index nc1_stores to be used for a row locator lookup in the last query, you would
have to reorder the columns so that zip is first—but then the index wouldn’t be useful for
Download from www.wowebook.com
ptg
1148
CHAPTER 34 Data Structures, Indexes, and Performance
any queries specifying only city and/or state. Satisfying all the preceding queries in this
case would require additional indexes on the stores table.
NOTE
For the two preceding queries, if you were to display the execution plan information (as
described in Chapter 36, “Query Analysis”), you might see that the queries actually use
the nc1_stores index to retrieve the result set. However, if you look closely, you can
see the queries are not using the index in the most efficient manner; the index is
being used to perform an index scan rather than an index seek. An index seek is what
we are really after. (Alternative query access methods are discussed in more detail in
Chapter 35). In an index seek, SQL Server searches for the specific SARG by walking
the index tree from the root level down to the specific row(s) with matching index key
values and then uses the row locator value stored in the index key to directly retrieve
the matching row(s) from the data page(s); the row locator is either a specific row iden-
tifier or the clustered key value for the row.
For an index scan, SQL Server searches all the rows in the leaf level of the index,
looking for possible matches. If any are found, it then uses the row locator to retrieve

the data row.
Although both seeks and scans use an index, the index scan is still more expensive in
terms of I/O than an index seek but slightly less expensive than a table scan, which is
why it is used. However, in this chapter you learn to design indexes that result in index
seeks, and when this chapter talks about queries using an index, index seeks are what
it refers to (except for the section on index covering, but that’s a horse of a slightly dif-
ferent color).
You might think that the easy solution to get row locator lookups on all possible columns
is to index all the columns on a table so that any type of search criteria specified for a
query can be helped by an index. This strategy might be somewhat appropriate in a read-
only decision support system (DSS) environment that supports ad hoc queries, but it is
not likely because many of the indexes probably still wouldn’t even be used. As you see in
the section “Index Selection,” later in this chapter, just because an index is defined on a
column doesn’t mean that the Query Optimizer is necessarily always going to use it if the
search criteria are not selective enough. Also, creating that many indexes on a large table
could take up a significant amount of space in the database, increasing the time required
to back up and run
DBCC checks on the database. As mentioned earlier, too many indexes
on a table in an OLTP environment can generate a significant amount of overhead during
inserts, updates, and deletes and have a detrimental impact on performance.
TIP
A common design mistake often made is too many indexes defined on tables in OLTP
environments. In many cases, some of the indexes are redundant or are never even
considered by the SQL Server Query Optimizer to process the queries used by the appli-
cations. These indexes end up simply wasting space and adding unnecessary overhead
to data updates.
Download from www.wowebook.com
ptg
1149
Index Selection

34
A case in point was one client who had eight indexes defined on a table, four of which
had the same column, which was a unique key, as the first column in the index. That
column was included in the WHERE clauses for all queries and updates performed on
the table. Only one of those four indexes was ever used.
It is hoped that, by the end of this chapter, you understand why all these indexes were
unnecessary and are able to recognize and determine which columns benefit from hav-
ing indexes defined on them and which indexes to avoid.
Index Selection
To determine which indexes to define on a table, you need to perform a detailed query
analysis. This process involves examining the search clauses to see what columns are refer-
enced, knowing the bias of the data to determine the usefulness of the index, and ranking
the queries in order of importance and frequency of execution. You have to be careful not
to examine individual queries and develop indexes to support one query, without consid-
ering the other queries that are executed on the table as well. You need to come up with a
set of indexes that work for the best cross-section of your queries.
TIP
A useful tool to help you identify your frequently executed and critical queries is SQL
Server Profiler. I’ve found SQL Server Profiler to be invaluable when going into a new
client site and having to identify the problem queries that need tuning. SQL Server
Profiler allows you to trace the procedures and queries being executed in SQL Server
and capture the runtime, reads and writes, execution plans, and other processing infor-
mation. This information can help you identify which queries are providing substandard
performance, which ones are being executed most often, which indexes are being used
by the queries, and so on.
You c an analyze thi s informatio n your self manually or save a trace to analyze with the
Database Engine Tuning Advisor. The features of SQL Server Profiler are covered in
more detail in Chapter 6, “SQL Server Profiler.” The Database Engine Tuning Advisor is
discussed in more detail in Chapter 55, “Configuring, Tuning, and Optimizing SQL
Server Options.”

Because it’s usually not possible to index for everything, you should index first for the
queries most critical to your applications or those run frequently by many users. If you have
a query that’s run only once a month, is it worth creating an index to support only that
query and having to maintain it throughout the rest of the month? The sum of the addi-
tional processing time throughout the month could conceivably exceed the time required to
perform a table scan to satisfy that one query.
Download from www.wowebook.com
ptg
1150
CHAPTER 34 Data Structures, Indexes, and Performance
TIP
If, due to query response time requirements, you must have an index in place when a
query is run, consider creating the index only when you run the query and then drop-
ping the index for the remainder of the month. This approach is feasible as long as the
time it takes to create the index and run the query that uses the index doesn’t exceed
the time it takes to simply run the query without the index in place.
Evaluating Index Usefulness
SQL Server provides indexes for two primary reasons: as a method to enforce the unique-
ness of the data in the database tables and to provide faster access to data in the tables.
Creating the appropriate indexes for a database is one of the most important aspects of
physical database design. Because you can’t have an unlimited number of indexes on a
table, and it wouldn’t be feasible anyway, you should create indexes on columns that have
high selectivity so that your queries will use the indexes. The selectivity of an index can
be defined as follows:
Selectivity ratio = Number of unique index values / Number of rows in table
If the selectivity ratio is high—that is, if a large number of rows can be uniquely identified
by the key—the index is highly selective and useful to the Query Optimizer. The optimum
selectivity would be 1, meaning that there is a unique value for each row. A low selectivity
means that there are many duplicate values and the index would be less useful. The SQL
Server Query Optimizer decides whether to use any indexes for a query based on the selec-

tivity of the index. The higher the selectivity, the faster and more efficiently SQL Server
can retrieve the result set.
For example, say that you are evaluating useful indexes on the
authors table in the
bigpubs2008 database. Assume that most of the queries access the table either by author’s
last name or by state. Because a large number of concurrent users modify data in this
table, you are allowed to choose only one index—author’s last name or state. Which one
should you choose? Let’s perform some analysis to see which one is a more useful, or
selective, index.
First, you need to determine the selectivity based on the author’s last name with a query
on the authors table in the bigpubs2008 database:
select count(distinct au_lname) as ‘# unique’,
count(*) as ‘# rows’,
str(count(distinct au_lname) / cast (count(*) as real),4,2) as ‘selectivity’
from authors
go
# unique # rows selectivity

160 172 0.93
Download from www.wowebook.com
ptg
1151
Evaluating Index Usefulness
34
The selectivity ratio calculated for the au_lname column on the authors table, 0.93, indi-
cates that an index on au_lname would be highly selective and a good candidate for an
index. All but 12 rows in the table contain a unique value for last name.
Now, look at the selectivity of the state column:
select count(distinct state) as ‘# unique’,
count(*) ‘# rows’,

str(count(distinct state) / cast (count(*) as real),4,2) as ‘selectivity’
from authors
go
# unique # rows selectivity

38 172 0.22
As you can see, an index on the state column would be much less selective (0.22) than
an index on the au_lname column and possibly not as useful.
One of the questions to ask at this point is whether a few values in the state column that
have a high number of duplicates are skewing the selectivity or whether there are just a few
unique values in the table. You can determine this with a query similar to the following:
select state,
count(*) as numrows,
count(*)/b.totalrows * 100 as percentage
from authors a,
(select convert(numeric(6,2), count(*)) as totalrows from authors) as b
group by state, b.totalrows
having count(*) > 1
order by 2 desc
go
state numrows percentage

CA 37 21.5116200
NY 18 10.4651100
TX 15 8.7209300
OH 9 5.2325500
FL 8 4.6511600
IL 7 4.0697600
NJ 7 4.0697600
WA 6 3.4883700

PA 6 3.4883700
CO 5 2.9069700
LA 5 2.9069700
MI 5 2.9069700
Download from www.wowebook.com
ptg
1152
CHAPTER 34 Data Structures, Indexes, and Performance
MN 3 1.7441800
MO 3 1.7441800
OK 3 1.7441800
AZ 3 1.7441800
AK 2 1.1627900
IN 2 1.1627900
GA 2 1.1627900
MA 2 1.1627900
NC 2 1.1627900
NE 2 1.1627900
SD 2 1.1627900
VA 2 1.1627900
WI 2 1.1627900
WV 2 1.1627900
As you can see, most of the state values are relatively unique, except for one value, ’CA’,
which accounts for more than 20% of the values in the table. Therefore, state is probably
not a good candidate for an indexed column, especially if most of the time you are search-
ing for authors from the state of California. SQL Server would generally find it more effi-
cient to scan the whole table rather than search via the index.
NOTE
When a single value skews the selectivity of an index, as in this example with the
state column, this type of column might be a candidate for a filtered index, a new fea-

ture in SQL Server 2008. See the section “Filtered Indexes and Statistics,” later in
this chapter.
As a general rule of thumb, if the selectivity ratio for a nonclustered index key is less than
0.85 (in other words, if the Query Optimizer cannot discard at least 85% of the rows based
on the key value), the Query Optimizer generally chooses a table scan to process the query
rather than a nonclustered index. In such cases, performing a table scan to find all the
qualifying rows is more efficient than seeking through the B-tree to locate a large number
of data rows.
NOTE
You c an relate the concept of selectivity to a hypothetical example. Say t hat you ne ed
to find every instance of the word SQL in this book. Would it be easier to do it by using
the index and going back and forth from the index to all the pages that contain the
word, or would it be easier just to scan each page from beginning to end to locate
every occurrence? What if you had to find all references to the word squonk, if any?
Squonk would definitely be easier to find via the index (actually, the index would help
you determine that it doesn’t even exist). Therefore, the selectivity for Squonk would be
high, and the selectivity for SQL would be much lower.
Download from www.wowebook.com
ptg
1153
Index Statistics
34
How does SQL Server determine whether an index is selective and which index, if it has
more than one to choose from, would be the most efficient to use? For example, how
would SQL Server know how many rows the following query might return?
select * from table
where key between 1000000 and 2000000
If the table contains 10,000,000 rows with values ranging between 0 and 20,000,000, how
does the Query Optimizer know whether to use an index or a table scan? There could be
10 rows in the range, or 900,000. How does SQL Server estimate how many rows are

between 1,000,000 and 2,000,000? The Query Optimizer gets this information from the
index statistics, as described in the next section.
Index Statistics
As mentioned earlier, the selectivity of a key is an important factor that determines
whether an index will be used to retrieve the data rows that satisfy a query. SQL Server
stores the selectivity and a histogram of sample values of the key; based on the statistics
stored for the key columns for the index and the SARGs specified for the query, the Query
Optimizer decides which index to use.
To see the statistical information stored for an index, use the DBCC SHOW_STATISTICS
command, which returns the following pieces of information:
. A histogram that contains an even sampling of the values for the first column in the
index key. SQL Server stores up to 200 sample values in the histogram.
. Index densities for the combination of columns in the index. Index density indicates
the uniqueness of the index key(s) and is discussed later in this section.
. The number of rows in the table at the time the statistics were computed.
. The number of rows sampled to generate the statistics.
. The number of sample values (steps) stored in the histogram.
. The average key length.
. Whether the index is defined on a string column.
. The date and time the statistics were generated.
The syntax for DBCC SHOW_STATISTICS is as follows:
DBCC SHOW_STATISTICS (tablename, index)
Listing 34.4 displays the abbreviated output from DBCC SHOW_STATISTICS, showing the
statistical information for the aunmind nonclustered index on the au_lname and au_fname
columns of the authors table.
Download from www.wowebook.com

Microsoft SQL Server 2008 R2 Unleashed- P121 ppsx

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về