Tải bản đầy đủ (.pdf) (10 trang)

Microsoft SQL Server 2008 R2 Unleashed- P120 pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (299.26 KB, 10 trang )

ptg
1134
CHAPTER 34 Data Structures, Indexes, and Performance
Data (Leaf) Pages
Page 8
Albert, Lynn,…
Alexis, Amy,…
Key Page ptr
Page 14
Albert
Cox
Eddy
8
9
10
Page 9
Dean, Beth,…
Cox, Nancy,…
Page 10
Eddy, Elizabeth,…
Frank, Anabelle,…
Page 11
Sally, Hunt,…
Martin, Emma,…
Page 12
Smith, David,…
Toms, Mike,…
Page 13
Watson, Tom,…
Key Page ptr
Page 24


Albert
Hunt
14
15
Key Page ptr
Page 15
Hunt
Smith
Watson
11
12
13
FIGURE 34.16 The structure of a clustered index.
By default, a clustered index has a single partition and thus has at least one row in
sys.partitions with index_id = 1. When a clustered index has multiple partitions, a
separate B-tree structure contains the data for that specific partition.
Depending on the data types in the clustered index, each clustered index structure has
one or more allocation units in which to store and manage the data for a specific parti-
tion. At a minimum, each clustered index has one IN_ROW_DATA allocation unit per parti-
tion. If the table contains any LOB data, the clustered index also has one LOB_DATA
allocation unit per partition and one ROW_OVERFLOW_DATA allocation unit per partition if
the table contains any variable-length columns that exceed the 8,060-byte row size limit.
Clustered Index Row Structure
The structure of a clustered index row is similar to the structure of a data row except that
it contains only key columns; this structure is detailed in Figure 34.17.
Download from www.wowebook.com
ptg
1135
Understanding Index Structures
34

(Shaded Areas represent data present only when index contains nullable or variable length columns)
Status
Byte A
(1 byte)
Fixed Length
Key Data
(n bytes)
Number
of
Columns
(2 bytes)
Null
Bitmap
(1 bit for
each
column)
File ID
(2 bytes)
Page
Number
(4 bytes)
Slot
number
(2 bytes)
Row Locator
Number
of
Variable
Length
Columns

(2 bytes)
Column
Offset
Array
(2 x
number
of variable
columns)
Variable Length
Key Data
(n bytes)
FIGURE 34.17 Clustered index row structure.
Notice that unlike a data row, index rows do not contain the status byte B or the 2 bytes to
hold the length of fixed-length data fields. Instead of storing the length of the fixed-length
data, which also indicates where the fixed-length portion of a row ends and the variable-
length portion begins, the page header pminlen value is used to help describe an index
row. The pminlen value is the minimum length of the index row, which is essentially the
sum of the size of all fixed-width fields and overhead. Therefore, if no variable-length or
nullable fields are in the index key, pminlen also indicates the width of each index row.
The null bitmap field and field for the number of columns in the index row are present
only when an index key contains nullable columns. The number of columns value is only
needed to determine how many bits are needed in the null bitmap and therefore how
many bytes are required to store the null bitmap (1 byte per eight columns). The data
contents of a clustered index row include the key values along with a 6-byte down-page
pointer (the first 2 bytes are the file ID, and the last 4 bytes are the page number). The
down-page pointer is the last value in the fixed-data portion of the row.
Nonunique Clustered Indexes
When a clustered index is defined on a table, the clustered index keys are used as row
locators to identify the data rows being referenced by nonclustered indexes (more on this
topic in the following section on nonclustered indexes). Because the clustered keys are

used as unique row pointers, there needs to be a way to uniquely refer to each row in the
table. If the clustered index is defined as a unique index, the key itself uniquely identifies
every row. If the clustered index was not created as a unique index, SQL Server adds a 4-
byte integer field, called a uniqueifier, to the data row to make each key unique when
necessary. When is the uniqueifier necessary? SQL Server adds the uniqueifier to a row
when the row is added to a table and that new row contains a key that is a duplicate of
the key for an already-existing row.
The uniqueifier is added to the variable-length data area of the data row, which also
results in the addition of the variable-length overhead bytes. Therefore, each duplicate row
in a clustered index has a minimum of 4 bytes of overhead added for the additional
Download from www.wowebook.com
ptg
1136
CHAPTER 34 Data Structures, Indexes, and Performance
uniqueifier. If the row had no variable-length keys previously, an additional 8 bytes of
overhead are added to the row to store the uniqueifier (4 bytes) plus the overhead bytes
required for the variable data (storing the number of variable columns requires 2 bytes,
and the column offset array requires 2 bytes).
Nonclustered Indexes
A nonclustered index is a separate index structure, independent of the physical sort order
of the data rows in the table. You can have up to 999 nonclustered indexes per table.
A nonclustered index is similar to the index in the back of a book. To find the pages on
which a specific subject is discussed, you look up the subject in the index and then go to
the pages referenced in the index. This method is efficient as long as the subject is
discussed on only a few pages. If the subject is discussed on many pages, or if you want to
read about many subjects, it can be more efficient to read the entire book.
A nonclustered index works similarly to the book index. From the index’s perspective, the
data rows are randomly spread throughout the table. The nonclustered index tree contains
the index key values, in sorted order. There is a row at the leaf level of the index for each
data row in the table. Each leaf-level row contains a data row locator to locate the actual

data row in the table.
Toms
Watson
Hunt
Martin
Smith
Albert
Hunt
Albert
Dean
Non-
Leaf
Level
Leaf
Level
Albert
Toms
Index
Pages
Data Pages
11:1
9:2
8:2
13:2
8:1
10:2
Albert
Alexis
Cox
9:1

11:2
12:1
Dean
Eddy
Franks
Eddy
Smith
Page 8


Hunt
Alexis
Page 11


Cox
Toms
Page 12


Watson
Dean
Page 13


Franks
Page 10


Albert

Martin
Page 9


12:2
13:1
FIGURE 34.18 A nonclustered index on a heap table.
Download from www.wowebook.com
ptg
1137
Understanding Index Structures
34
If no clustered index is created for the table, the data row locator for the leaf level of the
index is an actual pointer to the data page and the row number within the page where the
row is located (see Figure 34.18).
Nonclustered indexes on clustered tables use the associated clustered index key value for
the record as the data row locator. When SQL Server reaches the leaf level of a nonclus-
tered index, it uses the clustered index key to start searching through the clustered index
to find the actual data row (see Figure 34.19). This adds some I/O to the search itself, but
the benefit is that if a page split occurs in a clustered table, or if a data row is moved (for
example, as a result of an update), the nonclustered index row locator stays the same. As
Non-Clustered
Index
Where
firstname=‘Sally’
= Indicates
search
Amy
Anabelle
Ruth

Albert
Cox
Eddy
Albert
Alexis
Lynn
Amy


Eddy
Franks
Elizabeth
Anabelle
Hunt
Martin
Sally
Emma
Smith
Toms
David
Mike
Watson Tom







Cox

Dean
Nancy
Beth


Alexis
Franks
Dean
David
Emma
Lynn
Smith
Martin
Albert
Mike
Nancy
Sally
Toms
Cox
Hunt
Tom Watson
Amy
Mike
Alexis
Toms
Amy
David
Albert
Hunt
Alexis

Smith
Mike
Tom
Toms
Hunt
Smith
Watson
Data Pages
Clustered
Index
FIGURE 34.19 A nonclustered index on a clustered table.
Download from www.wowebook.com
ptg
1138
CHAPTER 34 Data Structures, Indexes, and Performance
(Shaded Areas represent data present only when index contains nullable or variable length columns)
Status
Byte A
(1 byte)
Fixed Length
Key Data
(n bytes)
Number
of
Columns
(2 bytes)
Null
Bitmap
(1 bit for
each

column)
File ID
(2 bytes)
Page
Number
(4 bytes)
Slot
number
(2 bytes)
Row Locator
Number
of
Variable
Length
Columns
(2 bytes)
Column
Offset
Array
(2 x
number
of variable
columns)
Variable Length
Key Data
(n bytes)
FIGURE 34.20 The structure of a nonclustered index leaf row for a heap table.
long as the clustered index key value itself is not modified, no data row locators in the
nonclustered index have to be updated.
SQL Server performs the following steps when searching for a value by using a nonclus-

tered index:
1. Queries the system catalog to determine the page address for the root page of the
index.
2. Compares the search value against the index key values on the root page.
3. Finds the highest key value on the page where the key value is less than or equal to
the search value.
4. Follows the down-page pointer to the next level down in the nonclustered index tree.
5. Continues following page pointers (that is, repeats steps 3 and 4) until the nonclus-
tered index leaf page is reached.
6. Searches the index key rows on the leaf page to locate any matches for the search
value. If no matching row is found on the leaf page, the table contains no match-
ing values.
7. If a match is found on the leaf page, SQL Server follows the data row locator to the
data row on the data page.
Nonclustered Index Leaf Row Structures
In nonclustered indexes, if the row locator is a row ID, it is stored at the end of the fixed-
length data portion of the row. The rest of the structure of a nonclustered index leaf row is
similar to a clustered index row. Figure 34.20 shows the structure of a nonclustered leaf
row for a heap table.
If the row locator is a clustered index key value, the row locator resides in either the fixed
or variable portion of the row, depending on whether the clustered key columns were
defined as fixed or variable length. Figure 34.21 shows the structure of a nonclustered leaf
row for a clustered table.
When the row locator is a clustered key value and the clustered and nonclustered indexes
share columns, the data value for the key is stored only once in the nonclustered index
row. For example, if your clustered index key is on
lastname and you have a nonclustered
Download from www.wowebook.com
ptg
1139

Understanding Index Structures
34
(Shaded Areas represent data present only when index contains nullable or variable length columns)
Status
Byte A
(1 byte)
Fixed Length
Nonclustered
Key Data
(n bytes)
Number
of
Columns
(2 bytes)
Null
Bitmap
(1 bit for
each
column)
Number
of
Variable
Length
Columns
(2 bytes)
Column
Offset
Array
(2 x
number

of variable
columns)
Variable Length
Nonclustered
Key Data
(n bytes)
Row Locator
Non-
Overlapping
Fixed Length
Clustered Key
Data
(n bytes)
Non-
Overlapping
Variable
Length
Clustered Key
Data
(n bytes)
FIGURE 34.21 The structure of a nonclustered index leaf row for a clustered table.
(Shaded Areas represent data present only when index contains nullable or variable length columns)
Status
Byte A
(1 byte)
Fixed Length
Key Data
(n bytes)
File ID
(2 bytes)

Number
of
Columns
(2 bytes)
Number
of
Variable
Length
Columns
(2 bytes)
Variable Length
Key Data
(n bytes)
Page-Down Pointer
Page Number
(4 bytes)
Column
Offset
Array
(2 x
number
of variable
columns)
Null
Bitmap
(1 bit for
each
column)
FIGURE 34.22 The structure of a nonclustered nonleaf index row for a unique index.
index defined on both firstname and lastname, the index rows do not store the value of

lastname twice, but only once for both keys.
Nonclustered Index Nonleaf Row Structures
The nonclustered index nonleaf rows are similar in structure to clustered index nonleaf
rows in that they contain a page-down pointer to a page at the next level down in the
index tree. The nonleaf rows don’t need to point to data rows; they only need to provide
the path to traverse the index tree to a leaf row. If the nonclustered index is defined as
unique, the nonleaf index key row contains only the index key value and page-down
pointer. Figure 34.22 shows the structure of a nonleaf index row for a unique nonclus-
tered index.
If the nonclustered index is not defined as a unique index, the nonleaf rows also contain
the row locator information for the corresponding data row. Storing the row locator in the
nonleaf index row ensures each index key row is unique (because the row locator, by its
Download from www.wowebook.com
ptg
1140
CHAPTER 34 Data Structures, Indexes, and Performance
(Shaded Areas represent data present only when index contains nullable or variable length columns)
Number
of
Columns
(2 bytes)
Null Bitmap
(1 bit for each
column)
Number
of
Variable
Length
Columns
(2 bytes)

Column
Offset
Array
(2 x
number
of variable
columns)
Variable Length
Key Data
(n bytes)
Page-Down PointerRow Locator
Page
Number
(4 bytes)
File ID
(2 bytes)
File ID
(2 bytes)
Page
Number
(4 bytes)
Slot
Number
(2 bytes)
Status
Byte A
(1 byte)
Fixed Length
Key Data
(n bytes)

FIGURE 34.23 The structure of a nonclustered nonleaf index row for a nonunique index on a
heap table.
(Shaded Areas represent data present only when index contains nullable or variable length columns)
Status
Byte A
(1 byte)
Fixed Length
Nonclustered
Key Data
(n bytes)
Number
of
Columns
(2 bytes)
Null
Bitmap
(1 bit for
each
column)
Number
of
Variable
Length
Columns
(2 bytes)
Column
Offset
Array
(2 x
number

of variable
columns)
Variable Length
Nonclustered
Key Data
(n bytes)
Row Locator
Non-
Overlapping
Fixed Length
Clustered Key
Data
(n bytes)
Non-
Overlapping
Variable
Length
Clustered Key
Data
(n bytes)
File ID
(2 bytes)
Page
Number
(4 bytes)
Page-Down Pointer
FIGURE 34.24 The structure of a nonclustered nonleafindex row for a nonunique index on a
clustered table.
nature, must be unique). Ensuring each index key row is unique allows any corresponding
nonleaf index rows to be located and deleted more easily when the data row is deleted.

For a heap table, the row locator is the corresponding data row’s page and row pointer, as
shown in Figure 34.23.
If the table is clustered, the clustered key values are stored in the nonleaf index rows of the
nonunique nonclustered index just as they are in the leaf rows, as shown in Figure 34.24.
As you can see, it’s possible for the index pointers and row overhead to exceed the size of
the index key itself. This is why, for I/O and storage reasons, it is always recommended
that you keep your index keys as small as possible.
Download from www.wowebook.com
ptg
1141
Data Modification and Performance
34
Data Modification and Performance
Now that you have a better understanding of the storage structures in SQL Server, it’s time
to look at how SQL Server maintains and manages those structures when data modifica-
tions are taking place in the database.
Inserting Data
When you add a data row to a heap table, SQL Server adds the row to the heap wherever
space is available. SQL Server uses the IAM and PFS pages to identify whether any pages
with free space are available in the extents already allocated to the table. If no free pages
are found, SQL Server uses the information from the GAM and SGAM pages to locate a
free extent and allocate it to the table.
For clustered tables, the new data row is inserted to the appropriate location on the appro-
priate data page relative to the clustered index key order. If no more room is available on
the destination page, SQL Server needs to link a new page in the page chain to make room
available and add the row. This is called a page split.
In addition to modifying the affected data pages when adding rows, SQL Server needs to
update all nonclustered indexes to add a pointer to the new record. If a page split occurs,
this incurs even more overhead because the clustered index needs to be updated to store
the pointer for the new page added to the table. Fortunately, because the clustered key is

used as the row locator in nonclustered indexes when a table is clustered, even though
the page and row IDs have changed, the nonclustered index row locators for rows moved
by a page split do not have to be updated as long as the clustered key column values
remain the same.
Page Splits
When a page split occurs, SQL Server looks for an available page to link into the page
chain. It first tries to find an available page in the same extent as the pages it will be
linked to. If no free pages exist in the same extent, it looks at the IAM to determine
whether there are any free pages in any other extents already allocated to the table or
index. If no free pages are found, a new extent is allocated to the table.
When a new page is found or allocated to the table and linked into the page chain, the
original page is “split.” Approximately half the rows are moved to the new page, and the
rest remain on the original page (see Figure 34.25). Whether the new page goes before or
after the original page when the split is made depends on the amount of data to be moved.
In an effort to minimize logging, SQL Server moves the smaller rows to the new page. If
the smaller rows are at the beginning of the page, SQL Server places the new page before
the original page and moves the smaller rows to it. If the larger rows are at the beginning
of the page, SQL Server keeps them on the original page and moves the smaller rows to the
new page after the original page.
Download from www.wowebook.com
ptg
1142
CHAPTER 34 Data Structures, Indexes, and Performance
AAAA …
BBBB…
CCCC …
EEEE …
FFFF …
Page 1:201
GGGG …

HHHH …
JJJJ …
LLLL…
Page 1:202
AAAA …
BBBB …
CCCC …
Page 1:201
GGGG …
HHHH …
JJJJ …
KKKK …
Page 1:202
EEEE …
FFFF …
DDDD …
Page 1:307
DDDD…
New Row
Page
Split
FIGURE 34.25 Page splitting due to inserts.
After determining where the new row goes between the existing rows and whether the
new page is to be added before or after the original page, SQL Server has to move rows to
the new page. The simplified algorithm for determining the split point is as follows:
1. Place first row (with the lowest clustered key value) at the beginning of first page.
2. Place the last row (with the highest clustered key value) on the second page.
3. Place the row with the next lowest clustered key value on the first page after the
existing row(s).
4. Place the next-to-last row (with the second highest clustered key value) on the

second page.
5. Continue alternating back and forth until the space between the two pages is bal-
anced or one of the pages is full.
In some situations a double split can occur. If the new row has to go between two existing
rows on a page, but the new row is too large to fit on either page with any of the existing
rows, a new page is added after the original. The new row is added to the new page, a
second new page is added after that, and the remaining original rows are inserted into the
second new page. An example of a double split is shown in Figure 34.26.
Download from www.wowebook.com
ptg
1143
Data Modification and Performance
34
AAAA …
BBBB…
CCCC …
EEEE …
FFFF …
Page 1:201
GGGG …
HHHH …
JJJJ …
LLLL…
Page 1:202
AAAA …
BBBB …
CCCC …
Page 1:201
GGGG …
HHHH …

JJJJ …
KKKK …
Page 1:202
EEEE,,,
FFFF …
Page 1:308
DDDD XXXX
XXXXXXXXX
XXXXXXXXX
XXXXXXXXX
XXXXXXXXX
DDDD XXXX
XXXXXXXXX
XXXXXXXXX
XXXXXXXXX
XXXXXXXXX
Page 1:307
Page
Split
FIGURE 34.26 Double page split due to large row insert.
NOTE
Although page splits are expensive when they occur, they do generate free space in the
split pages for future inserts into those pages. Page splits also help keep the index
tree balanced as rows are added to the table. However, if you monitor the system with
Performance Monitor and are seeing hundreds of page splits per second, you might
want to consider rebuilding the clustered index on the table and applying a lower fill
factor to provide more free space in the existing pages. This can help improve system
performance until eventually the pages fill up and start splitting again. For this reason,
some shops supporting high-volume online transaction processing (OLTP) environments
with a lot of insert activity rebuild the indexes with a lower fill factor on a daily basis.

Download from www.wowebook.com

×