Tải bản đầy đủ (.pdf) (10 trang)

Microsoft SQL Server 2008 R2 Unleashed- P118 doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (268.48 KB, 10 trang )

ptg
1114
CHAPTER 34 Data Structures, Indexes, and Performance
If SQL Server had to search throughout an entire database file to find free extents, it would-
n’t be efficient. Instead, SQL Server uses two special types of pages to record which extents
have been allocated to tables or indexes and whether it is a mixed or uniform extent:
. Global allocation map pages (GAMs)
. Shared global allocation map pages (SGAMs)
Global and Shared Global Allocation Map Pages
The allocation map pages track whether extents have been allocated to objects and
indexes and whether the allocation is for mixed extents or uniform extents. As mentioned
in the preceding section, there are two types of GAMs:
. Global allocation map (GAM)—The GAM keeps track of all allocated extents in a
database, regardless of what it’s allocated to. The structure of the GAM is straightfor-
ward: each bit in the page outside the page header represents one extent in the file,
where 1 means that the extent is not allocated, and 0 means that the extent is allo-
cated. Nearly 8,000 bytes (64,000 bits) are available in a GAM page after the header
and other overhead bytes are taken into account. Therefore, a single GAM covers
approximately 64,000 extents, or 4GB (64,000 * 64KB) of data.
. Shared global allocation map (SGAM)—The SGAM keeps track of mixed extents
that have free space available. An SGAM has a structure similar to a GAM, with each
bit representing an extent. A value of 1 means that the extent is a mixed extent and
there is free space (at least one unused page) available on the extent. A value of 0
means that the extent is not currently allocated, that the extent is a uniform extent,
or that the extent is a mixed extent with no free pages.
Table 34.6 summarizes the meaning of the bit in GAMs and SGAMs.
When SQL Server needs to allocate a uniform extent, it simply searches the GAM for a bit
with a value of 1 and sets it to 0 to indicate it has been allocated. To find a mixed extent
with free pages, it searches the SGAM for a bit set to 1. When all pages in a mixed extent
are used, its corresponding bit is set to 0. When a mixed extent needs to be allocated, SQL
Server searches the GAM for an extent whose bit set to 1 and sets the bit to 0, and the


corresponding SGAM bit is set to 1. There is some more processing involved as well, such
as spreading the data evenly across database files, but the allocation algorithms are still
relatively simple.
TABLE 34.6 Meaning of the GAM and SGAM Bits
Extent Usage GAM Bit SGAM Bit
Free, not used
1 0
Uniform or mixed with no free pages
0 0
Mixed, with free pages available
0 1
Download from www.wowebook.com
ptg
1115
Space Allocation Structures
34
SQL Server is able to easily locate GAM pages in a database because the first GAM page is
located at the third page in the file (page number 2). There is another GAM every 511,230
pages after the first GAM. The fourth page (page number 3) in each database file is the
SGAM page, and there is another SGAM each 511,230 pages after the first SGAM.
Page Free Space Pages
A page free space (PFS) page records whether each page is allocated and the amount of free
space available on the page. Each PFS covers 8,088 contiguous pages in the file. For each
of the 8,088 pages, the PFS has a 1-byte record that contains a bitmap for each page indi-
cating whether the page is empty, 1 to 50% full, 51 to 80% full, 81 to 95% full, or more
than 95% full. The first PFS page in a file is located at page number 1, the second PFS page
is located at page 8088, and each additional PFS page is located every 8,088 pages after
that. SQL Server uses PFS pages to find free pages on extents and to find pages with space
available on extents when a new row needs to be added to a table or index.
Figure 34.6 shows the layout of GAM, SGAM, and PFS pages in a database file. Note that

every file has a single file header located at page 0.
Index Allocation Map Pages
Index allocation map (IAM) pages keep track of the extents used by a heap or index. Each
heap table and index has at least one IAM page for each file where it has extents. An IAM
cannot reference pages in other database files; if the heap or index spreads to a new data-
base file, a new IAM for the heap or index is created in that file. IAM pages are allocated as
needed and are spread randomly throughout the database files.
An IAM page contains a small header that has the address of the first extent in the range
of pages being mapped by the IAM. It also contains eight page pointers that keep track of
index or heap pages that are in mixed extents. These pointers might or might not contain
any information, depending on whether any data has been deleted from the tables and
the page(s) released. Remember, an index or heap will have no more than eight pages in
mixed extents (after eight pages, it begins using uniform extents), so only the first IAM
page stores this information. The remainder of the IAM page is for the allocation bitmap.
The IAM bitmap works similarly to the GAM, indicating which extents over the range of
extents covered by the IAM are used by the heap or index the IAM belongs to. If a bit is
on, the corresponding extent is allocated to the table.
Each IAM covers a possible range of 63,903 extents (511,224 pages), covering a 4GB
section of a file. Each bit represents an extent within that range, whether or not the
Page 0
File
Heaader
Page 2
GAM
Page
Page 3
SGAM
Page
Page
8089

PFS
Page

Page 1
PFS
Page
Page
16177
PFS
Page

Page
509545
PFS
Page
Page
511232
GAM
Page
Page
511233
SGAM
Page
… … …
FIGURE 34.6 The layout of GAM, SGAM, and PFS pages in a database file.
Download from www.wowebook.com
ptg
1116
CHAPTER 34 Data Structures, Indexes, and Performance
extent is allocated to the object that the IAM belongs to. If the bit is set to 1, the relative

extent in the range is allocated to the index or heap. If the bit is set to 0, the extent is
either not allocated or might be allocated to another heap or index.
For example, assume that an IAM page resides at page 649 in the file. If the bit pattern in
the first byte of the IAM is 1010 0100, the first, third, and sixth extents within the range
of the IAM are allocated to the heap or index. The second, fourth, fifth, seventh, and
eighth extents are not.
NOTE
For a heap table, the data pages and rows within them are not stored in any specific
order. Unlike versions of SQL Server prior to 7.0, the pages in a heap structure are not
linked together in a page chain. The only logical connection between data pages is the
information recorded in the IAM pages, which are linked together. The structure of heap
tables is examined in more detail later in this chapter.
Differential Changed Map Pages
The seventh page (page number 6), and every 511,232
nd
page thereafter, in the database
file is the differential changed map (DCM) page. This page keeps track of which extents in a
file have been modified since the last full database backup. When an extent has been
modified, its corresponding bit in the DCM is turned on. This information is used when a
differential backup is performed on the database. A differential backup copies only the
extents changed since the last full backup was made. Using the DCM, SQL Server can
quickly tell which extents need to be backed up by examining the bits on the DCM pages
for each data file in the database. When a full backup is performed for the database, all
the bits are set back to
0.
Bulk Changed Map Pages
The eighth page (page number 7), and every 511,232
nd
page thereafter, in the database file
is the bulk changed map (BCM). When you perform a minimally or bulk-logged operation

in SQL Server 2008 in BULK_LOGGED recovery mode, SQL Server logs only the fact that the
operation occurred and doesn’t log the actual data changes. The operation is still fully
recoverable because SQL Server keeps track of what extents were actually modified by the
bulk operation in the BCM page. Similar to the DCM page, each bit on a BCM page repre-
sents an extent within its range, and if the bit is set to 1, that indicates that the corre-
sponding extent has been changed by a minimally logged bulk operation since the last
full database backup. All the bits on the BCM page are reset to 0 whenever a full database
backup or log backup occurs.
When you initiate a log backup for a database using the BULK_LOGGED recovery model, SQL
Server scans the BCM pages and backs up all the modified extents along with the contents
of the transaction log itself. You should be aware that the log file itself might be small, but
the backup of the log can be many times larger if a large bulk operation has been
performed since the last log backup.
Download from www.wowebook.com
ptg
1117
Data Compression
34
Data Compression
SQL Server 2008 introduced a new data compression feature that is available in Enterprise
and Datacenter Editions. Data compression helps to reduce both storage and memory
requirements as the data is compressed both on disk and when brought into the SQL
Server data cache.
When compression is enabled and data is written to disk, it is compressed and stored in
the designated compressed format. When the data is read from disk into the buffer cache,
it remains in its compressed format. This helps reduce both storage requirements and
memory requirements. It also reduces I/O because more data can be stored on a data page
when it’s compressed. When the data is passed to another component of SQL Server,
however, the Database Engine then has to uncompress the data on the fly. In other words,
every time data has to be passed to or from the buffered cache, it has to be compressed or

uncompressed. This requires extra CPU overhead to accomplish. However, in most cases,
the amount of I/O and buffer cache saved by compression more than makes up for the
CPU costs, boosting the overall performance of SQL Server.
Data compression can be applied on the following database objects:
. Tables (clustered or heap)
. Nonclustered indexes
. Indexed views
As the DBA, you need to evaluate which of the preceding objects in your database could
benefit from compression and then decide whether you want to compress it using either row-
level or page-level compression. Compression is enabled or disabled at the object level There is
no single option you can enable that turns compression on or off for all objects in the data-
base. Fortunately, other than turning compression on or off for the preceding objects, you
don’t have to do anything else to use data compression. SQL Server handles data compression
transparently without your having to re-architect your database or your applications.
Row-Level Compression
Row-level compression isn’t true data compression. Instead, space savings are achieved by
using a more efficient storage format for fixed-length data to use the minimum amount of
space required. For example, the int data type uses 4 bytes of storage regardless of the value
stored, even NULL. However, only a single byte is required to store a value of 100. Row-level
compression allows fixed-length values to use only the amount of storage space required.
Row-level compression saves space and reduces I/O by
. Reducing the amount of metadata required to store data rows
. Storing fixed-length numeric data types as if they were variable-length data types,
using only as many bytes as necessary to store the actual value
. Storing CHAR data types as variable-length data types
. Not storing NULL or 0 values
Download from www.wowebook.com
ptg
1118
CHAPTER 34 Data Structures, Indexes, and Performance

Row-level data compression provides less compression than page-level data compression,
but it also incurs less overhead, reducing the amount of CPU resources required to
implement it.
Row-level compression can be enabled when creating a table or index or using the ALTER
TABLE or ALTER INDEX commands by specifying the WITH (DATA_COMPRESSION = ROW)
option. The following example enables row compression on the titles table in the
bigpubs2008 database:
ALTER TABLE titles REBUILD WITH (DATA_COMPRESSION=ROW)
Additionally, if a table or index is partitioned, you can apply compression at the parti-
tion level.
When row-level compression is applied to a table, a new row format is used that is unlike
the standard data row format discussed previouslywhich has a fixed-length data section
separate from a variable-length data section (see Figure 34.3). This new row format is
referred to as column descriptor, or CD, format. The name of this row format refers to the
fact the every column has description information contained in the row itself. Figure 34.7
illustrates a representative view of the CD format (a definitive view is difficult because,
except for the header, the number of bytes in each region is completely dependent on the
values in the data row).
The row header is always 1 byte in length and contains information similar to Status Bits
A in a normal data row:
. Bit 0—This bit indicates the type of record (1 = CD record format).
. Bit 1—This bit indicates whether the row contains versioning information.
. Bits 2–4—This three-bit value indicates what kind of information is stored in the
row (such as primary record, ghost record, forwarding record, index record).
. Bit 5—This bit indicates whether the row contains a Long data region (with values
greater than 8 bytes in length).
. Bits 6 and 7—These bits are not used.
The CD region consists of two parts. The first is either a 1- or 2-byte value indicating the
number of short columns (8 bytes or less). If the most significant bit of the first byte is set
to

0, it’s a 1-byte field representing up to 127 columns; if it’s 1, it’s a 2-byte field represent-
ing up to 32,767 columns. Following the first 1 or 2 bytes is the CD array. The CD array
uses 4 bits for each column in the table to represent information about the length of the
Header
(1 byte)
CD Region Short Data Region
Long Data Region
Special Information
FIGURE 34.7 A representative structure of a CD format row.
Download from www.wowebook.com
ptg
1119
Data Compression
34
column. A bit representation of 0 indicates the column is NULL. A bit representation of the
values 1 to 9 indicates the column is 0 to 8 bytes in length, respectively. A bit representa-
tion of 10 (0xa) indicates that the corresponding column value is a long data value and
uses no space in the short data region. A bit representation of 11 (0xb) represents a bit
column with a value of 1, and a bit representation of 12 (0xc) indicates that the corre-
sponding value is a 1-byte symbol representing a value in the page compression dictionary
(the page compression dictionary is discussed next in the page-level compression section).
The short data region contains each of the short data values. However, because accessing
the last columns can be expensive if there are hundreds of columns in the table, columns
are grouped into clusters of 30 columns. At the beginning of the short data region, there is
an area called the short data cluster array. Each entry in the array is a single byte, which
indicates the sum of the sizes of all the data in the previous cluster in the short data
region; the value is essentially a pointer to the first column of the cluster (no row offset is
needed for the first cluster because it starts immediately after the CD region).
Any data value in the row longer than 8 bytes is stored in the long data region. This can
include LOB and row-overflow pointers. Long data needs an actual offset value to allow

SQL Server to locate each value. This offset array looks similar to the offset array used in
the
standard data row structure. The long data region consists of three parts: an offset
array, a long data cluster array, and the long data. The long data cluster array is similar to
the short data cluster array; it has one entry for each 30-column cluster (except for the last
one) and serves to limit the cost of locating columns near the end of a long list of columns.
The special information section at the end of the row contains three optional pieces of
information. The existence of any or all of this information is indicated by bits in the first
1-byte header at the beginning of the row. The three special pieces of information are
. Forwarding pointer—This pointer is used in a heap when a row is forwarded due
to an update (forward pointers are discussed later in this chapter).
. Back pointer—If the row is a forwarded row, it contains a pointer back to the origi-
nal row location.
. Versioning information—If snapshot isolation is being used, 14 bytes of version-
ing information are appended to the row.
Page-Level Compression
Page-level compression is an implementation of true data compression, using both column
prefix and dictionary-based compression. Data is compressed be storing repeating values or
common prefixes only once and then referencing those values from other columns and
rows. When you implement page compression for a table, row compression is applied as
well. Page-level compression offers increased data compression over row-level compression
alone but at the expense of greater CPU utilization. It works using these techniques:
. First, row-level data compression is applied to fit as many rows as it can on a
single page.
Download from www.wowebook.com
ptg
1120
CHAPTER 34 Data Structures, Indexes, and Performance
. Next, column prefix compression is run. Essentially, repeating patterns of data at the
beginning of the values of a given column are removed and substituted with an

abbreviated reference, which is stored in the compression information (CI) structure
stored after the page header.
. Finally, dictionary compression is applied on the page. Dictionary compression
searches for repeated values anywhere on a page and stores them in the CI.
Page compression is applied only after a page is full and if SQL Server determines that
compressing a page will save a meaningful amount of space.
The amount of compression provided by page-level data compression is highly dependent
on the data stored in a table or index. If a lot of the data repeats itself, compression is
more efficient. If the data is more randomly discrete values, fewer benefits are gained from
using page-level compression.
Column prefix compression looks at the column values on a single page and chooses a
common prefix that can be used to reduce the storage space required for values in that
column. The longest value in the column that contains the prefix is chosen as the anchor
value. A row that represents the prefix values for each column is created and stored in the
CI structure that immediately follows the page header. Each column is then stored as a
delta from the anchor value, where repeated prefix values in the column are replaced by a
reference to the corresponding prefix. If the value in a row does not exactly match the
selected prefix value, a partial match can still be indicated.
For example, consider a page that contains the following data rows before prefix compres-
sion as shown in Figure 34.8.
After you apply column prefix compression on the page, the CI structure is stored after
the page header holding the prefix values for each column. The columns then are stored
as the difference between the prefix and column value, as shown in Figure 34.9.
In the first column in the first data row, the value
4b represents that the first four charac-
ters of the prefix (aaab) are present at the beginning of the column for that row and also
the character b. If you append the character b to the first four values of the prefix, it
rebuilds the original value of aaabb. For any columns values that are [empty], the column
matches the prefix value exactly. Any column value that starts with 0 means that none of
the first characters of the column match the prefix. For the fourth column, there is no

common prefix value in the columns, so no prefix value is stored in the CI structure.
Page Header
aaabb aaaab abcd abc
aaabccc bbbbb abcd mno
aaaccc aaaacc bbbb xyz
Data Rows
FIGURE 34.8 Sample page of a table before prefix compression.
Download from www.wowebook.com
ptg
1121
Data Compression
34
aaabccc
Page Header
Data Rows
aaabccc aaabcccaaabccc
4b4b abcd[empty]
0bbbb[empty] mno[empty]
[empty]3ccc xyz0bbbb
FIGURE 34.9 Sample page of a table after prefix compression.
After column prefix compression is applied to every column individually on the page, SQL
Server then looks to apply dictionary compression. Dictionary compression looks for
repeated values anywhere on the page and also stores them in the CI structure after the
column prefix values. Dictionary compression values replace repeated values anywhere on
a page. The following illustrates the same page shown previously after dictionary compres-
sion has been applied:
The dictionary is stored as a set of these duplicate values and a symbol to represent these
values in the columns on the page. As you can see in this example, 4b is repeated in
multiple columns in multiple rows, and the value is replaced by the symbol 0 throughout
the page. The value 0bbbb is replaced by the symbol 1. SQL Server recognizes that the

value stored in the column is a symbol and not a data value by examining the coding in
the CD array, as discussed earlier.
Not all pages contain both the prefix record and a dictionary. Having them both depends
on whether the data has enough repeating values or patterns to warrant either a prefix
record or a dictionary.
Data Rows
Page Header
0 0 [empty] abcd
[empty] 1 [empty] mno
3ccc
[empty] 1 xyz
aaabccc aaaacc abcd [NULL]
4b 0bbbb
FIGURE 34.10 Sample page of a table after dictionary compression.
Download from www.wowebook.com
ptg
1122
CHAPTER 34 Data Structures, Indexes, and Performance
The CI Record
The CI record is the only main structural change to a page when it is page compressed
versus a page that uses row compression only. As shown in the previous examples, the CI
record is located immediately after the page header. There is no entry for the CI record in
the row offset table because its location is always the same. A bit is set in the page header
to indicate whether the page is page compressed. When this bit is present, SQL Server
knows to look for the CI record. The CI record contains the data elements shown in
Table 34.7.
Implementing Page Compression
Page compression can be implemented for a table at the time it is created or by using the
ALTER TABLE command, as in the following example:
ALTER TABLE sales_big REBUILD WITH (DATA_COMPRESSION=PAGE)

Unlike row compression, which is applied immediately on the rows, page compression
isn’t applied until the page is full. The rows cannot be compressed until SQL Server can
determine what encodings for prefix and dictionary substitution are going to be used to
replace the actual data. When you enable page compression for a table or a partition, SQL
Server examines every full page to determine the possible space savings. Any pages that
are not full are not considered for compression. During the compression analysis, the
prefix and dictionary values are created, and the column values are modified to reflect the
prefix and dictionary values. Then row compression is applied. If the new compressed
page can hold at least five additional rows, or 25% more rows than the page currently
TABLE 34.7 Data Elements Within the CI Record
Name Description
Header This structure contains 1 byte to keep track of information about the CI. Bit 0 is
the version (currently always 0), Bit 1 indicates the presence of a column prefix
anchor record, and Bit 2 indicates the presence of a compression dictionary.
PageModCount
This value keeps track of the number of changes to the page to determine
whether the compression on the page should be reevaluated and the CI record
rebuilt.
Offsets This element contains values to help SQL Server find the dictionary. It contains
the offset of the end of the Column prefix anchor record and offset of the end of
the CI record itself.
Anchor Record This record looks exactly like a regular CD record (see Figure 34.7). Values
stored are the common prefix values for each column, some of which might be
NULL.
Dictionary The first 2 bytes represent the number of entries in the dictionary, followed by
an offset array of 2-byte entries, which indicate the end offset of each dictionary
entry, and then the actual dictionary values.
Download from www.wowebook.com
ptg
1123

Data Compression
34
holds, the page is compressed. If neither one of these criteria is met, the compressed
version of the page is discarded.
New rows inserted into a compressed page are compressed as they are inserted. However,
new entries are not added to the prefix list or dictionary based on a single new row. The
prefix values and dictionary symbols are rebuilt only on an all-or-nothing basis. After the
page is changed a sufficient number of times, SQL Server evaluates whether to rebuild the
CI record. The PageModCount field in the CI record is used to keep track of the number of
changes to the page since the CI record was last built or rebuilt. This value is updated
every time a row is updated, deleted, or inserted. If SQL Server encounters a full page
during a data modification and the PageModCount is greater than 25 or the PageModCount
divided by the number of rows on the page is greater than 25%, SQL Server reapplies the
compression analysis on the page. Again, only if recompressing the page creates room for
five additional rows, or 25% more rows than the page currently holds, the new
compressed page replaces the existing page.
In B-tree structures (nonclustered indexes or a clustered table), only the leaf-level and data
pages are considered for compression. When you insert a new row into a leaf or data page,
if the compressed row fits, it is inserted and nothing more is done. If it doesn’t fit, SQL
Server attempts to recompress the page and then recompress the row based on the new CI
record. If the row fits after recompression, it is inserted and nothing more is done. If the
row still doesn’t fit, the page needs to be split. When a compressed page is split, the CI
record is copied to the new page exactly as it was, along with the rows moved to the new
page. However, the
PageModCount value is set to 25, so that when the new page gets full, it
will be immediately analyzed for recompression. Leaf and data pages are also checked for
recompression whenever you run an index rebuild or shrink operation.
If you enable compression on a heap table, pages are evaluated for compression only
during rebuild and shrink operations. Also, if you drop a clustered index on a table,
turning it into a heap, SQL Server runs compression analysis on any full pages.

Compression is avoided during normal data modification operations on a heap to avoid
changes to the Row IDs, which are used as the row locators for any indexes on the heap.
(See the “Understanding Index Structures” section later in this chapter for a discussion of
row locators.) Although the
RowModCounter is still maintained, SQL Server essentially
ignores it and never tries to recompress a page based on the RowModCounter value.
Evaluating Page Compression
Before choosing to implement page compression, you should determine if the overhead of
page compression will provide sufficient benefit in space savings. To determine how
changing the compression state will affect a table or an index, you can use the SQL Server
2008 sp_estimate_data_compression_savings stored procedure, which is available only
in the editions of SQL Server that support data compression. This stored procedure evalu-
ates the effects of compression by sampling up to 5,000 pages in the table and creating a
copy of these 5,000 pages of the table in tempdb, performing the compression, and then
using the sample to estimate the overall size for the table after compression. The syntax
for sp_estimate_data_compression_savings is as follows:
Download from www.wowebook.com

×