Pro SQL server internals

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (44.37 MB, 776 trang )

www.it-ebooks.info

For your convenience Apress has placed some of the front
matter material after the index. Please use the Bookmarks
and Contents at a Glance links to access them.

www.it-ebooks.info

Contents at a Glance
About the Author�� xxiii
About the Technical Reviewers�� xxv
Acknowledgments�� xxvii
Introduction�� xxix

■■Part 1: Tables and Indexes�� 1
■■Chapter 1: Data Storage Internals��3
■■Chapter 2: Tables and Indexes: Internal Structure and Access Methods��29
■■Chapter 3: Statistics��53
■■Chapter 4: Special Indexing and Storage Features��81
■■Chapter 5: Index Fragmentation��113
■■Chapter 6: Designing and Tuning the Indexes��125

■■Part 2: Other things that matter�� 149
■■Chapter 7: Constraints��151
■■Chapter 8: Triggers��165
■■Chapter 9: Views��181
■■Chapter 10: User-Defined Functions��195
■■Chapter 11: XML��209
■■Chapter 12: Temporary Tables��233

v
www.it-ebooks.info

■ Contents at a Glance

■■Chapter 13: CLR��255
■■Chapter 14: CLR Types��275
■■Chapter 15: Data Partitioning��301
■■Chapter 16: System Design Considerations��349

■■Part 3: Locking, Blocking and Concurrency�� 373
■■Chapter 17: Lock Types��375
■■Chapter 18: Troubleshooting Blocking Issues��387
■■Chapter 19: Deadlocks��399
■■Chapter 20: Lock Escalation��417
■■Chapter 21: Optimistic Isolation Levels��427
■■Chapter 22: Application Locks��439
■■Chapter 23: Schema Locks��443
■■Chapter 24: Designing Transaction Strategies��455

■■Part 4: Query Life Cycle�� 459
■■Chapter 25: Query Optimization and Execution��461
■■Chapter 26: Plan Caching��489

■■Part 5: Practical Troubleshooting�� 521
■■Chapter 27: System Troubleshooting��523
■■Chapter 28: Extended Events��553

■■Part 6: Inside the transaction log�� 583
■■Chapter 29: Transaction Log Internals��585
■■Chapter 30: Designing a Backup Strategy��603
■■Chapter 31: Designing a High Availability Strategy��625

vi
www.it-ebooks.info

■ Contents at a Glance

■■Part 7: In-Memory OLTP Engine�� 647
■■Chapter 32: In-Memory OLTP Internals ��649

■■Part 8: Columnstore Indexes�� 689
■■Chapter 33: In-Memory OLTP Programmability ��691
■■Chapter 34: Introduction to Columnstore Indexes ��709
■■Chapter 35: Clustered Columnstore Indexes��737
Index��755

vii
www.it-ebooks.info

Introduction
Several people asked me the same question during the time I worked on this book. “Why have you decided to write yet
another book on SQL Server Internals? There are plenty of books on this subject out there, including an excellent one
by Kalen Delaney et al., the latest version being entitled, Microsoft SQL Server 2012 Internals, Developer Reference
series (Microsoft Press 2013).
To be absolutely honest, I asked myself the same question while I toyed with the idea of writing that book. In the

end, I defined two goals:

1.

I wanted to write a book that explains how SQL Server works while keeping the content as
practical as possible.

2.

I wanted the book to be useful to both database administrators and developers.

There is a joke in SQL Server community: “How do you distinguish between junior- and senior-level database
professionals? Just ask them any question about SQL Server. The junior-level person gives you the straight answer. The
senior-level person, on the other hand, always answers, “It depends.”
As strange as it sounds, that is correct. SQL Server is a very complex product with a large number of components
that depend on each other. You can rarely give a straight yes or no answer to any question. Every decision comes with
its own set of strengths and weaknesses and leads to consequences that affect other parts of the system.
This book talks about on what “it depends”. My goal is to give you enough information about how SQL Server
works and to show you various examples of how specific database designs and code patterns affect SQL Server
behavior. I tried to avoid generic suggestions based on best practices. Even though those suggestions are great and
work in a large number of cases, there are always exceptions. I hope that, after you read this book, you will be able to
recognize those exceptions and make decisions that benefit your particular systems.
My second goal is based on the strong belief that the line between database administration and development is
very thin. It is impossible to be a successful database developer without knowledge of SQL Server Internals. Similarly,
it is impossible to be a successful database administrator without the ability to design efficient database schema
and to write good T-SQL code. That knowledge also helps both developers and administrators to better understand
and collaborate with each other, which is especially important nowadays in the age of agile development and

multi-terabyte databases.
I have worn both hats in my life. I started my career in IT as an application developer, slowly moving to backend
and database development over the years. At some point, I found that it was impossible to write good T-SQL code
unless I understood how SQL Server executes it. That discovery forced me to learn SQL Server Internals, and it led to
a new life where I design, develop, and tune various database solutions. I do not write client applications anymore;
however, I perfectly understand the challenges that application developers face when they deal with SQL Server. I
have “been there and done that.”
I still remember how hard it was to find good learning materials. There were plenty of good books; however,
all of them had a clear separation in their content. They expected the reader either to be developer or database
administrator — never both. I tried to avoid that separation in this book. Obviously, some of the chapters are more
DBA-oriented, while others lean more towards developers. Nevertheless, I hope that anyone who is working with SQL
Server will find the content useful.
Nevertheless, do not consider this book a SQL Server tutorial. I expect you to have previous experience working
with relational databases — preferably with SQL Server. You need to know RDBMS concepts, be familiar with different
types of database objects, and be able to understand SQL code if you want to get the most out of this book.

xxix
www.it-ebooks.info

■ Introduction

Finally, I would like to thank you for choosing this book and for your trust in me. I hope that you will enjoy
reading it as much as I enjoyed writing it.

How This Book Is Structured
The book is logically separated into eight different parts. Even though all of these parts are relatively independent
of each other, I would encourage you to start with Part 1, “Tables and Indexes” anyway. This part explains how SQL
Server stores and works with data, which is the key point in understanding SQL Server Internals. The other parts of the
book rely on this understanding.

The Parts of the book are as follows:
Part 1: Tables and Indexes covers how SQL Server works with data. It explains the internal
structure of database tables; discusses how and when SQL Server uses indexes, and
provides you with the basic guidelines about how to design and maintain them.
Part 2: Other Things That Matter provides an overview of different T-SQL objects, and it
outlines their strengths and weaknesses along with use-cases when they should or should
not be used. Finally, this part discusses data partitioning, and provides general system
design considerations for systems that utilize SQL Server as a database backend.
Part 3: Locking, Blocking, and Concurrency talks about the SQL Server concurrency
model. It explains the root-causes of various blocking issues in SQL Server, and it shows
you how to troubleshoot and address them in your systems. Finally, this part provides
you with a set of guidelines on how to design transaction strategies in a way that improves
concurrency in systems.
Part 4: Query Life Cycle discusses the optimization and execution of queries in SQL Server.
Moreover, it explains how SQL Server caches execution plans and it demonstrates several
plan-caching–related issues commonly encountered in systems.
Part 5: Practical Troubleshooting provides an overview of the SQL Server Execution
Model, and it explains how you can quickly diagnose systems and pinpoint the root-causes
of the problems.
Part 6: Inside the Transaction Log explains how SQL Server works with the transaction
log, and it gives you a set of guidelines on how to design Backup and High Availability
strategies in systems.
Part 7: In-Memory OLTP Engine (Hekaton) talks about the new in-memory OLTP engine
introduced in SQL Server 2014. It explains how Hekaton works internally and how you can
work with memory-optimized data in your systems.
Part 8: Columnstore Indexes provides an overview of columnstore indexes, which can
dramatically improve the performance of Data Warehouse solutions. It covers nonclustered
columnstore indexes, which were introduced in SQL Server 2012, along with clustered
columnstore indexes, introduced in SQL Server 2014.
As you may have already noticed, this book covers multiple SQL Server versions including the recently released

SQL Server 2014. I have noted version-specific features whenever necessary; however, most of the content is
applicable to any SQL Server version, starting with SQL Server 2005.
It is also worth noting that most of the figures and examples in this book were created in the Enterprise Edition
of SQL Server 2012 with parallelism disabled on the server level in order to simplify the resulting execution plans. In
some cases, you may get slightly different results when you run scripts in your environment using different versions of
SQL Server.

xxx
www.it-ebooks.info

■ Introduction

Downloading the Code
You can download the code used in this book from the Source Code section of the Apress web site (www.apress.com)
or from the Publications section of my blog (). The source code consists of SQL Server
Management Studio solutions, which include a set of the projects (one per chapter). Moreover, it includes several
.Net C# projects, which provide the client application code used in the examples in Chapters 12, 13, 14, and 16.

Contacting the Author
You can visit my blog at: or email me at:

xxxi
www.it-ebooks.info

Part 1

Tables and Indexes

www.it-ebooks.info

Chapter 1

Data Storage Internals
SQL Server database is a collection of objects that allow you to store and manipulate data. In theory, SQL Server
supports 32,767 databases per instance, although the typical installation usually has only several databases.
Obviously, the number of the databases SQL Server can handle depends on the load and hardware. It is not unusual
to see servers hosting dozens or even hundreds of small databases.
In this chapter, we will discuss the internal structure of the databases, and will cover how SQL Server stores
the data.

Database Files and Filegroups
Every database consists of one or more transaction log files and one or more data files. A transaction log stores
information about database transactions and all of the data modifications made in each session. Every time the data is
modified, SQL Server stores enough information in the transaction log to undo (rollback) or redo (replay) this action.

■■Note We will talk about the transaction log in greater detail in Part 6 of this book “Inside the Transaction Log.”
Every database has one primary data file, which by convention has an .mdf extension. In addition, every database
can also have secondary database files. Those files, by convention, have .ndf extensions.
All database files are grouped into the filegroups. A filegroup is a logical unit that simplifies database
administration. It permits the logical separation of database objects and physical database files. When you create
database objects-tables, for example, you specify into what filegroup they should be placed without worrying about
the underlying data files’ configuration.
Listing 1-1 shows the script that creates a database with name OrderEntryDb. This database consists of three
filegroups. The primary filegroup has one data file stored on the M: drive. The second filegroup, Entities, has
one data file stored on the N: drive. The last filegroup, Orders, has two data files stored on the O: and P: drives. Finally,
there is a transaction log file stored on the L: drive.
Listing 1-1. Creating a database

create database [OrderEntryDb] on
primary
(name = N'OrderEntryDb', filename = N'm:\OEDb.mdf'),
filegroup [Entities]
(name = N'OrderEntry_Entities_F1', filename = N'n:\OEEntities_F1.ndf'),
filegroup [Orders]
(name = N'OrderEntry_Orders_F1', filename = N'o:\OEOrders_F1.ndf'),

3
www.it-ebooks.info

Chapter 1 ■ Data Storage Internals

(name = N'OrderEntry_Orders_F2', filename = N'p:\OEOrders_F2.ndf')
log on
(name = N'OrderEntryDb_log', filename = N'l:\OrderEntryDb_log.ldf')

You can see the physical layout of the database and data files in Figure 1-1. There are five disks with four data files
and one transaction log file. The dashed rectangles represent the filegroups.

Figure 1-1. Physical layout of the database and data files
The ability to put multiple data files inside a filegroup lets us spread the load across different storage drives,
which could help to improve the I/O performance of the system. Transaction log throughput, on the other hand, does
not benefit from multiple files. SQL Server works with transactional logs sequentially, and only one log file would be
accessed at any given time.

■■Note We will talk about the transaction Log internal structure and best practices associated with it in Chapter 29,
“Transaction Log Internals.”
Let's create a few tables, as shown in Listing 1-2. The Clients and Articles tables are placed into the Entities

filegroup. The Orders table resides in the Orders filegroup.
Listing 1-2. Creating tables
create table dbo.Customers
(
/* Table Columns */
) on [Entities];

create table dbo.Articles
(
/* Table Columns */
) on [Entities];

create table dbo.Orders

4
www.it-ebooks.info

Chapter 1 ■ Data Storage Internals

(
/* Table Columns */
) on [Orders];

Figure 1-2 shows physical layout of the tables in the database and disks.

Figure 1-2. Physical layout of the tables
The separation between logical objects in the filegroups and the physical database files allow us to fine-tune
the database file layout to get the most out of the storage subsystem without worrying that it breaks the system. For
example, independent software vendors (ISV), who are deploying their products to different customers, can adjust the

number of database files during the deployment stage based on the underlying I/O configuration and the expected
amount of data. These changes will be transparent to developers who are placing the database objects into the
filegroups rather than into database files.
It is generally recommended to avoid using the PRIMARY filegroup for anything but system objects. Creating a
separate filegroup or set of the filegroups for the user objects simplifies database administration and disaster recovery,
especially in the case of large databases. We will discuss this in great detail in Chapter 30, “Designing a Backup
Strategy.”
You can specify initial file sizes and auto-growth parameters at the time that you create the database or add new
files to an existing database. SQL Server uses a proportional fill algorithm when choosing to which data file it should
write data. It writes an amount of data proportional to the free space available in the files—the more free space a file
has, the more writes it handles.

■■Tip It is recommended that all files in a single filegroup have the same initial size and auto-growth parameters with
grow size being defined in megabytes rather than by percent. This helps the proportional fill algorithm to balance write
activities evenly across data files.
Every time SQL Server grows the files, it fills the newly allocated space with zeros. This process blocks all sessions
that are writing to the corresponding file or, in case of transaction log growth, generating transaction log records.
SQL Server always zeros out the transaction log, and this behavior cannot be changed. However, you can control
if data files are zeroed out or not by enabling or disabling Instant File Initialization. Enabling Instant File Initialization
helps speed up data file growth and reduces the time required to create or restore the database.

5
www.it-ebooks.info

Chapter 1 ■ Data Storage Internals

■■Note There is a small security risk associated with Instant File Initialization. When this option is enabled, an
unallocated part of the data file can contain information from previously deleted OS files. Database administrators are
able to examine such data.

You can enable Instant File Initialization by adding an SA_MANAGE_VOLUME_NAME permission, also known as
Perform Volume Maintenance Task, to the SQL Server startup account. This can be done under the Local Security
Policy management application (secpol.msc), as shown in Figure 1-3. You need to open the properties for the
“Perform volume maintenance task” permission, and add a SQL Server startup account to the list of users there.

Figure 1-3. Enabling Instant File Initialization in secpol.msc

■■Tip SQL Server checks to see if Instant File Initialization is enabled on startup. You need to restart SQL Server service
after you give the corresponding permission to the SQL Server startup account.
In order to check if Instant File Initialization is enabled, you can use the code shown in Listing 1-3. This code sets
two trace flags that force SQL Server to put additional information into the error log, creates a small database, and
reads the content of the error log file.
Listing 1-3. Checking to see if Instant File Initialization is enabled
dbcc traceon(3004,3605,-1)
go

create database Dummy
go

exec sp_readerrorlog
go

6
www.it-ebooks.info

Chapter 1 ■ Data Storage Internals

drop database Dummy

go

dbcc traceoff(3004,3605,-1)
go

If Instant File Initialization is not enabled, the SQL Server error log indicates that SQL Server is zeroing out the
.mdf data file in addition to zeroing out the log .ldf file, as shown in Figure 1-4. When Instant File Initialization is
enabled, it would only show zeroing out of the log .ldf file.

Figure 1-4. Checking if Instant File Initialization is enabled - SQL Server error log
Another important database option that controls database file sizes is Auto Shrink. When this option is enabled,
SQL Server shrinks the database files every 30 minutes, reducing their size and releasing the space to operating
system. This operation is very resource intensive and rarely useful, as the database files grow again when new data
comes into the system. Moreover, it greatly increases index fragmentation in the database. Auto Shrink should never be
enabled. Moreover, Microsoft will remove this option in future versions of SQL Server.

■■Note We will talk about index fragmentation in greater detail in Chapter 5, “Index Fragmentation.”

Data Pages and Data Rows
The space in the database is divided into logical 8KB pages. These pages are continuously numbered starting with
zero, and they can be referenced by specifying a file ID and page number. The page numbering is always continuous
such that when SQL Server grows the database file, new pages are numbered starting from the last highest page
number in the file plus one. Similarly, when SQL Server shrinks the file, it removes the highest number pages from
the file.
Figure 1-5 shows the structure of a data page.

7
www.it-ebooks.info

Chapter 1 ■ Data Storage Internals

Figure 1-5. The data page structure
A 96-byte page header contains various pieces of information about a page, such as the object to which the page
belongs, the number of rows and amount of free space available on the page, links to the previous and next pages if
the page is in an index page chain, and so on.
Following the page header is the area where actual data is stored. This is followed by free space. Finally, there is
a slot array, which is a block of 2-byte entries indicating the offset at which the corresponding data rows begin on
the page.
The slot array indicates the logical order of the data rows on the page. If data on a page needs to be sorted in
the order of the index key, SQL Server does not physically sort the data rows on the page, but rather it populates the
slot array based on the index sort order. The slot 0 (rightmost in Figure 1-5) stores the offset for the data row with the
lowest key value on the page; slot 1, the second lowest key value; and so forth.

■■Note We will discuss indexes in greater detail in Chapter 2, “Internal Structure and Access Patterns.”
SQL Server offers a rich set of the system data types that can be logically separated into two different groups:
fixed length and variable length. Fixed-length data types, such as int, datetime, char, and others always use the same
amount of storage space regardless of their value, even when it is NULL. For example, the int column always uses 4
bytes and an nchar(10) column always uses 20 bytes to store information.
In contrast, variable-length data types, such as varchar, varbinary, and a few others, use as much storage space
as required to store data plus two extra bytes. For example an nvarchar(4000) column would use only 12 bytes to
store a five-character string and, in most cases, 2 bytes to store a NULL value. We will discuss the case where
variable-length columns do not use storage space for NULL values later in this chapter.
Let's look at the structure of a data row, as shown in Figure 1-6.

8
www.it-ebooks.info

Chapter 1 ■ Data Storage Internals

Figure 1-6. Data row structure
The first 2 bytes of the row, called Status Bits A and Status Bits B, are bitmaps that contain information about the
row, such as row type; if the row has been logically deleted (ghosted); and if the row has NULL values, variable-length
columns, and a versioning tag.
The next two bytes in the row are used to store the length of the fixed-length portion of the data. They are
followed by the fixed-length data itself.
After the fixed-length data portion, there is a null bitmap, which includes two different data elements. The first
2-byte element is the number of columns in the row. It is followed by a null bitmap array. This array uses one bit
for each column of the table, regardless of whether it is nullable or not.
A null bitmap is always present in data rows in heap tables or clustered index leaf rows, even when the table
does not have nullable columns. However, the null bitmap is not present in non-leaf index rows nor leaf-level rows of
nonclustered indexes when there are no nullable columns in the index.

■■Note We will talk about indexes in greater detail in Chapter 2, “Internal Structure and Access Patterns.”
Following the null bitmap, there is the variable-length data portion of the row. It starts with a two-byte number
of variable-length columns in the row followed by a column-offset array. SQL Server stores a two-byte offset value for
each variable-length column in the row, even when value is null. It is followed by the actual variable-length portion
of the data. Finally, there is an optional 14-byte versioning tag at the end of the row. This tag is used during operations
that require row-versioning, such as an online index rebuild, optimistic isolation levels, triggers, and a few others.

■■Note We will discuss Index Maintenance in Chapter 5; Triggers in Chapter 8; and Optimistic Isolation Levels in
Chapter 21.

9
www.it-ebooks.info

Chapter 1 ■ Data Storage Internals

Let's create a table, populate it with some data, and look at the actual row data. The code is shown in Listing 1-4.
The Replicate function repeats the character provided as the first parameter 255 times.
Listing 1-4. The data row format: Table creation
create table dbo.DataRows
(
ID int not null,
Col1 varchar(255) null,
Col2 varchar(255) null,
Col3 varchar(255) null
);

insert into dbo.DataRows(ID, Col1, Col3) values (1,replicate('a',255),replicate('c',255));

insert into dbo.DataRows(ID, Col2) values (2,replicate('b',255));

dbcc ind
(
'SQLServerInternals' /*Database Name*/
,'dbo.DataRows' /*Table Name*/
,-1 /*Display information for all pages of all indexes*/
);

An undocumented, but well-known DBCC IND command returns the information about table page allocations.
You can see the output of this command in Figure 1-7.

Figure 1-7. DBCC IND output
There are two pages that belong to the table. The first one with PageType=10 is a special type of the page called an
IAM allocation map. This page tracks the pages that belong to a particular object. Do not focus on that now, however,
as we will cover allocation map pages later in the chapter.

■■Note SQL Server 2012 introduces another undocumented data management function (DMF), sys.dm_db_database_
page_allocations, which can be used as a replacement for the DBCC IND command. The output of this DMF provides
more information when compared to DBCC IND, and it can be joined with other system DMVs and/or catalog views.
The page with PageType=1 is the actual data page that contains the data rows. The PageFID and PagePID columns
show the actual file and page numbers for the page. You can use another undocumented command, DBCC PAGE, to
examine its contents, as shown in Listing 1-5.
Listing 1-5. The data row format: DBCC PAGE call
-- Redirecting DBCC PAGE output to console
dbcc traceon(3604)
dbcc page

10
www.it-ebooks.info

Chapter 1 ■ Data Storage Internals

(
'SqlServerInternals' /*Database Name*/
,1 /*File ID*/
,214643 /*Page ID*/
,3 /*Output mode: 3 - display page header and row details */
);

Listing 1-6 shows the output of the DBCC PAGE that corresponds to the first data row. SQL Server stores the data in
byte-swapped order. For example, a two-byte value of 0001 would be stored as 0100.
Listing 1-6. DBCC PAGE output for the first row
Slot 0 Offset 0x60 Length 39

Record Type = PRIMARY_RECORD

Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 39
Memory Dump @0x000000000EABA060

0000000000000000: 30000800 01000000 04000403 001d001d 00270061 0................'.a
0000000000000014: 61616161 61616161 61636363 63636363 636363
aaaaaaaaacccccccccc

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4
ID = 1

Slot 0 Column 2 Offset 0x13 Length 10 Length (physical) 10
Col1 = aaaaaaaaaa

Slot 0 Column 3 Offset 0x0 Length 0 Length (physical) 0
Col2 = [NULL]

Slot 0 Column 4 Offset 0x1d Length 10 Length (physical) 10
Col3 = cccccccccc

Let's look at the data row in more detail, as shown in Figure 1-8.

Figure 1-8. First data row
As you see, the row starts with the two status bits followed by a two-byte value of 0800. This is the byte-swapped
value of 0008, which is the offset for the number of columns attribute in the row. This offset tells SQL Server where the
fixed-length data part of the row ends.

11
www.it-ebooks.info

Chapter 1 ■ Data Storage Internals

The next four bytes are used to store fixed-length data, which is the ID column in our case. After that, there is
the two-byte value that shows that the data row has four columns followed by a one-byte NULL bitmap. With just
four columns, one byte in the bitmap is enough. It stores the value of 04, which is 00000100 in the binary format. It
indicates that the third column in the row contains a NULL value.
The next two bytes store the number of variable-length columns in the row, which is 3 (0300 in byte-swapped
order). It follows by offset array, in which each two bytes stores the offset where variable-length column data ends.
As you see, even though Col2 is NULL, it still uses the slot in the offset-array. Finally, there is the actual data from the
variable-length columns.
Now let's look at the second data row. Listing 1-7 shows DBCC PAGE output, and Figure 1-9 shows the row data.
Listing 1-7. DBCC PAGE output for the second row
Slot 1 Offset 0x87 Length 27

Record Type = PRIMARY_RECORD
Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 27
Memory Dump @0x000000000EABA087

0000000000000000: 30000800 02000000 04000a02 0011001b 00626262 0................bbb
0000000000000014: 62626262 626262
bbbbbbb

Slot 1 Column 1 Offset 0x4 Length 4 Length (physical) 4
ID = 2

Slot 1 Column 2 Offset 0x0 Length 0 Length (physical) 0
Col1 = [NULL]

Slot 1 Column 3 Offset 0x11 Length 10 Length (physical) 10
Col2 = bbbbbbbbbb

Slot 1 Column 4 Offset 0x0 Length 0 Length (physical) 0
Col3 = [NULL]

Figure 1-9. Second data row data
The NULL bitmap in the second row represents a binary value of 00001010, which shows that Col1 and Col3 are
NULL. Even though the table has three variable-length columns, the number of variable-length columns in the row
indicates that there are just two columns/slots in the offset-array. SQL Server does not maintain the information about
the trailing NULL variable-length columns in the row.

12
www.it-ebooks.info

Chapter 1 ■ Data Storage Internals

■■Tip You can reduce the size of the data row by creating tables in the manner in which variable-length columns,
which usually store null values, are defined as the last ones in CREATE TABLE statement. This is the only case when the
order of columns in the CREATE TABLE statement matters.
The fixed-length data and internal attributes must fit into the 8,060 bytes available on the single data page. SQL
Server does not let you create the table when this is not the case. For example, the code in Listing 1-8 produces
an error.
Listing 1-8. Creating a table with a data row size that exceeds 8060 bytes
create table dbo.BadTable
(
Col1 char(4000),
Col2 char(4060)

) ;

Msg 1701, Level 16, State 1, Line 1
Creating or altering table 'BadTable' failed because the minimum row size would be 8067, including 7
bytes of internal overhead. This exceeds the maximum allowable table row size of 8060 bytes.

Large Objects Storage
Even though the fixed-length data and the internal attributes of a row must fit into a single page, SQL Server can store
the variable-length data on different data pages. There are two different ways to store the data, depending on the
data type and length.

Row-Overflow Storage
SQL Server stores variable-length column data, which does not exceed 8,000 bytes, on special pages called
row-overflow pages. Let's create a table and populate it with the data shown in Listing 1-9.
Listing 1-9. ROW_OVERFLOW data: Creating a table
create table dbo.RowOverflow
(
ID int not null,
Col1 varchar(8000) null,
Col2 varchar(8000) null
);

insert into dbo.RowOverflow(ID, Col1, Col2) values (1,replicate('a',8000),replicate('b',8000));

As you see, SQL Server creates the table and inserts the data row without any errors, even though the data row
size exceeds 8,060 bytes. Let's look at the table page allocation using the DBCC IND command. The results are shown in
Figure 1-10.

13

www.it-ebooks.info

Chapter 1 ■ Data Storage Internals

Figure 1-10. ROW_OVERFLOW data: DBCC IND results
Now you can see two different sets of IAM and data pages. The data page with PageType=3 represents the data
page that stores ROW_OVERFLOW data.
Let's look at data page 214647, which is the in-row data page that stores main row data. The partial output of the
DBCC PAGE command for the page (1:214647) is shown in Listing 1-10.
Listing 1-10. ROW_OVERFLOW data: DBCC PAGE results for IN_ROW data
Slot 0 Offset 0x60 Length 8041

Record Type = PRIMARY_RECORD
Record Attributes = NULL_BITMAP VARIABLE_COLUMNS
Record Size = 8041
Memory Dump @0x000000000FB7A060

0000000000000000: 30000800 01000000 03000002 00511f69 9f616161 0............Q.iŸaaa
0000000000000014: 61616161 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaaaaaa
0000000000000028: 61616161 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaaaaaa
000000000000003C: 61616161 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaaaaaa
0000000000000050: 61616161 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaaaaaa
<Skipped>
0000000000001F2C: 61616161 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaaaaaa
0000000000001F40: 61616161 61616161 61616161 61616161 61020000 aaaaaaaaaaaaaaaaa...
0000000000001F54: 00010000 00290000 00401f00 00754603 00010000 .....)…@…uF.....
0000000000001F68: 00

As you see, SQL Server stores Col1 data in-row. Col2 data, however, has been replaced with a 24-byte value. The

first 16 bytes are used to store off-row storage metadata, such as type, length of the data, and a few other attributes.
The last 8 bytes is the actual pointer to the row on the row-overflow page, which is the file, page, and slot number.
Figure 1-11 shows this in detail. Remember that all information is stored in byte-swapped order.

Figure 1-11. ROW_OVERFLOW data: Row-overflow page pointer structure
As you see, the slot number is 0, file number is 1, and page number is the hexadecimal value 0x00034675, which
is decimal 214645. The page number matches the DBCC IND results shown in Figure 1-10.
The partial output of the DBCC PAGE command for the page (1:214645) is shown in Listing 1-11.

14
www.it-ebooks.info

Chapter 1 ■ Data Storage Internals

Listing 1-11. ROW_OVERFLOW data: DBCC PAGE results for ROW_OVERFLOW data
Blob row at: Page (1:214645) Slot 0 Length: 8014 Type: 3
Blob Id:2686976

0000000008E0A06E: 62626262 62626262 62626262 62626262
0000000008E0A07E: 62626262 62626262 62626262 62626262
0000000008E0A08E: 62626262 62626262 62626262 62626262

As you see, Col2 data is stored in the first slot on the page.

(DATA)

bbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbb

LOB Storage
For the text, ntext, or image columns, SQL Server stores the data off-row by default. It uses another kind of page
called LOB data pages.

■■Note You can control this behavior up to a degree by using the “text in row” table option. For example, exec sp_
table_option dbo.MyTable, 'text in row', 200 forces SQL Server to store LOB data less than or equal to 200 bytes
in-row. LOB data greater than 200 bytes would be stored in LOB pages.
The logical LOB data structure is shown in Figure 1-12.

Figure 1-12. LOB data: Logical structure
Like ROW_OVERFLOW data, there is a pointer to another piece of information called the LOB root structure,
which contains a set of the pointers to other data pages/rows. When LOB data is less than 32 KB and can fit into five
data pages, the LOB root structure contains the pointers to the actual chunks of LOB data. Otherwise, the LOB tree
starts to include an additional, intermediate levels of pointers, similar to the index B-Tree, which we will discuss in
Chapter 2, “Tables and Indexes: Internal Structure and Access Methods.”

15
www.it-ebooks.info

Chapter 1 ■ Data Storage Internals

Let's create the table and insert one row of data, as shown in Listing 1-12. We need to cast the first argument of
the replicate function to varchar(max). Otherwise, the result of the replicate function would be limited to 8,000
bytes.
Listing 1-12. LOB data: Table creation
create table dbo.TextData
(
ID int not null,

Col1 text null
);

insert into dbo.TextData(ID, Col1) values (1, replicate(convert(varchar(max),'a'),16000));

The page allocation for the table is shown in Figure 1-13.

Figure 1-13. LOB data: DBCC IND result
As you see, the table has one data page for in-row data and three data pages for LOB data. I am not going to
examine the structure of the data row for in-row allocation; it is similar to the ROW_OVERFLOW allocation. However,
with the LOB allocation, it stores less metadata information in the pointer and uses 16 bytes rather than the 24 bytes
required by the ROW_OVERFLOW pointer.
The result of DBCC PAGE command for the page that stores the LOB root structure is shown in Listing 1-13.
Listing 1-13. LOB data: DBCC PAGE results for the LOB page with the LOB root structure
Blob row at: Page (1:3046835) Slot 0 Length: 84 Type: 5 (LARGE_ROOT_YUKON)

Blob Id: 131661824 Level: 0 MaxLinks: 5 CurLinks: 2

Child 0 at Page (1:3046834) Slot 0 Size: 8040 Offset: 8040
Child 1 at Page (1:3046832) Slot 0 Size: 7960 Offset: 16000

As you see, there are two pointers to the other pages with LOB data blocks, which are similar to the blob data
shown in Listing 1-11.
The format, in which SQL Server stores the data from the (MAX) columns, such as varchar(max), nvarchar(max),
and varbinary(max), depends on the actual data size. SQL Server stores it in-row when possible. When in-row
allocation is impossible, and data size is less or equal to 8,000 bytes, it stored as row-overflow data. The data that
exceeds 8,000 bytes is stored as LOB data.

■■Note text, ntext, and image data types are deprecated, and they will be removed in future versions of SQL Server.
Use varchar(max), nvarchar(max), and varbinary(max) columns instead.

16
www.it-ebooks.info

Chapter 1 ■ Data Storage Internals

It is also worth mentioning that SQL Server always stores rows that fit into a single page using in-row allocations.
When a page does not have enough free space to accommodate a row, SQL Server allocates a new page and places the
row there rather than placing it on the half-full page and moving some of the data to row-overflow pages.

SELECT * and I/O
There are plenty of reasons why selecting all columns from a table with the select * operator is not a good idea.
It increases network traffic by transmitting columns that the client application does not need. It also makes query
performance tuning more complicated, and it introduces side effects when the table schema changes.
It is recommended that you avoid such a pattern and explicitly specify the list of columns needed by the client
application. This is especially important with row-overflow and LOB storage, when one row can have data stored in multiple
data pages. SQL Server needs to read all of those pages, which can significantly decrease the performance of queries.
As an example, let's assume that we have table dbo.Employees with one column storing employee pictures. The
Listing 1-14 creates the table and populates it with some data.
Listing 1-14. Select * and I/O: Table creation
create table dbo.Employees
(
EmployeeId int not null,
Name varchar(128) not null,
Picture varbinary(max) null
);

;WITH N1(C) AS (SELECT 0 UNION ALL SELECT 0) -- 2 rows
,N2(C) AS (SELECT 0 FROM N1 AS T1 CROSS JOIN N1 AS T2) -- 4 rows
,N3(C) AS (SELECT 0 FROM N2 AS T1 CROSS JOIN N2 AS T2) -- 16 rows

,N4(C) AS (SELECT 0 FROM N3 AS T1 CROSS JOIN N3 AS T2) -- 256 rows
,N5(C) AS (SELECT 0 FROM N4 AS T1 CROSS JOIN N2 AS T2) -- 1,024 rows
,IDs(ID) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM N5)
insert into dbo.Employees(EmployeeId, Name, Picture)
select
ID, 'Employee ' + convert(varchar(5),ID),
convert(varbinary(max),replicate(convert(varchar(max),'a'),120000))
from Ids;

The table has 1,024 rows with binary data amounting to 120,000 bytes. Let's assume that we have code in the client
application that needs the EmployeeId and Name to populate a drop-down box. If a developer is not careful, he or she
can write a select statement using the select * pattern, even though a picture is not needed for this particular use-case.
Let's compare the performance of two selects; one selecting all data columns and another that selects only
EmployeeId and Name. The code to do this is shown in Listing 1-15. The execution time and number of reads on my
computer is shown in Table 1-1.
Listing 1-15. Select * and I/O: Performance comparison
set statistics io on
set statistics time on

select * from dbo.Employees;
select EmployeeId, Name from dbo.Employees;

set statistics io off
set statistics time off

www.it-ebooks.info

17

Chapter 1 ■ Data Storage Internals

Table 1-1. Select *: Number of reads and execution time of the queries

select EmployeeId, Name from dbo.Employee

select * from dbo.Employee

Number of reads

7

90,895

Execution time

2 ms

3,343 ms

As you see, the first select, which reads the LOB data and transmits it to the client, is a few orders of magnitude
slower than the second select. One case where this becomes extremely important is with client applications, which
use Object Relational Mapping (ORM) frameworks. Developers tend to reuse the same entity objects in different parts
of an application. As a result, an application may load all attributes/columns even though it does not need all of them
in many cases.
It is better to define different entities with a minimum set of required attributes on an individual usecase basis. In our example, it would work best to create separate entities/classes, such as EmployeeList
and EmployeeProperties. An EmployeeList entity would have two attributes: EmployeeId and Name.
EmployeeProperties would include a Picture attribute in addition to the two mentioned. This approach can
significantly improve the performance of systems.

Extents and Allocation Map Pages
SQL Server logically groups eight pages into 64KB units called extents. There are two types of extents available: Mixed
extents store data that belongs to different objects. Uniform extents store the data for the same object.
When a new object is created, SQL Server stores first eight object pages in mixed extents. After that, all
subsequent space allocation for that object is done with uniform extents.
SQL Server uses special kind of pages, called Allocation Maps, to track extent and page usage in a file. There are
several different types of allocation maps pages in SQL Server.
Global Allocation Map (GAM) pages track if extents have been allocated by any objects. The data is represented
as bitmaps where each bit indicates the allocation status of an extent. Zero bits indicate that the corresponding extents
are in use. The bits with a value of one indicate that the corresponding extents are free. Every GAM page covers about
64,000 extents, or almost 4GB of data. This means that every database file has one GAM page for about 4GB of file size.
Shared Global Allocation Map (SGAM) pages track information about mixed extents. Similar to GAM pages, it is a
bitmap with one bit per extent. The bit has a value of one if the corresponding extent is a mixed extent and has at least
one free page available. Otherwise, the bit is set to zero. Like a GAM page, SGAM page tracks about 64,000 extents, or
almost 4GB of data.
SQL Server can determine the allocation status of the extent by looking at the corresponding bits in GAM and
SGAM pages. Table 1-2 shows the possible combinations of the bits.
Table 1-2. Allocation status of the extents

Status

SGAM bit

GAM bit

Free, not in use

0

1

Mixed extent with at least one free page available

1

0

Uniform extent or full mixed extent

0

0

18
www.it-ebooks.info

Pro SQL server internals

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về