Index Fragmentation
Paul Randal
Dev Lead, Microsoft SQL Server
Introduction
Structural details of an index
How these structures are used
Index fragmentation
SQL Server 2000 DBCC SHOWCONTIG
SQL Server 2000 DBCC INDEXDEFRAG
"Yukon" dm_db_index_physical_stats
"Yukon" ALTER INDEX
Records (1)
How is data stored in SQL Server?
Example record types
Data
Index
Text
Ghost data/index
Records (2)
‘Inside SQL Server 2000’ has the details
Things to note:
Non-leaf index records contain a child page
Id in every record
All non-clustered index leaf records contain
a base table RID
“Yukon” allows INCLUDE’d columns in an
index record
Pages (1)
How are records stored?
Example page types
Data page
Index page
Allocation maps (GAM, SGAM, IAM) pages
PFS page
Pages (2)
Layout is the same for all page types
8k
96 byte header
slot array
records
free space
Extents
How are pages grouped?
Extents are:
A group of 8 contiguous pages
Start on an 8 page boundary
Tracked in IAM, GAM, SGAM pages
0 1 2 3 54 6 7
Heap (1)
Simplest storage arrangement
Comprised entirely of data pages
Unordered, pages unlinked
Records are located using physical RID
create table foo (
first char(100), last char (100), city char (100))
Clustered index (1)
Alternative table structure to heap
Data stored in defined order
Fast lookup through B-tree
Records located through logical RID
create clustered index foo_c on foo (city)
Clustered index (2)
index tree pages
P
L L
P
L L
P
L L
R
data pages
Non-clustered index (1)
A way to provide a different ordering
Define on heaps or clustered indexes
Leaf records contain RID of matching record in base
table
create index foo_nc on foo (last)
create index foo_nc on foo (last) include (first)
Non-clustered index (2)
P
L L
P
L L
P
L L
R
index leaf pages
index tree pages
Why use an index?
Allows a variety of access modes:
Singleton lookup
Range scan
Allocation order scan
Allows skipping of sort step in query
Reduces amount of data to apply predicates to
Singleton lookup
Matching record
Range scan
Allocation-order scan
1 23 45 6
Side note: merry-go-rounds
Why don’t I get the index order when I do ‘select *’?
D A T A
scan 1
scan 2
scan 3
scan 1
starts
scan 2
starts
scan 3
starts
Readahead (1)
Why use readahead?
Keep the CPUs busy, maximize throughput
Feedback mechanism to determine how far ahead of
the scan point to read
Driven from parent level
Issues 1, 8, or 32 page IOs
Better contiguity = bigger IOs
Readahead (2)
L
What causes fragmentation (1)
Index leaf level of newly built index
Red arrow is the allocation order
Black arrows are following the logical order
What causes fragmentation (2)
Newly built index leaf after a single page split
Red arrow is the allocation order
Black arrows are following the logical order
What causes fragmentation (3)
Index leaf level after random inserts/deletes
Red arrow is the allocation order
Black arrows are following the logical order
Logical scan fragmentation
Occurs when the next logical page is not the next
physical page
Prevents optimal readahead
Reduces range scan performance
Does not affect cached pages
Smaller indexes affected less
Extent scan fragmentation
Occurs when the extents in an index are not
contiguous
Also affects readahead but not as much
Index A Index B Index A Index B Index A Index A
1 2 3 4 5 6
DBCC SHOWCONTIG
Your tool for determining fragmentation
Keys to success are:
Knowing which indexes to look at
Knowing which options to use
Knowing how to interpret the results