Tải bản đầy đủ (.pdf) (517 trang)

Microsoft® SQL ServerTM 2005 Performance Optimization and Tuning Handbook pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (5.78 MB, 517 trang )

Microsoft
®
SQL Server
TM
2005 Performance
Optimization and Tuning
Handbook
This Page Intentionally Left Blank
Microsoft
®
SQL Server
TM
2005 Performance
Optimization and Tuning
Handbook
Ken England
Gavin Powell
Amsterdam • Boston • Heidelberg • London • New York • Oxford
Paris • San Diego• San Francisco • Singapore • Sydney • Tokyo
Digital Press is an imprint of Elsevier
Digital Press is an imprint of Elsevier
30 Corporate Drive, Suite 400, Burlington, MA 01803, USA
Linacre House, Jordan Hill, Oxford OX2 8DP, UK
Copyright © 2007, Elsevier Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or
transmitted in any form or by any means, electronic, mechanical, photocopying,
recording, or otherwise, without the prior written permission of the publisher.
Permissions may be sought directly from Elsevier’s Science & Technology Rights
Department in Oxford, UK: phone: (+44) 1865 843830, fax: (+44) 1865 853333,
E-mail: You may also complete your request online


via the Elsevier homepage (), by selecting “Support & Contact”
then “Copyright and Permission” and then “Obtaining Permissions.”
Recognizing the importance of preserving what has been written, Elsevier prints its
books on acid-free paper whenever possible.
Library of Congress Cataloging-in-Publication Data
Application Submitted.
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
ISBN: 978-1-55558-319-4
For information on all Elsevier Digital Press publications visit our Web site at
www.books.elsevier.com
Printed in the United States of America
07 08 09 10 11 12 10 9 8 7 6 5 4 3 2 1
v
Contents at a Glance
Introduction xv
1 Performance and SQL Server 2005 1
2. Logical Database Design for Performance 19
3. Physical Database Design 65
4. SQL Server Storage Structures 75
5. Indexing 121
6. Basic Query Tuning 193
7. What Is Query Optimization? 217
8. Investigating and Influencing the Optimizer 257
9. SQL Server and Windows 307
10. Transactions and Locking 355
11. Architectural Performance Options and Choices 409
12. Monitoring Performance 421
Appendices
A. Syntax Conventions 445

B. Database Scripts 447
C. Performance Strategies and Tuning Checklist 477
Index 487
This Page Intentionally Left Blank
vii
Contents
Introduction xv
1 Performance and SQL Server 2005 1
1.1 Partitioning tables and indexes 1
1.2 Building indexes online 2
1.3 Transact SQL improvements 2
1.4 Adding the .NET Framework 3
1.5 Trace and replay objects 4
1.6 Monitoring resource consumption with SQL OS 4
1.7 Establishing baseline metrics 4
1.8 Start using the GUI tools 7
1.8.1 SQL Server Management Studio 8
1.8.2 SQL Server Configuration Manager 9
1.8.3 Database Engine Tuning Advisor 9
1.8.4 SQL Server Profiler 12
1.8.5 Business Intelligence Development Studio 14
1.9 Availability and scalability 15
1.10 Other useful stuff 16
1.11 Where to begin? 17
2 Logical Database Design for Performance 19
2.1 Introducing logical database design for performance 19
2.2 Commercial normalization techniques 21
2.2.1 Referential integrity 22
2.2.2 Primary and foreign keys 23
2.2.3 Business rules in a relational database model 25

2.2.4 Alternate indexes 26
2.3 Denormalization for performance 29
viii Contents
2.3.1 What is denormalization? 31
2.3.2 Denormalizing the already normalized 31
2.3.2.1 Multiple table joins (more than two tables) 32
2.3.2.2 Multiple table joins finding a few fields 32
2.3.2.3 The presence of composite keys 34
2.3.2.4 One-to-one relationships 35
2.3.2.5 Denormalize static tables 37
2.3.2.6 Reconstructing collection lists 38
2.3.2.7 Removing tables with common fields 38
2.3.2.8 Reincorporating transitive dependencies 39
2.3.3 Denormalizing by context 40
2.3.3.1 Copies of single fields across tables 40
2.3.3.2 Summary fields in parent tables 42
2.3.3.3 Separating data by activity and
application requirements 43
2.3.3.4 Local application caching 44
2.3.4 Denormalizing and special purpose objects 44
2.4 Extreme denormalization in data warehouses 48
2.4.1 The dimensional data model 51
2.4.1.1 What is a star schema? 53
2.4.1.2 What is a snowflake schema? 54
2.4.2 Data warehouse data model design basics 56
2.4.2.1 Dimension tables 57
2.4.2.2 Fact tables 60
2.4.2.3 Other factors to consider during design 63
3 Physical Database Design 65
3.1 Introducing physical database design 65

3.2 Data volume analysis 67
3.3 Transaction analysis 69
3.4 Hardware environment considerations 73
4 SQL Server Storage Structures 75
4.1 Databases and files 75
4.2 Creating databases 79
4.3 Increasing the size of a database 83
4.4 Decreasing the size of a database 84
4.4.1 The autoshrink database option 86
4.4.2 Shrinking a database in the SQL Server
Management Studio 86
Contents ix
Contents
4.4.3 Shrinking a database using DBCC statements 88
4.5 Modifying filegroup properties 90
4.6 Setting database options 92
4.7 Displaying information about databases 95
4.8 System tables used in database configuration 98
4.9 Units of storage 102
4.10 Database pages 104
4.11 Looking into database pages 108
4.12 Pages for space management 112
4.13 Partitioning tables into physical chunks 115
4.13.1 Types of partitions 117
4.13.2 Creating a range partition 117
4.13.3 Creating an even distribution partition 118
4.14 The BankingDB database 119
5 Indexing 121
5.1 Data retrieval with no indexes 121
5.2 Clustered indexes 122

5.3 Non-clustered indexes 127
5.4 Online indexes 129
5.5 The more exotic indexing forms 129
5.5.1 Parallel indexing 129
5.5.2 Partition indexing 130
5.5.3 XML data type indexes 130
5.6 The role of indexes in insertion and deletion 131
5.7 A note with regard to updates 141
5.8 So how do you create indexes? 142
5.8.1 The Transact-SQL CREATE INDEX statement 142
5.8.2 The SQL Management Studio 153
5.8.3 The SQL Distributed Management
Framework (SQL-DMF) 155
5.9 Dropping and renaming indexes 157
5.10 Displaying information about indexes 158
5.10.1 The system stored procedure sp_helpindex 158
5.10.2 The system table sysindexes 159
5.10.3 Using metadata functions to obtain information
about indexes 161
5.10.4 The DBCC statement DBCC SHOWCONTIG 163
5.11 Creating indexes on views 167
5.12 Creating indexes with computed columns 170
5.13 Using indexes to retrieve data 171
x Contents
5.13.1 Retrieving a single row 173
5.13.2 Retrieving a range of rows 175
5.13.3 Covered queries 177
5.13.4 Retrieving a single row with a clustered index on
the table 178
5.13.5 Retrieving a range of rows with a clustered index on

the table 179
5.13.6 Covered queries with a clustered index on the table 180
5.13.7 Retrieving a range of rows with multiple non-clustered
indexes on the table 180
5.14 Choosing indexes 182
5.14.1 Why not create many indexes? 183
5.14.2 Online transaction processing versus decision support 184
5.14.3 Choosing sensible index columns 185
5.14.4 Choosing a clustered index or a non-clustered index 189
6 Basic Query Tuning 193
6.1 The SELECT statement 194
6.1.1 Filtering with the WHERE clause 195
6.1.2 Sorting with the ORDER BY clause 196
6.1.2.1 Overriding WHERE with ORDER BY 197
6.1.3 Grouping result sets 198
6.1.3.1 Sorting with the GROUP BY clause 198
6.1.3.2 Using DISTINCT 199
6.1.3.3 The HAVING clause 199
6.2 Using functions 200
6.2.1 Data type conversions 200
6.3 Comparison conditions 201
6.3.1 Equi, anti, and range 202
6.3.2 LIKE pattern matching 203
6.3.3 Set membership 204
6.4 Joins 204
6.4.1 Efficient joins 205
6.4.1.1 Intersections 205
6.4.1.2 Self joins 206
6.4.2 Inefficient Joins 207
6.4.2.1 Cartesian Products 207

6.4.2.2 Outer Joins 207
6.4.2.3 Anti-joins 209
6.4.3 How to tune a join 209
6.5 Using subqueries for efficiency 210
Contents xi
Contents
6.5.1 Correlated versus non-correlated subqueries 210
6.5.2 IN versus EXISTS 210
6.5.3 Nested subqueries 210
6.5.4 Advanced subquery joins 211
6.6 Specialized metadata objects 213
6.7 Procedures in Transact SQL 214
7 What Is Query Optimization? 217
7.1 When is a query optimized? 218
7.2 The steps in query optimization 218
7.3 Query analysis 219
7.3.1 Search arguments 219
7.3.2 OR clauses 223
7.3.3 Join clauses 224
7.4 Index selection 225
7.4.1 Does a useful index exist? 226
7.4.2 How selective is the search argument? 226
7.4.3 Key distribution statistics 227
7.4.4 Column statistics 233
7.4.5 Updating index and column statistics 234
7.4.6 When can we not use statistics? 240
7.4.7 Translating rows to logical reads 241
7.4.7.1 No index present 242
7.4.7.2 A clustered index present 242
7.4.7.3 A non-clustered index present 243

7.4.7.4 A non-clustered index present and a clustered
index present 245
7.4.7.5 Multiple non-clustered indexes present 245
7.5 Join order selection 246
7.6 How joins are processed 247
7.6.1 Nested loops joins 248
7.6.2 Merge joins 251
7.6.3 Hash joins 253
8 Investigating and Influencing the Optimizer 257
8.1 Text-based query plans and statistics 259
8.1.1 SET SHOWPLAN_TEXT { ON | OFF } 259
8.1.2 SET SHOWPLAN_ALL { ON | OFF } 260
8.1.3 SET SHOWPLAN_XML { ON | OFF } 265
8.1.4 SET STATISTICS PROFILE { ON | OFF } 266
xii Contents
8.1.5 SET STATISTICS IO { ON | OFF } 267
8.1.6 SET STATISTICS TIME { ON | OFF } 268
8.1.7 SET STATISTICS XML { ON | OFF } 270
8.2 Query plans in Management Studio 270
8.2.1 Statistics and cost-based optimization 275
8.3 Hinting to the optimizer 282
8.3.1 Join hints 283
8.3.2 Table and index hints 283
8.3.3 View hints 284
8.3.4 Query hints 285
8.4 Stored procedures and the query optimizer 289
8.4.1 A stored procedure challenge 292
8.4.1.1 Changes to the table structure 295
8.4.1.2 Changes to indexes 295
8.4.1.3 Executing update statistics 295

8.4.1.4 Aging the stored procedure out of cache 295
8.4.1.5 Table data modifications 295
8.4.1.6 Mixing data definition language and data
manipulation language statements 296
8.4.2 Temporary tables 297
8.4.3 Forcing recompilation 298
8.4.4 Aging stored procedures from cache 300
8.5 Non-stored procedure plans 301
8.6 The syscacheobjects system table 304
9 SQL Server and Windows 307
9.1 SQL Server and CPU 307
9.1.1 An overview of Windows and CPU utilization 307
9.1.2 How SQL Server uses CPU 309
9.1.2.1 Priority 309
9.1.2.2 Use of symmetric multiprocessing systems 311
9.1.2.3 Thread use 312
9.1.2.4 Query parallelism 313
9.1.3 Investigating CPU bottlenecks 314
9.1.4 Solving problems with CPU 321
9.2 SQL Server and memory 323
9.2.1 An overview of Windows virtual memory management 323
9.2.2 How SQL Server uses memory 325
9.2.2.1 Configuring memory for SQL Server 326
9.2.3 Investigating memory bottlenecks 329
9.2.4 Solving problems with memory 335
Contents xiii
Contents
9.3 SQL Server and disk I/O 335
9.3.1 An overview of Windows and disk I/O 336
9.3.2 How SQL Server uses disk I/O 339

9.3.2.1 An overview of the data cache 340
9.3.2.2 Keeping tables and indexes in cache 343
9.3.2.3 Read-ahead scans 344
9.3.2.4 Shrinking database files 346
9.3.3 Investigating disk I/O bottlenecks 348
9.3.4 Solving problems with disk I/O 352
10 Transactions and Locking 355
10.1 Why a locking protocol? 356
10.1.1 Scenario 1 356
10.1.2 Scenario 2 357
10.2 The SQL Server locking protocol 358
10.2.1 Shared and exclusive locks 358
10.2.2 Row-, page-, and table-level locking 360
10.2.2.1 When are row-level locks used? 361
10.2.2.2 When are table-level locks used? 362
10.2.3 Lock timeouts 363
10.2.4 Deadlocks 364
10.2.5 Update locks 365
10.2.6 Intent locks 367
10.2.7 Modifying the default locking behavior 367
10.2.7.1 Transaction isolation levels 368
10.2.7.2 Lock hints 369
10.2.8 Locking in system tables 373
10.2.9 Monitoring locks 374
10.2.9.1 Using the sp_lock system stored procedure 375
10.2.9.2 Using the SQL Server 2005 Management Studio 379
10.2.9.3 Using the System Monitor 381
10.2.9.4 Interrogating the syslockinfo table 383
10.2.9.5 Using the system procedure sp_who 386
10.2.9.6 The SQL Server Profiler 387

10.2.9.7 Using trace flags with DBCC 388
10.3 SQL Server locking in action 393
10.4 Uncommitted data, non-repeatable reads, phantoms, and more 398
10.4.1 Reading uncommitted data 398
10.4.2 Non-repeatable reads 399
10.4.3 Phantoms 401
10.4.4 More modified locking behavior 405
xiv Contents
10.5 Application resource locks 406
10.6 A summary of lock compatibility 407
11 Architectural Performance Options
and Choices 409
11.1 The Management Studio and the .NET Framework 410
11.2 Striping and mirroring 410
11.2.1 RAID arrays 410
11.2.2 Partitioning and Parallel Processing 411
11.3 Workflow management 411
11.4 Analysis Services and data warehousing 412
11.4.1 Data modeling techniques in SQL Server 2005 413
11.5 Distribution and replication 414
11.6 Standby failover (hot spare) 417
11.6.1 Clustered failover databases 418
11.7 Flashback snapshot databases 419
12 Monitoring Performance 421
12.1 System stored procedures 422
12.2 System monitor, performance logs, and alerts 424
12.3 SQL Server 2005 Management Studio 427
12.3.1 Client statistics 427
12.3.2 The SQL Server Profiler 428
12.3.2.1 What events can be traced? 429

12.3.2.2 What information is collected? 430
12.3.2.3 Filtering information 431
12.3.2.4 Creating an SQL Server profiler trace 431
12.3.2.5 Creating traces with stored procedures 438
12.3.3 Database Engine Tuning Advisor 442
12.4 SQL OS and resource consumption 443
A Syntax Conventions 445
B Database Scripts 447
C Performance Strategies and Tuning Checklist 477
Index 487
xv
Introduction
What is the goal of tuning an SQL Server database? The goal is to improve
performance until acceptable levels are reached. Acceptable levels can be
defined in a number of ways. For a large online transaction processing
(OLTP) application the performance goal might be to provide sub-second
response time for critical transactions and to provide a response time of less
than two seconds for 95 percent of the other main transactions. For some
systems, typically batch systems, acceptable performance might be mea-
sured in throughput. For example, a settlement system may define accept-
able performance in terms of the number of trades settled per hour. For an
overnight batch suite acceptable performance might be that it must finish
before the business day starts.
Whatever the system, designing for performance should start early in
the design process and continue after the application has gone live. Per-
formance tuning is not a one-off process but an iterative process during
which response time is measured, tuning performed, and response time
measured again.
There is no right way to design a database; there are a number of possi-
ble approaches and all these may be perfectly valid. It is sometimes said that

performance tuning is an art, not a science. This may be true, but it is
important to undertake performance tuning experiments with the same
kind of rigorous, controlled conditions under which scientific experiments
are performed. Measurements should be taken before and after any modifi-
cation, and these should be made one at a time so it can be established
which modification, if any, resulted in an improvement or degradation.
What areas should the database designer concentrate on? The simple
answer to this question is that the database designer should concentrate on
those areas that will return the most benefit. In my experience, for most
database designs I have worked with, large gains are typically made in the
area of query and index design. As we shall see later in this book, inappro-
xvi Introduction
priate indexes and badly written queries, as well as some other contributing
factors, can negatively influence the query optimizer such that it chooses an
inefficient strategy.
To give you some idea of the gains to be made in this area, I once was
asked to look at a query that joined a number of large tables together. The
query was abandoned after it had not completed within 12 hours. The
addition of an index in conjunction with a modification to the query meant
the query now completed in less than eight minutes! This magnitude of
gain cannot be achieved just by purchasing more hardware or by twiddling
with some arcane SQL Server configuration option. A database designer or
administrator’s time is always limited, so make the best use of it! The other
main area where gains can be dramatic is lock contention. Removing lock
bottlenecks in a system with a large number of users can have a huge impact
on response times.
Now, some words of caution when chasing performance problems. If
users phone up to tell you that they are getting poor response times, do not
immediately jump to conclusions about what is causing the problem. Circle
at a high altitude first. Having made sure that you are about to monitor the

correct server, use the System Monitor to look at the CPU, disk subsystem,
and memory use. Are there any obvious bottlenecks? If there are, then look
for the culprit. Everyone blames the database, but it could just as easily be
someone running his or her favorite game! If there are no obvious bottle-
necks, and the CPU, disk, and memory counters in the System Monitor are
lower than usual, then that might tell you something. Perhaps the network
is sluggish or there is lock contention. Also be aware of the fact that some
bottlenecks hide others. A memory bottleneck often manifests itself as a
disk bottleneck.
There is no substitute for knowing your own server and knowing the
normal range of System Monitor counters. Establish trends. Measure a set
of counters regularly, and then, when someone comments that the system is
slow, you can wave a graph in front of him or her showing that it isn’t!
Also there are special thanks to be made to Craig Mullins for his work
on technical editing of this book.
So, when do we start to worry about performance? As soon as possible,
of course! We want to take the logical design and start to look at how we
should transform it into an efficient physical design.
Gavin Powell can be contacted at the following email address:

1
1
Performance and SQL Server 2005
1.1 Partitioning tables and indexes
Partitioning lets you split large chunks of data in much more manageable
smaller physical chunks of disk space. The intention is to reduce I/O activ-
ity. For example, let’s say you have a table with 10 million rows and you
only want to read 1 million rows to compile an analytical report. If the table
is divided into 10 partitions, and your 1 million rows are contained in a sin-
gle partition, then you get to read 1 million rows as opposed to 10 million

rows. On that scale you can get quite a serious difference in I/O activity for
a single report.
SQL Server 2005 allows for table partitioning and index partitioning.
What this means is that you can create a table as a partitioned table, defin-
ing specifically where each physical chunk of the table or index resides.
SQL Server 2000 partitioning was essentially manual partitioning, using
multiple tables, distributed across multiple SQL Server computers. Then a
view (partition view) was created to overlay those tables across the servers.
In other words, a query required access to a view, which contained a query,
not data. SQL Server 2005 table partitions contain real physical rows.
Physically partitioning tables and indexes has a number of benefits:
 Data can be read from a single partition at once, cutting down enor-
mously on performance hogging I/O.

Data can be accessed from multiple partitions in parallel, which gets
things done at double the speed, depending on how many processors
a server platform has.

Different partitions can be managed separately, without having to
interfere with the entire table.
2 1.3 Transact SQL improvements
1.2 Building indexes online
Building an index online allows the table indexed against to be accessed
during the index creation process. Creating or regenerating an index for a
very large table can consume a considerable period of time (hours, days).
Without online index building, creating an index puts a table offline. If that
is crucial to the running of a computer system, then you have down time.
The result was usually that indexes are not created, or never regenerated.
Even the most versatile BTree indexes can sometimes require rebuilding
to increase their performance. Constant data manipulation activity on a

table (record insert, update and deletions) can cause a BTree index to deteri-
orate over time. Online index building is crucial to the constant uptime
required by modern databases for popular websites.
1.3 Transact SQL improvements
Transact SQL provides programmable access to SQL Server. Programmable
access means that Transact SQL allows you to construct database stored
code blocks, such as stored procedures, triggers, and functions. These code
blocks have direct access to other database objects—most significantly
tables where query and data manipulation commands can be executed
directly in the stored code blocks; and code blocks are executed on the data-
base server. New capabilities added to Transact SQL in SQL Server 2005
are as follows:
 Error handling

Recursive queries

Better query writing capabilities
There is also something new to SQL Server 2005 called Multiple Active
Result Sets (MARS). MARS allows for more than a single set of rows for a
single connection. In other words, a second query can be submitted to a
SQL Server while the result set of a first query is still being returned from
database server to client application.
The overall result of Transact SQL enhancements to SQL Server 2005 is
increased performance of code, better written code, and more versatility.
Better written code can ultimately make for better performing applications
in general.
1.4 Adding the .NET Framework 3
Chapter 1
1.4 Adding the .NET Framework
You can use programming languages other than just Transact SQL and

embed code into SQL Server as .NET Framework executables. These pro-
graming languages can leverage existing personnel skills. Perhaps more
importantly, some tasks can be written in programming languages more
appropriate to a task at hand. For example a language like C# can be used,
letting a programmer take advantage of the enormous speed advantages of
writing executable code using the C programming language.
Overall, you get support for languages not inherently part of SQL Server
(Transact SQL). You get faster and easier development. You get to use Web
Services and XML (with Native XML capabilities using XML data types).
The result is faster development, better development, and hopefully better
over database performance in the long run.
The result you get is something called managed code. Managed code is
code executed by the .NET Framework. As already stated, managed code
can be written using all sorts of programming languages. Different pro-
gramming languages have different benefits. For example, C is fast and effi-
cient, where Visual Basic is easier to write code with but executes slower.
Additionally, the .NET Framework has tremendous built-in functionality.
.NET is much, much more versatile and powerful than Transact SQL.
There is much to be said for placing executable into a database, on a
database server such as SQL Server. There is also much to be said against
this practice. Essentially, the more metadata and logic you add to a data-
base, the more business logic you add to a database. In my experience, add-
ing too much business logic to a database can cause performance problems
in the long run. After all, application development languages cater to num-
ber crunching and other tasks. Why put intensive, non-data access process-
ing into a database? The database system has enough to do just in keeping
your data up to date and available.
Managed code also compiles to native code, or native form in SQL
Server, immediately prior to execution. So, it should execute a little faster
because it executes in a form which is amenable to best performance in

SQL Server.
SQL Server 2005 includes a new management object model called SQL
Management Objects (SMO). The SMO has a basis in the .NET Frame-
work. The new graphical, SQL Server Management Studio, is written using
the SMO.
4 1.7 Establishing baseline metrics
1.5 Trace and replay objects
Tracing is the process of producing large amounts of log entry information
during the process of normal database operations. However, it might be
prudent to not choose tracing as a first option to solving a performance
issue. Tracing can hurt performance simply because it generates lots of data.
The point of producing trace files is to aid in finding errors or performance
bottlenecks, which cannot be deciphered by more readily available means.
So, tracing quite literally produces trace information. Replay allows replay
of actions that generated those trace events. So, you could replay a sequence
of events against a SQL Server, without actually changing any data, and
reproduce the unpleasant performance problem. And then you could try to
reanalyze the problem, try to decipher it, and try to resolve or improve it.
1.6 Monitoring resource consumption with SQL OS
SQL OS is a new tool for SQL Server 2005, which lives between an SQL
Server database and the underlying Windows operating system (OS). The
operating system manages, runs, and accesses computer hardware on your
database server, such as CPU, memory, disk I/O, and even tasks and sched-
uling. SQL OS allows a direct picture into the hardware side of SQL Server
and how the database is perhaps abusing that hardware and operating sys-
tem. The idea is to view the hardware and the operating system from within
an SQL Server 2005 database.
1.7 Establishing baseline metrics
A baseline is a setting established by a database administrator, either written
on paper, but preferably stored in a database (generated by the database).

This baseline establishes an acceptable standard of performance. If a base-
line is exceeded then the database is deemed to have a performance prob-
lem. A metric is essentially a measure of something. The result is many
metrics, with established acceptable baseline values. If one or more metric
baselines are exceeded then there is deemed to be one or more performance
problems. Additionally, each metric can be exceeded for a previously estab-
lished reason, based on what the metric is. So, if a table, with its indexes,
has an established baseline value of 10 bytes per minute of I/O activity, and
suddenly that value jumps up to 10 giga bytes per minute—there is proba-
bly a performance problem.
1.7 Establishing baseline metrics 5
Chapter 1
An established baseline metric is a measure of normal or acceptable
activity.
Metric baselines have more significance (there are more metrics) in SQL
Server 2005 than in SQL Server 2000. The overall effect is that an SQL
Server 2005 database is now more easily monitored, and the prospect of
some automated tuning activities becomes more practical in the long term.
SQL Server 2005 has added over 70 additional baseline measures applicable
to performance of an SQL Server database. These new baseline metrics
cover areas such as memory usage, locking activities, scheduling, network
usage, transaction management, and disk I/O activity.
The obvious answer to a situation such as this is that a key index is
dropped, corrupt, or deteriorated. Or a query could be doing something
unexpected such as reading all rows in a very large table.
Using metrics and their established baseline or expected values, one can
perform a certain amount of automated monitoring and detection of per-
formance problems.
Baseline metrics are essentially statistical values collected for a set of
metrics.

A metric is a measure of some activity in a database.
The most effective method of gathering those expected metric values is
to collect multiple values—and then aggregate and average them. And thus
the term statistic applies because a statistic is an aggregate or average value,
resulting from a sample of multiple values. So, when some activity veers
away from previously established statistics, you know that there could be
some kind of performance problem—the larger the variation, the larger the
potential problem.
Baseline metrics should be gathered in the following activity sectors:

High load: Peak times (highest database activity)

Low load: Off peak times (lowest database activity)
 Batch activity: Batch processing time such as during backup process-
ing and heavy reporting or extraction cycles

Downtime: How long it takes to backup, restore, and recover is
something the executive management will always have to detail to cli-
ents. This equates to uptime and potential downtime
6 1.7 Establishing baseline metrics
Some very generalized categories areas of metric baseline measurement
are as follows:

Applications database access: The most common performance
problems are caused by poorly built queries and locking or hot blocks
(conflict caused by too much concurrency on the same data).
In computer jargon, concurrency means lots of users accessing
and changing the same data all at the same time. If there are too
many concurrent users, ultimately any relational database has its lim-
itations on what it can manage efficiently.


Internalized database activity: Statistics must not only be present
but also kept up to date. When a query reads a table, it uses what’s
called an optimizer process to make a wild guess at what it should do.
If a table has 1 million rows, plus an index, and a query seeks 1
record, the optimizer will tell the query to read the index. The opti-
mizer uses statistics to compare 1 record required, within 1 million
rows available. Without the optimizer 1 million rows will be read to
find 1 record. Without the statistics the optimizer cannot even hazard
a guess and will probably read everything. If statistics are out of date
where the optimizer thinks the table has 2 rows, but there are really 1
million, then the optimizer will likely guess very badly.

Internalized database structure: Too much business logic, such as
stored procedures or a highly over normalized table structure, can
ultimately cause overloading of a database, slowing performance
because a database is just a little too top heavy.
 Database configuration: An OLTP database accesses a few rows at a
time. It often uses indexes, depending on table size, and will pass very
small amounts of data across network and telephone cables. So, an
OLTP database can be specifically configured to use lots of mem-
ory—things like caching on client computers and middle tier servers
(web and application servers), plus very little I/O. A data warehouse
on the other hand produces a small number of very large transac-
tions, with low memory usage, enormous amounts of I/O, and lots of
throughput processing. So, a data warehouse doesn’t care too much
about memory but wants the fastest access to disk possible, plus lots
of localized (LAN) network bandwidth. An OLTP database uses all
hardware resources and a data warehouse uses mainly I/O.


Hardware resource usage: This is really very similar to the above
point under database configuration, expect that hardware can be
1.8 Start using the GUI tools 7
Chapter 1
improved upon. In some circumstances beefing up hardware will
solve performance issues. For example, an OLTP database server
needs plenty of memory, whereas a data warehouse does well with fast
disks, and perhaps multiple CPUs with partitioning for rapid parallel
processing. Beefing up hardware doesn’t always help. Sometimes
increasing CPU speed and number, or increasing onboard memory,
can only hide performance problems until a database grows in physi-
cal size, or there are more users—the problem still exists. For exam-
ple, poor query coding and indexing in an OLTP database will always
cause performance problems, no matter how much money is spent
on hardware. Sometimes hardware solutions are easier and cheaper,
but often only a stopgap solution.

Network design and configuration: Network bandwidth and bot-
tlenecks can cause problems sometimes, but this is something rarely
seen in commercial environments because the network engineers are
usually prepared for potential bandwidth requirements.
The above categories are most often the culprits of the biggest perfor-
mance issues. There are other possibilities, but they are rare and don’t really
warrant mentioning at this point. Additionally, the most frequent and exac-
erbating causes of performance problems are usually the most obvious ones,
and more often than not something to do with the people maintaining and
using the software, inadequate software, or inadequate hardware. Hardware
is usually the easiest problem to fix. Fixing software is more expensive
depending on location of errors in database or application software. Per-
suading users to use your applications and database the way you want is

either a matter of expensive training, or developers having built software
without enough of a human use (user friendly) perspective in mind.
1.8 Start using the GUI tools
Traditionally, many database administrators will still utilize command line
tools because they perceive them as being more grassroots and, thus easier
to use. Sometimes these administrators are correct. I am as guilty of this as
is anyone else. However, as in any profession, new gadgets are often
frowned upon due to simple resistance to change and a desire to deal with
tools and methods which are familiar. The new GUI tools appearing in
many relational databases these days are just too good to miss.
8 1.8 Start using the GUI tools
1.8.1 SQL Server Management Studio
The SQL Server Management Studio is a new tool used to manage all the
facets of an SQL Server, including multiple databases, tables, indexes, fields,
and data types, anything you can think of. Figure 1.1 shows a sample view
of the SQL Server Management Studio tool in SQL Server 2005.
SQL Server Management Studio is a fully integrated, multi-task ori-
ented screen (console) that can be used to manage all aspects of an SQL
Server installation, including direct access to metadata and business logic,
integration, analysis, reports, notification, scheduling, and XML, among
other facets of SQL Server architecture. Additionally, queries and scripting
can be constructed, tested, and executed. Scripting also includes versioning
control (multiple historical versions of the same piece of code allow for
backtracking). It can also be used for very easy general database mainte-
nance.
SQL Server Management Studio is in reality wholly constructed using
something called SQL Management Objects (SMO). SMO is essentially a
very large group of predefined objects, built-in and reusable, which can be
used to access all functionality of a SQL Server database. SMO is written
using the object-oriented and highly versatile .NET Framework. Database

Figure 1.1
SQL Server
Management
Studio

×