Tải bản đầy đủ (.pdf) (10 trang)

Microsoft SQL Server 2008 R2 Unleashed- P133 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (494.46 KB, 10 trang )

ptg
1264
CHAPTER 35 Understanding Query Optimization
Query Text Query Hash Query Plan Hash

select * from titles where ytd_sales = 0 0x9AB21AC5889FE2D0 0x8D6DE6D258BABB2B
select * from titles where ytd_sales = 0 0x9AB21AC5889FE2D0 0x8D6DE6D258BABB2B
select * from titles where ytd_sales = 99 0x9AB21AC5889FE2D0 0xE889B5D23D917DFD
select * from titles where ytd_sales = 10 0x9AB21AC5889FE2D0 0xE889B5D23D917DFD
select * from titles where ytd_sales = 0 0x9AB21AC5889FE2D0 0x8D6DE6D258BABB2B
select * from titles where ytd_sales = 0 0x9AB21AC5889FE2D0 0xE889B5D23D917DFD
This query hash or query plan hash value can be used in a query to aggregate performance
statistics for like queries. For example, the following query returns the average processing
time and logical reads for the same queries that were returned in Listing 35.2:
SELECT
SUM(total_worker_time) / SUM(execution_count)/1000. AS “Avg CPU Time(ms)”,
SUM(total_logical_reads) / SUM(execution_count) AS “Avg Reads”
FROM
sys.dm_exec_query_stats
where query_hash = 0x9AB21AC5889FE2D0
go
Avg CPU Time(ms) Avg Reads

164.092000 7
Listing 35.5 provides a sample query using the query hash value to return information
about the top 25 queries ranked by average processing time.
LISTING 35.5 Returning Top 25 Queries Using Query Hash
SELECT TOP 25 query_stats.query_hash AS “Query Hash”,
SUM(query_stats.total_worker_time) / SUM(query_stats.execution_count) AS
“Avg CPU Time”,
MIN(query_stats.statement_text) AS “Statement Text”


FROM
(SELECT QS.*,
SUBSTRING(ST.text, (QS.statement_start_offset/2) + 1,
((CASE statement_end_offset
WHEN -1 THEN DATALENGTH(ST.text)
ELSE QS.statement_end_offset END
- QS.statement_start_offset)/2) + 1) AS statement_text
FROM sys.dm_exec_query_stats AS QS
CROSS APPLY sys.dm_exec_sql_text(QS.sql_handle) as ST) as query_stats
GROUP BY query_stats.query_hash
Download from www.wowebook.com
ptg
1265
Query Plan Caching
35
ORDER BY 2 DESC;
GO
sys.dm_exec_plan_attributes
If you want to get information about specific attributes of a specific query plan, you use
sys.dm_exec_plan_attributes. This DMV takes a plan_handle as an input parameter (see
Listing 35.1 for an example of a query that you can use to retrieve a query’s plan handle)
and returns one row for each attribute associated with the query plan. These attributes
include information such as the ID of the database context the query plan was generated
in, the ID of the user who generated the query plan, session SET options in effect at the
time the plan was generated, and so on. Many of these attributes are used as part of the
cache lookup key for the plan (indicated by the value 1 in the is_cache_key_column).
Following is an example of the output for sys.dm_exec_plan_attributes:
select convert(varchar(30), attribute) as attribute,
convert(varchar(12), value) as value,
is_cache_key

FROM
sys.dm_exec_plan_attributes (0x06000400EBC44D2AB880A006000000000000000000000000)
where is_cache_key = 1
go
attribute value is_cache_key

set_options 187 1
objectid 709739755 1
dbid 4 1
dbid_execute 0 1
user_id -2 1
language_id 0 1
date_format 1 1
date_first 7 1
compat_level 100 1
status 0 1
required_cursor_options 0 1
acceptable_cursor_options 0 1
merge_action_type 0 1
is_replication_specific 0 1
optional_spid 0 1
optional_clr_trigger_dbid 0 1
optional_clr_trigger_objid 0 1
Download from www.wowebook.com
ptg
1266
CHAPTER 35 Understanding Query Optimization
Note the attributes flagged as cache keys for the plan. If one of these properties does not
match the state of the current user session, the plan cannot be reused for that session, and
a new plan must be compiled and stored in the plan cache. If you see multiple plans in

cache for what appears to be the same query, you can determine the key differences
between them by comparing the columns associated with the plan’s cache keys to see
where the differences lie.
TIP
If SQL Server has been running for a while, with a lot of activity, the number of plans in
the plan cache can become quite large, resulting in a large number of rows being
returned by the plan cache DMVs. To run your own tests to determine which query
plans get cached and when specific query plans are reused, you should clear out the
cache occasionally. You can use the DBCC FREEPROCCACHE command to clear all
cached plans from memory. If you want to clear only the cached plans for objects or
queries in a specific database, you execute the following command:
DBCC FLUSHPROCINDB (dbid)
Keep in mind that you should run these commands only in a test environment. Running
these commands in production servers could impact the performance of the currently
running applications.
Other Query Processing Strategies
In addition to the optimization strategies covered so far, SQL Server also has some addi-
tional strategies it can apply for special types of queries. These strategies are used to help
further reduce the cost of executing various types of queries.
Predicate Transitivity
You might be familiar with the transitive property from algebra. The transitive property
simply states that if A=B and B=C, then A=C. SQL Server supports the transitive property
in its query predicates. Predicate transitivity enables SQL Server to infer a join equality
from two given equalities. Consider the following example:
SELECT *
FROM table1 t1
join table2 t2 on t1.column1 = t2.column1
join table3 t3 on t2.column1 = t3.column1
Using the principle of predicate transitivity, SQL Server is able to infer that t1.column1 is
equal to t3.column1. This capability provides the Query Optimizer with another join

Download from www.wowebook.com
ptg
1267
Other Query Processing Strategies
35
strategy to consider when optimizing this query. This might result in a much cheaper
execution plan.
The transitive property can also be applied to SARGs used on join columns. Consider the
following query:
select *
from sales s
join stores st on s.stor_id = st.stor_id
and s.stor_id = ‘B199’
Again, using transitive closure, it follows that st.stor_id is also equal to ’B199’. SQL
Server recognizes this and can compare the search value against the statistics on both
tables to more accurately estimate the number of matching rows from each table.
Group by Optimization
One way SQL Server can process GROUP BY results is to retrieve the matching detailed data
rows into a worktable and then sort the rows and calculate the aggregates on the groups
formed. In SQL Server 2008, the Query Optimizer also may choose to use hashing to orga-
nize the data into groups and then compute the aggregates.
The hash aggregation strategy uses the same basic method for grouping and calculating
aggregates as for a hash join. At the point where the probe input row is checked to deter-
mine whether it already exists in the hash bucket, the aggregate is computed if a hash
match is found. The following pseudocode summarizes the hash aggregation strategy:
create a hash table
for each row in the input table
read the row
hash the key value
search the hash table for matches

if match found
aggregate the value into the old record
else
insert the hashed key into the hash bucket
scan and output the hash table contents
drop the hash table
For some join queries that contain GROUP BY clauses, SQL Server might perform the group-
ing operation before processing the join. This could reduce the size of the input table to
the join and lower the overall cost of executing the query.
Download from www.wowebook.com
ptg
1268
CHAPTER 35 Understanding Query Optimization
NOTE
One important point to keep in mind is that regardless of the GROUP BY strategy
employed, the rows are not guaranteed to be returned in sorted order by the grouping
column(s) as they were in earlier releases. If the results must be returned in a specific
sort order, you need to use the ORDER BY clause with GROUP BY to ensure ordered
results. You might want to get into the habit of doing this regularly.
Queries with DISTINCT
When the DISTINCT clause is specified in a query, SQL Server can eliminate duplicate rows
by the sorting the result set in a worktable to identify and remove the duplicates, similar
to how a worktable is used for GROUP BY queries. In SQL Server 2008, the Query Optimizer
can also employ a hashing strategy similar to that used for GROUP BY to return only the
distinct rows before the final result set is determined.
In addition, if the Query Optimizer can determine at compile time that there will be no
possibility of duplicate rows in the result set (for example, each row contains the table’s
primary key), the strategies for removing duplicate rows are skipped altogether.
Queries with UNION
When you specify UNION in a query, SQL Server merges the result sets, applying one of the

merge or concatenation operators with sorting strategies to remove any duplicate rows.
Figure 35.25 shows an example similar to the OR strategy where the rows are concatenated
and then sorted to remove any duplicates.
If you specify UNION ALL in a query, SQL Server simply appends the result sets together. No
intermediate sorting or merge step is needed to remove duplicates. Figure 35.26 shows the
same query as in Figure 35.25, except that a UNION ALL is specified.
When you know that you do not need to worry about duplicate rows in a UNION result set,
always specify UNION ALL to eliminate the extra overhead required for sorting.
When a UNION is used to merge large result sets together, SQL Server 2008 may opt to use a
merge join or hash match operation to remove any duplicate rows. Figure 35.27 shows an
example of a UNION query where the rows are concatenated, and then a hash match opera-
tion is used to remove any duplicates.
Parallel Query Processing
The query processor in SQL Server 2008 includes parallel query processing—an execution
strategy that can improve the performance of complex queries on computers with more
than one processor.
SQL Server inserts exchange operators into each parallel query to build and manage the
query execution plan. The exchange operator is responsible for providing process manage-
ment, data redistribution, and flow control. The exchange operators are displayed in the
Download from www.wowebook.com
ptg
1269
Parallel Query Processing
35
FIGURE 35.25 An execution plan for a UNION query.
FIGURE 35.26 An execution plan for a UNION ALL query.
Download from www.wowebook.com
ptg
1270
CHAPTER 35 Understanding Query Optimization

FIGURE 35.27 An execution plan for a UNION query, using a hash match to eliminate
duplicate rows.
query plans as the Distribute Streams, Repartition Streams, and Gather Streams
logical operators. One or more of these operators can appear in the execution plan output
of a query plan for a parallel query.
Whereas a parallel query execution plan can use more than one thread, a serial execution
plan, used by a nonparallel query, uses only a single thread for its execution. Prior to
query execution time, SQL Server determines whether the current system state and config-
uration allow for parallel query execution. If parallel query execution is justified, SQL
Server determines the optimal number of threads, called the degree of parallelism, and
distributes the query workload execution across those threads. The parallel query uses the
same number of threads until the query completes. SQL Server reexamines the optimal
degree of parallelism each time a query execution plan is retrieved from the procedure
cache. Individual instances of the same query could be assigned a different degree of
parallelism.
SQL Server calculates the degree of parallelism for each instance of a parallel query execu-
tion by using the following criteria:
. How many processors does the computer running SQL Server have, and how many
are allocated to SQL Server?
If two or more processors are allocated to SQL Server, it can use parallel queries.
. What is the number of concurrent active users?
Download from www.wowebook.com
ptg
1271
Parallel Query Processing
35
The degree of parallelism is inversely related to CPU usage. The Query Optimizer
assigns a lower degree of parallelism if the CPUs are already busy.
. Is sufficient memory available for parallel query execution?
Queries, like other processes, require resources to execute, particularly memory.

Obviously, a parallel query demands more memory than a serial query. More impor-
tantly, as the degree of parallelism increases, so does the amount of memory
required. The Query Optimizer carefully considers this in developing a query execu-
tion plan. The Query Optimizer could either adjust the degree of parallelism or use a
serial plan to complete the query.
. What is the type of query being executed?
Queries that use several CPU cycles justify using a parallel execution plan. Some
examples are joins of large tables, substantial aggregations, and sorting of large result
sets. The Query Optimizer determines whether to use a parallel or serial plan by
checking the value of the cost threshold for parallelism.
. Are a sufficient number of rows processed in the given stream?
If the Query Optimizer determines that the number of rows in a stream is too low, it
does not execute a parallel plan. This prevents scenarios where the parallel execution
costs exceed the benefits of executing a parallel plan.
Regardless of the answers to the previous questions, the Query Optimizer does not use a
parallel execution plan for a query if any one of the following conditions is true:
. The serial execution cost of the query is not high enough to consider an alternative
parallel execution plan.
. A serial execution plan exists that is estimated to be faster than any possible parallel
execution plan for the particular query.
. The query contains scalar or relational operators that cannot be run in parallel.
Parallel Query Configuration Options
Two server configuration options—maximum degree of parallelism and cost thresh-
old for parallelism—affect the consideration for a parallel query. Although doing so
is not recommended, you can change the default settings for each. For single processor
machines, these settings are ignored.
The maximum degree of parallelism option limits the number of threads to use in a
parallel plan execution. The range of possible values is 0 to 32. This value is configured to
0 by default, which allows the Query Optimizer to use up to the actual number of CPUs
allocated to SQL Server. If you want to suppress parallel processing completely, set the

value to 1.
Download from www.wowebook.com
ptg
1272
CHAPTER 35 Understanding Query Optimization
The cost threshold for parallelism option establishes a ceiling value the Query
Optimizer uses to consider parallel query execution plans. If the calculated value to
execute a serial plan is greater than the value set for the cost threshold for parallelism, a
parallel plan is generated. This value is defined by the estimated time, in seconds, to
execute the serial plan. The range of values for this setting is 0 to 32767. The default value
is 5. If the maximum degree of parallelism is set to 1, or if the computer has a single
processor, the cost threshold for parallelism value is ignored.
You can modify the settings for the maximum degree of parallelism and the cost
threshold for parallelism server configuration options either by using the
sp_configure system stored procedure or through SSMS. To set the values for these
options, use the sp_configure system stored procedure via SSMS or via SQLCMD, as
follows:
USE master
go
exec sp_configure ‘show advanced options’, 1
GO
RECONFIGURE
GO
exec sp_configure ‘max degree of parallelism’, 2
exec sp_configure ‘cost threshold for parallelism’, 15
RECONFIGURE
GO
To set these configuration options via SSMS, right-click the SQL Server instance in the
Object Explorer and then click Properties. In the Server Properties dialog, select the
Advanced page. The parallelism options are near the bottom, as shown in Figure 35.28.

Identifying Parallel Queries
You can identify when a parallel execution plan is being chosen by displaying the graphi-
cal execution plan in SSMS. The graphical execution plan uses icons to represent the
execution of specific statements and queries in SQL Server. The execution plan output for
every parallel query has at least one of these three logical operators:
. Distribute Streams—Receives a single input stream of records and distributes
multiple output streams. The contents and form of the record are unchanged. All
records enter through the same single input stream and appear in one of the output
streams, preserving the relative order.
. Gather Streams—Assembles multiple input streams of records and yields a single
output stream. The relative order of the records, contents, and form is maintained.
. Repartition Streams—Accepts multiple input streams and produces multiple
streams of records. The record contents and format are unchanged.
Download from www.wowebook.com
ptg
1273
Parallel Query Processing
35
FIGURE 35.28 Setting SQL Server parallelism options.
Figure 35.29 shows a portion of a sample query plan that uses parallel query techniques—
both repartition streams and gather streams.
Parallel Queries on Partitioned Objects
SQL Server 2008 provides improved query processing performance for partitioned objects
when running parallel plans including changes in the way parallel and serial plans are
represented, and enhancements to the partitioning information provided in both compile-
time and runtime execution plans. SQL Server 2008 also automates and improves the
thread partitioning strategy for parallel query execution plans on partitioned objects.
In addition to the performance improvements, query plan information has been improved
as well in SQL Server 2008, now providing the following information related to parti-
tioned objects:

. The partitions accessed by the query, available in runtime execution plans.
. An optional Partitioned attribute indicating that an operation, such as a seek,
scan, insert, update, merge, or delete, is performed on a partitioned table.
. Summary information that provides a total count of the partitions accessed. This
information is available only in runtime plans.
Download from www.wowebook.com

×