Tải bản đầy đủ (.pdf) (40 trang)

Tài liệu SQL Server MVP Deep Dives- P14 ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.19 MB, 40 trang )

476
C
HAPTER
36
Understated changes in SQL Server 2005 replication
Reading the text of hidden replication stored procedures
Consider the case when you need to examine the text of a replication system stored
procedure. You might want to see this text for several reasons. Perhaps you want to
perform the same actions as the
SQL
Server Management Studio (
SSMS
)
GUI
, but
hope to gain programmatic control of the process for yourself. Perhaps you want to
get a better understanding of what happens under the hood in order to increase your
replication expertise and troubleshooting ability. Whatever the reason, usually you
will need to know the name of the system stored procedure and then use
sp_helptext
or the
OBJECT_DEFINITION
function to see the whole procedure definition. For some
of the replication stored procedures, though, you will find that the text is hidden and
these two methods will not work. For example, if you try the following code in a nor-
mal query window, you will have
NULL
returned:
SELECT OBJECT_DEFINITION (OBJECT_id('sys.sp_MSrepl_helparticlecolumns'))
On the other hand, if you use the dedicated administrator connection (
DAC


), you will
be able to access the underlying text of the procedure. The process is pretty straight-
forward and is shown here:
1
Enable remote access to the
DAC
:
sp_configure 'remote admin connections', 1;
GO
RECONFIGURE;
GO
2
Connect to the server using the
DAC
.
Use a query window to connect to
yourservername
by using
ADMIN:your-
servername
in the server name section (or use the sqlcmd command-prompt
utility with the
-A
switch).
3
Execute the script:
SELECT OBJECT_DEFINITION (
OBJECT_id('sys.sp_MSrepl_helparticlecolumns')
)
You should find the procedure text returned as expected, and if you are on a produc-

tion system, don’t forget to close the
DAC
connection when you are done with it!
Creating snapshots without any data—only the schema
When we look in
BOL
at the definition of a replication stored procedure or a replica-
tion agent, we find that the permitted values for the parameters are all clearly listed.
But it occasionally becomes apparent that there are other acceptable values that have
never been documented. The exact number of these hidden parameters is something
we’ll never know, and in all cases they will be unsupported for the general public.
Even so, sometimes they start being used and recommended prior to documentation,
usually in order to fix a bug. A case in point is the
sp_addpublication
procedure, in
which there is now the acceptable value of
database snapshot
for the
@sync_method
.
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
477
Undocumented or partially documented changes in behavior
This value was for some time known about, undocumented and yet used, but it now
exists in fully documented (and supported) form in
BOL
. The usual caveats apply if
you decide to use any such workaround; you must take full responsibility, and any
such modifications are completely unsupported.

Another example that exists in the public domain but is not yet in
BOL
is also
available. If your distributor is
SQL
Server 2005, the Snapshot Agent has an undocu-
mented
/NoBcpData
switch that will allow you to generate a snapshot without any
BCP
data. This can be useful when you need to (quickly) debug schema problems gener-
ated on initialization.
You can access the command line for running the Snapshot Agent from
SSMS
as
follows:
1
Expand the
SQL
Server Agent node.
2
Expand the Jobs node.
3
Double-click on the Snapshot Agent job, which typically has a name of the form
<Publisher>_<PublisherDB>_<Publication>_<number> (for example, Paul-PC-
TestPub-TestPublication-1). You’ll know if this is the correct job because the cat-
egory will be listed as
REPL
-Snapshot.
4

Select Steps from the left pane.
5
Select the second Run Agent step, and click the Edit button to open it. You
should see the command line in the Command text box.
Once you have added the
/NoBcpData
parameter to the command line, as shown in
figure 1, click
OK
in the Job Step dialog box and click
OK
again in the Job dialog
box to make sure that the change is committed. The
/NoBcpData
switch tells the
Snapshot Agent to create empty
BCP
files instead of bulk-copying data out from the
published tables.
Figure 1 In the Snapshot Agent’s job step, the unofficial (unsupported!)
/NoBcpData
is entered.
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
478
C
HAPTER
36
Understated changes in SQL Server 2005 replication
Some changed replication defaults

Many replication defaults changed between
SQL
Server 2000 and
SQL
Server
2005—far too many to cover in this section. Most of the new defaults are obvious and
self-explanatory, but occasionally some of the following changes catch people out.
ROW-LEVEL CONFLICT DETECTION
In
SQL
Server 2000, column-level conflict detection was the default for merge replica-
tion, and this has changed to row-level conflict detection in
SQL
Server 2005. Which
one is correct for your business is something only you can determine, but if you previ-
ously left this setting alone and intend to do the same now, you might find an unex-
pected set of records in the conflict viewer.
NEW MERGE IDENTITY RANGE MANAGEMENT
The following changes to identity range management for merge publications have
been introduced in
SQL
Server 2005:

Range allocation is automatic. In
SQL
Server 2005 merge publications, the arti-
cle identity range management is set to automatic by default. In
SQL
Server 2000,
the default identity range management was manual. What is the difference?

Automatic range management ensures that each subscriber is reseeded with
its own identity ranges without any extra configuration, whereas manual means
that you will need to change either the seed or the increment of the identity
range on each subscriber to avoid conflicts with the publisher. If you previously
relied on leaving this article property alone and chose to manually administer
the identity range, beware because a range of 1,000 values will have already
been allocated to each of your subscribers.

Default range sizes have increased. The publisher range has changed from
100 to 10,000, and the subscriber range size has increased from 100 to 1,000.

Overflow range is allocated. The merge trigger code on the published article
implements an overflow range that is the same size as the normal range. This
means that by default you will have two ranges of 1,000 values allocated to a sub-
scriber. The clever part is that the overflow range is automatically allocated by
the merge insert trigger and therefore doesn’t require a connection to the pub-
lisher. However, the reseeding performed in the trigger is restricted to those
cases where a member of the db_owner role does the insert.

The
threshold
parameter is no longer used. Although it appears in the arti-
cle properties dialog box much the same as in
SQL
Server 2000, the
threshold
parameter only applies to subscribers running
SQL
Server Mobile or previous
versions of

SQL
Server.
“NOT FOR REPLICATION” TREATMENT OF IDENTITY COLUMNS
Identity columns have a new default behavior. These columns are automatically
marked as Not for Replication (
NFR
) on the publisher and are transferred with the
identity
NFR
property intact at the subscriber. This retention of the
NFR
property
applies to both transactional and merge replication.
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
479
Undocumented or partially documented changes in behavior
Why might this be a useful change? First, it means that you don’t need to wade
through all the tables before creating the publication in order to manually set each
identity column as
NFR
. This is a huge improvement because the method used in
SQL
Server 2000 by Enterprise Manager to set the
NFR
attribute involved making whole
(time-consuming) copies of the table data. It also means that if you are using transac-
tional replication as a disaster recovery solution, there is now one less hoop you will
need to jump through on failover because you don’t have to change this setting on
each table at the subscriber. That particular part of your process can now be

removed.
(If you are now thinking that it is not possible in
T-SQL
to directly add the
NFR
attribute to an existing identity column, please take a look inside the
sp_identitycolumnforreplication
system stored procedure, because this is the pro-
cedure that marks the identity column as
NFR
.)
DEFERRED UPDATE TRACE FLAGS
For transactional replication, you might be using deferred update trace flags unneces-
sarily. In
SQL
Server 2000, updates to columns that do not participate in a unique key
constraint are replicated as updates to the subscriber unless trace flag 8202 is enabled,
after which they are treated as deferred updates (paired insert/deletes). On the other
hand, updates to columns that do participate in unique constraints are always treated
as deferred updates (paired insert/deletes) unless trace flag 8207 is enabled. In
SQL
Server 2005, all such changes are replicated as updates on the subscriber regardless of
whether the columns being updated participate in a unique constraint or not.
PARTITIONING OF SNAPSHOT FILES
The following change to a replication default is more complicated to explain, but it
deals with a significant improvement that has been made to the initial snapshot pro-
cess. In
SQL
Server 2000, when an article is
BCP

’d to the filesystem (the distribution
working folder) during the snapshot generation, there is always one file created that
contains the table’s data. In
SQL
Server 2005, when you look in the distribution work-
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
480
C
HAPTER
36
Understated changes in SQL Server 2005 replication
ing folder after creating a snapshot, you might be surprised to find many such files for
each article, each containing a separate part of the table data, as shown in figure 2.
Clearly there has been a big change in the processing rules. I’ll refer to this overall
process of splitting data files as
BCP
partitioning, borrowing the term from a Micro-
soft developer who once pointed this out in a posting in the Microsoft Replication
Newsgroup (microsoft.public.sqlserver.replication). This section explains why
BCP
partitioning exists, what the expected behavior is, and how to troubleshoot if it all
goes wrong.

BCP
partitioning has several benefits. First, it helps in those cases where there has
been a network outage when the snapshot is being applied to the subscriber. In
SQL
Server 2000, this would mean that the complete snapshot would have to be reapplied,
and in the case of concurrent snapshots, this would all have to be done in one trans-

action. In contrast, if you have a
SQL
Server 2005 distributor and
SQL
Server 2005
subscribers, there is now much greater granularity in the process. The article rows are
partitioned into the separate text files, and each partition is applied in a separate
transaction, meaning that after an outage, the snapshot distribution is able to con-
tinue with the partition where it left off and complete the remaining partitions. For a
table containing a lot of rows, this could lead to a huge saving in time.
Other useful side effects are that this can cause less expansion of the transaction
log (assuming that the migration crosses a backup schedule or the subscriber uses
the simple recovery model), and it can lead to paths of parallel execution of the
BCP
process for those machines having more than one processor. (It is true that parallel
execution existed in
SQL
Server 2000, but this was only for the processing of several
articles concurrently and not for a single table.)
Similarly, the same benefits apply when creating the initial snapshot using the
Snapshot Agent. Note that the
–BcpBatchSize
parameter of the Snapshot and Distri-
bution Agents governs how often progress messages are logged and has no bearing at
all on the number of partitions.
Source table
Articlename#1.bcp
Articlename#2.bcp
Figure 2 Snapshot data from
a table is now partitioned

across several text files.
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
481
More efficient methodologies
To disable BCP partitioning, you can add the unofficial
-EnableArticleBcp-
Partitioning 0
switch to the Snapshot Agent and a single data file will be produced,
just like in
SQL
Server 2000. Why would you want to turn off such a useful feature?
Well, anecdotally, things may get worse for folks who don’t start off with empty tables
(archiving or roll-up scenarios) or if the
CPU
, disk
I/O
, or network bandwidth is the
bottleneck in the attempt to extract more snapshot processing throughput when
using
BCP
partitioning.
Finally, for those tables that expand the transaction log, some
DBA
s like to enable
the bulk-logged recovery mode to try to minimize logging, but this will not always
work when dealing with multiple partitions. To ensure that there is a maximum
chance of going down the bulk-logged path, you should use
-MaxBcpThreads


X
(where
X

>

1
) for the Distribution Agent and ensure that the target table doesn’t have
any indexes on it before the Distribution Agent delivers the snapshot.
More efficient methodologies
In the previous section, we looked at several undocumented techniques that can be
used to enhance the replication behavior. We’ll now look at some capabilities that are
fully documented, but that are not always understood to be replacements for less-
efficient methodologies.
Remove redundant pre-snapshot and post-snapshot scripts
In
SQL
Server 2000 publications, we sometimes use pre-snapshot and post-snapshot
scripts. The pre-snapshot scripts are
T-SQL
scripts that run before the snapshot files
are applied, whereas the post-snapshot scripts apply once the snapshot has com-
pleted. Their use is often to overcome
DRI
(declarative referential integrity) issues on
the subscriber.
Remember that the initialization process starts by dropping tables on the sub-
scriber. If all the tables on the subscriber originate from one publication, this is not
an issue, but if there is more than one publication involved, we might have a scenario
where the dropping of tables at the subscriber during initialization would be invalid

because of relationships between articles originating from different publications.
There might also be other tables on the subscriber that are related to replicated arti-
cles and that are not themselves part of any publication. Either way, we find the same
DRI
problem when initialization tries to drop the subscriber’s table. In such cases, the
pre-snapshot and post-snapshot scripts are needed—a pre-snapshot script would drop
the foreign keys to allow the tables to be dropped, and a post-snapshot script would
then add the foreign keys back in. Such scripts are not difficult to write, but each
needs to be manually created and maintained, causing (another!) maintenance head-
ache for the
DBA
.
In
SQL
Server 2005 there is a new, automatic way of achieving this on initialization
at the subscriber. Initially, there is a call to the
sys.sp_MSdropfkreferencingarticle
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
482
C
HAPTER
36
Understated changes in SQL Server 2005 replication
system stored procedure, which saves the relevant DRI information to the following
three metadata tables:

dbo.MSsavedforeignkeys

dbo.MSsavedforeignkeycolumns


dbo.MSsavedforeignkeyextendedproperties
Once the information is safely hived away, the foreign keys are dropped. To re-add the
foreign keys, the Distribution Agent calls the new
sp_MSrestoresavedforeignkeys
system stored procedure once the snapshot has been applied. Note that all this hap-
pens automatically and requires no manual scripts to be created.
Take a look at your existing pre-snapshot and post-snapshot scripts. If they deal
with the maintenance of foreign keys, there’s a good chance they are doing work that
is already done by default, in which case you’ll be able to drop the scripts entirely and
remove the maintenance issue.
Replace merge -EXCHANGETYPE parameters
In
SQL
Server 2005 merge replication, we can now mark articles as download-only,
meaning that changes to the table are only allowed at the publisher and not at the
subscriber. Previously, in
SQL
Server 2000, we would use the
-EXCHANGETYPE
value to
set the direction of merge replication changes. This was implemented by manually
editing the Merge Agent’s job step and adding
-EXCHANGETYPE 1|2|3
as text.
When using
SQL
Server 2000, entering a value of
-EXCHANGETYPE 2
means that

changes to a replicated article at the subscriber are not prohibited, are recorded in
the merge metadata tables via merge triggers, and are subsequently filtered out when
the Merge Agent synchronizes. This means there may be a huge amount of unneces-
sary metadata being recorded, which slows down both the data changes made to the
table and the subsequent synchronization process.
This
-EXCHANGETYPE
setting is not reflected directly in the
GUI
and is hidden away
in the text of the Merge Agent’s job. Despite being a maintenance headache and
causing an unnecessary slowing down of synchronization, it was the only way of
achieving this end, and judging by the newsgroups, its use was commonplace.
In
SQL
Server 2005, when adding an article, there is an option to define the
subscriber_upload_options
either using the article properties screen in the
GUI
or
in code, like this:
sp_addmergearticle @subscriber_upload_options = 1
This parameter defines restrictions on updates made at a subscriber. The parameter
value of 1 is described as “download only, but allow subscriber changes” and seems
equivalent to the
-EXCHANGETYPE

=

2

setting mentioned previously, but in the
SQL
Server 2005 case there are no triggers at all on the subscriber table. Another distinc-
tion is that this setting is made at the more granular article level rather than set for the
entire publication. This means that although the
-EXCHANGETYPE
and
sp_add-
mergearticle
methods are logically equivalent, the implementation has become
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
483
Summary
much more sophisticated in
SQL
Server 2005. Triggers that unnecessarily log meta-
data at the subscriber are no longer fired; therefore both subscriber data changes and
the subsequent synchronization are significantly faster.
Put simply, you should replace the use of
EXCHANGETYPE
with download-only articles!
Incidentally, this setting is also implemented by a separate check box in
SSMS
, as
shown in figure 3. This check box does a similar job but sets the value of
@subscriber_upload_options
to 2, which again makes the changes download-only,
but in this case any subscriber settings are prohibited and rolled back.




Summary
We have looked at many of the lesser-known replication techniques useful in
SQL
Server 2005. Some of these involve using parameters or procedures that are partially
documented but that might help solve a particular issue. Other methods are fully doc-
umented, but we have looked at how these methods can be used to replace replication
techniques used in
SQL
Server 2000 and improve our replication implementation and
reduce administration.
Figure 3 Merge replication articles can be marked as download-only to prevent subscriber changes and
reduce metadata.
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
484
C
HAPTER
36
Understated changes in SQL Server 2005 replication
About the author
Paul Ibison is a contractor
SQL
Server
DBA
in London. He runs
the website www.replicationanswers.com
—the only site dedi-
cated to

SQL
Server replication—and has answered over 6,000
questions on the Microsoft
SQL
Server Replication newsgroup.
When not working, he likes spending time with his wife and
son, Ewa and Thomas, going fell-walking in the Lake District,
and learning Ba Gua, a Chinese martial art.
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
485
Summary
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
484
37 High-performance
transactional replication
Hilary Cotter
The purpose of this chapter is to educate
DBA
s on how to get maximum perfor-
mance from their high-performance transactional replication topology across all
versions and editions. Most
DBA
s are concerned with latency—in other words, how
old the transactions are when they are applied on the Subscriber.
To set expectations, you should know that the minimum latency of any transac-
tional replication solution will be several seconds (lower limits are between 1 and 2
seconds). Should you need replication solutions which require lower latencies, you
should look at products like Golden Gate, which is an

IP
application that piggy-
backs off the Log Reader Agent.
Focusing solely on latency will not give you a good indication of replication per-
formance. The nature of your workload can itself contribute to larger latencies. For
example, transactions consisting of single insert statements can be replicated with
small latencies (that is, several seconds), but large batch operations can have large
latencies (that is, many minutes or even hours). Large latencies in themselves are
not necessarily indicative of poor replication performance, insufficient network
bandwidth, or inadequate hardware to support your workload.
Consequently, in this study we’ll be looking at the following:

Throughput—The number of transactions and commands
SQL
Server can
replicate per second. These can be measured by the performance monitor
counters SQLServer:Replication Dist:Dist:Delivered Trans/sec and SQL-
Server:Replication Dist:Dist:Delivered Cmds/sec.

Worker time—How long it takes for
SQL
Server to replicate a fixed number of
transactions and commands. This statistic is logged when the replication
agents are run from the command line.

Latency—How old the transactions are when they arrive at the Subscriber.
Latency can be measured using the performance monitor counter
SQLServer:Replication Dist:Dist:Delivery Latency.
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

485
Performance kiss of death factors in transactional replication
Although these performance monitor counters are the best way to get a handle on
your current throughput and latency in your production environments, in this study
we’ll be focusing primarily on throughput. We’ll focus mainly on worker time, or how
long the distribution agent has to work to replicate a given set of commands. We’ll
focus on the Distribution Agent metrics, as the Log Reader is rarely the bottleneck in
a replication topology. Additionally the Log Reader Agent operates asynchronously
from the Distribution Agent; therefore, the Log Reader Agent can keep current with
reading the log, while the Distribution Agent can be experiencing high latencies. By
studying the output of the replication agents themselves, when you replay your work-
loads through them (or measure your workloads as they are replicated by the
agents), you can determine the optimal configuration of profile settings for your
workloads, and determine how to group articles into different publications for the
maximum throughput.
This chapter assumes that you have a good understanding of replication concepts.
Should you be unfamiliar with replication concepts, I advise you to study the section
Replication Administrator InfoCenter in Books Online, accessible online at http:
//
msdn.microsoft.com/en-us/library/ms151314(
SQL.90
).aspx.
Before we begin it is important to look at factors that are the performance kiss of
death to any replication solution. After we look at these factors and possible ways to
mitigate them, we’ll look at tuning the replication agents themselves for maximum
performance.
Performance kiss of death factors
in transactional replication
The following factors will adversely affect the throughput of any replication solution:


Batch updates

Replicating text

Logging

Network latency

Subscriber hardware

Subscriber indexes and triggers

Distributor hardware

Large numbers of push subscriptions
We’ll look at each of these in turn.
Batch updates
Transactional replication replicates transactions within a transactional con-
text—hence the name transactional. This means that if I do a batch update, insert, or
delete, the batch is written in the log as singleton commands. Singletons are data
manipulation language (
DML
) commands that affect at most one row. For example,
the following are all singletons:
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
486
C
HAPTER
37

High-performance transactional replication
insert into tableName (Col1, Col2) values(1,2)
update tableName set Col1=1, Col2=2 where pk=1
delete from tableName where pk=1
Each singleton is wrapped in a transaction. Contrast this with the following batch
updates (the term update refers to any
DML
—an
insert
,
update
, or
delete
:
insert into tableName
select * from tableName1
update tableName set col1=1 where pk<=20
delete from tableName where pk<=20
In the insert statement the insert batch update will insert as many rows as there are in
tableName1
into
tableName
(as a transaction). Assuming there were 20 rows with a
pk
less than or equal to 20 in
tableName
, 20 rows would be affected by the batch update
and batch deletes.
If you use any transaction log analysis tool, you’ll see that the batch updates are
decomposed into singleton commands. The following

update
command
update tableName set col1=1 where pk<=20
would be written in the log as 20 singleton commands, that is:
update tableName set col1=1 where pk=1
update tableName set col1=1 where pk=2
update tableName set col1=1 where pk=3
...
update tableName set col1=1 where pk=20
The Log Reader Agent reads committed transactions and their constituent singleton
commands in the log and writes them to the distribution database as the constituent
commands.
Details about the transaction are written to MSrepl_transactions along with details
about the constituent commands.
The Distribution Agent wakes up (if scheduled) or polls (if running continuously)
and reads the last applied transaction on the subscription database for that publica-
tion. It then reads MSrepl_transactions on the distribution database and applies the
corresponding commands for that transaction it finds in MSrepl_commands one by
one on the Subscriber.
Transactions are committed to the database depending on the settings of the
BatchCommitSize
and
BatchCommitThreshold
settings for the Distribution Agent.
We’ll talk about these settings later.
Key to understanding the performance impact of this architecture is realizing that
replicating large transactions means that a transaction will be held on the Subscriber
while all the singleton commands are being applied on the Subscriber. Then a com-
mit is issued. This allows the Distribution Agent to roll back the entire transaction,
should there be a primary key violation, foreign key violation, lack of transactional

consistency (no rows affected), or some other event that causes the
DML
to fail (for
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
487
Performance kiss of death factors in transactional replication
example, the subscription database transaction log filling up). This can mean that a
lengthy period of time is required to apply large batch updates. While these batch
updates are being applied, the Distribution Agent will wrap them in a transaction so
that it can roll them back on errors, and Subscriber resources are consumed to hold
this transaction open. Latencies that were previously several seconds can quickly grow
to many minutes, and occasionally to hours (for large numbers of modified rows).
SQL
Server will get bogged down when replicating transactions that affect large num-
bers of rows—typically in the tens or hundreds of thousands of rows. Strategies for
improving performance in this regard are presented in the sections that follow.
REPLICATING THE EXECUTION OF STORED PROCEDURES
Replication involves doing your batch
DML
through a stored procedure and then rep-
licating the execution of the stored procedure. If you choose to replicate the execu-
tion of a stored procedure, every time you execute that stored procedure its name and
its parameters will be written to the log, and the Log Reader Agent will pick it up and
write it to the distribution database, where the Distribution Agent will pick it up and
apply it on the Subscriber. The performance improvements are due to two reasons:

Instead of 100,000 commands (for example) being replicated, only one stored
procedure statement would be replicated.


The Log Reader Agent has to read only the stored procedure execution state-
ment from the log, and not the 100,000 constituent singleton commands.
Naturally this will only work if you have a small number of parameters to pass. For
example, if you’re doing a batch insert to 100,000 rows, it will be difficult to pass the
100,000 rows to your stored procedure.
SP_SETSUBSCRIPTIONXACTSEQNO
Another trick is to stop your Distribution Agent before you begin your batch update.
Use the
sp_browsereplcmds
stored procedure to extract commands that have not
been applied to the Subscriber and issue them on the Subscriber to bring it up to date
with the Publisher. Then perform the batch update on your Publisher and Subscriber.
The Log Reader Agent will pull all the commands from the Publisher into the distri-
bution database, but make sure that they are not replicated (and hence applied twice
at the Subscriber). Use
sp_browsereplcmds
to determine the transaction identifier
(
xact_ seqno
) for the last batch update command that the Log Reader Agent writes
into the distribution database. Note that you can select where to stop and start
sp_browsereplcmds
as it will take a long time to issue unqualified calls to
sp_browsereplcmds
.
You may have to wait awhile before the Log Reader Agent reads all the commands
from the Publisher’s log and writes them to the distribution database.
When you detect the end of the batch update using
sp_browsereplcmds
, note the

value of the last transaction identifier (
xact_seqno
) and then use
sp_setsub
scriptionxactseqno
to tell the subscription database that all the batch updates have
arrived on the Subscriber. Then restart your Distribution Agent, and the agent will
write only transactions that occurred after the batch update.
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
488
C
HAPTER
37
High-performance transactional replication
Take care to note any transactions that may be in the distribution database and
occurred after the batch update started and before it stopped. You’ll need to ensure
that these commands are also applied on the subscription database.
The problem with this approach is that the Log Reader Agent still has to process
all of the batch update commands that are written to the log. This approach will elim-
inate the lengthy time required for the Distribution Agent to apply the commands to
the Subscriber, but will not address the time that it takes for the Log Reader Agent to
read the batch commands from the log and write them to the distribution database.
MAXCMDSINTRAN
MaxCmdsInTran
is a Log Reader Agent parameter, which will break a large transaction
into small batches. For example, if you set this to 1,000 and do a batch insert of 10,000
rows, as the Log Reader Agent reads 1,000 commands in the log it will write them to
the distribution database, even before that batch insert has completed. This allows
them to be replicated to the Subscriber. If this batch insert was wrapped in a transac-

tion on the Publisher and the batch insert failed before the transaction was commit-
ted, the commands read by the Log Reader Agent and written in the distribution
database would not be rolled back. For example, if the batch insert failed on the
9,999th row, the entire 10,000-row transaction would be rolled back on the Publisher.
The log reader would already have read the 9,000 rows out of the transaction log and
written them to the distribution database, and then they would be written in 1,000 row
batches to the Subscriber.
The advantage of this method is reduced latency because the Log Reader Agent
can start reading these commands out of the Publisher’s log before the transaction is
committed, which will mean faster replication of commands. The disadvantage is that
consistency may be lacking between your Publisher and Subscriber. Appropriate use
cases are situations in which being up to date is more important than having a com-
plete historical record or transactional record. For example, a media giant used this
method with the understanding that their peak usage would occur during a disaster.
For example, during 9/11 everyone used the news media resources for the latest
news, and if a story was lost, its rewrite swiftly came down the wire. A large book seller
also used this method when they wrote off a few lost orders, knowing that the bulk of
them would be delivered to the subscribers on time.
Replicating text
Text in this context refers to any of the large-value data types—
text
,
ntext
,
image
,
nvarchar(max)
,
varchar(max)
,

varbinary(max)
with
filestream
enabled,
var-
binary(max)
, and
XML
.
Like a batch update, when you replicate text, the constituent commands may be
spread over multiple rows in MSrepl_commands.
For example, this statement,
Insert into tableName (col1) values(replicate('x',8000)
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.
489
Performance kiss of death factors in transactional replication
is spread over eight rows in MSrepl_commands. When the text is being replicated,
there is overhead not only when the command is read out of the Publisher’s log and
broken into eight commands in MSrepl_commands, but there is also overhead for the
Distribution Agent in assembling these eight commands into one insert statement to
apply on the Subscriber.
Unfortunately there is no easy way of getting around the overhead, other than
using vertical filtering to avoid replicating text columns. On the newsgroups I fre-
quently encounter misconceptions about the
max

text

repl


size

(B)
option, the
value of which can be set using sp_configure. The misconception is that this setting
somehow helps when replicating text. Some people think that if you are replicating
text values larger than the setting of
max

text

repl

size
, then the value is not repli-
cated; others think that special optimizations kick in. In actuality your
insert
or
update
statement will fail with this message: “Length of text, ntext, or image data (x)
to be replicated exceeds configured maximum 65536.”
Although you can use this option to avoid the expense of replicating large text val-
ues, ensure that your application can handle the error that is raised.
Logging
This is a catch-22. The agents need to log minimal replication activity so that the rep-
lication subsystem can detect hung agents. However, logging itself will degrade repli-
cation performance. Figure 1 illustrates the impact of various settings of the
HistoryVerboseLoggingLevel
when replicating 10,000 singleton insert statements.

The y axis is worker time (ms), and the x axis is
OutputVerboseLevel
. Notice how
a setting for
HistoryVerboseLevel
of 0 and using the default for
OutputVerboseLevel
(1) will give you the best performance and replicate 20 percent faster than its nearest
competitor; 20 percent faster meant a total of 18,356 transactions per second.
The characteristics are completely different for 100 transactions of 100 singleton
inserts as displayed in figure 2.
The y axis is worker time, and the x axis is
OutputVerboseLevel
.
1400
1200
1000
800
600
400
200
0
1 2 3
HistoryVerboseLevel=2
HistoryVerboseLevel=1
HistoryVerboseLevel=0
Figure 1 The effect of
HistoryVerboseLevel
and
OutputVerboseLevel

settings on a
workload of 10,000 singleton inserts
Licensed to Kerri Ross <>
Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

×