Tải bản đầy đủ (.pdf) (201 trang)

microsoft sql server analysis services multidimensional performance and operations guide

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (5.93 MB, 201 trang )


Microsoft SQL Server Analysis
Services Multidimensional
Performance and Operations Guide
Thomas Kejser and Denny Lee
Contributors and Technical Reviewers: Peter Adshead (UBS), T.K. Anand,
KaganArca, Andrew Calvett (UBS), Brad Daniels, John Desch, Marius
Dumitru, WillfriedFärber (Trivadis), Alberto Ferrari (SQLBI), Marcel Franke
(pmOne), Greg Galloway (Artis Consulting), Darren Gosbell (James &
Monroe), DaeSeong Han, Siva Harinath, Thomas Ivarsson (Sigma AB),
Alejandro Leguizamo (SolidQ), Alexei Khalyako, Edward Melomed,
AkshaiMirchandani, Sanjay Nayyar (IM Group), TomislavPiasevoli, Carl
Rabeler (SolidQ), Marco Russo (SQLBI), Ashvini Sharma, Didier Simon, John
Sirmon, Richard Tkachuk, Andrea Uggetti, Elizabeth Vitt, Mike Vovchik,
Christopher Webb (Crossjoin Consulting), SedatYogurtcuoglu, Anne Zorner

Summary: Download this book to learn about Analysis Services Multidimensional
performance tuning from an operational and development perspective. This book
consolidates the previously published SQL Server 2008 R2 Analysis Services Operations
Guide and SQL Server 2008 R2 Analysis Services Performance Guide into a single
publication that you can view on portable devices.
Category: Guide
Applies to: SQL Server 2005, SQL Server 2008, SQL Server 2008 R2, SQL Server 2012
Source: White paper (
link to source content
, link to source content)
E-book publication date: May 2012
200 pages

This page intentionally left blank




Copyright © 2012 by Microsoft Corporation
All rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by any means
without the written permission of the publisher.


Microsoft and the trademarks listed at
are trademarks of the
Microsoft group of companies. All other marks are property of their respective owners.

The example companies, organizations, products, domain names, email addresses, logos, people, places, and events
depicted herein are fictitious. No association with any real company, organization, product, domain name, email address,
logo, person, place, or event is intended or should be inferred.

This book expresses the author’s views and opinions. The information contained in this book is provided without any
express, statutory, or implied warranties. Neither the authors, Microsoft Corporation, nor its resellers, or distributors will
be held liable for any damages caused or alleged to be caused either directly or indirectly by this book.

4

Contents
1 Introduction 5
2 Part1:BuildingaHigh‐PerformanceCube 6
2.1 DesignPatternsforScalableCubes 6
2.2 TestingAnalysisServicesCubes 32
2.3 TuningQueryPerformance 39
2.4 TuningProcessingPerformance 76
2.5 SpecialConsiderations 93
3 Part2:Running

aCubeinProduction 105
3.1 ConfiguringtheServer 106
3.2 MonitoringandTuningtheServer 133
3.3 SecurityandAuditing 143
3.4 HighAvailabilityandDisasterRecovery 147
3.5 DiagnosingandOptimizing 150
3.6 ServerMaintenance 189
3.7 SpecialConsiderations 192
4
 Conclusion 200
Sendfeedback. 200


5

1 Introduction
ThisbookconsolidatestwopreviouslypublishedguidesintooneessentialresourceforAnalysisServices
developersandoperationspersonnel.AlthoughthetitlesoftheoriginalpublicationsindicateSQLServer
2008R2,mostoftheknowledgethatyougainfromthisbookisea sily transferredtootherversionsof
AnalysisServices,includingmultidimensional
modelsbuiltusingSQLServer2012.
Part1isfromthe“SQLServer2008R2AnalysisServicesPerformanceGuide”. PublishedinOctober
2011,thisguidewascreatedfordevelopersandcubedesignerswhowanttobuildhigh‐performance
cubesusingbestpracticesandinsightslearnedfromreal‐worlddevelopmentprojects.
InPart1,you’ll
learnproventechniquesforbuildingsolutionsthatarefastertoprocessandquery,minimizingtheneed
forfurthertuningdowntheroad.
Part2isfromthe“SQLServer2008R2AnalysisServicesOperationsGuide“.Thisguide,publishedinJune
2011,isintendedfordevelopersandoperations
specialistswhomanagesolutionsthatarealreadyin

production.Part2showsyouhowtoextractperformancegainsfromaproductioncube,including
changingserverandsystemproperties,andperformingsystemmaintenancethathelpyouavoid
problemsbeforetheystart.
Whileeachguidetargetsadifferentpartofasolution
lifecycle,havingbothinasingleportableformat
givesyouanintellectualtoolkitthatyoucanaccessonmobiledeviceswhereveryoumaybe.Wehope
youfindthisbookhelpfulandeasytouse,butit is onlyoneofseveralformatsavailableforthiscontent.
Youcanalsoget
printableversionsofbothguidesbydownloadingthemfromtheMicrosoftwebsite.



6

2 Part 1: Building a High-Performance Cube
ThissectionprovidesinformationaboutbuildingandtuningAnalysisServicescubesforthebestpossible
performance.Itisprimarilyaimedatbusinessintelligence(BI)developerswhoarebuildinganewcube
fromscratchoroptimizinganexistingcubeforbetterperformance.
Thegoalofthissectionistoprovideyouwith
thenecessarybackgroundtounderstanddesigntradeoffs
andwithtechniquesanddesignpatternsthatwillhelpyouachievethebestpossibleperformanceof
evenlargecubes.
Cubeperformancecanbedividedintotwotypesofworkload:queryperformanceandprocessing
performance.Becausetheseworkloadsareverydifferent,thissectionisorganized
intofourmain
groups.
DesignPatternsforScalableCubes–Noamountofquerytuningandoptimizationcanbeatthebenefits
ofawell‐designeddatamodel.Thissectioncontainsguidancetohelpyougetthedesignrightthefirst
time.Ingeneral,goodcubedesignfollowsKimballmodelingtechniques,and
ifyouavoidsometypical

designmistakes,youareinverygoodshape.
TestingAnalysisServicesCubes–Inever yITproject,preproductiontestingisacrucialpartofthe
developmentanddeploymentcycle.Evenwiththemostcarefuldesign,testingwillstillbeabletoshake
outerrorsand
avoidproductionissues.Designingandrunningatestrunofanenterprisecubeistime
wellinvested.Hence,thissectionincludesadescriptionofthetestmethodsavailabletoyou.
TuningQueryPerformance‐Queryperformancedirectlyimpactsthequalityoftheend‐user
experience.Assuch,itistheprimarybenchmark
usedtoevaluatethesuccessofanonlineanalytical
processing(OLAP)implementation.AnalysisServicesprovidesavarietyofmechanismstoaccelerate
queryperformance,includingaggregations,caching,andindexeddataretrieval.Thissectionalso
providesguidanceonwritingefficientMultidimensionalExpressions(MDX)calculationscripts.
TuningProcessingPerformance‐Processingistheoperation
thatrefreshesdatainanAnalysisServices
database.Thefastertheprocessingperformance,thesooneruserscanaccessrefresheddata.Analysis
Servicesprovidesavarietyofmechanismsthatyoucanusetoinfluenceprocessingperformance,
includingparallelizedprocessingdesigns,relationaltuning,andaneconomicalprocessingstrategy(for
example,incrementalversusfull
refreshversusproactivecaching).
SpecialConsiderations–SomefeaturesofAnalysisServicessuchasdistinctcountmeasuresandmany‐
to‐manydimensionsrequiremorecarefulattentiontothecubedesignthanothers.AttheendofPart1,
youwillfindasectionthatdescribesthespecialtechniquesyoushould
applywhenusingthesefeatures.
2.1 Design Patterns for Scalable Cubes
CubespresentauniquechallengetotheBIdeveloper:theyaread‐hocdatabasesthatareexpectedto
respondtomostqueriesinshorttime.Thefreedomoftheenduserislimitedonlybythedatamodel 
youimplement.Achievingabalancebetweenuserfreedomandscalabledesignwilldetermine
the
7


successofacube.Eachindustryhasspecificdesignpatternsthatlendthemselveswelltovalueadding
reporting–andadetailedtreatmentofoptimal,industryspecificdatamodelisoutsidethescopeofthis
book.However,therearealotofcommondesignpatternsyoucanapplyacross
allindustries‐this
sectiondealswiththesepatternsandhowyoucanleveragethemforincreasedscalabilityinyourcube
design.
2.1.1 Building Optimal Dimensions
Awell‐tuneddimensiondesignisoneofthemostcriticalsuccessfactorsofahigh‐performing Analysis 
Servicessolution.Thedimensionsofthecubearethefirststopfordataanalysisandtheirdesignhasa
deepimpactontheperformanceofallmeasuresinthecube.
Dimensionsarecomposed
ofattributes,whicharerelatedtoeachotherthroughhierarchies.Efficient
useofattributesisakeydesignskilltomaster,andstudyingandimplementingtheattribute
relationshipsavailableinthebusinessmodelcanhe lpimprovecubeperformance.
Inthissection,youwillfindguidanceonbuildingoptimizeddimensionsand
properlyusingboth
attributesandhierarchies.
2.1.1.1 Using the KeyColumns, ValueColumn, and NameColumn Properties
Effectively
Whenyouaddanewattributetoadimension,threepropertiesareusedtodefinetheattribute.The
KeyColumnspropertyspecifiesoneormoresourcefieldsthatuniquelyidentifyeachinstanceofthe
attribute.
TheNameColumnpropertyspecifiesthesourcefield thatwillbedisplayedtoendusers.Ifyoudo
not
specifyavaluefortheNameColumnproperty,itisautomaticallysettothevalueoftheKeyColumns
property.
ValueColumnallowsyoutocarryfurtherinformationabouttheattribute–typicallyusedfor
calculations.Unlikememberproperties,thispropertyofanattributeisstronglytyped–providing
increasedperformancewhen
itisusedincalculations.Thecontentsofthispropertycanbeaccessed

throughtheMemberValueMDXfunction.
UsingbothValueColumnandNameColumntocarryinformationeliminatestheneedforextraneous
attributes.Thisreducesthetotalnumberofattri butesinyourdesign,makingitmoreefficient.
Itisabestpractice
toassignanumericsourcefield,ifavailable,totheKeyColumnspropertyratherthan
astringproperty.Furthermore,useasinglecolumnkeyinsteadofacomposite,multi‐columnkey.Not
onlydothesepracticesthisreduceprocessingtime,theyalsoreducethesizeofthedimensionandthe
likelihood
ofusererrors.Thisisespeciallytrueforattributesthathavealargenumberofmembers,that
is,greaterthanonemillionmembers.
8

2.1.1.2 Hiding Attribute Hierarchies
Formanydimensions,youwillwanttheusertonavigatehierarchiescreatedforeaseofaccess.For
example,acustomerdimensioncouldbenavigatedbydrillingintocountryandcitybeforereachingthe
customername,orbydrillingthroughagegroupsorincomelevels.Suchhierarchies,coveredinmore
detaillater,
makenavigationofthecubeeasier–andmakequeriesmoreefficient.
Inadditiontouserhierarchies,AnalysisServicesbydefaultcreatesaflathierarchyforeveryattributein
adimension–theseareattributehierarchies.Hidingattributehierarchiesisoftenagoodidea,because
alotofhierarchiesin
asingledimensionwilltypi callyconfuseusersandmakeclientqueriesless
efficient.ConsidersettingAttri buteHierarchyVisible=falseformostattributehierarchiesanduseuser
hierarchiesinstead.
2.1.1.2.1 Hiding the Surrogate Key
Itisoftenagoodideatohidethesurrogatekeyattributeinthedi mension.Ifyouexposethesurrogate
keytothe
clienttoolsasaValueColumn,thosetools mayrefertothekeyvaluesinreports.The
surrogatekeyinaKimballstarschemadesignholdsnobusinessinformation,andmayevenchangeif
youremodeltype2history.Afteryoucreateadependencytothekeyintheclienttools,you

cannot
changethekeywithoutbreakingreports.Becauseofthis,youdon’twantend‐userreportsreferringto
thesurrogatekeydirectly–andthisiswhywerecommendhidingit.
Thebestdesignforasurrogatekeyistohideitfromusersinthedimensiondesignbysetting
the
AttributeHierarchyVisible=falseandbynotincludingtheattributeinanyuserhierarchies.This
preventsend‐usertoolsfromreferencingthesurrogatekey,leavingyoufreetochangethekeyvalueif
requirementschange.
2.1.1.3 Setting or Disabling Ordering of Attributes
Inmostcases,youwantanattributetohaveanexplicitordering.Forexample,youwillwantaCity
attributetobesortedalphabetically.Youshould explicitlysettheOrderByorOrderByAttribute
propertyoftheattributetoexplicitlycontrolthisord ering.Typically,thisorderingisbyattributename
orkey,but
itmayalsobeanotherattribute.Ifyouincludeanattributeonlyforthepurposeofordering
anotherattribute,makesureyousetAttributeHierarchyEnabled=falseand
AttributeHierarchyOptimizedState=NotOptimizedtosaveonprocessingoperations.
Therearefewcaseswhereyoudon’tcareabouttheorderingofanattribute,yet
thesurrogatekeyis
onesuchcase.Forsuchhiddenattributethatyouusedonlyforimplementationpurposes,youcanset
AttributeHierarchyOrdered=falsetosavetimeduringprocessingofthedimension.
2.1.1.4 Setting Default Attribute Members
Anyquerythatdoesnotexplicitlyreferenceahierarchywillusethecurrentmemberofthat
hierarchy.ThedefaultbehaviorofAnalysisServicesistoassigntheAllmemberofadimensionasthe
defaultmember,whichisnormallythedesiredbehavior.Butforsomeattributes,suchasthecurrent
9

dayinadatedimension,itsometimesmakessensetoexplicitly assignadefaultmember.Forexample,
youmaysetadefaultdateintheAdventureWorkscubelikethis.
ALTERCUBE [Adventure Works]UPDATE
DIMENSION [Date], DEFAULT_MEMBER='[Date].[Date].&[2000]'


However,defaultmembersmaycauseissuesintheclienttool.Forexample,MicrosoftExcel2010will
notprovideavisualindicationthatadefaultmemberiscurrentlyselectedandhenceimplicitlyinfluence
thequeryresult.ThismayconfuseuserswhoexpecttheAllleveltobethecurrentmemberwhen
no
othermembersareimpliedbythequery.Also,ifyousetadefaultmemberinadimensionwithmultiple
hierarchies,youwilltypicallygetresultsthatarehardforuserstointerpret.
Ingeneral,preferexplicitlydefaultmembersonlyondimensionswithsinglehierarchiesorinhierarchies
thatdonot
haveanAlllevel.
2.1.1.5 Removing the All Level
MostdimensionsrolluptoacommonAlllevel,whichistheaggregationofalldescendants.Butthere
aresomeexceptionswhereisdoesnotmakesensetoqueryattheAlllevel.Forexample,youmayhave
acurrencydimensioninthecube–andaskingfor“thesumof
allcurrencies”isameaninglessquestion.
ItcanevenbeexpensivetoaskfortheAlllevelofdimensionifthereisnotgoodaggregatetorespondto
thequery.Forexample,ifyouhaveacubepartitionedbycurrency,askingfortheAlllevelofcurrency
willcausea
scanofallpartitions,whichcouldbeexpensiveandleadtoauselessresult.
InordertopreventusersfromqueryingmeaninglessAlllevels,youcandisabletheAllmemberina
hierarchy.YoudothisbysettingtheIsAggregateable=falseontheattributeatthetopofthehierarchy.

NotethatifyoudisabletheAlllevel,youshouldalsosetadefaultmemberasdescribedintheprevious
section–ifyoudon’t,AnalysisServiceswillchooseoneforyou.
2.1.1.6 Identifying Attribute Relationships
Attributerelationshipsdefinehierarchicaldependenciesbetweenattributes.Inotherwords,ifAhasa
relatedattributeB,writtenAB,thereisonememberinBforeverymemberinA,andmanymembers
inAforagivenmemberinB.Forexample,givenanattributerelationshipCity
State,ifthecurrent
cityisSeattle,weknowtheStatemustbeWashington.

Often,therearerelationshipsbetweenattributesthatmightormightnotbemanifestedintheoriginal
dimensiontablethatcanbeusedbytheAnalysisServicesenginetooptimizeperformance.Bydefault,
allattributesarerelated
tothekey,andtheattributerelationshipdiagramrepresentsa“bush”where
relationshipsallstemfromthekeyattributeandendateachother’sattribute.
10

Figure
1
Youcan
o
amodel
n
otherwo
r
relations
h
Figure
2
Attribute
 C
s
a
 A
r
e
 A
Consider
t
attribute

1
1: Bushy
a
o
ptimizeperf
o
n
ameidentifi
e
r
ds,asingle
s
h
ipsintheat
t
2
2: Redefi
n
relationship
s
rossproduct
s
a
vesCPUtim
e
ggregations
b
e
sourcesduri
uto‐Existcan

t
hecross‐pr
o
relationships
a
ttribute r
o
rmanceby
d
e
stheprodu
c
s
ubcategoryi
s
t
ributerelati
o
n
ed attrib
u
s
helpperfor
m
s
betweenle
v
e
duringque
r

b
uiltonattri
b
ngprocessin
g
moreefficie
o
ductbetwee
havebeene
elationshi
p
d
efininghiera
c
tlineandsu
s
notfoundi
n
o
nshipeditor
,
u
te relatio
n
m
anceinthr
e
v
elsinthehi
e

r
ies.
b
utescanbe
r
g
andforqu
e
ntlyeliminat
e
nSubcatego
r
xplicitlydefi
n
p
s
rchicalrelati
o
bcategory,a
n
n
morethan
o
,
therelation
n
ships
e
esignificant
e

rarchydon
o
r
eusedforq
u
e
ries.
e
attributec
o
r
yandCateg
o
n
ed,theengi
n
o
nshipssupp
n
dthesubca
t
o
necategory
shipsarecle
a
ways:
o
tneedtogo
u
eriesonrela

t
o
mbinations
t
o
ryinthetw
o
n
emustfirst
f
ortedbythe
t
egoryidenti
f
.Ifyouredef
i
a
rer.
throughthe
t
edattribute
s
t
hatdonote
x
o
figures.Int
f
indwhichp
r

data.Inthis
c
f
iesacatego
r
i
nethe
keyattribute
s
.Thissaves
x
istintheda
t
hefirst,whe
r
r
oductsarei
n
c
ase,
r
y.In

.This
t
a.
r
eno
n


11

eachsubcategoryandthendeterminewhichcategorieseachoftheseproductsbelongsto.Forlarge
dimensions,thiscantakealongtime.Iftheattributerelationshipisdefined,theAnalysisServ ices
engineknowsbeforehandwhichcategoryeachsubcategorybelongstoviaindexesbuiltatprocesstime.
2.1.1.6.1 Flexible vs. Rigid Relationships
Whenanattribute
relationshipisdefined,therelatio ncaneitherbeflexibleorrigid.Aflexibleattribute 
relationshipisonewherememberscanmovearoundduringdimensionupdates,andarigidattribute
relationshipisonewherethememberrelationshipsareguaranteedtobefixed.Forexample,the
relationshipbetweenmonthandyearis
fixedbecauseaparticularmonthisn’tgoingtochangeitsyear
whenthedimensionisreprocessed.However,therelationshipbetweencustomerandcitymaybe
flexibleascustomersmove.
Whenachangeisdetectedduringprocessinaflexiblerelationship,allindexesforpartitionsreferencing
theaffecteddimension(includingtheindexes
forattributethatarenotaffected)mustbeinvalidated.
ThisisanexpensiveoperationandmaycauseProcessUpdateoperationstotakeaverylongtime.
IndexesinvalidatedbychangesinflexiblerelationshipsmustberebuiltafteraProcessUpdateoperation
withaProcessIndexontheaffectedpartitions;this
addsevenmoretimetocubeprocessing.
Flexiblerelationshipsarethedefaultsetting.Carefullyconsidertheadvantages ofrigidrelationshipsand
changethedefaultwherethedesignallowsit.
2.1.1.7 Using Hierarchies Effectively
AnalysisServicesenablesyoutobuildtwotypesofuserhierarchies:naturalandunnaturalhierarchies.
Eachtypehasdifferentdesignandperformancecharacteristics.
Inanaturalhierarchy,allattributesparticipatingaslevelsinthehierarchyhavedirectorindirect
attributerelationshipsfromthebottomofthehierarchytothetopof
thehierarchy.
Inanunnaturalhierarchy,thehierarchyconsistsofatleasttwoconsecutivelevelsthathavenoattribute

relationships.Typicallythesehierarchiesareusedtocreatedrill‐downpathsofcommonlyviewed
attributesthatdonotfollowanynaturalhierarchy.Forexample,usersmaywanttoviewahierarchyof

GenderandEducation.

Figure 33: Natural and unnatural hierarchies
12

Fromaperformanceperspective,naturalhierarchiesbehaveverydifferentlythanunnaturalhierarchies
do.Innaturalhierarchies,thehierarchytreeismaterializedondiskinhierarchystores.Inaddition,all
attributesparticipating innaturalhierarchiesareautomaticallyconsideredtobeaggregationcandidates.
Unnaturalhierarchiesarenotmaterializedondisk,and
theattributesparticipatinginunnatural
hierarchiesarenotautomaticallyconsideredasaggregationcandidates.Rather,theysimplyprovide
userswitheasy‐to‐usedrill‐downpathsforcommonlyviewedattributesthatdonothavenatural
relationships.Byassemblingtheseattributesintohierarchies,youcanalsouseavarietyofMDX
navigationfunctions
toeasilyperformcalculationslikepercentofparent.
Totakeadvantageofnaturalhierarchies,definecascadingattributerelationshipsforallattributesthat
participateinthehierarchy.
2.1.1.8 Turning Off the Attribute Hierarchy
Memberpropertiesprovideadifferentmechanismtoexposedimensioninformation.Foragiven
attribute,memberpropertiesareautomaticallycreatedforeverydirectattributerelationship.Forthe
primarykeyattribute,thismeansthateveryattributethatisdirectlyrelatedtotheprimarykeyis
availableasamemberpropertyoftheprimary
keyattribute.
Ifyouonlywanttoaccessanattributeasmemberproperty,afteryouverifythatthecorrectrelationship
isinplace,youcandisabletheattribute’shierarchy bysettingtheAttributeHierarchyEnabledproperty
toFalse.Fromaprocessingperspective,disablingtheattributehierarchycanimproveperformanceand
decreasecube

sizebecausetheattributewillnolongerbeindexedoraggregated.Thiscanbeespecially
usefulforhigh‐cardinalityattribu testhathaveaone‐to‐onerelationshipwiththeprimarykey.High‐
cardinalityattributessuchasphonenumbersandaddressestypicallydonotrequireslice‐and‐dice
analysis.Bydisabling
thehierarchiesfortheseattributesand accessingthemviamemberproperties,
youcansaveprocessingtimeandreducecubesize.
Decidingwhethertodisabletheattribute’shierarchyrequiresthatyouconsiderboththequeryingand
processingimpactsofusingmemberproperties.Memberproperties cannotbeplacedonaqueryaxisin

anMDXqueryinthesamemannerasattributehierarchiesanduserhierarchies.Toqueryamember
property,youmustquerytheattributethatcontainsthatmemberproperty.
Forexample,ifyourequiretheworkphonenumberforacustome r ,youmustquerythepropertiesof
customerandthenrequest
thephonenumberproperty.Asaconvenience,mostfront‐endtoolseasily
displaymemberpropertiesintheiruserinterfaces.
Ingeneral,filteringmeasuresusingmemberpropertiesisslowerthanfilteringusingattribute
hierarchies,becausememberpropertiesarenotindexedanddonotparticipateinaggregations.The
actualimpacttoqueryperformance
dependsonhowyouusetheattribute.
Forexample,ifyouruserswanttosliceanddicedatabybothaccountnumber andaccountdescription,
fromaqueryingperspectiveyoumaybebetteroffhavingtheattributehierarchiesinplaceand
removingthebitmapindexesifprocessingperformanceisan
issue.
13

2.1.1.9 Reference Dimensions
Referencedimensionsallow youtobuildadimensionalmodelontopofasnowflakerelationaldesign.
Whilethisisapowerfulfeature,youshouldunderstandtheimplicationsofusingit.
Bydefault,areferencedimensionisnon‐materialized.Thismeansthatqueri eshavetoperformthejoin
betweenthe

referenceandtheouterdimensiontableatquerytime.Also,filtersdefinedonattributesin
theouterdimensiontablearenotdrivenintothemeasuregroupwhenthebitmapstherearescanned.
Thismayresultinreadingtoomuchdatafromdisktoansweruserqueries.Leavingadimensionasnon

materializedprioritizesmodelingflexibilityoverqueryperformance.Considercarefullywhetheryoucan
affordthistradeoff:cubesaretypically intendedtobefastad‐hocstructures,andputtingthe
performanceburdenontheenduserisrarelyagoodidea.
AnalysisServiceshastheabilitytomaterializethereferencesdimension.When
youenablethisoption,
memoryanddiskstructuresarecreatedthatmakethedimensionbehavejustlikeadenormalizedstar
schema.Thismeansthatyouwillretainalltheperformancebenefitsofaregular,non‐reference
dimension.However,becarefulwithmaterializedreferencedimension–ifyourunaproces supdate
on
theintermediatedimension,anychangesintherelationshipsbetweentheouterdimensionandthe
referencewillnotbereflectedinthecube.Instead,theoriginalrelationshipbetweentheouter
dimensionandthemeasuregroupisretained–whichismostlikelynotthedesiredresult.Inaway,you
canconsiderthereferencetabletobearigidrelationshiptoattributesintheouterattributes.Theonly
waytoreflectchangesinthereferencetableistofullyprocessthedimension.
2.1.1.10 Fast-Changing Attributes
Somedatamodelscontainattributesthatchangeveryfast.Dependingonwhichtypeofhistorytracking
youneed,youmayfacedifferentchallenges. 
Type2Fast‐ChangingAttributes‐Ifyoutrackeverychangetoafast‐changingattribu te,thismaycause
thedimensioncontainingtheattributetogrowverylarge.Type2
attributesaretypicallyaddedtoa
dimensionwithaProcessAddcommand.Atsomepoint,runningProcessAddonalargedimensionand
runningalltheconsistencycheckswilltakealongtime.Also,havingahugedimensionisunwieldy
becauseuserswillhavetroublequeryingitandtheserverwillhave
troublekeepingitinmemory.A
goodexampleofsuchamodelingchallengeistheageofacustomer–thiswillchangeeveryyearand
causethecustomerdimensiontogrowdramatically.

Type1Fast‐ChangingAttributes–Evenifyoudonottrackeverychangetotheattribute,you
maystill
runintoissueswithfast‐changingattributes.Toreflectachangeinthedatasourcetothecube,you
havetorunProcessUpdateonthechangeddimension.Asthecubeanddimensiongrowslarger,
runningProcessUpdatebecomesexpensive.Anexampleofsuchamodelingchallengeis
totrackthe
statusattributeofaserverinahostingenvironment(“Running”,“Shutdown”,“Overloaded”andsoon).
Astatusattributelikethismaychangeseveraltimesperdayorevenperhour.Runningfrequent
ProcessUpdatesonsuchadimensiontoreflectchangescanbeanexpensiveoperation,andit
maynot
befeasiblewiththelockingimplementationofAnalysisServicesinaproductionenvironment.
14

Inthefollowingsections,wewilllookatsomemodelingoptionsyoucanusetoaddresstheseproblems.
2.1.1.10.1 Type 2 Fast-Changing Attributes
Ifhistorytrackingisarequirementofafast‐changingattribute,thebestoptionisoftentousethefact
tabletotrackhistory.Thisisbestillustratedwithanexample.
Consideragainthecustomerdi mension 
withtheageattribute.ModelingtheAgeattri butedirectlyinthecustomerdimensionproducesadesign
likethis.


Figure 44: Age in customer dimension
NoticethateverytimeThomashasabirthday,anewrowisaddedinthedimensiontable.The
alternativedesignapproachsplitsthecustomerdimensionintotwodimensionslikethis. 
15


Figure 55: Age in its own dimension
Notethattherearesomerestrictionsonthesituationwherethisdesigncanbeapplied.Itworksbest

whenthechangingattributetakesonasmall,distinctsetofvalues.Italsoaddscomplexitytothe
design;byaddingmoredimensions tothemodel,itcreatesmoreworkfortheETL
developerswhenthe
facttableisloaded.Also,considerthestorageimpactonthefacttable:Withthealternativedesign,the
facttablebecomeswider,andmorebyteshavetobestoredperrow.
2.1.1.10.2 Type 1 Fast-Changing Attributes
Yourbusinessrequirementmaybeupdatinganattributeofadimensionathighfrequency,daily,or

evenhourly.Forasmallcube,runningProcessUpdatewillhelpyouaddressthisissue.Butasthecube
growslarger,theruntimeofProcessUpdatecanbecometoolongfortheba tchwindoworthereal‐
timerequirementsofthecube(youcanreadmoreabouttuningprocess
updateintheprocessing
section).
Consideragaintheserverhostingexample:Youmaywanttotrackthestatus,whichchangesfrequently,
ofallservers.Fortheexample,letussaythattheserverdimensionisusedbyafacttabletracking
performancecounters.Assumeyouhavemodeledlikethis.
16


Figure 66: Status column in server dimension
TheproblemwiththismodelistheStatuscolumn.IftheFactCounterislargeandstatuschangesalot,
ProcessUpdatewilltakeaverylongtimetorun.Tooptimize,considerthisdesigninstead.

Figure 77: Status column in its own dimension
IfyouimplementDimServerastheintermediatereferencetabletoDimServerStatus,AnalysisServices
nolongerhastokeeptrackofthemetadataintheFactCounterwhenyourunProcessUpdateon
DimServerStatus.Butasdescribedearlier,thismeansthatthejointoDimServerStatuswillhappenat
runtime,increasingCPU
costandquerytimes.Italsomeansthatyoucannotindexattributes in
DimServerbecausetheintermediatedimensionisnotmaterialized.Youhavetocarefullybalancethe

tradeoffbetweenprocessingtimeandqueryspeeds.
17

2.1.1.11 Large Dimensions
InSQLServer2005,SQLServer2008,andSQLServer2008R2,AnalysisServiceshas somebuilt‐in
limitationsthatlimitthesizeofthedimensionsyoucancreate.Firstofall,ittakestimetoupdatea
dimension–thisisexpensivebecauseallindexesonfacttableshaveto
beconsideredforinvalidation
whenanattributechanges.Second,stringvaluesindimensionattributes are storedonadiskstructure
calledthestringstore.Thisstructurehasasizelimitationof4GB.Ifadimensioncontainsattri butes
wherethetotalsizeofthestringvalues(thisincludestranslations)exceeds
4GB,youwillgetanerror
duringprocessing.ThenextversionofSQLServerAnalysisServices,code‐named“Denali”,isexpectedto
removethislimitation.
Considerforamomentadimensionwithtensorevenhundredsofmillionsofmembers.Sucha
dimensioncanbebui lt andaddedtoa
cube,evenonSQLServer2005,SQLServer 2008,andSQLServer
2008R2.Butwhatdoessuchadimensionmeantoanad‐hocuser?Howwilltheusernavigateit?Which
hierarchieswillgroupthemembersofthisdimensionintoreasonablesizesthatcanberenderedona
screen?Whileitmaymakesenseforsomereportingpurposestosearchforindividualmembersinsuch
adimension,itmaynotbetherightproblemtosolvewithacube.
Whenyoubuildcubes,askyourself:isthisacubeproblem?Forexample,thinkofthistypicaltelco
modelof
calldetailrecords.

Figure 88: Call detail records (CDRs)
Inthisparticularexample,thereare300millioncustomersinthedatamodel.Thereisnogoodwayto
groupthesecustomersandallowad‐hocaccesstothecubeatreasonablespeeds.Evenifyoumanageto
optimizethespaceusedtofitinthe4‐GBstringstore,how
wouldusersbrowseacustomerdimension

likethis?
Ifyoufindyourselfinasituationwhereadimensionbecomestoolargeandunwieldy,considerbuilding
thecubeontopofanaggregate.Forthetelcoexample,imagineatransformationlikethefollowing.
18


Figure 99: Cube built on aggregate
Usinganaggregatedfacttable,thisturnsa300‐million‐rowdime nsionprobleminto100,000‐row
dimensionproblem.Youcanconsideraggregatingthefactstosavestoragetoo–alternatively,youcan
addademographicskeydirectlytotheoriginalfacttable,processontopofthisdatasource,andrely
on
MOLAPcompressiontoreducedatasizes.
2.1.2 Partitioning a Cube
Partitionsseparatemeasuregroupdataintophysicalstorageunits.Effective use ofpartitionscan
enhancequeryperformance,improveprocessingperformance,andfacilitatedatamanagement.This
sectionspecificallyaddresseshowyoucanusepartitionstoimprovequeryperformance.Youmustoften
makeatradeoffbetweenqueryandprocessingperformanceinyourpartitioning
strategy.
Youcanusemultiplepartitionstobreakupyourmeasuregroupintoseparatephysicalcomponents.The
advantagesofpartitioningforimprovingqueryperformancearepartitioneliminationandaggregation
design.
Partitionelimination‐Partitionsthatdonotcontaindatainthesubcubearenotqueriedatall,thus
avoidingthecost
ofreadingtheindex(orscanningatableiftheserverisinROLAPmode).Whilereading
apartitionindexandfindingnoavailablerowsisacheapoperation,asthenumberofconcurrentusers
grows,thesereadsbegintoputastraininthethreadpool.Also,forqueriesthatdo
nothaveindexesto
supportthem,AnalysisServiceswillhavetoscanallpotentiallymatchingpartitionsfordata.
Aggregationdesign‐Eachpartitioncanhaveitsownorsharedaggregationdesign.Therefore,partitions
queriedmoreoftenordifferentlycanhavetheirowndesigns.

19


Figure 1010: Intelligent querying by partitions
Figure10displaystheprofiler traceofqueryrequestingResellerSalesAmountbyBusinessTypefrom
AdventureWorks.TheResellerSalesmeasuregroupoftheAdventureWorkscubecontainsfour 
partitions:oneforeachyear.Becausethequerysliceson2003,thestorageenginecangodirectlytothe
2003Reseller
Salespartitionandignoreotherparti tions.
2.1.2.1 Partition Slicing
Partitionsareboundtoasourcetable,view,orsourcequery.Whentheformulaenginerequestsa
subcube,thestorageenginelooksatthemetadataofpartitionfortherelevantmeasuregroup.Each
partitionmaycontainaslicedefinition,ahighleveldescriptionoftheminimumandmaximumattribute
DataIDsthat
existinthatdimension.Ifitcanbedeterminedfromtheslicedefinitionthattherequested
subcubedatais not presentinthepartition,thatpartitionisignored.Iftheslicedefinitionismissingorif
theinformationinthesliceindicatesthatrequireddataispresent,thepartitionis
accessedbyfirst
lookingattheindexes(ifany)andthenscanningthepartitionsegments.
Thesliceofapartitioncanbesetintwoways:
 Autoslice–whenAnalysisServicesreadsthedataduringprocessing,itkeepstrackofthe
minimumandmaximumattributeDataIDreads.Thesevalues
areusedtosettheslicewhenthe
indexesarebuiltonthepartition.
 Manualslicer–Therearecaseswhereautoslicewillnotwork–thesearedescribedinthenext
section.Forthosesituations,youcanmanuallysettheslice.Manualslicesaretheonlyavailable
sliceoptionforROLAPpartitionsandproactivecachingpartitions.
2.1.2.1.1 Auto Slice
DuringprocessingofMOLAPpartitions,AnalysisServicesinternallyidentifiestherangeofdatathatis
containedineachpartitionbyusingtheMinandMaxDataIDsofeachattributetocalculatetherange of

datathatiscontainedinthepartition.
Thedatarangeforeachattributeisthencombinedtocreatethe
slicedefinitionforthepartition.
20

TheMinandMaxDataIDscanspecifyaeitherasinglememberorarangeofmembers.Forexample,
partitioningbyyearresultsinthesameMinandMaxDataIDslicefortheyearattribute,andqueriestoa
specificmomentintimeonlyresultinpartitionqueriesto
thatyear’spartition.
ItisimportanttorememberthatthepartitionsliceismaintainedasarangeofDataIDsthatyouhaveno
explicitcontrolover.DataIDsareassignedduringdimensionprocessingasnewmembersare
encountered.BecauseAnalysisServicesjustlooksattheminimumandmaximumvalueoftheDataID,

youcanendupreadingpartitionsthatdon’tcontainrelevantdata.
Forexample:ifyouhaveapartition,P2003_4,thatcontainsboth2003and2004data,youarenot
guaranteedthattheminimumandmaximumDataIDintheslidecontainvaluesnexttoeachother(even
thoughtheyearsare
adjacent).Inourexample,letussaytheDataIDfor2003is42andtheDataIDfor
2004is45.BecauseyoucannotcontrolwhichDataIDgetsassignedtowhichmembers,youcouldbeina
situationwheretheDataIDfor2005is44.Whenauserrequestsdatafor
2005,AnalysisServiceslooksat
thesliceforP2003_4,seesthatitcontainsdataintheinterval42to45andthereforeconcludesthatthis
partitionhastobescannedtomakesureitdoesnotcontainthevaluesforDataID44(because44is
between42and45).
Because
ofthisbehavior,autoslicetypicallyworksbestifthedatacontainedinthepartitionmapstoa
singleattributevalue.Whenthatisthecase,themaximumandminimumDataIDcontainedintheslice
willbeequalandtheslicewillworkefficiently.
Notethattheautosliceisnot
definedandindexesarenotbuiltforpartitionswithfewerrowsthan

IndexBuildThreshold(whichhasadefaultvalueof4096).
2.1.2.1.2 Manually Setting Slices
NometadataisavailabletoAnalysisServicesaboutthecontentofROLAPandproactivecaching
partitions.Becauseofthis,youmustmanuallyidentifythesliceinthepropertiesofthe
partition.Itisa
bestpracticetomanually setslicesinROLAPandproactivecachingpartitions.
However,asshownintheprevioussection,therearecaseswhereautoslicewillnotgiveyouthe
desiredpartitioneliminationbehavior.Inthesecasesyoucanbenefitfromdefiningthesliceyourselffor
MOLAPpartitions.Forexample,ifyoupartitionbyyearwithsomepartitionscontainingarangeofyears,
definingthesliceexplicitlyavoidstheproblemofoverlappingDataIDs.Thiscanonlybedonewith
knowledgeofthedata–whichiswhereyoucanaddsomeoptimizationasaBIdeveloper.
It
isgenerallynotabestpracticetocreatepartitionsbeforeyouarereadytofillthemwithdata.Butfor
real‐timecubes,itissometimesagoodideatocreatepartitionsinadvancetoavoidlockingissues.
Whenyoutakethisapproach,itis alsoagoodideato
setamanualsliceonMOLAPpartitionstomake
surethestorageenginedoesnotspendtimescanningemptypartitions.
21

2.1.2.2 Partition Sizing
Fornondistinctcountmeasuregroups,testswithpartitionsizesintherangeof200MBtoupto3GB
indicatethatpartitionsizealonedoesnothaveasubstantialimpactonqueryspeeds.Infact,wehave
successfullydeployedgoodqueryperformanceonpartitionslargerthan3GB.
Thefollowing
graphshowsfourdifferentqueryrunswithdifferentpartitionsizes(theverticalaxisis
totalruntimeinhours).Performanceiscomparablebetweenpartitionsizesandisonlyaffectedbythe
designofthesecurityfeaturesinthisparticularcustomercube.

Figure 1111: Throughput by partition size (higher is better)
Thepartitioningstrategyshouldbebasedonthesefactors:

 Increasingprocessingspeedandflexibility
 Increasingmanageabilityofbringinginnewdata
 Increasingqueryperformancefrompartitioneliminationasdescribedearlier
 Supportfordifferentaggregationdesigns
Asyouaddmorepartitions,themetadataoverheadofmanagingthecube
growsexponentially.This
affectsProcessUpdateandProcessAddoperationsondimensions,whichhavetotraversethemetadata
dependenciestoupdatethecubewhendimensionschange.Asaruleofthumb,youshouldtherefore
seektokeepthenumberofpartitionsinthecubeinthelowthousands–whileatthe
sametime
balancingtherequirementsdiscussedhere.
Forlargecubes,preferlargerpartitionsovercreatingtoomanypartitions.Thisalsomeansthatyoucan
safelyignoretheAnalysisManagementObjects(AMO)warninginMicrosoftVisualStudiothatpartition
sizesshouldnotexceed20millionrows.
2.1.2.3 Partition Strategy
Fromguidanceonpartitionsizing,wecandevelop somecommondesignpatternsforpartition
strategies.
22

2.1.2.3.1 Partition by Date
Mostcubesarebuiltonatleastonecolumncontainingadate.Becausedataoftenarrivesinmonthly,
weekly,daily,orevenhourlyslices,itmakessensetopartitionthecubeondate.Partitioningondate
allowsyoutoreplaceafulldayincaseyouloadfaultydata.
Itallowsyoutoselectivelyarchiveolddata
bymovingthepartitiontocheapstorage.Andfinally,itallowsyoutoeasilygetridofdata,byremoving
anentirepartition.Typically,adatepartitioningschemelookssomewhatlikethis.

Figure 1212: Partitioning by Date
Notethatinordertomovethepartitiontocheaperstorage,youwillhavetochangethedatalocation
andreprocessesthepartition.Thisdesignworksverywellforsmalltomedium‐sizedcubes.Itis

reasonablysimpletoimplementandthenumberofpartitionsiskeptlow.However,itdoes
sufferfroma
fewdrawbacks:
1. Ifthegranularityofthepartitioningissmallenough(forexample,hourly),thenumberof
partitionscanquicklybecomeunmanageable.
2. Assumingdataisaddedonlytothelatestpartition, partitionprocessingislimitedtooneTCP/IP
connectionreadingfromthedatasource.If
youhavealotofdata,thiscanbeascalabilitylimit.
Ad1)Ifyouhavealotofdate‐basedpartitions,itisoftenagoodideatomergetheolderonesintolarge
partitions.YoucandothiseitherbyusingtheAnalysisServicesmergefunctionalityorby
droppingthe
oldpartitions,creatinganew,largerpartition,andthenreprocessingit.Repro cessingwilltypicallytake
23

longerthanmerging,butwehavefoundthatcompressionofthepartitioncanoftenincreaseifyou
reprocess.Amodified,datepartitioningschememaylooklikethis.

Figure 1313: Modified Date Partitioning
Thisdesignaddressesthemetadataoverheadofhavingtoomanypartitions.Butitisstillbottlenecked
bythemaximumspeedoftheProcessAddorProcessFullforthelatestpartition.Ifyourdatasourceis
SQLServer,thespeedofasingledatabaseconnectioncanbehundredsofthousandsof
rowsevery
second–whichworkswellformostscenarios.Butifthecuberequiresevenfasterprocessingspeeds,
considermatrixpartitioning.
2.1.2.3.2 Matrix Partitioning
Forlargecubes,itisoftenagoodideatoimplementamatrixpartitioningscheme:partitiononboth
dateandsomeotherkey.Thedatepartitioningisused
toselectivelydeleteormergeoldpartitionsas
describedearlier.Theotherkeycanbeusedtoachieveparallelismduringpartitionprocessingandto
restrictcertainuserstoasubsetofthepartitions.Forexample,consideraretailerthatoperatesinUS,

Europe,andAsia.Youmightdecidetopartition
likethis.
24


Figure 1414: Example of matrix partitioning
Iftheretailergrows,theymaychoosetosplittheregionpartitionsintosmallerpartitionstoincrease
parallelismofloadfurtherandtolimittheworst‐casescansthatausercanperform.Forcubesthatare
expectedtogrowdramatically,itisagoodideatochooseapartitionkey
thatgrowswiththebusiness
andgivesyouoptionsforextendingthematrixpartitioningstrategyappropriately.Thefollowingtable
containsexamplesofsuchpartitioningkeys.
Industry Examplepartitionkey Sourceofdataproliferation
Webretail Customerkey Addingcustomersandtransactions 
Storeretail Storekey Addingnewstores
Datahosting HostIDorracklocation Addinganewserver

×