Microsoft SQL Server Analysis
Services Multidimensional
Performance and Operations Guide
Thomas Kejser and Denny Lee
Contributors and Technical Reviewers: Peter Adshead (UBS), T.K. Anand,
KaganArca, Andrew Calvett (UBS), Brad Daniels, John Desch, Marius
Dumitru, WillfriedFärber (Trivadis), Alberto Ferrari (SQLBI), Marcel Franke
(pmOne), Greg Galloway (Artis Consulting), Darren Gosbell (James &
Monroe), DaeSeong Han, Siva Harinath, Thomas Ivarsson (Sigma AB),
Alejandro Leguizamo (SolidQ), Alexei Khalyako, Edward Melomed,
AkshaiMirchandani, Sanjay Nayyar (IM Group), TomislavPiasevoli, Carl
Rabeler (SolidQ), Marco Russo (SQLBI), Ashvini Sharma, Didier Simon, John
Sirmon, Richard Tkachuk, Andrea Uggetti, Elizabeth Vitt, Mike Vovchik,
Christopher Webb (Crossjoin Consulting), SedatYogurtcuoglu, Anne Zorner
Summary: Download this book to learn about Analysis Services Multidimensional
performance tuning from an operational and development perspective. This book
consolidates the previously published SQL Server 2008 R2 Analysis Services Operations
Guide and SQL Server 2008 R2 Analysis Services Performance Guide into a single
publication that you can view on portable devices.
Category: Guide
Applies to: SQL Server 2005, SQL Server 2008, SQL Server 2008 R2, SQL Server 2012
Source: White paper (
link to source content
, link to source content)
E-book publication date: May 2012
200 pages
This page intentionally left blank
Copyright © 2012 by Microsoft Corporation
All rights reserved. No part of the contents of this book may be reproduced or transmitted in any form or by any means
without the written permission of the publisher.
Microsoft and the trademarks listed at
are trademarks of the
Microsoft group of companies. All other marks are property of their respective owners.
The example companies, organizations, products, domain names, email addresses, logos, people, places, and events
depicted herein are fictitious. No association with any real company, organization, product, domain name, email address,
logo, person, place, or event is intended or should be inferred.
This book expresses the author’s views and opinions. The information contained in this book is provided without any
express, statutory, or implied warranties. Neither the authors, Microsoft Corporation, nor its resellers, or distributors will
be held liable for any damages caused or alleged to be caused either directly or indirectly by this book.
4
Contents
1 Introduction 5
2 Part1:BuildingaHigh‐PerformanceCube 6
2.1 DesignPatternsforScalableCubes 6
2.2 TestingAnalysisServicesCubes 32
2.3 TuningQueryPerformance 39
2.4 TuningProcessingPerformance 76
2.5 SpecialConsiderations 93
3 Part2:Running
aCubeinProduction 105
3.1 ConfiguringtheServer 106
3.2 MonitoringandTuningtheServer 133
3.3 SecurityandAuditing 143
3.4 HighAvailabilityandDisasterRecovery 147
3.5 DiagnosingandOptimizing 150
3.6 ServerMaintenance 189
3.7 SpecialConsiderations 192
4
Conclusion 200
Sendfeedback. 200
5
1 Introduction
ThisbookconsolidatestwopreviouslypublishedguidesintooneessentialresourceforAnalysisServices
developersandoperationspersonnel.AlthoughthetitlesoftheoriginalpublicationsindicateSQLServer
2008R2,mostoftheknowledgethatyougainfromthisbookisea sily transferredtootherversionsof
AnalysisServices,includingmultidimensional
modelsbuiltusingSQLServer2012.
Part1isfromthe“SQLServer2008R2AnalysisServicesPerformanceGuide”. PublishedinOctober
2011,thisguidewascreatedfordevelopersandcubedesignerswhowanttobuildhigh‐performance
cubesusingbestpracticesandinsightslearnedfromreal‐worlddevelopmentprojects.
InPart1,you’ll
learnproventechniquesforbuildingsolutionsthatarefastertoprocessandquery,minimizingtheneed
forfurthertuningdowntheroad.
Part2isfromthe“SQLServer2008R2AnalysisServicesOperationsGuide“.Thisguide,publishedinJune
2011,isintendedfordevelopersandoperations
specialistswhomanagesolutionsthatarealreadyin
production.Part2showsyouhowtoextractperformancegainsfromaproductioncube,including
changingserverandsystemproperties,andperformingsystemmaintenancethathelpyouavoid
problemsbeforetheystart.
Whileeachguidetargetsadifferentpartofasolution
lifecycle,havingbothinasingleportableformat
givesyouanintellectualtoolkitthatyoucanaccessonmobiledeviceswhereveryoumaybe.Wehope
youfindthisbookhelpfulandeasytouse,butit is onlyoneofseveralformatsavailableforthiscontent.
Youcanalsoget
printableversionsofbothguidesbydownloadingthemfromtheMicrosoftwebsite.
6
2 Part 1: Building a High-Performance Cube
ThissectionprovidesinformationaboutbuildingandtuningAnalysisServicescubesforthebestpossible
performance.Itisprimarilyaimedatbusinessintelligence(BI)developerswhoarebuildinganewcube
fromscratchoroptimizinganexistingcubeforbetterperformance.
Thegoalofthissectionistoprovideyouwith
thenecessarybackgroundtounderstanddesigntradeoffs
andwithtechniquesanddesignpatternsthatwillhelpyouachievethebestpossibleperformanceof
evenlargecubes.
Cubeperformancecanbedividedintotwotypesofworkload:queryperformanceandprocessing
performance.Becausetheseworkloadsareverydifferent,thissectionisorganized
intofourmain
groups.
DesignPatternsforScalableCubes–Noamountofquerytuningandoptimizationcanbeatthebenefits
ofawell‐designeddatamodel.Thissectioncontainsguidancetohelpyougetthedesignrightthefirst
time.Ingeneral,goodcubedesignfollowsKimballmodelingtechniques,and
ifyouavoidsometypical
designmistakes,youareinverygoodshape.
TestingAnalysisServicesCubes–Inever yITproject,preproductiontestingisacrucialpartofthe
developmentanddeploymentcycle.Evenwiththemostcarefuldesign,testingwillstillbeabletoshake
outerrorsand
avoidproductionissues.Designingandrunningatestrunofanenterprisecubeistime
wellinvested.Hence,thissectionincludesadescriptionofthetestmethodsavailabletoyou.
TuningQueryPerformance‐Queryperformancedirectlyimpactsthequalityoftheend‐user
experience.Assuch,itistheprimarybenchmark
usedtoevaluatethesuccessofanonlineanalytical
processing(OLAP)implementation.AnalysisServicesprovidesavarietyofmechanismstoaccelerate
queryperformance,includingaggregations,caching,andindexeddataretrieval.Thissectionalso
providesguidanceonwritingefficientMultidimensionalExpressions(MDX)calculationscripts.
TuningProcessingPerformance‐Processingistheoperation
thatrefreshesdatainanAnalysisServices
database.Thefastertheprocessingperformance,thesooneruserscanaccessrefresheddata.Analysis
Servicesprovidesavarietyofmechanismsthatyoucanusetoinfluenceprocessingperformance,
includingparallelizedprocessingdesigns,relationaltuning,andaneconomicalprocessingstrategy(for
example,incrementalversusfull
refreshversusproactivecaching).
SpecialConsiderations–SomefeaturesofAnalysisServicessuchasdistinctcountmeasuresandmany‐
to‐manydimensionsrequiremorecarefulattentiontothecubedesignthanothers.AttheendofPart1,
youwillfindasectionthatdescribesthespecialtechniquesyoushould
applywhenusingthesefeatures.
2.1 Design Patterns for Scalable Cubes
CubespresentauniquechallengetotheBIdeveloper:theyaread‐hocdatabasesthatareexpectedto
respondtomostqueriesinshorttime.Thefreedomoftheenduserislimitedonlybythedatamodel
youimplement.Achievingabalancebetweenuserfreedomandscalabledesignwilldetermine
the
7
successofacube.Eachindustryhasspecificdesignpatternsthatlendthemselveswelltovalueadding
reporting–andadetailedtreatmentofoptimal,industryspecificdatamodelisoutsidethescopeofthis
book.However,therearealotofcommondesignpatternsyoucanapplyacross
allindustries‐this
sectiondealswiththesepatternsandhowyoucanleveragethemforincreasedscalabilityinyourcube
design.
2.1.1 Building Optimal Dimensions
Awell‐tuneddimensiondesignisoneofthemostcriticalsuccessfactorsofahigh‐performing Analysis
Servicessolution.Thedimensionsofthecubearethefirststopfordataanalysisandtheirdesignhasa
deepimpactontheperformanceofallmeasuresinthecube.
Dimensionsarecomposed
ofattributes,whicharerelatedtoeachotherthroughhierarchies.Efficient
useofattributesisakeydesignskilltomaster,andstudyingandimplementingtheattribute
relationshipsavailableinthebusinessmodelcanhe lpimprovecubeperformance.
Inthissection,youwillfindguidanceonbuildingoptimizeddimensionsand
properlyusingboth
attributesandhierarchies.
2.1.1.1 Using the KeyColumns, ValueColumn, and NameColumn Properties
Effectively
Whenyouaddanewattributetoadimension,threepropertiesareusedtodefinetheattribute.The
KeyColumnspropertyspecifiesoneormoresourcefieldsthatuniquelyidentifyeachinstanceofthe
attribute.
TheNameColumnpropertyspecifiesthesourcefield thatwillbedisplayedtoendusers.Ifyoudo
not
specifyavaluefortheNameColumnproperty,itisautomaticallysettothevalueoftheKeyColumns
property.
ValueColumnallowsyoutocarryfurtherinformationabouttheattribute–typicallyusedfor
calculations.Unlikememberproperties,thispropertyofanattributeisstronglytyped–providing
increasedperformancewhen
itisusedincalculations.Thecontentsofthispropertycanbeaccessed
throughtheMemberValueMDXfunction.
UsingbothValueColumnandNameColumntocarryinformationeliminatestheneedforextraneous
attributes.Thisreducesthetotalnumberofattri butesinyourdesign,makingitmoreefficient.
Itisabestpractice
toassignanumericsourcefield,ifavailable,totheKeyColumnspropertyratherthan
astringproperty.Furthermore,useasinglecolumnkeyinsteadofacomposite,multi‐columnkey.Not
onlydothesepracticesthisreduceprocessingtime,theyalsoreducethesizeofthedimensionandthe
likelihood
ofusererrors.Thisisespeciallytrueforattributesthathavealargenumberofmembers,that
is,greaterthanonemillionmembers.
8
2.1.1.2 Hiding Attribute Hierarchies
Formanydimensions,youwillwanttheusertonavigatehierarchiescreatedforeaseofaccess.For
example,acustomerdimensioncouldbenavigatedbydrillingintocountryandcitybeforereachingthe
customername,orbydrillingthroughagegroupsorincomelevels.Suchhierarchies,coveredinmore
detaillater,
makenavigationofthecubeeasier–andmakequeriesmoreefficient.
Inadditiontouserhierarchies,AnalysisServicesbydefaultcreatesaflathierarchyforeveryattributein
adimension–theseareattributehierarchies.Hidingattributehierarchiesisoftenagoodidea,because
alotofhierarchiesin
asingledimensionwilltypi callyconfuseusersandmakeclientqueriesless
efficient.ConsidersettingAttri buteHierarchyVisible=falseformostattributehierarchiesanduseuser
hierarchiesinstead.
2.1.1.2.1 Hiding the Surrogate Key
Itisoftenagoodideatohidethesurrogatekeyattributeinthedi mension.Ifyouexposethesurrogate
keytothe
clienttoolsasaValueColumn,thosetools mayrefertothekeyvaluesinreports.The
surrogatekeyinaKimballstarschemadesignholdsnobusinessinformation,andmayevenchangeif
youremodeltype2history.Afteryoucreateadependencytothekeyintheclienttools,you
cannot
changethekeywithoutbreakingreports.Becauseofthis,youdon’twantend‐userreportsreferringto
thesurrogatekeydirectly–andthisiswhywerecommendhidingit.
Thebestdesignforasurrogatekeyistohideitfromusersinthedimensiondesignbysetting
the
AttributeHierarchyVisible=falseandbynotincludingtheattributeinanyuserhierarchies.This
preventsend‐usertoolsfromreferencingthesurrogatekey,leavingyoufreetochangethekeyvalueif
requirementschange.
2.1.1.3 Setting or Disabling Ordering of Attributes
Inmostcases,youwantanattributetohaveanexplicitordering.Forexample,youwillwantaCity
attributetobesortedalphabetically.Youshould explicitlysettheOrderByorOrderByAttribute
propertyoftheattributetoexplicitlycontrolthisord ering.Typically,thisorderingisbyattributename
orkey,but
itmayalsobeanotherattribute.Ifyouincludeanattributeonlyforthepurposeofordering
anotherattribute,makesureyousetAttributeHierarchyEnabled=falseand
AttributeHierarchyOptimizedState=NotOptimizedtosaveonprocessingoperations.
Therearefewcaseswhereyoudon’tcareabouttheorderingofanattribute,yet
thesurrogatekeyis
onesuchcase.Forsuchhiddenattributethatyouusedonlyforimplementationpurposes,youcanset
AttributeHierarchyOrdered=falsetosavetimeduringprocessingofthedimension.
2.1.1.4 Setting Default Attribute Members
Anyquerythatdoesnotexplicitlyreferenceahierarchywillusethecurrentmemberofthat
hierarchy.ThedefaultbehaviorofAnalysisServicesistoassigntheAllmemberofadimensionasthe
defaultmember,whichisnormallythedesiredbehavior.Butforsomeattributes,suchasthecurrent
9
dayinadatedimension,itsometimesmakessensetoexplicitly assignadefaultmember.Forexample,
youmaysetadefaultdateintheAdventureWorkscubelikethis.
ALTERCUBE [Adventure Works]UPDATE
DIMENSION [Date], DEFAULT_MEMBER='[Date].[Date].&[2000]'
However,defaultmembersmaycauseissuesintheclienttool.Forexample,MicrosoftExcel2010will
notprovideavisualindicationthatadefaultmemberiscurrentlyselectedandhenceimplicitlyinfluence
thequeryresult.ThismayconfuseuserswhoexpecttheAllleveltobethecurrentmemberwhen
no
othermembersareimpliedbythequery.Also,ifyousetadefaultmemberinadimensionwithmultiple
hierarchies,youwilltypicallygetresultsthatarehardforuserstointerpret.
Ingeneral,preferexplicitlydefaultmembersonlyondimensionswithsinglehierarchiesorinhierarchies
thatdonot
haveanAlllevel.
2.1.1.5 Removing the All Level
MostdimensionsrolluptoacommonAlllevel,whichistheaggregationofalldescendants.Butthere
aresomeexceptionswhereisdoesnotmakesensetoqueryattheAlllevel.Forexample,youmayhave
acurrencydimensioninthecube–andaskingfor“thesumof
allcurrencies”isameaninglessquestion.
ItcanevenbeexpensivetoaskfortheAlllevelofdimensionifthereisnotgoodaggregatetorespondto
thequery.Forexample,ifyouhaveacubepartitionedbycurrency,askingfortheAlllevelofcurrency
willcausea
scanofallpartitions,whichcouldbeexpensiveandleadtoauselessresult.
InordertopreventusersfromqueryingmeaninglessAlllevels,youcandisabletheAllmemberina
hierarchy.YoudothisbysettingtheIsAggregateable=falseontheattributeatthetopofthehierarchy.
NotethatifyoudisabletheAlllevel,youshouldalsosetadefaultmemberasdescribedintheprevious
section–ifyoudon’t,AnalysisServiceswillchooseoneforyou.
2.1.1.6 Identifying Attribute Relationships
Attributerelationshipsdefinehierarchicaldependenciesbetweenattributes.Inotherwords,ifAhasa
relatedattributeB,writtenAB,thereisonememberinBforeverymemberinA,andmanymembers
inAforagivenmemberinB.Forexample,givenanattributerelationshipCity
State,ifthecurrent
cityisSeattle,weknowtheStatemustbeWashington.
Often,therearerelationshipsbetweenattributesthatmightormightnotbemanifestedintheoriginal
dimensiontablethatcanbeusedbytheAnalysisServicesenginetooptimizeperformance.Bydefault,
allattributesarerelated
tothekey,andtheattributerelationshipdiagramrepresentsa“bush”where
relationshipsallstemfromthekeyattributeandendateachother’sattribute.
10
Figure
1
Youcan
o
amodel
n
otherwo
r
relations
h
Figure
2
Attribute
C
s
a
A
r
e
A
Consider
t
attribute
1
1: Bushy
a
o
ptimizeperf
o
n
ameidentifi
e
r
ds,asingle
s
h
ipsintheat
t
2
2: Redefi
n
relationship
s
rossproduct
s
a
vesCPUtim
e
ggregations
b
e
sourcesduri
uto‐Existcan
t
hecross‐pr
o
relationships
a
ttribute r
o
rmanceby
d
e
stheprodu
c
s
ubcategoryi
s
t
ributerelati
o
n
ed attrib
u
s
helpperfor
m
s
betweenle
v
e
duringque
r
b
uiltonattri
b
ngprocessin
g
moreefficie
o
ductbetwee
havebeene
elationshi
p
d
efininghiera
c
tlineandsu
s
notfoundi
n
o
nshipeditor
,
u
te relatio
n
m
anceinthr
e
v
elsinthehi
e
r
ies.
b
utescanbe
r
g
andforqu
e
ntlyeliminat
e
nSubcatego
r
xplicitlydefi
n
p
s
rchicalrelati
o
bcategory,a
n
n
morethan
o
,
therelation
n
ships
e
esignificant
e
rarchydon
o
r
eusedforq
u
e
ries.
e
attributec
o
r
yandCateg
o
n
ed,theengi
n
o
nshipssupp
n
dthesubca
t
o
necategory
shipsarecle
a
ways:
o
tneedtogo
u
eriesonrela
t
o
mbinations
t
o
ryinthetw
o
n
emustfirst
f
ortedbythe
t
egoryidenti
f
.Ifyouredef
i
a
rer.
throughthe
t
edattribute
s
t
hatdonote
x
o
figures.Int
f
indwhichp
r
data.Inthis
c
f
iesacatego
r
i
nethe
keyattribute
s
.Thissaves
x
istintheda
t
hefirst,whe
r
r
oductsarei
n
c
ase,
r
y.In
.This
t
a.
r
eno
n
11
eachsubcategoryandthendeterminewhichcategorieseachoftheseproductsbelongsto.Forlarge
dimensions,thiscantakealongtime.Iftheattributerelationshipisdefined,theAnalysisServ ices
engineknowsbeforehandwhichcategoryeachsubcategorybelongstoviaindexesbuiltatprocesstime.
2.1.1.6.1 Flexible vs. Rigid Relationships
Whenanattribute
relationshipisdefined,therelatio ncaneitherbeflexibleorrigid.Aflexibleattribute
relationshipisonewherememberscanmovearoundduringdimensionupdates,andarigidattribute
relationshipisonewherethememberrelationshipsareguaranteedtobefixed.Forexample,the
relationshipbetweenmonthandyearis
fixedbecauseaparticularmonthisn’tgoingtochangeitsyear
whenthedimensionisreprocessed.However,therelationshipbetweencustomerandcitymaybe
flexibleascustomersmove.
Whenachangeisdetectedduringprocessinaflexiblerelationship,allindexesforpartitionsreferencing
theaffecteddimension(includingtheindexes
forattributethatarenotaffected)mustbeinvalidated.
ThisisanexpensiveoperationandmaycauseProcessUpdateoperationstotakeaverylongtime.
IndexesinvalidatedbychangesinflexiblerelationshipsmustberebuiltafteraProcessUpdateoperation
withaProcessIndexontheaffectedpartitions;this
addsevenmoretimetocubeprocessing.
Flexiblerelationshipsarethedefaultsetting.Carefullyconsidertheadvantages ofrigidrelationshipsand
changethedefaultwherethedesignallowsit.
2.1.1.7 Using Hierarchies Effectively
AnalysisServicesenablesyoutobuildtwotypesofuserhierarchies:naturalandunnaturalhierarchies.
Eachtypehasdifferentdesignandperformancecharacteristics.
Inanaturalhierarchy,allattributesparticipatingaslevelsinthehierarchyhavedirectorindirect
attributerelationshipsfromthebottomofthehierarchytothetopof
thehierarchy.
Inanunnaturalhierarchy,thehierarchyconsistsofatleasttwoconsecutivelevelsthathavenoattribute
relationships.Typicallythesehierarchiesareusedtocreatedrill‐downpathsofcommonlyviewed
attributesthatdonotfollowanynaturalhierarchy.Forexample,usersmaywanttoviewahierarchyof
GenderandEducation.
Figure 33: Natural and unnatural hierarchies
12
Fromaperformanceperspective,naturalhierarchiesbehaveverydifferentlythanunnaturalhierarchies
do.Innaturalhierarchies,thehierarchytreeismaterializedondiskinhierarchystores.Inaddition,all
attributesparticipating innaturalhierarchiesareautomaticallyconsideredtobeaggregationcandidates.
Unnaturalhierarchiesarenotmaterializedondisk,and
theattributesparticipatinginunnatural
hierarchiesarenotautomaticallyconsideredasaggregationcandidates.Rather,theysimplyprovide
userswitheasy‐to‐usedrill‐downpathsforcommonlyviewedattributesthatdonothavenatural
relationships.Byassemblingtheseattributesintohierarchies,youcanalsouseavarietyofMDX
navigationfunctions
toeasilyperformcalculationslikepercentofparent.
Totakeadvantageofnaturalhierarchies,definecascadingattributerelationshipsforallattributesthat
participateinthehierarchy.
2.1.1.8 Turning Off the Attribute Hierarchy
Memberpropertiesprovideadifferentmechanismtoexposedimensioninformation.Foragiven
attribute,memberpropertiesareautomaticallycreatedforeverydirectattributerelationship.Forthe
primarykeyattribute,thismeansthateveryattributethatisdirectlyrelatedtotheprimarykeyis
availableasamemberpropertyoftheprimary
keyattribute.
Ifyouonlywanttoaccessanattributeasmemberproperty,afteryouverifythatthecorrectrelationship
isinplace,youcandisabletheattribute’shierarchy bysettingtheAttributeHierarchyEnabledproperty
toFalse.Fromaprocessingperspective,disablingtheattributehierarchycanimproveperformanceand
decreasecube
sizebecausetheattributewillnolongerbeindexedoraggregated.Thiscanbeespecially
usefulforhigh‐cardinalityattribu testhathaveaone‐to‐onerelationshipwiththeprimarykey.High‐
cardinalityattributessuchasphonenumbersandaddressestypicallydonotrequireslice‐and‐dice
analysis.Bydisabling
thehierarchiesfortheseattributesand accessingthemviamemberproperties,
youcansaveprocessingtimeandreducecubesize.
Decidingwhethertodisabletheattribute’shierarchyrequiresthatyouconsiderboththequeryingand
processingimpactsofusingmemberproperties.Memberproperties cannotbeplacedonaqueryaxisin
anMDXqueryinthesamemannerasattributehierarchiesanduserhierarchies.Toqueryamember
property,youmustquerytheattributethatcontainsthatmemberproperty.
Forexample,ifyourequiretheworkphonenumberforacustome r ,youmustquerythepropertiesof
customerandthenrequest
thephonenumberproperty.Asaconvenience,mostfront‐endtoolseasily
displaymemberpropertiesintheiruserinterfaces.
Ingeneral,filteringmeasuresusingmemberpropertiesisslowerthanfilteringusingattribute
hierarchies,becausememberpropertiesarenotindexedanddonotparticipateinaggregations.The
actualimpacttoqueryperformance
dependsonhowyouusetheattribute.
Forexample,ifyouruserswanttosliceanddicedatabybothaccountnumber andaccountdescription,
fromaqueryingperspectiveyoumaybebetteroffhavingtheattributehierarchiesinplaceand
removingthebitmapindexesifprocessingperformanceisan
issue.
13
2.1.1.9 Reference Dimensions
Referencedimensionsallow youtobuildadimensionalmodelontopofasnowflakerelationaldesign.
Whilethisisapowerfulfeature,youshouldunderstandtheimplicationsofusingit.
Bydefault,areferencedimensionisnon‐materialized.Thismeansthatqueri eshavetoperformthejoin
betweenthe
referenceandtheouterdimensiontableatquerytime.Also,filtersdefinedonattributesin
theouterdimensiontablearenotdrivenintothemeasuregroupwhenthebitmapstherearescanned.
Thismayresultinreadingtoomuchdatafromdisktoansweruserqueries.Leavingadimensionasnon
‐
materializedprioritizesmodelingflexibilityoverqueryperformance.Considercarefullywhetheryoucan
affordthistradeoff:cubesaretypically intendedtobefastad‐hocstructures,andputtingthe
performanceburdenontheenduserisrarelyagoodidea.
AnalysisServiceshastheabilitytomaterializethereferencesdimension.When
youenablethisoption,
memoryanddiskstructuresarecreatedthatmakethedimensionbehavejustlikeadenormalizedstar
schema.Thismeansthatyouwillretainalltheperformancebenefitsofaregular,non‐reference
dimension.However,becarefulwithmaterializedreferencedimension–ifyourunaproces supdate
on
theintermediatedimension,anychangesintherelationshipsbetweentheouterdimensionandthe
referencewillnotbereflectedinthecube.Instead,theoriginalrelationshipbetweentheouter
dimensionandthemeasuregroupisretained–whichismostlikelynotthedesiredresult.Inaway,you
canconsiderthereferencetabletobearigidrelationshiptoattributesintheouterattributes.Theonly
waytoreflectchangesinthereferencetableistofullyprocessthedimension.
2.1.1.10 Fast-Changing Attributes
Somedatamodelscontainattributesthatchangeveryfast.Dependingonwhichtypeofhistorytracking
youneed,youmayfacedifferentchallenges.
Type2Fast‐ChangingAttributes‐Ifyoutrackeverychangetoafast‐changingattribu te,thismaycause
thedimensioncontainingtheattributetogrowverylarge.Type2
attributesaretypicallyaddedtoa
dimensionwithaProcessAddcommand.Atsomepoint,runningProcessAddonalargedimensionand
runningalltheconsistencycheckswilltakealongtime.Also,havingahugedimensionisunwieldy
becauseuserswillhavetroublequeryingitandtheserverwillhave
troublekeepingitinmemory.A
goodexampleofsuchamodelingchallengeistheageofacustomer–thiswillchangeeveryyearand
causethecustomerdimensiontogrowdramatically.
Type1Fast‐ChangingAttributes–Evenifyoudonottrackeverychangetotheattribute,you
maystill
runintoissueswithfast‐changingattributes.Toreflectachangeinthedatasourcetothecube,you
havetorunProcessUpdateonthechangeddimension.Asthecubeanddimensiongrowslarger,
runningProcessUpdatebecomesexpensive.Anexampleofsuchamodelingchallengeis
totrackthe
statusattributeofaserverinahostingenvironment(“Running”,“Shutdown”,“Overloaded”andsoon).
Astatusattributelikethismaychangeseveraltimesperdayorevenperhour.Runningfrequent
ProcessUpdatesonsuchadimensiontoreflectchangescanbeanexpensiveoperation,andit
maynot
befeasiblewiththelockingimplementationofAnalysisServicesinaproductionenvironment.
14
Inthefollowingsections,wewilllookatsomemodelingoptionsyoucanusetoaddresstheseproblems.
2.1.1.10.1 Type 2 Fast-Changing Attributes
Ifhistorytrackingisarequirementofafast‐changingattribute,thebestoptionisoftentousethefact
tabletotrackhistory.Thisisbestillustratedwithanexample.
Consideragainthecustomerdi mension
withtheageattribute.ModelingtheAgeattri butedirectlyinthecustomerdimensionproducesadesign
likethis.
Figure 44: Age in customer dimension
NoticethateverytimeThomashasabirthday,anewrowisaddedinthedimensiontable.The
alternativedesignapproachsplitsthecustomerdimensionintotwodimensionslikethis.
15
Figure 55: Age in its own dimension
Notethattherearesomerestrictionsonthesituationwherethisdesigncanbeapplied.Itworksbest
whenthechangingattributetakesonasmall,distinctsetofvalues.Italsoaddscomplexitytothe
design;byaddingmoredimensions tothemodel,itcreatesmoreworkfortheETL
developerswhenthe
facttableisloaded.Also,considerthestorageimpactonthefacttable:Withthealternativedesign,the
facttablebecomeswider,andmorebyteshavetobestoredperrow.
2.1.1.10.2 Type 1 Fast-Changing Attributes
Yourbusinessrequirementmaybeupdatinganattributeofadimensionathighfrequency,daily,or
evenhourly.Forasmallcube,runningProcessUpdatewillhelpyouaddressthisissue.Butasthecube
growslarger,theruntimeofProcessUpdatecanbecometoolongfortheba tchwindoworthereal‐
timerequirementsofthecube(youcanreadmoreabouttuningprocess
updateintheprocessing
section).
Consideragaintheserverhostingexample:Youmaywanttotrackthestatus,whichchangesfrequently,
ofallservers.Fortheexample,letussaythattheserverdimensionisusedbyafacttabletracking
performancecounters.Assumeyouhavemodeledlikethis.
16
Figure 66: Status column in server dimension
TheproblemwiththismodelistheStatuscolumn.IftheFactCounterislargeandstatuschangesalot,
ProcessUpdatewilltakeaverylongtimetorun.Tooptimize,considerthisdesigninstead.
Figure 77: Status column in its own dimension
IfyouimplementDimServerastheintermediatereferencetabletoDimServerStatus,AnalysisServices
nolongerhastokeeptrackofthemetadataintheFactCounterwhenyourunProcessUpdateon
DimServerStatus.Butasdescribedearlier,thismeansthatthejointoDimServerStatuswillhappenat
runtime,increasingCPU
costandquerytimes.Italsomeansthatyoucannotindexattributes in
DimServerbecausetheintermediatedimensionisnotmaterialized.Youhavetocarefullybalancethe
tradeoffbetweenprocessingtimeandqueryspeeds.
17
2.1.1.11 Large Dimensions
InSQLServer2005,SQLServer2008,andSQLServer2008R2,AnalysisServiceshas somebuilt‐in
limitationsthatlimitthesizeofthedimensionsyoucancreate.Firstofall,ittakestimetoupdatea
dimension–thisisexpensivebecauseallindexesonfacttableshaveto
beconsideredforinvalidation
whenanattributechanges.Second,stringvaluesindimensionattributes are storedonadiskstructure
calledthestringstore.Thisstructurehasasizelimitationof4GB.Ifadimensioncontainsattri butes
wherethetotalsizeofthestringvalues(thisincludestranslations)exceeds
4GB,youwillgetanerror
duringprocessing.ThenextversionofSQLServerAnalysisServices,code‐named“Denali”,isexpectedto
removethislimitation.
Considerforamomentadimensionwithtensorevenhundredsofmillionsofmembers.Sucha
dimensioncanbebui lt andaddedtoa
cube,evenonSQLServer2005,SQLServer 2008,andSQLServer
2008R2.Butwhatdoessuchadimensionmeantoanad‐hocuser?Howwilltheusernavigateit?Which
hierarchieswillgroupthemembersofthisdimensionintoreasonablesizesthatcanberenderedona
screen?Whileitmaymakesenseforsomereportingpurposestosearchforindividualmembersinsuch
adimension,itmaynotbetherightproblemtosolvewithacube.
Whenyoubuildcubes,askyourself:isthisacubeproblem?Forexample,thinkofthistypicaltelco
modelof
calldetailrecords.
Figure 88: Call detail records (CDRs)
Inthisparticularexample,thereare300millioncustomersinthedatamodel.Thereisnogoodwayto
groupthesecustomersandallowad‐hocaccesstothecubeatreasonablespeeds.Evenifyoumanageto
optimizethespaceusedtofitinthe4‐GBstringstore,how
wouldusersbrowseacustomerdimension
likethis?
Ifyoufindyourselfinasituationwhereadimensionbecomestoolargeandunwieldy,considerbuilding
thecubeontopofanaggregate.Forthetelcoexample,imagineatransformationlikethefollowing.
18
Figure 99: Cube built on aggregate
Usinganaggregatedfacttable,thisturnsa300‐million‐rowdime nsionprobleminto100,000‐row
dimensionproblem.Youcanconsideraggregatingthefactstosavestoragetoo–alternatively,youcan
addademographicskeydirectlytotheoriginalfacttable,processontopofthisdatasource,andrely
on
MOLAPcompressiontoreducedatasizes.
2.1.2 Partitioning a Cube
Partitionsseparatemeasuregroupdataintophysicalstorageunits.Effective use ofpartitionscan
enhancequeryperformance,improveprocessingperformance,andfacilitatedatamanagement.This
sectionspecificallyaddresseshowyoucanusepartitionstoimprovequeryperformance.Youmustoften
makeatradeoffbetweenqueryandprocessingperformanceinyourpartitioning
strategy.
Youcanusemultiplepartitionstobreakupyourmeasuregroupintoseparatephysicalcomponents.The
advantagesofpartitioningforimprovingqueryperformancearepartitioneliminationandaggregation
design.
Partitionelimination‐Partitionsthatdonotcontaindatainthesubcubearenotqueriedatall,thus
avoidingthecost
ofreadingtheindex(orscanningatableiftheserverisinROLAPmode).Whilereading
apartitionindexandfindingnoavailablerowsisacheapoperation,asthenumberofconcurrentusers
grows,thesereadsbegintoputastraininthethreadpool.Also,forqueriesthatdo
nothaveindexesto
supportthem,AnalysisServiceswillhavetoscanallpotentiallymatchingpartitionsfordata.
Aggregationdesign‐Eachpartitioncanhaveitsownorsharedaggregationdesign.Therefore,partitions
queriedmoreoftenordifferentlycanhavetheirowndesigns.
19
Figure 1010: Intelligent querying by partitions
Figure10displaystheprofiler traceofqueryrequestingResellerSalesAmountbyBusinessTypefrom
AdventureWorks.TheResellerSalesmeasuregroupoftheAdventureWorkscubecontainsfour
partitions:oneforeachyear.Becausethequerysliceson2003,thestorageenginecangodirectlytothe
2003Reseller
Salespartitionandignoreotherparti tions.
2.1.2.1 Partition Slicing
Partitionsareboundtoasourcetable,view,orsourcequery.Whentheformulaenginerequestsa
subcube,thestorageenginelooksatthemetadataofpartitionfortherelevantmeasuregroup.Each
partitionmaycontainaslicedefinition,ahighleveldescriptionoftheminimumandmaximumattribute
DataIDsthat
existinthatdimension.Ifitcanbedeterminedfromtheslicedefinitionthattherequested
subcubedatais not presentinthepartition,thatpartitionisignored.Iftheslicedefinitionismissingorif
theinformationinthesliceindicatesthatrequireddataispresent,thepartitionis
accessedbyfirst
lookingattheindexes(ifany)andthenscanningthepartitionsegments.
Thesliceofapartitioncanbesetintwoways:
Autoslice–whenAnalysisServicesreadsthedataduringprocessing,itkeepstrackofthe
minimumandmaximumattributeDataIDreads.Thesevalues
areusedtosettheslicewhenthe
indexesarebuiltonthepartition.
Manualslicer–Therearecaseswhereautoslicewillnotwork–thesearedescribedinthenext
section.Forthosesituations,youcanmanuallysettheslice.Manualslicesaretheonlyavailable
sliceoptionforROLAPpartitionsandproactivecachingpartitions.
2.1.2.1.1 Auto Slice
DuringprocessingofMOLAPpartitions,AnalysisServicesinternallyidentifiestherangeofdatathatis
containedineachpartitionbyusingtheMinandMaxDataIDsofeachattributetocalculatetherange of
datathatiscontainedinthepartition.
Thedatarangeforeachattributeisthencombinedtocreatethe
slicedefinitionforthepartition.
20
TheMinandMaxDataIDscanspecifyaeitherasinglememberorarangeofmembers.Forexample,
partitioningbyyearresultsinthesameMinandMaxDataIDslicefortheyearattribute,andqueriestoa
specificmomentintimeonlyresultinpartitionqueriesto
thatyear’spartition.
ItisimportanttorememberthatthepartitionsliceismaintainedasarangeofDataIDsthatyouhaveno
explicitcontrolover.DataIDsareassignedduringdimensionprocessingasnewmembersare
encountered.BecauseAnalysisServicesjustlooksattheminimumandmaximumvalueoftheDataID,
youcanendupreadingpartitionsthatdon’tcontainrelevantdata.
Forexample:ifyouhaveapartition,P2003_4,thatcontainsboth2003and2004data,youarenot
guaranteedthattheminimumandmaximumDataIDintheslidecontainvaluesnexttoeachother(even
thoughtheyearsare
adjacent).Inourexample,letussaytheDataIDfor2003is42andtheDataIDfor
2004is45.BecauseyoucannotcontrolwhichDataIDgetsassignedtowhichmembers,youcouldbeina
situationwheretheDataIDfor2005is44.Whenauserrequestsdatafor
2005,AnalysisServiceslooksat
thesliceforP2003_4,seesthatitcontainsdataintheinterval42to45andthereforeconcludesthatthis
partitionhastobescannedtomakesureitdoesnotcontainthevaluesforDataID44(because44is
between42and45).
Because
ofthisbehavior,autoslicetypicallyworksbestifthedatacontainedinthepartitionmapstoa
singleattributevalue.Whenthatisthecase,themaximumandminimumDataIDcontainedintheslice
willbeequalandtheslicewillworkefficiently.
Notethattheautosliceisnot
definedandindexesarenotbuiltforpartitionswithfewerrowsthan
IndexBuildThreshold(whichhasadefaultvalueof4096).
2.1.2.1.2 Manually Setting Slices
NometadataisavailabletoAnalysisServicesaboutthecontentofROLAPandproactivecaching
partitions.Becauseofthis,youmustmanuallyidentifythesliceinthepropertiesofthe
partition.Itisa
bestpracticetomanually setslicesinROLAPandproactivecachingpartitions.
However,asshownintheprevioussection,therearecaseswhereautoslicewillnotgiveyouthe
desiredpartitioneliminationbehavior.Inthesecasesyoucanbenefitfromdefiningthesliceyourselffor
MOLAPpartitions.Forexample,ifyoupartitionbyyearwithsomepartitionscontainingarangeofyears,
definingthesliceexplicitlyavoidstheproblemofoverlappingDataIDs.Thiscanonlybedonewith
knowledgeofthedata–whichiswhereyoucanaddsomeoptimizationasaBIdeveloper.
It
isgenerallynotabestpracticetocreatepartitionsbeforeyouarereadytofillthemwithdata.Butfor
real‐timecubes,itissometimesagoodideatocreatepartitionsinadvancetoavoidlockingissues.
Whenyoutakethisapproach,itis alsoagoodideato
setamanualsliceonMOLAPpartitionstomake
surethestorageenginedoesnotspendtimescanningemptypartitions.
21
2.1.2.2 Partition Sizing
Fornondistinctcountmeasuregroups,testswithpartitionsizesintherangeof200MBtoupto3GB
indicatethatpartitionsizealonedoesnothaveasubstantialimpactonqueryspeeds.Infact,wehave
successfullydeployedgoodqueryperformanceonpartitionslargerthan3GB.
Thefollowing
graphshowsfourdifferentqueryrunswithdifferentpartitionsizes(theverticalaxisis
totalruntimeinhours).Performanceiscomparablebetweenpartitionsizesandisonlyaffectedbythe
designofthesecurityfeaturesinthisparticularcustomercube.
Figure 1111: Throughput by partition size (higher is better)
Thepartitioningstrategyshouldbebasedonthesefactors:
Increasingprocessingspeedandflexibility
Increasingmanageabilityofbringinginnewdata
Increasingqueryperformancefrompartitioneliminationasdescribedearlier
Supportfordifferentaggregationdesigns
Asyouaddmorepartitions,themetadataoverheadofmanagingthecube
growsexponentially.This
affectsProcessUpdateandProcessAddoperationsondimensions,whichhavetotraversethemetadata
dependenciestoupdatethecubewhendimensionschange.Asaruleofthumb,youshouldtherefore
seektokeepthenumberofpartitionsinthecubeinthelowthousands–whileatthe
sametime
balancingtherequirementsdiscussedhere.
Forlargecubes,preferlargerpartitionsovercreatingtoomanypartitions.Thisalsomeansthatyoucan
safelyignoretheAnalysisManagementObjects(AMO)warninginMicrosoftVisualStudiothatpartition
sizesshouldnotexceed20millionrows.
2.1.2.3 Partition Strategy
Fromguidanceonpartitionsizing,wecandevelop somecommondesignpatternsforpartition
strategies.
22
2.1.2.3.1 Partition by Date
Mostcubesarebuiltonatleastonecolumncontainingadate.Becausedataoftenarrivesinmonthly,
weekly,daily,orevenhourlyslices,itmakessensetopartitionthecubeondate.Partitioningondate
allowsyoutoreplaceafulldayincaseyouloadfaultydata.
Itallowsyoutoselectivelyarchiveolddata
bymovingthepartitiontocheapstorage.Andfinally,itallowsyoutoeasilygetridofdata,byremoving
anentirepartition.Typically,adatepartitioningschemelookssomewhatlikethis.
Figure 1212: Partitioning by Date
Notethatinordertomovethepartitiontocheaperstorage,youwillhavetochangethedatalocation
andreprocessesthepartition.Thisdesignworksverywellforsmalltomedium‐sizedcubes.Itis
reasonablysimpletoimplementandthenumberofpartitionsiskeptlow.However,itdoes
sufferfroma
fewdrawbacks:
1. Ifthegranularityofthepartitioningissmallenough(forexample,hourly),thenumberof
partitionscanquicklybecomeunmanageable.
2. Assumingdataisaddedonlytothelatestpartition, partitionprocessingislimitedtooneTCP/IP
connectionreadingfromthedatasource.If
youhavealotofdata,thiscanbeascalabilitylimit.
Ad1)Ifyouhavealotofdate‐basedpartitions,itisoftenagoodideatomergetheolderonesintolarge
partitions.YoucandothiseitherbyusingtheAnalysisServicesmergefunctionalityorby
droppingthe
oldpartitions,creatinganew,largerpartition,andthenreprocessingit.Repro cessingwilltypicallytake
23
longerthanmerging,butwehavefoundthatcompressionofthepartitioncanoftenincreaseifyou
reprocess.Amodified,datepartitioningschememaylooklikethis.
Figure 1313: Modified Date Partitioning
Thisdesignaddressesthemetadataoverheadofhavingtoomanypartitions.Butitisstillbottlenecked
bythemaximumspeedoftheProcessAddorProcessFullforthelatestpartition.Ifyourdatasourceis
SQLServer,thespeedofasingledatabaseconnectioncanbehundredsofthousandsof
rowsevery
second–whichworkswellformostscenarios.Butifthecuberequiresevenfasterprocessingspeeds,
considermatrixpartitioning.
2.1.2.3.2 Matrix Partitioning
Forlargecubes,itisoftenagoodideatoimplementamatrixpartitioningscheme:partitiononboth
dateandsomeotherkey.Thedatepartitioningisused
toselectivelydeleteormergeoldpartitionsas
describedearlier.Theotherkeycanbeusedtoachieveparallelismduringpartitionprocessingandto
restrictcertainuserstoasubsetofthepartitions.Forexample,consideraretailerthatoperatesinUS,
Europe,andAsia.Youmightdecidetopartition
likethis.
24
Figure 1414: Example of matrix partitioning
Iftheretailergrows,theymaychoosetosplittheregionpartitionsintosmallerpartitionstoincrease
parallelismofloadfurtherandtolimittheworst‐casescansthatausercanperform.Forcubesthatare
expectedtogrowdramatically,itisagoodideatochooseapartitionkey
thatgrowswiththebusiness
andgivesyouoptionsforextendingthematrixpartitioningstrategyappropriately.Thefollowingtable
containsexamplesofsuchpartitioningkeys.
Industry Examplepartitionkey Sourceofdataproliferation
Webretail Customerkey Addingcustomersandtransactions
Storeretail Storekey Addingnewstores
Datahosting HostIDorracklocation Addinganewserver