BuildingScalableWebSites
ByCalHenderson
...............................................
Publisher:O'Reilly
PubDate:May2006
PrintISBN-10:0-596-10235-6
PrintISBN-13:978-0-59-610235-7
Pages:348
TableofContents|Index
Slowwebsitesinfuriateusers.Lotsofpeoplecanvisityour
websiteoruseyourwebapplication-butyouhavetobepreparedfor
thosevisitors,ortheywon'tcomeback.Yoursitesneedtobebuiltto
withstandtheproblemssuccesscreates.
BuildingScalableWebSiteslooksatavarietyoftechniquesforcreatingsitesthatcan
keepuserscheerfulevenwhentherearethousandsormillionsofthem.Flickr.com
developer,CalHenderson,explainshowtobuildsitessothatlargenumbersofvisitorscan
enjoythem.Hendersonexaminestechniquesthatgobeyondsheerspeed,exploringhow
tocoordinatedevelopers,supportinternationalusers,andintegratewithotherservices
fromemailtoSOAPtoRSStotheAPIsexposedbymanyAjax-basedwebapplications.
Thisbookuncoversthesecretsthatyouneedtoknowforback-endscaling,architecture
andfailoversoyourwebsitescanhandlecountlessrequests.You'lllearnhowtotakethe
"poorman'swebtechnologies"-Linux,Apache,MySQLandPHPorotherscripting
languages-andscalethemtocompetewithestablished"storebought"enterpriseweb
technologies.Towardtheendofthebook,you'lldiscovertechniquesforkeepingweb
applicationsrunningwitheventmonitoringandlong-termstatisticaltrackingforcapacity
planning.
Ifyou'reabouttobuildyourfirstdynamicwebsite,thenBuildingScalableWebSitesisn't
foryou.Butifyou'reanadvanceddeveloperwho'sreadytorealizethecostand
performancebenefitsofacomprehensiveapproachtoscalableapplications,thenletyour
fingersdothewalkingthroughthisconvenientguide.
BuildingScalableWebSites
ByCalHenderson
...............................................
Publisher:O'Reilly
PubDate:May2006
PrintISBN-10:0-596-10235-6
PrintISBN-13:978-0-59-610235-7
Pages:348
TableofContents|Index
Copyright
Preface
Chapter1.Introduction
Section1.1.WhatIsaWebApplication?
Section1.2.HowDoYouBuildWebApplications?
Section1.3.WhatIsArchitecture?
Section1.4.HowDoIGetStarted?
Chapter2.WebApplicationArchitecture
Section2.1.LayeredSoftwareArchitecture
Section2.2.LayeredTechnologies
Section2.3.SoftwareInterfaceDesign
Section2.4.GettingfromAtoB
Section2.5.TheSoftware/HardwareDivide
Section2.6.HardwarePlatforms
Section2.7.HardwarePlatformGrowth
Section2.8.HardwareRedundancy
Section2.9.Networking
Section2.10.Languages,Technologies,andDatabases
Chapter3.DevelopmentEnvironments
Section3.1.TheThreeRules
Section3.2.UseSourceControl
Section3.3.One-StepBuild
Section3.4.IssueTracking
Section3.5.ScalingtheDevelopmentModel
Section3.6.CodingStandards
Section3.7.Testing
Chapter4.i18n,L10n,andUnicode
Section4.1.InternationalizationandLocalization
Section4.2.UnicodeinaNutshell
Section4.3.UnicodeEncodings
Section4.4.TheUTF-8Encoding
Section4.5.UTF-8WebApplications
Section4.6.UsingUTF-8withPHP
Section4.7.UsingUTF-8withOtherLanguages
Section4.8.UsingUTF-8withMySQL
Section4.9.UsingUTF-8withEmail
Section4.10.UsingUTF-8withJavaScript
Section4.11.UsingUTF-8withAPIs
Chapter5.DataIntegrityandSecurity
Section5.1.DataIntegrityPolicies
Section5.2.Good,Valid,andInvalid
Section5.3.FilteringUTF-8
Section5.4.FilteringControlCharacters
Section5.5.FilteringHTML
Section5.6.Cross-SiteScripting(XSS)
Section5.7.SQLInjectionAttacks
Chapter6.Email
Section6.1.ReceivingEmail
Section6.2.InjectingEmailintoYourApplication
Section6.3.TheMIMEFormat
Section6.4.ParsingSimpleMIMEEmails
Section6.5.ParsingUUEncodedAttachments
Section6.6.TNEFAttachments
Section6.7.WirelessCarriersHateYou
Section6.8.CharacterSetsandEncodings
Section6.9.RecognizingYourUsers
Section6.10.UnitTesting
Chapter7.RemoteServices
Section7.1.RemoteServicesClub
Section7.2.Sockets
Section7.3.UsingHTTP
Section7.4.RemoteServicesRedundancy
Section7.5.AsynchronousSystems
Section7.6.ExchangingXML
Section7.7.LightweightProtocols
Chapter8.Bottlenecks
Section8.1.IdentifyingBottlenecks
Section8.2.ExternalServicesandBlackBoxes
Chapter9.ScalingWebApplications
Section9.1.TheScalingMyth
Section9.2.ScalingtheNetwork
Section9.3.LoadBalancing
Section9.4.ScalingMySQL
Section9.5.MyISAM
Section9.6.MySQLReplication
Section9.7.DatabasePartitioning
Section9.8.ScalingLargeDatabase
Section9.9.ScalingStorage
Chapter10.Statistics,Monitoring,andAlerting
Section10.1.TrackingWebStatistics
Section10.2.ApplicationMonitoring
Section10.3.Alerting
Chapter11.APIs
Section11.1.DataFeeds
Section11.2.MobileContent
Section11.3.WebServices
Section11.4.APITransports
Section11.5.APIAbuse
Section11.6.Authentication
Section11.7.TheFuture
AbouttheAuthor
Colophon
Colophon
Index
BuildingScalableWebSites
byCalHenderson
Copyright©2006O'ReillyMedia,Inc.Allrightsreserved.
PrintedintheUnitedStatesofAmerica.
PublishedbyO'ReillyMedia,Inc.,1005GravensteinHighway
North,Sebastopol,CA95472.
O'Reillybooksmaybepurchasedforeducational,business,or
salespromotionaluse.Onlineeditionsarealsoavailablefor
mosttitles(safari.oreilly.com).Formoreinformation,contact
ourcorporate/institutionalsalesdepartment:(800)998-9938or
Editor:
SimonSt.Laurent
ProductionEditor:
AdamWitwer
Copyeditor:
AdamWitwer
Proofreader:
ColleenGorman
Indexer:
JohnBickelhaupt
CoverDesigner:
KarenMontgomery
InteriorDesigner:
DavidFutato
Illustrators:
RobertRomanoandJessamynRead
PrintingHistory:
May2006:
FirstEdition.
NutshellHandbook,theNutshellHandbooklogo,andthe
O'ReillylogoareregisteredtrademarksofO'ReillyMedia,Inc.
BuildingScalableWebSites,theimageofacarp,andrelated
tradedressaretrademarksofO'ReillyMedia,Inc.
Manyofthedesignationsusedbymanufacturersandsellersto
distinguishtheirproductsareclaimedastrademarks.Where
thosedesignationsappearinthisbook,andO'ReillyMedia,Inc.
wasawareofatrademarkclaim,thedesignationshavebeen
printedincapsorinitialcaps.
Whileeveryprecautionhasbeentakeninthepreparationofthis
book,thepublisherandauthorassumenoresponsibilityfor
errorsoromissions,orfordamagesresultingfromtheuseof
theinformationcontainedherein.
ISBN:0-596-10235-6
[M]
Preface
ThefirstwebapplicationIbuiltwascalledTerrania.Avisitor
couldcometothewebsite,createavirtualcreaturewithsome
customizations,andthentrackthatcreature'sprogressthrough
avirtualworld.Creatureswouldwanderabout,eatplants(or
othercreatures),fightbattles,andmatewithotherplayers'
creatures.Thisactivitywouldthenbereportedbacktoplayers
bytwice-dailyemailssummarizingtheday'sevents.
Callingitawebapplicationisabitofastretch;atthetimeI
certainlywouldn'thavecategorizeditassuch.Thecoreofthe
gamewasaprogramwritteninC++thatranonasingle
machine,loadinggamedatafromasingleflatfile,processing
everythingforthegame"tick,"andstoringitallagainina
singleflatfile.WhenIstartedbuildingthegame,theruntime
wasdestinedtobecometheservercomponentofaclient-server
gamearchitecture.Programmingnetworkdata-exchangeatthe
timewasadifficultprocessthattendedtoinvolvewritingalot
ofrotecodejusttoexchangestringsbetweenaserverand
client(wehadno.NETinthosedays).
TheWebgaveapplicationdevelopersaready-to-useplatform
forcontentdeliveryacrossanetwork,cuttingoutthetrickier
partsofclient-serverapplications.Wewerefreetobuildthe
serverthatdidtheinterestingpartswhilebuildingaclientin
simpleHTMLthatwastrivialincomparison.Whatwouldhave
traditionallybeentheclientcomponentofTerraniaresidedon
theserver,simplyaccessingthesameflatfilethatthegame
serverused.Formostpagesinthe"client"application,Isimply
loadedthefileintomemory,parsedoutthecreaturesthatthe
playercaredabout,anddisplayedbacksomestaticinformation
inHTML.Tocreateanewcreature,Iappendedablockofdata
totheendofasecondfile,whichtheserverwouldthenpickup
andprocesseachtimeitran,integratingthenewcreaturesinto
thegame.Allgameprocessing,includingthesendingof
progressemails,wasdonebytheservercomponent.Theweb
server"client"interfacewasasimpleC++CGIapplicationthat
couldparsethegamedatafileinacoupleofhundredlinesof
source.
Thissystemwasprettysatisfactory;perhapsIdidn'tseethe
limitationsatthetimebecauseIdidn'tcomeupagainstanyof
them.Thelackofinteractivitythroughthewebinterfacewasn't
abigdealasthatwaspartofthegamedesign.Theonlywrite
operationperformedbyaplayerwastheinitialcreationofthe
creature,leavingtherestofthegameasaread-onlyprocess.
Anotherissuethatdidn'tcomeupwasconcurrency.Since
Terraniawaslargelyread-only,anynumberofplayerscould
generatepagessimultaneously.Allofthewritesweresimplefile
appendsthatwerefastenoughtoavoidspinningforlocks.
Besides,thereweren'tenoughplayersfortheretobea
reasonablechanceoftwopeoplereadingorwritingatonce.
AfewyearswouldpassbeforeIgotaroundtoworkingwith
somethingmorecloselyresemblingawebapplication.While
workingforanewmediaagency,Iwasaskedtomodifysomeof
theHTMLoutputbyamessageboardpoweredbyUBB
(UltimateBulletinBoard,fromGroupee,Inc.).UBBwaswritten
inPerlandranasaCGI.Applicationdataitems,suchasuser
accountsandthemessagesthatcomprisedthediscussion,were
storedinflatfilesusingacustomformat.Somepagesofthe
applicationweredynamic,beingcreatedontheflyfromdata
readfromtheflatfiles.Otherpages,suchasthediscussions
themselves,wereflatHTMLfilesthatwerewrittentodiskby
theapplicationasneeded.Thisrender-to-disktechniqueisstill
usedinlow-write,high-readsetupssuchasweblogs,wherethe
costofgeneratingtheviewedpagesontheflyoutweighsthe
costofwritingfilestodisk(whichcanbeacomparativelyvery
slowoperation).
ThegreatthingabouttheUBBwasthatitwaswrittenina
"scripting"language,Perl.Becausethesourcecodedidn'tneed
tobecompiled,thedevelopmentcyclewasmassivelyreduced,
makingitmucheasiertotinkerwiththingswithoutwasting
daysatatime.Thesourcecodewasorganizedintothreemain
files:theendpointscriptsthatusersactuallyrequestedandtwo
libraryfilescontainingutilityfunctions(calledubb_library.pland
ubb_library2.plseriously).
AfteralittleexperienceworkingwithUBBforafewcommercial
clients,Igotfairlyinvolvedwiththemessageboard"hacking"
communityastrangegroupofpeoplewhospenttheirtime
tryingtoaddfunctionalitytoexistingmessageboardsoftware.I
startedasitecalledUBBHackerswithaguywholaterwenton
tobeaprogrammerforInfopop,writingthenextversionof
UBB.
Earlyon,UBBhadverypoorconcurrencybecauseitreliedon
nonportablefile-lockingcodethatdidn'tworkonWindows(one
ofthetargetplatforms).Iftwouserswerereplyingtothesame
threadatthesametime,thethread'sdatafilecouldbecome
corruptedandsomeofthedatalost.Asthenumberofuserson
anysinglesystemincreased,thechancefordatacorruptionand
raceconditionsincreased.Forreallyactivesystems,rendering
HTMLfilestodiskquicklybottlenecksonfileI/O.Thenextstep
nowseemslikeitshouldhavebeenobvious,butatthetimeit
wasn't.
MySQL3changedalotofthingsintheworldofweb
applications.BeforeMySQL,itwasn'taseasytouseadatabase
forstoringwebapplicationdata.Existingdatabasetechnologies
wereeitherprohibitivelyexpensive(Oracle),slowanddifficult
toworkwith(FileMaker),orinsanelycomplicatedtosetupand
maintain(PostgreSQL).WiththeavailabilityofMySQL3,things
startedtochange.PHP4wasjuststartingtogetwidespread
acceptanceandthephpMyAdminprojecthadbeenstarted.
phpMyAdminmeantthatwebapplicationdeveloperscouldstart
workingwithdatabaseswithoutthevisualdesignodditiesof
FileMakerorthearcaneSQLsyntaxknowledgeneededtodrive
thingsonthecommandline.Icanstillneverrememberthe
correctsyntaxforcreatingatableorgrantingaccesstoanew
user,butnowIdon'tneedto.
MySQLbroughtapplicationdevelopersconcurrencywecould
readandwriteatthesametimeandourdatawouldneverget
inadvertentlycorrupted.AsMySQLprogessed,wegoteven
higherconcurrencyandmassiveperformance,milesbeyond
whatwecouldhaveachievedwithflatfilesandrender-to-disk
techniques.Withindexes,wecouldselectdatainarbitrarysets
andorderswithouthavingtoloaditallintomemoryandwalk
thedatastructure.Thepossibilitieswereendless.
Andtheystillare.
Thecurrentbreedofwebapplicationsarestillpushingthe
boundariesofwhatcanbedoneintermsofscale,functionality,
andinteroperability.WiththeexplosionofpublicAPIs,the
abilitytocombinemultipleapplicationstocreatenewservices
hasmadeforaservice-orientedculture.TheAPIservicemodel
hasshownusclearwaystoarchitectourapplicationsfor
flexibilityandscaleatalowcost.
Thelargestandmostpopularwebapplicationsofthemoment,
suchasFlickr,Friendster,MySpace,andWikipedia,handle
billionsofdatabasequeriesperday,havehugedatasets,and
runonmassivehardwareplatformscomprisedofcommodity
hardware.WhileGooglemightbetheposterchildofhuge
applications,theseothersmaller(thoughstillhuge)applications
arebecomingrolemodelsforthenextgenerationof
applications,nowlabeledWeb2.0.Withincreasedread/write
interactivity,networkeffects,andopenAPIs,thenext
generationofwebapplicationdevelopmentisgoingtobevery
interesting.
WhatThisBookIsAbout
Thisbookisprimarilyaboutwebapplicationdesign:thedesign
ofsoftwareandhardwaresystemsforwebapplications.We'llbe
lookingatapplicationarchitecture,developmentpractices,
technologies,Unicode,andgeneralinfrastructuralwork.
Perhapsasimportantly,thisbookisaboutthedevelopmentof
webapplications:thepracticeofbuildingthehardwareand
implementingthesoftwaresystemsthatwedesign.Whilethe
theoryofapplicationdesignisallwellandgood(andan
essentialpartofthewholeprocess),weneedtorecognizethat
theimplementationplaysaveryimportantpartinthe
constructionoflargeapplicationsandneedstobeborneinmind
duringthedesignprocess.Ifwe'redesigningthingsthatwe
can'tbuild,thenwecan'tknowifwe'redesigningtheright
thing.
Thisbookisnotaboutprogramming.Atleast,notreally.Rather
thantalkingaboutsnippetsofcode,functionnames,andso
forth,we'llbelookingatgeneralizedtechniquesandapproaches
forbuildingwebapplications.Whilethebookdoescontainsome
snippetsofexamplecode,theyarejustthat:examples.Mostof
thecodeexamplesinthisbookcanbeusedonlyinthecontext
ofalargerapplicationorinfrastructure.
Alotofwhatwe'llbelookingatrelatestodesigningapplication
architecturesandbuildingapplicationinfrastructures.Inthe
fieldofwebapplications,infrastructurestendtomeana
combinationofhardwareplatform,softwareplatform,and
maintenanceanddevelopmentpractices.We'llconsiderhowall
ofthesefittogethertobuildaseamlessinfrastructureforlargescaleapplications.
Thelargestchapterinthisbook(Chapter9)dealssolelywith
scalingapplications:architecturalapproachestodesignfor
scalabilityaswellastechnologiesandtechniquesthatcanbe
usedtohelpscaleexistingsystems.Whilewecanhardlycover
thewholefieldinasinglechapter(wecouldbarelycoverthe
basicsinanentirebook),we'vepickedacoupleofthemost
usefulapproachesforapplicationswithcommonrequirements.
Itshouldbenoted,however,thatthisishardlyanexhaustive
guidetoscaling,andthere'splentymoretolearn.Foran
introductiontothewiderworldofscalableinfrastructures,you
mightwanttopickupacopyofPerformancebyDesign:
ComputerCapacityPlanningbyExample(PrenticeHall).
Towardtheendofthebook(Chapters10and11),welookat
techniquesforkeepingwebapplicationsrunningwithevent
monitoringandlong-termstatisticaltrackingforcapacity
planning.Monitoringandalertingarecoreskillsforanyone
lookingtocreateanapplicationandthenmanageitforany
lengthoftime.Forapplicationswithcustomcomponents,or
evenjustmanycomponents,thetaskofdesigningandbuilding
theprobesandmonitorsoftenfallstotheapplicationdesigners,
sincetheyshouldbestknowwhatneedstobetrackedandwhat
constitutesanalertablestate.Foreverycomponentofour
system,weneedtodesignsomewaytocheckthatit'sboth
workingandworkingcorrectly.
Inthelastchapter,we'lllookattechniquesforsharingdataand
allowingotherapplicationstointegratewithourownviadata
feedsandread/writeAPIs.Whilewe'llbelookingatthedesign
ofcomponentAPIsthroughoutthebookaswedealwith
differentcomponentsinourapplication,thefinalchapterdeals
withwaystopresentthoseinterfacestotheoutsideworldina
safeandaccessiblemanner.We'llalsolookatthevarious
standardsthathaveevolvedfordataexportandinteractionand
lookatapproachesforpresentingthemfromourapplication.
WhatYouNeedtoKnow
Thisbookisnotmeantforpeoplebuildingtheirfirstdynamic
website.Thereareplentyofgoodbooksforfirsttimers,sowe
won'tbeattemptingtocoverthatgroundhere.Assuch,you'll
needtohavealittleexperiencewithbuildingdynamicwebsites
orapplications.Ataminimumyoushouldhavealittle
experienceofexposingdataforeditingviawebpagesand
managinguserdata.
Whilethisbookisn'taimedsolelyatimplementers,therearea
numberofpracticalexamples.Tofullyappreciatethese
examples,abasicknowledgeofprogrammingisrequired.While
youdon'tneedtoknowaboutcontinuationsorargument
currying,you'llneedtohaveaworkingknowledgeofsimple
controlstructuresandthebasicvonNeumanninput-processstorage-outputmodel.
Alongwiththecodeexamples,we'llbelookingatquiteafew
examplesontheUnixcommandline.HavingaccesstoaLinux
box(orotherUnixflavor)willmakeyourlifealoteasier.Having
aserveronwhichyoucanfollowalongwiththecommandsand
codewillmakeeverythingeasiertounderstandandhave
immediatepracticalusage.Aworkingknowledgeofthe
commandlineisassumed,soIwon'tbetellingyouhowto
launchashell,executeacommand,orkillaprocess.Ifyou're
newtothecommandline,youshouldpickupanintroductory
bookbeforegoingmuchfurthercommand-lineexperienceis
essentialforUnix-basedapplicationsandisbecomingmore
importantevenforWindows-basedapplications.
Whilethetechniquesinthisbookcanbeequallyappliedtoany
numberofmoderntechnologies,theexamplesanddiscussions
willdealwithasetoffourcoretechnologiesuponwhichmany
ofthelargestapplicationsarebuilt.PHPisthemainglue
languageusedinmostcodeexamplesdon'tworryifyouhaven't
usedPHPbefore,aslongasyou'veusedanotherC-like
language.Ifyou'veworkedwithC,C++,Java?,JavaScript,or
Perl,thenyou'llpickupPHPinnotimeatallandthesyntax
shouldbeimmediatelyunderstandable.
Forsecondarycodeandutilitywork,therearesomeexamples
inPerl.WhilePerlisalsousableasamainapplicationlanguage,
it'smostcapableinacommand-linescriptinganddata-munging
role,soitisoftenthesensiblechoiceforbuildingadministration
tools.Again,ifyou'veworkedwithaC-likelanguage,thenPerl
syntaxisacinchtopickup,sothere'snoneedtorunoffand
buythecamelbookjustyet.
Forthedatabasecomponentofourapplication,we'llfocus
primarilyonMySQL,althoughwe'llalsotouchontheotherbig
three(Oracle,SQLServer,andPostgreSQL).MySQLisn'talways
thebesttoolforthejob,butithasmanyadvantagesoverthe
others:it'seasytosetup,usuallygoodenough,andprobably
mostimportantly,free.Forprototypingorbuildingsmall-scale
applications,MySQL'slow-effortsetupandadministration,
combinedwithtoolslikephpMyAdmin
(),makeitaveryattractive
choice.That'snottosaythatthere'snospaceforother
databasetechnologiesforbuildingwebapplications,asallfour
haveextensiveusage,butit'salsoimportanttonotethat
MySQLcanbeusedforlargescaleapplicationsmanyofthe
largestapplicationsontheInternetuseit.Abasicknowledgeof
SQLanddatabasetheorywillbeusefulwhenreadingthisbook,
aswillaninstanceofMySQLonwhichyoucanplayaboutand
connecttoexamplePHPscripts.
TokeepinlinewithaUnixenvironment,alloftheexamples
assumethatyou'reusingApacheasanHTTPserver.Toan
extent,Apacheistheleastimportantcomponentinthetool
chain,sincewedon'ttalkmuchaboutconfiguringorextending
it(that'salargefieldinitself).WhileexperiencewithApacheis
beneficialwhenreadingthisbook,it'snotessential.Experience
withanywebserversoftwarewillbefine.
Practicalexperiencewithusingthesoftwareisnottheonly
requirement,however.Togetthemostoutofthisbook,you'll
needtohaveaworkingknowledgeofthetheorybehindthese
technologies.Foreachofthecoreprotocolsandstandardswe
lookat,IwillcitetheRFCorspecification(whichtendstobea
littledryandimpenetrable)andinmostcasesreferto
importantbooksinthefield.WhileI'lltalkinsomedepthabout
HTTP,TCP/IP,MIME,andUnicode,otherprotocolsarereferred
toonlyinpassing(you'llseeover200acronyms).Forafull
understandingoftheissuesinvolved,you'reencouragedtofind
outabouttheseprotocolsandstandardsyourself.
ConventionsUsedinThisBook
Itemsappearinginthebookaresometimesgivenaspecial
appearancetosetthemapartfromtheregulartext.Here'show
theylook:
Italic
Usedforcitationsofbooksandarticles,commands,email
addresses,URLs,filenames,emphasizedtext,andfirst
referencestoterms
Constantwidth
Usedforliterals,constantvalues,codelistings,andXML
markup
Constantwidthitalic
Usedforreplaceableparameterandvariablenames
Constantwidthbold
Usedtohighlighttheportionofacodelistingbeing
discussed
Indicatesatip,suggestion,orgeneralnote.Forexample,we'lltellyou
ifacertainsettingisversion-specific.
Indicatesawarningorcaution.Forexample,we'lltellyouifacertain
settinghassomekindofnegativeimpactonthesystem.
UsingCodeExamples
Theexamplesfromthisbookarefreelydownloadablefromthe
book'swebsiteat />Thisbookisheretohelpyougetthejobdone.Ingeneral,you
mayusethecodeinthisbookinyourprogramsand
documentation.Youdonotneedtocontactusforpermission
unlessyou'rereproducingasignificantportionofthecode.For
example,writingaprogramthatusesseveralchunksofcode
fromthisbookdoesnotrequirepermission.Sellingor
distributingaCD-ROMofexamplesfromO'Reillybooksdoes
requirepermission.Answeringaquestionbycitingthisbook
andquotingexamplecodedoesnotrequirepermission.
Incorporatingasignificantamountofexamplecodefromthis
bookintoyourproduct'sdocumentationdoesrequire
permission.
Weappreciate,butdonotrequire,attribution.Anattribution
usuallyincludesthetitle,author,publisher,andISBN.For
example:"BuildingScalableWebSitesbyCalHenderson.
Copyright2006O'ReillyMedia,Inc.,0-596-10235-6."
Ifyoufeelthatyouruseofcodeexamplesfallsoutsidefairuse
orthepermissiongivenhere,feelfreetocontactusat
Safari®Enabled
WhenyouseeaSafari®Enabledicononthecoverof
yourfavoritetechnologybook,thatmeansthebookisavailable
onlinethroughtheO'ReillyNetworkSafariBookshelf.
Safarioffersasolutionthat'sbetterthane-books.It'savirtual
librarythatletsyoueasilysearchthousandsoftoptechbooks,
cutandpastecodesamples,downloadchapters,andfindquick
answerswhenyouneedthemostaccurate,currentinformation.
Tryitforfreeat.
HowtoContactUs
Wehavetestedandverifiedtheinformationinthisbooktothe
bestofourability,butyoumayfindthatfeatureshavechanged
(oreventhatwehavemademistakes!).Pleaseletusknow
aboutanyerrorsyoufind,aswellasyoursuggestionsforfuture
editions,bywritingto:
O'ReillyMedia,Inc.
1005GravensteinHighwayNorth
Sebastopol,CA95472
800-998-9938(intheUnitedStatesorCanada)
707-829-0515(internationalorlocal)
707-829-0104(fax)
Wehaveawebpageforthisbook,wherewelisterrata,
examples,oranyadditionalinformation.Youcanaccessthis
pageat:
/>Tocommentorasktechnicalquestionsaboutthisbook,send
emailto:
Youcansignupforoneormoreofourmailinglistsat:
Formoreinformationaboutourbooks,conferences,software,
ResourceCenters,andtheO'ReillyNetwork,seeourwebsite
at:
Acknowledgments
I'dliketothanktheoriginalFlickr/LudicorpteamStewart
Butterfield,GeorgeOates,andEricCostelloforlettingmehelp
buildsuchanawesomeproductandhaveachancetomake
somethingpeoplereallycareabout.Muchofthelargerscale
systemsdesignworkhascomefromdiscussionswithother
fellowLudicorpersJohnAllspaw,SergueiMourachov,Dathan
Pattishall,andAaronStraupCope.
I'dalsoliketothankmylong-sufferingpartnerElinafornot
complainingtoomuchwhenIignoredherformonthswhile
writingthisbook.
Chapter1.Introduction
Beforewediveintoanydesignorcodingwork,weneedtostep
backanddefineourterms.Whatisitwe'retryingtodoandhow
doesitdifferfromwhatwe'vedonebefore?Ifyou'vealready
builtsomewebapplications,you'rewelcometoskipaheadto
thenextchapter(wherewe'llstarttogetabitnerdier),butif
you'reinterestedingettingsomegeneralcontextthenkeepon
reading.
1.1.WhatIsaWebApplication?
Ifyou'rereadingthisbook,youprobablyhaveagoodideaof
whatawebapplicationis,butit'sworthdefiningourterms
becausethelabelhasbeenroutinelymisapplied.Aweb
applicationisneitherawebsitenoranapplicationintheusual
desktop-iansense.Awebapplicationsitssomewherebetween
thetwo,withelementsofboth.
Whileawebsitecontainspagesofdata,awebapplicationis
comprisedofdatawithaseparatedeliverymechanism.While
webaccessibilityenthusiastsgetexcitedabouttheseparationof
markupandstylewithCSS,webapplicationdesignersget
excitedaboutrealdataseparation:thedatainaweb
applicationdoesn'thavetohaveanythingtodowithmarkup
(althoughitcancontainmarkup).Westorethemessagesthat
comprisethediscussioncomponentofawebapplication
separatelyfromthemarkup.Whenthetimecomestodisplay
datatotheuser,weextractthemessagesfromourdatastore
(typicallyadatabase)anddeliverthedatatotheuserinsome
formatoversomemedium(typicallyHTMLoverHTTP).The
importantpartisthatwedon'thavetodeliverthedatausing
HTML;wecouldjustaseasilydeliveritasaPDFbyemail.
Webapplicationsdon'thavepagesinthesamewaywebsites
do.Whileawebapplicationmayappeartohave10pages,
addingmoredatatothedatastoreincreasesthepagecount
withoutourhavingtoaddfurthermarkuporsourcecodetoour
application.Withafeaturesuchassearch,whichisdrivenby
userinput,awebapplicationcanhaveanearinfinitenumberof
"pages,"butwedon'thavetoentereachoftheseasablobof
HTML.Asmallsetoftemplatesandlogicallowsusto
generatepagesontheflybasedoninputparameterssuchas
URLorPOSTdata.
Totheaverageuser,awebapplicationcanbeindistinguishable
fromawebsite.Forasimpleweblog,wecan'ttellbylookingat
theoutputtedmarkupwhetherthepagesarebeinggenerated
ontheflyfromadatastoreorwrittenasstaticHTML
documents.Thefileextensioncangiveusaclue,butcanbe
fakedforgoodreasonineitherdirection.Awebapplication
tendstoappeartobeanapplicationonlytothoseuserswho
edittheapplication'sdata.Thisisoften,althoughnotalways,
accomplishedviaanHTMLinterface,butcouldjustaseasilybe
achievedusingadesktopapplicationthateditsthedatastore
directlyorremotely.
WiththeadventofAjax(AsynchronousJavaScriptandXML,
previouslyknownasremotescriptingor"remoting"),the
interactionmodelforwebapplicationshasbeenextended.In
thepast,usersinteractedwithwebapplicationsusingapagebasedmodel.Auserwouldrequestapagefromtheserver,
submithischangesusinganHTTPPOST,andbepresentedwith
anewpage,eitherconfirmingthechangesorshowingthe
modifieddata.WithAjax,wecansendourdatamodificationsin
thebackgroundwithoutchangingthepagetheuserison,
bringingusclosertothedesktopapplicationinteractionmodel.
Thenatureofwebapplicationsisslowlychanging.Itcan'tbe
deniedthatwe'vealreadycomealongwayfromthefirst
interactiveapplicationsontheWeb,butthere'sstillafairway
togo.WithapplicationslikeGoogle'sGmailandMicrosoft's
OfficeLive,thewebapplicationmarketismovingtoward
applicationsdeliveredovertheWebwiththefeaturesand
benefitsofdesktopapplicationscombinedwiththebenefitsof
webapplications.Whiledesktopapplicationsgiveusrich
interactivityandspeed,webapplicationscanofferzero-effort
upgrades,trulyportabledata,andreducedclientrequirements.
Whateverthemodelofinteraction,onethingremainsconstant:
webapplicationsaresystemswithacoredatasetthatcanbe
accessedandmodifiedusingwebpages,withthepossibilityof
otherinterfaces.
1.2.HowDoYouBuildWebApplications?
Tobuildawebapplication,weneedtocreateatleasttwomajor
components:ahardwareplatformandsoftwareplatform.
Forsmall,simpleapplications,ahardwareplatformmay
compriseasinglesharedserverrunningawebserveranda
database.Atsmallscaleswedon'tneedtothinkabouthardware
asacomponentofourapplications,butaswestarttoscaleout,
itbecomesamoreandmoreimportantpartoftheoverall
design.Inthisbookwe'lllookextensivelyatbothsidesof
applicationdesignandengineering,howtheyaffecteachother,
andhowwecantiethetwotogethertocreateaneffective
architecture.
Developerswhohaveworkedatthesmallscalemightbeasking
themselveswhyweneedtobotherwith"platformdesign"when
wecouldjustusesomekindofout-of-the-boxsolution.For
small-scaleapplications,thiscanbeagreatidea.Wesavetime
andmoneyupfrontandgetaworkingandserviceable
application.Theproblemcomesatlargerscalestherearenooffthe-shelfkitsthatwillallowyoutobuildsomethinglikeAmazon
orFriendster.Whilebuildingsimilarfunctionalitymightbefairly
trivial,makingthatfunctionalityworkformillionsofproducts,
millionsofusers,andwithoutspendingfartoomuchon
hardwarerequiresustobuildsomethinghighlycustomizedand
optimizedforourexactneeds.There'sagoodreasonwhythe
largestapplicationsontheInternetareallbespokecreations:
nootherapproachcancreatemassivelyscalableapplications
withinareasonablebudget.
We'vealreadysaidthatatthecoreofwebapplicationswehave
somesetofdatathatcanbeaccessedandperhapsmodified.
Withinthesoftwareelementofanapplication,weneedto
decidehowwestorethatdata(aschema),howweaccessand
modifyit(businesslogic),andhowwepresentittoourusers
(interactionlogic).InChapter2we'llbelookingatthese
differentcomponents,howtheyinteract,andwhatcomprises
them.Agoodapplicationdesignworksdownfromtheverytop,
definingsoftwareandhardwarearchitecture,thecomponents
thatcompriseyourplatform,andthefunctionalityimplemented
bythoselayers.
Thisbookaimstobeapracticalguidetodesigningandbuilding
large-scaleapplications.Bytheendofthebook,you'llhavea
goodideaofhowtogoaboutdesigninganapplicationandits
architecture,howtoscaleyoursystems,andhowtogoabout
implementingandexecutingthosedesigns.
1.3.WhatIsArchitecture?
Weliketotalkaboutarchitectingapplications,butwhatdoes
thatreallymean?Whenanarchitectdesignsahouse,hehasa
fairlywell-definedtask:gatherrequirements,explorethe
options,andproduceablueprint.Whenthebuildersturnthat
blueprintintoabuilding,weexpectafewthings:thebuilding
shouldstaystanding,keeptherainandwindout,andlet
enoughlightin.Sorrytoshattertheillusion,butarchitecting
applicationsisnotmuchlikethis.
Forastart,ifbuildingswerelikesoftware,thearchitectwould
beinvolvedintheactualbuildingprocess,fromlayingthe
foundationsrightthroughtoinstallingthefixtures.Whenhe
designedandbuiltthehouse,hewouldstartwithacoupleof
roomsandsomebasicamenities,andsomepeoplewouldthen
comeandstartlivingtherebeforethebuildingwascomplete.
Whenitlookedlikethebuildingworkwasabouttofinish,a
wholebunchmorepeoplewouldturnupandstartlivingthere,
too.Butthesenewresidentswouldneednewfeaturesmore
bedroomstosleepin,aswimmingpool,abasement,andon
andon.Thearchitectwoulddesignthesenewroomsand
features,augmentinghisoriginaldesign.Butwhenthetime
cametobuildthem,thecurrentresidentswouldn'tleave.
They'dcontinuelivinginthehouseevenwhileitwasextended,
allthetimecomplainingaboutthenoiseanddustfromthe
buildingwork.Infact,againstallreason,morepeoplewould
moveinwhiletheextensionswerebeingbuilt.Bythetimethe
modificationswerecomplete,morewouldbeneededtohouse
thenewcomersandkeepthemhappy.
Thekeytogoodapplicationarchitectureisplanningforthese
issuesfromthebeginning.Ifthearchitectofourmythicalhouse
startedoutbybuildingahuge,complexhouse,itwouldbe
overkill.Bythetimeitwasready,theresidentswouldhave
goneelsewheretoliveinasmallerhousebuiltinafractionof
thetime.Ifwebuildinsuchawaythatextendingourhouse
takestoolong,thenourresidentsmightmoveelsewhere.We
needtoknowhowtostartattherightscaleandallowour
housetobeextendedaspainlesslyaspossible.
That'snottosaythatwe'regoingtogetanythingrightthefirst
time.Inthescalingofatypicalapplication,everyaspectand
featureisprobablygoingtoberevisitedandrefactored.That's
finethetaskofanapplicationarchitectistominimizethetimeit
takestorefactoreachcomponent,throughcarefulinitialand
ongoingdesign.