StatisticsForBigDataForDummies®
Publishedby:JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030-5774,
www.wiley.com
Copyright©2015byJohnWiley&Sons,Inc.,Hoboken,NewJersey
PublishedsimultaneouslyinCanada
Nopartofthispublicationmaybereproduced,storedinaretrievalsystemor
transmittedinanyformorbyanymeans,electronic,mechanical,photocopying,
recording,scanningorotherwise,exceptaspermittedunderSections107or108ofthe
1976UnitedStatesCopyrightAct,withoutthepriorwrittenpermissionofthe
Publisher.RequeststothePublisherforpermissionshouldbeaddressedtothe
PermissionsDepartment,JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ
07030,(201)748-6011,fax(201)748-6008,oronlineat
/>Trademarks:Wiley,ForDummies,theDummiesManlogo,Dummies.com,Making
EverythingEasier,andrelatedtradedressaretrademarksorregisteredtrademarksof
JohnWiley&Sons,Inc.,andmaynotbeusedwithoutwrittenpermission.Allother
trademarksarethepropertyoftheirrespectiveowners.JohnWiley&Sons,Inc.,isnot
associatedwithanyproductorvendormentionedinthisbook.
LIMITOFLIABILITY/DISCLAIMEROFWARRANTY:WHILETHE
PUBLISHERANDAUTHORHAVEUSEDTHEIRBESTEFFORTSIN
PREPARINGTHISBOOK,THEYMAKENOREPRESENTATIONSOR
WARRANTIESWITHRESPECTTOTHEACCURACYOR
COMPLETENESSOFTHECONTENTSOFTHISBOOKAND
SPECIFICALLYDISCLAIMANYIMPLIEDWARRANTIESOF
MERCHANTABILITYORFITNESSFORAPARTICULARPURPOSE.NO
WARRANTYMAYBECREATEDOREXTENDEDBYSALES
REPRESENTATIVESORWRITTENSALESMATERIALS.THEADVICE
ANDSTRATEGIESCONTAINEDHEREINMAYNOTBESUITABLEFOR
YOURSITUATION.YOUSHOULDCONSULTWITHAPROFESSIONAL
WHEREAPPROPRIATE.NEITHERTHEPUBLISHERNORTHEAUTHOR
SHALLBELIABLEFORDAMAGESARISINGHEREFROM.
Forgeneralinformationonourotherproductsandservices,pleasecontactour
CustomerCareDepartmentwithintheU.S.at877-762-2974,outsidetheU.S.at317572-3993,orfax317-572-4002.Fortechnicalsupport,pleasevisit
www.wiley.com/techsupport.
Wileypublishesinavarietyofprintandelectronicformatsandbyprint-on-demand.
Somematerialincludedwithstandardprintversionsofthisbookmaynotbeincluded
ine-booksorinprint-on-demand.IfthisbookreferstomediasuchasaCDorDVD
thatisnotincludedintheversionyoupurchased,youmaydownloadthismaterialat
.FormoreinformationaboutWileyproducts,visit
www.wiley.com.
LibraryofCongressControlNumber:2015943222
ISBN978-1-118-94001-3(pbk);ISBN978-1-118-94002-0(ePub);ISBN978-1-11894003-7(ePDF)
StatisticsForBigDataForDummies
Visit
to
viewthisbook’scheatsheet.
TableofContents
Cover
Introduction
AboutThisBook
FoolishAssumptions
IconsUsedinThisBook
BeyondtheBook
WheretoGoFromHere
PartI:IntroducingBigDataStatistics
Chapter1:WhatIsBigDataandWhatDoYouDowithIt?
CharacteristicsofBigData
ExploratoryDataAnalysis(EDA)
StatisticalAnalysisofBigData
Chapter2:CharacteristicsofBigData:TheThreeVs
CharacteristicsofBigData
TraditionalDatabaseManagementSystems(DBMS)
Chapter3:UsingBigData:TheHotApplications
BigDataandWeatherForecasting
BigDataandHealthcareServices
BigDataandInsurance
BigDataandFinance
BigDataandElectricUtilities
BigDataandHigherEducation
BigDataandRetailers
BigDataandSearchEngines
BigDataandSocialMedia
Chapter4:UnderstandingProbabilities
TheCoreStructure:ProbabilitySpaces
DiscreteProbabilityDistributions
ContinuousProbabilityDistributions
IntroducingMultivariateProbabilityDistributions
Chapter5:BasicStatisticalIdeas
SomePreliminariesRegardingData
SummaryStatisticalMeasures
OverviewofHypothesisTesting
Higher-OrderMeasures
PartII:PreparingandCleaningData
Chapter6:DirtyWork:PreparingYourDataforAnalysis
PassingtheEyeTest:DoesYourDataLookCorrect?
BeingCarefulwithDates
DoestheDataMakeSense?
FrequentlyEncounteredDataHeadaches
OtherCommonDataTransformations
Chapter7:FiguringtheFormat:ImportantComputerFile
Formats
SpreadsheetFormats
DatabaseFormats
Chapter8:CheckingAssumptions:TestingforNormality
Goodnessoffittest
Jarque-Beratest
Chapter9:DealingwithMissingorIncompleteData
MissingData:What’stheProblem?
TechniquesforDealingwithMissingData
Chapter10:SendingOutaPosse:SearchingforOutliers
TestingforOutliers
RobustStatistics
DealingwithOutliers
PartIII:ExploratoryDataAnalysis(EDA)
Chapter11:AnOverviewofExploratoryDataAnalysis(EDA)
GraphicalEDATechniques
EDATechniquesforTestingAssumptions
QuantitativeEDATechniques
Chapter12:APlottoGetGraphical:GraphicalTechniques
Stem-and-LeafPlots
ScatterPlots
BoxPlots
Histograms
Quantile-Quantile(QQ)Plots
AutocorrelationPlots
Chapter13:You’retheOnlyVariableforMe:Univariate
StatisticalTechniques
CountingEventsOveraTimeInterval:ThePoissonDistribution
ContinuousProbabilityDistributions
Chapter14:ToAlltheVariablesWe’veEncountered:
MultivariateStatisticalTechniques
TestingHypothesesaboutTwoPopulationMeans
UsingAnalysisofVariance(ANOVA)toTestHypothesesaboutPopulationMeans
TheF-Distribution
F-TestfortheEqualityofTwoPopulationVariances
Correlation
Chapter15:RegressionAnalysis
TheFundamentalAssumption:VariablesHaveaLinearRelationship
DefiningthePopulationRegressionEquation
EstimatingthePopulationRegressionEquation
TestingtheEstimatedRegressionEquation
UsingStatisticalSoftware
AssumptionsofSimpleLinearRegression
MultipleRegressionAnalysis
Multicollinearity
Chapter16:WhenYou’veGottheTime:TimeSeriesAnalysis
KeyPropertiesofaTimeSeries
ForecastingwithDecompositionMethods
SmoothingTechniques
SeasonalComponents
ModelingaTimeSerieswithRegressionAnalysis
ComparingDifferentModels:MADandMSE
PartIV:BigDataApplications
Chapter17:UsingYourCrystalBall:ForecastingwithBigData
ARIMAModeling
SimulationTechniques
Chapter18:CrunchingNumbers:PerformingStatisticalAnalysis
onYourComputer
ExcellingatExcel
ProgrammingwithVisualBasicforApplications(VBA)
R,Matey!
Chapter19:SeekingFreeSourcesofFinancialData
Yahoo!Finance
FederalReserveEconomicData(FRED)
BoardofGovernorsoftheFederalReserveSystem
U.S.DepartmentoftheTreasury
OtherUsefulFinancialWebsites
PartV:ThePartofTens
Chapter20:Ten(orSo)BestPracticesinDataPreparation
CheckDataFormats
VerifyDataTypes
GraphYourData
VerifyDataAccuracy
IdentifyOutliers
DealwithMissingValues
CheckYourAssumptionsaboutHowtheDataIsDistributed
BackUpandDocumentEverythingYouDo
Chapter21:Ten(orSo)QuestionsAnsweredbyExploratory
DataAnalysis(EDA)
WhatAretheKeyPropertiesofaDataset?
What’stheCenteroftheData?
HowMuchSpreadIsThereintheData?
IstheDataSkewed?
WhatDistributionDoestheDataFollow?
AretheElementsintheDatasetUncorrelated?
DoestheCenteroftheDatasetChangeOverTime?
DoestheSpreadoftheDatasetChangeOverTime?
AreThereOutliersintheData?
DoestheDataConformtoOurAssumptions?
AbouttheAuthors
CheatSheet
AdvertisementPage
ConnectwithDummies
EndUserLicenseAgreement
Introduction
WelcometoStatisticsForBigDataForDummies!Everyday,whathascometobe
knownasbigdataismakingitsinfluencefeltinourlives.Someofthemostuseful
innovationsofthepast20yearshavebeenmadepossiblebytheadventofmassive
data-gatheringcapabilitiescombinedwithrapidlyimprovingcomputertechnology.
Forexample,ofcourse,wehavebecomeaccustomedtofindingalmostanyinformation
weneedthroughtheInternet.Youcanlocatenearlyanythingunderthesun
immediatelybyusingasearchenginesuchasGoogleorDuckDuckGo.Finding
informationthiswayhasbecomesocommonplacethatGooglehasslowlybecomea
verb,asin“Idon’tknowwheretofindthatrestaurant—I’lljustGoogleit.”Justthink
howmuchmoreefficientourliveshavebecomeasaresultofsearchengines.Buthow
doesGooglework?Googlecouldn’texistwithouttheabilitytoprocessmassive
quantitiesofinformationatanextremelyrapidspeed,anditssoftwarehastobe
extremelyefficient.
Anotherareathathaschangedourlivesforeverise-commerce,ofwhichtheclassic
exampleisAmazon.com.Peoplecanbuyvirtuallyeveryproducttheyuseintheirdaily
livesonline(andhaveitdeliveredpromptly,too).Oftenonlinepricesarelowerthanin
traditional“brick-and-mortar”stores,andtherangeofchoicesiswider.Online
shoppingalsoletspeoplefindthebestavailableitemsatthelowestpossibleprices.
Anotherhugeadvantagetoonlineshoppingistheabilityofthesellerstoprovide
reviewsofproductsandrecommendationsforfuturepurchases.Reviewsfromother
shopperscangiveextremelyimportantinformationthatisn’tavailablefromasimple
productdescriptionprovidedbymanufacturers.Andrecommendationsforfuture
purchasesareagreatwayforconsumerstofindnewproductsthattheymightnot
otherwisehaveknownabout.Recommendationsareenabledbyoneapplicationofbig
data—theuseofhighlysophisticatedprogramsthatanalyzeshoppingdataand
identifyitemsthattendtobepurchasedbythesameconsumers.
Althoughonlineshoppingisnowsecondnatureformanyconsumers,therealityisthat
e-commercehasonlycomeintoitsowninthelast15–20years,largelythankstothe
riseofbigdata.AwebsitesuchasAmazon.commustprocessquantitiesofinformation
thatwouldhavebeenunthinkablygiganticjustafewyearsago,andthatprocessing
mustbedonequicklyandefficiently.Thankstorapidlyimprovingtechnology,many
traditionalretailersnowalsooffertheoptionofmakingpurchasesonline;failuretodo
sowouldputaretaileratahugecompetitivedisadvantage.
Inadditiontosearchenginesande-commerce,bigdataismakingamajorimpactina
surprisingnumberofotherareasthataffectourdailylives:
Socialmedia
Onlineauctionsites
Insurance
Healthcare
Energy
Politicalpolling
Weatherforecasting
Education
Travel
Finance
AboutThisBook
Thisbookisintendedasanoverviewofthefieldofbigdata,withafocusonthe
statisticalmethodsused.Italsoprovidesalookatseveralkeyapplicationsofbigdata.
Bigdataisabroadtopic;itincludesquantitativesubjectssuchasmath,statistics,
computerscience,anddatascience.Bigdataalsocoversmanyapplications,suchas
weatherforecasting,financialmodeling,politicalpollingmethods,andsoforth.
Ourintentionsforthisbookspecificallyincludethefollowing:
Provideanoverviewofthefieldofbigdata.
Introducemanyusefulapplicationsofbigdata.
Showhowdatamaybeorganizedandcheckedforbadormissinginformation.
Showhowtohandleoutliersinadataset.
Explainhowtoidentifyassumptionsthataremadewhenanalyzingdata.
Provideadetailedexplanationofhowdatamaybeanalyzedwithgraphical
techniques.
Coverseveralkeyunivariate(involvingonlyonevariable)statisticaltechniquesfor
analyzingdata.
Explainwidelyusedmultivariate(involvingmorethanonevariable)statistical
techniques.
Provideanoverviewofmodelingtechniquessuchasregressionanalysis.
Explainthetechniquesthatarecommonlyusedtoanalyzetimeseriesdata.
Covertechniquesusedtoforecastthefuturevaluesofadataset.
Provideabriefoverviewofsoftwarepackagesandhowtheycanbeusedtoanalyze
statisticaldata.
BecausethisisaForDummiesbook,thechaptersarewrittensoyoucanpickand
choosewhichevertopicsthatinterestyouthemostanddiverightin.There’snoneedto
readthechaptersinsequentialorder,althoughyoucertainlycould.Wedosuggest,
though,thatyoumakesureyou’recomfortablewiththeideasdevelopedinChapters4
and5beforeproceedingtothelaterchaptersinthebook.Eachchapteralsocontains
severaltips,reminders,andothertidbits,andinseveralcasestherearelinkstowebsites
youcanusetofurtherpursuethesubject.There’salsoanonlineCheatSheetthat
includesasummaryofkeyequationsforeaseofreference.
Asmentioned,thisisabigtopicandafairlynewfield.Spaceconstraintsmake
possibleonlyanintroductiontothestatisticalconceptsthatunderliebigdata.Butwe
hopeitisenoughtogetyoustartedintherightdirection.
FoolishAssumptions
Wemakesomeassumptionsaboutyou,thereader.Hopefully,oneofthefollowing
descriptionsfitsyou:
You’veheardaboutbigdataandwouldliketolearnmoreaboutit.
You’dliketousebigdatainanapplicationbutdon’thavesufficientbackgroundin
statisticalmodeling.
Youdon’tknowhowtoimplementstatisticalmodelsinasoftwarepackage.
Possiblyallofthesearetrue.Thisbookshouldgiveyouagoodstartingpointfor
advancingyourinterestinthisfield.Clearly,youarealreadymotivated.
Thisbookdoesnotassumeanyparticularlyadvancedknowledgeofmathematicsand
statistics.Theideasaredevelopedfromfairlymundanemathematicaloperations.Butit
may,inmanyplaces,requireyoutotakeadeepbreathandnotgetintimidatedbythe
formulas.
IconsUsedinThisBook
Throughoutthebook,weincludeseveraliconsdesignedtopointoutspecifickindsof
information.Keepaneyeoutforthem:
ATippointsoutespeciallyhelpfulorpracticalinformationaboutatopic.It
maybehard-wonadviceonthebestwaytodosomethingorausefulinsightthat
maynothavebeenobviousatfirstglance.
AWarningisusedwheninformationmustbetreatedcarefully.Theseicons
pointoutpotentialproblemsortroubleyoumayencounter.Theyalsohighlight
mistakenassumptionsthatcouldleadtodifficulties.
TechnicalStuffpointsoutstuffthatmaybeinterestingifyou’rereallycurious
aboutsomething,butwhichisnotessential.Youcansafelyskiptheseifyou’rein
ahurryorjustlookingforthebasics.
Rememberisusedtoindicatestuffthatmayhavebeenpreviouslyencountered
inthebookorthatyouwilldowelltostashsomewhereinyourmemoryforfuture
benefit.
BeyondtheBook
Besidesthepagesorpixelsyou’representlyperusing,thisbookcomeswithevenmore
goodiesonline.YoucancheckouttheCheatSheetat
www.dummies.com/cheatsheet/statisticsforbigdata.
We’vealsowrittensomeadditionalmaterialthatwouldn’tquitefitinthebook.Ifthis
bookwereaDVD,thesewouldbeontheBonusContentdisc.Thishandfulofextra
articlesonvariousmini-topicsrelatedtobigdataisavailableat
www.dummies.com/extras/statisticsforbigdata.
WheretoGoFromHere
Youcanapproachthisbookfromseveraldifferentangles.Youcan,ofcourse,startwith
Chapter1andreadstraightthroughtotheend.Butyoumaynothavetimeforthat,or
maybeyouarealreadyfamiliarwithsomeofthebasics.Wesuggestcheckingoutthe
tableofcontentstoseeamapofwhat’scoveredinthebookandthenflippingtoany
particularchapterthatcatchesyoureye.Orifyou’vegotaspecificbigdataissueor
topicyou’reburningtoknowmoreabout,trylookingitupintheindex.
Onceyou’redonewiththebook,youcanfurtheryourbigdataadventure(whereelse?)
ontheInternet.InstructionalvideosareavailableonwebsitessuchasYouTube.Online
courses,manyofthemfree,arealsobecomingavailable.Someareproducedbyprivate
companiessuchasCoursera;othersareofferedbymajoruniversitiessuchasYaleand
M.I.T.Ofcourse,manynewbooksarebeingwritteninthefieldofbigdataduetoits
increasingimportance.
Ifyou’reevenmoreambitious,youwillfindspecializedcoursesatthecollege
undergraduateandgraduatelevelsinsubjectareassuchasstatistics,computerscience,
informationtechnology,andsoforth.Inordertosatisfytheexpectedfuturedemandfor
bigdataspecialists,severalschoolsarenowofferingaconcentrationorafulldegreein
DataScience.
Theresourcesarethere;youshouldbeabletotakeyourselfasfarasyouwanttogoin
thefieldofbigdata.Goodluck!
PartI
IntroducingBigDataStatistics
Visitwww.dummies.comforGreatDummiescontentonline.
Inthispart…
Introducingbigdataandstuffit’susedfor
ExploringthethreeVsofbigdata
Checkingoutthehotbigdataapplications
Discoveringprobabilitiesandotherbasicstatisticalidea
Chapter1
WhatIsBigDataandWhatDoYouDo
withIt?
InThisChapter
Understandingwhatbigdataisallabout
SeeinghowdatamaybeanalyzedusingExploratoryDataAnalysis(EDA)
Gaininginsightintosomeofthekeystatisticaltechniquesusedtoanalyzebigdata
Bigdatareferstosetsofdatathatarefartoomassivetobehandledwithtraditional
hardware.Bigdataisalsoproblematicforsoftwaresuchasdatabasesystems,statistical
packages,andsoforth.Inrecentyears,data-gatheringcapabilitieshaveexperienced
explosivegrowth,sothatstoringandanalyzingtheresultingdatahasbecome
progressivelymorechallenging.
Manyfieldshavebeenaffectedbytheincreasingavailabilityofdata,includingfinance,
marketing,ande-commerce.Bigdatahasalsorevolutionizedmoretraditionalfields
suchaslawandmedicine.Ofcourse,bigdataisgatheredonamassivescalebysearch
enginessuchasGoogleandsocialmediasitessuchasFacebook.Thesedevelopments
haveledtotheevolutionofanentirelynewprofession:thedatascientist,someone
whocancombinethefieldsofstatistics,math,computerscience,andengineeringwith
knowledgeofaspecificapplication.
Thischapterintroducesseveralkeyconceptsthatarediscussedthroughoutthebook.
Theseincludethecharacteristicsofbigdata,applicationsofbigdata,keystatistical
toolsforanalyzingbigdata,andforecastingtechniques.
CharacteristicsofBigData
Thethreefactorsthatdistinguishbigdatafromothertypesofdataarevolume,velocity,
andvariety.
Clearly,withbigdata,thevolumeismassive.Infact,newterminologymustbeusedto
describethesizeofthesedatasets.Forexample,onepetabyteofdataconsistsof
bytesofdata.That’s1,000trillionbytes!
Abyteisasingleunitofstorageinacomputer’smemory.Abyteisusedto
representasinglenumber,character,orsymbol.Abyteconsistsofeightbits,each
consistingofeithera0ora1.
Velocityreferstothespeedatwhichdataisgathered.Bigdatasetsconsistofdatathat’s
continuouslygatheredatveryhighspeeds.Forexample,ithasbeenestimatedthat
Twitterusersgeneratemorethanaquarterofamilliontweetseveryminute.This
requiresamassiveamountofstoragespaceaswellasreal-timeprocessingofthedata.
Varietyreferstothefactthatthecontentsofabigdatasetmayconsistofanumberof
differentformats,includingspreadsheets,videos,musicclips,emailmessages,andso
on.Storingahugequantityoftheseincompatibletypesisoneofthemajorchallenges
ofbigdata.
Chapter2coversthesecharacteristicsinmoredetail.
ExploratoryDataAnalysis(EDA)
Beforeyouapplystatisticaltechniquestoadataset,it’simportanttoexaminethedata
tounderstanditsbasicproperties.Youcanuseaseriesoftechniquesthatare
collectivelyknownasExploratoryDataAnalysis(EDA)toanalyzeadataset.EDA
helpsensurethatyouchoosethecorrectstatisticaltechniquestoanalyzeandforecast
thedata.ThetwobasictypesofEDAtechniquesaregraphicaltechniquesand
quantitativetechniques.
GraphicalEDAtechniques
GraphicalEDAtechniquesshowthekeypropertiesofadatasetinaconvenientformat.
It’softeneasiertounderstandthepropertiesofavariableandtherelationshipsbetween
variablesbylookingatgraphsratherthanlookingattherawdata.Youcanuseseveral
graphicaltechniques,dependingonthetypeofdatabeinganalyzed.Chapters11and12
explainhowtocreateandusethefollowing:
Boxplots
Histograms
Normalprobabilityplots
Scatterplots
QuantitativeEDAtechniques
QuantitativeEDAtechniquesprovideamorerigorousmethodofdeterminingthekey
propertiesofadataset.Twoofthemostimportantofthesetechniquesare
Intervalestimation(discussedinChapter11).
Hypothesistesting(introducedinChapter5).
Intervalestimatesareusedtocreatearangeofvalueswithinwhichavariableislikely
tofall.Hypothesistestingisusedtotestvariouspropositionsaboutadataset,suchas
Themeanvalueofthedataset.
Thestandarddeviationofthedataset.
Theprobabilitydistributionthedatasetfollows.
Hypothesistestingisacoretechniqueinstatisticsandisusedthroughoutthechapters
inPartIIIofthisbook.
StatisticalAnalysisofBigData
Gatheringandstoringmassivequantitiesofdataisamajorchallenge,butultimately
thebiggestandmostimportantchallengeofbigdataisputtingittogooduse.
Forexample,amassivequantityofdatacanbehelpfultoacompany’smarketing
researchdepartmentonlyifitcanidentifythekeydriversofthedemandforthe
company’sproducts.Politicalpollingfirmshaveaccesstomassiveamountsof
demographicdataaboutvoters;thisinformationmustbeanalyzedintensivelytofind
thekeyfactorsthatcanleadtoasuccessfulpoliticalcampaign.Ahedgefundcan
developtradingstrategiesfrommassivequantitiesoffinancialdatabyfindingobscure
patternsinthedatathatcanbeturnedintoprofitablestrategies.
Manystatisticaltechniquescanbeusedtoanalyzedatatofindusefulpatterns:
ProbabilitydistributionsareintroducedinChapter4andexploredatgreaterlength
inChapter13.
RegressionanalysisisthemaintopicofChapter15.
TimeseriesanalysisistheprimaryfocusofChapter16.
ForecastingtechniquesarediscussedinChapter17.
Probabilitydistributions
Youuseaprobabilitydistributiontocomputetheprobabilitiesassociatedwiththe
elementsofadataset.Thefollowingdistributionsaredescribedandappliedinthis
book:
Binomialdistribution:Youwouldusethebinomialdistributiontoanalyze
variablesthatcanassumeonlyoneoftwovalues.Forexample,youcould
determinetheprobabilitythatagivenpercentageofmembersatasportsclubare
left-handed.SeeChapter4fordetails.
Poissondistribution:YouwouldusethePoissondistributiontodescribethe
likelihoodofagivennumberofeventsoccurringoveranintervaloftime.For
example,itcouldbeusedtodescribetheprobabilityofaspecifiednumberofhits
onawebsiteoverthecominghour.SeeChapter13fordetails.
Normaldistribution:Thenormaldistributionisthemostwidelyusedprobability
distributioninmostdisciplines,includingeconomics,finance,marketing,biology,
psychology,andmanyothers.Oneofthecharacteristicfeaturesofthenormal
distributionissymmetry—theprobabilityofavariablebeingagivendistance
belowthemeanofthedistributionequalstheprobabilityofitbeingthesame
distanceabovethemean.Forexample,ifthemeanheightofallmenintheUnited
Statesis70inches,andheightsarenormallydistributed,arandomlychosenmanis
equallylikelytobebetween68and70inchestallasheistobebetween70and72
inchestall.SeeChapter4andthechaptersinPartsIIIandIVfordetails.
Thenormaldistributionworkswellwithmanyapplications.Forexample,it’soften
usedinthefieldoffinancetodescribethereturnstofinancialassets.Duetoitsease
ofinterpretationandimplementation,thenormaldistributionissometimesused
evenwhentheassumptionofnormalityisonlyapproximatelycorrect.
TheStudent’st-distribution:TheStudent’st-distributionissimilartothenormal
distribution,butwiththeStudent’st-distribution,extremelysmallorextremely
largevaluesaremuchmorelikelytooccur.Thisdistributionisoftenusedin
situationswhereavariableexhibitstoomuchvariationtobeconsistentwiththe
normaldistribution.Thisistruewhenthepropertiesofsmallsamplesarebeing
analyzed.Withsmallsamples,thevariationamongsamplesislikelytobequite
considerable,sothenormaldistributionshouldn’tbeusedtodescribetheir
properties.SeeChapter13fordetails.
Note:TheStudent’st-distributionwasdevelopedbyW.S.Gossetwhileemployed
attheGuinnessbrewingcompany.Hewasattemptingtodescribethepropertiesof
smallsamplemeans.
Thechi-squaredistribution:Thechi-squaredistributionisappropriateforseveral
typesofapplications.Forexample,youcanuseittodeterminewhethera
populationfollowsaparticularprobabilitydistribution.Youcanalsouseittotest
whetherthevarianceofapopulationequalsaspecifiedvalue,andtotestforthe
independenceoftwodatasets.SeeChapter13fordetails.
TheF-distribution:TheF-distributionisderivedfromthechi-squaredistribution.
Youuseittotestwhetherthevariancesoftwopopulationsequaleachother.TheFdistributionisalsousefulinapplicationssuchasregressionanalysis(coverednext).
SeeChapter14fordetails.
Regressionanalysis
Regressionanalysisisusedtoestimatethestrengthanddirectionoftherelationship
betweenvariablesthatarelinearlyrelatedtoeachother.Chapter15discussesthistopic
atlength.
TwovariablesXandYaresaidtobelinearlyrelatediftherelationshipbetween
themcanbewrittenintheform
where
mistheslope,orthechangeinYduetoagivenchangeinX
bistheintercept,orthevalueofYwhenX=0
Asanexampleofregressionanalysis,supposeacorporationwantstodetermine