Tải bản đầy đủ (.pdf) (412 trang)

Statistics for big data for dummies alan anderson

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (7.29 MB, 412 trang )



StatisticsForBigDataForDummies®
Publishedby:JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ07030-5774,
www.wiley.com

Copyright©2015byJohnWiley&Sons,Inc.,Hoboken,NewJersey
PublishedsimultaneouslyinCanada
Nopartofthispublicationmaybereproduced,storedinaretrievalsystemor
transmittedinanyformorbyanymeans,electronic,mechanical,photocopying,
recording,scanningorotherwise,exceptaspermittedunderSections107or108ofthe
1976UnitedStatesCopyrightAct,withoutthepriorwrittenpermissionofthe
Publisher.RequeststothePublisherforpermissionshouldbeaddressedtothe
PermissionsDepartment,JohnWiley&Sons,Inc.,111RiverStreet,Hoboken,NJ
07030,(201)748-6011,fax(201)748-6008,oronlineat
/>Trademarks:Wiley,ForDummies,theDummiesManlogo,Dummies.com,Making
EverythingEasier,andrelatedtradedressaretrademarksorregisteredtrademarksof
JohnWiley&Sons,Inc.,andmaynotbeusedwithoutwrittenpermission.Allother
trademarksarethepropertyoftheirrespectiveowners.JohnWiley&Sons,Inc.,isnot
associatedwithanyproductorvendormentionedinthisbook.
LIMITOFLIABILITY/DISCLAIMEROFWARRANTY:WHILETHE
PUBLISHERANDAUTHORHAVEUSEDTHEIRBESTEFFORTSIN
PREPARINGTHISBOOK,THEYMAKENOREPRESENTATIONSOR
WARRANTIESWITHRESPECTTOTHEACCURACYOR
COMPLETENESSOFTHECONTENTSOFTHISBOOKAND
SPECIFICALLYDISCLAIMANYIMPLIEDWARRANTIESOF
MERCHANTABILITYORFITNESSFORAPARTICULARPURPOSE.NO
WARRANTYMAYBECREATEDOREXTENDEDBYSALES
REPRESENTATIVESORWRITTENSALESMATERIALS.THEADVICE
ANDSTRATEGIESCONTAINEDHEREINMAYNOTBESUITABLEFOR
YOURSITUATION.YOUSHOULDCONSULTWITHAPROFESSIONAL


WHEREAPPROPRIATE.NEITHERTHEPUBLISHERNORTHEAUTHOR
SHALLBELIABLEFORDAMAGESARISINGHEREFROM.
Forgeneralinformationonourotherproductsandservices,pleasecontactour
CustomerCareDepartmentwithintheU.S.at877-762-2974,outsidetheU.S.at317572-3993,orfax317-572-4002.Fortechnicalsupport,pleasevisit
www.wiley.com/techsupport.
Wileypublishesinavarietyofprintandelectronicformatsandbyprint-on-demand.
Somematerialincludedwithstandardprintversionsofthisbookmaynotbeincluded
ine-booksorinprint-on-demand.IfthisbookreferstomediasuchasaCDorDVD
thatisnotincludedintheversionyoupurchased,youmaydownloadthismaterialat
.FormoreinformationaboutWileyproducts,visit


www.wiley.com.

LibraryofCongressControlNumber:2015943222
ISBN978-1-118-94001-3(pbk);ISBN978-1-118-94002-0(ePub);ISBN978-1-11894003-7(ePDF)


StatisticsForBigDataForDummies
Visit
to
viewthisbook’scheatsheet.
TableofContents
Cover
Introduction
AboutThisBook
FoolishAssumptions
IconsUsedinThisBook
BeyondtheBook
WheretoGoFromHere


PartI:IntroducingBigDataStatistics
Chapter1:WhatIsBigDataandWhatDoYouDowithIt?
CharacteristicsofBigData
ExploratoryDataAnalysis(EDA)
StatisticalAnalysisofBigData

Chapter2:CharacteristicsofBigData:TheThreeVs
CharacteristicsofBigData
TraditionalDatabaseManagementSystems(DBMS)

Chapter3:UsingBigData:TheHotApplications
BigDataandWeatherForecasting
BigDataandHealthcareServices
BigDataandInsurance
BigDataandFinance
BigDataandElectricUtilities
BigDataandHigherEducation
BigDataandRetailers
BigDataandSearchEngines
BigDataandSocialMedia

Chapter4:UnderstandingProbabilities
TheCoreStructure:ProbabilitySpaces
DiscreteProbabilityDistributions
ContinuousProbabilityDistributions
IntroducingMultivariateProbabilityDistributions


Chapter5:BasicStatisticalIdeas

SomePreliminariesRegardingData
SummaryStatisticalMeasures
OverviewofHypothesisTesting
Higher-OrderMeasures

PartII:PreparingandCleaningData
Chapter6:DirtyWork:PreparingYourDataforAnalysis
PassingtheEyeTest:DoesYourDataLookCorrect?
BeingCarefulwithDates
DoestheDataMakeSense?
FrequentlyEncounteredDataHeadaches
OtherCommonDataTransformations

Chapter7:FiguringtheFormat:ImportantComputerFile
Formats
SpreadsheetFormats
DatabaseFormats

Chapter8:CheckingAssumptions:TestingforNormality
Goodnessoffittest
Jarque-Beratest

Chapter9:DealingwithMissingorIncompleteData
MissingData:What’stheProblem?
TechniquesforDealingwithMissingData

Chapter10:SendingOutaPosse:SearchingforOutliers
TestingforOutliers
RobustStatistics
DealingwithOutliers


PartIII:ExploratoryDataAnalysis(EDA)
Chapter11:AnOverviewofExploratoryDataAnalysis(EDA)
GraphicalEDATechniques
EDATechniquesforTestingAssumptions
QuantitativeEDATechniques

Chapter12:APlottoGetGraphical:GraphicalTechniques
Stem-and-LeafPlots
ScatterPlots
BoxPlots
Histograms
Quantile-Quantile(QQ)Plots
AutocorrelationPlots

Chapter13:You’retheOnlyVariableforMe:Univariate


StatisticalTechniques
CountingEventsOveraTimeInterval:ThePoissonDistribution
ContinuousProbabilityDistributions

Chapter14:ToAlltheVariablesWe’veEncountered:
MultivariateStatisticalTechniques
TestingHypothesesaboutTwoPopulationMeans
UsingAnalysisofVariance(ANOVA)toTestHypothesesaboutPopulationMeans
TheF-Distribution
F-TestfortheEqualityofTwoPopulationVariances
Correlation


Chapter15:RegressionAnalysis
TheFundamentalAssumption:VariablesHaveaLinearRelationship
DefiningthePopulationRegressionEquation
EstimatingthePopulationRegressionEquation
TestingtheEstimatedRegressionEquation
UsingStatisticalSoftware
AssumptionsofSimpleLinearRegression
MultipleRegressionAnalysis
Multicollinearity

Chapter16:WhenYou’veGottheTime:TimeSeriesAnalysis
KeyPropertiesofaTimeSeries
ForecastingwithDecompositionMethods
SmoothingTechniques
SeasonalComponents
ModelingaTimeSerieswithRegressionAnalysis
ComparingDifferentModels:MADandMSE

PartIV:BigDataApplications
Chapter17:UsingYourCrystalBall:ForecastingwithBigData
ARIMAModeling
SimulationTechniques

Chapter18:CrunchingNumbers:PerformingStatisticalAnalysis
onYourComputer
ExcellingatExcel
ProgrammingwithVisualBasicforApplications(VBA)
R,Matey!

Chapter19:SeekingFreeSourcesofFinancialData

Yahoo!Finance
FederalReserveEconomicData(FRED)
BoardofGovernorsoftheFederalReserveSystem


U.S.DepartmentoftheTreasury
OtherUsefulFinancialWebsites

PartV:ThePartofTens
Chapter20:Ten(orSo)BestPracticesinDataPreparation
CheckDataFormats
VerifyDataTypes
GraphYourData
VerifyDataAccuracy
IdentifyOutliers
DealwithMissingValues
CheckYourAssumptionsaboutHowtheDataIsDistributed
BackUpandDocumentEverythingYouDo

Chapter21:Ten(orSo)QuestionsAnsweredbyExploratory
DataAnalysis(EDA)
WhatAretheKeyPropertiesofaDataset?
What’stheCenteroftheData?
HowMuchSpreadIsThereintheData?
IstheDataSkewed?
WhatDistributionDoestheDataFollow?
AretheElementsintheDatasetUncorrelated?
DoestheCenteroftheDatasetChangeOverTime?
DoestheSpreadoftheDatasetChangeOverTime?
AreThereOutliersintheData?

DoestheDataConformtoOurAssumptions?

AbouttheAuthors
CheatSheet
AdvertisementPage
ConnectwithDummies
EndUserLicenseAgreement


Introduction
WelcometoStatisticsForBigDataForDummies!Everyday,whathascometobe
knownasbigdataismakingitsinfluencefeltinourlives.Someofthemostuseful
innovationsofthepast20yearshavebeenmadepossiblebytheadventofmassive
data-gatheringcapabilitiescombinedwithrapidlyimprovingcomputertechnology.
Forexample,ofcourse,wehavebecomeaccustomedtofindingalmostanyinformation
weneedthroughtheInternet.Youcanlocatenearlyanythingunderthesun
immediatelybyusingasearchenginesuchasGoogleorDuckDuckGo.Finding
informationthiswayhasbecomesocommonplacethatGooglehasslowlybecomea
verb,asin“Idon’tknowwheretofindthatrestaurant—I’lljustGoogleit.”Justthink
howmuchmoreefficientourliveshavebecomeasaresultofsearchengines.Buthow
doesGooglework?Googlecouldn’texistwithouttheabilitytoprocessmassive
quantitiesofinformationatanextremelyrapidspeed,anditssoftwarehastobe
extremelyefficient.
Anotherareathathaschangedourlivesforeverise-commerce,ofwhichtheclassic
exampleisAmazon.com.Peoplecanbuyvirtuallyeveryproducttheyuseintheirdaily
livesonline(andhaveitdeliveredpromptly,too).Oftenonlinepricesarelowerthanin
traditional“brick-and-mortar”stores,andtherangeofchoicesiswider.Online
shoppingalsoletspeoplefindthebestavailableitemsatthelowestpossibleprices.
Anotherhugeadvantagetoonlineshoppingistheabilityofthesellerstoprovide
reviewsofproductsandrecommendationsforfuturepurchases.Reviewsfromother

shopperscangiveextremelyimportantinformationthatisn’tavailablefromasimple
productdescriptionprovidedbymanufacturers.Andrecommendationsforfuture
purchasesareagreatwayforconsumerstofindnewproductsthattheymightnot
otherwisehaveknownabout.Recommendationsareenabledbyoneapplicationofbig
data—theuseofhighlysophisticatedprogramsthatanalyzeshoppingdataand
identifyitemsthattendtobepurchasedbythesameconsumers.
Althoughonlineshoppingisnowsecondnatureformanyconsumers,therealityisthat
e-commercehasonlycomeintoitsowninthelast15–20years,largelythankstothe
riseofbigdata.AwebsitesuchasAmazon.commustprocessquantitiesofinformation
thatwouldhavebeenunthinkablygiganticjustafewyearsago,andthatprocessing
mustbedonequicklyandefficiently.Thankstorapidlyimprovingtechnology,many
traditionalretailersnowalsooffertheoptionofmakingpurchasesonline;failuretodo
sowouldputaretaileratahugecompetitivedisadvantage.
Inadditiontosearchenginesande-commerce,bigdataismakingamajorimpactina
surprisingnumberofotherareasthataffectourdailylives:
Socialmedia
Onlineauctionsites


Insurance
Healthcare
Energy
Politicalpolling
Weatherforecasting
Education
Travel
Finance


AboutThisBook

Thisbookisintendedasanoverviewofthefieldofbigdata,withafocusonthe
statisticalmethodsused.Italsoprovidesalookatseveralkeyapplicationsofbigdata.
Bigdataisabroadtopic;itincludesquantitativesubjectssuchasmath,statistics,
computerscience,anddatascience.Bigdataalsocoversmanyapplications,suchas
weatherforecasting,financialmodeling,politicalpollingmethods,andsoforth.
Ourintentionsforthisbookspecificallyincludethefollowing:
Provideanoverviewofthefieldofbigdata.
Introducemanyusefulapplicationsofbigdata.
Showhowdatamaybeorganizedandcheckedforbadormissinginformation.
Showhowtohandleoutliersinadataset.
Explainhowtoidentifyassumptionsthataremadewhenanalyzingdata.
Provideadetailedexplanationofhowdatamaybeanalyzedwithgraphical
techniques.
Coverseveralkeyunivariate(involvingonlyonevariable)statisticaltechniquesfor
analyzingdata.
Explainwidelyusedmultivariate(involvingmorethanonevariable)statistical
techniques.
Provideanoverviewofmodelingtechniquessuchasregressionanalysis.
Explainthetechniquesthatarecommonlyusedtoanalyzetimeseriesdata.
Covertechniquesusedtoforecastthefuturevaluesofadataset.
Provideabriefoverviewofsoftwarepackagesandhowtheycanbeusedtoanalyze
statisticaldata.
BecausethisisaForDummiesbook,thechaptersarewrittensoyoucanpickand
choosewhichevertopicsthatinterestyouthemostanddiverightin.There’snoneedto
readthechaptersinsequentialorder,althoughyoucertainlycould.Wedosuggest,
though,thatyoumakesureyou’recomfortablewiththeideasdevelopedinChapters4
and5beforeproceedingtothelaterchaptersinthebook.Eachchapteralsocontains
severaltips,reminders,andothertidbits,andinseveralcasestherearelinkstowebsites
youcanusetofurtherpursuethesubject.There’salsoanonlineCheatSheetthat
includesasummaryofkeyequationsforeaseofreference.

Asmentioned,thisisabigtopicandafairlynewfield.Spaceconstraintsmake
possibleonlyanintroductiontothestatisticalconceptsthatunderliebigdata.Butwe
hopeitisenoughtogetyoustartedintherightdirection.


FoolishAssumptions
Wemakesomeassumptionsaboutyou,thereader.Hopefully,oneofthefollowing
descriptionsfitsyou:
You’veheardaboutbigdataandwouldliketolearnmoreaboutit.
You’dliketousebigdatainanapplicationbutdon’thavesufficientbackgroundin
statisticalmodeling.
Youdon’tknowhowtoimplementstatisticalmodelsinasoftwarepackage.
Possiblyallofthesearetrue.Thisbookshouldgiveyouagoodstartingpointfor
advancingyourinterestinthisfield.Clearly,youarealreadymotivated.
Thisbookdoesnotassumeanyparticularlyadvancedknowledgeofmathematicsand
statistics.Theideasaredevelopedfromfairlymundanemathematicaloperations.Butit
may,inmanyplaces,requireyoutotakeadeepbreathandnotgetintimidatedbythe
formulas.


IconsUsedinThisBook
Throughoutthebook,weincludeseveraliconsdesignedtopointoutspecifickindsof
information.Keepaneyeoutforthem:

ATippointsoutespeciallyhelpfulorpracticalinformationaboutatopic.It
maybehard-wonadviceonthebestwaytodosomethingorausefulinsightthat
maynothavebeenobviousatfirstglance.

AWarningisusedwheninformationmustbetreatedcarefully.Theseicons
pointoutpotentialproblemsortroubleyoumayencounter.Theyalsohighlight

mistakenassumptionsthatcouldleadtodifficulties.

TechnicalStuffpointsoutstuffthatmaybeinterestingifyou’rereallycurious
aboutsomething,butwhichisnotessential.Youcansafelyskiptheseifyou’rein
ahurryorjustlookingforthebasics.

Rememberisusedtoindicatestuffthatmayhavebeenpreviouslyencountered
inthebookorthatyouwilldowelltostashsomewhereinyourmemoryforfuture
benefit.


BeyondtheBook
Besidesthepagesorpixelsyou’representlyperusing,thisbookcomeswithevenmore
goodiesonline.YoucancheckouttheCheatSheetat
www.dummies.com/cheatsheet/statisticsforbigdata.
We’vealsowrittensomeadditionalmaterialthatwouldn’tquitefitinthebook.Ifthis
bookwereaDVD,thesewouldbeontheBonusContentdisc.Thishandfulofextra
articlesonvariousmini-topicsrelatedtobigdataisavailableat
www.dummies.com/extras/statisticsforbigdata.


WheretoGoFromHere
Youcanapproachthisbookfromseveraldifferentangles.Youcan,ofcourse,startwith
Chapter1andreadstraightthroughtotheend.Butyoumaynothavetimeforthat,or
maybeyouarealreadyfamiliarwithsomeofthebasics.Wesuggestcheckingoutthe
tableofcontentstoseeamapofwhat’scoveredinthebookandthenflippingtoany
particularchapterthatcatchesyoureye.Orifyou’vegotaspecificbigdataissueor
topicyou’reburningtoknowmoreabout,trylookingitupintheindex.
Onceyou’redonewiththebook,youcanfurtheryourbigdataadventure(whereelse?)
ontheInternet.InstructionalvideosareavailableonwebsitessuchasYouTube.Online

courses,manyofthemfree,arealsobecomingavailable.Someareproducedbyprivate
companiessuchasCoursera;othersareofferedbymajoruniversitiessuchasYaleand
M.I.T.Ofcourse,manynewbooksarebeingwritteninthefieldofbigdataduetoits
increasingimportance.
Ifyou’reevenmoreambitious,youwillfindspecializedcoursesatthecollege
undergraduateandgraduatelevelsinsubjectareassuchasstatistics,computerscience,
informationtechnology,andsoforth.Inordertosatisfytheexpectedfuturedemandfor
bigdataspecialists,severalschoolsarenowofferingaconcentrationorafulldegreein
DataScience.
Theresourcesarethere;youshouldbeabletotakeyourselfasfarasyouwanttogoin
thefieldofbigdata.Goodluck!


PartI


IntroducingBigDataStatistics

Visitwww.dummies.comforGreatDummiescontentonline.



Inthispart…
Introducingbigdataandstuffit’susedfor
ExploringthethreeVsofbigdata
Checkingoutthehotbigdataapplications
Discoveringprobabilitiesandotherbasicstatisticalidea


Chapter1



WhatIsBigDataandWhatDoYouDo
withIt?
InThisChapter
Understandingwhatbigdataisallabout
SeeinghowdatamaybeanalyzedusingExploratoryDataAnalysis(EDA)
Gaininginsightintosomeofthekeystatisticaltechniquesusedtoanalyzebigdata
Bigdatareferstosetsofdatathatarefartoomassivetobehandledwithtraditional
hardware.Bigdataisalsoproblematicforsoftwaresuchasdatabasesystems,statistical
packages,andsoforth.Inrecentyears,data-gatheringcapabilitieshaveexperienced
explosivegrowth,sothatstoringandanalyzingtheresultingdatahasbecome
progressivelymorechallenging.
Manyfieldshavebeenaffectedbytheincreasingavailabilityofdata,includingfinance,
marketing,ande-commerce.Bigdatahasalsorevolutionizedmoretraditionalfields
suchaslawandmedicine.Ofcourse,bigdataisgatheredonamassivescalebysearch
enginessuchasGoogleandsocialmediasitessuchasFacebook.Thesedevelopments
haveledtotheevolutionofanentirelynewprofession:thedatascientist,someone
whocancombinethefieldsofstatistics,math,computerscience,andengineeringwith
knowledgeofaspecificapplication.
Thischapterintroducesseveralkeyconceptsthatarediscussedthroughoutthebook.
Theseincludethecharacteristicsofbigdata,applicationsofbigdata,keystatistical
toolsforanalyzingbigdata,andforecastingtechniques.


CharacteristicsofBigData
Thethreefactorsthatdistinguishbigdatafromothertypesofdataarevolume,velocity,
andvariety.
Clearly,withbigdata,thevolumeismassive.Infact,newterminologymustbeusedto
describethesizeofthesedatasets.Forexample,onepetabyteofdataconsistsof

bytesofdata.That’s1,000trillionbytes!

Abyteisasingleunitofstorageinacomputer’smemory.Abyteisusedto
representasinglenumber,character,orsymbol.Abyteconsistsofeightbits,each
consistingofeithera0ora1.
Velocityreferstothespeedatwhichdataisgathered.Bigdatasetsconsistofdatathat’s
continuouslygatheredatveryhighspeeds.Forexample,ithasbeenestimatedthat
Twitterusersgeneratemorethanaquarterofamilliontweetseveryminute.This
requiresamassiveamountofstoragespaceaswellasreal-timeprocessingofthedata.
Varietyreferstothefactthatthecontentsofabigdatasetmayconsistofanumberof
differentformats,includingspreadsheets,videos,musicclips,emailmessages,andso
on.Storingahugequantityoftheseincompatibletypesisoneofthemajorchallenges
ofbigdata.
Chapter2coversthesecharacteristicsinmoredetail.


ExploratoryDataAnalysis(EDA)
Beforeyouapplystatisticaltechniquestoadataset,it’simportanttoexaminethedata
tounderstanditsbasicproperties.Youcanuseaseriesoftechniquesthatare
collectivelyknownasExploratoryDataAnalysis(EDA)toanalyzeadataset.EDA
helpsensurethatyouchoosethecorrectstatisticaltechniquestoanalyzeandforecast
thedata.ThetwobasictypesofEDAtechniquesaregraphicaltechniquesand
quantitativetechniques.

GraphicalEDAtechniques
GraphicalEDAtechniquesshowthekeypropertiesofadatasetinaconvenientformat.
It’softeneasiertounderstandthepropertiesofavariableandtherelationshipsbetween
variablesbylookingatgraphsratherthanlookingattherawdata.Youcanuseseveral
graphicaltechniques,dependingonthetypeofdatabeinganalyzed.Chapters11and12
explainhowtocreateandusethefollowing:

Boxplots
Histograms
Normalprobabilityplots
Scatterplots

QuantitativeEDAtechniques
QuantitativeEDAtechniquesprovideamorerigorousmethodofdeterminingthekey
propertiesofadataset.Twoofthemostimportantofthesetechniquesare
Intervalestimation(discussedinChapter11).
Hypothesistesting(introducedinChapter5).
Intervalestimatesareusedtocreatearangeofvalueswithinwhichavariableislikely
tofall.Hypothesistestingisusedtotestvariouspropositionsaboutadataset,suchas
Themeanvalueofthedataset.
Thestandarddeviationofthedataset.
Theprobabilitydistributionthedatasetfollows.
Hypothesistestingisacoretechniqueinstatisticsandisusedthroughoutthechapters
inPartIIIofthisbook.


StatisticalAnalysisofBigData
Gatheringandstoringmassivequantitiesofdataisamajorchallenge,butultimately
thebiggestandmostimportantchallengeofbigdataisputtingittogooduse.
Forexample,amassivequantityofdatacanbehelpfultoacompany’smarketing
researchdepartmentonlyifitcanidentifythekeydriversofthedemandforthe
company’sproducts.Politicalpollingfirmshaveaccesstomassiveamountsof
demographicdataaboutvoters;thisinformationmustbeanalyzedintensivelytofind
thekeyfactorsthatcanleadtoasuccessfulpoliticalcampaign.Ahedgefundcan
developtradingstrategiesfrommassivequantitiesoffinancialdatabyfindingobscure
patternsinthedatathatcanbeturnedintoprofitablestrategies.
Manystatisticaltechniquescanbeusedtoanalyzedatatofindusefulpatterns:

ProbabilitydistributionsareintroducedinChapter4andexploredatgreaterlength
inChapter13.
RegressionanalysisisthemaintopicofChapter15.
TimeseriesanalysisistheprimaryfocusofChapter16.
ForecastingtechniquesarediscussedinChapter17.

Probabilitydistributions
Youuseaprobabilitydistributiontocomputetheprobabilitiesassociatedwiththe
elementsofadataset.Thefollowingdistributionsaredescribedandappliedinthis
book:
Binomialdistribution:Youwouldusethebinomialdistributiontoanalyze
variablesthatcanassumeonlyoneoftwovalues.Forexample,youcould
determinetheprobabilitythatagivenpercentageofmembersatasportsclubare
left-handed.SeeChapter4fordetails.
Poissondistribution:YouwouldusethePoissondistributiontodescribethe
likelihoodofagivennumberofeventsoccurringoveranintervaloftime.For
example,itcouldbeusedtodescribetheprobabilityofaspecifiednumberofhits
onawebsiteoverthecominghour.SeeChapter13fordetails.
Normaldistribution:Thenormaldistributionisthemostwidelyusedprobability
distributioninmostdisciplines,includingeconomics,finance,marketing,biology,
psychology,andmanyothers.Oneofthecharacteristicfeaturesofthenormal
distributionissymmetry—theprobabilityofavariablebeingagivendistance
belowthemeanofthedistributionequalstheprobabilityofitbeingthesame
distanceabovethemean.Forexample,ifthemeanheightofallmenintheUnited
Statesis70inches,andheightsarenormallydistributed,arandomlychosenmanis
equallylikelytobebetween68and70inchestallasheistobebetween70and72
inchestall.SeeChapter4andthechaptersinPartsIIIandIVfordetails.


Thenormaldistributionworkswellwithmanyapplications.Forexample,it’soften

usedinthefieldoffinancetodescribethereturnstofinancialassets.Duetoitsease
ofinterpretationandimplementation,thenormaldistributionissometimesused
evenwhentheassumptionofnormalityisonlyapproximatelycorrect.
TheStudent’st-distribution:TheStudent’st-distributionissimilartothenormal
distribution,butwiththeStudent’st-distribution,extremelysmallorextremely
largevaluesaremuchmorelikelytooccur.Thisdistributionisoftenusedin
situationswhereavariableexhibitstoomuchvariationtobeconsistentwiththe
normaldistribution.Thisistruewhenthepropertiesofsmallsamplesarebeing
analyzed.Withsmallsamples,thevariationamongsamplesislikelytobequite
considerable,sothenormaldistributionshouldn’tbeusedtodescribetheir
properties.SeeChapter13fordetails.
Note:TheStudent’st-distributionwasdevelopedbyW.S.Gossetwhileemployed
attheGuinnessbrewingcompany.Hewasattemptingtodescribethepropertiesof
smallsamplemeans.
Thechi-squaredistribution:Thechi-squaredistributionisappropriateforseveral
typesofapplications.Forexample,youcanuseittodeterminewhethera
populationfollowsaparticularprobabilitydistribution.Youcanalsouseittotest
whetherthevarianceofapopulationequalsaspecifiedvalue,andtotestforthe
independenceoftwodatasets.SeeChapter13fordetails.
TheF-distribution:TheF-distributionisderivedfromthechi-squaredistribution.
Youuseittotestwhetherthevariancesoftwopopulationsequaleachother.TheFdistributionisalsousefulinapplicationssuchasregressionanalysis(coverednext).
SeeChapter14fordetails.

Regressionanalysis
Regressionanalysisisusedtoestimatethestrengthanddirectionoftherelationship
betweenvariablesthatarelinearlyrelatedtoeachother.Chapter15discussesthistopic
atlength.

TwovariablesXandYaresaidtobelinearlyrelatediftherelationshipbetween
themcanbewrittenintheform


where
mistheslope,orthechangeinYduetoagivenchangeinX
bistheintercept,orthevalueofYwhenX=0
Asanexampleofregressionanalysis,supposeacorporationwantstodetermine


×