Tải bản đầy đủ (.ppt) (39 trang)

Statistics for business decision making and analysis robert stine and foster chapter 05

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (657.08 KB, 39 trang )


Chapter 5

Association between
Categorical Variables

Copyright © 2011 Pearson Education, Inc.


5.1 Contingency Tables
Which hosts send more buyers to
Amazon.com?


To answer this question we must gather data on
two categorical variables: Host and Purchase



Host identifies the originating site: MSN,
RecipeSource, or Yahoo; Purchase indicates
whether or not the visit results in a sale
3 of 39
Copyright © 2011 Pearson Education, Inc.


5.1 Contingency Tables
Consider Two Categorical Variables
Simultaneously



A table that shows counts of cases on one
categorical variable contingent on the value of
another (for every combination of both variables)



Cells in a contingency table are mutually
exclusive
4 of 39
Copyright © 2011 Pearson Education, Inc.


5.1 Contingency Tables
Contingency Table for Web Shopping

5 of 39
Copyright © 2011 Pearson Education, Inc.


5.1 Contingency Tables
Marginal and Conditional Distributions


Marginal distributions appear in the “margins” of a
contingency table and represent the totals
(frequencies) for each categorical variable
separately




Conditional distributions refer to counts within a
row or column of a contingency table (restricted to
cases satisfying a condition)
6 of 39
Copyright © 2011 Pearson Education, Inc.


5.1 Contingency Tables
Conditional Distribution of Purchase for each
Host (Column Counts and Percentages)

7 of 39
Copyright © 2011 Pearson Education, Inc.


5.1 Contingency Tables
Conditional Distribution


Reveals the percentage of purchases
among visitors from RecipeSource to be
much less than for MSN and Yahoo



Host and Purchase are associated

8 of 39
Copyright © 2011 Pearson Education, Inc.



5.1 Contingency Tables
Segmented Bar Charts


Used to display conditional distributions



Divides the bars in a bar chart into
segments that are proportional to the
percentage in each category of a second
variable
9 of 39
Copyright © 2011 Pearson Education, Inc.


5.1 Contingency Tables
Contingency Table of Purchase by Region

10 of 39
Copyright © 2011 Pearson Education, Inc.


5.1 Contingency Tables
Segmented Bar Chart Shows Association

11 of 39
Copyright © 2011 Pearson Education, Inc.



5.1 Contingency Tables
Mosaic Plots


Alternative to segmented bar chart



A plot in which the size of each “tile” is
proportional to the count in a cell of a
contingency table

12 of 39
Copyright © 2011 Pearson Education, Inc.


5.1 Contingency Tables
Contingency Table of Shirt Size by Style

13 of 39
Copyright © 2011 Pearson Education, Inc.


5.1 Contingency Tables
Mosaic Plot Shows Association

14 of 39
Copyright © 2011 Pearson Education, Inc.



4M Example 5.1: CAR THEFT
Motivation
Should insurance companies vary the
premiums for different car models (are
some cars more likely to be stolen than
others)?

15 of 39
Copyright © 2011 Pearson Education, Inc.


4M Example 5.1: CAR THEFT
Method
Data obtained from the National Highway Traffic
Safety Administration (NHTSA) on car theft for
seven popular models (two categorical variables:
type of car and whether the car was stolen).

16 of 39
Copyright © 2011 Pearson Education, Inc.


4M Example 5.1: CAR THEFT
Mechanics

17 of 39
Copyright © 2011 Pearson Education, Inc.



4M Example 5.1: CAR THEFT
Mechanics

18 of 39
Copyright © 2011 Pearson Education, Inc.


4M Example 5.1: CAR THEFT
Message
The Dodge Intrepid is more likely to be stolen than
other popular models. The data suggest that
higher premiums for theft insurance should be
charged for models that are more likely to be
stolen.

19 of 39
Copyright © 2011 Pearson Education, Inc.


5.2 Lurking Variables
and Simpson’s Paradox
Association Not Necessarily Causation


Lurking Variable: a concealed variable that
affects the apparent relationship between two
other variables




Simpson’s Paradox: a change in the association
between two variables when data are separated
into groups defined by a third variable
20 of 39
Copyright © 2011 Pearson Education, Inc.


4M Example 5.2: AIRLINE ARRIVALS
Motivation
Does it matter which of two airlines a
corporate CEO chooses when flying to
meetings if he wants to avoid delays?

21 of 39
Copyright © 2011 Pearson Education, Inc.


4M Example 5.2: AIRLINE ARRIVALS
Method
Data obtained from US Bureau of
Transportation Statistics on flight delays for
two airlines (two categorical variables:
airline and whether the flight arrived on
time).

22 of 39
Copyright © 2011 Pearson Education, Inc.


4M Example 5.2: AIRLINE ARRIVALS

Mechanics

23 of 39
Copyright © 2011 Pearson Education, Inc.


4M Example 5.2: AIRLINE ARRIVALS
Mechanics –
Is destination a lurking variable?

24 of 39
Copyright © 2011 Pearson Education, Inc.


4M Example 5.2: AIRLINE ARRIVALS
Mechanics –
This is Simpson’s Paradox

25 of 39
Copyright © 2011 Pearson Education, Inc.


×