Chapter 2
Data
Copyright © 2011 Pearson Education, Inc.
2.1 Data Tables
Some Basic Ideas
Data are a collection of numbers, labels, or
symbols with context
A data table is a rectangular arrangement of data
with rows and columns
Observations or cases form the rows; common
attributes or variables form the columns
3 of 25
Copyright © 2011 Pearson Education, Inc.
2.1 Data Tables
Disorganized Data
4 of 25
Copyright © 2011 Pearson Education, Inc.
2.1 Data Tables
Same Data in a Data Table
5 of 25
Copyright © 2011 Pearson Education, Inc.
2.1 Data Tables
Organize data to yield meaningful information
Provide context (e.g., who, what, when)
Improve interpretability with meaningful names,
formatting and units
6 of 25
Copyright © 2011 Pearson Education, Inc.
2.2 Categorical and Numerical Data
Categorical Data
Also called qualitative or nominal variables
Identify group membership
Type of purchase made and Brand of bike are
examples
7 of 25
Copyright © 2011 Pearson Education, Inc.
2.2 Categorical and Numerical Data
Numerical Data
Also called quantitative or continuous variables
Describe numerical properties of cases
Have measurement units
Size of bike (cm) and Amount spent ($) are
examples
8 of 25
Copyright © 2011 Pearson Education, Inc.
2.2 Categorical and Numerical Data
Measurement Scales
Nominal – name categories without implying
order (categorical)
Ordinal – name categories that can be ordered
(categorical)
Interval – numerical values that can be added or
subtracted (no absolute zero)
Ratio – numerical values that can be added,
subtracted, multiplied or divided (makes ratio
comparisons possible)
9 of 25
Copyright © 2011 Pearson Education, Inc.
2.2 Categorical and Numerical Data
Likert Scale (Ordinal – 5 to 7 Categories)
10 of 25
Copyright © 2011 Pearson Education, Inc.
2.3 Recoding and Aggregation
Recode: building a new variable from another
(recoding price into expensive or inexpensive)
Aggregate: reduce rows in a data table by
counting or summing values within categories
11 of 25
Copyright © 2011 Pearson Education, Inc.
2.3 Recoding and Aggregation
An Example of Aggregation
12 of 25
Copyright © 2011 Pearson Education, Inc.
4M Example 2.1: MEDICAL ADVICE
Motivation
Are patients from one HMO more likely to visit the
doctor than those from another HMO?
13 of 25
Copyright © 2011 Pearson Education, Inc.
4M Example 2.1: MEDICAL ADVICE
Method
Gather data and organize in a data table. Cases
that make up the rows are office visits. The
following variables make up three columns:
Patient ID; HMO Plan; and Duration of patient’s
office visit.
14 of 25
Copyright © 2011 Pearson Education, Inc.
4M Example 2.1: MEDICAL ADVICE
Mechanics
15 of 25
Copyright © 2011 Pearson Education, Inc.
4M Example 2.1: MEDICAL ADVICE
Message
Aggregate the duration of office visits to learn
whether patients from one plan are consuming
most of the doctor’s office time.
16 of 25
Copyright © 2011 Pearson Education, Inc.
2.4 Time Series
Some Definitions
Time series – data recorded over time
Timeplot – graph of a time series showing values
in chronological order
Frequency – regular time spacing of data in a
time series (e.g., daily, monthly, etc.)
Cross-sectional – data observed at the same time
17 of 25
Copyright © 2011 Pearson Education, Inc.
2.4 Time Series
Timeplot of Monthly Unemployment Rate
18 of 25
Copyright © 2011 Pearson Education, Inc.
2.5 Further Attributes of Data
Useful to Know
When and where the data were collected
Source of the data (available online?)
How the data were collected
19 of 25
Copyright © 2011 Pearson Education, Inc.
4M Example 2.2: CUSTOMER FOCUS
Motivation
How do customers in a focus group react to a
new product design?
20 of 25
Copyright © 2011 Pearson Education, Inc.
4M Example 2.2: CUSTOMER FOCUS
Method
Gather data and organize in a data table. The
cases that make up the rows are participants in
the focus group. One of the variables that make
up the columns is participants’ ratings of the
product.
21 of 25
Copyright © 2011 Pearson Education, Inc.
4M Example 2.2: CUSTOMER FOCUS
Mechanics
In addition to product ratings, the columns should
include characteristics of the participants such as
name, age (in years), sex, and income.
22 of 25
Copyright © 2011 Pearson Education, Inc.
4M Example 2.2: CUSTOMER FOCUS
Message
Determine who likes the design (younger or more
affluent members of the focus group, for example)
and choose advertising that appeals to this group.
23 of 25
Copyright © 2011 Pearson Education, Inc.
Best Practices
Provide a context for your data.
Use clear names for your variables.
Distinguish numerical data from categorical data.
Track down the details when you get the data.
Keep track of the source of data.
24 of 25
Copyright © 2011 Pearson Education, Inc.
Pitfalls
Do not assume that a list of numbers provides
numerical data.
Don’t trust all of the data that you get from the
Internet.
Don’t believe every claim based on survey data.
25 of 25
Copyright © 2011 Pearson Education, Inc.