Chapter One
What is Statistics?
1.1
What is Statistics?
“Statistics is a way to get information from data”
Statistics
Data
Information
Statistics is a tool for creating new understanding from a set of
numbers.
Definitions: Oxford English Dictionary
Copyright © 2009 Cengage Learning
1.2
Descriptive Statistics
Descriptive statistics deals with methods of organizing,
summarizing, and presenting data in a convenient and
informative way.
One form of descriptive statistics uses graphical techniques.
Copyright © 2009 Cengage Learning
1.3
Descriptive Statistics
Another form of descriptive statistics uses numerical
techniques to summarize data.
The mean and median are popular numerical techniques to
describe the location of the data.
The range, variance, and standard deviation measure the
variability of the data
Copyright © 2009 Cengage Learning
1.4
Inferential statistics
Inferential statistics is a body of methods used to draw
conclusions or inferences about characteristics of populations
based on sample data.
Copyright © 2009 Cengage Learning
1.5
Key Statistical Concepts
Population
— a population is the group of all items of interest to
a statistics practitioner.
— frequently very large; sometimes infinite.
E.g. All the population in Vietnam
Sample
— A sample is a set of data randomly drawn from the
population.
— Potentially large, but less than the population.
E.g. A sample of citizens living in Hanoi
Copyright © 2009 Cengage Learning
1.6
Key Statistical Concepts
Parameter
— A descriptive measure of a population.
Statistic
— A descriptive measure of a sample.
Copyright © 2009 Cengage Learning
1.7
Key Statistical Concepts
Population
Sample
Subset
Parameter
Statistic
Populations have Parameters,
Samples have Statistics.
Copyright © 2009 Cengage Learning
1.8
Statistical Inference
Statistical inference is the process of making an estimate,
prediction, or decision about a population based on a sample.
Population
Sample
Inference
Statistic
Parameter
What can we infer about a Population’s Parameters
based on a Sample’s Statistics?
Copyright © 2009 Cengage Learning
1.9
Chapter Two
Graphical and Tabular
Descriptive Techniques
2.10
11
Definitions…
A variable is some characteristic of a population or sample.
E.g. student grades.
The values of the variable are the range of possible values
for a variable.
E.g. student marks (0..100)
Data are the observed values of a variable.
E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
Copyright © 2009 Cengage Learning
12
Interval Data…
• Real numbers, i.e. heights, weights, prices, etc.
• Also referred to as quantitative or numerical.
Arithmetic operations can be performed on Interval Data,
Copyright © 2009 Cengage Learning
13
Nominal Data…
as:
• The values of nominal data are categories.
E.g. responses to questions about marital status, coded
Single = 1, Married = 2, Divorced = 3, Widowed = 4
Arithmetic operations don’t make any sense (e.g. does
Widowed ÷ 2 = Married?!)
Nominal data are also called qualitative or categorical.
Copyright © 2009 Cengage Learning
14
Ordinal Data…
Ordinal Data appear to be categorical in nature, but their
values have an order; a ranking to them:
E.g. College course rating system:
poor = 1, fair = 2, good = 3, very good = 4, excellent = 5
Still not meaningful to do arithmetic on this data
We can say:
excellent > poor or fair < very good
Order is maintained no matter what numeric values are
assigned to each category.
Copyright © 2009 Cengage Learning
15
Hierarchy of Data…
Interval
Values are real numbers.
All calculations are valid.
Data may be treated as ordinal or nominal.
Ordinal
Values must represent the ranked order of the data.
Data may be treated as nominal but not as interval.
Nominal
Values are the arbitrary numbers that represent categories.
Data may not be treated as ordinal or interval.
Copyright © 2009 Cengage Learning
16
Graphical & Tabular Techniques for Nominal
Data…
The only allowable calculation on nominal data is to count
the frequency of each value of the variable.
We can summarize the data in a table that presents the
categories and their counts called a frequency distribution.
A relative frequency distribution lists the categories and the
proportion with which each occurs.
Copyright © 2009 Cengage Learning
17
Nominal Data (Frequency)
285
Bar Charts show frequencies…
Copyright © 2009 Cengage Learning
18
Nominal Data (Relative Frequency)
Pie Charts show relative frequencies…
Copyright © 2009 Cengage Learning
19
Nominal Data
It all the same information,
(based on the same data).
Just different presentation.
Copyright © 2009 Cengage Learning
20
Ordinal Data
Same as the nominal data but
•the bars in the bar chart are arranged in ascending or descending
order.
•The wedges in the pie chart are arranged clockwise in ascending
or descending order.
Copyright © 2009 Cengage Learning
21
Graphical Techniques for Interval
Data
There are several graphical methods that are used when the
data are interval.
The most important of these graphical methods is the
histogram.
Copyright © 2009 Cengage Learning
22
Example
For nominal data we created a frequency distribution of the
categories.
For interval data, we create a frequency distribution by
counting the number of observations that fall into a series of
intervals, called classes.
Copyright © 2009 Cengage Learning
23
Example
These data are stored in file Xm0204
These classes are defined as follows:
Classes
Amounts that are less than or equal to 15
Amounts that are more than 15 but less than or equal to 30
Amounts that are more than 30 but less than or equal to 45
Amounts that are more than 45 but less than or equal to 60
Amounts that are more than 60 but less than or equal to 75
Amounts that are more than 75 but less than or equal to 90
Amounts that are more than 90 but less than or equal to 105
Amounts that are more than 105 but less than or equal to 120
Copyright © 2009 Cengage Learning
24
Example
Copyright © 2009 Cengage Learning
25
Interpret…
about half (71+37=108)
of the bills are “small”,
i.e. less than $30
(18+28+14=60)÷200 = 30%
i.e. nearly a third of the phone bills
are $90 or more.
There are only a few telephone
bills in the middle range.
Copyright © 2009 Cengage Learning