Lecture 1.
Chapter 1
What is statistics?
1
What is statistics?
• We are constantly being bombarded with
statistics and statistical information. For
example:
– customer surveys
– economic predictions
– marketing information
– political polls (proportion of people voting
for candidate A or policy B)
• How can we make sense out of all this data?
1
What is statistics?
Statistics
Information
Data
“Statistics is a way to get information from data”
Data: Facts, especially
numerical facts,
collected together for
reference or
information.
Information:
Knowledge
communicated
concerning some
particular fact.
“Statistics is a tool for creating new
understanding from a set of numbers”
Definitions: Oxford English Dictionary
Key statistical concepts
Population tổng thể, quần thề
• A population is the group of all items of interest to a
statistics practitioner.
• Frequently very large; sometimes infinite.
Example 1.4, page 4: An Australian automobile club
and a new life insurance policy. The population is all
current million or so members of an automobile club
Sample
vật mẫu
• A sample is a set of data drawn from the population.
• Potentially very large, but less than the population.
Example 1.4 (contd.): A sample of 500 members of
the club selected.
2
Key statistical concepts
Parameter tham số
• A descriptive measure of a population.
Example 1.4 (Contd.): the proportion of all
members who would purchase the new life
insurance policy
Statistic giá trị thống kê
• A descriptive measure of a sample.
Example 1.4 (Contd.): the proportion of 500
selected members who would purchase the
new life insurance policy
Key statistical concepts
Population
Sample
Subset
Parameter
Statistic
• Populations have parameters
• Samples have statistics.
• Examples of parameters / populations and
statistics / samples??
3
Descriptive statistics
• Methods of organizing, summarizing, and
presenting data in a convenient and informative
way. These methods include:
– graphical techniques (Chapters 3 and 4), and
– numerical techniques (Chapter 5).
• The actual method used depends on what
information you would like to extract. Are you
interested in:
– measure(s) of central location and/or
– measure(s) of variability (dispersion)?
Descriptive statistics helps to answer these questions.
Inferential statistics
• Descriptive statistics describes the data set
that is being analysed, but does not allow us
to draw any conclusions or make any
inferences about the data. Hence we need
another branch of statistics: inferential
statistics.
• Inferential statistics is also a set of methods,
but it is used to draw conclusions or inferences
about characteristics of populations based on
data from a sample.
4
Statistical inference
• Statistical inference is the process of making
an estimate, prediction, or decision about a
population based on a sample.
Population
Sample
Inference
Parameter
Statistic
What can we infer about a population’s parameters
based on a sample’s statistics?
Statistical inference
• We use the sample statistics to make inferences
about the population parameters.
• Therefore, we can make an estimate, prediction,
or decision about a population based on sample
data.
• Thus, we can apply what we know about a sample
to the larger population from which it was drawn.
Example 1.4 (contd.): Suppose 60 out of 500
selected members want to purchase the new life
insurance policy (12%) a statistical inference
may be made: about 12 % (or at least 10%) of all
one million members would purchase the new
policy.
5
Statistical inference
• Rationale
– Large populations make investigating each
member impractical and expensive.
– It is easier and cheaper to take a sample
and make estimates about the population
from the sample.
• However
– Such conclusions and estimates are not
always going to be correct.
– For this reason, we build into the statistical
inference 'measures of reliability', namely
confidence level and significance level.
Statistical applications in
economics & business
• Statistical analysis plays an important role in virtually all
aspects of business and economics.
• Throughout this course, we will see applications of
statistics in:
–
–
–
–
–
–
accounting
economics
finance
human resources management
marketing
and operations management.
(See also cases on pages 7, 8, 9)
6
Chapter 2
Types of data, data collection
and sampling
2.1 Types of data
Optional reading:
2.2 Method of collecting data
2.3 & 2.4 Sampling and sampling plans
2.5 Sampling and non-sampling errors
13
Re-cap
Descriptive statistics involves arranging,
summarizing, and presenting a set of data in such a
way that useful information is produced.
Statistics
Data
Information
Its methods make use of graphical techniques and
numerical descriptive measures (such as averages)
to summarize and present the data.
14
7
Some useful definitions
A variable is some characteristic of a population
or sample. E.g. marks of IB2020A on the math
exam (example, page 19)
Typically denoted with a capital letter: X, Y, Z…
The values of the variable are the range of
possible values for a variable.
E.g. student marks 0, 1, 2, …., 100
Data are the observed values of a variable.
E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
15
2.1 Types of data
Data (at least for purposes of statistics) fall into
three main groups:
• Numerical (interval or quantitative) data
• Nominal (categorical or qualitative) data dữ kiệu đinh tính
• Ordinal (ranked) data
16
8
Types of data
Numerical data
age income
(pages 19, 20, 21)
Nominal data
person
married
Ordinal data
exam grade
HD
75 000
1
yes
D
68 000
2
no
C
.
3
no
P
.
.
.
F
. data,
computer
brand
all we .
weight gain With nominal
With ordinal
data,
all we
Food
quality
IBM
can calculate is1 the
+10
Excellent
can
use
is
computations
2 that
Dell
proportion of data
+5
involving theGood
ordering
3
Compaq
falls into each category.
.
Satisfactory
process.
4
IBM
Poor
.
.
.
IBM
Dell
Compaq
other total
25
11
8
6
50
50%
22%
16%
12%
55
42
.
.
17
Calculations for Types of Data
As mentioned above,
• All calculations are permitted on numerical
data.
• No calculations are allowed for nominal data,
except counting the number of observations in
each category and calculating their proportions.
• Only calculations involving a ranking process
are allowed for ordinal data.
18
9
Chapter 3
Graphical descriptive
techniques – Nominal data
3.1 Graphical techniques to describe
nominal data
Optional reading
3.2 Selecting the appropriate chart
3.3 Graphical techniques to describe
ordinal data
3.4 Describing the relationship between
nominal variables
13
3.1 Graphical techniques to
describe nominal data
The only allowable calculation on nominal data is
to count the frequency of each value of the
variable.
We can summarize the data in a table that
presents the categories and their counts called a
frequency distribution.
A relative frequency distribution lists the
categories and the proportion with which each
occurs.
20
10
Example 3.1, page 45
• To determine the approximate market share of
various women’s magazines in New Zealand, a
women’s magazine readership survey was
conducted using a sample of 300 readers.
• Data was collected and the count of the
occurrences (frequencies) was recorded for each
magazine.
• The frequencies were presented in a bar chart.
• Then the frequencies were converted to
proportions and the results were presented in a
pie chart.
21
Example 3.1 (contd.)
1 = Australian Women’s Weekly (NZ Edition); 2 = NZ
Women’s Weekly; 3 = NZ Woman’s Day ; 4 = NZ New Idea;
5 = Next; and 6 = That’s Life.
Magazine
Frequency Percentage
Australian Women's Weekly (1)
59
19.7
NZ Women's Weekly (2)
58
19.3
NZ Women's Day (3)
88
29.3
NZ New Idea (4)
39
13.0
Next (5)
35
11.7
That's Life (6)
21
7.0
300
100
Total
22
11
Example 3.1 (contd.):
Excel representation
Women's magazine readership, NZ, 2015
100
88
90
80
70
59
60
58
50
39
40
35
30
21
20
10
0
Australian
Women's
Weekly
NZ Women's
Weekly
NZ Women's
Day
NZ New Idea
Next
That's Life
23
The size of each slice in a pie chart is proportional
to the percentage corresponding to the category it
represents.
Women's magazine readership, NZ, 2015
That's Life
7.0%
Next
11.7%
Australian
Women's Weekly
19.7%
(19.3)(3600)/100 = 69.60
NZ New Idea
13.0%
NZ Women's
Weekly 19.3%
NZ Women's Day
29.3%
24
12
3.4 Describing the relationship
between two nominal variables
Now we will investigate the relationship
between two nominal variables using either
tabular or graphical techniques.
A cross-classification table is used to
describe the relationship between two nominal
variables.
A cross-classification table lists the frequency
of each combination of the values of the two
variables…
25
Example 3.7, page 67
• In a major Australian city there are four
competing newspapers: N1, N2, N3 and N4.
• To help design advertising campaigns, the
advertising managers of the newspapers
need to know which segments of the
newspaper market are reading their papers.
• A survey was conducted to analyze the
relationship between newspapers read and
occupation.
26
13
Example 3.7 (contd.)
A sample of newspaper readers was asked to report
which newspaper they read: N1, N2, N3, N4, and to
indicate whether they were blue-collar worker (1),
white-collar worker (2), or professional (3). By
counting the number of times each of the 12
combinations occurs, we produced the Table 3.9.
27
Example 3.7 (contd.)
Interpretation: The relative frequencies in the rows
2 and 3 are similar, but there are large differences
between rows 1 and 2, and between rows 1 and 3.
28
14
Example 3.7 (contd.)
Use the data from the cross-classification table to
create bar charts.
- Professionals tend to
Bar Chart
60
50
51
40
43
N1
38 37
30
N2
33
N3
29
27
20
18
10
22
21
N4
20
15
0
Blue colla
White colla
Professtional
Occupation
read newspaper N2
more than twice as
often as newspaper
N3.
- Blue collar workers
tend to read different
newspapers from both
white collar workers
and professionals and
that white collar and
professionals are quite
similar
in
their
newspaper choice.
29
Summary:
Chapter 2 page 40, Chapter 3 page 73
Home assignment:
- Section 2.1 Exercises pages 23 -24: 2.3,
2.5, 2.8
- Section 3.1 Exercises pages 60 - 61: 3.2,
3.3, 3.4
- Section 3.4 Exercises pages 71 - 72: 3.29,
3.30
30
15