Tải bản đầy đủ (.pdf) (505 trang)

Thống kê kinh doanh sử dụng Excel

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (16.7 MB, 505 trang )


business statistics using Excel®


This page intentionally left blank


business statistics
using Excel®
Second edition
Glyn Davis & Branko Pecar

1


1

Great Clarendon Street, Oxford, OX2 6DP,
United Kingdom
Oxford University Press is a department of the University of Oxford.
It furthers the University’s objective of excellence in research, scholarship,
and education by publishing worldwide. Oxford is a registered trade mark of
Oxford University Press in the UK and in certain other countries
© Glyn Davis and Branko Pecar 2013
The moral rights of the authors have been asserted
First Edition copyright 2010
Impression: 1
All rights reserved. No part of this publication may be reproduced, stored in
a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by licence or under terms agreed with the appropriate reprographics


rights organization. Enquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above
You must not circulate this work in any other form
and you must impose this same condition on any acquirer
British Library Cataloguing in Publication Data
Data available
ISBN 978–0–19–965951–7
Printed in Italy by
L.E.G.O. S.p.A.—Lavis TN
Links to third party websites are provided by Oxford in good faith and
for information only. Oxford disclaims any responsibility for the materials
contained in any third party website referenced in this work.


Preface
Aims of the book
It has long been recognized that the development of modular undergraduate programmes
coupled with a dramatic increase in student numbers has led to a reconsideration of
teaching practices. This statement is particularly true in the teaching of statistics and, in
response, a more supportive learning process has been developed. A classic approach to
teaching statistics, unless one is teaching a class of future professional statisticians, can be
difficult and is often met with very little enthusiasm by the majority of students. A more
supportive learning process based on method application rather than method derivation
is clearly needed. The authors thought that by relying on some commonly available tools,
Microsoft Excel 2010 in particular, such an approach would be possible. To this effect, a
new programme relying on the integration of workbook based open learning materials
with information technology tools has been adopted. The current learning and assessment structure may be defined as follows:
(a) To help students ‘bridge the gap’ between school and university
(b) To enable a student to be confident in handling numerical data

(c) To enable students to appreciate the role of statistics as a business decision-making
tool
(d) To provide a student with the knowledge to use Excel 2010 to solve a range of
statistical problems.
This book is aimed at students who require a general introduction to business statistics
that would normally form a foundation-level business school module. The learning material in this book requires minimal input from a lecturer and can be used as a self-instruction guide. Furthermore, three online workbooks are available; two to help students with
Excel and practise numerical skills, and an advanced workbook to help undertake factorial experiment analysis using Excel 2010.
The growing importance of spreadsheets in business is emphasized throughout the text
by the use of the Excel spreadsheet. The use of software in statistics modules is more or
less mandatory at both diploma and degree level, and the emphasis within the text is on
the use of Excel 2010 to undertake the required calculations.

How to use the book effectively
The sequence of chapters has been arranged so that there is a progressive accumulation
of knowledge. Each chapter guides students step by step through the theoretical and
spreadsheet skills required. Chapters also contain exercises that give students the chance
to check their progress.


vi

Preface

Hints on using the book
(a) Be patient and work slowly and methodically, especially in the early stages when
progress may be slow.
(b) Do not omit or ‘jump around’ between chapters; each chapter builds upon
knowledge and skills gained previously. You may also find that the Excel
applications described earlier in the book are required to develop applications in
later chapters.

(c) Try not to compare your progress with others too much. Fastest is not always best!
(d) Don’t try to achieve too much in one session. Time for rest and reflection is
important.
(e) Mistakes are part of learning. Do not worry about them. The more you repeat
something, the fewer mistakes you will make.
(f ) Make time to complete the exercises, especially if you are learning on your own.
They are your best guide to your progress.
(g) The visual walkthroughs have been developed to solve a particular statistical
problem using Excel. If you are not sure about the Excel solution then use the visual
walkthrough (flash movies) as a reminder.


Brief contents
How to use this book
How to use the Online Resource Centre

xiv
xvi

1

Visualizing and presenting data

2

Data descriptors

3

Introduction to probability


107

4

Probability distributions

135

5

Sampling distributions and estimating

185

6

Introduction to parametric hypothesis testing

243

7

Chi-square and non-parametric
hypothesis testing

296

8


Linear correlation and regression analysis

343

9

Time series data and analysis

406

Glossary
Index

1
58

468
477


Detailed contents
How to use this book
How to use the Online Resource Centre

1

Visualizing and presenting data

1


Overview
Learning objectives

1
2

1.1 The different types of data variable

2

1.2 Tables

3

1.2.1

What a table looks like

1.2.2

Creating a frequency distribution

1.2.3

Types of data

1.2.4

Creating a table using Excel PivotTable


11

1.2.5

Principles of table construction

21

1.3 Graphical representation of data

4
6
10

21

1.3.1

Bar charts

22

1.3.2

Pie charts

27

1.3.3


Histograms

31

1.3.4

Histograms with unequal class intervals

40

1.3.5

Frequency polygon

42

1.3.6

Scatter and time series plots

47

1.3.7

Superimposing two sets of data onto one graph

Techniques in practice
Summary
Key terms
Further reading


2

xiv
xvi

Data descriptors

51

54
56
57
57

58

Overview
Learning objectives

58
59

2.1 Measures of central tendency

59

2.1.1

Mean, median, and mode


59

2.1.2

Percentiles and quartiles

63

2.1.3

Averages from frequency distributions

67

2.1.4

Weighted averages

77

2.2 Measures of dispersion

80

2.2.1

The range

82


2.2.2

The interquartile range and semi-interquartile range (SIQR)

82

2.2.3

The standard deviation and variance

83

2.2.4

The coefficient of variation

88


Detailed contents
2.2.5

Measures of skewness and kurtosis

2.3 Exploratory data analysis
2.3.1

Five-number summary


2.3.2

Box plots

2.3.3

Using the Excel ToolPak add-in

Techniques in practice
Summary
Key terms
Further reading

3

Introduction to probability

94
94
96
100

102
104
105
105

107

Overview

Learning objectives

107
107

3.1 Basic ideas

107

3.2 Relative frequency

109

3.3 Sample space

112

3.4 The probability laws

114

3.5 The general addition law

115

3.6 Conditional probability

117

3.7 Statistical independence


120

3.8 Probability tree diagrams

123

3.9 Introduction to probability distributions

124

3.10 Expectation and variance for a probability distribution

127
131
133
133
133

Techniques in practice
Summary
Key terms
Further reading

4

89

Probability distributions


135

Overview
Learning objectives

135
135

4.1 Continuous probability distributions

136

4.1.1

Introduction

136

4.1.2

The normal distribution

136

4.1.3

The standard normal distribution (Z distribution)

140


4.1.4

Checking for normality

149

4.1.5

Other continuous probability distributions

153

4.1.6

Probability density function and cumulative
distribution function

4.2 Discrete probability distributions

154

155

4.2.1

Introduction

155

4.2.2


Binomial probability distribution

155

ix


x

Detailed contents
4.2.3

Poisson probability distribution

165

4.2.4

Poisson approximation to the binomial distribution

173

4.2.5

Normal approximation to the binomial distribution

175

4.2.6


Normal approximation to the Poisson distribution

180

4.2.7

Other discrete probability distributions

182

Techniques in practice
Summary
Key terms
Further reading

5

Sampling distributions and estimating

182
183
183
184

185

Overview
Learning objectives


185
185

5.1 Introduction to the concept of a sample

186

5.1.1

Why sample?

186

5.1.2

Sampling terminology

187

5.1.3

Types of samples

188

5.1.4

Types of error

192


5.2 Sampling from a population

193

5.2.1

Introduction

5.2.2

Population versus sample

194

5.2.3

Sampling distributions

194

5.2.4

Sampling distribution of the mean

194

5.2.5

Sampling from a normal population


198

5.2.6

Sampling from a non-normal population

204

5.2.7

Sampling distribution of the proportion

210

5.2.8

193

Using Excel to generate a sample from a sampling
probability distribution

5.3 Population point estimates

212

217

5.3.1


Introduction

217

5.3.2

Types of estimate

218

5.3.3

Criteria of a good estimator

218

5.3.4

Point estimate of the population mean and variance

218

5.3.5

Point estimate for the population proportion and variance

222

5.3.6


Pooled estimates

224

5.4 Population confidence intervals

225

5.4.1

Introduction

225

5.4.2

Confidence interval estimate of the population mean, µ (σ known)

226

5.4.3

Confidence interval estimate of the population mean,
µ (σ unknown, n < 30)

5.4.4

5.4.5

228


Confidence interval estimate of the population mean,
µ (σ unknown, n ≥ 30)

232

Confidence interval estimate of a population proportion

235


Detailed contents
5.5 Calculating sample size

Techniques in practice
Summary
Key terms
Further reading

6

Introduction to parametric hypothesis testing

237
239
241
241
242

243


Overview
Learning objectives

243
243

6.1 Hypothesis testing rationale

244

6.1.1

Hypothesis statements H0 and H1

244

6.1.2

Parametric versus non-parametric tests of difference

246

6.1.3

One and two sample tests

246

6.1.4


Choosing an appropriate statisitcal test

247

6.1.5

Significance level

248

6.1.6

Sampling distributions

248

6.1.7

One and two tail tests

249

6.1.8

Check t-test model assumptions

250

6.1.9


Types of error

251

6.1.10 P-values

251

6.1.11 Critical test statistic

252

6.2 One sample z-test for the population mean

253

6.3 One sample t-test for the population mean

257

6.4 Two sample z-test for the population mean

261

6.5 Two sample z-test for the population proportion

266

6.6 Two sample t-test for population mean (independent samples,


equal variances)

269

6.7 Two sample tests for population mean (independent samples,

unequal variances)
6.7.1

6.7.2

274

Two sample tests for independent samples
(unequal variances)

274

Equivalent non-parametric test: Mann–Whitney U test

279

6.8 Two sample tests for population mean (dependent or

paired samples)

279

6.8.1


Two sample tests for dependent samples

279

6.8.2

Equivalent non-parametric test: Wilcoxon matched pairs test

283

6.9 F test for two population variances (variance ratio test)

285

6.10 Calculating the size of the type II error and the statistical power

290
292
294
294
295

Techniques in practice
Summary
Key terms
Further reading

xi



xii

Detailed contents

7

Chi-square and non-parametric
hypothesis testing
Overview
Learning objectives

296
296

7.1 Chi-square tests

297

7.1.1

Chi-square test of association

7.1.2

Chi-square test for independent samples

303

7.1.3


McNemar’s test for matched (or dependent) pairs

307

7.1.4

Chi-square goodness-of-fit test

312

7.2 Non-parametric (or distribution-free) tests
7.2.1
7.2.2

Sign test

7.2.3

298

318
318

Wilcoxon signed rank sum test for dependent samples (or
matched pairs)
Mann–Whitney U test for two independent samples

Techniques in practice
Summary

Key terms
Further reading

8

296

Linear correlation and regression analysis

324
331

338
340
341
341

343

Overview
Learning objectives

343
343

8.1 Linear correlation analysis

344

8.1.1


Scatter plots

344

8.1.2

Covariance

347

8.1.3

Pearson’s correlation coefficient, r

348

8.1.4

Testing the significance of linear correlation between the
two variables

353

8.1.5

Spearman’s rank correlation coefficient

356


8.1.6

Testing the significance of Spearman’s rank
correlation coefficient, rs

8.2 Linear regression analysis

358

362

8.2.1

Construct scatter plot to identify model

8.2.2

Fit line to sample data

364

8.2.3

Sum of squares defined

369

8.2.4

Regression assumptions


370

8.2.5

Test model reliability

372

8.2.6

The use of t-test to test whether the predictor variable is a
significant contributor

8.2.7

364

374

The use of F test to test whether the predictor variable is a
significant contributor

378

8.2.8

Confidence interval estimate for slope β1

382


8.2.9

Prediction interval for an estimate of Y

383

8.2.10 Excel data analysis regression solution

385

8.3 Some advanced topics in regression analysis

390


Detailed contents
8.3.1

Introduction to non-linear regression

390

8.3.2

Introduction to multiple regression analysis

397

Techniques in practice

Summary
Key terms
Further reading

9

Time series data and analysis

401
404
405
405

406

Overview
Learning objectives

406
406

9.1 Introduction to time series data

407

9.1.1

Stationary and non-stationary time series

407


9.1.2

Seasonal time series

409

9.1.3

Univariate and multivariate methods

409

9.1.4

Scaling the time series

410

9.2 Index numbers

411

9.2.1

Simple indices

412

9.2.2


Aggregate indices

415

9.2.3

Deflating values

416

9.3 Trend extrapolation

419

9.3.1

A trend component

9.3.2

Fitting a trend to a time series

420

9.3.3

Types of trends

423


9.3.4

Using a trend chart function to forecast time series

424

9.3.5

Trend parameters and calculations

426

9.4 Moving averages and time series smoothing

420

430

9.4.1

Forecasting with moving averages

431

9.4.2

Exponential smoothing concept

436


9.4.3

Forecasting with exponential smoothing

438

9.5 Forecasting seasonal series with exponential smoothing

445

9.6 Forecasting errors

450

9.6.1

Error measurement

450

9.6.2

Types of errors

453

9.6.3

Interpreting errors


455

9.6.4

Error inspection

456

9.7 Confidence intervals
9.7.1

Population and sample standard errors

458

9.7.2

Standard errors in time series

459

Techniques in practice
Summary
Key terms
Further reading
Glossary
Index

458


463
465
466
466
468
477

xiii


How to use this book

Learning objectives

» Learning objectives «
On successful completion of the module, you will be able to:

» Learning
»

understand the concept of an average;

» recognize that three possible averages exist (mean, mode, and median) and calculate them
using a variety of graphical and formula methods in number and frequency distribution form;
»
»
»

recognize when to use different measures of average;


»

recognize when to use different measures of dispersion;

understand the concept of dispersion;

On successf

recognize that different measures of dispersion exist (range, quartile range, SIQR, standard
deviation, and variance), and calculate them using a variety of graphical and formula methods
in number and frequency distribution form;

d

» understand the idea of distribution shape, and calculate a value for symmetry and
peakedness;

Each chapter opens with a series of learning objectives outlining what you can expect
to learn as you progress through the chapter.
These also serve as helpful recaps of important concepts when revising.

Step-by-step Excel guidance
Excel screenshots are fully integrated
throughout the text and visually demonstrate
the Excel formulas, functions, and solutions to
provide you with clear step-by-step guidance
on how to solve the statistical problems posed.

Figure 2.4


Example boxes

Example 2.4
To illustrate the use of the Select Formulas > Select Insert Function method consider the problem of calculating the mean value in Example 2.1. In Figures 2.1 and 2.2 the mean value is
located in cell E12. To insert the correct Excel function into cell E12 we would click on cell E12
and then Select Formulas > Select Insert Function as illustrated in Figures 2.3 and 2.4.

Examp
To illustrate t
lem of calcu

Note According to Table 2.3, a number of claims corresponding to ‘one’ occurs three
times, which will contribute three to the total, ‘two’ claims occur four times contributing eight
to the sum, and so on. This can be written as follows:

Note

Mean(X) =

times, whic
to the sum,

(3*1) + (4*2) +.........+ (1*10)
= 206/40 = 5.15
3+4+4+5+5+7+5+3+3+1

As already pointed out, as we are dealing with discrete data we would indicate a mean as
approximately five claims. Equation (2.3) can now be used to calculate the mean for a frequency distribution data set:


X=

∑ fX
∑f

below 430 m

Note boxes
Note boxes draw your attention to key points,
areas where extra care should be taken, or
certain exceptions to the rules.

(2.3)

❉ Interpretation Twenty five percent of all the values in the data set are equal to or
below 430 miles, while 75% are equal to or below 470 miles.

❉ Interpr

Detailed worked examples run throughout
each chapter to show you how the theory
relates to practice. The authors break concepts
down into clear step-by-step phases, which
are often accompanied by a series of Excel
screenshots, enabling you to assess your
progress.

Interpretation boxes
Interpretation boxes appear throughout
the chapters, providing you with further

explanations to aid your understanding of the
concepts being discussed.


How to use this book

Student exercises
Throughout each chapter you are regularly
given the chance to test your knowledge
and understanding of the topics covered
through student exercises at the end of each
section. You can then monitor your progress
by checking the answers at the back of the
textbook and online.

Techniques in practice
Techniques in practice exercises appear at the
end of each chapter and reinforce learning by
presenting questions to test the knowledge
and skills covered in that unit. You can use
these to check your understanding of a topic
before moving on to the next chapter.

Chapter summary
Each chapter ends with an overview of the
techniques covered and serves as an ideal
tool for you to check your understanding of
the skills you should have acquired in that
chapter.


Key terms
Key terms are highlighted in green where
they first appear in the text, along with their
definition in the margin. You can also find
these terms at the end of each chapter for
quick reference.

Further reading

Student exercises
X2.14 The manager at BIG JIMS restaurant is concerned about the time it takes to process
credit card payments at the counter by counter staff. The manager has collected the
following processing time data (time in minutes/seconds) (Table 2.21) and requested
that summary statistics are calculated.
(a) Calculate a five-number summary for this data set.

Stud

(b) Do we have any evidence for a symmetric distribution?
(c) Use the Excel Analysis-ToolPak to calculate descriptive statistics.
(d) Which measures would you use to provide a measure of average and spread?



Techniques in practice

TP1 CoCo S. A. is concerned at the time taken to react to customer complaints and have
implemented a new set of procedures for its support centre staff. The customer service director plans to reduce the mean time for responding to customer complaints to 28 days and has
collected the sample data given in Table 4.12 after implementation of the new procedures to
assess the time to react to complaints (days).




20

33

33

29

24

30

40

33

20

39

32

37

32

50


36

31

38

29

15

33

27

29

43

33

31

35

19

39

22


21

28

22

26

42

30

17

32

34

39

39

32

38

Techn

TP1 CoCo




Summ

In this chapte

Key Terms

Alpha
Alternative hypothesis
Beta, α
Central Limit Theorem
Critical test statistic
F distribution
F test
F test for two population
variances (variance
ratio test)
Hypothesis test procedure





Summary

In this chapter we have provided an introduction to the important statistical concept of parametric hypothesis testing for one and two samples. What is important in hypothesis testing is
that you are able to recognize the nature of the problem and should be able to convert this into
two appropriate hypothesis statements (H0 and H1) that can be measured.

If you are comparing more than two samples then you would need to employ advanced
statistical parametric hypothesis tests. These tests are called analysis of variance (ANOVA),
which are described in the online workbook ‘Factorial experiments’.
In this chapter we have described a simple five-step procedure to aid the solution process
and have focused on the application of Excel to solve the data problems. The main emphasis is placed on the use of the p-value, which provides a number to the probability of the
null hypothesis (H0) being rejected. Thus, if the measured p-value > α (Alpha) then we would
accept H0 to be statistically significant. Remember the value of the p-value will depend on
whether we are dealing with a two or one tail test. So take extra care with this concept as this
is where most students slip up.



X2.14 The m

Level of significance
Lower one tail test
Mann–Whitney U test
Non-parametric
Null hypothesis
One sample t-test
for the population
mean
One sample test
One sample z-test for the
population mean

One tail tests
Parametric
P-value
Region of rejection

Robust test
Significance level, α
Statistical power
Two sample t-test for
population mean
(dependent or paired
samples)



Key T

Alpha

Further Reading

Textbook Resources

A list of recommended reading is included
to allow you to explore a particular subject
area in more depth. Annotated web links are
also provided throughout the text to help you
locate further statistical resources.

1. Whigham, D. (2007) Business Data Analysis using Excel. Oxford: Oxford University Press.
2. Lindsey, J. K. (2003) Introduction to Applied Statistics: A Modelling Approach (2nd edn).
Oxford: Oxford University Press.

Web Resources
1. StatSoft Electronic Textbook (accessed

25 May 2012).
2. HyperStat Online Statistics Textbook />(accessed 25 May 2012).
3. Eurostat—website is updated daily and provides direct access to the latest and most complete statistical information available on the European Union (EU), the EU Member States,
the Euro-zone and other countries (accessed 25 May
2012).
4. Economagic—contains international economic data sets ()
(accessed 25 May 2012).



Furth

Te tbook Re

xv


How to use the Online Resource Centre

www.oxfordtextbooks.co.uk/orc/davis_pecar2e/

For students
Numerical skills workbook
The authors have provided you with a numerical skills
refresher, packed with examples and exercises, to equip you
with the skills needed to confidently approach every topic in
the textbook.

Introduction to Excel workbook
This workbook serves as an introductory guide or refresher

course which will guide you through the features of Microsoft
Excel 2010.

Factorial experiments workbook
This workbook has been devised to offer you specific guidance
on how to identify and solve factorial experiments. The authors
have provided a wealth of exercises, solutions, and suggested
reading to help you further your understanding of this topic.

Self-test multiple-choice questions
Multiple-choice questions for each chapter of the book help
you to test your understanding of a topic.

Online glossary
The glossary of terms, along with their definitions from the
book, can now be found online for ease of reference.


How to use the Online Resource Centre

Revision tips
The authors have provided you with revision tips to help
consolidate your learning and to assist you when preparing for
your exams.

Visual walkthroughs
Visual walkthroughs, complete with audio explanations, are provided
for each statistical process in the text to help guide you through the
techniques and Excel solutions.


For registered adopters
Instructor's manual
This resource includes a chapter-by-chapter guide to
structuring lectures and seminars as well as teaching tips and
solutions from the techniques and exercises in the text.

PowerPoint lecture slides
A suite of fully customizable PowerPoint slides have been
designed by the authors to assist you in your lectures and
presentations.

Test bank
Each chapter of the book is accompanied by a bank of assorted
questions, covering a variety of techniques for the topics
covered.

Excel data and solutions from the book
Excel spreadsheets and solutions can be found online for all
of the exercises and techniques in practice problems posed in
the book.

xvii


This page intentionally left blank


Visualizing and presenting data

The display of various types of data or information in the form of tables, graphs, and diagrams is quite a common spectacle these days. Newspapers, magazines, and television

all use these types of displays to try and convey information in an easy-to-assimilate way.
In a nutshell what these forms of display aim to do is to summarize large sets of raw data
such that we can see, at a glance, the ‘behaviour’ of the data. Figures 1.1 and 1.2 provide
examples of tables published in an English newspaper.
THE WORST OFFENDERS
Bank

Account

A&L

Direct Saver –4.95%

–2.20%

Abbey

50+

–4.65%

–2.85%

Halifax

Web Saver

–4.65%

–3.50%


Nationwide

e-Savings

–4.60%

–3.99%

Northern Rock

E-Saver

–4.00%

Cut

Mortgage rate cut

–2.70%
SOURCE: Moneyfacts.co.uk

Figure 1.1
‘No better off after rate cuts’. Elizabeth Colman, The Sunday Times—Money, 12 April 2009, p. 6

This chapter and the next will use a variety of techniques that can be used to present the
data in a form that will make sense to people. In this chapter we will look at using tables
and graphical forms to represent the raw data, and in Chapter 2 we will explore methods
that can put a summary number to the raw data.


» Overview «
In this chapter we shall look at methods to summarize data using tables and charts:

»
»

tabulating data;
graphing data.

1


2

Business statistics using Excel

Rising attacks
Increase in robberies over past three months
compared to previous year
Staffordshire

56%

North Yorkshire

47%

Lincolnshire

46%


Cambridgeshire

33%

Nottinghamshire

26%

Merseyside

14%

Greater Manchester

10%

South Wales

SOURCE:
Police
figures

No increase

−14%

Metropolitan police

000s 120


80
Robberies
40

SOURCE: Home Office
98

99

00

01

02

03

04

05

06

07

08

Figure 1.2
‘Muggings soar as recession bites’. David Leppard, The Sunday Times, 12 April 2009, p. 11


» Learning objectives «
On successful completion of the module you will be able to:

x
Variable A variable is a
symbol that can take on
any of a specified set of
values.
Quantitative Variables can
be classified using numbers.
Qualitative Variables can
be classified as descriptive
or categorical.
Categorical variables A
set of data is said to be
categorical if the values or
observations belonging to it
can be sorted according to
category.

»

understand the different types of data variables that can be used to represent a specific
measurement;

»
»
»
»

»
»
»

know how to present data in table form;
present data in a variety of graphical forms;
construct frequency distributions from raw data;
distinguish between discrete and continuous data;
construct histograms for equal and unequal class widths;
understand what we mean by a frequency polygon;
solve problems using Microsoft Excel.

1.1

The different types of data variable

A variable is any measured characteristic or attribute that differs for different subjects.
For example, if the height of 1000 subjects was measured, then height would be a variable.
Variables can be quantitative or qualitative (sometimes called categorical variables).


Visualizing and presenting data

Quantitative variables (or numerical variables) are measured on one of three different
scales: interval, ratio, or ordinal.
Qualitative variables are measured on a nominal scale. If a group of business students
was asked to name their favourite browser to browse the Web, then the variable would
be qualitative. If the time spent on the computer to research a topic was measured, then
the variable would be quantitative. Nominal measurement consists of assigning items
to groups or categories. No quantitative information is conveyed and no ordering of

the items is implied. Nominal scales are therefore qualitative rather than quantitative.
Football club allegiance, sex or gender, degree type, and courses studies are all examples
of nominal scales.
Frequency distributions, described in Chapter 2, are used to analyse data measured
on a nominal scale. The main statistic computed is the mode. Variables measured on a
nominal scale are often referred to as categorical or qualitative variables. It is very important that you understand the type of data variable that you have as the type of graph or
summary statistic calculated will be dependent upon the type of data variable that you
are handling.
Measurements with ordinal scales are ordered in the sense that higher numbers represent higher values. However, the intervals between the numbers are not necessarily equal.
For example, on a five-point rating scale measuring student satisfaction, the difference
between a rating of 1 (‘very poor’) and a rating of 2 (‘poor’) may not represent the same
difference as the difference between a rating of 4 (‘good’) and a rating of 5 (‘very good’).
The lowest point on the rating scale in the example was arbitrarily chosen to be 1 and this
scale does not have a ‘true’ zero point. The only conclusion you can make is that one is
better than the other (or even worse), but you cannot say that one is twice as good as the
other.
On interval measurement scales, one unit on the scale represents the same magnitude
of the characteristic being measured across the whole range of the scale. For example, if
student stress was being measured on an interval scale, then a difference between a score
of 5 and a score of 6 would represent the same difference in anxiety as would a difference
between a score of 9 and a score of 10. Interval scales do not have a ‘true’ zero point,
however; therefore it is not possible to make statements about how many times higher
one score is than another. For the stress measurement, it would not be valid to say that a
person with a score of 6 was twice as anxious as a person with a score of 3.
Ratio scales are like interval scales except they have true zero points. For example, a
weight of 100 g is twice as much as 50 g. Interval and ratio measurements are also called
continuous variables. Table 1.1 summarizes the different measurement scales with
examples provided of these different scales.

1.2


Tables

Presenting data in tabular form can make even the most comprehensive descriptive narrative of data more readily intelligible. Apart from taking up less room, a table enables
figures to be located quicker, easy comparisons between different classes to be made,
and may reveal patterns that cannot otherwise be deduced. The simplest form of table

3

x
Interval scale An
interval scale is a scale
of measurement where
the distance between
any two adjacent units of
measurement (or ‘intervals’)
is the same, but the zero
point is arbitrary.
Ratio scale Ratio scale
consists not only of
equidistant points but also
has a meaningful zero
point.
Ordinal scale Ordinal
scale is a scale where
the values/observations
belonging to it can be
ranked (put in order)
or have a rating scale
attached. You can count

and order, but not
measure, ordinal data.
Nominal scale A set
of data is said to be
categorical if the values or
observations belonging to
it can be sorted according
to category.
Frequency
distributions Systematic
method of showing the
number of occurrences of
observational data in order
from least to greatest.
Statistic A statistic is a
quantity that is calculated
from a sample of data.
Graph A graph is a picture
designed to express words,
particularly the connection
between two or more
quantities.
Continuous variable A
set of data is said to be
continuous if the values
belong to a continuous
interval of real values.
Table A table shows the
number of times that items
occur.

Classes Classes provide
several convenient intervals
into which the values of
the variable of a frequency
distribution may be
grouped.


4

Business statistics using Excel

Measurement scale

Recognizing a measurement scale

Nominal data

1. Classification data, e.g. male or female, red or black car.
2. Arbitrary labels, e.g. m or f, r or b, 0 or 1.
3. No ordering, e.g. it makes no sense to state that r > b.

Ordinal data

1. Ordered list, e.g. student satisfaction scale of 1, 2, 3, 4, and 5.
2. Differences between values are not important, e.g. political parties
can be given labels: far left, left, mid, right, far right, etc. and student
satisfaction scale of 1, 2, 3, 4, and 5.

Interval data


1. Ordered, constant scale, with no natural zero, e.g. temperature, dates.
2. Differences make sense, but ratios do not, e.g. temperature difference.

Ratio data

1. Ordered, constant scale, and a natural zero, e.g. length, height, weight,
and age.

Table 1.1

indicates the frequency of occurrence of objects within a number of defined categories.
Microsoft Excel provides a number of tables that can be constructed using raw data or
data that is already in summary form.

1.2.1

What a table looks like

Tables come in a variety of formats, from simple tables to frequency distributions, that
allow data sets to be summarized in a form that allows users to be able to access important
information. The table presented in Figure 1.1 compares the interest rate and mortgage
rate cuts for five leading bank accounts that appeared in The Sunday Times newspaper on
12 April 2009. We can see from the table information about the lender, account, interest
rate cut, and mortgage rate cut. This table will have been created from a data set collected
by the researcher.

Example 1.1
When asked the question ‘If there was a general election tomorrow, which party would you
vote for’, 1110 students responded as follows: 400 said Conservative, 510 Labour, 78 Liberal

Democrats, 55 Green, and the rest some other party. We can put this information in table form
indicating the frequency within each category, either as a raw score or as a percentage of the
total number of responses (Table 1.2).
Party

Frequency

Conservative

400

Labour

510

Democrat

78

Party

or

36

Labour

46

Democrat


7

Green

55

Green

5

Other

67

Other

6

Total

1110

Total

x
Raw data Raw data is data
collected in original form.

Frequency, %


Conservative

Table 1.2 Proposed voting behaviour by 1110 university students
(source: University Student Survey June 2012)

100


Visualizing and presenting data

Note






When a secondary data source is used it is acknowledged.
The title of the table is given.
The total of the frequencies is given.
When percentages are used for frequencies this is indicated together with the sample size, N.

Sometimes categories can be subdivided and tables can be constructed to convey this
information together with the frequency of occurrence within the subcategories. For
example Table 1.3 indicates the frequency of half-yearly sales of two cars produced by a
large company with the sales split by month.

Example 1.2
Half-yearly sales of XBAR Ltd

Month

January

February

March

April

May

June

Total

Pink

5200

4100

6000

6900

6050

7000


35250

Blue

2100

1050

2950

5000

6300

5200

22600

Total

7300

5150

8950

11900

12350


12200

57850

Table 1.3 Half yearly sales of XBAR Ltd

Further subdivisions of categories may also be displayed as indicated in Table 1.4,
showing a sample of adult males, television viewing behaviour.

Example 1.3
Tabulated results from a survey undertaken to measure the television viewing habits of adult
males by marital status and age.
Single

Married

Under 30 years

30+ years

Under 30 years

Less than 15 hours per week

330

358

1162


484

15 hours or more per week

1719

241

643

1521

Total

2049

599

1805

2005

Table 1.4 Viewing habits of adult males

30+ years

5


6


Business statistics using Excel

Creating a frequency distribution

1.2.2

When data is collected by survey or by some other form we have, initially, a set of unorganized raw data which, when viewed, would convey little information. A first step would
be to organize the set into a frequency distribution such that ‘like’ quantities are collected
and the frequency of occurrence of the quantities determined.

Example 1.4
Consider the set of data that represents the number of insurance claims processed each day by
an insurance firm over a period of 40 days: 3, 5, 9, 6, 4, 7, 8, 6, 2, 5, 10, 1, 6, 3, 6, 5, 4, 7, 8, 4, 5,
9, 4, 2, 7, 6, 1, 3, 5, 6, 2, 6, 4, 8, 3, 1, 7, 9, 7 and 2.
The frequency distribution can be used to show how many days it took for one claim to be
processed, how many days it took to process two claims, and so on. The simplest way of doing
this is by creating a tally chart.
Write down the range of values from the lowest (1) to the highest (10) then go through the
data set recording each score in the table with a tally mark. It’s a good idea to cross out figures
in the data set as you go through it to prevent double counting. Table 1.5 illustrates the frequency distribution for the data set given in Example 1.4.
Score

Tally

Frequency, f

1

111


3

2

1111

4

3

1111

4

4

1111

5

5

1111

5

6

1111 11


7

7

1111

5

8

111

3

9

111

3

10

1

1
Σf = 40

Table 1.5


x
Tally chart A tally chart
is a method of counting
frequencies, according to
some classification, in a set
of data.
Grouped frequency
distribution Data
arranged in intervals to
show the frequency with
which the possible values
of a variable occur.

In this example there were relatively few cases. However, we may have increased our
survey period to one year, and the range of claims may have been between 0 and 30. As our
aim is to summarize information we may find it better to group ‘likes’ into classes to form
a grouped frequency distribution. The next example illustrates this point.

Example 1.5
Consider the following data set of miles travelled by 120 salesmen in one week
(Table 1.6).


×