CHAPTER
Good visibility –
pictorial presentation
of data
5
Chapter objectives
This chapter will help you to:
■ illustrate qualitative data using pictographs, bar charts and
pie charts
■ portray quantitative data using histograms, cumulative fre-
quency charts and stem and leaf displays
■ present bivariate quantitative data using scatter diagrams
■ display time series data using time series charts
■ use the technology: data presentation in EXCEL, MINITAB
and SPSS
■ become acquainted with business uses of pictorial data
presentation
In the last chapter we looked at arranging and tabulating data, taking
the first steps in transforming raw data into information, bringing
meaning to the apparently meaningless. In this chapter we will continue
this theme by considering various ways of portraying data in visual form.
Used appropriately the diagrams and charts you will find here are very
effective means of communicating the patterns and meaning con-
tained in data, specifically the patterns and sequences in distributions.
136 Quantitative methods for business Chapter 5
There are techniques that are very common in business documents so
being able to understand what they mean is an important skill.
There are many different diagrams and charts that can be used to do
this, so it is important to know when to use them. The techniques we
use depend on the type of data we want to present, in the same way as
the suitability of the methods of arranging data featured in the last
chapter depended on the type of data. Essentially, the simpler the data,
the simpler the presentational tools that can be used to represent them:
simple nominal data restricted to a few categories can be shown effect-
ively in the form of a simple bar chart whereas ratio data require the
more rigorous scaling of something like a histogram.
5.1 Displaying qualitative data
Section 4.4.1 of Chapter 4 covered the arrangement of qualitative data
in the form of summary tables. As well as being a useful way of display-
ing qualitative data, a summary table is an essential preliminary task to
preparing a diagram to portray the data.
A diagram is usually a much more effective way of communicating
data because it is easier for the eye to digest than a table. This will be
important when you have to include data in a report or presentation
because you want your audience to focus their attention on what you
are saying. They can do that more easily if they don’t have to work too
hard to understand the form in which you have presented your data.
Displaying qualitative data is fairly simple if there are few categories
of the attribute or characteristic being investigated. With more cat-
egories, the task can be simplified by merging categories.
There are three types of diagram that you can use to show qualitative
data: pictographs, pie charts and bar charts. We will deal with them in this
section in order of increasing sophistication.
5.1.1 Pictographs
A pictograph is little more than a simple extension of a summary table.
The categories of the attribute are listed as they are in a summary table,
and we use symbols to represent the number of things in each category.
The symbols you use in a pictograph should have a simple and direct
visual association with the data.
A pictograph like Figure 5.1 can be an effective way of presenting a sim-
ple set of qualitative data. The symbols are a simple way of representing
the number in each category and have the extra advantage of empha-
sizing the context of the data.
Pictographs do have some drawbacks that may put you off using
them. Unless you are artistically gifted and can create appropriate
images by hand, you will probably have to rely on computer software to
produce them for you. Creating a pictograph using a PC can be a labori-
ous process. Spreadsheet and statistical packages cannot produce a
pictograph for you directly from data, so symbols have to be grafted
alongside text in a word processing package.
If you do use pictographs you need to choose the symbols carefully.
They should be easy to associate with the context of the data and not
so elaborate that the symbols themselves become the focus of attention
rather than the data they are supposed to represent.
Chapter 5 Good visibility – pictorial presentation of data 137
Example 5.1
The table below lists four racehorse trainers and the number of horses they trained that
won races at a horse race meeting.
Show this set of data in the form of a pictograph.
Trainer Number of winners
Nadia Amazonka 5
Freddie Conn 3
Lavinia Loshart 1
Victor Sedlow 2
Trainer Number of winners
Nadia Amazonka
Freddie Conn
Lavinia Loshart
Victor Sedlow
Each symbol represents 1 winner.
Figure 5.1
Pictograph of the number of winners by each trainer
You may occasionally see a pictograph in academic and business docu-
ments; you are more likely to see them on television and in newspapers.
The computer graphics software reporters and editors use is much
more sophisticated than any that you are likely to have access to during
your studies.
5.1.2 Pie charts
The second method of displaying qualitative data that we will look at is
the pie chart. Pie charts are used much more than pictographs in part
because they can be produced using widely available computer software.
A pie chart, like a pictograph, is designed to show how many things
belong to each category of an attribute. It does this by representing the
entire set of data as a circle or ‘pie’ and dividing the circle into segments
or ‘slices’. Each segment represents a category, and the size of the seg-
ment reflects the number of things in the category.
Just about every spreadsheet or statistical package can produce a pie
chart like Figure 5.2 either from the original data or from a summary
table. You will find guidance on doing this using EXCEL, MINITAB and
SPSS in the final section of this chapter. These packages provide vari-
ous ways of enhancing pie charts: colour and shading patterns, 3D effects,
and detached or ‘exploded’ slices to emphasize a particular segment.
With practice you will be able to use these options in creating pie
charts, but don’t overdo it. Remember that the pattern of the data is
what you want to convey not your ability to use every possible gimmick
in the package.
Pie charts are so widely used and understood that it is very tempting
to regard them as an almost universal means of displaying qualitative
138 Quantitative methods for business Chapter 5
Example 5.2
The Steeralny Appliance Repair Service has depots in Crewe, Doncaster, Exeter and Frome.
The numbers of call-outs from each depot on one day are given in the following table:
Depot Call-outs
Crewe 36 (26.1%)
Doncaster 57 (41.3%)
Exeter 28 (20.3%)
Frome 17 (12.3%)
Total 138 (100.0%)
data. In many cases they are appropriate and effective, but in some situ-
ations they are not.
Because the role of a pie chart is to show how different components
make up a whole, you should not use one when you cannot or do not
want to show the whole. This may be because there are some values
missing from the data or perhaps there is an untidy ‘Other’ category
for data that do not fit in the main categories. In leaving out any data,
either for administrative or aesthetic reasons, you would not be pre-
senting the whole, which is exactly what pie charts are designed to do.
One reason that people find pie charts accessible is that the analogy
of cutting up a pie is quite an obvious one. As long as the pie chart
looks like a pie it works. However if you produce a pie chart that has
too many categories it can look more like a bicycle wheel than a pie,
and confuses rather than clarifies the data. If you have a lot of cat-
egories to present, say more than ten, either merge some of the cat-
egories in order to reduce the number of segments in the pie chart or
consider an alternative way of presenting your data.
5.1.3 Bar charts
Another method of portraying qualitative data is the bar chart. Like
pie charts, bar charts are widely used, fairly simple to interpret, and
Chapter 5 Good visibility – pictorial presentation of data 139
These data are presented in the form of a pie chart in Figure 5.2.
Figure 5.2
Number of call-outs by depot
Crewe (36, 26.1%)
Exeter (28, 20.3%)
Frome (17, 12.3%)
Doncaster (57, 41.3%)
can be produced using spreadsheet and statistical packages. However
because there are several different varieties of bar charts, they are
more flexible tools. We can use bar charts to portray not only simple
categorizations but also two-way classifications.
The basic function of a bar chart is the same as that of a pie chart,
and for that matter a pictograph; to show the number or frequency
of things in each of a succession of categories of an attribute. It repre-
sents the frequencies as a series of bars. The height of each bar is in
direct proportion to the frequency of the category; the taller the
bar that represents a category, the more things there are in that
category.
The type of bar chart shown in Figure 5.3 is called a simple bar chart
because it represents only one attribute. If we had two attributes to display
we might use a more sophisticated type of bar chart, either a component
bar chart or a stack bar chart.
The type of bar chart shown in Figure 5.4 is called a component
bar chart because each bar is divided into parts or components. The
140 Quantitative methods for business Chapter 5
Example 5.3
Produce a bar chart to display the data from Example 5.2.
Crewe Doncaster Exeter Frome
0
10
20
30
40
50
60
Depot
Number of call-outs
Figure 5.3
A bar chart of call-outs by depot
alternative name for it, a stacked bar chart, reflects the way in which
the components of each bar are stacked on top of one another.
A component bar chart is particularly useful if you want to emphasize
the relative proportions of each category, in other words to show the
balance within the categories of one attribute (in the case of Example 5.4
the depot) between the categories of another attribute (in Example 5.4
the type of call-out).
Chapter 5 Good visibility – pictorial presentation of data 141
Example 5.4
The call-outs data in Example 5.2 have been scrutinized to establish how many call-outs
from each depot concerned washing machines and how many concerned other appli-
ances. The numbers of the two call-out types from each depot are:
Display these data as a component bar chart.
Washing machine Other appliance
Depot call-outs call-outs
Crewe 21 15
Doncaster 44 13
Exeter 13 15
Frome 10 7
Figure 5.4
A component bar chart of call-outs by depot and appliance type
Washing machine
Other appliance
Crewe Doncaster Exeter
Frome
0
10
20
30
40
50
60
Depot
Number of call-outs
If you want to focus on this balance exclusively and are not too concerned
about the absolute frequencies in each category you could use a com-
ponent bar chart in which each bar is subdivided in percentage terms.
If you want to emphasize the absolute differences between the cat-
egories of one attribute (in Example 5.4 the depots) within the cat-
egories of another (in Example 5.4 the types of call-out) you may find a
cluster bar chart more useful.
The type of bar chart shown in Example 5.6 is called a cluster bar chart
because it uses a group or cluster of bars to show the composition of
each category of one characteristic by categories of a second charac-
teristic. For instance in Figure 5.6 the bars for Crewe show how the call-
outs from the Crewe depot are composed of call-outs for washing
machines and call-outs for other appliances.
At this point you may find it useful to try Review Questions 5.1 to 5.3
at the end of the chapter.
142 Quantitative methods for business Chapter 5
Example 5.5
Produce a component bar chart for the data in Example 5.4 in which the sections of the
bars represent the percentages of call-outs by appliance type.
Washing machine
Other appliance
FromeExeterDoncaster
Crewe
100
90
80
70
60
50
40
30
20
10
0
Depot
Percentage of call-outs
Figure 5.5
A component bar chart of percentages of call-outs by depot and appliance type
Chapter 5 Good visibility – pictorial presentation of data 143
Example 5.6
Produce a cluster bar chart to portray the data from Example 5.4.
Washing machine
Other appliance
FromeExeterDoncaster
Crewe
50
40
30
20
10
0
Depot
Number of call-outs
Figure 5.6
A cluster bar chart of call-outs by depot and appliance type
Example 5.7
In Example 4.6 the numbers of free refills taken by 20 customers visiting the UREA
department store cafe were tabulated as follows:
Number of Number of
refills customers
06
17
25
32
5.2 Displaying quantitative data
Quantitative data are more sophisticated data than qualitative data and
therefore the methods used to present quantitative data are generally
more elaborate. The exception to this is where you want to represent
the simplest type of quantitative data, discrete quantitative variables
that have very few feasible values. You can treat the values in these data
as you would categories in qualitative data, using them to construct a
bar chart or pie chart.
5.2.1 Histograms
In general quantitative data consist of a rather larger variety of values
than the data portrayed in Figure 5.7. In section 4.4.2 of Chapter 4
we saw how grouped frequency distributions could be used to arrange
quantitative data. Here we will look at what is probably the most widely
used way of displaying data arranged in a grouped frequency distribution,
the histogram. This is a special type of bar chart where each bar or block
represents the frequency of a class of values rather than the frequency
of a single value. Because they are composed in this way histograms are
sometimes called block diagrams.
You can see that in Figure 5.8 there are no gaps between the blocks
in the histogram. The classes on which it is based start with ‘0–19’ then
‘20–39’ and so on. When plotting such classes you may be tempted to
leave gaps to reflect the fact that there is a numerical gap between the
end of the first class and the beginning of the next but this would be
144 Quantitative methods for business Chapter 5
Figure 5.7 shows these data in the form of a bar chart.
Figure 5.7
Number of customers by number of refills
3210
8
7
6
5
4
3
2
1
0
Number of refills
Number of customers
wrong because the gap would be meaningless as it is simply not pos-
sible to receive say 19.2 messages.
A histogram is a visual tool that displays the pattern or distribution
of observed values of a variable. The larger the size of the block that
represents a class, the greater the number of values that has occurred
in that class. Because the connection between the size of the blocks
and the frequencies of the classes is the key feature of the diagram the
scale along the vertical or ‘Y’ axis must start at zero, as in Figure 5.8.
Chapter 5 Good visibility – pictorial presentation of data 145
Example 5.8
In Example 4.7 the numbers of email messages received by 22 office workers were
arranged in the following grouped frequency distribution.
Show this grouped frequency distribution as a histogram.
Number of messages received Frequency
0–19 11
20–39 4
40–59 4
60–79 2
80–99 1
Total frequency 22
100806040200
10
5
0
Messages
Frequency
Figure 5.8
Histogram of email messages sent by 22 office workers
As long as the classes in a grouped frequency distribution are of the
same width it is simply the heights of the blocks of the histogram that
reflect the frequencies of observed values in the classes. If the classes
have different widths it is important that the areas of the blocks are
proportional to the frequencies of the classes. The best way of ensuring
this is to represent the frequency density rather than the frequency of the
classes. The frequency density is the frequency of values in a class
divided by the width of the class. It expresses how densely the values
are packed in the class to which they belong.
146 Quantitative methods for business Chapter 5
Example 5.9
The table below shows the distribution of ages of customers opening accounts at a
bank:
Calculate frequency density figures for the classes in the distribution and use them to
produce a histogram to portray the distribution.
In this distribution the classes have different widths, but an additional complication
is that the first class has no numerical beginning and the last class has no numerical
end, they are both ‘open-ended’ classes.
Before we can proceed we need to ‘close’ these classes. In the case of the first class
this is straightforward; we can simply express it as ‘0 to 14’. The last class poses more of
a problem. If we knew the age of the oldest person we could use that as the end of the
class, but as we don’t we have to select an arbitrary yet plausible end of the class. In
keeping with the style of some of the other classes we could use ‘65 to 84’.
The amended classes with their frequency densities are:
Age range Frequency
Under 15 0
15–24 5
25–44 20
45–64 18
Over 64 7
Age range Frequency Frequency density
0–14 0 0/15 ϭ 0.00
15–24 5 5/10 ϭ 0.50
25–44 20 20/20 ϭ 1.00
45–64 18 18/20 ϭ 0.90
65–84 7 7/20 ϭ 0.35
Using frequency densities in Figure 5.9 means that the height of the
block representing the ‘15 to 24’ class is increased to reflect the fact that
it is narrower than the other classes. Despite having only one quarter of
the frequency of the ‘25–44’ class the height of the block representing the
‘15–24’ class is half the height of the block representing the ‘25–44’ class.
The class is half the width of the classes to the right of it, so to keep the area
in proportion to the frequency the height of the block has to be doubled.
In Figure 5.9 there are no gaps between the classes, although it might
be tempting to insert them as each class finishes on the number before
the next class begins. This would be wrong because, for instance, people
are considered to be 14 years old right up until the day before their
fifteenth birthday.
The pattern of the distribution shown in Figure 5.9 is broadly bal-
anced or symmetrical. There are two large blocks in the middle and
smaller blocks to the left and right of the ‘bulge’. From this we would
conclude that the majority of observed values occur towards the middle
of the age range, with only a few relatively young and old customers.
In contrast, if you look back at Figure 5.8, the histogram showing the
numbers of email messages received, you will see an asymmetrical or
skewed pattern. The block on the left-hand side is the largest and the size
of the blocks gets smaller to the right of it. It could be more accurately
described as right or positively skewed. From Figure 5.8 we can conclude
Chapter 5 Good visibility – pictorial presentation of data 147
Figure 5.9
Histogram of ages of customers opening bank accounts
85654525150
1.25
1.00
0.75
0.50
0.25
0.00
Ages
Frequency density
that the majority of office workers receive a relatively modest number
of email messages and only a few office workers receive large numbers
of email messages.
You may come across distributions that are left or negatively skewed.
In these the classes on the left-hand side have smaller frequencies and
those on the right-hand side have larger frequencies.
In Figure 5.10 there are no gaps between the classes because there
are no numerical gaps between the classes; they are seamless.
148 Quantitative methods for business Chapter 5
Example 5.10
Raketa Airlines say they allow their passengers to take up to 5 kg of baggage with them
into the cabin. The weights of cabin baggage taken onto one flight were recorded and
the following grouped frequency distribution compiled from the data:
Portray this distribution in the form of a histogram.
Weight of cabin
baggage (kg) Number of passengers
0 and under 1 2
1 and under 2 3
2 and under 3 8
3 and under 4 11
4 and under 5 20
5 and under 6 18
6543210
20
10
0
Baggage weight (kg)
Frequency
Figure 5.10
Histogram of weights of cabin baggage
5.2.2 Cumulative frequency graphs
An alternative method of presenting data arranged in a grouped fre-
quency distribution is the cumulative frequency graph. This diagram
shows the way in which the data accumulates through the distribution
from the first to the last class in the grouped frequency distribution. It
uses the same horizontal axis as you would use to construct a histogram
to present the same data, but you have to make sure that the vertical axis,
which must begin at zero, extends far enough to cover the total frequency
of the distribution.
To plot a cumulative frequency graph you must begin by working
out the cumulative frequency of each class in the grouped frequency
distribution. The cumulative frequency of a class is the frequency of the
class itself added to the cumulative, or combined frequency of all the
preceding classes.
The cumulative frequency of the first class is simply the frequency of
the first class because it has no preceding classes. The cumulative fre-
quency of the second class is the frequency of the second class added to
the frequency of the first class. The cumulative frequency of the third
class is the frequency of the third class added to the cumulative frequency
of the second class, and so on.
Note that the cumulative frequency of the last class in the distribution
in Example 5.11 is 22, the total frequency of values in the distribution.
This should always be the case. Once we have included the values in
the final class in the cumulative total we should have included every
value in the distribution.
Chapter 5 Good visibility – pictorial presentation of data 149
Example 5.11
Find the cumulative frequencies of each class in the grouped frequency distribution in
Example 5.8.
Number of Cumulative
messages sent Frequency frequency
0–19 11 11
20–39 4 15
40–59 4 19
60–79 2 21
80–99 1 22
The cumulative frequency figures represent the number of values
that have been accumulated by the end of a class. A cumulative fre-
quency graph is a series of single points each of which represents the
cumulative frequency of its class plotted above the very end of its class.
There should be one plotted point for every class in the distribution.
The final step is to connect the points with straight lines.
If you look carefully at Figure 5.11 you will see that the line begins at
zero on the horizontal axis, which is the beginning of the first class,
and zero on the vertical axis. This is a logical starting point. It signifies
that no values have been accumulated before the beginning of the first
class. The line then climbs steeply before flattening off. The steep
climb represents the concentration of values in the first class, which
contains half of the values in the distribution. The flatter sections to
the right represent the very few values in the later classes.
The line in Figure 5.12 starts with a gentle slope then rises more
steeply before finishing with a gentle slope. This signifies that the first
classes contain few values, the middle classes contain many values, and
the final classes contain few values. This is a symmetrical distribution,
whereas the distribution shown in Figure 5.11 is a skewed distribution.
150 Quantitative methods for business Chapter 5
Example 5.12
Plot a cumulative frequency graph for the data in Example 5.8.
Figure 5.11
Cumulative frequency graph of email messages sent by 22 office
workers
100500
20
10
0
Messages
Cumulative frequency
It may be more convenient to plot a cumulative relative frequency
graph, in which the points represent the proportions of the total num-
ber of values that occur in and prior to each class. This is particularly use-
ful if the total number of values in the distribution is an awkward number.
You will find further discussion of cumulative frequency graphs in the
next chapter because they offer an easy way of finding the approximate
values of medians, quartiles and other order statistics.
At this point you may find it useful to try Review Questions 5.4 to 5.9
at the end of the chapter.
Chapter 5 Good visibility – pictorial presentation of data 151
Example 5.13
Plot a cumulative frequency graph for the distribution of contents of bottles of ‘Nogat’
nail polish in Example 4.8.
Cumulative
Nail polish (ml) Frequency frequency
9.80–9.89 3 3
9.90–9.99 7 10
10.00–10.09 11 21
10.10–10.19 5 26
10.20–10.29 3 29
10.30–10.39 1 30
10.410.310.210.110.09.99.8
30
20
10
0
Volume (ml)
Cumulative frequency
Figure 5.12
Cumulative frequency graph of contents of nail varnish bottles
Example 5.14
The size of cash payments made by 119 customers at a petrol station is summarized in
the following grouped frequency distribution. Plot a cumulative relative frequency graph.
5.2.3 Stem and leaf displays
Histograms and cumulative frequency graphs are effective and widely
used means of presenting quantitative data. Until fairly recently they
could be described as unrivalled. However, there is an alternative way of
presenting quantitative data in visual form, the stem and leaf display.
This is one of a number of newer techniques known collectively as
Exploratory Data Analysis (EDA). If you want to know more about EDA,
the books by Tukey (1977), and Velleman and Hoaglin (1981) provide
a thorough introduction.
152 Quantitative methods for business Chapter 5
Cumulative relative
Payment (£) Frequency Relative frequency frequency
5.00–9.99 15 15/119 ϭ 0.126 0.126
10.00–14.99 37 37/119 ϭ 0.311 0.437
15.00–19.99 41 41/119 ϭ 0.344 0.781
20.00–24.99 22 22/119 ϭ 0.185 0.966
25.00–29.99 4 4/119 ϭ 0.034 1.000
Figure 5.13
Cumulative relative frequency graph of cash payments at a petrol station
302010
1.0
0.5
0.0
Payment (£)
Cumulative relative frequency
The role of a stem and leaf display is the same as the role of a
histogram, namely to show the pattern of a distribution. But unlike a
histogram, a stem and leaf display is constructed using the actual data
as building blocks, so as well as showing the pattern of a distribution it is
also a list of the observations that make up that distribution. It is a very
useful tool for making an initial investigation of a set of data as it por-
trays the shape of the distribution, identifies unusual observations and
provides the basis for judging the suitability of different types of averages.
The basis of a stem and leaf display is the structure of numbers, the
fact that a number is made up of units, tens, hundreds and so on. For
instance the number 45 is composed of two digits, the 4 tens and the
5 units. Using the analogy of a plant, the stem of the number 45 is the
number on the left-hand side, 4 (the number of tens) and the leaf is
the number on the right hand side, 5 (the number of units). A stem on
a plant can have different leaves; in the same way the numerical stem 4
can have different numerical leaves. The number 48 has the same stem
as the number 45, but a different leaf, 8.
To produce a stem and leaf display for a set of data we have to list the
set of stem digits that appear in the data and then record each observa-
tion by putting its leaf digit alongside its stem digit. When we have done
this for every observed value in the set of data the result is a series of ‘stem
lines’ each of which consists of a stem digit and the leaf digits of all the
observations sharing that particular stem. The final stage in the process
is to arrange the leaf digits on each stem line in order of magnitude.
The message ‘leaf unit ϭ 1’ on the final version of the stem and leaf
display in Example 5.15 has the same role as the scale on the horizon-
tal or ‘X’ axis of a histogram, in that it specifies the order of magnitude
Chapter 5 Good visibility – pictorial presentation of data 153
Example 5.15
Musor Burgers operate fast food restaurants. The seating capacities of the 27 restau-
rants they operate in East Anglia are:
53 38 59 62 51 51 28 45 61 39
59 50 48 74 52 41 73 68 47 48
52 56 52 55 47 52 41
Produce a stem and leaf display for these data.
Every value consists of two digits: tens and units. The tens are the stem digits and the
units are the leaf digits. The lowest value is 28 and the highest is 74 so the first stem line
will be for the stem digit 2, and the last one for the stem digit 7. The first stem line will
have a leaf digit for the lowest value, the 8 from 28. The second stem line, for the stem
digit 3, will have two leaf digits, the 8 from 38 and the 9 from 39, and so on.
of the data. Without this message someone might look at the display,
see that the highest value in the distribution has the stem digit 7 and
the leaf digit 4, but be unclear whether the value is 0.74, 7.4, 74, 740,
7400, or any other number with a 7 followed by a 4. It is only when you
know that the leaf digits are units in this display that you can be sure
the stem digit 7 and the leaf digit 4 represents the number 74.
Although a stem and leaf display may appear a little strange it is a
tool that is well worth learning to use because it has two advantages
over a histogram: particular values can be highlighted and two distri-
butions can be shown in one display. A histogram simply cannot do the
former because it consists of blocks rather than data. It is possible to
154 Quantitative methods for business Chapter 5
Example 5.16
Five of the Musor restaurants whose seating capacities are given in Example 5.15 are in
city centre locations. The seating capacities for these five restaurants are shown in bold
type below:
53 38 59 62 51 51 28 45 61 39
59 50 48 74 52 41 73 68 47 48
52 56 52 55 47 52 41
We can embolden these values in the stem and leaf display.
This is a stem and leaf display, but it is not yet finished. We need to rearrange the leaf
digits so they are listed from the smallest to the largest.
Stem Leaves
28
389
4 8711578
5 392069251212
6 281
743
Stem Leaves
28
389
4 1157788
5 011222235699
6 128
734
Leaf unit ϭ 1
plot a histogram showing two distributions but the result is cumbersome
and you would do better to plot two separate histograms.
To show two distributions in one stem and leaf display you simply list
the leaf digits for one distribution to the left of the list of stem digits and
the leaf digits for the other distribution to the right of the stem digits.
By looking at the display in Example 5.17 you can see that in general
the restaurants in the Bristol area have larger seating capacities than
those in East Anglia.
Chapter 5 Good visibility – pictorial presentation of data 155
You can see from the display in Example 5.15 that the city centre restaurants are among
those with larger seating capacities.
Stem Leaves
28
389
4 1157788
5 0 1122223 5699
612 8
7 3 4
Leaf unit ϭ 1
Example 5.17
The seating capacities for the 22 Musor restaurants in the Bristol area are:
61 54 73 78 59 49 51 58 75 67
60 87 61 70 52 56 86 91 55 76
69 82
Produce a stem and leaf display to show these data alongside the seating capacity data
from the Musor restaurants in East Anglia given in Example 5.15.
East Anglia Stem Bristol
82
98 3
877511 4 9
996532222110 5 1245689
821 6 01179
43 7 03568
8 267
91
Leaf unit ϭ 1
On the left-hand side of the stem and leaf display in Example 5.17
stem line 5 is heavily loaded with leaf digits. You can modify stem and
leaf displays to reduce long rows of leaf digits by stretching the stems.
In Example 5.18 the stem and leaf display contains two stem lines for
each stem digit, except 2 and 7. The first stem line for a stem digit con-
tains leaf units 0 to 4 inclusive. The second stem line contains leaf units
5 to 9 inclusive. There is only one stem line for stem digit 2 because the
stem digit 2 has no leaf digits less than 5. There is only one stem line
for stem digit 7 because the stem digit 7 has no leaf digits more than 4.
The data we have used so far to construct stem and leaf displays has
consisted of two-digit numbers which makes it fairly easy: the left-hand
digit is the stem and the right-hand digit is the leaf. But what if you
have to deal with more complex figures? In the same way as we can
experiment with different classes to produce a suitable histogram, we
can try rounding, dividing stem lines, having longer stems or longer
leaves to produce a suitable stem and leaf display. Just as we can have
too many or too few classes in a histogram, we can have too many or
too few stem lines in a stem and leaf display. The important thing is to
construct the display so that it is an effective way of presenting the data
we have.
You can see by looking at the stem and leaf display in Example 5.19
that there are many cheaper but few expensive quotations. This is an
example of a positively skewed distribution.
156 Quantitative methods for business Chapter 5
Example 5.18
Produce a stem and leaf display to show the seating capacities of the Musor restaurants
in East Anglia. Use two stems for each stem digit.
Stem Leaves
28
3
389
411
4 778
5 01122223
5 5699
612
68
734
Leaf unit ϭ 1
Chapter 5 Good visibility – pictorial presentation of data 157
Example 5.19
The prices in £s of 16 different motor insurance quotations received by a motorist were:
448 423 284 377 502 459 278 268
374 344 256 228 380 286 219 352
Produce a stem and leaf display to show these data.
There are two ways to approach this task. You could try longer, two-digit stems and
one-digit leaves, so for instance the value 448 will have a stem of 44 and a leaf of 8. This
means that your list of stem lines will start with 21 (the stem of 219, the lowest value)
and finish with 50 (the stem of 502, the highest value). You would have a very long list
of stem lines (30) with only 16 leaf digits shared between them.
Alternatively you might try one-digit stems and longer, two-digit leaves, so 448 will
have a stem of 4 and a leaf of 48. This is much more promising in this case.
Stem Leaves
2 19285668788486
3 4452747780
4 234859
502
Leaf unit ϭ 1.0
Although a stem and leaf display is essentially an alternative to a histo-
gram, it can be used instead of a grouped frequency distribution as a
way of sorting the data before plotting a histogram.
At this point you may find it useful to try Review Questions 5.10 to
5.13 at the end of the chapter.
5.3 Presenting bivariate quantitative data
The techniques for presenting quantitative data that you have met so
far in this chapter have one thing in common: they are all designed to
portray the observed values of one variable. They are sometimes called
tools of univariate analysis.
But what if you want to present values of two variables in one diagram
in order to illustrate a connection (or maybe a lack of connection)
between them? In that case you need another type of graph, the scatter
diagram, which is a tool of bivariate, that is, two-variable, analysis. The
word scatter is used because the intention of the diagram is to show
how the observed values of one variable are distributed or scattered in
relation to the observed values of another variable.
A set of bivariate data consists of two sets of observed values, a pair of
values for each item or thing or person that has been studied. A scatter
diagram is constructed by plotting a point for each pair of observed val-
ues in the set of data. One value in the pair is plotted against the hori-
zontal axis, the other against the vertical axis. The result is a scatter of
points that will form some pattern if there is a connection between the
variables.
Typically when we plot a scatter diagram we do so because we have
a specific notion about the possible connection between the two vari-
ables. If we believe that one variable depends in some way on the other
variable we refer to one of the variables as the dependent variable
whose values we think depend on the values of the other, which is
called the independent variable. The dependent variable is known as
the Y variable and its observed values are plotted against the Y, or verti-
cal, axis. The independent variable is known as the X variable and its
values are plotted against the X, or horizontal, axis.
158 Quantitative methods for business Chapter 5
Example 5.20
Produce a histogram to portray the data in Example 5.19.
To do this we can use each stem line in the stem and leaf display as a class, which will
be represented as a block in the histogram. The first stem line will be the class ‘200 and
under 300’ and so on.
600500400300200
7
6
5
4
3
2
1
0
Price
Frequency
Figure 5.14
Histogram of prices of motor insurance quotations
In Figure 5.15 you can see 13 points in the diagram, one for each of the
13 days in the set of data. Each point represents both the temperature
and the amount of barbecue fuel sold for a particular day. The position
of the point along the vertical or Y axis tells you the quantity sold on
the day and the position of the point along the horizontal or X axis
tells you the temperature on the day, for instance the point on the bot-
tom left of the diagram represents the day when the temperature was
15 degrees Celsius and the quantity sold was 10 kg.
The diagram shows us that there appears to be a clear connection
between the temperature and the quantity of barbecue fuel sold. The
Chapter 5 Good visibility – pictorial presentation of data 159
Example 5.21
The maximum daytime temperatures (in degree Celsius) and the quantities of barbe-
cue fuel (in kg) sold at a service station on 13 days were:
Decide which is the dependent (Y) variable and which is the independent (X) variable
then produce a scatter diagram to portray this set of data.
Fuel sold is the dependent (Y) variable because logically it depends on the tempera-
ture, which is the independent (X) variable.
Temperature (°C) 15 17 18 18 19 20 21 22 24 25 27 27 28
Fuel sold (kg) 10 15 25 20 45 50 40 85 130 135 170 195 180
15 20 25
0
100
200
Temperature
Fuel sold (kg)
Figure 5.15
A scatter diagram of temperature and barbecue fuel sold