Descriptive Statistics:
Numerical Methods
1
4.1 Measures of Central Location
❧ The central data point reflects the locations of all the actual data points.
❧ How?
With two data points,
the central location
With one data point
should fall in the middle
clearly the central
between them (in order
location is at the point
to reflect the location of
itself.
both of them).
4.1 Measures of Central Location
❧ The central data point reflects the locations of all the actual data points.
❧ How?
But if the third data point
If the third
appears
data on
point
theappears
left hand-side
in the center
the measure
of the midrange,
of central itlocation
should will
“pull”
remain
thein central
the center,
location
but…to(click)
the left.
4.1 Measures of Central Location
As more and more data points are added, the
central location moves (left and right) as required
in order to reflect the effects of all the points.
The Arithmetic Mean (average)
• This is the most popular and useful measure of central location
Mean =
Sum of the measurements
Number of measurements
The Arithmetic Mean
Sample mean
nn x
∑
∑ i=i=11x i i
x=
nn
Sample size
Population mean
∑ Ni=1 x i
µ=
N
Population size
The Arithmetic Mean
Example 1
Find the mean rate of return for a portfolio equally invested in five stocks having the following annual rate of returns:
11.2%, 8.07%, 5.55%, 13.7%, 21%.
Solution
11.2 + 8.07 + 5.55 + 13.7 + 21
x=
= 9.764%
5
7
3. Geometric mean
• A specialized measure, used to find the average growth rate, or rate of change of a variable over time
• Example:
The number of students attending the music class last Tuesday was 160. This Tuesday, the number is expected to
increase by 15%.
How many of them are likely to attend this Tuesday?
3. Geometric mean
The number of students likely to attend this Tuesday
Number of students
Growth rate/rate of change?
= 160*(100+15)%
= 160*(1+0.15)= 184 (students)
15% or 0.15
3. Geometric mean
• Formula:
-
Step 1: Express the rate of change (R) as (1+R)
- Step 2: Calculate the geometric mean using the formula:
(i) Simple geometric mean: applied when each rate of change appears once only
Rg = n (1+ R1 )(1+ R2 )...(1+ Rn ) −1
3. Geometric mean
-
Step 2: Calculate the geometric mean
(ii) Weighted geometric mean: applied when each rate of return repeatedly appears
Rg = (1+ R1 ) (1+ R2 ) ...(1+ Rk ) −1
f1
n
Rg =
f2
k
n
∏ (1+ R )
i
i=1
fk
fi
−1
Example
The number of employees in a small bank over the period 2000-2006 is presented in the table
below:
Year
2000
2001
2002
2003
2004
2005
2006
No of
200
220
250
262
284
300
312
employees
What is the average rate of change in the number of employees?
Example
Year
2000
2001
2002
2003
2004
2005
2006
No of
200
220
250
262
284
300
312
employee
s
(1+R)
-
1.1
1.136
1.048
1.084
1.056
1.04
Example
The average rate of change:
Rg = 1.1×1.136×1.048×1.084 ×1.056×1.04 −1= 0.077 ~ 7.7%
6
Example
Year
Growth
rate (%)
Year
Growth
rate (%)
2000
2001
2002
2003
2004
10
25
15
10
10
2005
2006
2007
2008
2009
10
10
15
25
15
Example
Average growth rate
k
Rg =
Rg =
5+3+2
∑ fi
i=1
k
fi
(1+
R
)
−1
∏
i
i=1
1.10 ×1.15 ×1.25 −1= 0.14 ~14%
5
3
2
Characteristics of the mean
A representative of a data set
Takes every single value into account so it is likely to be affected by extreme values
Used to compare different-sized data sets.
The Median
• The median of a set of measurements is the value that falls in the middle when the
measurements are arranged in order of magnitude.
• When determining the median pay attention to the number of observations (k).
•
•
‘k’ is odd
Median = the number at the (k+1)/2th location of the ordered
array.
‘k’ is Even
Median = the average of the two numbers in the middle
(The number at the
(k/2)th and the [(k/2)+1)]th
locations of the ordered array.)
The Median
Example 2
Suppose an additional salary of $31,000
The salaries of seven employees
is added to the group of salaries recorded
were recorded (in 1000s): 28, 60, 26,
before. Find the median salary.
32, 30, 26, 29.
Find the median salary.
Odd number of observations
26,26,28,29,30,32,60
Even number of observations
26,26,28,29,
29.5, 30,32,60,31
There are seven salaries (K = 7).
There are eight salaries (K = 8).
th
The (k+1)/2 salary of the ordered
th th
The two salaries in the middle are 29 (in the (k/2) =4 location), and 30
array is the number at the
th th
(in the [(k/2)+1] =5 location.
th th
(7+1)/2 = 4 location.
The median is the average number – 29.5.
The median is 29.
The Mode
• The Mode of a set of measurements is the value that occurs most frequently.
• A Set of data may have one mode (or modal class), or two or more modes.
For large data sets
The modal class
the modal class is
much more relevant
than a single-value
mode.
The Mode
• Example 3
The manager of a men’s clothing store observes the waist size (in inches) of trousers sold last
week: 31, 34, 36, 33, 28, 34, 30, 34, 32, 40.
The mode of this data set is 34 in.
Thisinformation
informationseems
seemstotobebevaluable
valuable(for
(forexample,
example,for
forthe
the
This
designofofa anew
newdisplay
displayininthe
thestore),
store),much
muchmore
morethan
than“ “the
the
design
medianisis33.5
33.5in.”
in.”
median
Relationship among Mean, Median, and
Mode
•
If a distribution is symmetrical, the mean, median and mode coincide
❧
If a distribution is non symmetrical, and skewed to the left or to the right, the three measures
differ.
A positively skewed distribution
(“skewed to the right”)
Mode
Mean
Median
Relationship among Mean, Median, and
Mode
• If a distribution is symmetrical, the mean, median and mode coincide
❧ If a distribution is non symmetrical, and skewed to the left or to the right, the three
measures differ.
A positively skewed distribution
A negatively skewed distribution
(“skewed to the right”)
(“skewed to the left”)
Mode
Mean
Median
Mean
Mode
Median
Using the Mean, Median, and Mode
• The mean - is very sensitive to extreme values, is used in most statistical
analyses.
• The median is not effected by extreme values, yet, does not reflect all the
values included in the data set, but rather the location of the observation
in the middle.
• The mode – should be used mainly for categorical data.
4.2 Measures of Variability
• Measures of central location fail to tell the whole story about the distribution.
• A question of interest still remains unanswered:
How much are the values of a given set spread
out around the mean value?