Tải bản đầy đủ (.ppt) (29 trang)

Bài giảng nguyên lý thông kê chương 3 numerical measures part b student

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (168.03 KB, 29 trang )

Chapter 3

Statistical measures
Measure center and location
Measure variation/dispersion


Summary
Statistical measures
Center and location

Variation/Dispersion

- Mean (arithmetic,
weighted, geometric)

- Range

- Mode, Median
- Percentile, Quartile

- Variance
- Standard deviation
- Coefficient of variation


Part B
Measures of variation/dispersion
1.
2.
3.


4.
5.

Range
Mean deviation
Variance
Standard deviation
Coefficient of variation


1. The range




The range is defined as the numerical difference
between the smallest and largest values of the
items in a set or distribution
Formula:
R = largest value – smallest value


Example


Ages of two groups of people on survey:

Group
A


20

30

40

50

60

Group
B

38

39

40

41

42


Advantages and disadvantages
of the range


Advantages:




Disadvantages:


Implication


2. The mean deviation




The mean deviation is a measure of dispersion
that gives the average difference (i.e. ignoring ‘-’
signs) between each item and mean.
Formula:
- For a data set
n

d =

∑x
i =1

i

n

−x



Formulae
- For a frequency distribution
k

d =

∑f
i =1

i

xi − x

k

∑f
i =1

i


Example
Group
A

20

30


40

50

60

Group
B

38

39

40

41

42

n

∑x

d A = i =1

i

n


−x


Example
Group
A

20

30

40

50

60

Group
B

38

39

40

41

42


n

∑x

d B = i =1

i

n

−x


Example


The data in table
below relates to the
productivity
(kg/person) of 100
workers in a small
factory
Mean deviation?

Productivity
(kg/person)
<10

Number of
workers

7

10 – 20

18

20 – 30

25

30 – 35

20

35 – 40

18

≥ 40

12

Total

100


Characteristics of the mean deviation






A better measure of dispersion than the range
Useful for comparing the variability between
distributions
Can be complicated to calculate in practice if the
mean is anything other than a whole number.


3. Variance






Variance is another statistical measure of
dispersion
It is defined as the average of squared
discrepancies between each data value and their
mean
Formula:


For a set of values
n

σ =
2


∑( x
i =1

i

− x)

2

n

n

or

σ2 =

2
x
∑ i
i =1

n

− ( x )2 = x 2 − ( x )2

The mean of the squares less the square of the mean



For a frequency distribution
k

σ2 =

2
(
x

x
)
fi
∑ i
i =1

k

∑f
i =1

i

k

or

σ2 =

2
x

∑ i fi
i =1
k

∑f
i =1

− ( x )2 = x 2 − ( x )2

i

The mean of the squares less the square of the mean


Example
Group
A

20

30

40

50

60

Group
B


38

39

40

41

42

n

σ2 =

2
(
x

x
)
∑ i
i =1

n


Example



The data in table
below relates to the
productivity
(kg/person) of 100
workers in a small
factory
Variance?

Productivity
(kg/person)
>10

Number of
workers
7

10 – 20

18

20 – 30

25

30 – 35

20

35 – 40


18

≥ 40

12

Total

100


Characteristics of the variance




A better measure of dispersion than the range
Complicated since it multiply the discrepancies
The unit of the variance is not meaningful


4. Standard deviation




Standard deviation is defined as the square root
of the variance.
Formula



For a set of values
n

∑( x

σ=
n

or

σ=

i =1

n

−x)

n

∑x
i =1

i

2

2
i


− (x ) = x − (x )
2

2

2


For a frequency distribution
k

∑( x

σ=

i =1

i

−x )

2

fi

k

∑f
i =1


i

k

or

σ=

2
x
∑ i fi
i =1
k

∑f
i =1

i

− ( x )2 = x 2 − ( x )2


Example
Group
A

20

30


40

50

60

Group
B

38

39

40

41

42

σ= σ

2


Example


The data in table
below relates to the

productivity
(kg/person) of 100
workers in a small
factory
Standard
deviation?

Productivity
(kg/person)
>10

Number of
workers
7

10 – 20

18

20 – 30

25

30 – 35

20

35 – 40

18


≥ 40

12

Total

100


Characteristics of Standard Deviation



Can be regarded as one of the most useful and
appropriate measure of dispersion.
For distribution that are not too skewed:
- 99.7% of the data items should lie within three
standard deviation of the mean
- 95% of the data items should lie within two
standard deviation
- 68% of the data items should lie within one
standard deviation of the mean


×