Discriminan
t Analysis
What Is It?
Analysis when the dependent variable is categorical or
nominal and independent variables metric
Discriminate or Classify individuals into groups on the
basis of independent variables.
Involves deriving a variate which is a linear combination
of the independent variables.
This variate obtained by maximizing Between-Group
Variance relative to Within-Variance Group.
The linear combination is called Discriminant Function
as follows:
Z = a + W1X1 + W2X2 + ... + WnXn
Can also be used to test the hypothesis that group
means of a set of independent variables for 2 or more
groups are equal.
Muhamad Jantan & T. Ramayah
Discriminant Analysis
2
Objectives
♦ Profile Analysis
♦ Predictive Technique
♦ Test differences between groups on average score
profiles of a set of variables
♦ Determine the impact of independent variables on the
differences in average score profiles of two or more
groups.
♦ Classify units on the basis of their scores on a set of
independent variables.
♦ Establishing the number and composition of the
dimensions of discrimination between groups formed
from the set of independent variables
Muhamad Jantan & T. Ramayah
Discriminant Analysis
3
Assumptions
♦ Independent variables distributed as Multivariate Normal
♦
♦
♦
♦
♦
with unknown but equal Covariance matrices across the
groups.
Non-normality affects estimation of the discriminant
function. Use logistic regression instead.
Inequality of Covariance Matrices across groups affects
classification process. If sample size is small, estimation
process affected, where groups with larger covariance will
be overclassified. Rectify by using larger sample sizes or
quadratic classification techniques.
Multicollinearity of Independent Variables: Especially
when stepwise procedure is used. Especially critical when
used for explanation purposes. When interpreting the
results, always be aware of the level of collinearity.
Linearity of relationship
Outliers may have substantial impact on results
Muhamad Jantan & T. Ramayah
Discriminant Analysis
4
Estimation And Assessment
♦ Estimation
Simultaneous Method, or
Stepwise Method
♦ Assessment
Statistical Significance of Discriminant Function
Wilk’s λ, Hotelling trace, Pillai Criteria - evaluate
the discriminatory power of the function
Roy’s max. root - evaluate the first discriminant
function only
For Stepwise procedure: Mahalanobis D 2
and
Rao’s V ;
D2 - uses distances, adjusts for unequal
covariances
When 3 or more groups - evaluate overall
significance and significance of individual function
Muhamad Jantan & T. Ramayah
Discriminant Analysis
5
SPSS Commands
Dividing the Sample into Estimation and Split/Holdout
Sample: Random Selection Command:
TRANSFORM ⇒ RANDOM NUMBER SEED
TRANSFORM ⇒ COMPUTE Randz = UNIFORM(1) >
0.65 ⇒ will give ≈ 65% of respondent for estimation
and the remainder for holdout sample
Estimating the Discriminant Function(s):
SPSS ⇒ CLASSIFY ⇒ DISCRIMINANT:
This will give you a dialogue box for
Discriminant: Select the grouping (dependent)
variable and the independent variables. Also
need the SELECT option to identify units for
estimation sample: In this case use Randz with
SET VALUE at 0.
Options Available:
• Method: Stepwise or Simultaneous
• Classify: Provide options for Prior Probabilities, Using VarCov. Matrices, Plots and Display
Muhamad Jantan & T. Ramayah
Discriminant Analysis
6
SPSS: Discriminant
Analysis
SPSS Command: Analyze Classify Discriminant
Select Cases Statistics Classify
Muhamad Jantan & T. Ramayah
Discriminant Analysis
7
SPSS: Results for Two
groups
Wilks' Lambda
Test of Funct ion(s)
1
Wilks'
Lambda
.283
Chi-square
67.544
df
7
Sig.
.000
Wilks’ is significant indicating that we
have a significant discriminant function
Muhamad Jantan & T. Ramayah
Discriminant Analysis
8
SPSS: Results for Two
groups
Eigenvalues
Funct ion
1
Eigenvalue
2.534 a
% of
Variance
100.0
Cumulat ive %
100.0
Canonical
Correlat ion
.847
a. First 1 canonical discriminant funct ions were used in t he
analysis.
Indicates that (0.847)2 = 72% of variance in the
dependent variable is explained by the
independent variables
Muhamad Jantan & T. Ramayah
Discriminant Analysis
9
Descriptives
Group St at ist ics
Art iculat ion Of Needs
Specificat ion Buying
Tot al Value Analysis
Tot al
Muhamad Jantan & T. Ramayah
Delivery Speed
Price Level
Price Flexibilit y
Manufact urer Image
Service
Salesforce Image
Product Qualit y
Delivery Speed
Price Level
Price Flexibilit y
Manufact urer Image
Service
Salesforce Image
Product Qualit y
Delivery Speed
Price Level
Price Flexibilit y
Manufact urer Image
Service
Salesforce Image
Product Qualit y
Mean
2.404
2.946
6.707
5.343
2.646
2.693
8.421
4 .24 2
2.048
8.613
5.123
3.132
2.587
6.023
3.369
2.475
7.708
5.227
2.902
2.637
7.161
Discriminant Analysis
St d.
Deviat ion
1.0772
1.1971
.8602
.9155
.9913
.6248
.8883
1.0810
1.04 91
1.14 33
1.3732
.5369
.8628
1.2672
1.4149
1.2004
1.3935
1.1738
.8163
.7547
1.6302
Valid N (list wise)
Unweight ed
Weight ed
28
28.000
28
28.000
28
28.000
28
28.000
28
28.000
28
28.000
28
28.000
31
31.000
31
31.000
31
31.000
31
31.000
31
31.000
31
31.000
31
31.000
59
59.000
59
59.000
59
59.000
59
59.000
59
59.000
59
59.000
59
59.000
10
Assessment Of Overall Fit
♦ Why?
Concept of R2 - Classification matrix measures the
predictive ability of the discriminant functions.
Hits ratio - how well the functions classify the units; % of
correct classification
Chi-square of D2 equivalence to F test for R2
♦ Cutting Score: The score used for constructing the classification
matrix. Optimal cutting score depends on sizes of groups. If equal, it
is halfway between the two groups centroid.
Z CU =
N Z
N
A
B
A
+ NBZA
+NB
♦ Probabilities of Classification: Need to be specified by
researcher.
Default is equal probabilities. Used when unsure if sample
proportions are representative
Proportional to group size: When sample drawn randomly
from population
Muhamad Jantan & T. Ramayah
Discriminant Analysis
11
Measures of Predictive
Accuracy
♦ How good is the Hit Ratio? Compute
Hit Ratio for
split sample and compare it against
Maximum Chance Criterion: This is just the size of
the largest group. Minimum criterion to be met by the
Hit Ratio
Proportional Chance Criterion: Should be used when
group sizes are unequal. If two groups this is given as
follows:
Cpro = p2 + (1 - p)2
p = proportion in group
Press’s Q: Compares No. of correct classification
(n) against Total Sample (N) and Number of
Groups (k)
[N - (n * k)]2
Press Q =
N(k - 1)
Q ∼ χ2 with 1 degree of freedom.
Muhamad Jantan & T. Ramayah
Discriminant Analysis
12
SPSS Results: Assessment
Classificat ion Resultsa ,b
Cases Selected
Cases not
selected
for
validation
purposes
Original
Count
%
Cases Not Select ed
Original
Count
%
Art iculat ion Of Needs
Specification Buying
Total Value Analysis
Specification Buying
Total Value Analysis
Specification Buying
Total Value Analysis
Specification Buying
Total Value Analysis
Predicted Group Membership
Specificat ion
Total Value
Buying
Analysis
27
1
4
27
96.4
3.6
12.9
87.1
10
2
4
25
83.3
16.7
13.8
86.2
a. 91.5% of selected original grouped cases correct ly classified.
b. 85.4% of unselected original grouped cases correctly classified.
Hits ratio = %
of correct
classification
For Selected Cases:
Tot al
28
31
100.0
100.0
12
29
100.0
100.0
When
comparing
hits ratio with
chance
criteria use
the hold-out
sample and
and that the
model
accuracy
should be
25% better
than chance
COMPARED Maximum Chance Criterion: 70.7%;
TO
Proportional Chance Criterion: 58.4%
Press Q = 99.90
Muhamad Jantan & T. Ramayah
Discriminant Analysis
13
SPSS Results: Assessment
St ruct ure Mat rix
Funct ion
1
Product Qualit y
-.693
Price Flexibilit y
.597
Delivery Speed
.544
Price Level
-.256
Service
.197
Manufact urer Image
-.060
Salesforce Image
-.044
Pooled wit hin-groups correlat ions bet ween discriminat ing
variables and st andardized canonical discriminant funct ions
Variables ordered by absolut e size of correlat ion wit hin
funct ion.
Linear
Correlation
between PQ and
the discriminant
function
Muhamad Jantan & T. Ramayah
Discriminant Analysis
14
SPSS Results: Assessment
Canonical Discriminant Funct ion Coefficient s
Funct ion
1
Delivery Speed
.419
Price Level
.116
Price Flexibilit y
.560
Manufact urer Image
-.049
Service
-.141
Salesforce Image
.342
Product Qualit y
-.623
(Const ant )
-1.788
Unst andardized coefficient s
Coefficients used in
the discriminant
function to
calculate the
Discriminant scores
which is used to
classify the
individuals
Discriminant Function:
Z = -1.788 - .419X1 + .116X2 + .560X3 - .049X4 - .141X5 + .342X6 - .623X7
Muhamad Jantan & T. Ramayah
Discriminant Analysis
15
SPSS Results: Assessment
Funct ions at Group Cent roids
Art iculat ion Of Needs
Specificat ion Buying
Tot al Value Analysis
Funct ion
1
-1.646
1.487
Unst andardized canonical discriminant
funct ions evaluat ed at group means
Discriminant scores
evaluated at the
means of
(x1,x2,x3,x4,x5,x6,x7)
for the two groups
Cutting Score:
This means all
respondents with Z
31( −1.646) + 28(1.487)
ZCU =
= − - 0.15915 scores less than
28 + 31
-0.15915 will be
classified into
Note: No. in Group 0 (Specification)
Specification Buying
= 27+1=28 and Group 1 (Total) =
and Total Value
27+4=31
Analysis otherwise
Muhamad Jantan & T. Ramayah
Discriminant Analysis
16
Interpretation Of Results
Relative importance of each of the independent
variable in discriminating the groups.
• Discrimination weight (or coefficient): relative
contribution of the variable to the function;
equivalent to beta of regression; but weight is
unstable - thus caution
Discriminant Loading (or structure correlation):
measures the simple linear correlation between the
independent and the discriminant function
Partial F-values: Only when stepwise procedure is
used. Large F values indicate large contribution
Potency
Index: Relative measure amongst
variables; Composite contribution of a variable to
all the significant discriminant function; Used when
more than one significant discriminant function
Muhamad Jantan & T. Ramayah
Discriminant Analysis
17
SPSS Results – 3 groups
Interpretation
♦ Use Low, Moderate and High Satisfaction
♦ How to assess the results?
Significance of Discriminant Function
Wilk’s λ, Hotelling trace, Pillai Criteria - evaluate the
discriminatory power of the function
♦ Predictive Accuracy: Classification Table – summary and
individual
Hits Ratio
Classification Results;
Determinant Function
Cutoff Points – Territorial Map
♦ Relative Importance of Variables
Discriminant Weights
Discriminant Loadings
Potency Index;
Muhamad Jantan & T. Ramayah
Discriminant Analysis
18
Discriminant Analysis – 3
group example
Profile Analysis: Who are the companies that treats each
purchase from HATCO as a straight rebuy, modified
rebuy, and new task
Method: Enter
Muhamad Jantan & T. Ramayah
Discriminant Analysis
19
SPSS: Results for 3
groups
Eige nvalue s
Funct ion
1
2
Eigenvalue
3.952a
.948a
% of
Variance
80.7
19.3
Cumulat ive %
80.7
100.0
Canonical
Correlat ion
.893
.698
a. First 2 canonical discriminant funct ions were used in t he
analysis.
Wilks' Lambda
Test of Funct ion(s)
1 t hrough 2
2
Wilks'
Lambda
.104
.513
Chi-square
120.131
35.342
df
14
6
Sig.
.000
.000
Both discriminant functions are significant
Muhamad Jantan & T. Ramayah
Discriminant Analysis
20
Descriptives
Group St at ist ics
Type of Buying Sit uat ion
New Task
Modified Rebuy
St raight Rebuy
Tot al
Muhamad Jantan & T. Ramayah
Delivery Speed
Price Level
Price Flexibilit y
Manufact urer Image
Service
Salesforce Image
Product Qualit y
Delivery Speed
Price Level
Price Flexibilit y
Manufact urer Image
Service
Salesforce Image
Product Qualit y
Delivery Speed
Price Level
Price Flexibilit y
Manufact urer Image
Service
Salesforce Image
Product Qualit y
Delivery Speed
Price Level
Price Flexibilit y
Manufact urer Image
Service
Salesforce Image
Product Qualit y
Mean
2.213
2.183
6.84 3
4 .991
2.14 8
2.591
7.983
3.371
3.879
7.007
5.814
3.607
2.886
7.900
4 .577
1.886
9.059
5.100
3.24 1
2.527
5.832
3.369
2.4 75
7.708
5.227
2.902
2.637
7.161
Discriminant Analysis
St d.
Deviat ion
.9593
.8239
.9154
1.0004
.5720
.6230
1.2561
.9770
1.1081
1.2332
1.0399
.6522
.74 61
1.2521
.9904
.8593
.6967
1.334 2
.3996
.8751
1.3275
1.4 14 9
1.2004
1.3935
1.1738
.8163
.754 7
1.6302
Valid N (list wise)
Unweight ed
Weight ed
23
23.000
23
23.000
23
23.000
23
23.000
23
23.000
23
23.000
23
23.000
14
14 .000
14
14 .000
14
14 .000
14
14 .000
14
14 .000
14
14 .000
14
14 .000
22
22.000
22
22.000
22
22.000
22
22.000
22
22.000
22
22.000
22
22.000
59
59.000
59
59.000
59
59.000
59
59.000
59
59.000
59
59.000
59
59.000
21
SPSS Results: Assessment
Classificat ion Result sa ,b
Cases Select ed
Original
Count
%
Cases Not Selected
Original
Count
%
Type of Buying Sit uat ion
New Task
Modified Rebuy
St raight Rebuy
New Task
Modified Rebuy
St raight Rebuy
New Task
Modified Rebuy
St raight Rebuy
New Task
Modified Rebuy
St raight Rebuy
Predict ed Group Membership
Modified
St raight
New Task
Rebuy
Rebuy
21
1
1
0
11
3
0
2
20
91.3
4.3
4.3
.0
78.6
21.4
.0
9.1
90.9
7
2
2
3
7
8
0
0
12
63.6
18.2
18.2
16.7
38.9
44.4
.0
.0
100.0
Tot al
23
14
22
100.0
100.0
100.0
11
18
12
100.0
100.0
100.0
a. 88.1% of select ed original grouped cases correct ly classified.
b. 63.4% of unselect ed original grouped cases correct ly classified.
Hits ratio = %
of correct
classification
COMPARED
TO
Muhamad Jantan & T. Ramayah
For Hold-out Sample (Unselected Cases):
Maximum Chance Criterion: 43.90%
Proportional Chance Criterion: 35.10%
[41 - (26 * 3]2
Press Q =
= 35.10
41(3 - 1)
Discriminant Analysis
22
SPSS Results: Assessment
Note:
St ruct ure Mat rix
Funct ion
1
2
Delivery Speed
.545*
-.081
Price Level
-.040
.915*
Service
.487
.703*
Price Flexibility
.520
-.523*
Product Quality
-.365
.395*
Manufacturer Image
.032
.297*
Salesforce Image
-.012
.196*
Pooled within-groups correlations between discriminat ing
variables and st andardized canonical discriminant funct ions
Variables ordered by absolut e size of correlation within
function.
*. Largest absolut e correlation between each variable
and any discriminant function
Muhamad Jantan & T. Ramayah
Discriminant Analysis
• The “*” indicate
the variables that
likely to dominate
the particular
function
• There are 2
functions because
there are 3 groups
23
SPSS Results: Assessment
Canonical Discriminant Funct ion Coefficient s
Function
1
Delivery Speed
-.324
Price Level
-.431
Price Flexibility
.900
Manufacturer Image
.285
Service
2.259
Salesforce Image
-.323
Product Quality
-.255
(Constant )
-10.145
Unstandardized coefficient s
2
1.055
1.770
-.144
.162
-1.463
-.149
.111
-3.830
Coefficients used in
the discriminant
function to
calculate the
Discriminant scores
which is used to
classify the
individuals
Discriminant Function:
Z1 = -10.145 - .324X1 - .431X2 + .900X3 + .285X4 + 2.259X5 – 0.323X6 - .255X7
Z2 = -3.830 + 1.055X1 + 1.770X2 - .144X3 + .162X4 - 1.463X5 – .149X6 + .111X7
Muhamad Jantan & T. Ramayah
Discriminant Analysis
24
SPSS Results: Assessment
Centroid
Territorial Map
Canonical Discriminant
Function 2
-6.0
-4.0
-2.0
.0
2.0
4.0
6.0
12
23
12
23
12
23
4.0
12
23
12
23
12
23
2.0
12
23
12
*
23
12
23
.0
12
23
*
12
23
*
12
23
-2.0
12
23
12 23
1223
-4.0
123
13
13
-6.0
13
-6.0
-4.0
-2.0
.0
2.0
4.0
Canonical Discriminant Function 1
Group 2
Group 1
Muhamad Jantan & T. Ramayah
Group 3
Discriminant Analysis
6.0
6.0
25