Tải bản đầy đủ (.pdf) (13 trang)

báo cáo hóa học:" Research Article Linear Classifier with Reject Option for the Detection of Vocal Fold Paralysis and Vocal Fold Edema" doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.07 MB, 13 trang )

Hindawi Publishing Corporation
EURASIP Journal on Advances in Signal Processing
Volume 2009, Article ID 203790, 13 pages
doi:10.1155/2009/203790
Research Article
Linear Classifier with Reject Option for the Detection of
Vocal Fold Paralysis and Vocal Fold Edema
Constantine Kotropoulos (EURASIP Member)
1, 2
and Gonzalo R. Arce
2
1
Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki 54124, Box 451, Greece
2
Department of Electrical and Computer Engineering, University of D elaware, 140 Evans Hall, Newark, DE 19716, USA
Correspondence should be addressed to Constantine Kotropoulos,
Received 1 November 2008; Revised 19 May 2009; Accepted 30 July 2009
Recommended by Juan I. Godino-Llorente
Two distinct two-class pattern recognition problems are studied, namely, the detection of male subjects who are diagnosed with
vocal fold paralysis against male subjects who are diagnosed as normal and the detection of female subjects who are suffering from
vocal fold edema against female subjects who do not suffer from any voice pathology. To do so, utterances of the sustained vowel
“ah” are employed from the Massachusetts Eye and Ear Infirmary database of disordered speech. Linear prediction coefficients
extracted from the aforementioned utterances are used as features. The receiver operating characteristic curve of the linear
classifier, that stems from the Bayes classifier when Gaussian class conditional probability density functions with equal covariance
matrices are assumed, is derived. The optimal operating point of the linear classifier is specified with and without reject option.
First results using utterances of the “rainbow passage” are also reported for completeness. The reject option is shown to yield
statistically significant improvements in the accuracy of detecting the voice pathologies under study.
Copyright © 2009 C. Kotropoulos and G. R. Arce. This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
1. Introduction


Vocal pathologies arise due to accident, disease, misuse of
the voice, or surgery affecting the vocal folds and have a
profound impact on patients’ life. The modeling of normal
and pathological voice source and the analysis of healthy and
pathological voices has gained increasing interest recently
[1]. Among the most interesting works are those concerned
with Parkinson’s Disease (PD) and multiple sclerosis, which
belong to a class of neurodegenerative diseases that affect
patients speech, motor, and cognitive capabilities [2, 3].
People with neurological conditions causing disability often
have associated dysarthria, which is the most common
acquired speech disorder affecting 170 per 100 000 popula-
tion [4]. Several studies explore the main voice characteristics
(i.e., the fundamental frequency and vocal tract resonance
frequencies) together with their deviation from the nominal
conditions for persons who exhibit voice disorders. Although
the majority of techniques analyze the speech signal, the
video modality offers complementary information [5, 6].
For example, three-dimensional (3D) magnetic resonance
imaging could be used to build a 3D numerical model of
the vocal tract and videokymography could overcome the
transmission speed and volume limitations of 2D imaging
(i.e., stroboscopy) for severely dysphonic patients with an
aperiodic signal, allowing to register the movements of
the vocal folds with a high time resolution on a line
perpendicular to the glottis [1]. Furthermore, the irregular
vocal fold oscillations can be observed by means of a digital
high-speed camera using image processing techniques in
order to extract the vocal fold edges, estimate the minimum
glottal area defined by the vocal fold positions, and compute

the distance between the glottal midline and the vocal fold
edges extracted at medial position in real-time [7]. The time
series of such displacements can drive an inversion procedure
in order to adjust the parameters of a biomechanical model
of vocal folds for both pathological and healthy vocal
fold oscillations. All the aforementioned techniques aim at
evaluating the performance of special treatments, such as the
Lee Silverman Voice Treatment [3], assisting the e-inclusion
of people with physical disabilities and disordered speech by
offering better access to telecommunication services [8]or
2 EURASIP Journal on Advances in Signal Processing
more efficient environmental control systems [9]. Thus, it is a
matter of great significance to develop systems able to classify
the incoming voice samples as normal or pathological ones
before other procedures are further applied.
Voice pathologies may be assessed by either percep-
tual judgments or an objective assessment. The perceptual
judgment resorts to qualifying and quantifying the vocal
pathology by listening to patients’ speech. Although this is
the most commonly used method by clinicians, it suffers
from several drawbacks. First of all, the perceptual judgment
has to be performed by an expert jury in order to increase
its reliability. Second, due to the lack of universal assessment
scales and the dependence on experts’ professional back-
ground and experience or the knowledge of patients history,
the perceptual judgment may involve large intra and inter-
variability. Third, the perceptual analysis is very costly in
time and human resources and cannot be planned regularly.
Nowadays an increasing use of objective measurement-based
analysis as a non-invasive technique for supporting diagnosis

in laryngeal pathology has been observed [8–11]. Objective
measurement-based analysis qualifies and quantifies the
voice pathology by analyzing acoustical, aerodynamic, and
physiological measurements. These measurements may be
directly extracted from patient’s speech utterance using a
simple computer-based system or may require special instru-
ments. Typical techniques, such as fundamental frequency
and jitter estimation should be carefully adapted in order to
take into account the significant variations of fundamental
frequencyfromcycletocycleaswellasthepresenceof
subharmonic and aperiodic components in the pathological
voice [12–14]. Very useful insight to the production of
disordered speech could be obtained through simulation
studies [15–17]. Although the objective analysis alleviates
the subjectivity of perceptual judgments, it has certain
limitations as well. First, the objective analysis often relies on
pattern recognition techniques, such as linear discriminant
analysis, correlation estimation, which do depend on the
measurements being analyzed. Second, the objective analysis
is frequently confined to the study of sustained vowels only,
which are not representative of continuous speech [18]. In
the medical literature, agreement between the perceptual
judgments and the findings of objective analysis is generally
sought for [19, 20].
Several techniques for the detection and classification of
voice pathologies by means of acoustic analysis, parametric
and non-parametric feature extraction, and pattern recog-
nition are reviewed in [21]. In all these techniques, first,
descriptive features are extracted from the speech signal.
A number of so-called classical parameters quantify pitch

perturbations (jitter), amplitude perturbations (shimmer)
and estimate the Harmonic to Noise Ratio at different
frequency bands and the critical-band energy spectrum by
employing either short-term Discrete Fourier Transform
and cepstral analysis [22–24] or the singularities in the
power spectral density of the vocal cord cover wave (also
referred to as the mucosal wave correlate) [25]. Alternatively,
features stemming from the 1-D bicoherence index derived
by the bispectrum [22] or nonlinear dynamical system
theory, such as statistics of the correlation dimension and the
largest Lyapunov exponent [26], or the return period density
entropy [27] were extracted. Features could also be obtained
by applying the continuous wavelet transform to each speech
frame and averaging neighbor wavelet coefficients on time-
frequency scale [28]. Frequently, feature vectors undergo
dimensionality reduction by applying Principal Component
Analysis (PCA) [29–31] before classification or a subset of
features are selected by applying either a wrapper or a filter.
Next, the features are either clustered in a number of pre-
defined classes, say by a K-means algorithm [30]orarefed
to a classifier, which is designed to solve a two-class pattern
recognition problem. That is, to verify a specific pathology
in a test utterance or to decide whether a test utterance
is pathological or not. Commonly used classifiers resort to
linear discriminant analysis (LDA) [23, 27, 29, 32], nearest
neighbors [24, 26, 29], vector quantization [33]orsupport
vector machines (SVMs) [28,
31, 34]. It is worth noting that
the detection of voice pathology is closely related to speaker
verification. In particular, pathological class models can be

derived from generic Gaussian mixture models by employing
the maximum a posteriori adaptation technique [35]and
adapting only the means [34]. While a sustained phonation
can be classified as normal or pathological with an accuracy
greater than 90% when speech is recorded in laboratory
conditions [21], telephone quality speech can be classified as
normal or pathological with a much smaller accuracy, that is,
74.15% [23].
In this paper, we are concerned with vocal fold paralysis
and vocal fold edema, which are both associated with com-
munication deficits that affect the perceptual characteristics
of pitch, loudness, quality, intonation, and have similar
symptoms with PD and other neuro-degenerative diseases
[36]. We are interested in detecting male subjects who are
diagnosed with vocal fold paralysis against male subjects
who are diagnosed as normal. Similarly, we would like
to distinguish between female subjects who are diagnosed
with vocal fold edema against female subjects who are
diagnosed as normal. Utterances from the Massachusetts Eye
& Ear Infirmary (MEEI) Voice Disorders Database, which is
distributed by Kay Elemetrics [37], are employed, because
the MEEI database is a benchmark annotated speech corpus.
A review of several voice pathology detection approaches
with the MEEI database can be found in [21]. However, the
majority of these approaches aim at identifying whether an
utterance is pathological or not without addressing which
speech pathology is observed. Although a direct comparison
between these methods is not possible, because different data
subsets have been used and different performance criteria
have been employed, one can roughly claim that the state

of the art accuracy in detecting whether an utterance is
pathological or not exceeds 98% [38, 39]. In the following,
let us confine ourselves to vocal fold paralysis and edema
detection. The identification of vocal fold paralysis using
the normalized energy across various scaling factors of
the wavelet transform and a multilayer neural network
trained by back-propagation was proposed [40]. For 50
data samples of the MEEI database, an average classification
accuracy of 90% was reported. The performance of Fisher’s
linear classifier, the K-nearest neighbor classifier, and the
EURASIP Journal on Advances in Signal Processing 3
nearest mean one for detecting vocal fold paralysis in male
utterances and vocal fold edema in female utterances was
assessed in [29]. The subjects were called to articulate the
sustained vowel “ah” (/a/). From each recording, two central
frames were selected among the ones that belong to the
most stationary portion of the sustained speech signal as is
proposed in [41, 42]. 14-order linear prediction coefficients
(LPCs) were extracted from each frame. The dimensionality
of the raw feature vector was then reduced to 2 by PCA.
Receiver operating characteristic (ROC) curves for the Fisher
linear classifier were demonstrated. It was shown that a
probability of detection close to 85% could be achieved
for a probability of false alarm 10% in the case of vocal
fold paralysis in male utterances, while the probability of
detection for vocal fold edema in female utterances was
found to be approximately 73% at the same probability
of false alarm. The nearest mean classifier was found to
outperform K-nearest neighbor classifiers for K
= 1, 2,3

in both experiments. Two linear classifiers were examined
in [32]. The first one is based on a sample-based optimal
linear classifier design [43], while the second one is based
on the dual-space linear discriminant analysis [44]. Again 14
LPCs were extracted by processing utterances corresponding
to the sustained vowel “ah.” Both the rectangular and the
Hamming window are used to extract the speech frames [45].
The assessment of the classifiers studied in [32]wasdoneby
estimating the probability of false alarm and the probability
of detection using the leave-one-out method. The parametric
classifier was found to be more accurate than the dual space
linear discriminant classifier. In particular, a slightly higher
probability of detection for vocal fold paralysis in men was
measured, that is approximately equal to 90% for probability
of false alarm 10%. The gain in the probability of detection
for vocal fold edema in women was 20% higher than that
achieved by the Fisher linear discriminant in [29]. LPCs,
LPC-derived cepstral coefficients, and mel frequency cepstal
coefficients were extracted for vocal fold edema detection in
[33]. A vector quantizer was trained based on the distance
between the feature vectors. Experiments were conducted
by using 53 normal speakers and another 67, who were
diagnosed with voice pathologies including vocal fold edema.
Only a single operating point was reported, which yields
probability of detection approximately 73% for probability
of false alarm 4% [33]. For the same probability of false
alarm, a probability of detection, which falls between 80.95%
for rectangular window and 90.47% for Hamming window,
was reported in [32].
Two distinct two-class pattern recognition problems are

studied, namely, the detection of male subjects who are
diagnosed with vocal fold paralysis against male subjects who
are diagnosed as normal and the detection of female subjects
who are suffering from vocal fold edema against female
subjects who do not suffer from any voice pathology. The
rationale for gender-dependent voice pathology detection
is in the inherent differences of the speech production
system for male and female speakers and the higher accuracy
for speech emotion recognition, speaker indexing, speaker
recognition, and so forth, offered by the gender-dependent
models than the gender-independent ones. The ROC curve
of the linear classifier, that stems from the Bayes classi-
fier when Gaussian class conditional probability density
functions with equal covariance matrices are assumed, is
derived. The optimal operating point of the linear classifier
is specified with and without reject option. The contribution
of this paper is in the assessment of the impact of reject
option in the ROC curve of the linear classifier for the two-
class pattern recognition problems under study. Although
sustained vowels are not representative of continuous speech,
utterances of the sustained vowel “ah” from the MEEI
database are employed here due to their wide use in
medical practice and, primarily, in order to maintain direct
compatibility with previously reported results [29, 32]and
minimal problem complexity, so that we focus on the role of
the reject option. However, first experimental results using
continuous speech utterances are reported for completeness.
A reject region in classifier design was also proposed in [27],
but without demonstrating its impact in the ROC curve.
The motivation behind the introduction of reject option

in classifier design is two-fold: First, when the conditional
error given a feature vector due to the decision rule (also
known as classification risk) is high, the classifier should
postpone making any decision and request rather for expert’s
advice. Second, new classes may appear during the test
phase, which were not present during training or some
classes may be sampled poorly during training leading to
inaccurate class models [46]. The introduction of reject
option in the design of two-class classifiers (also known
as dichotomizers) and its impact on the ROC has recently
attracted the attention of the pattern recognition community
[46–49]. Linear prediction coefficients extracted from the
utterances are used as features. The reject option is shown
to yield statistically significant improvements in the accuracy
of detecting the voice pathologies under study.
The outline of the paper is as follows. Section 2 describes
briefly the Bayes classifier for both minimum error and min-
imum cost classification in a two-class pattern recognition
problem without a reject option and discusses the motivation
behind the adoption of a linear classifier. Section 2.1 defines
the ROC curve and its use to derive the optimal operating
point for a two-class classifier. The introduction of reject
option in a dichotomizer is addressed in Section 3.The
data-set used is presented in Section 4 along with feature
extraction. Experimental results are reported in Section 5
and conclusions are drawn in Section 6.
2. The Bayes and the Linear Classifiers
without Reject Option
Let X denote a sample (i.e., a feature vector). Let the class
Ω

1
comprise of samples from healthy subjects and the class
Ω
2
comprise of samples from subjects diagnosed with certain
pathologies. The Bayes rule for minimum error assigns X to
the class Ω
i
having the maximum a posteriori probability
given X [43]. That is,

(
X
)
=
p
1
(
X
)
p
2
(
X
)
Ω
1

Ω
2

P
2
P
1
,(1)
4 EURASIP Journal on Advances in Signal Processing
where p
i
(X) are the class conditional probability density
functions (pdfs) and P
i
are the a priori probabilities of the
classes Ω
i
, i = 1, 2. The term (X) at the left-hand side of (1)
is known as likelihood and the fraction in the right-hand side
of (1) is called the threshold value of the likelihood ratio for
decision [43]. Frequently, the decision is expressed in terms
of the minus log-likelihood ratio h(X)
=−ln (X), which
is known as the discriminant function. Let us assume that
the class conditional pdfs are normal densities with mean
vectors M
i
and covariance matrices Σ
i
, i = 1, 2. Then, the
discriminant function becomes a quadratic function of X,
that is,
h

(
X
)
=
1
2
(
X
−M
1
)
T
Σ
−1
1
(
X
−M
1
)

1
2
(
X
−M
2
)
T
Σ

−1
2
(
X
−M
2
)
+
1
2
ln

1
|

2
|
Ω
1

Ω
2
ln
P
1
P
2
.
(2)
The minimization of the probability of classification

error treats equally the misclassifications of Ω
1
-andΩ
2
-
samples. However, a higher decision cost should be assigned
whenever a patient is misclassified as normal than whenever
a normal subject is misclassified as patient. By introducing
the cost c
ij
of deciding X ∈ Ω
i
although X actually belongs
to Ω
j
according to ground truth, the B ayes test for minimum
cost is obtained:
p
1
(
X
)
p
2
(
X
)
Ω
1


Ω
2
(
c
12
−c
22
)
P
2
(
c
21
−c
11
)
P
1
. (3)
The comparison of (3)with(1) reveals that only the
threshold has been changed in the right-hand side of the
likelihood ratio test. Clearly, for symmetrical cost function,
that is, c
12
− c
22
= c
21
− c
11

, the aforementioned likelihood
ratio tests coincide. Hereafter, we will employ a linear
classifier that stems from the quadratic one (2)ifequal
covariance matrices Σ
1
= Σ
2
=

Σ are assumed, that is,

h
(
X
)
=


M
2


M
1

T

Σ
−1
X

+
1
2


M
T
1

Σ
−1

M
1


M
T
2

Σ
−1

M
2

Ω
1

Ω

2
t,
(4)
where

M
i
is the sample mean for Ω
i
, i = 1,2, t denotes the
threshold admitting a value in the range of the discriminant
function, and

Σ is the gross sample covariance matrix
estimated from the design set without making any distinction
between normal and pathological samples. That is,

Σ =
(1/N)

N
l=1
(X
l


M)(X
l



M)
T
,whereX
l
, l = 1, 2, , N
are the feature vectors in the design set of cardinality N
and

M is the gross sample mean feature vector. In the
Bayes sense, the linear classifier is optimum only for the
normal distribution with equal covariance matrices [43].
Although, the assumption of equal covariance matrices
might not be plausible in reality, the simplicity of the
classifier compensates for any potential loss in accuracy other
classifiers (e.g., SVMs) might deliver. Indeed, (4)requires
only

Σ and

M
i
, i = 1, 2 to be estimated from the design
set. However, it should be stressed that no linear classifier
performs well, when the distributions are not separated by
the mean-difference, but are separated by the covariance-
difference. In the latter case, one has to adopt a more complex
classifier, for example, a quadratic one.
2.1. ROC Curve without Reject Option. The decisions taken
by the linear classifier (4) for all test samples yield the
following measures, which are functions of the threshold t:

(i) true positive rate (TP), also called sensitivity or prob-
ability of detection P
D
, which is defined as the ratio
between pathological samples correctly classified and
the total number of pathological samples;
(ii) false negative rate (FN), also called probability of miss,
which is defined as the ratio between pathological
samples wrongly classified and the total number of
pathological samples;
(iii) true negative rate (TN), also called specificity,whichis
defined as the ratio between normal samples correctly
classified and the total number of normal samples;
(iv) false positive rate (FP) also known as probability
of false alarm P
FA
, which is defined as the ratio
between normal samples wrongly classified and the
totalnumberofnormalsamples.
By varying the threshold, we obtain several operating points
of the classifier, which can be represented through the receiver
operating characteristic (ROC) curve, which is the plot of P
D
(TP) versus P
FA
(FP) having t as an implicit parameter. The
ROC is always a concave upwards curve [50]. If a single figure
of merit out of a ROC curve is sought, the most commonly
used figure of merit is the area under the ROC curve. An
ideal classifier would have a unit area under the ROC curve.

Besides the visualization of classifier performance, the ROC
curve can be used to select the most appropriate decision
threshold for a particular application [47]. In this case, one
has to resort to the costs c
ij
, i, j = 1, 2, shown in the upper
two rows in Tab le 1 . Clearly, c
12
and c
21
are related to a
false negative and a false positive classification, while c
11
and c
22
refer to the costs of true negative and true positive
classifications. A particular operating point (P
FA
(t), P
D
(t)) at
threshold t is associated to the expected cost [47]:
EC
(
t
)
= P
1
(
c

21
−c
11
)
P
FA
(
t
)
+ P
2
(
c
22
−c
12
)
P
D
(
t
)
+ P
1
c
11
+ P
2
c
12

(5)
which defines a set of straight lines with slope
α
=−
P
1
P
2
c
21
−c
11
c
22
−c
12
(6)
EURASIP Journal on Advances in Signal Processing 5
Table 1: Costs for voice pathology detection with reject option.
Detector’s decision
Actual diagnosis
Normal (1) Pathological (2)
Normal (1) c
11
c
12
Pathological (2) c
21
c
22

Reject c
R1
(CRN) c
R2
(CRP)
on the (P
FA
(t), P
D
(t)) plane. Among these lines the one
touches the ROC curve determines the best operating point,
that is, the threshold that minimizes the expected cost. If
the ROC curve has been obtained by means of a parametric
model, it is a smooth curve and the best operating point
is where the line is tangent to the ROC curve [50]. When
the ROC curve is defined with respect to a finite number
of experimental measurements connected with straight lines,
the optimal operating point can be determined by the point
where a line with slope α touches the ROC curve moving
downwards from the top left corner of the (P
FA
, P
D
)plane
[51]. Such point lies on the ROC convex hull. That is, the
smallest convex set containing the points of the ROC curve
[47].
3. Dichotomizers with Reject Option
Given X, the conditional error (or risk) for the Bayes
classifier for minimum error (1)is

r
(
X
)
= min

P
1
p
1
(
X
)
, P
2
p
2
(
X
)

.
(7)
When r(X) is close to 0.5, decision-making can be postponed
by introducing a reject test. By setting a threshold θ for r(X),
the reject region is defined as [43]
r
(
X
)

≥ θ ⇐⇒ − ln
1
−θ
θ
+ln
P
1
P
2
≤ h
(
X
)
≤ ln
1
−θ
θ
+ln
P
1
P
2
.
(8)
Thus whenever (8) is satisfied, the sample X is rejected.
That is, no decision is taken by the classifier and further
advice is requested by a medical doctor in the context of the
application discussed in the paper. Samples in Ω
1
satisfying

h(X) > ln((1
− θ)/θ) + ln(P
1
/P
2
) are misclassified (FP).
Similarly, samples in Ω
2
satisfying h(X) < −ln((1 − θ)/θ)+
ln(P
1
/P
2
) are misclassified (FN). Equation (8) suggests to
modify the linear classifier decision rule (4) by introducing
two thresholds t
1
and t
2
with t
1
≤ t
2
as follows:
X
∈ Ω
1
(
N
)

if

h
(
X
)
<t
1
,
X
∈ Ω
2
(
P
)
if

h
(
X
)
>t
2
,
X is rejected if t
1


h
(

X
)
≤ t
2
.
(9)
Obviously, (9) suggests that although the probability of
rejection is a fraction of the test samples, the probability of
false alarm and the probability of detection is now a fraction
of the test samples, which are not being rejected. That is,
the denominators in the estimates of the just mentioned
probabilities are now different than those without rejection.
In a sample-based approach, we may set t
1
= t −ϑ and
t
2
= t + ϑ,wheret admits values uniformly spaced in the
interval [h
min
, h
max
]withh
min
= min
X∈(Ω
1
∪Ω
2
)

{

h(X)} and
h
max
= max
X∈(Ω
1
∪Ω
2
)
{

h(X)}, while ϑ = γΔt,whereΔt is the
step increment of t and γ is a small integer. However, such
a choice does not harm the validity of the analysis following
for generic (asymmetric) thresholds t
1
and t
2
[47]. Let T the
set of discrete thresholds determined by the just described
procedure for t.Onemaysett
1
∈ T and t
2
∈ T so that
t
2
>t

1
.
3.1. ROC Curve with Reject Option. When a reject option
is introduced in the classifier design, the costs for rejection
should be inserted in the last row of Tab le 1 . The optimal
values of t and ϑ (or γ) should be determined so that the
following two conflicting requirements are fulfilled, namely
classification error reduction and limited reject region in
order to preserve as many correct classifications as possible.
Following similar lines to [47], it can be shown that the
expected cost associated with the classification (9)isnowa
function of two variables and is given by
EC
(
t, ϑ
)
= 
2
(
t + ϑ
)
−
1
(
t
−ϑ
)
+ P
2
c

12
+ P
1
c
11
,
(10)
where

1
(
t
−ϑ
)
= P
2
(
c
12
−c
R2
)
P
D
(
t
−ϑ
)
+ P
1

(
c
11
−c
R1
)
P
FA
(
t
−ϑ
)
,

2
(
t + ϑ
)
= P
2
(
c
22
−c
R2
)
P
D
(
t

−ϑ
)
+ P
1
(
c
21
−c
R1
)
P
FA
(
t
−ϑ
)
.
(11)
The optimal t and ϑ satisfy

t,ϑ
EC(t, ϑ) = 0. This is
equivalent to
P
2
(
c
22
−c
R2

)
∂P
D
(
t
2
)
∂t
2
+ P
1
(
c
21
−c
R1
)
∂P
FA
(
t
2
)
∂t
2
−P
2
(
c
12

−c
R2
)
∂P
D
(
t
1
)
∂t
1
−P
1
(
c
11
−c
R1
)
∂P
FA
(
t
1
)
∂t
1
= 0,
P
2

(
c
22
−c
R2
)
∂P
D
(
t
2
)
∂t
2
+ P
1
(
c
21
−c
R1
)
∂P
FA
(
t
2
)
∂t
2

+ P
2
(
c
12
−c
R2
)
∂P
D
(
t
1
)
∂t
1
+ P
1
(
c
11
−c
R1
)
∂P
FA
(
t
1
)

∂t
1
= 0,
(12)
where the following change of variables has been made t
1
=
t − ϑ and t
2
= t + ϑ. By adding and subtracting by parts the
two equations in the set (12), we arrive at
P
2
(
c
22
−c
R2
)
∂P
D
(
t
2
)
∂t
2
+ P
1
(

c
21
−c
R1
)
∂P
FA
(
t
2
)
∂t
2
= 0,
P
2
(
c
12
−c
R2
)
∂P
D
(
t
1
)
∂t
1

+ P
1
(
c
11
−c
R1
)
∂P
FA
(
t
1
)
∂t
1
= 0.
(13)
6 EURASIP Journal on Advances in Signal Processing
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1

P
D
00.10.20.30.40.50.60.70.80.91
P
FA
Withoutrejectoption
With reject option
(a)
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
P
D
00.05 0.10.15 0.20.25
P
FA
Withoutrejectoption
With reject option
(b)
Figure 1: (a) Experimental ROC curves of the linear classifier tested for vocal fold paralysis detection in men without reject option (dashed
line) and with reject option (solid line). (b) Zoom in the ROC curves.
The set of equations (13) defines two straight lines with

slopes
α
1
=−
P
1
P
2
c
21
−c
R1
c
22
−c
R2
, (14)
α
2
=−
P
1
P
2
c
11
−c
R1
c
12

−c
R2
(15)
on the plane of P
FA
and P
D
. Equations (14)and(15)are
valid for generic t
1
and t
2
. The set of equations (13) suggests
that the straight lines of slope α
1
and α
2
should touch the
convex hull of the ROC curve without reject option at two
distinct points having implicit parameters t
1
and t
2
such
that t
1
<t
2
. Each of these distinct points can be found by
means of a simple search of the edges of the ROC convex

hull derived without the reject option [47]. Having found
t
1
and t
2
, the set of equations t
1
= t − ϑ and t
2
= t + ϑ is
then solved for t and ϑ. Clearly, the just derived estimates of
t and ϑ are initial ones, because they depend on the convex
hull resolution of the ROC curve without rejection estimated
from the threshold values t
∈ T . The initial estimates of t
and ϑ can be corrected, when the operating point they define
lies inside the convex hull of the ROC curve with rejection.
Since the probability of false alarm and the probability of
detection in the latter ROC curve are fractions of the test
samples, which are not being rejected, the lines of slope α
given by (6) should touch the convex hull of the ROC curve
with rejection at the optimal operating point. The values of
t and ϑ of the aforementioned optimal operating point are
better estimates than the initial ones. If the initial estimates
of t and ϑ define an operating point outside the convex hull
of the ROC curve with rejection, then no further correction
is needed, because such an operating point defines a new
vertex of the convex hull linked by two new edges with the
nearest vertices already included in the available convex hull.
Obviously, the new vertex will be the point where the lines of

slope α touch the updated convex hull.
4. Datasets and Feature Extraction
The MEEI database was released in 1994 [37]. It contains
over 1400 voice signals of approximately 700 subjects. Two
different kinds of recordings were collected: the patients
were called to articulate the sustained vowel “ah” (/a/)
and to read the “rainbow passage” in each session. The
database contains recordings of vowel “ah” (53 normal and
657 pathological utterances) and continuous speech (53
normal and 661 pathological utterances). The discussion is
focused on the sustained vowel recordings and first results
on “rainbow passage” recordings will be reported. The
recordings were performed in matching acoustic conditions,
using Kays Computerized Speech Lab. Each subject was
asked to produce a sustained phonation of vowel “ah” at
a comfortable pitch and loudness for at least 3 seconds.
The process was repeated three times for each subject,
and a speech pathologist chose the best sample for the
database. The recordings of the sustained vowel were made
at a sampling rate of 25 KHz for patients and 50 KHz for
the healthy subjects. In the latter case, the sampling rate
was reduced to 25 KHz by down-sampling. The normal
voice recordings are about 5 seconds long, whereas the
pathological ones are about 3 seconds long. The major
asset of the MEEI database is the clinical assessment of the
subjects as well as the availability of subjects’ personal details.
However, there are several drawbacks that are carefully
identified in [21].
Due to the inherent differences in the speech production
system of male and female subjects, it makes sense to deal

EURASIP Journal on Advances in Signal Processing 7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
P
D
00.10.20.30.40.50.60.70.80.91
P
FA
(a)
0.7
0.75
0.8
0.85
0.9
0.95
1
P
D
00.05 0.10.15 0.20.25
P
FA

(b)
Figure 2: (a) Convex hull of the experimental ROC curve of the linear classifier without reject option (solid line) with the level lines of slope
α (dashed lines) overlaid. (b) Zoom in (a): the arrow points to the optimal operating point (P
FA
, P
D
) = (0.0252, 0.9296).
with disordered speech detection separately for each gender.
Two experiments are conducted. The first experiment con-
cerns vocal fold paralysis detection and the dataset comprises
recordings from 21 males aged 26 to 60 years, who were
medically diagnosed as normal, and another 21 males aged
20 to 75 years, who were medically diagnosed with vocal
fold paralysis. The second experiment concerns vocal fold
edema detection, where 21 females aged 22 to 52 years,
who were medically diagnosed as normal, and another 21
females aged 18 to 57 years, who were medically diagnosed
with vocal fold edema served as subjects. The subjects
might suffer from other diseases too, such as hyperfunction,
ventricular compression, atrophy, teflon granuloma, and
so forth. Although a multi-label classification framework
would be more appropriate, we will assume a sort of
tying in this paper by ignoring the other connotations, so
that enough design and test samples are available for our
study. Multi-label classification is left for future research.
However, the linear classifier studied in the paper requires
only the estimation of the class-conditional mean vectors
and the gross dispersion matrix. Accordingly, the number of
adjustable parameters is not high.
As in [29, 32], 14 LPCs are extracted for each speech

frame.Thespeechframeshaveadurationof20msand
neighboring frames do not overlap. The rectangular window
is used to extract the speech frames. By varying the number
of LPCs from 14 to 30, we have found that the probability
of correct classification for both voice pathologies does not
improve so much to justify linear prediction analysis of
higher order than the 14th. On the contrary, more LPCs
than 14 are found to frequently deteriorate the probability
of correct classification.
In the first experiment, the sample set consists of 4236
14-dimensional feature vectors (i.e., samples) of which 3171
samples were extracted from normal speech utterances of the
sustained vowel “ah” and the remaining 1065 samples were
extracted from pathological speech uttered by male speakers.
In the second experiment, the sample set consists of 4199
Table 2: Arithmetic values of the costs employed for voice
pathology detection with reject option.
Detector’s decision Actual diagnosis
Normal (1) Pathological (2)
Normal (1) −110
Pathological (2) 5
−1
Reject 1 2
14-dimensional feature vectors of which 3096 samples were
extracted from normal speech utterances of the sustained
vowel “ah” and the remaining 1103 samples were extracted
from pathological speech uttered by female speakers. For
each experiment, first experimental results using utterances
of “rainbow passage” are also reported.
5. Experimental Results

The assessment of the linear classifier for detecting vocal
fold paralysis in men and vocal fold edema in women either
with or without reject option is based on the ROC curve.
80% of the samples have been used in classifier design,
and the remaining 20% of the samples has been used for
testing the classifier. The classifier design aims at estimating
the parameters appearing in (4). The costs depicted in
Ta bl e 2 have been used in the study of ROC curves. The
negative sign for true positives and true negatives should be
interpreted as a gain. The assignment of a higher cost for false
negatives (misses) than false positives (false alarms) is easily
understood. The costs c
R2
(CRP) and c
R1
(CRN) are chosen
so that the inequality
c
11
−c
R1
c
12
−c
R2
>
c
21
−c
R1

c
22
−c
R2
(16)
holds [47].Adesignstrategyisasfollows.
8 EURASIP Journal on Advances in Signal Processing
0.05
0.1
0.15
Probability of rejection
0.3
0.2
0.1
0
ϑ
−4
−2
0
2
t
(a)
0.2
0.4
0.6
0.8
Probability of rejection
5
0
−5

t
2
−4
−2
0
2
4
t
1
(b)
Figure 3: Probability of rejection in vocal fold paralysis detection as a function of (a) t and ϑ,(b)t
1
, t
2
∈ T with t
2
≥ t
1
.
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
P
D

0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055
P
FA
(a)
0.84
0.86
0.88
0.9
0.92
0.94
0.96
0.98
1
P
D
0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055
P
FA
(b)
Figure 4: (a) Zoom in the convex hull of the ROC without reject option (solid line); the level lines of slope α
1
(dashed lines) are overlaid.
The arrow points to the optimal operating point (P
FA
(t
2
), P
D
(t
2

)) = (0.0252, 0.9296). (b) Zoom in the convex hull of the ROC without reject
option (solid line); the level lines of slope α
2
(dashed lines) are overlaid. The arrow points to the optimal operating point (P
FA
(t
1
), P
D
(t
1
)) =
(0.0472, 0.9531).
(1) Choose c
22
<c
R2
<c
12
,forexample,c
R2
= 2.
(2) Let η
= (c
12
−c
R2
)/(c
R2
−c

22
) > 0, for example, η = 1.
(3) Then, c
R1
< (c
21
η + c
11
)/η +1,forexample,c
R1
< 4.5.
In addition, c
R1
should be chosen so that the straight lines
of slope α
1
and α
2
touch the convex hull of the ROC
curve without reject option at two distinct points in order
the reject option to be meaningful. The choice c
R1
= 1
satisfies both requirements. However, any other assignment
stemming from the just described strategy could also be used.
5.1. Vocal Fold Paralysis in Men. The experimental ROC
curves of the linear classifier without reject option (4)and
with reject option (9), that were derived by counting classifier
decisions, are shown in Figure 1.
In order to obtain a better insight into the detection,

first the convex hull of the ROC curve without the reject
option is plotted in Figure 2(a). In the same figure, several
parallel level lines P
D
(t) = αP
FA
(t)+β(t) are overlaid.
Clearly, one of these lines passes through the ideal operating
point (P
FA
(t), P
D
(t)) = (0, 1). The intercept of this line
EURASIP Journal on Advances in Signal Processing 9
0.7
0.75
0.8
0.85
0.9
0.95
1
P
D
00.02 0.04 0.06 0.08 0.10.12 0.14 0.16 0.18 0.2
P
FA
Figure 5: Zoom in the ROC convex hulls with reject option (solid
line) and without reject option (dashed line).
is β(t)|
{t:P

FA
(t)=0,P
D
(t)=1}
= 1. Accordingly, to produce the
set of parallel lines one has to uniformly vary β
∈ [0, 1].
The inspection of Figure 2(b) reveals the optimal operating
point (P
FA
(t), P
D
(t)) = (0.0252, 0.9296), where the level lines
touch the ROC convex hull. Indeed, the line above that
touching the ROC curve does not determine any feasible
point for the classifier, although it exhibits a lower expected
cost, while the line below intersects the ROC curve in at least
two points, but at a greater expected cost. The easiest method
to identify the optimal point is the visual inspection of the
graph. However, since the vertices of the convex hull have
already been determined, one has to insert the associated
(P
FA
(t), P
D
(t)) into (5), sort the vertices in increasing order
of the expected cost, and read the operating point that
yields the minimum expected cost. Alternatively, one may
search the edges of the ROC convex hull as is suggested in
[47]. All these methods have been successfully tested in all

experiments conducted.
The introduction of the reject option in (9) induces the
probability of rejection, which is plotted in Figure 3 as a
function of t
1
and t
2
when the costs shown in Ta b le 2 are
used. Figure 3(a) depicts the probability of rejection as a
function of t and ϑ.Inparticular,t
∈ T and 10 equally
spaced values of ϑ
∈ [0, 3Δt] were defined. As expected,
the largest probability of rejection (i.e., 0.1804) occurs for
t
=−0.7330 and ϑ = 0.2434 yielding thresholds t
1
and t
2
in
the middle of their domain T . The probability of rejection
for t
1
, t
2
∈ T with t
2
≥ t
1
is plotted in Figure 3(b).Itisseen

that the generic rejection region may yield large probabilities
of rejection leaving very few test samples to be processed
by the classifier. On the contrary, much fewer test samples
should be submitted to a clinician for further screening, if
t
1
, t
2
aresetequaltot ± ϑ.
In Figure 4(a), the convex hull of the ROC without
rejection is plotted along with the level lines having slope
α
1
given by (14). The points that define the ROC convex
hull are indicated by markers. The level lines touch the
ROC convex hull at the operating point (P
FA
(t
2
), P
D
(t
2
)) =
0.5
0.55
0.6
0.65
0.7
0.75

0.8
0.85
0.9
0.95
1
P
D
00.05 0.10.15 0.20.25 0.30.35 0.40.45 0.5
P
FA
Withoutrejectoption
With reject option
Figure 6: Zoom in the experimental ROC curves of the linear
classifier applied to vocal fold edema detection in women without
reject option (dashed line) and with reject option (solid line).
(0.0252, 0.9296). The level lines having slope α
2
given by (15)
touch the convex hull of the ROC without rejection at the
operating point (P
FA
(t
1
), P
D
(t
1
)) = (0.0472, 0.953), as can
be seen in Figure 4(b). The implicit thresholds associated
with the two operating points are t

1
=−0.2822 and t
2
=

0.1920. Indeed, the reject option is useful in the middle
of the domain of thresholds T . By applying the procedure
described in Section 3.1, the associated probabilities of false
alarm and detection with reject option at the optimal
operating point are found to be 0.01904 and 0.99484. It
is seen that the introduction of rejection has improved
the probability of detection by 6.59% for probability of
false alarm fixed to approximately 2%. The classification
accuracy with reject option at the operating point under
discussion is measured 98.47%, that is 2.13% higher than
that measured without rejection. The confidence interval for
the classification accuracy can be estimated as in [21], that is,
CI
=±z
1−δ/2

q

1 − q

N
,
(17)
where z
1−δ/2

is the standard Gaussian percentile for con-
fidence level 100 (1
− δ)% (e.g., for δ = 0.05, z
1−δ/2
=
z
0.975
=1.967), q is the experimentally measured classification
accuracy, and N is the number of samples. In our case,
for N
= 847 and q = 0.96863, (17) yields 0.83%,
which indicates that the just mentioned improvement is
statistically significant at 95% level of significance. If c
R1
is
set equal to
−1 (i.e., a gain is introduced for rejecting normal
subjects), which is a permissible policy according to the cost
assignment methodology described previously, and all other
costs are left intact, the probability of correct classification at
the best operating point increases to 98.59%, which yields
a statistically significant improvement at the same level of
significance (CI
= 0.7954%). At the latter operating point,
10 EURASIP Journal on Advances in Signal Processing
0.2
0.4
0.6
0.8
1

P
D
00.10.20.30.40.50.60.7
P
FA
(a)
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
P
D
00.05 0.10.15 0.20.25
P
FA
(b)
Figure 7: (a) Convex hull of the experimental ROC curve of the linear classifier without reject option (solid line) with the level lines of slope
α (dashed lines) overlaid. (b) Zoom in (a), the arrow points to the optimal operating point (P
FA
, P
D
) = (0.0629, 0.7955).
we have P
FA
= 0.0172 and P
D

= 0.994709, when the reject
option is enabled.
The superiority of the linear classifier with reject option
is demonstrated in Figure 5, where the convex hull of the
ROC curves with reject option (solid line) and without reject
option (dashed line) are plotted only. It is self-evident that
the area of the convex hull for the ROC with reject option
is greater than that without reject option. The area of the
convex hull is correlated with the area under the ROC that is
frequently used as an objective figure of merit. In particular,
the area under the ROC was measured to 0.9868 without
rejection and 0.9951 with rejection option, when t
1
= t − ϑ
and t
2
= t + ϑ.
The same procedure has been applied to a set of 5049
test feature vectors extracted from utterances of “rainbow
passage.” At the optimal operating point with respect to the
costs of Ta ble 2 the classifier without reject option yields
P
FA
= 0.477227 and P
D
= 0.9358 and its accuracy is 72.93%.
The introduction of the reject option yields at the optimal
operating point P
FA
= 0.0686 and P

D
= 0.91875, while the
probability of correct classification increases to 92.45%. It is
seen that the reject option reduces drastically the probability
of false alarm by approximately 40% at the same probability
of detection. Needless to say that the improvement in
classification accuracy is statistically significant.
5.2. Vocal Fold Edema in Women. The experimental ROC
curves of the linear classifier without reject option (4)and
with reject option (9) with the cost assignment shown in
Ta bl e 2 were derived by counting classifier decisions are
plotted in Figure 6.
The convex hull of the ROC curve without reject option
is plotted in Figure 7. In the same figure, a set of parallel level
lines having slope given by (6) is overlaid and the points that
define the ROC convex hull are indicated by markers. If the
costs shown in Ta bl e 2 are employed, the minimum expected
cost is found for the threshold that yields the operating
point (P
FA
(t), P
D
(t)) = (0.0629, 0.7955), where the level lines
touch the ROC convex hull.
The introduction of the reject option in (9) induces the
probability of rejection, which is plotted in Figure 8 as a
0.05
0.1
0.15
Probability of rejection

0.3
0.2
0.1
0
ϑ
−4
−2
0
2
4
t
Figure 8: Probability of rejection as a function of (t
1
, t
2
) for vocal
fold edema detection.
function of t and ϑ. 100 equally spaced values in the range
[h
min
, h
max
] were taken for t and 10 equally spaced values
of ϑ
∈ [0, 3Δt]weredefinedaspreviouslyinvocalfold
paralysis. As expected, the larger probability of rejection
occurs in the middle of the domain of t
±ϑ.
In Figure 9(a), the convex hull of the ROC without
rejection is plotted along with the level lines having slope

α
1
given by (14). The points that define the ROC convex
hull are indicated by markers. The level lines touch the
ROC convex hull at the operating point (P
FA
(t
2
), P
D
(t
2
)) =
(0.0177, 0.7227). The level lines of slope α
2
given by (15)
touch the convex hull of the ROC without rejection at
the operating point (P
FA
(t
1
), P
D
(t
1
)) = (0.1322, 0.8590),
as is demonstrated in Figure 9(b). These operating points
correspond to t
1
=−0.2643 and t

2
= 0.2937. By applying the
procedure described in Section 3.1, the associated probabili-
ties of false alarm and detection with reject option are found
EURASIP Journal on Advances in Signal Processing 11
0.6
0.65
0.7
P
D
00.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05
P
FA
(a)
0.7
0.75
0.8
0.85
0.9
P
D
00.05 0.10.15 0.20.25 0.3
P
FA
(b)
Figure 9: (a) Zoom in the convex hull of the ROC without reject option (solid line); The level lines of slope α
1
(dashed lines) are overlaid.
The arrow points to the optimal operating point (P
FA

(t
2
), P
D
(t
2
)) = (0.0177, 0.7227). (b) Zoom in the convex hull of the ROC without reject
option (solid line); the level lines of slope α
2
(dashed lines) are overlaid. The arrow points to the optimal operating point (P
FA
(t
1
), P
D
(t
1
)) =
(0.1322, 0.8590).
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1

P
D
00.02 0.04 0.06 0.08 0.10.12 0.14 0.16 0.18 0.2
P
FA
Figure 10: Zoom in the ROC convex hulls with reject option (solid
line) and without reject option (dashed line).
to be 0.02003 and 0.836842, respectively. The classification
accuracy with reject option at the best operating point, when
the costs of Tab le 2 are used, is measured 94.316%. That is,
4.316% higher than that measured without rejection. The
confidence interval for the classification accuracy predicted
by (17)forN
= 840 and q = 0.94316 is 1.57%, which
indicates that the just mentioned improvement of 4.316%
is statistically significant at 95% level of significance. By
fixing the probability of detection to 83.64%, the reject
option is found to reduce the probability of false alarm by
9.12%.
The superiority of the linear classifier with reject option
is demonstrated in Figure 10, where the convex hull of the
ROC curves with reject option (solid line) and without reject
option (dashed line) are plotted only. It is self-evident that
the area of the convex hull for the ROC with reject option
is greater than that without reject option. In particular, the
area under the ROC increases from 0.9458 to 0.96 with the
introduction of the reject option.
The same procedure has been applied to a set of 3365
test feature vectors extracted from utterances of “rainbow
passage.” At the optimal operating point with respect to the

costs of Ta ble 2 the classifier without reject option yields
P
FA
= 0.5965 and P
D
= 0.8959 and its probability of correct
classification is 64.96%. The introduction of the reject option
yields at the optimal operating point P
FA
= 0.5228 and
P
D
= 0.8853, while the accuracy increases to 68.8%. It
is seen that the reject option reduces the probability of
false alarm by approximately 7.3% at the same probability
of detection. The improvement of 3.9% in classification
accuracy is statistically significant at 95% level of significance
(CI
= 1.57%).
6. Conclusions
The reject option has been shown to improve the accuracy of
a linear classifier in detecting vocal fold paralysis for male
patients as well as detecting vocal fold edema for female
ones than that obtained without reject option. Moreover, the
reported improvements are shown to be statistically signifi-
cant at 95% confidence level. In addition, the linear classifier
with reject option outperforms the previously employed
classifiers in [29, 32] to detect the aforementioned voice
pathologies under exactly the same experimental protocol.
Future research will address the introduction of reject option

in the design of the Bayes classifier, when Gaussian mixture
models approximate the class conditional probability density
functions of the linear prediction coefficients extracted from
continuous speech.
12 EURASIP Journal on Advances in Signal Processing
References
[1] C. Manfredi, “Voice models and analysis for biomedical
applications,” Biomedical Signal Processing and Control, vol. 1,
no. 2, pp. 99–101, 2006.
[2] F. Quek, M. Harper, Y. Haciahmetoglou, L. Chen, and L.
O. Ramig, “Speech pauses and gestural holds in parkinson’s
disease,” in Proceedings of the 7th International Conference
on Spoken Language Processing (ICSLP ’02), pp. 2485–2488,
Denver, Colo, USA, September 2002.
[3] L. Will, L. O. Ramig, and J. L. Spielman, “Application of lee
silverman voice treatment (LSVT) to individuals with multiple
sclerosis, ataxic dysarthria, and stroke,” in Proceedings of the
7th International Conference on Spoken Language Processing
(ICSLP ’02), pp. 2497–2500, Denver, Colo, USA, September
2002.
[4] P.EnderbyandL.Emerson,Does Speech and Language Therapy
Work? Singular Publications, 1995.
[5] R.P.SchumeyerandK.E.Barner,“Effect of visual information
on word initial consonant perception of dysarthric speech,” in
Proceedings of the 4th International Conference on Spoken Lan-
guage Processing (ICSLP ’96), vol. 1, pp. 46–49, Philadelphia,
Pa, USA, October 1996.
[6] K. M
´
ady, R. Sader, A. Zimmermann, et al., “Assessment of

consonant articulation in glossectomee speech by dynamic
MRI,” in Proceedings of the 7th International Conference on
Spoken Language Processing (ICSLP ’02), pp. 961–964, Denver,
Colo, USA, September 2002.
[7] R. Schwarz, U. Hoppe, M. Schuster, T. Wurzbacher, U.
Eysholdt, and J. Lohscheller, “Classification of unilateral vocal
fold paralysis by endoscopic digital high-speed recordings and
inversion of a biomechanical model,” IEEE Transactions on
Biomedical Engineering, vol. 53, no. 6, pp. 1099–1108, 2006.
[8] V. Parsa and D. G. Jamieson, “Interactions between speech
coders and disordered speech,” Speech Communication, vol.
40, no. 7, pp. 365–385, 2003.
[9] M. S. Hawley, P. Green, P. Enderby, S. Cunningham, and R.
K. Moore, “Speech technology for e-inclusion of people with
physical disabilities and disordered speech,” in Proceedings
of the 9th European Conference on Speech Communication
and Technology (INTERSPEECH ’05), pp. 445–448, Lisbon,
Portugal, September 2005.
[10] F. Plante, H. Kessler, B. Cheetham, and J. Earis, “Speech
monitoring of infective laryngitis,” in Proceedings of the
4th International Conference on Spoken Language Processing
(ICSLP ’96), vol. 2, pp. 749–752, Philadelphia, Pa, USA,
October 1996.
[11] E. J. Wallen and J. H. L. Hansen, “Screening test for speech
pathology assessment using objective quality measures,” in
Proceedings of the 4th International Conference on Spoken Lan-
guage Processing (ICSLP ’96), vol. 2, pp. 776–779, Philadelphia,
Pa, USA, October 1996.
[12] M. N. Vieira, F. R. McInnes, and M. A. Jack, “Robust F0 and
jitter estimation in pathological voices,” in Proceedings of the

4th International Conference on Spoken Language Processing
(ICSLP ’96), vol. 2, pp. 745–748, Philadelphia, Pa, USA,
October 1996.
[13] P. Mitev and S. Hadjitodorov, “Fundamental frequency
estimation of voice of patients with laryngeal disorders,”
Information Sciences, vol. 156, no. 1-2, pp. 3–19, 2003.
[14] H. Weiping, W. Xiuxin, and P. G
´
omez, “Robust pitch extrac-
tion in pathological voice based on wavelet and cepstrum,” in
Proceedings of the 12th European Signal Processing Conference
(EUSIPCO ’04), pp. 297–300, Vienna, Austria, September
2004.
[15] L. Deng, X. Shen, D. Jamieson, and J. Till, “Simulation
of disordered speech using a frequency-domain vocal tract
model,” in Proceedings of the 4th International Conference on
Spoken Language Processing (ICSLP ’96), vol. 2, pp. 768–771,
Philadelphia, Pa, USA, October 1996.
[16] B. Gabelman and A. Alwan, “Analysis by synthesis of FM
modulation and aspiration noise components in pathological
voices,” in Proceedings of the IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP ’02), vol. 1, pp.
449–452, Orlando, Fla, USA, May 2002.
[17] J. Hanquinet, F. Grenez, and J. Schoentgen, “Synthesis of disor-
dered speech,” in Proceedings of the 9th European Conference on
Speech Communication and Technology (INTERSPEECH ’05),
pp. 1077–1080, Lisbon, Portugal, September 2005.
[18] V. Parsa and D. G. Jamieson, “Acoustic discrimination
of pathological voice: sustained vowels versus continuous
speech,” Journal of Speech, Language, and Hearing Research,

vol. 44, no. 2, pp. 327–339, 2001.
[19] A. McAllister, “Acoustic, perceptual and physiological studies
of ten-year-old children’s voices,” Speech, Music and Hearing
Quarterly Progress and Status Report, vol. 38, no. 1, 1997.
[20] V. Uloza, V. Saferis, and I. Uloziene, “Perceptual and acoustic
assessment of voice pathology and the efficacy of endolaryn-
geal phonomicrosurgery,” Journal of Voice,vol.19,no.1,pp.
138–145, 2005.
[21] N. S
´
aenz-Lech
´
on, J. I. Godino-Llorente, V. Osma-Ruiz, and P.
G
´
omez-Vilda, “Methodological issues in the development of
automatic systems for voice pathology detection,” Biomedical
Signal Processing and Control, vol. 1, no. 2, pp. 120–128, 2006.
[22] J. B. Alonso, J. de Leon, I. Alonso, and M. A. Ferrer,
“Automatic detection of pathologies in the voice by HOS based
parameters,” EURASIP Journal on Applied Signal Processing,
vol. 2001, no. 4, pp. 275–284, 2001.
[23] R. B. Reilly, R. Moran, and P. Lacy, “Voice pathology
assessment based on a dialogue system and speech analysis,”
in Proceedings of the of the AAAI Fall Symposium on Dialogue
Systems for Health Communication, pp. 104–109, Washington,
DC, USA, October 2004.
[24] K. Shama, A. Krishna, and N. U. Cholayya, “Study of
harmonics-to-noise ratio and critical-band energy spectrum
of speech as acoustic indicators of laryngeal and voice

pathology,” EURASIP Journal on Advances in Signal Processing,
vol. 2007, Article ID 85286, 9 pages, 2007.
[25] P. G
´
omez, J. I. Godino, F. Rodr
´
ıguez, et al., “Evidence of vocal
cord pathology from the mucosal wave cepstral contents,” in
Proceedings of the IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP ’04), vol. 5, pp. 437–440,
Montreal, Canada, May 2004.
[26] J.B.Alonso,F.D.deMaria,C.M.Trevieso,andM.A.Ferrer,
“Using nonlinear features for voice disorder detection,” in
Proceedings of the 3rd International Conference on Non-Linear
Speech Processing (NOLISP ’05), pp. 94–106, Barcelona, Spain,
2005.
[27] M. Little, P. McSharry, I. Moroz, and S. Roberts, “Nonlin-
ear, biophysically-informed speech pathology detection,” in
Proceedings of the IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP ’06), vol. 2, pp. 1080–
1083, Toulouse, France, May 2006.
[28] P. Kukharchik, I. Kheidorov, E. Bovbel, and D. Ladeev, “Speech
signal processing based on wavelets and SVM for vocal tract
pathology detection,” in Proceedings of the 3rd International
EURASIP Journal on Advances in Signal Processing 13
Conference on Image and Signal Processing (ICISP ’08), vol.
5099 of Lecture Notes in Computer Science, pp. 192–199,
Springer, Cherbourg-Octeville, France, July 2008.
[29] M. Marinaki, C. Kotropoulos, I. Pitas, and N. Maglaveras,
“Automatic detection of vocal fold paralysis and edema,”

in Proceedings of the International Conference on Spoken
Language Processing (ICSLP ’04), pp. 537–540, Jeju, South
Korea, October 2004.
[30] P. G
´
omez, F. D
´
ıaz, A.
´
Alvarez, et al., “Principal component
analysis of spectral perturbation parameters for voice pathol-
ogy detection,” in Proceedings of the18th IEEE Symposium
on Computer-Based Medical Syste ms (CBMS ’05), pp. 41–46,
Dublin, Ireland, June 2005.
[31] C. Peng, W. Chen, and B. Wan, “A preliminary study of
pathological voice classification,” in Proceedings of the 7th
IEEE International Conference on Computer and Information
Technology (CIT ’07), pp. 1106–1110, October 2007.
[32] E. Ziogas and C. Kotropoulos, “Detection of vocal fold
paralysis and edema using linear discriminant classifiers,”
in Proceedings of the 4th Helenic Conference on Advances in
Artificial Intelligence (SETN ’06), vol. 3955 of Lecture Notes in
Computer Science, pp. 454–464, Springer, Heraklion, Greece,
May 2006.
[33] B. G. A. Aguiar Neto, J. M. Fechine, S. C. Costa, and M.
Muppa, “Feature estimation for vocal fold edema detection
using short-term cepstral analysis,” in Proceedings of the 7th
IEEE International Conference on Bioinformatics and Bioeng i-
neering (BIBE ’07), pp. 1158–1162, October 2007.
[34] C. Fredouille, G. Pouchoulin, J F. Bonastre, M. Azzarello, A.

Giovanni, and A. Ghio, “Application of automatic speaker
recognition techniques to pathological voice assessment (dys-
phonia),” in Proceedings of the 9th European Conference on
Speech Communication and Technology (EUROSPEECH ’05) ,
pp. 149–152, Lisbon, Portugal, September 2005.
[35] D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker
verification using adapted Gaussian mixture models,” Digital
Signal Processing, vol. 10, no. 1–3, pp. 19–41, 2000.
[36] />[37] Massachusetts Eye and Ear Infirmary, Voice Disorders Database,
Version 1.03, Kay Elemetrics Corp., Lincoln Park, NJ, USA,
1994, CD-ROM.
[38] A. A. Dibazar, S. Narayanan, and T. W. Berger, “Feature
analysis for automatic detection of pathological speech,” in
Proceedings of the 25th IEEE Annual International Conference
of the Engineering in Medicine and Biology, vol. 1, pp. 182–183,
2002.
[39] V. Parsa, D. G. Jamieson, K. Stenning, and H. A. Leeper, “On
the estimation of signal-to-noise ratio in continuous speech
for abnormal voices,” in Proceedings of the 7th International
Conference on Spoken Language Processing (ICSLP ’02),pp.
2505–2508, Denver, Colo, USA, September 2002.
[40] J. Nayak and P. S. Bhat, “Identification of voice disorders using
speech samples,” in Proceedings of the 10th IEEE International
Conference on Convergent Technologies for Asia-Pasific Reg ion
(TENCON ’03), vol. 3, pp. 951–953, 2003.
[41]R.A.Prosek,A.A.Montgomery,B.E.Walden,andD.B.
Hawkins, “An evaluation of residue features as correlates of
voice disorders,” Journal of Communication Disorders, vol. 20,
pp. 105–107, 1987.
[42] M.DeOliveiraRosa,J.C.Pereira,andM.Grellet,“Adaptive

estimation of residue signal for voice pathology diagnosis,”
IEEE Transactions on Biomedical Engineering,vol.47,no.1,pp.
96–104, 2000.
[43] K. Fukunaga, Introduction to Statistical Pattern Recognition,
Academic Press, San Diego, Calif, USA, 2nd edition, 1990.
[44] X. Tang and W. Wang, “Dual-space linear discriminant
analysis for face recognition,” in Proceedings of the IEEE
Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR ’04) , vol. 2, pp. 1064–1068, 2004.
[45] J. R. Deller, J. G. Proakis, and J. H. L. Hansen, Discrete Time
Processing of Speech Signals, MacMillan Publishing Company,
New York, NY, USA, 1993.
[46]T.C.W.Landgrebe,D.M.J.Tax,P.Pacl
´
ık, and R. P. W.
Duin, “The interaction between classification and reject per-
formance for distance-based reject-option classifiers,” Pattern
Recognition Letters, vol. 27, no. 8, pp. 908–917, 2006.
[47] F. Tortorella, “A ROC-based reject rule for dichotomizers,”
Pattern Recognition Letters, vol. 26, no. 2, pp. 167–180, 2005.
[48] C. M. Santos-Pereira and A. M. Pires, “On optimal reject rules
and ROC curves,” Pattern Recognition Letters, vol. 26, no. 7, pp.
943–952, 2005.
[49] C. Marrocco, M. Molinara, and F. Tortorella, “An empirical
comparison of ideal and empirical ROC-based reject rules,”
in Proceedings of the 5th International Conference on Machine
Learning and Data Mining (MLDM ’07), vol. 4571 of Lecture
Notes in Computer Science, pp. 47–60, 2007.
[50] H. L. V. Trees, Detection, Estimation and Modulation Theory,
Part I, John Wiley & Sons, New York, NY, USA, 1968.

[51] M. H. Zweig and G. Campbell, “Receiver-operating character-
istic (ROC) plots: a fundamental evaluation tool in clinical
medicine,” Clinical Chemistry, vol. 39, no. 4, pp. 561–577,
1993.

×