Tải bản đầy đủ (.pdf) (382 trang)

combining pattern classifiers methods and algorithms (2nd ed ) kuncheva 2014 09 09 Cấu trúc dữ liệu và giải thuật

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (8.52 MB, 382 trang )

Combining
Pattern Classifiers
Methods and Algorithms, Second Edition
Ludmila Kuncheva

CuuDuongThanCong.com

www.it-ebooks.info


CuuDuongThanCong.com

www.it-ebooks.info


COMBINING PATTERN
CLASSIFIERS

CuuDuongThanCong.com

www.it-ebooks.info


CuuDuongThanCong.com

www.it-ebooks.info


COMBINING PATTERN
CLASSIFIERS
Methods and Algorithms


Second Edition

LUDMILA I. KUNCHEVA

CuuDuongThanCong.com

www.it-ebooks.info


Copyright © 2014 by John Wiley & Sons, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or
by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as
permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to
the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400,
fax (978) 646-8600, or on the web at www.copyright.com. Requests to the Publisher for permission
should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street,
Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representations or warranties with respect to the accuracy or
completeness of the contents of this book and specifically disclaim any implied warranties of
merchantability or fitness for a particular purpose. No warranty may be created or extended by sales
representatives or written sales materials. The advice and strategies contained herin may not be suitable
for your situation. You should consult with a professional where appropriate. Neither the publisher nor
author shall be liable for any loss of profit or any other commercial damages, including but not limited to
special, incidental, consequential, or other damages.
For general information on our other products and services please contact our Customer Care
Department with the U.S. at 877-762-2974, outside the U.S. at 317-572-3993 or fax 317-572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print,
however, may not be available in electronic format.
MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does
not warrant the accuracy of the text or exercises in this book. This book’s use or discussion of
MATLAB® software or related products does not constitute endorsement or sponsorship by The
MathWorks of a particular pedagogical approach or particular use of the MATLAB® software.
Library of Congress Cataloging-in-Publication Data
Kuncheva, Ludmila I. (Ludmila Ilieva), 1959–
Combining pattern classifiers : methods and algorithms / Ludmila I. Kuncheva. – Second edition.
pages cm
Includes index.
ISBN 978-1-118-31523-1 (hardback)
1. Pattern recognition systems. 2. Image processing–Digital techniques. I. Title.
TK7882.P3K83 2014
006.4–dc23
2014014214
Printed in the United States of America.
10 9 8 7 6 5 4 3 2 1

CuuDuongThanCong.com

www.it-ebooks.info


To Roumen, Diana and Kamelia

CuuDuongThanCong.com

www.it-ebooks.info



CuuDuongThanCong.com

www.it-ebooks.info


CONTENTS

Preface

xv

Acknowledgements

xxi

1 Fundamentals of Pattern Recognition

1

1.1 Basic Concepts: Class, Feature, Data Set, 1
1.1.1 Classes and Class Labels, 1
1.1.2 Features, 2
1.1.3 Data Set, 3
1.1.4 Generate Your Own Data, 6
1.2 Classifier, Discriminant Functions, Classification Regions, 9
1.3 Classification Error and Classification Accuracy, 11
1.3.1 Where Does the Error Come From? Bias and Variance, 11
1.3.2 Estimation of the Error, 13
1.3.3 Confusion Matrices and Loss Matrices, 14

1.3.4 Training and Testing Protocols, 15
1.3.5 Overtraining and Peeking, 17
1.4 Experimental Comparison of Classifiers, 19
1.4.1 Two Trained Classifiers and a Fixed Testing Set, 20
1.4.2 Two Classifier Models and a Single Data Set, 22
1.4.3 Two Classifier Models and Multiple Data Sets, 26
1.4.4 Multiple Classifier Models and Multiple Data Sets, 27
1.5 Bayes Decision Theory, 30
1.5.1 Probabilistic Framework, 30

vii

CuuDuongThanCong.com

www.it-ebooks.info


viii

CONTENTS

1.5.2
Discriminant Functions and Decision Boundaries, 31
1.5.3
Bayes Error, 33
1.6 Clustering and Feature Selection, 35
1.6.1
Clustering, 35
1.6.2
Feature Selection, 37

1.7 Challenges of Real-Life Data, 40
Appendix, 41
1.A.1 Data Generation, 41
1.A.2 Comparison of Classifiers, 42
1.A.2.1 MATLAB Functions for Comparing Classifiers, 42
1.A.2.2 Critical Values for Wilcoxon and Sign Test, 45
1.A.3 Feature Selection, 47
2 Base Classifiers

49

2.1 Linear and Quadratic Classifiers, 49
2.1.1
Linear Discriminant Classifier, 49
2.1.2
Nearest Mean Classifier, 52
2.1.3
Quadratic Discriminant Classifier, 52
2.1.4
Stability of LDC and QDC, 53
2.2 Decision Tree Classifiers, 55
2.2.1
Basics and Terminology, 55
2.2.2
Training of Decision Tree Classifiers, 57
2.2.3
Selection of the Feature for a Node, 58
2.2.4
Stopping Criterion, 60
2.2.5

Pruning of the Decision Tree, 63
2.2.6
C4.5 and ID3, 64
2.2.7
Instability of Decision Trees, 64
2.2.8
Random Trees, 65
2.3 The Na¨ıve Bayes Classifier, 66
2.4 Neural Networks, 68
2.4.1
Neurons, 68
2.4.2
Rosenblatt’s Perceptron, 70
2.4.3
Multi-Layer Perceptron, 71
2.5 Support Vector Machines, 73
2.5.1
Why Would It Work?, 73
Classification Margins, 74
2.5.2
2.5.3
Optimal Linear Boundary, 76
2.5.4
Parameters and Classification Boundaries of SVM, 78
2.6 The k-Nearest Neighbor Classifier (k-nn), 80
2.7 Final Remarks, 82
2.7.1
Simple or Complex Models?, 82
2.7.2
The Triangle Diagram, 83

2.7.3
Choosing a Base Classifier for Ensembles, 85
Appendix, 85

CuuDuongThanCong.com

www.it-ebooks.info


CONTENTS

ix

2.A.1 MATLAB Code for the Fish Data, 85
2.A.2 MATLAB Code for Individual Classifiers, 86
2.A.2.1 Decision Tree, 86
2.A.2.2 Na¨ıve Bayes, 89
2.A.2.3 Multi-Layer Perceptron, 90
2.A.2.4 1-nn Classifier, 92
3 An Overview of the Field

94

3.1 Philosophy, 94
3.2 Two Examples, 98
3.2.1
The Wisdom of the “Classifier Crowd”, 98
3.2.2
The Power of Divide-and-Conquer, 98
3.3 Structure of the Area, 100

3.3.1
Terminology, 100
3.3.2
A Taxonomy of Classifier Ensemble Methods, 100
3.3.3
Classifier Fusion and Classifier Selection, 104
3.4 Quo Vadis?, 105
3.4.1
Reinventing the Wheel?, 105
3.4.2
The Illusion of Progress?, 106
3.4.3
A Bibliometric Snapshot, 107
4 Combining Label Outputs
4.1 Types of Classifier Outputs, 111
4.2 A Probabilistic Framework for Combining Label Outputs, 112
4.3 Majority Vote, 113
4.3.1
“Democracy” in Classifier Combination, 113
4.3.2
Accuracy of the Majority Vote, 114
4.3.3
Limits on the Majority Vote Accuracy:
An Example, 117
4.3.4
Patterns of Success and Failure, 119
4.3.5
Optimality of the Majority Vote Combiner, 124
4.4 Weighted Majority Vote, 125
4.4.1

Two Examples, 126
4.4.2
Optimality of the Weighted Majority Vote
Combiner, 127
4.5 Na¨ıve-Bayes Combiner, 128
4.5.1
Optimality of the Na¨ıve Bayes Combiner, 128
Implementation of the NB Combiner, 130
4.5.2
4.6 Multinomial Methods, 132
4.7 Comparison of Combination Methods for Label Outputs, 135
Appendix, 137
4.A.1 Matan’s Proof for the Limits on the Majority Vote
Accuracy, 137
4.A.2 Selected MATLAB Code, 139

CuuDuongThanCong.com

www.it-ebooks.info

111


x

CONTENTS

5 Combining Continuous-Valued Outputs

143


5.1 Decision Profile, 143
5.2 How Do We Get Probability Outputs?, 144
5.2.1
Probabilities Based on Discriminant Scores, 144
5.2.2
Probabilities Based on Counts: Laplace Estimator, 147
5.3 Nontrainable (Fixed) Combination Rules, 150
5.3.1
A Generic Formulation, 150
5.3.2
Equivalence of Simple Combination Rules, 152
5.3.3
Generalized Mean Combiner, 153
5.3.4
A Theoretical Comparison of Simple Combiners, 156
5.3.5
Where Do They Come From?, 160
5.4 The Weighted Average (Linear Combiner), 166
5.4.1
Consensus Theory, 166
5.4.2
Added Error for the Weighted Mean Combination, 167
5.4.3
Linear Regression, 168
5.5 A Classifier as a Combiner, 172
5.5.1
The Supra Bayesian Approach, 172
5.5.2
Decision Templates, 173

5.5.3
A Linear Classifier, 175
5.6 An Example of Nine Combiners for Continuous-Valued
Outputs, 175
5.7 To Train or Not to Train?, 176
Appendix, 178
5.A.1 Theoretical Classification Error for the Simple Combiners, 178
5.A.1.1 Set-up and Assumptions, 178
5.A.1.2 Individual Error, 180
5.A.1.3 Minimum and Maximum, 180
5.A.1.4 Average (Sum), 181
5.A.1.5 Median and Majority Vote, 182
5.A.1.6 Oracle, 183
5.A.2 Selected MATLAB Code, 183
6 Ensemble Methods

186

6.1 Bagging, 186
6.1.1
The Origins: Bagging Predictors, 186
6.1.2
Why Does Bagging Work?, 187
6.1.3
Out-of-bag Estimates, 189
Variants of Bagging, 190
6.1.4
6.2 Random Forests, 190
6.3 AdaBoost, 192
6.3.1

The AdaBoost Algorithm, 192
6.3.2
The arc-x4 Algorithm, 194
6.3.3
Why Does AdaBoost Work?, 195

CuuDuongThanCong.com

www.it-ebooks.info


xi

CONTENTS

6.3.4
Variants of Boosting, 199
6.3.5
A Famous Application: AdaBoost for Face Detection, 199
6.4 Random Subspace Ensembles, 203
6.5 Rotation Forest, 204
6.6 Random Linear Oracle, 208
6.7 Error Correcting Output Codes (ECOC), 211
6.7.1
Code Designs, 212
6.7.2
Decoding, 214
6.7.3
Ensembles of Nested Dichotomies, 216
Appendix, 218

6.A.1 Bagging, 218
6.A.2 AdaBoost, 220
6.A.3 Random Subspace, 223
6.A.4 Rotation Forest, 225
6.A.5 Random Linear Oracle, 228
6.A.6 ECOC, 229
7 Classifier Selection

230

7.1 Preliminaries, 230
7.2 Why Classifier Selection Works, 231
7.3 Estimating Local Competence Dynamically, 233
7.3.1
Decision-Independent Estimates, 233
7.3.2
Decision-Dependent Estimates, 238
7.4 Pre-Estimation of the Competence Regions, 239
7.4.1
Bespoke Classifiers, 240
7.4.2
Clustering and Selection, 241
7.5 Simultaneous Training of Regions and Classifiers, 242
7.6 Cascade Classifiers, 244
Appendix: Selected MATLAB Code, 244
7.A.1 Banana Data, 244
7.A.2 Evolutionary Algorithm for a Selection Ensemble for the
Banana Data, 245
8 Diversity in Classifier Ensembles
8.1 What Is Diversity?, 247

8.1.1
Diversity for a Point-Value Estimate, 248
8.1.2
Diversity in Software Engineering, 248
8.1.3
Statistical Measures of Relationship, 249
8.2 Measuring Diversity in Classifier Ensembles, 250
8.2.1
Pairwise Measures, 250
8.2.2
Nonpairwise Measures, 251
8.3 Relationship Between Diversity and Accuracy, 256
8.3.1
An Example, 256

CuuDuongThanCong.com

www.it-ebooks.info

247


xii

CONTENTS

8.3.2
8.3.3

Relationship Patterns, 258

A Caveat: Independent Outputs ≠ Independent
Errors, 262
8.3.4
Independence Is Not the Best Scenario, 265
8.3.5
Diversity and Ensemble Margins, 267
8.4 Using Diversity, 270
8.4.1
Diversity for Finding Bounds and Theoretical
Relationships, 270
8.4.2
Kappa-error Diagrams and Ensemble Maps, 271
8.4.3
Overproduce and Select, 275
8.5 Conclusions: Diversity of Diversity, 279
Appendix, 280
8.A.1 Derivation of Diversity Measures for Oracle Outputs, 280
8.A.1.1 Correlation 𝜌, 280
8.A.1.2 Interrater Agreement 𝜅, 281
8.A.2 Diversity Measure Equivalence, 282
8.A.3 Independent Outputs ≠ Independent Errors, 284
8.A.4 A Bound on the Kappa-Error Diagram, 286
8.A.5 Calculation of the Pareto Frontier, 287
9 Ensemble Feature Selection
9.1 Preliminaries, 290
9.1.1
Right and Wrong Protocols, 290
9.1.2
Ensemble Feature Selection Approaches, 294
9.1.3

Natural Grouping, 294
9.2 Ranking by Decision Tree Ensembles, 295
9.2.1
Simple Count and Split Criterion, 295
9.2.2
Permuted Features or the “Noised-up” Method, 297
9.3 Ensembles of Rankers, 299
9.3.1
The Approach, 299
9.3.2
Ranking Methods (Criteria), 300
9.4 Random Feature Selection for the Ensemble, 305
9.4.1
Random Subspace Revisited, 305
9.4.2
Usability, Coverage, and Feature Diversity, 306
9.4.3
Genetic Algorithms, 312
9.5 Nonrandom Selection, 315
9.5.1
The “Favorite Class” Model, 315
9.5.2
The Iterative Model, 315
9.5.3
The Incremental Model, 316
9.6 A Stability Index, 317
9.6.1
Consistency Between a Pair of Subsets, 317
9.6.2
A Stability Index for K Sequences, 319

9.6.3
An Example of Applying the Stability Index, 320
Appendix, 322

CuuDuongThanCong.com

www.it-ebooks.info

290


CONTENTS

xiii

9.A.1 MATLAB Code for the Numerical Example of Ensemble
Ranking, 322
9.A.2 MATLAB GA Nuggets, 322
9.A.3 MATLAB Code for the Stability Index, 324
10

A Final Thought

326

References

327

Index


353

CuuDuongThanCong.com

www.it-ebooks.info


CuuDuongThanCong.com

www.it-ebooks.info


PREFACE

Pattern recognition is everywhere. It is the technology behind automatically identifying fraudulent bank transactions, giving verbal instructions to your mobile phone,
predicting oil deposit odds, or segmenting a brain tumour within a magnetic resonance
image.
A decade has passed since the first edition of this book. Combining classifiers,
also known as “classifier ensembles,” has flourished into a prolific discipline. Viewed
from the top, classifier ensembles reside at the intersection of engineering, computing, and mathematics. Zoomed in, classifier ensembles are fuelled by advances in
pattern recognition, machine learning and data mining, among others. An ensemble aggregates the “opinions” of several pattern classifiers in the hope that the new
opinion will be better than the individual ones. Vox populi, vox Dei.
The interest in classifier ensembles received a welcome boost due to the highprofile Netflix contest. The world’s research creativeness was challenged using a
difficult task and a substantial reward. The problem was to predict whether a person
will enjoy a movie based on their past movie preferences. A Grand Prize of $1,000,000
was to be awarded to the team who first achieved a 10% improvement on the classification accuracy of the existing system Cinematch. The contest was launched in
October 2006, and the prize was awarded in September 2009. The winning solution
was nothing else but a rather fancy classifier ensemble.
What is wrong with the good old single classifiers? Jokingly, I often put up a slide

in presentations, with a multiple-choice question. The question is “Why classifier
ensembles?” and the three possible answers are:
(a) because we like to complicate entities beyond necessity (anti-Occam’s
razor);
xv

CuuDuongThanCong.com

www.it-ebooks.info


xvi

PREFACE

(b) because we are lazy and stupid and cannot be bothered to design and train one
single sophisticated classifier; and
(c) because democracy is so important to our society, it must be important to
classification.
Funnily enough, the real answer hinges on choice (b). Of course, it is not a matter
of laziness or stupidity, but the realization that a complex problem can be elegantly
solved using simple and manageable tools. Recall the invention of the error backpropagation algorithm followed by the dramatic resurfacing of neural networks in
the 1980s. Neural networks were proved to be universal approximators with unlimited flexibility. They could approximate any classification boundary in any number
of dimensions. This capability, however, comes at a price. Large structures with
a vast number of parameters have to be trained. The initial excitement cooled
down as it transpired that massive structures cannot be easily trained with sufficient guarantees of good generalization performance. Until recently, a typical neural
network classifier contained one hidden layer with a dozen neurons, sacrificing the so
acclaimed flexibility but gaining credibility. Enter classifier ensembles! Ensembles
of simple neural networks are among the most versatile and successful ensemble
methods.

But the story does not end here. Recent studies have rekindled the excitement
of using massive neural networks drawing upon hardware advances such as parallel
computations using graphics processing units (GPU) [75]. The giant data sets necessary for training such structures are generated by small distortions of the available set.
These conceptually different rival approaches to machine learning can be regarded
as divide-and-conquer and brute force, respectively. It seems that the jury is still out
about their relative merits. In this book we adopt the divide-and-conquer approach.
THE PLAYING FIELD
Writing the first edition of the book felt like the overwhelming task of bringing
structure and organization to a hoarder’s attic. The scenery has changed markedly
since then. The series of workshops on Multiple Classifier Systems (MCS), run
since 2000 by Fabio Roli and Josef Kittler [338], served as a beacon, inspiration,
and guidance for experienced and new researchers alike. Excellent surveys shaped
the field, among which are the works by Polikar [311], Brown [53], and Valentini
and Re [397]. Better still, four recent texts together present accessible, in-depth,
comprehensive, and exquisite coverage of the classifier ensemble area: Rokach [335],
Zhou [439], Schapire and Freund [351], and Seni and Elder [355]. This gives me the
comfort and luxury to be able to skim over topics which are discussed at length and
in-depth elsewhere, and pick ones which I believe deserve more exposure or which I
just find curious.
As in the first edition, I have no ambition to present an accurate snapshot of the
state of the art. Instead, I have chosen to explain and illustrate some methods and
algorithms, giving sufficient detail so that the reader can reproduce them in code.

CuuDuongThanCong.com

www.it-ebooks.info


PREFACE


xvii

Although I venture an opinion based on general consensus and examples in the text,
this should not be regarded as a guide for preferring one method to another.
SOFTWARE
A rich set of classifier ensemble methods is implemented in WEKA1 [167], a collection of machine learning algorithms for data-mining tasks. PRTools2 is a MATLAB
toolbox for pattern recognition developed by the Pattern Recognition Research Group
of the TU Delft, The Netherlands, led by Professor R. P. W. (Bob) Duin. An industryoriented spin-off toolbox, called “perClass”3 was designed later. Classifier ensembles
feature prominently in both packages.
PRTools and perClass are instruments for advanced MATLAB programmers and
can also be used by practitioners after a short training. The recent edition of MATLAB
Statistics toolbox (2013b) includes a classifier ensemble suite as well.
Snippets of MATLAB DIY (do-it-yourself) code for illustrating methodologies
and concepts are given in the chapter appendices. MATLAB was seen as a suitable
language for such illustrations because it often looks like executable pseudo-code.
A programming language is like a living creature—it grows, develops, changes, and
breeds. The code in the book is written by today’s versions, styles, and conventions.
It does not, by any means, measure up to the richness, elegance, and sophistication
of PRTools and perClass. Aimed at simplicity, the code is not fool-proof nor is it
optimized for time or other efficiency criteria. Its sole purpose is to enable the reader
to grasp the ideas and run their own small-scale experiments.
STRUCTURE AND WHAT IS NEW IN THE SECOND EDITION
The book is organized as follows.
Chapter 1, Fundamentals, gives an introduction of the main concepts in pattern
recognition, Bayes decision theory, and experimental comparison of classifiers. A
new treatment of the classifier comparison issue is offered (after Demˇsar [89]). The
discussion of bias and variance decomposition of the error which was given in a
greater level of detail in Chapter 7 before (bagging and boosting) is now briefly
introduced and illustrated in Chapter 1.
Chapter 2, Base Classifiers, contains methods and algorithms for designing the

individual classifiers. In this edition, a special emphasis is put on the stability of the
classifier models. To aid the discussions and illustrations throughout the book, a toy
two-dimensional data set was created called the fish data. The Na¨ıve Bayes classifier
and the support vector machine classifier (SVM) are brought to the fore as they are
often used in classifier ensembles. In the final section of this chapter, I introduce the
triangle diagram that can enrich the analyses of pattern recognition methods.
1 />2 />3 />
CuuDuongThanCong.com

www.it-ebooks.info


xviii

PREFACE

Chapter 3, Multiple Classifier Systems, discusses some general questions in combining classifiers. It has undergone a major makeover. The new final section, “Quo
Vadis?,” asks questions such as “Are we reinventing the wheel?” and “Has the progress
thus far been illusory?” It also contains a bibliometric snapshot of the area of classifier
ensembles as of January 4, 2013 using Thomson Reuters’ Web of Knowledge (WoK).
Chapter 4, Combining Label Outputs, introduces a new theoretical framework
which defines the optimality conditions of several fusion rules by progressively
relaxing an assumption. The Behavior Knowledge Space method is trimmed down
and illustrated better in this edition. The combination method based on singular value
decomposition (SVD) has been dropped.
Chapter 5, Combining Continuous-Valued Outputs, summarizes classifier fusion
methods such as simple and weighted average, decision templates and a classifier used
as a combiner. The division of methods into class-conscious and class-independent
in the first edition was regarded as surplus and was therefore abandoned.
Chapter 6, Ensemble Methods, grew out of the former Bagging and Boosting

chapter. It now accommodates on an equal keel the reigning classics in classifier
ensembles: bagging, random forest, AdaBoost and random subspace, as well as a
couple of newcomers: rotation forest and random oracle. The Error Correcting Output
Code (ECOC) ensemble method is included here, having been cast as “Miscellanea”
in the first edition of the book. Based on the interest in this method, as well as its
success, ECOC’s rightful place is together with the classics.
Chapter 7, Classifier Selection, explains why this approach works and how classifier competence regions are estimated. The chapter contains new examples and
illustrations.
Chapter 8, Diversity, gives a modern view on ensemble diversity, raising at the
same time some old questions, which are still puzzling the researchers in spite of
the remarkable progress made in the area. There is a frighteningly large number of
possible “new” diversity measures, lurking as binary similarity and distance measures (take for example Choi et al.’s study [74] with 76, s-e-v-e-n-t-y s-i-x, such
measures). And we have not even touched the continuous-valued outputs and the
possible diversity measured from those. The message in this chapter is stronger now:
we hardly need any more diversity measures; we need to pick a few and learn how
to use them. In view of this, I have included a theoretical bound on the kappa-error
diagram [243] which shows how much space is still there for new ensemble methods
with engineered diversity.
Chapter 9, Ensemble Feature Selection, considers feature selection by the ensemble
and for the ensemble. It was born from a section in the former Chapter 8, Miscellanea.
The expansion was deemed necessary because of the surge of interest to ensemble
feature selection from a variety of application areas, notably so from bioinformatics
[346]. I have included a stability index between feature subsets or between feature
rankings [236].
I picked a figure from each chapter to create a small graphical guide to the contents
of the book as illustrated in Figure 1.
The former Theory chapter (Chapter 9) was dissolved; parts of it are now blended
with the rest of the content of the book. Lengthier proofs are relegated to the respective

CuuDuongThanCong.com


www.it-ebooks.info


Priors × p.d.f

PREFACE
class label

(a) Discriminant function set 1

0.2

A classifier?
combiner

Posterior probabilities

0.1
0

0

1

2
4
6
8
(b) Discriminant function set 2


10

x

classifier

classifier

xix

A fancy
feature
extractor?

classifier

0.5
0

0

2

4

6

8


10

feature values
(object description)

x

1. Fundamentals

2. Base classifiers

3. Ensemble overview

BKS accuracy 0.8948

AdaBoost sampling distribution

0.182
Ensemble error

Harmonic mean
0.181
Geometric mean
(Product)

0.18

Average
rule
0.179

–2

4. Combining labels

0
2
4
Level of optimism α

5. Combining continuous

6. Ensemble methods
Individual

κ -error bound

4

2

0.6
0.4
0.2

−2

−4
−4

Stability index


Error

0.8

0

0
−1
−2

0

2

4

7. Classifier selection

0
kappa

8. Diversity

FIGURE 1

Ensemble

0.8


1

1

0.6
0.4
0.2
0

0

20
40
Number of features

60

9. Feature selection

The book chapters at a glance.

chapter appendices. Some of the proofs and derivations were dropped altogether, for
example, the theory behind the magic of AdaBoost. Plenty of literature sources can
be consulted for the proofs and derivations left out.
The differences between the two editions reflect the fact that the classifier ensemble
research has made a giant leap; some methods and techniques discussed in the first
edition did not withstand the test of time, others were replaced with modern versions.
The dramatic expansion of some sub-areas forced me, unfortunately, to drop topics
such as cluster ensembles and stay away from topics such as classifier ensembles for:
adaptive (on-line) learning, learning in the presence of concept drift, semi-supervised

learning, active learning, handing imbalanced classes and missing values. Each of
these sub-areas will likely see a bespoke monograph in a not so distant future. I look
forward to that.

CuuDuongThanCong.com

www.it-ebooks.info


xx

PREFACE

I am humbled by the enormous volume of literature on the subject, and the
ingenious ideas and solutions within. My sincere apology to those authors, whose
excellent research into classifier ensembles went without citation in this book because
of lack of space or because of unawareness on my part.

WHO IS THIS BOOK FOR?
The book is suitable for postgraduate students and researchers in computing and
engineering, as well as practitioners with some technical background. The assumed
level of mathematics is minimal and includes a basic understanding of probabilities
and simple linear algebra. Beginner’s MATLAB programming knowledge would be
beneficial but is not essential.
Ludmila I. Kuncheva
Bangor, Gwynedd, UK
December 2013

CuuDuongThanCong.com


www.it-ebooks.info


ACKNOWLEDGEMENTS

I am most sincerely indebted to Gavin Brown, Juan Rodr´ıguez, and Kami Kountcheva
for scrutinizing the manuscript and returning to me their invaluable comments, suggestions, and corrections. Many heartfelt thanks go to my family and friends for their
constant support and encouragement. Last but not least, thank you, my reader, for
picking up this book.
Ludmila I. Kuncheva
Bangor, Gwynedd, UK
December 2013

xxi

CuuDuongThanCong.com

www.it-ebooks.info


CuuDuongThanCong.com

www.it-ebooks.info


1
FUNDAMENTALS OF PATTERN
RECOGNITION

1.1


BASIC CONCEPTS: CLASS, FEATURE, DATA SET

A wealth of literature in the 1960s and 1970s laid the grounds for modern pattern
recognition [90,106,140,141,282,290,305,340,353,386]. Faced with the formidable
challenges of real-life problems, elegant theories still coexist with ad hoc ideas,
intuition, and guessing.
Pattern recognition is about assigning labels to objects. Objects are described by
features, also called attributes. A classic example is recognition of handwritten digits
for the purpose of automatic mail sorting. Figure 1.1 shows a small data sample. Each
15×15 image is one object. Its class label is the digit it represents, and the features
can be extracted from the binary matrix of pixels.
1.1.1

Classes and Class Labels

Intuitively, a class contains similar objects, whereas objects from different classes
are dissimilar. Some classes have a clear-cut meaning, and in the simplest case are
mutually exclusive. For example, in signature verification, the signature is either
genuine or forged. The true class is one of the two, regardless of what we might
deduce from the observation of a particular signature. In other problems, classes
might be difficult to define, for example, the classes of left-handed and right-handed
people or ordered categories such as “low risk,” “medium risk,” and “high risk.”

Combining Pattern Classifiers: Methods and Algorithms, Second Edition. Ludmila I. Kuncheva.
© 2014 John Wiley & Sons, Inc. Published 2014 by John Wiley & Sons, Inc.

1

CuuDuongThanCong.com


www.it-ebooks.info


×