Trịnh Tấn Đạt
Khoa CNTT – Đại Học Sài Gòn
Email:
Website: />
Contents
Introduction
Voting
Bagging
Boosting
Stacking and Blending
Introduction
Definition
An ensemble of classifiers is a set of classifiers whose individual decisions
are combined in some way (typically, by weighted or un-weighted voting)
to classify new examples
Ensembles are often much more accurate than the individual classifiers that
make them up.
Learning Ensembles
Learn multiple alternative definitions of a concept using different training
data or different learning algorithms.
Combine decisions of multiple definitions, e.g. using voting.
Training Data
Data 1
Data 2
Learner 1
Learner 2
Model 1
Model 2
Model Combiner
Data K
Learner K
Model K
Final Model
Necessary and Sufficient Condition
For the idea to work, the classifiers should be
Accurate
Diverse
Accurate: Has an error rate better than random guessing on new instances
Diverse: They make different errors on new data points
Why they Work?
Suppose there are 25 base classifiers
Each classifier has an error rate, = 0.35
Assume classifiers are independent
Probability that the ensemble classifier makes a wrong prediction:
25 i
25 − i
−
= 0.06
(
1
)
i
i =13
25
Marquis de Condorcet (1785)
Majority vote is wrong with probability:
Value of Ensembles
When combing multiple independent and diverse decisions each of which is at
least more accurate than random guessing, random errors cancel each other out,
correct decisions are reinforced.
Human ensembles are demonstrably better
How many jelly beans in the jar?: Individual estimates vs. group average.
A Motivating Example
Suppose that you are a patient with a set of symptoms
Instead of taking opinion of just one doctor (classifier), you decide to take
opinion of a few doctors!
Is this a good idea? Indeed it is.
Consult many doctors and then based on their diagnosis; you can get a fairly
accurate idea of the diagnosis.
The Wisdom of Crowds
The collective knowledge of a diverse and independent body of people
typically exceeds the knowledge of any single individual and can be
harnessed by voting
When Ensembles Work?
Ensemble methods work better with ‘unstable classifiers’
Classifiers that are sensitive to minor perturbations in the training set
Examples:
Decision trees
Rule-based
Artificial neural networks
Ensembles
Homogeneous Ensembles : all individual models are obtained with the same learning
algorithm, on slightly different datasets
Use a single, arbitrary learning algorithm but manipulate training data to make it
learn multiple models.
Data1 Data2 … Data K
Learner1 = Learner2 = … = Learner K
Different methods for changing training data:
Bagging: Resample training data
Boosting: Reweight training data
Heterogeneous Ensembles : individual models are obtained with different algorithms
Stacking and Blending
combining mechanism is that the output of the classifiers (Level 0 classifiers) will be used as training data
for another classifier (Level 1 classifier)
Methods of Constructing Ensembles
1.
2.
3.
4.
5.
6.
Manipulate training data set
Cross-validated Committees
Weighted Training Examples
Manipulating Input Features
Manipulating Output Targets
Injecting Randomness
Methods of Constructing Ensembles - 1
1.
Manipulate training data set
Bagging (bootstrap aggregation)
On each run, Bagging presents the learning algorithm with a training set
drawn randomly, with replacement, from the original training data. This
process is called boostrapping.
Each bootstrap aggregate contains, on the average 63.2% of original training
data, with several examples appearing multiple times
Methods of Constructing Ensembles - 2
2. Cross-validated Committees
Construct training sets by leaving out disjointed sets of training data
Idea similar to k-fold cross validation
3. Maintain a set of weights over the training examples. At each iteration
the weights are changed to place more emphasis on misclassified
examples (Adaboost)
Methods of Constructing Ensembles - 3
4. Manipulating Input Features
Works if the input features are highly redundant (e.g., down sampling FFT
bins)
5. Manipulating Output Targets
6. Injecting Randomness
Variance and Bias
Bias is due to differences
between the model and the
true function.
Variance
represents
the
sensitivity of the model to
individual data points
Variance and Bias
Variance and Bias
Variance and Bias
Bias-Variance tradeoff
Voting
Simple Ensemble Techniques
Max Voting: multiple models are used to make predictions for each data point.
The predictions by each model are considered as a ‘vote’. The predictions which
we get from the majority of the models are used as the final prediction.
from sklearn.ensemble import VotingClassifier
model1 = LogisticRegression(random_state=1)
model2 = tree.DecisionTreeClassifier(random_state=1)
model = VotingClassifier(estimators=[('lr', model1), ('dt', model2)], voting='hard')
model.fit(x_train,y_train)
model.score(x_test,y_test)
Simple Ensemble Techniques
Averaging: multiple predictions are made for each data point in averaging. In this method, we
take an average of predictions from all the models and use it to make the final prediction.
Averaging can be used for making predictions in regression problems or while calculating
probabilities for classification problems.
model1 = tree.DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3= LogisticRegression()
model1.fit(x_train,y_train)
model2.fit(x_train,y_train)
model3.fit(x_train,y_train)
pred1=model1.predict_proba(x_test)
pred2=model2.predict_proba(x_test)
pred3=model3.predict_proba(x_test)
finalpred=(pred1+pred2+pred3)/3
Simple Ensemble Techniques
Weighted Average: All models are assigned different weights defining
the importance of each model for prediction
finalpred=(pred1*0.3+pred2*0.3+pred3*0.4)