Tải bản đầy đủ (.pdf) (90 trang)

Bài giảng khai phá dữ liệu (data mining) ensemble models

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.84 MB, 90 trang )

Trịnh Tấn Đạt
Khoa CNTT – Đại Học Sài Gòn
Email:
Website: />

Contents
 Introduction
 Voting
 Bagging

 Boosting
 Stacking and Blending


Introduction


Definition
 An ensemble of classifiers is a set of classifiers whose individual decisions

are combined in some way (typically, by weighted or un-weighted voting)
to classify new examples
 Ensembles are often much more accurate than the individual classifiers that

make them up.


Learning Ensembles
 Learn multiple alternative definitions of a concept using different training

data or different learning algorithms.


 Combine decisions of multiple definitions, e.g. using voting.
Training Data



Data 1

Data 2

Learner 1

Learner 2



Model 1

Model 2



Model Combiner

Data K

Learner K
Model K
Final Model



Necessary and Sufficient Condition
 For the idea to work, the classifiers should be
 Accurate
 Diverse
 Accurate: Has an error rate better than random guessing on new instances
 Diverse: They make different errors on new data points


Why they Work?
 Suppose there are 25 base classifiers
 Each classifier has an error rate,  = 0.35
 Assume classifiers are independent
 Probability that the ensemble classifier makes a wrong prediction:

 25  i
25 − i





= 0.06
(
1
)

 i 
i =13 

25


Marquis de Condorcet (1785)

Majority vote is wrong with probability:


Value of Ensembles
 When combing multiple independent and diverse decisions each of which is at

least more accurate than random guessing, random errors cancel each other out,
correct decisions are reinforced.
 Human ensembles are demonstrably better
 How many jelly beans in the jar?: Individual estimates vs. group average.


A Motivating Example
 Suppose that you are a patient with a set of symptoms
 Instead of taking opinion of just one doctor (classifier), you decide to take

opinion of a few doctors!
 Is this a good idea? Indeed it is.
 Consult many doctors and then based on their diagnosis; you can get a fairly
accurate idea of the diagnosis.


The Wisdom of Crowds
 The collective knowledge of a diverse and independent body of people

typically exceeds the knowledge of any single individual and can be
harnessed by voting



When Ensembles Work?
 Ensemble methods work better with ‘unstable classifiers’
 Classifiers that are sensitive to minor perturbations in the training set
 Examples:
 Decision trees
 Rule-based
 Artificial neural networks


Ensembles
 Homogeneous Ensembles : all individual models are obtained with the same learning

algorithm, on slightly different datasets
 Use a single, arbitrary learning algorithm but manipulate training data to make it
learn multiple models.



Data1  Data2  …  Data K
Learner1 = Learner2 = … = Learner K

 Different methods for changing training data:
 Bagging: Resample training data
 Boosting: Reweight training data

 Heterogeneous Ensembles : individual models are obtained with different algorithms
 Stacking and Blending



combining mechanism is that the output of the classifiers (Level 0 classifiers) will be used as training data
for another classifier (Level 1 classifier)


Methods of Constructing Ensembles
1.
2.
3.

4.
5.
6.

Manipulate training data set
Cross-validated Committees
Weighted Training Examples
Manipulating Input Features
Manipulating Output Targets
Injecting Randomness


Methods of Constructing Ensembles - 1
1.

Manipulate training data set
 Bagging (bootstrap aggregation)
 On each run, Bagging presents the learning algorithm with a training set

drawn randomly, with replacement, from the original training data. This

process is called boostrapping.
 Each bootstrap aggregate contains, on the average 63.2% of original training
data, with several examples appearing multiple times


Methods of Constructing Ensembles - 2
2. Cross-validated Committees
 Construct training sets by leaving out disjointed sets of training data
 Idea similar to k-fold cross validation
3. Maintain a set of weights over the training examples. At each iteration

the weights are changed to place more emphasis on misclassified
examples (Adaboost)


Methods of Constructing Ensembles - 3
4. Manipulating Input Features
 Works if the input features are highly redundant (e.g., down sampling FFT
bins)
5. Manipulating Output Targets
6. Injecting Randomness


Variance and Bias
 Bias is due to differences

between the model and the
true function.
 Variance
represents

the
sensitivity of the model to
individual data points


Variance and Bias


Variance and Bias


Variance and Bias


Bias-Variance tradeoff


Voting


Simple Ensemble Techniques
 Max Voting: multiple models are used to make predictions for each data point.

The predictions by each model are considered as a ‘vote’. The predictions which
we get from the majority of the models are used as the final prediction.

from sklearn.ensemble import VotingClassifier
model1 = LogisticRegression(random_state=1)
model2 = tree.DecisionTreeClassifier(random_state=1)
model = VotingClassifier(estimators=[('lr', model1), ('dt', model2)], voting='hard')

model.fit(x_train,y_train)
model.score(x_test,y_test)


Simple Ensemble Techniques
 Averaging: multiple predictions are made for each data point in averaging. In this method, we

take an average of predictions from all the models and use it to make the final prediction.
Averaging can be used for making predictions in regression problems or while calculating
probabilities for classification problems.
model1 = tree.DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3= LogisticRegression()
model1.fit(x_train,y_train)
model2.fit(x_train,y_train)
model3.fit(x_train,y_train)
pred1=model1.predict_proba(x_test)
pred2=model2.predict_proba(x_test)
pred3=model3.predict_proba(x_test)
finalpred=(pred1+pred2+pred3)/3


Simple Ensemble Techniques
 Weighted Average: All models are assigned different weights defining

the importance of each model for prediction

finalpred=(pred1*0.3+pred2*0.3+pred3*0.4)



×