Tải bản đầy đủ (.ppt) (35 trang)

rapid object detection using a boosted cascade of

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (650.28 KB, 35 trang )

Nh n D ng Đ i Tậ ạ ố
Nh n D ng Đ i Tậ ạ ố
ượ
ượ
ng
ng
S d ng thu t toán Adaptive Boostingử ụ ậ
S d ng thu t toán Adaptive Boostingử ụ ậ
Original Author
Original Author


Paul Viola & Michael Jones
Paul Viola & Michael Jones
Ng iườ Trình Bày: Nguy n Đễ ăng Bình
(moving or acting with great speed)
(moving or acting with great speed)
(increase the strength or value of Sth)
(increase the strength or value of Sth)
Outline
Outline

Gi i thi u (Introduction)ớ ệ
Gi i thi u (Introduction)ớ ệ

Thu t toán Boosting cho h c phân l p ậ ọ ớ
Thu t toán Boosting cho h c phân l p ậ ọ ớ
(The Boost algorithm for classifier learning)
(The Boost algorithm for classifier learning)

L a ch n ự ọ


L a ch n ự ọ
đặ
đặ
c tính,
c tính,
đặ
đặ
c tr
c tr
ư
ư
ng (Feature Selection)
ng (Feature Selection)

Hàm phân l p y u (Weak learner constructor)ớ ế
Hàm phân l p y u (Weak learner constructor)ớ ế

Phân l p m nh (The strong classifier)ớ ạ
Phân l p m nh (The strong classifier)ớ ạ

Khó kh
Khó kh
ă
ă
n (A tremendously difficult problem)
n (A tremendously difficult problem)

K t qu (Result)ế ả
K t qu (Result)ế ả


K t lu n (Conclusion)ế ậ
K t lu n (Conclusion)ế ậ
What had we done?
What had we done?

Ti p c n máy h c ế ậ ọ
Ti p c n máy h c ế ậ ọ
cho phát hi n và nh n d ng ệ ậ ạ
cho phát hi n và nh n d ng ệ ậ ạ
đ iố
đ iố


t
t
ượ
ượ
ng tr c quanự
ng tr c quanự

Kh nả
Kh nả
ă
ă
ng x lý nh ử ả
ng x lý nh ử ả
c c kỳ nhanh (extremely rapidly)ự
c c kỳ nhanh (extremely rapidly)ự

Achieving

Achieving
high detection rates
high detection rates

Three key
Three key
contributions
contributions

A new image representation
A new image representation


Integral Image
Integral Image

A learning algorithm( Based on AdaBoost[5])
A learning algorithm( Based on AdaBoost[5])

A combining classifiers method
A combining classifiers method


cascade classifiers
cascade classifiers
Select a small # of visual features from a larger set
yield an efficient classifiers
Select a small # of visual features from a larger set
yield an efficient classifiers
Speed up the feature evaluation

Speed up the feature evaluation
Discard the background regions
of the image
Discard the background regions
of the image
Working only with a single grey scale image
Working only with a single grey scale image
A demonstration on face detection
A demonstration on face detection

A frontal face detection system
A frontal face detection system

The detector run at 15 frames per second without resorting to image differencing or
The detector run at 15 frames per second without resorting to image differencing or
skin color detection
skin color detection
Image difference in video sequences
Image difference in video sequences
384 x 288 on a PentiumIII 700 MHz
384 x 288 on a PentiumIII 700 MHz
The broad practical
The broad practical
applications
applications


for a
for a
extremely fast face detector

extremely fast face detector

User Interface, Image Databases,
User Interface, Image Databases,
Teleconferencing
Teleconferencing

The system can be implemented on a small low
The system can be implemented on a small low
power devices.
power devices.
Compaq iPaq  2 frame/sec
Training process for classifier
Training process for classifier

The attentional operator is trained to detect
The attentional operator is trained to detect
examples of a particular class
examples of a particular class
a supervised
a supervised
training process
training process
In the domain of face detection
< 1% false negative
<40% false postivie
In the domain of face detection
< 1% false negative
<40% false postivie
Face classifier is constructed

Cascaded detection process
Cascaded detection process

The sub-windows are processed by a sequence
The sub-windows are processed by a sequence
of classifiers
of classifiers
each slightly more complex than the last
Any classifier rejects the sub-window,
no further processing is performed
Any classifier rejects the sub-window,
no further processing is performed

The process is essentially that of a degenerate
The process is essentially that of a degenerate
decision tree
decision tree
Our object detection framework
Our object detection framework
Original Image
Integral Image
Integral Image
In order to computing
features rapidly at many
scales
Haar Basis Functions
Haar Basis Functions
Haar Basis Functions
Haar Basis Functions
Haar Basis Functions

Haar Basis Functions
Feature Evaluation
Modified Ada Boost Procedure
Modified Ada Boost Procedure
Feature Selection
Large # of features
Small set of critical features
Small set of critical features
Cascaded Classifiers Structure
Cascaded Classifiers Structure
Feature Selection
Feature Selection
The detection process is based
on the feature rather than the
pixels directly.
Two Reasons:
The ad-hoc domain knowledge is difficult
to learn using a finite quantify of
training data.
The feature based system operates much
faster
Two Reasons:
The ad-hoc domain knowledge is difficult
to learn using a finite quantify of
training data.
The feature based system operates much
faster

The simple features are used
The simple features are used

The Haar basis functions which have been used by
Papageorgiou et al.[9]
Three kinds of features
Three kinds of features


Feature Selection
Feature Selection
The difference between the sum of pixels
within two rectangular regions
The difference between the sum of pixels
within two rectangular regions
Two-Rectangle Feature
Two-Rectangle Feature
The region have the same size and shape
And are horizontally or vertically adjacent
The region have the same size and shape
And are horizontally or vertically adjacent
The base resolution is 24x24
The exhaustive set of rectangle is large,
over 180,000.
The base resolution is 24x24
The exhaustive set of rectangle is large,
over 180,000.
Three-
Rectangle
Feature
Three-
Rectangle
Feature

the sum within two
outside rectangle
subtracted from the sum
in a center rectangle
The difference
between the
diagonal pairs of
rectangles
Four-
Rectangle
Feature
Four-
Rectangle
Feature
;0),1(
,0)1,(
),,(),1(),(
),,()1,(),(
=−
=−
+−=
+−=
yii
xs
yxsyxiiyxii
yxiyxsyxs
;0),1(
,0)1,(
),,(),1(),(
),,()1,(),(

=−
=−
+−=
+−=
yii
xs
yxsyxiiyxii
yxiyxsyxs
Integral Image
Integral Image
A intermediated representation
for rapidly computing the
rectangle features
A intermediated representation
for rapidly computing the
rectangle features

≤≤
=
yyxx
yxiyxii
''
,
''
),(),(

≤≤
=
yyxx
yxiyxii

''
,
''
),(),(
The integral image
The integral image
The original image
The original image
The recurrences pair
for one pass
computing
The recurrences pair
for one pass
computing
The cumulative row sum
The cumulative row sum
1 2 5
3 4 6
7 8 9
1 2 5
4 6 11
11 14 20
s
s
i
i
+
+
+
+

1 3 8
4 10 21
11 25 45
ii
ii
+
+
3
1
4
9
Calculating any rectangle sum with
Calculating any rectangle sum with
integral image
integral image
1  A
2  A + B
3  A + C
4  A + B + C + D
1  A
2  A + B
3  A + C
4  A + B + C + D
Rectangle Sum
D = 4 - 3 - 2 + 1
Rectangle Sum
D = 4 - 3 - 2 + 1
AdaBoost learning algorithm
 Is used to do the feature selection task
Learning Classification

Learning Classification
Functions
Functions
Learning Process
Learning Process
Feature Set
Training set
1. Positive
2. Negative
A variant AdaBoost procedure
A variant AdaBoost procedure
Face
non-
Face
The final strong classifier
Over 180,000 rectangle features associate with each sub-image
Over 180,000 rectangle features associate with each sub-image
24
24
Weak
Learner 1
Weak
Learner 1
Weak
Learner 2
Weak
Learner 2
Weak
Learner 2
Weak

Learner 2
The final strong classifier
The Boost
The Boost
algorithm for
algorithm for
classifier learning
classifier learning
),(, ),,(),,(
2211 nn
yxyxyx
),(, ),,(),,(
2211 nn
yxyxyx
Image
Positive =1
Negative=0
Step 1: Giving example images
Step 2: Initialize the
weights
positives. and negatives of # theare and
,1,0for
2
1
,
2
1
,1
lm
y

lm
w
ii
==
positives. and negatives of # theare and
,1,0for
2
1
,
2
1
,1
lm
y
lm
w
ii
==
For t = 1, … , T
1. Normalize the weights,
2. For each feature j, train a classifier hj which is restricted to using a single
feature
3. Update the weights:
For t = 1, … , T
1. Normalize the weights,
2. For each feature j, train a classifier hj which is restricted to using a single
feature
3. Update the weights:
ondistributiprobabity a is that so ,
1

,
,
, t
n
j
jt
it
it
w
w
w
w

=

ondistributiprobabity a is that so ,
1
,
,
, t
n
j
jt
it
it
w
w
w
w


=

.
error lowest with the, ,classifier theChoose
|)(|
, respect to with evaluated iserror The
tt
i
iijij
t
h
yxhw
w
ε
ε

−=
.
error lowest with the, ,classifier theChoose
|)(|
, respect to with evaluated iserror The
tt
i
iijij
t
h
yxhw
w
ε
ε


−=



==

+

otherwise
correctly classified is if,
,
,
1
,,1
it
itit
e
titit
w
xw
ww
i
β
β



==


+

otherwise
correctly classified is if,
,
,
1
,,1
it
itit
e
titit
w
xw
ww
i
β
β
Weak learner constructor
t
t
t
ε
ε
β

=
1
t
t

t
ε
ε
β

=
1
Training set
Weak learner constructor 圖示解說
1
w
1
w
2
w
2
w
n
w
n
w
j
f
j
f
j
f
j
f
j

f
j
f
j
f
j
f
Features
Features
Over 180,000 features
for each subimage
Over 180,000 features
for each subimage
1
ε
1
ε
2
ε
2
ε
3
ε
3
ε
000,180
ε
000,180
ε


−=
i
iijij
yxhw |)(|
ε

−=
i
iijij
yxhw |)(|
ε
Errors
Errors
min
ε
min
ε
1
h
1
h
2
h
2
h
3
h
3
h
000,180

h
000,180
h
t
h
t
h
.
error lowest with the, ,classifier theChoose
tt
h
ε
.
error lowest with the, ,classifier theChoose
tt
h
ε
Normalized the weights
1
w
1
w
2
w
2
w
n
w
n
w

i
w
i
w
miss correct correct miss
t
t
itit
ww
ε
ε

=
+
1
,,1
t
t
itit
ww
ε
ε

=
+
1
,,1
Update the weights
Training the weak learner
Training the weak learner

圖解說明
圖解說明
X (Training set)
)(xf
j
)(xf
j
ex
Face examples Non-Face examples
If fj(x) >
X is a face
θ
θ

−=
i
iijij
yxhw |)(|
ε
1)( =
ij
xh
False positive
False negative
feature a is
sign, inequality theofdirection theindicating
, thresholda is
0
)( if,1
)(

j
j
j
jjjj
j
f
P
where
otherwise
PxfP
xh
θ
θ



<
=
feature a is
sign, inequality theofdirection theindicating
, thresholda is
0
)( if,1
)(
j
j
j
jjjj
j
f

P
where
otherwise
PxfP
xh
θ
θ



<
=
AdaBoosting
AdaBoosting

Place the most weight on the examples must
Place the most weight on the examples must
often misclassified by the preceding weak rules
often misclassified by the preceding weak rules

Forcing the base learner to focus its attention on the
Forcing the base learner to focus its attention on the


hardest
hardest


examples
examples

The Boost algorithm for classifier learning
The Boost algorithm for classifier learning
),(, ),,(),,(
2211 nn
yxyxyx
),(, ),,(),,(
2211 nn
yxyxyx
Step 1: Giving example images
Step 2: Initialize the
weights
positives. and negatives of # theare and
,1,0for
2
1
,
2
1
,1
lm
y
lm
w
ii
==
positives. and negatives of # theare and
,1,0for
2
1
,

2
1
,1
lm
y
lm
w
ii
==
For t = 1, … , T
1. Normalize the weights,
2. For each feature j, train a classifier hj which is restricted to using a single
feature
3. Update the weights:
For t = 1, … , T
1. Normalize the weights,
2. For each feature j, train a classifier hj which is restricted to using a single
feature
3. Update the weights:
Weak learner constructor
Final strong classifier
1+t
h
1+t
h
2+t
h
2+t
h
3+t

h
3+t
h
Selected the weaker classifiers
Selected the weaker classifiers
t
t
t
ε
ε
β

=
1
t
t
t
ε
ε
β

=
1

−=
i
iijij
yxhw |)(|
ε


−=
i
iijij
yxhw |)(|
ε
The Big Picture on testing process
The Big Picture on testing process
Ada Boosting Learner
Ada Boosting Learner
1
h
1
h
Feature set
Feature set
Feature Select
& Classifier
Stage 1
Stage 1
False (Reject)
Ada Boosting Learner
Ada Boosting Learner
Stage 2
Stage 2
1
h
1
h
2
h

2
h
10
h
10
h
Pass
False (Reject)
Ada Boosting Learner
Ada Boosting Learner
Stage 3
Stage 3
1
h
1
h
2
h
2
h
more
more
Pass
False (Reject)
Reject as many negatives as possible  (minimize the false negative)
Reject as many negatives as possible  (minimize the false negative)
100% Detection Rate
50% False Positive
A tremendously difficult problem
A tremendously difficult problem


How to determine
How to determine

The number of classifier stages
The number of classifier stages

The number of features in each stages
The number of features in each stages

The threshold of each stage
The threshold of each stage
Ada Boosting Learner
Ada Boosting Learner
1
h
1
h
Training example
Training example
Feature Select
& Classifier
Stage 1
Stage 1
False (Reject)
face
Non-face
100% Detection Rate
50% False Positive
Ada Boosting Learner

Ada Boosting Learner
Stage 2
Stage 2
1
h
1
h
2
h
2
h
10
h
10
h
Pass
False (Reject)
Result
Result

A 38 layer cascaded classifier was trained to
A 38 layer cascaded classifier was trained to
detect frontal upright faces
detect frontal upright faces

Training set:
Training set:

Face
Face

: 4916 hand labeled faces with resolution 24x24.
: 4916 hand labeled faces with resolution 24x24.

Non-face:
Non-face:
9544 images contain no face.
9544 images contain no face.


(350 million subwindows within these non-face images)
(350 million subwindows within these non-face images)

Features
Features

The first five layers of the detector: 1, 10, 25, 25 and 50 features
The first five layers of the detector: 1, 10, 25, 25 and 50 features

Total # of features in all layer
Total # of features in all layer


6061
6061
Result
Result

Each classifier in the cascade was trained
Each classifier in the cascade was trained


Face :
Face :
4916 + the vertical mirror image
4916 + the vertical mirror image


9832 images
9832 images

Non-face sub-windows: 10,000
Non-face sub-windows: 10,000
(size=24x24)
(size=24x24)
Outline
Outline


Result
Result

Speed of the final Detector
Speed of the final Detector

Image Processing
Image Processing

Scanning the Detector
Scanning the Detector

Integration of Multiple Detector

Integration of Multiple Detector

Experiments on a Real-World Test Set
Experiments on a Real-World Test Set
Speed of the final Detector
Speed of the final Detector


Result
Result

The speed is directly related to
The speed is directly related to
the number of
the number of
features
features
evaluated per scanned sub-window.
evaluated per scanned sub-window.

MIT+CMU test set
MIT+CMU test set

An average of 10 features
An average of 10 features
out of a total 6061 are evaluated per sub-window.
out of a total 6061 are evaluated per sub-window.

On a 700Mhz PentiumIII, a 384 x 288 pixel
On a 700Mhz PentiumIII, a 384 x 288 pixel

image in about
image in about
.067
.067
seconds
seconds
(using a staring scale of
(using a staring scale of
1.25 and a step size of 1.5)
1.25 and a step size of 1.5)

×