Tải bản đầy đủ (.pdf) (6 trang)

Nhận dạng đối tượng dựa trên các đặc trưng "sparse feature" của ảnh màu

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (8.7 MB, 6 trang )

Hội thảo quốc gia lần thứ XVI: Một số vấn đề chọn lọc của Công nghệ thông tin và truyền thông- Đà Nẵng,14-15/11/2013

Object Recognition using Sparse Features of Color Images

T.T Quyen Bui
1
, Keum-Shik Hong
2
, Dang-Chung Nguyen
1
, Anh-Tuan Do
1
,
Thanh-Long Nguyen
1
, Ngoc-Minh Pham
1
, Quang-Vinh Thai
1

1
Department of Automation Technology, Institute of Information Technology, Hanoi, Vietnam
2
Department of Cogno-Mechatronics Engineering and School of Mechanical Engineering,
Pusan National University, Busan, Korea
Email:


Abstract—In this paper, we propose a new framework for
extraction of sparse features of color images. The framework is
based on the structure of the standard model of the visual


cortex: however, in place of symmetric Gabor filters, Gabor
energy filters are utilized. Color information is taken into
account in calculating sparse features of objects. The learning
stage yields a set of prototype patches of color components that
are simultaneously chosen over spatial position, spatial size, and
multiple scales. A set of sparse features is obtained by means of
the localized pooling method, after which is passed to a
classifier for object recognition and classification. The
experimental results confirm the significant contribution of our
framework in object recognition.

Keywords-object recognition; gabor energy filter; sparse
features; color-based features;
I. INTRODUCTION
Object recognition is one of difficult challenges in the
field of computer vision. Among the vast variety of existing
approaches to object recognition, methods using a deep,
biologically inspired architecture have proven remarkably
successful. A computational model of object recognition in
the visual cortex is introduced in [1]. The model consists of
five levels, starting with a grey-scale image layer I and
proceeding, in the higher levels, to alternating single S and
complex C units. The S units mix their inputs according to a
bell-shaped tuning function to increase selectivity, while the
C units pool their inputs by a maximum operation in order
to increase invariance. A cortex-like mechanism that uses a
symmetric Gabor filter with input grey images for
recognition of complex visual scenes is presented in detail
in [2]. The use of sparse features with limited receptive
fields for multiclass object recognition is introduced in [3].

In Mutch and Lowe’s approach [4], symmetric Gabor
filters are applied at all grey-image position and scales, by
means of alternating template matching and a max pooling
operation, feature complexity and position/scale invariance
are developed. Sparsity is increased by constraining the
number of features inputs, lateral inhibition, and features
selection. The result is that images are reduced to features
vectors that are computed by using a localized pooling
method and classified by a support vector machine (SVM).
Recently, Zhang et al. [5] proposed a framework in which
two functional classes of color-sensitive neurons [6], single-
opponent and double-opponent neurons, are used as the
inputs of the model of Serre et al. [2]. Thériault et al. [7]
improved the architecture of the visual cortex model by
allowing filters (prototype patches) extracted from level C1
to combine multiple scales inside the same spatial
neighborhood. This provides a better match to the local
scale of image structure. Multi-resolution spatial pooling, at
level C2, also is performed, as a result of which, both local
and global spatial information are encoded to produce
discriminative image signatures. Thériault et al. [7]
demonstrated that their model outperforms several previous
models, for example, those based on biologically inspired
architecture [2][4][8] or bag-of-words (BoW) architecture
[9-10].
Gabor filters and color information have been usefully
exploited in several object recognition studies owing to the
effectiveness of Gabor filters as (localization) detectors of
lines and edges [11] and the high discriminative power of
color. The discerning ability of color information is

important in many pattern recognition and image processing
applications [12-13]. Certainly, the additional information
provided by color often yields better results than methods
that use only grey-scale information [14-16]. Bui and Hong
[15] examined the use of color cues in the CIELAB color
space in a color-based active basis model incorporating a
template-based approach and the local power spectrums
(LPSs) of color components combined with gradient-based
features for object recognition. Martin et al. [17] explored
the utility of the local brightness, color, and texture cues in
detecting natural image boundaries.
Based on the structure of the model of the visual cortex,
the characteristics of Gabor filters, and the color cues of
LAB color images, we propose a new framework in which
the color information, Gabor energy filters, and the
architecture of the visual cortex model are utilized for object
recognition. Color information, unlike grey images as the
input of the visual cortex model, is taken into account in
calculating the sparse features of the objects. The CIELAB
color space was chosen, as it is designed to approximate
human vision. We endeavored to maintain the structure of
the generic framework of alternating convolution and max
pooling operations, which enhances selectivity and
invariance. An image represented in RGB color space is
converted to LAB color space. And in place of symmetric
Gabor filters, a set of spatial Gabor energy filters is applied
to each color component. In the learning stage, we adapt the
learning strategy presented in [7]. A set of discriminative
prototype patches of color components is selected randomly
Hội thảo quốc gia lần thứ XVI: Một số vấn đề chọn lọc của Công nghệ thông tin và truyền thông- Đà Nẵng,14-15/11/2013


over spatial position, spatial size, and several scales
simultaneously and is extracted by the local maximum over
scales and orientations. After alternating layers of feature
mapping (convolution) and of feature pooling, a set of
sparse features is computed by a localized pooling method
and is exploited by an SVM classifier [18] for object
recognition.
The research contributions presented in this paper can be
summarized as follows: As substitutes for the symmetric
Gabor filters employed in previous models, Gabor energy
filters are utilized in our framework; color information is
taken into account in calculating sparse features of objects
based on the structure of the model of the visual cortex;
experimental validations of our proposed framework with
respect to previous models are provided. For the purposes of
experiments and in order to compare our model with
previous ones, we use the leaves dataset [19], along with
airplane, motorcycle, and rear-car datasets [20], as well as
subsets of the CalTech101 [21].
This paper is organized as follows: A standard model of
the visual cortex for object recognition is overviewed briefly
in Section 2. The methodology of our proposed framework
for extraction of sparse features of color images is presented
in Section 3. The results of experiments conducted on
several image datasets for verification of our object
recognition approach are reviewed in Section 4. Finally,
concluding remarks are drawn in Section 5.
II. THE STANDARD VISUAL CORTEX MODEL
A standard object recognition model based on a theory of

the ventral visual stream of the visual cortex was introduced
in [1]. The structure of the model is kept in modified
versions [2][4][7]. Serre et al. [2] used this model structure,
with Gabor filter sets at eight bands with 16 scales, as an
alternative to the original Gaussian filter bank; they added a
learning step, and investigated the contributions of the C1
and C2 features to object recognition. The basic network
calculation in [2] is summarized as follows.
Layer S1: The two-dimensional Gabor function
),(
,,,
yxg

, wherein
2
),( yx
, is centered at the
origin and is given by [22]
  )
~
2cos(),(
2
222
2
~~
,,,










x
eyxg
yx
 
where

sincos
~
yxx 
,

cossin
~
yxy 
, and


is the spatial aspect ratio that determines the ellipticity of
the support of the Gabor function. Further,

is the
standard deviation of the Gaussian envelope and determines
the size of the receptive field;



is the wavelength of the
cosine factor where

/1

f
is the spatial frequency;

,
where
),0[



, represents the orientation of the normal to
the parallel stripes of the Gabor function, and finally,

,
where
],(





, is the phase offset that determines the
symmetry of
),(
,,,
yxg


with respect to the origin.
Layer S1 is the response of the convolution of the input
image
),( yxI
and a set of spatial symmetric Gabor filters,
),(
0,,,
yxg

with orientation
}, ,{
1




N

and scale
}, ,{
1




N

, where


N
and

N
are the numbers of
orientations and scales, respectively:
 
),(*),(S1
0,,,,
yxgyxI


 
The S1 stage resembles an edge detector, since symmetric
Gabor filters are active only near image edges. The S1 unit
is a four-dimensional matrix (x/y/θ/σ).
Layer C1: Layer C1 is a result of the selection of the
maxima over the local spatial neighborhood and the down-
sampling of the result. This pooling increases the tolerance
to two-dimensional transformation from layer S1 to C1,
thereby providing robustness to scale and translation.
Similarly, the C1 unit is a four-dimensional matrix (x/y/θ/σ).
Layer S2: At every position and scale in the C1 layer,
layer S2 is generated by template matching between the
patch of C1 units centered at that position/scale and each of
the N
p
prototype patches. The S2 unit is calculated as
 
)exp(S2

2
,
PX 


 
where β is a tunable parameter, X is an image patch from
the C1 layer at scale σ, and P is one of the N
p
features. Both
X and patch P are of size n × n, where n ∈ {4, 8, 12, 16}. S2
maps are calculated across all positions for each of scale


Ni
i
, ,1, 
. Here, N
p
prototype patches (small image
patches having dimensionality n × n ×

N
) are randomly
sampled from the C1 layers of training images.
Layer C2: A set of C2 features that is sift- and scale-
invariant is generated by applying the global maximum over
all scales and positions from the S2 map. A vector of N
p
C2

values are obtained for each image.
III. SPARSE FEATURES OF COLOR IMAGES
In the visual cortex model in which symmetric Gabor
filters are used [2][4-5][7], sparse features are actually the
values that are related to the edges of objects after a certain
number of mappings since Gabor filters in the first layer
resemble edge detecting filters and are active only near
image edges. According to the phase offset value of the
Gabor function in equation (1), there are two filter types:
anti-symmetric and symmetric. A filter that deploys an anti-
symmetric Gabor function (phase offset 90 or -90 degrees)
yields a maximum response exactly at an edge; however,
due to the ripples of the Gabor function, there are flanking
responses. A filter that deploys a symmetric Gabor function
(phase offset 0 or 180 degrees) yields a maximum that is
shifted from the edge. There are actually two maxima: one
to the left and the other to the right of the edge. This can
cause problems in selecting maxima over local
neighborhoods for layer C1, since the symmetric Gabor
filter has been used in layer L1 in previous research [2][4-
5][7]. Moreover, it is adverse to the calculation of a set of
sparse features in layer C2, which uses the localized pooling
method whereby the maximum output for each test image is
achieved in the neighborhood of the training position of the
prototype patch extracted from layer C1 [4][7].
Hội thảo quốc gia lần thứ XVI: Một số vấn đề chọn lọc của Công nghệ thông tin và truyền thông- Đà Nẵng,14-15/11/2013

When a Gabor energy filter is used, the responses to all
lines and edges in the input image are equally strong. This
filter provides a smooth response to an edge or a line of

appropriate width with a local maximum exactly at the edge
or in the center of the line. If we apply thinning to this
response, we will obtain one thin line that follows the
corresponding edge or line [23]. By contrast, a simple linear
Gabor filter (anti-symmetric or symmetric type) provides
flanking responses owing to the ripples of the Gabor
function. Therefore, the use Gabor energy filters will
improve the precision of the calculation of the local
maximum and localized pooling operations.
We propose a new framework for object recognition in
which color information is taken into account in the
calculation of sparse features of objects, as in Fig. 1. Layers
in the framework alternate between “sensitivity” (to
detecting features) and “invariance” (to position, scale, and
orientation). An input natural image represented in RGB
color space is converted to LAB color space. In addition,
rather than symmetric Gabor filters, we use Gabor energy
filters in the calculation of the first layer L1. Each color
component is convoluted with the Gabor energy filter, scale
}, ,{
1




N

, orientations
}, ,{
1





N

. At scales σ
and orientation θ, the Gabor energy model
),,(
,,
cyxE

of
a color component is computed from the superposition of
phases, as follows.
 
   
2
2/,,,
2
0,,,
,,
),(*),,(),(*),,(
),,(
yxgcyxIyxgcyxI
cyxE







   

where c indicates an index of color components,
}, ,{
1
C
N
ccc

, and
C
N
is the number of color
components. We use eight scales,
8

N
, and the
parameters of the Gabor energy filter are listed in Table 1.
The superposition of the Gabor energy filter outputs over
the color components is computed as
 
),,(),(
1
,,,, i
N
i
Sum

cyxEyxE
C




 
The normalized response ),(
,
,,
yxE
Sumi

of the output
superposition is calculated by using a divisive normalization
nonlinearity in which each non-normalized response (cell) is
suppressed by the pooled activity of a large number of non-
normalized responses [24], as follows.
 


 



j
Sumj
Sumi
Sumi
yxE

yxE
KyxE
,
,
),(
,
,,
2
,
,,
,
,,




 
where K is a constant scale factor and ρ
2
is the semi-
saturation constant. The summation



j
Sumj
yxE ,
,
,,


is
taken over a large number of non-normalized responses with
different turnings, and contains


yxE
Sumi
,
,
,,

, which appears

Figure 1. The proposed scheme of object recognition
TABLE I. THE PARAMETERS OF THE GABOR FILTERS. THESE PARAMETERS
ARE USED IN [2]
Scale

Filter size

Gabor σ

Wavelength λ

1

2
3
4
5

6
7
8
7 × 7

11 × 11
15 × 15
19 × 19
23 × 23
27 × 27
31 × 31
35 × 35
2.8

4.5
6.7
8.2
10.2
12.3
14.6
17.0
3.5

5.6
7.9
10.3
12.7
15.5
18.2
21.2


in the numerator. Since the value ρ is non-zero, the
normalized response has a value from 0 to K, saturating for
high contrasts. This normalization step is necessary, since
the new model preserves the essential features of linearity in
the face of apparently contradictory behavior [24]. Here, the
normalization responses are pooled over all orientations.
Parameters K and ρ were 1 and 0.225, respectively, in our
experiments.
+ Layer L2: Layer L2 is the reduced version of L1, and
is obtained by means of the maxima in the local spatial
neighborhood and over two adjacent scales. This pooling
increases the tolerance to position and scale, providing
robustness to translation and scale. We use spatial pooling
sizes that are proportional to the scale  presented in (Serre
et al. 2007). The sizes are g × g where g

{8, 10, 12, 14,
16, 18, 22, 24}.
+ Layer L3: The layer L3 unit is obtained by the
convolution product of layer L2 with a prototype patch P
centered at scale σ. Both components of the convolution
operation are normalized to unit length. The purpose of this
step is to maintain the features’ geometrical similarities
regarding the variations of the light intensity. Before
calculating layer L3 of an image, we need to conduct the
learning stage to extract a set of prototype patches that is
collected from all training images.
+ Layer L4: For each image, the number of sparse
features obtained is N

p
if the number of prototype patches in
the learning stage is N
p
. We use the localized pooling
method for calculating layer L4. The training position of
each prototype patch is recorded. Instead of pooling over the
entire images, the L4 unit is the response of taking the
maximum output for a test image in the neighborhood of the
training position. This approach allows for holding some
geometric information above the L4 level, and gains the
global invariance. Thériault et al. [7] considered both cases,
localized pooling [4] and multi-resolution pooling, and their
Hội thảo quốc gia lần thứ XVI: Một số vấn đề chọn lọc của Công nghệ thông tin và truyền thông- Đà Nẵng,14-15/11/2013

results show that multi-resolution pooling at the L4 stage
yields better performance (an additional 2% increase when
testing on CalTech101 for 30 training examples with 4080
prototype patches). Here, we used the multi-resolution
pooling presented in [7]. The local pooling regions are
circles centered at a training position of each prototype
patch.
+ Learning stage: We adapted the training process
presented in [7], in which the local scale and orientation
learned fit the local image structure. Training the prototype
patches with a lower fitness error increases the network
invariance level to basic local geometrical transformations,
and these prototype patches are less sensitive to local
perturbations around the axes of relevant image structures.
Additionally, local scaling in the network sketches out the

necessary balance between discriminative power and
invariance for classification. Unlike the learning in [2][4-5],
instead of selecting prototype patches at a single scale, the
learning selects a large pool of prototypes of different sizes
n × n, where n ∈ {4, 8, 12, 16} at random positions and
multiple scales simultaneously from the L2 layer of random
training images. In training, the coefficients that correspond
to “weak” scales and orientations are set to zero. Let
P
N be
the number of prototype patches extracted by learning. Let
},,,{
7531

S
and
}1, ,0,/{ 


NkNk
.
The rule for selecting prototype patches is









otherwise0
),,,(max),(if),,,(
),,,(
,
****
**



iii
S
iii
iii
yxByxB
yxP
 
where
),,,(

iii
yxB
is an image patch of size n × n, and n
∈ {4, 8, 12, 16} at a random position
),,(
siii
yxp

,
}, ,1{


Ns
on the layer L2 of a random training image,





,
S
. Patch
),,,(
**

iii
yxB
is obtained from the
local maximum over orientations and scales. This rule
makes a set of prototype patches more discriminative,
whereby weaker scales and orientations are ignored during
the testing process. In the learning stage, the image from
each class is selected randomly, but we pick out prototype
patches on each training dataset of each class equally. The
learning procedure is carried out as Algorithm 1.
+ Classification stage: Two sets of sparse features of
images in training and testing datasets are calculated from
our framework, which are passed to the classifier LibSVM
[18] for training and classification. For multi-class
classification, we used the one-against-all (OAA) SVMs
approach first introduced by Vapnik in [25].
Performance measures: In order to obtain performance

measures, decision functions are applied to predict the labels
(target values) of testing data. The prediction result is
evaluated as follows:
 
%100
datatestingtotal#
datapredictedcorrectly#
accuracy
 



IV. EXPERIMENTAL RESULTS
Here, we illustrate the use of our proposed framework in
object recognition. All images were rescaled to 140 pixels
on the shortest side, and other side was rescaled
automatically to preserve the image aspect ratio. Two color-
based sparse features datasets of training and testing
datasets achieved in the L4 stage were converted to the
LibSVM data format and then passed to LibSVM for
recognition and classification. Our object recognition and
classification process were executed on static images and
not in real-time. The result reported is the mean and
standard deviation of the performance measures after five
runs with random splits. We re-implemented the model in
[7], because it is the most closely related to our model. We
did not focus on the improvement of SVM classifiers. In
order to make fair comparisons between the methods, we
used the same values of the parameters of the algorithms for
learning and recognizing objects (e.g., the number of

prototype patches, fixed splits on datasets, etc.), as well as
SVM parameters.
TABLE II. A COMPARISON OF RECOGNITION PERFORMANCE MEASURES
OBTAINED FOR THE DATASETS. THE NUMBER OF PROTOTYPE PATCHES IN
THE LEARNING STAGE IS 1,000; THE NUMBER OF NEGATIVE IMAGES IS 200.
THE RESULTS IN COLUMNS “BENCHMARK” AND “SERRE ET AL. [7]” ARE
ARCHIEVED BY THEIR OWN IMPLEMENTATION.
Datasets


Unit: %

Bench
-
mark
Serre

et
al. [2]
Thériault

et
al. [7]
Our model

Leaves

84.0

95.9


97.43 ± 0.10

98.85
± 0.10

Motorcycles

95.0

97.4

98.46 ± 0.09

99.60
± 0.06

Airplanes

94.0

94.9

96.50 ± 0.15

98.36
± 0.12

Car rears


84.8


95.32 ± 0.12

98.93
± 0.16


Figure 2. Sample images from leaves, rear-car, motorcycle, and airplane
datasets [19-20]

Leaves Car rears

Motorcycles Airplanes

Algorithm 1 for selection of prototype patches
For i=1: N
P

+ Select one random training image
+ Convert the image to LAB color space
+ Calculate layer L2 of the image
+ Select a random position ),,(
siii
yxp

, }, ,1{

Ns on layer L2

+ Extract a random patch ),,,(

siii
yxB of size n × n at position 


+ If
),,,(max),(
,
**


iii
S
yxB



),,,(),,,(
****

iiiiii
yxByxP 

else
0),,,(
**


iii

yxP

+Record prototype patch
),,,(
**

iii
yxP
and ),,(
siii
yxp


End for i
Hội thảo quốc gia lần thứ XVI: Một số vấn đề chọn lọc của Công nghệ thông tin và truyền thông- Đà Nẵng,14-15/11/2013


Figure 3. Sample images of objects from CalTech101 dataset
TABLE III. RECOGNITION PERFORMANCE OBTAINED ON A SUBSET OF 94
CATERGORIES (7,687 IMAGES) FROM CALTECH101 DATASET. THE USE OF 15
AND 30 TRAINING IMAGES PER CLASS CORRESPOND TO COLUMN 2 AND 3,
RESPECTIVELY. THE NUMBER OF PROTOTYPE PATCHES IS 4,000.
Models

Unit:%

15 images/cat.

30 images/cat.


Thériault

et al. [7]

60.80 ± 0.45

69.63

± 0.38

Our model

65.40

± 0.40

75.76
± 0.32


A. Single-class object recognition
Here, the use of our framework for single-class object
recognition is demonstrated, and experimental results are
compared with those of previous works such as involved
benchmark systems (the constellation models [20]), and
grey-based sparse features in [2][7]. Each object class is
recognized independently, and a set of sparse features of
each object is extracted from each positive training image
dataset. We considered datasets of the following objects:
leaves, car rears, airplanes, and motorcycles from [19-20].

Fig. 2 displays sample images from leaves, rear-car,
motorcycle, and airplane datasets [19-20].
In the testing on the rear-car, leaves, airplane, and
motorcycle datasets, we used the same fixed splits on
datasets as in [20] in all of the experiments: each dataset
was randomly split into two separate datasets of equal size.
One dataset was used for the learning stage, and the second
one for testing. Table 2 provides a summary of the results
achieved with the two methods. The values in the
Benchmark and Serre et al. [2] columns are results
published from the benchmark systems [19-20] and Serre et
al. [2], respectively. Both the models of Serre et al. [2] and
Thériault et al. [7] use grey images as the input of the deep
biologically inspired architectures, but the performance
obtained from the model of Thériault et al. [7] is better,
owing to its incorporation of a learning stage in which
prototype patches are extracted over spatial position, spatial
orientations, and multiple-scale. In addition, in place of
pooling over an entire image as in [2], Thériault et al. [7]
used the localized pooling method for the calculation of
sparse feature sets. Nevertheless, our model yielded the
highest performance, because, along with the learning
strategy presented in [7], Gabor energy filters are utilized to
increase the precision of the local maximum and localized
pooling operations; and further, color information is taken
into account in calculating sparse features of objects. The
experimental results of single-class object recognition
showed that the use of color information in our model
imparts significant improvements to object recognition.
B. Multi-class object recognition

Here, we illustrate the use of our framework for multi-
class object recognition. Unlike the case of single-class
object recognition, universal sparse features are extracted
from random training image datasets and shared by several
object classes. We used subsets from the CalTech101
dataset, in which most objects of interest are central and
dominant. CalTech101 in which objects are against either a
background or a plain natural scene, are composed of both
grey- and color-image types. CalTech101 comprises 101
object classes plus a background class, totaling 9,144
images. However, because our framework works on color
images, we collected subsets from those datasets for our
experiments. Fig. 3 displays sample images of objects from
CalTech101 dataset.
We employed the fixed splits as follows: either 15 or 30
images were randomly selected from each object dataset for
a training set. A testing data set was collected from the
images remaining from the data sets of objects. Table 3
displays the recognition performance achieved on a subset
of 94 categories (7,687 images) from the CalTech101
dataset, corresponding to the cases for 15 and 30 training
images per class when the number of prototype patches in
the learning stage was 4,000. The recognition performance
for 30 training images per class was better than that for 15
training images per class, even though the number of
prototype patches was the same. Moreover, our model
yielded better results in both cases (15 and 30 training
images per class), specifically an improvement in
classification score of around 6%. These results confirm that
that the use of Gabor energy filters and color information in

our deep, biologically inspired architecture yields significant
improvements in object recognition and classification.
V. CONCLUSIONS
In this paper, we presented our new framework in which
a combination of Gabor energy filters, the structure of the
visual cortex model, and color information is used for
extraction of sparse features of color images in the process of
object recognition. In the learning stage, a set of prototype
patches of color components is selected over spatial position,
spatial size, and multiple scales simultaneously and is
extracted by the local maximum over scales and orientations.
A set of sparse features of objects is computed by the
localized pooling method, after which is exploited by the
classifier SVM for object recognition and classification. The
utility of our framework in recognizing objects was
illustrated on various datasets. The experimental results show
that the utility of our framework effects significant object
recognition improvement.
REFERENCES
[1] M. Riesenhuber & T. Poggio, “Hierarchical models of object
recognition in cortex,” Nature Neuroscience, 2(11), 1019–1025,
1999.

Sunflowers Starfishes

Crabs Cougar faces

Dragonflies Crayfishes
Hội thảo quốc gia lần thứ XVI: Một số vấn đề chọn lọc của Công nghệ thông tin và truyền thông- Đà Nẵng,14-15/11/2013


[2] T. Serre, L. Wolf, S. Stanley, M. Riesenhuber, & T. Poggio, “ Robust
object recognition with cortex-like mechanisms,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, 29(3), 411–426, 2007.
[3] J. Mutch & D.G. Lowe, “Multiclass object recognition with sparse
localized features,” In IEEE computer society conference on
computer vision and pattern recognition, NewYork: CVPR, 2006, pp.
11–18.
[4] J. Mutch & D.G. Lowe, “Object class recognition and localization
using sparse features with limited receptive fields,” International
Journal of Computer Vision, 80(1), 45–57, 2008.
[5] J. Zhang, Y. Barhomil, & T. Serre, “A new biologically inspired color
image descriptor,” In Computer vision–ECCV 2012, proceedings of
the 12
th
European conference on computer vision, part V, Heidelberg:
Spring-Verlag Berlin, 2012, pp. 312–324.
[6] R. Shapley & M. Hawken, Color in the cortex: Single- and double-
opponent cells, Vision Research, 51(7), 701–717, 2011.
[7] C. Thériault, N. Thome, & M. Cord, “Extended coding and pooling in
the HMAX model,” IEEE Transactions on Image Processing, 22(2),
764–777, 2013.
[8] Y. Huang, K. Huang, D. Tao, T. Tan, & X. Li, “Enhanced
biologically inspired model for object recognition,” IEEE
Transactions on Systems Man and Cybernetics, part B, 41(6), 1668–
1680, 2011.
[9] J. Pone, S. Lazebnik, & C. Schmid, “Beyond bags of features: Spatial
pyramid matching for recognizing natural scene categories,” In IEEE
computer society conference on computer vision and pattern
recognition, New York: CVPR, 2006, pp. 2169–2178.
[10] J. Yang, K. Yu, Y. Gong, & T. Huang, “Linear spatial pyramid

matching using sparse coding for image classification,” In IEEE
computer society conference on computer vision and pattern
recognition, Miami: CVPR, 2009, pp. 1794–1801.
[11] N. Petkov & P. Kruizinga, “Computational models of visual neurons
specialized in the detection of periodic and aperiodic oriented visual
stimuli: Bar and grating cells,” Biological Cybernetics, 76(2), 83–96,
1997.
[12] T.T.Q. Bui & K.S. Hong, “Supervised learning of a color-based
active basis model for object recognition,” Proceedings of the 2rd
International Conference on Knowledge and Systems Engineering,
2010, pp. 69-74.
[13] Y.S. Heo, K.M. Lee, & S.U. Lee, “Joint depth map and color
consistency estimation for stereo images with different illuminations
and cameras,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2012, doi:

[14] K.E. van de Sande, T. Gevers, & C.G.M. Snoek, “Evaluating color
descriptors for object and scene recognition,” IEEE Transactions on
Pattern Analysis and Machine Intelligence, 32(9), 1582–1596, 2010.
[15] T.T.Q. Bui & K.S. Hong, “Evaluating a color-based active basis
model for object recognition,” Computer Vision and Image
Understanding, 116(11), 1111–1120, 2012.
[16] F.S. Khan, J. van de Weijer, & M. Vanrell, “Modulating shape
features by color attention for object recognition,” International
Journal of Computer Vision, 98(1), 49–64, 2012.
[17] D.R. Martin, C.C. Fowlkes, & J. Malik, “Learning to detect natural
image boundaries using local brightness, color, and texture cues,”
IEEE Transactions on Pattern Analysis and Machine Intelligence,
26(5), 530–549, 2004.
[18] C.C. Chang & C.J. Lin, “LIBSVM: A library for support vector

machines,” ACM Transactions on Intelligent Systems and
Technology, 2(27), 1–27, 2011.
[19] M. Weber, W. Welling, & P. Perona, “Unsupervised learning of
models for recognition,” In Computer vision – ECCV 2000,
proceedings of the 6
th
European conference on computer vision, part
I, Heidelberg: Springer-Verlag Berlin, 2000, pp. 18–32.
[20] R. Fergus, P. Perona, & A. Zisserman, “Object class recognition by
unsupervised scale-invariant learning,” In Proceedings of the IEEE
computer society conference on computer vision and pattern
recognition, Madison: IEEE Computer Society, 2003, pp. 264–271.
[21] L. Fei-Fei, R. Fergus, & P. Perona, “Learning generative visual
models from few training examples: An incremental Bayesian
approach tested on 101 object categories,” In Proceedings of the
IEEE computer society conference on computer vision and pattern
recognition workshops, Washington: IEEE Computer Society, 2004,
pp. 178–187.
[22] J.G. Daugman, “Uncertainty relation for resolution in space, spatial-
frequency, and orientation optimized by two-dimensional visual
cortical filters,” Journal of the Optical Society of America A-Optics
Image Science and Vision, 2(7), 1160–1169, 1985.
[23] N. Petkov, “Biologically motivated computationally intensive
approaches to image pattern recognition,” Future Generation
Computer Systems, 11(4-5), 451–465, 1995.
[24] D.J. Heeger, “Modeling simple-cell direction selectivity with
normalized, half-squared, linear operators,” Journal of
Neurophysiology, 70(5), 1885–1898, 1993.
[25] V. Vapnik, The nature of statistical learning theory, London:
Springer-Verlag, 1995.

×