Tải bản đầy đủ (.pdf) (12 trang)

DSpace at VNU: Cluster-based relevance feedback for CBIR: A combination of query point movement and query expansion

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (960.99 KB, 12 trang )

J Ambient Intell Human Comput (2012) 3:281–292
DOI 10.1007/s12652-012-0141-z

ORIGINAL RESEARCH

Cluster-based relevance feedback for CBIR: a combination
of query point movement and query expansion
Nhu-Van Nguyen • Alain Boucher • Jean-Marc Ogier
Salvatore Tabbone



Received: 30 June 2011 / Accepted: 5 June 2012 / Published online: 21 June 2012
Ó Springer-Verlag 2012

Abstract This paper presents a cluster-based relevance
feedback method, which combines two popular techniques
of relevance feedback: query point movement and query
expansion. Inspired from text retrieval, these two techniques are giving good results for image retrieval. But
query point movement is limited by a constraint of unimodality in taking into account the user feedbacks. Query
expansion gives better results than query point movement,
but it cannot take into account irrelevant images from the
user feedbacks. We combine the two techniques to profit
from their advantages and to cope with their limitations.
From a single point initial query, query expansion provides
a multiple point query, which is then enhanced using query
point movement. To learn the multiple point queries, the
irrelevant feedback images are classified into query points
which are clustered from relevant images using the query
expansion technique. The experiments show that our
method gives better results in comparison with the two


techniques of relevance feedback taken individually.

N.-V. Nguyen (&) Á J.-M. Ogier
L3i-University of La Rochelle, La Rochelle, France
e-mail:
J.-M. Ogier
e-mail:
N.-V. Nguyen Á A. Boucher
IFI, MSI team; IRD, UMI 209 UMMISCO; Vietnam National
University, Hanoi, Vietnam
e-mail:
N.-V. Nguyen Á S. Tabbone
QGAR-LORIA, University of Lorraine, Nancy, France
e-mail:

Keywords Image retrieval Á Relevance feedback Á Query
point movement Á Query expansion

1 Introduction
There are two reasons for limited performance of all
Content-Based Image Retrieval (CBIR) systems. The first
one is that it is impossible to fully express all the user
intent into a simple query for retrieval. The latter is due to
the the semantic gap, which can be defined as the difference between the user interpretation and the computer
description for an image. In order to resolve these problems, several researchers (Zhou and Huang 2003; Nguyen
et al. 2009; Apostol et al. 2005; Kim et al. 2005; Ritendra
et al. 2008; Ortega and Mehrotra 2004; Yoshiharu et al.
1998) have applied the relevance feedback (RF) techniques
in CBIR over the last decade. Significant improvements in
performance have been witnessed in the application of RF

techniques in the traditional text retrieval domain. Nowadays, RF has become an essential component of a CBIR
system.
RF is an interactive strategy which is effective to
improve the accuracy of information retrieval systems. The
basic idea of RF is that the user is involved in the retrieval
process so the final result set is improved. In particular, the
user gives feedback on the relevance of documents in an
initial set of results. It adapts the retrieval process for a
specific user and a specific query. The user first submits a
query (an image as example in our case), then receives
some results. After that, the user interacts with the system
by labeling some images as relevant or irrelevant with the
given query. The system, in turn, computes a better revised
set of retrieval results based on the user feedback. RF has a
short-term memory which means that the system can

123


282

remember the results during the interaction process for the
given query. Once it is finished, the system cleans its
memory and the next user starts from scratch.
Various relevance feedback techniques have been proposed to improve the retrieval performance: weight features learning (Yoshiharu et al. 1998), query modification
(Ortega and Mehrotra 2004), classifier learning (Tao et al.
2006). Among them, query representation modification is
the most popular technique and is widely used in both
image retrieval and text retrieval. Query modification
includes two different techniques: query expansion and

query point movement. The first technique, query point
movement (Ortega and Mehrotra 2004; Yoshiharu et al.
1998) is referred to as the retrieval by single point query (as
represented in the feature space) which is modified via
relevant and irrelevant images, which represent positive
and negative feedbacks from the user. It is working with
the assumption of the unimodality of relevant images
(Yimin and Aidong 2004). Unimodality means that all
relevant images are similar between them and they form a
distinct cluster from other images in the feature space.
Query point movement tries to obtain the ideal query point
by moving it towards relevant images and away from
irrelevant ones. The second technique, query expansion
(Ortega and Mehrotra 2004; Kim et al. 2005), is referred to
as the retrieval by multiple point queries. Instead of
assuming an unimodal distribution, query expansion
assumes many smaller unimodal distributions to construct
multiple point queries from relevant images. Query
expansion is arguably one of the most effective approaches
of relevance feedback.
In this paper, a novel method for combining these two
techniques is proposed for query by example in CBIR.
Query expansion is used to construct multiple point queries
by clustering the relevant images. Query point movement is
used to improve the representation of the multiple point
queries by applying the Rocchio technique (Salton 1971)
on the relevant and the irrelevant images. Our contribution
is a cluster-based relevance feedback technique which uses
the query point movement technique and the irrelevant
examples to enhance the efficiency of query expansion.

This paper is divided into 6 sections. In Sect. 2 the
related work is described and the remaining problems are
discussed in Sect. 3. Section 4 presents our method. Section
5 discusses the evaluation and presents experimental results
on a large dataset with 30K images. Section 6 concludes
the paper and gives some future directions for work.

2 Related work
Because of the problem of fully expressing the user intents
using a simple query and the problem of the semantic gap,

123

N.-V. Nguyen et al.

there have been many works focusing on relevance feedback. Various relevance feedback techniques have been
proposed: weight features learning (Yoshiharu et al. 1998),
query modification (Ortega and Mehrotra 2004), classifier
learning (Tao et al. 2006). Weight features learning
improves the distance function, query modification looks
for the ideal query point and classifier learning uses the
relevant/irrelevant images as training data to construct a
probability classifier. Among the techniques for relevance
feedback, query modification is based on the text retrieval
approach and is often considered as the best approach of
relevance feedback in image retrieval systems. This traditional type of approach is still very efficient compared to
all other techniques in the two fields: text retrieval and
image retrieval. In the general context of the image
retrieval process and the development of techniques of
relevance feedback, a recognized problem is the small

number of available examples. We state the hypothesis that
a user can label up to 20 images only when most of the
learning techniques require much more. If we compare the
Rocchio algorithm for query modification with learning
algorithms (metric of classifier optimization), such as
neural networks for example, it can be understood that the
popularity of query modification is related to the fact that it
requires very few examples in learning.
To detail these two techniques for query modification,
we must first define the concept of unimodality of an image
group. Unimodality is a concept used by some authors in
the field of reference feedback (Karthik and Jawahar 2006;
Yimin and Aidong 2004) to characterize the fact that the
closest images of a query in the feature space are not all
relevant to the query. However, there is no clear definition
of this concept, so we define it as:
Definition The concept of unimodality of an image group
means that all images in this group are similar and they
form a group distinct from the other images in the feature
space. In relevance feedback, images in a group are similar
in the sense of their relevance with the given query. The
relevance can be estimated using an arbitrary threshold or
function, or in the case of our work, indicated by the user
who is labeling some images in the retrieval results as
relevant or irrelevant. Relevance is then a subjective notion
meaning that it satisfies the query as judged by the human
user. An image group is defined as centered on the query in
the feature space, or in another words as the most closest
retrieval results for the given query.
For example in Fig. 1, the left group is unimodal while

the right group is not unimodal.
The query modification technique, which we focus on in
this paper, can be achieved using either of two approaches:
query point movement and query expansion. In both
approaches, the input is a single point query (or a vector in


Cluster-based relevance feedback for CBIR

Fig. 1 Unimodality of an image group based on the user feedbacks:
relevant (?) and irrelevant (-) result images compared with the given
query. A non-unimodal image group (the group includes irrelevant
images as judged by the user given the query) could contain some
unimodal subgroups, as in the right group where we can identify
contains 3 unimodal subgroups (but not-centered on the query). In our
work, we try to identify these unimodal subgroups from a nonunimodal image group

283

Fig. 3 Query expansion: a a single point query is replaced by b a
multiple point query, using the user feedbacks, relevant (?) example
images only

qiþ1 ¼ aqi þ

b X
c X

d
jDr j d2D

jDn j d2D
r

ð1Þ

n

where ~
qi is the query at iteration i of the relevance feedback
process, Dr is the relevant set, Dn is the irrelevant set, a, b
and c give the relative weights of q, Dr and Dn. In experiments, the set of parameters a = b = c = 1 is widely
used for image retrieval.
2.2 Query expansion
Fig. 2 Query point movement. a The initial query and the user
feedbacks (relevant ‘‘?’’ and irrelevant ‘‘-’’ result images). b The
query moves toward the relevant images. c The query moves toward
the relevant images until it is positioned at the center of the relevant
images

the feature space). Query point movement aims at moving
the single point query in the feature space (adjusting the
feature vector of the query point, Fig. 2). Query expansion
aims at replacing the single point query by a multiple point
query (replacing a feature vector by multiple feature vectors, Fig. 3). Each technique uses the incremental information from interactions with the user, or in other terms,
the relevant/irrelevant images returned (labeled) by the
user.
2.1 Query point movement
In the query point movement approach (Ortega and Mehrotra 2004; Yoshiharu et al. 1998) for the query by
example in CBIR, a query is represented by a single point
in the feature space and the refinement process attempts to

reformulate the query vector to move it closer to the area
containing relevant images (see Fig. 2). With the assumption of the unimodality of relevant images, the optimal
query maximizes the similarity to relevant images and
minimizes the similarity to irrelevant ones (Kim et al.
2005). The Rocchio technique (Salton 1971) is often used
to compute the optimal query:

In the query expansion approach (Kim et al. 2005; Ortega
and Mehrotra 2004), the query is modified by selectively
adding new relevant point to the query representation. A
single point query is replaced by a multiple point query
(see Fig. 3). Instead of assuming an unimodal distribution
as in query point movement, query expansion assumes
many smaller unimodal distributions to construct multiple
local clusters from the relevant images. The representatives
of local clusters are used to perform multiple point querying. The clustering of relevant images is repeated for
each relevance feedback iteration. Querying by multiple
points is investigated in (Xiangyu and James 2003; Natsev
and Smith 2003; Thijs and de P. Vries Arjen 2004; Tahaghoghi et al. 2002; Apostol et al. 2005; Danzhou et al
2009) which are focused on the similarity function and the
fusion of multiple single point query. Experimental evaluation in (Ortega and Mehrotra 2004) shows that query
expansion outperforms query point movement in retrieval
effectiveness.
Recently, new approaches are aiming to improve the
query modification technique. The QCluster system (Kim
et al. 2005) uses a new adaptive classification and clustermerging method to find multiple regions. The clustering
step is not repeated as in query expansion. QCluster classifies relevant examples into the previous clusters or create
a new cluster. The number of clusters is limited to a fixed
number by using a cluster-merging method. But this
complex approach is unable to make effective use of

irrelevant examples. All the above methods still have

123


284

N.-V. Nguyen et al.

drawbacks such as local maximum traps and slow convergence. In (Danzhou et al. 2009), the authors propose a
fast query point movement technique to get rid of these
drawbacks. However, their work aims to specific target
search by using relevance feedback, which has some difference with the category search done in classical CBIR.
Target search in CBIR systems refers to finding a specific
(target) image such as a particular registered logo or a
specific historical photograph.
2.3 Multiple point query
Query expansion requires support for multiple point querying. Querying by multiple point is investigated in (Thijs
and de P. Vries Arjen 2004; Tahaghoghi et al. 2002;
Apostol et al. 2005) which are concerned by the similarity
function and the fusion of multiple single point queries.
The similarity of images for each single point query is
determined independently. The result for a single point
query is an ordered list. Lists from all single point queries
must be combined to determine the final ranking of the
multiple points query. A combining function is therefore
required to reduce multiple similarity values to a single
value. When this reduction has been performed for all
images in the collection, the user is presented with a list of
the images, presented in decreasing order of similarity. All

combining functions can be resumed into three types:
MINIMUM, MAXIMUM and SUM. These types determine the distance of images from the specified multiple
points query to be respectively the minimum, the maximum, and the sum of the distances (with weights) to each
single point query. In our experiment, the MINIMUM
function is found to be the best combining function in term
of robustness. This is also confirmed by Tahaghoghi et al.
in (2002).

3 Remaining problems
The main disadvantage of query point movement is the
constraint of unimodality (see previous definition in Sect.
2) on relevant examples. The main problem for query
expansion is its difficulty to use effectively irrelevant
images. In query point movement, the query point is moved
closer to the relevant examples and away from the irrelevant ones in the feature space. When the relevant images
are grouped in distinct subsets in the feature space (that is
to say the distribution of the relevant examples is not
unimodal), then the problem arises from the need to cover
multiple clusters with a single query. In these cases, the
ideal query point includes irrelevant examples. Figure 4
shows the ellipse representing the line equidistant from a

123

Fig. 4 Remaining problems with query point movement and query
expansion. a In query point movement the ideal query point can
include some irrelevant examples (-) due to the non-unimodality of
the relevant examples. b In query expansion, ideal query points
slowly converge when irrelevant examples (-) are not used. Both
techniques can cause result in a local maximum trap


new query. We can see some irrelevant examples included
in the relevant ellipses.
Query expansion and its best improved version QCluster
(Kim et al. 2005) only use relevant examples to form
queries to multiple points. The technique of query expansion does not use irrelevant examples because we cannot
perform clustering using relevant and irrelevant examples
together, which would give false groups. Our analysis on
the subject suggests that without irrelevant examples,
convergence towards the ideal query point can potentially
be very slow, and also the risk of falling into a local
minimum is not insignificant. Indeed, a false ideal query
point can be achieved when the local group is close to some
relevant examples, but located near also many irrelevant
examples (see Fig. 4). We can see from this figure that
irrelevant examples may be included in local groups,
because these are constructed based only on relevant
examples regardless of the presence or not of irrelevant
ones. In general, relevance feedback techniques often use
relevant feedback examples. The management of irrelevant
feedback examples remains a major growth factor, thus
representing a very open scientific question (Xuanhui et al.
2008).

4 Clustered-based relevance feeback for CBIR
In this section, we present our approach which attempts to
provide precise answers to questions previously identified.
This approach exploits irrelevant examples and combines
query point movement and query expansion.
A combination of query point movement and query

expansion is proposed to overcome problems related to
query expansion and query point movement. The main
drawback of query point movement is the constraint of
unimodality on relevant examples that cannot be always
verified. We solve this problem by using a clustering


Cluster-based relevance feedback for CBIR

285

Fig. 5 Combination of query point movement and query expansion,
where ideal query points are achieved more efficiently and quickly
and irrelevant examples are not present in local clusters. a The initial
single point query and the feedbacks (relevant ‘‘?’’ and irrelevant ‘‘’’) given by the user. b The multiple point query obtained by query
expansion. c The multiple points query is moved towards relevant
feedbacks and away from irrelevant ones using query point movement

technique to build multiple local clusters that provide local
unimodality using relevant examples. The main drawback
of query expansion is the inability to make effective use of
irrelevant examples. In our approach, we propose a
sequential combination of the two techniques: first query
expansion (Fig. 5b) then query point movement (Fig. 5c).
We are taking advantage of irrelevant examples using the
technique of query point movement on multiple local
clusters created using query expansion. We believe this
sequential combination is the best among all possible
combinations because it ensures the unimodality constraint
and makes use of irrelevant examples (Fig. 5c) to effectively achieve the ideal query. The opposite combination

(first query point movement then query expansion) is not
good as query expansion cannot profit from irrelevant
examples which were used in query point movement.
The purpose of this technique is to reach the ideal query
through interaction with the user and to overcome the
identified problems for both query point movement and
query expansion. The first relevance feedback interaction
loop is shown in Fig. 6. Initially, a single point query is
formalized by using the feature vector of an image query q:
Q = f1, f2, ..., fn fi is a n-dimension vector in the feature
space. Then images are retrieved, the first N images are
shown to the user (which has a limited view due to screen
interface constraints). The user identifies and labels relevant/irrelevant images in an interaction process of RF, with
the assumption that relevant examples in the result do not
ensure the unimodality (Fig. 6, steps 1 and 2). Basing on
(only) relevant/irrelevant images returned from the user the
technique will replace and improve the single point query
q by a multiple point query qi, i [ 1 (a query with multiple
feature vectors) using the two main processes: query
expansion and query movement.
First, the single point query q is expanded into a multiple point query to ensure the unimodality (of each subquery) which is the problem of query point movement
(Fig. 6, step 3): the relevant examples are clustered into c
groups C1, C2, …, Cc. The number of clusters c is selected

Fig. 6 Main steps for the cluster-based relevance feedback

automatically using an adaptive clustering technique and is
limited to a maximum value. In this step, we try to have the
cluster/group maximums that are always unimodal. Two
clustering algorithms used in our system are presented in

the end of this section. Second, in order to find the ideal
points of the c relevant groups, the query point movement
technique is used: irrelevant examples are classified into
these c groups (Fig. 6, step 4) to identify irrelevant
examples present in each local group (in contrast with
query expansion where only relevant examples are used).
Relevant and irrelevant examples in each group are then
used to build the multiple point query by the Eq. 1 (Fig. 6,
step 5) in which we try to move the query points closer to
the relevant images and away from the irrelevant images.
The classifier k Nearest Neighbors (k-NN) is used in step 4
for the classification of irrelevant examples because of its
efficiency and simplicity, the parameter k of the classifier is
selected as follows:
k ¼ minðjCi j; i ¼ 1 : cÞ

ð2Þ

and the query point !
qi of cluster i is calculated using the
Rocchio’s formula (Salton 1971):
Pm ! Pn !
j¼1 Rj
j¼1 Ij
!
qi ¼
À
ð3Þ
m
n


123


286

where I1, I2, …, In: n irrelevant examples and
R1, R2, …, Rm : m relevant examples of the local cluster Ci.
These c points of query form the final multiple point query.
As discussed above, in the first interaction loop, the
initial query (one sole point) is replaced by a multiple
points query by building local groups (clustering step). For
the following interaction loops, there are two choices to
improve the multiple points query. The first choice does
not rely on the first multiple point query (clustering step of
the first iteration), but is re-clustering relevant examples at
each iteration. This method attempts to add relevant query
points and to remove irrelevant points in this same query,
based on all relevant/irrelevant examples from each interaction loop. Clustering and classification are repeated for
each iteration for this method. The second choice is to
move points of the first query to ideal points based on new
relevant/irrelevant examples from the following interactions. This method assumes that one can get at ideal query
points from the first constructed query points. Since we do
not rebuild local groups, the clustering step is performed
once at the beginning (during the first interaction loop), in
the following interactions the query is built based on the
multiple point query from the first iteration.
We can observe that the first choice is more influenced
by query point movement than query expansion, because it
attempts to move the multiple points query to the ideal

query. In contrast, the second choice is more influenced by
query expansion because it tries to create the ideal query
points based on the clustering. We are calling these two
methods: Clustering-Repeat (CR) and Clustering-NoRepeat (CNR). The two corresponding algorithms are
described below.
Clustering-Repeat (CR) In this approach, the clustering
step of relevant examples, the classification step of irrelevant
examples and the multiple point query construction step are
repeated for each iteration of relevance feedback. Thus, the
system performs the same process for all iterations. The
query of the previous iteration does not directly affect the
new query for the current iteration. Examples from the previous iteration are also included in the current iteration.
Implicitly, relevant points are added and irrelevant ones get
dropped as we move from one iteration to the next.
Clustering-No-Repeat In this approach, the previous
query affect directly the new query. The clustering step of
relevant examples is performed once at the beginning (first
iteration). Then, during subsequent iterations, instead of
making a new clustering as in the case of the CR method,
both of relevant/irrelevant examples are classified in points
of the previous query, so take advantage of the previous
query. New query points are refined from the relevant/irrelevant examples using the query point movement technique:
In these two algorithms, we can observe that the difference is in steps 3, 4 and 5. In the case of the CNR

123

N.-V. Nguyen et al.

method, step 3 is performed only once (at the first iteration)
while it is repeated for all iterations for the CR method. In

step 4, only the irrelevant set is classified into clusters for
the CR algorithm, while both sets (relevant and irrelevant)
are classified into the clusters for the CNR algorithm. Step
5 of the CR algorithm, the relevant set is used to rebuild the
local groups (step 3 is repeated). Finally, the formula used
to construct the multiple points query is different for two
algorithms.
Discussion In this section, we have presented our
approach with two variants for relevance feedback. Our
approach combines two techniques of query modification:
query point movement and query expansion, to take
advantage of irrelevant examples and to address the problem of unimodality and trying to eliminate all irrelevant
examples in the result. Both variants of our approach
(Clustering-Repeat and Clustering-No-Repeat methods)
are aiming at finding the ideal query points when we move
from one interaction loop to another. The first method
(Clustering-Repeat) aims to replace irrelevant query points
by relevant query points. The second method (Clustering-


Cluster-based relevance feedback for CBIR

287

is a fuzzy clustering method which has a computational
efficiency (complexity) of O(CDN), C being the number of
prototypes, the data points are D-dimensional and N the
number of data points to cluster. The kNN classification
method has a complexity of O(DN), where the data points
are D-dimensions and N is the total number of points. The

total complexity is O(CDN) ? O(DN) which is are suitable
for retrieval analysis in large image datasets, remembering
that as in our assumption/condition for each interaction the
number of samples processed (relevant/irrelevant examples) is very small, estimated at 20 maximum (limited by
the quantity of images that the user can label.

4.1 Selection of clustering method
In our approach of relevance feedback, an important step
concerns the clustering of user feedbacks. Clustering is
used to cluster relevant images in separate groups. In our
system, the number of groups is unknown. We are therefore interested in clustering methods able of determining
automatically the optimal number of groups. We have
experimented using 2 methods: Adaptive K-Means (Kothari and Pitts 1999) and Competitive Agglomeration (Frigui and Krishnapuram 1997). These two methods are
chosen for their ability to automatically determine the
number of groups, and are representative of two known
types of clustering methods in the literature: hierarchical
methods and partitional methods.
Adaptive K-means The best known algorithm for clustering is the k-means method. For p models:
fxl : l ¼ 1; 2; . . .; pg; xl 2 Rn

ð4Þ

the k-means method obtains the position of the k cluster
centers y m by minimizing the cost function given by:


p X
k
X


Iðym jxl Þjjxl À ym jj2

ð5Þ

l¼1 m¼1

No-Repeat) aims to move query points to ideal points. The
first method (CR) is more dependent on the performance of
the clustering method used than the second one because in
the CR method the clustering is repeated for all iterations.
The second method (CNR) is more dependent on the
construction of the initial points. For example, if all the
possible relevant examples can be represented in n distinct
groups but the relevant examples labeled by the user and
used to construct the initial points belonging to c \\ n
distinct groups, this can produce a loss in the result. The
computational complexity of the two algorithms is the sum
of the complexity of the clustering and the classification
methods used. In our case, the Competitive Agglomeration

where ||.|| denotes a distance metric, I(ym|xl) is an indicator
function which equals 1 if l = arg minł ||xl - ył||2 and 0
otherwise.
In the Adaptive K-Means method (Kothari and Pitts,
1999), the proposed cost function is:


p X
k
X


Iðym jxl Þjjxl À ym jj2 þ extra term

ð6Þ

l¼1 m¼1

extra term ¼

p X
k
X

~ m jxl Þjjym À yx jj2
k~m Iðy

ð7Þ

l¼1 m¼1

123


288

N.-V. Nguyen et al.

~ m jxl Þ is an indicator function which equals 1 if
where Iðy
m


l

ł 2

y 2 Nyx ; x ¼ argminł jjx À y jj ; and Nyx are neighborhoods of the center of the cluster yx.
There are two terms in the cost function: the first is
similar to the k-means method, the second is an extra term.
This extra term tries to spread the cluster centers to minimize the sums of squares of distance of a cluster center to
cluster centers nearby.
Smaller values for the neighborhood encourage the
formation of several centers in separate clusters, while
large values for the neighborhood encourage the formation
of fewer distinct cluster centers. The Adaptive K-Means
method identifies the neighborhood as a scale parameter
and provides the number of centers of clusters at different
values of the scale parameter. The number of centers of
clusters in the data is then obtained based on the stability of
clusters by varying the scale parameter.
Competitive agglomeration This second clustering method
by (Frigui and Krishnapuram 1997) minimizes an objective
function that integrates the advantages of hierarchical and
partitional clustering techniques. The Competitive Agglomeration algorithm produces a sequence of partitions with a
decrease in the number of groups. Competitive Agglomeration begins with data partitioning on a specified number of
groups, and finally provides the ‘‘best’’ number of groups.
During the clustering phase, the adjacent groups playing
against each other to capture the data points, and groups that
are gradually losing in the competition run out and disappear,
until only groups with large cardinality survive. The algorithm
can incorporate different distance measures in the objective

function to find a number of groups in various forms.
Discussion on clustering methods In our experiments,
different clustering methods were studied to calculate the
local groups. Taking advantage of the benefits of both
hierarchical and partitional clustering, Competitive
Agglomeration (Frigui and Krishnapuram 1997) seems to
produce the best performance in our extensive testing.
Another advantage of this clustering method is the automatic
selection of the number of groups. Our experiments have
shown that the choice of the clustering and the classification
methods does not influence much the final result, because the
total number of samples (relevant/irrelevant) is very small.
Let us recall here that the user marks only a few examples as
relevant or irrelevant during the relevance feedback process.
We will present the experiment to compare these clustering
methods in the result section of this paper.

5 Evaluation
We presented our contribution on relevance feedback for
content-based image retrieval with two methods. These

123

methods are based on a combination of two popular techniques: query point movement and query expansion. The
main idea of our approach is to avoid the problems associated with query point movement and query expansion to
enhance search results. This approach provides a good tool
to improve the performance of image retrieval. In this
section we present our experiments to evaluate our methods for relevance feedback.
5.1 Experimental protocol
For our experiment, we are using 3 different databases:

Corel 30K image database (Gustavo et al. 2007), Caltech256 database (Griffin et al. 2007) and Pascal VOC2011
database (Everingham et al. 2007). User interactions are
simulated using external knowledge corresponding to the
manual annotations in this database. Three methods of
relevance feedback are evaluated in this experiment: the
query point movement, the query expansion and our proposed method with two variants which are ClusteringRepeat (CR) and Clustering-No-Repeat (CNR).
The content-based image retrieval system used in the
experiments is based on the state-of-the-art Bag of Words
model (Sivic and Zisserman 2008). Visual words are built
using the SIFT feature, computed as in (Sivic and Zisserman 2008). All the results presented in this section will
evaluate the improvement between the initial response
from the system (after the initial query) and the one
obtained after relevance feedback (in percent of improvement for the precision and recall measures).
5.1.1 Experimental database
The Corel 30K image database contains 30,000 images
divided into different categories by experts and there are
100 images in each class. The Caltech256 database
contains about 30,000 images divided into 256 different
categories by experts and there are about 100 images in
each class. The Pascal VOC2011 database contains
about 15,000 images, each image being in one or several of the 23 different categories (multiple class
images).
We rely on a simulation of human interaction, using
data already in Corel30K, Caltech256 and PascalVoc2011,
playing a role somewhat similar to that of a human. A
technique of pseudo-relevance feedback is used to simulate automatically human interactions in relevance feedback. Our approach relies on the use of textual annotations
given for the images in this database, for which there are
various possibilities for specifying a ground truth for
validation.



Cluster-based relevance feedback for CBIR

289

5.1.2 Discussion on the protocols used for other systems
For the MARS system (Ortega and Mehrotra 2004), images
relevant to a query image are selected as follows. A query
image Q is selected at random from the database and
retrieval for the first 50 image results. This set of 50 images
is referred to the set relevant(Q). Then new queries are
constructed by moving around of Q (these queries are close
to Q in the feature space). It is then considering Q as the
ideal query. Queries are chosen from around Q in the hope
that they will achieve the ideal query Q (using relevance
feedback). Then the first 100 images are retrieved, which
become the retrieved (Q). In Mars, precision and recall are
calculated using the relevant (Q) set and retrieved (Q) set
using the classical formulas below:
T
relevantðQÞ retrievedðQÞ
precison ¼
ð8Þ
retrievedðQÞ

rappel ¼

T
relevantðQÞ retrievedðQÞ
relevantðQÞ


ð9Þ

For the MARS system (Ortega and Mehrotra 2004), the
relevant set is selected by ensuring the unimodality since
all images are visually similar to a query image. The
authors assume that all the relevant images form a
unimodal, assumption which is not entirely realistic,
creating an implicit limitation of the approach. In
addition, this work supports all measures on average
about 100 queries, which is very small compared to the
number of images in the database. In another example, the
QCluster system (Kim et al. 2005), the ground truth is
relatively simple because information from high-level
category in the Corel database is used as ground truth for
simulating the relevance feedback. The images of the same
class are considered as the most relevant images and
related categories (such as flowers and plants) are
considered relevant. This assumption creates an easy
condition for the relevance feedback, because the number
of relevant images is then higher compared with other
approaches [e.g. Mars (Ortega and Mehrotra 2004)],
explaining the good quality results for the QCluster system.

5.1.3 Our experimental protocol
For our experiment, we consider the ground truth as the
class of images in Corek30K, Caltech256 and PascalVoc2011, which can produce a wide variety of classes, but
that seems representative of real life conditions. We measure the retrieval performance with the classical criteria of
recall/precision by retrieving the first 100 responses (we
assume that the user can see only 100 results on the screen


interface). Most of studies (Huiskes and Lew 2008; Yimin
and Aidong 2004; Faria et al. 2010) on relevance feedback
use only a sub-database (10, 20 or 50 categories) for expriment on Corel30K and Caltech256 due to the great
number of images in these databases (30,000) while the
number of images in each category is small (100). This is
done to stress the effect of relevance feedback in the validation process. Following a similar protocol, we are
dividing the whole database into five different experiment
sets to ensure there are relevant images in the first 100
images retrieved. The PascalVoc2011 database has 14,961
images and there are from 275 to 1,366 images in each
class (except for one class which has 7,419 images), so
there is no need to divide this database. For the experimentation, we are using about 5,000 queries for each
experiment set.
One parameter for relevance feedback is the number of
feedbacks given by the user at each iteration. This number
of training examples is usually small. In our experiments,
we rely on the assumption that a maximum of 20 images
can be selected by the user. These images are chosen as the
first P relevant examples and the first N irrelevant examples
in the first 100 responses, where P ? N B 20. These
examples are automatically returned by the system using
the ground truth as we use a technique of pseudo-relevance
feedback to simulate automatically human interaction. We
propose two strategies for the number of examples:
1.

2.

Ten relevant examples, 10 irrelevant examples in the

case of query point movement, CR and CNR. And 20
relevant examples in the case of query expansion. We
remind that query expansion does not use irrelevant
examples because this technique attempts to combine
the relevant examples to form the multiple point query.
Five relevant examples, 5 irrelevant examples in the
case of query point movement, CR and CNR. And 10
relevant examples in the case of query expansion.

5.2 Results and discussion
5.2.1 Retrieval performance over 3 image databases
In this section, the 4 relevance feedback techniques are
compared according to the protocol described above. As
mentioned above, we compute the classical criteria of
recall/precision by retrieving the first 100 responses. As the
number of images of each class in Corel30K and Caltech256 database is about 100 (thus, the number of relevant
examples is equal to the number of examples retrieved), the
recall for the first 100 retrieved images is equal to the
precision.
For the Corel30K database, in the case of experiments
based on 10 sample images (Fig. 7), our methods are better

123


290

Fig. 7 Corel30K: Average accuracy for the first 100 retrieved images
for the four techniques of relevance feedback with 10 feedback
examples for each iteration. QE, Query expansion; QPM, Query point

movement; CR, Clustering-Repeat; CNR Clustering-No-Repeat. Both
CR and CNR methods show very good performance compared to
existing query modification techniques

than query expansion and query point movement. CNR
method is slightly better than CR method. After two iterations of relevance feedback, query point movement has
the worst performance; the other three methods are with
equivalent performance. During subsequent iterations, both
methods CR and CNR become better than traditional
techniques. The average precision of traditional techniques
is approximately of 0.244 after five iterations, while the
CNR method has an average accuracy of 0.288 and the CR
method has an average accuracy of 0.279. The improvement in accuracy of our methods over traditional techniques is 18 % from these results.
In the case of experiments with 20 images of feedback
(Fig. 8), the CNR method outperforms all other methods.
Our methods have better performance for the early iterations, but the accuracy of the CR method is not better than
query point movement for the following iterations. In this
case, query expansion gives the worst performance; query
point movement and the CR method have the same performance with an average accuracy of about 0.305, the
CNR method with the best average accuracy of 0.39. The
improvement in accuracy for the CNR method compared
with traditional techniques is 28% in this experiment.
Our methods give better results compared to query
modification techniques used in MARS (Ortega and Mehrotra 2004). Both also provides a significant improvement
in average accuracy compared to QCluster (Kim et al.
2005). They show improvements of 18 and 28 % (respectively for 10 and 20 examples of relevance feedback in the
first 100 retrieved images) as compared with traditional
techniques. QCluster has an improvement of 20 % compared with traditional techniques, but for this approach, the
number of examples is the maximum number of relevant


123

N.-V. Nguyen et al.

Fig. 8 Corel30K: Average accuracy for the first 100 images from the
four techniques with 20 examples of relevance feedback for one
iteration. QE, Query expansion; QPM, Query point movement; CR,
Clustering-Repeat; CNR Clustering-No-Repeat. The CNR method
gives the best result

images in the first 100 images result. This number is
greater than the number of examples in our proposed
methods (20 maximum). In reality, the approach proposed
by QCluster seems unrealistic in terms of usage, because it
is difficult to ask too many interactions from the user. A
system asking the user 20 interactions seems more realistic
compared to one who is asking 100. In addition, Qcluster
and MARS are evaluated on only 100 queries and their
ground truths are selected solely for their own methods.
Our method is evaluated on a number of 5,000 queries that
provides so much more than generic QCluster and MARS.
For the Caltech256 database based on 20 sample images
(Fig. 9), query expansion is the worst and query point
movement and CR method are the same. The first iteration,
all methods have the same performance, while for the latter
two iterations, CR is better than query point movement but
in the 5th iteration, query point movement is better than
CR. Only CNR method is always better than other methods. The average precision of the best traditional technique
is 0.308 after 5 iterations, while the CNR method has an
average accuracy of 0.368 and the CR method has an

average accuracy of 0.296. The improvement in accuracy
of CNR method over traditional techniques is about 20 %.
For the PascalVOC2011 database based on 20 sample
images (Fig. 10), query expansion is also the worst and
query point movement is better than CR method. For the
first iteration, the two traditional techniques have better
performance than our methods. During the latter iterations,
query point movement is better than CR method but CNR
method always outperforms all other methods. The average
precision of the best traditional technique is about 0.393
after 5 iterations, while the CNR method has an average
accuracy of 0.464 and the CR method has an average
accuracy of 0.370. The improvement in accuracy of CNR


Cluster-based relevance feedback for CBIR

Fig. 9 Caltech256: average accuracy for the first 100 images from
the 4 techniques with 20 examples of relevance feedback for 1
iteration. QE, Query expansion; QPM, Query point movement; CR,
Clustering-repeat; CNR, Clustering-No-Repeat. The CNR method
gives the best result

291

Fig. 11 Average accuracy for the first 100 retrieved images for our
two techniques of RF with 20 feedback examples for each iteration
with different clustering methods: adaptive K-means and competitive
agglomeration. CR, Clustering-Repeat; CNR, Clustering-No-Repeat.
In both cases CR and CNR, competitive agglomeration is slightly

better than adaptive K-means, the difference being relatively small

number of samples (relevant/irrelevant examples) is very
low. Note that in our system, the user labels few examples
(20 maximum) as relevant or irrelevant during an
interaction.

6 Conclusion

Fig. 10 PascalVOC2011: average accuracy for the first 100 images
from the 4 techniques with 20 examples of relevance feedback for 1
iteration.QE, Query expansion; QPM, Query point movement; CR,
Clustering-repeat; CNR, Clustering-No-Repeat. The CNR method
gives the best result

method over traditional techniques is about 18 % in this
experiment.
5.2.2 Comparison of clustering methods
Our algorithms are mainly based on the clustering of
sample images. We have presented our selection of the
clustering approach in Sect. 4: adaptive K-means and
competitive agglomeration. In this section, these two
methods are compared based on the performance of image
retrieval.
The Figure 11 illustrates the average accuracy for the
first 100 retrieved images. In both cases CR and CNR,
Competitive Agglomeration is slightly better than Adaptive
K-means. We can see that the choice of clustering method
does not influence much the results, because the total


In this article, we are proposing a new method for relevance feedback called cluster-based relevance feedback. It
is inspired by two existing techniques of relevance feedback scheme: query point movement and query expansion.
Taking advantage of irrelevant images and advantages of
both traditional techniques, our method gives better results.
The cluster-based relevance feedback is proposed with
two different variants: CR and CNR. By combining both
techniques of query modification that are query point
movement and query expansion, these two approaches can
benefit from irrelevant examples. In all cases, CNR gives
the best result. Clustering-repeat gives good results when
the number of feedback examples is low. Our method does
not require complex computations, but offers very significant improvements in accuracy compared to traditional
techniques.
As the relevance feedback methods presented here are
valid for both text and image retrieval, we are planning, in
the near future, to extend our cluster-based relevance
feedback by combining together text-based and contentbased image retrieval. To achieve this, a text/image
learning model is needed and can be built onto the same
relevance feedback model. This learning model would be
considered as long-term memory relevance feedback,
because knowledge would be learnt and stored in the

123


292

system for long-term use, as opposed to the short-term
memory relevance feedback presented in this article.
Acknowledgments This project is supported in part by the ICTAsia IDEA project from the French Ministry of Foreign Affairs

(MAE), the DRI INRIA and DRI CNRS.

References
Apostol N, Milind N, Jelena T (2005) Learning the semantics of
multimedia queries and concepts from a small number of
examples. In: MULTIMEDIA ’05: Proceedings of the 13th
annual ACM international conference on Multimedia, ACM,
New York, NY, USA, pp 598–607
Danzhou L, Hua A, Vu K, Yu N (2009) Fast query point movement
techniques for large cbir systems. IEEE Trans Knowl and Data
Eng 21(5):729–743
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A
(2007) The PASCAL Visual Object Classes Challenge 2011
(VOC2011) Results. />VOC/voc2011/workshop/index.html
Faria FF, Veloso A, Almeida HM, Valle E, Torres RdS, Gonc¸alves
MA, Meira W Jr (2010) Learning to rank for content-based
image retrieval. In: Proceedings of the international conference
on Multimedia information retrieval, MIR ’10, ACM, New York,
NY, USA, pp 285–294
Frigui H, Krishnapuram R (1997) Clustering by competitive agglomeration. Pattern Recognition 30(7):1109 – 1119
Griffin G, Holub A, Perona P (2007) Caltech-256 object category
dataset. Tech. Rep. 7694, California Institute of Technology,
/>Gustavo C, Chan B, Moreno J, Vasconcelos N (2007) Supervised
learning of semantic classes for image annotation and retrieval.
IEEE Trans Pattern Anal Mach Intell 29(3):394–410
Huiskes J, Lew S (2008) Performance evaluation of relevance
feedback methods. In: CIVR ’08: Proceedings of the 2008
international conference on Content-based image and video
retrieval, ACM, New York, NY, USA, pp 239–248
Karthik PS, Jawahar CV (2006) Analysis of relevance feedback in

content based image retrieval. In: Ninth international conference
on control automation robotics and vision, 2006, pp 1–6
Kim D, Chung C, Barnard K (2005) Relevance feedback using
adaptive clustering for image similarity retrieval. J Syst Softw
78(1):9–23
Kothari R, Pitts D (1999) On finding the number of clusters. Pattern
Recogn Lett 20(4):405–416
Natsev A, Smith J (2003) Active selection for multi-example
querying by content. In: ICME ’03: proceedings of the 2003

123

N.-V. Nguyen et al.
international conference on multimedia and expo, IEEE Computer Society, Washington, DC, USA, pp 445–448
Nguyen NV, Ogier JM, Tabbone S, Boucher A (2009) Text retrieval
relevance feedback techniques for bag of words model in cbir.
In: International conference on machine learning and pattern
recognition (ICMLPR), Paris, France, pp 541–546
Ortega M, Mehrotra S (2004) Relevance feedback techniques in the
mars image retrieval system. Multimed Syst 9:535–547
Ritendra D, Dhiraj J, Jia L, James ZW (2008) Image retrieval: Ideas,
influences, and trends of the new age. ACM Comput Surv 40(2):
1–60
Salton G (ed) (1971) The SMART retrieval system—experiments in
automatic document processing. Prentice Hall, Englewood,
Cliffs
Sivic J, Zisserman A (2008) Efficient visual search for objects in
videos. Proc IEEE 96(4):548–566
Tahaghoghi M, Thom A, Williams E (2002) Multiple example
queries in content-based image retrieval. In: SPIRE 2002:

proceedings of the ninth international symposium on string
processing and information retrieval, Springer-Verlag, London,
pp 227–240
Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random
subspace for support vector machines-based relevance feedback
in image retrieval. IEEE Trans Pattern Anal Mach Intell 28(7):
1088–1099
Thijs W, de P Vries Arjen (2004) Multimedia retrieval using multiple
examples. In: Image and video retrieval, lecture notes in
computer science, vol 3115, Springer, Berlin, pp 2048–2049
Xiangyu J, James CF (2003) Improving image retrieval effectiveness
via multiple queries. In: MMDB ’03: Proceedings of the 1st
ACM international workshop on Multimedia databases, ACM,
New York, NY, USA, pp 86–93
Xuanhui W, Hui F, ChengXiang Z (2008) A study of methods for
negative relevance feedback. In: Proceedings of the 31st annual
international ACM SIGIR conference on Research and development in information retrieval, ACM, New York, NY, USA,
SIGIR ’08, pp 219–226
Yimin W, Aidong Z (2004) Interactive pattern analysis for relevance
feedback in multimedia information retrieval. Multimedia Syst
10:41–55
Yoshiharu I, Ravishankar S, Christos F (1998) Mindreader: Querying
databases through multiple examples. In: VLDB ’98: Proceedings of the 24rd International Conference on Very Large Data
Bases, Morgan Kaufmann Publishers Inc., San Francisco, CA,
USA, pp 218–227
Zhou S, Huang S (2003) Relevance feedback in image retrieval: a
comprehensive review. Multimedia Syst 8(6):536–544




×