Tải bản đầy đủ (.pdf) (8 trang)

DSpace at VNU: Content-Based Image Retrieval Using Moments of Local Ternary Pattern

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (767.84 KB, 8 trang )

Mobile Netw Appl (2014) 19:618–625
DOI 10.1007/s11036-014-0526-7

Content-Based Image Retrieval Using Moments of Local
Ternary Pattern
Prashant Srivastava & Nguyen Thanh Binh &
Ashish Khare

Published online: 18 July 2014
# Springer Science+Business Media New York 2014

Abstract Due to the availability of large number of digital
images, development of an efficient content-based indexing
and retrieval method is required. Also, the emergence of
smartphones and modern PDAs has further substantiated the
need of such systems. This paper proposes a combination of
Local Ternary Pattern (LTP) and moments for Content-Based
Image Retrieval. Image is divided into blocks of equal size
and LTP codes of each block are computed. Geometric moments of LTP codes of each block are computed followed by
computation of distance between moments of LTP codes of
query and database images. Then, the threshold using distance
values is applied to retrieve images similar to the query image.
Performance of the proposed method is compared with other
state-of-the-art methods on the basis of results obtained on
Corel-1,000 database. The comparison shows that the proposed method gives better results in terms of precision and
recall as compared to other state-of-the-art image retrieval
methods.

Keywords Image retrieval . Content-based image retrieval .
Local ternary pattern . Geometric moments
P. Srivastava : A. Khare (*)


Department of Electronics and Communication, University of
Allahabad, Allahabad, Uttar Pradesh, India
e-mail:
A. Khare
e-mail:
P. Srivastava
e-mail:
N. T. Binh
Faculty of Computer Science and Engineering, Ho Chi Minh City
University of Technology, Ho Chi Minh, Vietnam
e-mail:

1 Introduction
With the advent of numerous digital image libraries, containing huge amount of different types of images, it has become
necessary to develop systems that are capable of performing
efficient browsing and retrieval of images. Also, with the
emergence of mobiles and smartphones, the number of images
is increasing day-by-day. Pure text-based image retrieval systems are prevalent but are unable to retrieve visually similar
images. Also, it is practically difficult to annotate manually
large number of images. Hence, pure text-based approach is
insufficient for image retrieval.
Content-Based Image Retrieval (CBIR) - the retrieval of
images on the basis of features present in the image, is an
important problem of Computer Vision. Content-based image
retrieval, instead of using keywords and text, uses visual
features such as colour, texture and shape to search an image
from large database [1,2]. These features form a feature set
which act as an indexing scheme to perform search in an
image database. These feature sets of query images are compared with that of database images to retrieve visually similar
images. Since retrieval is based on contents of image, the

process of arrangement and classification of images is easier
as it does not require manual annotation. The automatic classification of images together makes the access of similar
images easier to the users.
Early image retrieval systems were based on primitive
features such as colour, texture and shape. The field of image
retrieval has witnessed substantial work on colour feature.
Colour is a visible property of an object and a powerful
descriptor of object. Colour based CBIR systems use conventional colour histogram to perform retrieval. Texture is another
feature that has been used extensively for image retrieval.
Texture feature represents structural arrangement of a region
and describe characteristics such as smoothness, coarseness,
roughness of a region. One such feature is Local Binary


Mobile Netw Appl (2014) 19:618–625

Pattern (LBP) [3] which is applied on gray-level images. LBP
is a very powerful descriptor as it is practically easy to compute and is invariant to gray-level transformations. However,
being based on bit values 0 and 1, LBP operator fails to
discriminate between multiple patterns. Also, the presence of
noise in the image affects the LBP operator as it is highly
sensitive to noise. Tan et al. [4] provided an extension of LBP
as Local Ternary Pattern (LTP). LTP thresholds
neighbourhood pixels to three values and is less sensitive to
noise as compared to LBP. However, LTP is not invariant to
gray level transformation.
Content-based retrieval methods based on shape feature
has been used extensively. Shape does not mean shape of
whole image but shape of a particular object or a region in
the image. Shape features generally act as global features. The

global features consider whole image to extract features.
However, they do not consider local variations in the image.
Shape features are generally used after segmentation of objects from images unlike colour and texture [5]. Since segmentation is a difficult problem, therefore, shape features have
not been exploited much. But, still shape is considered as a
powerful descriptor. Single feature is insufficient to construct
efficient feature vector which is very essential for efficient
image retrieval. The combination of more than one feature
attempts to solve this problem. The combination of colour and
texture [6], colour and shape [7], and colour, texture, and
shape [8] has been widely used for this purpose.
Modern image retrieval methods combine local and global
features of an image to perform efficient retrieval. The combination of local and global features exploits the advantages of
both the features. This property has motivated us to combine
local feature LTP with global feature moments. This paper
combines LTP and moments in the form of moments of LTP.
Grayscale images are divided into blocks of equal size and
LTP codes of each block are computed. Geometric moments
of these LTP codes are then computed to form feature vector.
Euclidean distance is computed between blocks of query
image and database images to measure similarity followed
by computation of threshold values to find images similar to
the query image.
Rest of the paper is organized as follows. Section 2 discusses some of the related work in the field of image retrieval.
Section 3 describes fundamentals of LTP and image moments
along with their properties. Section 4 of this paper is concerned with the proposed method. Section 5 discusses experimental results and Section 6 concludes the paper.

2 Related work
Over a past few decades the field of image retrieval has
witnessed a number of approaches to improve the performance of image retrieval. Text-based approaches are still in


619

use and almost all web search engines follow this approach.
Early CBIR systems were based on colour features. Later on,
colour based techniques saw use of colour histograms. Texture
features caught the attention of researchers and were used
extensively for the purpose of image retrieval. Texture features such as LBP, LTP are considered to be powerful descriptive features and have been used for various applications.
Pietikäinen et al. [9] proposed block-based method for
image retrieval using LBP. Murala et al. [10] proposed two
new features, namely Local Tetra Patterns (LTrP) and Directional Local Extrema Pattern (DLEP) [11], based on the
concept of Local Binary Pattern (LBP) as features for image
retrieval. Liu et al. [12] proposed the concept of Multi-texton
Histogram (MTH) which is considered as an improvement of
Texton Co-occurrence Matrix (TCM) [13]. The concept of
MTH works for natural images. The concept of Microstructure Descriptor (MSD) has been described in [14]. This
feature computes local features by identifying colours that
have similar edge orientations.
Shape has also been exploited as a single feature as
well as in combination with other features. Zhand et al.
[15] proposed a region based shape descriptor, namely,
Generic Fourier Descriptor (GFD). Two dimensional fourier descriptor was applied on polar raster sampled shape
image in order to extract GFD, which was applied on
image to determine the shape of the object. Lin et al.
[16] proposed a rotation, translation and scale invariant
method for shape identification which is also applicable
on the objects with modest level of deformation. Yoo
et al. [17] proposed the concept of histogram of edge
directions, called as edge angles to perform shape based
retrieval. [18] used the concept of moments for CBIR.
The method divided images into blocks and computed

geometric moments of each block. Euclidean distance
between blocks of query image and database image was
computed followed by computation of threshold to retrieve visually similar images.
However, these features have been exploited as single
feature which are not sufficient for constructing powerful
feature vector. Therefore, the combination of two or more
features emerged as silver lining in the field of image retrieval
as this combined the advantages of all features. [19] proposed
the combination of SIFT, LBP and HOG descriptors as bag of
feature model in order to exploit the concept of local and
global features of image. The combination of wavelets with
other features has also been exploited for image retrieval.
Combination of gabor filter and Zernike moments has been
proposed in [20]. Gabor filter performs texture extraction
while Zernike moment performs shape extraction. This method has been applied for face recognition, fingerprint recognition, shape recognition. Wavelet has also been used with
colour as wavelet correlogram in [21]. Wavelet has a powerful
characteristic of multiresolution analysis. It is because of this


620

property that wavelets have been used extensively for image
retrieval. The combination of á trous wavelet with microstructure descriptor (MSD) as á trous gradient structure descriptor has been proposed in [22]. Wang et al. [8] incorporated colour, texture and shape features for image retrieval.
Colour feature has been exploited by using fast colour quantization. Texture features are extracted using filter decomposition and finally, shape features have been exploited using
pseudo-Zernike moments. Li et al. [23] proposed the use of
phase and magnitude of Zernike moment, for image retrieval.
Deselaers et al. [24] compared certain features for image
retrieval on different databases.

Mobile Netw Appl (2014) 19:618–625


moments and various types of moment based invariants
play an important role in object recognition and shape
analysis. The (p+q)th order geometric moment Mpq of a
gray-level f(x,y) is defined as.
Z∞ Z∞
M pq ¼

xp yq f ðx; yÞdxdy

ð2Þ

∞ ∞

In discrete cases [25], the integral in the equation (2)
reduces to summation and equation (2) becomes
M pq ¼

n X
m
X

xp yq f ðx; yÞ

ð3Þ

x¼1 y¼1

3 Features used and their properties
3.1 Local ternary patterns

Local Ternary Pattern (LTP) is an extension of Local
Binary Pattern (LBP). Whereas LBP operator thresholds
a pixel to 2-valued codes 0 and 1, LTP thresholds a pixel
to 3-valued codes. The gray levels in a zone of width ±t
around pixel c are quantized to 0, those which are above
this are quantized to +1 and those below this are quantized to − 1. That is,
8
9
< 1; p ≥ c þ t =


ð1Þ
LTPðp; c; t Þ ¼ 0; p−c < t
:
;
−1; p≤ c−t
where t is a user-specified threshold.
In order to eliminate negative values, the LTP values are
divided into two channels, the upper LTP (ULTP) and the
lower LTP (LLTP). The ULTP is obtained by replacing the
negative values by 0. The two channels of LTP are treated as
separate entities for which separate histograms and similarity
metrics are computed combining these at the end. Computation of LTP with the help of an example has been shown in
Fig. 1 (t=5).
3.2 Properties of LTP
LTP holds following important properties1. LTPs are less sensitive to noise as compared to LBP.
2. LTP is not invariant to gray level transformation.

3.3 Moments
Moment is a measure of shape of object. Image moments

are useful to describe objects after segmentation. Image

where n x m is the size of gray-level image f(x,y).
Simple properties of image which are found via image
moments include area, its centroid and information about the
orientation. Moment features are invariant to geometric transformations. Such features are useful to identify objects with
unique shapes regardless of their size, and orientation. Being
invariant under linear coordinate transformations, the moment
invariants are useful features in pattern recognition problems.
Moments have been used for distinguishing between shapes
of different aircraft, character recognition, and scene matching
applications. Following properties of image moments are very
useful in image retrieval1. Moment features are invariant to geometric
transformations.
2. Moment features provide enough discrimination power to
distinguish among objects of different shapes.
3. Moment features provide efficient local descriptors for
identifying the shape of objects.
4. Infinite sequence of moments uniquely identifies objects.

3.4 Local ternary patterns and moments
Single feature fails to capture complete information of an
image. The combination of features is required to incorporate fine details of an image while constructing feature
vector. The combination of local and global features is
one such approach in this direction. The local features
help in capturing local variations. On the other hand
global features capture holistic ideas of an image. Also,
this approach combines the advantages of both the features. The combination of LTP and moments help in
fulfilling these criteria. LTP, a local feature captures texture details and act as a powerful classifier. Moment, a
global feature determines shape of an object in the image



Mobile Netw Appl (2014) 19:618–625

and is invariant to geometric transformation. The advantages of this combination are summarized as follows1. LTP, as compared to LBP, is less sensitive to noise and
hence the combination of LTP with moments is less
affected by the presence of noise.
2. The use of geometric moment as a single feature creates
numerical instabilities as it takes high values for higher
order moments [26]. But the combination of LTP and
moments overcome this disadvantage as the moment
values of LTP are not very high.
3. Geometric moments are invariant to geometric transformations. Hence its combination with LTP incorporates this advantage in the LTP-Moment feature
vector.

621

The schematic diagram of the proposed method is shown in
Fig. 2.
4.1 Computation of LTP codes
The algorithm for computation of LTP codes is as follows:
1. Convert the image into grayscale.
2. Rescale the image to 252×252
3. Divide the image into blocks of 84 × 84 and compute LTP
codes of each block.
4. Computation of LTP yields two values: upper LTP
(ULTP) and lower LTP (LLTP).

4.2 Computation of moments
4 The proposed method

The proposed method consists of three steps:
1. The first step is concerned with division of image into
blocks and computation of LTP codes of each block.
2. In second step, we compute geometric moments of LTP
codes of query image and database images.
3. Threshold is computed to perform retrieval in the third
step.
Fig. 1 Computation of LTP

Geometric moments of ULTP and LLTP codes are computed
using eqn (3). The sequence of moments chosen here is 0 to 15.
The moment values of ULTP and LLTP are computed separately.
4.3 Distance measurement
Let the moments of LTP codes for different blocks of query
image be represented as mQ =(mQ1,mQ2,mQn). Let the moments of LTP codes for different blocks of database images


622

be represented as mDB ¼ ðmDB1 ; mDB2 ; mDBn Þ . Then, the Euclidean distance between block LTP moments of query and
database image is given as.
À
Á qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
À
Á2
D mQ ; mDB ¼
ð4Þ
mQi −mDBi
4.4 Computation of threshold
Threshold is used to perform retrieval. Use of threshold improves the retrieval results as compared to the retrieval result

obtained without using threshold. The basic idea behind
threshold computation is to find the range of distance values
which return images similar to the query image. The Euclidean distance values computed using equation (4) are sorted in
ascending order so that images are arranged according to
similarity to query image. That is, the most similar image first
and others after that. The index of similar images is stored
along with their distance values to identify minimum and
maximum values of range. This determines the range of

Then the algorithm to compute threshold is given below:

Mobile Netw Appl (2014) 19:618–625

similarity to a query image. This procedure is repeated for
every image of database to find the range of similarity. Finally,
the minimum and maximum of all range of values is determined. These values determine threshold of the entire category of similar images. This is done for all categories of images
in database. The threshold values for upper LTP and lower
LTP are computed separately. To compute threshold, let
(i) N be total number of relevant images in database and
NDB be total number of images in the database.
(ii) sortmat be the sorted matrix (ascending order) of distance
values and minix be first N indices of images in sortmat
matrix.
(iii) start_range and end_range be the range of relevant images in the database.
(iv) maxthreshold and minthreshold are respectively the maximum and minimum distance values of each query image.
(v) mthreshmat be the maximum of all the values of
maxthreshold.


Mobile Netw Appl (2014) 19:618–625


623

Fig. 2 Schematic diagram of the
proposed method

5 Experiment and results
To perform experiment using the proposed method, images
from Corel-1K database [27] have been used. The images in
this database have been classified into ten categories, namely,
Africans, Beaches, Buildings, Buses, Dinosaurs, Elephants,
Flowers, Horses, Mountains, Food. Each image is of size
either 256 × 384 or 384 × 256. Each category of image
consists of 100 images. Each image has been rescaled to
252×252 to ease the computation. Sample images from each
category are shown in Fig. 3.
Each image of this database is taken as query image. If the
retrieved images belong to the same category as that of the
query image, the retrieval is considered to be successful,
otherwise the retrieval fails.



IR
TR

ð5Þ

where IR denotes total number of relevant images retrieved
and TR denotes total number of images retrieved.

Recall is defined as the ratio of total number of relevant
images retrieved to the total number of relevant images in the
database. Mathematically, recall can be formulated as.


IR
CR

ð6Þ

where IR denotes total number of relevant images retrieved
and CR denotes total number of relevant images in the database. In this experiment, TR =10 and CR =100.

5.1 Performance evaluation
5.2 Retrieval results
Performance of the proposed method has been measured in
terms of precision and recall. Precision is defined as the ratio
of total number of relevant images retrieved to the total
number of images retrieved. Mathematically, precision can
be formulated as.

Fig. 3 Sample images from Corel-1,000 database

For the experimentation purpose, each image is divided into
blocks of size 84 ×84. Local Ternary Pattern codes of each
block are computed followed by computation of geometric
moments of LTP codes. Distance between block moments of


624


Mobile Netw Appl (2014) 19:618–625

Table 1 Average precision and recall values for each category of image

Table 2 Comparison of the proposed method with other methods

Category

Precision (%)

Recall (%)

Methods

Precision (%)

Africans
Beaches
Buildings
Buses
Dinosaurs
Elephants
Flowers
Horses
Mountains
Food
Average

41.50

33.70
33.40
54.80
94.50
42.50
87.60
79.30
27.90
41.80
53.70

66.12
51.82
66.60
74.80
91.82
74.56
85.88
83.46
53.42
72.20
72.09

Block-based LBP [9]
CBIR using moments [18]
Gabor histogram [24]
Image-based HOG-LBP [19]
LF SIFT histogram [24]
Color histogram [24]
The proposed method


23.00
35.94
41.30
46.00
48.20
50.50
53.70

query image and database image is determined. Then the
retrieval is performed using threshold obtained by using
threshold algorithm.
The computation of local ternary pattern yields two values,
namely upper LTP and lower LTP. These two values are
treated as separate entities of LTP codes. Separate moment
distance and threshold values are computed which are

subsequently combined at the end of computation of threshold. After computing distance measurement of the two moment values, threshold is computed for the purpose of retrieval. This produces two sets of similar images. Union of these
two sets is taken to produce final set of similar images. Recall
is computed by counting total number of relevant images in
the final set. Similarly, for precision, top n matches for each
image set is counted and then union is applied on these two
sets to produce final set. Mathematically, this can be formulated as follows. Let fULTP be set of similar images obtained
from moments of upper LTP codes and fLLTP be set of similar
images obtained from moments of lower LTP codes. Then, the
final set of similar images denoted by fRS is given by.
f RS ¼ f ULTP ∪ f LLTP

ð7Þ


Similarly, let fnULTP and fnLLTP be set of top n images obtained from moments of upper LTP codes and moments of lower
LTP codes respectively. Then the final set of top n images
denoted by fnPS is given as
f nPS ¼ f nULTP ∪ f nLLTP

Fig. 4 a Precision vs. Category plot b Recall vs. Category plot

ð8Þ

Fig. 5 Comparison of the proposed method (PM) with other methods in
terms of average precision


Mobile Netw Appl (2014) 19:618–625

Retrieval is considered to be good if the values of precision
and recall are high. Table 1 shows the performance of the
proposed method for each category of image of database in
terms of precision and recall. Fig. 4 shows the plot between
recall and precision values for different image categories.
The proposed method is compared with other state-of-theart methods such as Block-based LBP method [9], Imagebased HOG-LBP [19], and LF SIFT Histogram [24]. Table 2
shows the performance comparison of the proposed method
with other methods in terms of average precision. Fig. 5 shows
the plot between precision and methods. Values of precision
and recall were computed on the same Corel-1K image
database. From Table 2 and Fig. 5 it can be observed that
the proposed method outperforms, in terms of precision,
Block-based LBP [9] by 30.70 %, CBIR using Moments
[18] by 17.76 %, Gabor Histogram [24] by 12.4 %, Imagebased HOG-LBP [19] by 7.7 %, LF SIFT Histogram [24] by
5.5 %, Color Histogram [24] by 3.2 %.


6 Conclusion
In this paper, we have presented the combination of LTP and
moments. Local Ternary Pattern codes of blocks of gray level
image are computed. Geometric moments of the resulting LTP
codes are then computed. The method then computes distance
between blocks of query and database images and finally
retrieval is performed on the basis of threshold. This method
combines the advantage of low noise sensitivity of LTP and
invariance to geometric transformation property of moments.
Also, this method exploits the advantages of fusion of local
and global features of an image.
Performance of the proposed method was measured in
terms of precision and recall. The experimental results showed
that the proposed method outperformed other state-of-the-art
methods. Results of the proposed method can be further
improved by dividing moments into more number of
sequences.

References
1. Long H, Zhang H, Feng DD (2003) Fundamentals of content-based
image retrieval. Multimedia information retrieval and management.
Springer Berlin, Heidelberg, pp 1–26
2. Rui Y, Huang TS, Chang S (1999) Image retrieval: current techniques, promising directions, and open issues. J Vis Commun Image
Represent 10:39–62
3. Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale
and rotation invariant texture classification with local binary patterns.
IEEE Trans Pattern Anal Mach Intell 24(7):971–987

625

4. Tan X, Triggs B (2010) Enhanced local texture feature sets for face
recognition under difficult lighting conditions. IEEE Trans Image
Process 19(6):1635–1650
5. Khare M, Srivastava R K, Khare A (2013) Moving object segmentation in daubechies complex wavelet domain. Signal, Image and
Video Processing. Accepted, doi: 10.1007/s11760-013-0496-4,
Springer
6. Wang X, Zhang B, Yang H (2002) Content-based image retrieval by
integrating color and texture features. MultimediaTools Appl 1–25
7. Gevers T, Smeulders AW (2000) Pictoseek: combining color and
shape invariant features for image retrieval. IEEE Trans Image
Process 33(1):102–119
8. Wang X, Yu Y, Yang H (2011) An effective image retrieval scheme
using color, texture and shape features. Comput Stand Interfaces
33(1):59–68
9. Pietikäinen M, Takala V, Ahonen T (2005) Block-based methods for
image retrieval using local binary patterns.14th Scandinavian
Conference on Image Analysis 882–891
10. Murala S, Maheshwari RP, Balasubramanian R (2012) Local tetra
patterns: a new descriptor for content-based image retrieval. IEEE
Trans Image Process 21(5):2874–2886
11. Murala S, Maheshwari RP, Balasubramanian R (2012) Directional
local extrema patterns: a new descriptor for content-based image
retrieval. Int J Multimedia Inf Retrieval 1(3):191–203
12. Liu G, Zhang L, Hou Y, Yang J (2008) Image retrieval based on
multi-texton histogram. Pattern Recogn 43(7):2380–2389
13. Liu G, Yang Y (2008) Image retrieval based on texton co-occurrence
matrix. Pattern Recogn 41(12):3521–3527
14. Liu G, Li Z, Zhang L, Xu Y (2011) Image retrieval based on
microstructure descriptor. Pattern Recogn. doi:10.1016/j.patcog.
2011.02.003

15. Zhang D, Lu G (2002) Shape-based image retrieval using generic
fourier descriptor. Signal Process-Image Commun 17(10):825–848
16. Lin H, Kao Y, Yen S, Wang C (2004) A study of shape-based image
retrieval. In Proc. 24th International Conference on Distributed
Computing Workshops 118–123
17. Yoo H, Jang D, Jung S, Park J, Song K (2002) Visual information
retrieval via content-based approach. J Pattern Recognit Soc 35:749–
769
18. Srivastava P, Binh N T, Khare A (2013) Content-based image
retrieval using moments. In Proc. 2nd International Conference on
Context-Aware Systems and Applications 228–237
19. Yu J, Qin Z, Wan T, Zhang X (2013) Feature integration analysis of
bag-of-features model for image retrieval. Neurocomputing 120:
355–364
20. Fu X, Li Y, Harrison R, Belkasim S (2006) Content-based image
retrieval using gabor-zernike features. 18th International Conference
on Pattern Recognition, Hong Kong 2:417–420
21. Moghaddam HA, Khajoie TT, Rouhi AH, Tarzjan MS (2005)
Wavelet correlogram: a new approach for image indexing and retrieval. Pattern Recogn 38:2506–2518
22. Agarwal M, Maheshwari RP (2012) Á trous gradient structure descriptor for content based image retrieval. Int J Multimedia Inf Retr
1(2):129–138
23. Li S, Lee MC, Pun CM (2009) Complex Zernike moments shapebased image retrieval. IEEE Trans Syst Man Cybern Part A: Syst
Hum 39(1):227–237
24. Deselaers T, Keysers D, Ney H (2008) Features for image retrieval:
an experimental comparison. Inf Retr 11:77–107
25. Flusser J (2005) Moment invariants in image analysis. Enformatika
11
26. Kotoulas L, Andreadis I (2005) Image analysis using moments. 5th
International Conference on Technology and Automation,
Thessaloniki, Greece 360–364

27. />


×