Trích xuất ảnh trademark dựa trên các đặc trưng bất biến dịch chuyển, quay, tỷ lệ (Trademark Image Retrieval Based on Scale, Rotation, Translation Invariant Features) : M.A Thesis Information Technology : 60 48 01

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.11 MB, 64 trang )

UNIVERSITY OF ENGINEERING AND TECHNOLOGY
VIETNAM NATIONAL UNIVERSITY, HANOI

NGUYEN TIENDUNG

TRADEMARK IMAGE RETRIEVAL BASED ON
SCALE, ROTATION, TRANSLATION,
INVARIANT FEATURES

MASTER THESIS:INFORMATION TECHNOLOGY

Hanoi - 2014
0

UNIVERSITY OF ENGINEERING AND TECHNOLOGY
VIETNAM NATIONAL UNIVERSITY, HANOI

NGUYEN TIENDUNG

TRADEMARK IMAGE RETRIEVAL BASED
ON SCALE, ROTATION, TRANSLATION,
INVARIANT FEATURES
Major

: Computer Science

Code

: 60480101

MASTER THESIS: INFORMATION TECHNOLOGY
Supervised by: Dr. Le Thanh Ha

Hanoi - 2014
1

Hanoi - 2010

Originality Statement
‘I hereby declare that this submission is my own work and to the best of my knowledge
it contains no materials previously published or written by another person, or substantial
proportions of material which have been accepted for the award of any other degree or
diploma at University of Engineering and Technology (UET) or any other educational
institution, except where due acknowledgement is made in the thesis. I also declare that
the intellectual content of this thesis is the product of my own work, except to the extent
that assistance from others in the project’s design and conception or in style, presentation
and linguistic expression is acknowledged.’
Signed ........................................................................

2

TABLE OF CONTENS
Originality Statement ....................................................................................................... 2
ABBREVIATION ............................................................................................................ 6
Abstract ............................................................................................................................ 7
Chapter 1: Introduction .................................................................................................... 8
Chapter 2: Related work ................................................................................................ 11
Chapter 3: Background .................................................................................................. 14

3.1.

Pre-processing ................................................................................................ 14

3.2.

Object description .......................................................................................... 19

3.3.

Feature vectors extraction .............................................................................. 20

3.3.1.

Discrete Fourier Transform (DFT) ............................................................. 20

3.3.2.

Log-polar transform .................................................................................... 21

3.4.

Measure of similarity ..................................................................................... 22

3.4.1.

Euclidean distance ...................................................................................... 22

3.4.2.

Mahalanobis distance .................................................................................. 23

3.4.3.

Chord distance ............................................................................................ 23

Chapter 4: Proposed method .......................................................................................... 24
4.1.

Pre-processing ................................................................................................ 24

4.2.

Visual shape objects extraction ...................................................................... 24

4.3.

Scale, rotation, translation invariant features ................................................. 25

4.4.

Measure of similarity ..................................................................................... 28

Chapter 5: Experiments and results ............................................................................... 30
5.1.

Implementation .............................................................................................. 30

5.2.

Test results for exact copy actions ................................................................. 32
3

5.3.

Test results for scaling action ......................................................................... 33

5.4.

Test results for rotating actions ...................................................................... 34

5.5.

Test results for mirror actions ........................................................................ 35

5.6.

Test results for partial copy actions ............................................................... 36

5.7.

Test results for random query trademark ....................................................... 38

5.8.

Testing summary ............................................................................................ 38

Chapter 6: Conclusion .................................................................................................... 40
REFERENCES ............................................................................................................... 41

APPENDIX .................................................................................................................... 45
Pre-processing ........................................................................................................... 45
Visual shape objects extraction ............................................................................... 45
Scale, rotation, translation invariant features extraction ..................................... 47
Matching by measure of similarity and retrieval Trademark Images ................ 49

4

List of Figure
Fig. 1.

Some trademark image samples ....................................................................... 8

Fig. 2.

The log-polar transform maps (𝐱, 𝐲) into (𝐥𝐨𝐠(𝐫), 𝛉) ................................... 21

Fig. 3.

Log-polar transform of rotated and scaled squares: size goes to a shift on the

𝐥𝐨𝐠(𝐫) axis and rotation to a shift on the 𝛉 − 𝐚𝐱𝐢𝐬 ....................................................... 22
Fig. 4.

Contour filter algorithm. ................................................................................ 25

Fig. 5.

Illustration of three stages of the proposed method ....................................... 28

Fig. 6.

Samples of the collected trademark images for testing.................................. 30

Fig. 7.

Results for exact copy tests ............................................................................ 32

Fig. 8.

Result for scaling tests ................................................................................... 33

Fig. 9.

Results for translation and scaling tests ......................................................... 34

Fig. 10. Results for rotation tests ................................................................................. 35
Fig. 11. Results for mirror tests .................................................................................. 36
Fig. 12. Results for parital copy tests .......................................................................... 37
Fig. 13. Results for random tests ................................................................................. 38

5

ABBREVIATION
DFT: Discrete Fourier Transform
CBIR: Content Based Image Retrieval
SIFT: Scale-invariant feature transform

6

Abstract
Trademark registration offices or authorities have been bombarded with requests
from enterprises. These authorities face a great deal of difficulty in protecting enterprises’
rights such as copyright, license, or uniqueness of logo or trademark since they have only
conventional clustering. Urgent and essential need for sufficient automatic trademark
image retrieval system, therefore, is entirely worth thorough research. In this thesis, a
novel trademark image retrieval method is proposed in which the input trademark image
is first separated into dominant visual shape images then a feature vector for each shape
image which is scale-, rotation-, and translation- invariant is created. Finally, a similarity
measure between two trademark is calculated based on these feature vectors. Given a
query trademark image, retrieval procedure is carried out by taking the most five similar
trademark images in a predefined trademark. Various experiments are conducted to mimic
the many types of trademark copying actions and the experimental results exhibit the
robustness of our retrieval method under these trademark copy actions.

7

Chapter 1: Introduction
From an economic perspective, a trademark is clearly understood as a word, a design,
a picture, a complex symbol or even a combination of such, which is put on a product or
standing for service of particular company. In [2], four types of popular trademarks are
listed in order of visual complexity: word-in-mark (only characters or words in the mark),
device-mark (graphical or figurative elements), composite-mark (characters or words and
graphical elements) and complex-mark (complex image). Fig. 1 offers some trademark
samples.

Fig. 1. Some trademark image samples

Every Company or Financial organization desires to own a distinctive, meaningful,
and descriptive logo which offers both exclusive and right of its characteristic. Drawing
attention of consumers to their products or services and market viability dependsactually
on not only designing an intellectual and attractive trademark, but also whether or not
preventing consumer confusion.
The world markets have remarkably expandedand grown for global economic
scenario caused by different trade related practices coming closer to each other at
international level. A great number of businesses have been established. This has resulted
in millions of trademarks submitted to various trademark offices world over for
registration need to have distinctiveness from the existing trademarks as per definitions
and trade practices in different countries and this is likely to be on an increase in years to
come.Actually, millions of trademarks already registered and millions of applications filed
for trademarks registration are aggravating the problem of issuing the trademark
certificates. Therefore, the trademark registration authorities have received many
trademark protection applications from enterprises.The problem for finding the similar
trademark has become a challenge because. These authorities face challenges in dealing
with these proposals since they still use the traditional activity of classification (i.e.,
manual way). It is obvious that trademark registration with manual searching is very
arduous task for the officials.It is really hard for them to make sure if a trademark is
8

duplicated: whether a particular trademark is registered or not; if a trademark resembles
another registered trademark in any way, or, if copyright or license of trademark is
infringed. Thus, this poses an urgent need for an alternative automatic technology.
In [33], there are different techniques and approaches currently in use for distinctness
check for trademarks.The most popular and appreciated image processing techniques and
approaches for the trademark distinctness check are Content Based Image retrieval

techniques, which widely used for that purpose and some other approaches like shape and
texture based similarity finding techniques are also used.Image processing tools and
techniques can be used to solve different problems related to image, text, graphics and
color etc. A Trademark can be a combination of text, graphics, image, and colored texture.
Based on these, one can divide them in these components for finding the similarity among
different trademarks retrieved from the trademark database. Most of the recent techniques
used for the image retrieval have mainly utilized the features like color, texture, shape etc.
They used existing CBIR technique, i.e. Content Based Image Retrieval Systems to
retrieve the images based on visual features like texture, color, shape. In this technique
extraction of color feature using the color histogram technique is utilized. It also considers
the shape feature because it is an important feature in CBIR applications. Many techniques
or approaches have been utilized for the image retrieval, some of which are based on
improved pattern matching algorithms. Some others take a much broader approach like
searching just from the text files. Some are based on shape and color feature and some
have attempted morphological pattern based image matching and retrieval using a
database. A shape based technique introduced for the logo retrieval reported in a paper is
also inadequate to solve the problem amicably.
In this thesis, a novel trademark image retrieval method is proposed in which the
input trademark image is first separated into dominant visual shape images then a feature
vector for each shape image which is scale-, rotation-, and translation- invariant is created.
Finally, a similarity measure between two trademark is calculated based on these feature
vectors. The manuscript entitled “Trademark Image Retrieval Based on Scale, Rotation,
Translation, Invariant Features” related the issue of my thesis was published inComputing
and Communication Technologies, Research, Innovation, and Vision for the Future
(RIVF), 2013 IEEE RIVF International Conference on10-13 Nov. 2013
9

My thesisis organized as follows: Chapter 1 is introduction of my thesis. Chapter 2
represents some related works. Chapter 3 illustrates background about some problems

related. Chapter 4 presents proposed method in detail. Chapter 5 provides installation of
Visual Studio 2010 with OpenCV2.4.2 on Window 7 for implementing my thesis anda
presentation of experimental results. Chapter 6 is conclusionof the thesis. Additionally, in
Appendix, I show the whole of source code of my thesis for reading convenience.

10

Chapter 2: Related work
In recent years, researchers have proposed a wide range of solutions in a bid to
alleviate the workload for the trademark registration offices. Chen, Sun, Yang [1],
suggested two main steps for computing features vector. Initially, object region extracted
from a principal orientation-rotated image is equally partitioned into 16 regions. Then an
entropy vector as feature vector of an image is constructed by computing information
entropy of each partitioned region. This automatic shape-based retrieval technique
achieves the desired performance, good invariance under rotation, translation, scale, noise,
degree of thickness, and human visual perception satisfaction. However, this single-feature
retrieval system does not seem to meet multi-aspect of appreciation. To improve this,
among others, single-feature Zernike Moments (ZM) in [4, 10] and invariant moments in
[3, 5, 6] features of each are combined with other features.Experiment results presented by
[4] showed that this method has steady performance and good invariant property under
translation, rotation and scale. Moreover, and the low noise sensitivity of Zernike moments
made this method more insensitive to noise. However, because different users have
different understanding for image similarity, the present methods of trademark image
retrieval have some shortcomings in some aspects such as the retrieval ability to geometric
deformation images, retrieval accuracy, the consistency between image and human visual
perception. Yet the retrieval by using Zernike moment method in [10]shows it can
rapidlyretrieve trademarks. A new method is proposed in [3] based on cosine distance and
normalized distance measures. The cosine distance metric normalizes all features vector to
unit length and makes it invariant against relative in-plane scaling transformation of the

image content. The normalized distance combines two distances measures such as cosine
distance and Euclidean distance which shows more accuracy than one method alone. The
proposed measures take into account the integration of global features (invariant moments
and eccentricity) and local features (entropy histogram and distance histogram). It first
indexes trademark images database (DB) in order to search for trademarks in narrowed
limit and reduce time computation and then calculates similarities for features vector to
obtain the total similarity between features vector.An alternative solution worth a
mentioning is four shape features: global features (invariant moments and eccentricity) and
local features (entropy histogram and distance histogram) [16] are exploited by [3].
11

Recently, [5] combined nine feature vectors of low-order color moments in HSV color
space with low-orders Hu moments and eccentricity which are extracted from gray shaperegion image by Rui and Huang’s (1998) technique. The way of Gauss normalization is
applied to those features, and the weight of every feature can be adjusted flexibly
[17].Good results have been obtained in the experiments which prove that the multi-feature
combination way is better than other single-feature ways. [6] employed10 invariant
moments as shape features of trademark images improved by [20]. These features are input
to an ensemble of RBFNNs trained via a minimization of the localized generalization error
to recognize the trademark images. In this current system, the trademark images are blackand-white images. The system will be improved to adopt color trademark images as further
study.
In [2, 7], the ways of proposed combination of features are definitely different. It is
admitted that each of them performs well. The equidistance based on concentric circles
[14] is used to partition region, labelled the first step in [4] and [2]. [4], and [2] differ in the
implementation of the second step: [4] calculated its feature vector F composed of
corresponding region ZM, while [2] combined region feature vectors of 200 values with
contours features which are considered the corner-to-centroid triangulations detected by
the Hong & Jiang’s improved SUSAN algorithm [15]. Iwanaga, et al [7] put forward the
modified angle-distance pair-wise histogram based on the angle histogram and distance
histogram of trademark object. This system outperforms both moment-based and

independent histogram; i.e. angle, distance, color. Experiments conducted on registered
trademark databases. Impressive results were shown to demonstrate the robustness of the
proposed approach. Moreover, it is quite simpleto construct the distance–angle pair-wise
histogram for a trademark object
As the state-of-the-art method, [10] integrated ZM with SIFT features. In this
approach, Zernike moments of the retrieved image were firstly extracted and
sortedaccording to similarity. Candidate images were formed. Then, the SIFT features
were used for matching the query image accurately with candidate images. This method
not only keeps high precision- recall of SIFT features and is superior thanthe method based
on the single Zernike moments feature, but also improves effective retrieval
speedcompared to the single SIFT features. This method can be well applied to the
12

trademark image retrievalsystem. This newly proposed approach enhances the retrieval
performances. Tuan N.G., et al. in [27] presented a new method based on the
discriminative properties of trademark images for text recognition in trademark images.
The experimental results show the significant gain in text recognition accuracy of
proposed method in comparing with traditional text recognition methods. This contribution
seems to deal with a part of recognition of trademark image.
However, those approaches seem to ignore not only partial trademark comparison,
but also mirrored trademark. Furthermore, the approaches have only concentrated on either
original trademark without removing noise elements of trademark or standard database
which contains no noise. Additionally, these approaches have taken the trademark images
as a completed object to process and do not concern with the detailed visual shape in the
trademark images. Therefore, they cannot detect the partial similarity between trademark
images.Nonetheless, calculating distance between two features also plays an extremely
important part in measuring the similarity degrees among images. For this reason, the
mentioned solutions each endeavours to propose an appropriate measure to some extent
To overcome the above-mentioned drawbacks, an novel content-based trademark

recognition method is proposed with these four main stages: (i) pre-process or scale down
the trademark images and converts them into binary image; (ii) extract dominant shape
objects from the binary images; (iii) apply RBRC algorithm to extract rotation-invariant,
scale-invariant, translation-invariant features from the shape objects; and (iv) use Euclidian
distance to measure similarity of two images and then retrieve 10 trademark images which
are the most similar to query trademark images. The thesis focuses on handling
Vietnamese composite-mark database.

13

Chapter 3: Background
3.1. Pre-processing
Converting gray scale to binary image
In [31], segmentation involves separating an image into regions (or their contours)
corresponding to objects. We usually try to segment regions by identifying common
properties. Or, similarly, we identify contours by identifying differencesbetween regions.
The simplest property that pixels in a region can share is intensity. So, a natural way to
segment such regions is through thresholding, the separation of light and dark regions.
Thresholding creates binary images from grey-level ones by turning all pixels below some
threshold to zero and all pixels about that threshold to one. If 𝑔(𝑥, 𝑦) is a thresholded
version of 𝑓(𝑥, 𝑦) at some global threshold 𝑇,
𝑔 𝑥, 𝑦 =

0,
𝑜𝑡𝑕𝑒𝑟𝑤𝑖𝑠𝑒
1, 𝑖𝑓 𝑓(𝑥, 𝑦) ≥ 𝑇

(1)

Problems with thresholding
The major problem with thresholding is that we consider only the intensity, not any
relationships between the pixels.There is no guarantee that the pixels identified by the
thresholding process are contiguous.We can easily include extraneous pixels that aren’t
part of the desired region, and we can just as easily miss isolatedpixels within the region
(especially near the boundaries of the region). These effects get worse as the noise gets
worse,simply because it’s more likely that the pixels intensity doesn’t represent the
normal intensity in the region.When we use thresholding, we typically have to play with
it, sometimes losing too much of the region and sometimesgetting too many extraneous
background pixels. (Shadows of objects in the image are also a real pain - notjust where
they fall across another object but where they mistakenly get included as part of a dark
object on a lightbackground.)
Local thresholding
Another problem with global thresholding is that changes in illumination across the
scene may cause some parts to bebrighter (in the light) and some parts darker (in shadow)
in ways that have nothing to do with the objects in the image.

14

We can deal, at least in part, with such uneven illumination by determining
thresholds locally. That is, instead ofhaving a single global threshold, we allow the
threshold itself to smoothly vary across the image.
Automated methods for finding thresholds
To set a global threshold or to adapt a local threshold to an area, we usually look at
the histogram to see if we can find two or more distinct modes - one for the foreground
and one for the background. Recall that a histogram is a probability distribution:
𝑝 𝑔 = 𝑛𝑔 /𝑛

(2)

That is, the number of pixels 𝑛𝑔 having greyscale intensity g as a fraction of the total
number of pixels n.
Known distribution
If we know that the object we’re looking for is brighter than the background and
occupies a certain fraction 1/𝑝of the image, we can set the threshold by simply finding
the intensity level such that the desired percentage of the imagepixels are below this
value. This is easily extracted from the cumulative histogram:
𝑐(𝑔) =

𝑔
0

𝑝(𝑔)

(3)

Simply set the threshold 𝑇such that 𝑐(𝑇) = 1/𝑝. (Or, if we’re looking for a dark object
on a light background,𝑐(𝑇) = (1 − 1/𝑝)
Finding peaks and valleys
One extremely simple way to find a suitable threshold is to find each of the modes
(local maxima) and then find thevalley (minimum) between them.While this method
appears simple, there are two main problems with it:the histogram may be noisy, thus
causing many local minima and maxima. To get around this, the histogram isusually
smoothed before trying to find separate modes; the sum of two separate distributions,
each with their own mode, may not produce a distribution with twodistinct modes.
Clustering (K-Means Variation)
Another way to look at the problem is that we have two groups of pixels, one with
one range of values and one withanother. What makes thresholding difficult is that these
ranges usually overlap. What we want to do is to minimizethe error of classifying a

background pixel as a foreground one or vice versa. To do this, we try to minimize the
areaunder the histogram for one region that lies on the other region’s side of the threshold.
15

The problem is that we don’thave the histograms for each region, only the histogram for
the combined regions. Understand that the place of minimum overlap (the place where the
misclassified areas of the distributions areequal) is not is not necessarily where the valley
occurs in the combined histogram. This occurs, for example, whenone cluster has a wide
distribution and the other a narrow one. One way that we can try to do this is to consider
the values in the two regions as two clusters.In other words, let 𝜇𝐵(𝑇) be the mean of all
pixels less than the threshold and 𝜇𝑂(𝑇) be the mean of all pixelsgreater than the
threshold. We want to find a threshold such that the following holds:
∀𝑔 ≥ 𝑇 ∶ |𝑔 − 𝜇𝐵(𝑇)| > |𝑔 − 𝜇𝑂 𝑇 |

(4)

∀𝑔 < 𝑇 ∶ |𝑔 − 𝜇𝐵(𝑇)| < |𝑔 − 𝜇𝑂(𝑇)|

(5)

and

The basic idea is to start by estimating𝜇𝐵(𝑇) as the average of the four corner pixels
(assumed to be background) and 𝜇𝑂(𝑇) as the average of everythingelse. Set the
threshold to be halfway between 𝜇𝐵(𝑇) and 𝜇𝑂(𝑇) (thus separating the pixels according
to how closetheir intensities are to 𝜇𝐵(𝑇) and 𝜇𝑂(𝑇) respectively). Now, update the
estimates of 𝜇𝐵(𝑇) and 𝜇𝑂(𝑇) respectivelyby actually calculating the means of the pixels
on each side of the threshold. This process repeats until the algorithmconverges.This
method works well if the spreads of the distributions are approximately equal, but it does

not handle well thecase where the distributions have differing variances.
Clustering (The Otsu Method)
Another way of accomplishing similar results is to set the threshold so as to try to
make each cluster as tight as possible,thus minimizing their overlap. Obviously, we can’t
change the distributions, but we can adjust where weseparate them (the threshold). As we
adjust the threshold one way, we increase the spread of one and decrease thespread of the
other. The goal then is to select the threshold that minimizes the combined spread.We can
define the within-class variance as the weighted sum of the variances of each cluster:
2
2
2
𝜍𝑤𝑖𝑡
𝑕𝑖𝑛 𝑇 = 𝑛𝐵 𝑇 𝜍𝐵 𝑇 + 𝑛𝑜 (𝑇)𝜍0 (𝑇)

𝑛𝐵 𝑇 =
𝑛𝑜 𝑇 =

𝑇−1
𝑖=0 𝑝(𝑖)
𝑁−1
𝑖=𝑇 𝑝(𝑖)

𝜍𝐵2 𝑇 = the variance of the pixels in the background (below threshold)
𝜍02 𝑇 = the variance of the pixels in theforeground (above threshold)
16

(6)
(7)
(8)

and [0,N − 1] is the range of intensity levels.
Computing this within-class variance for each of the two classes for each possible
threshold involves a lot ofcomputation, but there’s an easier way.If we subtract the
within-class variance from the total variance of the combined distribution, you get
somethingcalled the between-class variance:
2
2
𝜍𝐵𝑒𝑡𝑤𝑒𝑒𝑛
𝑇 = 𝜍 2 − 𝜍𝑤𝑖𝑡
𝑕𝑖𝑛 𝑇

= 𝑛𝐵 𝑇 𝜇𝐵 𝑇 − 𝜇

2

+ 𝑛𝑂 (𝑇) 𝜇𝑂 𝑇 − 𝜇

(9)
2

(10)

where𝜍 2 is the combined variance and μ is the combined mean. Notice that the betweenclass variance is simply the weighted variance of the cluster means themselves around the
overall mean. Substituting 𝜇 = 𝑛𝐵 𝑇 𝜇𝐵 (𝑇) + 𝑛𝑂 𝑇 𝜇𝑂 (𝑇) and simplifying, we get
2
𝜍𝐵𝑒𝑡𝑤𝑒𝑒𝑛
𝑇 = 𝑛𝐵 𝑇 𝑛𝑂 𝑇 𝜇𝐵 𝑇 − 𝜇𝑂 𝑇

2

(11)

So, for each potential threshold T we
 Separate the pixels into two clusters according to the threshold.
 Find the mean of each cluster.
 Square the difference between the means.
 Multiply by the number of pixels in one cluster times the number in the other.
This depends only on the difference between the means of the two clusters, thus
avoiding having to calculate differences between individual intensities and the
clustermeans. The optimal threshold is the one that maximizes the between-class variance
(or, conversely, minimizes the within-class variance).
This still sounds like a lot of work, since we have to do this for each possible
threshold, but it turns out that thecomputations aren’t independent as we change from one
threshold to another. We can update 𝑛𝐵 (𝑇), 𝑛𝑂 (𝑇), and the respective cluster means
𝜇𝐵 (𝑇) and 𝜇𝑂 (𝑇) as pixels move from one cluster to the other as T increases. Using
simple recurrence relations we can update the between-class variance as we successively
test each threshold:
𝑛𝐵 𝑇 + 1 = 𝑛𝐵 𝑇 + 𝑛 𝑇

(12)

𝑛𝑂 (𝑇 + 1) = 𝑛𝑂 (𝑇) − 𝑛 𝑇

(13)

𝜇𝐵 𝑇 + 1 =

𝜇 𝐵 𝑇 𝑛 𝐵 𝑇 +𝑛 𝑇 𝑇
𝑛 𝐵 (𝑇+1)

17

(14)

𝜇𝑂 𝑇 + 1 =

𝜇 𝑂 𝑇 𝑛 𝑂 𝑇 −𝑛 𝑇 𝑇

(15)

𝑛 𝑂 (𝑇+1)

This method is called the Otsu method.
Mixture modeling
Another way to minimize the classification error in the threshold is to suppose that
each group is Gaussian-distributed.Each of the distributions has a mean (𝜇𝐵 and
𝜇𝑂 respectively) and a standard deviation (𝜍𝐵 and 𝜍𝑂 respectively)independent of the
threshold we choose:
𝑕𝑚𝑜𝑑𝑒𝑙 𝑔 = 𝑛𝐵 𝑒 − 𝑔−𝜇 𝐵

2 2𝜍 2
𝑂

(16)

Whereas the Otsu method separated the two clusters according to the threshold and tried
to optimize some statistical measure, mixture modeling assumes that there already exists
two distributions and we must find them. Once we know the parameters of the

distributions, it’s easy to determine the best threshold. Unfortunately, we have six
unknown parameters (𝑛𝐵 , 𝑛𝑂 , 𝜇𝐵 , 𝜇𝑂 , 𝜍𝐵 , 𝑎𝑛𝑑 𝜍𝑂 ), so we need to make some estimates of
these quantities. If the two distributions are reasonably well separated (some overlap but
not too much), we can choose an arbitrary threshold T and assume that the mean and
standard deviation of each group approximates the mean and standard deviationof thetwo
underlying populations. We can then measure how well a mix of the two distributions
approximates the overall distribution:
𝐹=

𝑁−1
0

𝑕𝑚𝑜𝑑 𝑒𝑙 𝑔 − 𝑕𝑖𝑚𝑎𝑔𝑒

2

(17)

𝑔

Choosing the optimal threshold thus becomes a matter of finding the one that causes
the mixture of the twoestimated Gaussian distributions to best approximate the actual
histogram (minimizes F). Unfortunately, the solution space is too large to search
exhaustively, so most methods use some form of gradient descent method. Such gradient
descent methods depend heavily on the accuracy of the initial estimate, but the Otsu
method or similar clustering methods can usually provide reasonable initial
estimates.Mixture modeling also extends to models with more than two underlying
distributions (more than two types ofregions).
Multispectral thresholding
A

technique

for

segmenting

images

with

multiple

components

(color

images,Landsat images, or MRI images with T1, T2, and proton-density bands) works by
18

estimating the optimal thresholdin one channel and then segmenting the overall image
based on that threshold. Each of theseregions is then subdivided independently using
properties of the second channel. It is repeated again for the third channel, and so
on,running through all channels repeatedly until each region in the image exhibits a
distribution indicative of a coherentregion (a single mode).
Thresholding along boundaries
If we want our thresholding method to give stay fairly true to the boundaries of the
object, we can first apply someboundary-finding method (such as edge detection
techniques) and then sample the pixels only where the boundary probability is high.Thus,

our threshold method based on pixels near boundaries will cause separations of the pixels
in ways that tendto preserve the boundaries. Other scattered distributions within the object
or the background are of no relevance.However, if the characteristics change along the
boundary, we’re still in trouble. And, of course, there’s still noguarantee that we’ll not
have extraneous pixels or holes.
3.2. Objectdescription
In [33], Objects are represented as a collection of pixels in an image. Thus, for
purposes of recognition weneed to describe the properties of groups of pixels. The
description is often just a set of numbers: the object’s descriptors. From these, we can
compare and recognize objects by simply matching the descriptors of objects in an image
against the descriptors of known objects. However, to be useful for recognition,
descriptors should have four important properties. First, they should define a complete set.
That is, two objects must have the same descriptors if and only if they have the same
shape. Secondly, they should be congruent. As such, we should be able to
recognizesimilar objects when they have similar descriptors. Thirdly, it is convenient that
they haveinvariant properties. For example, rotation-invariant descriptors will be useful
for recognizing objects whatever their orientation. Other important invariance properties
include scale and position and also invariance to affine and perspective changes. These
last two properties are very important when recognizing objects observed from different
viewpoints. In addition to these three properties, the descriptors should be a compact set.
Namely, a descriptor should represent the essence of an object in an efficient way. That is,
it should only contain information about what makes an object unique, or different from
19

the other objects. The quantity of information used to describe this characterization
should be less than the information necessary to have a complete description of the object
itself. Unfortunately, there is no set of complete and compact descriptors to characterize
general objects. Thus, the best recognition performance is obtained by carefully selected
properties. As such, the process of recognition is strongly related to each particular

application with a particular type of objects. Here, the characterization of objects is
presented by two forms of descriptors. Region and shape descriptors characterize an
arrangement of pixels within the area and the arrangement of pixels in the perimeter or
boundary, respectively. This region versus perimeter kind of representation is common in
image analysis. For example, edges can be located by region growing (to label area) or by
differentiation (to label perimeter). There are many techniques that can be used to obtain
descriptors of an object’s boundary.
3.3. Feature vectors extraction
3.3.1.

Discrete Fourier Transform (DFT)

In [34], the Fourier Transform will decompose an image into its sinus and cosines
components. In other words, it will transform an image from its spatial domain to its
frequency domain. The idea is that any function may be approximated exactly with the
sum of infinite sinus and cosines functions. The Fourier Transform is a way how to do
this. Mathematically a two dimensional images Fourier transform is:

𝐹 𝑘, 𝑙 =

𝑁−1
𝑖=0

𝑁−1
𝑗 =0 𝑓

𝑘𝑖

𝑘𝑗

𝑖, 𝑗 𝑒 −𝑖2𝜋( 𝑁 + 𝑁 )

𝑒 𝑖𝑥 = 𝑐𝑜𝑠𝑥 + 𝑖𝑠𝑖𝑛𝑥

(18)
(19)

Here f is the image value in its spatial domain and F in its frequency domain.The
result of the transformation is complex numbers. Displaying this is possible either via a
real image and a complex image or via a magnitude and a phase image. However,
throughout the image processing algorithms only the magnitude image is interesting as
this contains all the information we need about the images geometric structure.
Nevertheless, if we intend to make some modifications of the image in these forms and
then we need to retransform it we will need to preserve both of these.

20

3.3.2.

Log-polar transform

In [32], for two-dimensional images, the log-polar transform [Schwartz80] is a
change from Cartesian to polar coordinates:(𝑥, 𝑦) ↔ 𝑟𝑒 𝑖𝜃 , where 𝑟 =

𝑥2 + 𝑦2

and𝑒𝑥𝑝 𝑖𝜃 = exp⁡
(i. arctan⁡
(y/x)). To separate out the polar coordinates into a (𝜌, 𝜃)

space that is relative to some center point (𝑥𝑐 , 𝑦𝑐 ), we take the log so that 𝜌 =
log⁡ 𝑥 − 𝑥𝑐

2

+ 𝑦 − 𝑦𝑐 2 and𝜃 = arctan⁡
((𝑦 − 𝑦𝑐 )/(𝑥 − 𝑥𝑐 )). For image purposes -

when we need to “fit” the interesting stuff

into the available image memory - we

typically apply a scaling factor m to ρ. Fig. 2 shows a square object on the left and its
encoding in log-polar space.

Fig. 2. The log-polar transform maps (𝐱, 𝐲) into (𝐥𝐨𝐠(𝐫), 𝛉)

The log-polar transform can be used to create two-dimensional invariant
representations of object views by shifting the transformed image’s center of mass to a
fixed point in the log-polar plane; see Fig.3. On the left are three shapes that we want to
recognize as “square”. The problem is, they look very different. One is much larger than
the others and another is rotated. The log-polar transform appears on the right in Figure 3.
Observe that size differences in the (x, y) plane are converted to shifts along the log(r)
axis of the log-polar plane and that the rotation differences are converted to shift s along
the θ-axis in the log-polar plane. If we take the transformed center of each transformed
square in the log-polar plane and then recenterthat point to a certain fixed position, then
all the squares will show up identically in the log-polar plane. This yields a type of
invariance to two-dimensional rotation and scaling.
21

Fig. 3. Log-polar transform of rotated and scaled squares: size goes to a shift on the 𝐥𝐨𝐠(𝐫) axis and rotation
to a shift on the 𝛉 − 𝐚𝐱𝐢𝐬

3.4. Measure of similarity
In [30], In a digital multimedia area, the research of content based image retrieval
(CBIR) used to establish a database composed of images, each is represented as a vector
of features derived from color, shape, and/or texture information. When the query is
requested, a similarity measurement between a user-provided image and those prestored
in the database is computed and compared to report a few of most similar images.
A similarity measurement must be selected to decide how close a vector is to
another vector. The problem can be converted to computing the discrepancy between two
vectors 𝑥, 𝑦 ∈ 𝑅𝑑 . Several distance measurements are presented as follows

3.4.1. Euclidean distance
The Euclidean distance between 𝑥, 𝑦 ∈ 𝑅 𝑑 is computed by
𝛿1 𝑥, 𝑦 = 𝑥 − 𝑦

2

=

𝑑
𝑗 =1

𝑥𝑗 − 𝑦𝑗

2

(20)

A similar measurement called the cityblock distance, which takes fewer operations,
is computed by
𝜏1 𝑥, 𝑦 = 𝑥 − 𝑦

1

=

𝑑
𝑗 =1

𝑥𝑗 − 𝑦𝑗

(21)

Another distance measurement called the supremenorm, is computed by
𝜏2 𝑥, 𝑦 = max1≤𝑗 ≤𝑑 𝑥𝑗 − 𝑦𝑗
22

(22)

3.4.2. Mahalanobis distance
The Mahalanobis distance between two vectors 𝑥and 𝑦with respect to the training
patterns {𝒙𝑖 }is computedby
𝛿2 𝑥, 𝑦 =

𝑥 − 𝑦 𝑡 𝑆 −1 𝑥 − 𝑦 ,

(23)

where the mean vector 𝑢and the sample covariance matrix 𝑆from the sample 𝑥𝑖 |1 ≤ 𝑖 ≤

𝑛 of size n are computed by 𝑆=1𝑛𝑖=1𝑛𝑥𝑖−𝑢𝑥𝑖−𝑢𝑡 with
𝑢=

1
𝑛

𝑛
𝑖=1 𝑥𝑖

(24)

3.4.3. Chord distance
The chord distance between two vectors x and y is to measure the distance between
the projected vectors of x and y onto the unit sphere, which can be computedby
𝛿3 𝑥, 𝑦 =

𝑥
𝑟

−

𝑦
𝑠

2

where 𝑟 = 𝑥

2,

𝑠 = 𝑦 2.

A simple computation leads to 𝛿3 𝑥, 𝑦 = 2sin⁡
(𝛼/2) with 𝛼 begin the angle
between vectors 𝑥and 𝑦.
A similar measurement based on the angle between vectors 𝑥 and 𝑦 is defined as
𝜏3 𝑥, 𝑦 = 1 − cos⁡
(𝛼) , cos 𝛼 =

23

𝑥.𝑦
𝑥 2 𝑦 2

(25)

Chapter 4: Proposed method
4.1. Pre-processing
In this initial stage, images are scaled down with the smaller side of 300 pixels and
converted into gray scales. The images are then converted into binary trademark using
Otsu’s method [11] which minimizes the weighted within-class variance or maximizes the
between-class variance. Otsu’s algorithm is also one of the five automated methods for
finding threshold: finding peaks and valleys, clustering (K-Means Variation), mixture
modeling, and multispectral thresholding.
4.2. Visual shape objects extraction

In this stage, shape objects in form of connected contours are retrieved from the
binary image using Suzuki's algorithm in [12]. Each of those detected contours is stored as
a vector of points. All of the vectors are organized as a structure of hierarchy, which
contains information about the image topology. The number of contours is manipulated by
the texture of image.
There are four options processed in [12] for contour retrieval: (i) retrieve only the
extreme outer contours, (ii) retrieve all of the contours without establishing any
hierarchical relationships, (iii) retrieve all of the contours organized into a two-level
hierarchy; and (iv) retrieve all of the contours and reconstruct a full hierarchy of nested
contours.
In the present research, we adopted the second option for shape object extraction.
From a binary trademark image, we extracted a number of shape object images. However,
due to noise presence of input trademark image, many noise contours were extracted as
shape objects. To prevent this problem, we applied a filter so that the noise contours were
removed. Observations showed that the dominant shape contours usually have a much
larger area in comparison with the noise contours. Furthermore, due to the characteristics
of trademarks in our database, most trademarks consist of one or two dominant shape
objects which play a primary role in a company’s reputation. For this reason, we propose
an algorithm to extract up to two dominant shape objects out of a binary image. The
algorithm composes 4 main steps and one function named FilterContours (see Fig. 4)
responsible for taking out two dominant shape objects. FilterContours operation was
24

Trích xuất ảnh trademark dựa trên các đặc trưng bất biến dịch chuyển, quay, tỷ lệ (Trademark Image Retrieval Based on Scale, Rotation, Translation Invariant Features) : M.A Thesis Information Technology : 60 48 01

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về