BUILDING A BOOK RECOGNITION PROGRAM ON ANDROID SMARTPHONE

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.23 MB, 23 trang )

1

ĐẠI HỌC QUỐC GIA HÀ NỘI
TRƯỜNG ĐẠI HỌC CÔNG NGHỆ

CÔNG TRÌNH DỰ THI
GIẢI THƯỞNG “SINH VIÊN NGHIÊN CỨU KHOA HỌC”
NĂM 2012

Tên công trình
BUILDING A BOOK RECOGNITION PROGRAM ON ANDROID
SMARTPHONE

Họ tên sinh viên:
Hoàng Thanh Tùng, lớp K53CA-KHMT
Nguyễn Hữu Cường, lớp K53CA-KHMT
Đỗ Tất Thắng, lớp K53CA-KHMT
Khoa: Công nghệ thông tin
Giáo viên hướng dẫn: TS. Nguyễn Phương Thái

HÀ NỘI, 2012
2

Contents
Abstract 3
Chapter 1: Introduction 4
1. Objective 4

2. Related works 4
Chapter 2: Approaches to image retrieval 5
1. Image meta search 5
2. Content based image retrieval 5
3. Approach for our system 6
Chapter 3: Image retrieval with OpenCV 7
1. Overview of OpenCV library 7
2. Interest point in Image Retrieval 8
3. Speeded Up Robust Feature 9
3.1. SURF’s properties 9
4. Image search in our system 11
Chapter 4: Building the system 14
1. Overview of system architecture 14
2. Handling client’s requests 15
2.1. Search for book information 15
2.2. Search for related book 16
2.3. Search for nearby bookshop 17
2.4. Rate a book 18
Chapter 5: Experimental result 19
Chapter 6: Future works 21
Chapter 7: Conclusion 22
References 23
Links to software and sites 23

3

Abstract
Information is an essential need of human. Searching for information using search engine
like Google, Bing and Yahoo is familiar to every people. However, searching in text is

boring and sometimes does not give the correct information. Recently, Google have
introduced the new image search engine which can search for images similar to the image
uploaded by user. We have developed a system which allows people to use images of
book cover as queries for information about the book. Our system provides an easier and
more interesting way of searching. The system was built for users with mobile devices
which have camera function. The system applies the modern technologies of Content
Based Image Retrieval to provide a fast and reliable search engine. Experiments on our
database show that the system has high accuracy and is robust to many kind graphical
deformations.

4

Chapter 1: Introduction
1. Objective
Currently, the rapid development of the Internet leads to an exponential growth in the
amount of data. Automatic searching and retrieving data from large database is currently
one of the most important research fields. Image retrieval (IR) is the problem of finding
and retrieving images from digital image database. Traditional methods utilize the
metadata associated with images such as captioning, keywords to classify images and
perform retrieval task. The metadata are often created manually, these methods, therefore
cannot be applied for large database. Content based image retrieval (CBIR) is the whole
new approach to IR problem. In CBIR, the image will be classified and retrieved based of
the actual content of the image such as lines, colors, shapes, texture and any other
information which can be derived from the image. CBIR, thus, can provide better
classification and more reliable search result. A CBIR system also eliminates the need of
human force in annotating the images.
Using CBIR on computer have become familiar for many people while there are quite a
few number of CBIR programs for mobile devices. We aim to a simple but helpful CBIR
system for portable device users. Our program, however, is not a trivial CBIR system

where only the closest matches of the input images are returned. At the time this report is
written, a user of the system can take image of book cover, sending them to the server to
receive information of the book such as title, author, publisher, price, criticisms. User can
also search for related books, bookshops and give opinion about books.
The experimental result shows that the accuracy of the system is high for clear and large
input image. The result is still acceptable when the image is noisy, small or when only a
part of the cover is captured.
2. Related works
Google Goggles allows user to search for information about scenes, books, and any
objects they see just by taking and uploading photos of the scenes or objects. The
program is accurate for high quality input images but the accuracy decreases dramatically
when the image is noisy or is taken from different views or in poor lighting condition.
Goggles is also much slower than the traditional search engine provided by Google (it
could take 10 seconds to complete a search on phone with 3G connection).

5

Chapter 2: Approaches to image retrieval
As we have discussed in chapter 1, there are two main approaches to IR problem. The
traditional approach use metadata to perform the search while CBIR use information
extracted from the image itself. In this chapter we look deeper at each approach to see
their advantages and disadvantages.
1. Image meta search
In a meta search system, metadata are usually in text form and are indexed and stored in
database. The data are external to the images and are added to the images to make the
meta search possible. Image search in these systems is performed in the same way as in
other text search engines. Input to the system is the description of the input image (the
description may be created by user of the system or derived from the context of the
image). The search engine compares the description with metadata of images in database
to find closest matches and returns the result in descending order of relevancy.

One advantage of meta search systems is that we could reuse the powerful text search
engines to perform image retrieval. Because indexing and searching in a text database is
much faster than that in a multimedia database, this approach has better time performance
than the CBIR approach. The most common search engines today including Google, Bing
and Yahoo use this approach to provide image search.
A big disadvantage of the approach is that the metadata is external to the image and may
not precisely describe the actual content of the image. Not good metadata will produce a
large number of irrelevant images in the search result. Although many methods for
creating the metadata automatically have been proposed (LDA for image retrieval, see
[2], [3], [4]), the achieved result did not satisfy high expectation users. The quality of
search results relies largely on the quality of descriptions of input which are often created
by users. Users may not always give good description to their images; the accuracy of the
system therefore decreases accordingly. Furthermore, requiring users to describe the
images makes the search more complex and less interesting. Thus, a more accurate and
friendly search engine is desirable.
2. Content based image retrieval
CBIR approach makes use of modern Computer Vision (CV) techniques to solve the
image retrieval problem. Differ from the meta search systems, CBIR systems do not store
the metadata of images but the information derived from the images themselves including
color, intensity, shapes, textures, lines, interesting points and many other useful
information. Different CBIR systems select different features to store and use different
algorithm for classifying and searching images. When users want to search for some
6

images, they just need to give their image and the system will automatically detect the
relevant features from the image, comparing the information to that of the database
images to find the best matches. The results, thus, are graphically related to the input.
This help CBIR systems remove a large number of garbage results which are normally
produced by meta search system. CBIR systems also allow people to draw an
approximation of their image and use that as the input to the search engine. This breaks

the limit of traditional IR systems where images can only described by words.
CBIR systems, however cannot completely replace the old meta search systems. Current
algorithms for extracting visual features from images and searching in database of those
features are still very complex in both time and space. As a result, CBIR is not efficient
for huge database or system with large number of queries per time interval. Besides,
searching for visually related images does not always give good result. When users want
to find different images related to some events or people, CBIR is not suitable because
the images may be graphically similar but they do not relate to the events or people.
3. Approach for our system
As we have discussed, each approach has its own advantages and drawbacks. We have
selected CBIR as the method for developing our system. There are a number of reasons
for this decision.
Firstly, our primary goal is creating an image search program for mobile users so we need
an interactive way of searching and sharing information. Searching with text is very
common and somewhat boring. With our program, people can take images with their
smart phone or digital cameras and use those images to search for necessary information.
Secondly, we want to create a system which can give users information about thing that
users do not know what it is or how to describe it. This circumstance occurs when people
travel to strange place, seeing things that they have never encountered before. Our system
can provide reliable information for users by searching for similar images and returning
the information associated with those images to users.
While there are too many meta search programs, there are only a few number of CBIR
programs for mobile devices. Hence, developing a CBIR program for those devices
would be promising. Android is currently the most popular operating system for mobile
devices such as smartphone and tablet thus we have chosen Android as the platform for
client program.

7

Chapter 3: Image retrieval with OpenCV

1. Overview of OpenCV library
OpenCV [9] is an open source library for real time computer vision developed by Intel
and currently supported by Willow Garage. OpenCV offers many advanced functions for
computer vision and image processing. OpenCV is released under the BSD license. The
library is available for Linux, Windows, Mac OS and Android platform. Originally
written in C but C#, Java, Python, Ruby wrapper for OpenCV are available to users now.
According to Willow Garage, OpenCV has over 500 functions with more than 2500
optimized algorithms. OpenCV’s functions can be categorized as follow:
 General Image Processing Functions
 Image Pyramid
 Geometric descriptors
 Camera calibration, Stereo, 3D
 Fitting
 Tracking
 Machine learning: detection and recognition
 Transforms
 Segmentation
 Utilities and Data structures
 Features
8

Figure 1: Overview of OpenCV’s functions
Because of this rich collection of function, OpenCV is used by more than 40,000 people
for both academic and commercial purpose.
We have used OpenCV for detecting, describing interest points in images and matching
those sets of interest point. A set of keypoint carries the information about the image and
we can expect that two similar images should have two similar sets of keypoint.
Therefore, by comparing the two sets, we can measure the difference between the two
images. While OpenCV can detect many types of interest point, we have selected

Speeded Up Robust Feature (SURF). The main properties of SURF are given in the next
part of this chapter.
2. Interest point in Image Retrieval
According to Herbert Bay et al. [1] the process of finding similar images in database
consists of three steps. First we need to detect interest points at distinctive location in the
image, these points could be corners, blobs or T-junctions. We evaluate most the
repeatability of an interest point detector. A good detector should be able to reliably find
the same physical interest points under different viewing condition. The next step is
describing the neighborhood of the detected interest points by a feature vector. The two
most important properties of this feature vector are distinctiveness and robustness.
Distinctiveness means that two feature vectors of two different images are different. The
9

feature vector computed from a noisy, transformed version of the image should not be too
different from the vector of the original image; this property is called the robustness of
the detector. The last step is matching the descriptor vectors of different images. We
measure the dissimilarity between the two vectors by the distance between them (the
distance could be Mahalanobis or Euclidean distance). Distance between the vectors
somehow reflects the distance between images, we need some other mechanism to refine
the results and then order to images in database. Due to the curse of dimensionality,
matching high dimensional vectors is still a time consuming task. Various techniques
have been developed for matching high dimensional vectors. OpenCV provide an
approximate but fast algorithm for this problem called Best Bin First [5, 7] which we
have used in our program.
3. Speeded Up Robust Feature
Because we focused at using SURF for image retrieval, not studying how to detect and
describe it, we do not give details about the mathematics foundation and other specialized
knowledge. For complete description of SURF, please consult Herbert Bay et al. [1].
3.1. SURF’s properties
SURF was proposed by Herbert et al. in 2008 and since then, it has been used in a wide

range of CV application. Performance of SURF can be compared to state of the art
detector and descriptor while SURF is much faster. SURF is built on the best to date
detector and descriptor (a Hessian matrix based detector and a distribution based
descriptor). SURF simplifies the detector and descriptor to achieve high speed while
keeps the performance unchanged. As stated by the authors, SURF has high repeating
score, distinctiveness and is robust to image deformations. The following figures are
taken from Herbert Bay’s paper that show the performance of SURF in some typical
benchmark databases.
10

Figure 2: Repeatability score for image rotation of up to 180 degrees.
Fast-Hessian is the more accurate detector and is used for SURF detector in OpenCV.
11

Figure 3: Repeating score for three different images when
scale and blur of those images are changed
4. Image search in our system
OpenCV provide a strong implementation of SURF with the optimized algorithms for
detecting, descripting and matching SURF keypoints. In OpenCV, keypoint of all types
are represented by an instance of KeyPoint class. Classes relating to detecting,
descripting and matching SURF keypoints are SurfFeatureDetector,
SurfDescriptorExtractor, FlannBasedMatcher and DMatch.

We first resize all images to the size of 200 by 300 pixels then run SurfFeatureDetector
for every image to detect keypoints. For each image, we try several setting of detector
parameters to limit the number of keypoint between 150 and 200. Experiments show that
150 to 200 keypoints per image give good matching result while less time to compute.
Descriptor vector for each image is then computed using SurfDescriptorExtractor. We

store the keypoints, descriptor vector and associated information of the image (author,
12

publisher, price, etc…) in the database. Figure 4, 5 and 6 show the SURF keypoint
detected on an image and the result of matching keypoints of two images.

Figure 4: SURF keypoints of an image

Figure 5: Matching keypoints of two images.
On the left: The bottom image is the larger, rotated version of the top
image. The program finds hundreds of corresponding points between them.
On the right: The two images are very different. The program can only
find 2 corresponding keypoints between them.
This result suggests that the number of correspondences reflects the
similarity between the images.
13

Figure 6: Matching keypoints between two different images.
Although the box in the bottom image is partially covered by other boxes;
we still obtain very good matching result.

When we want to look for images similar to a given image, we extract the keypoints and
descriptor vector. We then use FlannBasedMatcher to match the keypoints and descriptor
of the given image to those of each image in database. For every keypoint in the first
image, FlannBasedMatcher finds the most similar keypoint in the second image. The
result of matching keypoints and descriptor of two images is stored in a vector of DMatch

class. Each instance of DMatch class contains the indices and the distance between the
two keypoints. The smaller the distance is the more similar two keypoints are. Figure 5
also shows that the number of matched pair can be used to measure the similarity
between images. The distance and the number of pairs, however, cannot be directly used
for judging the similarity between two images. To refine the result, we remove pairs with
long distance, specifically; we only keep pairs with the distance of less than 0.2. The
average distance and the number of remaining pairs are used to compute the difference
between two images:
  



After having the score of all images in database, we order the images in ascending order
of . The first returned image, therefore, is the closest match to user’s
image. To provide the book search function, we extract the information associated with
the best matches which we have just obtained and return it to user.

14

Chapter 4: Building the system
1. Overview of system architecture
The system has client-server architecture. While server is written in Java and the search
engine is written in C++ on Linux platform, client program is written in Java on Android
platform. We use the free MySQL database management system for managing our
database. The main components of the system are shown in figure 7.

Figure 7: System architecture
A typical scenario in the system is described below:

A user sends a request to our system. Controller receives the request and then passes it to
an idle thread. The thread checks for the type of request to see whether it can handle the
request individually or it has to use result from Search engine. If the thread can process
the request by itself, it executes the suitable functions. Otherwise, the thread waits for
data from Search engine and uses that data to create the final result. The result in both
cases is returned to controller. Controller sends back that result to user.
While the core function of the system is still image retrieval, we have added many
functions to the system to give users a whole new experience of searching for book
Client
Program
Controller
Search
engine
Inactive
thread
DBMS
Active
Thread
request
result
request
result
result
request
search
request
search
result
Socket interface
15

information. Users can send different request to the system including: search for book
information, search for related book, search for nearby bookshops and rate a book.
Details about the requests are given in the next part of this chapter.
2. Handling client’s requests
As mentioned in the previous part, server has to handle the following types of request:
2.1. Search for book information
User uploads his/her image of the book cover to search for information about the book.
Server receives the request and passes it to an idle thread. The thread extracts the image
and passes it to Search engine. Search engine searches for the best matches of that image.
The thread looks for the corresponding information of those best matches and returns it to
user.

Figure 8: Realization of use case “Search for book information”
16

Figure 9: Some images of book covers used in our test

Figure 10: Result of searching for the book “Artificial Intelligence A Modern Approach”
in our database. We can notice that 4 of 5 books in the result is the book we want to find.
2.2. Search for related book
When received information of a book, user may want to find related books in database.
Server searches for books which have the same author or tags and return them to user.
Books are manually tagged; tags of a book represent the main content of the book.
17

Figure 11: Realization of use case “Search for related book”

2.3. Search for nearby bookshop
Server has to give user information about bookshops that sell the book user want to buy.
If there is no shop selling the book, server returns a default list of shops.

Figure 12: Realization of use case “Search for nearby shop”
18

Figure 13: Result of finding bookshops. User can either view the position
of the shop on Google Map or make a call to the shop.
2.4. Rate a book
Users can give their opinion about a book by rating it. Server updates rating of the book
right after receiving a rate request.

Figure 14: Interface for rating a book

19

Chapter 5: Experimental result
We have run a number of tests to assess our system functionality and performance. The
hardware configuration of our system is as follow:
 Server has an E2180 Dual Core Processor, 2GB of RAM and runs Ubuntu 10.10.
 Client is a Samsung Galaxy GIO smartphone with Android 2.2 installed.
Figures from 15 to 17 show the result of different kinds of test on our system.

Figure 15: Retrieval time at client side for 100 queries.
This includes time to upload the image and download the result. The speed
at client side is mainly affected by speed of network connection

0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Retrieval time
Average
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Retrieval time
Average
20

Figure 16: Retrieval time at server side for 100 queries.

Figure 17: Result of 100 tests. Images in the test are taken by the 2.0 MP camera of our
Galaxy GIO. The test includes both clear and noisy images of rotated, folded, resized

book covers in various lighting conditions and backgrounds. Complete test data can be
downloaded from project home page [10].

0
10
20
30
40
50
60
70
80
90
100
Desired book is #1 result Desired book in top 5 results
Number of test
21

Chapter 6: Future works
When the database is large, comparing input image to every image in database is
impossible. We proposed a method for fast finding the best matches of input image.
Because images in database are characterized by their feature vectors, we can use K-
means algorithm to group those feature vectors into K different groups and each group is
represented by its center vector. When looking for the best matches of an image, we

compare its feature vector to center vectors of all K groups. The closest groups are
selected. We can apply exhaustive search for best matches in those groups as in current
system. We can also further divide each group into some smaller groups to achieve better
time performance.

Figure 18: Example of K-means algorithm with K = 4 for 2D vectors
22

Chapter 7: Conclusion
In this report we have describe the modern technologies in Content Based Image
Retrieval and the method of applying CBIR to an image search system. We have built a
complete system with many advanced functions. We also have chance to use some
software engineering principles. The system has achieved positive result on a sample
database. The method for extending the system was also proposed.

23

References
[1] Herbert Bay, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. Speeded-Up
Robust Features (SURF)
[2] Xiaogang Wang and Eric Grimson. Spatial Latent Dirichlet Allocation
[3] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet Allocation
[4] Eva Horster, Rainer Lienhart, and Malcolm Slaney. Image Retrieval on Large-
Scale Image Databases
[5] David Marshall. Nearest Neighbour Searching in High Dimensional Metric Space
[6] David Lowe. Scale Invariant Feature Transform
[7] Haifeng Liu; Deng, M.; Chuangbai Xiao; Coll. of Comput. Sci. & Technol.,
Beijing Univ. of Technol., Beijing, China. An improved best bin first algorithm for fast
image registration
[8] K. Mikolajczyk and C. Schmid. Indexing based on scale invariant interest points.

In ICCV, volume 1, pages 525-531, 2001.
Links to software and sites
[9] OpenCV home page:
[10] Project home page:
[11] Android home page:
[12] Android developer page:

BUILDING A BOOK RECOGNITION PROGRAM ON ANDROID SMARTPHONE

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về