A complete fingerprint matching algorithm on GPU for a large scale identification system

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (447.77 KB, 10 trang )

A Complete Fingerprint Matching Algorithm
on GPU for a Large Scale Identification System
Hong Hai Le, Ngoc Hoa Nguyen and Tri Thanh Nguyen

Abstract &Fingerprints are most used biometrics features for identification.
Although state-of-the-art algorithms are very accurate, but the need for processing
speed for databases with millions fingerprints are very demanding. GPU devices
are used widely in parallel computing tasks for their efficiency and low-cost. Most
approaches make use of GPU for the filtering process in a multi-stage matching
system. In this paper, we present a complete fingerprint matching algorithm on
GPU. Our approach uses minutia cylinder-code (MCC) representation with a
global consolidation stage and a careful design to make it suitable for the
architecture of GPU. The result tested with GTX- 680 device shows that the
proposed algorithm can perform 1.8 millions matches per second, making it
applicable for real time identification systems with databases of millions
fingerprints.
Keywords Fingerprint identification · Matching · Minutiae · MCC · GPU · CUDA

1

Introduction

Approaches to fingerprint matching algorithms which compare two given
fingerprints and return a degree of similarity are often classified into three types:
correlation-based matching, minutiae-based matching, and ridge feature-based
matching. The Fingerprint Verification Competitions (FVC) [2] shows that the
minutiae-based matching is the most popular approach. Minutiae are the points
where the ridge continuity breaks and it is typically represented as a triplet (x, y,
θ); where x and y represent the point coordinates and θ is the ridge direction at that
point. The task in the minutiae-based matching approach is finding the maximum
number of matching minutiae pairs in two given fingerprints. Figure 1 shows

matches between two fingerprints based on minutiae.
H.H. Le() · N.H. Nguyen · T.T. Nguyen
Vietnam National University of Hanoi, 144 Xuan Thuy, Hanoi, Vietnam
e-mail: {hailh,hoann,ntthanh}@vnu.edu.vn
© Springer Science+Business Media Singapore 2016
K.J. Kim and N. Joukov (eds.), Information Science and Applications (ICISA) 2016,
Lecture Notes in Electrical Engineering 376,
DOI: 10.1007/978-981-10-0557-2_67

679

680

H.H. Le et al.

Most minutiae-based matching algorithms consist of two steps: perform a local
structure matching and followed by a consolidation stage. The local structure
matching allows to quickly find pairs of minutiae that can be matched locally and
can be candidates for aligning between the two fingerprints. Local structures are
normally invariant to the fingerprint rotation and translation. Local structures of
minutiae are typically represented by neighboring minutiae [3,11], ridges [7,8],
orientations [9,10], or combination of these [6]. Recently, Minutia Cylinder-Code
(MCC) representation [4] shows a good performance in both accuracy and speed
of fingerprint matching. The aim of the consolidation stage is to check whether the
local matching minutiae pairs still satisfy at the global alignment level of two
fingerprints.
There are basically two methods to increase the speed of a fingerprint
identification system: reducing the total number of fingerprint comparisons
(through fingerprint classification [16, 17], pre-filtering or multistage matching

[18,19]), or using parallel architectures [14].
Graphic Processing Unit (GPU) has been proven to be a very useful tool to
accelerate the processing speed of computationally intensive algorithms. These
devices introduce massive parallelism in the calculations and apply successfully in
fields such as artificial intelligence [21,22], simulation [23] and bioinformatics
[24]. Recently, having reports for applying GPU for MCC fingerprint matching
like the works of Gutierrez et al. [12], Capelli et al. [13]. Most approaches make
use of GPU for the filtering process. After that, more accurate matching
algorithms on CPU for remaining fingerprint candidates are used.
In this paper, we propose a different approach to adapt complete stages of the
fingerprint matching algorithm based on MCC representation for GPU, the
proposal is suitable with GPU computing architecture, making it easy for
implementation.

Fig. 1 Fingerprint matching based on minutiae.

The rest of the paper is organized as follows: in section 2, we review the stages
of the fingerprint matching algorithm based on MCC. GPU programming model is
briefly described in section 3. Section 4 describes our adaption MCC for GPU.
Section 5 details the experimental results over FVC 2002 DB database.

A Complete Fingerprint Matching Algorithm on GPU

2

Matching Algorithm Based on MCC

2.1

Local Structure Matching

681

Minutia Cylinder-Code (MCC) representation [4] shows a good performance in
both accuracy and speed of the fingerprint matching. Each minutia is represented
by a cylinder feature. This cylinder which is centered at the minutia, has a fixed
radius R, and a height of 2π. Each cylinder is divided into
× ×
cells as
shown in Figure 2.
defines the resolution of the discretized 2D space around
represents the number divisions applied to the
minutia
( × ) and
height of the cylinder (2π) which represents the angular distance.
The contribution of each minutia
to a cell (of the cylinder corresponding to
a given minutia ), depends both on: spatial information (how much
is close
to the center of the cell), and directional information (how much the directional
difference between
and
), is similar to the directional difference associated
to the section where the cell lies). In other words, the value of a cell represents the
likelihood of finding minutiae that are close to the cell and whose directional
difference with respect to
.

Fig. 2 Structure of a cylinder [12].

Once a cylinder is built for minutia
, it can be simply treated as a single
feature vector. With a negligible loss of accuracy [4], each element of the feature
vector can be stored as a bit. A simple but effective similarity measure between two
bit vectors of cylinder
and and of cylinder is described in Formula 1 [4]
|| ⨁ ||
1−
,
≤
(1)
,
=
|| || + || ||
0
ℎ
Where
 ⨁ represents the bitwise XOR operator;
 ||. || represents the Euclidean distance;
 ( , is the difference between the angles of the two minutia
and
;
is the maximum rotation threshold allowed between two fingerprints.


682

H.H. Le et al.

With the cylinder set of the two fingerprints ( and ) to be matched, a local
matching process is started on every pair of cylinders using Formula 1 and the
results are stored in a matrix. After that most approaches for GPU [12,13] use
Local Similarity Sort (LSS) technique which sorts all values of the matrix and
computes the average of the top values. This does not guarantee that these top
minutiae pairs are matched with each other. The consolidation stage presented in
the next section is used to check whether the local matched minutiae pairs still
satisfy at the global alignment level of two fingerprints.

2.2

Consolidation Stage

The simplest consolidation approach uses the local matched minutiae pair having
the maximum similarity value in order to align the fingerprint and
for the
global matching step. After the alignment, all local matched pairs are verified
whether they are still matched by the following constraints:
-

The Euclidean distance between the two minutiae does not exceed
threshold ts.
The difference between the two minutiae angles does not exceed threshold tθ.

The two parameters ts and tθ represent the tolerance window and their value
can be determined by experiments. For example, in TK algorithm [10], the
distance threshold ts= 12 and the angle threshold tθ= π/6 brought a good result of
fingerprint matching.
However, the transformation on the minutiae pair having maximum similarity value

may not be the best transformation at the global level. Several authors have adopted
multiple candidate transformations for the alignments. Finally, the transformation that
maximizes the number of global matching minutiae pairs will be chosen. For instance,
Medina et al. [11] reduced the number of local matching minutiae pairs by for each
minutia p and minutia q, selecting only minutia that maximizes their similarity values,
then perform the transformation for each minutiae pair in the reduced set. Feng et al.
[6] sorted minutiae pairs by descending similarity values and chose top minutia
pairs for the transformation. Normally, these approaches allow to get better accuracy
than that of the single transformation approach.

3

Graphics Processing Units

The Compute Unified Device Architecture (CUDA) is one of the most widelyadopted frameworks for GPU; CUDA is a hardware and software architecture that
enables NVIDIA GPU to execute parallel kernels written in C. The physical
architecture of CUDA-enabled GPU consists of a set of Streaming Multiprocessors
(SM), each containing 32 cores for SIMD (Single Instruction Multi Data). In the
CUDA programming model, a CUDA kernel is executed in parallel across a set of
threads, which are organized into blocks. All threads of the same block are executed

A Complete Fingerprint Matching Algorithm on GPU

683

on the same SM and share the limited memory resources of that multiprocessor. The
maximum number of threads in a block cannot be too big (1024 in the GPU used in
this work). However, a kernel can be executed by multiple, equally-sized blocks,
forming a grid: the total number of threads is then equal to the number of blocks

times the number of threads per block (Fig. 3). Each SM schedules and executes
threads in groups of 32 parallel threads (being 32 the number of cores in a SM) called
warps. A warp executes one common instruction at a time, so full efficiency is
realized when all 32 threads of a warp synchronize their execution path. If threads of
the same warp take different paths (due to flow control instructions), they have to
wait for each other. It is important to make GPU threads are extremely lightweight.
CUDA threads have access to various memory types (Fig. 3): each thread has its
registers, which are the fastest memory, and its private local memory (which is
slower); each block has small shared memory accessible to all threads of the block
and with the same lifetime of the block; all threads have access to the global
memory: the largest and slowest memory, which is used for communication between
different blocks and with the host. When a warp executes an instruction that
accesses global memory, it coalesces the memory accesses of the threads within the
warp into one or more of these memory transactions, depending on the size of the
word accessed by each thread and the distribution of the memory addresses across
the threads [15]. Therefore a very important optimization in CUDA is to ensure that
global memory accesses are as much coalesced as possible.

Fig. 3 CUDA: grid, blocks, threads, and the various memory spaces [15].

684

4

H.H. Le et al.

Adapting Complete Fingerprint Matching to GPU

For identifying a query fingerprint

in a database of
template fingerprints
{ , …, } using matching algorithm based on MCC representation, the first local
structure matching stage calculates similarity matrices, after that similarity score set
= { , , . . , } is calculated. Figure 4 demonstrates these calculating steps.
When adapting the algorithm to GPU, the aim is to maximize active threads.
Threads are grouped by wraps. Each of which contains 32 threads. Because of the
variable number of minutiae of fingerprints, to avoid divergence between threads
in the warps, approaches [12,13] divided the algorithm mainly into 2 separate
kernel GPU calls. The first kernel GPU call is to calculate all similarity matrices.
The second GPU call is to calculate score from similarity matrixes. When dividing
the algorithm into separate calls, it needs to transfer data between kernel calls and
some advantages of the GPU architecture like share memory is not utilized. [13]
used a very careful design algorithm and ad hoc technique to translate the
similarity matrix to a fix size.

Fig. 4 Calculating for fingerprint identification process using MCC [13]

[12,13] did not use consolidation stage to calculate score set = { , , . . , },
they used Local Similarity Sort (LSS) technique which sorts all values of the matrix
and computes the average of the top values. This does not guarantee that these top
minutiae pairs are matched with each other. After that, more accurate matching
algorithms on CPU for remaining fingerprint candidates are used.
Our approach is based on a view using 32 minutiae for each fingerprint is
enough for the matching process. From statistics of FVC 2002 fingerprint
databases, the average number of minutiae of each fingerprint is 30, and the
average number of matches for a genuine matching is 6. In our algorithm, we use
all minutiae for calculating cylinders of the fingerprint after that we choose 32
minutiae with cylinder having maximum number of 1 value. Minutia with cylinder
having less 1 value tends to be outline of the fingerprint. All minutiae are used to

calculate cylinders so the bit vectors of cylinders do not be affected.

A Complete Fingerprint Matching Algorithm on GPU

:
− Template fingperprints { , , . . , }
− A query fingerprint
:
−Matching score set = { , , . . , }
Kernel execution configuration:
− 32 threads per block, blocks
ℎ
ℎ
[32]
Share memory
Share memory
[32]
Share memory
ℎ
Block and thread index of the current thread
,
//Local structure matching stage
1.
: the cylinder of minutia
of template
2. : the angle of minutia
of template
3. For = 1 to 32
4.

: the cylinder of minutia
of query fingerprint
5.
: the angle of minutia
of query fingerprint
)≤ )
6. If( ( ,
7.
=
( ,
)
[
],
8.
updateMax(
)
[
], )
updateMax(
9. End If
10. End For
11. __syncthreads()
//Consolidation stage
=0
12.
ℎ
13. For = 1 to 32
14.
If
<>

and (
, ) pair of
,
(
[
[ ] ) pair of
]
15.
ℎ
++
16.
End If
17. End For
18. atomicMax(maxMatching,
ℎ
)
19. __syncthreads();
19. If(
== 0)
20.
=
ℎ
/(32 ∗ 32)
Fig. 5 Complete fingerprint matching on GPU

685

ℎ

686

H.H. Le et al.

By using our approach, all the similarity matrices in Figure 4 have the same size of
32x32. We use one block for matching with , each block has 32 threads. Each
thread of the block is used to calculate a column in the similarity matrix and to find
the maximum value in that column. The minutiae pairs which have maximum values
are checked with each other in the consolidation stage to guarantee they are matched.
The details of the algorithm are presented in the Figure 5.








5

All the cylinder and angle of the templates of the databases are previously
loaded into the global memory of GPU.
is used to calculate the similarity
The GPU block with index of
between fingerprint template
of the fingerprint database with the query
fingerprint . Each thread with the index of
of the block is used to
calculate the maximum value of the matrix column using the loop at line 3
and store that value in the array

.
,
in line 7 is calculated by using Formula 1.
The similarity
ℎ
() function in line 11 is a barrier for all the threads of the
block, after that all the results of the threads of the block are available for the
consolidation stage.
__
() function in line 18 helps to avoid race condition occurs
when two or more threads of the block update the share variable
ℎ
at the same time.
The similarity score
is calculated in line 20 by the first thread of the
block using maximum matching value found from all threads of the block.

Experimental Results

In order to evaluate our proposed approaches, we used the FVC 2002 DB1
database to carry out experiments. The minutiae extraction and creating MCC
cylinders process used tools from Pérez et al. [25]. The MCC templates of
fingerprints were stored on disk and used for the experiments.
For evaluating the accuracy of the proposed algorithm, the result of the
proposed algorithm is compared to the result of the MCC baseline in which all
minutiae are used for the matching process. We achieved an EER of 1.34%
against a 1.26% of MCC-baseline. These are certainly minor differences and can
be accepted in real world applications.
For evaluating the speed of the proposed algorithm, we carried out all the
experiments on a GTX GPU, an NVIDIA GeForce GTX 680 with 1536 CUDA

cores, Kepler Architecture and 2GB of memory. FVC 2002 database was scaled to
different database sizes (ranging from 10000 to 200 000) to study how the GPU
based algorithm scaled with the database size. 10 input fingerprints were randomly
selected to be identified. Table 1 shows the result of experiments with databases
with different sizes.

A Complete Fingerprint Matching Algorithm on GPU

687

Table 1 Execution time of the first ten queriess with different database sizes
DB size
10000
50000
100000
150000
200000

Time (ms)
58
284
567
850
1105

Throughput (KMPS)
1724
1760
1763

1764
1809

At larger DB sizes, throughput of the proposed algorithm is stable at 1.8
millions matches per second, no scalability issues were found. It is higher than the
result reported for previously published GPU algorithm [12], which gains 55.7
KMPS on a single GPU device, which is the same as our device. Though [12]
used a different fingerprint database in the experiments, the average number of
minutiae of a fingerprint is quite stable.

6

Conclusions

This paper proposes an approach of adapting the complete stages of the fingerprint
matching algorithm based on MCC on GPU. Our approach uses all minutiae for
calculating cylinders, but choosing 32 minutiae for matching process, that makes
the approach actually fit well with the GPU computing architecture. The proposed
method does not affect the accuracy of the original algorithm. The speed of the
adapting algorithm gains state-of-the-art result. The proposed approach can be
easily scaled-up thus makes it possible to implement large-scale fingerprint
identification systems with inexpensive hardware.

References
1. Maltoni, D., Maio, D., Jain, A.K., Prabhakar, S.: Handbook of Fingerprint
Recognition. Springer, London (2009)
2. Cappelli, R., Maio, D., Maltoni, D., Wayman, J.L., Jain, A.K.: Performance evaluation
of fingerprint verification systems. IEEE Trans. Pattern Anal. Mach. Intell. 28, 3–18
(2006)
3. Chikkerur, S., Cartwright, A.N., Govindaraju, V.: K-plet and cbfs: a graph based

fingerprint representation and matching algorithm. In: International Conference on
Biometrics, pp. 309–315 (2006)
4. Cappelli, R., Ferrara, M., Maltoni, D.: Minutia cylinder-code: A new representation
and matching technique for fingerprint recognition. IEEE Trans. Pattern. Anal. Mach.
Intell. 32, 2128–2141 (2010)
5. Xu, W., Chen, X., Feng, J.: A robust fingerprint matching approach: growing and
fusing of local structures. In: Proceedings of the 2nd International Conference on
Biometrics, Seoul, Korea, 27–29 August 2007. LNCS, vol. 4642, pp. 134–143 (2007)

688

H.H. Le et al.

6. Feng, J.: Combining minutiae descriptors for fingerprint matching. Pattern Recognition
41, 342–352 (2008)
7. Wang, X., Li, J., Niu, Y.: Fingerprint matching using Orientation Codes and
PolyLines. Pattern Recognition 40, 3164–3177 (2007)
8. Feng, J., Ouyang, Z., Cai, A.: Fingerprint matching using ridges. Pattern Recognition
39, 2131–2140 (2006)
9. Qi, J., Yang, S., Wang, Y.: Fingerprint matching combining the global orientation field
with minutia. Pattern Recognition Letter. 26, 2424–2430 (2005)
10. Tico, M., Kuosmanen, P.: Fingerprint matching using an orientation-based minutia
descriptor. IEEE Trans. Pattern Anal. Mach. Intell. 25, 1009–1014 (2003)
11. Medina-Pérez, M.A., García-Borroto, M., Gutierrez-Rodriguez, A.E., Altamirano-Robles,
L.: Robust fingerprint verification using m-triplets. In: International Conference on
Hand-Based Biometrics (ICHB 2011), Hong Kong, pp. 1–5 (2011)
12. Gutierrez, P.D., Lastra, M., Herrera, F., Benitez, J.M.: A high performance fingerprint
matching system for large databases based on GPU. IEEE Trans. Inf. Forensics Secur.
9(1), 62–71 (2014)

13. Cappelli, R., Ferrara, M., Maltoni, D.: Large-scale fingerprint identification on GPU.
Inf. Sci. 306, 1–20 (2015)
14. Peralta, D., Triguero, I., Sanchez-Reillo, R., Herrera, F., Benitez, J.M.: Fast fingerprint
identification for large databases. Pattern Recogn. 47(2), 588–602 (2014)
15. Luebke, D., et al.: GPGPU: general-purpose computation on graphics hardware. In: SC
2006 Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (2006)
16. Cappelli, R., Maio, D.: State-of-the-art in fingerprint classification. In: Ratha, N.,
Bolle, R. (eds.) Automatic Fingerprint Recognition Systems, pp. 183–205. Springer,
New York (2004)
17. Hong, J.H., Min, J.K., Cho, U.K., Cho, S.B.: Fingerprint classification using one-vs-all
support vector machines dynamically ordered with naive Bayes classifiers. Pattern
Recognition 41(2), 662–671 (2008)
18. Cappelli, R., Ferrara, M., Maltoni, D.: Fingerprint indexing based on minutia cylinder
code. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 1051–1057 (2011)
19. Bhanu, B., Tan, X.: A triplet based approach for indexing of fingerprint database for
identification. In: Proc. Int. Conf. on Audio- and Video-Based Biometric Person
Authentication (3rd), pp. 205–210 (2001)
20. Unique Identification Authority of India, Role of Biometric Technology in Aadhaar
Enrollment (2012)
21. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep
convolutional neural networks. In: NIPS 2012, pp. 1106–1114 (2012)
22. Zhang, Y., Yi, D., Wei, B., Zhuang, Y.: A GPU-accelerated non-negative sparse latent
semantic analysis algorithm for social tagging data. Inform Sci., May 2014
23. Friedrichs, M., Eastman, P., Vaidyanathan, V., Houston, M., Legrand, S., Beberg, A.,
et al.: Accelerating molecular dynamic simulation on graphics processing units. J.
Comput. Chem. 30(6), 864–872 (2009)
24. Schatz, M., Trapnell, C., Delcher, A., Varshney, A.: High-throughput sequence
alignment using graphics processing units. BMC Bioinformat. 8, 474 (2007)
25. Medina-Pérez, M.A., Loyola-González, O., Gutierrez-Rodríguez, A.E., García-Borroto, M.,
Altamirano-Robles, L.: Introducing an experimental framework in C# for fingerprint

recognition. In: LNCS, vol. 8495 (2014)

A complete fingerprint matching algorithm on GPU for a large scale identification system

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về