Tuyển tập Báo cáo Hội nghị Sinh viên Nghiên cứu Khoa học lần thứ 8 Đại học Đà Nẵng năm 2012
HANDWRITTEN NUMBER RECOGNITION AND ITS APPLICATION AT
DANANG UNIVERSITY OF TECHNOLOGY
Authors: Duong Thi Kim Cuc, Dinh Quang Huy, Tran Hoang An, Nguyen Van Trong
Da Nang University of Technology, Center of Excellence, ECE08
Advisors: Hoang Le Uyen Thuc, M.S., Pham Van Tuan, Ph.D.
Da Nang University of Technology, Department of Electronics and Telecommunications
ABSTRACT
This paper presents the results of handwritten digit recognition on well-known image
databases using state-of-the-art feature extraction and classification techniques. The tested
databases are obtained from MNIST [1] and collected samples of digits handwritten by teachers at
Da Nang University of Technology. For feature extraction, two features are chosen: Hu’s seven
moments and image averaging (resizing the images to ones of less number of pixels for easier
comparison). The preceding features are accompanied with corresponding classifiers, which are
Neural Network classifier and Euclidean Distance. So far with the dictionary for matching collected
at Da Nang University of Technology, the combination of image averaging feature and the
Euclidean Distance gives the best accuracies (more than 93%) and can further be improved with a
more comprehensive database.
1. Introduction
One of the most troublesome and tedious tasks teachers at Da Nang University of
Technology generally face is to manually put the exam grades into computers. This project
aims at providing them with the convenience of not having to copy the grades by hands, by
presenting a method of automatically importing grades into computers. This technique
employs a well-known procedure in pattern recognition called OCR (optical character
recognition).
The performance of character recognition largely depends on the feature extraction
approach and the classification/learning scheme. For feature extraction of character
recognition, various approaches have been proposed. Hu’s seven moments have been
extensively employed as invariant global features of images in pattern recognition.
Averaging is a rather simple process of representing a square of pixels by a single pixel,
leading to an image being expressed by a smaller image.
An artificial neural network (ANN) consists of an interconnected group of artificial
neurons, and it processes information using a connectionist approach to computation.
When the network structure is appropriately designed and the training sample size is large,
neural networks are able to give high classification accuracy to unseen test data. OCR
using template matching is a system prototype that useful to recognize the character or
alphabet by comparing two images of digits. We implement template-matching technique,
which involves optimizing the Euclidean Distance between the patterns to be recognized
with the sample patterns provided by the dictionary.
2. Main process
Tuyển tập Báo cáo Hội nghị Sinh viên Nghiên cứu Khoa học lần thứ 8 Đại học Đà Nẵng năm 2012
2.1. Proposed System
The scanned image is first preprocessed to give the normalized image. To ease the
classification process, the normalized image is represented by a set of features for
comparison. Finally, the conversion from the JPEG file to xls takes place.
Figure 1. Proposed System Overview.
2.2. Preprocessing
There are four main steps in this stage. Firstly, the scanned RGB image is
converted to gray scale image. This process is completed using the formula Intensity =
0.2989*red + 0.5870*green + 0.1140*blue [2]. Secondly, the image is thresholded to
obtain the binary one. The thresholding level, which is chosen to be 70 in this case,
depends on the quality of the scanned image and the background noise The output binary
image has value of 1 (white) for all pixels in the input image with luminance greater than
70 and 0 (black) for all other pixels. Thirdly, “Opening” morphology method [3] is applied
to smoothen the number and eliminate small noise regions. Finally, normalization is used
to regulate the size, position and shape of the image so that the differences between
samples in one class are reduced. The key idea behind normalization involves bilinear
interpolation theory
[4]. All of these steps are
depicted in the Figure
2.
(a) (b) (c) (d) (e)
Figure 2. Preprocessing Steps. (a) RGB Image. (b) Gray-scale Image. (c) Binary Image.
(d) Image after Morphology. (e) Normalized Image.
2.3. Feature extraction
The features used in our experiment are Hu’s seven moments and image averaging.
2.3.1 Hu’s seven moments (SM)
An essential issue in the field of pattern analysis is the recognition of objects and
characters regardless of their position, size and orientation as illustrated in figure 1. The
idea of using moments in shape recognition gained prominence when Hu (1962) [5],
derived a set of invariants using algebraic invariants. The two-dimensional (p + q)
th
order
moments of an image with density function f(x, y) are defined in terms of Riemann
integrals.
Tuyển tập Báo cáo Hội nghị Sinh viên Nghiên cứu Khoa học lần thứ 8 Đại học Đà Nẵng năm 2012
The central moment are defined as:
(2)
(3)
In particular, Hu (1962), defines seven values, computed by normalizing central
moments through order three, that are invariant to object scale, position, and orientation.
2.3.2 Image Averaging
Since the matrix expressions for each of the ten numbers from 0 to 9 are very
different, it is reasonable to recognize them by checking each ‘pseudo pixel’, which is
represented by a 4x4 block in a particular image number. A 4x4 block has its own
averaging value and can be considered a ‘pixel’. By choosing 4x4 blocks, we can reduce
the complexity of the recognition process but still maintain the shape of the image. Figure
3 shows the number zero images before and after the averaging.
(a) (b)
Figure 3. Example of Image Averaging. (a) Initial Image. (b) Average Image
2.4. Classification algorithm
2.4.1 Artificial neural network
Artificial neural network [6], which are inspired from studies of biological nervous
systems, are composed of many simple nonlinear computational elements (neurons or
nodes) which are connected by links with variable weights. The inherent parallelism
of these networks allows rapid pursuit of many hypotheses in parallel, resulting in
high computation rates. Moreover, they provide a greater degree of robustness or fault
tolerance than conventional computers because of the many processing nodes, each of
which is responsible for a small portion of the task. Damage to a few nodes or
links thus does not impair overall performance significantly. Neural networks can
perform different tasks, one of which is in the context of a supervised classifier.
This is a decision-making process which requires the net to identify the class or
Tuyển tập Báo cáo Hội nghị Sinh viên Nghiên cứu Khoa học lần thứ 8 Đại học Đà Nẵng năm 2012
category which best represents an input pattern. It is assumed that the net has
already adapted to the classes it is expected to recognize through a learning process
using labeled training prototypes from each category
Figure 4. General structural of a neural network [6]
2.4.2. Template matching using Euclidean Distance
The Euclidean Distance [7] is based on the smallest ‘distance’ or error between the
testing samples and a dictionary that is built up in advance.
2
1
1
min ( ( ( , )) )
k
nN
j
d feature dictionary j n
(4)
3. Experimental results
Two different features and classifiers result in four experiments: Hu’s SM and
Neural Network, SM and Euclidean Distance, Image Averaging and Neural Network, and
Image Averaging and Euclidean Distance.
Table 1: Errors rates for data from MNIST Table 2: Errors rates for data from DUT
(Over 1000 samples) (Over 90 samples)
Figure 6 shows the actual result from the Graphic User Interface (GUI).
Hu’s seven
moments
Image
averaging
Neural
Network
48.18%
14.3%
Euclidean
Distance
7.2%
8.4%
Hu’s seven
moments
Image
averaging
Neural
Network
83.6%
4.2%
Euclidean
Distance
8.2%
10%
Tuyển tập Báo cáo Hội nghị Sinh viên Nghiên cứu Khoa học lần thứ 8 Đại học Đà Nẵng năm 2012
Figure 6. The experimental results of DUT and MNIST databases. (a) DUT database.
Figure 7 introduces the GUI. The user types the name of the JPEG file and the
corresponding number of students, and then clicks Convert button to get an xls file containing
extracted scores. To view the xls file, the user clicks Open to view xls file button.
Figure 7. Graphic User Interface
4. Conclusion
The experimental results indicate that Image Averaging and Euclidean distance
give more stable and smaller errors than the combination of Neural Network and SM,
while the best performance is obtained using Neural Network classifier. From these
statistics, we decided to implement Image Averaging and Euclidean Distance in the final
Number Recognition Product. In the future, this program will be upgraded to recognize the
score written in decimal number (such as 9.5 or 10). Also, a score written in word
recognition system will be added for checking the result.
REFERENCES
[1] Yann LeCun, Corinna Cortes, The MNIST Database of Handwritten Digits,
[2]
[3] Rafael C. Gonzalez and Richard E. Woods, Digital Image Processing, second
edition (2001) P.528-532.
[4]
[5] Ming – Kuel Hu, “Visual Pattern Recognition by Moment Invariants” (1962).
[6] V. Venugopal, W. Baets, Neural Networks and Statistical Techniques in Marketing
Research: A Conceptual Comparison (1994), Vol. 12 Iss: 7, pp.30 – 38.
[7] Cheng Liu Liu, “Handwritten Digit Recognition: Benchmarking of state-of-the-art
techniques”, Elsevier Ltd, (2002).