Handwriting recognition based on convolutional neural network

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (536.94 KB, 4 trang )

HAPPY NEW YEAR 2018

HANDWRITING RECOGNITION
BASED ON CONVOLUTIONAL NEURAL NETWORK
PHAM TUAN DAT, LE THE ANH
Faculty of Information Technology, Vietnam Maritime University
Abstract
Recent years, there is a new research approach based on convolutional neural network
which is known as one of the most advanced deep learning models. In fact, convolutional
neural network has been widely applied in many artificial intelligence problems such as
object recognition, feature detection and text classification. To meet the need of those
problems applications must have an efficient algorithm. In this paper, authors present the
overview model of convolutional neural network including major function in each layer and
transformations in this network. Based on that theory, authors have designed an application
of English handwriting recognition using convolutional neural network as well as having
comparative tests between some recognition algorithms.
Keywords: Convolutional, perceptron, nearest neighbor, feature map, pooling, receptive field,
backpropagation, cross - entropy.
Tóm tắt
Những năm gần đây, có một hướng nghiên cứu mới dựa trên mạng nơron nhân chập được
biết như một trong những mô hình học sâu tiên tiến nhất. Thực tế mạng nơron nhân chập
được ứng dụng rộng rãi trong nhiều bài toán trí tuệ nhân tạo như nhận dạng đối tượng,
phát hiện đặc trưng hay phân loại văn bản. Để đáp ứng yêu cầu của những vấn đề trên,
các ứng dụng phải có một giải thuật hiệu quả. Trong bài báo này, nhóm tác giả giới thiệu
mô hình tổng quan mạng nơron nhân chập gồm chức năng trong mỗi tầng và những phép
biến đổi trong mạng này. Trên cơ sở đó, nhóm tác giả thiết kế một ứng dụng nhận dạng
chữ viết tay tiếng Anh sử dụng mạng nơron nhân chập cũng như có sự so sánh kết quả
giữa một số giải thuật nhận dạng.
Từ khóa: Nhân chập, giải thuật học có giám sát, láng giềng gần nhất, bản đồ đặc tính, tổng hợp,
trường tiếp nhận, truyền ngược, độ đo số bít tối thiểu cho mã hóa.
1. Introduction

Convolutional neural network (CNN) is an advanced deep learning model on which different
artificial intelligence problems may be supported. For instance, major problems such as feature
detection and object recognition from digital image can be solved by applying CNN model. Currently
many applications of face recognition or natural language processing were installed on different
platforms of computers or mobile devices.
Compared with other recognition algorithms, neural network has abilities of learning, fault
tolerance, and classifying samples into different classes. CNN is improved from traditional neural
network, so that it does have above advantages and also offers high recognition accuracy.
Furthermore, training algorithm in neural network can generate parameters for the input of various
models such as support vector machine. All network models could be applied for artificial intelligence
fields but only some of them are efficient enough for recognition problems. Therefore in the paper
authors present the overview model and comparative experiments of English handwriting recognition
based on CNN, multilayer perceptron network (MLP), and the other deep learning model called
nearest neighbor algorithm.
2. Theoretical Background
CNN is different from MLP in some points: Firstly, CNN consists of hidden layers that are
linked together through convolutional operations and non-linear functions. Secondly, in CNN, each
neuron of hidden layer is only connected to some neurons in local region of input layer. Lastly, CNN
works on basic concepts such as local receptive field, shared weight and pooling.
Local receptive field: let input of network be a digital image with size 28*28 and divided into
regions with size 5*5 as depicted in the Figure 1. If the window is moved by one pixel in order (from
left to right and from top to bottom of image) then 24*24 regions of image will be generated according
to 24*24 neurons at convolutional layer, and this transformation is known as a feature map.

70

Journal of Marine Science and Technology

No. 53 - January 2018

HAPPY NEW YEAR 2018

Figure 1. Neuron is created from the local region

Shared weight: each feature map has 26 weights or parameters, including 25 shared weights
and 1 bias parameter, so that 6 feature maps will produce 156 weights. In fact, recognition
applications always have dozens of feature maps.

Figure 2. Max - pooling procedure

Pooling: Its task is to reduce number of neurons, each region including 2*2 neurons will create
1 neuron at next layer. Two helpful procedures for pooling are called max - pooling and l2 - pooling.
Figure 3 describes the overview model of one CNN. The structure of network consists of input
and output layer, 2 convolutional layers, 2 pooling layers, 1 fully connected layer. The input size is
28*28, feature map number is 6, output layer allows to recognize labels from 0 to 9. Of course if the
size of input or class number is large, then network must have more layers.

Figure 3. The overview model of one CNN

Transformations from local regions to 1st convolutional layer:
2
2
 1
C
(
i
,
j
)



(
I (i  u, j  v) . k11, p (u, v)  b1p )


 p
u  2 v  2


p

1
..
6
;
i
,
j
 1..24


1
 ( x) 
1  e x



(1)

Activation equation in this case is sigmoid but CNN providing another one is tanh function.
Transformations from 1st convolutional layer to 1st pooling layer:

 1
1 1 1 1
S
(i,j)

 p
  C p( 2i  u, 2 j  v)
4 u 0 v 0

 p  1..6 ; i, j  1 ..12

Journal of Marine Science and Technology

No. 53 - January 2018

(2)

71

HAPPY NEW YEAR 2018
Transformations from 1st pooling layer to 2nd convolutional layer:
6
2
2
 2
C

(
i
,
j
)


(
S 1p (i  u, j  v) . k p2, q (u, v)  bp2 )
 q



p 1 u  2 v  2

q  1..12; i, j  1 .. 8


(3)

Transformations from 2nd convolutional layer to 2nd pooling layer:

 2
1 1 1 2
S q (i,j)    C q ( 2i  u, 2 j  v)
4 u 0 v 0

q  1..6; i, j  1 .. 4


(4)

In Equations (2) and (4), the procedure used is l2 - pooling.
Transformations from 2nd pooling layer to fully connected layer:

 f  F ({S q2 }q 1..12 )

 y   (W * f  b)

(5)

Estimation error between true label and prediction: there are two choices including cross
- entropy and mean squared error. Recent research showed that applying cross - entropy cost is
better than applying mean squared error cost in neural network [4]. Furthermore, to increase the
accuracy of recognition problem then CNN must use backpropagation algorithm with the detail
content described in [1], [3].
3. Experimental Work
Authors have carried out experiments of English handwriting recognition with three recognition
algorithms based on CNN [2], MLP, and nearest neighbor. Toolkits for the application are Python
and relevant libraries.
Patterns were collected from “www.ee.surrey.ac.uk/CVSSP/demos/chars74k/” including 62
classes (0 - 9, a - z, A - Z). Of all the patterns, digits number is 484 while upper letters and lower
letters account for 1297 and 1139 patterns respectively. The number of patterns for training and
testing was splitted by rate 7:3.
Model of the CNN structure in experiments is the same as the above model but having some
differences: the size of input data is 28*40, kernel number is 16/32, learning rate parameter is 0.01;
the procedure used in transformations from convolutional layers to pooling layers is max – pooling,
the estimation function is cross - entropy.
MLP network has 3 hidden layers with 1024 features on each one and like CNN, it also uses
the cross - entropy function while the estimation function of nearest neighbor algorithm is manhattan

distance (l1 norm) [5].
The process for training and testing upper letters on CNN:

Figure 4. Training and testing upper letters on CNN
The experiment detects handwriting letters from digital image according to labels of the above model:

Figure 5. Recognizing lower letters

72

Journal of Marine Science and Technology

No. 53 - January 2018

HAPPY NEW YEAR 2018
Figure 6 describes the recognition result of algorithms. CNN model recognizes well for upper
letters patterns, MLP network also gives rather good results while nearest neighbor achieves the
highest accuracy. Nevertheless, with lower letters both algorithms don’t give recognition results as
expectation.
Digits

Lower letters

Upper letters

Convolutional neural network

89.6%

78.9%

90.3%

Multilayer perceptron network

89.6%

75.1%

87.0%

Nearest neighbor

87.6%

83.7%

94.4%

Figure 6. The accuracy of recognition results

4. Conclusion
Nowadays, the problem of English text recognition is not a new subject. Actually, many
applications have been implemented on the multilayer perceptron network approach. For the difficult
problem like English handwriting recognition, authors present a new approach based on CNN model
and compare to some other methods about the accuracy of recognition result. Through the
experimental work, authors obtained the results of training and testing on CNN more exactly than
those on other networks. In addition, one remarkable advantage of neural networks is the ability to
change the number of layers in the network structure to improve accuracy. However, the recognition

result based on CNN is not really higher when compared with nearest neighbor algorithm. On the
other hand, the limitation of CNN has not been resolved completely, if complex shapes of patterns
or the quality of input data is not good then the recognition efficiency will decrease.
REFERENCES
[1] Gavin Hackeling, Mastering Machine Learning with scikit - learn, Packt Publishing, October 2014.
[2] Rodolfo Bonnin, Building Machine Learning Projects with TensorFlow, Packt Publishing, November 2016.
[3] Zhifei Zhang, “Derivation of Backpropagation in Convolutional Neural Network (CNN)”, October 2016.
[4] Michael Nielsen, Neural Networks and Deep Learning, Determination Press, 2015.
[5] Deepak Sinwar and Rahul Kaushik, “Study of Euclidean and Manhattan Distance Metrics using
Simple K-Means Ckustering”, International Journal For Research In Applied Science And
Engineering Technology, Vol.2 IssueV, May 2014.
Received:
Revised:
Accepted:

11 January 2018
22 January 2018
26 January 2018

Journal of Marine Science and Technology

No. 53 - January 2018

73

Handwriting recognition based on convolutional neural network

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về