Tải bản đầy đủ (.pdf) (5 trang)

Lecture Notes in Computer Science- P72 docx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (351.72 KB, 5 trang )

F. Li et al. (Eds.): ICWL 2008, LNCS 5145, pp. 344–355, 2008.
© Springer-Verlag Berlin Heidelberg 2008
Automated Chinese Handwriting Error Detection Using
Attributed Relational Graph Matching
Zhihui Hu
1,2,3
, Howard Leung
2,3
, and Yun Xu
1,2

1
Department of Computer Science and Technology,
University of Science & Technology of China, Hefei, China
2
Joint Research Lab of Excellence, CityU-USTC Advanced Research Institute,
Suzhou, China
3
Department of Computer Science, City University of Hong Kong, Hong Kong S.A.R.
, ,
Abstract. Due to the complex shapes and various writing styles of Chinese
characters, it is a challenge to automatically detect the errors in people’s hand-
writing. In this paper, we use attributed relational graph to represent a Chinese
character. To model the spatial relationships between the strokes in a Chinese
character, a refined interval relationship that considers more granular levels is
proposed. A novel interval neighborhood graph is also proposed to compute the
distances among the refined interval relationships. Error-tolerant graph match-
ing is used to locate the stroke production errors, sequence error as well as the
spatial relationship errors. We also propose a pruning strategy in order to speed
up the graph matching. Experiment results show that our proposed method out-
performs existing approaches in terms of accuracy as well as its ability to han-


dle more kinds of handwriting errors in less computational time.
Keywords: Chinese handwriting error detection, attributed relational graph,
stroke spatial relationship error, stroke spatial relationship error, error-tolerant
graph matching.
1 Introduction
A Chinese character is an ideogram composed of many strokes. The correct handwrit-
ing should follow the correct position, proportion and order of each stroke. Law et al.
[1] shows the following handwriting errors children may often make: 1) stroke pro-
duction errors that include missing, extra, broken, and concatenated strokes; 2) stroke
sequence errors. Besides, there exist other handwriting errors such as spatial relation-
ship errors resulting from problems in the relative length or position between strokes.
When a student makes a handwriting mistake, he/she often does not even realize it. It
is thus essential for the student to receive feedback about his/her handwriting in order
to correct any mistakes.
Traditionally, the teacher can help the student find out their handwriting errors in
class however the teacher’s available time for each student is limited. As a result, we
are motivated to build a Chinese handwriting education system for assisting the
teacher when the teacher is absent. In this system, a student can first write a Chinese
Automated Chinese Handwriting Error Detection 345
character by following a template character from teacher then the system can auto-
matically check the handwriting and give feedback to indicate whether and where
there are any errors.
The existing handwriting education systems can be divided into two categories.
The first one is the view-only system. The student can see how a Chinese character
should be written but they cannot practice handwriting through the system [2, 3]. The
other category allows the student to practice handwriting and gives some feedback to
indicate if there are errors in their handwriting. These systems can be further divided
into four main streams. The first one is focused on locating the production errors [4,
5, 6]. The second stream can only evaluate the stroke sequence errors [7]. The third
stream can detect the spatial relationship error among strokes [8]. The last one is the

combination of the previous types. In [9] the system can find out both the stroke pro-
duction and sequence errors but without considering the spatial relationship errors. As
a result, we are motivated to explore a method that can identify the stroke sequence,
production and spatial relationship errors at the same time.
In this paper, we propose a method that can not only identify the stroke production
errors and sequence error but also the spatial relationship errors between strokes given
an input online Chinese handwriting. This is achieved by using the attributed rela-
tional graph (ARG) matching. Attributed relational graph is a powerful tool to repre-
sent the relational structure of a pattern. It has been used in 2D recognition [10, 11] as
well as Chinese handwriting education [6]. In our application, the Chinese character is
represented by a complete ARG. The nodes in the ARG are used to describe the
strokes of the character and the edges denote the relations between any two strokes.
As the relations between the Chinese characters are rather complex, we propose to
extend the existing interval relationship to refine its granularity. The optimal detailed
matching between the two ARGs is the mapping between corresponding strokes. In
order to find this detailed matching, the error-tolerant graph matching [13, 14] is used
with the graph edit operations: deletion, insertion, substitution, merging and splitting
of the nodes and the edges. A* algorithm is applied to perform the state-space search-
ing of such a graph matching. The resulting operations can reflect the graph distor-
tions. On the other hand the operation of the edges can show the spatial relationship
between strokes. However, we should not ignore the computational complexity of
graph matching thus we propose a pruning strategy to reduce the matching time.
The main contributions of this paper is as follows: 1) we propose an algorithm that
can analyze an input online Chinese handwriting and determine stroke production
error, stroke sequence error and stroke spatial relationship error at the same time; 2)
we define a refined interval relationship to model the spatial relationship between
strokes and extend the interval neighborhood graph to obtain the distance measures
for the refined interval relationships; 3) we propose a pruning strategy in order to
reduce the state-space searching time while we apply the error-tolerant graph match-
ing. The remainder of this paper is organized as follows: In Section 2, the proposed

ARG matching method incorporating the spatial relationships is described. Experi-
ments and results are discussed in Section 3. Conclusions and future work are pro-
vided in Section 4.
346 Z. Hu, H. Leung

, and Y. Xu
2 Our Proposed Method
2.1 Overview
The flowchart of our method is illustrated in Figure 1. First, the sample handwriting
inputted by the student and the template character with which the student should fol-
low are both represented as ARGs. Then the error-tolerant graph matching is applied
on the two ARGs in order to find out the stroke production and sequence error in the
sample handwriting. Afterwards, the post processing can detect the stroke relationship
error. Finally the feedback that locates all the errors is provided to the student.
Representation
Representation
Character
matching
Post
p
rocessin
g
Sample
handwritin
g
Template
handwritin
g
Feedback


Fig. 1. Flowchart of our method
2.2 Spatial Relationship in Chinese Character
A Chinese character consists of many strokes that form a particular structure unique
to that Chinese character. The spatial relationship between strokes is one important
factor in determining whether a student’s Chinese handwriting is written correctly. In
object recognition, people have studied the spatial relationship between objects. Allen
firstly shows 13 interval relationships in [15] and the spatial relationships between
objects have been described in [16, 17].Nevertheless, it is not sufficient to use these
interval relationships to fully describe the spatial relationship between strokes. This
can be illustrated by the example in Figure 2. The strokes in Figure 2(a), (b) and (c)
all have the same ‘during d’ relation as defined in Allen’s interval relationship mean-
ing that the duration of stroke a is within the duration of stroke b. However, only
Figure 2(b) shows the standard handwriting of this character. The handwritings in
Figure 2(a) and (c) are non-standard because stroke a in Figure 2(a) is too long
whereas the one in Figure 2(c) is too short.
(a) Non-standard handwriting (b) Standard handwriting (c) Non-standard handwriting

Fig. 2. Example of spatial relationships in Chinese character
As illustrated in Figure 2, it can be observed that the relationship between the strokes
is not only the topological relationship but also the relative distance between the strokes.
A more granular definition of the interval relationship is able to distinguish among the
Automated Chinese Handwriting Error Detection 347
three cases in Figure 2. In particular, we propose to further refine the interval relation-
ship into three levels (f, m, l) by considering the distance information. The refined inter-
val relationships of the strokes in Figure 2(a), (b) and (c) become ‘dl’, ‘dm’ and ‘df’
respectively. The refined relationship with three additional levels based on the distance
can also be applied to other existing interval relationships. The resulting refined rela-
tionships are summarized in Figure 3.
Relation Symbol Symbol for inverse Example


Fig. 3. Refined interval relationships with more granular levels
2.3 Complete ARG Representation of Chinese Character
ARG was first described in [10] to represent the structure information of a pattern as
g=(V,E,α,β). In our application, the set of nodes V describe the strokes of the Chinese
character, and the set of edges E describes the relationships between any two strokes
as defined in Figure 3. The ARG representation is given as follows.
Nodes in the ARG. Each node stores the x and y coordinates of a stroke. The node
labeling function
V
LV →:
α
returns n data points for each stroke [6].
Edges in the ARG. Each edge stores the relation of the two nodes (strokes) which are
connected by this edge. The edge labeling function
E
LE →:
β
returns (μ, λ) where μ, λ
are the refined interval relationship along the x-axis and y-axis respectively.
As an example, a Chinese character and its stroke spatial relationships are shown in
Figure 4(a). The ARG representation of this character is shown in Figure 4(b). The strokes
a, b and c in the character are represented by the nodes a, b and c in the ARG. The term
r
s1s2
is the relationship between strokes s1 and s2, and
21),,,(2,1 sscbass ≠∈
.

348 Z. Hu, H. Leung


, and Y. Xu
a
c b
r
ac: (df, mi)
r
ca: (dif, m)
r
ab: (df, dif)
r
ba: (dif, df)
r
bc: (dm, >m)
r
cb: (dim, <m)
(a) A Chinese character (b) Corresponding ARG

Fig. 4. ARG representation of a Chinese character
In this example, r
ac
is denoted by (df ,mi), r
ab
is denoted by (df, dif), and r
bc
is denoted
by (dm,>m). Note that the r
ca
is formed simply by taking the inverse of each compo-
nent of the relationship used to represent r
ac

and is denoted by (dif, m).
2.4 Error-Tolerant Graph Matching
As illustrated in Figure 1, the input (sample) handwriting is represented as an ARG
g
1
=(V
1
,E
1

1

1
) and the template handwriting is represented as another ARG
g
2
=(V
2
,E
2

2

2
). In order to decide whether the two ARGs have some differences, we
find an error–tolerant graph matching from g
1
to g
2
which is a transformation denoted

by the function f [13, 14].This function f consists of many edit operations performed
on both nodes and edges. The node operations have been defined by the authors in [6]
with node substitution, merging, splitting, deletion and insertion. On the other hand,
we extend the work in [6] by adding the edge operations defined as follows: 1) edge
substitution implying that both nodes sharing this edge are correct; 2) edge deletion
implying that one of the nodes/both nodes sharing this edge is an extra or broken
stroke; 3) edge insertion implying that one of the node/both nodes sharing this edge is
a missing or concatenated stroke.

Edge substitution. The cost for the edge substitution is the matching cost between
an edge in the sample character and an edge in the template. We use Rt to denote
the set of edges in the template and Rs to denote the set of edges in the sample.
Note that an edge represents the spatial relationship between two strokes in a hand-
writing. The i-th template edge Rt
i
can be denoted by (μt
i
, λt
i
) and the j-th sample
edge Rs
j
can be denoted by (μs
j
, λs
j
). The dissimilarity between (μt
i
, λt
i

) and (μs
j
, λs
j
)
is defined as D(Rt
i
, Rs
j
) which is derived from the idea of the interval neighborhood
graph [16]. Two interval relationships are neighbors, if they can be transformed into
one another by continuous deformation (shortening, lengthening, and moving) [17].
We construct a new interval neighborhood graph in Figure 5 which considers our
proposed refined relationship with three levels (f, m, l) in each relationship defined
in Figure 3. Note that the three levels with the same interval relationship are close
to each other in the refined interval neighborhood graph since they can be trans-
formed from one to another by shortening or lengthening the distance between the
two strokes.

×