Tải bản đầy đủ (.pdf) (10 trang)

Managing and Mining Graph Data part 27 docx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.41 MB, 10 trang )

Exact and Inexact Graph Matching: Methodology and Applications 243
[43] J. Larrosa and G. Valiente. Constraint satisfaction algorithms for
graph pattern matching. Mathematical Structures in Computer Science,
12(4):403–422, 2002.
[44] G. Levi. A note on the derivation of maximal common subgraphs of two
directed or undirected graphs. Calcolo, 9:341–354, 1972.
[45] E.M. Luks. Isomorphism of graphs of bounded valence can be tested in
polynomial time. Journal of Computer and Systems Sciences, 25:42–65,
1982.
[46] B. Luo and E. Hancock. Structural graph matching using the EM algo-
rithm and singular value decomposition. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 23(10):1120–1136, 2001.
[47] B. Luo, R. Wilson, and E.R. Hancock. Spectral embedding of graphs.
Pattern Recognition, 36(10):2213–2223, 2003.
[48] P. Mah
«
e, N. Ueda, and T. Akutsu. Graph kernels for molecular structures
– activity relationship analysis with support vector machines. Journal of
Chemical Information and Modeling, 45(4):939–951, 2005.
[49] J.J. McGregor. Backtrack search algorithms and the maximal common
subgraph problem. Software Practice and Experience, 12:23–34, 1982.
[50] B.D. McKay. Practical graph isomorphism. Congressus Numerantium,
30:45–87, 1981.
[51] B.T. Messmer and H. Bunke. A decision tree approach to graph and sub-
graph isomorphism detection. Pattern Recognition, 32:1979–1998, 1008.
[52] A. Micheli. Neural network for graphs: A contextual constructive ap-
proach. IEEE Transactions on Neural Networks, 20(3):498–511, 2009.
[53] J. Munkres. Algorithms for the assignment and transportation problems.
In Journal of the Society for Industrial and Applied Mathematics, vol-
ume 5, pages 32–38, March 1957.
[54] R. Myers, R.C. Wilson, and E.R. Hancock. Bayesian graph edit dis-


tance. IEEE Transactions on Pattern Analysis and Machine Intelligence,
22(6):628–635, 2000.
[55] M. Neuhaus and H. Bunke. Self-organizing maps for learning the edit
costs in graph matching. IEEE Transactions on Systems, Man, and Cyber-
netics (Part B), 35(3):503–514, 2005.
[56] M. Neuhaus and H. Bunke. Automatic learning of cost functions for
graph edit distance. Information Sciences, 177(1):239–247, 2007.
[57] M. Neuhaus and H. Bunke. Bridging the Gap Between Graph Edit Dis-
tance and Kernel Machines. World Scientific, 2007.
[58] M. Neuhaus and H. Bunke. A quadratic programming approach to the
graph edit distance problem. In F. Escolano and M. Vento, editors, Proc.
244 MANAGING AND MINING GRAPH DATA
6th Int. Workshop on Graph Based Representations in Pattern Recognition,
LNCS 4538, pages 92–102, 2007.
[59] M. Neuhaus, K. Riesen, and H. Bunke. Fast suboptimal algorithms for the
computation of graph edit distance. In Dit-Yan Yeung, J.T. Kwok, A. Fred,
F. Roli, and D. de Ridder, editors, Proc. 11.th int. Workshop on Strucural
and Syntactic Pattern Recognition, LNCS 4109, pages 163–172. Springer,
2006.
[60] E. Pekalska and R. Duin. The Dissimilarity Representation for Pattern
Recognition: Foundations and Applications. World Scientific, 2005.
[61] M. Pelillo. Replicator equations, maximal cliques, and graph isomor-
phism. Neural Computation, 11(8):1933–1955, 1999.
[62] K. Riesen and H. Bunke. Graph classification based on vector space
embedding. Int. Journal of Pattern Recognition and Artificial Intelligence,
2008. accepted for publication.
[63] K. Riesen and H. Bunke. Kernel 𝑘-means clustering applied to vector
space embeddings of graphs. In L. Prevost, S. Marinai, and F. Schwenker,
editors, Proc. 3rd IAPR Workshop Artificial Neural Networks in Pattern
Recognition, LNAI 5064, pages 24–35. Springer, 2008.

[64] K. Riesen and H. Bunke. Non-linear transformations of vector space
embedded graphs. In A. Juan-Ciscar and G. Sanchez-Albaladejo, editors,
Pattern Recognition in Information Systems, pages 173–186, 2008.
[65] K. Riesen and H. Bunke. On Lipschitz embeddings of graphs. In
I. Lovrek, R.J. Howlett, and L.C. Jain, editors, Proc. 12th International
Conference, Knowledge-Based Intelligent Information and Engineering
Systems, Part I, LNAI 5177, pages 131–140. Springer, 2008.
[66] K. Riesen and H. Bunke. Reducing the dimensionality of dissimilarity
space embedding graph kernels. Engineering Applications of Artificial
Intelligence, 22(1):48–56, 2008.
[67] K. Riesen and H. Bunke. Approximate graph edit distance computa-
tion by means of bipartite graph matching. Image and Vision Computing,
27(4):950–959, 2009.
[68] K. Riesen and H. Bunke. Dissimilarity based vector space embedding
of graphs using prototype reduction schemes. Accepted for publication in
Machine Learning and Data Mining in Pattern Recognition, 2009.
[69] A. Robles-Kelly and E.R. Hancock. String edit distance, random walks
and graph matching. Int. Journal of Pattern Recognition and Artificial
Intelligence, 18(3):315–327, 2004.
[70] A. Robles-Kelly and E.R. Hancock. A Riemannian approach to graph
embedding. Pattern Recognition, 40:1024–1056, 2007.
Exact and Inexact Graph Matching: Methodology and Applications 245
[71] A. Sanfeliu and K.S. Fu. A distance measure between attributed relational
graphs for pattern recognition. IEEE Transactions on Systems, Man, and
Cybernetics (Part B), 13(3):353–363, 1983.
[72] F. Scarselli, M. Gori, A.C. Tsoi, M. Hagenbuchner, and G. Monfardini.
The graph neural network model. IEEE Transactions on Neural Networks,
20(1):61–80, 2009.
[73] K. Sch
-

adler and F. Wysotzki. Comparing structures using a Hopfield-
style neural network. Applied Intelligence, 11:15–30, 1999.
[74] A. Schenker, H. Bunke, M. Last, and A. Kandel. Graph-Theoretic Tech-
niques for Web Content Mining. World Scientific, 2005.
[75] B. Sch
-
olkopf and A. Smola. Learning with Kernels. MIT Press, 2002.
[76] B. Sch
-
olkopf, A. Smola, and K R. M
-
uller. Nonlinear component analy-
sis as a kernel eigenvalue problem. Neural Computation, 10:1299–1319,
1998.
[77] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis.
Cambridge University Press, 2004.
[78] A. Shokoufandeh, D. Macrini, S. Dickinson, K. Siddiqi, and S.W. Zucker.
Indexing hierarchical structures using graph spectra. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 27(7):1125–1140, 2005.
[79] A. Smola and R. Kondor. Kernels and regularization on graphs. In Proc.
16th. Int. Conf. on Comptuational Learning Theory, pages 144–158, 2003.
[80] S. Sorlin and C. Solnon. Reactive tabu search for measuring graph simi-
larity. In L. Brun and M. Vento, editors, Proc. 5th Int. Worksho on Graph-
based Representations in Pattern Recognition, LNCS 3434, pages 172–
182. Springer, 2005.
[81] A. Sperduti and A. Starita. Supervised neural networks for the classifica-
tion of structures. IEEE Transactions on Neural Networks, 8(3):714–735,
1997.
[82] B. Spillmann, M. Neuhaus, H. Bunke, E. Pekalska, and R. Duin. Trans-
forming strings to vector spaces using prototype selection. In Dit-Yan Ye-

ung, J.T. Kwok, A. Fred, F. Roli, and D. de Ridder, editors, Proc. 11.th int.
Workshop on Strucural and Syntactic Pattern Recognition, LNCS 4109,
pages 287–296. Springer, 2006.
[83] P.N. Suganthan, E.K. Teoh, and D.P. Mital. Pattern recognition by graph
matching using the potts MFT neural networks. Pattern Recognition,
28(7):997–1009, 1995.
[84] P.N. Suganthan, E.K. Teoh, and D.P. Mital. Pattern recognition by ho-
momorphic graph matching using Hopfield neural networks. Image Vision
Computing, 13(1):45–60, 1995.
246 MANAGING AND MINING GRAPH DATA
[85] P.N. Suganthan, E.K. Teoh, and D.P. Mital. Self-organizing Hopfield
network for attributed relational graph matching. Image Vision Computing,
13(1):61–73, 1995.
[86] Y. Tian and J.M. Patel. Tale: A tool for approximate large graph matching.
In IEEE 24th International Conference on Data Engineering, pages 963–
972, 2008.
[87] A. Torsello and E. Hancock. Computing approximate tree edit distance
using relaxation labeling. Pattern Recognition Letters, 24(8):1089–1097,
2003.
[88] K. Tsuda. Support vector classification with asymmetric kernel function.
In M. Verleysen, editor, Proc. 7th European Symposium on Artifical Neural
Netweorks, pages 183–188, 1999.
[89] J.R. Ullmann. An algorithm for subgraph isomorphism. Journal of the
Association for Computing Machinery, 23(1):31–42, 1976.
[90] S. Umeyama. An eigendecomposition approach to weighted graph
matching problems. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 10(5):695–703, 1988.
[91] M.A. van Wyk, T.S. Durrani, and B.J. van Wyk. A RKHS interpolator-
based graph matching algorithm. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 24(7):988–995, 2003.

[92] J P. Vert and M. Kanehisa. Graph-driven features extraction from mi-
croarray data using diffusion kernels and kernel CCA. In Advances in Neu-
ral Information Processing Systems, volume 15, pages 1425–1432. MIT
Press, 2003.
[93] R.A. Wagner and M.J. Fischer. The string-to-string correction prob-
lem. Journal of the Association for Computing Machinery, 21(1):168–173,
1974.
[94] W.D. Wallis, P. Shoubridge, M. Kraetzl, and D. Ray. Graph distances
using graph union. Pattern Recognition Letters, 22(6):701–704, 2001.
[95] C. Watkins. Dynamic alignment kernels. In A. Smola, P.L. Bartlett,
B. Sch
-
olkopf, and D. Schuurmans, editors, Advances in Large Margin
Classifiers, pages 39–50. MIT Press, 2000.
[96] R. Wilson and E.R. Hancock. Levenshtein distance for graph spectral
features. In J. Kittler, M. Petrou, and M. Nixon, editors, Proc. 17th Int.
Conf. on Pattern Recognition, volume 2, pages 489–492, 2004.
[97] R.C. Wilson and E. Hancock. Structural matching by discrete relax-
ation. IEEE Transactions on Pattern Analysis and Machine Intelligence,
19(6):634–648, 1997.
Exact and Inexact Graph Matching: Methodology and Applications 247
[98] R.C. Wilson, E.R. Hancock, and B. Luo. Pattern vectors from algebraic
graph theory. IEEE Trans. on Pattern Analysis ans Machine Intelligence,
27(7):1112–1124, 2005.
[99] A.K.C. Wong and M. You. Entropy and distance of random graphs with
application to structural pattern recognition. IEEE Transactions on Pattern
Analysis and Machine Intelligence, 7(5):599–609, 1985.
[100] R. Xu and D. Wunsch. Survey of graph clustering algorithms. IEEE
Transactions on Neural Networks, 16(3):645–678, 2005.
[101] Y. Yao, G.L. Marcialis, M. Pontil, P. Frasconi, and F. Roli. Combining

flat and structured representations for fingerprint classification with recur-
sive neural networks and support vector machines. Pattern Recognition,
36(2):397–406, 2003.
Chapter 8
A SURVEY OF ALGORITHMS FOR KEYWORD
SEARCH ON GRAPH DATA
Haixun Wang
Microsoft Research Asia
Beijing, China 100190

Charu C. Aggarwal
IBM T. J. Watson Research Center
Hawthorne, NY 10532

Abstract In this chapter, we survey methods that perform keyword search on graph data.
Keyword search provides a simple but user-friendly interface to retrieve infor-
mation from complicated data structures. Since many real life datasets are repre-
sented by trees and graphs, keyword search has become an attractive mechanism
for data of a variety of types. In this survey, we discuss methods of keyword
search on schema graphs, which are abstract representation for XML data and
relational data, and methods of keyword search on schema-free graphs. In our
discussion, we focus on three major challenges of keyword search on graphs.
First, what is the semantics of keyword search on graphs, or, what qualifies as
an answer to a keyword search; second, what constitutes a good answer, or, how
to rank the answers; third, how to perform keyword search efficiently. We also
discuss some unresolved challenges and propose some new research directions
on this topic.
Keywords: Keyword Search, Information Retrieval, Graph Structured Data, Semi-
Structured Data
© Springer Science+Business Media, LLC 2010

C.C. Aggarwal and H. Wang (eds.), Managing and Mining Graph Data,
Advances in Database Systems 40, DOI 10.1007/978-1-4419-6045-0_8,
249
250 MANAGING AND MINING GRAPH DATA
1. Introduction
Keyword search is the de facto information retrieval mechanism for data on
the World Wide Web. It also proves to be an effective mechanism for querying
semi-structured and structured data, because of its user-friendly query inter-
face. In this survey, we focus on keyword search problems for XML documents
(semi-structured data), relational databases (structured data), and all kinds of
schema-free graph data.
Recently, query processing over graph-structured data has attracted increas-
ing attention, as myriads of applications are driven by and producing graph-
structured data [14]. For example, in semantic web, two major W3C standards,
RDF and OWL, conform to node-labeled and edge-labeled graph models. In
bioinformatics, many well-known projects, e.g., BioCyc (),
build graph-structured databases. In social network analysis, much inter-
est centers around all kinds of personal interconnections. In other applica-
tions, raw data might not be graph-structured at the first glance, but there are
many implicit connections among data items; restoring these connections of-
ten allows more effective and intuitive querying. For example, a number of
projects [1, 18, 3, 26, 8] enable keyword search over relational databases.
In personal information management (PIM) systems [10, 5], objects such as
emails, documents, and photos are interwoven into a graph using manually or
automatically established connections among them. The list of examples of
graph-structured data goes on.
For data with relational and XML schema, specific query languages, such
as SQL and XQuery, have been developed for information retrieval. In or-
der to query such data, the user must master a complex query language and
understand the underlying data schema. In relational databases, information

about an object is often scattered in multiple tables due to normalization con-
siderations, and in XML datasets, the schema are often complicated and em-
bedded XML structures often create a lot of difficulty to express queries that
are forced to traverse tree structures. Furthermore, many applications work on
graph-structured data with no obvious, well-structured schema, so the option
of information retrieval based on query languages is not applicable.
Both relational databases and XML databases can be viewed as graphs.
Specifically, XML datasets can be regarded as graphs when IDREF/ID links
are taken into consideration, and a relational database can be regarded as a data
graph that has tuples and keywords as nodes. In the data graph, for example,
two tuples are connected by an edge if they can be joined using a foreign key;
a tuple and a keyword are connected if the tuple contains the keyword. Thus,
traditional graph search algorithms, which extract features (e.g., paths [27],
frequent-patterns [30], sequences [20]) from graph data, and convert queries
into searches over feature spaces, can be used for such data.
A Survey of Algorithms for Keyword Search on Graph Data 251
However, traditional graph search methods usually focus more on the struc-
ture of the graph rather than the semantic content of the graph. In XML and re-
lational data graphs, nodes contain keywords, and sometimes nodes and edges
are labeled. The problem of keyword search requires us to determine a group
of densely linked nodes in the graph, which may satisfy a particular keyword-
based query. Thus, the keyword search problem makes use of both the content
and the linkage structure. These two sources of information actually re-enforce
each other, and improve the overall quality of the results. This makes keyword
search a more preferred information retrieval method. Keyword search allows
users to query the databases quickly, with no need to know the schema of
the respective databases. In addition, keyword search can help discover unex-
pected answers that are often difficult to obtain via rigid-format SQL queries.
It is for these reasons that keyword search over tree- and graph-structured data
has attracted much attention [1, 18, 3, 6, 13, 16, 2, 28, 21, 26, 24, 8].

Keyword search over graph data presents many challenges. The first ques-
tion we must answer is that, what constitutes an answer to a keyword. For
information retrieval on the Web, answers are simply Web documents that
contain the keywords. In our case, the entire dataset is considered as a sin-
gle graph, so the algorithms must work on a finer granularity and decide what
subgraphs are qualified as answers. Furthermore, since many subgraphs may
satisfy a query, we must design ranking strategies to find top answers. The
definition of answers and the design of their ranking strategies must satisfy
users’ intention. For example, several papers [16, 2, 12, 26] adopt IR-style
answer-tree ranking strategies to enhance semantics of answers. Finally, a ma-
jor challenge for keyword search over graph data is query efficiency, which to a
large extent hinges on the semantics of the query and the ranking strategy. For
instance, some ranking strategies score an answer by the sum of edge weights.
In this case, finding the top-ranked answer is equivalent to the group Steiner
tree problem [9], which is NP-hard. Thus, finding the exact top 𝑘 answers
is inherently difficult. To improve search efficiency, many systems, such as
BANKS [3], propose ways to reduce the search space. As another example,
BLINKS [14] avoids the inherent difficulty of the group Steiner tree problem
by proposing an alternative scoring mechanism, which lowers complexity and
enables effective indexing and pruning.
Before we delve into the details of various keyword search problems for
graph data, we briefly summarize the scope of this survey chapter. We classify
algorithms we survey into three categories based on the schema constraints in
the underlying graph data.
Keyword Search on XML Data:
Keyword search on XML data [11, 6, 13, 23, 25] is a simpler prob-
lem than on schema-free graphs. They are basically constrained to tree
252 MANAGING AND MINING GRAPH DATA
structures, where each node only has a single incoming path. This prop-
erty provides great optimization opportunities [28]. Connectivity infor-

mation can also be efficiently encoded and indexed. For example, in
XRank [13], the Dewey inverted list is used to index paths so that a key-
word query can be evaluated without tree traversal.
Keyword Search over Relational Databases:
Keyword search on relational databases [1, 3, 18, 16, 26] has attracted
much interest. Conceptually, a database is viewed as a labeled graph
where tuples in different tables are treated as nodes connected via
foreign-key relationships. Note that a graph constructed this way usu-
ally has a regular structure because schema restricts node connections.
Different from the graph-search approach in BANKS [3], DBXplorer [1]
and DISCOVER [18] construct join expressions and evaluate them, re-
lying heavily on the database schema and query processing techniques
in RDBMS.
Keyword Search on Graphs: A great deal of work on keyword query-
ing of structured and semi-structured data has been proposed in re-
cent years. Well known algorithms includes the backward expanding
search [3], bidirectional search [21], dynamic programming techniques
DPBF [8], and BLINKS [14]. Recently, work that extend keyword
search to graphs on external memory has been proposed [7].
This rest of the chapter is organized as follows. We first discuss keyword
search methods for schema graphs. In Section 2 we focus on keyword search
for XML data, and in Section 3, we focus on keyword search for relational
data. In Section 4, we introduce several algorithms for keyword search on
schema-free graphs. Section 5 contains a discussion of future directions and
the conclusion.
2. Keyword Search on XML Data
Sophisticated query languages such as XQuery have been developed for
querying XML documents. Although XQuery can express many queries pre-
cisely and effectively, it is by no means a user-friendly interface for accessing
XML data: users must master a complex query language, and in order to use

it, they must have a full understanding of the schema of the underlying XML
data. Keyword search, on the other hand, offers a simple and user-friendly in-
terface. Furthermore, the tree structure of XML data gives nice semantics to
the query and enables efficient query processing.
A Survey of Algorithms for Keyword Search on Graph Data 253
2.1 Query Semantics
In the most basic form, as in XRank [13] and many other systems, a keyword
search query consists of 𝑛 keywords: 𝑄 = {𝑘
1
, ⋅⋅⋅ , 𝑘
𝑛
}. XSEarch [6] extends
the syntax to allow users to specify which keywords must appear in a satisfying
document, and which may or may not appear (although the appearance of such
keywords is desirable, as indicated by the ranking function).
Syntax aside, one important question is, what qualifies as an answer to a
keyword search query? In information retrieval, we simply return documents
that contain all the keywords. For keyword search on an XML document, we
want to return meaningful snippets of the document that contains the keywords.
One interpretation of meaningful is to find the smallest subtrees that contain all
the keywords.
A
B
C D
x
y
x
y
exclusive LCA node
minimal LCA node

Figure 8.1. Query Semantics for Keyword Search 𝑄 = {𝑥, 𝑦 } on XML Data
Specifically, for each keyword 𝑘
𝑖
, let 𝐿
𝑖
be the list of nodes in the XML
document that contain keyword 𝑘
𝑖
. Clearly, subtrees formed by at least one
node from each 𝐿
𝑖
, 𝑖 = 1, ⋅⋅⋅ , 𝑛 contain all the keywords. Thus, an answer to
the query can be represented by 𝑙𝑐𝑎(𝑛
1
, ⋅⋅⋅ , 𝑛
𝑛
), the lowest common ancestor
(LCA) of nodes 𝑛
1
, ⋅⋅⋅ , 𝑛
𝑛
where 𝑛
𝑖
∈ 𝐿
𝑖
. In other words, answering the
query is equivalent to finding:
𝐿𝐶𝐴(𝑘
1
, ⋅⋅⋅ , 𝑘

𝑛
) = {𝑙𝑐𝑎(𝑛
1
, ⋅⋅⋅ , 𝑛
𝑛
)∣𝑛
1
∈ 𝐿
1
, ⋅⋅⋅ , 𝑛
𝑛
∈ 𝐿
𝑛
}
Moreover, we are only interested in the “smallest” answer, that is,
𝑆𝐿𝐶𝐴(𝑘
1
, ⋅⋅⋅ , 𝑘
𝑛
) = {𝑣 ∣ 𝑣 ∈ 𝐿𝐶𝐴(𝑘
1
, ⋅⋅⋅ , 𝑘
𝑛
) ∧
∀𝑣

∈ 𝐿𝐶𝐴(𝑘
1
, ⋅⋅⋅ , 𝑘
𝑛

), 𝑣 ⊀ 𝑣

}
(8.1)
where ≺ denotes the ancestor relationship between two nodes in an XML
document. As an example, in Figure 8.1, we assume the keyword query
is 𝑄 = {𝑥, 𝑦}. We have 𝐶 ∈ 𝑆𝐿𝐶𝐴(𝑥, 𝑦) while 𝐴 ∈ 𝐿𝐶𝐴(𝑥, 𝑦) but
𝐴 ∕∈ 𝑆𝐿𝐶𝐴(𝑥, 𝑦).
Several algorithms including [28, 17, 29] are based on the SLCA semantics.
However, SLCA is by no means the only meaningful semantics for keyword

×