Tải bản đầy đủ (.pdf) (10 trang)

Managing and Mining Graph Data part 59 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (765.48 KB, 10 trang )

570 MANAGING AND MINING GRAPH DATA
from the perspective mining of mining a single (large) network in the presence
of noise and uncertainty.
Both data mining and the field of bioinformatics are young and vibrant and
thus there are ample opportunities for interesting lines of future research at
their intersection. Sticking to the theme of this article – graph mining in
bioinformatics – below we list several such opportunities. This list is by no
means a comprehensive list but highlight some of the potential opportunities
researchers may avail of.
Scalable algorithms for analyzing time varying networks: A large ma-
jority of the work to date in this field has focused on the analysis of static
networks. While there have been some recent efforts to analyze dynamic
biological networks, research in this arena is at its infancy. With antici-
pated advances in technology where much more temporal data is likely
to become available temporal analysis of such networks is likely to be an
important arena of future research. Underpinning this effort, given the
size and dynamics of the data involved are the need to develop scalable
algorithms for processing and analyzing such data.
Discovering anomalous structures in graph data: Again while most of
the work to date has focused on the discovery of frequent or modular
structure within such data – the discovery of anomalous substructures
often has a crucial role to play in such domains. Defining what con-
stitutes an anomaly, how to compute it efficiently while leveraging the
ambient knowledge in the domain in question are some of the challenges
to be addressed.
Integrating data from multiple, possibly conflicting sources: A funda-
mental challenge in bioinformatics in general is that of data integration.
Data is available in many formats and often times are in conflict. For ex-
ample protein interaction data produced by various experimental meth-
ods (mass spectrometry, Yeast2Hybrid, in-silico) are often in conflict.
Research into methods that are capable of resolving such conflicts while


still discovering useful patterns are needed.
Incorporating domain information: It has been our observation that often
we as data mining researchers tend to under-utilize available domain
information. This may arise out of ignorance (the field of bioinformatics
is very vast) or simply omitted from the training phase as a means to
confirm the utility of the proposed methods (to maintain the sanctity
of the validation procedure). We believe a fresh look at how domain
knowledge can be embedded in existing approaches and better validation
methodologies in close conjunction with domain experts must be looked
into.
A Survey of Graph Mining Techniques for Biological Datasets 571
Uncertainty-aware and noise-tolerant methods: While this has certainly
been an active area of research in the bioinformatics community in gen-
eral, and in the field of graph mining in bioinformatics in particular,
there are still many open problems here. Incorporating uncertainty is
necessarily a domain-dependent issue and probabilistic approaches of-
fer exciting possibilities. Additionally leveraging topological, relational
and other semantic characteristics of the data effectively is an interesting
topic for future research. A related challenge here is to model trust and
provenance related information.
Ranking and summarizing patterns harvested: While ranking and sum-
marizing patterns has been the subject of much research in the data min-
ing and network science community the role of such methods in bioin-
formatics has been much less researched. We expect this to be a very
important and active area of research especially since often times evalu-
ating and validating patterns discovered can be an expensive and time
consuming process. In this context research into ranking algorithms
for bioinformatics that leverage domain knowledge and mechanisms for
summarizing patterns harvested is an exciting opportunity for future re-
search.

References
[1] Akutsu, T. (1992). An RNC algorithm for finding a largest common
subtree of two trees. IEICE Transactions on Information and Systems,
75(1):95–101.
[2] Aoki, K., Mamitsuka, H., Akutsu, T., and Kanehisa, M. (2005). A score
matrix to reveal the hidden links in glycans. Bioinformatics, 21(8):1457–
1463.
[3] Aoki, K., Ueda, N., Yamaguchi, A., Kanehisa, M., Akutsu, T., and Mamit-
suka, H. (2004a). Application of a new probabilistic model for recognizing
complex patterns in glycans.
[4] Aoki, K., Yamaguchi, A., Okuno, Y., Akutsu, T., Ueda, N., Kanehisa, M.,
and Mamitsuka, H. (2003). Efficient tree-matching methods for accurate
carbohydrate database queries. Genome Informatics Sl, pages 134–143.
[5] Aoki, K., Yamaguchi, A., Ueda, N., Akutsu, T., Mamitsuka, H., Goto, S.,
and Kanehisa, M. (2004b). KCaM (KEGG Carbohydrate Matcher): a soft-
ware tool for analyzing the structures of carbohydrate sugar chains. Nu-
cleic acids research, 32(Web Server Issue):W267.
[6] Asur, S., Ucar, D., and Parthasarathy, S. (2007). An ensemble frame-
work for clustering protein protein interaction networks. Bioinformatics,
23(13):i29.
572 MANAGING AND MINING GRAPH DATA
[7] Avogadri, R. and Valentini, G. (2009). Fuzzy ensemble clustering based
on random projections for DNA microarray data analysis. Artificial Intel-
ligence in Medicine, 45(2-3):173–183.
[8] Bader, G. and Hogue, C. (2003). An automated method for finding molec-
ular complexes in large protein interaction networks. BMC Bioinfomatics,
4:2.
[9] Bafna, V., Muthukrishnan, S., and Ravi, R. (1995). Computing similarity
between RNA strings. In Combinatorial Pattern Matching (CPM), volume
937 of LNCS.

[10] Bar-Joseph, Z., Gerber, G., Lee, T., Rinaldi, N., Yoo, J., Robert, F., Gor-
don, D., Fraenkel, E., Jaakkola, T., Young, R., et al. (2003). Computational
discovery of gene modules and regulatory networks. Nature Biotechnol-
ogy, 21(11):1337–1342.
[11] Benedetti, G. and Morosetti, S. (1996). A graph-topological approach to
recognition of pattern and similarity in RNA secondary structures. Bio-
physical chemistry, 59(1-2):179–184.
[12] Bille, P. (2005). A survey on tree edit distance and related problems.
Theoretical computer science, 337(1-3):217–239.
[13] Bohne-Lang, A., Lang, E., F
-
orster, T., and von der Lieth, C. (2001). LIN-
UCS: linear notation for unique description of carbohydrate sequences.
Carbohydrate research, 336(1):1–11.
[14] Brohee, S. and van Helden, J. (2006). Evaluation of clustering algorithms
for protein-protein interaction networks. BMC bioinformatics, 7(1):488.
[15] Butte, A. and Kohane, I. (2000). Mutual information relevance networks:
functional genomic clustering using pairwise entropy measurements. In
Pac Symp Biocomput, volume 5, pages 418–429.
[16] Chakrabarti, D. and Faloutsos, C. (2006). Graph mining: Laws, genera-
tors, and algorithms. ACM Computing Surveys (CSUR), 38(1).
[17] Chawathe, S. and Garcia-Molina, H. (1997). Meaningful change detec-
tion in structured data. ACM SIGMOD Record, 26(2):26–37.
[18] Chawathe, S., Rajaraman, A., Garcia-Molina, H., and Widom, J. (1996).
Change detection in hierarchically structured information. In Proceedings
of the 1996 ACM SIGMOD international conference on Management of
data, pages 493–504. ACM New York, NY, USA.
[19] Chen, J., Hsu, W., Lee, M., and Ng, S. (2006). NeMoFinder: Dissecting
genome-wide protein-protein interactions with meso-scale network mo-
tifs. In Proceedings of the 12th ACM SIGKDD international conference on

Knowledge discovery and data mining, pages 106–115. ACM New York,
NY, USA.
A Survey of Graph Mining Techniques for Biological Datasets 573
[20] Chen, J., Hsu, W., Lee, M. L., and Ng, S K. (2007). Labeling network
motifs in protein interactomes for protein function prediction. Data Engi-
neering, International Conference on, 0:546–555.
[21] Cheng, Y. and Church, G. (2000). Biclustering of expression data. In
Proceedings of the Eighth International Conference on Intelligent Systems
for Molecular Biology table of contents, pages 93–103. AAAI Press.
[22] Chua, H., Ning, K., Sung, W., Leong, H., and Wong, L. (2007). Using in-
direct protein-protein interactions for protein complex prediction. In Com-
putational Systems Bioinformatics: Proceedings of the CSB 2007 Confer-
ence, page 97. Imperial College Press.
[23] Coatney, M. and Parthasarathy, S. (2005a). MotifMiner: Efficient discov-
ery of common substructures in biochemical molecules. Knowledge and
Information Systems, 7(2):202–223.
[24] Coatney, M. and Parthasarathy, S. (2005b). Motifminer: Efficient discov-
ery of common substructures in biochemical molecules. Knowl. Inf. Syst.,
7(2):202–223.
[25] Constantinescu, M. and Sankoff, D. (1995). An efficient algorithm for
supertrees. Journal of Classification, 12(1):101–112.
[26] Cooper, C., Harrison, M., Wilkins, M., and Packer, N. (2001). GlycoSuit-
eDB: a new curated relational database of glycoprotein glycan structures
and their biological sources. Nucleic Acids Research, 29(1):332.
[27] Dhillon, I., Guan, Y., and Kulis, B. (2005). A fast kernel-based multilevel
algorithm for graph clustering. Proceedings of the 11th ACM SIGKDD,
pages 629–634.
[28] Dongen, S. (2000). Graph clustering by flow simulation. PhD thesis,
PhD Thesis, University of Utrecht, The Netherlands.
[29] Durocher, D., Taylor, I., Sarbassova, D., Haire, L., Westcott, S., Jack-

son, S., Smerdon, S., and Yaffe, M. (2000). The molecular basis of FHA
domain: phosphopeptide binding specificity and implications for phospho-
dependent signaling mechanisms. Molecular Cell, 6(5):1169–1182.
[30] Eisen, M., Spellman, P., Brown, P., and Botstein, D. (1998). Cluster anal-
ysis and display of genome-wide expression patterns. Proceedings of the
National Academy of Sciences, 95(25):14863–14868.
[31] Farach, M. and Thorup, M. (1994). Fast comparison of evolutionary trees.
In Proceedings of the fifth annual ACM-SIAM symposium on Discrete al-
gorithms, pages 481–488. Society for Industrial and Applied Mathematics
Philadelphia, PA, USA.
[32] Fitch, W. (1971). Toward defining the course of evolution: minimum
change for a specific tree topology. Systematic zoology, 20(4):406–416.
574 MANAGING AND MINING GRAPH DATA
[33] F
-
urtig, B., Richter, C., Wohnert, J., and Schwalbe, H. (2003). NMR spec-
troscopy of RNA. ChemBioChem, 4(10):936–962.
[34] Gan, H., Pasquali, S., and Schlick, T. (2003). Exploring the repertoire of
RNA secondary motifs using graph theory; implications for RNA design.
Nucleic acids research, 31(11):2926.
[35] Gardner, P. and Giegerich, R. (2004). A comprehensive comparison of
comparative RNA structure prediction approaches. BMC bioinformatics,
5(1):140.
[36] Gordon, A. (1979). A measure of the agreement between rankings.
Biometrika, 66(1):7–15.
[37] Gordon, A. (1986). Consensus supertrees: the synthesis of rooted trees
containing overlapping sets of labeled leaves. Journal of Classification,
3(2):335–348.
[38] Gouda, K. and Zaki, M. (2001). Efficiently mining maximal frequent
itemsets. In Proceedings of the 2001 IEEE International Conference on

Data Mining, pages 163–170.
[39] Grochow, J. and Kellis, M. (2007). Network motif discovery using sub-
graph enumeration and symmetry-breaking. Lecture Notes in Computer
Science, 4453:92.
[40] Guignon, V., Chauve, C., and Hamel, S. (2005). An edit distance between
RNA stem-loops. Lecture notes in computer science, 3772:333.
[41] Gupta, A. and Nishimura, N. (1998). Finding largest subtrees and small-
est supertrees. Algorithmica, 21(2):183–210.
[42] Hadzic, F., Dillon, T., Sidhu, A., Chang, E., and Tan, H. (2006). Mining
substructures in protein data. In IEEE ICDM 2006 Workshop on Data
Mining in Bioinformatics (DMB 2006), pages 18–22.
[43] Hartuv, E. and Shamir, R. (2000). A clustering algorithm based on graph
connectivity. Information processing letters, 76(4-6):175–181.
[44] Hashimoto, K., Aoki-Kinoshita, K., Ueda, N., Kanehisa, M., and Mamit-
suka, H. (2006a). A new efficient probabilistic model for mining labeled
ordered trees. In Proceedings of the 12th ACM SIGKDD international
conference on Knowledge discovery and data mining, pages 177–186.
[45] Hashimoto, K., Goto, S., Kawano, S., Aoki-Kinoshita, K., Ueda, N.,
Hamajima, M., Kawasaki, T., and Kanehisa, M. (2006b). KEGG as a gly-
come informatics resource. Glycobiology, 16(5):63–70.
[46] Hashimoto, K., Takigawa, I., Shiga, M., Kanehisa, M., and Mamitsuka,
H. (2008). Mining significant tree patterns in carbohydrate sugar chains.
Bioinformatics, 24(16):i167.
A Survey of Graph Mining Techniques for Biological Datasets 575
[47] Henikoff, S. and Henikoff, J. (1992). Amino acid substitution matrices
from protein blocks. Proceedings of the National Academy of Sciences,
89(22):10915–10919.
[48] Herget, S., Ranzinger, R., Maass, K., and Lieth, C. (2008). GlycoCT:
a unifying sequence format for carbohydrates. Carbohydrate Research,
343(12):2162–2171.

[49] Hizukuri, Y., Yamanishi, Y., Nakamura, O., Yagi, F., Goto, S., and Kane-
hisa, M. (2005). Extraction of leukemia specific glycan motifs in humans
by computational glycomics. Carbohydrate research, 340(14):2270–2278.
[50] H
-
ochsmann, M., Voss, B., and Giegerich, R. (2004). Pure multiple
RNA secondary structure alignments: a progressive profile approach.
IEEE/ACM Transactions on Computational Biology and Bioinformatics
(TCBB), 1(1):53–62.
[51] Holder, L., Cook, D., and Djoko, S. (1994). Substructure discovery in the
subdue system. In Proc. of the AAAI Workshop on Knowledge Discovery
in Databases, pages 169–180.
[52] Horvath, S. and Dong, J. (2008). Geometric interpretation of gene coex-
pression network analysis. PLoS Computational Biology, 4(8).
[53] Hu, H., Yan, X., Huang, Y., Han, J., and Zhou, X. (2005). Mining co-
herent dense subgraphs across massive biological networks for functional
discovery. Bioinformatics, 21(1):213–221.
[54] Huan, J., Wang, W., Prins, J., and Yang, J. (2004). Spin: mining maxi-
mal frequent subgraphs from graph databases. In Proceedings of the tenth
ACM SIGKDD international conference on Knowledge discovery and data
mining, pages 581–586. ACM New York, NY, USA.
[55] Huang, Y., Li, H., Hu, H., Yan, X., Waterman, M., Huang, H., and
Zhou, X. (2007). Systematic discovery of functional modules and
context-specific functional annotation of human genome. Bioinformatics,
23(13):i222.
[56] Jiang, T., Lawler, E., and Wang, L. (1994). Aligning sequences via an
evolutionary tree: complexity and approximation. In Proceedings of the
twenty-sixth annual ACM symposium on Theory of computing, pages 760–
769. ACM New York, NY, USA.
[57] Jiang, T., Lin, G., Ma, B., and Zhang, K. (2002). A general edit distance

between RNA structures. Journal of Computational Biology, 9(2):371–
388.
[58] Jiang, T., Wang, L., and Zhang, K. (1995). Alignment of trees: an alter-
native to tree edit. Theoretical Computer Science, 143(1):137–148.
[59] Jin, R., Wang, C., Polshakov, D., Parthasarathy, S., and Agrawal, G.
(2005). Discovering frequent topological structures from graph datasets.
576 MANAGING AND MINING GRAPH DATA
In Proceedings of the eleventh ACM SIGKDD international conference on
Knowledge discovery in data mining, pages 606–611. ACM New York,
NY, USA.
[60] Karypis, G. and Kumar, V. (1999). A fast and high quality multilevel
scheme for partitioning irregular graphs. SIAM Journal on Scientific Com-
puting, 20(1):359.
[61] Kashtan, N., Itzkovitz, S., Milo, R., and Alon, U. (2004). Efficient sam-
pling algorithm for estimating subgraph concentrations and detecting net-
work motifs. Bioinformatics, 20(11):1746–1758.
[62] Kawano, S., Hashimoto, K., Miyama, T., Goto, S., and Kanehisa, M.
(2005). Prediction of glycan structures from gene expression data based
on glycosyltransferase reactions. Bioinformatics, 21(21):3976–3982.
[63] Keselman, D. and Amir, A. (1994). Maximum agreement subtree in a
set of evolutionary trees-metrics and efficient algorithms. In Annual Sym-
posium on Foundations of Computer Science, volume 35, pages 758–758.
IEEE Computer Society Press.
[64] Khanna, S., Motwani, R., and Yao, F. (1995). Approximation algorithms
for the largest common subtree problem.
[65] Kohonen, T. (1995). Self-organizing maps. Springer, Berlin.
[66] Koyuturk, M., Grama, A., and Szpankowski, W. (2004a). An efficient
algorithm for detecting frequent subgraphs in biological networks. Bioin-
formatics, 20(90001).
[67] Koyuturk, M., Szpankowski, W., and Grama, A. (2004b). Bicluster-

ing gene-feature matrices for statistically significant dense patterns. In
2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB
2004. Proceedings, pages 480–484.
[68] Kuboyama, T., Hirata, K., Aoki-Kinoshita, K., Kashima, H., and Yasuda,
H. (2006). A gram distribution kernel applied to glycan classification and
motif extraction. Genome Informatics Series, 17(2):25.
[69] Le, S., Owens, J., Nussinov, R., Chen, J., Shapiro, B., and Maizel, J.
(1989). RNA secondary structures: comparison and determination of fre-
quently recurring substructures by consensus. Bioinformatics, 5(3):205–
210.
[70] Lee, H., Hsu, A., Sajdak, J., Qin, J., and Pavlidis, P. (2004). Coexpres-
sion analysis of human genes across many microarray data sets. Genome
Research, 14(6):1085–1094.
[71] Lemmens, K., Dhollander, T., De Bie, T., Monsieurs, P., Engelen, K.,
Smets, B., Winderickx, J., De Moor, B., and Marchal, K. (2006). Infer-
ring transcriptional modules from ChIP-chip, motif and microarray data.
Genome biology, 7(5):R37.
A Survey of Graph Mining Techniques for Biological Datasets 577
[72] Li, H., Marsolo, K., Parthasarathy, S., and Polshakov, D. (2004). A new
approach to protein structure mining and alignment. Proceedings of the
ACM SIGKDD Workshop on Data Mining and Bioinformatics (BIOKDD),
pages 1–10.
[73] Li, X., Foo, C., and Ng, S. (2007). Discovering protein complexes in
dense reliable neighborhoods of protein interaction networks. In Computa-
tional Systems Bioinformatics: Proceedings of the CSB 2007 Conference,
page 157. Imperial College Press.
[74] Liu, N. and Wang, T. (2006). A method for rapid similarity analysis of
RNA secondary structures. BMC bioinformatics, 7(1):493.
[75] Loß, A., Bunsmann, P., Bohne, A., Loß, A., Schwarzer, E., Lang, E., and
Von der Lieth, C. (2002). SWEET-DB: an attempt to create annotated data

collections for carbohydrates. Nucleic acids research, 30(1):405–408.
[76] MacQueen, J. (1967). Some methods for classification and analysis of
multivariate observations. Proceedings of 5th Berkeley Symposium on
Mathematical Statistics and Probability, pages 281–297.
[77] Margush, T. and McMorris, F. (1981). Consensusn-trees. Bulletin of
Mathematical Biology, 43(2):239–244.
[78] Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D., and
Alon, U. (2002). Network motifs: simple building blocks of complex net-
works. Science, 298(5594):824–827.
[79] Mitchell, J., Cheng, J., and Collins, K. (1999). A box H/ACA small nu-
cleolar RNA-like domain at the human telomerase RNA 3’end. Molecular
and cellular biology, 19(1):567–576.
[80] Newman, M. and Girvan, M. (2004). Finding and evaluating community
structure in networks. Physical Review E, 69:026113.
[81] Ohtsubo, K. and Marth, J. (2006). Glycosylation in cellular mechanisms
of health and disease. Cell, 126(5):855–867.
[82] Onoa, B. and Tinoco, I. (2004). RNA folding and unfolding. Current
Opinion in Structural Biology, 14(3):374–379.
[83] Packer, N., von der Lieth, C., Aoki-Kinoshita, K., Lebrilla, C., Paulson,
J., Raman, R., Rudd, P., Sasisekharan, R., Taniguchi, N., and York, W.
(2008). Frontiers in glycomics: Bioinformatics and biomarkers in disease.
Proteomics, 8(1).
[84] Pizzuti, C. and Rombo, S. (2008). Multi-functional protein clustering in
ppi networks. In Bioinformatics Research and Development, pages 318–
330.
[85] Ragan, M. (1992). Phylogenetic inference based on matrix representation
of trees. Molecular Phylogenetics and Evolution, 1(1):53.
578 MANAGING AND MINING GRAPH DATA
[86] Ravasz, E., Somera, A., Mongru, D., Oltvai, Z., and Barabasi, A. (2002).
Hierarchical organization of modularity in metabolic networks.

[87] Sahoo, S., Thomas, C., Sheth, A., Henson, C., and York, W. (2005).
GLYDE-an expressive XML standard for the representation of glycan
structure. Carbohydrate research, 340(18):2802–2807.
[88] Sanderson, M., Purvis, A., and Henze, C. (1998). Phylogenetic su-
pertrees: assembling the trees of life. Trends in Ecology & Evolution,
13(3):105–109.
[89] Satuluri, V. and Parthasarathy, S. (2009). Scalable Graph Clustering using
Stochastic Flows: Applications to Community Discovery. In Proceedings
of the 15th ACM SIGKDD international conference on Knowledge discov-
ery and data mining, pages 737–746.
[90] Sch
-
olkopf, B. and Smola, A. (2002). Learning with kernels: Support
vector machines, regularization, optimization, and beyond. MIT press.
[91] Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., and
Friedman, N. (2003). Module networks: identifying regulatory modules
and their condition-specific regulators from gene expression data. Nature
genetics, 34(2):166–176.
[92] Selkow, S. (1977). The tree-to-tree editing problem. Information pro-
cessing letters, 6(6):184–186.
[93] Shapiro, B. and Zhang, K. (1990). Comparing multiple RNA secondary
structures using tree comparisons. Bioinformatics, 6(4):309–318.
[94] Sharan, R. and Shamir, R. (2000). CLICK: A clustering algorithm with
applications to gene expression analysis. 8:307–316.
[95] Shasha, D., Wang, J., and Zhang, S. (2004). Unordered tree mining with
applications to phylogeny. In in Proceedings of International Conference
on Data Engineering, pages 708–719.
[96] Shi, J. and Malik, J. (2000). Normalized cuts and image segmenta-
tion. IEEE Transactions on pattern analysis and machine intelligence,
22(8):888–905.

[97] Shih, F. and Mitchell, O. (1989). Threshold decomposition of gray-scale
morphology into binarymorphology. IEEE Transactions on Pattern Anal-
ysis and Machine Intelligence, 11(1):31–42.
[98] Smith, T. and Waterman, M. (1981). Identification of common molecular
subsequences. J. Mol. Bwl, 147:195–197.
[99] Sneath, S. (1973). Hierarchical clustering.
[100] Stark, C., Breitkreutz, B., Reguly, T., Boucher, L., Breitkreutz, A., and
Tyers, M. (2006). BioGRID: a general repository for interaction datasets.
Nucleic acids research, 34(Database Issue):D535.
A Survey of Graph Mining Techniques for Biological Datasets 579
[101] Stockham, C., Wang, L., and Warnow, T. (2002). Statistically based
postprocessing of phylogenetic analysis by clustering. Bioinformatics,
18(3):465–469.
[102] Stuart, J., Segal, E., Koller, D., and Kim, S. (2003a). A gene-
coexpression network for global discovery of conserved genetic modules.
Science, 302(5643):249–255.
[103] Stuart, J., Segal, E., Koller, D., and Kim, S. (2003b). A gene-
coexpression network for global discovery of conserved genetic modules.
Science, 302(5643):249–255.
[104] Tai, K. (1979). The tree-to-tree correction problem. Journal of the
Association for Computing Machm
c
⃝ ry, 26(3):422–433.
[105] Tanay, A., Sharan, R., Kupiec, M., and Shamir, R. (2004). Revealing
modularity and organization in the yeast molecular network by integrated
analysis of highly heterogeneous genomewide data. Proceedings of the
National Academy of Sciences, 101(9):2981–2986.
[106] Tanay, A., Sharan, R., and Shamir, R. (2002). Discovering statistically
significant biclusters in gene expression data. Bioinformatics, 18(Suppl
1):S136–S144.

[107] Tinoco, I. and Bustamante, C. (1999). How RNA folds. Journal of
molecular biology, 293(2):271–281.
[108] Ueda, N., Aoki, K., and Mamitsuka, H. (2004). A general probabilistic
framework for mining labeled ordered trees. In Proceedings of the Fourth
SIAM International Conference on Data Mining, pages 357–368.
[109] Ueda, N., Aoki-Kinoshita, K., Yamaguchi, A., Akutsu, T., and Mamit-
suka, H. (2005). A probabilistic model for mining labeled ordered trees:
Capturing patterns in carbohydrate sugar chains. IEEE Transactions on
Knowledge and Data Engineering, 17(8):1051–1064.
[110] Valiente, G. (2002). Algorithms on trees and graphs. Springer.
[111] Wang, C. and Parthasarathy, S. (2004). Parallel algorithms for mining
frequent structural motifs in scientific data. In Proceedings of the 18th
annual international conference on Supercomputing, pages 31–40. ACM
New York, NY, USA.
[112] Wang, L., Jiang, T., and Gusfield, D. (1997). A more efficient approxi-
mation scheme for tree alignment. In Proceedings of the first annual inter-
national conference on Computational molecular biology, pages 310–319.
ACM New York, NY, USA.
[113] Wang, L., Jiang, T., and Lawler, E. (1996). Approximation algorithms
for tree alignment with a given phylogeny. Algorithmica, 16(3):302–315.
[114] Yamanishi, Y., Bach, F., and Vert, J. (2007). Glycan classification with
tree kernels. Bioinformatics, 23(10):1211.

×