Tải bản đầy đủ (.pdf) (10 trang)

Báo cáo khoa học: From functional genomics to systems biology Meeting report based on the presentations at the 3rd EMBL Biennial Symposium 2006 (Heidelberg, Germany) Sergii Ivakhno pot

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (117.75 KB, 10 trang )

REVIEW ARTICLE
From functional genomics to systems biology
Meeting report based on the presentations at the 3rd EMBL Biennial
Symposium 2006 (Heidelberg, Germany)
Sergii Ivakhno
Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, UK
Introduction
The third EMBL Biennial Symposium, From func-
tional genomics to systems biology, was held in Heidel-
berg, Germany, 14–17 October 2006. The title of the
conference clearly states the major challenges and
issues that were addressed by the speakers – how to
combine different ‘omics’ technologies and bioinfor-
matics ⁄ computational methodologies to address
increasingly complex biological questions. The main
conference was divided into five separate sessions,
which discussed different functional genomic approa-
ches in systems biology: (a) A global view of transcrip-
tional regulation, (b) Genomics of development and
disease, (c) Protein–protein interaction networks and
beyond, (d) Towards functional interaction networks,
(e) Systems level analysis: from organisms to commu-
nities.
Table 1 gives a broad overview of topics presented
at the meeting according to the systems biology
applications, types of high-throughput techniques,
and biological networks. From the sheer number of
various high-throughput genomic approaches des-
cribed at the meeting, it becomes clear that ‘postge-
nome’ science has already entered the most exciting
period of analyzing biological functions at the sys-


tems-wide level. Chromatin immunoprecipitation
arrays (chip-on-chip), tiling arrays, DNA microar-
rays, synthetic genetic arrays, high-content fluorescent
microscopy, protein microarrays, RNA interference
Keywords
DNA microarray medical applications;
functional genomics; genetic interaction
networks; networks biology; signalling
networks; systems biology
Correspondence
S. Ivakhno, Institute for Adaptive and Neural
Computation, School of Informatics,
University of Edinburgh, E4, 5 Forrest Hill,
Edinburgh EH1 2QL, UK
Fax: +44 (0) 131 6506899
Tel: +44 (0) 131 6676000, ext.
0131 6684266
E-mail:
(Received 30 January 2007, revised 1 March
2007, accepted 12 March 2007)
doi:10.1111/j.1742-4658.2007.05794.x
This review discusses the talks presented at the third EMBL Biennial Sym-
posium, From functional genomics to systems biology, held in Heidelberg,
Germany, 14–17 October 2006. Current issues and trends in various sub-
fields of functional genomics and systems biology are considered, including
analysis of regulatory elements, signalling networks, transcription networks,
protein–protein interaction networks, genetic interaction networks, medical
applications of DNA microarrays, and metagenomics. Several technological
advances in the fields of DNA microarrays, identification of regulatory ele-
ments in the genomes of higher eukaryotes, and MS for detection of pro-

tein interactions are introduced. Major directions of future systems biology
research are also discussed.
Abbreviations
RNAi, RNA interference; SGA, synthetic genetic array; TF, transcription factor; Y1H, yeast one-hybrid; Y2H, yeast two-hybrid.
FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS 2439
(RNAi) screens, and high-throughput metagenomic
sequencing are some of the technologies discussed by
the speakers. Computational methods and algorithms
were also an integral part of the conference, with
various systems biology applications of machine
learning, algorithmic network theory, differential
equation modelling, and simulation being introduced.
In the following, I will discuss some of the talks
representing different areas of functional genomics,
networks and systems biology.
Analysis of regulatory elements in the
genomes of higher eukaryotes
The first session of the conference began with a talk
by E. Birney from the European Bioinformatics Insti-
tute (Hinxton, Cambridgeshire, UK). Birney described
recent efforts of the ENCODE project (Encyclopaedia
of DNA Elements), a multi-institutional collaboration
supported by NIH and the Welcome Trust that
attempts to map all functional elements in the human
genome: promoters, enhancers, repressors ⁄ silencers,
exons, origins of replication, sites of replication ter-
mination, transcription factor (TF)-binding sites,
methylation sites, deoxyribonuclease I-hypersensitive
sites, chromatin modifications, and multispecies con-
served sequences of as yet unknown function [1]. The

pilot phase, which began in September 2003, is less
ambitious and targets 44 uniformly distributed regions
that comprise 1% of the genome. Birney’s talk empha-
sized the problem of mapping TF-binding sites and
other elements that regulate transcription. Standardiza-
tion of the protocols and comparison of different tech-
niques was one of the major challenges encountered in
the pilot phase. Another big problem concerned the
annotation of the transcription regulatory elements. In
contrast with genomes of simple eukaryotes such as
yeast, in which the regulatory elements occur upstream
of the genes that they regulate, in the human genome
they are widely dispersed and occur between and
within intones, making them very hard to map. This
will probably be the next big challenge for computa-
tional biologists, who need to develop new algorithms
for detecting regulatory elements with varying position
in the genome. Comparative genomic approaches
previously gave the best results for finding regulatory
elements in eukaryotes [2]; however, additional devel-
opments will be required to detect elements dispersed
throughout the genome.
L. Steinmetz from EMBL (Heidelberg, Germany)
described the application of tiling arrays for detection
of new transcripts and refinement of boundary, struc-
ture, and expression level of coding and noncoding
transcripts in the yeast genome [3]. Although the con-
cept of using tiling arrays and gene expression to find
functional transcribed elements is not new (a review of
the topic can be found in [4]), Steinmetz’s group and

collaborators developed a new and more sensitive
Table 1. Overview of the topics covered in the meeting report according to systems biology applications, types of high-throughput tech-
niques, and biological networks.
Type of biological
networks ⁄ area of
functional genomics
High-throughput
functional genomic
techniques
Systems biology
applications
Cancer diagnosis DNA microarray Head and neck cancer [9,10],
leukaemia AmpliChip
Analysis of regulatory
elements
Tiling arrays, chromosome
conformation capture
ENCODE project [1], expression in the
yeast genome [3], analysis of globin
locus enhancers [5]
Transcription regulatory
networks
Chip-on-chip, Y1H system,
DNA microarray
Transcription regulatory network during
muscle development in Drosophila [19]
Epistasis gene networks High-throughput RNAi screens [23,24] Wnt pathway [25]
Genetic interaction networks Yeast SGA [29,31],
epistatic mini-array profiles [35]
Yeast genetic interaction network

Chemical–genetic interaction
networks
Yeast SGA [31,32] Yeast chemical–genetic interaction networks
Protein interaction networks Y2H screens, MS-based analysis
of protein complexes [39]
Coverage and false positives in protein
interaction networks [37]
Signalling networks ODE modelling Epigenetic inheritance of gene-expression
dynamics in single cells using [48]
Networks of networks:
metagenomics
High-throughput DNA sequencing Bacterial communities [50]
Functional genomics and systems biology S. Ivakhno
2440 FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS
oligonucleotide array which contains 6.5 million
probes and interrogates both strands of the full
genomic sequence (accomplishing 8 nucleotide resolu-
tion for double-stranded targets). Significant expres-
sion above background was detected for 5104 ORFs
(90%) during exponential growth in rich medium.
Remarkably, 16% of the transcribed base pairs had
not been annotated before, which is rather surprising
considering more than 10 years of intensive analysis of
the yeast genome.
As already mentioned, in many cases regulatory ele-
ments are located at distances up to several megabases
from their target genes, in which case control of gene
expression cannot be mediated through direct physical
interaction between genes and their regulatory ele-
ments. The development of techniques to detect long-

distance interactions was the topic of J. Dekker’s talk
from the University of Massachusetts Medical School
(Boston, MA, USA). He described the chromosome
conformation capture methodology which uses formal-
dehyde cross-linking to covalently link interacting
chromatin segments in intact cells [5]. Cross-linked
chromatin is then solubilized and digested with an
appropriate restriction enzyme, which is then followed
by intramolecular ligation of cross-linked fragments.
The resulting template therefore contains a large col-
lection of ligation products that reflect interaction
between two genomic loci and can be detected by
quantitative PCR using specific primers. The abun-
dance of each ligation product can be used in a quan-
titative manner to measure the frequency with which
the two loci in the genome interact with each other.
Dekker’s group applied this technique to the analysis
of globin locus enhancers and showed that chromo-
some conformation capture has a similar or better sen-
sitivity than the chip-on-chip approach. The advantage
of chromosome conformation capture is that it can
detect regulatory elements that are active only in a
particular cellular state, developmental stage or cell
type.
Functional genomics approaches
to diagnosis of diseases
One of the goals of systems biology and high-through-
put functional genomics is to develop better diagnostic
tools that would allow adoption of personalized medi-
cine approaches in clinical settings [6]. Medical appli-

cations of systems biology and functional genomics
were widely discussed at the conference, with several
talks devoted to the use of DNA microarrays for can-
cer diagnosis and prognosis. For instance, leukaemia
comprises more than 20 subgroups, which may require
different approaches for successful treatment. Cur-
rently, the diagnosis and classification of leukaemia
rely on the simultaneous application of multiple tech-
niques, such as cytomorphology, histomorphology,
cytochemistry and multiparameter flow cytometry,
often supplemented by fluorescence in situ hybridi-
zation and molecular techniques, such as PCR. These
high-cost and time-consuming approaches have
encouraged the development of more effective diagnos-
tic techniques. The use of DNA microarrays for can-
cer diagnosis was proposed more than 10 years ago,
yet not a single microarray diagnostic kit has been
approved by the FDA. One of the key challenges in
using DNA microarrays for cancer diagnosis is the
reproducibility of signature genes characterized by dif-
ferent groups [7,8]. This issue was addressed by
F. Holstege from the Genomics Laboratory (UMC
Utrecht, The Netherlands) in his talk on signatures for
detection of lymph node metastasis in patients with
head and neck cancer. It can often be very difficult to
detect lymph node metastases reliably, but their early
detection is crucial for the appropriate treatment.
Using DNA microarray, Holstege’s group and colla-
borators built a 102-gene classifier from 82 tumours,
which outperformed current clinical diagnosis tech-

niques in its predictive accuracy when independently
validated [9]. However, further examination revealed
that, when the oldest tumour samples were excluded,
the predictive accuracy remained high but the overlap
between two signature gene sets found was limited to
49 genes [10]. This is a typical example that led many
researchers to question the validity of DNA micro-
array approaches for cancer classification [11]. Hols-
tege proposed an alternative explanation for such a
discrepancy: incomplete overlap may be caused by the
presence of a large number of genes with similar pat-
terns of expression across samples. This suggests that
many predictive genes can be interchanged without
influencing the predictive outcome and that multiple,
different gene sets can be used for accurate prediction
[10]. Holstege described how through repetitive samp-
ling they found that 3000 different signature gene sets
(comprising 825 unique genes occurring in one set at
least) can classify tumour samples with similar high
accuracy. Holstege concluded that there is no single
set of genes with optimal predictive accuracy and that
various signatures can be identified by different insti-
tutes or simply by using different samples. This study
also exposes the flaw behind common attempts to
make signature gene lists as small as possible, the
argument being that molecular signatures based on
more genes will be less prone to biases towards specific
samples.
S. Ivakhno Functional genomics and systems biology
FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS 2441

Next, T. Haferlach from Ludwig-Maximilians-Uni-
versity (Munich, Germany) described the progress in
building the first commercial ampliChip DNA microar-
ray for testing leukaemia which will be released by
Roche. The major challenge facing clinical trials is the
large number of tumour samples that must be analyzed
to ensure the high accuracy of signature gene lists,
which often results in high costs and time delay.
Reporting on the preliminary screens, Haferlach des-
cribed a DNA microarray study of 937 bone marrow
and peripheral blood samples from 892 patients with
all clinically relevant leukaemia subtypes. They were
used to build a classifier with overall prediction accu-
racy of 95.1%. In the follow up round of clinical trials
carried out by Microarray Innovations in Leukaemia
(MILE, an international initiative with 11 centres from
Europe, USA and Singapore), DNA microarrays are
being used to analyze samples from more than 2000
patients The results from this and other studies will
help to restrict the number of genes on the ampliChip
to about 500 of the most predictive ones. Although the
AmpliChip is not the first array to enter the market
(the MammaPrint 70-gene signature for diagnosis of
breast cancer based on a study by van’t Veer et al . [12]
is already available through Agendia), it could be the
first one to obtain FDA approval for clinical tests.
Haferlach estimates that, once the AmpliChip is avail-
able, it will provide a more accurate, faster and cost-
saving strategy for diagnosis of leukaemia.
From systems to networks biology

Talks related to networks biology covered a large por-
tion of the meeting. Among many different types of
biological networks discussed were gene regulatory net-
works, protein interaction networks, genetic networks,
signalling networks and networks of bacterial commu-
nities. For interested readers, comprehensive surveys of
networks biology principles can be found in [13,14].
One of the reasons why networks biology receives such
close attention is that the network-based representation
of high-throughput biological data can serve as a core
around which more comprehensive information about
biological models can be arranged. It also provides a
natural method for integration of different biological
data.
Transcription regulatory networks
I begin by describing talks that addressed analysis of
transcription regulatory networks. Transcription regu-
latory networks, first described for Escherichia coli [15]
and yeast [16], consist of physical and functional inter-
actions between TFs and their target genes represented
on the graph [17]. The systematic mapping of TF–target
gene interactions has been very successful in unicellular
systems using ‘TF-centred’ approaches, such as combi-
nation of chromatin immunoprecipitation (ChIP) with
promoter DNA microarrays (known as chip-on-chip),
which identifies a list of direct target genes for a partic-
ular transcription factor under a given set of conditions.
However, as suggested by M. Walhout from the Uni-
versity of Massachusetts Medical School, metazoan sys-
tems are less amenable to application of chip-on-chip

methods. First, TFs that are expressed at low levels, in
a few cells, or during a narrow developmental interval
are not suitable for ‘TF-centred’ experiments. Secondly,
antibodies are only available for a very limited num-
ber of metazoan TFs, restricting the applicability of
chip-on-chip. Walhout described an alternative
‘gene-centred’ approach for elucidating transcription
regulatory networks, which uses a high-throughput
gateway-compatible yeast one-hybrid (Y1H) system
[18]. Y1H is a genetic system based on the reporter gene
expression in yeast that detects interactions between a
‘DNA bait’ (e.g. cis-regulatory DNA elements or gene
promoters) and ‘protein prey’ (e.g. TFs). When a prey
protein binds to the DNA bait, the heterologous activa-
tion domain activates reporter gene expression. Thus,
physical interactions between repressors ⁄ activators and
their DNA targets can be identified.
Walhout described an application of the Y1H sys-
tem in Caenorhabditis elegans in which her group iden-
tified 283 interactions between 72 digestive tract genes
and 117 proteins, providing the first set of putative tar-
get genes for nearly 10% of all predicted worm TFs.
Detailed analysis found that more than 70% of the
promoters are bound by at least one of the top 10%
most highly connected TFs. In addition, 82% of the
promoters are bound by at least one of the other less-
well-connected interactors, and more than half of the
target promoters bind both. Summarizing these obser-
vations, Walhout described a model of the transcrip-
tion regulatory network in C. elegans, where genes are

subjected to three or more layers of transcriptional
control. The first layer consists of global regulators
which control the expression of many genes in many
different systems. The second layer involves ‘master
regulators’ which control the expression of multiple
genes involved in specific cellular processes. Finally,
the third layer constitutes ‘specifiers’ which fine-tune
the expression of a relatively small number of genes.
The description of the layered architecture for the
C. elegans transcription regulatory network provides
an additional level of network hierarchy to previously
described network motifs. Quite interestingly, the
Functional genomics and systems biology S. Ivakhno
2442 FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS
layered architecture of the C. elegans network resem-
bles dense overlapping regions of the E. coli transcrip-
tion network [15], although in the latter case such
coherent division into different levels of global regula-
tion were not observed.
E. Furlong from EMBL devoted her talk to the
recent study of the transcription regulatory network
during muscle development in Drosophila . The main
approach adopted by Furlong’s group is a combination
of chip-on-chip arrays with DNA microarrays and
immunohistochemistry. Using a combination of these
techniques, they obtained a temporal regulatory net-
work of Mef2 activity, the key myogenesis regulator
during Drosophila embryonic development [19]. Two
novel ideas behind this approach are worth mentioning.
First, they used gene expression profiling of Mef2

mutant embryos during the time course and found
genes requiring Mef2 for their correct expression at var-
ious stages of development. This provided functional
validation of the chip-on-chip results and distinguished
between direct and indirect regulation. Second, the
chip-on-chip was itself performed over the time course,
which identified temporal patterns of Mef2 target gene
regulation. Although most of the reported transcription
networks based on chip-on-chip data are static, Fur-
long described one of the first examples of a dynamic
transcription network which is relevant in the context
of developmental biology [20]. This example also
reveals other crucial themes in biological networks ana-
lysis: integration of different data types for reconstruc-
tion of temporal and spatial relations in the networks.
As different high-throughput techniques become more
established and widespread, we can expect much wider
utilization of data integration approaches for building
more complex biological networks. Ultimately, this will
lead to fusion of different biological networks, such as
signalling, transcription and metabolic, into the cellular
super network. Several early attempts in this direction
have already produced interesting results. For example,
Zhang et al. [21] assembled an integrated yeast network
in which nodes represent genes (or their protein prod-
ucts) and edges represent various biological interac-
tions, such as protein–protein interactions, genetic
interactions, transcriptional regulation, sequence
homology, and expression correlation. A search for sig-
nificantly enriched motifs in this integrated network

found specific ‘network themes’, higher-order network
structures that correspond to various biological phe-
nomena, such as ‘compensatory complexes’. Another
similar study found that ‘action’ networks (metabolic,
co-expression, and interaction) share the same scaffold-
ing of hubs, whereas the regulatory network uses differ-
ent regulatory hubs [22].
Networks derived from synthetic genetic
interactions and RNAi screens
Other approaches to the construction of biological net-
works focus on functional relations between different
genes. RNAi and synthetic lethal screens that are used
for building epistatic and genetic networks were also
covered at the meeting. N. Perrimon from Harvard
Medical School (Boston, MA, USA) described how
high-throughput RNAi screens can be used to analyze
information flow in Drosophila signal-transduction
pathways. One of the key considerations in such
screens is the choice of appropriate read-out assays
that can accurately assess the effect of gene knock-
down on the pathway of interest [23]. Whereas more
proximal assays that measure activity near receptors
would identify fewer regulators and may miss compo-
nents of input branches from other receptors, distal
readouts (e.g. transcriptional reporters or morpho-
logical outputs through ‘high-content screening’ micro-
scopy [24]) may integrate more pathways than is
desirable. Therefore, for the comprehensive analysis of
a particular signalling pathway, several approaches
should be combined to accurately identify correspond-

ing phenotypes. Perrimon described one example
where 22 000 duplex RNAs were used for identifica-
tion of new Wnt pathway targets [25]. The screening
method relied on sensitive reporter genes containing
T-cell factor-binding sites fused to a minimal promoter
upstream of a the luciferase gene. This set-up led to
the identification of 238 potential Wnt pathway genes.
In the other RNAi screen, DNA microarrays were
used as phenotypes to infer epistatic interactions or
epistasis gene networks [26,27]. Interestingly, similar
approaches were independently developed for the ana-
lysis of signalling networks, where kinase inhibitors
and multiparameter flow cytometry are used in place
of RNAi and DNA microarrays [28]. In this case,
availability of the single-cell data from flow cytometry
allows accurate de novo reconstruction of signalling
networks using machine learning algorithms. However,
disadvantages of this approach are the limited availa-
bility of phospho-specific antibodies and the difficulty
in scaling up the flow cytometry for simultaneous ana-
lysis of multiple kinases.
C. Boone from The University of Toronto, Canada
described two recent extensions to the synthetic genetic
array (SGA) technology developed at his laboratory,
which are based on detecting synthetic genetic inter-
action of essential genes and chemical–genetic inter-
actions. The idea behind the original technique is that
most yeast genes are nonessential and therefore
their knockdowns do not produce any observable
S. Ivakhno Functional genomics and systems biology

FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS 2443
phenotypic defects [29]. However, the combination of
mutations in two genes that cause cell death or
reduced fitness provides a means of mapping genetic
interactions. Genetic interactions among essential
genes were not examined systematically because of the
inherent difficulty in creating and working with hypo-
morphic (similar) alleles. Boone described the use of
temperature-sensitive conditional alleles based on the
tetracycline (tet) promoter that overcomes this chal-
lenge. A mutation in a particular query gene is first
crossed to an input array of single mutants, and then a
series of robotic pinning steps generates the array of
double mutants, which is then scored for fitness defects
relative to either of the single mutants. With this
approach, Boone’s laboratory conducted 30 SGA
screens of 575 essential genes and built the correspond-
ing genetic network [30]. This network resembles the
genetic network of nonessential genes: both have a
scale-free topology and most of the interactions do not
overlap with protein–protein interactions. However,
the most notable property of the essential gene genetic
network is its density (median frequency of interac-
tions is 3%), which is five times higher than the net-
work density for nonessential genes. These results
indicate that essential genes are well connected hubs
on the genetic interaction network, and that essential
pathways are also highly buffered compared with the
network of nonessential genes. Interestingly, analogous
results were recently reported for the yeast transcrip-

tion network [31]. Similar results obtained from the
analysis of different biological networks suggest that
scale-free architecture is not the only way to produce
biological robustness and that distributed architecture
may also contribute to the robustness in the same net-
work (although it may apply to different nodes in the
network, e.g. TFs versus housekeeping genes).
SGA can also be used in combination with chemical
treatments to identify genes involved in mediating the
response to drug compounds [32]. The approach is
based on the premise that, if a small molecule disrupts
the function of its target protein, then cells with a
smaller amount of that target protein would be more
sensitive to the compound. In the second part of the
talk, Boone described a new screen with 82 compounds
against the Saccharomyces cerevisiae-viable deletion set
to generate chemical–genetic interaction profiles [32].
The clustering of the resulting data matrix identified
sets of compounds with similar biological effects and
genes that show sensitivity to similar compounds [33].
Several other talks also discussed analysis of genetic
networks. For instance, one limitation of SGA is that
only negative interactions can be identified. Conse-
quently, interactions that are detected generally involve
genes that have unrelated functions, which obscures
the biological relevance and interpretation. To over-
come this limitation, N. Krogan (University of Tor-
onto, Canada) described a new technique, epistatic
mini-array profiles, which consists of arrays of all
double-mutant combinations for the genes involved in

a specific process [34]. This approach involves measur-
ing quantitative effects on colony growth, which,
unlike looking for viability, can detect both positive
and negative interactions.
Protein interaction networks
Protein interaction networks were also extensively dis-
cussed at the meeting. These networks usually repre-
sent either direct or indirect (a part of a protein
complex) physical interactions between proteins and
are typically derived from yeast two-hybrid (Y2H)
screens or MS-based analysis of protein complexes (co-
AP ⁄ MS) [35]. In most cases, protein interaction net-
works are static and represent only a small subset of
the true biological interactions. M. Vidal from Har-
vard Medical School devoted his talk to the issues of
network coverage and the effect of false negatives on
the accuracy of the protein interaction network. The
small overlap between different Y2H maps is often
attributed to low data accuracy. However, Vidal
argued that each map covers only 3–9% of the total
interactome, so limited overlap should be expected. To
test this assumption, Vidal’s group developed a samp-
ling algorithm for generation of many low coverage
networks with properties similar to the current Y2H
maps. In almost 23 000 such comparisons, the interac-
tome that was common to each pair comprised only
2.1%, which suggests that it is possible to observe per-
fectly accurate samples (without false positives) that
have very limited overlap solely because of the low
coverage of their maps [36]. Drawing from examples in

the genome sequencing community, Vidal proposed a
solution to this problem. As any single study cannot
possibly cover all the protein interactions, he suggested
that individual research groups should continually con-
tribute small subnetworks to the global interactome
repository in the way it was done during sequencing of
the human genome.
The incompleteness of protein interaction networks
might raise concerns about such well-established con-
cepts as scale-free architecture, as it becomes unclear
whether extrapolation of network topology from the
currently limited data to the whole network can be
achieved accurately and with high confidence. Current
interactome networks are often attributed with power
law degree distribution, in which most proteins interact
Functional genomics and systems biology S. Ivakhno
2444 FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS
with a few partners, whereas a few proteins, ‘hubs’,
interact with many partners [37]. In the biological con-
text, power law topology might relate to the generic
robustness of protein interaction networks, and the
hubs may be considered the most suitable targets for
drugs. Vidal described a recent study by his group that
attempted to relate interactome network coverage to
the observable degree distribution [36]. By sampling
from random networks with different degree distribu-
tions, they created multiple subnetworks of different
size (relating to the original random networks). For
instance, at 10% of coverage, random networks that
did not have power law distribution started exhibiting

scale-free behaviour. Although more detailed compar-
ison with real Y2H and co-AP ⁄ MS networks suggested
that complete a protein interactome map is still more
likely to be scale-free, other possibilities cannot be
ruled out, especially considering that many technical
false positives are auto-activators or sticky proteins
(creating nodes of artificially high degree).
Affinity purification methods allow macromolecules
physically associated with a tagged bait to be
retrieved and identified by MS. These methods have
been used as large-scale screens in prokaryotic and
eukaryotic cells, leading to the construction of many
protein interaction maps [38]. However, without
genome-wide coverage, assignment of a protein to a
particular complex relies heavily on experimental
stringency and arbitrary thresholds. A C. Gavin from
EMBL described the first genome-wide screen for
protein complexes in budding yeast based on tandem
affinity purification coupled to MS [39]. This method
identified 491 complexes, of which 257 were novel.
Commenting on the data analysis, Gavin pointed out
that complexes can be partitioned into the core and
attachment proteins, which provide diversity to the
core and allow execution of functions under different
conditions. Using the ‘guilt by association’ principle,
Gavin and collaborators also identified functions for
several novel modules involved in ribosome biogenesis
and RNA metabolism. The functional association was
further aided by integration of protein interaction
data with data on gene expression, localization, func-

tion, evolutionary conservation, protein structure and
binary interactions. Finally, Gavin reported the deve-
lopment of a new scoring system for measuring the
potency of proteins for forming associations, the
‘socioaffinity index’. The socioaffinity index represents
the tendency of proteins to associate under different
conditions and therefore could be used to analyze the
yeast interactome network from the dynamic perspec-
tive. The socioaffinity index is similar to several meth-
ods developed for detecting community structures in
social networks and therefore could be extended with
algorithms proposed in that context [40,41]. It would
be interesting to compare the ‘community structures’
in protein interaction networks obtained by different
algorithms.
Signalling networks
The issue of modelling signalling networks was also
discussed at the meeting. Signalling networks differ
from protein interaction or transcription networks
in that they are by nature temporal and therefore
amenable to modelling of signal propagation in the
network. Stochastic and deterministic differential equa-
tions (i.e. ODE), process algebra and Boolean kinetics
have been used to analyze signalling networks [42].
These approaches attain the highest level of modelling
accuracy by incorporating kinetic parameters directly
into the network. However, they require availability of
complete information about the structure of the signal-
ling network and the values of kinetic parameters.
Unfortunately, this is not available in many cases,

especially when large cascades of 30 or more proteins
are considered. Excellent reviews on mechanistic mod-
elling of signalling networks can be found in Kholo-
denko [43] and Mogilner et al. [42]; the motif-
based ⁄ dynamic systems approach is covered in [44]. At
least two distinct levels of modelling signalling net-
works can be described (although many examples lie
between these two extremes). In one approach, com-
prehensive ODE modelling of all the species deemed to
participate in a particular signal-transduction cascade
is attempted with numerical methods (an example of
this approach can be found in [45]). An alternative
‘hypothesis-driven’ approach starts by introducing
some prior assumptions into the model to simplify it
to a few equations that can then be solved analytically.
Although the resulting model becomes a highly
abstract representation of the signalling network, it
can be very powerful in addressing specific questions
([46] contains typical examples).
A. van Oudenaarden from Massachusetts Institute
of Technology devoted his talk to the epigenetic
inheritance of gene-expression dynamics in single cells
using a ‘hypothesis-driven’ modelling approach. van
Oudenaarden described how, on induction of cell dif-
ferentiation, distinct cell phenotypes can be encoded
by complex signalling networks that prevent pheno-
type reversion even in the presence of significant
environmental fluctuations [47]. To explore the key
parameters that determine the stability of cellular
memory, the galactose network of yeast was used

as a model system. One of the advantages of this
S. Ivakhno Functional genomics and systems biology
FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS 2445
system over the networks of prokaryotes is that it
contains multiple nested feedback loops that bring
different functionalities to the complete network.
Using fluorescent microscopy and computational
energy landscape approaches [48], the van Oude-
naarden group revealed intricate combinations of
signalling circuits. One of the findings was that the
core positive-feedback loop through GAL3 is neces-
sary for this cellular memory, whereas a negative-
feedback loop through GAL80 competes with the
positive GAL3 loop and reduces the potential for
memory storage. Consistently, when the negative
feedback loop is opened and Gal80p levels are con-
trolled constitutively, the memory persistence can be
tuned from hours to months. Such observations pro-
vide a quantitative understanding of the stability and
reversibility of cellular differentiation states. It should
be noted that the definition of epigenetic inheritance
in this talk was not restricted to nonmutational
changes in the chromatin, but comprised all possible
sources of inheritance unrelated to DNA sequence,
such as distribution and concentration of key regula-
tory proteins in the cytoplasm.
Networks of networks: metagenomics
applications in systems biology
Several other avenues in systems and networks biology
were also briefly introduced by several speakers. For

instance, in the talk ‘Metagenomics of organisms and
the air’, E. Rubin from Lawrence Berkeley National
Laboratory (Berkeley, CA, USA) showed how high-
throughput DNA sequencing approaches can be used
to study and characterize organisms that are imposs-
ible to grow in the laboratory-controlled environment
[49]. Metagenomic approaches rely on sequencing as a
tool to characterize microbial communities. Rubin des-
cribed a study that investigated the composition of
organisms in the air harvested from two densely popu-
lated urban buildings. Comparison of air samples with
each other and with nearby terrestrial and aquatic
environments suggested that indoor air microbes are
not random transients from surrounding environments,
but rather originate from indoor niches including
human occupants. In another study described by
Rubin, an approach called ‘reverse genomics’ was used
to characterize a symbiotic microbial community in
the worm, Olavius algarvensis, which lacks mouth, gut,
and nephridia [50]. This worm lives in several sediment
layers and forms species-specific associations with
extracellular bacterial endosymbionts located just
below the worm cuticle. As the symbionts have not
been grown in culture, their phylogeny has only been
accessible through 16S ribosomal RNA analysis and
fluorescence in situ hybridization, which uses reverse
genomics to decipher the organism’s functions from its
sequence. By shotgun sequencing, Rubin’s group was
able to reconstruct the symbiotic relationship
between the worm and four different microbes that

accounts for the loss of digestive and excretory systems
in O. algarvensis. In one plausible model, the selective
advantage of harbouring multiple symbionts lies in
their ability to supply the worm with energy from the
diverse supply of reducing and oxidizing compounds
needed for the worm to survive in various environ-
ments of different oxidized and reduced sediment
layers.
The third EMBL Biennial Symposium brought
together researchers from several fields to discuss cur-
rent issues and trends in various subfields of func-
tional genomics and systems biology. The overall
meeting and the talks of the individual speakers out-
lined several important directions in which systems
biology may significantly progress over the next few
years. First, analysis of regulatory elements in the
human genome could yield novel results with the
availability of new technologies such as discussed in
the report of the chromosome conformation capture
technique, supplemented by new computational algo-
rithms for detection of functional elements. In the lat-
ter respect, advanced probabilistic graphic modelling
approaches that extend hidden Markov models might
produce the best results. Secondly, the networks bio-
logy paradigm will probably gain a more central role
in systems biology research and produce many inter-
esting research directions in the areas of algorithmic
networks theory (i.e. various topological and cluster-
ing measures), flow of biological information (i.e.
maximum flow in biological networks), ODE-based

modelling of signalling networks, and obviously net-
works integration through algorithmic and machine
learning approaches. Finally, systems biology should
progress from its promise to direct examples of medi-
cally relevant research projects. DNA microarrays
may be the first successful systems biology ⁄ functional
genomics application for diagnosis and treatment of
patients with cancer.
Another important trend that was noticeable at the
symposium was the methodology and scope of systems
and networks biology research. The meeting was no
longer a place for computer scientists, physicists and
biologists who wanted to apply their individual exper-
tise for solving complex systems-wide biological prob-
lems. It was a meeting of systems biologists who
understand the methodologies and paradigms of com-
puter science, physics and biology and recognize the
Functional genomics and systems biology S. Ivakhno
2446 FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS
limitations of each individual discipline and its role in
systems biology research. More significantly, there was
a clear trend towards a general understanding of what
constitutes an important problem in systems biology
and how it should be resolved by application of rele-
vant methods and techniques.
References
1 Encode Project Consortium (2004) The ENCODE
(ENCyclopedia of DNA Elements) Project. Science 306,
636–640.
2 Xie X, Lu J, Kulbokas EJ, Golub TR, Mootha V,

Lindblad-Toh K, Lander ES & Kellis M (2005) Sys-
tematic discovery of regulatory motifs in human promo-
ters and 3¢ UTRs by comparison of several mammals.
Nature 434, 338–345.
3 David L, Huber W, Granovskaia M, Toedling J, Palm
CJ, Bofkin L, Jones T, Davis RW & Steinmetz LM
(2006) A high-resolution map of transcription in the
yeast genome. Proc Natl Acad Sci USA 103, 5320–5325.
4 Royce TE, Rozowsky JS, Bertone P, Samanta M, Stolc
V, Weissman S, Snyder M & Gerstein M (2005) Issues
in the analysis of oligonucleotide tiling microarrays for
transcript mapping. Trends Genet 21, 466–475.
5 Dekker J (2006) The three ‘C’ s of chromosome con-
formation capture: controls, controls, controls. Nat
Methods 3, 17–21.
6 Hood L, Heath JR, Phelps ME & Lin B (2004) Systems
biology and new technologies enable predictive and
preventative medicine. Science 306, 640–643.
7 Ein-Dor L, Kela I, Getz G, Givol D & Domany E
(2005) Outcome signature genes in breast cancer: is
there a unique set? Bioinformatics 21, 171–178.
8 Novak K (2006) News feature: where the chips fall.
Nat Med 12 , 158–159.
9 Roepman P, Wessels LFA, Kettelarij N, Kemmeren P,
Miles AJ, Lijnzaad P, Tilanus MGJ, Koole R, Hordijk
G-J, van der Vliet PC et al. (2005) An expression profile
for diagnosis of lymph node metastases from primary
head and neck squamous cell carcinomas. Nat Genet 37,
182–186.
10 Roepman P, Kemmeren P, Wessels LFA, Slootweg PJ

& Holstege FCP (2006) Multiple robust signatures for
detecting lymph node metastasis in head and neck can-
cer. Cancer Res 66, 2361–2366.
11 Michiels S, Koscielny S, Hill C. (2205) Prediction of
cancer outcome with microarrays: a multiple random
validation strategy. Lancet 365, 488–492.
12 van ‘T, Veer LJ, Dai H, van de Vijver MJ, He YD,
Hart AAM, Mao M, Peterse HL, van der Kooy K,
Marton MJ, Witteveen AT et al. (2002) Gene expression
profiling predicts clinical outcome of breast cancer.
Nature 415, 530–536.
13 Dobrin R, Beg Q, Barabasi A-L, Oltvai Z. (2004)
Aggregation of topological motifs in the E. coli tran-
scriptional regulatory network. BMC Bioinformatics
5, 10.
14 Tong AHY, Lesage G, Bader GD, Ding H, Xu H, Xin
X, Young J, Berriz GF, Brost RL, Chang M et al.
(2004) Global mapping of the yeast genetic interaction
network. Science 303, 808–813.
15 Shen-Orr SS, Milo R, Mangan S & Alon U (2002) Net-
work motifs in the transcriptional regulation network of
Escherichia coli. Nat Genet 31, 64–68.
16 Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph
Z, Gerber GK, Hannett NM, Harbison CT, Thompson
CM, Simon I, et al. (2002) Transcriptional regulatory
networks in Saccharomyces cerevisiae. Science
298,
799–804.
17 Blais A & Dynlacht BD (2005) Constructing
transcriptional regulatory networks. Genes Dev 19,

1499–1511.
18 Deplancke B, Mukhopadhyay A, Ao W, Elewa AM,
Grove CA, Martinez NJ, Sequerra R, Doucette-Stamm
L, Reece-Hoyes JS & Hope IA (2006) A gene-centered
C. elegans protein–DNA interaction network. Cell 125,
1193–1205.
19 Sandmann T, Jensen LJ, Jakobsen JS, Karzynski MM,
Eichenlaub MP, Bork P & Furlong EEM (2006) A tem-
poral map of transcription factor activity: Mef2 directly
regulates target genes at all stages of muscle develop-
ment. Dev Cell 10, 797–807.
20 Furlong EE (2004) Integrating transcriptional and sig-
nalling networks during muscle development. Curr Opin
Genet Dev 14, 343–350.
21 Zhang L, King O, Wong S, Goldberg D, Tong A,
Lesage G, Andrews B, Bussey H, Boone C & Roth F
(2005) Motifs, themes and thematic maps of an inte-
grated Saccharomyces cerevisiae interaction network.
J Biol 4,6.
22 Qi Y & Ge H (2006) Modularity and dynamics of cellu-
lar networks. PLoS Comput Biol 2, 174.
23 Friedman A & Perrimon N (2006) High-throughput
approaches to dissecting MAPK signaling pathways.
Methods 40, 262–271.
24 Pepperkok R & Ellenberg J (2006) High-throughput
fluorescence microscopy for systems biology. Nat Rev
Mol Cell Biol 7, 690–696.
25 DasGupta R, Kaykas A, Moon RT & Perrimon N
(2005) Functional genomic analysis of the Wnt-Wingless
signaling pathway. Science 308, 826–833.

26 Boutros M, Agaisse H & Perrimon N (2002) Sequential
activation of signaling pathways during innate immune
responses in Drosophila. Dev Cell 3, 711–722.
27 Markowetz F, Bloch J & Spang R (2005) Non-transcrip-
tional pathway features reconstructed from secondary
S. Ivakhno Functional genomics and systems biology
FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS 2447
effects of RNA interference. Bioinformatics 21, 4026–
4032.
28 Sachs K, Perez O, Pe’er D, Lauffenburger DA & Nolan
GP (2005) Causal protein-signaling networks derived
from multiparameter single-cell data. Science 308,
523–529.
29 Tong AHY, Evangelista M, Parsons AB, Xu H, Bader
GD, Page N, Robinson M, Raghibizadeh S, Hogue
CWV, Bussey H et al. (2001) Systematic genetic analysis
with ordered arrays of yeast deletion mutants. Science
294, 2364–2368.
30 Mnaimneh S (2004) Exploration of essential gene func-
tions via titratable promoter alleles. Cell 118, 31–44.
31 Balaji S, Iyer LM, Aravind L & Babu MM (2006)
Uncovering a hidden distributed architecture behind
scale-free transcriptional regulatory networks, J Mol
Biol 360, 204–212.
32 Parsons AB, Brost RL, Ding H, Li Z, Zhang C, Sheikh
B, Brown GW, Kane PM, Hughes TR & Boone C
(2004) Integration of chemical-genetic and genetic inter-
action data links bioactive compounds to cellular target
pathways. Nat Biotech 22, 62–69.
33 Dueck D, Morris QD & Frey BJ (2005) Multi-way

clustering of microarray data using probabilistic sparse
matrix factorization. Bioinformatics 21, i144–151.
34 Schuldiner M, Collins SR, Thompson NJ, Denic V,
Bhamidipati A, Punna T, Ihmels J, Andrews B, Boone C,
Greenblatt JF, et al. (2005) Exploration of the function
and organization of the yeast early secretory pathway
through an epistatic miniarray profile. Cell 123, 507–519.
35 Cusick ME, Klitgord N, Vidal M & Hill DE (2005)
Interactome: gateway into systems biology. Hum Mol
Genet 14 Spec No. 2, R171–81.
36 Han J-DJ, Dupuy D, Bertin N, Cusick ME & Vidal M
(2005) Effect of sampling on topology predictions of
protein–protein interaction networks. Nat Biotechnol
23, 839–844.
37 Roverato A (2005) A unified approach to the characteri-
zation of equivalence classes of DAGs, chain graphs with
no flags and chain graphs. Scand J Stat 32, 295–312.
38 Bouwmeester T, Bauch A, Ruffner H, Angrand P-O,
Bergamini G, Croughton K, Cruciat C, Eberhard D,
Gagneur J, Ghidelli S et al. (2004) A physical and
functional map of the human TNF-a ⁄ NF-jB signal
transduction pathway. Nat Cell Biol 6, 97–105.
39 Gavin A-C, Aloy P, Grandi P, Krause R, Boesche M,
Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld
B, et al. (2006) Proteome survey reveals modularity of
the yeast cell machinery. Nature 440, 631–6.
40 Girvan M & Newman MEJ (2002) Community struc-
ture in social and biological networks. Proc Natl Acad
Sci USA 99, 7821–7826.
41 Newman MEJ (2006) From the cover: modularity and

community structure in networks. Proc Natl Acad Sci
USA 103
, 8577–8582.
42 Mogilner A, Wollman R & Marshall WF (2006) Quan-
titative modeling in cell biology: what is it good for?
Dev Cell 11, 279–287.
43 Kholodenko BN (2006) Cell-signalling dynamics in time
and space. Nat Rev Mol Cell Biol 7, 165–176.
44 Tyson JJ, Chen KC & Novak B (2003) Sniffers, buzzers,
toggles and blinkers: dynamics of regulatory and signal-
ing pathways in the cell. Curr Opin Cell Biol 15,
221–231.
45 Elowitz MB & Leibler S (2000) A synthetic oscillatory
network of transcriptional regulators. Nature 403,
335–338.
46 Amonlirdviman K, Khare NA, Tree DRP, Chen W-S,
Axelrod JD & Tomlin CJ (2005) Mathematical model-
ing of planar cell polarity to understand domineering
nonautonomy. Science 307, 423–426.
47 Acar M, Becskei A & van Oudenaarden A (2005)
Enhancement of cellular memory by reducing stochastic
transitions. Nature 435, 228–232.
48 Becskei A, Kaufmann BB & van Oudenaarden A (2005)
Contributions of low molecule number and chromo-
somal positioning to stochastic gene expression. Nat
Genet 37, 937–944.
49 Tringe SG & Rubin EM (2005) Metagenomics: DNA
sequencing of environmental samples. Nat Rev Genet 6,
805–814.
50 Woyke T, Teeling H, Ivanova NN, Huntemann M,

Richter M, Gloeckner FO, Boffelli D, Anderson IJ,
Barry KW, Shapiro HJ, et al. (2006) Symbiosis insights
through metagenomic analysis of a microbial consor-
tium. Nature 443, 950–955.
Functional genomics and systems biology S. Ivakhno
2448 FEBS Journal 274 (2007) 2439–2448 ª 2007 The Author Journal compilation ª 2007 FEBS

×