Tải bản đầy đủ (.pdf) (11 trang)

Báo cáo khoa học: Identification of small scale biochemical networks based on general type system perturbations pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (193 KB, 11 trang )

Identification of small scale biochemical networks based
on general type system perturbations
Henning Schmidt1, Kwang-Hyun Cho2,3 and Elling W. Jacobsen1
1 Signals, Sensors and Systems, Royal Institute of Technology – KTH, Stockholm, Sweden
2 College of Medicine, Seoul National University, Chongno-gu, Seoul, Korea
3 Korea Bio-MAX Institute, Seoul National University, Gwanak-gu, Korea

Keywords
biochemical networks; identification;
Jacobian; time-series measurements
Correspondence
E. W. Jacobsen, Department of Automatic
Control, Royal Institute of Technology –
KTH, Osquldasvag 10, S-10044 Stockholm,
Sweden
Fax: +46 8790 7329
Tel: +46 8790 7325
E-mail:
K.-H. Cho, College of Medicine, Seoul
National University, Chongno-gu, Seoul,
110–799, Korea, and Korea Bio-MAX
Institute, Seoul National University,
Gwanak-gu, Seoul, 151–818, Korea
Fax: +82 2887 2692
Tel: +82 2887 2650
E-mail:
(Received 22 December 2004, accepted
8 February 2005)
doi:10.1111/j.1742-4658.2005.04605.x

New technologies enable acquisition of large data-sets containing genomic,


proteomic and metabolic information that describe the state of a cell.
These data-sets call for systematic methods enabling relevant information
about the inner workings of the cell to be extracted. One important issue
at hand is the understanding of the functional interactions between genes,
proteins and metabolites. We here present a method for identifying the
dynamic interactions between biochemical components within the cell, in
the vicinity of a steady-state. Key features of the proposed method are that
it can deal with data obtained under perturbations of any system parameter, not only concentrations of specific components, and that the direct
effect of the perturbations does not need to be known. This is important as
concentration perturbations are often difficult to perform in biochemical
systems and the specific effects of general type perturbations are usually
highly uncertain, or unknown. The basis of the method is a linear leastsquares estimation, using time-series measurements of concentrations and
expression profiles, in which system states and parameter perturbations are
estimated simultaneously. An important side-effect of also employing estimation of the parameter perturbations is that knowledge of the system’s
steady-state concentrations, or activities, is not required and that deviations
from steady-state prior to the perturbation can be dealt with. Time derivatives are computed using a zero-order hold discretization, shown to yield
significant improvements over the widely used Euler approximation. We
also show how network interactions with dynamics that are too fast to be
captured within the available sampling time can be determined and
excluded from the network identification. Known and unknown moiety
conservation relationships can be processed in the same manner. The
method requires that the number of samples equals at least the number of
network components and, hence, is at present restricted to relatively smallscale networks. We demonstrate herein the performance of the method on
two small-scale in silico genetic networks.

New high-throughput experimental technologies, i.e.
for monitoring the expression levels of large gene sets
and the concentrations of metabolites, are evolving
rapidly. These data sets contain the information
required to uncover the organization of biological

systems on a genetic, proteomic, and metabolic level.
However, in order to realize the translation of data
FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS

into a system level understanding of cell functions,
methods that can construct quantitative mathematical
models from data are needed. In particular, determination of the quantitative interactions between the
components within and across these levels is an
important issue. These interactions lead to the notion
of networks that can be represented by weighted,
2141


Identification of biochemical network structures

directed graphs, where the nodes correspond to the
biochemical components and the edges, represented as
arrows with weights attached, indicate the direct
quantitative effect that a change in a certain component has on another component. The weights are in
general nonlinear functions that represent reaction
kinetics. Determination of these network structures
will provide insight into the functional relationships
between the involved components, so as to better
understand the functions of biological systems, and
will eventually lead to knowledge concerning how
these systems can be manipulated in order to achieve
a certain desired behavior.
Due to the fact that the reaction kinetics in general
are unknown, and because of the large number of
parameters involved, it is in most cases unfeasible to

determine directly the nonlinear weights from experimental data. Herein, however, a distinction has to be
made between gene and metabolic networks. For metabolic networks, a good initial guess of the network
structure is usually available from databases, such as
KEGG [1], while the structures of gene networks usually are largely unknown in advance. Therefore, the
approach presented in this paper probably has its
greatest value for gene networks, but can be applied
equally well to signaling and metabolic networks
where, e.g. model validation and the determination of
new, previously unknown, connections between intermediates is needed.
A common approach in structural identification is to
consider the biochemical network behavior around
some steady-state and assume that it behaves linearly
for small deviations from this steady-state [2–4]. With
this assumption, the network weights become constants, quantifying the interactions between the
components in the neighborhood of the steady-state.
Grouping these constant weights into a matrix yields
an interaction matrix, the Jacobian, which quantifies
the mutual effects of deviations from the steady-state
on the various components of the system.
Several approaches to the determination of interaction matrices of biochemical systems have been published recently. These can be divided roughly into
methods focusing on the determination of the qualitative structure of the interactions and those aimed at
determining quantitative information about the interactions. Ross [5] reviews two approaches to determine
the structure of reaction pathways from time-series
measurements of metabolites and proteins. The first
approach is based on small pulses of concentration
changes applied to the different species around a stable
steady-state. Depending on the relative behavior of the
measured responses, the considered metabolic pathway
2142


H. Schmidt et al.

can be determined [6,7]. The second approach is based
on correlations between different species when periodically forcing the system by changing some input species over time. Using correlation and multidimensional
scaling analysis, the structure of the considered pathway can be unravelled [8].
Kholodenko et al., Gardner et al. and Vance et al.
propose methods for determining quantitative interaction matrices based on steady-state responses of perturbed genetic networks [2,3,9]. As the responses to the
applied perturbations can often become relatively large
in steady-state, these methods are potentially limited
depending on the nonlinearity of the considered systems. Furthermore, the fact that Kholodenko et al.
and Vance et al. determine the n2 elements of the interaction matrix from n2 measurements, suggests that the
results are potentially sensitive to measurement uncertainty [2,9].
In contrast to methods based on steady-state measurements, methods based on time-series measurements
can cope better with the issue of nonlinearity and
measurement uncertainty. Monitoring time-series also
enables significantly more information to be extracted
in each experiment.
A widespread method in the identification of reaction
networks using time-series measurements is a leastsquares estimation of the Jacobian. An interesting
method is presented by Mihaliuk et al., in which the
idea is to apply perturbations to all components in the
network and to determine the Jacobian by measuring
only one component, or a linear combination of components [10]. A drawback of this method, for the application to biological systems, is the fact that the
perturbations are assumed to be instant changes of the
concentrations of intermediates in the network. Furthermore, the magnitude of these perturbations is assumed
to be known. The use of concentration shift experiments, that is, adding specific components to the system
as pulses or steps, is a typical assumption in many previously proposed methods. However, while such perturbations are mainly feasible in chemical systems, they
are usually hard to realize in vitro or in vivo [11].
To overcome the restriction to concentration shift
experiments, Sontag et al. derive a method based on

parameter perturbations, in which a separate experiment is performed for each network component so that
the perturbation has no direct effect on this component, that is, the designed perturbation only works indirectly through other components in the network [4].
However, this requires substantial a priori structural
knowledge, and furthermore causes problems with rank
deficient measurement matrices. (The latter is discussed
in more detail in the supplementary material.)
FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS


H. Schmidt et al.

Herein, we study biochemical networks involving
genes, proteins, and ⁄ or metabolites, and consider
determination of the Jacobian using least-squares estimation from time-series measurements obtained in the
vicinity of some steady-state. The method is able to
deal with very general types of system perturbations.
In this paper we assume the use of constant parameter
perturbations, such as gene knockouts and inhibitor
additions. For these type of perturbations, the exact
size as well as the direct effect of the perturbations will
in general be largely unknown. We therefore also consider incorporating a determination of the perturbation
itself from the available data. However, the method
can be applied equally when pulse perturbations are
realizable for a given network, and in the case of
known or unknown time-varying parameter perturbations. Furthermore, it is possible to combine pulse and
parameter perturbations. (The use of the method for
other types of perturbations is discussed in the
supplementary material.)
Furthermore, we show that the effect of unsteadystate initial conditions can be considered an unknown
perturbation and hence can be estimated in the same

manner. Due to the latter feature, the proposed
method does not, in contrast to most other methods,
require the system to be in a steady-state when the
perturbations are applied, nor does it require knowledge of the steady-state activities and concentrations.
However, as the method is based on the assumption
that the network is behaving linearly around the same
steady-state for all experiments, the initial states of all
experiments should, in general, not be too far from
the steady-state at which the Jacobian is to be determined.
Network modelling based on time-series data
requires estimation of time derivatives of the states.
These are commonly calculated through the use of
some Euler type finite difference approximation.
Herein we employ a representation of time derivatives,
commonly used in systems theory, that avoids any
approximations, thereby leading to significantly
improved estimation results. Finally, we address the
issue of using dynamics that are significantly faster
than the sampling time, and show how such interactions can be identified and extracted from the data-sets
prior to the network identification. Thus, a reduced
network, with the fast dynamics replaced by algebraic
relationships, can be identified. As we show, the same
approach is also applicable in the case of moiety conservations in metabolic and signaling networks, and
thus it is possible to determine the Jacobian expressed
only in terms of the independent intermediates of the
network.
FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS

Identification of biochemical network structures


The proposed method will in general uncover only a
phenomenological interaction topology of the network
as not all intermediates can be measured. That is, we
assume that only the measured components are part of
the network to be modelled. This is a common
assumption often used [4]. This assumption is relaxed
somewhat by Mihaliuk et al. [10]. However, they
assume that all components are known and are possible to perturb. The case of unknown and unmeasurable components is, of course, a highly relevant topic,
but outside the scope of this paper.
The outline of the paper is as follows. We first present the problem formulation and briefly outline the
method used for network identification from measurement samples. The proposed method is then applied
to in silico models of two small scale gene networks.
Following the conclusions, we present a detailed
description of the method for least-squares identification of the Jacobian, and discuss the impact of the
sampling time and moiety conservations.

Results and Discussion
Problem formulation
We consider metabolic reactions, signaling networks,
and gene networks that can be described by a system
of nonlinear differential equations of the form in Eqn
(1).
_
xtị ẳ f xtị; pị;

1ị

T

where, x ¼ [x1,…,xn] is the state vector containing

the concentrations, activities, or expressions, of all
components in the network and p ¼ [p1,…,pq]T is a
vector of adjustable parameters within the considered
biological system, such as kinetic rate constants and
genes whose expression levels can be perturbed. The
vector valued function f determines the dynamics of
the biochemical network given the states and parameters. The definition in Eqn (1) also incorporates the
_
typical form of kinetic models, that is, s ¼ NV(s,p),
[12]. In cases of small molecular concentrations and ⁄ or
low levels of diffusion, partial differential and stochastic equations may be required, but this is outside the
scope of this paper.
Due to largely unknown reaction kinetics, and the
large number of involved parameters, it is in general
unfeasible to determine the nonlinear functions fi(x,p)
using a ‘top-down’ approach, that is, determining all
reaction mechanisms and involved parameters, such as
rate constants, from measured responses of the perturbed network. We therefore consider the system
(Eqn 1) in the neighborhood of some steady-state
2143


Identification of biochemical network structures

H. Schmidt et al.

(x0,p0) and assume that it behaves linearly for small
variations around this state. This assumption allows us
to represent the system as a linear time invariant system (Eqn 2)
_

Dxtị ẳ @f =@xjx0 ;p0 Dxtị ỵ @f =@pjx0 ;p0 Dptị
ẳ ADxtị ỵ BDptị;

2ị

where Dx(t) ẳ x(t) ) x0 and Dp(t) ¼ p(t) ) p0 denote
deviations from the considered steady-state. Equation
(2) is obtained by truncating the Taylor expansion of
Eqn (1) after the linear terms. The constant matrix A
is the Jacobian matrix of the nonlinear system and represents the network connectivity and the interactions
between the network components around the considered steady-state. For example, in the case of gene networks, a zero element Aij indicates that the expression
level of gene j does not directly affect the expression of
gene i. Positive and negative elements within A imply
activation and inhibition, respectively, of the corresponding components.
The aim here is to determine the Jacobian, or interaction matrix A, based on time-series measurements.
We assume that the measurements are collected using
a fixed sampling time DT, and that at each sample the
concentrations, or activity levels, of all n components
in x are measured. Furthermore we assume that the
perturbations are constant between two sampling
instants. Due to the discrete nature of the measurements, we reformulate the continuous time system
(Eqn 2) as a discrete time system (Eqn 3)
Dxkỵ1 ẳ Ad Dxk ỵ Bd Dpk ;

3ị

where Dxk ẳ Dx(kDT) and Dpk ¼ Dp(kDT). Using
Eqn (3) we will, in the following, show how an estimation Ad for the discrete time Jacobian Ad can be
determined. An estimation A for the continuous time
Jacobian A can then be calculated through a reverse

transformation to continuous time using the Euler
approximation or the, so called, zero-order hold discretization.
The commonly used Euler approximation for the
time derivatives of the states implies replacing the con_
tinuous derivatives by the finite difference Dx(t) ¼
(Dxk+1 ) Dxk) ⁄ DT. The reverse transformation from
discrete time Ad then yields the following approximation for the continuous time Jacobian (Eqn 4)
Aeuler ẳ

1
Ad Iị:
DT

4ị

The Euler discretization method is approximate, and
the goodness of the approximation is in general highly
sensitive to the choice of the sampling time DT. This
2144

‘approximate’ relationship between the continuous and
discrete time models can be avoided completely under
the assumption that the perturbations Dp are constant
between sampling instants. Then, an analytical solution
for Dx(t) can be derived and hence also the exact relationship between Dxk+1 and Dxk. This leads to the
zero-order hold discretization [13] (Eqn 5)
Azoh ẳ

1
logm Ad ị;

DT

5ị

where logm(Ad) denotes the matrix logarithm. Note
there are no approximations involved in this transformation provided the parameter perturbations are constant between samples. (A more detailed discussion of
the zero-order hold discretization and a comparison to
the Euler discretization can be found in part 1 of the
supplementary material.)
Having determined an estimation Ad for Ad, an estimation for the continuous time Jacobian A can be
obtained using the above transformations. We will
demonstrate that Eqn (5), in general, leads to a significantly better estimation of the Jacobian than Eqn (4).
In-Silico four gene network example
We consider a genetic network containing four genes,
which has been used previously ([2,4]) as a test case
for identification of interaction matrices and Jacobians. The motivation behind choosing such a small
scale network for illustration of the method is to
keep the exposition complete and reasonably compact. (The model equations and parameters are given
in part 6 of the supplementary material.) The nominal Jacobian at the considered steady-state is given
by Eqn (6)
2
3
6:45 2:92
0
2:54
6 0
8:17
0
3:93 7
7

6ị
Aẳ6
4 2:31
2:80 14:46
0 5
0
0
10:22 À9:74
and the corresponding network is illustrated in Fig. 1.
From system identification theory it is well known
that a good estimation result requires a sufficient excitation of the system. (In particular, part 3 of the supplementary material, shows that perturbations have to
be chosen such that the complete space of the network
states is perturbed.) In the following we consider
time-series data obtained from constant parameter perturbation experiments. The perturbed parameters correspond to the maximal enzyme rates involved in the
transcription of the genes (part 6 of the supplementary
material). Furthermore, the magnitudes, as well as the
direct effects of the perturbations, are assumed to be
FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS


H. Schmidt et al.

Identification of biochemical network structures

4
3

1
2


Fig. 1. Structure of the four-gene network. The interconnections
represent the direct interactions between the genes. An arrow indicates a positive effect on the gene transcription, and a bar indicates
a negative (inhibitory) effect.

unknown and thus not used in the identification
^
algorithm. The estimations of Ad are obtained using
absolute measurements and applying Eqn 15. Unless
otherwise stated, the zero-order hold discretization is
used in the following.
Estimation of the Jacobian
We first performed an in silico experiment in which the
maximal enzyme rate corresponding to the transcription of gene number one is perturbed by 1%. The
sampling time is chosen as DT ¼ 0.01 h, and we collect
six samples, the minimal number required for estimating the Jacobian when the size of the perturbation is
unknown. The first sample is taken one time-step after
the perturbation has been applied to the system. It
should be noted that the sampling time is chosen
sufficiently small to enable the fastest dynamics of the
system to be captured.
Applying the method proposed above, we obtain the
following estimate for the Jacobian:
2
3
À6:45 À2:90
0:01
2:52
6 0:00 À8:17
0:00
3:93 7

7
^
A ¼6
4 À2:31
2:77 À14:40
0:01 5
0:00 À0:09
10:22 À9:77
Except for the (4,2) element, the estimated Jacobian is
very close to the nominal Jacobian, the largest relative error in the nonzero elements being less than 1%.
This is not surprising, as the perturbation to the system was chosen so small that the nonlinearity of the
system played a relatively modest role. The fact that
the (4,2) element is relatively poorly estimated is
probably explained by more severe nonlinear effects
for this specific relationship with the chosen parameter perturbation. Note that the nonlinear effects in
general will depend on the parameter chosen for perturbation.

FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS

In real experiments, inhibition efficiencies by small
interfering RNA (siRNA) or chemical inhibitors are
much higher than in the experiment above. A more
realistic experimental setting is to assume perturbations
of 50%, and to do several experiments, in which different parameters are perturbed. We performed four
experiments and combine the obtained measurements.
In each experiment, one of the maximal enzyme rates
of the four genes is perturbed by 50%, and the minimum required number of samples are taken – three
samples in each experiment. The sampling time is
DT ¼ 0.01 h. The result of the estimation is given by:
2

3
À6:45 À2:69 À0:01
2:59
6 0:00 À8:14 À0:02
4:06 7
7
^
A¼6
4 À2:19
2:90 À14:38 À0:01 5
0:07
0:08
8:51 À9:72
Comparing with the ‘true’ linear Jacobian in Eqn (6),
we see that the network has been identified with reasonable accuracy, the largest relative error in the
nonzero elements being less than 20%. That we
obtain such good results even for relatively large
perturbations is partly explained by the fact that
measurements from different experiments have been
combined. This allows for different perturbations of
the system and a reduction in the number of samples
required in each experiment. The use of different perturbed parameters in each experiment leads to a better excitation of the system, which is beneficial for
the estimation result. The reduction in the time span
of each experiment reduces the deviation from the
initial state, thereby reducing the effects of nonlinearities.
The above results demonstrate that it is theoretically possible to determine the Jacobian from one
experiment only, but that in practice usually more
than one experiment will be preferable. How to
choose the perturbations in an optimal way is out of
the scope of this paper and a topic for future work.

Instead we will, in the experiments below, consistently
perform four experiments. (In each experiment the
transcription rate of a different gene is perturbed
using the parameters given in part 6 of the supplementary materials.)
Effect of discretization method
In order to illustrate the importance of the method
employed for determination of time derivatives, we
herein perform estimations for different sampling times
DT and perturbation magnitudes, using two discretization methods (Eqns 4 and 5). The relative estimation
error, e, is calculated as (Eqn 7)
2145


Identification of biochemical network structures

aij ¼

n
n
1 XX
jaij j
N i¼1 j¼1
(^
Aij Aij
Aij 6ẳ 0
Aij ;

0;

7ị

;

Ai;j ẳ 0

where N denotes the number of nonzero elements in
the nominal Jacobian A. The results are shown in
Table 1. In the table, we also show the error introduced in the derivatives by using the Euler approximation. The error is determined using the nominal
Jacobian A and is computed as
gDTị ẳ

jjeADT I ỵ ADTịjjsum
:
jjeADT jjsum

The results clearly demonstrate that the zero-order
hold discretization in Eqn (5) leads to a considerable
improvement in the network identification, compared
to the commonly used Euler approximation.
Impact of measurement uncertainty
We consider herein the effect of measurement uncertainty on the estimation of the Jacobian. The uncertainty is simulated in silico by adding noise to the
absolute measurements xk as follows
noise
xk ẳ xk ỵ W x0

Here, W denotes a diagonal matrix in which the entries
are uniformly distributed random variables between
)0.02 and 0.02. These values may appear small compared to the uncertainty in realistic biological experiments. However, the noise levels relative to the
measured deviations Dx correspond to over 50% for
some samples. This should also be seen in relation to
the fact that measurements of gene expressions are


Table 1. Comparison of estimation errors. We compared the estimation errors obtained for different sampling times, discretization
methods, and magnitudes of the parameter perturbation. (Euler),
the Euler approximation; (ZOH), a zero-order hold discretization.
The last column displays the relative error introduced by using the
Euler approximation.
Error (%)
50% perturbation

10% perturbation

Approximation error

DT

e (ZOH)

e (Euler)

e (ZOH)

e (Euler)

g(DT)

0.001
0.01
0.1

0.48

3.95
22.77

1.02
9.48
56.98

0.10
0.84
6.22

0.83
7.92
52.82

0.013
1.3
120

2146

often carried out in a relative manner, corresponding
to the measurement of Dx.
The sampling time is DT ¼ 0.01 h as before, and
considered magnitudes of parameter perturbations are
20, 50 and 100%. The results for different numbers of
measured time-steps per experiment can be seen in
Fig. 2. In order to display the mean value and the
standard deviation of the relative estimation error in
the nonzero elements of the Jacobian, one hundred

Monte-Carlo simulations have been conducted at each
point. The results show that the relative estimation
error (Eqn 7) and its standard deviation decrease for
increasing numbers of measured time-steps. It is interesting to note that the estimation error also decreases
for increasing perturbation magnitudes. This is
explained by the fact that the signal-to-noise ratio
becomes more improved for larger perturbations,
which is reasonable also in practice. This serves to
illustrate that in general, there will exist a trade-off, in
terms of effects of measurement uncertainty on the one
hand and the effects of nonlinearities on the other
hand, when choosing the size of parameter perturbations.
Impact of sampling time
In order to illustrate the problems occurring in networks with dynamic modes that are too fast to capture
with the available sampling time, we considered identification of a network consisting of five genes. The network is a modification of the four gene network used
in the previous example, obtained by adding a fifth

10 4
Mean relative error and standard deviation (%)



H. Schmidt et al.

10 3
20% perturbation

50% perturbation

10 2


100% perturbation

10 1

21% Error

3

6

9
12
15
18
Measured time-steps / experiment

21

Fig. 2. Mean value and standard deviation of the relative estimation
error (7) in the nonzero elements of the Jacobian, obtained from
100 Monte-Carlo simulations.

FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS


H. Schmidt et al.

Identification of biochemical network structures


4
3

1
2
5

Fig. 3. Structure of the five-gene network. The interconnections
determine the direct interactions between the genes. An arrow
indicates a positive effect on the gene transcription, and a bar indicates a negative effect.

gene with relatively fast dynamics. (The equations,
parameters, and the nominal Jacobian are given in
part seven of the supplementary material.) The structure of the five gene network is shown in Fig. 3.
In the considered network, the degradation rate of
the mRNA of gene five has been chosen to be much
faster than the degradation rates for the other
mRNAs, thereby introducing a relatively fast dynamic
mode. The sampling time we employ is too large to
capture this fast dynamic mode.
Data for the estimation of the Jacobian of the system is generated in silico in the following way: (a) five
experiments, in each a 50% repression of one of the
genes is simulated. In in silico implementations, this
corresponds to a parameter perturbation of )50% in
the maximal enzyme rate. We stress that the magnitude of the perturbation is assumed unknown when
we apply the identification algorithm; (b) in each
experiment the mRNA concentrations, corresponding
to all five genes, are measured at four consecutive
time-steps. The first sample is taken one time-step
after the perturbation is applied to the system; (c) the

perturbation is applied while the system is not in the
steady-state. In in silico environments, this is simulated by introducing the perturbation while all mRNA
concentrations are 5% below their steady-state values.
This reflects the fact that a biological system in general will not be in a steady-state when perturbations
are applied in a real experiment. Furthermore, the
steady-state is assumed to be unknown and thus not
used in the identification; (d) the sampling time is
chosen to be DT ¼ 0.01 h.
Following the approach discussed above, we collect
the measurements and find that the smallest singular
value of the measurement matrix M is r1 ¼ 0.00026,
which is relatively close to 0 compared to the other
singular values. Thus, we conclude that the chosen

FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS

sampling time was too large with respect to the fastest
dynamics of the system.
Using the zero-order hold discretization to determine A from Ad , ignoring the fact that some modes
have not been captured in the data, the following
result is obtained:
2
3
À5:27 À4:29 À0:03
2:49
À5:57
6 0:68 À8:58
0:02
3:76
À3:33 7

6
7
A ¼ 6 1:19
2:27 À14:32
0:02
À9:48 7:
6
7
4 À19:32
15:60
9:50 À9:72
92:46 5
30:52 À25:11
0:03
0:07 À149:4
As can be verified easily, this result does not capture
the structure of the network in Fig. 3 correctly. For
example, the estimate of the Jacobian shows a large
direct effect of gene 1 on gene 4, which is incorrect.
The singular vector u1, corresponding to r1, shows
that the fifth component in x, that is, the mRNA concentration corresponding to gene five, is the most
dominant with respect to the singularity of the measurement matrix. Using the approach outlined in the
Method section, neglecting the measurements of the
fifth component, the following Jacobian for the reduced
network is obtained:
2
3
À6:43 À3:34 À0:01
2:49
6 À0:02 À8:01

0:03
3:76 7
7
A1;2;3;4 ¼ 6
4 À0:78
3:89 À14:28
0:02 5
À0:04 À0:17
9:12 À9:72
The identified Jacobian is close to the true Jacobian
for the reduced network, with relative errors in all
nonzero elements being smaller than 20%, and all zero
elements being identified as close to zero. It is important to point out that the Jacobian of the reduced network is not supposed to be equal to the Jacobian of
the four gene network in the previous example. The
dynamics of the reduced Jacobian also correspond reasonably well to the slow dynamics of the five gene network, as can be seen from the computed eigenvalues
in Table 2. The structure of the identified reduced
Jacobian reflects well the structure of the network in
Fig. 3 when gene five is taken out. For instance, gene
one directly affects gene three when the dynamics of
gene five are neglected, or assumed to be infinitely
fast.
The results presented above show that it is indeed
possible to obtain a useful identification result even in
the case that fast dynamics are not captured correctly.
Moreover, one can obtain the information on which
components are involved in the fast reactions, and
their static relationship with the other components of
the network.

2147



Identification of biochemical network structures

H. Schmidt et al.

Table 2. Comparison between the eigenvalues of the nominal
Jacobian of the five gene Anetwork and the eigenvalues of the estipffiffiffiffiffiffiffi
^
mated reduced Jacobian A1;2;3;4 . i, À1.
Network

Eigenvalues

A (nominal)
^
A1;2;3;4 (estimated)

)571.7
None

)13.28 ± i 3.16
)13.27 ± i 3.36

)6.93
)7.12

)5.15
)4.80


Conclusions
In this paper we have discussed the qualitative and
quantitative identification of network interactions
based on time-series measurements obtained from perturbation experiments and least-squares estimation.
The proposed method is equally applicable to identification of gene, protein, and metabolic networks. Due
to the fact that the method requires at least n +1 samples, where n is the number of network components,
the method is relatively costly for large scale networks,
and thus so far limited to the identification of smaller
networks. However, as high throughput techniques are
evolving fast, it is probable that high-frequency sampling can be obtained in the near future. Thus, wet-lab
based experimental verification of the proposed
method remains as future study.
The proposed approach has several advantages over
other approaches: the steady-state of the system does
not need to be known nor achieved prior to the perturbation; general type perturbations can be used; dynamics relatively fast compared to the sampling time can
be detected and removed from the identification; linear
dependencies due to moiety conservations can be identified and processed; samples from any number of
experiments can be combined in the identification, as
long as these experiments have been carried out
around the same steady-state.
We have shown that measurement uncertainty can
have a large effect on the identification result. Possible
solutions for uncertainty and noise are to collect and
use more measurement data, and to make use of available a priori structural knowledge. In addition, methods
from identification theory on estimating and filtering
noise can be incorporated. Furthermore, the signalto-noise ratio can be increased by choosing larger perturbations. However, the latter can lead to increased
nonlinear effects and a trade-off between the two
effects, therefore, has to be taken into consideration.
Instead of using the widely accepted Euler discretization, we have shown that the zero-order hold discretization, in general, results in a significantly improved
estimation and should be used in all methods aimed at

identifying dynamic biochemical networks.

2148

We have not discussed explicitly the effect of autoregulation of biological systems by self-negative feedback. For example, certain components might be regulated by homeostatic effects and a response to
perturbations might not be visible in the measurement
data. However, under the assumption that these
effects are significantly slower than the sampling time
it is reasonable to assume that the proposed method
will lead to an acceptable result. Furthermore, we
have only considered the case of the estimation
around a stable steady-state of the network. In the
case of oscillations, created within the network or
affecting the network, one would have to deal with
time-varying Jacobians, which is outside the scope of
this paper.

Experimental procedures
Method
In this section, we present a method for the determination
^
of an estimate Ad of the discrete Jacobian Ad based on minimization of a least-squares criterion. Some related issues,
such as the choice of the sampling time and how to deal
with moiety conservations in metabolic and signaling networks is also discussed. Least-squares based estimation is
used widely within many areas of science and engineering,
an important reason being that it is applicable even in the
case where no statistical information about the measurements are available [14]; this is typically the case with measurement data from biological systems.
Excitation of a biochemical system is usually performed
as a constant parameter perturbation e.g. gene knockouts
or the alteration of gene transcription rates. Especially

in vivo, it is not possible to quantify the applied perturbations, meaning that the magnitude of the applied perturbations is unknown. Furthermore, for gene networks, it is
usually also unknown which components the perturbations
affect in a direct manner. Previously proposed methods
often assume this information to be available, at least partially. Sontag et al., for example, assume that the magnitude
of the perturbations is unknown but the genes that are
directly affected by the perturbed parameters are known
[4]. In the following we consider both the magnitude and
the direct effects of perturbations to be unknown. To keep
the exposition relatively simple, we assume, however, that
the parameter perturbations are constant over time.
(However, in part 2 of the supplementary material we show
how this assumption can be relaxed to take time varying
parameter perturbations, known or unknown, and pulse
perturbations into account.)
We assume that the network response to the applied
perturbations is sufficiently small such that, in the time
range of the measurements, the system can be regarded as

FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS


H. Schmidt et al.

Identification of biochemical network structures

linear. Equation (3) then describes the behavior of the
system (Eqn 1) for variations around the steady-state
(x0,p0). In the following we will replace the time-dependent
perturbation vector Dpk by a constant perturbation vector
Dp. As Bd and Dp are both unknown, we can replace the

corresponding term by a constant unknown perturbation
Du, as follows:


Dxk
: 8ị
Dxkỵ1 ẳ Ad Dxk ỵ Bd Dp ẳ Ad Dxk ỵ Du ẳ ½Ad ; DuŠ
1
Here, [Ad, Du] is a matrix, consisting of the discrete time
Jacobian and the unknown perturbation vector Du, representing the unknowns in the equation. The vectors Dxk+1
T
and ½Dxk ; 1ŠT are given by the measurements. Assume now
that the system (Eqn 1) at time k ¼ k0 is in steady-state
(Dx0 ¼ 0) and that an unknown perturbation, corresponding to a nonzero perturbation vector Du, is applied to the
system at k ¼ k0 and held constant. Without loss of generality we can assume herein k0 ¼ 0. The response of the
network to this perturbation is measured at the following time-steps. The column vector Dxk represents the concentrations of the network components relative to the
steady-state concentrations obtained at time step k > 0.
Measuring the response of the network until time-step
n+2, where n corresponds to the number of involved
components in the network, and arranging these concentration vectors into matrices we obtain the following matrix version of Eqn (8):
R ẳ ẵDxnỵ2 ; Dxn ; :::; Dx2

Dxnỵ1 Dxn
ẳ ẵAd ; Du
1
1
ẳ ẵAd ; DuM:

:::
:::


Dx1
1



^
ẵAd ; D^ ẳ RM T MM T ị1 ;
u

9ị

ẳ RM À1 :
Invertibility of M can be guaranteed under a controllability
condition from linear systems theory (see proof in part 3 of
the supplementary material), and Ad and the unknown perturbation Du can then be determined from a single experiment.
The determination of Ad and Du is exact only in the case
where the system is linear and no measurement uncertainty
is present. In the case of noisy measurements and a nonlin^
u
ear biochemical network, only estimates Ad and D^ of the
unknowns can be obtained. It is then also important to
measure and use more time-steps than the minimum
required. In the case of more than n + 2 measured time-

ð10Þ

^
u
this corresponds to a least-squares solution for Ad and D^.

As the identification of the overall network structure
requires a relatively large number of measurement samples,
we consider combining data from several experiments. It
has to be pointed out that these experiments should be performed around the same steady-state, as only then an
averaging effect in the determination of Ad can be avoided.
Small variations of the initial state around the steady-state
are admissible, as long as the system still can be seen as
behaving linearly.
If r experiments are performed, the result matrix R can
be constructed as Eqn (11):
R ẳ ẵR1 ; :::; Rr ;

11ị

where Ri is the result matrix corresponding to the i-th experiment. For each experiment, Ri is constructed as Eqn (12):
i
i
i
Ri ẳ ẵDxmi ; Dxmi À1 ; :::; Dx2 Š;

ð12Þ

where mi determines the number of measured time-steps in
experiment i. Note that the measurements are assumed to
start at time k ¼ 1 and end at time k ¼ mi.
The measurement matrix M is constructed as Eqn (13):
2

Under the assumption that the (n +1) · (n +1) measurement matrix M on the right hand side in (Eqn 9) can be
inverted, the unknown matrix [Ad, Du] can be determined

from:


Dxnỵ1 Dxn ::: Dx1 1
ẵAd ; Du ẳ ẵDxnỵ2 ; Dxn ; :::; Dx2 Š
1
1 ::: 1

FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS

steps, the matrices M and R are constructed as above, but
with more columns, corresponding to the measurements in
the additional time-steps. Thus, M will no longer be a
square matrix and the pseudoinverse needs to be used
instead:

M1
6 1
6
M¼6 0
6 .
4 .
.
0

M2
0
1
.
.

.
0

3
::: Mr
::: 0 7
7
::: 0 7;
..
. 7
. . 5
.
::: 1

ð13Þ

where Mi is the matrix containing the measurements corresponding to the i-th experiment, shifted by one time-step relative to the measurements in Ri. For each experiment Mi is
constructed as Eqn (14):
i
i
i
Mi ẳ ẵDxmi 1 ; Dxmi 2 ; :::; Dx1 Š:

ð14Þ

The 1 and 0 elements in Eqn (13) denote row vectors
with unity and zero entries, respectively. These vectors
have the same width as the corresponding measurement
matrices Mi. The 1-vectors have the same origin as those
in Eqn (9). However, in the case of several experiments,

one has to take into account that the perturbation Dui in
the i-th experiment can be different from the perturbation
in the other experiments, and thus for each experiment
one perturbation vector needs to be taken into account.
(The construction of the matrices M and R is illustrated
for a simple example in part 4 of the supplementary
material.)

2149


Identification of biochemical network structures

H. Schmidt et al.

Estimations for the discrete time Jacobian Ad and the
unknown perturbations Du1,…, Dur can now be determined
from:
^
ẵAd ; D^1 ; :::; D^r ẳ RM T ðMM T ÞÀ1
u
u

ð15Þ

In the case of combined experiments, the total number of
columns of M should at least equal n + r. For the construction of R and M at least n + 2r measured time-steps
are required. Note that Eqn (15) involves the pseudoinverse
of M, corresponding to a least-squares estimation of Ad
and Dui.

An important side effect of incorporating estimation of
the applied perturbations using measurement data, is that
also nonzero, or unsteady-state, initial conditions can be
handled. This follows from the fact that initial unknown
deviations from the steady-state in fact can be represented
as an unknown perturbation. Thus, the proposed method
can be used even in cases where the steady-state x0 is
unknown. To see this, Eqn (8) is reformulated using the
relations Dxk+1 ¼ xk+1 ) x0 and Dxk ẳ xk x0 to obtain:
xkỵ1 ẳ Ad xk ỵ u;
where u is now given by u ¼ Du + (I ) Ad)x0, representing
a lumped perturbation, consisting of the unknown steadystate and the unknown perturbation Du. Thus, rather than
using relative measurements Dxk, the absolute measurements xk can be used directly for the estimation, and the
steady-state of the system does not need to be known. In
order to use this approach, it is sufficient to replace the
i
i
Dxk ¼ xk À x0 in Eqns (12) and (14) by the corresponding
i
absolute measurements xk . Equation (15) then becomes
^ ^
^
½Ad ; u1 ;:::; ur ẳ RM T MM T ị1 ;
where the only difference lies in the fact that now the
lumped perturbations ui, instead of Dui, are estimated.
Note, however, that the method is still based on the
assumption that the network is behaving linearly around
the same steady-state for all experiments. Hence, the initial
states in all experiments should in general not be too far
from the steady-state at which the Jacobian is to be determined.

The advantage of the approach proposed above is that
very general types of perturbations can be applied to the
system, and that information about the perturbations is not
required. (As mentioned earlier in the text, in part 2 of the
supplementary material we relax the assumption of constant parameter perturbations.)

Choice of sampling time and dealing
with moiety conservations
Biochemical networks generally contain dynamic modes
with a wide range of time constants. In order to identify
the full Jacobian from time-series measurements, the

2150

sampling time DT needs to be chosen so small that even the
fastest dynamics are captured. Due to experimental limitations, it may, however, not be possible to realize the
required sampling time. Furthermore, as the dynamics of
the system in general are unknown in advance, it is hard to
determine the required sampling time in advance. Herein
we will consider how interactions with dynamics significantly faster than the sampling time can be identified a priori from the collected data, and how these interactions then
can be extracted from the data prior to identification of the
network Jacobian. We also show that the same approach
can be used to deal with moiety conservations within the
considered network.
Assume the fastest mode of the linearized system (Eqn 2)
corresponds to an eigenvalue sf or a time-constant
sf ¼ 1 ⁄ |kf |. In order to obtain a reasonable estimate of the
corresponding dynamics, a sampling time DT smaller than
sf should be used [15]. If the sampling time is chosen to be
significantly larger than sf, then the transients of this mode

will essentially disappear between samples. This implies that
there exists an almost linear dependency between the measurements of the sampled states, and hence that the
measurement matrix M will be (almost) rank deficient. In
general – and we assume the perturbations fulfil the controllability condition discussed above ) the deficiency will
be equal to the number of modes with time-constant significantly smaller than the sampling time. The linear dependency, corresponding to the interactions with dynamics
significantly faster than the sampling time, can be determined directly from the collected measurements using a
singular value decomposition (SVD) of the measurement
matrix, that is, M ¼ USVH. The vectors ui corresponding
to singular values ri close to zero will correspond to the
singular directions.
A possible solution to the problem with too slow sampling is to identify the components taking part in the fast
dynamics, that is, components corresponding to nonzero
elements in the singular vector ui, and to remove one of
them for each fast mode. Any component can in principle
be chosen, but a reasonable choice is to neglect the one
being most dominant with respect to the singularity, that is,
corresponding to the largest element in the vector ui.
Repeating this procedure for every singular value of M
close to zero, will lead to a measurement matrix with full
rank, allowing determination of the Jacobian of the network, reduced by one component for each fast mode.
The presence of moiety conservations in metabolic or
signaling networks has the same effect on the estimation of
the Jacobian Ad as modes with dynamics significantly faster
than the sampling time. In other words, some of the concentrations of the intermediates in the networks will be linearly
dependent, resulting in a measurement matrix without full
row rank. Thus, the same approach as presented above for
dealing with linear dependencies due to a too large sampling
time, can be used to determine the components involved in

FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS



H. Schmidt et al.

the moiety conservations, and a reduced Jacobian can then
be determined for a reduced network containing only the
independent components. The algebraic relations corresponding to moiety conservations are obtained directly from
the SVD of the measurement matrix, M.
Note that if the linear dependencies due to moiety conservations and ⁄ or fast dynamic modes are not eliminated
^
^
from M prior to the determination of Ad and A, the resulting Jacobian will contain gross errors.

Acknowledgments
Henning Schmidt and Elling W. Jacobsen acknowledge
financial support from the Swedish Research Council.
Kwang-Hyun Cho acknowledges the support received
by a grant from the Korea Ministry of Science and
Technology (Korean Systems Biology Research Grant,
M10309000006–03B5000-00211) and also by The 21C
Frontier Microbial Genomics and Application Center
Program, Ministry of Science and Technology (Grant
M605-0204-3-0), Republic of Korea.

References
1 Kanehisa M & Goto S. (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28,
27–30. />2 Kholodenko BN, Kiyatkin A, Bruggeman FJ, Sontag E,
Westerhoff HV & Hoek JB (2002) Untangling the wires:
a strategy to trace functional interactions in signaling
and gene networks. Proc Natl Acad Sci USA 99, 12841–

12846.
3 Gardner T, di Bernardo D, Lorenz D & Collins J.
(2003) Inferring genetic networks and identifying compound mode of action via expression profiling. Science
301, 102–105.
4 Sontag E, Kiyatkin A & Kholodenko B (2004) Inferring
dynamic architecture of cellular networks using time series of gene expression, protein and metabolite data.
Bioinformatics 20, 1877–1886.

FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS

Identification of biochemical network structures

5 Ross J (2003) New approaches to the deduction of complex reaction mechanisms. Acc Chem Res 36, 839–847.
6 Vance W, Arkin A & Ross J (2002) Determination of
causal connectivities of species in reaction networks.
Proc Natl Acad Sci USA 99, 5816–5821.
7 Torralba A, Yu K, Shen P, Oefner P & Ross J (2003)
Experimental test of a method for determining causal
connectivities of species in reactions. Proc Natl Acad Sci
USA 100, 1494–1498.
8 Arkin A & Ross J (1995) Statistical construction of chemical reaction mechanisms from measured time-series.
J Phys Chem 99, 970–979.
9 de la Fuente A, Brazhnik P & Mendes P (2002) Linking
the genes: inferring quantitative gene networks from
microarray data. Trends Genet 18, 395–398.
10 Mihaliuk E, Skodt H, Hynne F, Sorensen PG & Showalter K (1999) Normal modes for chemical reactions
from time series analysis. J Phys Chem 103, 8246–8251.
11 Crampin EJ, Schnell S & McSharry PE (2004) Mathematical and computational techniques to deduce complex biochemical reaction mechanisms. Prog Biophys
Mol Biol 86, 77–112.
12 Siddhartha J & van Schuppen JH (2001) Modelling and

control of cell reaction networks. PNA-R0116, CWI,
Amsterdam.
13 Rugh W (1996) Linear System Theory. Prentice Hall,
Upper Saddle River, NJ, USA.
14 Kay S (1993) Fundamentals of Statistical Signal Processing. Prentice Hall, Upper Saddle River, NJ, USA.
15 Ljung L (1999) System Identification – Theory for the
User, 2nd edn. Prentice Hall, Upper Saddle River, NJ,
USA.

Supplementary material
The following material is available from http://www.
blackwellpublishing.com/products/journals/suppmat/EJB/
EJB4605/EJB4605sm.htm
Appendix S1. Additional proofs and models.

2151



×