Tải bản đầy đủ (.pdf) (30 trang)

Model-Based Design for Embedded Systems- P2 doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (716.66 KB, 30 trang )

6

Model-Based Design for Embedded Systems

In addition, in many cases, the same simulation environment can be used
for both function and performance verifications. However, most simulationbased performance estimation methods suffer from insufficient corner-case
coverage. This means that they are typically not able to provide worst-case
performance guarantees. Moreover, accurate simulations are often computationally expensive.
In other works [5,6], hybrid performance estimation methods have been
presented that combine simulation and analytic techniques. While these
approaches considerably shorten the simulation run-times, they still cannot
guarantee full coverage of corner cases.
To determine guaranteed performance limits, analytic methods must be
adopted. These methods provide hard performance bounds; however, they
are typically not able to model complex interactions and state-dependent
behaviors, which can result in pessimistic performance bounds.
Several models and methods for analytic performance verifications of distributed platforms have been presented so far. These approaches are based
on essentially different abstraction concepts. The first idea was to extend
well-known results of the classical scheduling theory to distributed systems. This implies the consideration of communication delays, which cannot
be neglected in a distributed system. Such a combined analysis of processor and bus scheduling is often referred to as holistic scheduling analysis.
Rather than a specific performance analysis method, holistic scheduling is
a collection of techniques for the analysis of distributed platforms, each of
which is tailored toward a particular combination of an event stream model,
a resource-sharing policy, and communication arbitration (see [10,11,15] as
examples). Several holistic analysis techniques are aggregated and implemented in the modeling and analysis suite for real-time applications (MAST)
[3].∗
In [12], a more general approach to extend the concepts of the classical
scheduling theory to distributed systems was presented. In contrast to holistic approaches that extend the monoprocessor scheduling analysis to special
classes of distributed systems, this compositional method applies existing
analysis techniques in a modular manner: the single components of a distributed system are analyzed with classical algorithms, and the local results
are propagated through the system by appropriate interfaces relying on a


limited set of event stream models.
In this chapter, we will describe a different analytic and modular
approach for performance prediction that does not rely on the classical
scheduling theory. The method uses real-time calculus [13] (RTC), which
extends the basic concepts of network calculus [7]. The corresponding modular performance analysis (MPA) framework [1] analyzes the flow of event
streams through a network of computation and communication resources.

∗ Available as Open Source software at


Performance Prediction of Distributed Platforms

1.2

7

Application Scenario

In this section, we introduce the reader to the system-level performance
analysis by means of a concrete application scenario from the area of video
processing. Intentionally, this example is extremely simple in terms of the
underlying hardware platform and the application model. On the other
hand, it allows us to introduce the concepts that are necessary for a compositional performance analysis (see Section 1.4).
The example system that we consider is a digital set-top box for the
decoding of video streams. The architecture of the system is depicted in
Figure 1.2. The set-top box implements a picture-in-picture (PiP) application
that decodes two concurrent MPEG-2 video streams and displays them on
the same output device. The upper stream, VHR , has a higher frame resolution and is displayed in full screen whereas the lower stream, VLR , has a
lower frame resolution and is displayed in a smaller window at the bottom
left edge of the screen.

The MPEG-2 video decoding consists of the following tasks: variable
length decoding (VLD), inverse quantization (IQ), inverse discrete cosine
transformation (IDCT), and motion compensation (MC). In the considered
set-top box, the decoding application is partitioned onto three processors:
CPU1 , CPU2 , and CPU3 . The tasks VLD and IQ are mapped onto CPU1
for the first video stream (process P1 ) and onto CPU2 for the second video
stream (process P3 ). The tasks IDCT and MC are mapped onto CPU3 for both
video streams (processes P2 and P4 ). A pre-emptive fixed priority scheduler
is adopted for the sharing of CPU3 between the two streams, with the upper
stream having higher priority than the lower stream. This reflects the fact
that the decoder gives a higher quality of service (QoS) to the stream with a
higher frame resolution, VHR .
As shown in the figure, the video streams arrive over a network and enter
the system after some initial packet processing at the network interface. The
inputs to P1 and P3 are compressed bitstreams and their outputs are partially decoded macroblocks, which serve as inputs to P2 and P4 . The fully
decoded video streams are then fed into two traffic-shaping components S1
and S2 , respectively. This is necessary because the outputs of P2 and P4 are
potentially bursty and need to be smoothed out in order to make sure that
no packets are lost by the video interface, which cannot handle more than a
certain packet rate per stream.
We assume that the arrival patterns of the two streams, VHR and VLR ,
from the network as well as the execution demands of the various tasks in
the system are known. The performance characteristics that we want to analyze are the worst-case end-to-end delays for the two video streams from
the input to the output of the set-top box. Moreover, we want to analyze the
memory demand of the system in terms of worst-case packet buffer occupation for the various tasks.


VLR

VHR


CPU2

CPU1

S2
2
σ2

IDCT
MC

P4

VLD
IQ

P3

S1
σ1

P2
IDCT
MC

CPU3

VLD
IQ


P1

Set-top box

Video
interface

FIGURE 1.2
A PiP application decoding two MPEG-2 video streams on a multiprocessor architecture.

Network

Network
interface

LR

LCD TV
HR

8
Model-Based Design for Embedded Systems


Performance Prediction of Distributed Platforms

9

In Section 1.3, we at first will formally describe the above system in the

concrete time domain. In principle, this formalization could directly be used
in order to perform a simulation; in our case, it will be the basis for the MPA
described in Section 1.4.

1.3

Representation in the Time Domain

As can be seen from the example described in Section 1.2, the basic model
of computation consists of component networks that can be described as a
set of components that are communicating via infinite FIFO (first-in first-out)
buffers denoted as channels. Components receive streams of tokens via their
input channels, operate on the arriving tokens, and produce output tokens
that are sent to the output channels. We also assume that the components need
resources in order to actually perform operations. Figure 1.3 represents the
simple component network corresponding to the video decoding example.
Examples of components are tasks that are executed on computing
resources or data communication via buses or interconnection networks.
Therefore, the token streams that are present at the inputs or outputs of a
component could be of different types; for example, they could represent
simple events that trigger tasks in the corresponding computation component or they could represent data packets that need to be communicated.

1.3.1 Arrival and Service Functions
In order to describe this model in greater detail, at first we will describe
streams in the concrete time domain. To this end, we define the concept of
arrival functions: R(s, t) ∈ R≥0 denotes the amount of tokens that arrive in
the time interval [s, t) for all time instances, s, t ∈ R, s < t, and R(t, t) = 0.
Depending on the interpretation of a token stream, an arrival function may
be integer valued, i.e., R(s, t) ∈ Z≥0 . In other words, R(s, t) “counts” the


C1
P1

P3
(a)

P2

P4

S1

S2

C3

S1

RHR

P1

P2

σ1

RLR

C2
P3


P4

σ2

(b)

S2

FIGURE 1.3
Component networks corresponding to the video decoding example in Section 1.2: (a) without resource interaction, and (b) with resource interaction.


10

Model-Based Design for Embedded Systems

number of tokens in a time interval. Note that we are taking a very liberal
definition of a token here: It just denotes the amount of data or events that
arrive in a channel. Therefore, a token may represent bytes, events, or even
demanded processing cycles.
In the component network semantics, tokens are stored in channels that
connect inputs and outputs of components. Let us suppose that we had
determined the arrival function R (s, t) corresponding to a component output (that writes tokens into a channel) and the arrival function R(s, t) corresponding to a component input (that removes tokens from the channel); then
we can easily determine the buffer fill level, B(t), of this channel at some time
t: B(t) = B(s) + R (s, t) − R(s, t).
As has been described above, one of the major elements of the model is
that components can only advance in their operation if there are resources
available. As resources are the first-class citizens of the performance analysis, we define the concept of service functions: C(s, t) ∈ R≥0 denotes the
amount of available resources in the time interval [s, t) for all time instances,

s, t ∈ R, s < t, and C(t, t) = 0. Depending on the type of the underlying
resource, C(s, t) may denote the accumulated time in which the resource is
fully available for communication or computation, the amount of processing
cycles, or the amount of information that can be communicated in [s, t).

1.3.2 Simple and Greedy Components
Using the above concept of arrival functions, we can describe a set of very
simple components that only perform data conversions and synchronization.
• Tokenizer: A tokenizer receives fractional tokens at the input that may
correspond to a partially transmitted packet or a partially executed
task. A discrete output token is only generated if the whole processing or communication of the predecessor component is finished. With
the input and output arrival functions R(s, t) and R (s, t), respectively,
we obtain as a transfer function R (s, t) = R(s, t) .
• Scaler: Sometimes, the units of arrival and service curves do not match.
For example, the arrival function, R, describes a number of events and
the service function, C, describes resource units. Therefore, we need to
introduce the concept of scaling: R (s, t) = w · R(s, t), with the positive
scaling factor, w. For example, w may convert events into processor
cycles (in case of computing) or into number of bytes (in case of communication). A much more detailed view on workloads and their modeling can be found in [8], for example, modeling time-varying resource
usage or upper and lower bounds (worst-case and best-case resource
demands).
• AND and OR: As a last simple example, let us suppose a component
that only produces output tokens if there are tokens on all inputs
(AND). Then the relation between the arrival functions at the inputs


Performance Prediction of Distributed Platforms

11


R1 (s, t) and R2 (s, t), and output R (s, t) is R (s, t) = min{B1 (s) + R1 (s, t),
B2 (s) + R2 (s, t)}, where B1 (s) and B2 (s) denote the buffer levels in the
input channels at time s. If the component produces an output token
for every token at any input (OR), we find R (s, t) = R1 (s, t) + R2 (s, t).
The elementary components described above do not interact with the
available resources at all. On the other hand, it would be highly desirable
to express the fact that a component may need resources in order to operate on the available input tokens. A greedy processing component (GPC) takes
an input arrival function, R(s, t), and produces an output arrival function,
R (s, t), by means of a service function, C(s, t). It is defined by the input/output relation
R (s, t) = inf {R(s, λ) + C(λ, t) + B(s), C(s, t)}
s≤λ≤t

where B(s) denotes the initial buffer level in the input channel. The remaining
service function of the remaining resource is given by
C (s, t) = C(s, t) − R (s, t)
The above definition can be related to the intuitive notion of a greedy
component as follows: The output between some time λ and t cannot
be larger than C(λ, t), and, therefore, R (s, t) ≤ R (s, λ) + C(λ, t), and also
R (s, t) ≤ C(s, t). As the component cannot output more than what was
available at the input, we also have R (s, λ) ≤ R(s, λ) + B(s), and, therefore,
R (s, t) ≤ min{R(s, λ) + C(λ, t) + B(s), C(s, t)}. Let us suppose that there is
some last time λ∗ before t when the buffer was empty. At λ∗ , we clearly
have R (s, λ∗ ) = R(s, λ∗ ) + B(s). In the interval from λ∗ to t, the buffer is
never empty and all available resources are used to produce output tokens:
R (s, t) = R(s, λ∗ ) + B(s) + C(λ∗ , t). If the buffer is never empty, we clearly have
R (s, t) = C(s, t), as all available resources are used to produce output tokens.
As a result, we obtain the mentioned input–output relation of a GPC.
Note that the above resource and timing semantics model almost all practically relevant processing and communication components (e.g., processors
that operate on tasks and use queues to keep ready tasks, communication
networks, and buses). As a result, we are not restricted to model the processing time with a fixed delay. The service function can be chosen to represent

a resource that is available only in certain time intervals (e.g., time division
multiple access [TDMA] scheduling), or which is the remaining service after
a resource has performed other tasks (e.g., fixed priority scheduling). Note
that a scaler can be used to perform the appropriate conversions between
token and resource units. Figure 1.4 depicts the examples of concrete components we considered so far. Note that further models of computation can
be described as well, for example, (greedy) Shapers that limit the amount of
output tokens to a given shaping function, σ, according to R (s, t) ≤ σ(t − s)
(see Section 1.4 and also [19]).


12

Model-Based Design for Embedded Systems
Scaler
R

ω

R

Tokenizer
R

R

AND
R1
R2

OR

R

R1
+
R2

GPC
C
R

R

GPC

Shaper
R

R

σ

R

C

FIGURE 1.4
Examples of component types as described in Section 1.3.2.

1.3.3 Composition
The components shown in Figure 1.4 can now be combined to form a

component network that not only describes the flow of tokens but also the
interaction with the available resources. Figure 1.3b shows the component
network that corresponds to the video decoding example. Here, the components, as introduced in Section 1.3.2, are used. Note that necessary scaler and
tokenizer components are not shown for simplicity, but they are needed to
relate the different units of tokens and resources, and to form tokens out of
partially computed data.
For example, the input events described by the arrival function, RLR , trigger the tasks in the process P3 , which runs on CPU2 whose availability is
described by the service function, C2 . The output drives the task in the process P4 , which runs on CPU3 with a second priority. This is modeled by feeding the GPC component with the remaining resources from the process P2 .
We can conclude that the flow of event streams is modeled by connecting
the “arrival” ports of the components and the scheduling policy is modeled
by connecting their “service” ports. Other scheduling policies like the nonpreemptive fixed priority, earliest deadline first, TDMA, general processor
share, various servers, as well as any hierarchical composition of these policies can be modeled as well (see Section 1.4).

1.4

Modular Performance Analysis with Real-Time
Calculus

In the previous section, we have presented the characterization of event and
resource streams, and their transformation by elementary concrete processes.
We denote these characterizations as concrete, as they represent components,
event streams, and resource availabilities in the time domain and work on
concrete stream instances only. However, event and resource streams can
exhibit a large variability in their timing behavior because of nondeterminism and interference. The designer of a real-time system has to provide performance guarantees that cover all possible behaviors of a distributed system


Performance Prediction of Distributed Platforms

13


and its environment. In this section, we introduce the abstraction of the MPA
with the RTC [1] (MPA-RTC) that provides the means to capture all possible
interactions of event and resource streams in a system, and permits to derive
safe bounds on best-case and worst-case behaviors.
This approach was first presented in [13] and has its roots in network calculus [7]. It permits to analyze the flow of event streams through a network
of heterogeneous computation and communication resources in an embedded platform, and to derive hard bounds on its performance.

1.4.1 Variability Characterization
In the MPA, the timing characterization of event streams and of the resource
availability is based on the abstractions of arrival curves and service curves,
respectively. Both the models belong to the general class of variability characterization curves (VCCs), which allow to precisely quantify the best-case
and worst-case variabilities of wide-sense-increasing functions [8]. For simplicity, in the rest of the chapter we will use the term VCC if we want to refer
to either arrival or service curves.
In the MPA framework, an event stream is described by a tuple of arrival
curves, α(Δ) = [αl (Δ), αu (Δ)], where αl : R≥0 → R≥0 denotes the lower
arrival curve and αu : R≥0 → R≥0 the upper arrival curve of the event
stream. We say that a tuple of arrival curves, α(Δ), conforms to an event
stream described by the arrival function, R(s, t), denoted as α |= R iff for all
t > s we have αl (t − s) ≤ R(s, t) ≤ αu (t − s). In other words, there will be
at least αl (Δ) events and at most αu (Δ) events in any time interval [s, t) with
t − s = Δ.
In contrast to arrival functions, which describe one concrete trace of an
event stream, a tuple of arrival curves represents all possible traces of a stream.
Figure 1.5a shows an example tuple of arrival curves. Note that any event
stream can be modeled by an appropriate pair of arrival curves, which means
that this abstraction substantially expands the modeling power of standard
event arrival patterns such as sporadic, periodic, or periodic with jitter.
Similarly, the availability of a resource is described by a tuple of service
curves, β(Δ) = [βl (Δ), βu (Δ)], where βl : R≥0 → R≥0 denotes the lower
service curve and βu : R≥0 → R≥0 the upper service curve. Again, we say

that a tuple of service curves, β(Δ), conforms to an event stream described
by the service function, C(s, t), denoted as β |= C iff for all t > s we have
βl (t − s) ≤ C(s, t) ≤ βu (t − s). Figure 1.5b shows an example tuple of service
curves.
Note that, as defined above, the arrival curves are expressed in terms
of events while the service curves are expressed in terms of workload/
service units. However, the component model described in Section 1.4.2
requires the arrival and service curves to be expressed in the same unit.
The transformation of event-based curves into resource-based curves and
vice versa is done by means of so-called workload curves which are VCCs


14

Model-Based Design for Embedded Systems
αu

# Events

# Cycles

8

βu

3e4
6

βl
2e4


αl

4

1e4

2

0
(a)

5

10

15

20

Δ

0

10

20

30


Δ

(b)

FIGURE 1.5
Examples of arrival and service curves.
β
α

GPC

(a)

C
α

β

R
(b)

GPC

R

C

FIGURE 1.6
(a) Abstract and (b) concrete GPCs.


themselves. Basically, these curves define the minimum and maximum
workloads imposed on a resource by a given number of consecutive events,
i.e., they capture the variability in execution demands. More details about
workload transformations can be found in [8]. In the simplest case of a constant workload w for all events, an event-based curve is transformed into a
resource-based curve by simply scaling it by the factor w. This can be done
by an appropriate scaler component, as described in Section 1.3.

1.4.2 Component Model
Distributed embedded systems typically consist of computation and communication elements that process incoming event streams and are mapped
on several different hardware resources. We denote such event-processing
units as components. For instance, in the system depicted in Figure 1.2, we
can identify six components: the four tasks, P1 , P2 , P3 and P4 , as well as the
two shaper components, S1 and S2 .
In the MPA framework, an abstract component is a model of the processing semantics of a concrete component, for instance, an application task or
a concrete dedicated HW/SW unit. An abstract component models the execution of events by a computation or communication resource and can be


Performance Prediction of Distributed Platforms

15

seen as a transformer of abstract event and resource streams. As an example,
Figure 1.6 shows an abstract and a concrete GPC.
Abstract components transform input VCCs into output VCCs, that is,
they are characterized by a transfer function that relates input VCCs to output VCCs. We say that an abstract component conforms to a concrete component if the following holds: Given any set of input VCCs, let us choose an
arbitrary trace of concrete component inputs (event and resource streams)
that conforms to the input VCCs. Then, the resulting output streams must
conform to the output VCCs as computed using the abstract transfer function. In other words, for any input that conforms to the corresponding input
VCCs, the output must also conform to the corresponding output VCCs.
In the case of the GPC depicted in Figure 1.6, the transfer function Φ of the

abstract component is specified by a set of functions that relate the incoming
arrival and service curves to the outgoing arrival and service curves. In this
case, we have Φ = [fα , fβ ] with α = fα (α, β) and β = fβ (α, β).

1.4.3 Component Examples
In the following, we describe the abstract components of the MPA framework that correspond to the concrete components introduced in Section 1.3:
scaler, tokenizer, OR, AND, GPC, and shaper.
Using the above relation between concrete and abstract components, we
can easily determine the transfer functions of the simple components, tokenizer, scaler, and OR, which are depicted in Figure 1.4.
• Tokenizer: The tokenizer outputs only integer tokens and is characterized by R (s, t) = R(s, t) . Using the definition of arrival curves, we
simply obtain as the abstract transfer function α u (Δ) = αu (Δ) and
α l (Δ) = αl (Δ) .
• Scaler: As R (s, t) = w · R(s, t), we get α u (Δ) = w · αu (Δ) and α l (Δ) =
w · αl (Δ).
• OR: The OR component produces an output for every token at any
input: R (s, t) = R1 (s, t) + R2 (s, t). Therefore, we find α u (Δ) = αu (Δ) +
1
αu (Δ) and α l (Δ) = αl (Δ) + αl (Δ).
2
1
2
The derivation of the AND component is more complex and its corresponding transfer functions can be found in [4,17].
As described in Section 1.3, a GPC models a task that is triggered by
the events of the incoming event stream, which queue up in a FIFO buffer.
The task processes the events in a greedy fashion while being restricted
by the availability of resources. Such a behavior can be modeled with the
following internal relations that are proven in [17]:∗
∗ The deconvolutions in min-plus and max-plus algebra are defined as (f
g)(Δ) =
g)(Δ) = infλ≥0 {f (Δ + λ) − g(λ)}, respectively. The convosupλ≥0 {f (Δ + λ) − g(λ)} and (f

lution in min-plus algebra is defined as (f ⊗ g)(Δ) = inf0≤λ≤Δ {f (Δ − λ) + g(λ)}.


16

Model-Based Design for Embedded Systems
α u (Δ) = min{(αu ⊗ βu )
α l (Δ) = min{(αl

βl , βu }

βu ) ⊗ βl , βl }

β u (Δ) = max{ inf {βu (λ) − αl (λ)}, 0}
Δ≤λ

l

β (Δ) = sup {βl (λ) − αu (λ)}
0≤λ≤Δ

In the example system of Figure 1.2, the processing semantics of the tasks
P1 , P2 , P3 , and P4 can be modeled with abstract GPCs.
Finally, let us consider a component that is used for event stream shaping.
A greedy shaper component (GSC) with a shaping curve σ delays events of
an input event stream such that the output event stream has σ as an upper
arrival curve. Additionally, a greedy shaper guarantees that no events are
delayed longer than necessary. Typically, greedy shapers are used to reshape
bursty event streams and to reduce global buffer requirements. If the abstract
input event stream of a GSC with the shaping curve, σ, is represented by the

tuple of arrival curves, [αl , αu ], then the output of the GSC can be modeled
as an abstract event stream with arrival curves:
α u = αu ⊗ σ
GSC

l
αl
GSC = α ⊗ (σ σ)

Note that a greedy shaper does not need any computation or communication resources. Thus, the transfer function of an abstract GSC considers only
the ingoing and the outgoing event stream, as well as the shaping curve, σ.
More details about greedy shapers in the context of MPA can be found in [19].
In the example system of Figure 1.2, the semantics of the shapers, S1 and
S2 , can be modeled with abstract GSCs.

1.4.4 System Performance Model
In order to analyze the performance of a distributed embedded platform, it
is necessary to build a system performance model. This model has to represent the hardware architecture of the platform. In particular, it has to reflect
the mapping of tasks to computation or communication resources and the
scheduling policies adopted by these resources.
To obtain a performance model of a system, we first have to model the
event streams that trigger the system, the computation and communication
resources that are available, and the processing components. Then, we have
to interconnect the arrival and service inputs and outputs of all these elements so that the architecture of the system is correctly represented.
Figure 1.7 depicts the MPA performance model for the example system
described in Figure 1.2. Note that the outgoing abstract service stream of
GPC2 is used as the ingoing abstract service stream for GPC4 , i.e., GPC4
gets only the resources that are left by GPC2 . This represents the fact that
the two tasks share the same processor and are scheduled according to a



Performance Prediction of Distributed Platforms
β1
GPC1

αHR

β3
GPC2

σ1
GSC1

β2
αLR

GPC3

17

σ2
GPC4

GSC2

FIGURE 1.7
Performance model for the example system in Figure 1.2.
pre-emptive fixed priority scheduling policy with GPC2 having a higher priority than GPC4 .
In general, scheduling policies for shared resources can be modeled by
the way the abstract resources β are distributed among the different abstract

tasks. For some scheduling policies, such as earliest deadline first (EDF) [16],
TDMA [20], nonpreemptive fixed priority scheduling [4], various kinds of
servers [16], or any hierarchical composition of these elementary policies,
abstract components with appropriate transfer functions have been introduced. Figure 1.8 shows some examples of how to model different scheduling policies within the MPA framework.

1.4.5 Performance Analysis
The performance model provides the basis for the performance analysis
of a system. Several performance characteristics such as worst-case end-toend delays of events or buffer requirements can be determined analytically
within the MPA framework.
The performance of each abstract component can be determined as a
function of the ingoing arrival and service curves by the formulas of the RTC.
For instance, the maximum delay, dmax , experienced by an event of an event
stream with arrival curves, [αl , αu ], that is processed by a GPC on a resource
with service curves, [βl , βu ], is bounded by
def

dmax ≤ sup inf{τ ≥ 0 : αu (λ) ≤ βl (λ + τ)} = Del(αu , βl )
λ≥0

The maximum buffer space, bmax , that is required to buffer an event stream
with arrival curves, [αl , αu ], that is processed by a GPC on a resource with
service curves, [βl , βu ], is bounded by
def

bmax ≤ sup{αu (λ) − βl (λ)} = Buf(αu , βl )
λ≥0


18


Model-Based Design for Embedded Systems
β
αA

β
αA

GPC

αA

αA

β
αB

EDF
αB

αB

GPC

αB
β

β
(a)

(b)

β
TDMA
βslot1

αA

αA

GPC

αB

GPC
βslot1

Share
βs2 βs1

βslot2
αA

GPC

αA

αB

αB

GPC


αB

βs2 βs1

βslot2

Sum
β
(c)

(d)

FIGURE 1.8
Modeling scheduling policies in the MPA framework: (a) preemptive fixed
priority, (b) EDF, (c) TDMA, and (d) generalized processor sharing.
Figure 1.9 shows the graphical interpretation of the maximum delay experienced by an event at a GPC and the maximum buffer requirement of the
GPC: dmax corresponds to the maximum horizontal distance between αu
and βl , and bmax corresponds to the maximum vertical distance between αu
and βl .
In order to compute the end-to-end delay of an event stream over several
consecutive GPCs, one can simply add the single delays at the various components. Besides this strictly modular approach, one can also use a holistic
delay analysis that takes into consideration that in a chain of task the worstcase burst cannot appear simultaneously in all tasks. (This phenomenon is
described as “pay burst only once” [7].) For such a task chain the total delay
can be tightened to
dmax ≤ Del(αu , βl ⊗ βl ⊗ . . . ⊗ βl )
n
1
2
For an abstract GSC, the maximum delay and the maximum backlog are

bounded by
bmax = Buf(αu , σ)
dmax = Del(αu , σ)


19

Performance Prediction of Distributed Platforms
βl

αu

bmax
dmax

Δ

FIGURE 1.9
Graphical interpretation of dmax and bmax .

Let us come back to the example of Figure 1.2. By applying the above
reasoning, the worst-case end-to-end delay for the packets of the two video
streams can be analytically bounded by
dHR ≤ Del(αu , βl ⊗ βl ⊗ σ1 )
HR 1
3

dLR ≤ Del(αu , βl ⊗ β3l ⊗ σ2 )
LR 2


1.4.6 Compact Representation of VCCs
The performance analysis method presented above relies on computations
on arrival and service curves. While the RTC provides compact mathematical representations for the different operations on curves, their computation
in practice is typically more involved. The main issue is that the VCCs are
defined for the infinite range of positive real numbers. However, any computation on these curves requires a finite representation.
To overcome this problem, we introduce a compact representation for
special classes of VCCs. In particular, we consider piecewise linear VCCs
that are finite, periodic, or mixed.
• Finite piecewise linear VCCs consist of a finite set of linear segments.
• Periodic piecewise linear VCCs consist of a finite set of linear segments
that are repeated periodically with a constant offset between consecutive repetitions.
• Mixed piecewise linear VCCs consist of a finite set of linear segments
that are followed by a second finite set of linear segments that are
repeated periodically, again with a constant offset between consecutive repetitions.
Figure 1.10a through c shows examples of these three classes of curves.
Many practically relevant arrival and service curves are piecewise linear.
For example, if a stream consists of a discrete token, the corresponding


20

Model-Based Design for Embedded Systems

Resource units

8
6
4
2
0


0

2

4

(a)

6

8

10

6

8

10

6

8

10

Δ (ms)

Resource units


4
3
2
1
0
0

2

4
Δ (ms)

(b)
8

Events

6
4
2
0
(c)

0

2

4
Δ (ms)


FIGURE 1.10
(a) A finite piecewise linear VCC, (b) a periodic piecewise linear VCC, and
(c) a mixed piecewise linear VCC.
arrival curve is an integer and can be represented as a piecewise constant
function. For the service curves, one could use the same reasoning, as the basic
resource units (number of clock cycles, number of bytes, etc.) are typically
also atomic. However, these units are often too fine-grained for a practical
analysis and hence it is preferable to use a continuous model. In most practical
applications, the fluid resource availability is piecewise constant over time,
that is, practically relevant service curves are also piecewise linear.
Here, we want to note that there are also piecewise linear VCCs that are
not covered by the three classes of curves that we have defined above. In
particular, we have excluded irregular VCCs, that is, VCCs with an infinite
number of linear segments that do not eventually show periodicity.
However, most practically relevant timing specifications for event streams
and availability specifications for resources can be captured by either finite,


Performance Prediction of Distributed Platforms

21

periodic, or mixed piecewise linear VCCs. In addition, note that VCCs only
describe bounds on token or resource streams, and, therefore, one can always
safely approximate an irregular VCC to a mixed piecewise VCC.
In the following, we describe how these three classes of curves can be represented by means of a compact data structure. First, we note that a single linear segment of a curve can be represented by a triple x, y, s with x ∈ R≥0 and
y, s ∈ R that specifies a straight line in the Cartesian coordinate system, which
starts at the point (x, y) and has a slope s. Further, a piecewise linear VCC can
be represented as a (finite or infinite) sequence x1 , y1 , s1 , x2 , y2 , s2 , . . . of

such triples with xi < xi+1 for all i. To obtain a curve defined by such a
sequence, the single linear segments are simply extended with their slopes
until the x-coordinate of the starting point of the next segment is reached.
The key property of the three classes of VCCs defined above is that these
VCCs can be represented with a finite number of segments, which is fundamental for practical computations: Let ρ be a lower or an upper VCC belonging to a set of finite, periodic, or mixed VCCs. Then ρ can be represented with
a tuple
νρ = ΣA , ΣP , px , py , xp0 , yp0
where
ΣA is a sequence of linear segments describing a possibly existing irregular
initial part of ρ
ΣP is a sequence of linear segments describing a possibly existing regularly
repeated part of ρ
If ΣP is not an empty sequence, then the regular part of ρ is defined by
the period px and the vertical offset py between two consecutive repetitions
of ΣP , and the first occurrence of the regular sequence ΣP starts at (xp0 , yp0 ).
In this compact representation, we call ΣA the aperiodic curve part and ΣP
the periodic curve part.
In the compact representation, a finite piecewise linear VCC has ΣP = {},
that is, it consists of only the aperiodic part, ΣA , with xA,1 = 0. A periodic
piecewise linear VCC can be described with ΣA = {}, xP,1 = 0, and xp0 = 0,
that is, it has no aperiodic part. And finally, a mixed piecewise linear VCC is
characterized by xA,1 = 0, xP,1 = 0, and xp0 > 0.
As an example, consider the regular mixed piecewise linear VCC
depicted in Figure 1.10c. Its compact representation according to the definition above is given by the tuple
νC =

0, 1, 0 , 0.2, 2, 0 , 0.4, 3, 0 , 0.6, 4, 0 , 0, 0, 0 , 2, 1, 2, 5

The described compact representation of VCCs is used as a basis for practical computations in the RTC framework. All the curve operators adopted
in the RTC (minimum, maximum, convolutions, deconvolutions, etc.) are

closed on the set of mixed piecewise linear VCCs. This means that the result
of the operators, when applied to finite, periodic, or mixed piecewise linear


22

Model-Based Design for Embedded Systems

VCCs, is again a mixed piecewise linear VCC. Further details about the compact representation of VCCs and, in particular, on the computation of the
operators can be found in [17].

1.5 RTC Toolbox
The framework for the MPA with the RTC that we have described in this
chapter has been implemented in the RTC Toolbox for MATLAB R [21],
which is available at />The RTC Toolbox is a powerful instrument for system-level performance
analysis of distributed embedded platforms. At its core, the toolbox provides
a MATLAB type for the compact representation of VCCs (see details in Section 1.4) and an implementation of a set of the RTC curve operations. Built
around this core, the RTC Toolbox provides libraries to perform the MPA,
and to visualize VCCs and the related data.
Figure 1.11 shows the underlying software architecture of the toolbox.
The RTC toolbox internally consists of a kernel that is implemented in Java,
and a set of MATLAB libraries that connect the Java kernel to the MATLAB command line interface. The kernel consists of classes for the compact
representation of VCCs and classes that implement the RTC operators. These
two principal components are supported by classes that provide various
utilities. On top of these classes, the Java kernel provides APIs that provide
methods to create compact VCCs, compute the RTC operations, and access
parts of the utilities.

MATLAB command line


RTC toolbox
MPA library
VCC library

RTC operators

MATLAB/Java interface
Java API
Min-plus/Max-plus algebra, utilities
Compact representation of VCCs

FIGURE 1.11
Software architecture of the RTC toolbox.


Performance Prediction of Distributed Platforms

23

The Java kernel is accessed from MATLAB via the MATLAB Java Interface. However, this access is completely hidden from the user who only
uses the MATLAB functions provided by the RTC libraries. The MATLAB
libraries of the RTC Toolbox provide functions to create VCCs, plot VCCs,
and apply operators of the RTC on VCCs. From the point of view of the user,
the VCCs are MATLAB data types, even if internally they are represented
as Java objects. Similarly, the MATLAB functions for the RTC operators are
wrapper functions for the corresponding methods that are implemented in
the Java kernel.
On top of the VCC and the RTC libraries, there is the MPA library. It
provides a set of functions that facilitate the use of the RTC Toolbox for the
MPA. In particular, it contains functions to create commonly used arrival

and service curves, as well as functions to conveniently compute the outputs
of the various abstract components of the MPA framework.

1.6

Extensions

In the previous sections, we have introduced the basics of the MPA approach
based on the RTC. Recently, several extensions have been developed to refine
the analysis method.
In [4], the existing methods for analyzing heterogeneous multiprocessor
systems are extended to nonpreemptive scheduling policies. In this work,
more complex task-activation schemes are investigated as well. In particular,
components with multiple inputs and AND- or OR-activation semantics are
introduced.
The MPA approach also supports the modeling and analysis of systems
with dynamic scheduling policies. In [16], a component for the modeling of
the EDF scheduling is presented. This work also extends the ability of the
MPA framework to model and analyze hierarchical scheduling policies by
introducing appropriate server components. The TDMA policies have been
modeled using the MPA as well [20].
In Section 1.4, we have briefly described the GSC. More details about
traffic shaping in the context of multiprocessor embedded systems and the
embedding of the GSC component into the MPA framework can be found
in [19].
In many embedded systems, the events of an event stream can have
various types and impose different workloads on the systems depending on
their types. Abstract stream models for the characterization of streams with
different event types are introduced in [18]. In order to get more accurate
analysis results, these models permit to capture and exploit the knowledge

about correlations and dependencies between different event types in a stream.
Further, in distributed embedded platforms, there often exist correlations
in the workloads imposed by events of a given type on different system


24

Model-Based Design for Embedded Systems

components. In [22], a model is introduced to capture and characterize such
workload correlations in the framework of the MPA. This work shows that
the exploitation of workload correlations can lead to considerably improved
analysis results.
The theory of real-time interfaces is introduced in [14]. It connects the
principles of the RTC and the interface-based embedded system design [2].
The real-time interfaces represent a powerful extension of the MPA framework. They permit an abstraction of the component behavior into interfaces.
This means that a system designer does not need to understand the details
of a component’s implementation, but only needs to know its interface in
order to ensure that the component will work properly in the system. Before
the introduction of the real-time interfaces, the MPA method was limited
to the a posteriori analysis of component-based real-time system designs.
With the real-time interfaces, it is possible to compose systems that are
correct by construction.

1.7 Concluding Remarks
In this chapter, we have introduced the reader to the system-level performance prediction of distributed embedded platforms in the early design
stages. We have defined the problem and given a brief overview of
approaches to performance analysis.
Starting from a simple application scenario, we have presented a formal
system description method in the time domain. We have described its usefulness for the simulation of concrete system executions, but at the same time

we have pointed out that the method is inappropriate for worst-case analysis, as in general it cannot guarantee the coverage of corner cases.
Driven by the need to provide hard performance bounds for distributed
embedded platforms, we have generalized the formalism to an abstraction
in the time interval domain based on the VCCs and the RTC. We have presented the essential models underlying the resulting framework for the MPA
and we have demonstrated its application. Finally, we have described a compact representation of the VCCs that enables an efficient computation of RTC
curve operations in practice, and we have presented the RTC Toolbox for
MATLAB, the implementation of the MPA analysis framework.

Acknowledgments
The authors would like to thank Ernesto Wandeler for contributing to some
part of this chapter and Nikolay Stoimenov for helpful comments on an earlier version.


Performance Prediction of Distributed Platforms

25

References
1. S. Chakraborty, S. Künzli, and L. Thiele. A general framework
for analysing system properties in platform-based embedded system
designs. In Design Automation and Test in Europe (DATE), pp. 190–195,
Munich, Germany, March 2003. IEEE Press.
2. L. de Alfaro and T. A. Henzinger. Interface theories for component-based
design. In EMSOFT ’01: Proceedings of the First International Workshop on
Embedded Software, pp. 148–165, London, U.K., 2001. Springer-Verlag.
3. M. G. Harbour, J. J. Gutiérrez García, J. C. Palencia Gutiérrez, and J. M.
Drake Moyano. Mast: Modeling and analysis suite for real time applications. In Proceedings of 13th Euromicro Conference on Real-Time Systems,
pp. 125–134, Delft, the Netherlands, 2001. IEEE Computer Society.
4. W. Haid and L. Thiele. Complex task activation schemes in system level
performance analysis. In 5th International Conference on Hardware/Software

Codesign and System Synthesis (CODES+ISSS’07), pp. 173–178, Salzburg,
Austria, October 2007.
5. S. Künzli, F. Poletti, L. Benini, and L. Thiele. Combining simulation and
formal methods for system-level performance analysis. In Design Automation and Test in Europe (DATE), pp. 236–241, Munich, Germany, 2006. IEEE
Computer Society.
6. K. Lahiri, A. Raghunathan, and S. Dey. System-level performance analysis for designing on-chip communication architectures. IEEE Transactions
on CAD of Integrated Circuits and Systems, 20(6):768–783, 2001.
7. J.-Y. Le Boudec and P. Thiran. Network Calculus: A Theory of Deterministic
Queuing Systems for the Internet. Springer-Verlag, New York, Inc., 2001.
8. A. Maxiaguine, S. Künzli, and L. Thiele. Workload characterization
model for tasks with variable execution demand. In Design Automation
and Test in Europe (DATE), pp. 1040–1045, Paris, France, February 2004.
IEEE Computer Society.
9. The Open SystemC Initiative (OSCI). .
10. J. C. Palencia Gutiérrez and M. G. Harbour. Schedulability analysis for
tasks with static and dynamic offsets. In Proceedings of the 19th Real-Time
Systems Symposium, Madrid, Spain, 1998. IEEE Computer Society.
11. T. Pop, P. Eles, and Z. Peng. Holistic scheduling and analysis of mixed
time/event-triggered distributed embedded systems. In CODES ’02:


26

Model-Based Design for Embedded Systems
Proceedings of the Tenth International Symposium on Hardware/Software
Codesign, pp. 187–192, New York, 2002. ACM.

12. K. Richter, M. Jersak, and R. Ernst. A formal approach to mpsoc performance verification. IEEE Computer, 36(4):60–67, 2003.
13. L. Thiele, S. Chakraborty, and M. Naedele. Real-time calculus for
scheduling hard real-time systems. In Proceedings Symposium on Circuits

and Systems, volume 4, pp. 101–104, Geneva, Switzerland, 2000.
14. L. Thiele, E. Wandeler, and N. Stoimenov. Real-time interfaces for composing real-time systems. In International Conference on Embedded Software
EMSOFT 06, pp. 34–43, Seoul, Korea, 2006.
15. K. Tindell and J. Clark. Holistic schedulability analysis for distributed
hard real-time systems. Microprocess. Microprogram., 40(2–3):117–134,
1994.
16. E. Wandeler and L. Thiele. Interface-based design of real-time systems
with hierarchical scheduling. In 12th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS), pp. 243–252, San Jose, CA,
April 2006.
17. E. Wandeler. Modular performance analysis and interface-based design
for embedded realtime systems. PhD thesis, ETH Zürich, 2006.
18. E. Wandeler, A. Maxiaguine, and L. Thiele. Quantitative characterization
of event streams in analysis of hard real-time applications. Real-Time Systems, 29(2):205–225, March 2005.
19. E. Wandeler, A. Maxiaguine, and L. Thiele. Performance analysis of
greedy shapers in real-time systems. In Design, Automation and Test in
Europe (DATE), pp. 444–449, Munich, Germany, March 2006.
20. E. Wandeler and L. Thiele. Optimal TDMA time slot and cycle length
allocation. In Asia and South Pacific Desing Automation Conference (ASPDAC), pp. 479–484, Yokohama, Japan, January 2006.
21. E. Wandeler and L. Thiele. Real-Time Calculus (RTC) Toolbox.
2006.
22. E. Wandeler and L. Thiele. Workload correlations in multi-processor
hard real-time systems. Journal of Computer and System Sciences, 73(2):207–
224, March 2007.


2
SystemC-Based Performance Analysis of
Embedded Systems
Jürgen Schnerr, Oliver Bringmann, Matthias Krause, Alexander Viehl,
and Wolfgang Rosentiel


CONTENTS
2.1
2.2

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Performance Analysis of Distributed Embedded Systems . . . . . . . . . . . . . . . . . . .
2.2.1 Analytical Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Simulative Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Hybrid Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Transaction-Level Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Accuracy and Speed Trade-Off during Refinement Process . . . . . . . . .
2.3.1.1 Communication Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1.2 Computation Refinement of Software Applications . . . . . . .
2.4 Proposed Hybrid Approach for Accurate Software Timing Simulation . . . .
2.4.1 Back-Annotation of WCET/BCET Values . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2 Annotation of SystemC Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.3 Static Cycle Calculation of a Basic Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.4 Modeling of Pipeline for a Basic Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.4.1 Modeling with the Help of Reservation Tables . . . . . . . . . . . .
2.4.4.2 Calculation of Pipeline Overlapping . . . . . . . . . . . . . . . . . . . . . . . .
2.4.5 Dynamic Correction of Cycle Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.5.1 Branch Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.5.2 Instruction Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.5.3 Cache Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.5.4 Cache Analysis Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.5.5 Cycle Calculation Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.6 Consideration of Task Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.7 Preemption of Software Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.6 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28
29
29
30
31
32
33
33
34
35
36
38
40
40
41
42
43
43
43
44
44
45
46
46
47
50

50
51

This chapter presents a methodology for SystemC-based performance analysis of embedded systems. This methodology is based on a cycle-accurate
simulation approach for the embedded software that also allows the integration of abstract SystemC models. Compared to existing simulation-based
approaches, a hybrid method is presented that resolves performance issues
27


28

Model-Based Design for Embedded Systems

by combining the advantages of simulation-based and analytical approaches.
In the first step, cycle-accurate static execution time analysis is applied at
each basic block of a cross-compiled binary program using static processor models. After that, the determined timing information is back-annotated
into SystemC for a fast simulation of all effects that cannot be resolved statically. This allows the consideration of data dependencies during runtime,
and the incorporation of branch prediction and cache models by efficient
source-code instrumentation. The major benefit of our approach is that the
generated code can be executed very efficiently on the simulation host with
approximately 90% of the speed of the untimed software without any code
instrumentation.

2.1 Introduction
In the future, new system functionality will be realized less by the sum of
single components, but more by cooperation, interconnection, and distribution of these components, thereby leading to distributed embedded systems.
Furthermore, new applications and innovations arise more and more from
a distribution of functionality as well as from a combination of previously
independent functions. Therefore, in the future, this distribution will play an
important part in the increase of the product value.

The system responsibility of the supplier is also currently increasing. This
is because the supplier is not only responsible for the designed subsystem,
but additionally for the integration of the subsystem in the context of the
entire system. This integration is becoming more complex: today, requirements of single components are validated; in future, the requirements validation of the entire system has to be achieved with regard to the designed
component.
What this means is that changes in the product area will lead to a paradigm shift in the design. Even in the design stage, the impact of a component on an entire system has to be considered. A comprehensive modeling
of distributed systems, and an early analysis and simulation of the system
integration have to be considered.
Therefore, a methodical design process of distributed embedded systems
has to be established, taking into account the timing behavior of the embedded software very early in the design process. This methodical design process can be implemented by using a comprehensive modeling of distributed
systems and by using a platform-independent development of the application software (UML [6], MATLAB R /Simulink R [24], and C++).
What is also important is the early inclusion of the intended target platform in the model-based system design (UML), the mapping of function
blocks on platform components, and the use of virtual prototypes for the
abstract modeling of the target architecture.


SystemC-Based Performance Analysis of Embedded Systems

29

An early evaluation of the target platform means that the application
software can be evaluated while considering the target platform. Hence, an
optimization of the target platform under consideration of the application
software, performance requirements, power dissipation, and reliability can
take place.
An early analysis of the system integration is provided by an early verification and exposure of integration faults using virtual prototypes. After that,
a seamless transition to the physical prototype can take place.

2.2 Performance Analysis of Distributed Embedded
Systems

The main question of performance analysis of distributed embedded systems
is: What is the global timing behavior of a system and how can it be determined? The central issue is that computation has no timing behavior as long
as the target platform is not known because the target platform has a major
effect on timing.
The specification, however, can contain global performance requirements. The fulfillment of these requirements depends on local timing behaviors of system parts. A solution for determining local timing properties is an
early inclusion of the target architecture.
Several analytical and simulative approaches for performance analysis
have previously been proposed. In this chapter, a hybrid approach for performance analysis will be presented.

2.2.1 Analytical Approaches
Analytical approaches perform a formal analysis of pessimistic corner cases
based on a system model. Corner cases are hard bounds of the temporal system behavior. The approaches can be divided into two categories: black-box
approaches and white-box approaches. Furthermore, both approaches can
be categorized depending on the level of system abstraction and with regard
to the model of computation that is employed.
Black-box approaches consider functional system components as black
boxes and abstract from their internal behavior.
Black-box abstraction commonly uses a task model [33] with abstract task
activation and event streams representing activation patterns [34] at the task
level. Using event stream propagation, fixed points are calculated. For this,
no modification of the event streams is necessary. Examples for black-box
approaches are the real-time calculus (see Chapter 1 or [44]), the systemlevel composition by event stream propagation as it is used in SymTA/S (see
Chapter 3 or [11]), the MAST framework [9], and the framework proposed by
Pop et al. [31].


30

Model-Based Design for Embedded Systems


White-box approaches include an abstract control-flow representation of
each process within the system model. Then, a global performance and communication analysis considering (data-dependent) control structures of all
processes can take place. For this analysis, an extraction of the control flow
from the application software or from UML models [47] is required. Then,
the environment can be modeled using event models or processes. Examples
for white-box approaches are the communication dependency analysis [41],
the control-flow-based extraction of hierarchical event streams [1], and timed
automata [27].
Analytical approaches that only rely on best-case and worst-case timing
estimates are very often too pessimistic, hence risk estimation for concrete
scenarios is difficult to carry out. Different probabilistic analytic approaches
attempt to tackle this issue by considering probabilities of timing quantities
in white-box system analysis.
Timed Petri nets [49] are able to represent the internal behavior of a
system. Although there exist stochastic extensions by generalized stochastic Petri nets (GSPN) [23], these do not consider execution times of the actual
system components. Furthermore, synchronization by communication and
the specification of communication protocols have to be modeled explicitly and cannot be extracted from executable functional implementations of
a design.
System-level performance and power estimation based on stochastic
automata networks (SAN) are introduced in [22]. The system including
probabilities of execution times is modeled explicitly in SAN. The actual
execution behavior of the components related to timing and control flow of a
functional implementation is not considered. Stochastic automata [3] extend
the model of communicating I/O automata [42] by general probability distributions for verifying performance requirements of systems. The system and
timing probabilities have to be modeled explicitly and no bottom-up evaluation of a functional system implementation is given.

2.2.2 Simulative Approaches
Simulative approaches perform a simulation of the entire communication
infrastructure and the processing elements. If necessary, this simulation
includes a hardware IP.

Depending on the underlying model of computation, a network simulator such as the OPNET [28], Simulink, or SystemC [14] can be employed
to simulate a network between communicating C/C++ processes. Timing
annotation of such a network simulation is possible, but the exact timing
behavior of the software is missing. To obtain this timing behavior, it is necessary to simulate the software execution on the target processor. For this
simulation, the binary code for the target platform component is required.
This binary code can run on an instruction set simulator (ISS). An ISS is
an abstract model for executing instructions at the binary level and can be
implemented either as an interpreter or as a binary code translator. It does


×