Tải bản đầy đủ (.pdf) (10 trang)

Model-Based Design for Embedded Systems- P6 pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (312.72 KB, 10 trang )

Nicolescu/Model-Based Design for Embedded Systems 67842_C001 Finals Page 26 2009-10-1
26 Model-Based Design for Embedded Systems
Proceedings of the Tenth International Symposium on Hardware/Software
Codesign, pp. 187–192, New York, 2002. ACM.
12. K. Richter, M. Jersak, and R. Ernst. A formal approach to mpsoc perfor-
mance verification. IEEE Computer, 36(4):60–67, 2003.
13. L. Thiele, S. Chakraborty, and M. Naedele. Real-time calculus for
scheduling hard real-time systems. In Proceedings Symposium on Circuits
and Systems, volume 4, pp. 101–104, Geneva, Switzerland, 2000.
14. L. Thiele, E. Wandeler, and N. Stoimenov. Real-time interfaces for com-
posing real-time systems. In International Conference on Embedded Software
EMSOFT 06, pp. 34–43, Seoul, Korea, 2006.
15. K. Tindell and J. Clark. Holistic schedulability analysis for distributed
hard real-time systems. Microprocess. Microprogram., 40(2–3):117–134,
1994.
16. E. Wandeler and L. Thiele. Interface-based design of real-time systems
with hierarchical scheduling. In 12th IEEE Real-Time and Embedded Tech-
nology and Applications Symposium (RTAS), pp. 243–252, San Jose, CA,
April 2006.
17. E. Wandeler. Modular performance analysis and interface-based design
for embedded realtime systems. PhD thesis, ETH Zürich, 2006.
18. E. Wandeler, A. Maxiaguine, and L. Thiele. Quantitative characterization
of event streams in analysis of hard real-time applications. Real-Time Sys-
tems, 29(2):205–225, March 2005.
19. E. Wandeler, A. Maxiaguine, and L. Thiele. Performance analysis of
greedy shapers in real-time systems. In Design, Automation and Test in
Europe (DATE), pp. 444–449, Munich, Germany, March 2006.
20. E. Wandeler and L. Thiele. Optimal TDMA time slot and cycle length
allocation. In Asia and South Pacific Desing Automation Conference (ASP-
DAC), pp. 479–484, Yokohama, Japan, January 2006.
21. E. Wandeler and L. Thiele. Real-Time Calculus (RTC) Toolbox.


2006.
22. E. Wandeler and L. Thiele. Workload correlations in multi-processor
hard real-time systems. Journal of Computer and System Sciences, 73(2):207–
224, March 2007.
Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 27 2009-10-13
2
SystemC-Based Performance Analysis of
Embedded Systems
Jürgen Schnerr, Oliver Bringmann, Matthias Krause, Alexander Viehl,
and Wolfgang Rosentiel
CONTENTS
2.1 Introduction 28
2.2 PerformanceAnalysisofDistributedEmbeddedSystems 29
2.2.1 Analytical Approaches 29
2.2.2 Simulative Approaches 30
2.2.3 Hybrid Approaches 31
2.3 Transaction-LevelModeling 32
2.3.1 Accuracy and Speed Trade-Off during Refinement Process 33
2.3.1.1 Communication Refinement 33
2.3.1.2 Computation Refinement of Software Applications 34
2.4 Proposed Hybrid Approach for Accurate Software Timing Simulation 35
2.4.1 Back-Annotation of WCET/BCET Values 36
2.4.2 Annotation of SystemC Code . 38
2.4.3 Static Cycle Calculation of a Basic Block 40
2.4.4 Modeling of Pipeline for a Basic Block . 40
2.4.4.1 Modeling with the Help of Reservation Tables 41
2.4.4.2 Calculation of Pipeline Overlapping 42
2.4.5 Dynamic Correction of Cycle Prediction 43
2.4.5.1 Branch Prediction 43
2.4.5.2 Instruction Cache 43

2.4.5.3 Cache Model 44
2.4.5.4 Cache Analysis Blocks 44
2.4.5.5 Cycle Calculation Code 45
2.4.6 Consideration of Task Switches 46
2.4.7 Preemption of Software Tasks 46
2.5 ExperimentalResults 47
2.6 Outlook 50
2.7 Conclusions 50
References 51
This chapter presents a methodology for SystemC-based performance anal-
ysis of embedded systems. This methodology is based on a cycle-accurate
simulation approach for the embedded software that also allows the inte-
gration of abstract SystemC models. Compared to existing simulation-based
approaches, a hybrid method is presented that resolves performance issues
27
Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 28 2009-10-13
28 Model-Based Design for Embedded Systems
by combining the advantages of simulation-based and analytical approaches.
In the first step, cycle-accurate static execution time analysis is applied at
each basic block of a cross-compiled binary program using static proces-
sor models. After that, the determined timing information is back-annotated
into SystemC for a fast simulation of all effects that cannot be resolved stat-
ically. This allows the consideration of data dependencies during runtime,
and the incorporation of branch prediction and cache models by efficient
source-code instrumentation. The major benefit of our approach is that the
generated code can be executed very efficiently on the simulation host with
approximately 90% of the speed of the untimed software without any code
instrumentation.
2.1 Introduction
In the future, new system functionality will be realized less by the sum of

single components, but more by cooperation, interconnection, and distribu-
tion of these components, thereby leading to distributed embedded systems.
Furthermore, new applications and innovations arise more and more from
a distribution of functionality as well as from a combination of previously
independent functions. Therefore, in the future, this distribution will play an
important part in the increase of the product value.
The system responsibility of the supplier is also currently increasing. This
is because the supplier is not only responsible for the designed subsystem,
but additionally for the integration of the subsystem in the context of the
entire system. This integration is becoming more complex: today, require-
ments of single components are validated; in future, the requirements vali-
dation of the entire system has to be achieved with regard to the designed
component.
What this means is that changes in the product area will lead to a para-
digm shift in the design. Even in the design stage, the impact of a compo-
nent on an entire system has to be considered. A comprehensive modeling
of distributed systems, and an early analysis and simulation of the system
integration have to be considered.
Therefore, a methodical design process of distributed embedded systems
has to be established, taking into account the timing behavior of the embed-
ded software very early in the design process. This methodical design pro-
cess can be implemented by using a comprehensive modeling of distributed
systems and by using a platform-independent development of the applica-
tion software (UML [6], MATLAB
R

/Simulink
R

[24], and C++).

What is also important is the early inclusion of the intended target plat-
form in the model-based system design (UML), the mapping of function
blocks on platform components, and the use of virtual prototypes for the
abstract modeling of the target architecture.
Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 29 2009-10-13
SystemC-Based Performance Analysis of Embedded Systems 29
An early evaluation of the target platform means that the application
software can be evaluated while considering the target platform. Hence, an
optimization of the target platform under consideration of the application
software, performance requirements, power dissipation, and reliability can
take place.
An early analysis of the system integration is provided by an early verifi-
cation and exposure of integration faults using virtual prototypes. After that,
a seamless transition to the physical prototype can take place.
2.2 Performance Analysis of Distributed Embedded
Systems
The main question of performance analysis of distributed embedded systems
is: What is the global timing behavior of a system and how can it be deter-
mined? The central issue is that computation has no timing behavior as long
as the target platform is not known because the target platform has a major
effect on timing.
The specification, however, can contain global performance require-
ments. The fulfillment of these requirements depends on local timing behav-
iors of system parts. A solution for determining local timing properties is an
early inclusion of the target architecture.
Several analytical and simulative approaches for performance analysis
have previously been proposed. In this chapter, a hybrid approach for per-
formance analysis will be presented.
2.2.1 Analytical Approaches
Analytical approaches perform a formal analysis of pessimistic corner cases

based on a system model. Corner cases are hard bounds of the temporal sys-
tem behavior. The approaches can be divided into two categories: black-box
approaches and white-box approaches. Furthermore, both approaches can
be categorized depending on the level of system abstraction and with regard
to the model of computation that is employed.
Black-box approaches consider functional system components as black
boxes and abstract from their internal behavior.
Black-box abstraction commonly uses a task model [33] with abstract task
activation and event streams representing activation patterns [34] at the task
level. Using event stream propagation, fixed points are calculated. For this,
no modification of the event streams is necessary. Examples for black-box
approaches are the real-time calculus (see Chapter 1 or [44]), the system-
level composition by event stream propagation as it is used in SymTA/S (see
Chapter 3 or [11]), the MAST framework [9], and the framework proposed by
Pop et al. [31].
Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 30 2009-10-13
30 Model-Based Design for Embedded Systems
White-box approaches include an abstract control-flow representation of
each process within the system model. Then, a global performance and com-
munication analysis considering (data-dependent) control structures of all
processes can take place. For this analysis, an extraction of the control flow
from the application software or from UML models [47] is required. Then,
the environment can be modeled using event models or processes. Examples
for white-box approaches are the communication dependency analysis [41],
the control-flow-based extraction of hierarchical event streams [1], and timed
automata [27].
Analytical approaches that only rely on best-case and worst-case timing
estimates are very often too pessimistic, hence risk estimation for concrete
scenarios is difficult to carry out. Different probabilistic analytic approaches
attempt to tackle this issue by considering probabilities of timing quantities

in white-box system analysis.
Timed Petri nets [49] are able to represent the internal behavior of a
system. Although there exist stochastic extensions by generalized stochas-
tic Petri nets (GSPN) [23], these do not consider execution times of the actual
system components. Furthermore, synchronization by communication and
the specification of communication protocols have to be modeled explic-
itly and cannot be extracted from executable functional implementations of
a design.
System-level performance and power estimation based on stochastic
automata networks (SAN) are introduced in [22]. The system including
probabilities of execution times is modeled explicitly in SAN. The actual
execution behavior of the components related to timing and control flow of a
functional implementation is not considered. Stochastic automata [3] extend
the model of communicating I/O automata [42] by general probability distri-
butions for verifying performance requirements of systems. The system and
timing probabilities have to be modeled explicitly and no bottom-up evalua-
tion of a functional system implementation is given.
2.2.2 Simulative Approaches
Simulative approaches perform a simulation of the entire communication
infrastructure and the processing elements. If necessary, this simulation
includes a hardware IP.
Depending on the underlying model of computation, a network simu-
lator such as the OPNET [28], Simulink, or SystemC [14] can be employed
to simulate a network between communicating C/C++ processes. Timing
annotation of such a network simulation is possible, but the exact timing
behavior of the software is missing. To obtain this timing behavior, it is nec-
essary to simulate the software execution on the target processor. For this
simulation, the binary code for the target platform component is required.
This binary code can run on an instruction set simulator (ISS). An ISS is
an abstract model for executing instructions at the binary level and can be

implemented either as an interpreter or as a binary code translator. It does
Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 31 2009-10-13
SystemC-Based Performance Analysis of Embedded Systems 31
not consider modeling of the bus behavior. The binary code translation can
be realized in two different ways: either as a static or as a dynamic compila-
tion, also called the just-in-time (JIT) compilation [26]. An ISS is used in sev-
eral commercial solutions, like the CoWare Processor Designer [5], CoMET
from VaST Systems Technology [45], or Synopsys Virtual Platforms [43].
Furthermore, the binary code can be executed using a processor model
that captures the complete processor (functional units, pipelines, caches, reg-
ister, counter, I/Os, etc.). Such a model can have several levels of accu-
racy. For example, it can be a transaction-level model or a register transfer
model. Since our approach uses transaction-level modeling (TLM), we will
describe the different levels of abstraction of TLM models in more detail in
Section 2.3.
In addition to simulating the processor, peripheral components and cus-
tom hardware have to be simulated as well, either by a co-simulation with
HDL (hardware description language) simulators or by using SystemC.
An abstract processor model with an integrated RTOS (real-time operat-
ing system) model using task scheduling was presented in [35]. Addition-
ally, a processor model using neural networks for execution-cycle estimation
was presented in [30]. A transaction-level approach for the performance eval-
uation of SoC (System-on-Chip) architectures was presented in [48]. This
approach is trace-based, and, therefore, cannot guarantee a sufficient path
coverage of control-flow-dominated applications.
Furthermore, the integration of a so-called cycle-approximate retar-
getable processor model for software performance estimation at the trans-
action level was presented in [13]. The major drawback of this approach
is that microarchitecture-dependent properties are measured on the target
platform and are included probabilistically during execution. The compara-

ble low deviation from on-board measurements of only 8% results from the
fact that the reference measurements used the same examples and input data
that the models were built from. It is likely that data-dependent effects will
lead to larger accuracy errors.
2.2.3 Hybrid Approaches
Hybrid approaches combine the advantages of analytical and simulative
approaches. A hybrid approach for combining simulation and formal anal-
ysis for tightening bounds of system-level performance analysis was pre-
sented in [20]. The objectives are to determine timing characteristics of
nonformally specified components by simulation and to integrate simula-
tion results into a framework for formal performance analysis. In compari-
son to the approach shown in [20], we focus on a fast timing simulation of
the embedded software. The results determined using our approach may be
included in system-level performance methodologies with the benefit of high
accuracy and large time savings in the simulation stage.
Analytic performance risk quantification based on profiled execution
times is presented in [46]. The model is derived from physical
Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 32 2009-10-13
32 Model-Based Design for Embedded Systems
implementations. Although it is able to represent the temporal behavior of
communication, computation, and synchronization, data-dependent timing
effects cannot be detected reliably.
A hybrid model for the fast simulation that allows switching between
native code execution and ISS-based simulation was presented in [17].
Another approach using a hybrid model was shown in [38] and [36]. This
approach is based on the translation of an object code into an annotated
binary code for the target processor. For the cycle-accurate execution of the
annotated code on this processor, a special hardware is needed.
2.3 Transaction-Level Modeling
The TLM is a high-level approach to model systems where computation and

communication between system modules are separated for each module of
the proposed target architecture. Components that are described at different
levels of abstraction can be integrated and exchanged in one common sys-
tem model using standardized interfaces. Furthermore, an exploration and a
refinement of components and their implementation in the global architec-
ture can be performed.
Transaction-level models address the problem of designing increasingly
complex systems by raising the level of design abstraction above the register
transfer level (RTL). The Open SystemC Initiative (OSCI) Transaction-Level
Working Group has defined different levels of abstraction. Of these abstrac-
tion levels, transaction-level models apply at the levels between the Algo-
rithmic Level (AL) and the RTL. These levels are introduced in [2] and also
are briefly presented here.
• Algorithmic Level (AL): Purely behavioral, no architectural detail
whatsoever.
• Untimed (UT) Modeling: Notion of simulation time is not required,
each process runs up to the next explicit synchronization point before
yielding.
• Loosely Timed (LT) Modeling: The simulation time is used, but pro-
cesses are temporally decoupled from the simulation time. Each pro-
cess keeps a tally of the time it consumes, and may yield because it
reaches an explicit synchronization point or because it has consumed
its time quantum.
• Approximately Timed (AT) Modeling: Processes run in lockstep with
the SystemC simulation time. Delays of process interactions are anno-
tated by using timeouts (wait) or timed event notifications.
• Register Transfer Level (RTL): Has the description of the register and
combination logic.
Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 33 2009-10-13
SystemC-Based Performance Analysis of Embedded Systems 33

2.3.1 Accuracy and Speed Trade-Off during Refinement Process
The proposed approach allows for an early incorporation of the effects of
the underlying target platform into the embedded software design. Platform
architectures are not limited to single-core processors with simple communi-
cation architectures. The approach also applies to multi-core architectures
and distributed embedded systems with complex network architectures,
for instance, networks of interconnected electronic control units (ECUs) in
the automotive domain. This flexibility requires a seamless refinement flow
for the embedded software beginning at the platform-independent software
down to the platform-specific target software. By stepwise refinement of the
system model, a design at lower levels of abstraction, where the simulation
is more accurate at the expense of increasing the simulation time, can be
obtained. Two different refinement strategies have to be distinguished: com-
putation refinement and communication refinement. Computation refine-
ment is especially applicable for single-processor embedded systems without
a special focus on communication aspects. In this case, the complexity of exe-
cuting a cross-compiled binary code may be acceptable. But with an increas-
ing number of processing units and network complexity (e.g., hierarchical
automotive networks consisting of FlexRay, CAN, LIN, and MOST buses),
the simulation speed for analyzing the timing influences of the embed-
ded software on the distributed system becomes unacceptable. This issue
is addressed by a highly scalable performance simulation approach for net-
worked embedded systems because the integration of the ISSs with a high
simulation time into each processing element becomes obsolete. A decreas-
ing simulation time is specifically enabled by keeping computation at a
high level of abstraction whereas communication is refined to a lower level
or vice versa. During the refinement flow, different levels of abstraction
are traversed. This strategy is supported by the TLM in SystemC. More
detailed information about the modeling and refinement of SystemC sim-
ulation models within the scope of the automotive embedded software and

AUTOSAR [10] is presented in [19].
2.3.1.1 Communication Refinement
As shown in Figure 2.1, there exists a communication scheme at the UT
level that is called point-to-point communication. The point-to-point com-
munication can be timed or untimed. A timed representation means that
an abstract timing behavior is provided by use of wait(T) statements,
which are allowed to be introduced within the point-to-point communica-
tion. However, only certain cases can be considered during simulation. The
consideration of all cases possibly results in an infinite or at least in an unac-
ceptable simulation time. This is a general problem of simulation, and only a
formal analysis can solve this problem to cover each corner case of the system
behavior. Such a method is also introduced in [39] and [40].
Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 34 2009-10-13
34 Model-Based Design for Embedded Systems
UT/LT
Untimed/timed
structural communication
CDMA
Timing approximate
communication
CAN
UT
Untimed/timed
p-2-p communication
AT
Cycle-accurate
communication
CAN
Refinement flow
FIGURE 2.1

The communication refinement flow. (From Krause, M. et al., Des. Automat.
Embed. Syst., 10, 237, 2005. With permission.)
The refinement from untimed modeling to loosely timed modeling intro-
duces abstract or dedicated buses respectively. The ports and interfaces
of the untimed modeling remain and only the channel implementation is
replaced. Figure 2.1 illustrates the communication refinement process for a
CAN bus.
Refinement from the TLM to the RTL description means replacing trans-
actions by signals. This refinement technique is described in [8] in detail.
2.3.1.2 Computation Refinement of Software Applications
Considering computation, the design is transformed to a structural represen-
tation by specifying the desired target architecture. Using untimed modeling,
processes are still simulated as parallel processes by the SystemC simulation
kernel.Themostimportantimpacttoasoftwarerealizationistheimplemented
scheduling of threads that are assigned to the same processing elements.
The refinement from an unstructured to a structured execution order is
done by introducing a scheduler model to the system description, or, for
more detailed modeling, an abstract RTOS model. However, this requires the
specification of preemption points. Together with such preemption points,
the timing information of the runtime is annotated. This chapter presents an
approach on how to obtain and integrate the accurate timing information.
Figure 2.2 illustrates the computation refinement process. Detailed informa-
tion about refinement is presented in [18].
UT
Untimed/timed
parallel processes
Refinement flow
UT/LT
Untimed/timed
scheduled processes

Scheduled processes,
approximate timing
AT
Cycle-accurate
computation
CAN
RTOS RTOS
CPU CPU
RTOS RTOS
FIGURE 2.2
The computation refinement flow. (From Krause, M. et al., Des. Automat.
Embed. Syst., 10, 238, 2005. With permission.)
Nicolescu/Model-Based Design for Embedded Systems 67842_C002 Finals Page 35 2009-10-13
SystemC-Based Performance Analysis of Embedded Systems 35
2.4 Proposed Hybrid Approach for Accurate Software
Timing Simulation
In this section, a hybrid approach for the performance simulation of the
embedded software [37] will be presented. Hybrid approaches consist of
a combination of analytic and simulative approaches with the objective of
gaining simulation speed while maintaining sufficient accuracy.
The integratability in a global refinement flow for the software down to
the cycle-approximate level is given by the automated generation of the TLM
interfaces.
The static worst-case/best-case execution time (WCET/BCET) analysis
abstracts the influence of data dependencies on the software execution time.
Because of this, the BCET/WCET analysis delivers very good results of the
entire basic blocks, but it is too pessimistic across the basic block boundaries.
Furthermore, the effects of a concurrent cache usage of different applications
on multi-core architectures lead to even wider bounds. An analytic solution
for this issue is still unknown. The objective of the presented approach is the

reduction of pessimism that is contained in the WCET/BCET boundaries.
Simulative techniques that consider an application with concrete input
data and the target architecture can be used to determine the timing behavior
of the softwareonthe underlying architecture.Theproposed approach triesto
prevent repeated time-consuming interpretation and repeated timing deter-
mination of all executed binary code instructions on the target architecture.
The hybrid approach provided in this chapter applies back-annotation
of the WCET/BCET values. These values are determined statically at the
basic block level using the binary code that was generated from the C source
code. Additionally, the timing impact of data-dependent architectural prop-
erties such as branch prediction is also considered effectively. The tool that
implements the proposed methodology generates the SystemC code. This
code can be compiled for any host machine to be used for a target platform-
independent simulation.
Communication calls in the automatically created SystemC models are
encapsulated in the TLM [7] communication primitives. In this way, a clean
and standardized ability to integrate the timed embedded software in virtual
SystemC prototypes is provided.
One major advantage of the presented methodology is in the area of
multi-core processors with shared caches. Whereas static analysis has no
knowledge of concurrent cache usage of different applications and the
impact on execution time, the presented methodology is able to handle these
issues. How this is done will be described in more detail in Section 2.4.6.
Another possibility would be a translation of the binary code into the
annotated SystemC code. One of the main advantages of such an approach
is that no source code is needed, as the binary code is used for determining
cycle counts and for generating the SystemC code. Another advantage is that

×