Tải bản đầy đủ (.pdf) (10 trang)

Model-Based Design for Embedded Systems- P41 doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (477.42 KB, 10 trang )

Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 366 2009-10-1
366 Model-Based Design for Embedded Systems
PLBv46
Slave
PLBv46
master
burst
Slave
buffer
interface
LocalLink
write buffer
LocalLink
read buffer
Reset
logic
Interrupt
generation
Bridge
control
logic
Bridge status signals
Interrupt request
Reset
Control
bus
Control
bus
Control bus
DCR
slave


PLB
bus
PLB
bus
DCR
bus
Reconfigured region
Bus macro enable
Reconfigurable socket
Reset request
FIGURE 12.7
Reconfigurable socket abstraction based on the “PLBv46 PLBv46 bridge”
architecture. The “PLBv46 slave” and “PLBv46 master burst” blocks are stan-
dard IP components and all blocks except the DCR slave block are part of the
bridge. Bus macros are implicitly present on all signals crossing the bound-
ary of the reconfigured region.
An alternative is to architect the interface around a bus bridge, with inde-
pendent busses in the static region and in the reconfigurable region. The
design of the socket is based on partitioning the Xilinx “PLBv46 PLBv46
bridge” IP [23], as shown in the block diagram in Figure 12.7. Internally this
core is based around 32-bit fixed-width data FIFOs and a small number of
control signals. Most of the bridge is treated as part of the static region, with
only a small amount of logic required in the reconfigurable region to com-
plete the bridge. In addition to the bus interface, which is primarily used to
interface to the reconfigured region, the socket core also contains a control
interface (based on the DCR protocol [7]) which is used to generate an inde-
pendent reset signal to the reconfigurable region and to force signals driven
by the reconfigurable module to stable values during reconfiguration.
12.5.3 Direct Memory Access Interfaces
The bus interface above is a generic and flexible interface, which can be used

to communicate with the reconfigured portion of the system in different
ways. For instance, it may be used by the processor to both send and receive
data from the reconfigured region or as a control interface to set parame-
ter values of IP cores executing in the reconfigured region. However, it does
have several disadvantages. Primarily, the bandwidth of data to or from the
Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 367 2009-10-1
FPGA Platforms for Embedded Systems 367
processor is limited because of the overhead of bus arbitration and the fact
that the memory range is treated as uncached I/O transactions. Although
performance could be improved somewhat for large transactions by using
DMA engines or treating data transfer regions as cached and manually man-
aging cache coherency, this would significantly increase the complexity of
the processor software. Secondly, many FPGA algorithms require access to
external memory for buffering data until it can be processed. For instance, in
a network router, packet data may need to be stored until a routing decision
can be made, or in a streaming video system, several frames of video data
may need to be stored to analyze object motion between frames.
Because of these limitations, it is best to consider the bus interface above
as primarily an interface used for low-bandwidth control and configuration
information. In systems that require higher bandwidth communication, or
direct access to external memory, the control interface can be augmented
with additional interfaces to memory. Although it may seem straightforward
to include a complementary bus bridge that can be driven by the reconfig-
ured region to provide this functionality, this tends not to be the highest
bandwidth option since performance can be limited by the arbitration logic
of the PLB bus. This logic is heavily pipelined in order to maximize the bus
throughput under a wide variety of usage, typically incurring three cycles of
latency before a slave can respond to a bus access.
One solution is to provide an interface connected directly to the native
port interface (NPI) of the Xilinx MPMC IP core, as shown in Figure 12.8.

External memory (e.g., DDR/DDR2)
Arbiter
Multiported
memory
controller
Physical interface
FIFO
FIFO
PIM
PIM
FIGURE 12.8
Architecture of the Xilinx MPMC.
Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 368 2009-10-1
368 Model-Based Design for Embedded Systems
Typically, this interface exhibits both lower latency and higher bandwidth
than the PLB bus. Although the MPMC must still arbitrate between different
ports attempting to use the memory controller, this arbitration can be per-
formed locally within the memory controller and concurrently with the data
being provided. The only disadvantage of connecting directly to the mem-
ory controller is that other IP cores in the static region cannot be accessed
from the reconfigured region. However, since in the SRP usage model these
IP cores are likely being managed by device drivers in the operating system
of the processor, it is questionable whether such access should be allowed
anyway.
12.5.4 External Interfaces
In addition to communicating with the static region, a reconfigurable mod-
ule may also communicate with other interfaces external to the FPGA. In
order to accomplish this, a reconfigurable region may include external I/O
pins and/or high-speed serial transceivers. For the most part, these resources
can be treated as any other FPGA primitives and can be placed and routed

as usual.
However, there is some complexity with regard to external I/O pins,
since in many FPGA designs, the input/output buffer (IOB) primitives rep-
resenting external I/O pins are not explicitly instantiated in a user design but
are inferred in the synthesis process. Normally in a hierarchical design, the
netlist can be synthesized using a special option to disable inference of these
primitives, since they will be inferred or instantiated during synthesis of the
toplevel design. However, when building a generic FPGA platform, relying
on this may not be desirable, since the reconfigured region may require more
control over the configuration of these primitives. In other cases, exactly
which IOB primitives are explicitly instantiated in a reconfigurable module
and which ones are not may not be known when the static design is synthe-
sized and implemented. One way to solve this is to not expose any I/O pins
of the reconfigurable region as external signals of the static region, implying
that synthesis of the static design will never include IOB primitives for these
pins. When a reconfigurable module is synthesized, signals interfacing with
the static region are individually tagged with the constraint BUFFER_TYPE
set to NONE, indicating that no IOB primitives should be inferred for those
signals.
High-speed serial transceivers also have additional design complexity,
since each transceiver is associated with specialized clock resources in the
FPGA. These clock resources typically include phase-locked loops for clock
synchronization and dedicated clock distribution paths and may be shared
between transceivers. From the perspective of building FPGA platforms, this
resource sharing combined with how transceivers are grouped into configu-
ration frames may need to be considered during the floorplanning stage in
order to gain maximum usage of the available transceivers.
Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 369 2009-10-1
FPGA Platforms for Embedded Systems 369
Static design flow

EDK base
system builder
+ hand design
EDK BSP
generator
Linux
design
flow
.elf
EDK
platgen
.mhs
.ucf
Floor-
planning
.ngc
.dts
PR-enabled
NGDBuild,
Map, and PAR
PR-enabled
bitgen
EDK
genace.tcl
static.ace
.bit
.ncd
static.used
UCF
merge

.ucf
EDK
platgen
.mhs
EDK
hand design
Module design flow
Hand
design
.ucf
.ngc
PR-enabled
NGDBuild,
Map, and PAR
.ncd
PRMergeDesign +
PR-enabled bitgen
Meta-information
C code
gcc +
objcopy
EDK
genace.tcl
partial.bit .bit
merged.ace
configure.elf
.ucf
FIGURE 12.9
Design flow for PR systems based on EDK.
12.5.5 Implementation Flow

The implementation flow for the system is shown in Figure 12.9. The static
design is implemented first, as shown in the left-hand side of the figure,
using the EA PR tools. During this sequence, no netlist for the reconfigurable
region is present, and the place and route tools only implement logic for the
static region. Design constraints are provided in a .ucf file and must include
the required floorplanning constraints for the PR flow. After routing is com-
pleted, the routing resources used by the static logic are saved in the file
static.used for later use. Since by default the interface with the reconfigured
region is driven to an idle state, the resulting bitstream can be used in a sys-
tem without programming the remainder of the FPGA. The device tree for
a particular design is generated from the EDK design, and after being con-
verted to a binary device tree blob, can be included in the Linux kernel image,
or stored as the initial value of a BRAM in the bitstream. Lastly, EDK is used
to package the FPGA bitstream with the Linux kernel binary in a bootable
image that can be used with Xilinx SystemAce [24] to boot the kernel.
The right-hand side of Figure 12.9 shows a second pass for the imple-
mentation of a reconfigurable module. During this pass, the logic of the
Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 370 2009-10-1
370 Model-Based Design for Embedded Systems
reconfigurable module is implemented together with a small portion of the
static logic called the “context logic.” The context logic is necessary to pro-
vide the context of the reconfigurable module, so that hierarchical names
in the design and location constraints for clock signals and bus macros
can be preserved. The design constraints for implementation are created by
merging the design constraints from the static design with any additional
design constraints specific to the reconfigurable module, such as pin loca-
tion constraints. During this pass, the routing resources in the file static.used
are excluded from use, since these resources are already used in the static
design. The final bitstream for the reconfigurable module is generated by
first merging the design database (contained in an .ncd file) from both

passes, ensuring that the configuration bits used in the static design are pro-
grammed correctly. In addition, design rule checks and timing analysis can
be applied to the merged design database, to ensure that individual passes
were implemented correctly. From the merged design database, it is possi-
ble to generate both a partial bitstream that can be used after configuration
with the static bitstream and a merged bitstream which can be used as an ini-
tial configuration bitstream, with the reconfigurable module already loaded.
To enable reconfiguration in a Linux system, the partial bitstream is encapsu-
lated with the Linux code for performing PR and the meta-information about
the reconfigurable module, to generate a Linux executable, as described in
Section 12.6.
12.6 Managing Partial Reconfiguration in Linux
Two device drivers are used to manage the reconfiguration process. Primar-
ily, the device driver for the ICAP device performs the actual reconfigura-
tion. When a partial bitstream is written to this device (for instance, using
the cp command or the write() system call), the bytes are transferred to
the ICAP. Since the device driver does not inspect or modify the stream of
bytes, the data being written must include the appropriate control words, as
expected by the configuration interface [26]. The device driver also includes
simple locking of the ICAP resource, in order to prevent different processes
from unexpectedly interleaving accesses to the ICAP. Readback is also possi-
ble using this device driver by writing the correct readback request bitstream
to the ICAP and subsequently reading data (using the read() system call).
The second device driver used to manage reconfiguration is associated
with the reconfigurable socket core. This driver exports a character interface
to which meta-information about a reconfigurable module can be written. A
simple way of representing this meta-information is in the form of an array
of struct platform_device, a data structure which is used internally
by Linux to represent devices. A more complex, but perhaps more robust
Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 371 2009-10-1

FPGA Platforms for Embedded Systems 371
Reconfigure FPGA
Notify kernel of devices
Load kernel modules
Enable bus macros
Reset reconfigurable module
Processing
Unload kernel modules
Release devices
Disable bus macros
FIGURE 12.10
The reconfiguration process.
representation of meta-information could be an additional device tree blob.
This meta-information is parsed and checksummed and, if valid, is used to
notify the Linux kernel of the presence of new devices, which can then be
bound to other device drivers. An invalid checksum is interpreted as an indi-
cation to unbind any previously loaded devices and release ownership of the
reconfigured region. Secondarily, this device driver also enables and disables
the bus macros between the static region and the reconfigured region, and
controls the reset of the reconfigured region. As with the ICAP device driver,
the socket device driver includes a simple locking mechanism in order to
prevent a process from unexpectedly reconfiguring an active region in use
by another process.
The complete process of reconfiguration is shown in Figure 12.10. In the
initial state, we assume no module is loaded in the reconfigured region. Next,
a reconfigurable module is loaded into the FPGA through the ICAP device
driver. Next, meta-information about the reconfigurable module issent to the
socket device driver, which registers the presence of any new devices, resets
the newly loaded module, and enables the interface between the static region
and the reconfigurable module. At this point, although Linux is aware of the

presence of the reconfigured devices, it may not have device drivers appro-
priate to those devices. Next, device drivers for new devices are provided
by loading the appropriate kernel modules and the Linux kernel binds those
device drivers to the reconfigured devices. At this point, application code
may use the device drivers to communicate with the reconfigured region.
A similar sequence of steps in reverse order occurs to unbind the device
drivers and release the reconfigured region so that different processing
may occur.
Since the ICAP device and the control interface of the socket are exposed
through device drivers, it is relatively straightforward to implement recon-
figuration through a regular user process. One possibility for implementing
Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 372 2009-10-1
372 Model-Based Design for Embedded Systems
this involves linking the bitstream and meta-information into a single exe-
cutable along with the code for reconfiguration. The process created when
this executable is executed can be controlled through any operating system
mechanism (such as POSIX signals) to manage the life cycle of the module
loaded in the FPGA. The executable can also be linked together with other
application code, resulting in a familiar processor-centric usage model for
the FPGA fabric. This approach is similar in spirit, but greatly different in
implementation from that proposed in [18], which performs essentially the
same processes using the Linux kernel’s ability to implement new executable
formats.
It is important to recognize that although the reconfiguration process
is managed by a user process, it must be treated as a privileged opera-
tion executed as the root user, since there are many places where both
unintended errors and malicious attacks may result in unintended behav-
ior. Some of these places are not specific to the PR process, such as loading
kernel modules, whereas others are more subtle vulnerabilities. For instance,
as noted before, partial bitstreams have significant constraints on how they

are constructed and are specific to a particular implementation of the static
system. More directly, it is possible to trigger reconfiguration of the FPGA
through the ICAP interface, resulting in the loss of the current state of the
system. If the bus macros are enabled during PR, then it is likely that glitch-
ing on the interface signals will result in unintended behavior of the static
system.
One particularly common usage error is simply attempting to load a par-
tial bitstream that does not correspond to the current implementation of the
static design. This may happen during development when a modification is
made to the static region, but a designer neglects to reimplement a recon-
figured module. One way of avoiding such errors is to prepend each partial
bitstream with a hash generated from the static design. This hash can also
be stored in the static design, possibly in the device tree blob, and checked
before being loaded into the FPGA. If the partial bitstream is not signed
properly, then the reconfiguration process can be halted without affecting
the operation of the static design. This technique can be simply applied to
prevent unintended errors, or adapted using more cryptographically secure
techniques to prevent malicious attacks [2,4].
12.7 Putting It All Together
This section illustrates a SRP design targeted at a variant of the WARP
Software-defined Radio hardware built by Rice University [12]. Since the
original hardware is based on an older Virtex 2 Pro FPGA, we present
a design based on an updated Virtex 4 FX 100 device in order to better
Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 373 2009-10-1
FPGA Platforms for Embedded Systems 373
PPC405
(ppc_virtex4 v2.00.b)
Interrupt
controller
(xps_intc v1.00a)

Multiported memory controller
(mpmc v3.00b)
Ethernet MAC
(xps_ll_temac v1.01a)
plb plb sdma
Reconfigurable
socket
RS232 Uart
(xps_uartlite v1.00a)
ICAP Interface
(opb_hwicap v1.00b)
(using bridge)
BRAM interface
(xps_bram_if_cntlr v1.00a)
npi
dcr
(using bridge)
PLB bus
Reconfigured
region
FIGURE 12.11
Architecture of a reconfigurable platform. Some signals and standard cores
have not been shown.
represent the PR capabilities of newer FPGA architectures. In particular, we
focus on a MIMO OFDM reference design for this board, which implements
a bridge from Wired Ethernet to a two-radio MIMO system. The design uses
a processor to manage the packet headers and to perform configuration man-
agement of the radios, while packet payloads are communicated directly
between the wired and wireless network interfaces using direct memory
access to a processor-managed memory buffer. In the reference design, the

packed payload buffer is implemened in BRAM and communicated through
a PLB bus. In the reconfigurable design, we assume that the packet payload
buffer is implemented in external DRAM, which must be accessed from the
reconfigurable region through a separate port of the memory controller. As a
nonreconfigurable system, this design uses approximately 50% of the device
(21294 of 42176 slices).
The design of the static subsystem is shown in Figure 12.11. This design is
architected around the PowerPC 405 processor core and was largely gener-
ated using the Base System Builder capability in Xilinx EDK. Standard serial
port and ethernet IP cores provide external connectivity. Access to external
64 bit wide DDR2 SDRAM, including DMA access for the ethernet core, is
provided by the Xilinx MPMC IP core. In this system, the processor, memory
bus, and memory controller are designed to be “quasi-synchronous,” mean-
ing that clocks must be edge-aligned. Based on the speeds of the individ-
ual components, a design point was chosen targeting a slow speed grade
FPGA (−10) with the memory bus clocked at 83.3 MHz, the memory con-
troller clocked twice as fast (166.6 MHz), and the processor clocked three
times as fast (250 MHz).
Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 374 2009-10-1
374 Model-Based Design for Embedded Systems
Reconfigured region
ICAP
interface
Control
interface
bus macros
Memory
interface
bus macros
Utilized

powerPC
core
Static region
FIGURE 12.12
Placed and routed design of an FPGA processor platform, targeting a Virtex
4 FX 100.
The FPGA layout of the design is shown in Figure 12.12, overlaid with the
PR floorplanning constraints. The static region is at the south of the chip, and
is exactly two configuration frames tall. This layout provides approximately
8600 slices and 128 external I/O pins, which accommodates both the logic
Nicolescu/Model-Based Design for Embedded Systems 67842_C012 Finals Page 375 2009-10-1
FPGA Platforms for Embedded Systems 375
requirements of a simple processor design, and the I/O pins requirements of
a 64-bit DDR2 memory interface. A significantly smaller region would fail to
provide enough logic cells for the static design, while a larger region would
allocate too many pins to the static region, which would be difficult to access
from the reconfigurable region.
Note that the majority of the routed signals are contained within the floor-
planned area for the static region. The routes entering the top region connect
primarily to external I/O pins and FPGA resources, such as clock buffers and
the ICAP, located in the center column of the FPGA. Some routes into the top
region also connect to the PowerPC cores. Although only one PowerPC is
actually used in the static design, current versions of the EA PR tools do not
allow PowerPC cores to be part of the reconfigured portion of the design.
Hence, this design instantiates both PowerPC cores in the static region, in
order to enable use of the JTAG chain, which is assumed to connect through
both cores.
The device tree for this design is shown in Figure 12.13. Since the targeted
board includes Xilinx SystemACE, this is used to configure the FPGA and
initialize external memory with the kernel image. The compressed device

tree blob is initialized in the BRAM at address 0xfffff800 and decom-
pressed by the Linux bootwrapper executing out of external memory. The
root filesystem is stored on an external file server and loaded over the net-
work interface using the NFS protocol.
12.8 Conclusion
Although high-level algorithmic modeling offers significant promise for
increasing design productivity, a common problem with many approaches
is representing the environment in which a model exists in a system. A solu-
tion to this problem is often to provide platforms that abstract lower level
details, provide standardized interfaces, and can be targeted by a high-level
design tool. Although this difficulty exists in any embedded system, it is par-
ticularly apparent in FPGA systems, which include complex IP blocks, such
as processor cores, and where physical interfaces to the rest of the system are
highly flexible and incorporate many features that cannot be easily modeled
even at the circuit and gate level.
However, using the architectural features of some FPGAs, such as PR,
higher level platforms can be constructed that abstract many of these details
and are more appropriate for mapping from a high-level design tool. This
chapter has particularly shown how this technique can abstract the complex-
ities associated with including a control processor and operating system as
part of an FPGA platform.

×