Tải bản đầy đủ (.ppt) (73 trang)

A Survey of Logic Block Architectures For Digital Signal Processing Applications

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.21 MB, 73 trang )

A Survey of Logic
Block Architectures
For Digital Signal Processing
Applications


Presentation Outline


Considerations in Logic Block Design
 Computation

Requirements
 Why Inefficiencies?


Representative Logic Block Architectures
 Proposed
 Commercial



Conclusions: What is suitable Where?


Why DSP??? The Context







Representative of computationally intensive
class of applications  datapath oriented and
arithmetic oriented
Increasingly large use of FPGAs for DSP 
multimedia signal processing, communications,
and much more
To study the “issues” in reconfigurable fabric
design for compute intensive applications 
What is involved in making a fabric to accelerate
multimedia reconfigurable computing possible?


Elements of a Reconfigurable
Architecture


Logic Block/Processing Element
 Differing

Grains Fine>>Coarse>>ALUs

Routing
 Dynamic Reconfiguration



So what’s wrong with the typical
FPGA?
Meant to be general purpose  lower

risks
 Toooo Flexible!  Result: Efficiency Gap
 Higher Implementation Cost, Larger Delay,
Larger Power Consumption than ASICs
 Performance vs. Flexibility Tradeoff 
Postponing Mapping and Silicon Re-use



Solution? See how FPGAs are
Used?
FPGAs are being used for “classes” of
applications  Encryption, DSP,
Multimedia etc.
 Here lies the Key  Design FPGAs for a
class of applications
 Application Domain Characterization 
Application Domain Tuning



Domain Specialization
COMPUTATION

 defines 

ARCHITECTURE




Target Application Characteristics known
beforehand? Yes

1.

Characterize the application domain
Determine a balance b/w flexibilty vs efficiency
Tune the architecture according

2.
3.


Categorizing the “Computation”
Control  Random Logic Implementation
 Datapath  Processing of Multi-bit Data
 Conflicting Requirements???



Datapath Element Requirements
Operates on Word Slices or Bit Slices
 Produces multi-bit outputs
 Requires many smaller elements to
produce each bit output  i.e. multiple
small LUTs



Control Logic Requirements

Produces a single output from many single
bit inputs
 Benefits from large grain LUT as logic
levels gets reduced



Logic Block Design:
Considerations
“How much” of “what kinds” of
computations to support?
 Tradeoff: Generality vs Specialization



How much of What? Applications
benchmarking


So what do we have to support?







Datapath functionality, in particular arithmetic, is
dominant in DSP.
The datapath functions have different bit-widths.

DSP designs heavily use multiplexers of various
size. Thus, an efficient mapping of multiplexers
should be supported.
DSP functions do contain random logic. The
amount of random logic varies per design.
Some DSP designs use wide boolean functions.


DSP Building Blocks




Some techniques widely used to achieve areaspeed efficient DSP implementations
Bit Serial Computations
 Routing

Efficient
 Bit Level Pipelining Increases throughput even more


Digit Serial Computation
 Combining

“Area efficiency” of bit-serial and with
“Time efficiency” of Bit-parallel


Classes of DSP-optimized FPGA
Architectures

1.

Architectures with Dedicated DSP Logic




2.
3.

Homogeneous
Hetrogeneous
Globally Homogeneous, Locally
Heterogenous

Architectures of Coarser Granularity
With DSP Specific Improvements (e.g.
Carry Chains, Input Sharing, CBS)


Some Representative
Architectures


Bit-Serial FPGA with SR LUT
Bit-serial paradigm suites the existing
FPGA so why not optimize the FPGA for it!
 Logic block to support efficient
implementation of bit-serial data path and
bit-level pipelining

 LUTs can be used for combinational logic
as well as for Shift Registers



A Bit-Serial Adder
A Bit-Serial Adder
which processes two
bits at a time

Interface Block Diagram


A Bit-Serial Multiplier Cell


The Proposed Bit Serial Logic
Block Architecture


4x4-input LUTs and 6 flip-flops.



The two multiplexers in front of the LUTs are targeted mainly
for carry-save operations which are frequently used in bitserial computations.



There are 18 signal inputs and 6 signal outputs, plus a clock

input.



Feed-back inputs c2, c3, c4, c5 can be connected to either
GND or VDD or to one of the 4 outputs d0, d1, d2, d3.
Therefore, each LUT can implement any 4-input functions
controlled by inputs a0, a1, a2, a3 or b0, b1, b2, b3.



Programmable switches connected to inputs a4 and b4 control
the functionality of the four multiplexers at the output of LUTs.
As a result, 2 LUTs can implement any 5-input functions.



The final outputs d0, d1, d2, d3 can either be the direct
outputs from the multiplexers or the outputs from flip-flops. All
bit-serial operators use the outputs from flip-flops; therefore
the attached programmable switches are actually
unnecessary. They are only present in order to implement any
other logic functions other than bit-serial datapath circuits.



Two flip-flops are added (inputs c0 and c1) to implement shift
registers which are frequently used in bit-serial operations.



The Modified LUT Implementing a
Shift Register


Performance Results


Digit-Serial Logic Block Architecture
Digit–Serial Architectures process one
digit (N=4 bits) at a time
 They offer area efficiency similar to bitserial architectures and time-efficiency
close to bit-parallel architectures
 N=4 bits can serve as an optimal
granularity for processing larger digit sizes
(N=8,16 etc)



Digit-Serial Building Blocks

A Digit-Serial Adder

A Digit-Serial Unsigned Multiplier


Digit-Serial Building Blocks

A Pipelined Digit-Serial Unsigned Multiplier For Y=8 bits



×