A framework to explore low power architecture and variability aware timing estimation of FPGAs

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.24 MB, 120 trang )

A FRAMEWORK TO EXPLORE LOW-POWER
ARCHITECTURE AND VARIABILITY-AWARE
TIMING ESTIMATION OF FPGAS
LEE CHEE SING
(B.Eng.(Hons.), NUS)
A THESIS SUBMITTED
FOR THE DEGREE OF MASTER OF ENGINEERING
DEPARTMENT OF ELECTRICAL & COMPUTER ENGINEERING
NATIONAL UNIVERSITY OF SINGAPORE
2007
Acknowledgements
My sincere thanks go to my advisor, Assistant Professor Ha Yajun. Without his
help, this work would never have been possible. I have enjoyed a wonderful research
experience under his supervision as he has gone beyond the duties of a supervisor to
act as a mentor as well as a supporter.
I would also like to give special thanks to Professor Ben Chen (M. Eng./Ph.D.
Program Coordinator), who provided impetus for the project, laid down the initial
speciﬁcations and gave advices. Also, I would like to give a special acknowledgment to
Professor Jonathan Rose and Vaughn Betz (creators of VPR tool) from the University
of Toronto as well as Professor Jorge Stolﬁ (creator of aﬃne arithmetic model) for
their help in formulating the technical aspects of this work. Their contribution of
ideas and software had greatly aided in the development of my research.
In addition, during this Master’s program, I have gained wonderful experience
working with diﬀerent groups of people. Special thanks to Dr Heng Chun Huat for
his valuable contribution to the project on the designing of the reconﬁgurable buﬀer
for a low-power FPGA architecture. Thanks to Pu Yu and Kumaran, with who
have allow me to gain more insight to VLSI circuit designing in this project too.
Next, thanks to my hardware timing analysis project team (Zhang Wenjuan, Chen
Xiaolei and Loke Wei Ting), who have worked closely with me on the research on
ii
timing estimation in FPGAs. Also, thanks to my fellow colleagues, Shakith, Teo Jenn

Yue, Li Yanhui, Shefali, Zhang Wenjuan, Chen Xiaolei, Loke Wei Ting and Yu Heng
for the various knowledge enriching sharing mini-seminars that are organized by our
supervisor.
Last but not least, I would like to give special thanks to my family, friends and
anyone who is not mentioned here but had helped in one way or another.
iii
Contents
Acknowledgements ii
Table of Contents vii
Abstract viii
List of Figures xi
List of Tables xiv
List of Abbreviations xv
1 Introduction 1
1.1 FPGA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Process variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Traditional corner-based timing method . . . . . . . . . . . . 5
1.3 Problem deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Limitation of CAD tools . . . . . . . . . . . . . . . . . . . . . 7
iv
1.3.2 Limitation of power reduction in interconnects . . . . . . . . . 7
1.3.3 Limitation of SSTA techniques . . . . . . . . . . . . . . . . . . 9
1.4 Proposed research approach . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 Proposed CAD framework . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Proposed low power FPGA architecture . . . . . . . . . . . . 11
1.4.3 Proposed variability-aware timing estimation . . . . . . . . . . 11
1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.6 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Background and Related Works 14
2.1 FPGA routing architecture . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 CAD ﬂow for FPGA design . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Existing power estimation techniques . . . . . . . . . . . . . . . . . . 23
2.4 Existing SSTA techniques . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Modeling of the CAD Framework 26
3.1 Framework design approach . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Framework implementation approach . . . . . . . . . . . . . . . . . . 28
3.2.1 Initializing the architecture template . . . . . . . . . . . . . . 28
3.2.2 Editing the architecture template . . . . . . . . . . . . . . . . 33
3.2.3 CAD tool interface . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3 Routing resource graph . . . . . . . . . . . . . . . . . . . . . . . . . . 38
v
3.4 Placement and routing processes . . . . . . . . . . . . . . . . . . . . . 38
3.4.1 Placement process . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.2 Routing process . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4 Framework Experimental Results and Analysis 50
4.1 Display of generic FPGA architecture . . . . . . . . . . . . . . . . . . 51
4.2 Display of edited FPGA architecture . . . . . . . . . . . . . . . . . . 52
4.3 Display of architecture after placement and routing . . . . . . . . . . 54
4.4 Placement and routing results . . . . . . . . . . . . . . . . . . . . . . 55
5 Case Study 1: A Low-power FPGA Architecture 59
5.1 Conventional switch block . . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 Reconﬁgurable switch block . . . . . . . . . . . . . . . . . . . . . . . 62
5.3 Proposed switch block and FPGA architecture . . . . . . . . . . . . . 66
5.4 EDA support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.5 Power analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6 Case Study 2: A Interval-based FPGA Timing Estimator 72
6.1 Deterministic timing estimation . . . . . . . . . . . . . . . . . . . . . 72
6.2 Modeling of process variation . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Introduction to interval arithmetic . . . . . . . . . . . . . . . . . . . 74
6.4 Introduction to aﬃne arithmetic . . . . . . . . . . . . . . . . . . . . . 75

6.5 Interval-based timing estimation . . . . . . . . . . . . . . . . . . . . . 77
vi
6.5.1 Modeling of Variation . . . . . . . . . . . . . . . . . . . . . . 78
6.5.2 Comparison with Statistical modeling . . . . . . . . . . . . . . 80
6.5.3 Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.6 Design methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.7 Timing delay analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7 Conclusions and Future Work 91
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Bibliography 95
vii
Abstract
This thesis is written in 3 main sections. First, a new CAD framework is designed.
As semiconductor technology gets scaled down, more transistors will be allowed to be
fabricated onto a single chip. There is a need for a new tool to handle the building
of larger FPGAs. Heterogeneity is brought into the development phase to improve
FPGAs’ qualities. We propose a framework to allow researchers to design arbitrary
architectures with the help of a graphical user interface. It enables the initialization
of essential circuit parameters to obtain a basic architectural layout. Editing of the
initial design can be performed to allow the creation of an arbitrary architectural
design. It is built in with placement and routing capabilities to test the feasibility
of the newly designed architecture. Diﬀerent arbitrary architectures are being tested
using a set of MCNC benchmarks. Furthermore, porting of the designed architecture’s
resource graph to the current state-of-art VPR for more complete testing is made
available.
Second, we use the developed framework to investigate an alternative approach to
minimize the short-circuit power of FPGA global interconnects without the luxury of
viii
dual supply. A reconﬁgurable buﬀer, with programmable driving strength, is designed

and integrated into the FPGA switch block. EDA support is built into our framework
to test this new architecture. With our methodology, interconnect buﬀers can choose
the right driving strength based on the exact wire load after detailed routing. Our
simulation results show that, by applying larger driving strength along the critical
paths and relaxing the driving strength along the non-critical paths, the proposed
FPGA architecture can reduce the overall dynamic power by 6.10% - 10.05%, com-
pared with the conventional FPGA architecture. Our approach is complementary to
the existing dual supply voltage solution. Both techniques can be combined to further
reduce the overall dynamic power consumption.
Third, we use a developed framework VPR to explore a fast and accurate interval-
based timing estimator for variability-aware FPGA physical synthesis tools. As pro-
cess variations of deep sub-micron technologies have created signiﬁcant timing un-
certainty, this generates the need for a new generation of variability-aware physical
synthesis tools for FPGAs. Ideally, variability-aware tools should be able to per-
form both timing variability estimation during the synthesis and timing variability
analysis after the synthesis. SSTA methods are being developed to perform the tim-
ing variability analysis after the synthesis, but they are computationally expensive
and not fast enough to provide the timing variability estimation during the synthe-
sis. Hence, we propose a fast and accurate interval-based method for the timing
variability estimation. This method uses correlation-aware aﬃne intervals instead of
ix
probability density distributions to model timing uncertainties. Compared to Monte
Carlo simulations, we estimate the mean of timing variation within the accuracy of
1%, the average looseness range of about 22.6% and 4.5% for the Uniform and Gaus-
sian distribution respectively and a 1000X simulation speed-up. This work can be
easily extended to ASIC ﬂows. Furthermore, using our developed framework, this
case study can be extended to non-regular architectures.
x
List of Figures
1.1 Corner-based timing analysis: 2n corners for n parameters . . . . . . 6

2.1 Types of FPGA architecture . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 An island-style FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Typical FPGA CAD ﬂow . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Complexity problem in path-based approach . . . . . . . . . . . . . . 25
3.1 Interface for initialization . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Logic block pins location . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Types of connection block connectivity . . . . . . . . . . . . . . . . . 31
3.4 Types of switch block connectivity . . . . . . . . . . . . . . . . . . . 32
3.5 FPGA routing architecture template . . . . . . . . . . . . . . . . . . 33
3.6 Edit CLB’s pin orientation . . . . . . . . . . . . . . . . . . . . . . . . 35
3.7 Edit track information . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.8 Edit connection box . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.9 Edit switch box connectivity . . . . . . . . . . . . . . . . . . . . . . . 36
xi
3.10 Program interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.11 Modeling FPGA routing as a directed graph . . . . . . . . . . . . . . 39
3.12 Pseudo-code for the simulated-annealing algorithm used in the place-
ment step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.13 Half-perimeter wavelength model . . . . . . . . . . . . . . . . . . . . 42
3.14 Swapping between two logic blocks . . . . . . . . . . . . . . . . . . . 43
3.15 Sample placement ﬁle . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.16 Coordinate system used . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.17 Pseudo-code for the Pathﬁnder negotiated congestion algorithm used
in the routing step . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.18 Sample route ﬁle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1 Graphical view of a sample of FPGA routing architecture . . . . . . . 51
4.2 Segmentation view of a sample of FPGA routing architecture . . . . . 52
4.3 An edited FPGA architecture with heterogeneity . . . . . . . . . . . 53
4.4 An architecture after placement and routing . . . . . . . . . . . . . . 54
4.5 A selected CLB with its connectivity . . . . . . . . . . . . . . . . . . 55

4.6 A modiﬁed FPGA routing architecture template . . . . . . . . . . . . 57
5.1 Conventional switch blocks . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Reconﬁgurable buﬀer schematic . . . . . . . . . . . . . . . . . . . . . 62
5.3 Candidate circuits for a reconﬁgurable buﬀer cell . . . . . . . . . . . 63
5.4 Circuit implementation of a reconﬁgurable buﬀer . . . . . . . . . . . 64
xii
5.5 Equivalent circuits of conﬁgurable buﬀer . . . . . . . . . . . . . . . . 65
5.6 Switch point integrated with reconﬁgurable buﬀer . . . . . . . . . . . 67
5.7 EDA ﬂow for propose FPGA routing architecture . . . . . . . . . . . 68
6.1 Geometry of wiring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.2 Joint range of two partially dependent quantities in Aﬃne Arithmetic 78
6.3 The grid-based model to model correlations . . . . . . . . . . . . . . 80
6.4 Design ﬂow chart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.5 Variation initialization interface . . . . . . . . . . . . . . . . . . . . . 84
6.6 Pseudo-code for AA timing analysis . . . . . . . . . . . . . . . . . . . 85
6.7 MC initialization interface . . . . . . . . . . . . . . . . . . . . . . . . 85
6.8 Frequency distribution of des using Gaussian distribution and single
stream for 10000 iterations (MC) . . . . . . . . . . . . . . . . . . . . 87
6.9 Frequency distribution of des using Uniform distribution and single
stream for 10000 iterations (MC) . . . . . . . . . . . . . . . . . . . . 87
6.10 Max no. of noise symbols on an AA variable to illustrate that com-
plexity does not grow with circuit’s size . . . . . . . . . . . . . . . . . 88
xiii
List of Tables
1.1 CMOS technology roadmap . . . . . . . . . . . . . . . . . . . . . . . 4
3.1 Menu bar Options and descriptions . . . . . . . . . . . . . . . . . . . 37
3.2 Temperature update schedule . . . . . . . . . . . . . . . . . . . . . . 41
4.1 Minimum channel widths required to place and route 20 large bench-
mark circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2 Minimum channel widths required to place and route 20 large bench-

mark circuits using modiﬁed architecture . . . . . . . . . . . . . . . . 58
5.1 New FPGA architecture energy consumption for 20 large benchmark
circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.1 Parameter and its variation . . . . . . . . . . . . . . . . . . . . . . . 86
6.2 Comparison of bounds of critical path (ns) - Uniform . . . . . . . . . 89
6.3 Comparison of bounds of critical path (ns) - Gaussian . . . . . . . . . 89
xiv
List of Abbreviations
AA Aﬃne Arithmetic
ASIC Application-Speciﬁc Integrated Circuit
CAD Computer-Aided Design
CLB Conﬁgurable Logic Block
CMOS Complementary MetalOxideSemiconductor
DLL Delay-Lock Loop
EDA Electronic Design Automation
FPGA Field-Programmable Gate Array
GUI Graphical User Interface
HDL Hardware Description Language
I/O Input/Output
IA Interval Arithmetic
IOB Input/Output Block
LE Logic Element
xv
LUT Look-Up Tables
MPGA Mask-Programmable Gate Array
MCNC Microelectronics Corporation of North Carolina
PLD Programmable Logic Device
RRG Routing Resource Graph
RTL Register Transfer Level
STA Static Timing Analysis

SSTA Statistical Static Timing Analysis
SOC System-On-a-Chip
VPR Versatile Placement and Routing tool for FPGAs
VLSI Very Large Scale Integration
TTL Transistor-Transistor Logic
xvi
Chapter 1
Introduction
For the past few decades, microelectronics has been the technology in demand
for the development of both the hardware and software systems. With the continu-
ous increase in the level of integration of electronic devices, this form of technology
improves tremendously. The trend towards higher integration brings about the evolu-
tion of more sophisticated and faster systems to meet the increasing market demand.
As a result, the ﬁnal products become better and cheaper.
Field programmable gate arrays (FPGAs) are ﬁrst introduced during the mid-
1980s. At that time, FPGAs are only made up of transistor-transistor logic (TTL)
equivalent logic gates. With enhancements in the very-large-scale integration (VLSI)
processing technology, FPGAs have evolved to system-on-a-chips (SOC) with millions
of logic gates being packed together. Ever since, FPGA becomes a widely adopted
design at the heart of most electronic systems for its wide abundance of resources and
1
eﬃciency.
Moreover, with the discovering of new processing techniques over the recent years,
the semiconductor technology has been seen scaling down as predicted by the Moore’s
Law. This results in more transistors to be able to get fabricated onto a single chip;
and opens up more opportunities for researches to build larger and sophisticated FP-
GAs than ever. Furthermore, new features are continuously being discovered and
added into these FPGAs to cater for diﬀerent design needs. For example, power
eﬃcient FPGAs are being developed for portable electronic devices for which low
power consumption is a key requirement. As of today, we have seen numerous re-

searches with innovative ideas evolving and this has led to the development of FPGA
architectures of higher qualities and eﬃciencies.
1.1 FPGA Architecture
An FPGA architecture is made up of several millions of logic gates fused together.
In order to develop an optimized and eﬃcient architecture is not an easy task. How-
ever, a good approach to start oﬀ is to ﬁrst implement an architecture instance in
all the selected classes of FPGAs and evaluates their performances. The architecture
displaying the best combination of placement and routing results in terms of timing,
area or power is deemed to be the best. Previous researches [1–5] have shown that
a proper design of the routing architecture does play a major role in determining its
quality. The description of the architecture plays an important role in determining
2
the overall eﬃciency of the FPGA too.
Diﬀerent approaches in describing an FPGA architecture have been adopted in
many of the existing frameworks. One brute force method to describe the routing
architecture is by manually specifying all the interconnections between the logic blocks
through the use of a routing resource graph (RRG). This enables researches to have
the ﬂexibility in describing diﬀerent forms of architectures. However, this method is
not practical as a typical FPGA RRG’s size can go up to megabytes or even larger.
Eventually, due to its ineﬃciency and impracticability, such a low level and detailed
speciﬁcation is not applied.
A more practical approach is to ﬁrst design a basic tile with its interconnections
manually and uses a program to automatically replicate that basic structure into an
array to form a complete architecture. This technique is applied by George in [5] to
design low energy FPGA architectures. Not only it is time consuming, this method
also shows limitation in terms of ﬂexibility as the whole architecture is a replica of
the basic tile.
1.2 Process variation
With the continuous scaling of technology into the deep sub-micron regions, the
amount of variability increases signiﬁcantly in the process parameters that have to be

accounted for. For example, more than 35% variations on the gate length are cited
for 90nm processes and they are even larger for 65 nm processes [6]. Also, as shown
3
in Table 1.1 [7], the magnitude of the parameter variations does not scale down as
fast as the nominal values. As such, the parameter variation, as a percentage of the
nominal value, gets larger with decreasing technology.
Parameters Nominal Values 3σ Values
Years 1997 1999 2002 2005 2006 1997 1999 2002 2005 2006
Leﬀ [nm] 250 180 130 100 70 80 60 50 40 33
Tox [nm] 5 4.5 4 3.5 3 0.4 0.36 0.39 0.42 0.48
Vdd [V] 2.5 1.8 1.5 1.2 0.9 0.25 0.18 0.15 0.12 0.09
Vth [mV] 500 450 400 350 300 50 45 40 40 40
W [µu] 0.8 0.55 0.5 0.4 0.3 0.2 0.17 0.14 0.12 0.1
H [µm] 1.2 1 0.9 0.8 0.7 0.3 0.3 0.27 0.27 0.25
p [mΩ] 45 50 55 60 75 10 12 15 19 25
Table 1.1: CMOS technology roadmap
Process variations [8, 9] can be classiﬁed as inter-die variations, which aﬀect the
entire chip, and intra-die variations, which are the results of layout-speciﬁc variations.
These variations are normally accompanied with a complex spatial or temporal cor-
relation structure. They create signiﬁcant timing uncertainty and yield degradation.
This growing problem brings about the need to build the next generation variability-
aware electronic design automation (EDA) tools.
The above observation is especially important for FPGA vendors because they
are almost always the ﬁrst to use the most advanced technologies. For example,
Xilinx is the ﬁrst in the whole semiconductor industry to fabricate their Virtex-
2 FPGAs in 130nm, Virtex-4 in 90 nm, and Virtex-5 in 65nm processes. As the
process shrinks, variations in eﬀective channel length, threshold voltage and gate oxide
thickness become more prominent. This will greatly inﬂuence the timing performance
4
of FPGAs. Hence, the FPGA physical synthesis tools need to consider the impact of

process variations on timing in order to help guide timing-driven optimizations.
1.2.1 Traditional corner-based timing method
Process variations and their correlations have been studied over the years. Their
importance accelerates as the technology continues to scale down. Traditionally, pa-
rameter variations and correlations are handled using the corner-based deterministic
static timing analysis as shown in Figure 1.1 [7]. From Figure 1.1(a), two corners
known as the worst case and best case are individually timed for a single parameter
variation. However, if the two parameter variations are of signiﬁcance, four corners
are to be timed individually as shown in Figure 1.1(b). Hence, as the parameter vari-
ations increases, an exponential number of corners need to examine individually. This
makes the approach to be cumbersome and ineﬃcient. In addition, the corner-based
approach only provides information on whether the circuit is able to function at the
extreme corners and not on the quantitative yield information which is more critical.
1.3 Problem deﬁnition
Although there had been existing works which are eﬃcient in describing FPGA
architectures, reducing power usage and handling of process variations in FPGAs,
there are still many problems that need to be solved to improve them.
5
(a) Two corners for single parameter
(b) Four corners for two parameters
Figure 1.1: Corner-based timing analysis: 2n corners for n parameters
6
1.3.1 Limitation of CAD tools
Currently, there have been several promising computer-aided design (CAD) tools [1–
5] capable of describing routing architecture with enhanced design complexity, better
cost-saving or even improved eﬃciency. For example, Emerald [1] makes use of the
WireC schematics to describ e its routing architecture. This method requires inputs
like routing architecture description, logic block architecture description and archi-
tecture speciﬁc metrics in order to provide the basic features needed in placement
and routing tools. In another example, the Versatile Placement and Routing tool

(VPR) [2, 3] makes use of an FPGA architecture description language to describe its
routing architecture. An ”architecture generator” is used to convert this speciﬁca-
tion into a detailed and complete architecture for future work on optimization and
visualization. However, both the Emerald and VPR CAD tools share a common limi-
tation. Their architecture description techniques limit the range of architectures only
to a selected class of templates. This limitation prevents the design of heterogeneous
architectures.
1.3.2 Limitation of power reduction in interconnects
Among the routing resources in an FPGA architecture, switch buﬀers are the most
important components that determine its performance. The buﬀer not only behaves
as an intermediate repeater to regenerate the signal, it also breaks a long RC network
to minimize the interconnect delay. Therefore, the buﬀer chosen must be large enough
7
to drive its downstream circuits. While buﬀers can be fully customized for various
applications in application speciﬁc integrated circuits (ASIC) design, FPGAs do not
have such freedom because they are pre-fabricated. Targeting at driving the worst
case of load normally results in having unnecessarily large buﬀers within a FPGA
chip. These oversized buﬀers can cause undesirable problems.
First, due to the non-zero rising and falling time of the input signal, a larger
buﬀer will result in larger peak and average short-circuit currents during the transition
period, hence resulting in an increase in the short circuit power. From the simulations,
it can be shown that the short-circuit power accounts for roughly 10% of the total
dynamic power, depending on the actual synthesized circuits. As a result, the dynamic
power, which consists of both the switching power and the short-circuit power, is
increased.
Second, a larger buﬀer creates more ground-bounce noise. In custom ASIC design,
large transient current is avoided by using the minimum required buﬀers. This will
minimize the ground bounce noise introduced by Ldi/dt, where L is the inductance
associated with the package pins, bonding wires and on-chip metal lines for power
routing. Ground bounce noise reduces the available noise margin for the digital cir-

cuits [10]. In addition, it also deteriorates the performance of the sensitive analog
circuit on the chip, such as delay-lock loop (DLL), which is crucial for the function-
ing of large digital circuits. If an oversized buﬀer is used within the FPGAs, large
transient current and thus large ground bounce noise are inevitable.
8
1.3.3 Limitation of SSTA techniques
In relation to process variation, there has been several works [11–21] considering
the impact of variations on circuit performances using statistical static timing analysis
(SSTA). These approaches are classiﬁed into various categories such as block-based,
path-based, incremental, etc. In [12, 14], the authors propose techniques to get the
bounds of the delay distributions instead of calculating the exact distributions using
path-based or block-based analysis techniques. In [16], the proposed approach does
an estimation based on a generic path analysis rather than evaluating every path
statistically. However, many of these researchers have advocated complicated SSTA
techniques, primarily due to handling correlation and path reconvergence during the
MAX operation fundamental to static timing analysis (STA). This leads to undesir-
able high computation complexity and large CPU overhead. Furthermore, most of
these statistical analysis techniques typically assume the circuit parameters as inde-
pendent random variables with a Gaussian distribution. This is not true in most
cases.
1.4 Proposed research approach
From the problem deﬁnitions above, we propose three approaches to solve each
of them individually. First, a CAD framework capable of designing heterogeneous
architecture is developed. Second, a FPGA architecture with reconﬁgurable buﬀer
9

A framework to explore low power architecture and variability aware timing estimation of FPGAs

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về