Tải bản đầy đủ (.pdf) (10 trang)

Handbook of algorithms for physical design automation part 101 potx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (280.76 KB, 10 trang )

Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C046 Finals Page 982 9-10-2008 #27
982 Handbook of Algorithms for Physical Design Automation
Source
Switchbox
Sink
(a) Config 1 (b) Config 2
(c) Config 3 (d) Delay variation
1.5 1.1
0.5
1.0 1.0
1.0
1.0 0.9
0.9
FIGURE 46.19 Three critical path configurations and delay variations of a switch matrix. (Based on
Matsumoto, Y. et al., Proceedings of the 2007 ACM/SIGDA 15th International Symposium on Field
Pr ogrammable Gate Arrays, ACM Press, New York, 2007. With permission.)
where Y
1
(Target) is defined as
Y
1
(
Target
)
=
T
Target

−∞
f
crit


(
t
)
dt (46.13)
In Equation 46.12, the likelihood that all n configurations fail is subtracted from 1. In their
work, they assume complete independence b etween critical paths in different configurations, which
enables them to analytically evaluate Equations 46.12 and 46.13. This assumption is not valid, as
we kn ow spatial corr elations exist between circuit elements, and also critical paths across different
configurations might share routing resources, especially close to the source and sink nodes.
They propose a routing algorithm that keeps track of the usage of routing resources by critical
paths and tries to avoid them in consecutive configuration s that are generated. The method is similar
to the congestion avoidance procedure used in VPR, that is, resources that are used by critical paths
in other configurations are penalized so that the router avoids them if other paths with the same
delay exist.
REFERENCES
1. J. Cong and K. Minkovich, Optimality study of logic synthesis for Lut-based FPGAs, IEEE Transactions
on Computer-Aided Design of Integr ated Circuits and Systems, 26(2): 230–239, 2007.
2. D. Chen and J. Cong, Daomap: A depth-optimal area optimization mapping algorithm for FPGA designs,
in ICCAD ’04: Proceedings of the 2004 IEEE/ACM International Conference on Computer-Aided Design,
pp. 752–759, IEEE Computer Society, Washington DC, 2004.
3. B. L. Synthesis and V. Group, A bc: A system for sequential synthesis a nd verification. Available at
/>4. Alan, S. Chatterjee, and R. Brayton, Improv ements to technology mapping for Lut-based FPGAs, in FPGA
’06: Proceedings of the 2006 ACM/SIGDA 14th International Symposium on Field Programmable Gate
Arrays, pp. 41–49, ACM Press, New York, 2006.
5. J. Cong and Y. Ding, Flo wmap: An optimal technology mapping algorithm for delay optimization in
lookup-table based FPGA designs, IEEE Transactions on Computer-Aided Design of Integr ated Circuits
and Systems (TCAD), 13(1): 1–12, 1994.
Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C046 Finals Page 983 9-10-2008 #28
FPGA Technology Mapping, Placement, and Routing 983
6. V. Betz and J. Rose, VPR: A n ew packing, placement and routing tool for FPGA research, in Field-

Programmable Logic and Applications (W. Luk, P. Y. Cheung, and M. Glesner, eds.), pp. 213–222,
Springer-Verlag, Berlin, Germany, 1997.
7. A. S. Marquardt, V. Betz, and J. Rose, Using cluster-based logic blocks and timing-driven packing to
improve FPGA speed and density, in Proceedings of the ACM/SIGDA International Symposium on Field
Pr ogrammable Gate Arrays, Monterey, CA, pp. 37–46, 1999.
8. E. Bozorgzadeh, S. Ogrenci-Memik, and M. Sarrafzadeh, Rpack: Routability-driven packing for cluster-
based FPGAs, in Pr oceedings of the Asia-South Pacific Design Automation Confer ence, Yokohama, Japan,
2001, pp. 629–634.
9. A. Singh and M. Marek-Sadowska, Efficient circuit clustering for area and power reduction in FPGAs, in
Pr oceedings of the ACM/SIGDA International Symposium on Field Pro grammable Gate Arrays, Monterey,
CA, pp. 59–66, 2002.
10. A. DeHon, B alancing interconnect and computation in a reconfiguable computing array (or , why you don’t
really want 100% LUT utilization), in Proceedings of the ACM/SIGDA International Symposium on Field
Pr ogrammable Gate Arrays, Monterey, CA, pp. 69–78, 1999.
11. L. Cheng a nd M. D. F. Wong, Floorplan design for multi-million gate FPGAs, in Proceedings of the
IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, pp. 292–299, 2004.
12. Y. Sankar and J. Rose, Trading quality for compile time: Ultra-fast placement for FPGAs, i n Proceed-
ings of the ACM/SIGDA International Symposium o n Field Pro grammable Gate Arrays, San Jose, CA,
pp. 157–166, 1999.
13. J.M.Emmert andD. Bhatia, Amethodology forfast FPGA floorplanning, inProceedings of the ACM/SIGDA
International Symposium on Field Prog rammable Gate Arrays, Monterey, CA, pp. 47–56, 1999.
14. K. Bazargan, R. Kastner, and M. Sarrafzadeh, Fast template placement for reconfigurable computing
systems, IEEE Design and Test—Special Issue on Reconfigur able Computing, 17: 68–83, January 2000.
15. E. L. Horta, J. W. Lockwood, D. E. Taylor, and D. Parlour, Dynamic hardware plugins in an FPGA with
partial runtime reconfiguration, in Proceedings of the ACM/IEEE Design Automation Conference,New
Orleans, LA, pp. 343–347, 2002.
16. J. Chen, J. Moon, and K. Bazargan, A reconfigurable FPGA-based readback signal generator for hard-drive
read channel simulator, in Proceedings of the ACM/IEEE Design Automation Conference, New Orleans,
LA, pp. 349–354, 2002.
17. M. Handa and R. Vemuri, An efficient algorithm for finding empty space for online FPGA placement, in

Pr oceedings of the ACM/IEEE Design Automation Conference, San Diego, CA, pp. 960–965, 2004.
18. L. Singhal and E. Bozorgzadeh, M ulti-layer floorplanning on a sequence of reconfigurable designs, in
FPL’06: Proceedings of the 2006 International Conference on Field Pro grammable Logic and Applications,
Madrid, 2006.
19. J. Cong, M. Romesis, and M. Xie, Optimality and stability study of timing-driven placement algorithms,
in Proceedings of the I EEE/ACM International Conference on Computer-Aided Design, San Jose, CA,
p. 472, 2003.
20. C. -L. E. Cheng, Risa: Accurate and efficient placement routability modeling, in Proceedings of the
IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, pp. 690–695, 1994.
21. A. Marquardt, V. Betz, and J.Rose,Timing-drivenplacement for FPGAs, in Proceedings of the ACM/SIGDA
International Symposium on Field Prog rammable Gate Arrays, Monterey, CA, pp. 203–213, 2000.
22. S. Nag and R. A. Rutenbar, Performance-driven simultaneous placement and routing for FPGA’s IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 17(6): 499–518, 1998.
23. P. Maidee, C. Ababei, and K. Bazargan, Timing-driven partitioning-based placement for island style
FPGAs, IEEE Transactions on Co mputer-Aided Design of Inte grated Circuits and Systems (TCAD), 24(3):
395–406, 2005.
24. S. A. Senouci, A. Amoura, H. Krupnova, and G. Saucier, Timing driven floorplanning on programmable
hierarchical targets, in Proceedings of the ACM/SIGDA International Symposium on Field P rogrammable
Gate Arrays
, Monterey, CA, pp. 85–92, 1998.
25.
M.
Hutton, K. Adibsamii, and A. Leaver, Timing-driven placement for hierarchical programmable logic
devices, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays,
Monterey, C A, pp. 3–11, 2001.
26. G. Chen and J. Cong, Simultaneous timing-driven placement and duplication, in Proceedings of the
ACM/SIGDA International Symposiumon Field Programmable Gate Arrays,Monterey, CA,pp. 51–59,2005.
Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C046 Finals Page 984 9-10-2008 #29
984 Handbook of Algorithms for Physical Design Automation
27. D. P. Singh and S. D. Brown, Incremental placement for layout-driven optimizations on FPGAs, in

Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA,
pp. 752–759, 2002.
28. S. -W. Hur and J. Lillis, Mongrel: Hybrid techniques for standard cell placement, in Proceedings of the
IEEE/ACM International Conference on Computer-Aided Design, San Jose, CA, pp. 165–170, 2000.
29. T. J. Callahan, P. Chong, A. DeHon, and J. Wawrzynek, Fast module mapping and placement for datapaths in
FPGAs, in Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays,
Monterey, C A, pp. 123–132, 1998.
30. C. Ababei and K. Bazargan, Non-contiguous linear placement for reconfigurable fabrics, International
Journal of Embedded S ystems (IJES)—esp. issue on Reconfigurable Architectures Workshop (RAW),
2(1/2): 86–94, 2006.
31. M. Hutton, Y. Lin, and L. He, Placement and timing for FPGAs considering variations, in FPL’06: Pro-
ceedings of the 2006 International Confer ence on Field Programmable Logic and Applications,Madrid,
2006.
32. L. Cheng, J. Xiong, L. He, and M. Hutton, FPGA performance optimization via chipwise placement
considering process variations, in FPL’06: P roceedings of the 2006 International Confer ence on Field
Programmable Logic and Applications, Madrid, 2006.
33. C. Visweswariah, K. Ravindran, K. Kalafala, S. G. Walker, S. Narayan, D. K. Beece, J. Piaget, N.
Venkateswaran, and J. G. Hemmett, First-order incremental block-based statistical timing analysis, IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems, 25: 2170–2180, October 2006.
34. Y. Lin and L. He, Stochastic physical synthesis for FPGAs with pre-routing interconnect uncertainty and
process variation, in FPGA ’07: Proceedings of the 2007 ACM/SIGDA 15th International Symposium on
Field Programmable Gate Arrays, pp. 80–88, ACM Press, New York, 2007.
35. A. Gayasen, Y. Tsai, N. V ijaykrishnan, M. Kandemir, M. J. Irwin, and T. Tuan, Reducing leakage energy
in fpgas using region-constrained placement, in Proceedings of the ACM/SIGDA International Symposium
on Field Programmable Gate Arrays, M onterey, CA, pp. 51–58, 2004.
36. Y. Lin and L. He, Leakage efficient chip-level dual-vdd assignment with time slack allocation for FPGA
power reduction, in Proceedings of the ACM/IEEE Design Automation Conference, Anaheim, CA, pp. 720–
725, 2005.
37. L. McMuchie and C. Ebeling, Pathfinder: A negotiation-based performance-driven router for FPGAs, in
Pr oceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey,

CA, pp. 473–482, 1995.
38. Y. -W. C hang, K. Zhu, and D. F. Wong, Timing-driven routing for symmetrical array-based FPGAs, ACM
Transactions on Design Automation of Electronic Systems, 5(3): 433–450, 2000.
39. G. -J. Nam, K. A . Sakallah, and R. A. Rutenbar, A new FPGA detailed routing approach via search-based
Boolean satisfiability, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
(TCAD), 21(6): 674–684, 2002.
40. J. -M. Lin, S. -R. Pan, and Y. -W. Chang, Graph matching-based algorithms for array-based FPGA seg-
mentation design and routing, in Proceedings of the Asia-South Pacific Design Automation Conference,
Kitakyushu, Japan, pp. 851–854, 2003.
41. N. Sherwani, Algorithms for VLSI Physical D esign Automation, 2 edn. Kluwer Academic Publishers,
Boston, MA, 1995.
42. K. Eguro and S. H auck, Armada: Timing-driven pipeline-aw are routing for FPGAs, in Proceedings of
the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, Monterey, CA, pp. 169–
178, 2006.
43. P. Kannan, S. Balachandran, and D. Bhatia, On metrics for comparing routability estimation methods for
FPGAs, in Proceedings of the ACM/IEEE Design A u tomation Confer ence, New Orleans, LA, pp. 70–
75, 2002.
44. S. Sivaswamy and K. Bazargan, Variation-aware routing for FPGAs, in FPGA ’07: Proceedings of the 2007
ACM/SIGD A 15th International Symposium on Field Progra mmable G ate Arrays, pp. 71–79, ACM Press,
New York 2007.
45. Y. Matsumoto, M. Hioki, T. Kawanami, T. Tsutsumi, T. Nakagawa, T. Sekigawa, and H. K oike, Performance
and yield enhancement of FPGAs with within-die variation using multiple configurations, in FPGA ’07:
Pr oceedings of the 2007 ACM/SIGDA 15th International Symposium on Field Pro grammable Gate Arrays,
pp. 169–177, ACM Press, New York 2007.
Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 985 10-10-2008 #2
47
Physical Design for
Three-Dimensional
Circuits
Kia Bazargan and Sachin S. Sapatnekar

CONTENTS
47.1 Introduction 985
47.2 Standard Cell-Based Designs 987
47.2.1 Thermal Vias 987
47.2.2 3D Floorplanning 989
47.2.3 3D Placement 990
47.2.4 Routing Algorithms 991
47.3 3D FPGADesigns 993
47.3.1 Estimation Methods 994
47.3.2 Placement and Routing Algorithms 997
47.3.2.1 Partitioning the Circuit between Tiers 998
47.3.2. 2 Partitioning-Based Placement within Tiers 999
47.3.2.3 Simulated Annealing Placement Phase 1000
References 1000
47.1 INTRODUCTION
Recent advances in process technology have brought three-dimensional (3D) circuits to the realm of
reality. This n ew design paradigm will require a major change from contemporary design method-
ologies, because an optimal 3D design has very d ifferent characteristics from an optimal 2D design.
The move from conventional 2D to 3D is inherently a topological change, and therefore, many of
the problems that are unique to 3D circuits lie in the domain of physical design.
The essential idea of a 3D circuit is to place multiple tiers of active devices (transistors) above
each other, as opposed to a conventional 2 D circuit where all transistors and gates lie in a single tier.
An example of 3D circuit is shown in Figure 47.1.
One of the primary motivators for 3D technologies is related to the dominant effects of intercon-
nects in nanoscale technologies, and the addition of a third dimension provides significant relief in
this respect. This is achieved b y reductions in the average interconnect lengths (in comparison with
2D implementations, for the same circuit size), lower wire congestion, as well as by denser integra-
tion, which results in the replacement o f chip-to-chip interconnections by intrachip connections. In
addition, the increased packing density improves the computation per unit volume.
For instance, Figure 47.2 shows a 2D layout on a chip of dimension 2L × 2L on the left, where

the longest (nondetoured) wire, going from one end of the layout to the other, has a length of 4L.
If this design is built on four tiers, as shown at right, assuming the same total silicon area and a
square aspect ratio for each tier, the silicon area in each tier is L ×L. Therefore, the longest possible
985
Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 986 10-10-2008 #3
986 Handbook of Algorithms for Physical Design Automation
Intratier
wires
Devices
Intertier via
Silicon substrate
Tier 1
Tier 2
Tier 3
Tier 4
FIGURE 47.1 Schematic of a 3D integrated circuit.
undetoured wirelength, going from one end in the lowest tier to the other end in the uppermost tier,
is approximately 2L (because the intertier thickness is negligible). Because, for a buffered two-pin
interconnect, the delay of a wire is proportional to its length, this implies that the delay is halved.
Moreover, the reduced wire lengths also reduce the likelihood of congestion bottlenecks, potentially
reducing the need to detour wires. A more precise distribution of the wirelength has been reported
in Ref. [1], which shows that the histogram of wirelength distributions moves progressively to the
left as the number of tiers is increased.
In addition, 3D designs can result in new paradigms, for example, heterogeneous integration,
where each tier could be a different material (e.g.,a silicon-based circuit on one tier and a GaAs-based
circuit on another). Even for purely silicon-based circuits, 3D designs permit analog/RF and digital
circuits to be build on different tiers, which improves their noise behavior; additionally, it is possible
to construct shielding structures such as Faraday cages between tiers for enhanced noise reduction.
Various flavors o f 3D technologies have been proposed and are in use. One of the simplest forms
involves wafer stacking, where the distance between active devices in the third dimension (or the

“z dimension”) equals the thickness of a wafer. However, the thickness of a wafer is of the order of
several hundreds of microns, and the full potential of 3D is not achieved by this approach due to
the long distance that a wire must traverse in the z dimension. Further progress has resulted in the
development of integrated 3D circuits in industrial [2], g overnment [3], and academic [4] settings,
which have demonstrated 3D designs with intertier separations of the order of a few microns.
Today, it is only possible to build a few tiers in the third dimension, as a result of which many
of these technologies are often referred to as 2.5D rather than fully 3D. Nevertheless, even the half
dimension can provide the potential for substantial performance improvements, and perhaps future
technological improvements will enable truly 3D integration.
In this chapter, we present an overview of physical design technologies for 3D circuits. We begin
with a brief overview of a typical 3D technology, and then discuss physical design problems in the
custom/ASIC design as well as the FPGA paradigms. Generally speaking, the number of tiers is
taken in as a technology input by the 3D tools described in this chapter.
2L
2L
L
L
FIGURE 47.2 Comparison of the maximum wirelength in a 2D layout (left) and i n its 3D counterpart (right).
For clarity, the intertier thicknesses in the 3D circuit are shown to be exaggeratedly large.
Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 987 10-10-2008 #4
Physical Design for Three-Dimensional Circuits 987
47.2 STANDARD CELL-BASED DESIGNS
A typical cell-based flow begins with a floorplanning step, where the system is laid out at the level
of macroblocks, detailed p lacement of the cells in the layout, and routing. In the 3D context, each of
these must be modified to adapt to the constraints imposed by 3D circuits. In addition to conventional
metrics, 3D-specific geometrical considerations must be used, for example, for wirelength metrics.
In addition, temperature is treated as a first-class citizen during these optimizations.

Moreover,
intertier via reduction is considered to be a desirable goal, because the number of available vias is

restricted and must be shared between signal nets and supply and clock nets.
In addition to floorplanning, placement, and routing, a 3D-specific optimization that makes the
temperature distribution more uniform is the judicious positioning of thermal vias within the layout.
These vias correspond to intertier metal connections that have no electrical function, but instead,
constitute a passive cooling technology that draws heat from the problem areas to the heat sink, and
can be built into each of these steps or performed as an independent postprocessing step, depending
on the design methodology.
It is instructive to view the result of a typical 3D thermally aware placement [5]: a layout for
the benchmark circuit, IBM01, in a four-tier 3D process, is displayed in Figure 47.3. The cells are
positioned in ordered rows on each tier, and the layout in each individual tier looks similar to a 2D
standard cell layout. The heat sink is placed at the bottom of the 3D chip, and the lighter shaded
regions are hotter than the darker shaded regions. The coolest cells are those in the bottom tier, next
to the h eat sink, and the temperature increases as we move to higher tiers. The thermal placement
method consciously mitigates the temperature by making the upper tiers sparser, in terms of the
percentage of area populated by the cells, than the lower tiers.
47.2.1 THERMAL VIAS
Although silicon is a good thermal conductor,with half or more of the conductivity of typical metals,
many of the materials used in 3D technologies are strong insulators that place severe restrictions on
the amount of heat that can be removed, even under the best placement solution.Thematerials include
epoxy bonding materials used to attach 3D tiers, or field oxide, or the insulator in an SOI technology.
Therefore, the use of deliberate metal lines that serve as heat-removing channels, called thermal
vias, are an important ingredient of the total thermal solution. The second step in the flow determines
the optimal positions of thermal vias in the placement that provide an overall improvement in the
Hot
Cool
0
0.5
Ϫ0.5
1
ϫ10

−5
Ϫ1
0.015
0.005
Ϫ0.005
0.01
Ϫ0.01
0
Ϫ0.015
0.015
0.01
0
0.005
Ϫ0.005
Ϫ0.01
Ϫ0.015
FIGURE 47.3 Placement for the benchmark ibm01 in a four-tier 3D technology. ( From Ababei, C., et al.,
IEEE Design and Test, 22, 520, 2005. Copyright IEEE. With permission.)

A description of techniques for thermal analysis is provided in Section 3.4
Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 988 10-10-2008 #5
988 Handbook of Algorithms for Physical Design Automation
temperature distribution. In realistic 3D technologies, the footprints of these intertier vias are of the
order 5 ×5 µm.
In principle, the problem of placing thermal v ias can be viewed as one of determining one of two
conductivities (corresponding to the presence or absence of metal) at every candidate point where a
thermal via may be placed in the chip. However, in practice, it is easy to see that such an approach
could lead to an extremely large search space that is exponential in the number of possible positions;
note that the set of possible positions in itself is extremely large.
Quite apart from the size of the search space, such an approach is unrealistic for several other

reasons. First, the wanton addition of thermal vias in any arbitrary region of the layout would lead
to nightmares for a router, which would have to navigate around these blockages. Second, from a
practical standpoint, it is unreasonable to perform full-chip thermal analysis, particularly in the inner
loop of an optimizer, at the granularity of individual thermal vias. At this level of detail, individual
elements would have to correspond to the size of a thermal via, and the size of the thermal simulation
matrix would become extremely large.
Fortunately, there are reasonable ways to overcome each of these issues. The blockage problem
may be controlled by enforcing discipline within the design, designating a specific set of areas within
the chip as potential thermal via sites. These could be chosen as specific interrow regions in the cell-
based layout, and the optimizer would d etermine the density with which these are filled with thermal
vias. Theadvantageto the routerisobvious, because onlytheseregions arepotential blockages,which
is much easier to handle. To control the finite element analysis (FEA) stiffness matrix size, one could
work with a two-level scheme with relatively large elements, where the average thermal conductivity
of eachregion isadesignvariable. Oncethisaverageconductivityischosen,itcouldbe translated back
into a precise distribution of thermal vias within the element that achieves that average conductivity.
Various published methods take different approaches to thermal via insertion. We now describe
an algorithm to postfacto thermal via insertion [6]; other procedures perform thermal via insertion
during floorplanning, placement or routing are discussed in the appropriate sections.
For a given placed 3D circuit, an iterative method was developed in which, during each iteration,
the thermal conductivities of certain FEA elements (thermal via regions) are incrementally modified
so that thermal problems are reduced or eliminated. Thermal vias are generically added to elements
to achieve the desired thermal conductivities. The goal of this method is to satisfy given thermal
requirements using as few thermal vias as possible, that is, keeping the thermal conductivities as low
as possible.
The approach uses the finite element equations to determine a target thermal conductivity.
A key observation in this work is that the insertion of thermal vias is most useful in areas with
a high thermal gradient, rather than areas with a high temperature. Effectively, the thermal via acts
as a pipe that allows the heat to be conducted from the higher temperature region to the lower
temperature region; this, in turn, leads to temperature reductions in areas of high temperature.
This is illustrated in Figure 47.4, which shows the 3D layout of the benchmark struct,before

and after the addition of thermal vias. The hottest region is the center of the uppermost tier, and a
major reason for its elevated temperature is b ecause the tier below it is hot. Adding thermal vias to
remove heat from the second tier, therefore, effectively also significantly reduces the temperature
of the top tier. For this reason, the regions where the insertion of thermal vias is most effective are
those that have high thermal gradients.
Therefore the method in Ref. [6] employs an iterative update formula of the type
K
new
i
= K
old
i



g
old
i


g
i,ideal

i = x, y, z (47.1)
is employed, where K
new
i
and K
old
i

are, respectively, the new and old thermal conductivities in each
direction, before and after each iteration, g
old
i
is the old thermal gradient, an d g
i,ideal
is a heuristically
selected ideal thermal gradient.
Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 989 10-10-2008 #6
Physical Design for Three-Dimensional Circuits 989
0
−0.01
y
1
0.8
0.6
0.01
−1
−0.8
−0.6
−0.4
−0.2
z
0.4
0.2
0
ϫ10
Ϫ5
Before thermal via placement After thermal via placement
ϫ10

Ϫ5
1
0.8
0.6
−1
−0.8
−0.6
−0.4
−0.2
0.4
0.2
0
z
0
−0.01
0.01
y
−0.015
−0.005
−0.01
0
0.005
0.01
0.015
x
x
−0.015
−0.005
−0.01
0.000

0.005
0.01 0.015
FIGURE 47.4 Thermal profile of struct before (left) and after (right) thermal via insertion. The top four layers
of the figure at right correspond to the f our layers in the figure at left. (From Goplen, B. and Sapatnekar, S. S.,
IEEE Transactions on Computer-Aided Design, 26, 692, 2006. Copyright IEEE. With permission.)
Each iteration begins with a distribution of the thermal vias; this distribution is corrected using the
aboveupdate formula, and the K
new
i
value is th en translated to a thermal via density, and then a precise
layout of thermal vias, using precharacterization. The iterations end when the desired temperature
profile is achieved. This essential iterative idea has also been used in other methods for thermal-
via insertion steps that are integrated within floorplanning, placement, and routing, as described in
succeeding sections. This general framework has been used in several o ther published techniques
that insert thermal vias either concurrently during another optimization, or as an independent step.
47.2.2 3D FLOORPLANNING
The 3D floorplanning problem is analogous to the 2D problem discussed in Chapters 8 through 13,
with all the constraints and opportunities that arise with the move to the third dimension. Typical
cost functions include a mix of the conventional wirelength and total area costs, and the temperature
and the number of intertier vias.
The approach in Ref. [7] presented one of the first approaches to 3D floorplanning, and used
the transitive closure graph (TCG) representation [8], described in Section 11.7, for each tier, and a
bucket structure for the third dimension. Each bucket represents a 2D region over all tiers, and stores,
for each tier, the indices of the blocks that intersect that bucket. In other words, the TCG and this
bucket structure can quickly determine any adjacency information. A simulated annealing engine is
then utilized, with the moves corresponding to perturbations within a tier and across tiers; in each
such case, the corresponding TCGs and buckets are updated, as necessary.
A simple thermal analysis procedure is built into this solution, using a finite difference approx-
imation of the thermal network to build an RC thermal network. Under the assumption that heat
flows purely in the z direction and there is no lateral heat conduction, the RC model obtained from a

finite difference approximation has a tree structure, and Elmore-like computations (Section 47.3.1)
can be performed to d e termine the temperature. The optimization heuristically attempts to make
this a self-fulfilling assumption, by discouraging lateral heat conduction, introducing a cost function
parameter that discourages strong horizontal gradients. A hybridapproachperformsan exact thermal
analysis once every 20 iterations or so and uses the approximate approach for the other iterations.
The work in Ref. [9] expands the idea of thermally driven floorplanning by integrating thermal
via insertion into the simulated annealing procedure. A thermal analysis procedure based on random
walks[10]is built into themethod, and an iterative formula,similartoRef. [6],is usedin a thermal-via
insertion step between successive simulated annealing iterations.
Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 990 10-10-2008 #7
990 Handbook of Algorithms for Physical Design Automation
47.2.3 3D PLACEMENT
In the p lacement step, the p recise positions of cells in a layout are determined, and they are arranged
in rows within the tiers of the 3D circuit. Because thermal considerations are particularly important
in 3D cell-based circuits, this pr ocedure m ust spread the cells to achieve a reasona ble temp erature
distribution, while also capturing traditional placement requirements.
Several approaches to 3D placement have been proposed in the literature. The work in Ref. [11]
embeds the netlist hypergraph into the layout area. A recursive bipartitioning procedure is used to
assign nodes of the hypergraph to partitions, using mincut as the primary objective and underpartition
capacity constraints. Partitioning in the z direction corresponds to tier assignment, and xy partitions
to assigning standard cells to rows. No thermal considerations are taken into account.
The procedure in Ref. [5] presents a 3D-specific force-directed placer that incorporates thermal
objectives directly into the placer. I nstead of the finite difference method that is used in many
floorplanners, this approach employs FEA, which discretizes the design space into regions known
as elements. For rectangular structures of the type encountered in integrated circuits, a rectangular
cuboidal element can simulate heat conduction in the lateral directions without aberrations in the
prime directions. As described in Chapter 3, FEA results in a matrix of the type
KT = P (47.2)
The left hand side matrix, K, known as the global stiffness matrix, can be constructed using stamps
for the finite elements and the boundary conditions. The FEA equations are solved rapidly using an

iterative linear solver, with clever adjustments of the convergence criteria to achieve greater or lesser
accuracy, as required at different stages of the iterative placement process.
The placement engine is based on a force-directed approach, the key idea of which is described
in Chapter 18. Attractive forces are created between interconnected cells, and these are proportional
to the quadratic function of the cell coordinates that represents the Euclidean distance between the
blocks. The constants of proportionality are chosen to be higher in the z direction to discourage
intertier vias.
Apart from design criteria such as cell overlap, in the 3D context, thermal criteria are also used
to generate repulsive forces, to prevent hot spots. The temperature gradient (which itself can be
related to the stiffness matrix and its derivative) is used to determine the magnitudes and directions
of these forces.
Once the entire system of attractive and repulsive forces is generated, repulsive forces are added,
the system is solved for the minimum energy state, that is, the equilibrium location. Ideally, this
minimizes the wirelengths while at the same time satisfying the other design criteria such as the
temperature distribution. The iterative force-directed approach follows the following steps in the
main loop. Initially, forces are updated based on the previous placement. Using these new forces,
the cell positions are then calculated. These two steps of calculating forces and finding cell positions
are rep eated until the exit criteria are satisfied. The specifics of the force-directed approachto thermal
placement, including the mathematical details, are presented in Ref. [5].Oncethe iterations converge,
a final postprocessing step is used to legalize the placement. Even though forces have been added
to discourage overlaps, the force-directed engine solves the problem in the continuous domain, and
the task of legalization is to align cells to tiers, and to rows within each tier.
Another method in Ref. [12] maps an existing 2D placement to a 3D placement through trans-
formations based on dividing the layout into 2
k
regions, for integer values of k, and then defining
local transformations to heuristically refine the layout.
More recent work in Ref. [13] observes that because 3D layouts have very limited flexibility in
the third dimension (with a small number of layers and a fixed set of discrete locations), partitioning
works better than a force-directed method. Accordingly, this work performs global placement using

recursive bisectioning. Thermal effects are incorporated through thermal resistance reduction nets,
which are attractive forces that induce high power nets to remain close to the heat sink. The global
Alpert/Handbook of Algorithms for Physical Design Automation AU7242_C047 Finals Page 991 10-10-2008 #8
Physical Design for Three-Dimensional Circuits 991
placement step is followed by coarse legalization, in which a novel cell-shifting approach is proposed.
This generalizes the methods in FastPlace, described in Chapter 18, by allowing shift moves to adjust
the boundaries of both sparsely and densely populated cells using a computationally simple method.
Finally, detailed legalization generates a final nonoverlapping layout. The approach is shown to
provide excellent trade-offs between parameters such as the number of interlayer vias, wirelength,
and temperature.
47.2.4 ROUTING ALGORITHMS
During routing, several objectives and constraints must be taken into consideration, including avoid-
ing blockages due to areas occupied by thermal vias, incorporating the effect of temperature on the
delays of therouted wires, andofcourse, traditional objectivessuch aswirelength, timing,congestion,
and routing completion.
Once the cells have been placed and the locations of the thermal vias determined, the routing
stage finds the optimal interconnections between the wires. As in 2D routing, it is important to
optimize the wirelength, the delay, and the congestion. In addition, several 3D-specific issues come
into play. First, the delay of a wire increases with its temperature, so that more critical wires should
avoid the hottest regions, as far as possible. Second, intertier vias are a valuable resource that must
be optimally allocated among the nets. Third, congestion management and blockage avoidance is
more complex with the addition of a third dimension. For instance, a signal via or thermal via that
spans two or more tiers constitutes a blockage that wires must navigate around.
Consider the problem of routing in a three-tier technology, as illustrated in Figure 47.5. The
layout is gridded into rectangular tiles, each with a horizontal and vertical capacity that determines
the number of wires that can traverse the tile, and an intertier via capacity that determines the number
of free vias available in that tile. These capacities account for the resources allocated for nonsignal
wires (e.g., power and clock wires) as well as the resources used by thermal vias. For a single net,
as shown in the figure, the degrees of freedom that are available are in choosing the locations of the
intertier vias, and selecting the precise routes within each tier. The locations of intertier vias will

depend on the resource contention for vias within each grid. Moreover, critical wires should avoid
the high-temperature tiles, as far as possible.
The work in Ref. [14] presents a thermally conscious router, using a multilevel rou ting paradigm
similar to Ref. [15,16], with integrated intertier via planning and incorporating thermal considera-
tions. An initial routing solution is constructed by building a 3D minimum spanning tree (MST) for
each multipin net, and using maze routing to avoid obstacles.
At each level of the multilevel scheme, the intertier via planning problem assigns vias in a given
region at level k − 1 of the multilevel hierarchy to tiles at level k. The problem is formulated as
Tier 1
Tier 2
Tier 3
FIGURE 47.5 Example route for a net in a three-tier 3D technology. (From Ababei, C., et al., IEEE Design
and Test, 22, 520, 2005. Cop yright IEEE. With permission.)

×