VLSI Design Automation
Maurizio Palesi
Maurizio Palesi
1
The Inverted Pyramid
Electronic Systems > $1 Trillion
Semiconductor > $220 B
CAD $3 B
Maurizio Palesi
2
1
IC Products
Processors
Î CPU, DSP, Controllers
Memory chips
Î RAM, ROM, EEPROM
Analog
Î Mobile communication,
audio/video processing
Programmable
Î PLA, FPGA
Embedded systems
Î Used in cars, factories
Î Network cards
System-on-chip (SoC)
Maurizio Palesi
3
Integrated Circuit Revolution
1958: First integrated circuit (germanium)
Built by Jack Kilby at Texas Instruments:
Contained five components: transistors, resistors,
and capacitors
Maurizio Palesi
4
2
Integrated Circuit Revolution
1972: Intel 4004 Microprocessor
Clock speed: 108 KHz
# Transistors: 2,300
# I/O pins: 16
Technology: 10µm
Maurizio Palesi
5
Integrated Circuit Revolution
2000: Intel Pentium 4 Processor
Clock speed: 1.5 GHz
# Transistors: 42 million
Technology: 0.18µm CMOS
Maurizio Palesi
6
3
Integrated Circuit Revolution
2006: Intel Core 2 Duo
Clock speed: 3.73 GHz
# Transistors: 1 billion
Technology: 65nm CMOS
Maurizio Palesi
7
Moore’s Law
Gordon Moore predicted in 1965 that the number of transistors that can
be integrated on a die would double every 18 months.
Maurizio Palesi
8
4
Semiconductor Growth
Maurizio Palesi
9
What Makes it Happen
Maurizio Palesi
10
5
Processor Power (Watts)
Maurizio Palesi
11
Intel Microprocessor Performance
Maurizio Palesi
12
6
Increasing Device and Context Complexity
Exponential increase in device
complexity
More complex system contexts
ÎSystem contexts in which devices are
deployed (e.g. cellular radio) are increasing
in complexity
Complexity
ÎIncreasing with Moore's law (or faster)!
Require exponential increases in design
productivity
We
Wehave
haveexponentially
exponentiallymore
moretransistors!
transistors!
Maurizio Palesi
13
Deep Submicron Effects
ÎCross-coupled capacitances
ÎSignal integrity
ÎResistance
ÎInductance
DSM Effects
Smaller geometries are causing a
wide variety of effects that we
have largely ignored in the past:
Design
Designof
ofeach
eachtransistor
transistorisisgetting
gettingmore
moredifficult!
difficult!
Maurizio Palesi
14
7
Heterogeneity on Chip
Greater diversity of
on-chip elements
ÎProcessors
ÎSoftware
ÎMemory
ÎAnalog
Heterogeneity
More
Moretransistors
transistorsdoing
doingdifferent
differentthings!
things!
Maurizio Palesi
15
Stronger Market Pressures
Decreasing design
window
Less tolerance for design
revisions
Maurizio Palesi
Time-to-market
16
8
A Quadruple
Complexity
Time-to-market
Heterogeneity
DSM Effects
Exponentially
Exponentiallymore
morecomplex,
complex,greater
greaterdesign
designrisk,
risk,
greater
variety,
and
a
smaller
design
greater variety, and a smaller designwindow!
window!
Maurizio Palesi
17
How Are We Doing?
58% / Yr. compound
complexity growth rate
1,000,000
gi
Lo
100,000
hip
./C
r
cT
100,000,000
10,000,000
1,000,000
100,000
10,000
Productivity
gap
1,000
.M
Tr./S
10,000
1,000
100
21% / Yr. compound
productivity growth rate
10
100
Productivity
Trans. / Staff . Month
Logic transistors per chip
(K)
10,000,000
2009
2005
2001
1997
1993
1989
1985
1981
10
Role of EDA: close the productivity gap
Maurizio Palesi
18
9
The Evolution of Design Methodology
We are now entering the era of block-based
design (BBD)
ASIC/ASSP
Design
Yesterday
Bus Standards,
Predictable, Preverified
IP/Block
Authoring
Today
VSI Compatible Standards,
Predictable, Preverified
Maurizio Palesi
System-Board
Integration
System-Chip
Integration
19
Trends
A inexpensive 200mm2 die has 75M logic transistors, or 375M SRAM
transistors
A 1000mm2 die would have 400M logic transistors
Core
Trans. Count
486DX4
0.7 million
Pentium/MMX
2.8 million
MPEG-2 encoder
1.5 million
MPEG-2 decoder
0.5 million
8051 microcontroller
0.05 million
Maurizio Palesi
Up to 7500 8051 microcontrollers
on a single chip!
150 – 750 cores on a single chip
Pentium/MMX
20
10
The Evolution of SoC Platforms
General-purpose
Scalable RISC
Processor
• 50 to 300+ MHz
• 32-bit or 64-bit
Library of Device
IP Blocks
• Image
coprocessors
• DSPs
• UART
• 1394
• USB
Scalable VLIW
Media Processor:
• 100 to 300+ MHz
• 32-bit or 64-bit
Nexperia™
System Buses
• 32-128 bit
2 Cores: Philips’ Nexperia PNX8850 SoC platform for High-end digital video (2001)
Maurizio Palesi
21
Running Forward…
Four 350/400 MHz StarCore SC140
DSP extended cores
16 ALUs: 5600/6400 MMACS
1436 KB of internal SRAM & multilevel memory hierarchy
Internal DMA controller supports 16
TDM unidirectional channels,
Two internal coprocesssors (TCOP
and VCOP) to provide specialpurpose processing capability in
parallel with the core processors
6 Cores: Motorola’s MSC8126 SoC platform for 3G base stations (late 2003)
Maurizio Palesi
22
11
What´s Happening in SoCs?
Technology: no slow-down in sight!
ÎFaster and smaller transistors: 90 → 65 → 45 nm
Î… but slower wires, lower voltage, more noise!
9 80% or more of the delay of critical paths will be due to
interconnects
Design complexity: from 2 to 10 to 100 cores!
ÎDesign reuse is essential
Î…but differentiation/innovation is key for winning on the
market!
Performance and power: GOPS for MWs!
ÎPerformance requirements keep going up
Î…but power budgets don’t!
Maurizio Palesi
23
The Deep Submicron Effects
Maurizio Palesi
24
12
Communication Architectures
Shared bus
IP
ÎLow area
ÎPoor scalability
ÎHigh energy consumption
IP
IP
Shared bus
IP
IP
IP
Network-on-Chip
ÎScalability and modularity
ÎLow energy consumption
ÎIncrease of design complexity
IP
IP
IP
IP
IP
IP
IP
IP
IP
Maurizio Palesi
25
IC Design Steps
Specifications
Specifications
Maurizio Palesi
High-level
High-level
Description
Description
Functional
Functional
Description
Description
Behavioral
VHDL, C
Structural
VHDL
26
13
IC Design Steps
High-level
High-level
Description
Description
Specifications
Specifications
Functional
Functional
Description
Description
Synthesis
Physical
Design
Placed
Placed
&&Routed
Routed
Design
Design
Technology
Mapping
Logic
Logic
Description
Description
Gate-level
Gate-level
Design
Design
Fabrication
Packaging
X=(AB*CD)+
(A+D)+(A(B+C))
Y = (A(B+C)+AC+
D+A(BC+D))
Maurizio Palesi
27
Design of Microelectronic Circuits
Design
Idea
Modeling
Fabrication
Mask
Fabrication
Synthesis and
Optimization
Validation
Packaging
Slicing
Wafer
Fabrication
Testing
Tester
1000010001
0010101010
1100101010
Packaging
Maurizio Palesi
1101110001
28
14
Circuit Models
A model of a circuit is an abstraction
ÎA representation that shows relevant features
without associated details
Circuit
CircuitModel
Model
(few
(fewdetails)
details)
Circuit
CircuitModel
Model
(many
(manydetails)
details)
Synthesis
Synthesis
Maurizio Palesi
29
Model Classification
Models
Levels of Abstractions
Circuit Views
Architectural
Behavioral
Logic
Structural
Geometrical
Physical
Maurizio Palesi
30
15
Levels of Abstraction
Architectural
ÎA circuit performs a set of operation, such as
data computation or transfer
9HDL models, Flow diagrams, …
Logic
ÎA circuit evaluate a set of logic functions
9FSMs, Schematics, …
Geometrical
ÎA circuit is a set of geometrical entities
9Floor plans, layouts, ...
Maurizio Palesi
31
Levels of Abstraction
Models
Levels of Abstractions
Architectural
…
PC = PC + 1;
Fetch(PC);
Decode(Inst);
...
Design consists of
refining the abstract
specification of the
architectural model into
the detailed geometricallevel model
Logic
Geometrical
Maurizio Palesi
32
16
Views of a Model
Behavioral
ÎDescribe the function of a circuit regardless of
its implementation
Structural
ÎDescribe a model as an interconnection of
components
Physical
ÎRelate to the physical object (e.g., transistors)
of a design
Maurizio Palesi
33
The Y-chart
Structural-view
Behavioral-view
Architectural-level
Logic-level
Geometrical-level
Physical-view
Maurizio Palesi
Gajski and Kuhn’s Y-chart
(Silicon Compilers, Addison-Wesley, 1987)
34
17
The Y-chart
Structural-view
Behavioral-view
…
PC = PC + 1;
Fetch(PC);
Decode(Inst);
...
MULT
CTRL
Architectural
level
ADD
RAM
S0
S3
Logic level
S1
S2
Geometrical
level
Physical-view
Maurizio Palesi
35
Synthesis
Behavioral-view
High-level synthesis
(or architectural synthesis)
Structural-view
Assignment to resources
Interconnection
Scheduling
Architectural-level
Logic synthesis
Interconnection of istances
of library cells (technology
mapping)
Logic-level
Physical design
Geometrical-level
Physical layout of the chip
(placement, routing)
Physical-view
Maurizio Palesi
36
18
Architectural Synthesis
Identify the resources that can implement the operations
Scheduling the execution time of the operation
Binding them to the resources
Control
ControlUnit
Unit
Behavioral
Behavioral
Architectural-level
Architectural-level
Circuit
CircuitModel
Model
Architectural
Architectural
Synthesis
Synthesis
Datapath
Datapath
Maurizio Palesi
37
Architectural Syntesis (Example)
Solve numerically (by means of the forward Euler method) the
differential equation y’’+3xy’+3y=0 in the interval [0,a] with step-size dx
and initial values x(0)=x; y(0)=y; y’(0)=u.
diffeq {
read (x, y, u, dx, a);
repeat {
x1 = x + dx;
u1 = u - (3*x*u*dx) - (3*y*dx);
y1 = y + u*dx;
c = x1 < a;
x = x1; u = u1; y = y1;
}
until(c);
write(y);
}
Maurizio Palesi
Behavioral View
38
19
Architectural Syntesis (Example)
Let us assume that the data path of the circuit contains two resources:
Î 1 multiplier
Î 1 ALU (add, sub, comparison)
MUL
ALU
Steering
&
Memory
Structural View
Control
Unit
Data path
Maurizio Palesi
39
Logic Synthesis
HDL
…
PC = PC + 1;
Fetch(PC);
Decode(Inst);
...
FSM
S0
S3
S1
Logic
Logic
Synthesis
Synthesis
S2
Schematic
MULT
Gate-level netlist
CTRL
ADD
RAM
Maurizio Palesi
40
20
Logic Synthesis (Example)
MUL
ALU
Reset state
(reading the data)
Steering
&
Memory
Control
Unit
S1
r’
*
Writing the data
diffeq {
read (x, y, u, dx, a);
repeat {
x1 = x + dx;
u1 = u - (3*x*u*dx) - (3*y*dx);
y1 = y + u*dx;
c = x1 < a;
x = x1; u = u1; y = y1;
}
until(c);
write(y);
}
r
S9
r
cr’
r
r
S8
r
r’
r
c’r’
r’
S2
S3
r
r’
S7
S4
r’
r’
S6
S5
r’
Maurizio Palesi
41
Logic Synthesis (Example)
S1
r’
*
r
S9
r
c’r’
cr’
S8
r’
r
r
r
S2
r
r’
r
r’
S7
S4
r’
r’
S5
r’
r
c
λ
State
δ
To steering
& memory module
From steering
& memory module
S6
Maurizio Palesi
Logic
Logic
Synthesis
Synthesis
S3
42
21
Optimization
Quality measures
Î Performance
9 Combinational logic circuits
– Propagation delay through the critical path [sec]
9 Synchronous sequential circuits
–
–
–
–
Cycle time [sec]
Latency [clock cycles]
Execution time = Latency*Cycle time
Throughput (for pipeline organization)
Î Area
9 Logic gates, registers, wiring
Î Power
9 Energy
– Battery life, system weight, ...
9 Power
– Packaging, reliability, cost, ...
Maurizio Palesi
43
Optimization
a
x
y
u
+
*
x1
3x
dx
3
<
MUL
ALU
Steering
&
Memory
Control
Unit
*
c
+
3xudx
3
diffeq {
read (x, y, u, dx, a);
repeat {
x1 = x + dx;
u1 = u - (3*x*u*dx) - (3*y*dx);
y1 = y + u*dx;
c = x1 < a;
x = x1; u = u1; y = y1;
}
until(c);
write(y);
}
y
udx
*
u
y1
-
*
dx
3y
*
3ydx
-
6 clock cycles
u1
Maurizio Palesi
44
22
Optimization
x
y
u
dx
3
MUL
ALU
MUL
ALU
Steering
&
Memory
Control
Unit
diffeq {
read (x, y, u, dx, a);
repeat {
x1 = x + dx;
u1 = u - (3*x*u*dx) - (3*y*dx);
y1 = y + u*dx;
c = x1 < a;
x = x1; u = u1; y = y1;
}
until(c);
write(y);
}
*
+
3x
x1
*
3 udx
a
y
*
<
*
+
3xudx
c
3y
y1
-
*
u-3xudx
3ydx
-
4 clock cycles
u1
Maurizio Palesi
45
Design Space
Parameters
Î # of ALU (max 2), # of Multiplier
(max 2)
Area
Design space
Î System A = 1 alu, 1 multiplier
Î System B = 1 alu, 2 multiplier
Î System C = 2 alu, 1 multiplier
Î System D = 2 alu, 2 multiplier
Let us assume
Î Multiplier = 5 units of area
Î ALU = 1 unit of area
Î Control Unit + Steering logic = 1 unit
of area
Maurizio Palesi
15
14
13
12
11
10
9
8
7
D
B
Pareto points
C
A
1
2
3
4
5
6
Dominated
7
Latency
(clock cycles)
46
23