CS 704
Advanced Computer Architecture
Lecture 9
Computer Hardware Design
(Multi Cycle and Pipeline - Datapath and Control Design)
Prof. Dr. M. Ashraf Chughtai
Today’s Topics
Recap: multi cycle datapath and control
Features of Multi cycle design
Multi Cycle Control Design
Introduction to Pipeline datapath
Summary
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware
Design (3)
2
Recap: Lecture 8
Information flow and Control signals for
single cycles data path to execute:
– Add/Subtract Instruction
– Immediate Instruction
– Load/Store Instructions
– Control Instructions
Analysis of single cycle data path
How effectively are different sections used?
…. Next please
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware
Design (3)
3
How effectively different sections are used?
– Memory is used twice, at different times
(i.e., Instruction Fetch and Load or Store)
– Adders in IF section are used once for fraction
of time (Fetch Phase)
– ALU is used for the execution of R-type
instructions and memory address calculation
Conclusion:
We can reduce H/W without hurting
performance by using extra control
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
4
Multiple Cycle Approach
Cycle
Clk
Clk
I fetch
ID/Reg
Exec
Mem
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Wr
Cycle 5
Clk
The single cycle operations are performed in five steps:
Instruction Fetch
Instruction Decode and Register Read
Execute (R- I-type or address for Load/store/Branch)
Memory (Read/write)
Write (to register file)
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware
Design (3)
5
Multiple Cycle Approach
In the Single Cycle implementation, the cycle time
is set to accommodate the longest instruction, the
Load instruction.
In the Multiple Cycles implementation, the cycle
time is set to accomplish longest step, the
memory read/write
Consequently, the cycle time for the Single Cycle
implementation can be five times longer than the
multiple cycle implementation.
As an example, if T = 5 µ Sec. for single cycle then
T= 1 µ Sec. for multi cycle implementation
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware
Design (3)
6
Single Cycle vs. Multiple Cycle
Single Cycle Implementation:
Cycle 1
Cycle 2
Clk
Load
Store
Waste
Multiple Cycle Implementation:
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Store
Load
I fetch
ID/Reg
MAC/VU-Advanced
Computer Architecture
Exec
Mem
Wr
I fetch
ID/Reg
Lecture 9 – Computer Hardware Design
(3)
Exec
Rtype
Mem
Ifetch
7
Single Cycle vs. Multiple Cycle: Explanation
For different classes of instructions, Multi Cycle
implementation may take 3, 4 or 5 cycles to fetch
and execute an instruction
Now in order to compare the performance of
single cycle and multi cycle implementations, let
us consider a program segment comprising three
instructions, given in the sequence:
Load
Store
R-type (say Add)
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
8
Single Cycle vs. Multiple Cycle: Explanation
The execution time for these three instructions
using single cycle implementation with cycle
length equals 5 µ Sec is:
T exe = 3 x 5 µ Sec = 15 µsec.
Note that here the cycle time is long enough for
the load instruction, but it is too long for the Store
and R-type instruction
So the last part of the cycle, in case of the store
and 4th (memory) part in case of R-type instruction
is wasted.
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
9
Single Cycle vs. Multiple Cycle: Explanation
In Multi cycle implementation, Load is completed
in 5 Cycles, and store and R-type each takes 4
cycles to complete.
Thus, these three instructions take 5+4+4 = 13
cycles, if the cycle length is 1 µ Sec then the
execution time for the three instructions is:
T exe = 13 x 1 µ Sec = 13 µsec.
Conclusion:
The multi cycle is 15/13 = 1.24 times faster
Next: High-view of multi cycle datapath
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
10
High Level View of Multiple Cycle Datapath
Data
Memory
P
C
Inst.
Reg.
Address Inst.
Or data
Data
Reg.
Data
Rreg #
Rreg #
Register
File
A
ALU
ALUout
B
Wreg#
Explanation Next slide ……….
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
11
High level view of Multiple Cycle Datapath: Explanation
Here, a shared memory is used, as the instruction fetch and
data read/write are performed in different cycles
The single ALU is shared among the instruction fetch, execute
arithmetic and logic instructions and address calculation in
different cycles
The use of shared function unit (ALU) requires additional
multiplexers or widening of multiplexers
New temporary registers, Instruction register, Data memory,
operand A and B and ALUout, are included to hold the
information for use in later cycle
E.g.; Memory read in cycle 4 is written in cycle 5 (Load), operand
registers A and B read in cycle 2 may be used in cycle 3 or 4,
and so on
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
12
Multiple Cycle Datapath Design
PCWr
PCWrCond
Zero IRWr
IorD
MemWr
PCSrc
RegDst
Mux 2
32
Mux 1
1
32
32
Rs
RAdr
Ideal
Memory
32
WrAdr
Din Dout
IR
32
MDR
Rt
5
Rt 0
5
Mux 4
Rd
1
A
Reg File
Rb
Rw
busW busB 32
Extend
<< 2
1
4
Lecture 9 – Computer MemtoReg
Hardware Design
(3)
Zero
32
0
32
1
Mux5 32
2
3
32
ExtOp
MAC/VU-Advanced
Computer Architecture
32
B
1 Mux60
Imm 16
Mux3
Ra busA
ALUSelB
Target
ALU Out
0
0
0
32
ALU
32
1
32
PC
32
ALUSelA
RegWr
BrWr
ALU
Control
ALUOp
13
Multiple Cycle Datapath Architecture
Immunized Hardware: 1 memory, 1 adder
Cycle 1 - [Instruction Fetch]:
firstly, MUX-1 select input IorD =0 and the PC is
connected to the Memory Read address input
RAdr; instruction is fetched from the memory at
Dout and is placed in the Instruction Register by
inserting IRWr [Yellow Path]
Secondly, the select input ALUSelA to MUX-3, is
made equal to 0,, ALUSelB to MUX-5 is made equal
to 00 to add 4 to PC; then PCSrc of MUX-2 is made
0 and PCWr is asserted to load PC+4 to the PC as
address of the next instruction
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
14
Multiple Cycle Datapath Architecture
Cycle 2 – [ID and Reg. Rd.]
firstly the Instruction is decoded; the Rs, Rt,
Rd and Imm16 fields are made available on
respective lines (Shown in orange)
Secondly the registers at Rs and Rt are read
at buses A and B, respectively
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
15
Multiple Cycle Datapath Architecture
Cycle 3 - [Exe]
The select inputs ALUSelA and ALUSelB to the
MUX-3 and MUX-5, respectively for the instruction
in hand; available at ALUop input to the ALU
Control Unit
-
For R-type instructions:
ALUSelA = 1 and ALUSelB = 01 to connect bus
A and bus B to ALU to perform the operation
[Green Path]
-MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
16
Multiple Cycle Datapath Architecture
-
For I-type and Memory Instructions:
ALUSelA = 1 and ALUSelB = 11 to connect bus
A and Sign Extended Imm16 to ALU to perform
the operation on immediate data [Red Path]
The ALU output is kept in ALU OUT Register as
result of ALU OP execution in case of I-type
operation and as Memory address in case of
memory instructions Load/store
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
17
Multiple Cycle Datapath Architecture
-
For J- type Instructions:
1: Condition Test: ALUSelA = 1 and
ALUSelB = 01; ALUop=SUB
If ALU output Zero =1 then
assert PCWrCond and
2: PC PC+4+[Sign Extend Imm16 and Shift left 2 bits]
ALUSelA = 0 ; ALUSelB = 10
Assert BrWr ; and PCSrc of MUX-2 = 1 to
pass the target address to PC [Blue Path]
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
18
Multiple Cycle Datapath Architecture
Cycle 4 - [Memory Instruction Load/Store]
- Load instruction:
IorD=1 to pass the ALUout Register as RAdr
(Read Address) input to the memory to read
data at the Dout [Dark Green Path]
- Store instruction:
MemWr is asserted; as the ALUout Register
output is wired to WrAdr (Write address input)
[Dark Green Path] and bus B of the register file
is wired to Din (Data In)
[Dark blue] of the
memory
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
19
Multiple Cycle Datapath Architecture
Cycle 5 - [Write Back]
- R-type instruction:
RegDest of MUX-4 = 1 to select Rd as the
destination address; MemToReg = 0 to connect
ALUout to Bus-W and RegWr is asserted
memory
- I-type instruction:
RegDest of MUX-4 = 0 to select Rt as the
destination address; MemToReg = 0 to connect
ALUout to Bus-W and RegWr is asserted
memory
Load instruction next …
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
20
Multiple Cycle Datapath Architecture
Cycle 5 - [Write Back]
- Load instruction:
RegDest of MUX-4 = 0 to select Rt as the
destination address; MemToReg = 1 to connect
Dout of the memory to Bus-W or the register
file and RegWr is asserted
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
21
Multi Cycle Control design
Control may be designed in the following steps
using the initial representation as:
Finite State Machine
Here, the sequence control is defined by explicit
next state functions, logic is represented by logic
equations and usually PLAs are used to
implement the machine
Micro-program
-Here, micro-program counter and a dispatch
ROM defines the sequence control, logic is
represented by truth table and control is
implemented using ROM
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
22
Multi Cycle Controller FSM Specifications
IR <= MEM[PC]
PC <= PC + 4
0000
“instruction fetch”
“decode”
A <= R[rs]
B <= R[rt]
R-type
ORi
S <= A fun B S <= A op ZX
0100
0110
LW
S <= A + SX
1000
BEQ
SW
S <= A - B
S <= A + SX
1011
0010
Equal
M <= MEM[S]
1001
R[rd] <= S
R[rt] <= S
0101
MAC/VU-Advanced
Computer Architecture
0111
MEM[S] <= B
1100
R[rt] <= M
1010
Lecture 9 – Computer Hardware Design
(3)
~Equal
PC <= PC +
SX || 00
0011
Write-back
Memory Execute
0001
23
Micro program Controller
Control Logic
Multicycle
Datapath
Outputs
Inputs
1
Adder
State Reg
Address Select Logic
MAC/VU-Advanced
Computer Architecture
Opcode
Lecture 9 – Computer Hardware Design
(3)
24
“Macroinstruction” Interpretation
Main
Memory
ADD
SUB
AND
.
.
.
DATA
execution
unit
CPU
User program
plus Data
this can change!
one of these is
mapped into one
of these
AND microsequence
control
memory
e.g., Fetch
Calc Operand Addr
Fetch Operand(s)
Calculate
Save Answer(s)
MAC/VU-Advanced
Computer Architecture
Lecture 9 – Computer Hardware Design
(3)
25