Tải bản đầy đủ (.pdf) (30 trang)

Advanced Computer Architecture - Lecture 9: Computer hardware design

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (944.09 KB, 30 trang )

CS 704
Advanced Computer Architecture

Lecture 9
Computer Hardware Design
(Multi Cycle and Pipeline - Datapath and Control Design)

Prof. Dr. M. Ashraf Chughtai


Today’s Topics
Recap: multi cycle datapath and control
Features of Multi cycle design
Multi Cycle Control Design
Introduction to Pipeline datapath
Summary

MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware
Design (3)

2


Recap: Lecture 8
Information flow and Control signals for
single cycles data path to execute:
– Add/Subtract Instruction
– Immediate Instruction


– Load/Store Instructions
– Control Instructions
Analysis of single cycle data path
How effectively are different sections used?
…. Next please
MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware
Design (3)

3


How effectively different sections are used?
– Memory is used twice, at different times
(i.e., Instruction Fetch and Load or Store)
– Adders in IF section are used once for fraction
of time (Fetch Phase)
– ALU is used for the execution of R-type
instructions and memory address calculation

Conclusion:
We can reduce H/W without hurting
performance by using extra control
MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware Design
(3)


4


Multiple Cycle Approach
Cycle
Clk

Clk

I fetch

ID/Reg

Exec

Mem

Cycle 1

Cycle 2

Cycle 3

Cycle 4

Wr

Cycle 5


Clk

The single cycle operations are performed in five steps:
Instruction Fetch
Instruction Decode and Register Read
Execute (R- I-type or address for Load/store/Branch)
Memory (Read/write)
Write (to register file)
MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware
Design (3)

5


Multiple Cycle Approach
In the Single Cycle implementation, the cycle time
is set to accommodate the longest instruction, the
Load instruction.
In the Multiple Cycles implementation, the cycle
time is set to accomplish longest step, the
memory read/write
Consequently, the cycle time for the Single Cycle
implementation can be five times longer than the
multiple cycle implementation.
As an example, if T = 5 µ Sec. for single cycle then
T= 1 µ Sec. for multi cycle implementation
MAC/VU-Advanced

Computer Architecture

Lecture 9 – Computer Hardware
Design (3)

6


Single Cycle vs. Multiple Cycle
Single Cycle Implementation:
Cycle 1

Cycle 2

Clk

Load

Store

Waste

Multiple Cycle Implementation:
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10
Clk
Store

Load
I fetch


ID/Reg

MAC/VU-Advanced
Computer Architecture

Exec

Mem

Wr

I fetch

ID/Reg

Lecture 9 – Computer Hardware Design
(3)

Exec

R­type
Mem

Ifetch

7


Single Cycle vs. Multiple Cycle: Explanation
For different classes of instructions, Multi Cycle

implementation may take 3, 4 or 5 cycles to fetch
and execute an instruction
Now in order to compare the performance of
single cycle and multi cycle implementations, let
us consider a program segment comprising three
instructions, given in the sequence:
Load
Store
R-type (say Add)

MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware Design
(3)

8


Single Cycle vs. Multiple Cycle: Explanation
The execution time for these three instructions
using single cycle implementation with cycle
length equals 5 µ Sec is:

T exe = 3 x 5 µ Sec = 15 µsec.
Note that here the cycle time is long enough for
the load instruction, but it is too long for the Store
and R-type instruction
So the last part of the cycle, in case of the store
and 4th (memory) part in case of R-type instruction

is wasted.
MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware Design
(3)

9


Single Cycle vs. Multiple Cycle: Explanation
In Multi cycle implementation, Load is completed
in 5 Cycles, and store and R-type each takes 4
cycles to complete.
Thus, these three instructions take 5+4+4 = 13
cycles, if the cycle length is 1 µ Sec then the
execution time for the three instructions is:

T exe = 13 x 1 µ Sec = 13 µsec.
Conclusion:
The multi cycle is 15/13 = 1.24 times faster
Next: High-view of multi cycle datapath
MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware Design
(3)

10



High Level View of Multiple Cycle Datapath

Data

Memory
P
C

Inst.
Reg.

Address Inst.
Or data
Data
Reg.
Data

Rreg #
Rreg #
Register
File

A
ALU

ALUout

B


Wreg#

Explanation Next slide ……….
MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware Design
(3)

11


High level view of Multiple Cycle Datapath: Explanation
Here, a shared memory is used, as the instruction fetch and
data read/write are performed in different cycles
The single ALU is shared among the instruction fetch, execute
arithmetic and logic instructions and address calculation in
different cycles
The use of shared function unit (ALU) requires additional
multiplexers or widening of multiplexers
New temporary registers, Instruction register, Data memory,
operand A and B and ALUout, are included to hold the
information for use in later cycle
E.g.; Memory read in cycle 4 is written in cycle 5 (Load), operand
registers A and B read in cycle 2 may be used in cycle 3 or 4,
and so on
MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware Design

(3)

12


Multiple Cycle Datapath Design
PCWr

PCWrCond
Zero IRWr
IorD
MemWr

PCSrc
RegDst

Mux 2

32

Mux 1
1
32
32

Rs
RAdr

Ideal
Memory


32
WrAdr
Din Dout

IR

32

MDR

Rt

5

Rt 0

5

Mux 4

Rd

1

A
Reg File

Rb


Rw

busW busB 32

Extend

<< 2

1

4

Lecture 9 – Computer MemtoReg
Hardware Design
(3)

Zero

32

0

32

1
Mux5 32
2
3

32


ExtOp
MAC/VU-Advanced
Computer Architecture

32

B

1 Mux60

Imm 16

Mux3

Ra busA

ALUSelB

Target

ALU Out

0

0

0

32


ALU

32

1

32

PC
32

ALUSelA

RegWr

BrWr

ALU
Control

ALUOp

13


Multiple Cycle Datapath Architecture
Immunized Hardware: 1 memory, 1 adder
Cycle 1 - [Instruction Fetch]:
firstly, MUX-1 select input IorD =0 and the PC is

connected to the Memory Read address input
RAdr; instruction is fetched from the memory at
Dout and is placed in the Instruction Register by
inserting IRWr [Yellow Path]
Secondly, the select input ALUSelA to MUX-3, is
made equal to 0,, ALUSelB to MUX-5 is made equal
to 00 to add 4 to PC; then PCSrc of MUX-2 is made
0 and PCWr is asserted to load PC+4 to the PC as
address of the next instruction
MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware Design
(3)

14


Multiple Cycle Datapath Architecture
Cycle 2 – [ID and Reg. Rd.]

firstly the Instruction is decoded; the Rs, Rt,
Rd and Imm16 fields are made available on
respective lines (Shown in orange)
Secondly the registers at Rs and Rt are read
at buses A and B, respectively

MAC/VU-Advanced
Computer Architecture


Lecture 9 – Computer Hardware Design
(3)

15


Multiple Cycle Datapath Architecture
Cycle 3 - [Exe]
The select inputs ALUSelA and ALUSelB to the
MUX-3 and MUX-5, respectively for the instruction
in hand; available at ALUop input to the ALU
Control Unit
-

For R-type instructions:
ALUSelA = 1 and ALUSelB = 01 to connect bus
A and bus B to ALU to perform the operation
[Green Path]

-MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware Design
(3)

16


Multiple Cycle Datapath Architecture
-


For I-type and Memory Instructions:
ALUSelA = 1 and ALUSelB = 11 to connect bus
A and Sign Extended Imm16 to ALU to perform
the operation on immediate data [Red Path]
The ALU output is kept in ALU OUT Register as
result of ALU OP execution in case of I-type
operation and as Memory address in case of
memory instructions Load/store

MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware Design
(3)

17


Multiple Cycle Datapath Architecture
-

For J- type Instructions:
1: Condition Test: ALUSelA = 1 and
ALUSelB = 01; ALUop=SUB
If ALU output Zero =1 then
assert PCWrCond and
2: PC  PC+4+[Sign Extend Imm16 and Shift left 2 bits]
ALUSelA = 0 ; ALUSelB = 10
Assert BrWr ; and PCSrc of MUX-2 = 1 to

pass the target address to PC [Blue Path]

MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware Design
(3)

18


Multiple Cycle Datapath Architecture
Cycle 4 - [Memory Instruction Load/Store]
- Load instruction:
IorD=1 to pass the ALUout Register as RAdr
(Read Address) input to the memory to read
data at the Dout [Dark Green Path]
- Store instruction:
MemWr is asserted; as the ALUout Register
output is wired to WrAdr (Write address input)
[Dark Green Path] and bus B of the register file
is wired to Din (Data In)
[Dark blue] of the
memory
MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware Design
(3)


19


Multiple Cycle Datapath Architecture
Cycle 5 - [Write Back]
- R-type instruction:
RegDest of MUX-4 = 1 to select Rd as the
destination address; MemToReg = 0 to connect
ALUout to Bus-W and RegWr is asserted
memory
- I-type instruction:
RegDest of MUX-4 = 0 to select Rt as the
destination address; MemToReg = 0 to connect
ALUout to Bus-W and RegWr is asserted
memory
Load instruction next …
MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware Design
(3)

20


Multiple Cycle Datapath Architecture
Cycle 5 - [Write Back]
- Load instruction:
RegDest of MUX-4 = 0 to select Rt as the
destination address; MemToReg = 1 to connect

Dout of the memory to Bus-W or the register
file and RegWr is asserted

MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware Design
(3)

21


Multi Cycle Control design
Control may be designed in the following steps
using the initial representation as:
Finite State Machine
Here, the sequence control is defined by explicit
next state functions, logic is represented by logic
equations and usually PLAs are used to
implement the machine
Micro-program
-Here, micro-program counter and a dispatch
ROM defines the sequence control, logic is
represented by truth table and control is
implemented using ROM
MAC/VU-Advanced
Computer Architecture

Lecture 9 – Computer Hardware Design
(3)


22


Multi Cycle Controller FSM Specifications
IR <= MEM[PC]
PC <= PC + 4

0000

“instruction fetch”

“decode”

A <= R[rs]
B <= R[rt]

R-type

ORi

S <= A fun B S <= A op ZX

0100

0110

LW
S <= A + SX


1000

BEQ

SW

S <= A - B

S <= A + SX

1011

0010
Equal

M <= MEM[S]

1001

R[rd] <= S

R[rt] <= S

0101
MAC/VU-Advanced
Computer Architecture

0111

MEM[S] <= B


1100

R[rt] <= M

1010
Lecture 9 – Computer Hardware Design
(3)

~Equal

PC <= PC +
SX || 00

0011

Write-back

Memory Execute

0001

23


Micro program Controller
Control Logic

Multicycle
Datapath


Outputs
Inputs

1
Adder

State Reg
Address Select Logic

MAC/VU-Advanced
Computer Architecture

Opcode

Lecture 9 – Computer Hardware Design
(3)

24


“Macroinstruction” Interpretation
Main
Memory

ADD
SUB
AND

.

.
.
DATA

execution
unit
CPU

User program
plus Data
this can change!

one of these is
mapped into one
of these

AND microsequence

control
memory

e.g., Fetch
Calc Operand Addr
Fetch Operand(s)
Calculate
Save Answer(s)

MAC/VU-Advanced
Computer Architecture


Lecture 9 – Computer Hardware Design
(3)

25


×