Tải bản đầy đủ (.pdf) (52 trang)

kiến trúc máy tính võ tần phương chương ter04 1 single cycle processor sinhvienzone com

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.61 MB, 52 trang )

dce
2013

COMPUTER ARCHITECTURE
CSE Fall 2013

BK
TP.HCM

Faculty of Computer Science and
Engineering
Department of Computer Engineering

Vo Tan Phuong
/>CuuDuongThanCong.com

/>

dce
2013

Chapter 4.1
Single Cycle Processor Design

CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS


2


dce

Presentation Outline

2013

 Designing a Processor: Step-by-Step
 Datapath Components and Clocking

 Assembling an Adequate Datapath
 Controlling the Execution of Instructions
 The Main Controller and ALU Controller
 Drawback of the single-cycle processor design

CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS

3


dce
2013


The Performance Perspective
 Recall, performance is determined by:
 Instruction count

I-Count

 Clock cycles per instruction (CPI)
 Clock cycle time

 Processor design will affect

CPI

Cycle

 Clock cycles per instruction
 Clock cycle time

 Single cycle datapath and control design:
 Advantage: One clock cycle per instruction
 Disadvantage: long cycle time

CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS

4



dce
2013

Designing a Processor: Step-by-Step
 Analyze instruction set => datapath requirements
 The meaning of each instruction is given by the register transfers
 Datapath must include storage elements for ISA registers

 Datapath must support each register transfer

 Select datapath components and clocking methodology

 Assemble datapath meeting the requirements
 Analyze implementation of each instruction
 Determine the setting of control signals for register transfer

 Assemble the control logic
CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS

5


dce

2013

Review of MIPS Instruction Formats
 All instructions are 32-bit wide
 Three instruction formats: R-type, I-type, and J-type
Op6

Rs5

Rt5

Op6

Rs5

Rt5

Op6

Rd5

sa5

funct6

immediate16
immediate26

 Op6: 6-bit opcode of the instruction
 Rs5, Rt5, Rd5: 5-bit source and destination register numbers

 sa5: 5-bit shift amount used by shift instructions
 funct6: 6-bit function field for R-type instructions
 immediate16: 16-bit immediate value or address offset
 immediate26: 26-bit target address of the jump instruction

CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS

6


dce
2013

MIPS Subset of Instructions
 Only a subset of the MIPS instructions are considered
 ALU instructions (R-type): add, sub, and, or, xor, slt
 Immediate instructions (I-type): addi, slti, andi, ori, xori
 Load and Store (I-type): lw, sw
 Branch (I-type): beq, bne

 Jump (J-type): j

 This subset does not include all the integer instructions
 But sufficient to illustrate design of datapath and control
 Concepts used to implement the MIPS subset are used

to construct a broad spectrum of computers
CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS

7


dce

Details of the MIPS Subset

2013

Instruction
add
sub
and
or
xor
slt
addi
slti
andi
ori
xori
lw

sw
beq
bne
j

Meaning

rd, rs, rt
addition
rd, rs, rt
subtraction
rd, rs, rt
bitwise and
rd, rs, rt
bitwise or
rd, rs, rt
exclusive or
rd, rs, rt
set on less than
rt, rs, im16
add immediate
rt, rs, im16
slt immediate
rt, rs, im16
and immediate
rt, rs, im16
or immediate
rt, im16
xor immediate
rt, im16(rs)

load word
rt, im16(rs)
store word
rs, rt, im16
branch if equal
rs, rt, im16 branch not equal
im26
jump
CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

Format
op6 = 0
op6 = 0
op6 = 0
op6 = 0
op6 = 0
op6 = 0
0x08
0x0a
0x0c
0x0d
0x0e
0x23
0x2b
0x04
0x05
0x02


rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5

rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5

rt5

rd5
rd5
rd5
rd5
rd5
rd5

0
0
0
0
0
0
im16
im16
im16
im16
im16
im16
im16
im16
im16

0x20
0x22
0x24
0x25
0x26

0x2a

im26
/>
© Fall 2013, CS

8


dce

Register Transfer Level (RTL)

2013

 RTL is a description of data flow between registers
 RTL gives a meaning to the instructions
 All instructions are fetched from memory at address PC
Instruction

RTL Description

ADD

Reg(Rd) ← Reg(Rs) + Reg(Rt);

PC ← PC + 4

SUB


Reg(Rd) ← Reg(Rs) – Reg(Rt);

PC ← PC + 4

ORI

Reg(Rt) ← Reg(Rs) | zero_ext(Im16);

PC ← PC + 4

LW

Reg(Rt) ← MEM[Reg(Rs) + sign_ext(Im16)];

PC ← PC + 4

SW

MEM[Reg(Rs) + sign_ext(Im16)] ← Reg(Rt);

PC ← PC + 4

BEQ

if (Reg(Rs) == Reg(Rt))
PC ← PC + 4 + 4 × sign_extend(Im16)
else PC ← PC + 4

CuuDuongThanCong.com


Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS

9


dce
2013

Instructions are Executed in Steps
 R-type

Fetch instruction:
Fetch operands:
Execute operation:
Write ALU result:
Next PC address:

Instruction ← MEM[PC]
data1 ← Reg(Rs), data2 ← Reg(Rt)
ALU_result ← func(data1, data2)
Reg(Rd) ← ALU_result
PC ← PC + 4

 I-type

Fetch instruction:
Fetch operands:

Execute operation:
Write ALU result:
Next PC address:

Instruction ← MEM[PC]
data1 ← Reg(Rs), data2 ← Extend(imm16)
ALU_result ← op(data1, data2)
Reg(Rt) ← ALU_result
PC ← PC + 4

 BEQ

Fetch instruction:
Fetch operands:
Equality:
Branch:

Instruction ← MEM[PC]
data1 ← Reg(Rs), data2 ← Reg(Rt)
zero ← subtract(data1, data2)
if (zero) PC ← PC + 4 + 4×sign_ext(imm16)
else
PC ← PC + 4

CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS


10


dce

Instruction Execution – cont’d

2013

 LW

Fetch instruction:
Fetch base register:
Calculate address:
Read memory:
Write register Rt:
Next PC address:

Instruction ← MEM[PC]
base ← Reg(Rs)
address ← base + sign_extend(imm16)
data ← MEM[address]
Reg(Rt) ← data
PC ← PC + 4

 SW

Fetch instruction:
Fetch registers:

Calculate address:
Write memory:
Next PC address:

Instruction ← MEM[PC]
base ← Reg(Rs), data ← Reg(Rt)
address ← base + sign_extend(imm16)
MEM[address] ← data
PC ← PC + 4

 Jump

Fetch instruction:
Target PC address:
Jump:

CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

concatenation

Instruction ← MEM[PC]
target ← PC[31:28] || Imm26 || ‘00’
PC ← target

/>
© Fall 2013, CS

11



dce
2013

Requirements of the Instruction Set
 Memory
 Instruction memory where instructions are stored
 Data memory where data is stored

 Registers
 31 × 32-bit general purpose registers, R0 is always zero
 Read source register Rs
 Read source register Rt
 Write destination register Rt or Rd

 Program counter PC register and Adder to increment PC
 Sign and Zero extender for immediate constant
 ALU for executing instructions
CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS

12


dce


Next . . .

2013

 Designing a Processor: Step-by-Step
 Datapath Components and Clocking

 Assembling an Adequate Datapath
 Controlling the Execution of Instructions
 The Main Controller and ALU Controller
 Drawback of the single-cycle processor design

CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS

13


dce
2013

Components of the Datapath

 Combinational Elements


32

0

 ALU, Adder

16

Extend

m
u
x

32

 Immediate extender

 Multiplexers

select

 Storage Elements

32

PC

 Instruction memory


32

32

 Clocking methodology
 Timing of writes

32

CuuDuongThanCong.com

32

Data_out
Data_in

32

clk
Mem
Read

Registers
5

32

RA

BusA


RB

BusB

5

Mem
Write

32

5

RW

BusW

clk
RegWrite

Computer Architecture – Chapter 4.1

overflow

Address

Instruction
Memory


 PC register
 Register file

ALU result

Data
Memory

Address

clk

zero
32

ALU control

Instruction
32

 Data memory

32

1
ExtOp

A
L
U


32

/>
© Fall 2013, CS

14


dce

Register Element

2013

 Register
Data_In

 Similar to the D-type Flip-Flop

n bits

 n-bit input and output
 Write Enable (WE):

Write
Enable

Register


WE

Clock

n bits

 Enable / disable writing of register
 Negated (0): Data_Out will not change

Data_Out

 Asserted (1): Data_Out will become Data_In after clock edge

 Edge triggered Clocking
 Register output is modified at clock edge

CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS

15


dce

MIPS Register File


2013

RW RA RB

 Register File consists of 32 × 32-bit registers
 BusA and BusB: 32-bit output busses for reading 2 registers

 BusW: 32-bit input bus for writing a register when RegWrite is 1
 Two registers read and one written in a cycle

 Registers are selected by:
 RA selects register to be read on BusA
 RB selects register to be read on BusB
 RW selects the register to be written

 Clock input

5

RA

Register
File BusA

32

5

RB


32

5

BusB

RW
Clock

BusW

RegWrite

32

 The clock input is used ONLY during write operation
 During read, register file behaves as a combinational logic block
 RA or RB valid => BusA or BusB valid after access time
CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS

16


dce


Details of the Register File

2013

RA 5
Decoder

32

R0 is
not used

WE

Decoder

5

.
.
.
32

WE

"0"
Tri-state
buffers

R1


32

RW

"0"

RB 5
Decoder

32

R2
32

32

.
.
.

BusW

32

BusA
WE

R31
32


RegWrite

32

BusB

Clock
CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS

17


dce

Tri-State Buffers

2013

 Allow multiple sources to drive a single bus
 Two Inputs:

Enable

 Data_in

 Enable (to enable output)

Data_in

Data_out

 One Output: Data_out
 If (Enable) Data_out = Data_in
else Data_out = High Impedance state (output is disconnected)

 Tri-state buffers can be
used to build multiplexors

Data_0
Output
Data_1
Select

CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS

18


dce


Building a Multifunction ALU

Shift/Rotate
Operation

2013

2

SLL = 00
SRL = 00
SRA = 01
ROR = 11

Shift Amount

Shifter
32

A
Arithmetic
Operation

SLT: ALU does a
SUB and check the
sign and overflow

5

B


c0

32

32

ADD = 0
SUB = 1

sign

32

A
d
d
e
r

0

Logical
Operation

1
2

CuuDuongThanCong.com


ALU Result

1

32

2
3
2

overflow

Logic Unit
AND = 00
OR = 01
NOR = 10
XOR = 11



0

3
2

Computer Architecture – Chapter 4.1

zero

ALU

Selection
Shift = 00
SLT = 01
Arith = 10
Logic = 11

/>
© Fall 2013, CS

19


dce
2013

Instruction and Data Memories
 Instruction memory needs only provide read access
 Because datapath does not write instructions
 Behaves as combinational logic for read

 Address selects Instruction after access time

32

32

Address Instruction

Instruction
Memory


 Data Memory is used for load and store
 MemRead: enables output on Data_out
 Address selects the word to put on Data_out

 MemWrite: enables writing of Data_in

Data
Memory
32

 Address selects the memory word to be written

32

Address Data_out
32

Data_in

 The Clock synchronizes the write operation

Clock

 Separate instruction and data memories
 Later, we will replace them with caches

CuuDuongThanCong.com

Computer Architecture – Chapter 4.1


MemRead

/>
MemWrite

© Fall 2013, CS

20


dce

Clocking Methodology

2013

 Clocks are needed in a sequential
 We assume edgelogic to decide when a state element
triggered clocking
(register) should be updated
 All state changes

Combinational logic

clock
rising edge

CuuDuongThanCong.com


falling edge

Computer Architecture – Chapter 4.1

Register 2

Register 1

occur on the same
 To ensure correctness, a clocking
clock edge
methodology defines when data can
 Data must be valid
be written and read
and stable before
arrival of clock
edge

 Edge-triggered
clocking allows a
register to be read
and written during
same clock cycle
/>
© Fall 2013, CS

21


dce


Determining the Clock Cycle

2013

Register 2

Register 1

 With edge-triggered clocking, the clock cycle must be
long enough to accommodate the path from one register
through the combinational logic to another register

Combinational logic
clock
writing edge

Tclk-q

Tmax_comb

Ts

Tcycle ≥ Tclk-q + Tmax_comb + Ts
CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

 Tclk-q : clock to output delay
through register

 Tmax_comb : longest delay
through combinational logic
 Ts : setup time that input to a
register must be stable
before arrival of clock edge

 Th: hold time that input to a
Th
register must hold after
arrival of clock edge
 Hold time (Th) is normally
satisfied since Tclk-q > Th
/>
© Fall 2013, CS

22


dce

Clock Skew

2013

 Clock skew arises because the clock signal uses different
paths with slightly different delays to reach state elements

 Clock skew is the difference in absolute time between
when two storage elements see a clock edge
 With a clock skew, the clock cycle time is increased


Tcycle ≥ Tclk-q + Tmax_combinational + Tsetup+ Tskew
 Clock skew is reduced by balancing the clock delays
CuuDuongThanCong.com

Computer Architecture – Chapter 4.1

/>
© Fall 2013, CS

23


dce

Next . . .

2013

 Designing a Processor: Step-by-Step
 Datapath Components and Clocking

 Assembling an Adequate Datapath
 Controlling the Execution of Instructions
 The Main Controller and ALU Controller
 Drawback of the single-cycle processor design

CuuDuongThanCong.com

Computer Architecture – Chapter 4.1


/>
© Fall 2013, CS

24


dce

Instruction Fetching Datapath

2013

 We can now assemble the datapath from its components
 For instruction fetching, we need …
 Program Counter (PC) register

 Instruction Memory
 Adder for incrementing PC
The least significant 2 bits
of the PC are ‘00’ since
PC is a multiple of 4

next PC

4
32

32


Address

Instruction
Memory

CuuDuongThanCong.com

00

Instruction

Datapath does not
handle branch or
jump instructions

Computer Architecture – Chapter 4.1

Improved
Datapath

+1

30

30

32

Instruction
32


PC

PC

clk

next PC

32

00

32

A
d
d

Improved datapath
increments upper
30 bits of PC by 1

clk

Address

Instruction
Memory


/>
© Fall 2013, CS

25


×