dce
2013
COMPUTER ARCHITECTURE
CSE Fall 2013
BK
TP.HCM
Faculty of Computer Science and
Engineering
Department of Computer Engineering
Vo Tan Phuong
/>CuuDuongThanCong.com
/>
dce
2013
Chapter 4.1
Single Cycle Processor Design
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
2
dce
Presentation Outline
2013
Designing a Processor: Step-by-Step
Datapath Components and Clocking
Assembling an Adequate Datapath
Controlling the Execution of Instructions
The Main Controller and ALU Controller
Drawback of the single-cycle processor design
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
3
dce
2013
The Performance Perspective
Recall, performance is determined by:
Instruction count
I-Count
Clock cycles per instruction (CPI)
Clock cycle time
Processor design will affect
CPI
Cycle
Clock cycles per instruction
Clock cycle time
Single cycle datapath and control design:
Advantage: One clock cycle per instruction
Disadvantage: long cycle time
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
4
dce
2013
Designing a Processor: Step-by-Step
Analyze instruction set => datapath requirements
The meaning of each instruction is given by the register transfers
Datapath must include storage elements for ISA registers
Datapath must support each register transfer
Select datapath components and clocking methodology
Assemble datapath meeting the requirements
Analyze implementation of each instruction
Determine the setting of control signals for register transfer
Assemble the control logic
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
5
dce
2013
Review of MIPS Instruction Formats
All instructions are 32-bit wide
Three instruction formats: R-type, I-type, and J-type
Op6
Rs5
Rt5
Op6
Rs5
Rt5
Op6
Rd5
sa5
funct6
immediate16
immediate26
Op6: 6-bit opcode of the instruction
Rs5, Rt5, Rd5: 5-bit source and destination register numbers
sa5: 5-bit shift amount used by shift instructions
funct6: 6-bit function field for R-type instructions
immediate16: 16-bit immediate value or address offset
immediate26: 26-bit target address of the jump instruction
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
6
dce
2013
MIPS Subset of Instructions
Only a subset of the MIPS instructions are considered
ALU instructions (R-type): add, sub, and, or, xor, slt
Immediate instructions (I-type): addi, slti, andi, ori, xori
Load and Store (I-type): lw, sw
Branch (I-type): beq, bne
Jump (J-type): j
This subset does not include all the integer instructions
But sufficient to illustrate design of datapath and control
Concepts used to implement the MIPS subset are used
to construct a broad spectrum of computers
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
7
dce
Details of the MIPS Subset
2013
Instruction
add
sub
and
or
xor
slt
addi
slti
andi
ori
xori
lw
sw
beq
bne
j
Meaning
rd, rs, rt
addition
rd, rs, rt
subtraction
rd, rs, rt
bitwise and
rd, rs, rt
bitwise or
rd, rs, rt
exclusive or
rd, rs, rt
set on less than
rt, rs, im16
add immediate
rt, rs, im16
slt immediate
rt, rs, im16
and immediate
rt, rs, im16
or immediate
rt, im16
xor immediate
rt, im16(rs)
load word
rt, im16(rs)
store word
rs, rt, im16
branch if equal
rs, rt, im16 branch not equal
im26
jump
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
Format
op6 = 0
op6 = 0
op6 = 0
op6 = 0
op6 = 0
op6 = 0
0x08
0x0a
0x0c
0x0d
0x0e
0x23
0x2b
0x04
0x05
0x02
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rs5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rt5
rd5
rd5
rd5
rd5
rd5
rd5
0
0
0
0
0
0
im16
im16
im16
im16
im16
im16
im16
im16
im16
0x20
0x22
0x24
0x25
0x26
0x2a
im26
/>
© Fall 2013, CS
8
dce
Register Transfer Level (RTL)
2013
RTL is a description of data flow between registers
RTL gives a meaning to the instructions
All instructions are fetched from memory at address PC
Instruction
RTL Description
ADD
Reg(Rd) ← Reg(Rs) + Reg(Rt);
PC ← PC + 4
SUB
Reg(Rd) ← Reg(Rs) – Reg(Rt);
PC ← PC + 4
ORI
Reg(Rt) ← Reg(Rs) | zero_ext(Im16);
PC ← PC + 4
LW
Reg(Rt) ← MEM[Reg(Rs) + sign_ext(Im16)];
PC ← PC + 4
SW
MEM[Reg(Rs) + sign_ext(Im16)] ← Reg(Rt);
PC ← PC + 4
BEQ
if (Reg(Rs) == Reg(Rt))
PC ← PC + 4 + 4 × sign_extend(Im16)
else PC ← PC + 4
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
9
dce
2013
Instructions are Executed in Steps
R-type
Fetch instruction:
Fetch operands:
Execute operation:
Write ALU result:
Next PC address:
Instruction ← MEM[PC]
data1 ← Reg(Rs), data2 ← Reg(Rt)
ALU_result ← func(data1, data2)
Reg(Rd) ← ALU_result
PC ← PC + 4
I-type
Fetch instruction:
Fetch operands:
Execute operation:
Write ALU result:
Next PC address:
Instruction ← MEM[PC]
data1 ← Reg(Rs), data2 ← Extend(imm16)
ALU_result ← op(data1, data2)
Reg(Rt) ← ALU_result
PC ← PC + 4
BEQ
Fetch instruction:
Fetch operands:
Equality:
Branch:
Instruction ← MEM[PC]
data1 ← Reg(Rs), data2 ← Reg(Rt)
zero ← subtract(data1, data2)
if (zero) PC ← PC + 4 + 4×sign_ext(imm16)
else
PC ← PC + 4
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
10
dce
Instruction Execution – cont’d
2013
LW
Fetch instruction:
Fetch base register:
Calculate address:
Read memory:
Write register Rt:
Next PC address:
Instruction ← MEM[PC]
base ← Reg(Rs)
address ← base + sign_extend(imm16)
data ← MEM[address]
Reg(Rt) ← data
PC ← PC + 4
SW
Fetch instruction:
Fetch registers:
Calculate address:
Write memory:
Next PC address:
Instruction ← MEM[PC]
base ← Reg(Rs), data ← Reg(Rt)
address ← base + sign_extend(imm16)
MEM[address] ← data
PC ← PC + 4
Jump
Fetch instruction:
Target PC address:
Jump:
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
concatenation
Instruction ← MEM[PC]
target ← PC[31:28] || Imm26 || ‘00’
PC ← target
/>
© Fall 2013, CS
11
dce
2013
Requirements of the Instruction Set
Memory
Instruction memory where instructions are stored
Data memory where data is stored
Registers
31 × 32-bit general purpose registers, R0 is always zero
Read source register Rs
Read source register Rt
Write destination register Rt or Rd
Program counter PC register and Adder to increment PC
Sign and Zero extender for immediate constant
ALU for executing instructions
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
12
dce
Next . . .
2013
Designing a Processor: Step-by-Step
Datapath Components and Clocking
Assembling an Adequate Datapath
Controlling the Execution of Instructions
The Main Controller and ALU Controller
Drawback of the single-cycle processor design
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
13
dce
2013
Components of the Datapath
Combinational Elements
32
0
ALU, Adder
16
Extend
m
u
x
32
Immediate extender
Multiplexers
select
Storage Elements
32
PC
Instruction memory
32
32
Clocking methodology
Timing of writes
32
CuuDuongThanCong.com
32
Data_out
Data_in
32
clk
Mem
Read
Registers
5
32
RA
BusA
RB
BusB
5
Mem
Write
32
5
RW
BusW
clk
RegWrite
Computer Architecture – Chapter 4.1
overflow
Address
Instruction
Memory
PC register
Register file
ALU result
Data
Memory
Address
clk
zero
32
ALU control
Instruction
32
Data memory
32
1
ExtOp
A
L
U
32
/>
© Fall 2013, CS
14
dce
Register Element
2013
Register
Data_In
Similar to the D-type Flip-Flop
n bits
n-bit input and output
Write Enable (WE):
Write
Enable
Register
WE
Clock
n bits
Enable / disable writing of register
Negated (0): Data_Out will not change
Data_Out
Asserted (1): Data_Out will become Data_In after clock edge
Edge triggered Clocking
Register output is modified at clock edge
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
15
dce
MIPS Register File
2013
RW RA RB
Register File consists of 32 × 32-bit registers
BusA and BusB: 32-bit output busses for reading 2 registers
BusW: 32-bit input bus for writing a register when RegWrite is 1
Two registers read and one written in a cycle
Registers are selected by:
RA selects register to be read on BusA
RB selects register to be read on BusB
RW selects the register to be written
Clock input
5
RA
Register
File BusA
32
5
RB
32
5
BusB
RW
Clock
BusW
RegWrite
32
The clock input is used ONLY during write operation
During read, register file behaves as a combinational logic block
RA or RB valid => BusA or BusB valid after access time
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
16
dce
Details of the Register File
2013
RA 5
Decoder
32
R0 is
not used
WE
Decoder
5
.
.
.
32
WE
"0"
Tri-state
buffers
R1
32
RW
"0"
RB 5
Decoder
32
R2
32
32
.
.
.
BusW
32
BusA
WE
R31
32
RegWrite
32
BusB
Clock
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
17
dce
Tri-State Buffers
2013
Allow multiple sources to drive a single bus
Two Inputs:
Enable
Data_in
Enable (to enable output)
Data_in
Data_out
One Output: Data_out
If (Enable) Data_out = Data_in
else Data_out = High Impedance state (output is disconnected)
Tri-state buffers can be
used to build multiplexors
Data_0
Output
Data_1
Select
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
18
dce
Building a Multifunction ALU
Shift/Rotate
Operation
2013
2
SLL = 00
SRL = 00
SRA = 01
ROR = 11
Shift Amount
Shifter
32
A
Arithmetic
Operation
SLT: ALU does a
SUB and check the
sign and overflow
5
B
c0
32
32
ADD = 0
SUB = 1
sign
32
A
d
d
e
r
0
Logical
Operation
1
2
CuuDuongThanCong.com
ALU Result
1
32
2
3
2
overflow
Logic Unit
AND = 00
OR = 01
NOR = 10
XOR = 11
≠
0
3
2
Computer Architecture – Chapter 4.1
zero
ALU
Selection
Shift = 00
SLT = 01
Arith = 10
Logic = 11
/>
© Fall 2013, CS
19
dce
2013
Instruction and Data Memories
Instruction memory needs only provide read access
Because datapath does not write instructions
Behaves as combinational logic for read
Address selects Instruction after access time
32
32
Address Instruction
Instruction
Memory
Data Memory is used for load and store
MemRead: enables output on Data_out
Address selects the word to put on Data_out
MemWrite: enables writing of Data_in
Data
Memory
32
Address selects the memory word to be written
32
Address Data_out
32
Data_in
The Clock synchronizes the write operation
Clock
Separate instruction and data memories
Later, we will replace them with caches
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
MemRead
/>
MemWrite
© Fall 2013, CS
20
dce
Clocking Methodology
2013
Clocks are needed in a sequential
We assume edgelogic to decide when a state element
triggered clocking
(register) should be updated
All state changes
Combinational logic
clock
rising edge
CuuDuongThanCong.com
falling edge
Computer Architecture – Chapter 4.1
Register 2
Register 1
occur on the same
To ensure correctness, a clocking
clock edge
methodology defines when data can
Data must be valid
be written and read
and stable before
arrival of clock
edge
Edge-triggered
clocking allows a
register to be read
and written during
same clock cycle
/>
© Fall 2013, CS
21
dce
Determining the Clock Cycle
2013
Register 2
Register 1
With edge-triggered clocking, the clock cycle must be
long enough to accommodate the path from one register
through the combinational logic to another register
Combinational logic
clock
writing edge
Tclk-q
Tmax_comb
Ts
Tcycle ≥ Tclk-q + Tmax_comb + Ts
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
Tclk-q : clock to output delay
through register
Tmax_comb : longest delay
through combinational logic
Ts : setup time that input to a
register must be stable
before arrival of clock edge
Th: hold time that input to a
Th
register must hold after
arrival of clock edge
Hold time (Th) is normally
satisfied since Tclk-q > Th
/>
© Fall 2013, CS
22
dce
Clock Skew
2013
Clock skew arises because the clock signal uses different
paths with slightly different delays to reach state elements
Clock skew is the difference in absolute time between
when two storage elements see a clock edge
With a clock skew, the clock cycle time is increased
Tcycle ≥ Tclk-q + Tmax_combinational + Tsetup+ Tskew
Clock skew is reduced by balancing the clock delays
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
23
dce
Next . . .
2013
Designing a Processor: Step-by-Step
Datapath Components and Clocking
Assembling an Adequate Datapath
Controlling the Execution of Instructions
The Main Controller and ALU Controller
Drawback of the single-cycle processor design
CuuDuongThanCong.com
Computer Architecture – Chapter 4.1
/>
© Fall 2013, CS
24
dce
Instruction Fetching Datapath
2013
We can now assemble the datapath from its components
For instruction fetching, we need …
Program Counter (PC) register
Instruction Memory
Adder for incrementing PC
The least significant 2 bits
of the PC are ‘00’ since
PC is a multiple of 4
next PC
4
32
32
Address
Instruction
Memory
CuuDuongThanCong.com
00
Instruction
Datapath does not
handle branch or
jump instructions
Computer Architecture – Chapter 4.1
Improved
Datapath
+1
30
30
32
Instruction
32
PC
PC
clk
next PC
32
00
32
A
d
d
Improved datapath
increments upper
30 bits of PC by 1
clk
Address
Instruction
Memory
/>
© Fall 2013, CS
25