Tải bản đầy đủ (.pdf) (71 trang)

Advanced Computer Architecture - Lecture 44: Putting it all together

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.01 MB, 71 trang )

CS 704
Advanced Computer Architecture

Lecture 44
Putting It All Together
(Case Studies)

Prof. Dr. M. Ashraf Chughtai


Today’s Topics
Case Studies
 Power PC 750 Architecture
 Power PC 970 Architecture
 Intel Pentium – VI Architecture
Summary

MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

2


PowerPC 750 - General
PowerPC 750 is an implementation of PowerPC
microprocessor family of reduced instruction
set computer (RISC) microprocessors
750 implements the 32-bit portion of the
PowerPC architecture


It provides 32-bit effective addresses for:
– Integer data types of 8, 16, and 32 bits
– Floating-point data types of 32 and 64 bits
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

3


PowerPC 750 – General …cont’d
It is high-performance, superscalar microprocessor architecture that has Six execution
units and two register files
It can:
– fetch from the instruction cache as many as
four instructions per cycle
– dispatch as many as two instructions per clock
– execute as many as six instructions per clock

MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

4


PowerPC Instructions
Instructions are encoded as single-word (32-bit)

Instruction formats are consistent among all
instruction types, permitting efficient decoding
to occur in parallel with operand accesses
This fixed instruction length and consistent
format greatly simplifies instruction pipelining
Integer instructions are:
Integer arithmetic, Integer compare, logical,
rotate and shift
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

5


PowerPC Instructions … Cont’d
Floating-point instructions are:
Floating-point arithmetic, multiply/add,
rounding and conversion,
compare, status and control instructions
Load/store instructions are:
Integer and Floating-point load and store; and
atomic memory operations (lwarx and stwcx)
instructions
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)


6


PowerPC Instructions

.. Cont’d

Flow control instructions are:
branching, condition register logical, trap, and
other instructions that affect the instruction
flow
Processor control instructions are used for
synchronizing memory accesses and
management of caches, TLBs, and the segment
registers
Memory control instructions provide control of
caches, TLBs, and SRs
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

7


PowerPC 750 Block Diagram

MAC/VU-Advanced
Computer Architecture


Lecture 44 Putting it all together (1)

8


PowerPC 750 Block Diagram
Branch
IF Processing
DISPATCH Registers
Instruction
& Rename Buffer
Reservation
Stations
EXE
L2 Cache
Interface
COM
Data Cache (L1)
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

Cache (L1)

9


PowerPC 750 – Instruction Flow
Now let discuss the instruction flow in

PowerPC 750, which includes:
 Instruction fetch,
 Instruction decode and
 Instruction dispatch

The instruction flow in PowerPC 750 is
illustrated here with the help of block diagram
PowerPC 750 allows maximum four instruction
fetch per clock cycle
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

10


PowerPC 750: Instruction Flow (decode/dispatch)
Fetch: Maximum 4 inst
per cycle
Instruction Queue
Branch
Processing Unit
BPU)

Dispatch Unit
Max. 2 Inst/cycle; I Inst/unit

Completion Queue
Assignment

Reservation
Stations

Store Queue

Complete
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

Completion Queue
11


PowerPC 750 – Instruction Fetch .. Cont’d
However, the number of clock cycles
necessary to request instructions from the
memory system depends on where exactly is
the:
1. branch target instruction cache
2. on-chip instruction L1 cache
3. L2 cache
Having understood the instruction let us
discuss how the PowerPC decodes and
dispatch the instruction
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)


12


PowerPC 750 – Decode/Dispatch
Refer to the instruction flow diagram again and
note that:
– Instructions can be dispatched only from the
two lowest instruction queue entries, IQ0 and
IQ1
– A maximum of two instructions can be
dispatched per clock cycle (although an
additional branch instruction can be handled
by the Branch Processing Unit-BPU
– Only one instruction can be dispatched to each
execution unit per clock cycle
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

13


PowerPC 750 – Decode/Dispatch
Note that to facilitate dispatch:
– There must be a vacancy in the specified
execution unit
– A rename register must be available for each
destination operand specified by the

instruction
– There must be an open position in the
completion queue; If no entry is available, the
instruction remains in the IQ.
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

14


PowerPC 750: Superscalar Pipeline
Maximum four instruction
fetch per clock cycle
Maximum three
instructions dispatch per
clock cycle

Maximum three
instructions completion
per cycle
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

15



PowerPC 750 – Execution Units
Refer to the PowerPC 750 superscalar pipeline
shown here and note that it contains two
integer units (IUs),
– IU1 can execute any integer instruction
– IU2 can execute all integer instructions except
multiply and divide
Which share thirty-two GPRs for integer
operands and a Single-entry reservation
station for each
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

16


PowerPC 750 – Execution Units
Furthermore, there exist
– One three-stage floating point unit (FPU) that
allows both single- and double-precision
operations
– Hardware support for demoralized numbers

and Single-entry reservation station are
provided
– Thirty-two 64-bit FPRs for single- or double-

precision operands

MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

17


PowerPC 750 – Execution Units

…..Cont’d

Two-stage LSU (Load/Store Unit) contains
– Two-entry reservation station
– Single-cycle, pipelined cache access
– Three-entry store queue

Supports both big- and little-endian modes
It’s dedicated adder performs (extended
addition) EA calculations
It performs alignment and precision conversion
for floating-point data and sign extension for
integer data
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

18



PowerPC 750: Completion Unit
Completion unit retires an instruction from the
six-entry reorder buffer (completion queue)
when:
1.

All instructions ahead of it have been
completed, and
2.
The instruction has finished execution, and
3.
No exceptions are pending
The completion unit guarantees sequential
programming model (precise exception model)
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

19


PowerPC 750 Completion Unit
Monitors all dispatched instructions and retires
them in order
Tracks unresolved branches and flushes
instructions from the mispredicted branch
Retires as many as two instructions per clock


MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

20


PowerPC 750 Rename Buffers
750 provides rename registers for holding
instruction results before the completion
commits them to the architected register
Refer to the instruction flow diagram again
and note that there are six GPR rename
registers, six FPR rename registers, and one
each for the CR, LR, and CTR
When an instruction is dispatched to its
execution unit, a rename register for the
results of that instruction is assigned
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

21


PowerPC 750 Rename Buffers
Dispatcher also provides a tag to the execution
unit identifying the rename register that

forwards the required data for an instruction
When the source data reaches the rename
register, execution can begin
Results are transferred from the rename
registers to the architected registers by the
completion unit when an instruction is retired
from completion queue
Results of squashed instructions are flushed
from the rename registers
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

22


PowerPC 750 Branch Prediction Unit
Featuring both static and dynamic branch
predictions, only one is used at any given time
Static branch prediction
– It is defined by the PowerPC architecture and
involves encoding the branch instructions
– The PowerPC architecture provides a field in
branch instructions (the BO field) to allow
software to hint whether a branch is likely to be
taken
MAC/VU-Advanced
Computer Architecture


Lecture 44 Putting it all together (1)

23


PowerPC 750 Branch Prediction Unit
– Rather than delaying instruction processing
until the condition is known, the 750 uses the
instruction encoding to predict whether the
branch is likely to be taken and begins fetching
and executing along that path
Dynamic branch prediction:
– 750 use the 512-entry Branch history table
(BHT) with two bits per entry
– Allows prediction as: Not-taken, strongly nottaken, taken, strongly taken
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

24


PowerPC 750

Branch Target Cache - BTC

750 uses the BTC to reduce time required for
fetching target instructions when branch is
predicted to be taken

Branch Target Instruction Cache (BTIC)
– 64-entry (16-set, four-way set-associative)
– Cache of branch instructions that have been
encountered in branch/loop code sequences
– BTIC hit: instructions are fetched into the
instruction queue a cycle sooner than it can be
made available from the instruction cache
MAC/VU-Advanced
Computer Architecture

Lecture 44 Putting it all together (1)

25


×