CS 704
Advanced Computer Architecture
Lecture 6
Instruction Set Principles
(ISA Performance Analysis, Fallacies and Pitfalls)
Prof. Dr. M. Ashraf Chughtai
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
1
Today’s Topics
Recap Lecture 5
DSP Media Operations
ISA Performance
Putting it all Together
Summary
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
2
Recap: Lecture 5
Instruction encoding
- Essential elements of computer instruction
-
-
word:
- Type of operands
- Places of source and destinations
- Place of next instruction
Instruction word length
- Variable Length
- Fixed length
- Hybrid – variable fixed
Categories of Hybrid length
4, 3, 2, 1 and 0 address format
MAC/VU-Advanced
Computer Architecture
Lecture 5 - Instruction Set Principles ..
Cont'd
3
Recap: Lecture 5
….. Cont’d
- Comparison of hybrid instruction word format
Minimum number of memory bytes are required in case
of 1 address (accumulator) format
Maximum for 4-address format
- MIPS Instruction word format
- RISC and MIPS a fixed length, 64-bit LOAD/STORE
Architecture
- It supports:
- 8-, 16-, 32- and 64-bit operand
- R-type, I-type and J-type
- Arithmetic and logic operation
- data transfer operations
- Control flow operations
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
4
Media and Signal Processing Operands
Graphic applications deal with 2D and 3D images
3D data type is called vertex
Vertex structure has 4-components
-
x- coordinate
y- coordinate
z- coordinate
w-coordinate
The three vertices specify a graphic primitive, such as a
triangle; and the fourth to help with color and hidden
surfaces
Vertex values are usually 32-bit Floating point values
DSP adds fixed point to the data types – binary point just
to the right of the sign-bit
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
5
3D Data Type
A triangle is visible when it is depicted as
filled with pixels
Pixels are typically 32-bits, usually
consisting of four 8-bit channels
-
R -red
G-green
B-blue
A: Transparency of pixel
when it is depicted
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
6
Media and Signal Processing Operations
Data for multimedia operations is
usually much narrower than the 64-bit
data word of modern processors
Thus, 64-bit may be partitioned in to
four 16-bit data values so that the 64bit ALU to perform four 16-bit
operations (say add operation) in a
single clock cycle
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
7
Media and Signal Processing Operations
Here, extra hardware is added to
prevent the ‘CARRY’ between the four
16-bit partitions of 64-bit ALU
These operations are called Single-
Instruction Multiple-Data (SIMD) or
vector operations
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
8
Multimedia Operations
Most graphic multimedia applications
use 32-bit floating point operations
allowing a single instruction to launch
two 32-bit operations on operands
found side-by-side in double precision
register
The table shown here summarizes
SIMD instructions found in recent
computers
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
9
Summary of SIMD instructions
in recent computers
Insert Table given in Fig. 2.17 from page 110
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
10
Multimedia Operations
You may note that there is very little
common across the five architectures
All are fixed-width operation ,
performing multiple narrow operations
on either 64-bit or 128-bit ALU
The narrow operation are shown as
B-byte,
H-half word
W-word and
8B double word
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
11
Digital Signals Processing Issues
Saturating Add/Subtract
Too Large Result and Overflow
Result Rounding
Choose from IEEE 754 mode
algorithms
Multiply Accumulate
Vector and Matrix dot product operations
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
12
DSP Operations
Saturating Add/Sub
DSP cannot ignore results of
overflow otherwise it may miss an
event, therefore, it uses saturating
arithmetic.
- Here, if the result is too large to be
presented it is set to the largest
representable number, based on the
sign of the number
-
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
13
DSP Operations
Result Rounding
IEEE 754 has several algorithms to round
the wider accumulator into narrower one,
DSPs select the appropriate mode to
round the result
Multiply-Accumulate (MAC)
MAC operations are the key to dot
product operations of vector and matrix
multiply which need to accumulate a
series of product
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
14
ISA Performance
Role of Compiler
The interaction of compiler and highlevel languages significantly effects how
program uses an ISA
-
- Optimizations performed by the
compilers can be classified as follows:
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
15
Classification of Performance
optimization
-
High-level optimization: is often done on the
source with the output fed to the later
optimization passes.
- Local Optimization: is done within a straightline code fragment (basic block)
- Global Optimization: extends the optimization
across branches
- Register Allocation: associate registers with
operands
- Processor-dependent optimization: using the
specific architecture
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
16
Impact of Compiler Technology
-
Interaction of compiler and high-level language
affects how a program uses an ISA
-
Here, two important questions are:
1:
2:
-
How are variables allocated?
How many registers are needed to
allocate variables appropriately?
These questions are addressed by using three
areas in which high-level language allocates
data
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
17
Three areas of data allocation
1: Local Variable area – Stack
-
It is used to allocate local variable
it grows or shrinks on procedure call or
return
- Objects on stack are primarily scalar –
single variable rather than arrays and are
addressed by stack-pointer
- Register allocation is much more
effective for stack-allocated objects
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
18
Three areas of data allocation
… Cont’d:
2: Global Data Area
-
It is used to allocate statically declared objects
such as global variables and constants
- These objects are mostly arrays and other
aggregate data structures
- Register allocation is relatively less effective
for global variables
- Global variables are aliased – there are
multiple way to address so make it illegal to put
on registers
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
19
Three areas of data allocation
… Cont’d:
3: Dynamic Object Allocation: Heap
- It is used to allocate the objects that
do not
adhere to stack
- The objects in heap are accessed
with pointer but are not scalars
- Most heap variable are aliased so
register
allocation is almost
impossible for heap
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
20
ISA Performance … Cont’d
MIPS Floating-point Operations
The instructions manipulate the floatingpoint registers
- They indicate whether the operation is to
be performed on single precision or
double precision
-
MOV.S copies a single precision register to
another of the same type
MOV.D copies a Double precision register to
another of the same type
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
21
MIPS Floating-point Operations … Cont’d
To get greater performance for graphic
routines, MIPS64 offers Paired-Single
Instructions
- These instructions perform two 32-bit
floating point operations on each half of
the 64-bit floating point register
Examples:
-
ADD.PS
SUB.PS
MUL.PS
DIV.PS
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
22
Putting it All Together
The earliest architectures were limited to
instruction sets by the hardware
technology of that time
-
-
In the 1960s, stack architecture became
popular, viewed as being good match of
high-level language
- In the 1970s, the main concern of the
architectures was to reduce the software
cost, thus produced high-level
architectures such as VAX machine
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
23
Putting it All Together .. Cont’d
In the 1980s, return to simpler
architecture took place due to
sophisticated compiler technology
-
- In the 1990s, new architectures were
introduced; these include:
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
24
Putting it All Together .. Cont’d
1990s Architectures
1: Address size doubles – 32-bit to 64-bit
2: Optimization of conditional branches via
conditional execution e.g.; conditional move
3: Optimization of Cache performance via
pre-fetch that increased the role of memory
hierarchy in performance of computers
4: Multimedia support
5: Faster Floating point instructions
6: Long Instruction Word
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
25