Tải bản đầy đủ (.pdf) (649 trang)

page - practical introduction to computer architecture (springer, 2009)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (9.45 MB, 649 trang )


Texts in Computer Science
Editors
David Gries
Fred B. Schneider

For further volumes:
/>

Daniel Page

A Practical Introduction to
Computer Architecture

123


Dr. Daniel Page
University of Bristol
Dept. Computer Science
Room 3.10, Merchant Venturers Building
Woodland Road, Bristol
United Kingdom, BS8 1UB


Series Editors
David Gries
Department of Computer Science
Upson Hall
Cornell University
Ithaca, NY 14853-7501, USA



Fred B. Schneider
Department of Computer Science
Upson Hall
Cornell University
Ithaca, NY 14853-7501, USA

ISBN 978-1-84882-255-9
e-ISBN 978-1-84882-256-6
DOI 10.1007/978-1-84882-256-6
Springer Dordrecht Heidelberg London New York
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Control Number: 2009922086
c Springer-Verlag London Limited 2009
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as
permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced,
stored or transmitted, in any form or by any means, with the prior permission in writing of the
publishers, or in the case of reprographic reproduction in accordance with the terms of licenses issued by
the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent
to the publishers.
The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a
specific statement, that such names are exempt from the relevant laws and regulations and therefore free
for general use.
The publisher makes no representation, express or implied, with regard to the accuracy of the information
contained in this book and cannot accept any legal responsibility or liability for any errors or omissions
that may be made.

Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)



To Kate ♥


Foreword

It is a great pleasure to write a preface to this book. In my view, the content is
unique in that it blends traditional teaching approaches with the use of mathematics
and a mainstream Hardware Design Language (HDL) as formalisms to describe
key concepts. The book keeps the “machine” separate from the “application” by
strictly following a bottom-up approach: it starts with transistors and logic gates and
only introduces assembly language programs once their execution by a processor is
clearly defined.
Using a HDL, Verilog in this case, rather than static circuit diagrams is a big
deviation from traditional books on computer architecture. Static circuit diagrams
cannot be explored in a hands-on way like the corresponding Verilog model can. In
order to understand why I consider this shift so important, one must consider how
computer architecture, a subject that has been studied for more than 50 years, has
evolved.
In the pioneering days computers were constructed by hand. An entire computer
could (just about) be described by drawing a circuit diagram. Initially, such diagrams consisted mostly of analogue components before later moving toward digital logic gates. The advent of digital electronics led to more complex cells, such
as half-adders, flip-flops, and decoders being recognised as useful building blocks.
However, miniaturisation of devices and hence computers has led to the design of
single circuits containing millions or even billions of components. As a result, handlay-out is only used for specific modules, and circuit diagrams are less useful as a
mechanism for describing functionality for real circuits.
Instead, two formalisms are used in industry: HDLs and mathematics. A HDL
allows us to tell the component-layout and simulation tools how we would like to
implement our circuit; mathematics tells us what the circuit ought to do. In order
to verify whether the circuit does what we want it to, we can (partly mechanically)

compare the mathematical description with the HDL description. This represents
increased use of abstraction to cope with complexity, and an engineer can now be
productive by simply understanding and using high-level circuit design (e.g., multiplier design or pipelined processors) and formalisms (e.g., HDLs and mathematics).

vii


viii

Foreword

Circuit diagrams are still used in the design flow, but mostly to sketch the physical
layout, in order to predict whether a circuit can be laid out sensibly.
Dealing with gaps in understanding between such a wide range of concepts and
techniques is often off-putting for people new to the subject. The best way to approach the problem is by placing it within a practical context that enables students
to experiment with ideas and discover themselves the advantages and disadvantages
of a particular technique.
In this book, Dan does just that by giving an excellent overview of key concepts
and an introduction to formalisms with which they can be explored. I hope this book
will inspire many readers to follow a career in this fascinating subject.

Henk Muller.
Principal Technologist, XMOS.


Preface

In theory there is no difference between theory and practice. In practice there is.
– L.P. “Yogi” Berra


Overview
In my (limited) experience, and although there are a number of genuinely excellent textbooks on the subject, two main challenges exist in relation to delivery of
University level taught modules in computer architecture:
1. Such modules are often regarded as unpopular and irrelevant by students who
have not been exposed to the subject before, and who view a computer system
from the applications level. This is compounded by the prevalence of technologies such as Java which place a further layer between the student and actual
computer hardware. In short, and no matter how one tries to persuade them otherwise, students often see no point in learning about the internals of a computer
system because they cannot see the benefit.
2. Conventional textbooks teach the subject in a different way than in other modules
students are exposed to at the same time. For example, conventional wisdom says
that one cannot “teach” programming, one has to “do” programming in order
to learn. This is in stark contrast to textbooks on computer architecture where
students are often forced to learn in a more theoretical way, learning by taking
facts for granted rather than experimenting to arrive at their own conclusions. For
example, because of the difficulty in working with large logic designs on paper,
any practical work is often limited and hence detached from the more challenging
content.
I would argue that this is a shame: computer architecture represents a broad spectrum of fundamental and exciting topics that underpin computer science in general. Aside from the technical challenges and sense of achievement that stem from
ix


x

Preface

understanding exactly how high-level programs are actually executed on devices
built from simple building blocks, historical developments in computer architecture
neatly capture and explain many design decisions that have shaped a landscape we
now take for granted. The representation of strings in C is a great example: the nullterminated ASCIIZ approach was not adopted for any real reason other than the
PDP-7 computer included instructions ideal for processing strings in this form, and

yet we still live with this decision years after the PDP-7 became obsolete. Seemingly frivolous anecdotes and examples like this are increasingly being consigned
to history whereas from an Engineering perspective, one would like to learn and
understand previous approaches so as to potentially improve in the future.
International experts regularly debate tools and techniques for delivering
University-level modules in computer architecture; the Workshop on Computer Architecture Education (WCAE), currently held in conjunction with the International
Symposium on Computer Architecture (ISCA), is the premier research conference
in this area. This book represents an attempt at translating my personal philosophy,
that theoretical concepts should be accessible for practical experimentation, into a
form suitable for use in such modules. Put simply, I see computer architecture as a
subject in which “getting things done” is paramount; the ability to understand tradeoffs before selecting between and implementing well considered design options is
often as important as the study of those options at a more theoretical level. This focus is underlined by the book sub-title: a “practical” approach is the aim throughout.
To enable this, a key feature of this book is inclusion and use of a hardware description language (i.e., Verilog), and a concrete processor (i.e., MIPS32) as practical
vehicles for modelling and experimenting with digital logic and processor design.

Target Audience
The content is organised into three parts which contain a total of thirteen core chapters. Although some slight disagreement about inclusion of specific topics is inevitable, the chapters represent a compromise between my informal opinion, interests and experience, and more formal curriculum guidelines such as that developed
by the UK Quality Assurance Agency (QAA):
/>Of course, international equivalents exist; examples include those developed jointly
by the IEEE and ACM, leading professional bodies within this domain:
/>The general aim of this book is to cover topics every computer science student
should have at least a basic grasp of, and equip said students with enough knowledge to read and understand more advanced textbooks. In this respect, the core target audience is first-year Undergraduate students with a rudimentary knowledge of


Preface

xi

programming in C. More generally, the book content is pitched at a level which satisfies most of the demands that have resulted from our degree programmes at the
University of Bristol. In particular, the more advanced material has proved useful as
a bridge toward, or in support of, more specialised textbooks that cover later-year

Undergraduate and Postgraduate modules.

Organisation
The book chapters are described briefly below. Very roughly the three parts of the
book can be viewed as somewhat self-contained, representing three layers or levels
of abstraction: the digital logic layer, the instruction set and micro-architecture
layer, and the hardware/software interface. Part 1 deals with basic tools and techniques which underpin the rest of the book:
Chapter 1 Introduces the theoretical background including logic, sets, and number representation.
Chapter 2 Uses the theory developed in the previous chapter to describe the basics of digital logic including logic gates and their construction using transistors,
combinatorial and clocked circuits and their optimisation.
Chapter 3 As a means of realising the digital logic designs presented in the previous chapter, Verilog is presented in an introductory manner; this content is
written with a reader who is a proficient C programmer in mind.
Part 2 deals with the broad topic of processor design and implementation. The content takes a step-by-step approach, starting with a functional description of a computer processor and gradually expanding on the details, issues and techniques that
have resulted in modern, high-performance processor designs:
Chapter 4 The first step in introducing the design of processors (as an extension
to the study of general circuits) is to track historical developments and use them
as a means to explain central concepts such as the fetch-decode-execute cycle.
Chapter 5 Once the core concepts are introduced, a concrete realisation in the
form of MIPS32 is discussed; this discussion includes details such as addressing
modes and instruction encoding for example.
Chapter 6 This chapter outlines some basic methods for evaluating processors
in terms of various metrics which can be used to defined quality, focusing on
performance in particular.
Chapter 7 As a vital component in any processor, the design of efficient circuits
for arithmetic (e.g., addition and multiplication) is introduced and demonstrated
using Verilog.
Chapter 8 In the same way as arithmetic, the design of efficient components in
the memory hierarchy is introduced and demonstrated using Verilog.
Chapter 9 Finally, a number of more advanced topics in processor design are
investigated including approaches such as superscalar and vector processors.



xii

Preface

Finally, Part 3 attempts to bridge the gap between hardware and software by examining the programming tools and operating system concepts that support the development and execution of programs:
Chapter 10 As a first step, this chapter presents a detailed description of a development tool-chain, starting with linkers and assemblers. This material is supported by and links to Appendix A, which provides a stand-alone tutorial on using SPIM, a MIPS32 simulator. As such, it provides a concrete means of writing
programs for the processor design introduced earlier in the book.
Chapter 11 Following the previous chapter, compilers are now introduced by focusing on their aspects which are most closely tied to the processor they target.
For example, register allocation, instruction select and scheduling are all covered.
Chapter 12 With a means of writing programs established, the operating system layer is introduced in a practical way by using SPIM as a platform for real
implementations of concepts such as scheduling and interrupt handling.
Chapter 13 Finally, a somewhat novel chapter examines the concept of efficient
programming. This is written from the point of view of a programmer who wants
(or needs) to capitalise on the behaviour and characteristics of the concepts presented in previous chapters to improve their programs.
Each chapter concludes with a set of example questions which are largely at a level
one might encounter in a degree-level examination. These questions are numbered
consecutively (rather than relative to the chapter number) and a set of example solutions can be found in Appendix B.
Although this content might clearly contain accidental omissions, some topics
have been avoided by design. Optimistically, one might view them as ideal for inclusion in future versions of the book; more realistically they represent topics that
are important but not vital at the level the book is aimed at:
• Perhaps the largest omission, at least the one which seems likely to prompt the
loudest outcry, is that of floating point arithmetic. Although the book covers
floating point representation briefly, the circuits for arithmetic and instruction
sets that use them are integer only. The rationale for this decision was purely
space: floating point represents a fairly self-contained topic which could be left
out without too negative an impact on core topics covered by the rest of the
content.
• A similar situation exists with the topic of design verification: although testing

the digital logic designs one can produce with Verilog is important, one could
easily dedicate an entire book to the art of good design verification.
• The main emphasis of the book is processors and as such, they are examined
somewhat in isolation. This contrasts with reality where such processors typically
form part of a larger system including peripherals, communication networks and
so on. As such, the broad topic of system-level design, which is central to other
books, is not covered here.
• At the time of writing, the “hot topic” in computer architecture is the advent of
multi-core devices, i.e., many processors on a single chip. Although this has been


Preface

xiii

a research area for many years in an attempt to cope with design complexity and
effective use of increased transistor counts, commodity multi-core devices have
breathed new life into the subject. Again, because the emphasis is more intraprocessor than inter-processor, we omit the topic here although among all the
options for future material this is perhaps the most compelling.
• The book covers only traditional logic styles. However, for specific application
areas it has become attractive to investigate alternatives; the idea is basically that
one should be aware of available alternatives and use the right one in the right
place. Specifically, secure logic styles (which help to prevent certain types of
passive attack) and asynchronous logic (which avoids the need for global clock
signals) represent interesting avenues for future content.

Contact
As people who know me will (too) willingly attest, I am far from perfect. As a result,
this book is sure to include problems in the shape of minor errors and mistakes. If
you find such a problem, or have a more general comment, I would be glad to hear

about it; you can contact me via
/>
Acknowledgements
A
This book was typeset with LTEX, originally developed by Leslie Lamport and
based on TEX by Donald Knuth; the listings package by Carsten Heinz, the
algorithm2e package by Christophe Fiorio and karnaugh by Andreas Wieland
were all used in addition to the basic system. The various figures and source code
listing were generated with the help of ModelSim, a Verilog simulator by Mentor
Graphics; GTKWave, a VCD waveform viewer by Anthony Bybell; xfig, a general
vector drawing package originally written by Supoj Sutanthavibul and maintained
by many others; xcircuit, a circuit vector drawing package by Tim Edwards;
asymptote, a scripted vector drawing package by Andy Hammerlindl, John Bowman, and Tom Prince; SPIM, a MIPS32 simulator written by James Larus; Dinero,
a cache simulator written by Jan Edler and Mark Hill; and finally gcc the GNU C
compiler, originally by Richard Stallman and now maintained by many others.
Throughout the book, images from other sources are reproduced under specific
licenses; each image of this type carefully notes the source and the license in question. Specific details of the GNU Free Documentation License and Creative Commons Licenses can be found via

/>and


xiv

Preface

/>Of course, as with any project, numerous people contributed in other ways. I would
like to thank the (extended) Page, Symonds, Gould and Hunkin families and my
friends Stan and Paul, Gavin and Heather and Fry’s Hockey Club for their support,
and providing an escape from work; Maisie, you still owe me a “Busy Bee” for
helping with your homework. The book could not have been completed without the

help and guidance of Wayne Wheeler, Catherine Brett and Simon Rees at SpringerVerlag. It probably would not have been started at all without the encouragement
and tutelage of Nigel Smart, Henk Muller, David May and James Irwin within the
Computer Science Department at the University of Bristol. Staff and students in the
Cryptography and Information Security Group have provided a constant sounding
board for ideas; I would like to thank Andrew Moss, Rob Granger, Philipp Grabher
and Johann Groòschă dl in particular. Like all good Engineers, students at the Unia
versity of Bristol have never been shy to say when I am talking nonsense or present
something in a particularly boring way; my thanks to them and many anonymous
reviewers for improving the editorial quality throughout.
Most of all I thank Kate for making it all worthwhile.


Contents

Part I Tools and Techniques
1

Mathematical Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Propositions and Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Connectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.2 Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.3 Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Sets and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2 Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3 Numeric Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.5 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Boolean Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Boolean Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3.2 Normal Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Number Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1 Converting Between Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.2 Bits, Bytes and Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.3 Representing Numbers and Characters . . . . . . . . . . . . . . . . . .
1.5 Toward a Digital Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.7 Example Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3
3
5
7
8
10
11
12
14
14
17
18
21
22
23
25
27
29
40
41
42


2

Basics of Digital Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1 Switches and Transistors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Basic Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2 Building and Packaging Transistors . . . . . . . . . . . . . . . . . . . . .
2.2 Combinatorial Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Basic Logic Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 3-state Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43
43
43
44
48
48
50

xv


xvi

Contents

2.3

2.4


2.5
2.6
3

2.2.3 Designing Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.4 Simplifying Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.5 Physical Circuit Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.6 Basic Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Clocked and Stateful Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Latches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3 Flip-Flops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.4 State Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Implementation and Fabrication Technologies . . . . . . . . . . . . . . . . . . .
2.4.1 Silicon Fabrication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2 Programmable Logic Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.3 Field Programmable Gate Arrays . . . . . . . . . . . . . . . . . . . . . . .
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Example Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51
53
61
63
74
75
77
80
81
87

87
90
92
93
93

Hardware Design Using Verilog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.1.1 The Problem of Design Complexity . . . . . . . . . . . . . . . . . . . . . 97
3.1.2 Design Automation as a Solution . . . . . . . . . . . . . . . . . . . . . . . 98
3.2 Structural Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.2.1 Modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
3.2.2 Wires . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
3.2.3 Values and Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.2.4 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
3.2.5 Basic Instantiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
3.2.6 Nested Instantiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.2.7 User-Defined Primitives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.3 Higher-level Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.3.1 Continuous Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.3.2 Selection and Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3.3.3 Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.3.4 Timing and Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.4 State and Clocked Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.4.1 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
3.4.2 Processes and Triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
3.4.3 Procedural Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
3.4.4 Timing and Delays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
3.4.5 Further Behavioural Statements . . . . . . . . . . . . . . . . . . . . . . . . 121
3.4.6 Tasks and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

3.5 Effective Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.5.1 System Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.5.2 Using the Pre-processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.5.3 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
3.5.4 Named Port Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132


Contents

xvii

3.5.5 Generate Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.5.6 Simulation and Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
3.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3.7 Example Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Part II Processor Design
4

A Historical and Functional Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.2 Special-Purpose Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
4.3 General-Purpose Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
4.4 Stored Program Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
4.5 Toward Modern Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
4.5.1 The von Neumann Bottleneck . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.5.2 Data-Dependent Control-Flow . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.5.3 Self-Modifying Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
4.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
4.7 Example Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166


5

Basic Processor Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.1 A Concrete Stored Program Architecture . . . . . . . . . . . . . . . . . . . . . . . 169
5.1.1 Major Data-path Components . . . . . . . . . . . . . . . . . . . . . . . . . . 171
5.1.2 Describing Instruction Behaviour . . . . . . . . . . . . . . . . . . . . . . . 174
5.1.3 The Fetch-Decode-Execute Cycle . . . . . . . . . . . . . . . . . . . . . . 176
5.1.4 Controlling the Data-path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
5.2 Buses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.2.1 Synchronous Buses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
5.2.2 Asynchronous Buses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
5.3 Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
5.3.1 Immediate Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.3.2 Register Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.3.3 Memory Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
5.4 Instruction Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
5.4.1 Instruction Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
5.4.2 Instruction Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.4.3 Basic Encoding and Decoding . . . . . . . . . . . . . . . . . . . . . . . . . 186
5.4.4 More Complicated Encoding Issues . . . . . . . . . . . . . . . . . . . . . 189
5.5 Control-Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
5.5.1 Predicated Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
5.5.2 Function Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
5.6 Some Design Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
5.6.1 Moore’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
5.6.2 RISC versus CISC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
5.7 Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
5.8 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
5.9 Example Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209



xviii

Contents

6

Measuring Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.1 Measuring Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.1.1 Estimating Execution Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
6.1.2 Measuring Execution Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
6.1.3 Benchmark Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
6.1.4 Measuring Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
6.2 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
6.3 Example Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

7

Arithmetic and Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
7.2 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7.2.1 Unsigned Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
7.2.2 Signed Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
7.3 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
7.3.1 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
7.3.2 Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
7.4 Shift and Rotate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
7.4.1 Bit-Serial Shifter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.4.2 Logarithmic Shifter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
7.5 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

7.5.1 Bit-Serial Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
7.5.2 Tree Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
7.5.3 Digit-Serial Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
7.5.4 Early Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251
7.5.5 Wallace and Dadda Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
7.5.6 Booth Recoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
7.6 Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
7.6.1 Comparison ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
7.6.2 Arithmetic ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
7.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
7.8 Example Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

8

Memory and Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
8.1.1 Historical Memory and Storage . . . . . . . . . . . . . . . . . . . . . . . . 271
8.1.2 A Modern Memory Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . 277
8.1.3 Basic Organisation and Implementation . . . . . . . . . . . . . . . . . 279
8.1.4 Memory Banking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
8.1.5 Access Locality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
8.2 Memory and Storage Specifics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291
8.2.1 Static RAM (SRAM) and Dynamic RAM (DRAM) . . . . . . . 291
8.2.2 Non-volatile RAM and ROM . . . . . . . . . . . . . . . . . . . . . . . . . . 293
8.2.3 Magnetic Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294


Contents

8.3


8.4

8.5

8.6
8.7
9

xix

8.2.4 Optical Disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
8.2.5 Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Basic Cache Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
8.3.1 Fetch Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
8.3.2 Write Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
8.3.3 Direct-Mapped Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
8.3.4 Fully-Associative Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
8.3.5 Set-Associative Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
8.3.6 Cache Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
Advanced Cache Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
8.4.1 Victim Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
8.4.2 Gated and Drowsy Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
Putting It All Together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
8.5.1 Register File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
8.5.2 Main Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
8.5.3 Cache Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329
Example Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329


Advanced Processor Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
9.1.1 A Taxonomy of Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
9.1.2 Instruction-Level Parallelism (ILP) . . . . . . . . . . . . . . . . . . . . . 335
9.2 Pipelined Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
9.2.1 Pipelined Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
9.2.2 Pipelined Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
9.2.3 Pipeline Hazards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
9.2.4 Stalls and Hazard Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . 352
9.3 Superscalar Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
9.3.1 Basic Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
9.3.2 Step 1: Scoreboard-based Design . . . . . . . . . . . . . . . . . . . . . . . 361
9.3.3 Step 2: Reservation Station-based Design . . . . . . . . . . . . . . . . 370
9.3.4 Further Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
9.4 Vector Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
9.4.1 Basic Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
9.4.2 A Dedicated Vector Processor . . . . . . . . . . . . . . . . . . . . . . . . . . 382
9.4.3 SIMD Within A Register (SWAR) . . . . . . . . . . . . . . . . . . . . . . 384
9.4.4 Issues of Vectorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
9.5 VLIW Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
9.5.1 Basic Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
9.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
9.7 Example Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390


xx

Contents

Part III The Hardware/Software Interface

10

Linkers and Assemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
10.2 The Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400
10.2.1 Stack Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
10.2.2 Static Data Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
10.2.3 Dynamic Data Section . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
10.3 Executable Versus Object Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
10.4 Linkers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
10.4.1 Static and Dynamic Linkage . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
10.4.2 Boot-strap Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
10.4.3 Symbol Relocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
10.4.4 Symbol Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
10.5 Assemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
10.5.1 Basic Assembly Language Statements . . . . . . . . . . . . . . . . . . . 418
10.5.2 Using Machine Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 420
10.5.3 Using Assembler Aliases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
10.5.4 Using Assembler Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
10.5.5 Peephole Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
10.5.6 Some Short Example Programs . . . . . . . . . . . . . . . . . . . . . . . . 435
10.5.7 The Forward Referencing Problem . . . . . . . . . . . . . . . . . . . . . . 439
10.5.8 An Example Assembler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
10.6 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
10.7 Example Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449

11

Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451

11.2 Compiler Bootstrapping and Re-Hosting . . . . . . . . . . . . . . . . . . . . . . . 453
11.3 Intermediate Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
11.4 Register Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
11.4.1 An Example Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
11.4.2 Uses for Pre-colouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463
11.4.3 Avoiding Error Cases via Spilling . . . . . . . . . . . . . . . . . . . . . . 464
11.5 Instruction Selection and Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . 467
11.5.1 Instruction Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
11.5.2 Instruction Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
11.5.3 Scheduling Basic Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 470
11.5.4 Scheduling Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
11.6 “Template” Code Generation for High-Level Statements . . . . . . . . . . 473
11.6.1 Conditional Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
11.6.2 Loop Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
11.6.3 Multi-way Branch Statements . . . . . . . . . . . . . . . . . . . . . . . . . . 478
11.7 “Template” Code Generation for High-Level Function Calls . . . . . . . 479
11.7.1 Basic Stack Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481


Contents

xxi

11.7.2 Advanced Stack Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
11.8 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
11.9 Example Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
12

Operating Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495

12.2 The Hardware/Software Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
12.2.1 MIPS32 Co-processor Registers . . . . . . . . . . . . . . . . . . . . . . . . 497
12.2.2 MIPS32 Processor Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499
12.2.3 MIPS32 Assembly Language . . . . . . . . . . . . . . . . . . . . . . . . . . 500
12.3 Boot-Strapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501
12.4 Event Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
12.4.1 Handling Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
12.4.2 Handling Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505
12.4.3 Handling Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
12.4.4 An Example Exception Handler . . . . . . . . . . . . . . . . . . . . . . . . 510
12.5 Memory Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
12.5.1 Basic Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512
12.5.2 Pages and Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
12.5.3 Address Translation and Memory Access . . . . . . . . . . . . . . . . 518
12.5.4 Page Eviction and Replacement . . . . . . . . . . . . . . . . . . . . . . . . 520
12.5.5 Translation Look-aside Buffer (TLB) . . . . . . . . . . . . . . . . . . . 521
12.6 Process Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
12.6.1 Storing and Switching Process Context . . . . . . . . . . . . . . . . . . 523
12.6.2 Process Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524
12.6.3 An Example Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
12.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533
12.8 Example Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534

13

Efficient Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
13.2 “Space” Conscious Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
13.2.1 Reducing Register Pressure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
13.2.2 Reducing Memory Allocation . . . . . . . . . . . . . . . . . . . . . . . . . . 537

13.3 “Time” Conscious Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
13.3.1 Effective Short-circuiting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
13.3.2 Branch Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
13.3.3 Loop Fusion and Fission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542
13.3.4 Loop Unrolling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
13.3.5 Loop Hoisting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
13.3.6 Loop Interchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
13.3.7 Loop Blocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
13.3.8 Function Inlining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
13.3.9 Software Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
13.4 Example Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556


xxii

Contents

Part IV Appendices
SPIM: A MIPS32 Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561
A.2 Configuring SPIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
A.3 Controlling SPIM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
A.4 Example Program Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
A.5 Using System Calls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
Example Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633


Part I


Tools and Techniques


Chapter 1

Mathematical Preliminaries

In mathematics you don’t understand things. You just get used to them.
– J. von Neumann

Abstract The goal of this chapter is to give a fairly comprehensive overview of the
theory that underpins the rest of the book. On first reading, it may seem a little dry
and is often excluded in other similar books. However, without a solid understanding of logic and representation of numbers it seems clear that constructing digital
circuits to put this theory into practise would be much harder. The theory here will
present an introduction to propositional logic, sets and functions, number systems
and Boolean algebra. These four main areas combine to produce a basis for formal
methods to describe, manipulate and implement digital systems such as computer
processors. Those with a background in mathematics or computer science might
skip this material and use it simply for reference; those approaching the subject
from another background would be advised to read the material in more detail.

1.1 Propositions and Predicates
Definition 1. A proposition is a statement whose meaning, termed the truth value,
is either true or false. Less formally, we say the statement is true if it has a truth
value of true and false if it has a truth value of false.
A predicate is a proposition which contains one or more variables; only when
concrete values are assigned to each of the variables can the predicate be called a
proposition.
Since we use them so naturally, it almost seems too formal to define what a proposition is. However, by doing so we can start to use them as a building block to describe

what logic is and how it works. The statement
“the temperature is 90◦C”

3


4

1 Mathematical Preliminaries

is a proposition since it is definitely either true or false. When we take a proposition
and decide whether it is true or false, we say we have evaluated it. However, there
are clearly a lot of statements that are not propositions because they do not state any
proposal. For example,
“turn off the heat”
is a command or request of some kind, it does not evaluate to a truth value. Propositions must also be well defined in the sense that they are definitely either true or
false, i.e., there are no “gray areas” in between. The statement
“90◦C is too hot”
is not a proposition since it could be true or false depending on the context, or
your point of view: 90◦C probably is too hot for body temperature but probably not
for a cup of coffee. Finally, some statements look like propositions but cannot be
evaluated because they are paradoxical. The most famous example of this situation
is the liar paradox, usually attributed to the Greek philosopher Eubulides, who stated
it as
“a man says that he is lying, is what he says true or false ?”
although a clearer version is the more commonly referenced
“this statement is false” .
If the man is telling the truth, everything he says must be true which means he is
lying and hence everything he says is false. Conversely, if the man is lying everything he says is false, so he cannot be lying since he said he was ! In terms of the
statement, we cannot be sure of the truth value so this is not normally classed as a

proposition.
As stated above, a predicate is just a proposition that contains variables. By assigning the variable a value we can turn the predicate into a proposition and evaluate
the corresponding truth value. For example, consider the predicate
“x◦C equals 90◦C”
where x is a variable. By assigning x a value we get a proposition; setting x = 10,
for example, gives
“10◦C equals 90◦C”
which clearly evaluates to false. Setting x = 90◦C gives
“90◦C equals 90◦C”
which evaluates to true. In some sense, a predicate is an abstract or general proposition which is not well defined until we assign values to all the variables.


1.1 Propositions and Predicates

5

1.1.1 Connectives
Definition 2. A connective is a statement which binds single propositions into a
compound proposition. For brevity, we use symbols to denote common connectives:







“not x” is denoted ¬x.
“x and y” is denoted x ∧ y.
“x or y” is denoted x ∨ y, this is usually called an inclusive-or.
“x or y but not x and y” is denoted x ⊕ y, this is usually called an exclusive-or.

“x implies y” is denoted x → y, which is sometimes written as “if x then y”.
“x is equivalent to y” is denoted x ↔ y, which is sometimes written as “x if and
only if y” or further shortened to “x iff. y”.

Note that we group statements using parentheses when there could be some confusion about the order they are applied; hence (x ∧ y) is the same as x ∧ y.
A proposition or predicate involving connectives is built from terms; the connective
joins together these terms into an expression. For example, the expression
“the temperature is less than 90◦C ∧ the temperature is greater than 10◦C”
contains two terms that propose
“the temperature is less than 90◦C”
and
“the temperature is greater than 10◦C” .
These terms are joined together using the ∧ connective so that the whole expression
evaluates to true if both of the terms are true, otherwise it evaluates to false. In a
similar way we might write a compound predicate
“the temperature is less than x◦C ∧ the temperature is greater than y◦C”
which can only be evaluated when we assign values to the variables x and y.
Definition 3. The meaning of connectives is usually describe in a tabular form
which enumerates the possible values each term can take and what the resulting
truth value is; we call this a truth table.
x
y ¬x x ∧ y x ∨ y x ⊕ y x → y x ↔ y
false false true false false false true true
false true true false true true true false
true false false false true true false false
true true false true true false true true
The ¬ connective negates the truth value of an expression so considering
¬(x > 10)



×