Tải bản đầy đủ (.pdf) (483 trang)

Ebook Computer organization and design: The hardware software interface (RISC-V edition) - Part 1

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (7.31 MB, 483 trang )


In Praise of Computer Organization and Design: The Hardware/
Software Interface
“Textbook selection is often a frustrating act of compromise—pedagogy, content
coverage, quality of exposition, level of rigor, cost. Computer Organization and
Design is the rare book that hits all the right notes across the board, without
compromise. It is not only the premier computer organization textbook, it is a
shining example of what all computer science textbooks could and should be.”
—Michael Goldweber, Xavier University
“I have been using Computer Organization and Design for years, from the very first
edition. This new edition is yet another outstanding improvement on an already
classic text. The evolution from desktop computing to mobile computing to Big
Data brings new coverage of embedded processors such as the ARM, new material
on how software and hardware interact to increase performance, and cloud
computing. All this without sacrificing the fundamentals.”
—Ed Harcourt, St. Lawrence University
“To Millennials: Computer Organization and Design is the computer architecture
book you should keep on your (virtual) bookshelf. The book is both old and new,
because it develops venerable principles—Moore’s Law, abstraction, common case
fast, redundancy, memory hierarchies, parallelism, and pipelining—but illustrates
them with contemporary designs.”
—Mark D. Hill, University of Wisconsin-Madison
“The new edition of Computer Organization and Design keeps pace with advances
in emerging embedded and many-core (GPU) systems, where tablets and
smartphones will/are quickly becoming our new desktops. This text acknowledges
these changes, but continues to provide a rich foundation of the fundamentals
in computer organization and design which will be needed for the designers of
hardware and software that power this new class of devices and systems.”
—Dave Kaeli, Northeastern University
“Computer Organization and Design provides more than an introduction to computer
architecture. It prepares the reader for the changes necessary to meet the everincreasing performance needs of mobile systems and big data processing at a time


that difficulties in semiconductor scaling are making all systems power constrained.
In this new era for computing, hardware and software must be co-designed and
system-level architecture is as critical as component-level optimizations.”
—Christos Kozyrakis, Stanford University
“Patterson and Hennessy brilliantly address the issues in ever-changing computer
hardware architectures, emphasizing on interactions among hardware and software
components at various abstraction levels. By interspersing I/O and parallelism concepts
with a variety of mechanisms in hardware and software throughout the book, the new
edition achieves an excellent holistic presentation of computer architecture for the postPC era. This book is an essential guide to hardware and software professionals facing
energy efficiency and parallelization challenges in Tablet PC to Cloud computing.”
—Jae C. Oh, Syracuse University


This page intentionally left blank


R

I

S

C

-

V

E


D I

T

I O N

Computer Organization and Design
T H E

H A R D W A R E / S O F T W A R E

I N T E R FA C E


David A. Patterson is the Pardee Professor of Computer Science, Emeritus at the
University of California at Berkeley, which he joined after graduating from UCLA in
1977. His teaching has been honored by the Distinguished Teaching Award from the
University of California, the Karlstrom Award from ACM, and the Mulligan Education
Medal and Undergraduate Teaching Award from IEEE. Patterson received the IEEE
Technical Achievement Award and the ACM Eckert-Mauchly Award for contributions
to RISC, and he shared the IEEE Johnson Information Storage Award for contributions
to RAID. He also shared the IEEE John von Neumann Medal and the C & C Prize
with John Hennessy. Like his coauthor, Patterson is a Fellow of the American Academy
of Arts and Sciences, the Computer History Museum, ACM, and IEEE, and he was
elected to the National Academy of Engineering, the National Academy of Sciences,
and the Silicon Valley Engineering Hall of Fame. He served on the Information
Technology Advisory Committee to the US President, as chair of the CS division in the
Berkeley EECS department, as chair of the Computing Research Association, and as
President of ACM. This record led to Distinguished Service Awards from ACM, CRA,
and SIGARCH.

At Berkeley, Patterson led the design and implementation of RISC I, likely the first
VLSI reduced instruction set computer, and the foundation of the commercial SPARC
architecture. He was a leader of the Redundant Arrays of Inexpensive Disks (RAID) project,
which led to dependable storage systems from many companies. He was also involved in the
Network of Workstations (NOW) project, which led to cluster technology used by Internet
companies and later to cloud computing. These projects earned four dissertation awards
from ACM. His current research projects are Algorithm-Machine-People and Algorithms
and Specializers for Provably Optimal Implementations with Resilience and Efficiency. The
AMP Lab is developing scalable machine learning algorithms, warehouse-scale-computerfriendly programming models, and crowd-sourcing tools to gain valuable insights quickly
from big data in the cloud. The ASPIRE Lab uses deep hardware and software co-tuning
to achieve the highest possible performance and energy efficiency for mobile and rack
computing systems.
John L. Hennessy is a Professor of Electrical Engineering and Computer Science at
Stanford University, where he has been a member of the faculty since 1977 and was,
from 2000 to 2016, its tenth President. Hennessy is a Fellow of the IEEE and ACM; a
member of the National Academy of Engineering, the National Academy of Science,
and the American Philosophical Society; and a Fellow of the American Academy of
Arts and Sciences. Among his many awards are the 2001 Eckert-Mauchly Award for
his contributions to RISC technology, the 2001 Seymour Cray Computer Engineering
Award, and the 2000 John von Neumann Award, which he shared with David Patterson.
He has also received seven honorary doctorates.
In 1981, he started the MIPS project at Stanford with a handful of graduate students.
After completing the project in 1984, he took a leave from the university to cofound
MIPS Computer Systems (now MIPS Technologies), which developed one of the first
commercial RISC microprocessors. As of 2006, over 2 billion MIPS microprocessors have
been shipped in devices ranging from video games and palmtop computers to laser printers
and network switches. Hennessy subsequently led the DASH (Director Architecture
for Shared Memory) project, which prototyped the first scalable cache coherent
multiprocessor; many of the key ideas have been adopted in modern multiprocessors.
In addition to his technical activities and university responsibilities, he has continued to

work with numerous start-ups, both as an early-stage advisor and an investor.


R

I

S

C

-

V

E

D

I

T

I

O

N

Computer Organization and Design

T H E

H A R D W A R E / S O F T W A R E

I N T E R FA C E

David A. Patterson
University of California, Berkeley
John L. Hennessy
Stanford University

RISC-V updates and contributions by
Andrew S. Waterman
SiFive, Inc.
Yunsup Lee
SiFive, Inc.

Matthew Farrens
University of California, Davis

Kevin Lim
Hewlett-Packard

David Kaeli
Northeastern University

Additional contributions by
Perry Alexander
The University of Kansas


Eric Love
University of California,
Berkeley

Nicole Kaiyan
University of Adelaide

John Nickolls
NVIDIA

Peter J. Ashenden
Ashenden Designs Pty Ltd

David Kirk
NVIDIA

John Y. Oliver
Cal Poly, San Luis Obispo

Jason D. Bakos
University of South Carolina

Zachary Kurmas
Grand Valley State University

Milos Prvulovic
Georgia Tech

Javier Diaz Bruguera
Universidade de Santiago de Compostela


James R. Larus
School of Computer and
Communications Science at EPFL

Partha Ranganathan
Google

Jichuan Chang
Google

Jacob Leverich
Stanford University

Mark Smotherman
Clemson University


Morgan Kaufmann is an imprint of Elsevier
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
Copyright © 2018 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including
photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how
to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the
Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted
herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in
research methods, professional practices, or medical treatment may become necessary.

Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods,
compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the
safety of others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/
or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods,
products, instructions, or ideas contained in the material herein.
RISC-V and the RISC-V logo are registered trademarks managed by the RISC-V Foundation, used under permission of the RISC-V
Foundation. All rights reserved.
This publication is independent of the RISC-V Foundation, which is not affiliated with the publisher and the RISC-V Foundation does not
authorize, sponsor, endorse or otherwise approve this publication.
All material relating to ARM® technology has been reproduced with permission from ARM Limited, and should only be used for education
purposes. All ARM-based models shown or referred to in the text must not be used, reproduced or distributed for commercial purposes, and
in no event shall purchasing this textbook be construed as granting you or any third party, expressly or by implication, estoppel or otherwise,
a license to use any other ARM technology or know how. Materials provided by ARM are copyright © ARM Limited (or its affi liates).
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library
Library of Congress Cataloging-in-Publication Data
A catalog record for this book is available from the Library of Congress
ISBN: 978-0-12-812275-4
For Information on all Morgan Kaufmann publications
visit our website at />
Publisher: Katey Birtcher
Acquisition Editor: Steve Merken
Development Editor: Nate McFadden
Production Project Manager: Lisa Jones
Designer: Victoria Pearson Esser
Typeset by MPS Limited, Chennai, India


To Linda,

who has been, is, and always will be the love of my life


A C K N O W L E D G M E N T S

Figures 1.7, 1.8 Courtesy of iFixit (www.ifixit.com).

Figure 1.10.4 Courtesy of Cray Inc.

Figure 1.9 Courtesy of Chipworks (www.chipworks.com).

Figure 1.10.5 Courtesy of Apple Computer, Inc.

Figure 1.13 Courtesy of Intel.

Figure 1.10.6 Courtesy of the Computer History Museum.

Figures 1.10.1, 1.10.2, 4.15.2 Courtesy of the Charles Babbage
Institute, University of Minnesota Libraries, Minneapolis.

Figures 5.17.1, 5.17.2 Courtesy of Museum of Science, Boston.

Figures 1.10.3, 4.15.1, 4.15.3, 5.12.3, 6.14.2 Courtesy of IBM.

Figure 5.17.4 Courtesy of MIPS Technologies, Inc.
Figure 6.15.1 Courtesy of NASA Ames Research Center.


Contents
Preface xv


C H A P T E R S

1

Computer Abstractions and Technology  2
1.1 Introduction 3
1.2 Eight Great Ideas in Computer Architecture  11
1.3 Below Your Program  13
1.4 Under the Covers  16
1.5 Technologies for Building Processors and Memory  24
1.6 Performance 28
1.7 The Power Wall  40
1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors  43
1.9 Real Stuff: Benchmarking the Intel Core i7  46
1.10 Fallacies and Pitfalls  49
1.11 Concluding Remarks  52
1.12Historical Perspective and Further Reading  54
1.13Exercises  54

2

Instructions: Language of the Computer  60
2.1 Introduction 62
2.2 Operations of the Computer Hardware  63
2.3 Operands of the Computer Hardware  67
2.4 Signed and Unsigned Numbers  74
2.5 Representing Instructions in the Computer  81
2.6 Logical Operations  89
2.7 Instructions for Making Decisions  92

2.8 Supporting Procedures in Computer Hardware  98
2.9 Communicating with People  108
2.10 RISC-V Addressing for Wide Immediates and Addresses  113
2.11 Parallelism and Instructions: Synchronization  121
2.12 Translating and Starting a Program  124
2.13 A C Sort Example to Put it All Together  133
2.14 Arrays versus Pointers  141
2.15Advanced Material: Compiling C and Interpreting Java  144


xContents

2.16 Real Stuff: MIPS Instructions  145
2.17 Real Stuff: x86 Instructions  146
2.18 Real Stuff: The Rest of the RISC-V Instruction Set  155
2.19 Fallacies and Pitfalls  157
2.20 Concluding Remarks  159
2.21Historical Perspective and Further Reading  162
2.22Exercises  162

3

Arithmetic for Computers  172
3.1 Introduction 174
3.2 Addition and Subtraction  174
3.3 Multiplication 177
3.4 Division 183
3.5 Floating Point  191
3.6 Parallelism and Computer Arithmetic: Subword Parallelism  216
3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions

in x86  217
3.8 Going Faster: Subword Parallelism and Matrix Multiply  218
3.9 Fallacies and Pitfalls  222
3.10 Concluding Remarks  225
3.11Historical Perspective and Further Reading  227
3.12Exercises  227

4

The Processor  234
4.1 Introduction 236
4.2 Logic Design Conventions  240
4.3 Building a Datapath  243
4.4 A Simple Implementation Scheme  251
4.5 An Overview of Pipelining  262
4.6 Pipelined Datapath and Control  276
4.7 Data Hazards: Forwarding versus Stalling  294
4.8 Control Hazards  307
4.9 Exceptions 315
4.10 Parallelism via Instructions  321
4.11 Real Stuff: The ARM Cortex-A53 and Intel Core i7 Pipelines  334
4.12 Going Faster: Instruction-Level Parallelism and Matrix Multiply  342
4.13Advanced Topic: An Introduction to Digital Design Using a Hardware
Design Language to Describe and Model a Pipeline and More Pipelining
Illustrations 345
4.14 Fallacies and Pitfalls  345
4.15 Concluding Remarks  346
4.16Historical Perspective and Further Reading  347
4.17Exercises  347



Contents

5

Large and Fast: Exploiting Memory Hierarchy  364
5.1 Introduction 366
5.2 Memory Technologies  370
5.3 The Basics of Caches  375
5.4 Measuring and Improving Cache Performance  390
5.5 Dependable Memory Hierarchy  410
5.6 Virtual Machines  416
5.7 Virtual Memory  419
5.8 A Common Framework for Memory Hierarchy  443
5.9 Using a Finite-State Machine to Control a Simple Cache  449
5.10 Parallelism and Memory Hierarchy: Cache Coherence  454
5.11Parallelism and Memory Hierarchy: Redundant Arrays of Inexpensive
Disks 458
5.12Advanced Material: Implementing Cache Controllers  459
5.13 Real Stuff: The ARM Cortex-A53 and Intel Core i7 Memory
Hierarchies 459
5.14 Real Stuff: The Rest of the RISC-V System and Special Instructions  464
5.15 Going Faster: Cache Blocking and Matrix Multiply  465
5.16 Fallacies and Pitfalls  468
5.17 Concluding Remarks  472
5.18Historical Perspective and Further Reading  473
5.19Exercises  473

6


Parallel Processors from Client to Cloud  490
6.1 Introduction 492
6.2 The Difficulty of Creating Parallel Processing Programs  494
6.3 SISD, MIMD, SIMD, SPMD, and Vector  499
6.4 Hardware Multithreading  506
6.5 Multicore and Other Shared Memory Multiprocessors  509
6.6 Introduction to Graphics Processing Units  514
6.7 Clusters, Warehouse Scale Computers, and Other Message-Passing
Multiprocessors 521
6.8 Introduction to Multiprocessor Network Topologies  526
6.9Communicating to the Outside World: Cluster Networking  529
6.10 Multiprocessor Benchmarks and Performance Models  530
6.11 Real Stuff: Benchmarking and Rooflines of the Intel Core i7 960 and the
NVIDIA Tesla GPU  540
6.12 Going Faster: Multiple Processors and Matrix Multiply  545
6.13 Fallacies and Pitfalls  548
6.14 Concluding Remarks  550
6.15Historical Perspective and Further Reading  553
6.16Exercises  553

xi


xiiContents

A P P E N D I X

A

The Basics of Logic Design  A-2

A.1Introduction A-3
A.2 Gates, Truth Tables, and Logic Equations  A-4
A.3 Combinational Logic  A-9
A.4 Using a Hardware Description Language  A-20
A.5 Constructing a Basic Arithmetic Logic Unit  A-26
A.6 Faster Addition: Carry Lookahead  A-37
A.7Clocks A-47
A.8 Memory Elements: Flip-Flops, Latches, and Registers  A-49
A.9 Memory Elements: SRAMs and DRAMs  A-57
A.10 Finite-State Machines  A-66
A.11 Timing Methodologies  A-71
A.12 Field Programmable Devices  A-77
A.13 Concluding Remarks  A-78
A.14Exercises  A-79

Index I-1
O N L I N E



B

C O N T E N T

Graphics and Computing GPUs  B-2
B.1Introduction B-3
B.2 GPU System Architectures  B-7
B.3 Programming GPUs  B-12
B.4 Multithreaded Multiprocessor Architecture  B-25
B.5 Parallel Memory System  B-36

B.6 Floating Point Arithmetic  B-41
B.7 Real Stuff: The NVIDIA GeForce 8800  B-46
B.8 Real Stuff: Mapping Applications to GPUs  B-55
B.9 Fallacies and Pitfalls  B-72
B.10 Concluding Remarks  B-76
B.11 Historical Perspective and Further Reading  B-77

C


Mapping Control to Hardware  C-2
C.1Introduction C-3
C.2 Implementing Combinational Control Units  C-4
C.3 Implementing Finite-State Machine Control  C-8
C.4 Implementing the Next-State Function with a Sequencer  C-22
C.5 Translating a Microprogram to Hardware  C-28
C.6 Concluding Remarks  C-32
C.7Exercises C-33


Contents


A Survey of RISC Architectures for Desktop, Server,
D
and Embedded Computers  D-2
D.1Introduction D-3
D.2 Addressing Modes and Instruction Formats  D-5
D.3 Instructions: the MIPS Core Subset  D-9
D.4 Instructions: Multimedia Extensions of the Desktop/Server RISCs  D-16

D.5 Instructions: Digital Signal-Processing Extensions of the Embedded
RISCs D-19
D.6 Instructions: Common Extensions to MIPS Core  D-20
D.7 Instructions Unique to MIPS-64  D-25
D.8 Instructions Unique to Alpha  D-27
D.9 Instructions Unique to SPARC v9  D-29
D.10 Instructions Unique to PowerPC  D-32
D.11 Instructions Unique to PA-RISC 2.0  D-34
D.12 Instructions Unique to ARM  D-36
D.13 Instructions Unique to Thumb  D-38
D.14 Instructions Unique to SuperH  D-39
D.15 Instructions Unique to M32R  D-40
D.16 Instructions Unique to MIPS-16  D-40
D.17 Concluding Remarks  D-43
Glossary G-1
Further Reading  FR-1

xiii


This page intentionally left blank


Preface
The most beautiful thing we can experience is the mysterious. It is the
source of all true art and science.
Albert Einstein, What I Believe, 1930

About This Book
We believe that learning in computer science and engineering should reflect

the current state of the field, as well as introduce the principles that are shaping
computing. We also feel that readers in every specialty of computing need
to appreciate the organizational paradigms that determine the capabilities,
performance, energy, and, ultimately, the success of computer systems.
Modern computer technology requires professionals of every computing
specialty to understand both hardware and software. The interaction between
hardware and software at a variety of levels also offers a framework for understanding
the fundamentals of computing. Whether your primary interest is hardware or
software, computer science or electrical engineering, the central ideas in computer
organization and design are the same. Thus, our emphasis in this book is to show
the relationship between hardware and software and to focus on the concepts that
are the basis for current computers.
The recent switch from uniprocessor to multicore microprocessors confirmed
the soundness of this perspective, given since the first edition. While programmers
could ignore the advice and rely on computer architects, compiler writers, and silicon
engineers to make their programs run faster or be more energy-efficient without
change, that era is over. For programs to run faster, they must become parallel.
While the goal of many researchers is to make it possible for programmers to be
unaware of the underlying parallel nature of the hardware they are programming,
it will take many years to realize this vision. Our view is that for at least the next
decade, most programmers are going to have to understand the hardware/software
interface if they want programs to run efficiently on parallel computers.
The audience for this book includes those with little experience in assembly
language or logic design who need to understand basic computer organization as
well as readers with backgrounds in assembly language and/or logic design who
want to learn how to design a computer or understand how a system works and
why it performs as it does.


xviPreface


About the Other Book
Some readers may be familiar with Computer Architecture: A Quantitative
Approach, popularly known as Hennessy and Patterson. (This book in turn is
often called Patterson and Hennessy.) Our motivation in writing the earlier book
was to describe the principles of computer architecture using solid engineering
fundamentals and quantitative cost/performance tradeoffs. We used an approach
that combined examples and measurements, based on commercial systems, to
create realistic design experiences. Our goal was to demonstrate that computer
architecture could be learned using quantitative methodologies instead of a
descriptive approach. It was intended for the serious computing professional who
wanted a detailed understanding of computers.
A majority of the readers for this book do not plan to become computer
architects. The performance and energy efficiency of future software systems will
be dramatically affected, however, by how well software designers understand the
basic hardware techniques at work in a system. Thus, compiler writers, operating
system designers, database programmers, and most other software engineers
need a firm grounding in the principles presented in this book. Similarly,
hardware designers must understand clearly the effects of their work on software
applications.
Thus, we knew that this book had to be much more than a subset of the material
in Computer Architecture, and the material was extensively revised to match the
different audience. We were so happy with the result that the subsequent editions
of Computer Architecture were revised to remove most of the introductory
material; hence, there is much less overlap today than with the first editions of
both books.

Why RISC-V for This Edition?
The choice of instruction set architecture is clearly critical to the pedagogy of a
computer architecture textbook. We didn’t want an instruction set that required

describing unnecessary baroque features for someone’s first instruction set, no
matter how popular it is. Ideally, your initial instruction set should be an exemplar,
just like your first love. Surprisingly, you remember both fondly.
Since there were so many choices at the time, for the first edition of Computer
Architecture: A Quantitative Approach we invented our own RISC-style instruction
set. Given the growing popularity and the simple elegance of the MIPS instruction
set, we switched to it for the first edition of this book and to later editions of the
other book. MIPS has served us and our readers well.
It’s been 20 years since we made that switch, and while billions of chips that use
MIPS continue to be shipped, they are typically in found embedded devices where
the instruction set is nearly invisible. Thus, for a while now it’s been hard to find a
real computer on which readers can download and run MIPS programs.
The good news is that an open instruction set that adheres closely to the RISC
principles has recently debuted, and it is rapidly gaining a following. RISC-V, which
was developed originally at UC Berkeley, not only cleans up the quirks of the MIPS


Preface

instruction set, but it offers a simple, elegant, modern take on what instruction sets
should look like in 2017.
Moreover, because it is not proprietary, there are open-source RISC-V simulators,
compilers, debuggers, and so on easily available and even open-source RISC-V
implementations available written in hardware description languages. In addition,
there will soon be low-cost hardware platforms on which to run RISC-V programs.
Readers will not only benefit from studying these RISC-V designs, they will be able
to modify them and go through the implementation process in order to understand
the impact of their hypothetical changes on performance, die size, and energy.
This is an exciting opportunity for the computing industry as well as for
education, and thus at the time of this writing more than 40 companies have joined

the RISC-V foundation. This sponsor list includes virtually all the major players
except for ARM and Intel, including AMD, Google, Hewlett Packard Enterprise,
IBM, Microsoft, NVIDIA, Oracle, and Qualcomm.
It is for these reasons that we wrote a RISC-V edition of this book, and we are
switching Computer Architecture: A Quantitative Approach to RISC-V as well.
Given that RISC-V offers both 32-bit address instructions and 64-bit address
instructions with essentially the same instruction set, we could have switched
instruction sets but kept the address size at 32 bits. Our publisher polled the faculty
who used the book and found that 75% either preferred larger addresses or were
neutral, so we increased the address space to 64 bits, which may make more sense
today than 32 bits.
The only changes for the RISC-V edition from the MIPS edition are those
associated with the change in instruction sets, which primarily affects Chapter 2,
Chapter 3, the virtual memory section in Chapter 5, and the short VMIPS example
in Chapter 6. In Chapter 4, we switched to RISC-V instructions, changed several
figures, and added a few “Elaboration” sections, but the changes were simpler than
we had feared. Chapter 1 and the rest of the appendices are virtually unchanged.
The extensive online documentation and combined with the magnitude of RISC-V
make it difficult to come up with a replacement for the MIPS version of Appendix
A (“Assemblers, Linkers, and the SPIM Simulator” in the MIPS Fifth Edition).
Instead, Chapters 2, 3, and 5 include quick overviews of the hundreds of RISC-V
instructions outside of the core RISC-V instructions that we cover in detail in the
rest of the book.
Note that we are not (yet) saying that we are permanently switching to RISC-V. For
example, in addition to this new RISC-V edition, there are ARMv8 and MIPS versions
available for sale now. One possibility is that there will be a demand for all versions for
future editions of the book, or for just one. We’ll cross that bridge when we come to it.
For now, we look forward to your reaction to and feedback on this effort.

Changes for the Fifth Edition

We had six major goals for the fifth edition of Computer Organization and Design
demonstrate the importance of understanding hardware with a running example;
highlight main themes across the topics using margin icons that are introduced

xvii


xviiiPreface

early; update examples to reflect changeover from PC era to post-PC era; spread
the material on I/O throughout the book rather than isolating it into a single
chapter; update the technical content to reflect changes in the industry since the
publication of the fourth edition in 2009; and put appendices and optional sections
online instead of including a CD to lower costs and to make this edition viable as
an electronic book.
Before discussing the goals in detail, let’s look at the table on the next page. It
shows the hardware and software paths through the material. Chapters 1, 4, 5, and
6 are found on both paths, no matter what the experience or the focus. Chapter 1
discusses the importance of energy and how it motivates the switch from single
core to multicore microprocessors and introduces the eight great ideas in computer
architecture. Chapter 2 is likely to be review material for the hardware-oriented,
but it is essential reading for the software-oriented, especially for those readers
interested in learning more about compilers and object-oriented programming
languages. Chapter  3 is for readers interested in constructing a datapath or in
learning more about floating-point arithmetic. Some will skip parts of Chapter 3,
either because they don’t need them, or because they offer a review. However, we
introduce the running example of matrix multiply in this chapter, showing how
subword parallels offers a fourfold improvement, so don’t skip Sections 3.6 to 3.8.
Chapter 4 explains pipelined processors. Sections 4.1, 4.5, and 4.10 give overviews,
and Section 4.12 gives the next performance boost for matrix multiply for those

with a software focus. Those with a hardware focus, however, will find that this
chapter presents core material; they may also, depending on their background,
want to read Appendix A on logic design first. The last chapter, on multicores,
multiprocessors, and clusters, is mostly new content and should be read by
everyone. It was significantly reorganized in this edition to make the flow of
ideas more natural and to include much more depth on GPUs, warehouse-scale
computers, and the hardware–software interface of network interface cards that
are key to clusters.


Preface

Chapter or Appendix

Sections

Software focus

1.1 to 1.11

1. Computer Abstractions
and Technology

1.12 (History)
2.1 to 2.14
2.15 (Compilers & Java)

2. Instructions: Language
of the Computer


2.16 to 2.20
2.21 (History)

D. RISC Instruction-Set Architectures

D.1 to D.17
3.1 to 3.5
3.6 to 3.8 (Subword Parallelism)

3. Arithmetic for Computers
3.9 to 3.10 (Fallacies)
3.11 (History)
A. The Basics of Logic Design

A.1 to A.13
4.1 (Overview)
4.2 (Logic Conventions)
4.3 to 4.4 (Simple Implementation)
4.5 (Pipelining Overview)

4. The Processor

4.6 (Pipelined Datapath)
4.7 to 4.9 (Hazards, Exceptions)
4.10 to 4.12 (Parallel, Real Stuff)
4.13 (Verilog Pipeline Control)
4.14 to 4.15 (Fallacies)
4.16 (History)

C. Mapping Control to Hardware


C.1 to C.6
5.1 to 5.10

5. Large and Fast: Exploiting
Memory Hierarchy

5.11 (Redundant Arrays of
Inexpensive Disks)
5.12 (Verilog Cache Controller)
5.13 to 5.17
5.18 (History)
6.1 to 6.8

6. Parallel Process from Client
to Cloud

6.9 (Networks)
6.10 to 6.14
6.15 (History)

B. Graphics Processor Units

B.1 to B.13

Read carefully

Read if have time

Review or read


Read for culture

Reference

Hardware focus

xix


xxPreface

The first of the six goals for this fifth edition was to demonstrate the importance
of understanding modern hardware to get good performance and energy efficiency
with a concrete example. As mentioned above, we start with subword parallelism
in Chapter 3 to improve matrix multiply by a factor of 4. We double performance
in Chapter 4 by unrolling the loop to demonstrate the value of instruction-level
parallelism. Chapter 5 doubles performance again by optimizing for caches using
blocking. Finally, Chapter 6 demonstrates a speedup of 14 from 16 processors by
using thread-level parallelism. All four optimizations in total add just 24 lines of C
code to our initial matrix multiply example.
The second goal was to help readers separate the forest from the trees by
identifying eight great ideas of computer architecture early and then pointing out
all the places they occur throughout the rest of the book. We use (hopefully) easyto-remember margin icons and highlight the corresponding word in the text to
remind readers of these eight themes. There are nearly 100 citations in the book. No
chapter has less than seven examples of great ideas, and no idea is cited less than five
times. Performance via parallelism, pipelining, and prediction are the three most
popular great ideas, followed closely by Moore’s Law. Chapter 4, The Processor, is
the one with the most examples, which is not a surprise since it probably received
the most attention from computer architects. The one great idea found in every

chapter is performance via parallelism, which is a pleasant observation given the
recent emphasis in parallelism in the field and in editions of this book.
The third goal was to recognize the generation change in computing from the
PC era to the post-PC era by this edition with our examples and material. Thus,
Chapter 1 dives into the guts of a tablet computer rather than a PC, and Chapter 6
describes the computing infrastructure of the cloud. We also feature the ARM,
which is the instruction set of choice in the personal mobile devices of the postPC era, as well as the x86 instruction set that dominated the PC era and (so far)
dominates cloud computing.
The fourth goal was to spread the I/O material throughout the book rather
than have it in its own chapter, much as we spread parallelism throughout all the
chapters in the fourth edition. Hence, I/O material in this edition can be found in
Sections 1.4, 4.9, 5.2, 5.5, 5.11, and 6.9. The thought is that readers (and instructors)
are more likely to cover I/O if it’s not segregated to its own chapter.
This is a fast-moving field, and, as is always the case for our new editions, an
important goal is to update the technical content. The running example is the ARM
Cortex A53 and the Intel Core i7, reflecting our post-PC era. Other highlights
include a tutorial on GPUs that explains their unique terminology, more depth on
the warehouse-scale computers that make up the cloud, and a deep dive into 10
Gigabyte Ethernet cards.
To keep the main book short and compatible with electronic books, we placed
the optional material as online appendices instead of on a companion CD as in
prior editions.
Finally, we updated all the exercises in the book.
While some elements changed, we have preserved useful book elements from
prior editions. To make the book work better as a reference, we still place definitions
of new terms in the margins at their first occurrence. The book element called


Preface


“Understanding Program Performance” sections helps readers understand the
performance of their programs and how to improve it, just as the “Hardware/Software
Interface” book element helped readers understand the tradeoffs at this interface.
“The Big Picture” section remains so that the reader sees the forest despite all the
trees. “Check Yourself ” sections help readers to confirm their comprehension of the
material on the first time through with answers provided at the end of each chapter.
This edition still includes the green RISC-V reference card, which was inspired by
the “Green Card” of the IBM System/360. This card has been updated and should be
a handy reference when writing RISC-V assembly language programs.

Instructor Support
We have collected a great deal of material to help instructors teach courses using
this book. Solutions to exercises, figures from the book, lecture slides, and other
materials are available to instructors who register with the publisher. In addition,
the companion Web site provides links to a free RISC-V software. Check the
publisher’s Web site for more information:
textbooks.elsevier.com/9780128122754

Concluding Remarks
If you read the following acknowledgments section, you will see that we went to
great lengths to correct mistakes. Since a book goes through many printings, we
have the opportunity to make even more corrections. If you uncover any remaining,
resilient bugs, please contact the publisher by electronic mail at codRISCVbugs@
mkp.com or by low-tech mail using the address found on the copyright page.
This edition is the third break in the long-standing collaboration between
Hennessy and Patterson, which started in 1989. The demands of running one of
the world’s great universities meant that President Hennessy could no longer make
the substantial commitment to create a new edition. The remaining author felt
once again like a tightrope walker without a safety net. Hence, the people in the
acknowledgments and Berkeley colleagues played an even larger role in shaping

the contents of this book. Nevertheless, this time around there is only one author
to blame for the new material in what you are about to read.

Acknowledgments
With every edition of this book, we are very fortunate to receive help from many
readers, reviewers, and contributors. Each of these people has helped to make this
book better.
We are grateful for the assistance of Khaled Benkrid and his colleagues at
ARM Ltd., who carefully reviewed the ARM-related material and provided helpful
feedback.
Chapter 6 was so extensively revised that we did a separate review for ideas and
contents, and I made changes based on the feedback from every reviewer. I’d like to
thank Christos Kozyrakis of Stanford University for suggesting using the network

xxi


xxiiPreface

interface for clusters to demonstrate the hardware–software interface of I/O and
for suggestions on organizing the rest of the chapter; Mario Flagsilk of Stanford
University for providing details, diagrams, and performance measurements of the
NetFPGA NIC; and the following for suggestions on how to improve the chapter:
David Kaeli of Northeastern University, Partha Ranganathan of HP Labs,
David Wood of the University of Wisconsin, and my Berkeley colleagues Siamak
Faridani, Shoaib Kamil, Yunsup Lee, Zhangxi Tan, and Andrew Waterman.
Special thanks goes to Rimas Avizenis of UC Berkeley, who developed the
various versions of matrix multiply and supplied the performance numbers as well.
As I worked with his father while I was a graduate student at UCLA, it was a nice
symmetry to work with Rimas at UCB.

I also wish to thank my longtime collaborator Randy Katz of UC Berkeley, who
helped develop the concept of great ideas in computer architecture as part of the
extensive revision of an undergraduate class that we did together.
I’d like to thank David Kirk, John Nickolls, and their colleagues at NVIDIA
(Michael Garland, John Montrym, Doug Voorhies, Lars Nyland, Erik Lindholm,
Paulius Micikevicius, Massimiliano Fatica, Stuart Oberman, and Vasily Volkov)
for writing the first in-depth appendix on GPUs. I’d like to express again my
appreciation to Jim Larus, recently named Dean of the School of Computer and
Communications Science at EPFL, for his willingness in contributing his expertise
on assembly language programming, as well as for welcoming readers of this book
with regard to using the simulator he developed and maintains.
I am also very grateful to Zachary Kurmas of Grand Valley State University,
who updated and created new exercises, based on originals created by Perry
Alexander (The University of Kansas); Jason Bakos (University of South Carolina);
Javier Bruguera (Universidade de Santiago de Compostela); Matthew Farrens
(University of California, Davis); David Kaeli (Northeastern University); Nicole
Kaiyan (University of Adelaide); John Oliver (Cal Poly, San Luis Obispo); Milos
Prvulovic (Georgia Tech); Jichuan Chang (Google); Jacob Leverich (Stanford);
Kevin Lim (Hewlett-Packard); and Partha Ranganathan (Google).
Additional thanks goes to Peter Ashenden for updating the lecture slides.
I am grateful to the many instructors who have answered the publisher’s surveys,
reviewed our proposals, and attended focus groups. They include the following
individuals: Focus Groups: Bruce Barton (Suffolk County Community College), Jeff
Braun (Montana Tech), Ed Gehringer (North Carolina State), Michael Goldweber
(Xavier University), Ed Harcourt (St. Lawrence University), Mark Hill (University
of Wisconsin, Madison), Patrick Homer (University of Arizona), Norm Jouppi
(HP Labs), Dave Kaeli (Northeastern University), Christos Kozyrakis (Stanford
University), Jae C. Oh (Syracuse University), Lu Peng (LSU), Milos Prvulovic (Georgia
Tech), Partha Ranganathan (HP Labs), David Wood (University of Wisconsin),
Craig Zilles (University of Illinois at Urbana-Champaign). Surveys and Reviews:

Mahmoud Abou-Nasr (Wayne State University), Perry Alexander (The University
of Kansas), Behnam Arad (Sacramento State University), Hakan Aydin (George
Mason University), Hussein Badr (State University of New York at Stony Brook),
Mac Baker (Virginia Military Institute), Ron Barnes (George Mason University),


Preface

Douglas Blough (Georgia Institute of Technology), Kevin Bolding (Seattle Pacific
University), Miodrag Bolic (University of Ottawa), John Bonomo (Westminster
College), Jeff Braun (Montana Tech), Tom Briggs (Shippensburg University), Mike
Bright (Grove City College), Scott Burgess (Humboldt State University), Fazli Can
(Bilkent University), Warren R. Carithers (Rochester Institute of Technology),
Bruce Carlton (Mesa Community College), Nicholas Carter (University of Illinois
at Urbana-Champaign), Anthony Cocchi (The City University of New York), Don
Cooley (Utah State University), Gene Cooperman (Northeastern University),
Robert D. Cupper (Allegheny College), Amy Csizmar Dalal (Carleton College),
Daniel Dalle (Université de Sherbrooke), Edward W. Davis (North Carolina State
University), Nathaniel J. Davis (Air Force Institute of Technology), Molisa Derk
(Oklahoma City University), Andrea Di Blas (Stanford University), Derek Eager
(University of Saskatchewan), Ata Elahi (Souther Connecticut State University),
Ernest Ferguson (Northwest Missouri State University), Rhonda Kay Gaede (The
University of Alabama), Etienne M. Gagnon (L’Université du Québec à Montréal),
Costa Gerousis (Christopher Newport University), Paul Gillard (Memorial
University of Newfoundland), Michael Goldweber (Xavier University), Georgia
Grant (College of San Mateo), Paul V. Gratz (Texas A&M University), Merrill Hall
(The Master’s College), Tyson Hall (Southern Adventist University), Ed Harcourt
(St. Lawrence University), Justin E. Harlow (University of South Florida), Paul F.
Hemler (Hampden-Sydney College), Jayantha Herath (St. Cloud State University),
Martin Herbordt (Boston University), Steve J. Hodges (Cabrillo College), Kenneth

Hopkinson (Cornell University), Bill Hsu (San Francisco State University), Dalton
Hunkins (St. Bonaventure University), Baback Izadi (State University of New
York—New Paltz), Reza Jafari, Robert W. Johnson (Colorado Technical University),
Bharat Joshi (University of North Carolina, Charlotte), Nagarajan Kandasamy
(Drexel University), Rajiv Kapadia, Ryan Kastner (University of California,
Santa Barbara), E.J. Kim (Texas A&M University), Jihong Kim (Seoul National
University), Jim Kirk (Union University), Geoffrey S. Knauth (Lycoming College),
Manish M. Kochhal (Wayne State), Suzan Koknar-Tezel (Saint Joseph’s University),
Angkul Kongmunvattana (Columbus State University), April Kontostathis (Ursinus
College), Christos Kozyrakis (Stanford University), Danny Krizanc (Wesleyan
University), Ashok Kumar, S. Kumar (The University of Texas), Zachary Kurmas
(Grand Valley State University), Adrian Lauf (University of Louisville), Robert N.
Lea (University of Houston), Alvin Lebeck (Duke University), Baoxin Li (Arizona
State University), Li Liao (University of Delaware), Gary Livingston (University of
Massachusetts), Michael Lyle, Douglas W. Lynn (Oregon Institute of Technology),
Yashwant K Malaiya (Colorado State University), Stephen Mann (University of
Waterloo), Bill Mark (University of Texas at Austin), Ananda Mondal (Claflin
University), Alvin Moser (Seattle University),
Walid Najjar (University of California, Riverside), Vijaykrishnan Narayanan
(Penn State University), Danial J. Neebel (Loras College), Victor Nelson (Auburn
University), John Nestor (Lafayette College), Jae C. Oh (Syracuse University),
Joe Oldham (Centre College), Timour Paltashev, James Parkerson (University of
Arkansas), Shaunak Pawagi (SUNY at Stony Brook), Steve Pearce, Ted Pedersen

xxiii


xxivPreface

(University of Minnesota), Lu Peng (Louisiana State University), Gregory D.

Peterson (The University of Tennessee), William Pierce (Hood College), Milos
Prvulovic (Georgia Tech), Partha Ranganathan (HP Labs), Dejan Raskovic
(University of Alaska, Fairbanks) Brad Richards (University of Puget Sound),
Roman Rozanov, Louis Rubinfield (Villanova University), Md Abdus Salam
(Southern University), Augustine Samba (Kent State University), Robert Schaefer
(Daniel Webster College), Carolyn J. C. Schauble (Colorado State University),
Keith Schubert (CSU San Bernardino), William L. Schultz, Kelly Shaw (University
of Richmond), Shahram Shirani (McMaster University), Scott Sigman (Drury
University), Shai Simonson (Stonehill College), Bruce Smith, David Smith, Jeff W.
Smith (University of Georgia, Athens), Mark Smotherman (Clemson University),
Philip Snyder (Johns Hopkins University), Alex Sprintson (Texas A&M), Timothy
D. Stanley (Brigham Young University), Dean Stevens (Morningside College),
Nozar Tabrizi (Kettering University), Yuval Tamir (UCLA), Alexander Taubin
(Boston University), Will Thacker (Winthrop University), Mithuna Thottethodi
(Purdue University), Manghui Tu (Southern Utah University), Dean Tullsen (UC
San Diego), Steve VanderLeest (Calvin College), Christopher Vickery (Queens
College of CUNY), Rama Viswanathan (Beloit College), Ken Vollmar (Missouri
State University), Guoping Wang (Indiana-Purdue University), Patricia Wenner
(Bucknell University), Kent Wilken (University of California, Davis), David Wolfe
(Gustavus Adolphus College), David Wood (University of Wisconsin, Madison),
Ki Hwan Yum (University of Texas, San Antonio), Mohamed Zahran (City College
of New York), Amr Zaky (Santa Clara University), Gerald D. Zarnett (Ryerson
University), Nian Zhang (South Dakota School of Mines & Technology), Jiling
Zhong (Troy University), Huiyang Zhou (North Carolina State University), Weiyu
Zhu (Illinois Wesleyan University).
A special thanks also goes to Mark Smotherman for making multiple passes to
find technical and writing glitches that significantly improved the quality of this
edition.
We wish to thank the extended Morgan Kaufmann family for agreeing to
publish this book again under the able leadership of Katey Birtcher, Steve Merken,

and Nate McFadden: I certainly couldn’t have completed the book without them.
We also want to extend thanks to Lisa Jones, who managed the book production
process, and Victoria Pearson Esser, who did the cover design. The cover cleverly
connects the post-PC era content of this edition to the cover of the first edition.
Finally, I owe a huge debt to Yunsup Lee and Andrew Waterman for taking on
this conversion to RISC-V in their spare time while founding a startup company.
Kudos to Eric Love as well, who made RISC-V versions of the exercises in this
edition while finishing his Ph.D. We’re all excited to see what will happen with
RISC-V in academia and beyond.
The contributions of the nearly 150 people we mentioned here have helped
make this new edition what I hope will be our best book yet. Enjoy!
David A. Patterson


×