Tải bản đầy đủ (.pdf) (709 trang)

embedded computing a vliw approach to architecture compilers and tools

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.6 MB, 709 trang )

Praise for Embedded Computing: A VLIW
Approach to Architecture, Compilers
and Tools
There is little doubt that embedded computing is the new frontier of computer research.
There is also a consensus that VLIW technology is extremely powerful in this domain.
This book speaks with an authoritative voice on VLIW for embedded with true technical
depth and deep wisdom from the pioneering experiences of the authors. This book will
find a place on my shelf next to the classic texts on computer architecture and compiler
optimization. It is simply that good.
Tom Conte Center for Embedded Systems Research, North Carolina State University
Written by one of the field’s inventors with his collaborators, this book is the first complete
exposition of the VLIW design philosophy for embedded systems. It can be read as a
stand-alone reference on VLIW — a careful treatment of the ISA, compiling and program
analysis tools needed to develop a new generation of embedded systems — or as a series
of design case studies drawn from the authors’ extensive experience. The authors’ style
is careful yet informal, and the book abounds with “flames,” debunked “fallacies” and
other material that engages the reader in the lively interplay between academic research
and commercial development that has made this aspect of computer architecture so
exciting. Embedded Computing: A VLIW Approach to Architecture, Compilers, and
Tools will certainly be the definitive treatment of this important chapter in computer
architecture.
Richard DeMillo Georgia Institute of Technology
This book does a superb job of laying down the foundations of VLIW computing and con-
veying how the VLIW principles have evolved to meet the needs of embedded computing.
Due to the additional attention paid to characterizing a wide range of embedded appli-
cations and development of an accompanying toolchain, this book sets a new standard
both as a reference and a text for embedded computing.
Rajiv Gupta The University of Arizona
A wealth of wisdom on a high-performance and power-efficient approach to embedded
computing. I highly recommend it for both engineers and students.


Norm Jouppi HP Labs
TEAM LinG - Live, Informative, Non-cost and Genuine !
Praise for Embedded Computing continued
Josh, Paolo, and Cliff have devoted most of their professional lives to developing and
advancing the fundamental research and use of VLIW architectures and instruction
level parallelism. They are also system-builders in the best and broadest sense of the
term. This book offers deep insights into the field, and highlights the power of these
technologies for use in the rapidly expanding field of high performance embedded com-
puting. I believe this book will become required reading for anyone working in these
technologies.
Dick Lampman HP Labs
Embedded Computing is a fabulous read, engagingly styled, with generous research
and practical perspective, and authoritative, since Fisher has been responsible for this
paradigm of simultaneously engineering the compiler and processor. Practicing engi-
neers — both architects and embedded system designers — will find the techniques they
will need to achieve the substantial benefits of VLIW-based systems. Instructors will value
the rare juxtaposition of advanced technology with practical deployment examples, and
students will enjoy the unusually interesting and mind-expanding chapter exercises.
Richard A. Lethin Reservoir Labs and Yale University
One of the strengths of this book is that it combines the perspectives of academic
research, industrial development, as well as tool building. While its coverage of embed-
ded architectures and compilers is very broad, it is also deep where necessary. Embedded
Computing is a must-have for any student or practitioner of embedded computing.
Walid Najjar University of California, Riverside
TEAM LinG - Live, Informative, Non-cost and Genuine !
Embedded Computing
A VLIW Approach to Architecture, Compilers and Tools
TEAM LinG - Live, Informative, Non-cost and Genuine !
TEAM LinG - Live, Informative, Non-cost and Genuine !
Embedded

Computing
A VLIW Approach to
Architecture, Compilers and Tools
Joseph A. Fisher
Paolo Faraboschi
Cliff Young
AMSTERDAM • BOSTON • HEIDELBERG • LONDON
NEW YORK • OXFORD • PARIS • SAN DIEGO
SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
Morgan Kaufmann is an imprint of Elsevier
TEAM LinG - Live, Informative, Non-cost and Genuine !
Publisher Denise E. M. Penrose
Publishing Services Manager Simon Crump
Senior Production Editor Angela Dooley
Editorial Assistant Valerie Witte
Cover Design Hannus Design
Cover Image Santiago Calatrava’s Alamillo Bridge
Text Design Frances Baca Design
Composition CEPHA
Technical Illustration Dartmouth Publishing
Copyeditor Daril Bentley
Proofreader Phyllis Coyne & Associates
Indexer Northwind Editorial
Interior printer The Maple-Vail Manufacturing Group
Cover printer Phoenix Color, Inc.
Morgan Kaufmann Publishers is an imprint of Elsevier. 500 Sansome Street, Suite 400, San Francisco, CA 94111
This book is printed on acid-free paper.
© 2005 by Elsevier Inc. All rights reserved.
Designations used by companies to distinguish their products are often claimed as trademarks or registered trademarks.
In all instances in which Morgan Kaufmann Publishers is aware of a claim, the product names appear in initial capital or

all capital letters. Readers, however, should contact the appropriate companies for more complete information regarding
trademarks and registration.
Cover image: Santiago Calatrava’s Alamillo Bridge blends art and engineering to make architecture. While his design
remains a modern, cable-stayed bridge, it simultaneously reinvents the category, breaking traditional assumptions and
rearranging structural elements into a new form that is efficient, powerful, and beautiful. The authors chose this cover
image for a number of reasons. Compiler engineering, which is at the heart of modern VLIW design, is similar to bridge
engineering: both must be built to last for decades, to withstand changes in usage and replacement of components, and
to weather much abuse. The VLIW design philosophy was one of the first computer architectural styles to bridge the
software and hardware communities, treating them as equals and partners. And this book is meant as a bridge between
the VLIW and embedded communities, which had historically been separate, but which today have complementary
strengths and requirements.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any
means — electronic, mechanical, photocopying, scanning, or otherwise — without prior written permission of the
publisher.
Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44)
1865 843830, fax: (+44) 1865 853333, e-mail: You may also complete your request on-line
via the Elsevier homepage () by selecting “Customer Support” and then “Obtaining Permissions.”
ADVICE, PRAISE, AND ERRORS: Any correspondence related to this publication or intended for the authors should be
addressed to Information regarding error sightings is also encouraged and can be sent to
Library of Congress Cataloging-in-Publication Data
ISBN: 1-55860-766-8
For information on all Morgan Kaufmann publications,
visit our Web site at www.mkp.com or www.books.elsevier.com.
Printed in the United States of America
0405060708 54321
TEAM LinG - Live, Informative, Non-cost and Genuine !
To my wife Elizabeth, our children David and Dora,
and my parents, Harry and the late Susan Fisher.
And to my friend and mentor, Martin Davis.
Josh Fisher

To the memory of my late parents Silvio and Gina,
to my wife Tatiana and our daughter Silvia.
Paolo Faraboschi
To the women of my family:
Yueh-Jing, Dorothy, Matilda, Joyce, and Celeste.
Cliff Young
To Bob Rau, a VLIW pioneer and true visionary,
and a wonderful human being.
We were privileged to know and work with him.
The Authors
TEAM LinG - Live, Informative, Non-cost and Genuine !
TEAM LinG - Live, Informative, Non-cost and Genuine !
About the Authors
JOSEPH A. FISHER is a Hewlett-Packard Senior Fellow at HP Labs, where he has
worked since 1990 in instruction-level parallelism and in custom embedded VLIW pro-
cessors and their compilers. Josh studied at the Courant Institute of NYU (B.A., M.A.,
and then Ph.D. in 1979), where he devised the trace scheduling compiler algorithm
and coined the term instruction-level parallelism. As a professor at Yale University, he
created and named VLIW architectures and invented many of the fundamental tech-
nologies of ILP. In 1984, he started Multiflow Computer with two members of his Yale
team. Josh won an NSF Presidential Young Investigator Award in 1984, was the 1987
Connecticut Eli Whitney Entrepreneur of the Year, and in 2003 received the ACM/IEEE
Eckert-Mauchly Award.
PAOLO FARABOSCHI is a Principal Research Scientist at HP Labs. Before joining
Hewlett-Packard in 1994, Paolo received an M.S. (Laurea) and Ph.D. (Dottorato di
Ricerca) in electrical engineering and computer science from the University of Genoa
(Italy) in 1989 and 1993, respectively. His research interests skirt the boundary of
hardware and software, including VLIW architectures, compilers, and embedded sys-
tems. More recently, he has been looking at the computing aspects of demanding
content-processing applications. Paolo is an active member of the computer architec-

ture community, has served in many program committees, and was Program Co-chair
for MICRO (2001) and CASES (2003).
CLIFF YOUNG works for D. E. Shaw Research and Development, LLC, a member of the
D. E. Shaw group of companies, on projects involving special-purpose, high-performance
computers for computational biochemistry. Before his current position, he was a Member
of Technical Staff at Bell Laboratories in Murray Hill, New Jersey. He received A.B., S.M.,
and Ph.D. degrees in computer science from Harvard University in 1989, 1995, and 1998,
respectively.
ix
TEAM LinG - Live, Informative, Non-cost and Genuine !
TEAM LinG - Live, Informative, Non-cost and Genuine !
Foreword
Bob Colwell,R&EColwell & Assoc. Inc.
T
here are two ways to learn more about your country: you can study it directly
by traveling around in it or you can study it indirectly by leaving it. The first
method yields facts and insights directly in context, and the second by contrast.
Our tradition in computer engineering has been to seldom leave our neighborhood.
If you want to learn about operating systems, you read an OS book. For multiprocessor
systems, you get a book that maps out the MP space.
The book you are holding in your hands can serve admirably in that direct sense. If
the technology you are working on is associated with VLIWs or “embedded computing,”
clearly it is imperative that you read this book.
But what pleasantly surprised me was how useful this book is, even if one’s work
is not VLIW-related or has no obvious relationship to embedded computing. I had long
felt it was time for Josh Fisher to write his magnum opus on VLIWs, so when I first heard
that he and his coauthors were working on a book with VLIW in the title I naturally and
enthusiastically assumed this was it. Then I heard the words “embedded computing”
were also in the title and felt considerable uncertainty, having spent most of my profes-
sional career in the general-purpose computing arena. I thought embedded computing

was interesting, but mostly in the same sense that studying cosmology was interesting:
intellectually challenging, but what does it have to do with me?
I should have known better. I don’t think Josh Fisher can write boring text. He
doesn’t know how. (I still consider his “Very Long Instruction Word Architectures and
the ELI-512” paper from ISCA-10 to be the finest conference publication I have ever read.)
And he seems to have either found like-minded coauthors in Faraboschi and Young or
has taught them well, because Embedded Computing: A VLIW Approach to Architecture,
Tools and Compilers is enthralling in its clarity and exhilarating in its scope. If you are
involved in computer system design or programming, you must still read this book,
because it will take you to places where the views are spectacular, including those
looking over to where you usually live. You don’t necessarily have to agree with every
point the authors make, but you will understand what they are trying to say, and they
will make you think.
One of the best legacies of the classic Hennessy and Patterson computer architecture
textbooks is that the success of their format and style has encouraged more books like
theirs. In Embedded Computing: A VLIW Approach to Architecture, Tools and Compil-
ers, you will find the pitfalls, controversies, and occasional opinion sidebars that made
xi
TEAM LinG - Live, Informative, Non-cost and Genuine !
xii Foreword
H&P such a joy to read. This kind of technical exposition is like vulcanology done while
standing on an active volcano. Look over there, and see molten lava running under a
new fissure in the rocks. Feel the heat; it commands your full attention. It’s immersive,
it’s interesting, and it’s immediate. If your Vibram soles start melting, it’s still worth it.
You probably needed new shoes anyway.
I first met Josh when I was a grad student at Carnegie-Mellon in 1982. He spent an
hour earnestly describing to me how a sufficiently talented compiler could in principle
find enough parallelism, via a technique he called trace scheduling, to keep a really
wild-looking hardware engine busy. The compiler would speculatively move code all
over the place, and then invent more code to fix up what it got wrong. I thought to myself

“So this is what a lunatic looks like up close. I hope he’s not dangerous.” Two years later
I joined him at Multiflow and learned more in the next five years than I ever have, before
or since.
It was an honor to review an early draft of this book, and I was thrilled to be asked to
contribute this foreword. As the book makes clear, general-purpose computing has tra-
ditionally gotten the glory, while embedded computing quietly keeps our infrastructure
running. This is probably just a sign of the immaturity of the general-purpose com-
puting environment (even though we “nonembedded” types don’t like to admit that).
With general-purpose computers, people “use the computer” to do something. But with
embedded computers, people accomplish some task, blithely and happily unaware that
there’s a computer involved. Indeed, if they had to be conscious of the computer, their
embedded computers would have already failed: antilock brakes and engine controllers,
for instance. General-purpose CPUs have a few microarchitecture performance tricks to
show their embedded brethren, but the embedded space has much more to teach the
general computing folks about the bigger picture: total cost of ownership, who lives in
the adjacent neighborhoods, and what they need for all to live harmoniously. This book
is a wonderful contribution toward that evolution.
TEAM LinG - Live, Informative, Non-cost and Genuine !
Contents
About the Authors ix
Foreword xi
Preface xxvii
Content and Structure xxviii
The VEX (VLIW Example) Computing System xxx
Audience xxx
Cross-cutting Topics xxxi
How to Read This Book xxxi
Figure Acknowledgments xxxiv
Acknowledgments xxxv
CHAPTER 1

An Introduction to Embedded Processing 1
1.1 What Is Embedded Computing? 3
1.1.1 Attributes of Embedded Devices 4
1.1.2 Embedded Is Growing 5
1.2 Distinguishing Between Embedded and General-Purpose Computing 6
1.2.1 The “Run One Program Only” Phenomenon 8
1.2.2 Backward and Binary Compatibility 9
1.2.3 Physical Limits in the Embedded Domain 10
1.3 Characterizing Embedded Computing 11
1.3.1 Categorization by Type of Processing Engine 12
Digital Signal Processors 13
Network Processors 16
1.3.2 Categorization by Application Area 17
The Image Processing and Consumer Market 18
The Communications Market 20
The Automotive Market 22
1.3.3 Categorization by Workload Differences 22
1.4 Embedded Market Structure 23
1.4.1 The Market for Embedded Processor Cores 24
xiii
TEAM LinG - Live, Informative, Non-cost and Genuine !
xiv Contents
1.4.2 Business Model of Embedded Processors 25
1.4.3 Costs and Product Volume 26
1.4.4 Software and the Embedded Software Market 28
1.4.5 Industry Standards 28
1.4.6 Product Life Cycle 30
1.4.7 The Transition to SoC Design 31
Effects of SoC on the Business Model 34
Centers of Embedded Design 35

1.4.8 The Future of Embedded Systems 36
Connectivity: Always-on Infrastructure 36
State: Personal Storage 36
Administration 37
Security 37
The Next Generation 37
1.5 Further Reading 38
1.6 Exercises 40
CHAPTER 2
An Overview of VLIW and ILP 45
2.1 Semantics and Parallelism 46
2.1.1 Baseline: Sequential Program Semantics 46
2.1.2 Pipelined Execution, Overlapped Execution, and
Multiple Execution Units 47
2.1.3 Dependence and Program Rearrangement 51
2.1.4 ILP and Other Forms of Parallelism 52
2.2 Design Philosophies 54
2.2.1 An Illustration of Design Philosophies: RISC
Versus CISC 56
2.2.2 First Definition of VLIW 57
2.2.3 A Design Philosophy: VLIW 59
VLIW Versus Superscalar 59
VLIW Versus DSP 62
2.3 Role of the Compiler 63
2.3.1 The Phases of a High-Performance Compiler 63
2.3.2 Compiling for ILP and VLIW 65
2.4 VLIW in the Embedded and DSP Domains 69
2.5 Historical Perspective and Further Reading 71
2.5.1 ILP Hardware in the 1960s and 1970s 71
Early Supercomputer Arithmetic Units 71

Attached Signal Processors 72
Horizontal Microcode 72
2.5.2 The Development of ILP Code Generation in the 1980s 73
Acyclic Microcode Compaction Techniques 73
Cyclic Techniques: Software Pipelining 75
TEAM LinG - Live, Informative, Non-cost and Genuine !
Contents xv
2.5.3 VLIW Development in the 1980s 76
2.5.4 ILP in the 1990s and 2000s 77
2.6 Exercises 78
CHAPTER 3
An Overview of ISA Design 83
3.1 Overview: What to Hide 84
3.1.1 Architectural State: Memory and Registers 84
3.1.2 Pipelining and Operational Latency 85
3.1.3 Multiple Issue and Hazards 86
Exposing Dependence and Independence 86
Structural Hazards 87
Resource Hazards 89
3.1.4 Exception and Interrupt Handling 89
3.1.5 Discussion 90
3.2 Basic VLIW Design Principles 91
3.2.1 Implications for Compilers and Implementations 92
3.2.2 Execution Model Subtleties 93
3.3 Designing a VLIW ISA for Embedded Systems 95
3.3.1 Application Domain 96
3.3.2 ILP Style 98
3.3.3 Hardware/Software Tradeoffs 100
3.4
Instruction-set Encoding 101

3.4.1 A Larger Definition of Architecture 101
3.4.2 Encoding and Architectural Style 105
RISC Encodings 107
CISC Encodings 108
VLIW Encodings 109
Why Not Superscalar Encodings? 109
DSP Encodings 110
Vector Encodings 111
3.5 VLIW Encoding 112
3.5.1 Operation Encoding 113
3.5.2 Instruction Encoding 113
Fixed-overhead Encoding 115
Distributed Encoding 115
Template-based Encoding 116
3.5.3 Dispatching and Opcode Subspaces 117
3.6 Encoding and Instruction-set Extensions 119
3.7 Further Reading 121
3.8 Exercises 121
TEAM LinG - Live, Informative, Non-cost and Genuine !
xvi Contents
CHAPTER 4
Architectural Structures in ISA Design 125
4.1 The Datapath 127
4.1.1 Location of Operands and Results 127
4.1.2 Datapath Width 127
4.1.3 Operation Repertoire 129
Simple Integer and Compare Operations 131
Carry, Overflow, and Other Flags 131
Common Bitwise Utilities 132
Integer Multiplication 132

Fixed-point Multiplication 133
Integer Division 135
Floating-point Operations 136
Saturated Arithmetic 137
4.1.4 Micro-SIMD Operations 139
Alignment Issues 141
Precision Issues 141
Dealing with Control Flow 142
Pack, Unpack, and Mix 143
Reductions 143
4.1.5 Constants 144
4.2 Registers and Clusters 144
4.2.1 Clustering 145
Architecturally Invisible Clustering 147
Architecturally Visible Clustering 147
4.2.2 Heterogeneous Register Files 149
4.2.3 Address and Data Registers 149
4.2.4 Special Register File Features 150
Indexed Register Files 150
Rotating Register Files 151
4.3 Memory Architecture 151
4.3.1 Addressing Modes 152
4.3.2 Access Sizes 153
4.3.3 Alignment Issues 153
4.3.4 Caches and Local Memories 154
Prefetching 154
Local Memories and Lockable Caches 156
4.3.5 Exotic Addressing Modes for Embedded Processing 156
4.4
Branch Architecture 156

4.4.1 Unbundling Branches 158
Two-step Branching 159
Three-step Branching 159
4.4.2 Multiway Branches 160
TEAM LinG - Live, Informative, Non-cost and Genuine !
Contents xvii
4.4.3 Multicluster Branches 161
4.4.4 Branches and Loops 162
4.5 Speculation and Predication 163
4.5.1 Speculation 163
Control Speculation 164
Data Speculation 167
4.5.2 Predication 168
Full Predication 169
Partial Predication 170
Cost and Benefits of Predication 171
Predication in the Embedded Domain 172
4.6 System Operations 173
4.7 Further Reading 174
4.8
Exercises 175
CHAPTER 5
Microarchitecture Design 179
5.1 Register File Design 182
5.1.1 Register File Structure 182
5.1.2 Register Files, Technology, and Clustering 183
5.1.3 Separate Address and Data Register Files 184
5.1.4 Special Registers and Register File Features 186
5.2 Pipeline Design 186
5.2.1 Balancing a Pipeline 187

5.3
VLIW Fetch, Sequencing, and Decoding 191
5.3.1 Instruction Fetch 191
5.3.2 Alignment and Instruction Length 192
5.3.3 Decoding and Dispersal 194
5.3.4 Decoding and ISA Extensions 195
5.4 The Datapath 195
5.4.1 Execution Units 197
5.4.2 Bypassing and Forwarding Logic 200
5.4.3 Exposing Latencies 202
5.4.4 Predication and Selects 204
5.5
Memory Architecture 206
5.5.1 Local Memory and Caches 206
5.5.2 Byte Manipulation 209
5.5.3 Addressing, Protection, and Virtual Memory 210
5.5.4 Memories in Multiprocessor Systems 211
5.5.5 Memory Speculation 213
5.6 The Control Unit 214
5.6.1 Branch Architecture 214
5.6.2 Predication and Selects 215
TEAM LinG - Live, Informative, Non-cost and Genuine !
xviii Contents
5.6.3 Interrupts and Exceptions 216
5.6.4 Exceptions and Pipelining 218
Drain and Flush Pipeline Models 218
Early Commit 219
Delayed Commit 220
5.7 Control Registers 221
5.8 Power Considerations 221

5.8.1 Energy Efficiency and ILP 222
System-level Power Considerations 224
5.9 Further Reading 225
5.10
Exercises 227
CHAPTER 6
System Design and Simulation 231
6.1 System-on-a-Chip (SoC) 231
6.1.1 IP Blocks and Design Reuse 232
A Concrete SoC Example 233
Virtual Components and the VSIA Alliance 235
6.1.2 Design Flows 236
Creation Flow 236
Verification Flow 238
6.1.3 SoC Buses 239
Data Widths 240
Masters, Slaves, and Arbiters 241
Bus Transactions 242
Test Modes 244
6.2 Processor Cores and SoC 245
6.2.1 Nonprogrammable Accelerators 246
Reconfigurable Logic 248
6.2.2 Multiprocessing on a Chip 250
Symmetric Multiprocessing 250
Heterogeneous Multiprocessing 251
Example: A Multicore Platform for Mobile Multimedia 252
6.3 Overview of Simulation 254
6.3.1 Using Simulators 256
6.4 Simulating a VLIW Architecture 257
6.4.1 Interpretation 258

6.4.2 Compiled Simulation 259
Memory 262
Registers 263
Control Flow 263
Exceptions 266
TEAM LinG - Live, Informative, Non-cost and Genuine !
Contents xix
Analysis of Compiled Simulation 267
Performance Measurement and Compiled Simulation 268
6.4.3 Dynamic Binary Translation 268
6.4.4 Trace-driven Simulation 270
6.5 System Simulation 271
6.5.1 I/O and Concurrent Activities 272
6.5.2 Hardware Simulation 272
Discrete Event Simulation 274
6.5.3 Accelerating Simulation 275
In-Circuit Emulation 275
Hardware Accelerators for Simulation 276
6.6 Validation and Verification 276
6.6.1 Co-simulation 278
6.6.2 Simulation, Verification, and Test 279
Formal Verification 280
Design for Testability 280
Debugging Support for SoC 281
6.7 Further Reading 282
6.8 Exercises 284
CHAPTER 7
Embedded Compiling and Toolchains 287
7.1 What Is Important in an ILP Compiler? 287
7.2 Embedded Cross-Development Toolchains 290

7.2.1 Compiler 291
7.2.2 Assembler 292
7.2.3 Libraries 294
7.2.4 Linker 296
7.2.5 Post-link Optimizer 297
7.2.6 Run-time Program Loader 297
7.2.7 Simulator 299
7.2.8 Debuggers and Monitor ROMs 300
7.2.9 Automated Test Systems 301
7.2.10 Profiling Tools 302
7.2.11 Binary Utilities 302
7.3 Structure of an ILP Compiler 302
7.3.1 Front End 304
7.3.2 Machine-independent Optimizer 304
7.3.3 Back End: Machine-specific Optimizations 306
7.4
Code Layout 306
7.4.1 Code Layout Techniques 306
DAG-based Placement 308
The “Pettis-Hansen” Technique 310
TEAM LinG - Live, Informative, Non-cost and Genuine !
xx Contents
Procedure Inlining 310
Cache Line Coloring 311
Temporal-order Placement 311
7.5 Embedded-Specific Tradeoffs for Compilers 311
7.5.1 Space, Time, and Energy Tradeoffs 312
7.5.2 Power-specific Optimizations 315
Fundamentals of Power Dissipation 316
Power-aware Software Techniques 317

7.6 DSP-Specific Compiler Optimizations 320
7.6.1 Compiler-visible Features of DSPs 322
Heterogeneous Registers 322
Addressing Modes 322
Limited Connectivity 323
Local Memories 323
Harvard Architecture 324
7.6.2 Instruction Selection and Scheduling 325
7.6.3 Address Computation and Offset Assignment 327
7.6.4 Local Memories 327
7.6.5 Register Assignment Techniques 328
7.6.6 Retargetable DSP and ASIP Compilers 329
7.7 Further Reading 332
7.8 Exercises 333
CHAPTER 8
Compiling for VLIWs and ILP 337
8.1 Profiling 338
8.1.1 Types of Profiles 338
8.1.2 Profile Collection 341
8.1.3 Synthetic Profiles (Heuristics in Lieu of Profiles) 341
8.1.4 Profile Bookkeeping and Methodology 342
8.1.5 Profiles and Embedded Applications 342
8.2 Scheduling 343
8.2.1 Acyclic Region Types and Shapes 345
Basic Blocks 345
Traces 345
Superblocks 345
Hyperblocks 347
Treegions 347
Percolation Scheduling 348

8.2.2 Region Formation 350
Region Selection 351
Enlargement Techniques 353
Phase-ordering Considerations 356
TEAM LinG - Live, Informative, Non-cost and Genuine !
Contents xxi
8.2.3 Schedule Construction 357
Analyzing Programs for Schedule Construction 359
Compaction Techniques 362
Compensation Code 365
Another View of Scheduling Problems 367
8.2.4 Resource Management During Scheduling 368
Resource Vectors 368
Finite-state Automata 369
8.2.5 Loop Scheduling 371
Modulo Scheduling 373
8.2.6 Clustering 380
8.3 Register Allocation 382
8.3.1 Phase-ordering Issues 383
Register Allocation and Scheduling 383
8.4 Speculation and Predication 385
8.4.1 Control and Data Speculation 385
8.4.2 Predicated Execution 386
8.4.3 Prefetching 389
8.4.4 Data Layout Methods 390
8.4.5 Static and Hybrid Branch Prediction 390
8.5 Instruction Selection 390
8.6 Further Reading 391
8.7 Exercises 395
CHAPTER 9

The Run-time System 399
9.1 Exceptions, Interrupts, and Traps 400
9.1.1 Exception Handling 400
9.2 Application Binary Interface Considerations 402
9.2.1 Loading Programs 404
9.2.2 Data Layout 406
9.2.3 Accessing Global Data 407
9.2.4 Calling Conventions 409
Registers 409
Call Instructions 409
Call Sites 410
Function Prologues and Epilogues 412
9.2.5 Advanced ABI Topics 412
Variable-length Argument Lists 412
Dynamic Stack Allocation 413
Garbage Collection 414
Linguistic Exceptions 414
TEAM LinG - Live, Informative, Non-cost and Genuine !
xxii Contents
9.3 Code Compression 415
9.3.1 Motivations 416
9.3.2 Compression and Information Theory 417
9.3.3 Architectural Compression Options 417
Decompression on Fetch 420
Decompression on Refill 420
Load-time Decompression 420
9.3.4 Compression Methods 420
Hand-tuned ISAs 421
Ad Hoc Compression Schemes 421
RAM Decompression 422

Dictionary-based Software Compression 422
Cache-based Compression 422
Quantifying Compression Benefits 424
9.4 Embedded Operating Systems 427
9.4.1 “Traditional” OS Issues Revisited 427
9.4.2 Real-time Systems 428
Real-time Scheduling 429
9.4.3 Multiple Flows of Control 431
Threads, Processes, and Microkernels 432
9.4.4 Market Considerations 433
Embedded Linux 435
9.4.5 Downloadable Code and Virtual Machines 436
9.5 Multiprocessing and Multithreading 438
9.5.1 Multiprocessing in the Embedded World 438
9.5.2 Multiprocessing and VLIW 439
9.6
Further Reading 440
9.7 Exercises 441
CHAPTER 10
Application Design and Customization 443
10.1 Programming Language Choices 443
10.1.1 Overview of Embedded Programming Languages 444
10.1.2 Traditional C and ANSI C 445
10.1.3 C++ and Embedded C++ 447
Embedded C++ 449
10.1.4 Matlab 450
10.1.5 Embedded Java 452
The Allure of Embedded Java 452
Embedded Java: The Dark Side 455
10.1.6 C Extensions for Digital Signal Processing 456

Restricted Pointers 456
Fixed-point Data Types 459
Circular Arrays 461
TEAM LinG - Live, Informative, Non-cost and Genuine !
Contents xxiii
Matrix Referencing and Operators 462
10.1.7 Pragmas, Intrinsics, and Inline Assembly Language Code 462
Compiler Pragmas and Type Annotations 462
Assembler Inserts and Intrinsics 463
10.2 Performance, Benchmarking, and Tuning 465
10.2.1 Importance and Methodology 465
10.2.2 Tuning an Application for Performance 466
Profiling 466
Performance Tuning and Compilers 467
Developing for ILP Targets 468
10.2.3 Benchmarking 473
10.3 Scalability and Customizability 475
10.3.1 Scalability and Architecture Families 476
10.3.2 Exploration and Scalability 477
10.3.3 Customization 478
Customized Implementations 479
10.3.4 Reconfigurable Hardware 480
Using Programmable Logic 480
10.3.5 Customizable Processors and Tools 481
Describing Processors 481
10.3.6 Tools for Customization 483
Customizable Compilers 485
10.3.7 Architecture Exploration 487
Dealing with the Complexity 488
Other Barriers to Customization 488

Wrapping Up 489
10.4
Further Reading 489
10.5 Exercises 490
CHAPTER 11
Application Areas 493
11.1 Digital Printing and Imaging 493
11.1.1 Photo Printing Pipeline 495
JPEG Decompression 495
Scaling 496
Color Space Conversion 497
Dithering 499
11.1.2 Implementation and Performance 501
Summary 505
11.2 Telecom Applications 505
11.2.1 Voice Coding 506
Waveform Codecs 506
Vocoders 507
Hybrid Coders 508
TEAM LinG - Live, Informative, Non-cost and Genuine !
xxiv Contents
11.2.2 Multiplexing 509
11.2.3 The GSM Enhanced Full-rate Codec 510
Implementation and Performance 510
11.3 Other Application Areas 514
11.3.1 Digital Video 515
MPEG-1 and MPEG-2 516
MPEG-4 518
11.3.2 Automotive 518
Fail-safety and Fault Tolerance 519

Engine Control Units 520
In-vehicle Networking 520
11.3.3 Hard Disk Drives 522
Motor Control 524
Data Decoding 525
Disk Scheduling and On-disk Management Tasks 526
Disk Scheduling and Off-disk Management Tasks 527
11.3.4 Networking and Network Processors 528
Network Processors 531
11.4 Further Reading 535
11.5 Exercises 537
APPENDIX A
The VEX System 539
A.1 The VEX Instruction-set Architecture 540
A.1.1 VEX Assembly Language Notation 541
A.1.2 Clusters 542
A.1.3 Execution Model 544
A.1.4 Architecture State 545
A.1.5 Arithmetic and Logic Operations 545
Examples 547
A.1.6 Intercluster Communication 549
A.1.7 Memory Operations 550
A.1.8 Control Operations 552
Examples 553
A.1.9 Structure of the Default VEX Cluster 554
Register Files and Immediates 555
A.1.10 VEX Semantics 556
A.2 The VEX Run-time Architecture 558
A.2.1 Data Allocation and Layout 559
A.2.2 Register Usage 560

A.2.3 Stack Layout and Procedure Linkage 560
Procedure Linkage 563
TEAM LinG - Live, Informative, Non-cost and Genuine !

×