Tải bản đầy đủ (.pdf) (604 trang)

High performance embedded computing handbook

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (21.29 MB, 604 trang )


High Performance
Embedded Computing Handbook
A Systems Perspective

7197.indb 1

5/14/08 12:15:10 PM


7197.indb 2

5/14/08 12:15:10 PM


High Performance
Embedded Computing Handbook
A Systems Perspective

Edited by

David R. Martinez
Robert A. Bond
M. Michael Vai
Massachusetts Institute of Technology
Lincoln Laboratory
Lexington, Massachusetts, U.S.A.

7197.indb 3

5/14/08 12:15:10 PM




The U.S. Government is reserved a royalty-free, non-exclusive license to use or have others use or copy the work for government purposes. MIT and MIT Lincoln Laboratory are reserved a license to use and distribute the work for internal
research and educational use purposes.
MATLAB® is a trademark of The MathWorks, Inc. and is used with permission. The MathWorks does not warrant the
accuracy of the text or exercises in this book. This book’s use or discussion of MATLAB® software or related products
does not constitute endorsement or sponsorship by The MathWorks of a particular pedagogical approach or particular
use of the MATLAB® software.

CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2008 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed in the United States of America on acid-free paper
10 9 8 7 6 5 4 3 2 1
International Standard Book Number-13: 978-0-8493-7197-4 (Hardcover)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this
form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may
rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for

identification and explanation without intent to infringe.
Library of Congress Cataloging‑in‑Publication Data
High performance embedded computing handbook : a systems perspective / editors, David R.
Martinez, Robert A. Bond, M. Michael Vai.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-8493-7197-4 (hardback : alk. paper)
1. Embedded computer systems--Handbooks, manuals, etc. 2. High performance
computing--Handbooks, manuals, etc. I. Martinez, David R. II. Bond, Robert A. III. Vai, M. Michael.
IV. Title.
TK7895.E42H54 2008
004.16--dc22

2008010485

Visit the Taylor & Francis Web site at

and the CRC Press Web site at


7197.indb 4

5/14/08 12:15:11 PM


Dedication
This handbook is dedicated to MIT Lincoln Laboratory for providing the opportunities to work
on exciting and challenging hardware and software projects leading to the demonstration of high
performance embedded computing systems.




7197.indb 5

5/14/08 12:15:12 PM


7197.indb 6

5/14/08 12:15:12 PM


Contents
Preface.............................................................................................................................................xix
Acknowledgments............................................................................................................................xxi
About the Editors.......................................................................................................................... xxiii
Contributors....................................................................................................................................xxv

Section I  Introduction
Chapter 1 A Retrospective on High Performance Embedded Computing....................................3
David R. Martinez, MIT Lincoln Laboratory
1.1 Introduction..............................................................................................................................3
1.2 HPEC Hardware Systems and Software Technologies............................................................7
1.3 HPEC Multiprocessor System..................................................................................................9
1.4 Summary................................................................................................................................ 13
References......................................................................................................................................... 13
Chapter 2 Representative Example of a High Performance Embedded Computing
System......................................................................................................................... 15
David R. Martinez, MIT Lincoln Laboratory
2.1 Introduction............................................................................................................................ 15

2.2 System Complexity................................................................................................................ 16
2.3 Implementation Techniques...................................................................................................20
2.4 Software Complexity and System Integration....................................................................... 23
2.5 Summary................................................................................................................................26
References......................................................................................................................................... 27
Chapter 3 System Architecture of a Multiprocessor System....................................................... 29
David R. Martinez, MIT Lincoln Laboratory
3.1
3.2
3.3
3.4

Introduction............................................................................................................................ 29
A Generic Multiprocessor System......................................................................................... 30
A High Performance Hardware System................................................................................. 32
Custom VLSI Implementation............................................................................................... 33
3.4.1 Custom VLSI Hardware........................................................................................... 36
3.5 A High Performance COTS Programmable Signal Processor.............................................. 37
3.6 Summary................................................................................................................................ 39
References......................................................................................................................................... 39
Chapter 4 High Performance Embedded Computers: Development Process and
Management Perspectives........................................................................................... 41
Robert A. Bond, MIT Lincoln Laboratory
4.1
4.2

Introduction............................................................................................................................ 41
Development Process............................................................................................................. 42
vii


7197.indb 7

5/14/08 12:15:13 PM


viii

High Performance Embedded Computing Handbook: A Systems Perspective

4.3

Case Study: Airborne Radar HPEC System..........................................................................46
4.3.1 Programmable Signal Processor Development........................................................ 52
4.3.2 Software Estimation, Monitoring, and Configuration Control................................ 57
4.3.3 PSP Software Integration, Optimization, and Verification......................................60
4.4 Trends.....................................................................................................................................66
References......................................................................................................................................... 69

Section II  Computational Nature of High

Performance Embedded Systems
Chapter 5 Computational Characteristics of High Performance Embedded Algorithms
and Applications.......................................................................................................... 73
Masahiro Arakawa and Robert A. Bond, MIT Lincoln Laboratory
5.1 Introduction............................................................................................................................ 73
5.2 General Computational Characteristics of HPEC................................................................. 76
5.3 Complexity of HPEC Algorithms.......................................................................................... 88
5.4 Parallelism in HPEC Algorithms and Architectures.............................................................96
5.5 Future Trends....................................................................................................................... 109
References....................................................................................................................................... 112

Chapter 6 Radar Signal Processing: An Example of High Performance Embedded
Computing................................................................................................................. 113
Robert A. Bond and Albert I. Reuther, MIT Lincoln Laboratory
Introduction.......................................................................................................................... 113
A Canonical HPEC Radar Algorithm................................................................................. 116
6.2.1 Subband Analysis and Synthesis............................................................................ 120
6.2.2 Adaptive Beamforming.......................................................................................... 122
6.2.3 Pulse Compression................................................................................................. 131
6.2.4 Doppler Filtering.................................................................................................... 132
6.2.5 Space-Time Adaptive Processing........................................................................... 132
6.2.6 Subband Synthesis Revisited.................................................................................. 136
6.2.7 CFAR Detection..................................................................................................... 136
6.3 Example Architecture of the Front-End Processor.............................................................. 138
6.3.1 A Discussion of the Back-End Processing............................................................. 140
6.4 Conclusion............................................................................................................................ 143
References....................................................................................................................................... 144

6.1
6.2

Section III  Front-End Real-Time Processor Technologies
Chapter 7 Analog-to-Digital Conversion................................................................................... 149
James C. Anderson and Helen H. Kim, MIT Lincoln Laboratory
7.1
7.2
7.3

7197.indb 8

Introduction.......................................................................................................................... 149

Conceptual ADC Operation................................................................................................. 150
Static Metrics....................................................................................................................... 150
7.3.1 Offset Error............................................................................................................ 150

5/14/08 12:15:13 PM


Contents

ix

7.3.2 Gain Error............................................................................................................... 152
7.3.3 Differential Nonlinearity........................................................................................ 152
7.3.4 Integral Nonlinearity.............................................................................................. 152
7.4 Dynamic Metrics................................................................................................................. 152
7.4.1 Resolution............................................................................................................... 152
7.4.2 Monotonicity.......................................................................................................... 153
7.4.3 Equivalent Input-Referred Noise (Thermal Noise)................................................ 153
7.4.4 Quantization Error.................................................................................................. 153
7.4.5 Ratio of Signal to Noise and Distortion................................................................. 154
7.4.6 Effective Number of Bits........................................................................................ 154
7.4.7 Spurious-Free Dynamic Range.............................................................................. 154
7.4.8 Dither...................................................................................................................... 155
7.4.9 Aperture Uncertainty............................................................................................. 155
7.5 System-Level Performance Trends and Limitations............................................................ 156
7.5.1 Trends in Resolution............................................................................................... 156
7.5.2 Trends in Effective Number of Bits........................................................................ 157
7.5.3 Trends in Spurious-Free Dynamic Range.............................................................. 158
7.5.4 Trends in Power Consumption............................................................................... 159
7.5.5 ADC Impact on Processing Gain........................................................................... 160

7.6 High-Speed ADC Design..................................................................................................... 160
7.6.1 Flash ADC.............................................................................................................. 161
7.6.2 Architectural Techniques for Power Saving........................................................... 165
7.6.3 Pipeline ADC......................................................................................................... 168
7.7 Power Dissipation Issues in High-Speed ADCs.................................................................. 170
7.8 Summary.............................................................................................................................. 170
References....................................................................................................................................... 171
Chapter 8 Implementation Approaches of Front-End Processors.............................................. 173
M. Michael Vai and Huy T. Nguyen, MIT Lincoln Laboratory
8.1
8.2
8.3

Introduction.......................................................................................................................... 173
Front-End Processor Design Methodology.......................................................................... 174
Front-End Signal Processing Technologies.......................................................................... 175
8.3.1 Full-Custom ASIC.................................................................................................. 176
8.3.2 Synthesized ASIC................................................................................................... 176
8.3.3 FPGA Technology.................................................................................................. 177
8.3.4 Structured ASIC..................................................................................................... 179
8.4 Intellectual Property............................................................................................................ 179
8.5 Development Cost................................................................................................................ 179
8.6 Design Space........................................................................................................................ 182
8.7 Design Case Studies............................................................................................................. 183
8.7.1 Channelized Adaptive Beamformer Processor...................................................... 183
8.7.2 Radar Pulse Compression Processor...................................................................... 187
8.7.3 Co-Design Benefits................................................................................................. 189
8.8 Summary.............................................................................................................................. 190
References....................................................................................................................................... 190
Chapter 9 Application-Specific Integrated Circuits................................................................... 191

M. Michael Vai, William S. Song, and Brian M. Tyrrell, MIT Lincoln Laboratory

9.1

7197.indb 9

Introduction.......................................................................................................................... 191

5/14/08 12:15:14 PM




High Performance Embedded Computing Handbook: A Systems Perspective

9.2
9.3

Integrated Circuit Technology Evolution............................................................................. 192
CMOS Technology............................................................................................................... 194
9.3.1 MOSFET................................................................................................................ 195
9.4 CMOS Logic Structures....................................................................................................... 196
9.4.1 Static Logic............................................................................................................. 196
9.4.2 Dynamic CMOS Logic........................................................................................... 198
9.5 Integrated Circuit Fabrication.............................................................................................. 198
9.6 Performance Metrics............................................................................................................200
9.6.1 Speed......................................................................................................................200
9.6.2 Power Dissipation...................................................................................................202
9.7 Design Methodology............................................................................................................202
9.7.1 Full-Custom Physical Design.................................................................................203

9.7.2 Synthesis Process...................................................................................................203
9.7.3 Physical Verification...............................................................................................205
9.7.4 Simulation...............................................................................................................206
9.7.5 Design for Manufacturability.................................................................................206
9.8 Packages...............................................................................................................................207
9.9 Testing..................................................................................................................................208
9.9.1 Fault Models...........................................................................................................209
9.9.2 Test Generation for Stuck-at Faults........................................................................209
9.9.3 Design for Testability............................................................................................. 210
9.9.4 Built-in Self-Test..................................................................................................... 211
9.10 Case Study............................................................................................................................ 212
9.11 Summary.............................................................................................................................. 215
References....................................................................................................................................... 215
Chapter 10 Field Programmable Gate Arrays............................................................................. 217
Miriam Leeser, Northeastern University
10.1 Introduction.......................................................................................................................... 217
10.2 FPGA Structures.................................................................................................................. 218
10.2.1 Basic Structures Found in FPGAs.......................................................................... 218
10.3 Modern FPGA Architectures............................................................................................... 222
10.3.1 Embedded Blocks................................................................................................... 222
10.3.2 Future Directions.................................................................................................... 223
10.4 Commercial FPGA Boards and Systems.............................................................................224
10.5 Languages and Tools for Programming FPGAs..................................................................224
10.5.1 Hardware Description Languages.......................................................................... 225
10.5.2 High-Level Languages........................................................................................... 225
10.5.3 Library-Based Solutions......................................................................................... 226
10.6 Case Study: Radar Processing on an FPGA........................................................................ 227
10.6.1 Project Description................................................................................................. 227
10.6.2 Parallelism: Fine-Grained versus Coarse-Grained................................................ 228
10.6.3 Data Organization.................................................................................................. 228

10.6.4 Experimental Results............................................................................................. 229
10.7 Challenges to High Performance with FPGA Architectures............................................... 229
10.7.1 Data: Movement and Organization........................................................................ 229
10.7.2 Design Trade-Offs.................................................................................................. 230
10.8 Summary.............................................................................................................................. 230
Acknowledgments........................................................................................................................... 230
References....................................................................................................................................... 231

7197.indb 10

5/14/08 12:15:14 PM


Contents

xi

Chapter 11 Intellectual Property-Based Design.......................................................................... 233
Wayne Wolf, Georgia Institute of Technology
11.1 Introduction.......................................................................................................................... 233
11.2 Classes of Intellectual Property........................................................................................... 234
11.3 Sources of Intellectual Property.......................................................................................... 235
11.4 Licenses for Intellectual Property........................................................................................ 236
11.5 CPU Cores............................................................................................................................ 236
11.6 Busses................................................................................................................................... 237
11.7 I/O Devices.......................................................................................................................... 238
11.8 Memories............................................................................................................................. 238
11.9 Operating Systems............................................................................................................... 238
11.10 Software Libraries and Middleware.................................................................................... 239
11.11 IP-Based Design Methodologies.......................................................................................... 239

11.12 Standards-Based Design......................................................................................................240
11.13 Summary.............................................................................................................................. 241
References....................................................................................................................................... 241
Chapter 12 Systolic Array Processors......................................................................................... 243
M. Michael Vai, Huy T. Nguyen, Preston A. Jackson, and William S. Song,
MIT Lincoln Laboratory
12.1
12.2
12.3
12.4

Introduction.......................................................................................................................... 243
Beamforming Processor Design..........................................................................................244
Systolic Array Design Approach......................................................................................... 247
Design Examples.................................................................................................................. 255
12.4.1 QR Decomposition Processor................................................................................ 255
12.4.2 Real-Time FFT Processor...................................................................................... 259
12.4.3 Bit-Level Systolic Array Methodology................................................................... 262
12.5 Summary.............................................................................................................................. 263
References....................................................................................................................................... 263

Section IV  Programmable High Performance Embedded
Computing Systems
Chapter 13 Computing Devices................................................................................................... 267
Kenneth Teitelbaum, MIT Lincoln Laboratory
13.1 Introduction.......................................................................................................................... 267
13.2 Common Metrics................................................................................................................. 268
13.2.1 Assessing the Required Computation Rate............................................................ 268
13.2.2 Quantifying the Performance of COTS Computing Devices................................ 269
13.3 Current COTS Computing Devices in Embedded Systems................................................. 270

13.3.1 General-Purpose Microprocessors......................................................................... 271
13.3.1.1 Word Length......................................................................................... 271
13.3.1.2 Vector Processing Units........................................................................ 271
13.3.1.3 Power Consumption versus Performance............................................. 271
13.3.1.4 Memory Hierarchy................................................................................ 272
13.3.1.5 Some Benchmark Results..................................................................... 273
13.3.1.6 Input/Output.......................................................................................... 274

7197.indb 11

5/14/08 12:15:15 PM


xii

High Performance Embedded Computing Handbook: A Systems Perspective

13.3.2 Digital Signal Processors....................................................................................... 274
13.4 Future Trends....................................................................................................................... 274
13.4.1 Technology Projections and Extrapolating Current Architectures........................ 275
13.4.2 Advanced Architectures and the Exploitation of Moore’s Law............................. 276
13.4.2.1 Multiple-Core Processors..................................................................... 276
13.4.2.2 The IBM Cell Broadband Engine......................................................... 277
13.4.2.3 SIMD Processor Arrays........................................................................ 277
13.4.2.4 DARPA Polymorphic Computing Architectures.................................. 278
13.4.2.5 Graphical Processing Units as Numerical Co-processors.................... 278
13.4.2.6 FPGA-Based Co-processors................................................................. 279
13.5 Summary..............................................................................................................................280
References.......................................................................................................................................280
Chapter 14 Interconnection Fabrics............................................................................................. 283

Kenneth Teitelbaum, MIT Lincoln Laboratory
14.1 Introduction.......................................................................................................................... 283
14.1.1 Anatomy of a Typical Interconnection Fabric........................................................284
14.1.2 Network Topology and Bisection Bandwidth......................................................... 285
14.1.3 Total Exchange....................................................................................................... 285
14.1.4 Parallel Two-Dimensional Fast Fourier Transform—A Simple Example............. 286
14.2 Crossbar Tree Networks....................................................................................................... 287
14.2.1 Network Formulas.................................................................................................. 289
14.2.2 Scalability of Network Bisection Width.................................................................290
14.2.3 Units of Replication................................................................................................ 291
14.2.4 Pruning Crossbar Tree Networks........................................................................... 292
14.3 VXS: A Commercial Example............................................................................................. 295
14.3.1 Link Essentials....................................................................................................... 295
14.3.2 VXS-Supported Topologies................................................................................... 297
14.4 Summary.............................................................................................................................. 298
References....................................................................................................................................... 301
Chapter 15 Performance Metrics and Software Architecture..................................................... 303
Jeremy Kepner, Theresa Meuse, and Glenn E. Schrader, MIT Lincoln Laboratory
15.1 Introduction.......................................................................................................................... 303
15.2 Synthetic Aperture Radar Example Application.................................................................304
15.2.1 Operating Modes....................................................................................................306
15.2.2 Computational Workload.......................................................................................307
15.3 Degrees of Parallelism......................................................................................................... 310
15.3.1 Parallel Performance Metrics (no communication)............................................... 311
15.3.2 Parallel Performance Metrics (with communication)............................................ 313
15.3.3 Amdahl’s Law........................................................................................................ 314
15.4 Standard Programmable Multi-Computer........................................................................... 315
15.4.1  Network Model......................................................................................................... 317
15.5 Parallel Programming Models and Their Impact................................................................ 319
15.5.1 High-Level Programming Environment with Global Arrays................................ 320

15.6 System Metrics..................................................................................................................... 323
15.6.1 Performance........................................................................................................... 323
15.6.2 Form Factor............................................................................................................ 324

7197.indb 12

5/14/08 12:15:15 PM


Contents

xiii

15.6.3 Efficiency................................................................................................................ 325
15.6.4 Software Cost......................................................................................................... 327
References....................................................................................................................................... 329
Appendix A: A Synthetic Aperture Radar Algorithm................................................................... 330
A.1 Scalable Data Generator...................................................................................................... 330
A.2 Stage 1: Front-End Sensor Processing................................................................................. 330
A.3 Stage 2: Back-End Knowledge Formation........................................................................... 333
Chapter 16 Programming Languages.......................................................................................... 335
James M. Lebak, The MathWorks
Introduction.......................................................................................................................... 335
Principles of Programming Embedded Signal Processing Systems.................................... 336
Evolution of Programming Languages................................................................................ 337
Features of Third-Generation Programming Languages.................................................... 338
16.4.1 Object-Oriented Programming.............................................................................. 338
16.4.2 Exception Handling................................................................................................ 338
16.4.3 Generic Programming............................................................................................ 339
16.5Use of Specific Languages in High Performance Embedded Computing........................... 339

16.5.1 C............................................................................................................................. 339
16.5.2 Fortran....................................................................................................................340
16.5.3 Ada.........................................................................................................................340
16.5.4 C++......................................................................................................................... 341
16.5.5 Java......................................................................................................................... 342
16.6 Future Development of Programming Languages............................................................... 342
16.7 Summary: Features of Current Programming Languages.................................................. 343
References....................................................................................................................................... 343

16.1
16.2
16.3
16.4

Chapter 17 Portable Software Technology.................................................................................. 347
James M. Lebak, The MathWorks
17.1 Introduction.......................................................................................................................... 347
17.2 Libraries............................................................................................................................... 349
17.2.1 Distributed and Parallel Programming.................................................................. 349
17.2.2 Surveying the State of Portable Software Technology........................................... 350
17.2.2.1 Portable Math Libraries........................................................................ 350
17.2.2.2 Portable Performance Using Math Libraries........................................ 350
17.2.3 Parallel and Distributed Libraries.......................................................................... 351
17.2.4 Example: Expression Template Use in the MIT Lincoln Laboratory Parallel
Vector Library........................................................................................................ 353
17.3 Summary.............................................................................................................................. 356
References....................................................................................................................................... 357
Chapter 18 Parallel and Distributed Processing.......................................................................... 359
Albert I. Reuther and Hahn G. Kim, MIT Lincoln Laboratory
18.1 Introduction.......................................................................................................................... 359

18.2 Parallel Programming Models.............................................................................................360
18.2.1 Threads...................................................................................................................360
18.2.1.1 Pthreads................................................................................................ 362
18.2.1.2 OpenMP................................................................................................ 362

7197.indb 13

5/14/08 12:15:16 PM


xiv

High Performance Embedded Computing Handbook: A Systems Perspective

18.2.2 Message Passing..................................................................................................... 363
18.2.2.1 Parallel Virtual Machine...................................................................... 363
18.2.2.2 Message Passing Interface....................................................................364
18.2.3 Partitioned Global Address Space.......................................................................... 365
18.2.3.1 Unified Parallel C................................................................................. 366
18.2.3.2 VSIPL++............................................................................................... 366
18.2.4 Applications............................................................................................................ 368
18.2.4.1 Fast Fourier Transform......................................................................... 369
18.2.4.2 Synthetic Aperture Radar..................................................................... 370
18.3 Distributed Computing Models............................................................................................ 371
18.3.1 Client-Server........................................................................................................... 372
18.3.1.1 SOAP.................................................................................................... 373
18.3.1.2 Java Remote Method Invocation........................................................... 374
18.3.1.3 Common Object Request Broker Architecture..................................... 374
18.3.2 Data Driven............................................................................................................ 375
18.3.2.1 Java Messaging Service........................................................................ 376

18.3.2.2 Data Distribution Service..................................................................... 376
18.3.3 Applications............................................................................................................ 377
18.3.3.1 Radar Open Systems Architecture....................................................... 377
18.3.3.2 Integrated Sensing and Decision Support............................................. 378
18.4 Summary.............................................................................................................................. 379
References....................................................................................................................................... 379
Chapter 19 Automatic Code Parallelization and Optimization................................................... 381
Nadya T. Bliss, MIT Lincoln Laboratory
19.1 Introduction.......................................................................................................................... 381
19.2 Instruction-Level Parallelism versus Explicit-Program Parallelism.................................... 382
19.3 Automatic Parallelization Approaches: A Taxonomy.......................................................... 384
19.4 Maps and Map Independence.............................................................................................. 385
19.5 Local Optimization in an Automatically Tuned Library..................................................... 386
19.6 Compiler and Language Approach...................................................................................... 388
19.7 Dynamic Code Analysis in a Middleware System.............................................................. 389
19.8 Summary.............................................................................................................................. 391
References....................................................................................................................................... 392

Section V  High Performance Embedded Computing

Application Examples
Chapter 20 Radar Applications.................................................................................................... 397
Kenneth Teitelbaum, MIT Lincoln Laboratory
20.1 Introduction.......................................................................................................................... 397
20.2 Basic Radar Concepts.......................................................................................................... 398
20.2.1 Pulse-Doppler Radar Operation............................................................................. 398
20.2.2 Multichannel Pulse-Doppler................................................................................... 399
20.2.3 Adaptive Beamforming..........................................................................................400
20.2.4 Space-Time Adaptive Processing........................................................................... 401
20.3 Mapping Radar Algorithms onto HPEC Architectures.......................................................402


7197.indb 14

5/14/08 12:15:16 PM


Contents

xv

20.3.1 Round-Robin Partitioning......................................................................................403
20.3.2 Functional Pipelining.............................................................................................403
20.3.3 Coarse-Grain Data-Parallel Partitioning................................................................403
20.3.4 Fine-Grain Data-Parallel Partitioning....................................................................404
20.4 Implementation Examples....................................................................................................405
20.4.1 Radar Surveillance Processor................................................................................405
20.4.2 Adaptive Processor (Generation 1).........................................................................406
20.4.3 Adaptive Processor (Generation 2).........................................................................406
20.4.4 KASSPER...............................................................................................................407
20.5 Summary..............................................................................................................................409
References.......................................................................................................................................409
Chapter 21 A Sonar Application.................................................................................................. 411
W. Robert Bernecky, Naval Undersea Warfare Center
21.1 Introduction.......................................................................................................................... 411
21.2 Sonar Problem Description.................................................................................................. 411
21.3 Designing an Embedded Sonar System............................................................................... 412
21.3.1 The Sonar Processing Thread................................................................................ 412
21.3.2 Prototype Development.......................................................................................... 413
21.3.3 Computational Requirements................................................................................. 414
21.3.4 Parallelism.............................................................................................................. 414

21.3.5 Implementing the Real-Time System..................................................................... 415
21.3.6 Verify Real-Time Performance.............................................................................. 415
21.3.7 Verify Correct Output............................................................................................ 415
21.4 An Example Development................................................................................................... 415
21.4.1 System Attributes................................................................................................... 416
21.4.2 Sonar Processing Thread Computational Requirements....................................... 416
21.4.3 Sensor Data Collection........................................................................................... 416
21.4.4 Two-Dimensional Fast Fourier Transform............................................................. 417
21.4.5 Covariance Matrix Formation................................................................................ 418
21.4.6 Covariance Matrix Inversion.................................................................................. 418
21.4.7 Adaptive Beamforming.......................................................................................... 418
21.4.8 Broadband Formation............................................................................................. 419
21.4.9 Normalization......................................................................................................... 420
21.4.10 Detection................................................................................................................ 420
21.4.11 Display Preparation and Operator Controls........................................................... 420
21.4.12 Summary of Computational Requirements............................................................ 421
21.4.13 Parallelism.............................................................................................................. 421
21.5 Hardware Architecture........................................................................................................ 422
21.6 Software Considerations...................................................................................................... 422
21.7 Embedded Sonar Systems of the Future.............................................................................. 423
References....................................................................................................................................... 423
Chapter 22 Communications Applications.................................................................................. 425
Joel I. Goodman and Thomas G. Macdonald, MIT Lincoln Laboratory
22.1 Introduction.......................................................................................................................... 425
22.2 Communications Application Challenges............................................................................ 425
22.3 Communications Signal Processing..................................................................................... 427
22.3.1 Transmitter Signal Processing................................................................................ 427

7197.indb 15


5/14/08 12:15:17 PM


xvi

High Performance Embedded Computing Handbook: A Systems Perspective

22.3.2 Transmitter Processing Requirements.................................................................... 431
22.3.3 Receiver Signal Processing.................................................................................... 431
22.3.4 Receiver Processing Requirements........................................................................ 434
22.4 Summary.............................................................................................................................. 435
References....................................................................................................................................... 436
Chapter 23 Development of a Real-Time Electro-Optical Reconnaissance System................... 437
Robert A. Coury, MIT Lincoln Laboratory
23.1 Introduction.......................................................................................................................... 437
23.2 Aerial Surveillance Background.......................................................................................... 437
23.3 Methodology........................................................................................................................ 441
23.3.1 Performance Modeling........................................................................................... 442
23.3.2 Feature Tracking and Optic Flow...........................................................................444
23.3.3 Three-Dimensional Site Model Generation...........................................................446
23.3.4 Challenges..............................................................................................................448
23.3.5 Camera Model........................................................................................................448
23.3.6 Distortion................................................................................................................ 450
23.4 System Design Considerations............................................................................................. 451
23.4.1 Altitude................................................................................................................... 451
23.4.2 Sensor..................................................................................................................... 451
23.4.3 GPS/IMU................................................................................................................ 452
23.4.4 Processing and Storage........................................................................................... 452
23.4.5 Communications..................................................................................................... 453
23.4.6 Cost......................................................................................................................... 453

23.4.7 Test Platform........................................................................................................... 453
23.5 Transition to Target Platform............................................................................................... 455
23.5.1 Payload................................................................................................................... 456
23.5.2 GPS/IMU................................................................................................................ 456
23.5.3 Sensor..................................................................................................................... 456
23.5.4 Processing............................................................................................................... 457
23.5.5 Communications and Storage................................................................................. 458
23.5.6 Altitude................................................................................................................... 459
23.6 Summary.............................................................................................................................. 459
Acknowledgments........................................................................................................................... 459
References....................................................................................................................................... 459

Section VI  Future Trends
Chapter 24 Application and HPEC System Trends..................................................................... 463
David R. Martinez, MIT Lincoln Laboratory
24.1 Introduction.......................................................................................................................... 463
24.1.1 Sensor Node Architecture Trends.......................................................................... 467
24.2 Hardware Trends..................................................................................................................469
24.3 Software Trends................................................................................................................... 473
24.4 Distributed Net-Centric Architecture.................................................................................. 475
24.5 Summary.............................................................................................................................. 478
References....................................................................................................................................... 479

7197.indb 16

5/14/08 12:15:17 PM


Contents


xvii

Chapter 25 A Review on Probabilistic CMOS (PCMOS) Technology: From Device
Characteristics to Ultra-Low-Energy SOC Architectures........................................ 481
Krishna V. Palem, Lakshmi N. Chakrapani, Bilge E. S. Akgul, and
Pinar Korkmaz, Georgia Institute of Technology
25.1 Introduction.......................................................................................................................... 481
25.2 Characterizing the Behavior of a PCMOS Switch............................................................... 483
25.2.1 Inverter Realization of a Probabilistic Switch........................................................ 483
25.2.2 Analytical Model and the Three Laws of a PCMOS Inverter................................ 486
25.2.3 Realizing a Probabilistic Inverter with Limited Available Noise.......................... 489
25.3 Realizing PCMOS-Based Low-Energy Architectures........................................................ 490
25.3.1 Metrics for Evaluating PCMOS-Based Architectures........................................... 490
25.3.2 Experimental Methodology.................................................................................... 491
25.3.3 Metrics for Analysis of PCMOS-Based Implementations..................................... 492
25.3.4 Hyperencryption Application and PCMOS-Based Implementation...................... 493
25.3.5 Results and Analysis............................................................................................... 494
25.3.6 PCMOS-Based Architectures for Error-Tolerant Applications.............................. 495
25.4 Conclusions.......................................................................................................................... 496
References....................................................................................................................................... 497
Chapter 26 Advanced Microprocessor Architectures.................................................................. 499
Janice McMahon and Stephen Crago, University of Southern California,
Information Sciences Institute
Donald Yeung, University of Maryland
26.1 Introduction.......................................................................................................................... 499
26.2 Background..........................................................................................................................500
26.2.1 Established Instruction-Level Parallelism Techniques..........................................500
26.2.2 Parallel Architectures............................................................................................. 501
26.3 Motivation for New Architectures.......................................................................................504
26.3.1 Limitations of Conventional Microprocessors.......................................................504

26.4 Current Research Microprocessors...................................................................................... 505
26.4.1 Instruction-Level Parallelism................................................................................. 505
26.4.1.1 Tile-Based Organization.......................................................................506
26.4.1.2 Explicit Parallelism Model...................................................................507
26.4.1.3 Scalable On-Chip Networks.................................................................508
26.4.2 Data-Level Parallelism...........................................................................................509
26.4.2.1 SIMD Architectures.............................................................................509
26.4.2.2 Vector Architectures............................................................................. 511
26.4.2.3 Streaming Architectures....................................................................... 513
26.4.3 Thread-Level Parallelism....................................................................................... 513
26.4.3.1 Multithreading and Granularity............................................................ 514
26.4.3.2 Multilevel Memory............................................................................... 515
26.4.3.3 Speculative Execution........................................................................... 517
26.5 Real-Time Embedded Applications..................................................................................... 518
26.5.1 Scalability............................................................................................................... 518
26.5.2 Input/Output Bandwidth......................................................................................... 519
26.5.3 Programming Models and Algorithm Mapping..................................................... 519
26.6 Summary.............................................................................................................................. 519
References....................................................................................................................................... 520

7197.indb 17

5/14/08 12:15:18 PM


xviii

High Performance Embedded Computing Handbook: A Systems Perspective

Glossary of Acronyms and Abbreviations.................................................................................. 523

Index............................................................................................................................................... 531

7197.indb 18

5/14/08 12:15:18 PM


Preface
Over the past several decades, advances in digital signal processing have permeated many applications, providing unprecedented growth in capabilities. Complex military systems, for example,
evolved from primarily analog processing during the 1960s and 1970s to primarily digital processing in the last decade. MIT Lincoln Laboratory pioneered some of the early applications of digital
signal processing by developing dedicated processing performed in hardware to implement application-specific functions. Through the advent of programmable computing, many of these digital
processing algorithms were implemented in more general-purpose computing while still preserving
compute-intensive functions in dedicated hardware. As a result of the wide range of computing
environments and the growth in the requisite parallel processing, MIT Lincoln Laboratory recognized the need to assemble the embedded community in a yearly national event. In 2006, this
event, the High Performance Embedded Computing (HPEC) Workshop, marked its tenth anniversary of providing a forum for current advances in HPEC. This handbook, an outgrowth of the
many advances made in the last decade, also, in several instances, builds on knowledge originally
discussed and presented by the handbook authors at HPEC Workshops. The editors and contributing authors believe it is important to bring together in the form of a handbook the lessons learned
from a decade of advances in high performance embedded computing.
This HPEC handbook is best suited to systems engineers and computational scientists working
in the embedded computing field. The emphasis is on a systems perspective, but complemented with
specific implementations starting with analog-to-digital converters, continuing with front-end signal
processing addressing compute-intensive operations, and progressing through back-end processing
requiring intensive parallel and programmable processing. Hardware and software engineers will
also benefit from this handbook since the chapters present their subject areas by starting with fundamental principles and exemplifying those via actual developed systems. The editors together with
the contributing authors bring a wealth of practical experience acquired through working in this
field for a span of several decades. Therefore, the approach taken in each of the chapters is to cover
the respective system components found in today’s HPEC systems by addressing design trade-offs,
implementation options, and techniques of the trade and then solidifying the concepts through specific HPEC system examples. This approach provides a more valuable learning tool since the reader
will learn about the different subject areas by way of factual implementation cases developed in the
course of the editors’ and contributing authors’ work in this exciting field.

Since a complex HPEC system consists of many subsystems and components, this handbook
covers every segment based on a canonical framework. The canonical framework is shown in the
following figure. This framework is used across the handbook as a road map to help the reader navigate logically through the handbook.
The introductory chapters present examples of complex HPEC systems representative of actual
prototype developments. The reader will get an appreciation of the key subsystems and components by first covering these chapters. The handbook then addresses each of the system components
shown in the aforementioned figure. After the introductory chapters, the handbook covers computational characteristics of high performance embedded algorithms and applications to help the reader
understand the key challenges and recommended approaches. The handbook then proceeds with
a thorough description of analog-to-digital converters typically found in today’s HPEC systems.
The discussion continues into front-end implementation approaches followed by back-end parallel
processing techniques. Since the front-end processing is typically very compute-intensive, this part
of the system is best suited for VLSI hardware and/or field programmable gate arrays. Therefore,
these subject areas are addressed in great detail.
xix

7197.indb 19

5/14/08 12:15:18 PM


xx

High Performance Embedded Computing Handbook: A Systems Perspective

Application
Architecture
HW Module

ADC

SW Module


Computation HW IP

Computation Middleware

Communication HW IP

Communication Middleware

Application-Specific
Architecture

Programmable
Architecture

ASIC

FPGA

I/O

Memory

Multi-Proc.

Uni-Proc.

I/O

Memory


Interconnection Architecture (fabric, point-to-point, etc.)

Canonical framework illustrating key subsystems and components of a high performance embedded computing (HPEC) system.

The handbook continues with several chapters discussing candidate back-end implementation
techniques. The back-end of an HPEC system is often implemented using a parallel set of high
performing programmable chips. Thus, parallel processing technologies are discussed in significant depth. Computing devices, interconnection fabrics, software architectures and metrics, plus
middleware and portable software, are covered at a level that practicing engineers and HPEC computational practitioners can learn and adapt to suit their own implementation requirements. More
and more of the systems implemented today require an open system architecture, which depends on
adopted standards targeted at parallel processing. These standards are also covered in significant
detail, illustrating the benefits of this open architecture trend.
The handbook concludes with several chapters presenting application examples ranging from
electro-optics, sonar surveillance, communications systems, to advanced radar systems. This last
section of the handbook also addresses future trends in high performance embedded computing
and presents advances in microprocessor architectures since these processors are at the heart of any
future HPEC system.
The HPEC handbook, by leveraging the contributors’ many years of experience in embedded
computing, provides readers with the requisite background to effectively work in this field. It may
also serve as a reference for an advanced undergraduate course or a specialized graduate course in
high performance embedded computing.
David R. Martinez
Robert A. Bond
M. Michael Vai

7197.indb 20

5/14/08 12:15:19 PM



Acknowledgments
This handbook is the product of many hours of dedicated efforts by the editors, authors, and production personnel. It has been a very rewarding experience. This book would not have been possible
without the technical contributions from all the authors. Being leading experts in the field of high
performance embedded computing, they bring a wealth of experience not found in any other book
dedicated to this subject area.
We would also like to thank the editors’ employer, MIT Lincoln Laboratory; many of the subjects and fundamental principles discussed in the handbook stemmed from research and development projects performed at the Laboratory in the past several years. The Lincoln Laboratory
management wholeheartedly supported the production of this handbook from its start. We are especially grateful for the valuable support we received during the preparation of the manuscript. In
particular, we would like to thank Mr. David Granchelli and Ms. Dorothy Ryan. Dorothy Ryan
patiently edited every single chapter of this book. David Granchelli coordinated the assembling of
the book. Also, many thanks are due to the graphics artists—Mr. Chet Beals, Mr. Henry Palumbo,
Mr. Art Saarinen, and Mr. Newton Taylor. The graphics work flow was supervised by Mr. John
Austin. Many of the chapters were proofread by Mrs. Barbra Gottschalk. Finally, we would like to
thank the publisher, Taylor & Francis/CRC Press, for working with us in completing this handbook.
The MIT Lincoln Laboratory Communications Office, editorial personnel, graphics artists, and the
publisher are the people who transformed a folder of manuscript files into a complete book.

xxi

7197.indb 21

5/14/08 12:15:20 PM


7197.indb 22

5/14/08 12:15:20 PM


About the Editors
Mr. David R. Martinez is Head of the Intelligence, Surveillance, and Reconnaissance (ISR) Systems and Technology Division at MIT Lincoln Laboratory.

He oversees more than 300 people and has direct line management responsibility for the division’s programs in the development of advanced techniques and
prototypes for surface surveillance, laser systems, active and passive adaptive
array processing, integrated sensing and decision support, undersea warfare,
and embedded hardware and software computing.
Mr. Martinez joined MIT Lincoln Laboratory in 1988 and was responsible
for the development of a large prototype space-time adaptive signal processor.
Prior to joining the Laboratory, he was Principal Research Engineer at ARCO
Oil and Gas Company, responsible for a multidisciplinary company project to demonstrate the viability of real-time adaptive signal processing techniques. He received the ARCO special achievement award for the planning and execution of the 1986 Cuyama Project, which provided a superior
and cost-effective approach to three-dimensional seismic surveys. He holds three U.S. patents.
Mr. Martinez is the founder, and served from 1997 to 1999 as chairman, of a national workshop on high performance embedded computing. He has also served as keynote speaker at multiple
national-level workshops and symposia including the Tenth Annual High Performance Embedded
Computing Workshop, the Real-Time Systems Symposium, and the Second International Workshop
on Compiler and Architecture Support for Embedded Systems. He was appointed to the Army Science Board from 1999 to 2004. From 1994 to 1998, he was Associate Editor of the IEEE Signal Processing magazine. He was elected an IEEE Fellow in 2003, and in 2007 he served on the Defense
Science Board ISR Task Force.
Mr. Martinez earned a bachelor’s degree from New Mexico State University in 1976, an M.S.
degree from the Massachusetts Institute of Technology (MIT), and an E.E. degree jointly from MIT
and the Woods Hole Oceanographic Institution in 1979. He completed an M.B.A. at the Southern
Methodist University in 1986. He has attended the Program for Senior Executives in National and
International Security at the John F. Kennedy School of Government, Harvard University.
Mr. Robert A. Bond is Leader of the Embedded Digital Systems Group at MIT
Lincoln Laboratory. In his career, he has focused on the research and development of high performance embedded processors, advanced signal processing
technology, and embedded middleware architectures. Prior to coming to the
Laboratory, Mr. Bond worked at CAE Ltd. on radar, navigation, and Kalman
filter applications for flight simulators, and then at Sperry, where he developed
simulation systems for a Naval command and control application.
Mr. Bond joined MIT Lincoln Laboratory in 1987. In his first assignment,
he was responsible for the development of the Mountaintop RSTER radar software architecture and was coordinator for the radar system integration. In the
early 1990s, he was involved in seminal studies to evaluate the use of massively parallel processors
(MPP) for real-time signal and image processing. Later, he managed the development of a 200 billion operations-per-second airborne processor, consisting of a 1000-processor MPP for performing
radar space-time adaptive processing and a custom processor for performing high-throughput radar

signal processing. In 2001, he led a team in the development of the Parallel Vector Library, a novel
middleware technology for the portable and scalable development of high performance parallel
signal processors.
xxiii

7197.indb 23

5/14/08 12:15:21 PM


xxiv

High Performance Embedded Computing Handbook: A Systems Perspective

In 2003, Mr. Bond was one of two researchers to receive the Lincoln Laboratory Technical
Excellence Award for his “technical vision and leadership in the application of high-performance
embedded processing architectures to real-time digital signal processing systems.” He earned a B.S.
degree (honors) in physics from Queen’s University, Ontario, Canada, in 1978.
Dr. M. Michael Vai is Assistant Leader of the Embedded Digital Systems
Group at MIT Lincoln Laboratory. He has been involved in the area of high
performance embedded computing for over 20 years. He has worked and published extensively in very-large-scale integration (VLSI), application-specific
integrated circuits (ASICs), field programmable gate arrays (FPGAs), design
methodology, and embedded digital systems. He has published more than 60
technical papers and a textbook (VLSI Design, CRC Press, 2001). His current
research interests include advanced signal processing algorithms and architectures, rapid prototyping methodologies, and anti-tampering techniques.
Until July 1999, Dr. Vai was on the faculty of the Electrical and Computer
Engineering Department, Northeastern University, Boston, Massachusetts. At Northeastern University, he developed and taught the VLSI Design and VLSI Architecture courses. He also established
and supervised a VLSI CAD laboratory. In May 1999, the Electrical and Computer Engineering
students presented him with the Outstanding Professor Award. During his tenure at Northeastern
University, he performed research programs funded by the National Science Foundation (NSF),

Defense Advanced Research Projects Agency (DARPA), and industry.
After joining MIT Lincoln Laboratory in 1999, Dr. Vai led the development of several notable
real-time signal processing systems incorporating high-density VLSI chips and FPGAs. He coordinated and taught a VLSI Design course at Lincoln Laboratory in 2002, and in April 2003, he
delivered a lecture entitled “ASIC and FPGA DSP Implementations” in the IEEE lecture series,
“Current Topics in Digital Signal Processing.” Dr. Vai earned a B.S. degree from National Taiwan
University, Taipei, Taiwan, in 1979, and M.S. and Ph.D. degrees from Michigan State University,
East Lansing, Michigan, in 1985 and 1987, respectively, all in electrical engineering. He is a senior
member of IEEE.

7197.indb 24

5/14/08 12:15:21 PM


×