Tải bản đầy đủ (.pdf) (703 trang)

Arm system developer’s guide designing and optimizing system software

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.32 MB, 703 trang )


ARM System Developer’s Guide
Designing and Optimizing
System Software


About the Authors
Andrew N. Sloss
Andrew Sloss received a B.Sc. in Computer Science from the University of Herefordshire (UK)
in 1992 and was certified as a Chartered Engineer by the British Computer Society (C.Eng, MBCS).
He has worked in the computer industry for over 16 years and has been involved with the ARM
processor since 1987. He has gained extensive experience developing a wide range of applications
running on the ARM processor. He designed the first editing systems for both Chinese and Egyptian
Hieroglyphics executing on the ARM2 and ARM3 processors for Emerald Publishing (UK). Andrew
Sloss has worked at ARM Inc. for over six years. He is currently a Technical Sales Engineer advising
and supporting companies developing new products. He works within the U.S. Sales Organization
and is based in Los Gatos, California.
Dominic Symes
Dominic Symes is currently a software engineer at ARM Ltd. in Cambridge, England, where
he has worked on ARM-based embedded software since 1995. He received his B.A. and D.Phil. in
Mathematics from Oxford University. He first programmed the ARM in 1989 and is particularly
interested in algorithms and optimization techniques. Before joining ARM, he wrote commercial and
public domain ARM software.
Chris Wright
Chris Wright began his embedded systems career in the early 80s at Lockheed Advanced Marine
Systems. While at Advanced Marine Systems he wrote small software control systems for use on
the Intel 8051 family of microcontrollers. He has spent much of his career working at the Lockheed
Palo Alto Research Laboratory and in a software development group at Dow Jones Telerate. Most
recently, Chris Wright spent several years in the Customer Support group at ARM Inc., training and
supporting partner companies developing new ARM-based products. Chris Wright is currently the
Director of Customer Support at Ultimodule Inc. in Sunnyvale, California.


John Rayfield
John Rayfield, an independent consultant, was formerly Vice President of Marketing, U.S., at
ARM. In this role he was responsible for setting ARM’s strategic marketing direction in the U.S.,
and identifying opportunities for new technologies to serve key market segments. John joined ARM
in 1996 and held various roles within the company, including Director of Technical Marketing and
R&D, which were focused around new product/technology development. Before joining ARM, John
held several engineering and management roles in the field of digital signal processing, software,
hardware, ASIC and system design. John holds an M.Sc. in Signal Processing from the University of
Surrey (UK) and a B.Sc.Hons. in Electronic Engineering from Brunel University (UK).


ARM System
Developer’s Guide
Designing and Optimizing
System Software

Andrew N. Sloss
Dominic Symes
Chris Wright
With a contribution by John Rayfield

AMSTERDAM • BOSTON • HEIDELBERG • LONDON
NEW YORK • OXFORD • PARIS • SAN DIEGO
SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO
Morgan Kaufmann is an imprint of Elsevier


Senior Editor
Publishing Services Manager
Project Manager

Developmental Editor
Editorial Assistant
Cover Design
Cover Image

Technical Illustration
Composition
Copyeditor
Proofreader
Indexer
Interior printer
Cover printer

Denise E.M. Penrose
Simon Crump
Sarah M. Hajduk
Belinda Breyer
Summer Block
Dick Hannus
Red Wing No.6 by Charles Biederman
Collection Walker Art Center, Minneapolis
Gift of the artist through the Ford Foundation Purchase Program, 1964
Dartmouth Publishing
Cepha Imaging, Ltd.
Ken Dellapenta
Jan Cocker
Ferreira Indexing
The Maple-Vail Book Manufacturing Group
Phoenix Color


Morgan Kaufmann Publishers is an imprint of Elsevier.
500 Sansome Street, Suite 400, San Francisco, CA 94111
This book is printed on acid-free paper.
© 2004 by Elsevier Inc. All rights reserved.
The programs, examples, and applications presented in this book and on the publisher’s Web site have been included for their instructional
value. The publisher and the authors offer no warranty implied or express, including but not limited to implied warranties of fitness or merchantability
for any particular purpose and do not accept any liability for any loss or damage arising from the use of any information in this book, or any error or
omission in such information, or any incorrect use of these programs, procedures, and applications.
Designations used by companies to distinguish their products are often claimed as trademarks or registered trademarks. In all instances in
which Morgan Kaufmann Publishers is aware of a claim, the product names appear in initial capital or all capital letters. Readers, however, should
contact the appropriate companies for more complete information regarding trademarks and registration.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical,
photocopying, scanning, or otherwise—without prior written permission of the publisher.
Permissions may be sought directly from Elsevier’s Science & Technology Rights Department in Oxford, UK: phone: (+44) 1865 843830, fax:
(+44) 1865 853333, e-mail: You may also complete your request on-line via the Elsevier homepage () by
selecting “Customer Support” and then “Obtaining Permissions.”
Library of Congress Cataloging-in-Publication Data
Sloss, Andrew N.
ARM system developer’s guide: designing and optimizing system software/Andrew N.
Sloss, Dominic Symes, Chris Wright.
p. cm.
Includes bibliographical references and index.
ISBN 1-55860-874-5 (alk. paper)
1. Computer software–Development. 2. RISC microprocessors. 3. Computer
architecture. I. Symes, Dominic. II. Wright, Chris, 1953- III. Title.
QA76.76.D47S565 2004
005.1–dc22
2004040366
ISBN: 1-55860-874-5
For information on all Morgan Kaufmann publications,

visit our Web site at www.mkp.com.
Printed in the United States of America
08 07 06 05 04
5 4 3 2 1


Contents
About the Authors
Preface

ii
xi

ARM Embedded Systems

3

Chapter

1

1.1
1.2
1.3
1.4
1.5

The RISC Design Philosophy
The ARM Design Philosophy
Embedded System Hardware

Embedded System Software
Summary

4
5
6
12
15

ARM Processor Fundamentals

19

2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8

Registers
Current Program Status Register
Pipeline
Exceptions, Interrupts, and the Vector Table
Core Extensions
Architecture Revisions
ARM Processor Families
Summary


21
22
29
33
34
37
38
43

Introduction to the ARM Instruction Set

47

3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9

50
58
60
73
75
78

79
82
84

Chapter

2

Chapter

3

Data Processing Instructions
Branch Instructions
Load-Store Instructions
Software Interrupt Instruction
Program Status Register Instructions
Loading Constants
ARMv5E Extensions
Conditional Execution
Summary

v


vi Contents

Chapter

4


Introduction to the Thumb Instruction Set
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9

Thumb Register Usage
ARM-Thumb Interworking
Other Branch Instructions
Data Processing Instructions
Single-Register Load-Store Instructions
Multiple-Register Load-Store Instructions
Stack Instructions
Software Interrupt Instruction
Summary

87
89
90
92
93
96
97
98

99
100

Chapter

5

Efficient C Programming

103

5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14

104
105
113
120

122
127
130
133
136
140
149
149
153
155

Overview of C Compilers and Optimization
Basic C Data Types
C Looping Structures
Register Allocation
Function Calls
Pointer Aliasing
Structure Arrangement
Bit-fields
Unaligned Data and Endianness
Division
Floating Point
Inline Functions and Inline Assembly
Portability Issues
Summary

Chapter

6


Writing and Optimizing ARM Assembly Code

157

6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8

158
163
163
171
180
183
191
197

Writing Assembly Code
Profiling and Cycle Counting
Instruction Scheduling
Register Allocation
Conditional Execution
Looping Constructs
Bit Manipulation
Efficient Switches



Contents

6.9
6.10

Handling Unaligned Data
Summary

vii

201
204

Chapter

7

Optimized Primitives

207

7.1
7.2
7.3
7.4
7.5
7.6
7.7

7.8
7.9

208
212
216
238
241
248
253
255
256

Double-Precision Integer Multiplication
Integer Normalization and Count Leading Zeros
Division
Square Roots
Transcendental Functions: log, exp, sin, cos
Endian Reversal and Bit Operations
Saturated and Rounded Arithmetic
Random Number Generation
Summary

Chapter

8

Digital Signal Processing

259


8.1
8.2
8.3
8.4
8.5
8.6

260
269
280
294
303
314

Representing a Digital Signal
Introduction to DSP on the ARM
FIR filters
IIR Filters
The Discrete Fourier Transform
Summary

Chapter

9

Exception and Interrupt Handling

317


9.1
9.2
9.3
9.4

318
324
333
364

Exception Handling
Interrupts
Interrupt Handling Schemes
Summary

Chapter

10

Firmware

367

10.1
10.2
10.3

367
372
379


Firmware and Bootloader
Example: Sandstone
Summary


viii Contents

Chapter

11

Embedded Operating Systems

381

11.1
11.2
11.3

381
383
400

Fundamental Components
Example: Simple Little Operating System
Summary

Chapter


12

Caches

403

12.1
12.2
12.3
12.4
12.5
12.6
12.7
12.8

404
408
418
423
423
443
456
457

The Memory Hierarchy and Cache Memory
Cache Architecture
Cache Policy
Coprocessor 15 and Caches
Flushing and Cleaning Cache Memory
Cache Lockdown

Caches and Software Performance
Summary

Chapter

13

Memory Protection Units

461

13.1
13.2
13.3
13.4

463
465
478
487

Protected Regions
Initializing the MPU, Caches, and Write Buffer
Demonstration of an MPU system
Summary

Chapter

14


Memory Management Units

491

14.1
14.2
14.3
14.4
14.5
14.6
14.7
14.8
14.9
14.10
14.11
14.12

492
493
501
501
506
510
512
513
515
520
545
545


Moving from an MPU to an MMU
How Virtual Memory Works
Details of the ARM MMU
Page Tables
The Translation Lookaside Buffer
Domains and Memory Access Permission
The Caches and Write Buffer
Coprocessor 15 and MMU Configuration
The Fast Context Switch Extension
Demonstration: A Small Virtual Memory System
The Demonstration as mmuSLOS
Summary


Contents

ix

Chapter

15

The Future of the Architecture
by John Rayfield

549

15.1
15.2
15.3

15.4
15.5

550
560
563
563
566

Advanced DSP and SIMD Support in ARMv6
System and Multiprocessor Support Additions to ARMv6
ARMv6 Implementations
Future Technologies beyond ARMv6
Summary

Appendix

A

ARM and Thumb Assembler Instructions

569

A.1
A.2
A.3
A.4
A.5

569

570
573
620
631

Using This Appendix
Syntax
Alphabetical List of ARM and Thumb Instructions
ARM Assembler Quick Reference
GNU Assembler Quick Reference

Appendix

B

ARM and Thumb Instruction Encodings

637

B.1
B.2
B.3

637
638
645

ARM Instruction Set Encodings
Thumb Instruction Set Encodings
Program Status Registers


Appendix

C

Processors and Architecture

647

C.1
C.2

647
647

ARM Naming Convention
Core and Architectures

Appendix

D

Instruction Cycle Timings

651

D.1
D.2
D.3
D.4

D.5
D.6
D.7
D.8

651
653
654
655
656
658
659
661

Using the Instruction Cycle Timing Tables
ARM7TDMI Instruction Cycle Timings
ARM9TDMI Instruction Cycle Timings
StrongARM1 Instruction Cycle Timings
ARM9E Instruction Cycle Timings
ARM10E Instruction Cycle Timings
Intel XScale Instruction Cycle Timings
ARM11 Cycle Timings


x Contents

Appendix

E


Suggested Reading

667

E.1
E.2
E.3

667
667

E.4

ARM References
Algorithm References
Memory Management and Cache Architecture (Hardware Overview
and Reference)
Operating System References

Index

667
668
669


Preface

Increasingly, embedded systems developers and system-on-chip designers select specific
microprocessor cores and a family of tools, libraries, and off-the-shelf components to

quickly develop new microprocessor-based products. A major player in this industry is
ARM. Over the last 10 years, the ARM architecture has become the most pervasive 32-bit
architecture in the world, with more than 2 billion ARM-based processors shipped at the
time of this writing. ARM processors are embedded in products ranging from cell/mobile
phones to automotive braking systems. A worldwide community of ARM partners and
third-party vendors has developed among semiconductor and product design companies,
including hardware engineers, system designers, and software developers. To date, no book
has directly addressed their need to develop the system and software for an ARM-based
embedded design. This text fills that gap.
Our goal has been to describe the operation of the ARM core from a product developer’s
perspective with a clear emphasis on software. Because we have written this book specifically
for engineers who are experienced with embedded systems development but who may be
unfamiliar with the ARM architecture, we have assumed no previous ARM experience.
To help our readers become productive as quickly as possible, we have included a suite
of ARM software examples that can be integrated into commercial products or used as
templates for the quick creation of productive software. The examples are numbered so
that readers can easily locate the source code on the publisher’s Web site. The examples are
also valuable to people with ARM design experience who want to make the most efficient
use of an ARM-based embedded system.

Organization of the Book
The book begins by briefly noting the ARM processor design philosophy and discussing how
and why it differs from the traditional RISC philosophy. The first chapter also introduces a
simple embedded system based on the ARM processor.
Chapter 2 digs more deeply into the hardware, focusing on the ARM processor core and
presenting an overview of the ARM cores currently in the marketplace.
The ARM and Thumb instruction sets are the focus of Chapters 3 and 4, respectively,
and form the fundamental basis for the rest of the book. Explanations of key instructions
include complete examples, so these chapters also serve as a tutorial on the instruction sets.
Chapters 5 and 6 demonstrate how to write efficient code with scores of example that we

have developed while working with ARM customers. Chapter 5 teaches proven techniques

xi


xii Preface

and rules for writing C code that will compile efficiently on the ARM architecture, and it
helps determine which code should be optimized. Chapter 6 details best practices for writing
and optimizing ARM assembly code—critical for improving performance by reducing
system power consumption and clock speed.
Because primitives are basic operations used in a wide range of algorithms, it’s worthwhile to learn how they can be optimized. Chapter 7 discusses how to optimize primitives
for specific ARM processors. It presents optimized reference implementations of common primitives as well as of more complicated mathematical operations for those who
wish to take a quick reference approach. We have also included the theory behind each
implementation for those who wish to dig deeper.
Audio and video embedded systems applications are increasingly in demand. They
require digital signal processing (DSP) capability that until recently would have been provided by a separate DSP processor. Now, however, the ARM architecture offers higher
memory bandwidths and faster multiply accumulate operations, permitting a single ARM
core design to support these applications. Chapter 8 examines how to maximize the performance of the ARM for digital processing applications and how to implement DSP
algorithms.
At the heart of an embedded system lie the exception handlers. Efficient handlers
can dramatically improve system performance. Chapter 9 covers the theory and practice of handling exceptions and interrupts on the ARM processor through a set of detailed
examples.
Firmware, an important part of any embedded system, is described in Chapter 10 by
means of a simple firmware package we designed, called Sandstone. The chapter also reviews
popular industry firmware packages that are available for the ARM.
Chapter 11 demonstrates the implementation of embedded operating systems through
an example operating system we designed, called Simple Little Operating System.
Chapters 12, 13, and 14 focus on memory issues. Chapter 12 examines the various
cache technologies that surround the ARM cores, demonstrating routines for controlling

the cache on specific cache-enabled ARM processors. Chapter 13 discusses the memory
protection unit, and Chapter 14 discusses the memory management unit.
Finally, in Chapter 15, we consider the future of the ARM architecture, highlighting
new directions in the instruction set and new technologies that ARM is implementing in
the next few years.
The appendices provide detailed references on the instruction sets, cycle timing, and
specific ARM products.

Examples on the Web
As we noted earlier, we have created an extensive set of tested practical examples to
reinforce concepts and methods. These are available on the publisher’s Web site at
www.mkp.com/companions/1558608745.


Preface

xiii

Acknowledgments
First, of course, are our wives—Shau Chin Symes and Yulian Yang—and families who have
been very supportive and have put up with us spending a large proportion of our home
time on this project.
This book has taken many years to complete, and many people have contributed with
encouragement and technical advice. We would like to personally thank all the people
involved. Writing a technical book involves a lot of painstaking attention to detail, so a big
thank you to all the reviewers who spent time and effort reading and providing feedback—a
difficult activity that requires a special skill. Reviewers who worked with the publisher during
the developmental process were Jim Turley (Silicon-Insider), Peter Maloy (CodeSprite),
Chris Larsen, Peter Harrod (ARM, Ltd.), Gary Thomas (MLB Associates), Wayne Wolf
(Princeton University), Scott Runner (Qualcomm, Inc.), Niall Murphy (PanelSoft), and

Dominic Sweetman (Algorithmics, Ltd.).
A special thanks to Wilco Dijkstra, Edward Nevill, and David Seal for allowing us to
include selected examples within the book. Thanks also to Rod Crawford, Andrew Cummins, Dave Flynn, Jamie Smith, William Rees, and Anne Rooney for helping throughout
with advice. Thanks to the ARM Strategic Support Group—Howard Ho, John Archibald,
Miguel Echavarria, Robert Allen, and Ian Field—for reading and providing quick local
feedback.
We would like to thank John Rayfield for initiating this project and contributing
Chapter 15. We would also like to thank David Brash for reviewing the manuscript and
allowing us to include ARMv6 material in this book.
Lastly, we wish to thank Morgan Kaufmann Publishers, especially Denise Penrose and
Belinda Breyer for their patience and advice throughout the project.


This Page Intentionally Left Blank


1.1 The RISC design philosophy
1.2 The ARM Design Philosophy
1.2.1 Instruction Set for Embedded Systems

1.3 Embedded System Hardware
1.3.1
1.3.2
1.3.3
1.3.4

ARM Bus Technology
AMBA Bus Protocol
Memory
Peripherals


1.4 Embedded System Software
1.4.1 Initialization (Boot) Code
1.4.2 Operating System
1.4.3 Applications

1.5 Summary


Chapter

ARM Embedded
Systems

1

The ARM processor core is a key component of many successful 32-bit embedded systems.
You probably own one yourself and may not even realize it! ARM cores are widely used in
mobile phones, handheld organizers, and a multitude of other everyday portable consumer
devices.
ARM’s designers have come a long way from the first ARM1 prototype in 1985. Over
one billion ARM processors had been shipped worldwide by the end of 2001. The ARM
company bases their success on a simple and powerful original design, which continues
to improve today through constant technical innovation. In fact, the ARM core is not
a single core, but a whole family of designs sharing similar design principles and a common
instruction set.
For example, one of ARM’s most successful cores is the ARM7TDMI. It provides up to
120 Dhrystone MIPS1 and is known for its high code density and low power consumption,
making it ideal for mobile embedded devices.
In this first chapter we discuss how the RISC (reduced instruction set computer) design

philosophy was adapted by ARM to create a flexible embedded processor. We then introduce
an example embedded device and discuss the typical hardware and software technologies
that surround an ARM processor.

1. Dhrystone MIPS version 2.1 is a small benchmarking program.

3


4 Chapter 1 ARM Embedded Systems

1.1

The RISC design philosophy
The ARM core uses a RISC architecture. RISC is a design philosophy aimed at delivering
simple but powerful instructions that execute within a single cycle at a high clock speed.
The RISC philosophy concentrates on reducing the complexity of instructions performed
by the hardware because it is easier to provide greater flexibility and intelligence in software
rather than hardware. As a result, a RISC design places greater demands on the compiler.
In contrast, the traditional complex instruction set computer (CISC) relies more on the
hardware for instruction functionality, and consequently the CISC instructions are more
complicated. Figure 1.1 illustrates these major differences.
The RISC philosophy is implemented with four major design rules:
1. Instructions—RISC processors have a reduced number of instruction classes. These
classes provide simple operations that can each execute in a single cycle. The compiler
or programmer synthesizes complicated operations (for example, a divide operation)
by combining several simple instructions. Each instruction is a fixed length to allow
the pipeline to fetch future instructions before decoding the current instruction. In
contrast, in CISC processors the instructions are often of variable size and take many
cycles to execute.

2. Pipelines—The processing of instructions is broken down into smaller units that can be
executed in parallel by pipelines. Ideally the pipeline advances by one step on each cycle
for maximum throughput. Instructions can be decoded in one pipeline stage. There is
no need for an instruction to be executed by a miniprogram called microcode as on
CISC processors.
3. Registers—RISC machines have a large general-purpose register set. Any register can
contain either data or an address. Registers act as the fast local memory store for all data

CISC
Compiler
Code
Generation
Greater
Complexity

Figure 1.1

Processor

RISC
Greater
Complexity

Compiler
Code
Generation
Processor

CISC vs. RISC. CISC emphasizes hardware complexity. RISC emphasizes compiler
complexity.



1.2 The ARM Design Philosophy

5

processing operations. In contrast, CISC processors have dedicated registers for specific
purposes.
4. Load-store architecture—The processor operates on data held in registers. Separate load
and store instructions transfer data between the register bank and external memory.
Memory accesses are costly, so separating memory accesses from data processing provides an advantage because you can use data items held in the register bank multiple
times without needing multiple memory accesses. In contrast, with a CISC design the
data processing operations can act on memory directly.
These design rules allow a RISC processor to be simpler, and thus the core can operate
at higher clock frequencies. In contrast, traditional CISC processors are more complex
and operate at lower clock frequencies. Over the course of two decades, however, the
distinction between RISC and CISC has blurred as CISC processors have implemented
more RISC concepts.

1.2

The ARM Design Philosophy
There are a number of physical features that have driven the ARM processor design. First,
portable embedded systems require some form of battery power. The ARM processor has
been specifically designed to be small to reduce power consumption and extend battery
operation—essential for applications such as mobile phones and personal digital assistants
(PDAs).
High code density is another major requirement since embedded systems have limited memory due to cost and/or physical size restrictions. High code density is useful for
applications that have limited on-board memory, such as mobile phones and mass storage
devices.

In addition, embedded systems are price sensitive and use slow and low-cost memory
devices. For high-volume applications like digital cameras, every cent has to be accounted
for in the design. The ability to use low-cost memory devices produces substantial savings.
Another important requirement is to reduce the area of the die taken up by the embedded
processor. For a single-chip solution, the smaller the area used by the embedded processor,
the more available space for specialized peripherals. This in turn reduces the cost of the
design and manufacturing since fewer discrete chips are required for the end product.
ARM has incorporated hardware debug technology within the processor so that software
engineers can view what is happening while the processor is executing code. With greater
visibility, software engineers can resolve issues faster, which has a direct effect on the time
to market and reduces overall development costs.
The ARM core is not a pure RISC architecture because of the constraints of its primary
application—the embedded system. In some sense, the strength of the ARM core is that
it does not take the RISC concept too far. In today’s systems the key is not raw processor
speed but total effective system performance and power consumption.


6 Chapter 1 ARM Embedded Systems

1.2.1 Instruction Set for Embedded Systems
The ARM instruction set differs from the pure RISC definition in several ways that make
the ARM instruction set suitable for embedded applications:


Variable cycle execution for certain instructions—Not every ARM instruction executes
in a single cycle. For example, load-store-multiple instructions vary in the number
of execution cycles depending upon the number of registers being transferred. The
transfer can occur on sequential memory addresses, which increases performance since
sequential memory accesses are often faster than random accesses. Code density is also
improved since multiple register transfers are common operations at the start and end

of functions.



Inline barrel shifter leading to more complex instructions—The inline barrel shifter is
a hardware component that preprocesses one of the input registers before it is used
by an instruction. This expands the capability of many instructions to improve core
performance and code density. We explain this feature in more detail in Chapters 2, 3,
and 4.



Thumb 16-bit instruction set—ARM enhanced the processor core by adding a second
16-bit instruction set called Thumb that permits the ARM core to execute either
16- or 32-bit instructions. The 16-bit instructions improve code density by about
30% over 32-bit fixed-length instructions.



Conditional execution—An instruction is only executed when a specific condition has
been satisfied. This feature improves performance and code density by reducing branch
instructions.



Enhanced instructions—The enhanced digital signal processor (DSP) instructions were
added to the standard ARM instruction set to support fast 16×16-bit multiplier operations and saturation. These instructions allow a faster-performing ARM processor in
some cases to replace the traditional combinations of a processor plus a DSP.

These additional features have made the ARM processor one of the most commonly

used 32-bit embedded processor cores. Many of the top semiconductor companies around
the world produce products based around the ARM processor.

1.3

Embedded System Hardware
Embedded systems can control many different devices, from small sensors found on
a production line, to the real-time control systems used on a NASA space probe. All
these devices use a combination of software and hardware components. Each component
is chosen for efficiency and, if applicable, is designed for future extension and expansion.


1.3 Embedded System Hardware

7

ROM
SRAM
FLASHROM

ARM
processor

Memory controller
DRAM

Interrupt controller

AHB-external bridge


External bus

AHB arbiter
AHB-APB bridge

Ethernet

Real-time clock

Counter/timers
Console

Serial UARTs

ARM

Figure 1.2

Ethernet
physical
driver

Controllers

Peripherals

Bus

An example of an ARM-based embedded device, a microcontroller.


Figure 1.2 shows a typical embedded device based on an ARM core. Each box represents
a feature or function. The lines connecting the boxes are the buses carrying data. We can
separate the device into four main hardware components:


The ARM processor controls the embedded device. Different versions of the ARM processor are available to suit the desired operating characteristics. An ARM processor
comprises a core (the execution engine that processes instructions and manipulates
data) plus the surrounding components that interface it with a bus. These components
can include memory management and caches.



Controllers coordinate important functional blocks of the system. Two commonly
found controllers are interrupt and memory controllers.



The peripherals provide all the input-output capability external to the chip and are
responsible for the uniqueness of the embedded device.



A bus is used to communicate between different parts of the device.


8 Chapter 1 ARM Embedded Systems

1.3.1 ARM Bus Technology
Embedded systems use different bus technologies than those designed for x86 PCs. The most
common PC bus technology, the Peripheral Component Interconnect (PCI) bus, connects

such devices as video cards and hard disk controllers to the x86 processor bus. This type
of technology is external or off-chip (i.e., the bus is designed to connect mechanically and
electrically to devices external to the chip) and is built into the motherboard of a PC.
In contrast, embedded devices use an on-chip bus that is internal to the chip and that
allows different peripheral devices to be interconnected with an ARM core.
There are two different classes of devices attached to the bus. The ARM processor core is
a bus master—a logical device capable of initiating a data transfer with another device across
the same bus. Peripherals tend to be bus slaves—logical devices capable only of responding
to a transfer request from a bus master device.
A bus has two architecture levels. The first is a physical level that covers the electrical
characteristics and bus width (16, 32, or 64 bits). The second level deals with protocol—the
logical rules that govern the communication between the processor and a peripheral.
ARM is primarily a design company. It seldom implements the electrical characteristics
of the bus, but it routinely specifies the bus protocol.

1.3.2 AMBA Bus Protocol
The Advanced Microcontroller Bus Architecture (AMBA) was introduced in 1996 and has
been widely adopted as the on-chip bus architecture used for ARM processors. The first
AMBA buses introduced were the ARM System Bus (ASB) and the ARM Peripheral Bus
(APB). Later ARM introduced another bus design, called the ARM High Performance Bus
(AHB). Using AMBA, peripheral designers can reuse the same design on multiple projects.
Because there are a large number of peripherals developed with an AMBA interface, hardware designers have a wide choice of tested and proven peripherals for use in a device.
A peripheral can simply be bolted onto the on-chip bus without having to redesign an interface for each different processor architecture. This plug-and-play interface for hardware
developers improves availability and time to market.
AHB provides higher data throughput than ASB because it is based on a centralized
multiplexed bus scheme rather than the ASB bidirectional bus design. This change allows
the AHB bus to run at higher clock speeds and to be the first ARM bus to support widths
of 64 and 128 bits. ARM has introduced two variations on the AHB bus: Multi-layer AHB
and AHB-Lite. In contrast to the original AHB, which allows a single bus master to be
active on the bus at any time, the Multi-layer AHB bus allows multiple active bus masters.

AHB-Lite is a subset of the AHB bus and it is limited to a single bus master. This bus was
developed for designs that do not require the full features of the standard AHB bus.
AHB and Multi-layer AHB support the same protocol for master and slave but have
different interconnects. The new interconnects in Multi-layer AHB are good for systems
with multiple processors. They permit operations to occur in parallel and allow for higher
throughput rates.


1.3 Embedded System Hardware

9

The example device shown in Figure 1.2 has three buses: an AHB bus for the highperformance peripherals, an APB bus for the slower peripherals, and a third bus for external
peripherals, proprietary to this device. This external bus requires a specialized bridge to
connect with the AHB bus.

1.3.3 Memory
An embedded system has to have some form of memory to store and execute code. You
have to compare price, performance, and power consumption when deciding upon specific
memory characteristics, such as hierarchy, width, and type. If memory has to run twice as
fast to maintain a desired bandwidth, then the memory power requirement may be higher.

1.3.3.1

Hierarchy

All computer systems have memory arranged in some form of hierarchy. Figure 1.2 shows
a device that supports external off-chip memory. Internal to the processor there is an option
of a cache (not shown in Figure 1.2) to improve memory performance.
Figure 1.3 shows the memory trade-offs: the fastest memory cache is physically located

nearer the ARM processor core and the slowest secondary memory is set further away.
Generally the closer memory is to the processor core, the more it costs and the smaller its
capacity.
The cache is placed between main memory and the core. It is used to speed up data
transfer between the processor and main memory. A cache provides an overall increase in
performance but with a loss of predictable execution time. Although the cache increases the

Performance/costs

Cache

Main
memory
Secondary
storage

1 MB

1 GB

Memory Size

Figure 1.3

Storage trade-offs.


10 Chapter 1 ARM Embedded Systems

general performance of the system, it does not help real-time system response. Note that

many small embedded systems do not require the performance benefits of a cache.
The main memory is large—around 256 KB to 256 MB (or even greater), depending on
the application—and is generally stored in separate chips. Load and store instructions access
the main memory unless the values have been stored in the cache for fast access. Secondary
storage is the largest and slowest form of memory. Hard disk drives and CD-ROM drives
are examples of secondary storage. These days secondary storage may vary from 600 MB
to 60 GB.

1.3.3.2

Width

The memory width is the number of bits the memory returns on each access—typically
8, 16, 32, or 64 bits. The memory width has a direct effect on the overall performance and
cost ratio.
If you have an uncached system using 32-bit ARM instructions and 16-bit-wide memory
chips, then the processor will have to make two memory fetches per instruction. Each fetch
requires two 16-bit loads. This obviously has the effect of reducing system performance,
but the benefit is that 16-bit memory is less expensive.
In contrast, if the core executes 16-bit Thumb instructions, it will achieve better
performance with a 16-bit memory. The higher performance is a result of the core making
only a single fetch to memory to load an instruction. Hence, using Thumb instructions
with 16-bit-wide memory devices provides both improved performance and reduced cost.
Table 1.1 summarizes theoretical cycle times on an ARM processor using different
memory width devices.

1.3.3.3

Types


There are many different types of memory. In this section we describe some of the more
popular memory devices found in ARM-based embedded systems.
Read-only memory (ROM) is the least flexible of all memory types because it contains an
image that is permanently set at production time and cannot be reprogrammed. ROMs are
used in high-volume devices that require no updates or corrections. Many devices also use
a ROM to hold boot code.

Table 1.1

Fetching instructions from memory.
Instruction size

8-bit memory

16-bit memory

32-bit memory

ARM 32-bit
Thumb 16-bit

4 cycles
2 cycles

2 cycles
1 cycle

1 cycle
1 cycle



1.3 Embedded System Hardware

11

Flash ROM can be written to as well as read, but it is slow to write so you shouldn’t use
it for holding dynamic data. Its main use is for holding the device firmware or storing longterm data that needs to be preserved after power is off. The erasing and writing of flash ROM
are completely software controlled with no additional hardware circuity required, which
reduces the manufacturing costs. Flash ROM has become the most popular of the read-only
memory types and is currently being used as an alternative for mass or secondary storage.
Dynamic random access memory (DRAM) is the most commonly used RAM for devices.
It has the lowest cost per megabyte compared with other types of RAM. DRAM is dynamic—
it needs to have its storage cells refreshed and given a new electronic charge every few
milliseconds, so you need to set up a DRAM controller before using the memory.
Static random access memory (SRAM) is faster than the more traditional DRAM, but
requires more silicon area. SRAM is static—the RAM does not require refreshing. The
access time for SRAM is considerably shorter than the equivalent DRAM because SRAM
does not require a pause between data accesses. Because of its higher cost, it is used mostly
for smaller high-speed tasks, such as fast memory and caches.
Synchronous dynamic random access memory (SDRAM) is one of many subcategories
of DRAM. It can run at much higher clock speeds than conventional memory. SDRAM
synchronizes itself with the processor bus because it is clocked. Internally the data is fetched
from memory cells, pipelined, and finally brought out on the bus in a burst. The old-style
DRAM is asynchronous, so does not burst as efficiently as SDRAM.

1.3.4 Peripherals
Embedded systems that interact with the outside world need some form of peripheral
device. A peripheral device performs input and output functions for the chip by connecting
to other devices or sensors that are off-chip. Each peripheral device usually performs a single
function and may reside on-chip. Peripherals range from a simple serial communication

device to a more complex 802.11 wireless device.
All ARM peripherals are memory mapped—the programming interface is a set of
memory-addressed registers. The address of these registers is an offset from a specific
peripheral base address.
Controllers are specialized peripherals that implement higher levels of functionality
within an embedded system. Two important types of controllers are memory controllers
and interrupt controllers.

1.3.4.1

Memory Controllers

Memory controllers connect different types of memory to the processor bus. On power-up
a memory controller is configured in hardware to allow certain memory devices to be active.
These memory devices allow the initialization code to be executed. Some memory devices
must be set up by software; for example, when using DRAM, you first have to set up the
memory timings and refresh rate before it can be accessed.


×