The selection process

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (410.97 KB, 20 trang )

Chapter 2: The Selection Process
Overview
Embedded systems represent target platforms that are usually specific to a single
task. This specificity means the system design can be highly optimized because the
range of tasks the device must perform is well bounded. In other words, you
wouldn’t use your PC to run your coffee machine (you might, but that’s beside the
point). Unlike your desktop processor, the 4-bit microcontroller that runs your
coffee machine costs less than $1 in large quantities. It does exactly what it’s
supposed to do to — make your coffee. It doesn’t play Zelda, nor does it exchange
data with an Internet service provider (ISP), although that might change soon.
Because the functionality of the device is so narrowly defined, you must find the
optimal processing element (CPU) for the design. Given the several hundred
choices available and the many variations within those choices, choosing the right
CPU can be a daunting task.
Although choosing a processor is a complex task that defies simple “optimization”
(see Figure 2.1
) in all but the simplest projects, the final choice must pass four
critical tests:

Figure 2.1: Choosing the right processor.

Considerations for choosing the right microprocessor for an embedded
application.

 Is it available in a suitable implementation?
 Is it capable of sufficient performance?
 Is it supported by a suitable operating system?
 Is it supported by appropriate and adequate tools?

Is the Processor Available in a Suitable Implementation? Cost-sensitive
projects might require an off-the-shelf, highly integrated part. High-performance
applications might require gate-to-gate delays that are only practical when the
entire design is fabricated on a single chip. What good is choosing the highest
performing processor if the cost of goods makes your product noncompetitive in
the marketplace? For example, industrial control equipment manufacturers that
commonly provide product support and replacement parts with a 20-year lifetime
won’t choose a microprocessor from a vendor that can’t guarantee product
availability over a reasonable span of time. Similarly, if a processor isn’t available
in a military version, you wouldn’t choose it for a missile guidance system, no
matter how good the specs are. In many cases, packaging and implementation
technology issues significantly limit the choice of architecture and instruction set.

Is the Processor Capable of Sufficient Performance? Ultimately, the
processor must be able to do the job on time. Unfortunately, as embedded
systems become more complex, characterizing “the job” becomes more difficult.
As the mix of tasks managed by the processor becomes more diverse (not just
button presses and motor encoding but now also Digital Signal Processor [DSP]
algorithms and network processing), the bottlenecks that limit performance often
have less to do with computational power than with the “fit” between the
architecture and the device’s more demanding tasks. For this reason, it can be
difficult to correlate benchmark results with how a processor will perform in a
particular device.

Is the Processor Supported by an Appropriate Operating System? With
today’s 32-bit microprocessors, it’s natural to see an advantage in choosing a
commercial RTOS. You might prefer one vendor’s RTOS, such as VxWorks or pSOS
from Wind River Systems. Porting the RTOS kernel to a new or different
microprocessor architecture and having it specifically optimized to take advantage
of the low-level performance features of that microprocessor is not a task for the

faint-hearted. So, the microprocessor selection also might depend on having
support for the customer’s preferred RTOS.

Is the Processor Supported by Appropriate and Adequate Tools? Good tools
are critical to project success. The specific toolset necessary depends on the
nature of the project to a certain extent. At a minimum, you’ll need a good cross-
compiler and good debugging support. In many situations, you’ll need far more,
such as in-circuit emulators (ICE), simulators, and so on.
Although these four considerations must be addressed in every processor-
selection process, in many cases, the optimal fit to these criteria isn’t necessarily
the best choice. Other organizational and business issues might limit your choices
even further. For example, time-to-market constraints might make it imperative
that you choose an architecture with which the design team is already familiar. A
corporate commitment or industry preference for a particular vendor or family also
can be an important factor.

Packaging the Silicon
Until recently, designers have been limited to the choice of microprocessor versus
microcontroller. Recent advances in semiconductor technology have increased the
designer’s choices. Now, at least for mass-market products, it might make sense
to consider a system-on-a-chip (SOC) implementation, either using a standard
part or using a semi-custom design compiled from licensed intellectual property.
The following section begins the discussion of these issues by looking at the
traditional microprocessor versus microcontroller trade-offs. Later sections explore
some of the issues relating to more highly integrated solutions.
Microprocessor versus Microcontroller

Most embedded systems use microcontrollers instead of microprocessors.
Sometimes the distinction is blurry, but in general, a microprocessor is the CPU
without any additional peripheral or support devices. Microcontrollers are designed

to need a minimum complement of external parts. Figure 2.2 illustrates the
difference. The diagram on the left side of the figure shows a typical
microprocessor system constructed of discrete components. The diagram on the
right shows the same system but now integrated within a single package.

Figure 2.2: Microcontrollers versus microprocessors.

In a microprocessor-based system, the CPU and the various I/O functions
are packaged as separate ICs. In a microcontroller-based system many, if
not all, of the I/O functions are integrated into the same package with the
CPU.
The advantages of the microcontroller’s higher level of integration are easy to see:

 Lower cost — One part replaces many parts.
 More reliable — Fewer packages, fewer interconnects.
 Better performance — System components are optimized for their
environment.
 Faster — Signals can stay on the chip.
 Lower RF signature — Fast signals don’t radiate from a large PC board.
Thus, it’s obvious why microcontrollers have become so prevalent and even
dominate the entire embedded world. Given that these benefits derive directly
from the higher integration levels in microcontrollers, it’s only reasonable to ask
“why not integrate even more on the main chip?” A quick examination of the
economics of the process helps answer this question.
Silicon Economics
For most of the major silicon vendors in the United States, Japan, and Europe,
high-performance processors also mean high profit margins. Thus, the newest CPU
designs tend to be introduced into applications in which cost isn’t the all-
consuming factor as it is in embedded applications. Not surprisingly, a new CPU

architecture first appears in desktop or other high- performance applications.
As the family of products continues to evolve, the newer design takes its place as
the flagship product. The latest design is characterized by having the highest
transistor count, the lowest yield of good dies, the most advanced fabrication
process, the fastest clock speeds, and the best performance. Many customers pay
a premium to access this advanced technology in an attempt to gain an advantage
in their own markets. Many other customers won’t pay the premium, however.

As the silicon vendor continues to improve the process, its yields begin to rise, and
its profit margins go up. The earlier members of the family can now take
advantage of the new process and be re-engineered in this new process (silicon
vendors call this a shrink), and the resulting part can be sold at a reduced cost
because the die size is now smaller, yielding many more parts for a given wafer
size. Also, because the R&D costs have been recovered by selling the
microprocessor version at a premium, a lower price becomes acceptable for the
older members of the family.
Using the Core As the Basis of a Microcontroller
The silicon vendor also can take the basic microprocessor core and use it as the
basis of a microcontroller. Cost-reducing the microprocessor core might inevitably
lead to a family of microcontroller devices, all based on a core architecture that
once was a stand-alone microprocessor. For example, Intel’s 8086 processor led to
the 80186 family of devices. Motorola’s 68000 and 68020 CPUs led to the 68300
family of devices. The list goes on.
System-on-Silicon (SoS)

Today, it’s common for a customer with reasonable volume projections to
completely design an application-specific microcontroller containing multiple CPU
elements and multiple peripheral devices on a single silicon die. Typically, the
individual elements are not designed from scratch but are licensed (in the form of
“synthesizable” VHDL

[1]
or Verilog specifications) from various IC design houses.
Engineers connect these modules with custom interconnect logic, creating a chip
that contains the entire design. Condensing these elements onto a single piece of
silicon is called system-on- silicon (SoS) or SOC. Chapter 3
on hardware and
software partitioning discusses this trend. The complexity of modern SOCs are
going far beyond the relatively “simple” microcontrollers in use today.
[1]
VHDl stands for VHSIC (very high-speed IC) hardware description language
Adequate Performance
Although performance is only one of the considerations when selecting processors,
engineers are inclined to place it above the others, perhaps because performance
is expected to be a tangible metric both absolutely and relatively with respect to
other processors. However, as you’ll see in the following sections, this is not the
case.
Performance-Measuring Tools
For many professionals, benchmarking is almost synonymous with Dhrystones and
MIPS. Engineers tend to expect that if processor A benchmarks at 1.5 MIPS, and
Processor B benchmarks at 0.8 MIPS, then processor A is a better choice. This
inference is so wrong that some have suggested MIPS should mean: Meaningless
Indicator of Performance for Salesmen.
MIPS were originally defined in terms of the VAX 11/780 minicomputer. This was
the first machine that could run 1 million instructions per second (1 MIPS). An
instruction, however, is a one-dimensional metric that might not have anything to
do with the way work scales on different machine architectures. With that in mind,
which accounts for more work, executing 1,500 instructions on a RISC architecture
or executing 1,000 instructions on a CISC architecture? Unless you are comparing
VAX to VAX, MIPS doesn’t mean much.
The Dhrystone benchmark is a simple C program that compiles to about 2,000

lines of assembly code and is independent of operating system services. The
Dhrystone benchmark was also calibrated to the venerable VAX. Because a VAX
11/70 could execute 1,757 loops through the Dhrystone benchmark in 1 second,
1,757 loops became 1 Dhrystone. The problem with the Dhrystone test is that a
crafty compiler designer can optimize the compiler to blast through the Dhrystone
benchmark and do little else well.

Distorting the Dhrystone Benchmark
Daniel Mann and Paul Cobb[5] provide an excellent analysis of the shortcomings of
the Dhrystone benchmark. They analyze the Dhrystone and other benchmarks and
point out the problems inherent in using the Dhrystone to compare embedded
processor performance. The Dhrystone often misrepresents expected performance
because the benchmark doesn’t always use the processor in ways that parallel
typical application use. For example, a particular problem arises because of the
presence of on-chip instructions and data caches. If significant amounts (or all) of
a benchmark can fit in an on-chip cache, this can skew the performance results.
Figure 2.3
compares the performance of three microprocessors for the Dhrystone
benchmark on the left side of the chart and for the Link Access Protocol-D (LAPD)
benchmark on the right side. The LAPD benchmark is more representative of
communication applications. LAPD is the signaling protocol for the D- channel of
ISDN. The benchmark is intended to measure a processor’s capability to process a
typical layered protocol stack.

Figure 2.3: Dhrystone comparison chart.

Comparing microprocessor performance for two benchmarks (courtesy of
Mann and Cobb)[5].
Furthermore, Mann and Cobb point out that developers usually compile the

Dhrystone benchmark using the string manipulation functions that are part of the
C run-time library, which is normally part of the compiler vendor’s software
package. The compiler vendor usually optimizes these library functions as a good
compromise between speed and code size. However, the compiler vendor could
create optimized versions of these string-handling functions to yield more
favorable Dhrystone results. This practice isn’t necessarily dishonest, as long as a
full disclosure is made to the end user.
A manufacturer can further abuse benchmark data by benchmarking its processor
with a board that has fast static SRAM and then compare the results to a
competitor’s board that contains slower, but more economical, DRAM.
Meaningful Benchmarking
Real benchmarking involves carefully balancing system requirements and variables.
How a processor runs in your application might be very different from its
performance in a different application. You must consider many things when
determining how well or poorly a processor might perform in benchmarking tests.
In particular, it’s important to analyze the real-time behavior of the processor.
Because most embedded processors must deal with real-time events, you might
assume that the designers have factored this into their performance requirements
for the processor. This assumption might or might not be correct because, once
again, how to optimize for real-time problems isn’t as obvious as you might expect.
Real-time performance can be generally categorized into two buckets: interrupt
handling and task switching. Both relate to the general problem of switching the
context of the processor from one operation to another. Registers must be saved,
variables must be pushed onto the stack, memory spaces must be swapped, and
other housekeeping events must take place in both instances. How easy this is to
accomplish, as well as how fast it can be carried out, are important in evaluating a
processor that must be interfaced to events in the real world.
Predicting performance isn’t easy. Many companies that blindly relied (sometimes
with fervent reassurance from vendors) on overly simplistic benchmarking data
have suffered severe consequences. The semiconductor vendors were often just as

guilty as the compiler vendors of aggressively tweaking their processors to
perform well in the Dhrystone tests.

From the Trenches When you base early decisions on simplistic measures,
such as benchmarks and throughput, you risk disasterous
late surprises, as this story illustrates:

A certain embedded controller manufacturer, who shall remain nameless, was
faced with a dilemma. The current product family was running out of gas, and it
was time to do a re-evaluation of the current architecture. There was a strong
desire to stay with the same processor family that they used in the previous design.
The silicon manufacturer claimed that the newest member of the family
benchmarked at twice the throughput of the previous version of the device (The
clue here is benchmarked. What was the benchmark? How did it relate to the
application code being used by this product team?). Since one of the design
requirements was to double the throughput of the product, the design team opted
to replace the existing embedded processor with the new one.
At first, the project progressed rapidly, since the designers could reuse much of
their C and assembly code, as well as many of the software tools they had already
purchased or developed. The problems became apparent when they finally began
to run their own performance metrics on the new prototype hardware. Instead of
the expected two-fold performance boost, their new design gave them only a 15-
percent performance improvement, far less than what they needed to stay
competitive in their market space.
The post-mortem analysis showed that the performance boost they expected could
not be achieved by simply doubling the clock frequency or by using a more
powerful processor. Their system design had bottlenecks liberally sprinkled
throughout the hardware and software design. The processor could have been
infinitely fast, and they still would not have gotten much better than a 15-percent
boost.

EEMBC

Clearly, MIPS and Dhrystone measurements aren’t adequate; designers still need
something more tangible than marketing copy to use as a basis for their processor
selection. To address this need, representatives of the semiconductor vendors, the
compiler vendors, and their customers met under the leadership of Markus Levy
(who was then the technical editor of EDN magazine) to create a more meaningful
benchmark. The result is the EDN Embedded Microprocessor Benchmark
Consortium, or EEMBC (pronounced “Embassy”).

The EEMBC benchmark consists of industry-specific tests. Version 1.0 currently has
46 tests divided into five application suites. Table 2.1
shows the benchmark tests
that make up 1.0 of the test suite.

Table 2.1: EEMBC tests list.

The 46 tests in the EEMBC benchmark are organized as five industry-
specific suites.

EEMBC Test
Automotive/Industrial Suite
Angle-to-time conversion Inverse discrete cosine
transform
Basic floating point Inverse Fast-Fourier transform
(FFT) filter
Bit manipulation Matrix arithmetic
Cache buster Pointer chasing
CAN remote data request Pulse-width modulation
Fast-Fourier transform (FFT) Road speed calculation

Finite Impulse Response (FIR) filter Table lookup and interpolation
Infinite Impulse Response (IIR) filter Tooth-to-spark calculation
Consumer Suite
Compress JPEG RGB-to-CMYK conversion
Decompress JPEG RGB-to-YIQ conversion
High-pass grayscale filter

Networking Suite
OSPF/Dijkstra routing Packet Flow (1MB)
Lookup/Patricia algorithm Packet Flow (2MB)
Packet flow (512B)

Office Automation Suite
Bezier-curve calculation Image rotation
Dithering Text processing
Telecommunications Suite
Autocorrelation (3 tests) Fixed-point complex FFT (3
tests)
Convolution encoder (3 tests) Viterbi GSM decoder (4 tests)
Fixed-point bit allocation (3 tests)

Unlike the Dhrystone benchmarks, the benchmarks developed by the EEMBC
technical committee represent real-world algorithms against which the processor
can be measured. Looking at the Automotive/Industrial suite of tests, for example,
it’s obvious that any embedded microprocessor involved in an engine-management
system should be able to calculate a tooth-to-spark time interval efficiently.
The EEMBC benchmark produces statistics on the number of times per second the
algorithm executes and the size of the compiled code. Because the compiler could
have a dramatic impact on the code size and efficiency, each benchmark must

contain a significant amount of information about the compiler and the settings of
the various optimization switches.

The selection process

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về