Tải bản đầy đủ (.pdf) (37 trang)

Tài liệu 77 Introduction to the TMS320 Family of Digital Signal Processors docx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (416.92 KB, 37 trang )

Papamichalis, P. “Introduction to the TMS320 Family of Digital Signal Processors”
Digital Signal Processing Handbook
Ed. Vijay K. Madisetti and Douglas B. Williams
Boca Raton: CRC Press LLC, 1999
c

1999byCRCPressLLC
77
Introduction to the TMS320 Family
of Digital Signal Processors
Panos Papamichalis
Texas Instruments
77.1 Introduction
77.2 Fixed-Point Devices: TMS320C25 Architecture and
Fundamental Features
77.3 TMS320C25 Memory Organization and Access
77.4 TMS320C25 Multiplier and ALU
77.5 Other Architectural Features of the TMS320C25
77.6 TMS320C25 Instruction Set
77.7 Input/Output Operations of the TMS320C25
77.8 Subroutines, Interrupts, and Stack on the TMS320C25
77.9 Introduction to the TMS320C30 Digital Signal Processor
77.10 TMS320C30 Memory Organization and Access
77.11 Multiplier and ALU of the TMS320C30
77.12 Other Architectural Features of the TMS320C30
77.13 TMS320C30 Instruction Set
77.14 Other Generations and Devices in the TMS320 Family
References
This article discusses the architecture and the hardware characteristics of the TMS320
family of Digital Signal Processors. The TMS320 family includes several generations
of programmable processors with several devices in each generation. Since the pro-


grammable processors are split between fixed-point and floating-point devices, both
categories are examinedinsomedetail. The TMS320C25 serves here as a simple example
for the fixed-point processor family, while the TMS320C30 is used for the floating-point
family.
77.1 Introduction
Since its introduction in 1982 with the TMS32010 processor, the TMS320 family of DSPs has been
exceedingly popular. Different members of this family were introduced to address the existing
needs for real-time processing, but then, designers capitalized on the features of the devices to create
solutions and productsin waysneverimagined before. Inturn, these innovationsfed the architectural
and hardware configurations of newer generations of devices.
Digital Signal Processing encompasses a variety of applications, such as digital filtering, speech
and audio processing, image and video processing, and control. All DSP applications share some
c

1999 by CRC Press LLC
common characteristics:
• The algorithms used are mathematically intensive. A typical example is the computa-
tion of an FIR filter, implemented as sum-of-products. This operation involves a lot of
multiplications combined with additions.
• DSP algorithms must typically run in real time: i.e., the processing of a segment of the
arriving signal must be completed before the next segment arrives, or else data will be
lost.
• DSP techniques are under constant development. This implies that DSP systems should
be flexible to support changes and improvements in the state of the art. As a result,
programmable processors have been the preferred way of implementation. In recent
times, though, fixed-function devices have also been introduced to address high-volume
consumer applications with low-cost requirements.
These needs are addressed in the TMS320 family of DSPs by using appropriate architecture, in-
struction sets, I/O capabilities, as well as the raw speed of the devices. However, it should be kept
in mind that these features do not cover all the aspects describing a DSP device, and especially a

programmable one. Availability and quality of software and hardware development tools (such as
compilers, assemblers, linker, simulators, hardware emulators, and development systems), applica-
tion notes, third-party products and support, hot-line support, etc. play an important role on how
easy it will be to develop an application on the DSP processor. The TMS320 family has very extensive
such support, but its description goes beyond the scope of this article. The interested reader should
contact the TI DSP hotline (Tel. 713-274-2320).
For the purposes of this article, two devices have been selected to be highlighted from the Texas
Instruments TMS320 family of digital signal processors. One is the TMS320C25, a 16-bit, fixed-point
DSP, and the other is the TMS320C30, a 32-bit, floating-point DSP. As a short-hand notation, they
will be called ‘C25 and ‘C30, respectively. The choice was made so that both fixed-point issues are
considered.
There have been newer (and more sophisticated) generations added to the TMS320 family but,
since the objective of this article is to be more tutorial, they will be discussed as extensions of the
‘C25 and the ‘C30. Such examples are other members of the ‘C2x and the ‘C3x generations, as well
as the TMS320C5x generation (‘C5x for short) of fixed-point devices, and the TMS320C4x (‘C4x) of
floating-point devices. Customizable and fixed-function extensions of this family of processors will
be also discussed.
Texas Instruments, like all vendors of DSP devices, publishes detailed User’s Guides that explain at
great length the features and the operation of the devices. Each of these User’s Guides is a pretty thick
book, so it is not possible (or desirable) to repeat all this information here. Instead, the objective of
this article is to give an overview of the basic features for each device. If more detail is necessary for
an application, the reader is expected to refer to the User’s Guides. If the User’s Guides are needed,
it is very easy to obtain them from Texas Instruments.
77.2 Fixed-Point Devices: TMS320C25 Architecture and
Fundamental Features
The Texas Instruments TMS320C25 is a fast, 16-bit, fixed-point digital signal processor. The speed
of the device is 10 MHz, which corresponds to a cycle time of 100 ns. Since the majority of the
instructions execute in a single cycle, the figure of 100 ns also indicates how long it takes to execute
one instruction. Alternatively, we can say that the device can execute 10 million instructions per
second (MIPS). The actual signal from the external oscillator or crystal has a frequency four times

higher, at 40 MHz. This frequency is then divided on-chip to generate the internal clock with a
c

1999 by CRC Press LLC
period of 100 ns. Figure 77.1 shows the relationship between the input clock CLKIN from the
external oscillator, and the output clock CLKOUT. CLKOUT is the same as the clock of the device,
and it is related to CLKIN by the equation CLKOUT = CLKIN /4. Note that in Fig. 77.1 the shape
of the signal is idealized ignoring rise and fall times.
FIGURE 77.1: Clock timing of the TMS320C25. CLKIN = external oscillator; CLKOUT = clock of
the device.
Newer versions of the TMS320C25 operate in higher frequencies. For instance, there is a spinoff
that has a cycle time of 80 ns, resulting in a 12.5 MIPS operation. There are also slower (and cheaper)
versions for applications that do not need this computational power.
Figure 77.2 shows in a simplified form the key features of the TMS320C25. The major parts of the
DSP processor are the memory, the Central Processing Unit (CPU), the ports, and the peripherals.
Each of these parts will be examined in more detail later. The on-chip memory consists of 544
words of RAM (read/write memory) and 4K words of ROM (read-only memory). In the notation
used here, 1K = 1024 words, and 4K = 4 × 1024 = 4096 words. Each word is 16 bits wide and,
when some memory size is given, it is measured in 16-bit words, and not in bytes (as is the custom
in microprocessors). Of the 544 words of RAM, 256 words can be used as either program or data
memory, while the rest is only data memory. All 4K of on-chip ROM is program memory. Overall,
the device can address 64K words of data memory and 64K words of program memory. Except for
what resides on-chip, the rest of the memory is external, supplied by the designer.
The CPU is the heart of the processor. Its most important feature, distinguishing it from the
traditional microprocessors, is a hardware multiplier that is capable of performing a 16 × 16 bit
multiplication in a single cycle. To preserve higher intermediate accuracy of results, the full 32-
bit product is saved in a product register. The other important part of the CPU is the Arithmetic
Logic Unit (ALU) that performs additions, subtractions, and logical operations. Again, for increased
intermediate accuracy, there is a 32-bit accumulator to handle all the ALU operations.
All the arithmetic and logical functions are accumulator-based. In other words, these operations

have two operands, one of which is always the accumulator. The result of the operation is stored in
the accumulator.
Because of this approach the form of the instructions is very simple indicating only what the other
operand is. This architectural philosophy is very popular but it is not universal. For instance, as is
discussed later, the TMS320C30 takes a different approach, where there are several “accumulators”
in what is called a register file.
Other components of the TMS320C25 CPU are several shifters to facilitate manipulation of the
data and increase the throughput of the device by performing shifting operations in parallel with
other functions. As part of the CPU, there are also eight auxiliary registers that can be used as memory
pointers or loop counters. There are two status registers, and an 8-deep hardware stack. The stack
c

1999 by CRC Press LLC
FIGURE 77.2: Key architectural features of the TMS320C25.
is used to store the memory address where the program will continue execution after a temporary
diversion to a subroutine.
To communicate with external devices, the TMS320C25 has 16 input and 16 output parallel ports.
It also has a serial port that can serve the same purpose. The serial port is one of the peripherals that
have been implemented on chip. Other peripherals include the interrupt mask, the global memory
capability, and a timer. The above components of the TMS320C25 are examined in more detail
below.
The device has 68 pins that are designated to perform certain functions, and to communicate
with other devices on the same board. The names of the signals and the corresponding definitions
appear in Table 77.1. The first column of the table gives the pin names. Note that a bar over the
name indicates that the pin is in the active position when it is electrically low. For instance, if the
pins take the voltage levels of 0 V and 5 V, a pin indicated with an overbar is asserted when it is set
at 0 V. Otherwise, assertion occurs at 5 V. The second column indicates if the pin is used for input
to the device or output from the device or both. The third column gives a description of the pin
functionality.
Understanding the functionality of the device pins is as important as understanding the internal

architecturebecause it provides the designer with the toolsavailabletocommunicate with the external
world. The DSP device needs to receive data and, often, instructions from the external sources, and
send the results back to the external world. Depending on the paths available for such transactions,
the design of a program can take very different forms. Within this framework, it is up to the designer
to generate implementations that are ingenious and elegant.
The TMS320C25 has its own assembly language to be programmed. This assembly language
consists of 133 instructions that perform general-purpose and DSP-specific functions. Familiarity
with the instruction set and the device architecture are the two components of efficient program
implementation. High-level-language compilers have also been developed that make the writing of
programs an easier task. For the TMS320C25, there isaCcompiler available. However, there is
always a loss of efficiency when programming in high-level languages, and this may not be acceptable
in computation-bound real-time systems. Besides, for complete understanding of the device it is
necessary to consider the assembly language.
c

1999 by CRC Press LLC
TABLE 77.1 Names and Functionality of the 68 pins of the TMS320C25
Signals I/O/Z
a
Definition
V
CC
I 5-V supply pins
V
SS
I Ground pins
X1 O Output from internal oscillator for crystal
X2/CLKIN I Input to internal oscillator from crystal or external clock
CLKOUT1 O Master clock output (crystal or CLKIN frequency/4)
CLKOUT2 O A second clock output signal

D15-D0 I/O/Z 16-bit data bus D15 (MSB) through DO (LSB). Multiplexedbetween program,
data, and I/O spaces.
A15-A0 O/Z 16-bit address bus A15 (MSB) through AO (LSB)
PS,DS, IS
O/Z Program, data, and I/O space select signals
R/
W
O/Z Read/write signal
ST RB
O/Z Strobe signal
RS
I Reset input
INT
2-
INT
0 I External user interrupt inputs
MP/
MC
I Microprocessor/microcomputer mode select pin
MSC
O Microstate complete signal
IACK
O Interrupt acknowledge signal
READY I Data ready input. Asserted by external logic when using slower devices to
indicate that the current bus transaction is complete.
BR
O Busrequestsignal. Assertedwhenthe TMS320C25requiresaccesstoanexternal
global data memory space.
XF O External flag output (latched software-programmable signal)
HOLD

I Hold input. When asserted. TMS320C25 goes into an idle mode and places
the data, address, and control lines in the high impedance state.
H OLDA
O Hold acknowledge signal.
SYNC
I Synchronization input.
BIO
I Branch control input. Polled by BIOZ instruction
DR I Serial data receive input
CLKR I Clock for receive input for serial port
FSR I Frame synchronization pulse for receive input
DX O/Z Serial data transmit output
CLKX I Clock for transmit output for serial port
FSX I/O/Z Frame synchronization pulse for transmit. Configurable as either an input or
an output.
a
I/O/Z denotes input/output/high-impedance state.
Note: The first column is the pin name; the second column indicates if it is an input or an output pin; the third
column gives a description of the pin functionality.
A very important characteristic of the device is its Harvard architecture. In Harvard architecture
(see Fig. 77.3), the program and data memory spaces are separated and they are accessed by different
buses. One bus accesses the program memory space to fetch the instructions, while another bus is
used to bring operands from the data memory space and store the results back to memory. The
objective of this approach is to increase the throughput by bringing instructions and data in parallel.
Analternatephilosophy is thevonNeumanarchitecture. The vonNeumanarchitecture (see Fig. 77.4)
uses a single bus and a unified memory space. Unification of the memory space is convenient for
partitioning it between program and data, but it presents a bottleneck since both data and program
instructions must use the same path and, hence, they must be multiplexed. The Harvard architecture
of multiple buses is used in digital signal processorsbecausethe increased throughput is of paramount
importance in real-time systems.

The difference of the architectures is important because it influences the programming style. In
Harvard architecture, two memory locations can have the same address, as long as one of them is
in the data space and the other is in the program space. Hence, when the programmer uses an
address label, he has to be alert as to what space he is referring. Another restriction of the Harvard
architecture is that the data memory cannot be initialized during loading because loading refers
only to placing the program on the memory (and the program memory is separate from the data
memory). Datamemorycan be initialized during execution only. The programmer must incorporate
such initialization in his program code. As it will be seen later, such restrictions have been removed
from the TMS320C30 while retaining the convenient feature of multiple buses.
Figure 77.5 shows a functional block diagram of the TMS320C25 architecture. The Harvard
c

1999 by CRC Press LLC
FIGURE 77.3: Simplified block diagram of the Harvard architecture.
FIGURE 77.4: Simplified block diagram of the von Neuman architecture.
architecture of the device is immediately apparent from the separate program and data buses. What
is not apparent is that the architecture has been modified to permit communication between the
two buses. Through such communication, it is possible to transfer data between the program and
memory spaces. Then, the program memory space also can be used to store tables. The transfer
takes place by using special instructions such as TBLR (Table Read), TBLW (Table Write), and BLKP
(Block transfer from Program memory).
As shown in the block diagram, the program ROM is linked to the program bus, while data RAM
blocks B1 and B2 are linked to the data bus. The RAM block B0 can be configured either as program
or data memory (using the instructions CNFP and CNFD), and it is multiplexed with both buses.
The different segments, such as the multiplier, the ALU, the memories, etc. are examined in more
detail below.
77.3 TMS320C25 Memory Organization and Access
Besides the on-chip memory (RAM and ROM),the TMS320C25 can accessexternalmemory through
the external bus. This bus consists of the 16 address pins A0-A15, and the 16 data pins D0-D15.
The address pins carry the address to be accessed, while the data pins carry the instruction word or

the operand, depending on whether program or data memory is accessed. The bus can access either
program or data memory, the difference indicated by which of the pins PS and DS (with overbars)
becomes active. The activation is done automatically when, during the execution, an instruction or
a piece of data needs to be fetched. Since the address is 16-bits wide, the maximum memory space
c

1999 by CRC Press LLC
FIGURE 77.5: Functional block diagram of the TMS320C25 architecture.
c

1999 by CRC Press LLC
FIGURE 77.6: Memory maps for program and data memory of the TMS320C25.
is 64K words for program and 64K words for data.
The device starts execution after a reset signal, i.e., after the RS pin is pulled low for a short
period of time. The execution always begins at program memory location 0, where there should
be an instruction to direct the program execution to the appropriate location. This direction is
accomplished by a branch instruction.
BPROG
which loads the program counter with the program memory address that has the label PROG (or
any other label you choose). Then, execution continues from the address PROG, where, presumably,
a useful program has been placed.
It is clear that the program memory location 0 is very important, and you need to know where
it is physically located. The TMS320C25 gives you the flexibility to use as location 0 either the first
location of the on-chip ROM,or the first location of the external memory. In the first case, we say that
the device operates in the microcomputer mode, while in the second one it is in the microprocessor
mode. In the microprocessor mode, the on-chip ROMis ignored altogether. You can choose between
the two modes by pulling the device MP/MC high or low. The microcomputer mode is useful for
production purposes, while for laboratory and development work the microprocessor mode is used
exclusively.
Figure 77.6 shows the memory configuration of the TMS320C25, where the microprocessor and

microcomputer configurations of the program memory are depicted separately. The data memory
is partitioned in 512 sections, called pages, of 128 words each. The reason of the partitioning is for
addressing purposes, as will be discussed below. Memory boundaries of the 64K memory space are
shown in both decimal and hexadecimal notation (hexadecimal notation indicated by an “h” or “H”
at the end.) Compare this map with the block diagram in Fig. 77.5.
As mentioned earlier, in two-operand operations, one of the operands resides in the accumulator,
and the result is also placed in the accumulator. (The only exceptions is the multiplication operation
examined later.) The other operand can either reside in memory or be part of the instruction. In the
lattercase, the value to be combinedwith the accumulator is explicitly specified in the instruction, and
this addressing mode is called immediate addressing mode. In the TMS320C25 assembly language,
the immediate addressing mode instructions are indicated by a “K” at the end of the instruction.
c

1999 by CRC Press LLC
For example, the instruction
ADDK 5
increments the contents of the accumulator by 5.
If the value to be operated upon resides in memory, there are two ways to access it: either by
specifying the memory address directly (direct addressing) or by using a register that holds the
address of that number (indirect addressing).
As a general rule, it is desirable to describe an instruction as briefly as possible so that the whole
description can be held in one 16-bit word. Then, when the program is executed, only one word
needs to be fetched before all the information from the instruction is available for execution. This
is not always possible and there are two-word instructions as well, but the chip architects always
strive to achieve one-word instructions. In the direct addressing mode, full description of a memory
address would require a 16-bit word by itself because the memory space is 64K words. To reduce
that requirement, the memory space is divided in 512 pages of 128 words each. An instruction using
direct addressing contains the 7 bits indicating what word you want to access within a page. The
page number (9 bits) is stored in a separate register (actually, part of a register), called the Data Page
pointer (DP). You store the page number in the DP pointer by using the instructions LDP (Load Data

Page pointer) or LDPK (Load Data Page pointer immediate).
In the indirect addressing mode, the data memory address is held in a register that acts as a memory
pointer. There are eight such registers available, called auxiliary registers, AR0-AR7. The auxiliary
registerscan also be used for other functions, suchasloop counters, etc. To save bits in the instruction,
the auxiliary register used as memory pointer is not indicated explicitly, but it is stored in a separate
register (actually, part of a register), the auxiliary register pointer (ARP). In other words, there is
the concept of the “current register”. In an operation using indirect addressing, the contents of the
current auxiliary register point to the desired memory location. The current AR is specified by the
contents of the ARP as shown in Fig. 77.7. In an instruction, indirect addressing is indicated by an
asterisk.
FIGURE 77.7: Example of indirect addressing mode.
A“+” sign at the end of an instruction using indirect addressing means “after the present memory
access, increment the contents of the current auxiliary register by 1”. This is done in parallel with
the load-accumulator operation. The above autoincrementing of the auxiliary register is an optional
operation that offers additional flexibility to the programmer. And it is not the only one available.
The TMS320C25 has an auxiliary register arithmetic unit (ARAU, see Fig. 77.5) that can execute
c

1999 by CRC Press LLC
such operations in parallel with the CPU, and increase the throughput of the device in this way.
Table 77.2 summarizes the different operations that can be done while using indirect addressing.
As seen from this table, the contents of an auxiliary register can be incremented or decremented by
1, incremented or decremented by the contents of AR0, and incremented or decremented by AR0
in a bit-reversed fashion. The last operation is useful when doing Fast Fourier Transforms. The
bit-reversed addressing is implemented by adding AR0 with reverse carry propagation, an operation
explained in the TMS320C25 User’s Guide. Additionally, it is possible to load at the same time the
ARP with a new value, thus saving an extra instruction.
TABLE 77.2 Operations That Can Be
Performed in Parallel with Indirect
Addressing

Notation Operation
ADD

No manipulation of AR or ARP
ADD

,Y Y

ARP
ADD

+
AR(ARP)
+1 →
AR(ARP)
ADD

+
,Y AR(ARP)
+1 →
AR(ARP)
Y

ARP
ADD

- AR(ARP) -
1 →
AR(ARP)
ADD


-,Y AR(ARP) -
1 →
AR(ARP)
Y

ARP
ADD

0+
AR(ARP)
+
AR0

AR(ARP)
ADD

0+
,Y AR(ARP)
+
AR0

AR(ARP)
Y

ARP
ADD

0
- AR(ARP)-AR0


AR(ARP)
ADD

0
-,Y AR(ARP)-AR0

AR(ARP)
Y

ARP
ADD

BR0
+
AR(ARP)
+
rcAR0

AR(ARP)
ADD

BR0
+
,Y AR(ARP)
+
rcAR0

AR(ARP)
Y


ARP
ADD

BR0- AR(ARP)-rcAR0

AR(ARP)
ADD

BR0-,Y AR(ARP)-rcAR0

AR(ARP)
Y

ARP
Note:
Y = 0,...,7
is the new “current” AR. AR(ARP)
is the AR pointed to by the ARP. BR
=
bit reversed, rc
=
reverse carry.
77.4 TMS320C25 Multiplier and ALU
The heart of the TMS320C25 is the CPU consisting, primarily, of the multiplier and the arithmetic
logic unit (ALU). The hardware multiplier can perform a 16 bit × 16 bit multiplication in a single
machine cycle. This capability is probably the major distinguishing feature of digital signal processors
because it permits high throughput in numerically intensive algorithms.
Associated with the multiplier, there are two registers that hold operands and results. The T-
register (for temporary register) holds one of the two factors. The other factor comes from a memory

location. Again, this construct, with one implied operand residing in the T-register, permits more
compact instruction words. When multiplier and multiplicand (two 16-bit words) are multiplied
together, the result is 32-bits long. In traditional microprocessors, this product would have been
truncated to 16 bits, and presented as the final result. In DSP applications, though, this product
is only an intermediate result in a long stream of multiply-adds, and if truncated at this point, too
much computational noise would be introduced to the final result. To preserve higher final accuracy,
the full 32-bit result is held in the P-register (for product register). This configuration is shown in
Fig. 77.8 which depicts the multiplier and the ALU of the TMS320C25.
Actually, the P-register is viewed as two 16-bit registers concatenated. This viewpoint is convenient
c

1999 by CRC Press LLC
if you need to save the product using the instructions SPH (store product high) and SPL (store
product low). Otherwise, the product can operate on the accumulator, which is also 32-bits wide.
The contents of the product register can be loaded on the accumulator, overwriting whatever was
there, using the PAC (product to accumulator) instruction. It can also be added to or subtracted
from the accumulator using the instructions APAC or SPAC.
FIGURE 77.8: Diagram of the TMS320C25 multiplier and ALU.
When moving the contentsof the T-register to the accumulator, youcan shift this number using the
built-in shifters. For instance you can shift the result left by 1 or 4 locations (essentially multiplying
it by 2 or 16), or you can shift it right by 6 (essentially dividing it by 64). These operations are done
automatically, without spending any extra machine cycles, simply by setting the appropriate product
mode with SPM instruction. Why would you want to do such shifting? The left shifts have as a
main purpose to eliminate any extra sign bits that would appear in computations. The right shift
scales down the result and permits accumulation of several products before you start worrying about
overflowing the accumulator.
At this point, it is appropriate to discuss the data formats supported on the TMS320C25. This
device, as most fixed-point processors, uses two’s-complement notation to represent the negative
numbers. In two’s complement notation, to form the negative of a given number, you take the
complement of that number and you add 1. In two’s-complement notation, the most significant bit

(MSB, the left-most bit) of a positive number is zero, while the MSB of a negative number is one. In
the ‘C25, the two’s complement numbers are sign-extended, which means that, if the absolute value
of the number is not large enough to fill all the bits of the word, there will be more than one sign bits.
As seen from Fig. 77.8, the multiplier path is not the only way to access the accumulator. Actually,
the ALU and the accumulator support a wealth of arithmetic (ADD, SUB, etc.) and logical (OR,
AND, XOR, etc.) instructions, in addition to load and store instructions for the accumulator (LAC,
c

1999 by CRC Press LLC
FIGURE 77.9: Partial memory configuration of the TMS320C25 after the CNFD and the CNFP
instructions.
ZALH, SACL, SACH, etc.).
An interesting characteristic of the TMS320C25 architecture is the existence of several shifters that
can perform such shifts in parallel with other operations. Except for the right shifter at the multiplier,
all the other shifters are left shifters. An input shifter to the ALU and the accumulator can shift the
input value to the left by up to 16 locations, while output shifters from the accumulator can shift
either the high or the low part of the accumulator by up to 7 locations to the left.
A construct that appears very often in mathematical computations is the sum of products. Sums
of products appear in the computation of dot products, in matrix multiplication, and in convolution
sums for filtering, among other applications. Since it is important to carry out this computation as
fast as possible for real-time operation, all digital signal processors have special instructions to speed
up this particular function.
The TMS320C25 has the instruction LTA which loads the T-register and, in parallel with that, adds
the previous product (which already resides in the P-register) to the accumulator. LTS subtracts the
product from the accumulator. Another instruction, LTD, does the same thing as LTA, but it also
moves the value that was just loaded on the T-register to the next higher location in memory. This
move realizes the delay line that is needed in filtering applications. LTA, when combined with the
MPY instruction, can implement very efficiently the sum of products.
For even higher efficiency, there is a MAC instruction that combines LTA and MPY. An additional
MACD instruction combines LTD and MPY. The increased efficiency is achieved by using both the

data and the program buses to bring in the operands of the multiplication. The data coming from
the data bus can be traced in memory by an AR, using indirect addressing. The data coming from the
program bus are traced by the program counter (actually, the pre-fetch counter, PFC) and, hence,
they must reside in consecutive locations of program memory. To be able to modify the data and
then use it in such multiply-add operations, the TMS320C25 permits reconfiguration of block B0
in the on-chip memory. B0 can be configured either as program or as data memory, as shown in
Fig. 77.9, using the CNFD and CNFP instructions.
c

1999 by CRC Press LLC
77.5 Other Architectural Features of the TMS320C25
The TMS320C25 has many interesting features and capabilities that can be found in the user’s
guide [1]. Here, we present briefly only the most important of them.
The program counter is a 16-bit register, hidden from the user, which contains the address of
the next instruction word to be fetched and executed. Occasionally, the program execution may be
redirected, for instance, through a subroutine call. In this case, it is necessary to save the contents
of the program counter so that the program flow continues from the correct instruction after the
completion of the subroutine call. For this purpose, a hardware stack is provided to save and recover
the contents of the program counter.
The hardware stack is a set of eight registers, of which only the top one is accessible to the user.
Upon a subroutine call, the address after the subroutine call is pushed on the stack, and it is reinstated
in the program counter when the execution returns from the subroutine call. The programmer has
control over the stack by using the PUSH, PSHD, POP, and POPD instructions. The PUSH and
POP operations push the accumulator on the stack or pop the top of the stack to the accumulator
respectively. PSHD and POPD do the same functions but with memory locations instead of the
accumulator.
Occasionally the program execution in a processor must be interrupted in order to take care
of urgent functions, such as receiving data from external sources. In these cases, a special signal
goes to the processor, and an interrupt occurs. The interrupts can be internal or external. During
an interrupt, the processor stops execution, wherever it may be, pushes the address of the next

instruction on the stack, and starts executing from a predetermined location in memory. The
interrupt approach is appropriate when there are functions or devices that need immediate attention.
OntheTMS320C25, there areseveralinternalandexternalinterrupts, whichareprioritized, i.e., when
several of the interrupts occur at the same time, the one with the highest priority is executed first.
Typically, the memory location where the execution is directed to during an interrupt contains a
branch instruction. This branch instruction directs the program execution to an area in the program
memory whereaninterrupt service routineexists. The interrupt serviceroutinewill perform thetasks
that the interrupt has been designed for, and then return to the execution of the original program.
Besides the external hardware interrupts (for which there are dedicated pins on the device), there
are internal interrupts generated by the serial port and the timer. The serial port provides direct
communication with serial devices, such as codecs, serial analog-to-digital converters, etc. In these
devices, the data are transmitted serially, one bit at a time, and not in parallel, which would require
several parallel lines. When 16 bits have been input, the 16-bit word can be retrieved from the register
DRR (data receive register). Conversely, to transmit a word, you put it in the DXR (data transmit
register). These two registers occupy data memory locations 0 and 1, respectively, and they can be
treated like any other memory location.
The timer consists of a period register and a timer register. At the beginning of the operation, the
contents of the period register are loaded on the timer register, which is then decremented at every
machine cycle. When the value of the timer register reaches zero, it generates a timer interrupt, the
period register is loaded again on the timer register, and the whole operation is repeated.
77.6 TMS320C25 Instruction Set
The TMS320C25 has an instruction set consisting of 133 instructions. Some of these assembly
language instructions perform general purpose operations, while others are more specific to DSP
applications. This section discusses examples of instructions selected from different groups. For a
detailed description of each instruction, the reader is referred to the TMS320C25 User’s Guide [1].
Each instruction is represented by one or two 16-bit words. Part of the instruction is a unique code
c

1999 by CRC Press LLC

×