Tải bản đầy đủ (.pdf) (30 trang)

ARM Architecture Reference Manual- P26

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (393.06 KB, 30 trang )

VFP Addressing Modes

5.1.3

Scalar operations
If the destination register lies in the first bank of eight registers, the instruction specifies a scalar operation:
if d_bank ==
vec_len
Sd[0] =
Sn[0] =
Sm[0] =

0 then
= 1
d_num
n_num
m_num

Note
Source operands

ARM DDI 0100E

The source operands are always scalars, regardless of which bank they are in. This
allows individual elements of vectors to be used as scalars.

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

C5-5




VFP Addressing Modes

5.1.4

Mixed vector/scalar operations
If the destination register specified in the instruction does not lie in the first bank of eight registers, but the
second source register does, then the destination register and first source register specify vectors and the
second source register specifies a scalar:
if d_bank != 0 and m_bank == 0 then
vec_len = vector length specified by FPSCR
for i = 0 to vec_len-1
Sd[i] = (d_bank << 3) | d_index
Sn[i] = (n_bank << 3) | n_index
Sm[i] = m_num
d_index = d_index + (vector stride specified by FPSCR)
if d_index > 7 then
d_index = d_index - 8
n_index = n_index + (vector stride specified by FPSCR)
if n_index > 7 then
n_index = n_index - 8

Notes
First source operand
The first operand is always a vector, regardless of which bank it is in. This allows a set of
consecutive registers in the first bank to be treated as a vector.
Vector wrap-around
A vector operand must not wrap around so that it re-uses its first element. Otherwise, the
results of the instruction are UNPREDICTABLE. When the FPSCR specifies a vector stride of

1, this is not a restriction, because the vector length is at most 8. When the FPSCR specifies
a vector stride of 2, it implies that the vector length must be at most 4.
Operand overlap
If two operands overlap, they must be identical both in terms of which registers are accessed
and the order in which they are accessed. Otherwise, the results of the instruction are
UNPREDICTABLE. This implies that:


If the set of register numbers generated in Sd[i] overlaps the set of register numbers
generated in Sn[i], then d_num and n_num must be identical.



If the set of register numbers generated in Sn[i] includes m_num, the vector length
must be 1.

It is impossible for the set of register numbers generated in Sd[i] to include m_num,
because they lie in different banks.

C5-6

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

ARM DDI 0100E


VFP Addressing Modes


5.1.5

Vector operations
If neither the destination register nor the second source register lies in the first bank of eight registers, then
all register operands specify vectors:
if d_bank != 0 and m_bank != 0 then
vec_len = vector length specified by FPSCR
for i = 0 to vec_len-1
Sd[i] = (d_bank << 3) | d_index
Sn[i] = (n_bank << 3) | n_index
Sm[i] = (m_bank << 3) | m_index
d_index = d_index + (vector stride specified by FPSCR)
if d_index > 7 then
d_index = d_index - 8
n_index = n_index + (vector stride specified by FPSCR)
if n_index > 7 then
n_index = n_index - 8
m_index = m_index + (vector stride specified by FPSCR)
if m_index > 7 then
m_index = m_index - 8

Notes
Vector wrap-around A vector operand must not wrap around so that it re-uses its first element.
Otherwise, the results of the instruction are UNPREDICTABLE. When the FPSCR
specifies a vector stride of 1, this is not a restriction, since the vector length is at
most 8. When the FPSCR specifies a vector stride of 2, it implies that the vector
length must be at most 4.
Operand overlap

If two operands overlap, they must be identical both in terms of which registers are

accessed and the order in which they are accessed. Otherwise, the results of the
instruction are UNPREDICTABLE. This implies that:



If the set of register numbers generated in Sd[i] overlaps the set of register
numbers generated in Sm[i], then d_num and m_num must be identical.



ARM DDI 0100E

If the set of register numbers generated in Sd[i] overlaps the set of register
numbers generated in Sn[i], then d_num and n_num must be identical.

If the set of register numbers generated in Sn[i] overlaps the set of register
numbers generated in Sm[i], then n_num and m_num must be identical.

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

C5-7


VFP Addressing Modes

5.2

Addressing Mode 2 - Double-precision vectors (non-monadic)

31

28 27 26 25 24 23 22 21 20 19

cond

1 1 1 0 Op 0

Op

16 15

Dn

12 11 10 9 8

Dd

7

6

5

4 3

1 0 1 1 0 Op 0 0

0


Dm

When the vector length indicated by the FPSCR is greater than 1, the double-precision two-operand
instructions FADDD, FDIVD, FMULD, FNMULD, and FSUBD can specify three different types of behavior:

One arithmetic operation between two scalar values, yielding a scalar:
ScalarA op ScalarB → ScalarD



When this case is selected (see Scalar operations on page C5-11), it causes just one operation to be
performed, overriding the vector length specified in the FPSCR. This allows scalar operations and
vector operations to be mixed without the need to reprogram the FPSCR between them.
A set of N arithmetic operations, where N is the vector length specified in the FPSCR, with the first
operand scanning through a vector, the second operand remaining constant and the destination
scanning through a vector:
VectorA[0] op ScalarB → VectorD[0]
VectorA[1] op ScalarB → VectorD[1]
...
VectorA[N-1] op ScalarB → VectorD[N-1]

This can be abbreviated to:
VectorA op ScalarB → VectorD



A set of N arithmetic operations, where N is the vector length specified in the FPSCR, with both
operands and the destination scanning through vectors:
VectorA[0] op VectorB[0] → VectorD[0]
VectorA[1] op VectorB[1] → VectorD[1]

...
VectorA[N-1] op VectorB[N-1] → VectorD[N-1]

This can be abbreviated to:
VectorA op VectorB → VectorD

The double-precision three-operand instructions FMACD, FMSCD, FNMACD and FNMSCD each use the same
register for their addition/subtraction operand as for their destination. So they have three forms
corresponding to the above three:


A pure scalar form:
± (ScalarA * ScalarB) ± ScalarD → ScalarD



A form in which the second multiplication operand is a scalar and everything else scans through
vectors:
± (VectorA[0] * ScalarB) ± VectorD[0] → VectorD[0]
± (VectorA[1] * ScalarB) ± VectorD[1] → VectorD[1]
...
± (VectorA[N-1] * ScalarB) ± VectorD[N-1] → VectorD[N-1]

C5-8

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

ARM DDI 0100E



VFP Addressing Modes

This can be abbreviated to:
± (VectorA * ScalarB) ± VectorD → VectorD



A form in which everything scans through a vector:
± (VectorA[0] * VectorB[0]) ± VectorD[0] → VectorD[0]
± (VectorA[1] * VectorB[1]) ± VectorD[1] → VectorD[1]
...
± (VectorA[N-1] * VectorB[N-1]) ± VectorD[N-1] → VectorD[N-1]

This can be abbreviated to:
± (VectorA * VectorB) ± VectorD → VectorD

5.2.1

Register banks
To allow these various forms to be specified, the set of 16 double-precision registers is split into four banks,
each of four registers. The form used by an instruction depends on which operands are in the first bank. The
general principle behind the rules is that the first bank must be used to hold scalar operands while the other
banks are used to hold vector operands. All destination register writes and many source register reads adhere
to this principle, but some source register reads can result in scalar access to vector elements or vector
accesses to groups of scalars.
A vector operand consists of 2-4 registers from a single bank, with the number of registers being specified
by the vector length field of the FPSCR (see Vector length/stride control on page C2-22). The register
number in the instruction specifies the register that contains the first element of the vector. Each successive

element of the vector is formed by incrementing the register number by the value specified by the vector
stride field of the FPSCR. If this causes the register number to overflow the top of the register bank, the
register number wraps around to the bottom of the bank, as shown in Figure 5-2.

Scalar bank

Vector bank

Vector bank

Vector bank

d0

d4

d8

d12

d1

d5

d9

d13

d2


d6

d10

d14

d3

d7

d11

d15

Figure 5-2 Double-precision register banks

ARM DDI 0100E

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

C5-9


VFP Addressing Modes

5.2.2

Operation

The following pages describe each of the three possible forms of the addressing mode:

Scalar operations on page C5-11

Mixed vector/scalar operations on page C5-12

Vector operations on page C5-13.
In each case, the following values are generated:
The number of individual operations specified by the instruction.

vec_len

Dd[0] ... Dd[vec_len-1]
Destination registers of the individual operations.
Dn[0] ... Dn[vec_len-1]
First source registers of the individual operations.
Dm[0] ... Dm[vec_len-1]
Second source registers of the individual operations.
The register numbers specified in the instruction are broken up into bank numbers and indices within the
banks as follows:
d_bank = Dd[3:2]
d_index = Dd[1:0]
n_bank = dn[3:2]
n_index = Dn[1:0]
m_bank = Dm[3:2]
m_index = Dm[1:0]

Note
The case where the FPSCR specifies a vector length of 1 is not in fact a special case, since the rules for all
three forms of the addressing mode simplify to the following when the vector length is 1:

vec_len
Dd[0] =
Dn[0] =
Dm[0] =

C5-10

= 1
Dd
Dn
Dm

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

ARM DDI 0100E


VFP Addressing Modes

5.2.3

Scalar operations
If the destination register lies in the first bank of four registers, the instruction specifies a scalar operation:
if d_bank ==
vec_len
Dd[0] =
Dn[0] =
Dm[0] =


0 then
= 1
Dd
Dn
Dm

Notes
Source operands

ARM DDI 0100E

The source operands are always scalars, regardless of which bank they are in. This
allows individual elements of vectors to be used as scalars.

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

C5-11


VFP Addressing Modes

5.2.4

Mixed vector/scalar operations
If the destination register specified in the instruction does not lie in the first bank of four registers, but the
second source register does, then the destination register and first source register specify vectors and the
second source register specifies a scalar:

if d_bank != 0 and m_bank == 0 then
vec_len = vector length specified by FPSCR
for i = 0 to vec_len-1
Dd[i] = (d_bank << 2) | d_index
Dn[i] = (n_bank << 2) | n_index
Dm[i] = Dm
d_index = d_index + (vector stride specified by FPSCR)
if d_index > 3 then
d_index = d_index - 4
n_index = n_index + (vector stride specified by FPSCR)
if n_index > 3 then
n_index = n_index - 4

Notes
First source operand The first operand is always a vector, regardless of which bank it is in. This allows a
set of consecutive registers in the first bank to be treated as a vector.
Vector wrap-around A vector operand must not wrap around so that it re-uses its first element.
Otherwise, the results of the instruction are UNPREDICTABLE. When the FPSCR
specifies a vector stride of 1, this implies that the vector length must be at most 4.
When the FPSCR specifies a vector stride of 2, it implies that the vector length must
be at most 2.
Operand overlap

If two operands overlap, they must be identical both in terms of which registers are
accessed and the order in which they are accessed. Otherwise, the results of the
instruction are UNPREDICTABLE. This implies that:


If the set of register numbers generated in Dd[i] overlaps the set of register
numbers generated in Dn[i], then Dd and Dn must be identical.




If the set of register numbers generated in Dn[i] includes Dm, then the vector
length must be 1.

It is impossible for the set of register numbers generated in Dd[i] to include Dm,
because they lie in different banks.

C5-12

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

ARM DDI 0100E


VFP Addressing Modes

5.2.5

Vector operations
If neither the destination register nor the second source register lies in the first bank of four registers, then
all register operands specify vectors:
if d_bank != 0 and m_bank != 0 then
vec_len = vector length specified by FPSCR
for i = 0 to vec_len-1
Dd[i] = (d_bank << 2) | d_index
Dn[i] = (n_bank << 2) | n_index

Dm[i] = (m_bank << 2) | m_index
d_index = d_index + (vector stride specified by FPSCR)
if d_index > 3 then
d_index = d_index - 4
n_index = n_index + (vector stride specified by FPSCR)
if n_index > 3 then
n_index = n_index - 4
m_index = m_index + (vector stride specified by FPSCR)
if m_index > 3 then
m_index = m_index - 4

Notes
Vector wrap-around A vector operand must not wrap around so that it re-uses its first element.
Otherwise, the results of the instruction are UNPREDICTABLE. When the FPSCR
specifies a vector stride of 1, this implies that the vector length must be at most 4.
When the FPSCR specifies a vector stride of 2, it implies that the vector length must
be at most 2.
Operand overlap

If two operands overlap, they must be identical both in terms of which registers are
accessed and the order in which they are accessed. Otherwise, the results of the
instruction are UNPREDICTABLE. This implies that:



If the set of register numbers generated in Dd[i] overlaps the set of register
numbers generated in Dm[i], then Dd and Dm must be identical.




ARM DDI 0100E

If the set of register numbers generated in Dd[i] overlaps the set of register
numbers generated in Dn[i], then Dd and Dn must be identical.

If the set of register numbers generated in Dn[i] overlaps the set of register
numbers generated in Dm[i], then Dn and Dm must be identical.

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

C5-13


VFP Addressing Modes

5.3

Addressing Mode 3 - Single-precision vectors (monadic)
31

28 27 26 25 24 23 22 21 20 19

cond

1 1 1 0 1 D 1 1

16 15


Op

12 11 10 9 8

Fd

7

6

5

4 3

1 0 1 0 Op 1 M 0

0

Fm

When the vector length indicated by the FPSCR is greater than 1, the single-precision one-operand
instructions FABSS, FCPYS, FNEGS, and FSQRTS can specify three different types of behavior:


An operation on a scalar value, yielding a scalar:
Op(ScalarB) → ScalarD

When this case is selected (see Scalar-to-scalar operations on page C5-16), it causes just one
operation to be performed, overriding the vector length specified in the FPSCR. This allows scalar
operations and vector operations to be mixed without the need to reprogram the FPSCR between

them.


An operation on a scalar value, whose result is written to each of the N elements of a vector, where
N is the vector length specified in the FPSCR:
Op(ScalarB) → VectorD[0]
Op(ScalarB) → VectorD[1]
...
Op(ScalarB) → VectorD[N-1]

This can be abbreviated to:
Op(ScalarB) → VectorD



A set of N operations, where N is the vector length specified in the FPSCR, with both the operand and
the destination scanning through vectors:
Op(VectorB[0]) → VectorD[0]
Op(VectorB[1]) → VectorD[1]
...
Op(VectorB[N-1]) → VectorD[N-1]

This can be abbreviated to:
Op(VectorB) → VectorD

To allow these various forms to be specified, the set of 32 single-precision registers is split into four banks,
each of eight registers. For a description of this, see Register banks on page C5-3.

C5-14


Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

ARM DDI 0100E


VFP Addressing Modes

5.3.1

Operation
The following pages describe each of the three possible forms of the addressing mode:

Scalar-to-scalar operations on page C5-16

Scalar-to-vector operations on page C5-17

Vector-to-vector operations on page C5-18.
In each case, the following values are generated:
vec_len

The number of individual operations specified by the instruction.

Sd[0] ... Sd[vec_len-1]
Destination registers of the individual operations.
Sm[0] ... Sm[vec_len-1]
Source registers of the individual operations.
In all cases, the registers specified by the instruction are determined by concatenating the Fd and Fm fields
of the instruction with the D and M bits respectively:

d_num = (Fd << 1) | D
m_num = (Fm << 1) | M

These register numbers are then broken up into bank numbers and indices within the banks as follows:
d_bank = d_num[4:3]
d_index = d_num[2:0]
m_bank = m_num[4:3]
m_index = m_num[2:0]

Note
The case where the FPSCR specifies a vector length of 1 is not in fact a special case, since the rules for all
three forms of the addressing mode simplify to the following when the vector length is 1:
vec_len = 1
Sd[0] = d_num
Sm[0] = m_num

ARM DDI 0100E

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

C5-15


VFP Addressing Modes

5.3.2

Scalar-to-scalar operations

If the destination register lies in the first bank of eight registers, the instruction specifies a scalar operation:
if d_bank ==
vec_len
Sd[0] =
Sm[0] =

0 then
= 1
d_num
m_num

Notes
Source operands

C5-16

The source operand is always a scalar, regardless of which bank it lies in. This
allows individual elements of vectors to be used as scalars.

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

ARM DDI 0100E


VFP Addressing Modes

5.3.3


Scalar-to-vector operations
If the destination register specified in the instruction does not lie in the first bank of eight registers, but the
source register does, then the destination register specifies a vector and the source register specifies a scalar:
if d_bank != 0 and m_bank == 0 then
vec_len = vector length specified by FPSCR
for i = 0 to vec_len-1
Sd[i] = (d_bank << 3) | d_index
Sm[i] = m_num
d_index = d_index + (vector stride specified by FPSCR)
if d_index > 7 then
d_index = d_index - 8

Notes
Vector wrap-around A vector operand must not wrap around so that it re-uses its first element.
Otherwise, the results of the instruction are UNPREDICTABLE. When the FPSCR
specifies a vector stride of 1, this is not a restriction, because the vector length is at
most 8. When the FPSCR specifies a vector stride of 2, it implies that the vector
length must be at most 4.
Operand overlap

ARM DDI 0100E

If the source and destination overlap, they must be identical both in terms of which
registers are accessed and the order in which they are accessed. This implies that if
the set of register numbers generated in Sn[i] includes m_num, the vector length
must be 1.

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.


C5-17


VFP Addressing Modes

5.3.4

Vector-to-vector operations
If neither the destination register nor the source register lies in the first bank of eight registers, then both
register operands specify vectors:
if d_bank != 0 and m_bank != 0 then
vec_len = vector length specified by FPSCR
for i = 0 to vec_len-1
Sd[i] = (d_bank << 3) | d_index
Sm[i] = (m_bank << 3) | m_index
d_index = d_index + (vector stride specified by FPSCR)
if d_index > 7 then
d_index = d_index - 8
m_index = m_index + (vector stride specified by FPSCR)
if m_index > 7 then
m_index = m_index - 8

Notes
Vector wrap-around A vector operand must not wrap around so that it re-uses its first element.
Otherwise, the results of the instruction are UNPREDICTABLE. When the FPSCR
specifies a vector stride of 1, this is not a restriction, since the vector length is at
most 8. When the FPSCR specifies a vector stride of 2, it implies that the vector
length must be at most 4.
Operand overlap


C5-18

If the source and destination overlap, they must be identical both in terms of which
registers are accessed and the order in which they are accessed. Otherwise, the
results of the instruction are UNPREDICTABLE. This implies that if the set of register
numbers generated in Sd[i] overlaps the set of register numbers generated in
Sm[i], d_num and m_num must be identical.

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

ARM DDI 0100E


VFP Addressing Modes

5.4

Addressing Mode 4 - Double-precision vectors (monadic)
31

28 27 26 25 24 23 22 21 20 19

cond

1 1 1 0 1 0 1 1

16 15


Op

12 11 10 9

Dd

8

7

6 5

4

1 0 1 1 Op 1 0 0

3

0

Dm

When the vector length indicated by the FPSCR is greater than 1, the double-precision one-operand
instructions FABSD, FCPYD, FNEGD, and FSQRTD can specify three different types of behavior:

An operation on a scalar value, yielding a scalar:
Op(ScalarB)




-->

ScalarD

When this case is selected (see Scalar-to-scalar operations on page C5-21), it causes just one
operation to be performed, overriding the vector length specified in the FPSCR. This allows scalar
operations and vector operations to be mixed without the need to reprogram the FPSCR between
them.
An operation on a scalar value, whose result is written to each of the N elements of a vector, where
N is the vector length specified in the FPSCR:
Op(ScalarB)
Op(ScalarB)
...
Op(ScalarB)

-->
-->

VectorD[0]
VectorD[1]

-->

VectorD[N-1]

This can be abbreviated to:
Op(ScalarB)




-->

VectorD

A set of N operations, where N is the vector length specified in the FPSCR, with both the operand and
the destination scanning through vectors:
Op(VectorB[0])
Op(VectorB[1])
...
Op(VectorB[N-1])

-->
-->

VectorD[0]
VectorD[1]

-->

VectorD[N-1]

This can be abbreviated to:
Op(VectorB)

-->

VectorD

To allow these various forms to be specified, the set of 16 double-precision registers is split into four banks,

each of four registers. For a description of this, see Register banks on page C5-9.

ARM DDI 0100E

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

C5-19


VFP Addressing Modes

5.4.1

Operation
The following pages describe each of the three possible forms of the addressing mode:

Scalar-to-scalar operations on page C5-21

Scalar-to-vector operations on page C5-22

Vector-to-vector operations on page C5-23.
In each case, the following values are generated:
vec_len
The number of individual operations specified by the instruction.
Dd[0] ... Dd[vec_len-1]
Destination registers of the individual operations.
Dm[0] ... Dm[vec_len-1]
Source registers of the individual operations.

The register numbers specified in the instruction are broken up into bank numbers and indices within the
banks as follows:
d_bank = Dd[3:2]
d_index = Dd[1:0]
m_bank = Dm[3:2]
m_index = Dm[1:0]

Note
The case where the FPSCR specifies a vector length of 1 is not in fact a special case, since the rules for all
three forms of the addressing mode simplify to the following when the vector length is 1:
vec_len = 1
Dd[0] = Dd
Dm[0] = Dm

C5-20

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

ARM DDI 0100E


VFP Addressing Modes

5.4.2

Scalar-to-scalar operations
If the destination register lies in the first bank of four registers, the instruction specifies a scalar operation:
if d_bank ==

vec_len
Dd[0] =
Dm[0] =

0 then
= 1
Dd
Dm

Notes
Source operands

ARM DDI 0100E

The source operand is always a scalar, regardless of which bank it lies in. This
allows individual elements of vectors to be used as scalars.

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

C5-21


VFP Addressing Modes

5.4.3

Scalar-to-vector operations
If the destination register specified in the instruction does not lie in the first bank of four registers, but the

source register does, then the destination register specifies a vector and the source register specifies a scalar:
if d_bank != 0 and m_bank == 0 then
vec_len = vector length specified by FPSCR
for i = 0 to vec_len-1
Dd[i] = (d_bank << 2) | d_index
Dm[i] = m_num
d_index = d_index + (vector stride specified by FPSCR)
if d_index > 3 then
d_index = d_index - 4

Notes
Vector wrap-around A vector operand must not wrap around so that it re-uses its first element.
Otherwise, the results of the instruction are UNPREDICTABLE. When the FPSCR
specifies a vector stride of 1, this implies that the vector length must be at most 4.
When the FPSCR specifies a vector stride of 2, it implies that the vector length must
be at most 2.
Operand overlap

C5-22

If the source and destination overlap, they must be identical both in terms of which
registers are accessed and the order in which they are accessed. This implies that if
the set of register numbers generated in Dn[i] includes Dm, the vector length must
be 1.

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

ARM DDI 0100E



VFP Addressing Modes

5.4.4

Vector-to-vector operations
If neither the destination register nor the source register lies in the first bank of four registers, then both
register operands specify vectors:
if d_bank != 0 and m_bank != 0 then
vec_len = vector length specified by FPSCR
for i = 0 to vec_len-1
Dd[i] = (d_bank << 2) | d_index
Dm[i] = (m_bank << 2) | m_index
d_index = d_index + (vector stride specified by FPSCR)
if d_index > 3 then
d_index = d_index - 4
m_index = m_index + (vector stride specified by FPSCR)
if m_index > 3 then
m_index = m_index - 4

Notes
Vector wrap-around A vector operand must not wrap around so that it re-uses its first element.
Otherwise, the results of the instruction are UNPREDICTABLE. When the FPSCR
specifies a vector stride of 1, this implies that the vector length must be at most 4.
When the FPSCR specifies a vector stride of 2, it implies that the vector length must
be at most 2.
Operand overlap

ARM DDI 0100E


If the source and destination overlap, they must be identical both in terms of which
registers are accessed and the order in which they are accessed. Otherwise, the
results of the instruction are UNPREDICTABLE. This implies that if the set of register
numbers generated in Dd[i] overlaps the set of register numbers generated in Dm[i],
then Dd and Dm must be identical.

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

C5-23


VFP Addressing Modes

5.5

Addressing Mode 5 - VFP load/store multiple
31

28 27 26 25 24 23 22 21 20 19

cond

1 1 1 P U D W L

16 15

Rn


12 11

Fd

8

7

cp_num

0

offset

The VFP load multiple instructions (FLDMD, FLDMS, FLDMX) are examples of ARM LDC instructions,
whose addressing modes are described in Addressing Mode 5 - Load and Store Coprocessor on page A5-56.
Similarly, the VFP store multiple instructions (FSTMD, FSTMS, FSTMX) are examples of ARM STC
instructions, which have the same addressing modes. However, the full range of LDC/STC addressing
modes is not available for the VFP load multiple and store multiple instructions. This is partly because the
FLDD, FLDS, FSTD and FSTS instructions use some of the options, and partly because the
8_bit_offset field in the LDC/STC instruction is used for additional purposes in the VFP instructions.
This section gives details of the LDC/STC addressing modes that are allowed for the VFP load multiple and
store multiple instructions, and the assembler syntax for each option.

5.5.1

Summary
Whether an LDC/STC addressing mode is allowed for the VFP load multiple and store multiple instructions
can be determined by looking at the P, U and W bits of the instruction. Table 5-1 shows details of this.

Table 5-1 VFP load/store addressing modes
P

W

Instructions

Mode

0

0

0

UNDEFINED

See Note

0

0

1

UNDEFINED

See Note

0


1

0

FLDMD, FLDMS, FLDMX, FSTMD, FSTMS, FSTMX

Unindexed

0

1

1

FLDMD, FLDMS, FLDMX, FSTMD, FSTMS, FSTMX

Increment

1

0

0

FLDD, FLDS, FSTD, FSTS

(Negative offset)

1


0

1

FLDMD, FLDMS, FLDMX, FSTMD, FSTMS, FSTMX

Decrement

1

1

0

FLDD, FLDS, FSTD, FSTS

(Positive offset)

1

C5-24

U

1

1

UNDEFINED


See following
note

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

ARM DDI 0100E


VFP Addressing Modes

Note
For a hardware coprocessor implementation of the VFP instruction set, the UNDEFINED entries in Table 5-1
mean the coprocessor does not respond to the instruction, which make the ARM’s Undefined Instruction
exception occur (see Undefined Instruction exception on page A2-15).
For a software implementation, the UNDEFINED entries mean that such instructions must be passed to the
system’s normal mechanism for dealing with non-coprocessor undefined instructions. The exact details of
this are system-dependent.

ARM DDI 0100E

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

C5-25



VFP Addressing Modes

5.5.2

VFP load/store multiple - Unindexed
31

28 27 26 25 24 23 22 21 20 19

cond

1 1 0 0 1 D 0 L

16 15

Rn

12 11

Fd

8

7

cp_num

0

offset


This addressing mode is for VFP load multiple and store multiple instructions, and forms a range of
addresses. The first address formed is the start_address, and is the value of the base register Rn. Subsequent
addresses are formed by incrementing the previous address by four.


For the FLDMS and FSTMS instructions, the offset in the instruction is equal to the number of
single-precision registers to be transferred. One address is generated for each register, so the
end_address is four less than the value of the base register Rn plus four times the offset.



For the FLDMD and FSTMD instructions, the offset in the instruction is equal to twice the number of
double-precision registers to be transferred. Two addresses are generated for each register, so the
end_address is four less than the value of the base register Rn plus four times the offset.



For the FLDMX and FSTMX instructions, the offset in the instruction is one more than twice the
number of double-precision registers to be transferred.

The number of addresses generated is at most equal to the offset, but can be a smaller number (decided by
the implementor) provided the FLDMX and FSTMX instructions function correctly (see FLDMX on
page C4-425 and FSTMX on page C4-100). Accordingly, the end_address is the value of the base register
Rn plus four times the offset, minus an IMPLEMENTATION DEFINED amount which is at least four.

Instruction syntax
<opcode>IA{<cond>}

<Rn>, <registers>


where:
<opcode>

Is FLDM or FSTM, and controls the value of the L bit.



Is D, S or X, and controls the values of cp_num and offset[0].

<cond>

Is the condition under which the instruction is executed. The conditions are defined
in The condition field on page A3-5. If <cond> is omitted, the AL (always)
condition is used.

<Rn>

Specifies the base register. If R15 is specified for <Rn>, the value used is the
address of the instruction plus 8.

<registers>

Specifies the list of registers loaded or stored by the instruction. See the individual
instructions for details of which registers are specified and how Fd, D and offset are
set in the instruction.

Architecture version
All


C5-26

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

ARM DDI 0100E


VFP Addressing Modes

Operation
if (offset[0] == 1) and (cp_num == 0b1011) then /* FLDMX or FSTMX */
word_count = IMPLEMENTATION DEFINED value (<= offset)
else
/* Others */
word_count = offset
start_address = Rn
end_address = start_address + 4 * word_count - 4

Usage
For FLDMD, FLDMS, FSTMD and FSTMS, this addressing mode is typically used to load or store a short
vector. For example, to load a graphics point consisting of four single-precision coordinates into s8-s11, the
following code might be used:
ADR
FLDMIAS

Rn, Point
Rn, {s8-s11}


For FLDMX and FSTMX, this addressing mode is typically used as part of loading and saving the VFP state
in process swap code, in sequences like:
; Assume Rp points to the process block
ADD
Rn, Rp, #Offset_to_VFP_register_dump
FSTMIAX Rn, {d0-d15}

Notes
Offset restrictions

ARM DDI 0100E

The offset value must be at least 1 and at most 33. If the offset is 0 or greater than
33, the instruction is always UNPREDICTABLE. Each instruction also imposes further
restrictions on the offset, depending on the values of Fd and D. See the individual
instruction descriptions for details of these.

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

C5-27


VFP Addressing Modes

5.5.3

VFP load/store multiple - Increment
31


28 27 26 25 24 23 22 21 20 19

cond

1 1 0 0 1 D 1 L

16 15

Rn

12 11

Fd

8

7

cp_num

0

offset

This addressing mode is for VFP load multiple and store multiple instructions, and forms a range of
addresses. The first address formed is the start_address, and is the value of the base register Rn. Subsequent
addresses are formed by incrementing the previous address by four.



For the FLDMS and FSTMS instructions, the offset in the instruction is equal to the number of
single-precision registers to be transferred. One address is generated for each register, so the
end_address is four less than the value of the base register Rn plus four times the offset.



For the FLDMD and FSTMD instructions, the offset in the instruction is equal to twice the number of
double-precision registers to be transferred. Two addresses are generated for each register, so the
end_address is four less than the value of the base register Rn plus four times the offset.



For the FLDMX and FSTMX instructions, the offset in the instruction is one more than twice the
number of double-precision registers to be transferred.

The number of addresses generated is at most equal to the offset, but can be a smaller number (decided by
the implementor) provided the FLDMX and FSTMX instructions function correctly (see FLDMX on
page C4-42 and FSTMX on page C4-100). Accordingly, the end_address is the value of the base register Rn
plus four times the offset, minus an IMPLEMENTATION DEFINED amount which is at least four.
For all instructions, if the condition specified in the instruction matches the condition code status (see The
condition field on page A3-5), Rn is incremented by four times the offset specified in the instruction.

Instruction syntax
<opcode>IA{<cond>}

<Rn>!, <registers>

where:
<opcode>



Is D, S or X, and controls the values of cp_num and offset[0].

<cond>

Is the condition under which the instruction is executed. The conditions are defined
in The condition field on page A3-5. If <cond> is omitted, the AL (always)
condition is used.

<Rn>

Is the base register. If R15 is specified for <Rn>, the instruction is UNPREDICTABLE.

!

Indicates the base register writeback that occurs in this addressing mode. If it is
omitted, this is the Unindexed addressing mode (see VFP load/store multiple Unindexed on page C5-26) instead.

<registers>

C5-28

Is FLDM or FSTM, and controls the value of the L bit.

Specifies the list of registers loaded or stored by the instruction. For details of which
registers are specified and how Fd, D and offset are set, see individual instructions.

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.


ARM DDI 0100E


VFP Addressing Modes

Architecture version
All

Operation
if (offset[0] == 1) and (cp_num == 0b1011) then /* FLDMX or FSTMX */
word_count = IMPLEMENTATION DEFINED value (<= offset)
else
/* Others */
word_count = offset
start_address = Rn
end_address = start_address + 4 * word_count - 4
if ConditionPassed(cond) then
Rn = Rn + 4 * offset

Usage
For FLDMD, FLDMS, FSTMD and FSTMS, this addressing mode can be used to load or store an element of
an array of short vectors and advance the pointer to the next element. For example, if Rn points to an element
of an array of graphics points, each consisting of four single-precision co-ordinates, then:
FSTMIAS

Rn!, {s16-s19}

stores the single-precision registers s16, s17, s18 and s19 to the current element of the array and advances
Rn to point to the next element.

A related use occurs with long vectors of floating-point data. If Rn points to a long vector of single-precision
values, the same instruction stores s16, s17, s18 and s19 to the next four elements of the vector and advance
Rn to point to the next element after them.
For FSTMD, FSTMS and FSTMX, this addressing mode is useful for pushing register values on to an Empty
Ascending stack. Use FSTMD or FSTMS respectively when it is known that the registers contain only
double-precision data or only single-precision data. Use FSTMX when the precision of the data held in the
registers is unknown, and nothing needs to be done with the stored data apart from reloading it with a
matching FLDMX instruction. For instance, for callee-save registers in procedure entry sequences.
If multiple registers holding values of known but different precisions need to be pushed on to a stack,
FSTMX can be used if nothing needs to be done with the stored data apart from reloading it with a matching
FLDMX instruction. Otherwise, a sequence of FSTMD and FSTMS instructions needs to be used.
For FLDMD, FLDMS and FLDMX, this addressing mode is useful for popping data from a Full Descending
stack. The choice of which instruction to use follows the same principles as above.

Notes
Offset restrictions

ARM DDI 0100E

The offset value must at least 1 and at most 33. If the offset is 0 or greater than 33,
the instruction is always UNPREDICTABLE. Each instruction also imposes further
restrictions on the offset, depending on the values of Fd and D. See the individual
instruction descriptions for details of these.

Copyright © 1996-2000 ARM Limited. All rights reserved.

Please purchase PDF Split-Merge on www.verypdf.com to remove this watermark.

C5-29



×