mAgicV DSP Architecture Document

Feature Summary
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
1.0 GFLOPS - 1.5 GOPS at 100 MHz
AHB Master Port, integrated DMA Engine and AHB Slave Port
VLIW Architecture with five Independent Execution Units
Up to 10 Arithmetic Operations per Cycle (4 Multiply, 2 Add/Subtract, 1 Add, 1 Subtract
40-bit Floating Point and 32-bit Integer) performing a Single Cycle FFT Butterfly
Native Support for Complex Arithmetic and Vectorial SIMD Operations: One Complex
Multiply with Dual Add/Sub per Clock Cycle or Two Real Multiply and Two Add/Sub or
Simple Scalar Operations
32-bit Integer and IEEE® 40-bit Extended Precision Floating Point Numeric Format
16-port Data Register File: 256 Registers organized in two 128-register Banks
5-issue predicated VLIW Architecture with Orthogonal ISA, Code Compression and
Hardware Support for Code Efficient Software Pipeline Loops
4 Data Accesses per Cycle supported
2 Independent Address Generation Units Operating on a 16-register Address Register
File Supporting DSP features: Programmable Stride and Circular Buffers
1.7 Mbits of On-chip SRAM: 2 x 8K x 40-bit Data Memory Locations, 8 K x 128-bit
Program Memory Locations, Equivalent to ~50K DSP Assembler Instructions thanks to
Code Compression and SW Pipelining
Program Management Unit with HW Page-replacing Algorithm
DMA access to AHB SOC Peripherals and Memories
Hardware support for Debug
Support for Multi-core Integration: Mutex, Cross-triggering
mAgicV DSP
Architecture
Document
7011A–DSP–12/08
1. About this manual
This manual describes mAgicV DSP high-level architecture and the programming user interface.
mAgicV VLIW assembly language is described in the Assembly Reference Manual.
2. References
• Parallel Assembly Syntax User Guide - doc7013.pdf
• mAgicV C Compiler User Guide - doc7016.pdf
3. mAgicV VLIW DSP Architecture
mAgicV is a high performance Very Long Instruction Word (VLIW) DSP delivering 1.0 Giga floating-point operations per second (GFLOPS) and 1.5 GOPS at a clock rate of 100 MHz. It is
equipped with an AHB master port and an AHB slave port for system-on-chip integration. It has
256x40-bit data registers, 16x64-bit multi-field address registers to support DSP oriented
addressing modes like circular and stride accesses, 10 arithmetic operating units, two independent AGUs (Address Generation Unit) and a DMA engine. To sustain the internal parallelism,
the data bandwidth through the Register File is 80 bytes/cycle. The architecture is optimized to
work in the complex domain. When activating all the computing units, mAgicV can produce a
complete FFT butterfly per cycle. It also supports native 2D vectorial arithmetic operations. mAgicV operates on IEEE 754 40-bit extended precision floating-point and 32-bit integer numeric
format for numerical computations.
Figure 3-1.
mAgicV Block Diagram
AHB
External interrupts
2-port 8Kx128 Program Memory
32
Interrupt
controller
AHB
Master
13
pma
Decompressor
16
pc
Flow Controller
Debug
logic
DMA
Engine
MMU
AHB
Slave
40
port3
(master)
40
4x16x16-bit
Address
Register File
16
RF 256x40 (128x80)
4R+4W
128x40
4R+4W
128x40
RF0
RF1
40
40
Operator block
10-float 40bit
ops/cycle
2
32
32
64
48
AGU0
16
48
AGU1
14
add0
14
add1
40
port2
(slave)
4-port 16Kx40(8Kx80)
Data Memory
Data Memory
Bank0
8Kx40
Data Memory
Bank1
8Kx40
(agu0)
port0
(agu1)
port1
80
80
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
The Harvard memory architecture is composed of an on-chip 2x8Kx40-bit data memory and an
on-chip 8Kx128-bit program memory. Efficient usage of the program memory is achieved
through a mechanism of program compression performed by the software tool chain and supported by a hardware decompression engine. A program memory management unit supports a
virtual program space of 64Kx128-bit locations. Interrupts are vectorized to minimize the interrupt service latency.
3.1
VLIW overview
VLIW processors execute parallel operations based on a fixed schedule determined at compiletime. The processor does not need any hardware support for scheduling because the operations
execution schedule is handled by the compiler (including the operations to be executed simultaneously). As a result, VLIW CPUs offer significant computational power with less hardware
complexity (but greater compiler complexity) than most superscalar CPUs.
The rows in mAgicV program memory are 128 bit wide. When the “default” decoding scheme is
applied, the program word (composed of 120 bits) drives five execution units through five operation fields named issues. Eight additional bits drive the program decompression engine. mAgicV
issues are named: FLOW, AGU0, MUL, AGU1, ADD.
Table 3-1.
FLOW
Conceptual Representation of Issues in the Default VLIW Decoding Scheme
AGU0
MUL
AGU1
ADD
Two issues are associated to the pair of independent AGUs. The ADD and the MUL issues drive
respectively the add/subtract unit and the multiplier operators unit. The FLOW issue manages
the program flow unit. Each issue is predicated by a predication register for conditional execution without pipeline breaking penalties.
3.1.1
Description of the Issues
Every issue describes the operation to be performed by the associated execution unit. Each
issue is composed of one or more VLIW fields. Operations on the FLOW issue have the maximum priority and can inhibit the execution of other issues or change their default format.
Table 3-2 shows the main fields that constitute a VLIW program word. Issues can either be composed of these fields (default issues) or of completely different fields. Default issues are
orthogonal and grant maximum parallelism (five operations per cycle). The following paragraphs
show the default format of the five issues, further details about all possible issues and their formats can be found in the Assembly Reference Manual.
3.1.2
MGCVLIW Register
The decompressed VLIW word ready for the decoding stage is visible to the external AHB masters in the MGCVLIW register. The default structure of the VLIW word after decoding is shown in
Table 3-2.
Table 3-2.
MGCVLIW Register. Default Decoding Scheme.
127
126
125
124
123
used in compression
122
121
120
119
spare
118
117
116
114
113
112
RF add port 7
115
FLOW code
3
7011A–DSP–12/08
111
110
109
108
107
RF add port 7
106
105
104
103
102
101
100
99
RF add port 5
98
97
96
95
94
93
92
AGU0 code
91
90
89
88
ARF add0
87
86
ARF add0
85
84
83
82
RF add port 4
81
80
79
78
RF add port 4
77
76
75
74
RF add port 0
73
72
71
70
RF add port 0
69
68
67
66
RF add port 1
65
64
63
62
RF add port 1
61
60
59
58
MUL code
57
56
55
54
53
52
predicate MUL
47
46
45
44
AGU1 code
43
42
41
40
ARF add1
39
38
ARF add1
37
36
35
34
RF add port 6
33
32
31
30
RF add port 6
29
28
27
26
RF add port 2
25
24
23
22
RF add port 2
21
20
19
18
RF add port 3
17
16
15
14
RF add port 3
13
12
11
10
ADD code
9
8
7
6
5
4
3
2
predicate FLOW
MUL code
ADD code
4
predicate ADD
51
50
predicate FLOW write
49
48
predicate AGU1
1
0
predicate AGU0
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
3.2
Program Memory
The Program Memory system contains 8K*128-bit on-chip memory locations supporting up to 2
accesses/cycle. One-access/cycle is reserved to the core to fetch the program, while the other
access is used either by the internal AHB master access (i.e: DMA) or by the internal AHB slave
access (e.g.: debug or accesses executed by an external AHB master).
The read latency during program fetch is 1-cycle. While write and read latencies through AHB
are shown in Table 3-26 on page 39.
An efficient usage of the Program Memory is achieved through a program memory decompressor engine that is able to decompress, in a single clock cycle, words that are stored using a
compression format. Thus the total latency per program fetch, including the compression, is 2
cycles.
3.2.1
Decompressor
The mAgicV VLIW architecture is natively designed for optimal program density. Moreover, the
program compression scheme allows an average compression of 165%. Therefore, about eight
issues are stored for each 128-bit program memory locations. A higher Program Memory density
is achieved thanks to the combined effect of Program Compression and HW support for software pipelining (see Section 3.9.7 ”Software Pipeline HW Support” on page 55). Many
applications can be implemented on mAgicV using only the internal program memory. In fact,
the 8Kx128bit program memory provides, through code compression, about 50K DSP assembler instructions stored on-chip (typically).
This device is able to decompress at each cycle a 128-bit Compressed Program Word (CPW)
coming from the Program Memory into one or more 120-bit mAgicV VLIW words, by using a patented compression/decompression scheme.
3.2.1.1
Compressed Program Word
The CPW is composed of an 8-bit Super-Header (SH), an optional 16-bit Field-Header (FH), 7
optional 16-bit fields and an optional 8-bit field.
The Super-Header field is used to drive the decompression of the CPW. Table 3-3 shows an
example of 4 VLIWs compressed into a single 128-bit CPW. The SH field must always exist
(there is a SH for each compressed VLIW), all others fields are optional. The FH drives eight
decompressing units, named dyprodes, with 16 control bits (2-bit for each dyprode).
Table 3-3.
Compressed Program Word
127
24
FIELDS3
FH3
SH3
FH2
SH2
SH1
23
FIELDS0
FH0
8
0
SH0
The SH format is described in Table 3-4.
Table 3-4.
7
SH Format
6
5
4
3
word length
2
SH code
1
0
parity
fetch
• fetch
FETCH=’1’, fetches another CPW, PMA is incremented.
FETCH=’0’, doesn’t fetch, PMA is not modified.
5
7011A–DSP–12/08
• parity
It’s the parity bit of the decompressed VLIW word.
NOTE: When a program is compressed many parity bits can exist in the same Compressed Program Word.
• code
2-bit of global decompressing code.
Table 3-5.
SH Code
SH Codes
Mnemonic
0x0
NOP
0x1
SHIFTALL
0x2
CTRL
0x3
STORE
Description
a NOP is generated, no FH needed
a realign request (i.e. branch). The next CPW word will start with a SH
the decompressing units are driven by the Field Header
All the decompressing units are requested to store the contents of the
current CPW, no FH needed, see Table 3-6 on page 6 and Table 3-7
on page 7
• word length
The length of the current compressed VLIW is a multiple of 8 bits.
NOTE: A compressed VLIW can be stored in two successive CPWs.
3.2.1.2
Dyprodes
The dyprodes are the decompressing units. There are two types of dyprodes:
• derivative dyprodes: which decompress RF addresses
• value dyprodes: which decompress operation microcodes and other bits
Both dyprode types are driven by a 2-bit control word and operate either on a 16-bit input data
producing a 16-bit output or on an 8-bit input data producing an 8-bit output. Actually each 120bit decompressed program word is produced by 7 dyprodes operating on 16-bit plus one
dyprode operating on 8-bit. Accordingly, the Field Header is the control word (8x2 bits) of eight
dyprodes. Table 3-8 on page 7 shows the mapping between the decompressed VLIW and the
dyprodes outputs.
Each dyprode has two 16-bit registers: SAMEREG and SWAPREG that hold respectively the
current value (or current derivative) and the last value (last derivative). The derivative dyprode
has a third register used as an accumulator that holds the sum between one of the previous registers and the old accumulated sum. Table 3-6 on page 6 shows the operations performed by
the dyprodes.
Table 3-6.
Code
0x0
6
Value Dyprode Control Codes
Mnemonic
NOP
Operation
Description
output=constant
a constant output is generated
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
Code
Mnemonic
Operation
Description
0x1
SAME
output=samereg
output is not changed
0x2
SWAP
output=swapreg
swapreg=samereg
output is a previously stored input, the
current output is stored
STORE
output=input
samereg=input
swapreg=samereg
input is stored and put in output
0x3
Table 3-7.
Derivative Dyprode Control Codes
Code
Mnemonic
Operation
Description
output=constant
acc=constant
a constant output is generated, accumulator
reset
0x0
NOP
0x1
SAME
output= acc + samereg
output is the accumulator value plus the
stored derivative
0x2
SWAP
output=acc + swapreg
swapreg=samereg
output is the accumulator value plus the last
stored derivative
STORE
output=input
samereg=input
swapreg=samereg
input is stored and put in output
0x3
Table 3-8.
VLIW to Dyprode Mapping
VLIW
Dyprode Output
vliw[3:0]
Value dyprode #4 [3:0]
vliw[12:4]
Value dyprode #5 [8:0]
vliw[28:13]
Derivative dyprode #0 [15:0]
vliw[36:29]
Derivative dyprode #1 [7:0]
vliw[52:37]
Value dyprode #6 [15:0]
vliw[60:53]
Value dyprode #7 [7:0]
vliw[76:61]
Derivative dyprode #2 [15:0]
vliw[84:77]
Derivative dyprode #1 [15:8]
vliw[95:85]
Value dyprode #4 [14:4]
vliw[111:96]
Derivative dyprode #3 [15:0]
vliw[112]
vliw[119:113]
Value dyprode #4 [15]
Value dyprode #5 [15:9]
7
7011A–DSP–12/08
3.3
Register File
In order to provide optimal data bandwidth and to give the best support to the RISC-like programming model, mAgicV arithmetic computations are supported by a 16-port 256x40-bit entries
Register File (RF). The registers are numbered from RF0 to RF255 and they can be accessed
either individually for scalar operations or in pairs, aligned to even addresses, for operations in
the complex or vectorial domain.
3.3.1
Description
The RF is a multi-port memory containing 256 registers (RF0-RF255). They are arranged in two
blocks of 128x40-bit registers each. BLOCK0 holds even registers (real part of a complex number) while the BLOCK1 holds odd registers (imaginary part of a complex number). At each cycle
eight addresses are delivered to both register arrays in order to move up to eight in/out data
pairs. Both the odd and the even side of the register file are 9-ported (4-read ports and 4-write
ports for computing/move operations + 1 port for independent debug access), making a total of
16 I/O ports available for the data move to and from the operators block and the memory, plus
ports for debug accesses.
The software tool chain must avoid concurrent double write with different data on the same RF
address.
NOTE: In case of a simultaneous (i.e same cycle) write and read at the same location there is a
bypass logic that allows to read the currently written data. This can be used for reducing the loop
length because the read after the write data latency takes one cycle instead of two cycles.
Figure 3-2.
Register File
4 data input (right)
4 data input (left)
BLOCK0
left block (128 regs)
real
even addresses
4 data output (left)
controls
control
address/control
input
BLOCK1
right block (128 regs)
immaginary
odd addresses
4 data output (right)
Debug circuit not drawn
8
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
3.4
Operators Block
The Operators Block performs arithmetical operations. It works on IEEE 754 extended precision
40-bit floating-point and 32-bit signed integers data. The 16-bit unsigned and signed integers are
managed by the AGUs (see Section 3.6.3 ”Arithmetic” on page 36). The operators are arranged
in order to support:
• arithmetic on complex domain (throughput of one complex multiply, add or multiply and add
per cycle)
• fast FFT (throughput of one complete butterfly computation per cycle)
• vectorial arithmetic acting on operands constituted of data pairs. The operators block is able
to launch a vectorial multiply plus a vectorial add at every cycle
• scalar arithmetic acting on scalar data. The operator block is able to launch a scalar
multiplication and a scalar addition at every cycle
The peak performance of mAgicV is achieved during single cycle FFT butterfly execution, when
mAgicV delivers 10 floating-point or 32-bit signed integer operations per clock cycle.
The operands manipulated by the operators block are specified by the RF addresses. The RF
addresses for scalar domain operations can be either odd or even. Vectorial and Complex operand pairs need even RF addresses.
3.4.1
Description
The Operators Block is composed of 4 integer/floating point multipliers, 4 integer/floating point
adders, 2 shift/logic units and 2 seed generators (see Figure 3-3 on page 10).
The complex product is hardwired in the operators block. This is obtained by using 4 multipliers
and 2 adders with integer and floating-point capabilities.
Vector operations are performed using 2 multipliers, 2 adders, 2 logic units and 2 seed generators with floating point and integer capabilities, on the left and right path simultaneously.
Scalar operations (floating or integer) operate on one path only (left or right depending on the
parity of the register address involved).
The inputs of the 4 multipliers are RF Port 0 and RF Port 1, while the inputs of the 2 independent
adders are RF Port 2 and RF Port 3. In output, the adder returns its result in RF Port 6 and the
multiplier returns it in RF Port 4. RF Port 5 collects the SUB output of the adder unit in simultaneous ADD/SUB instructions only.
9
7011A–DSP–12/08
Figure 3-3.
Operators Block
TORF5
TORF5 TORF7
TORF7
P6_0
P6_1
P5_0
IN ports:
P4_0
OUT port:
Conv1 FP/I
Div1
Sh/Log1
4
0
5
6
RF0
1
2
7
IN ports:
3
OUT port:
Mul2
FP/I
Mul1
*
*
4
5
6
RF1
0
1
2
Mul3
FP/I
7
P4_1
3
P5_1
FP/I
Mul4 Conv2
Div2
Sh/Log2
*
*
FP/I Cadd1
FP/I Cadd2
-
+
Min
Max1 FP/I
-
Add1
+
Operator path left(0)
Min
Add2 Max2
FP/I
+
-
Operator path right(1)
Arithmetic operations performed by the operators block can be classified either as “ADD operations” or “MUL operations”. ADD and MUL operations can be performed in parallel, because
they are driven by two orthogonal issues in the VLIW program word.
The ADD operations involve solely adders/subtractors. All other operations are classified as
MUL operations (i.e a complex multiplication is classified in the group of MUL operations even if
adders are involved in the calculation).
3.4.2
ADD/SUB Issue
The ADD/SUB issue drives a pair of adder/subtractor and minmax operators (ADD1, ADD2,
MIN/MAX1, MIN/MAX2 in Figure 3-3 above). It takes two scalar, either vectorial or complex,
input operands from the Register File. Usually, a scalar result, either vectorial or complex, is
returned to the Register File. Only the ADDSUB operations produce two results. Compare operations produce no result at all (only condition flags are updated).
There are two possible formats for an ADD/SUB issue:
Table 3-9.
vliw[5:4]
ADD
predication
ADD/SUB Issue Format 1
vliw[12:6]
vliw[36:29]
vliw[20:13]
vliw[28:21]
ADD code
RF add 6 (D)
RF add3 (S1)
RF add2 (S2)
• RF add 2: Register File address port 2
It specifies one of the 256 RF registers as input operand0
10
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
• RF add 3: Register File address port 3
It specifies one of the 256 RF registers as input operand1
• RF add 6: Register File address port 6
It specifies one of the 256 RF registers as output operand
• ADD opcode: ADD microcode
It specifies the ADD operation to be performed
• ADD predication
It specifies one of the four predication registers, if the condition of the predication register is false
the issue will not be executed
The second format is reserved to operations that generate two results like: minmax or addsub.
In this case one more output address must be specified:
Table 3-10.
vliw[5:4]
ADD
predication
ADD/SUB Issue Format 2
vliw[12:6]
vliw[102:96]
vliw[36:29]
vliw[20:13]
vliw[28:21]
ADD opcode
RF add5 (D2)
RF add6 (D1)
RF add3 (S1)
RF add2 (S2)
• RF add 5: Register File address port 5
It specifies one of the 256 RF registers as second output operand
Table 3-11.
ADD/SUB Operations
ADD/SUB
Opcode
Mnemonic
Description
Complex Floating Point Addition
0x0
CFADD D S1 S2
0x1
CFADD D S1 -S2
0x2
CFADD D -S1 -S2
0x3
CFADD D -S1 S2
( D 0, D 1 ) = – S 1 c + S2 c = ( ( – S 1 r + S2 r ), ( – S 1 i + S2 i ) )
0x4
CFJADD D S1 S2
( D 0, D 1 ) = S1 c + S2 c = ( ( S1 r + S2 r ), ( S1 i – S2 i ) )
0x5
CFJADD D S1 -S2
( D 0, D 1 ) = S1 c – S2 c = ( ( S1 r – S2 r ), ( S1 i + S2 i ) )
( D 0, D 1 ) = S1 c + S2 c = ( ( S1 r + S2 r ), ( S1 i + S2 i ) )
Complex Floating Point Subtraction
( D 0, D 1 ) = S1 c – S2 c = ( ( S1 r – S2 r ), ( S1 i – S2 i ) )
Complex Floating Point Addition
( D 0, D 1 ) = – S 1 c – S2 c = ( ( – S 1 r – S2 r ), ( – S 1 i – S2 i ) )
Complex Floating Point Addition
Complex Conjugate Floating Point Addition
Complex Conjugate Floating Point Subtraction
11
7011A–DSP–12/08
ADD/SUB
Opcode
Mnemonic
Description
Complex Conjugate Floating Point Addition
0x6
CFJADD D -S1 -S2
0x7
CFJADD D -S1 S2
0x8
CJJADD D S1 S2
0x9
CJJADD D S1 -S2
0xA
CJJADD D -S1 -S2
0xB
CJJADD D -S1 S2
0xC
CFRADD D S1 S2
Complex with Real Floating Point Addition
( D 0, D 1 ) = S1 c + S2 r = ( ( S1 r + S2 r ), S1 i )
0xD
CFRADD D S1 -S2
Complex with Real Floating Point Subtraction
( D 0, D 1 ) = S1 c – S2 r = ( ( S1 r – S2 r ), S1 i )
0xE
CFRADD D -S1 -S2
0xF
CFRADD D -S1 S2
0x10
CIADD D S1 S2
Integer Complex Integer Addition
( D 0, D 1 ) = S1 c + S2 c = ( ( S1 r + S2 r ), ( S1 i + S2 i ) )
0x11
CIADD D S1 -S2
Complex Integer Subtraction
( D 0, D 1 ) = S1 c – S2 c = ( ( S1 r – S2 r ), ( S1 i – S2 i ) )
0x12
CIADD D -S1 -S2
0x13
CIADD D -S1 S2
0x14
CIJADD D S1 S2
0x15
CIJADD D S1 -S2
0x16
CIJADD D -S1 -S2
( D 0, D 1 ) = – S 1 c – S2 c = ( ( – S 1 r – S2 r ), ( – S 1 i + S2 i ) )
Complex Conjugate Floating Point Addition
( D 0, D 1 ) = – S 1 c + S2 c = ( ( – S 1 r + S2 r ), ( – S 1 i – S2 i ) )
Complex Double-Conjugate Floating Point Addition
( D 0, D 1 ) = S1 1 + S2 2 = ( ( S1 r + S2 r ), ( – S1 i – S2 i ) )
Complex Double-Conjugate Floating Point Subtraction
( D 0, D 1 ) = S1 1 – S2 2 = ( ( S1 r – S2 r ), ( – S1 i + S2 i ) )
Complex Double-Conjugate Floating Point Addition
( D 0, D 1 ) = – S 1 1 – S2 2 = ( ( – S 1 r – S2 r ), ( S1 i + S2 i ) )
Complex Double-Conjugate Floating Point Addition
( D 0, D 1 ) = – S 1 1 + S2 2 = ( ( – S 1 r + S2 r ), ( S1 i – S2 i ) )
Complex with Real Floating Point Addition
( D 0, D 1 ) = – S 1 c – S2 r = ( ( – S 1 r – S2 r ), – S 1 i )
Complex with Real Floating Point Addition
( D 0, D 1 ) = – S 1 c + S2 r = ( ( – S 1 r + S2 r ), – S 1 i )
Complex Integer Addition
( D 0, D 1 ) = – S 1 c – S2 c = ( ( – S 1 r – S2 r ), ( – S 1 i – S2 i ) )
Complex Integer Addition
( D 0, D 1 ) = – S 1 c + S2 c = ( ( – S 1 r + S2 r ), ( – S 1 i + S2 i ) )
Complex Conjugate Integer Addition
( D 0, D 1 ) = S1 c + S2 c = ( ( S1 r + S2 r ), ( S1 i – S2 i ) )
Complex Conjugate Integer Subtraction
( D 0, D 1 ) = S1 c – S2 c = ( ( S1 r – S2 r ), ( S1 i + S2 i ) )
Complex Conjugate Integer Addition
12
( D 0, D 1 ) = – S 1 c – S2 c = ( ( – S 1 r – S2 r ), ( – S 1 i + S2 i ) )
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
ADD/SUB
Opcode
Mnemonic
Description
Complex Conjugate Integer Addition
0x17
CIJADD D -S1 S2
0x18
CIJJADD D S1 S2
( D 0, D 1 ) = S1 1 + S2 2 = ( ( S1 r + S2 r ), ( – S1 i – S2 i ) )
0x19
CJIJADD D S1 -S2
( D 0, D 1 ) = S1 1 – S2 2 = ( ( S1 r – S2 r ), ( – S1 i + S2 i ) )
0x1A
CIJJADD D -S1 -S2
( D 0, D 1 ) = – S 1 1 – S2 2 = ( ( – S 1 r – S2 r ), ( S1 i + S2 i ) )
0x1B
CIJJADD D -S1 S2
( D 0, D 1 ) = – S 1 1 + S2 2 = ( ( – S 1 r + S2 r ), ( S1 i – S2 i ) )
0x1C
CIRADD D S1 S2
0x1D
CIRADD D S1 -S2
( D 0, D 1 ) = S1 c – S2 r = ( ( S1 r – S2 r ), S1 i )
0x1E
CIRADD D -S1 -S2
( D 0, D 1 ) = – S 1 c – S2 r = ( ( – S 1 r – S2 r ), – S 1 i )
0x1F
CIRADD D -S1 S2
0x20
VFABSADD D S1 S2
0x21
VFABSADD D S1 -S2
0x22
VFABSADD D -S1 -S2
( D 0, D 1 ) = – S 1 – S 2 = ( ( – S 1 0 – S2 0 ) , ( – S 1 1 – S2 1 ) )
0x23
VFABSADD D -S1 S2
( D 0, D 1 ) = – S 1 + S2 = ( ( – S 1 0 + S2 0 ) , ( – S1 1 + S2 1 ) )
0x24
VIABSADD D S1 S2
( D 0, D 1 ) = S1 + S2 = ( ( S1 0 + S2 0 ) , ( S1 1 + S2 1 ) )
0x25
VIABSADD D S1 -S2
0x26
VIABSADD D -S1 -S2
( D 0, D 1 ) = – S 1 – S 2 = ( ( – S 1 0 – S2 0 ) , ( – S 1 1 – S2 1 ) )
0x27
VIABSADD D -S1 S2
( D 0, D 1 ) = – S 1 + S2 = ( ( – S 1 0 + S2 0 ) , ( – S1 1 + S2 1 ) )
( D 0, D 1 ) = – S 1 c + S2 c = ( ( – S 1 r + S2 r ), ( – S 1 i – S2 i ) )
Complex Double-Conjugate Integer Addition
Complex Double-Conjugate Integer Subtraction
Complex Double-Conjugate Integer Addition
Complex Double-Conjugate Integer Addition
Complex with Real Integer Addition
( D 0, D 1 ) = S1 c + S2 r = ( ( S1 r + S2 r ), S1 i )
Complex with Real Integer Subtraction
Complex with Real Integer Addition
Complex with Real Integer Addition
( D 0, D 1 ) = – S 1 c + S2 r = ( ( – S 1 r + S2 r ), – S 1 i )
Vectorial Absolute Floating Point Addition
( D 0, D 1 ) = S1 + S2 = ( ( S1 0 + S2 0 ) , ( S1 1 + S2 1 ) )
Vectorial Absolute Floating Point Subtraction
( D 0, D 1 ) = S1 – S 2 = ( ( S1 0 – S2 0 ) , ( S1 1 – S2 1 ) )
Vectorial Absolute Floating Point Addition
Vectorial Absolute Floating Point Addition
Integer Vectorial Absolute Integer Addition
Vectorial Absolute Integer Subtraction
( D 0, D 1 ) = S1 – S 2 = ( ( S1 0 – S2 0 ) , ( S1 1 – S2 1 ) )
Vectorial Absolute Integer Addition
Vectorial Absolute Integer Addition
13
7011A–DSP–12/08
ADD/SUB
Opcode
Mnemonic
Description
Vectorial Floating Point Addition and Subtraction
0x28
VFADDSUB D1D2 S1 S2
( D1 0, D1 1 ) = ( S1 0 + S2 0, S1 1 + S2 1 )
( D2 0, D2 1 ) = ( S1 0 – S2 0, S1 1 – S2 1 )
0x29
FADD D S1 S2
Floating Point Addition
D = S1 + S2
0x2A
FADD D S1 -S2
Floating Point Addition
D = S1 – S2
0x2B
FADD D -S1 -S2
Floating Point Addition
D = – S1 – S2
0x2C
FADD D -S1 S2
Floating Point Addition
D = – S1 + S2
Floating Point Addition and Subtraction
0x2D
FADDSUB D1D2 S1 S2
D1 = ( S1 + S2 )
D2 = ( S1 – S2 )
0x2E
IADD D S1 S2
Integer Addition
D = S1 + S2
0x2F
IADD D S1 -S2
Integer Addition
D = S1 – S2
0x30
IADD D -S1 -S2
Integer Addition
D = – S1 – S2
0x31
IADD D -S1 S2
Integer Addition
D = – S1 + S2
Integer Addition and Subtraction
0x32
IADDSUB D1D2 S1 S2
D1 = ( S1 + S2 )
D2 = ( S1 – S2 )
14
0x33
FABSADD D S1 S2
Floating Point Absolute Addition
D = S1 + S2
0x34
FABSADD D S1 -S2
Floating Point Absolute Addition
D = S1 – S2
0x35
FABSADD D -S1 -S2
Floating Point Absolute Addition
D = – S1 – S2
0x36
FABSADD D -S1 S2
Floating Point Addition
D = – S1 + S2
0x37
IABSADD D S1 S2
Integer Addition
D = S1 + S2
0x38
IABSADD D S1 -S2
Integer Addition
D = S1 – S2
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
ADD/SUB
Opcode
Mnemonic
Description
0x39
IABSADD D -S1 -S2
Integer Addition
D = – S1 – S2
0x3A
IABSADD D -S1 S2
Integer Addition
D = – S1 + S2
0x3B
VFMIN D S1 S2
Vectorial Floating Point Minimum
( D 0, D 1 ) = ( MIN ( S1 0, S2 0 ), MIN ( S1 1, S2 1 ) )
0x3C
VIMIN D S1 S2
Vectorial Integer Minimum
( D 0, D 1 ) = ( MIN ( S1 0, S2 0 ), MIN ( S1 1, S2 1 ) )
0x3D
VFMAX D S1 S2
0x3E
VIMAX D S1 S2
Vectorial Floating Point Maximum
( D 0, D 1 ) = ( MAX ( S1 0, S2 0 ), MAX ( S1 1, S2 1 ) )
Vectorial Integer Maximum
( D 0, D 1 ) = ( MAX ( S1 0, S2 0 ), MAX ( S1 1, S2 1 ) )
Vectorial Floating Point Min And Max
0x3F
VFMINMAX D1D2 S1 S2
( D1 0, D1 1 ) = ( MIN ( S1 0, S2 0 ), MIN ( S1 1, S2 1 ) )
( D2 0, D2 1 ) = ( MAX ( S1 0, S2 0 ), MAX ( S1 1, S2 1 ) )
Vectorial Integer Min And Max
0x40
VIMINMAX D1D2 S1 S2
( D1 0, D1 1 ) = ( MIN ( S1 0, S2 0 ), MIN ( S1 1, S2 1 ) )
( D2 0, D2 1 ) = ( MAX ( S1 0, S2 0 ), MAX ( S1 1, S2 1 ) )
Vectorial swap (vectorially conditioned)
fpucond0 = 1 ⇒S1 0 ←S2 0 ;S1 0 →S2 0
0x41
VSWAP S1 S2
fpucond1 = 1 ⇒S1 1 ←S2 1 ;S1 1 →S2 1
fpucond0 = 0 ⇒S1 0 ←S1 0 ;S2 0 →S2 0
fpucond1 = 0 ⇒S1 1 ←S1 1 ;S2 1 →S2 1
Floating Point Minimum
0x42
FMIN D S1 S2
0x43
IMIN D S1 S2
0x44
FMAX D S1 S2
Floating Point Maximum
D = MAX ( S 1, S 2 )
0x45
IMAX D S1 S2
Integer Maximum
D = MAX ( S 1, S 2 )
D = MIN ( S 1, S 2 )
Integer Minimum
D = MIN ( S 1, S 2 )
Floating Point Min And Max
0x46
FMINMAX D1D2 S1 S2
D1 = MIN ( S1, S2 )
D1 = MAX ( S1, S2 )
15
7011A–DSP–12/08
ADD/SUB
Opcode
Mnemonic
Description
Integer Min And Max
0x47
IMINMAX D1D2 S1 S2
D1 = MIN ( S1, S2 )
D1 = MAX ( S1, S2 )
Floating Point Selection (conditioned)
0x48
FSEL D S1 S2
fpucond0 = 1 ⇒D = S1
fpucond0 = 0 ⇒D = S2
Integer Selection (conditioned)
0x49
ISEL D S1 S2
fpucond0 = 1 ⇒D = S1
fpucond0 = 0 ⇒D = S2
Swap (conditioned)
0x5A
SWAP S1 S2
fpucond0 = 1 ⇒S1 ←S2 ;S1 →S2
fpucond0 = 0 ⇒S1 ←S1 ;S2 →S2
16
0x61
CFEQ S1 S2
Complex Floating Point Equal
0x62
CFNE S1 S2
Complex Floating Point not Equal
0x7B
CIEQ S1 S2
Complex Integer Equal
0x7C
CINE S1 S2
Complex Integer not Equal
0x63
VFEQ S1 S2
Vectorial Floating Point Equal
0x64
VFLT S1 S2
Vectorial Floating Point Less Than
0x65
VFGT S1 S2
Vectorial Floating Point Greater Than
0x66
VFLE S1 S2
Vectorial Floating Point Less or Equal
0x67
VFGE S1 S2
Vectorial Floating Point Greater or Equal
0x68
VFNE S1 S2
Vectorial Floating Point Not Equal
0x69
VIEQ S1 S2
Vectorial Integer Equal
0x6A
VILT S1 S2
Vectorial Signed Integer Less Than
0x6B
VIGT S1 S2
Vectorial Signed Integer Greater Than
0x6C
VILE S1 S2
Vectorial Signed Integer Less or Equal
0x6D
VIGE S1 S2
Vectorial Signed Integer Greater or Equal
0x6E
VINE S1 S2
Vectorial Integer Not Equal
0x6F
FEQ S1 S2
Floating Point Equal
0x70
FLT S1 S2
Floating Point Less Than
0x71
FGT S1 S2
Floating Point Greater Than
0x72
FLE S1 S2
Floating Point Less or Equal
0x73
FGE S1 S2
Floating Point Greater or Equal
0x74
FNE S1 S2
Floating Point Not Equal
0x75
IEQ S1 S2
Integer Equal
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
ADD/SUB
Opcode
Mnemonic
Description
0x76
ILT S1 S2
Integer Signed Less Than
0x77
IGT S1 S2
Integer Signed Greater Than
0x78
ILE S1 S2
Integer Signed Less or Equal
0x79
IGE S1 S2
Integer Signed Greater or Equal
0x7A
INE S1 S2
Integer Not Equal
Vectorial Integer Addition and Subtraction
0x7D
VIADDSUB D1D2 S1 S2
( D1 0, D1 1 ) = ( S1 0 + S2 0, S1 1 + S2 1 )
( D2 0, D2 1 ) = ( S1 0 – S2 0, S1 1 – S2 1 )
0x7F
3.4.3
NOP
Not OPeration
ADD/SUB Operators
The ADD/SUB operators allow computing different types of additions and subtractions on integer and floating point data types: complex, vectorial and single additions. They also allow to
explicitly execute an addition and a subtraction concurrently on the same data and also execute
a certain number of miscellaneous instructions such as absolute value computation, min/max,
conditioned selection of input data, swap, etc. In complex additions, it is feasible to conjugate
the second or both the operands as well as add a complex number with a real one.
ADD/SUB instructions on floating point or 32-bit signed integers include:
• Complex addition/subtraction
– standard,
– standard with conjugation of input operands
– between real and complex operands
• Vectorial and scalar addition, subtraction, add/sub
– addition, subtraction
– add/sub
– absolute value
– min, max, min/max
– swap input operands
– selection and compare
The ADD/SUB floating-point execution unit operates on 40-bit floating-point operands and outputs 40-bit floating-point results. The ADD/SUB integer device operates on 32-bit integer
operands and outputs 32-bit integer results. In the case of integer instructions, only 32 LSBs are
treated, while the 8 MSBs remain unchanged see Section 4.2 ”Data Organization” on page 63.
3.4.3.1
Latency
Each ADD/SUB computation takes two pipeline cycles to be completed. During the first cycle the
input data (read from the Register File at the previous cycle) enters the adder and a first half of
the instruction is executed. During the second cycle the last part of the instruction is executed
and the result is ready to be written in the Register File. At the next cycle, it is actually written in
the Register File. From this cycle on the result can be read for the subsequent operations. So,
17
7011A–DSP–12/08
the total adder pipeline is four cycles long, two cycles for arithmetic computation plus two cycles
necessary for the input data read from and the output data write to the Register File.
The Table 3-12 illustrates this pipeline scheme for a simple addition operation:
Z=X+Y
where X and Y are the adder input data already present in the Register File and Z is the result
(adder output data) that will be written to the Register File.
Table 3-12.
ADD/SUB Pipeline
Cycle
Operation
Note
0
Data X and Y are read from the Register File
decoding and RF
read
1
Data X and Y enter the adder - First half instruction execution
computation
2
Second half instruction execution - Data Z (result) output the
adder
computation
3
Data Z written to the Register File
write back to RF
As shown in the above table, cycle 1 and cycle 2 are the two adder pipeline cycles (computation
cycles), while cycle 0 and cycle 3 are used to access the Register File. From cycle 3 on the
result is already available in the Register File for a read access.
This bypass logic allows reading the written data saving a pipeline cycle (latency is still 4 cycles,
but loops on accumulated data can be reduced to 3 cycles only).
3.4.4
MUL Issue
The MUL issue drives:
• the block containing 4 multiply arithmetic operators plus 2 adders for complex domain
multiplies (MUL1, MUL2, MUL3, MUL4, CADD1, CADD2 in Figure 3-3 on page 10);
• a pair of shifters (SH1, SH2);
• a pair of float integer converters (CONV1, CONV2);
• seed generators for division and trascendent functions (DIV1, DIV2);
The MUL issue takes two scalar, vectorial or complex input operands from the RF. A scalar, vectorial or complex result is returned to the Register File.
Table 3-13.
vliw[53:52]
MUL
predication
MUL Issue
vliw[60:54]
vliw[84:77]
vliw[68:61]
vliw[76:69]
MUL opcode
RF add4 (D)
RF add1 (S1)
RF add0 (S2)
• RF add 0: Register File address port 0
It specifies one of the 256 RF registers as input operand0.
• RF add 1: Register File address port 1
It specifies one of the 256 RF registers as input operand1.
• RF add 4: Register File address port 6
It specifies one of the 256 RF registers as output operand.
18
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
• MUL opcode: MUL microcode
It specifies the MUL operation to perform.
• MUL predication
It specifies one of the four predication registers, if the condition of the predication register is false
the issue will not be executed.
Table 3-14.
MUL
Code
Mnemonic
0x0
CFMUL D S1 S2
MUL Opcodes
Description
Complex Floating Point Multiplication
( D 0, D 1 ) = S1 c × S2 c = ( ( S1 r ⋅ S2 r – S1 i ⋅ S2 i ), ( S1 i ⋅ S2 r + S2 i ⋅ S1 r ) )
Complex Floating Point Multiplication
0x1
CFMUL D S1 -S2
( D 0, D 1 ) = S1 c × – S 2 c = ( ( – S 1 r ⋅ S2 r + S1 i ⋅ S2 i ), ( – S 1 i ⋅ S2 r – S2 i ⋅ S1 r ) )
Complex Floating Point Conjugated Multiplication
0x2
CFJMUL D S1 S2
( D 0, D 1 ) = S1 c × S2 c = ( ( S1 r ⋅ S2 r + S1 i ⋅ S2 i ), ( S1 i ⋅ S2 r – S2 i ⋅ S1 r ) )
Complex Floating Point Conjugated Multiplication
0x3
CFJMUL D S1 -S2
0x4
CFJJMUL D S1S2
0x5
CFJJMUL D S1 S2
0x6
CFRMUL D S1 S2
0x7
CFRMUL D S1 -S2
( D 0, D 1 ) = S1 c × – S 2 c = ( ( – S 1 r ⋅ S2 r – S1 i ⋅ S2 i ), ( – S 1 i ⋅ S2 r + S2 i ⋅ S1 r ) )
Complex Double-Conjugate Floating Point Multiplication
( D 0, D 1 ) = S1 c × S2 c = ( ( S1 r ⋅ S2 r – S1 i ⋅ S2 i ), ( – S 1 i ⋅ S2 r – S2 i ⋅ S1 r ) )
Complex Double-Conjugate Floating Point Multiplication
( D 0, D 1 ) = S1 c × – S 2 c = ( ( – S 1 r ⋅ S2 r + S1 i ⋅ S2 i ), ( S1 i ⋅ S2 r + S2 i ⋅ S1 r ) )
Complex with Real Floating Point Multiplication
( D 0, D 1 ) = S1 c × S2 r = ( S1 r ⋅ S2 r, S1 i ⋅ S2 r )
Complex Integer with Real Integer Multiplication
( D 0, D 1 ) = S1 c × – S 2 r = ( – S 1 r ⋅ S2 r, – S 1 i ⋅ S2 r )
Complex Integer Multiplication
0x8
CIMUL D S1 S2
0x9
CIMUL D S1 -S2
0xA
CIJMUL D S1 S2
( D 0, D 1 ) = S1 c × S2 c = ( ( S1 r ⋅ S2 r – S1 i ⋅ S2 i ), ( S1 i ⋅ S2 r + S2 i ⋅ S r1 ) )
Complex Integer Multiplication
( D 0, D 1 ) = S1 c × – S 2 c = ( ( – S 1 r ⋅ S2 r + S1 i ⋅ S2 i ), ( – S 1 i ⋅ S2 r – S2 i ⋅ S1 r ) )
Complex Integer Conjugated Multiplication
( D 0, D 1 ) = S1 c × S2 c = ( ( S1 r ⋅ S2 r + S1 i ⋅ S2 i ), ( S1 i ⋅ S2 r – S2 i ⋅ S1 r ) )
19
7011A–DSP–12/08
MUL
Code
Mnemonic
0xB
CIJMUL D S1 -S2
0xC
CIJJMUL D S1 S2
0xD
CIJJMUL D S1 -S2
0xE
CIRMUL D S1 S2
0xF
CIRMUL D S1 -S2
0x20
VFMUL D S1 S2
0x21
VFMUL D S1 -S2
0x22
VIMUL D S1 S2
0x23
VIMUL D S1 -S2
0x28
FMUL D S1 S2
0x29
FMUL D S1 -S2
0x2A
IMUL D S1 S2
0x2B
IMUL D S1 -S2
0x30
VSH D S1 S2
Vectorial Shift
Store in (D0,D1) the operand (S10,S11) shifted by MSBs of (S20,S21).
See Section 3.4.6.1 ”Shift and Bitwise Operations” on page 24
VSHAND D S1 S2
Vectorial Shift and bitwise AND
Store in (D0,D1) the operand (S10,S11) shifted and bitwise ANDed of (S20,S21).
See Section 3.4.6.1 ”Shift and Bitwise Operations” on page 24
Description
Complex Integer Conjugated Multiplication
( D 0, D 1 ) = S1 c × – S 2 c = ( ( – S 1 r ⋅ S2 r – S1 i ⋅ S2 i ), ( – S 1 i ⋅ S2 r + S2 i ⋅ S1 r ) )
Complex Double-Conjugate Integer Multiplication
( D 0, D 1 ) = S1 c × S2 c = ( ( S1 r ⋅ S2 r – S1 i ⋅ S2 i ), ( – S 1 i ⋅ S2 r – S2 i ⋅ S1 r ) )
Complex Double-Conjugate Integer Multiplication
( D 0, D 1 ) = S1 c × – S 2 c = ( ( – S 1 r ⋅ S2 r + S1 i ⋅ S2 i ), ( S1 i ⋅ S2 r + S2 i ⋅ S1 r ) )
Complex with Real Integer Multiplication
( D 0, D 1 ) = S1 c × S2 r = ( S1 r ⋅ S2 r, S1 i ⋅ S2 r )
Complex with Real Integer Multiplication
( D 0, D 1 ) = S1 c × – S 2 r = ( – S 1 r ⋅ S2 r, – S 1 i ⋅ S2 r )
Vectorial Floating Point Multiplication
( D 0, D 1 ) = ( S1 0 × S2 0, S1 1 × S2 1 )
Vectorial Floating Point Multiplication
( D 0, D 1 ) = ( S1 0 × – S 2 0, S1 1 × – S 2 1 )
Vectorial Integer Multiplication
( D 0, D 1 ) = ( S1 0 × S2 0, S1 1 × S2 1 )
Vectorial Integer Multiplication
( D 0, D 1 ) = ( S1 0 × – S 2 0, S1 1 × – S 2 1 )
Floating Point Multiplication
D = S1 × S2
Floating Point Multiplication
D = S1 × – S2
Integer Multiplication
D = S1 × S2
Integer Multiplication
0x31
20
D = S1 × – S2
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
MUL
Code
Mnemonic
Description
0x32
VSHOR D S1 S2
Vectorial Shift and bitwise OR
Store in (D0,D1) the operand (S10,S11) shifted and bitwise ORed of (S20,S21).
0x33
VSHXOR D S1 S2
Vectorial Shift and bitwise XOR
Store in (D0,D1) the operand (S10,S11) shifted and bitwise XORed of (S20,S21).
0x34
VBITSET D S1 S2
Vectorial Bit Set
Store in (D0,D1) the operand (S10,S11) with the N-th bit (value 0 to 31, specified into the (S20,S21)
MSB) set.
See Section 3.4.6.2 ”Bit Manipulation” on page 25
0x35
VBITCLR D S1 S2
Vectorial Bit Clear
Store in (D0,D1) the operand (S10,S11) with the N-th bit (value 0 to 31, specified into the (S20,S21)
MSB) cleared.
0x36
VBITTGL D S1 S2
Vectorial Bit Toggle
Store in (D0,D1) the operand (S10,S11) with the N-th bit (value 0 to 31, specified into the (S20,S21)
MSB) inverted.
0x37
VBITTST D S1 S2
Vectorial Bit Test
Store in (D0,D1) the value of the N. bit (value 0 to 31, specified into the (S20,S21) MSB) of
operand (S10,S11).
0x38
SH D S1 S2
0x39
SHAND D S1 S2
Shift and bitwise AND
See Section 3.4.6.1 ”Shift and Bitwise Operations” on page 24
0x3A
SHOR D S1 S2
Shift and bitwise OR
0x3B
SHXOR D S1 S2
Shift and bitwise XOR
0x3C
BITSET D S1 S2
Bit Set
Store in D the operand S1 with the N. bit (value 0 to 31, specified into the S2 MSB) set.
0x3D
BITCLR D S1 S2
Bit Clear
Store in D the operand S1 with the N. bit (value 0 to 31, specified into the S2 MSB) cleared.
0x3E
BITTGL D S1 S2
Bit Toggle
Store in D the operand S1 with the N. bit (value 0 to 31, specified into the S2 MSB) inverted.
0x3F
BITTST D S1 S2
Bit Test
Store in D the value of the N. bit (value 0 to 31, specified into the S2 MSB) of operand S1.
0x40
VFTOI D S1
Vectorial Floating Point To Integer Conversion
0x41
VITOF D S1
Vectorial Integer To Floating Point Conversion
0x42
FTOI D S1
Floating Point To Integer Conversion
0x43
ITOF D S1
Integer To Floating Point Conversion
0x44
VRF2RF D S1
0x45
RF2RF D S1
Shift
See Section 3.4.6.1 ”Shift and Bitwise Operations” on page 24
Vectorial Register File To Register File Movement
( D 0, D 1 ) = ( S1 0, S1 1 )
Register File To Register File Movement
D = S1
21
7011A–DSP–12/08
MUL
Code
Mnemonic
Description
Inversion Seed Generation
0x46
INVSEED D S1
0x47
INVSQRTSEED D
S1
1D = ----S1
Inversion Square Root Seed Generation
1
D = ---------S1
Vectorial Inversion Seed Generation
0x48
VINVSEED D S1
0x49
VINVSQRTSEED
D S1
0x7F
NOP
1
1
( D 0, D 1 ) =  ---------, ---------
 S1 0 S1 1
Vectorial Inversion Square Root Seed Generation
3.4.4.1
1
1
( D 0, D 1 ) =  -------------, -------------
 S1

S1
0
1
No Operation
Latency of MUL Issue
Two cycles are necessary to fetch the input operands and store the result. The arithmetic operation is performed in two additional cycles (scalar or vectorial case) or 4 cycles (complex domain
operations). So, the total multiplier pipeline is 4 cycles long (vectorial and scalar instructions) or
6 cycles long (complex instructions).
As an example of these two different pipeline cycles, two cases are described:
1. Vectorial or single product with two pipeline cycles
Table 3-15 shows a product between two real numbers.
The operation is
Z=X×Y
where X and Y are the multiplier input data already present in the Register File and Z is the
result (multiplier output data) that will be written to the Register File. X, Y and Z are real numbers
(not complex numbers). The pipeline scheme is shown in Table 3-15.
Table 3-15.
Cycle
MUL Pipeline for Scalar or Vectorial Operations
Operation
Note
0
Data X and Y are read from the Register File
decoding plus RF read
1
Data X and Y enter the multiplier - First half instruction
execution
computation
2
Second half instruction execution - Data Z (result) output
the multiplier
3
Data Z written to the Register File
write back to RF
As shown in the table above, cycle 1 and cycle 2 are the two multiplier pipeline cycles (computation cycles), while cycle 0 and cycle 3 are used to access the Register File.
22
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
The bypass logic implemented in the Register File allows to read the written data starting from
cycle 3 thus saving a pipeline cycle (so latency is still 4 cycles, but loops on accumulated data
can be reduced to 3 cycles).
2. Complex product with four pipeline cycles
Table 3-16 on page 23 shows a product between two complex numbers.
The operation is always the same:
Z=X×Y
but in this case X, Y and Z are complex values. Note that when dealing with complex numbers,
the product is equivalent to
ZRe = ( XRe × YRe ) - ( XIm × YIm )
ZIm = ( XIm × YRe ) + ( XRe × YIm )
where the subscript Re identifies the real part of the complex number and the suffix Im identifies
the imaginary part of the complex number.
The pipeline scheme is shown in Table 3-16 on page 23.
Table 3-16.
Cycle
MUL Complex Pipeline
Operation
0
Data XRe, XIm, YRe and YIm are read from the Register File
1
Data XRe, XIm, YRe and YIm enter the multiplier - First half of the
four multiplication instructions execution
2
Second half of the four multiplication instructions execution Intermediate results output the four multipliers
3
Intermediate results enter the subtractor and the adder - First half
of the addition and of the subtraction instructions execution
4
Second half of the addition and of the subtraction instructions
execution - Data Z (ZRe and ZIm) outputs the adder and the
subtractor
5
Data Z (ZRe and ZIm) written to the Register File
Note
decoding plus
RF read
computation
write back to
RF
Cycles 1, 2, 3 and 4 are the four-computation pipeline cycles (cycles 1 and 2 are used for the
four products and cycles 3 and 4 are used for the addition and the subtraction), while cycle 0 and
cycle 5 are used to access the Register File.
3.4.5
Multiplier Operator
The multiplier operator executes products between 40-bit floating-point numbers or 32-bit signed
integers. In the case of integer instructions, only 32 LSBs are treated, while the 8 MSBs remain
unchanged see Section 4.2 ”Data Organization” on page 63.
Products are allowed between scalar, vectorial and complex numbers. In particular, the complex
product is hardwired with the use of four multipliers, one adder and one subtractor, so that the
complex multiply can deliver a new result at every cycle.
23
7011A–DSP–12/08
The pipeline depth of the multiply operation is four cycles. Besides, in complex products it is feasible to conjugate the second or both the operands as well as to multiply a complex number with
a real one.
Multiplier instructions on floating point or 32-bit signed integers include:
• Complex products
– standard,
– standard with conjugation of input operands
– between real and complex operands
• Vectorial and scalar products
3.4.6
Shifter Operator
The shifter operator performs scalar and vectorial instructions.
Shifter operations include:
• Arithmetic left and right shifts on 32-bit signed integers
• Left rotate of the 32 Least Significant Bits of a 40-bit data (see “Data Organization” on page
63)
• Left rotate module 40-bit
• Logical Shift Right
• Shift Left inserting "1"
• Bit manipulation operations, including set, clear, toggle, and test bits
• Encode of "0" or "1" starting from either left or right
The shifter operator belongs to the MUL issue (see Section 3.4.4 ”MUL Issue” on page 18) with
a total latency of 4 cycles like in Section 3.4.4.1 ”Latency of MUL Issue” on page 22.
3.4.6.1
Shift and Bitwise Operations
The Shifter operator can combine a shift/rotate operation with a bitwise operation (see (V)SH,
(V)SHAND, (V)SHOR, (V)SHXOR Table 3-11 on page 11). The first operand S1 is the data to be
shifted. The 8 MSBs of the second operand S2 encode the shift operation, see Table 3-18 on
page 25. The remaining 32 LSBs of the operand S2 (mask data) are used as second operands
for bitwise operations: (V)SHAND, (V)SHOR and (V)XOR.
Table 3-17.
39
24
38
Second Operand Format (S2)
37
36
35
shift type
34
33
32
31
0
mask data
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
Table 3-18.
Shift Encoding
39
38
37
36
35
34
33
32
0
0
0
N
N
N
N
N
Left arithmetic shift and N. of shifts (0 to +31)
0
0
1
N
N
N
N
N
Right arithmetic shift and N. of shifts (0 to
+31)
0
1
N
N
N
N
N
N
Left rotate mod. 40 and N. of shifts (0 to +39)
1
0
0
N
N
N
N
N
"1" Insert Left Shift and N. of shifts (0 to 31).
Shift left and insert ‘1’.
1
0
1
N
N
N
N
N
Logic shift Right (0 to 31)
1
1
0
N
N
N
N
N
Left rotate mod. 32 and N. of shifts (0 to 31)
0|1
Encode 0|1 starting from Left(0)|Right(1). It
returns in D an integer between 0 and 31
corresponding to the bit position of the first
occurrence of a ‘1’ or ‘0’ (bit 32), starting from
left or right (bit 33)
1
1
1
-
-
-
L|R
NOTE Rev A specific: The sign bit of Arithmetic Shift Left is not shifted out (arithmetic shift left
and logic shift left are not equal).
3.4.6.2
Bit Manipulation
(V)BITSET, (V)BITCLR, (V)BITTGL and (V)BITTST use the second operands bits from 32 to 37
(S2) to specify the bit position, the 32 LSBs are unused.
Table 3-19.
39
38
not used
3.4.7
Bit Manipulation Format
37
36
35
34
33
32
N. bit position (values from 0 to 39)
31
0
not used
Converter Operator
The converter operator performs scalar and vectorial conversion from 40-bit floating point to 32bit signed integer and from 32-bit signed integer to 40-bit floating-point numbers.
The conversion is ANSI C standard compliant, i.e. it truncates the floating point decimal part.
The converter operator belongs to the MUL issue (see Section 3.4.4 ”MUL Issue” on page 18)
with a total latency of 4 cycles like in Section 3.4.4.1 ”Latency of MUL Issue” on page 22.
3.4.8
Mathematic Seed Generator
This device generates seeds that can be used in algorithms for the implementation of 1/x and
1/SQRT(x) 40-bit floating-point functions.
It takes one input floating-point operand from the Register File and returns one floating-point
result (the appropriate seed read from a look-up table) to the Register File.
The seed generator belongs to the MUL issue (see Section 3.4.4 ”MUL Issue” on page 18) with
a total latency of 4 cycles like in Section 3.4.4.1 ”Latency of MUL Issue” on page 22.
3.4.9
Operator Status Flags
The mAgicV DSP is IEEE 754 compliant for most aspects. As a consequence, there are five
types of exceptions:
25
7011A–DSP–12/08
•
inexact result
•
underflow
•
overflow
•
divide by zero
•
invalid operation
Rounding mode is round to nearest (the number is rounded to the nearest representable value;
this mode has the smallest errors associated with it because statistically rounding up and rounding down occur with the same frequency).
There are however a few differences from the IEEE 754 standard:
•
No traps are implemented. When an exception is detected, a status flag is set.
•
Denormalized numbers are not implemented.
Notes:
• computed result - means the result which would have been computed if both exponents
range and precision were unbounded (infinitely precise result).
• rounded result - means the result after rounding to nearest.
• delivered result - means the results that is returned by the operation and which fits with the
format.
3.4.9.1
Special Floating Point Values
Table 3-20 shows a set of special values as coded in the 40-bit extension to the IEEE 754 floating point format used by mAgicV.
Table 3-20.
Number
Sign
Exponent
Mantissa
+NaN
0
255
any configuration
-NaN
1
255
any configuration
+infinity
0
255
0
0x7F80000000
-infinity
1
255
0
0xFF80000000
+zero
0
0
0
0x0000000000
-zero
1
0
0
0x8000000000
0.5
0
126
0
0x3F00000000
1.0
0
127
0
0x3F80000000
1.5
0
127
2-1
0x3FC0000000
2.0
0
128
0
0x4000000000
3.0
3.4.9.2
Special Floating Point Values
0
128
-1
2
Representation
0x4040000000
Invalid Operation (Inop)
Only invalid operands for the operation to be performed can cause invalid operation flag to be
set.
NaN is signaled by the following operations:
• (+ ∞) + (- ∞)
• 0× ∞
26
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
• 1/
( ≤0 )
• ∞or NaN input operand for floating-point to integer conversion
a quiet NaN is delivered when an invalid operation exception occurs.
3.4.9.3
Division by Zero (Div by Zero)
If input operand is zero, a division by zero flag is set.
The delivered result is ∞ with correct sign when a division by zero exception occurs.
3.4.9.4
Overflow (Ovf)
If the magnitude of the rounded result exceeds 2128, an overflow exception is signaled.
The delivered result is ∞, with the sign of the computed result when an overflow exception
occurs.
3.4.9.5
Underflow (Udf)
Two events can cause underflow: tininess and loss of accuracy.
As no traps are implemented, an underflow is signaled only when both tininess and loss of accuracy occur.
Tininess:
computed result lies between ± 2-126 and is not zero
Ex: 2-130 (1.x..x)
Loss of accuracy: delivered result differs from computed result (which means the result is
inexact)
Ex:
computed result=2-130 (1.x..x1100)
delivered result = 2-126 (0.0001x..x1)
As denormalized numbers are not implemented the delivered result is zero when an underflow
exception occurs.
3.4.9.6
Inexact (Inex)
If rounded result differs from computed result or if an overflow exception occurs, an inexact
exception is signaled.
The delivered result is the rounded result or the overflowed result (∞ with the right sign), or the
underflowed result (0 with the right sign).
Note that if the underflow or overflow flag is high, the inexact flag will be high.
3.4.9.7
32-bit Signed Integer Flags
When the input operands are in 32-bit signed integer format the following rules apply:
• Flags Inop, Udf, and Inex are cleared
• Flag overflow has the following meaning: if the result of the operation is not between 231 and
–(231+1) the overflow flag is set. The delivered result is the overflowed result;
There is one more flag for the ALU: the Carry flag can be used to extend the result with one
more bit.
3.4.9.8
mAgicV DSP Flags Summary
Each operator in Figure 3-3 on page 10 sets its own flag:
27
7011A–DSP–12/08
MUL1,MUL2,MUL3,MUL4:
• Inop, Ovf, Udf, Inex
CADD1,CADD2:
• Inop, Ovf, Udf, Inex, Carry
ADD1, ADD2:
• Inop, Ovf, Udf, Inex, Carry for Add output
• Inop, Ovf, Udf, Inex, Carry for Sub output
CONV1, CONV2:
• Integer to floating-point: (Inex, Inop, Ovf are all cleared)
• Floating-point to integer: Inex, Inop, Ovf
SH/LOG1, SH/LOG2:
• No flag
MIN/MAX1, MIN/MAX2:
• Inop
DIV1, DIV2:
• Inop, Ovf, Udf, Inex, Div by Zero
the total amount of flags is 64 (32 for operator path0 and 32 for operator path1), stored in the
MGCSTIKY0 and MGCSTIKY1 registers.
The MGCEXCEPTION register (see Section 5-16 ”MGCEXCEPTION Register” on page 87) has
dedicated bits which collect:
• A logical OR of all inop flags (BADIN bit);
• A logical OR of all overflow flags (BADOUT bit);
• A logical OR of all divzero flags (DIVZERO bit)
3.4.9.9
MGCSTIKY0 and MGCSTIKY1
Operators flags are accumulated in two 32 bit Sticky Status Registers (MGCSTKY0 for the
Operator path0 and MGCSTKY1 for the path1) that are reset after reading.
Flags refer to the units of Figure 3-3 on page 10.
State flags are meaningful only if they assume the logic value ‘1’. These registers are cleared
upon read.
Table 3-21.
28
Format of MGCSTIKY0 and MGCSTIKY1 Registers
31
inopminmax
30
inopconv
29
ovfconv
28
inexconv
27
inopdiv
26
divbyzero
25
ovfdiv
24
udfdiv
23
inexdiv
22
inopadd1
21
ovfadd1
20
carryadd1
19
udfadd1
18
inexadd1
17
inopsub
16
ovfsub
15
carrysub
14
udfsub
13
inexsub
12
inopadd2
11
ovfadd2
10
carryadd2
9
udfadd2
8
inexadd2
7
inopmul2
6
ovfmul2
5
udfmul2
4
inexmul2
3
inopmul1
2
ovfmul1
1
udfmul1
0
inexmul1
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
• inexmul1: inexact primary multiplier
inexact from primary multiplier. Section 3.4.9.6 ”Inexact (Inex)” on page 27
• udfmul1: underflow primary multiplier
underflow from primary multiplier. Section 3.4.9.5 ”Underflow (Udf)” on page 27
• ovfmul1: overflow primary multiplier
overflow from primary multiplier. 3.4.9.4 ”Overflow (Ovf)” on page 27
• inopmul1: inexact primary multiplier
invalid operation from primary multiplier. Section 3.4.9.2 ”Invalid Operation (Inop)” on page 26
• inexmul2: inexact secondary multiplier
inexact from secondary multiplier (used in complex products).
• udfmul2: underflow secondary multiplier
underflow from secondary multiplier.
• ovfmul2: overflow secondary multiplier
overflow from secondary multiplier.
• inopmul2: inexact secondary multiplier
invalid operation from secondary multiplier.
• inexadd2: inexact secondary adder
inexact from secondary adder (used in complex products).
• udfadd2: underflow secondary adder
underflow from secondary adder.
• carryadd2: carry secondary adder
carry from secondary adder. Section 3.4.9.7 ”32-bit Signed Integer Flags” on page 27
• ovfdd2: overflow secondary adder
overflow from secondary adder.
• inopadd2: inexact secondary adder
invalid operation from secondary adder.
• inexsub: inexact subtractor
inexact from primary subtractor (used in complex products).
• udfsub: underflow subtractor
underflow from subtractor.
• carrysub: carry subtractor
carry from subtractor.
• ovfsub: overflow subtractor
overflow from subtractor.
29
7011A–DSP–12/08
• inopsub: inexact subtractor
invalid operation from subtractor.
• inexadd1: inexact primary adder
inexact from primary adder (used in complex products).
• udfadd1: underflow primary adder
underflow from primary adder.
• carryadd1: carry primary adder
carry from primary adder.
• ovfdd1: overflow primary adder
overflow from primary adder.
• inopadd1: inexact primary adder
invalid operation from primary adder.
• inexdiv: inexact divider
inexact from divider.
• udfdiv: underflow divider
underflow from divider.
• ovfdiv: overflow divider
overflow from divider.
• divbyzero: division by zero divider
division by zero operation from divider. Section 3.4.9.3 ”Division by Zero (Div by Zero)” on page
27
• inopdiv: inexact divider
invalid operation from divider.
• inexconv: inexact converter
inexact from converter.
• pvfconv: overflow converter
overflow from converter.
• inopconv: invalid operand converter
invalid operation from converter.
• inopminmax: invalid operand minmax
invalid operation from minmax.
30
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
3.5
On-Chip Data Memory
The on-chip Data Memory System contains 2 banks of 8K 40-bit words. It provides a maximum
throughput of 6 words/cycle and it can be simultaneously accessed by three devices: the computational data path, the AHB master and the AHB slave. Simultaneously, the computational
data-path can fetch and store a maximum of four 40-bit data per cycle, the AHB master can
drive a single access of 32-bit word per cycle and the AHB slave can support single accesses of
32-bit word per cycle (for the management of the 32-bit accesses inside the 40-bit word see the
alias in Section 3.7.2 ”Data Memory Accesses” on page 40). The simultaneous activity of the
AHB master and slave requires an external multi-layer bus matrix implementation.
Each access through P0B (and/or through P1B) can either transfer a single 40-bit data (scalar
access) or access a pair of consecutive memory locations aligned to even address (for operation
on complex or vectorial data types). Accesses through P0B and P1B are reserved to the computational data-path and their addresses are generated by AGU0 and AGU1. See Figure 3-4 for
the Data Memory system and Section 3.6 ”Address Generation Units” on page 32.
Figure 3-4.
Quad Port Data Memory
A H B m aster
P0A
A H B slave
P1A
32-bit
32-bit
Dual to Quad Port logic
PA
Dual Port RAM
2x8K x 40
PB
Dual to Quad Port logic
40 or 80-bit
Core
P0B
3.5.1
40 or 80-bit
Core
P1B
Latency
This memory is a static RAM with 2-cycles of latency for read accesses and 1-cycle to write.
Write and read latencies through the AHB are shown on Table 3-26 on page 39.
3.5.2
Access rules
To grant correct application functionality, simultaneous multiple Read and Write, or multiple
Write accesses (see Section 3.5.2.2 on page 32 and Section 3.5.2.3 on page 32).
Simultaneous multiple read accesses are allowed (see Section 3.5.2.1 on page 32).
31
7011A–DSP–12/08
3.5.2.1
Simultaneous Multiple Read Accesses to Identical Locations
Simultaneous read accesses are allowed. This means that any operation like
• AHB master (P0A), AHB slave (P1A), Core (P0B and P1B) contemporary Read
can be executed without causing unpredictable results.
3.5.2.2
3.5.2.3
Simultaneous Read/Write and Write/Write Accesses to Identical Locations producing Undefined Results
1.
AHB master (P0A) and Core (P0B) contemporary Write
2.
AHB slave (P1A) and Core (P1B) contemporary Write
3.
AHB master (P0A) and Core (P0B) contemporary Write and Read
4.
AHB master (P0A) and Core (P0B) contemporary Read and Write
5.
AHB slave (P1A) and Core (P1B) contemporary Write and Read
6.
AHB slave (P1A) and Core (P1B) contemporary Read and Write
Simultaneous Read/Write and Write/Write Accesses to Identical Locations producing Defined Results
1.
AHB master (P0A) and AHB slave (P1A) contemporary Write (P0A writes, P1A Write fails)
2.
Core (P0B and P1B) contemporary Write (P0B writes, P1B Write fails)
3.
AHB master (P0A) and Core (P1B) contemporary Write and Read (P0A writes, P1B reads
old value)
4.
AHB master (P0A) and Core (P1B) contemporary Read and Write (P0A reads new value,
P1B writes)
5.
AHB slave (P1A) and Core (P0B) contemporary Write and Read (P1A writes, P0B reads
new value)
6.
AHB slave (P1A) and Core (P0B) contemporary Read and Write (P1A reads old value,
P0B writes)
7.
AHB slave (P1A) and Core (P0B) contemporary Write (P1A Write fails, P0B writes)
8.
AHB master (P0A) and Core (P1B) contemporary Write (P0A writes, P1B Write fails)
Only these simultaneous accesses at the same address are allowed
3.6
Address Generation Units
There are two identical Address Generation Units in mAgicV (see Figure 3-1 on page 2) named
AGU0 and AGU1. Each AGU is driven by a dedicated VLIW issue (see Table 3-1 on page 3) and
Section 3.6.1 below.
The AGU can generate complex/vectorial and scalar accesses. In complex/vectorial mode two
words are accessed instead of one (scalar mode). The AGU supports linear addressing and
DSP oriented features like circular buffers. The address generation unit is supported by a multi
field Address Register File (ARF) composed of 4x16x16-bit registers, for a total of 64 16-bit integer registers. Registers named A0-A15 are used to manage 16-bit integers/pointers, while M0M15 registers are for the 16-bit integer/pointer modifiers. When circular buffers are used, S0S15 store the start addresses of the buffers, while L0-L15 store their lengths (zero length means
no circular buffer). Each AGU contains also a private 16-bit TMP register (TMP0 and TMP1)
which can be used by the AGU arithmetic and addressing operations. The AGU is able to per-
32
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
form 16-bit signed/unsigned integer arithmetic operations in parallel to the activities of the 40-bit
floating point and the 32-bit signed integer operators block.
Table 3-22.
63
64-bit ARF Register
48
47
S
32
31
L
16
15
A
0
M
At every clock cycle each AGU can perform addressing (addressing mode) or arithmetic operations (arithmetic mode).
The output of both arithmetic and addressing operations are written either in the A field of an
ARF register or in an internal AGU register named TMP.
The compiler, generating addressing or arithmetic AGU operations, can exploit different solutions in terms of AGU issue generation.
The most compact and orthogonal solution is to generate issues that select a single 64-bit ARFx
(see Section 3.6.1.1 ”Default AGU Issue” on page 34); but sometimes it is convenient to use the
My 16-bit field from a different 64-bit ARF. When two different ARFs are used, some other
issues are inhibited because of the need for additional coding bits, thus creates overlapping on
other issues.
3.6.1
AGU Issues
AGU issues are composed of at least an AGU code, an AGU predication field and 4 bits generally used to address a 64-bit ARF register. See Table 3-22 above.
The format of the AGU issue depends on the AGU code. Table 3-23 shows the AGU code
format:
Table 3-23.
6
0
1
AGU Code Format
5
write
4
vector
3
2
1
Addressing opcode
arithmetic opcode
0
• Bit 6: mode
MODE=’1’, selects the addressing mode
MODE=’0’, selects the arithmetic mode
• write
In addressing mode:
WRITE=’1’, selects a write access
WRITE=’0’, selects a read access
• vector
In addressing mode:
VECTOR=’1’, selects a vector access
VECTOR=’0’, selects a scalar access
• Addressing Opcode
In addressing mode specifies the addressing operation
33
7011A–DSP–12/08
• Arithmetic Opcode
In arithmetic mode specifies the arithmetic operation
3.6.1.1
Default AGU Issue
This AGU issue is used in the arithmetic/addressing operations that use only one 64-bit ARF
bundle composed of four 16-bit registers (Sx, Ax, Mx, Lx). The result is in Ax or TMP registers
and it uses Ax, Mx, Sx, Lx or TMP as sources.
The format of the AGU0 issue is:
vliw[1:0]
AGU0
predication
vliw[95:89]
vliw[88:85]
AGU0 code
ARF add0
vliw[47:41]
vliw[40:37]
AGU1 code
ARF add1
of the AGU1:
vliw[49:48]
AGU1
predication
• ARF addx: the ARF source/destination address
It specifies one of the 16 boundled ARF registers. Note that each ARF register is composed of
four sub-fields S,L,A,M for a total of 64-bit.
• AGUx code: AGU microcode
It specifies the AGU operation to perform.
• AGUx predication
It specifies one of the four predication registers. If the condition of the predication register is false
the issue will not be executed.
3.6.2
Addressing
The addressing can be:
• circular, if the L field of the involved ARF is not zero
• normal, if L=0
In the case of non circular addressing, only the A and the M fields are used to generate the
address. S and L are ignored and no modular addressing is performed.
The S field is considered only in circular addressing and it contains the base address of the
addressed circular vector.
By executing the circular address operation S + (A-S+/-M)% L the A field will be modified as
follows:
• If M>=0
A=A+M
if A+M-S<L
A=A+M-L
if A+M-S>=L
• If M<0
34
A=A+M
if A+M>=S
A=A+M+L
if A+M<S
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
In circular addressing, a Boundary Flag with the following rule is generated:
Bnd flag:
1
if Ax+Mx>=Sx+Lx
1
if Ax+Mx<Sx
0
otherwise
The boundary flag is registered into the MGCCONDITION register (see Section 3.9.2 ”Conditions and Status Flags” on page 47).
All the operations are in twos complement.
Addressing operations can generate the following two exceptions:
Addr_ovfl: signal address memory overflow when address > 0x3FFF (size of the “OnChip Data Memory), counted in 40-bit words (see Section 3.5 ”On-Chip Data Memory” on page
31).
Parity_err: signal memory address odd in vectorial addressing
In case of exception the addressing operation is disabled. An addressing operation can modify
the A field of an ARF (A-ARF) register.
Table 3-24 on page 35 shows the possible addressing modes. For further details see Section 8.
”Revision History” on page 101.
Table 3-24.
Addressing Modes
Addressing
Opcode
Updated Value of Ax
(when L=0, S is ignored
and % is not performed)
Generated Address
(when L=0, S is ignored
and % is not performed)
Description
0x0
Ax=Ax
Addr=Ax
direct register
0x1
Ax=Sx+(Ax-Sx+Mx)%Lx
Addr=Ax
post-auto-increment
0x2
Ax=Sx+(Ax-Sx-Mx)%Lx
Addr=Ax
post-auto-decrement
0x3
Ax=Sx+(Ax-Sx+Mx)%Lx
Addr=Sx+(Ax-Sx+Mx)%Lx
pre-auto-increment
0x4
Ax=Sx+(Ax-Sx-Mx)%Lx
Addr=Sx+(Ax-Sx-Mx)%Lx
pre-auto-decrement
0x5
Ax=Sx+(Ax-Sx+My)%Lx
Addr=Ax
post-auto-increment,
uses two ARFs
0x6
Ax=Sx+(Ax-Sx-My)%Lx
Addr=Ax
post-auto-decrement,
uses two ARFs
0x7
Ax=Sx+(Ax-Sx+My)%Lx
Addr=Sx+(Ax-Sx+My)%Lx
pre-auto-increment,
using two ARFs
0x8
Ax=Ax
Addr=Sx+(Ax-Sx+Mx)%Lx
base plus index regs
0x9
Ax=Ay+simm16
Addr=Ay+simm16
base reg plus signed
immediate, with reg
update, using two ARFs
0xA
Ax=Ax+simm16
Addr=Ax+simm16
base reg plus signed
immediate, with reg
update,
0xB
Ax=Ax
Addr=Ax+simm16
base reg plus signed
immediate
0xC
Ax=Ax
Addr=imm16
16-bit immediate
35
7011A–DSP–12/08
3.6.3
Addressing
Opcode
Updated Value of Ax
(when L=0, S is ignored
and % is not performed)
Generated Address
(when L=0, S is ignored
and % is not performed)
0xD
Ax=simm16
Addr=simm16
16-bit immediate, plus
reg loading
0xE
Ax=Ax
Addr=tmp
direct TMP register
0xF
Ax=Ax
Addr=Ax+tmp
base reg plus TMP
register
Description
Arithmetic
The outputs of arithmetic operations are written either in the A field of an ARF register or in the
TMP register.
Almost all arithmetic operations generate the following registered flags: Neg(negative),
Zero(zero), V(overflow).
• Zero: 1 if res=0; 0 otherwise;
• Neg: 1 in signed operation if res[15]=1;
• V: overflow result of a signed ADD or signed SUB or signed MUL
Compare instructions generate flag C, while instructions involving Lx and Sx generate the
boundary flag B.
All these flags are registered into the MGCCONDITION register and can be used for branches,
predication or pushed into the MGCSTKIQ stack condition register (see Section 3.9.4 ”Condition
Stack Manipulation” on page 50).
Table 3-25.
Arithmetic
Opcode
Arithmetic Operations
Output
0x0
36
Notes
Flag
NOP
0x1
Ax=Ay
copy two different A-ARF
registers
0x2
Ax=Ax-TMP
signed
N,Z,V
0x3
Ax=Ax+TMP
signed
N,Z,V
0x4
Ax=Ay-TMP
needs two A-ARF
registers, signed
N,Z,V
0x5
Ax=Ay+TMP
needs two A-ARF
registers, signed
N,Z,V
0x6
TMP=Ax-TMP
signed
N,Z,V
0x7
TMP=Ax+TMP
signed
N,Z,V
0x8
Ax=Ax+simm16
signed
N,Z,V
0x9
Ax=Ay+simm16
needs two A-ARF
registers, signed
N,Z,V
0xA
TMP=AX+simm16
signed
N,Z,V
0xB
TMP=TMP+simm16
signed
N,Z,V
0xC
TMP=Ax*simm16
signed
N,Z,V
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
Arithmetic
Opcode
Output
Notes
Flag
0xD
TMP=TMP*simm16
signed
N,Z,V
0xE
TMP=Ax*My
needs two ARF
registers,signed
N,Z,V
0xF
EQ(Ax,TMP)
compare equal Ax==TMP
signed/usigned
C
0x10
NEQ(Ax,TMP)
compare not equal
Ax!=TMP signed/usigned
C
0x11
GE(Ax,TMP)
compare greater equal
Ax>=TMP signed
C
0x12
GT(Ax,TMP)
compare greater
Ax > TMP signed
C
0x13
LE(Ax,TMP)
compare less equal
Ax<=TMP signed
C
0x14
LT(Ax,TMP)
compare less
Ax <TMP signed
C
0x15
Ax=TMP
copy
0x16
TMP=Ax
copy
0x17
TMP=Mx
copy
0x18
TMP=Sx
copy
0x19
TMP=Lx
copy
0x1A
Ax=(Ax+Mx)<Lx
generates condition true if
A+M<L, signed
N,Z,V,B
0x1B
Ax=Ax-Mx
generates condition true if
A-M>0, signed
N,Z,V,B
0x1C
Ax=Ax-TMP
generates condition true if
A-TMP>0, signed
N,Z,V,B
0x1D
Ax=(Ax+TMP)<Lx
generates condition true if
A+TMP<L, signed
N,Z,V,B
0x1E
Ax=Ax-simm16
generates condition true if
A-simm16>0 (signed
operation)
N,Z,V,B
0x1F
Ax=simm16
0x20
TMP=0
0x21
Ax=ABS(Ax)
Absolute value of Ax
0x22
TMP=ABS(Ax)
Absolute value of Ax
0x23
TMP=-Mx
sign inversion
0x24
TMP=uimm4
use four bits of ARF
addressing (default issue)
as 4-bit immediate
0x25
BITTEST(Ax,TMP)
BIT TEST Ax&(1<<TMP)
C
37
7011A–DSP–12/08
38
Arithmetic
Opcode
Output
Notes
Flag
0x26
Ax=Ax<<TMP
logical shift left of TMP
positions
N,Z
0x27
Ax=Ax>>TMP
logical shift right of TMP
positions
N,Z
0x28
Ax=Ax>>TMP
arithmetic shift right of
TMP positions
N,Z
0x29
Ax=Ax & TMP
bitwise AND
N,Z
0x2A
Ax=Ax | TMP
bitwise OR
N,Z
0x2B
Ax=Ax ^ TMP
bitwise XOR
N,Z
0x2C
Ax=Ax - TMP
unsigned
Z
0x2D
Ax=Ax + TMP
unsigned
Z
0x2E
Ax=Ay-TMP
needs two A-ARF
registers, unsigned
Z
0x2F
Ax=Ay+TMP
needs two A-ARF
registers, unsigned
Z
0x30
TMP=Ax-TMP
unsigned
Z
0x31
TMP=Ax+TMP
unsigned
Z
0x32
Ax=Ax+imm16
unsigned
Z
0x33
Ax=Ay+imm16
needs two A-ARF
registers, unsigned
Z
0x34
TMP=Ax+imm16
unsigned
Z
0x35
TMP=TMP+imm16
unsigned
Z
0x36
TMP=Ax*imm16
unsigned
Z
0x37
TMP=TMP*imm16
unsigned
Z
0x38
TMP=Ax*My
needs two ARF registers,
unsigned
Z
0x39
GE(Ax,TMP)
compare greater equal
Ax>=TMP unsigned
C
0x3A
GT(Ax,TMP)
compare greater
Ax > TMP unsigned
C
0x3B
LE(Ax,TMP)
compare less equal
Ax<=TMP unsigned
C
0x3C
LT(Ax,TMP)
compare less
Ax <TMP unsigned
C
0x3D
Ax=(Ax+Mx)<Lx
generates condition true if
A+M<L, unsigned
Z,B
0x3E
Ax=Ax-Mx
generates condition true if
A-M>0, unsigned
Z,B
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
3.7
AHB Slave Port
The AHB slave is AMBA® rev 2.0 compliant, and it is directly pluggable into an AHB-lite system.
It can give only “OK” or “ERROR” responses to the AMBA AHB transactions, but it never issues
a “RETRY” or “SPLIT”.
Errors are revealed in the following cases:
1.
wrong address space (address out of space or non existent)
2.
data size not 32-bit (i.e. byte and half-word accesses are non allowed)
3.
address not 32-bit aligned (i.e. 2 LSBs need to be "00")
In case of error a pulse signal is raised and registered into the MGCEXCEPTION register (Section 5.3.1.1 ”MGCEXCEPTION” on page 87).
The slave decoder receives 2 clocks, one for the AHB side and the other related to the core side.
There must be an integer ratio between the 2 clock frequencies (i.e. 1:2, 3:1, etc. etc.); skew
between rising edges of clocks need to be carefully controlled and the relative phase must be
stable.
See mAgicV DSP Implementation Manual for details on how to insert clock-tree for the IP inside
a SoC.
Slave accesses are not pipelined; each access is decoded and issued to a slave decoder block
running at core frequency and when it is completed a new access can be processed. During all
the processing time the AHB slave emits a “WAIT” answer.
The slave decodes three DSP addressing regions: program memory, data memory and registers, with different access times.
Table 3-26.
Start Address
End Address
Size
Access
Write
Latency
Read
Latency
PM
0x00600000
0x0061FFFF
128Kbyte
4 x word32
5
6
DM_I
0x00620000
0x0062FFFF
64Kbyte
word32
5
7
DM_F
0x00640000
0x0064FFFF
64Kbyte
word32
5
7
DM_D
0x00660000
0x0067FFFF
128Kbyte
2 x word32
5
7
REGS
0x00680000
0x00681FFF
8Kbyte
word32
5
6
RESERVED
0x00682000
0x006FFFFF
632Kbyte
word32
5
6
Resource
3.7.1
Addressing Regions
Program Memory Accesses
Internal Program Memory has a size of 8K 128-bit words. A word is made of 4 consecutive
increasing word addresses, so a complete 128-bit word is allocated in the slave addresses x,
x+4, x+8, x+12 where x is a 128-bit aligned address. x stores the lowest part of the program
memory line [31:0]; x+12 stores its highest part [127:96].
39
7011A–DSP–12/08
Table 3-27.
PM Alias
AHB
Address
AHB Data
WA
data0[31:0]
WA+4
data1[31:0]
WA+8
data2[31:0]
WA+12
data3[31:0]
RA
VLIW[31:0]
RA+4
VLIW[63:32]
RA+8
VLIW[95:64]
RA+12
VLIW[127:96]
Operation
PM[127:0]
write to PM
3.7.2
3.7.2.1
data3
data2
read from
PM
data1
data0
VLIW[127:0]
Data Memory Accesses
Internal Data memory has a size of 16Kx40 bit words. It can contain integers or single extended
precision floating data. There are three different aliased memory regions to access this memory
according to the content of the data itself.
DM-I memory alias
This memory alias is used for the 32-bit integer accesses (32 LSBs of the 40-bit word):
• A 32-bit WRITE in the internal data memory writes 32 LSBs of the 40-bit word and it clears
the remaining 8 MSBs.
• A 32-bit READ in the internal data memory reads only 32 LSBs of the 40-bit word.
Table 3-28.
3.7.2.2
DM-I Alias
AHB Data
Operation
integer[31:0]
write to DM-I
data40[31:0]
read from DM-I
DM[39:0]
00000000
integer[31:0]
data40[39:0]
DM-F Memory Alias
This memory alias is used for the 32-bit floating point accesses (32 MSBs of the 40-bit word):
• A 32-bit WRITE in the internal data memory writes 32 MSBs of the 40 bit word and it clears
the remaining 8 LSBs.
• A 32-bit READ in the internal data memory reads only the 32 MSBs of the 40-bit word.
Using this alias it is possible to convert a 40-bit floating point into a 32-bit floating point simply by
cutting the 8 LSBs of the 32-bit mantissa.
Table 3-29.
40
DM-F Alias
AHB Data
Operation
float32[31:0]
write to DM-F
float40[39:8]
read from DM-F
DM[39:0]
float32[31:0]
00000000
float40[39:0]
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
3.7.2.3
DM-D Memory Alias
This memory alias is used to access the full 40-bit data word, without considering the data type.
In this case a data word is made of two consecutive increasing word addresses, so a complete
40-bit word is stored in addresses: x, x+4, where x is a 64-bit word aligned address. x stores 32
LSBs of memory data [31:0], x+4 stores its 8 MSBs part.
Table 3-30.
DM-D Alias
AHB
Address
AHB Data
Operation
WA
data0[31:0]
write to DM-D
WA+4
data1[7:0]
RA
data40[31:0]
RA+4
data40[39:32]
DM[39:0]
data1[7:0]
read from DM-D
3.8
data0[31:0]
data40[39:0]
AHB Master Port
The AHB master is AMBA rev 2.0 compliant, and it is directly pluggable into an AHB system.
It does not implement protection (a default value is issued).
It supports only 32 bit accesses.
It issues only incremental bursts of unspecified length, even in case of single transfers.
It does not emit wait states.
Master grant is always asserted (no arbitration is present, the on-going of AHB transfer modulating HREADY signal is under the addressed slave's responsibility)
Figure 3-5 below indicates the main parts of the AHB master and the DMA engine.
When a DMA channel is ready to start a transfer it turns on the AHB master FSM for data move
to/from the DSP core memories.
FIFOs are controlled by the AHB signals on one side and a decoder interface that either transmits or receives data to or from the memories through a master decoder block that is
responsible for the correctness check and the data dispatching.
41
7011A–DSP–12/08
Figure 3-5.
AHB Master and DMA Engine
C
O
Write data fifo
A
H
AHB
R
interface
E
B
Read data fifo
Decoder
interface
M
E
M
DMA engine
O
R
I
E
AHB slave interface
mAgicV Core and MMU interfaces
S
The AHB master is activated by a DMA engine companion (see Section 7.1 ”DMA interface” on
page 92).
The AHB master first chooses the next winning DMA channel according to a fixed priority algorithm, then it copies the transfer parameters and starts the AHB cycles as soon as possible.
A programmable length up to 64K words burst is then managed directly by the AHB master: a
core engine asks for or delivers data to the internal memories and the AHB bus side manages
the AHB protocol. Between the AHB part and the core part there are 2 FIFOs, one for transmitting (10 locations) and the other for receiving data (16 locations).
The two sides can be clocked by different clock frequencies, but with a fixed ratio and with a
fixed relative phase (ratios like 2:1, 4:1, 1:3 etc. are allowed). The two different clocked words
are separated by the FIFO.
Whenever a transfer write from the internal DSP memories to the AHB bus is running out of data
the transfer is interrupted after current data transfer completion and then continued after re-gaining bus grant, without the need for busy issuing that would occupy the bus and waste useful
bandwidth.
Whenever a transfer read from the AHB bus to the DSP memories is filling up the FIFO, the
transfer is interrupted after current data transfer completion; the master then will transfer the
FIFO content to the internal memories and only when the FIFO is empty the transfer will continue after re-gaining bus grant, without the need for busy issuing that would waste bus cycles.
42
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
3.9
FLOW Control Block
The FLOW control block performs the following tasks:
• Registers movement
• Programs flow control
• Condition management
The FLOW issue has many formats and the FLOW code can change the default format of other
issues.
The basic default format of the FLOW is shown in Table 3-31.
Table 3-31.
vliw[3:2]
FLOW
predication
Default FLOW Issue
vliw[118:113]
vliw[51:50]
FLOW code
FLOW predication write
• FLOW predication
It specifies one of the four predication registers, if the condition of the pointed predication register is “false” (logic ‘0’) the issue will not be executed.
NOTE: Not all FLOW codes are predicated.
• FLOW code
It specifies the FLOW operation to be performed.
• FLOW predication write
It specifies the predication register when the FLOW code has a value comprised between 0x2E
and 0x36
For further details see Section 8. ”Revision History” on page 101.
3.9.1
Register Movements
There are four types of register movements:
1. immediate to register movement
2. register to register movement
3. memory to register
4. register to memory
Movements of type 1 and 2 are performed using explicit FLOW issues, while 3 and 4 are performed using AGU issues.
Register movements use two additional bits of RFA7 field (VLIW[112:104]) to decode the transfer type as specified in Table 3-32 on page 44.
43
7011A–DSP–12/08
Table 3-32.
Transfer Type
RFA7[8]
RFA7[7]
Description
0
0
Movement toward an ARF, flower, dma registers. Destination address
RFA7[6:0] using FLOW
0
1
Movement of ARF, flower, dma registers toward memory. Source address
RFA7[6:0] using AGU0
1
X
Movement toward a RF register destination address RFA7[7:0], using AGU0
Movements can be vectorial or scalar. ARF, RF registers and data memory allow vector movements. Vectorial ARF movements have multiplicity 4: they move four 16-bit registers. RF
vectorial movements have multiplicity 2: they move two 40 bit registers.
NOTE: Only movements with multiplicity 2 can be vectorially predicated (see Section 3.9.5
”Predication” on page 52).
3.9.1.1
Immediate Movements
The immediate movements occupy the flow code to specify the immediate operation, the RFA7
field to specify the destination address and zero (when loading the following integer or floating
point constants: 0,1,-1, 2, -2), one or more VLIW fields to specify the 16/32/64/80-bit immediate
values. Table 3-33 shows the immediate opcodes.
NOTE: Loading large constants reduces parallelism because VLIW fields are used to accommodate constants.
Table 3-33.
Immediate FLOW Opcodes
FLOW
Opcode
Mnemonic
0x1
Description
Latency
immv
Vector Immediate loading, to specify 64 or 80 bits values
depending on the destination register (ARF register 4x16-bit, RF
2x40-bit)
1
0x2
immvc
Vector Conjugate Float loading uses 40-bit immediate for the real
part and its negate for the imaginary part
1
0x3
immsi
Scalar Integer immediate loading uses a 32-bit immediate
1
0x4
immsf
Scalar Floating immediate loading uses a 40-bit immediate
1
0x5
immsic
Scalar integer constant loading uses the FLOW predication write
field of default issue to specify the integer constant to be loaded
see Table 3-36 on page 45
1
0x6
immsfc
Scalar floating constant loading uses the FLOW predication write
field of default issue to specify the floating constant to be loaded
1
0x7
imm0v
Vector zero constant loading
1
0x8
imm0s
Scalar zero constant loading
1
Table 3-34.
vliw[3:2]
FLOW
predication
44
Immediate Issue (imm0,imm0v)
vliw[118:113]
vliw[112:104]
imm0/imm0v
destination addr
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
Table 3-35.
Immediate Issue (immsic, immsfc)
vliw[3:2]
FLOW
predication
Table 3-36.
vliw[112:104]
vliw[51:50]
immsic/immsfc
destination addr
constant sel
Constant Selection
Constant Sel
Scalar Float
Scalar Integer
0x0
1.0
1
0x1
-1.0
-1
0x2
2.0
2
0x3
-2.0
-2
Table 3-37.
vliw[3:2]
FLOW
predication
Table 3-38.
vliw[3:2]
FLOW
predication
3.9.1.2
vliw[118:113]
Immediate Issue (immsi, immsf,immvc)
vliw[118:113]
immsi,
immsf,immvc
vliw[112:104]
vliw[92:53]
destination addr
imm
Immediate Issue (immv)
vliw[118:113]
vliw[112:104]
vliw[92:53]
vliw[43:4]
immv
destination addr
immh
imml
Register to Register Movements
Register movements can be:
• movements that have a RF register as source (require a dedicated FLOW opcode)
• other register to register movements (use RFA5 and RFA7 fields to identify respectively
source and destination addresses)
The Table 3-39 resumes the possible movements.
Table 3-39.
FLOW
Opcode
Register to Register FLOW Opcodes
Mnemonic
Description
Latency
0x9
movv
Vectorial movement
1
0xA
movs
Scalar movement
1
45
7011A–DSP–12/08
FLOW
Opcode
Mnemonic
Description
Latency
0xB
movvf
Vectorial movement (RF as source)
3
0xC
movsf
Scalar movement (RF as source)
3
0xD
movss
Signed extended scalar register movement: 16-bit ARF
register into RF register
1
NOTE: Signed extension movements only from the ARF to Register File registers
NOTE: Only RF and ARF registers classes support movement toward registers of the same
class (i.e. DMA registers can't be moved into other DMA registers)
NOTE: dedicated MUL opcodes can be used to move the RF registers into registers of the same
class.
Table 3-40.
vliw[3:2]
FLOW
predication
Table 3-41.
vliw[3:2]
FLOW
predication
3.9.1.3
Register to Register Issue (movv,movs,movss)
vliw[118:113]
movv,movs,
movss
vliw[112:104]
vliw[103:96]
destination addr
source addr
Register to Register Issue (movvf,movsf)
vliw[118:113]
vliw[112:104]
vliw[76:69]
movvf,movsf
destination addr
source addr
Register to Memory
Register to memory movements can be:
• movements that have a RF register as source
• movements that have no RF register as source
• RF register as source:
RF registers can't be transferred directly to the memory but only through MUL/ADD issue operations. MUL issue results are written in the memory by setting the appropriate write operation on
the AGU0 issue. The ADD issue results are written in the memory by setting the appropriate
write operations on the AGU1 issue.
For instance, to transfer a RF register into the memory the compiler can generate an integer
multiplication with “1” plus an AGU0 addressing issue or an integer addition with “0” plus an
AGU1 addressing issue.
NOTE: Normally the compiler is able to write RF registers to the memory during intensive calculation without adding dummy operations.
The write latency is the ADD/MUL operation latency plus one cycle.
No RF register as source:
In this case only the AGU0 issue can be used and the registers are transferred to the memory by
setting bit 8 and bit 7 of the RFA7 field according the transfer types see Table 3-32 on page 44.
The write latency is one cycle.
46
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
3.9.1.4
Memory to Register
Memory to register operations can be activated both on AGU0 or AGU1 issues specifying a read
operation.
The AGU0 issues uses also the RFA7 field (see Table 3-32 on page 44) to specify the destination register.
The AGU1 issue performs only operations toward the RF registers and uses the RFA5 field to
specify the RF destination address.
The read latency is three cycles long.
3.9.2
Conditions and Status Flags
Condition and status flags are registered in the MGCONDITION register. They are used to perform Condition Stack Manipulation (see Section 3.9.4 ”Condition Stack Manipulation” on page
50), conditional branches and predication.
These flags can be generated by: compare instructions, AGU addressing and arithmetic instructions or by writing Write And Test (WAT) registers.
Conditional branches and calls use condition flags.
3.9.2.1
MGCCONDITION Register
This register is reset to zero at every program start (i.e.setting the START bit on the MGCCTRL
register).
Table 3-42.
MGCCONDITION
15
-
14
-
13
sirq3
12
sirq2
11
rptloop
10
WATcond
9
fpucond1
8
fpucond0
7
agu1neg
6
agu1zero
5
agu1bnd
4
agu1cond
3
agu0neg
2
agu0zero
1
agu0bnd
0
agu0cond
• agu0cond: AGU0 condition
condition flag generated by the AGU0 compare instructions or by the AGUFLAG FLOW code
that converts one of the following status flags: AGU0BND, AGU0ZERO, AGU0NEG into the condition flag AGU0COND.
• agu0bnd: AGU0 boundary
status flag generated by the AGU0 circular addressing mode (see Section 3.6.2 ”Addressing” on
page 34).
• agu0zero: AGU0 zero
status flag generated by the AGU0 arithmetic operations, when the result is zero.
• agu0neg: AGU0 negative
status flag generated by the AGU0 arithmetic signed operations, when the result is negative.
• agu1cond: AGU1 condition
condition flag generated by the AGU1 compare instructions or by the aguflag FLOW code that
converts one of the following status flags: AGU1BND, AGU1ZERO, AGU1NEG into the condition flag AGU1COND.
47
7011A–DSP–12/08
• agu1bnd: AGU1 boundary
status flag generated by the AGU1 circular addressing.
• agu1zero: AGU1 zero
status flag generated by the AGU1 arithmetic operations, when the result is zero.
• agu1neg: AGU1 negative
status flag generated by the AGU1 arithmetic signed operations, when the result is negative.
• fpucond0: FPU left condition
condition left (or real) flag generated by the ADD issue compare instructions. In scalar and complex compare left and right conditions are equal. In a vectorial compare left and right conditions
may be different.
• fpucond1: FPU right condition
condition right (or imaginary) flag generated by the ADD issue compare instructions. In scalar
and complex compare left and right conditions are equal. In a vectorial compare left and right
conditions may be different.
• WATcond: Write And Test condition
condition flag generated by writing WAT registers. The written value is the mask of bits to test in
the register. WATcond=’1’ if the bits tested are ‘1’.
• rptloop: repeat loop
condition flag generated if there is an active hardware loop.
RPTLOOP=’1’, when MGCLOOPCNT register is > 0.
RPTLOOP=’0’, when MGCLOOPCNT == ’0’.
• sirq2
status flag generated by the FLOW SIRQ2 code.
• sirq3
status flag generated by the FLOW SIRQ3 code.
3.9.3
Conditioned and Unconditioned Jumps
mAgicV supports PC-relative and Indirect Register Jumps:
• PC-relative Jumps use a 16-bit signed immediate offset, allowing forward and backward
branches of up to 32K locations. The size of the physical on-chip program memory is 8K
locations. When the PMU is enabled the total addressable space for program memory is
extended to 64K locations. Starting from any 16-bit PC value, the addition or subtraction of a
32K value allows to span the full addressable 16-bit space. For this purpose, no exception
overflow exception is generated by relative address computation.
• Indirect Register Jumps use the 16-bit MGCBRANCH register. This register must contain an
absolute Program Memory address.
Jumps can be either unconditioned or conditioned. The branch instruction specifies the condition
source to be used, selecting among five possible sources (AGU0COND, AGU1COND,
FPUCOND0, WATCOND, TOP0) using an 8-bit JUMP TYPE field (see Section 3.9.3.1 ”Selec-
48
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
tion of Jump Condition” on page 49). The value of the condition must have been previously
stored in the MCGCONDITION register.
The pair of CALL and RETurn FLOW instructions provide a fast jump and link mechanism to
enter into and exit from leaf functions. The FLOW code (CALL) preserves the address of the
instruction after the branch in the MGCSTK register (link register). The RET FLOW code
replaces the program counter MGCPC with the MGCSTK register content.
Branches and calls can be predicated (see Section 3.9.5 ”Predication” on page 52).
NOTE: To return from an Interrupt Service Routine the RETI code must be used.
Table 3-43.
FLOW
Opcode
Mnemonic
Description
0x26
br
0x27
br_reg
0x28
call
0x29
call_reg
0x2C
ret
RETURN from a CALL, absolute jump to MGCSTK register
0x2D
reti
RETUN from an INTERRUPT service routine
Table 3-44.
vliw[3:2]
FLOW
predication
Table 3-45.
vliw[3:2]
FLOW
predication
Table 3-46.
3.9.3.1
Branch Operations
PC-Relative Jump (Immediate Offset)
Indirect Register Jump to MGCBRANCH
PC-Relative CALL (Immediate Offset), saves return address to MGCSTK
Indirect Reg CALL to MGCBRANCH, return address in MGCSTK
Opcodes for PC-relative Jumps and Calls
vliw[118:113]
vliw[84:77]
vliw[76:61]
br,call
jump type
offset
Opcodes for Indirect Registers Jumps and Calls
vliw[118:113]
vliw[84:77]
br_reg,call_reg
jump type
Opcodes for RET and RETI
vliw[3:2]
vliw[118:113]
FLOW
predication
ret,reti
Selection of Jump Condition
Jumps have an 8-bit type field to specify the type of condition used (see Section 3.9.2 ”Conditions and Status Flags” on page 47). For unconditioned Jumps this field must be set to 0.
NOTE: For conditioned Jumps only one bit must be set.
Table 3-47.
7
reserved
Jump Condition Selector
6
reserved
5
reserved
4
agu1cond
3
agu0cond
2
top0
1
fpucond0
0
WATcond
49
7011A–DSP–12/08
• WATcond
WATcond=’1’, jump conditioned by WATcond.
WATcond=’0’, jump not conditioned by this flag.
• fpucond0
fpucond0=’1’, jump conditioned by FPU compare fpucondflag0 (vector conditions are not used).
fpucond0=’0’, jump not conditioned by this flag.
• top0
top0=’1’, jump conditioned by the top of the condition stack0.
top0=’0’, jump not conditioned by this flag.
• agucond0
agucond0=’1’, jump conditioned by aguflag0.
agucond0=’0’, jump not conditioned by this flag.
• agucond1
agucond1=’1’, jump conditioned by aguflag1.
agucond1=’0’, jump not conditioned by this flag.
3.9.4
Condition Stack Manipulation
In addition to the mechanism of conditions direct generation performed by AGU0, AGU1, and
Operators Block (directly stored in MGCCONDITION), two 16-bit condition stack registers stored
in the 32-bit MGCSTKIQ register can be used to push and manipulate condition flags previously
generated.
Table 3-48 Lists the FLOW codes that operate on the stack registers.
Table 3-48.
FLOW
Opcode
50
Condition Stack Operations
Mnemonic
Description
0xE
push_op
Pushes vector conditions generated by ADD
0xF
push_agu0
Pushes scalar condition generated by AGU0 in both STACK0 and STACK1
0x10
push_agu1
Pushes scalar condition generated by AGU1 in both STACK0 and STACK1
0x11
pop
Pops out conditions from both STACK0 and STACK1
0x12
true
Pushes true ‘1’ on both STACK0 and STACK1
0x13
false
Pushes false ‘0’ on both STACK0 and STACK1
0x14
fsnot
inverts the TOPs of the STACKs
0x15
fsand
Makes the logic AND between the first and second elements of the
STACKs and replaces TOPs with the results
0x16
fsor
Makes the logic OR between the first and second elements of the STACKs
and replaces TOPs with the results
0x17
fsxor
Makes the logic XOR between the first and second elements of the
STACKs and replaces TOPs with the results
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
FLOW
Opcode
Mnemonic
Description
0x18
repush
Pushes TOPs into the STACKs
0x19
fsswap
Swaps the first and the second value of the STACKs
0x1A
fsand2
Makes the logic AND between the first and second value of the STACKs
and pushes the results in TOPs
0x1B
fsi2q
pushes TOP0 condition on the STACKs
0x1C
fsq2i
pushes TOP1 condition on the STACKs
The default FLOW issue shown in Table 3-31 on page 43 is used for condition stack
manipulation.
3.9.4.1
MGCSTKIQ
The stack registers are accessed as the LSB and MSB parts of the MGCSTKIQ 32-bit register
and they accommodate respectively the right and the left part of a vectorial flag.
Table 3-49.
MGCSTKIQ 32-bit Register
31
30
STACK1
29
28
27
26
25
24
23
22
21
20
19
18
17
16
top1
15
14
STACK0
13
12
11
10
9
8
7
6
5
4
3
2
1
0
top0
STACK0 field stores the right condition vector flag, while STACK1 field stores left condition vector flag. Stack0[0] is named TOP0, while STACK1[16] is named TOP1.
While a push operation on an AGU condition flag (PUSH_AGU0, PUSH_AGU1) copies the
same flag value on the top of both the I and Q stacks, the PUSH_OP code, which receives the
conditions from the Operators Block, can push different values on the top of the I and Q stacks
when executing a Vectorial compare.
The TOP0 can be used in conditional branches, while both TOPs can be used in some ADD
issue operations (SWAP,VSWAP,SEL).
51
7011A–DSP–12/08
3.9.5
Predication
mAgicV has 4 predication registers, packed into a single 16-bit MGCPRED register.
Each VLIW issue (ADD, MUL, AGU0, AGU1, FLOW) is associated with individual fields of 2 bits
in the program word. Those 2-bit fields specify the number of the predication register to be used
to predicate the execution of the associated issue.
If the value of the predication register is false (logic ‘0’) the operation of the associated issue is
not executed.
NOTE: predication is always enabled. This means that every operation which must be actually
executed must be associated to a predication register set to a TRUE value. At reset, the predication register #0 is set to TRUE and the operations referring to it will be executed.
The value of the 4 predication registers can be set by the predication operations specified by
Table 3-50. The VLIW coding for these operations is described by Table 3-51 and Table 3-52
below.
NOTE: If the HW support for the software pipeline is enabled, all issues are forced to use the
predication register #0.
Table 3-50.
FLOW
Opcode
Mnemonic
Description
0x2E
clrp
clears predication register
0x2F
setp
sets true predication register, OPERATION NOT PREDICATED
0x30
setpv
sets predication register to immediate, OPERATION NOT PREDICATED
0x31
settop
sets predication register with the TOPs of condition stacks
0x32
setop
sets predication register with the fpucond0 and fpucond1 condition flags
0x33
setagu0
sets predication register with agu0cond
0x34
setagu1
sets predication register with agu1cond
0x35
setnotp
sets predication register <reg+1> equal to the not of <reg>
0x36
setnot
inverts condition of predication register
Table 3-51.
vliw[118:113]
setpv
Table 3-52.
vliw[3:2]
FLOW
predication
3.9.5.1
Opcodes for setting the Values of the Predication Registers.
VLIW Coding for the SETPV Operation
vliw[51:50]
destination addr
vliw[3:2]
2-bit imm
VLIW Coding for other Predication Setting Instructions
vliw[118:113]
vliw[51:50]
clrp,settop,setop,setagu0,setagu1,setnot
destination addr
MGCPRED
The MGCPRED register packs the 4 predication registers plus a PREMODE field which specifies a modality of behavior for the operations setting the values of the predication registers listed
in Table 3-50 above.
52
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
Each predication register is 3-bit wide. A predication register with all the 3 bits set to 1 enables
the execution of any issue predicated by it (TRUE value). C0 and C1 fields accommodate vector
condition flags and they are used to enable/disable vector operations producing vectorial results
to be stored in the RF and in the memory. All scalar operations must have C0=C1, otherwise
undefined results may be generated. The third bit G0 is updated as shown in Table 3-50 on
page 52 depending on the PREDMODE field as shown in Table 3-55 below.
Table 3-53.
MGCPRED Register
15
7
14
6
13
predmode
5
predreg2
Table 3-54.
2
G
12
11
4
predreg1
3
10
predreg3
2
9
1
predreg0
8
predreg2
0
Predication Register
1
C1
0
C0
• C0: condition flag0
Right vector condition.
• C1: condition flag1
Left vector condition.
• G: global predication flag
Used to predicate:
• AGU addressing and arithmetic operations
• data movement towards the ARF, DMA and FLOW registers
• flow control operations.
Table 3-55.
Predmode
Global Predication Flag Update
Description
0x0
G=C0, default
0x1
G=(C0 OR C1)
0x2
G=(C0 AND C1)
0x3
G=G, not update
53
7011A–DSP–12/08
3.9.6
Hardware Loop Support
mAgicV has hardware loop support to optimize loop overhead. Repeats can be combined with
hardware software-pipeline to enhance code density and register reuse.
Table 3-56.
FLOW
Opcode
Repeat Operations MGCLOOPCNT
Mnemonic
Description
0x37
repb
repeat block
0x39
reserved
reserved
REPB opcode requires the initialization of the 32-bit loop counter MGCLOOPCNT register with
the length of the loop. Repeats always execute at least one loop-cycle (their logic is similar to a
do-loop C-statement) the counter is decremented by one at the end of the loop.
3.9.6.1
MGCREPEAT
REPB opcode loads the start and end addresses of the block to be repeated into the 32-bit
MGCREPEAT register. The start of the block PMA is set to the current PC + 3, while the end of
the block PMA is set to current PC + block len -1 + 3.
Table 3-57.
3.9.6.2
MGCREPEAT Register
31
30
29
28
27
START OF BLOCK PMA
26
25
24
23
22
21
20
19
START OF BLOCK PMA
18
17
16
15
14
13
12
11
END OF BLOCK PMA
10
9
8
7
6
5
4
3
END OF BLOCK PMA
2
1
0
Repeat Block
The repeat block is used mainly to support hardware loops, with no jump overhead. The repeating block starts at PMA+3, where PMA is the location of the repeat instruction and it needs a
BLOCK LENGTH field to specify the length of the block to be repeated.
Table 3-58.
vliw[3:2]
FLOW
predication
Repeat Block Issue (repb)
vliw[118:113]
vliw[76:61]
repb
block length
NOTE: If the block is compressed, the repeat block parameter must change to reflect the final
length (after compression) of the block. To maintain logic coherency the compressor can't compress the code between the repeat block and the start of the repeating block.
54
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
3.9.7
Software Pipeline HW Support
The hardware support to the software pipeline allows eliminating prologues and epilogues of
software-pipelined loops, saving program memory space. The drawback can be an increased
execution time, due to more loop-cycles needed to complete the original loop.
In software pipeline mode, each computational issue (AGU0, AGU1, MUL, ADD) has an associated number (named STAGE) that represents the loop iteration K from which the issue will be
enabled. The issue will remain enabled up to K=MGCLOOPLEN + STAGE. The number of iterations of the loop will be MGCLOOPLEN + MAXSTAGE, where MAXSTAGE is the maximum
stage declared inside a software pipelined loop, and MGCLOOPLEN is the register that holds
the loop length.
The SWPIPE FLOW code is used to enable or to disable a software pipelined loop. This microinstruction needs two parameters: MAXSTAGE a 3-bit field and ON/OFF (ON=’1’) flag (see Table
3-59 below). MAXSTAGE is the maximum stage declared inside a software pipelined loop.
ON/OFF bit flags specify if we are entering or exiting a software pipelined loop.
Table 3-59.
vliw[118:113]
swpipe=0x3B
Table 3-60.
vliw[53:51]
STAGE MUL
SW Pipeline Issue
vliw[51]
ON/OFF
vliw[50]
vliw[3:2]
MAXSTAGE
STAGE Fields in SW Pipeline Mode
vliw[50:49]
STAGE AGU1
vliw[5:3]
STAGE ADD
vliw[2:0]
STAGE AGU0
NOTE: If HW support for software pipeline is enabled, all predication addresses point to zero.
This means that only one level of predication is supported (predreg0 register).
NOTE: Software pipeline is disabled when entering in interrupt routine. It is not possible to use it
inside an Interrupt Service Routine.
3.9.7.1
MGCLOOPLEN
This 32-bit register is used only within HW software pipelined loops. It takes into account the
number of iterations that the loop must do, while the MGCLOOPCNT is the loop counter. By writing this register the MGCLOOPCNT will be written as well (to save a write operation: at the
beginning of the loop MGCLOOPCNT=MGCLOOPLEN).
NOTE: The command swpipe (ON) will add automatically MAXSTAGE to MGCLOOPLEN and
will copy the result into MGCLOOPCNT and MGCLOOPLEN registers.
NOTE: The command swpipe (OFF) will subtract MAXSTAGE to MGCLOOPLEN and will copy
the result into MGCLOOPCNT and MGCLOOPLEN registers.
55
7011A–DSP–12/08
3.10
Program Management Unit
The mAgicV architecture specifies a 16-bit virtual program memory space (64K 128-bit words).
This virtual space is mapped into a physical 13-bit physical program memory space by a PMU.
The pm word (program word) is composed of 128 bits, the PMU maps 64K pm words of external
program memory in 8K pm words of internal memory. The external program memory space is
divided into 64 pages of 1K pm words. Each 1K pm word page is divided into sixteen chunks,
each one composed of 64 pm words, as described in Table 3-61 below.
Table 3-61.
15
14
Virtual Address
13
12
virtual page
11
10
9
8
7
chunk
6
5
4
3
2
offset
1
0
An efficient page replacement algorithm is realized in hardware to avoid software overhead.
It is possible to instruct the PMU to fix a set of physical pages, excluding them from the replacement algorithm. Each of the 8 physical pages has an associated PMUMAPPEDVIRT register
used to specify the virtual page (each one described by one of the 64 PMUVIRT registers) and
the chunks already loaded on the internal memory.
Two types of faults can be generated at every cycle:
• Page fault
• Chunk fault
A page fault is generated when the virtual page isn't physically mapped into one of the eight
internal physical pages. In this case the PMU finds a physical page to host the new virtual page.
If all physical pages are allocated, the PMU will replace the last recently used page with the new
one, using an hardware replacement algorithm (see Section 3.10.2 ”Replacement Algorithm” on
page 60) which operates on the PMU register described by Table 3-62 below.
3.10.1
PMU Registers
PMU registers can be accessed only through the AHB slave accesses. Each PMU register is
described later in a specific section. Table 3-62 contains the list of PMU registers.
Table 3-62.
56
PMU Registers
AHB Offset
Name
Type
Reset Value
0x2000x2FC
PMUVIRT
RW
NA
0x3000x31C
PMUMAPPEDVIRT
RW
0x00000000
0x320
PMUFAULTADD
RW
NA
0x324
PMUEXTADD
RW
0x200
0x328
PMUCTL
WO
NA
0x328
PMUSTAT
RO
0x00000000
0x32C
PMUMISSCNT
RW
0x00000000
0x330
PMUPHYSPNT
RW
0x7
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
3.10.1.1
PMUVIRT
The PMU has 64 6-bit registers (PMUVIRT), one for each virtual page. PMUVIRT descriptors
are used to translate virtual pages into physical pages to detect page miss and to perform pagereplacement algorithm. Table 3-63 describes the format of PMUVIRT registers.
Table 3-63.
5
Virtual Page Descriptor Register
4
phys page
3
2
ref
1
fixed
0
map
• map
MAP=’1’, the virtual address corresponding to the virtual page descriptor register is currently
mapped to PHYS PAGE.
MAP=’0’, the virtual address isn’t mapped.
• fixed
FIXED=’1’, the virtual page is fixed and can’t be replaced.
FIXED=’0’, the virtual page can be replaced.
• ref: reference (Read Only)
REF=’1’, the virtual page has been recently accessed. See replacement algorithm.
REF=’0’, the virtual page has not been recently accessed.
• phys page: physical page
It’s the 3-bit value identifying one of the eight PMUMAPPEDVIRT describing the physical page
that hosts the virtual page.
3.10.1.2
PMUMAPPEDVIRT
The PMU has eight 32-bit descriptor registers, (PMUMAPPEVIRT0 - PMUMAPPEDVIRT7).
Each PMUMAPPEVIRTx is associated to one of the 8 available on-chip physical pages.
Each PMUMAPPEVIRTx register holds a pointer to the virtual page which is mapped on the
physical page #x and a bitmap that specifies the chunks loaded. Table 3-64 describes the format
of PMUMAPPEDVIRT registers.
Table 3-64.
PMUMAPPEDVIRT Register
31
chunk15
30
chunk14
29
chunk13
28
chunk12
27
chunk11
26
chunk10
25
chunk9
24
chunk8
23
chunk7
22
chunk6
21
chunk5
20
chunk4
19
chunk3
18
chunk2
17
chunk1
16
chunk0
15
14
13
12
11
10
9
8
1
0
00000000
7
0
6
valid
5
4
3
2
mapped virtual page
• mapped virtual page
Describes the virtual page number mapped to the physical page associated with the PMUMAPPEDVIRTx register (0<=x<7)
57
7011A–DSP–12/08
• valid
VALID=’0’, the associated physical page doesn’t host any virtual page.
VALID=’1’, the associated physical page maps the specified virtual page.
• chunk15-chunk0
it’s a bitmap that specifies what chunks are loaded into the associated physical page
3.10.1.3
PMUFAULT: PMU Fault Address
This 16-bit register contains the virtual address that generated the last miss.
3.10.1.4
PMUEXTADD
This 12-bit register contains the MSB of the address generated by the PMU to access the program residing in the external memory.
This register allows handling of several program images on the 4-GB of external memory space.
The external address generated by the PMU is shown in Table 3-65.
Table 3-65.
External Address
31
30
23
22
21
pmuextaddreg
15
14
7
29
28
27
pmuextaddreg
20
19
13
12
pmufaultadd[15:6]
6
5
11
26
25
24
18
17
pmufaultadd[15:6]
10
16
9
8
0
4
3
2
1
0
0x00
3.10.1.5
PMUCTL
It is the PMU control register (Write Only access).
Table 3-66.
PMUCTL
31
-
30
-
29
-
28
-
27
-
26
-
25
-
24
-
23
-
22
-
21
-
20
-
19
-
18
-
17
-
16
-
15
-
14
-
13
-
12
-
11
-
10
-
9
-
8
-
7
6
5
4
3
-
-
-
-
1
prefetch
mask
0
-
2
prefetch
unmask
softreset
• softreset: software reset
SOFTRESET=’1’, resets the PMU state machine.
SOFTRESET=’0’, No effect.
NOTE: Only the PMUSTAT register is reset to the value shown in Table 3-62 on page 56. Other
registers are left unchanged.
58
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
• prefetch mask
PREFETCH MASK=’1’, prefetch instructions are not performed.
PREFETCH MASK=’0’, No effect.
• prefetch unmask
PREFETCH UNMASK = ‘1’, allows execution of prefetch instructions.
PREFETCH UNMASK = ‘0’, No effect.
3.10.1.6
PMUSTAT: PMU Status Register
It is the PMU status register (Read Only Access).
Table 3-67.
PMUSTAT
31
-
30
-
29
-
28
-
27
-
26
-
25
-
24
-
23
-
22
-
21
-
20
-
19
-
18
-
17
-
16
-
15
14
13
12
11
10
9
-
-
-
-
clkdis
fix page
-
8
prefetch
chunk
7
prefetch
page
6
chunk
fault
5
4
double
fault
3
mask
prefetch
2
1
0
waitmst
waitendtx
busy
Page fault
• busy
BUSY = ‘1’ , PMU state machine is working.
BUSY = ‘0’, PMU state machine is idle.
• waitendtx: wait end transfer
WAITENDTX = ‘1’, PMU is waiting for the completion of a DMA request.
WAITENDTX = ‘0’, PMU is not waiting for a DMA transfer.
• waitmst: wait master ready
WAITMST = ‘1’, PMU is waiting for the AHB master to be ready.
WAITMST = ‘0’, PMU is not waiting for the AHB master.
• mask prefetch
MASKPREFETCH = ‘1’, prefetched are masked.
MASKPREFETCH = ‘0’, prefetches are not masked.
• double fault
DOUBLEFAULT = ‘1’, a double miss has been detected.
DOUBLEFAULT = ‘0’, no double fault.
• page fault
PAGEFAULT = ’1’, a page miss has been detected.
59
7011A–DSP–12/08
PAGEFAULT = ’0’, no page fault.
• chunk fault
CHUNKFAULT = ‘1’, a chunk miss has been detected (no need of page replacing).
CHUNKFAULT = ‘0’, no chunk fault.
• prefetch page
PREFETCHPAGE = ‘1’, a page must be prefetched.
PREFETCHPAGE = ‘0’, no prefetch page.
• prefetch chunck
PREFETCHCHUNK = ‘1’, a chunk must be prefetched.
PREFETCHCHUNK = ‘0’, no prefetch chunk.
• fix page
FIXPAGE = ‘1’, a page is temporary fixed by the PMU during a double fault.
FIXPAGE = ‘0’, no fixing needed.
• clkdis: clock disable
CLKDIS = ‘1’, clock is disabled to allow transparent code fetching.
CLKDIS = ‘0’, no clock disable.
3.10.1.7
PMUMISSCNT
This is a 32-bit register that counts the number of misses done by the program. It can be used to
optimize the code by minimizing the misses (through page fixing and/or prefetches).
3.10.1.8
PMUPHYSPNT
This is a 3-bit register used to realize the replacement algorithm. It points to the physical page
descriptor (PMUMAPPEDVIRT) that represents the virtual page likely to be replaced.
3.10.2
Replacement Algorithm
The PMU register PMUPHYSPNT points to the physical page next to the last loaded by a page
miss.
The PMU will search starting from PMUPHYSPNT the first virtual page with “FIXED”=0 and
“REF”=0. During the search, the PMU resets the “REF” bit of every virtual page checked. If no
page is found (because each virtual page has the “REF”=1) the algorithm is repeated. If no page
is found the exception of MISSPAGEFREE will be raised. This exception could be raised if there
are at least seven pages fixed.
A chunk fault is generated when the virtual page is already mapped into a physical page, but the
chunk isn't present in the physical page yet. In this case the PMU loads 64 pm words corresponding to the missing chunk, without replacing any page.
The PMU supports prefetching instructions in the FLOW issue to improve program page hits.
3.10.3
60
Translation from Virtual Address to Physical Address
The virtual page field (see Table 3-61 on page 56) is used to select one of the 64 PMUVIRT registers. If the virtual page descriptor is “MAP”=1 the 13-bit PMA (Program Memory Address) is
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
built by joining the “PHYS PAGE” field with the 10 LSBs of virtual address else a miss will be
generated (see Table 3-63 on page 57).
Table 3-68.
PMA
12
11
10
PHYS_PAGE[2:0]
3.10.4
8
7
6
5
4
3
virtual address[9:0]
2
1
0
PMU Operations
The program flow can drive PMU to prefetch, fix, unfix virtual pages. Through these commands
it is possible to dynamically optimize a program by reducing its misses.
Table 3-69.
PMU Operations
FLOW
Opcode
Mnemonic
Description
0x2B
prefetch_reg
prefetch the page pointed by MGCBRANCH register
0x3A
prefetch_off
prefetch the page corresponding to the PC relative offset
0x3D
pmucmd
fix, unfix virtual page specified by MGCBRANCH
Table 3-70.
vliw[3:2]
FLOW
predication
Table 3-71.
vliw[3:2]
FLOW
predication
Table 3-72.
vliw[3:2]
FLOW
predication
3.10.4.1
9
prefetch_reg
vliw[118:113]
prefetch_reg
prefetch_off
vliw[118:113]
vliw[76:61]
prefetch_off
offset
pmucmd
vliw[118:113]
vliw[51:50]
pmucmd
fix=1, unfix=2
Prefetching
By prefetching a page the PMU will check if the page is already mapped, otherwise it will start a
DMA to load the requested page in parallel to the program execution.
3.10.4.2
Fixing and Unfixing
By issuing this commands it is possible to fix and unfix a virtual page.
61
7011A–DSP–12/08
4. Programming Model
This chapter describes the programming model and the set of registers accessible to the user.
4.1
Data Formats
mAgicV supports the data types shown in Table 4-1.
Table 4-1.
Data Types
Type
Data Width
half-word
16-bits
used for signed/unsigned 16-bit integers
word
32-bits
used for signed 32-bit integers
32-bits
used either for external memory storage of 32-bit
standard precision floating-point data or for 32-bit
data communication through AHB AMBA
interface
40-bits
used for internal floating point computation
(extended IEEE754 format)
64-bits
used either for external memory storage of
extended precision floating-point data or for
extended precision data communication through
AHB AMBA interface
extended-word
4.1.1
Description
16-bit Integer Format
The 16-bit unsigned integers represent numbers in the range between 0 and 65535.
The twos complement format is used for 16-bit signed integers, in the range between -215 and
215-1.
4.1.2
32-bit Signed Integer Format
mAgicV uses the twos complement for 32-bit signed integers, in the range between -231 and 2311.
No HW support for 32-bit unsigned integers which must be emulated by software.
mAgicV supports vector operations on 32 bit signed integers.
4.1.3
Standard Precision 32-bit Floating-Point Format
Standard 32-bit floating point format is used for data storage in external memory and can be
used for data exchange through AHB AMBA interface. All mAgicV internal floating point computations are performed on 40-bit floating point extended precision data.
4.1.4
Extended Precision 40-bit Floating-Point Format
mAgicV supports 40-bit floating point format, as an extension of the IEEE Standard 754 floating
point, having 1-bit sign, 8-bit exponent and 31-bit mantissa instead of 23-bit mantissa.
Table 4-2.
39
sign
62
40-bit Floating Point Format
38
32
exponent
30
0
mantissa
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
As in the standard IEEE 754 representation of the floating point numbers, the most significant bit
of the mantissa, also known as the “hidden bit”, is not represented.
There are however differences from the IEEE 754 standard:
• No traps are implemented. When an exception is detected, a status flag is set (see Section
3.4.9 ”Operator Status Flags” on page 25).
• De-normalized numbers are not implemented.
• Rounding mode is round to nearest (the number is rounded to the nearest representable
value; this mode has the smallest errors associated with it because statistically rounding up
and rounding down occur with the same frequency).
mAgicV supports vector operations on 40 bit floating point numbers.
4.2
Data Organization
In the memory and in the RF the data is stored as 40-bit quantities (extended-word). Integers
quantities have the 8 MSB padded with zero. Vector accesses occupy two consecutive
addresses (a vector memory access with odd addresses generates exception). The “right” part
of a 2-D vector quantity is contained at a lower address.
The tables below show the representation of the data types listed in Table 4-1 on page 62.
Table 4-3.
half-word unsigned
39
16
15
000000000000000000000000
Table 4-4.
39
32
00000000
Table 4-5.
39
32
00000000
0
halfword
half-word signed extended
31
16
15
1111111111111111
0
halfword
word
31
0
word
A full 64-bit ARF register (see Table 3-22) is packed in the memory and in the RF by using two
consecutive words.
Table 4-6.
39
32
00000000
Table 4-7.
39
32
00000000
even word
31
0
A field
M field
S field
L field
odd word
31
0
63
7011A–DSP–12/08
4.2.1
Register Classes
mAgicV registers are divided into three main classes having well defined data types and
functionalities:
1.
FLOW registers: it groups registers used to control and monitor program flow, to manage
interrupts and DMA. No arithmetic operations are defined for these registers and they can’t be
copied directly in other FLOW registers.
2.
ARF registers: it groups registers involved in address calculation and 16-bit
signed/unsigned arithmetic.
3.
Register File registers: it groups registers involved in 40-bit floating point arithmetic and in
signed integer 32-bit arithmetic. This class of registers can’t be accessed in write mode by an
external AHB master.
4.3
DSP States
mAgicV supports two main modes: run and debug. When the processor is in run mode three
other states are possible: step, sleep, interrupt.
Mode changes can be either caused by software control (i.e. FLOW opcodes or accesses from
external masters through the AHB slave interfaces, both writing on the MGCCTRL register), or
activated by external interrupts or exception processing. A mode can be interrupted by a higher
priority mode, but never by a lower priority. An AHB external master can change any mAgicV
state. Nested interrupts are not supported.
Table 4-8.
4.3.1
DSP States
Priority
State
Description
4
debug
All core pipelines are frozen, it’s safe to access internal memories and
registers through the AHB slave interface. Pending DMA are completed.
3
sleep
All pipelines are frozen, but the DSP is running waiting for some events.
This mode is used mainly in combination with write/read DMA
operations to wait the end of the transfer (EOT).
2
step
Causes one cycle of run state followed by the debug state
1
interrupt
0
run
mAgicV executing an ISR. All pipelines are running, interrupts arriving
on other lines are stored and will be served after execution of the RETI
instruction. Hardware support for SW pipeline is disabled.
All pipelines are running. Interrupt will be served on execution of
branches.
Debug
In debug mode, all DSP pipelines are frozen, only the DMA and the PMU are active, completing
all pending I/O operations. An AHB master controller can access mAgicV internal memories
either in write or in read, through the AHB slave port, without interfering with core pipelines. All
mAgicV internal registers are accessible from external masters in read/write (see rev A Note
below).
NOTE rev.A specific: with the exception of the RF registers which can be read only.
64
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
4.3.2
Sleep
See Section 5.2 ”Sleep and Wakeup” on page 84.
4.3.3
Step
See Section 6.2.4 ”Step Mode Support” on page 91.
4.3.4
Interrupt
See Section 5.1 ”Interrupt Handling” on page 78.
4.3.5
Run
In run mode, mAgicV is executing a program. When the HW support for software pipeline is
enabled an interrupt service is deferred. An AHB master controller can access both mAgicV
internal memories and mAgicV registers. In this mode any access to an internal resource should
be protected by using mutexes (see Section 4.5.1 ”Mutex Support” on page 75). As in debug
mode, an external AHB master can access the RF registers through the AHB slave only in read
mode (see rev A Note above on page 71).
4.3.6
Transitions between States
The Table 4-9 illustrates how different states and modes are both entered and exited.
Table 4-9.
State
Transition between States
Entry Cause
• Induced by FLOW opcodes:
– non-masked exceptions,
– HALT code.
• Caused by external AHB masters:
debug
– by writing STOP or
GBREAKON MGCCTRL
registers bits,
Exit Cause
• induced by external AHB master:
– Clearing of exceptions or
break conditions.
– By writing START or
CONTINUE or
GBREAKOFF MGCCTRL
registers bits.
– breakpoints,
– watchpoints,
– step mode.
• induced by Cross/triggering line:
– external debug request
(Cross Trigger),
• Induced by FLOW opcodes:
– SLEEP FLOW code
• Caused by external AHB masters:
sleep
– by writing SLEEPON bit in
MGCCTRL register
• induced by the events reported in
the MGCWAKECTRL reg:
– Interrupts
– DMA end of transfers.
• caused by external AHB masters:
– by writing SLEEPOFF bit in
MGCCTRL register.
65
7011A–DSP–12/08
State
step
interrupt
run
4.3.7
Entry Cause
Exit Cause
• Caused by external AHB masters:
• Caused by external AHB masters:
– by writing STEPON bit in
MGCCTRL register.
• Induced by non-masked interrupt
on internal/external lines, which are
served on the first occurrence of a
BR_REG,BR or WATCHINT FLOW
opcodes
• Any of the Exit Causes of higher
priority states, which are listed in
the right column above.
– by writing STEPOFF bit in
MGCCTRL register.
• RETI FLOW opcode
• Any of the Entry Causes of higher
priority states, which are listed in
the left column above.
Control and Status Registers
mAgicV control registers are write only. They allow to access the single bit of the register by setting ‘1’ in the respective position of the write mask, ‘0’ has no effect. In this way it is possible to
save several cycles of bit manipulation. Two distinct control bits are needed to either set or clear
a status/mode bit, one to set it and the other to clear it. Normally each control register has an
associated status register that reflects the changes made by writing the control register. They
often have the same address because they are selected by the operation type (write or read).
Status registers may support the Write And Test (WAT) optimization (see Section 3.9.3 ”Conditioned and Unconditioned Jumps” on page 48).
The most important mAgicV control register is the MGCCTRL which allows to start, stop and
debug mAgicV programs.
4.3.7.1
MGCCTRL
Table 4-10.
MGCCTRL Register
31
swreset
30
nofataloff
29
nofatalon
28
allfataloff
27
allfatalon
26
lockoff
25
lockon
24
tickoff
23
tickon
22
pmuoff
21
pmuon
20
pmchkoff
19
pmchkon
18
intovfoff
17
intovfon
16
breakoff
15
breakon
14
watchoff
13
watchon
12
decopff
11
decopon
10
triggoff
9
triggon
8
stepoff
7
stepon
6
gbreakoff
5
gbreakon
4
start
3
stop
2
continue
1
sleepoff
0
sleepon
• sleepon
SLEEPON=’1’, it forces mAgicV into sleep mode.
SLEEPON=’0’, no effect.
• sleepoff
SLEEPOFF=’1’, it forces mAgicV to exit from sleep mode.
SLEEPOFF=’0’, no effect.
66
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
• continue
CONTINUE=’1’, it puts mAgicV in run mode from the debug state (by continuing a previous interrupted program, i.e. step mode).
CONTINUE=’0’, no effect.
• stop
STOP=’1’, it puts mAgicV in debug mode by terminating a running program.
STOP=’0’, no effect.
• start
START=’1’, it starts a new program and puts mAgicV in run mode from the debug state.
START=’0’, no effect.
• gbreakon
GBREAKON=’1’, it puts mAgicV in debug mode by setting a “software break point”.
GBREAKON=’0’, no effect.
• gbreakoff
GBREAKOFF=’1’, it clears the “software break point” and it returns to run mode if no other break
sources are active.
GBREAKOFF=’0’, no effect.
• stepon
STEPON=’1’, it enables the step mode. A break request (STEPREQ bit of MGCSTAT) will be
generated at each cycle forcing mAgicV into debug mode.
STEPON=’0’, no effect.
• stepoff
STEPOFF=’1’, it disables the step mode.
STEPOFF=’0’, no effect.
• triggon
TRIGGON=’1’, it enables Cross Triggering (see Section 4.5.3 ”Programmable Output Lines” on
page 77).
TRIGGON=’0’, no effect.
• triggoff
TRIGGOFF=’1’, disables Cross Triggering.
TRIGGOFF=’0’, no effect
• decopon
DECOMPON=’1’, it enables code decompressor. The internal Program Memory must contain a
valid Compressed Program VLIW.
DECOMPON=’0’, no effect.
67
7011A–DSP–12/08
• decopoff
DECOMPOFF=’1’, it disables code decompressor.
DECOMPOFF=’0’, no effect.
• watchon
WATCHON=’1’, it enables data watchpoints. Write accesses to the internal data memory at the
address MGCWATCH will generate a break (WATCH bit of MGCSTAT) forcing mAgicV into
debug mode.
WATCHON=’0’, no effect.
• watchoff
WATCHOFF=’1’, it disables data watchpoints.
WATCHOFF=’0’, no effect.
• breakon
BREAKON=’1’, it enables program breakpoints, parity errors will be interpreted as breakpoints.
BREAKON=’0’, no effect
• breakoff
BREAKOFF=’1’, it disables program breakpoints.
BREAKOFF=’0’, no effect.
• intovfon
INTOVFON=’1’, it enables signed integer overflow detection.
INTOVFON=’0’, no effect.
• intovfoff
INTOVFOFF=’1’, it disables signed integer overflow detection.
INTOVFOFF=’0’, no effect.
• pmuon
PMUON=’1’, it enables PMU.
PMUON=’0’, no effect.
• pmuoff
PMUOFF=’1’, it disables PMU, virtual = physical address (8K physical address).
PMUOFF=’0’, no effect.
• tickon
TICKON=’1’, it enables clock tick counter (MGCSTEP register).
TICKON=’0’, no effect.
68
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
• tickoff
TICKOFF=’1’, it disables clock tick counter.
TICKOFF=’0’, no effect.
• lockon
LOCKON=’1’, it enables write lock on the MGCCTRL register. When the lock is active only an
external AHB master can write the MGCCTRL register. If mAgicV core tries to write the MGCCTRL with this lock active the write fails and an exception is generated.
LOCKON=’0’, no effect.
• lockoff
LOCKOFF=’1’, it removes the write lock on the MGCCTRL register.
LOCKOFF=’0’, no effect.
• allfatalon
ALLFATALON=’1’, all exceptions are fatal and put mAgicV in debug mode.
ALLFATALON=’0’, no effect.
• allfataloff
ALLFATALOFF=’1’, only some exceptions are fatal and put mAgicV in debug mode (see Section
5.3.1.1 ”MGCEXCEPTION” on page 87).
ALLFATALOFF=’0’, no effect.
• nofatalon
NOFATALON=’1’, all the exceptions are non-fatal and they can be recovered by an appropriate
exception service routine.
NOFATALON=’0’, no effect.
• nofataloff
NOFATALOFF=’1’, only some exceptions are fatal and put mAgicV in debug mode.
NOFATALOFF=’0’, no effect.
• swreset: software reset
SWRESET=’1’, mAgicV state is reset and put in debug mode. To make an hard reset use the
MGCRESET control register.
SWRESET=’0’, no effect.
4.3.7.2
MGCSTAT Register
State flags are meaningful only if they assume the logic value ‘1’.
Table 4-11.
MGCSTAT Register
31
-
30
-
29
stepreq
28
clken
27
ticken
26
watchen
25
intovfen
24
tickovf
23
nonefatal
22
allfatal
21
mgctrlock
20
rpt single
19
pmcheck
18
pty2break
17
softpipe
16
pmuen
69
7011A–DSP–12/08
15
decopen
14
intdbgreq
13
gbreak
12
watch
11
break
10
extdbgreq
9
stop
8
CT
7
halt
6
interrupt
5
exception
4
step
3
start
2
debug
1
run
0
sleep
• sleep
mAgicV is in sleep mode.
• run
mAgicV is in run mode.
• debug
mAgicV is in debug mode.
• start
mAgicV is starting a program (enabled by setting the START bit of the MGCCTRL).
• step
mAgicV is in step mode (enabled by setting the STEPON bit of the MGCCTRL).
• exception
An exception occurred.
• interrupt
mAgicV is serving an interrupt service routine.
• halt
An HALT FLOW has been executed.
• CT: Cross Triggering
Cross Triggering enabled (enabled by setting the TRIGGON bit of the MGCCTRL).
• stop
mAgicV was stopped (enabled by setting the STOP bit of the MGCCTRL).
• extdbgreq: external debug request
This is an external debug request. The request is meaningful only if Cross Triggering is enabled.
• break
a breakpoint occurred (reset by the START and the CONTINUE bit of the MGCCTRL).
• watch
a data watchpoint occurred (reset by the START and the CONTINUE bit of the MGCCTRL).
• gbreak: gui break
a “software breakpoint” occurred (enabled by setting the GBREAKON bit of the MGCCTRL).
70
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
• intdbgreq: internal debug request
debug request sent to an external AHB controller. The request is meaningful only if Cross Triggering is enabled.
• decopen: decompressor enabled
program memory decompressor is enabled (enabled by setting the DECOPON bit of the
MGCCTRL).
• pmuen: PMU enabled
PMU enabled (enabled by setting the PMUON bit of the MGCCTRL).
• softpipe: HW software pipeline support enabled
HW software pipeline support enabled (enabled by the program).
• pty2break: Parity to break
Parity errors are seen as breakpoints (enabled by setting the BREAKON bit of the MGCCTRL).
• pmchecken: Program Memory check enabled
Parity check on Program Memory enabled (enabled by setting the PMCHECKEN bit of the
MGCCTRL).
• rpt single: repeat single
mAgicV is in repeat single mode (enabled by the program).
• mgctrlock: mgcctrl lock
MGCCTRL is locked (enabled by setting the LOCKON bit of the MGCCTRL). Only an external
AHB master controller can write on MGCCTRL register.
• allfatal: all exceptions fatal
all exceptions put mAgicV in debug mode (enabled by setting the ALLFATALON bit of the
MGCCTRL)
• nonefatal: all exceptions non-fatal
all exception are considered non-fatal (enabled by setting the NOFATALON bit of the MGCCTRL), they can be handled by an interrupt handler.
• tickovf: tick counter overflow
32 bits MGCSTEP counter overflow (reset by writing the MGCSTEP register).
• intovfen: integer overflow enabled
integer overflow generate exception (enabled by setting the INTOVFON bit of the MGCCTRL).
• watchen: watch point enabled
watch point enabled (enabled by setting the WATCHON bit of the MGCCTRL). Write accesses
to the internal data memory at the address MGCWATCH set the WATCH status flag and put
mAgicV in debug mode.
• ticken: tick counter enabled
cycle counting enabled (enabled by setting the TICKON bit of the MGCCTRL).
71
7011A–DSP–12/08
• stepreq: step mode request
in step mode this flag goes to ‘1’ at each run cycles forcing mAgicV into debug mode.
• clken: clock enabled
this flag shows if core pipelines are running.
4.3.7.3
4.4
MGCRESET Register
A hard reset is done by writing the value 0xAC1CDEAD in this register.
Register Map
mAgicV registers address mapping can be divided into two banks. The first bank is composed of
128 registers that group: FLOW registers, ARF registers and DMA registers.
The second bank is composed of 256 Register File registers.
In register movements a 9-bit destination address RFA7 is used to access these registers. Bank
selection is done through the most significant RFA7 address bit (see Section 3.9.1 ”Register
Movements” on page 43).
Table 4-13 on page 73 illustrates the internal and AHB external mapping of mAgicV internal
registers.
4.4.1
Register Access Modes
mAgicV registers are classified in Table 4-12 below as:
• Read Only registers
• Write Only registers
• Read/Write registers
• Read and Clear
• Write And Test (see Section 4.4.1.2 ”Write And Test” on page 72)
4.4.1.1
Read and Clear
Some of the status and exception registers are cleared upon read.
4.4.1.2
Write And Test
Some of mAgicV registers support the Write And Test (WAT) optimization that allows to test one
or more bits of a register by writing a mask ‘1’ in the positions to be tested, and ‘0’ in other positions. The WAT condition flag is set to ‘1’ if all the positions tested are ‘1’.
Table 4-12.
Register Access Modes
Register Type
72
Description
WAT
Write And Test support
RC
Read and Clear after read (exception register)
RW
Read and Write register
RO
Read Only register (status register)
WO
Write Only register (control register).
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
4.4.2
Map of mAgicV Registers
The following table summarizes the name, access mode, reset value, internal address
(accesses performed by mAgicV core) and offset for accesses performed by external AHB
masters.
Table 4-13.
mAgicV Register Map
mAgicV Address
AHB Offset
Name
Access Mode
Reset Value
0x0
0x0
MGCCTRL
WO
NA
0x0
0x0
MGCEXCEPTION
RO
0x00000000
0x1
0x4
MGCSTAT
RO,WAT
0x00000204
0x2
0x8
MGCMASK
RW
0x00000000
0x3
0xC
MGCEXCEPTION
RC
0x00000000
0x4
0x10
MGCSTIKY0
RC
0x00000000
0x5
0x14
MGCSTIKY1
RC
0x00000000
0x6
0x18
MGCCONDITION
RW
0x00000000
0x7
0x1C
MGCREPEAT
RW
NA
0x8
0x20
MGCPC
RW
0x0000
0x9
0x24
MGCSTK
RW
0x0000
0xA
0x28
MGCBRANCH
RW
NA
0xB
0x2C
MGCPREDICATION
RW
0x007
0xC
0x30
MGCSTEP
RW
0x00000000
0xD
0x34
MGCWATCH
RW
NA
0xE
0x38
MGCMUTXCTRLSTAT
WAT
0x00000000
0xF
0x3C
MGCLOOPCNT
RW
0x00000000
0x10
0x40
MGCLOOPLEN
RW
0x00000000
0x11
0x44
General Purpose
RW
NA
0x12
0x48
General Purpose
RW
NA
0x13
0x4C
General Purpose
RW
NA
0x14-0x23
0x50-0x8C
ARF-AM
RW
NA
0x24-0x33
0x90-0xCC
ARF-M
RW
NA
0x34-0x43
0xD0-0x10C
ARF-A
RW
NA
0x44-0x53
0x110-0x14C
ARF-L
RW
NA
0x54-0x63
0x150-0x18C
ARF-S
RW
NA
0x64
0x190
MGCDMAEXTADD
RW
NA
0x65
0x194
MGCDMAEXTCIRCLEN
RW
NA
0x66
0x198
MGCDMAEXTMOD
RW
NA
0x67
0x19C
MGCDMAINTADD
RW
NA
0x68
0x1A0
MGCDMAINTCIRCLEN
RW
NA
0x69
0x1A4
MGCDMAINTMOD
RW
NA
73
7011A–DSP–12/08
74
mAgicV Address
AHB Offset
Name
Access Mode
Reset Value
0x6A
0x1A8
MGCDMALEN
RW
NA
0x6B
0x1AC
MGCDMAINTSEG
RW
NA
0x6C
0x1B0
MGCDMACURLEN
RO
0x0000
0x6D
0x1B4
MGCDMACUREXTADD
RO
NA
0x6E
0x1B8
MGCDMAreserved
RO
NA
0x6F
0x1BC
MGCDMASTAT
WAT
0x00000000
0x6F
0x1BC
MGCDMACTRL
WO
NA
0x70
0x1C0
MGCSTKIQ
RW
0x00000000
0x71-0x78
0x1C4-0x1E0
MGCINTSVR
RW
NA
0x79
0x1E4
MGCINTMASK
RW
0xFFFF
0x7A
0x1E8
MGCINTSTAT
RO
0x00000000
0x7B
0x1EC
MGCINTGSTAT
RO
0x0000
0x7C
0x1F0
MGCINTCTRL
WO
NA
0x7D
0x1F4
MGCWAKECTRL
WO
NA
0x7D
0x1F4
MGCWAKESTAT
RO
0x00000000
0x7E
0x1F8
MGCINTSETRESET
WO
NA
0x7E
0x1F8
MGCINTRET
RO
NA
0x7F
0x1FC
MGCINTPRIORITY
RW
0x0000000
NA
0x200-0x2FC
PMUVIRT
RW
NA
NA
0x300-0x31C
PMUMAPPEDVIRT
RW
NA
NA
0x320
PMUFAULTADD
RW
NA
NA
0x324
PMUEXTADD
RW
NA
NA
0x328
PMUCTL
WO
NA
NA
0x328
PMUSTAT
RO
0x00
NA
0x32C
PMUMISSCNT
RW
0x00000000
NA
0x330
PMUPHYSPNT
RW
0x7
0x100-0x1FF
0x400-0xBFC
RF
RO(AHB)
RW(Core)
NA
NA
0xC00-0xC0C
MGCVLIW
RO
NA
NA
0xFC0
MGCRESET
WO
NA
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
4.5
4.5.1
Multicore Synchronization Support
Mutex Support
mAgicV provides 16 mutexes to safely manage resources sharing between an external AHB
master controller and the mAgicV core. There is no predefined meaning for the mutex registers.
The association between mutex and shared resources is driven by software that must add control code to manage the access to the shared resources. The hardware guarantees an atomic
write and test operation to lock mutexes and a fixed priority (external AHB master first) for contemporaneous write accesses.
4.5.1.1
MGCMUTEXCTRLSTAT
The interface is based upon a 32-bit control and status register MGCMUTEXCTRLSTAT. When
this register is read it reports the lock status and the owner of each mutex. In write it is used to
lock and unlock mutexes. The status of the lock/unlock operation is reported into the WAT condition flag.
Table 4-14.
MGCMUTEXCTRLSTAT (Read)
31
owner15
30
owner14
29
owner13
28
owner12
27
owner11
26
owner10
25
owner9
24
owner8
23
owner7
22
owner6
21
owner5
20
owner4
19
owner3
18
owner2
17
owner1
16
owner0
15
locked15
14
locked14
13
locked13
12
locked12
11
locked11
10
locked10
9
locked9
8
locked8
7
locked7
6
locked6
5
locked5
4
locked4
3
locked3
2
locked2
1
locked1
0
locked0
• locked0-locked15
lockedx=’1’ mutex ‘x’ is locked, owner ‘x’ is the owner of the mutex.
lockedx=’0’ mutex ‘x’ is free.
• owner0-owner15
These fields are meaningful only if the relative locked ‘x’ =’1’.
onwerx=’1’, an external AHB master controller is the owner of the mutex ‘x’.
ownerx=’0’, mAgicV is the owner of the mutex ‘x’.
Table 4-15.
MGCMUTEXCTRLSTAT (Write)
31
unlock15
30
unlock14
29
unlock13
28
unlock12
27
unlock11
26
unlock10
25
unlock9
24
unlock8
23
unlock7
22
unlock6
21
unlock5
20
unlock4
19
unlock3
18
unlock2
17
unlock1
16
unlock0
15
lock15
14
lock14
13
lock13
12
lock12
11
lock11
10
lock10
9
lock9
8
lock8
7
lock7
6
lock6
5
lock5
4
lock4
3
lock3
2
lock2
1
lock1
0
lock0
75
7011A–DSP–12/08
• lock0-lock15
Either an external AHB master controller or mAgicV try to set a mutex lock by writing ‘1’ in these
fields. If the lock succeeds (mutex free) the owner and the lock fields of the mutex status register
of Table 4-14 on page 75 are set accordingly.
By writing ‘0’ no effect.
• unlock0-unlock15
Either an external AHB master controller or mAgicV try to remove a mutex lock by writing ‘1’ in
these fields. If the unlock succeeds (the writer is the owner of the mutex) the lock field of the
mutex status register of Table 4-14 on page 75 is cleared.
By writing ‘0’ no effect.
NOTE: by writing more than one bit in either the lock or the unlock field more than one mutex at
a time can be respectively locked or unlocked.
NOTE: in case of a contemporary write by mAgicV and by an external AHB master controller on
the mutex control register, the AHB master will perform the operation.
NOTE: the WAT condition flag is automatically updated when mAgicV writes on the MGCMUTEXCTRLSTAT, if the mutex control operation succeeds WAT=’1’ otherwise WAT=’0’. This flag
can be checked directly by jumps to implement fast lock and unlock routines.
4.5.2
Lock and Unlock Software Procedures
The following pseudo C-code implements the pair of SW mutex lock/unlock (blocking) procedures to be executed by the mAgicV core and by an external AHB master (e.g. an ARM ®
processor).
do {/// LOCK procedure
u32 res;
// write the lock field
WRITE32(AT91_MAGIC_REG_BASE,AT91_REG_MGCMUTEXCTL,(1<<x));
// read the status register
READ32(AT91_MAGIC_REG_BASE,AT91_REG_MGCMUTEXCTL,res);
#ifdef ARM
// ARM loops until the mutex is not acquired (lock=’1’ and owner=’1’)
} while(((res & (1<<x)) && (res & (1<<(x+16))))==0);
#else
// mAgicV loops until the mutex is not acquired (lock=’1’ and owner=’0’)
} while(((res & (1<<x)) && (!(res & (1<<(x+16)))))==0);
#endif
The unlock procedure does not have to test the owner; need only test if the mutex is unlocked.
do {/// UNLOCK procedure
u32 res;
// write the unlock field
WRITE32(AT91_MAGIC_REG_BASE,AT91_REG_MGCMUTEXCTL,(1<<(x+16)));
// read the status register
READ32(AT91_MAGIC_REG_BASE,AT91_REG_MGCMUTEXCTL,res);
// loop until the mutex is not released lock=’0’)
} while(((res & (1<<x))==0);
76
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
4.5.3
Programmable Output Lines
mAgicV has 5 output lines named: SIRQ0,SIRQ1,SIRQ2,SIRQ3, HALT, which can be used to
generate waveforms (SIRQ2,SIRQ3) or external interrupts (SIRQ0,SIRQ1,HALT). The status of
these lines is directly driven by an explicit FLOW code.
Each SIRQ2 or SIRQ3 FLOW opcodes causes a toggle of the corresponding line.
Table 4-16.
vliw[3:2]
FLOW
predication
Table 4-17.
Sirq Generation Issue
vliw[118:113]
vliw[51:50]
sirq=0x1f
sirqnum
Sirq Description
Sirqnum
Description
0x0
cycle pulse on line SIRQ0
0x1
cycle pulse on line SIRQ1
0x2
toggle value on line SIRQ2, reset
0x3
toggle value on line SIRQ3, reset value 0
NOTE: SIRQ2 and SIRQ3 values are contained into the MGCCOND register so that they can
also be modified by writing the MGCCOND register.
The HALT code, puts mAgicV in debug mode by setting the HALTED bit in the MGCSTAT register, this value is reported in the HALT output line.
Table 4-18.
vliw[3:2]
FLOW
predication
Halt Generation Issue
vliw[118:113]
halt=0x23
77
7011A–DSP–12/08
5. Event Handling
When an event occurs the execution of the instruction stream can be:
1. passed to an event handler at an address specified by one of the 8 MGCINTSVR registers.
See Section 5.1 below.
2. resumed by a previous state of sleep (see Section 5.2 ”Sleep and Wakeup” on page 84).
3. halted and then pass in debug mode (see Section 5.3 ”Exceptions” on page 86).
5.1
Interrupt Handling
mAgicV allows very fast interrupt handling by treating interrupts as a routine processor instruction (branch, call, ret). Interrupts don’t break pipelines and save only return program counter into
the read only MGCINTRET register. mAgicV doesn’t cross protection domains to take an interrupt. Since the protection domain remains unchanged on a interrupt, the Interrupt Service
Routine is called as a normal function call.
There are 8 prioritized interrupt lines. Line0 and line1 multiplex four lines each (named shared
lines), so that the number of interrupt lines is 14. Each interrupt line is associated to a 16-bit
interrupt vector register (MGCINTSVR) that must be set to a valid program address, corresponding to the handler interrupt routine. An interrupt, on a previously enabled and not masked
interrupt line (via the MGCINTCTRL register), is registered into the PEND field of the MGCINTSTAT interrupt status register and served in a synchronous way in correspondence of the FLOW
codes shown in Table 5-1.
Table 5-1.
FLOW
Opcode
interrupt Service FLOW Codes
Mnemonic
Description
0x1D
watchint
0x26
br
0x27
br_reg
branch register
0x2C
ret
return from call
0x37
repb
Table 5-2.
vliw[3:2]
FLOW
predication
Table 5-3.
vliw[3:2]
FLOW
predication
check for pending interrupts
branch
repeat block
Watchint Issue
vliw[118:113]
watchint=0x1D
Reti Issue
vliw[118:113]
reti=0x2D
Interrupts can be masked using the MGCINTMASK. A masked interrupt is always registered as
a pending interrupt, but it won't be served until it's unmasked.
78
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
When the program jumps to an Interrupt Service Routine the ISVR MGCSTAT bit will be set,
indicating that no more interrupts will be served until a return from interrupt instruction (RETI) is
executed. The user code return address is saved into the MGCINTRET register and it's automatically restored in the MGCPC register when a RETI issue is executed.
In case more than one interrupt is pending, the line with a greater priority will be served. In case
of equal priority the interrupt line with a lower number will be served.
The priority register MGCINTPRIO is a 24-bit register that allows to associate three bits of priority to each line.
Pending interrupts can be set and cleared by using the MGCINTSETRESET; this feature can be
used to generate or clear interrupts by software over each line.
5.1.1
Interrupt Sources
Interrupts line0 and line1 are special lines that group 4 interrupt lines each (LINESHx). It's a way
to extend the number of interrupts without adding additional resources in terms of interrupt vector registers and prioritization logic. These shared lines can be connected to relatively slow and
homogenous devices as for instance SPIs and SSCs.
An interrupt on line 0 or line 1 is the OR of interrupts on the shared lines.
INT0 = LINESH0 OR LINESH1 OR LINESH2 OR LINESH3.
INT1 = LINESH4 OR LINESH5 OR LINESH6 OR LINESH7.
Table 5-4 shows the interrupt sources and the associated interrupt Service Vector Routine
registers.
Table 5-4.
INT #
Interrupt Sources
Interrupt Line
Source
Description
SVR
LINESH0
external
D940 implementation: SSC0
LINESH1
external
D940 implementation: SSC1
LINESH2
external
D940 implementation: SSC2
LINESH3
external
D940 implementation: SSC3
LINESH4
external
D940 implementation: EXT0
LINESH5
external
D940 implementation: EXT1
LINESH6
external
D940 implementation: EXT2
LINESH7
external
D940 implementation: EXT3
2
line2
external
D940 implementation: Timer channel
A1
MGCINTSVR2
3
line3
external
D940 implementation: SPI0
MGCINTSVR3
4
line4
internal
DMA End Of Transfer (one of the four
channels raises an EOT)
MGCINTSVR4
0
MGCINTISVR0
1
MGCINTISVR1
79
7011A–DSP–12/08
INT #
Interrupt Line
Source
Description
SVR
5
line5
internal
Watch point. There is a write done in
the data memory location specified by
mgcwatch. Often used to wake up
processor sleeping. (see Section
5.1.1.2)
6
line6
internal
non-fatal exception, MGCEXCEPTION
register
7
line7
internal
MGCINTSVR5
MGCINTSVR6
MGCSTEP counter overflow (see
5.1.1.1
Section 6. ”Profiling and Debug
support” on page 90)
MGCINTSVR7
EOT Interrupt
The EOT interrupt (INT#4) is generated when one of the four DMA channels generates an EOT.
NOTE: EOTs generated by the DMA and launched by the PMU are not registered.
5.1.1.2
Watch Interrupt
This interrupt (INT#5) is generated when a write operation is performed in the internal data
memory at the location pointed by the MGCWATCH register.
NOTE: if the WATCHEN bit of the MGCSTAT is set, mAgicV will go in debug mode because the
register is used as watch point register (see Section 6.2 ”Debug” on page 90).
5.1.1.3
Exception Interrupt
The interrupt exception signal (INT #6) is the logic OR of all non fatal exceptions.
NOTE: if the NONEFATAL flag of the MGCSTAT is set, it is the logic OR of all exception
sources.
5.1.1.4
5.1.2
5.1.2.1
Tick overflow interrupt
This interrupt (INT #7) is generated when the TICKOVF bit of the MGCSTAT is set (MGCSTEP=0xFFFFFFFF). The overflow bit is cleared by writing the MGCSTEP register.
Interrupt Registers
MGCINTCTRL
The MGCINTCTRL is a control register that allows to enable/disable an interrupt line x by writing
'1' in the appropriate ENx or DISx field. The SHxNEG and SHxPOS fields allow to choose the
edge level sensitivity of the 8 shared lines (used for external interrupts).
Table 5-5.
80
MGCINTCTRL Register
31
sh7neg
30
sh7pos
29
sh6neg
28
sh6pos
27
sh5neg
26
sh5pos
25
sh4neg
24
sh4pos
23
sh3neg
22
sh3pos
21
sh2neg
20
sh2pos
19
sh1neg
18
sh1pos
17
sh0neg
16
sh0pos
15
dis7
14
en7
13
dis6
12
en6
11
dis5
10
en5
9
dis4
8
en4
7
dis3
6
en3
5
dis2
4
en2
3
dis1
2
en1
1
dis0
0
en0
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
• en0-en7
ENx=‘1’, it enables the interrupt of the corresponding x line.
ENx=’0’, no effect.
• DIS0-DIS7
DISx=‘1’, it disables the interrupt of the corresponding x line.
DISx=’0’, no effect.
• sh0pos-sh7pos
SHxPOS=‘1’, it sets the sensitivity of the corresponding shared x line to be positive edge
sensitive.
SHxPOS=’0’, no effect.
• sh0neg-sh7neg
SHxNEG=‘1’ , it sets the sensitivity of the corresponding shared x line to be negative edge sensitive.SHxNEG=’0’, no effect.
5.1.2.2
MGCINTSETRESET
The MGCINTSETRESET is a control register that can be used to set and clear interrupt pending
bits into the MGCINTSTAT by writing '1' in the corresponding bit field.
NOTE: this register can be used to realize SWI.
Table 5-6.
MGCINTCTRL Register
15
clrint7
14
setint7
13
clrint6
12
setint6
11
clrint5
10
setint5
9
clrint4
8
setint4
7
clrint3
6
setint3
5
clrint2
4
setint2
3
clrint1
2
setint1
1
clrint0
0
setint0
• setint0-seting7
SETINTx=‘1’, it arises an interrupt in line x (PENDx=’1’ in the MGCINTSTAT).
SETINTx=’0’, no effect.
• clrint0-clrint7
CLRINTx=‘1’, it clears an interrupt in line x (PENDx=’0’ in the MGCINTSTAT).
CLRINTx=’0’, no effect.
5.1.2.3
MGCINTSTAT
The MGCINTSTAT contains the information about the enabled interrupt lines ENx, the pending
lines PENDx and the edge sensitivity of the 8 shared lines.
Table 5-7.
MGCINTSTAT Register
31
-
30
-
29
-
28
-
27
-
26
-
25
-
24
-
23
sh7edge
22
sh6edge
21
sh5edge
20
sh4edge
19
sh3edge
18
sh2edge
17
sh1edge
16
sh0edge
81
7011A–DSP–12/08
15
en7
14
en6
13
en5
12
en4
11
en3
10
en2
9
en1
8
en0
7
pend7
6
pend6
5
pend5
4
pend4
3
pend3
2
pend2
1
pend1
0
pend0
• pend0-pend7
PENDx=‘1’, pending interrupt on line x.
PENDx=’0’, no interrupt on line x.
• en0-en7
ENx=‘1’, enable interrupt line x.
ENx=’0’, no interrupt enabled on line x. MGCINSTAT does not register interrupts not enabled.
• sh0edge-sh7edge
SHxEDGE=‘1’, the corresponding shared x line is positive edge sensitive.
SHxEDGE=‘0’, the corresponding shared x line is negative edge sensitive.
5.1.2.4
MGCINTGSTAT
The MGCINTGSTAT register is used to save the status of pending shared interrupts when the
associated interrupt service routine is called. When an INT0 or INT1 interrupt is served, the status of the relative pending shared interrupts is registered in the saved shared field of the
MGCINTGSTAT, while the pending shared interrupt field is cleared. In this way the SVR0 or
SVR1 interrupt service routine can check the shared line that generated the interrupt, and the
HW will continue to register new shared interrupts.
Table 5-8.
MGCINTGSTAT Register
15
shsave7
14
shsave6
13
shsave5
12
shsave4
11
shpend7
10
shpend6
9
shpend5
8
shpend4
7
shsave3
6
shsave2
5
shsave1
4
shsave0
3
shpend3
2
shpend2
1
shpend1
0
shpend0
• shpend0-shpend3
SHPENDx=’1’, pending interrupt on shared line x (belonging to INT 0).
SHPENDx=’0’, no interrupt on shared line x.
• shsave0-shsave3
SHSAVEx, status of the interrupt line x when the service routine is served.
• shpend4-shpend7
SHPENDx=’1’, pending interrupt on shared line x (belonging to INT 1).
SHPENDx=’0’, no interrupt on shared line x.
• shsave4-shsave7
SHSAVEx, status of interrupt line x when the service routine is served.
82
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
5.1.2.5
MGCINTMASK
The MGCINTMASK is a 16-bit register that is used to mask enabled interrupts MSKx and shared
interrupts lines MSKSHx. Setting '1' means interrupt masked. An enabled interrupt is registered
into the pending register even if it is masked.
Table 5-9.
MGCINTMASK Register
15
msksh7
14
msksh6
13
msksh5
12
msksh4
11
msksh3
10
msksh2
9
msksh1
8
msksh0
7
msk7
6
msk6
5
msk5
4
msk4
3
msk3
2
msk2
1
msk1
0
msk0
• msk0-msk7
MSKx=‘1’, interrupt line x is masked.
MSKx=’0’, interrupt line x is not masked.
• msksh0-msksh7
MSKSHx=‘1’, interrupt line SHx is masked.
MSKSHx=’0’, interrupt line SHx is not masked.
5.1.2.6
MGCINTPRIO
In case of more than one pending interrupt, the line with a greater priority will be served, in case
of equal priority the line with a lower number will be served.
The MGCINTPRIO is a 24-bit register, PRIOx[2:0] allows to define an eight level priority for line
x. Lower values have higher priority (0: max priority-7=min priority).
Table 5-10.
MGCINTPRIO Register
31
-
30
-
29
-
28
-
27
-
26
-
25
-
23
22
PRIO7
21
20
19
PRIO6
18
17
15
PRIO5
14
13
PRIO4
12
11
10
PRIO3
9
8
PRIO2
7
6
5
4
PRIO1
3
2
1
PRIO0
0
PRIO2
24
16
PRIO5
• prio0-prio7
priority level of associated interrupt.
5.1.2.7
MGCINTRET
This read only register contains the return address of the interrupted user program. A RETI
FLOW code moves this register into the MGCPC and clears the INTERRUPT bit of the
MGCSTAT.
NOTE: a double RETI causes exception.
83
7011A–DSP–12/08
5.2
Sleep and Wakeup
mAgicV can go to sleep mode by writing the MGCCTRL register or by using the explicit FLOW
codes shown in Table 5-11 below. The processor will be woken up by one of the interrupts
shown in Table 5-4 on page 79 or by four EOT (End of Transfer) events coming from the DMA.
The events that can wake mAgicV up from a sleep state are selected using the MGCWAKECTRL control register.
NOTE Rev A specific: Sleep instruction must be scheduled at least three PM 128-bit locations
before the 1K boundary of each PM virtual page.
Table 5-11.
FLOW
Opcode
Description
sleepon
Put mAgicV in sleep mode waiting events enabled in MGCWAKESTAT
0x24
writedma
Start a pre-initialized dma write toward external resources, put the system
in sleep mode and set wakeup EOT event to the currently used channel
0x38
readdma
Readma start a pre-initialized dma toward mAgicV internal resources, put
the system in sleep mode and set wakeup eot event to the currently used
channel
vliw[3:2]
FLOW
predication
5.2.1.1
Mnemonic
0x1E
Table 5-12.
5.2.1
Sleep FLOW codes
Sleep Issue
vliw[118:113]
sleepon, writedma,
readdma
Wakeup Registers
MGCWAKECTRL
Wake Up control register. By writing this register wake up events registered in the MGCWAKESTAT are cleared.
Table 5-13.
MGCWAKECTRL Register
23
eotdis3
22
eoten3
21
eotdis2
20
eoten2
19
eotdis1
18
eoten1
17
eotdis0
16
eoten0
15
intdis7
14
inten7
13
intdis6
12
inten6
11
intdis5
10
inten5
9
intdis4
8
inten4
7
intdis3
6
inten3
5
intdis2
4
inten2
3
intdis1
2
inten1
1
intdis0
0
inten0
• inten0-inten7
INTENx=‘1’, it enables wakeup on interrupt line x.
INTENx=’0’, no effect.
• intdis0-intdis7
INTDISx=‘1’, it disables wake up on interrupt line x.
INTDISx=’0’, no effect.
84
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
• eoten0-eoten7
EOTENx=‘1’, it enables wakeup on the EOT of channel x.
EOTENx=’0’, no effect.
• eotdis0-eotdis7
EOTDISx=‘1’, it disables wakeup on the EOT of channel x.
EOTDISx=’0’, no effect.
5.2.1.2
MGCWAKESTAT
This register reports the wake up events currently enabled and the wakeup events occurred
from the last MGCWAKECTRL write access.
Table 5-14.
MGCWAKESTAT Register
23
eot3ev
22
eot2ev
21
eot1ev
20
eot0ev
19
int7ev
18
int6ev
17
int5ev
16
int4ev
15
int3ev
14
int2ev
13
int1ev
12
int0ev
11
eot3wk
10
eot2wk
9
eot1wk
8
eot0wk
7
int7wk
6
int6wk
5
int5wk
4
int4wk
3
int3wk
2
int2wk
1
int1wk
0
int0wk
• int0wk-int7wk
INTxWK=‘1’, it wakes up on interrupt line x.
INTxWK=‘0’, no wake up on interrupt line x.
• eot0wk-eot3wk
EOTxWK=‘1’, it wakes up on the EOT of channel x.
EOTxWK=‘0’, no wake up on the EOT of channel x.
• int0ev-int7ev
INTxEV=‘1’, an event raised on interrupt line x.
INTxEV=‘0’, no event on interrupt line x.
• eot0ev-eot3ev
EOTxEV=‘1’, an event arose on EOT of channel x.
EOTxEV=‘0’, no event on EOT of channel x.
85
7011A–DSP–12/08
5.3
Exceptions
mAgicV exceptions are divided into fatal and non fatal exceptions (see Table 5-15 below). Non
masked fatal exceptions cause the processor to stop immediately and to enter in debug mode.
Other exceptions can be handled in run mode by the exception interrupt routine number 6 (see
Table 5-4 on page 79). Exception register MGCEXCEPTION collects exceptions.
5.3.1
Fatal and non Fatal Exceptions
The non fatal exceptions can be considered fatal if the ALLFATAL bit of the MGCSTAT (see
Table 4-11 on page 69) is enabled. Conversely, all fatal exceptions can be considered as non
fatal if the NOFATAL bit of the MGSTAT is enabled.
Table 5-15.
Exception
86
Fatal Exceptions
Type
badin
not fatal
badout
not fatal
divbyzero
not fatal
addunkn
fatal
agu0ovf
not fatal
agu0pty
not fatal
agu0aovf
not fatal
agu1ovf
not fatal
agu1pty
not fatal
agu1aovf
not fatal
slverr
fatal
msterr
fatal
softex
not fatal
ctrlockex
fatal
rwagu0
fatal
rwagu1
fatal
write RF7
fatal
write RF5
fatal
write ARF
fatal
write FLOW
fatal
write DMA
fatal
agu0unkn
fatal
agu1unkn
fatal
ptyerr
fatal
rtierr
fatal
dblwrite
fatal
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
Exception
Type
pagefree
fatal
mulunkn
fatal
dmabusy
fatal
NOTE: The exception signal that halts the program execution is the logic OR of all non fatal
exceptions.
5.3.1.1
MGCEXCEPTION
Exception flags assume the logic value ‘1’ in case of non masked exception.
NOTE: an exception is considered masked if the corresponding bit into the MGCMASK register
is '1'.
This register has two different addresses:
• address 0x0, reads the register without clearing it.
• address 0x3 (0xC AHB offset), reads register and clears it.
Table 5-16.
MGCEXCEPTION Register
31
-
30
-
29
-
28
dmabusy
27
mulunkn
26
pagefree
25
dblwrite
24
rtierr
23
ptyerr
22
agu1ukn
21
agu0ukn
20
writedma
19
writeflow
18
writearf
17
writerf5
16
writerf7
15
rwagu1
14
rwagu0
13
ctrlockex
12
softex
11
msterr
10
slverr
9
agu1aovf
8
agu1pty
7
agu1ovf
6
agu0aovf
5
agu0pty
4
agu0ovf
3
addunkn
2
divzero
1
badout
0
badin
• badin: FPU bad input operand
A logical OR of all inop flags (see Section 3.4.9.2 ”Invalid Operation (Inop)” on page 26). Non
Fatal.
• badout: FPU bad output operand
A logical OR of all overflow flags. Non Fatal.
• divzero: FPU division by zero
Section 3.4.9.3 ”Division by Zero (Div by Zero)” on page 27. Non Fatal.
• addunkn: ADD unknown
ADD issue unknown code. Fatal.
• agu0ovf: AGU0 overflow
AGU0 address out of bounds (max address 16383). Non Fatal.
• agu0pty: AGU0 parity
AGU0 address parity error (a vector access with odd address). Non Fatal.
87
7011A–DSP–12/08
• agu0aovf: AGU0 arithmetic overflow
AGU0 arithmetic overflow. Non Fatal.
• agu1ovf: AGU1 overflow
AGU1 address out of bounds (max address 16383). Non Fatal.
• agu1pty: AGU1 parity
AGU1 address parity error (a vector access with odd address). Non Fatal.
• agu1aovf: AGU1 arithmetic overflow
AGU1 arithmetic overflow. Non Fatal.
• slverr: AHB Slave error
an error is detected by AHB Slave. Fatal.
• msterr: AHB Master error
an error is detected by AHB Master. Fatal.
• softex: software exception
generated by a SEX FLOW code. Non Fatal.
• ctrlockex: mgcctrl locked exception
a write operation has been issued by mAgicV core, but the mgcctrl register was locked. Fatal.
• rwagu0: AGU0 read write memory conflict
AGU0 read conflicts with a previous write (SW scheduling problem). Fatal.
• rwagu1: AGU1 read write memory conflict
AGU1 read conflicts with a previous write (SW scheduling problem). Fatal.
• write RF7: write conflict on RF port7
More than one write was scheduled on the RF port7 (SW scheduling problem). Fatal.
• write RF5: write conflict on RF port5
More than one write was scheduled on the RF port5 (SW scheduling problem). Fatal.
• write ARF: write conflict on ARF IO port
More than one write was scheduled on the ARF IO port (SW scheduling problem). Fatal
• write FLOW: write conflict on FLOW IO port
More than one write was scheduled on the FLOW (SW scheduling problem). Fatal
• write DMA: write conflict on FLOW DMA port
More than one write was scheduled on the DMA (SW scheduling problem). Fatal.
• agu0unkn: AGU0 unknown
AGU0 unknown code. Fatal.
• agu1unkn: AGU1 unknown
AGU1 unknown code. Fatal.
88
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
• ptyerr: parity error
Program memory parity error. Fatal.
• rtierr: RTI error
Unexpected return from interrupt. Fatal.
• dblwrite: double write
double write at the same data memory address. Fatal.
• pagefree
PMU can’t find a physical page to accommodate a new page. Fatal.
• mulunkn: MUL unknown
MUL unknown code. Fatal.
• dmabusy
started a new DMA on a busy channel. Fatal.
89
7011A–DSP–12/08
6. Profiling and Debug support
6.1
Profiling registers
The user can evaluate the performance of the system by means of two mAgicV 32-bit counter
registers.
6.1.1
MGCSTEP
The MGCSTEP register is used to collect information about the cycles spent in run mode. It
includes the cycles of pipeline stall due to program cache miss or sleep mode. This counter can
be accessed by mAgicV and by an external AHB master controller (see Table 4-13 on page 73).
Accessing TICKON and TICKOFF MGCCTRL control bits is possible to respectively start and
stop the MGCSTEP counter register. An interrupt handler can be installed on INT #7 line, signalling the overflow of this counter. The overflow is registered in the MGCSTAT register and it’s
cleared by write operations on the MGCSTEP register.
NOTE: The cycles reported by the MGCSTEP depends on the program flow (deterministic) and
by the AHB bus occupation (non deterministic).
6.1.2
PMUMISSCNT
The PMUMISSCNT register is used to collect information about the number of program misses
executed. This register can be accessed only by an external AHB master controller.
These miss events can be monitored by reading the PMUSTAT (see Table 3-67 on page 59).
6.2
Debug
All the debug features can be accessed by an external AHB master that can read and write all
mAgicV internal resources (memories and registers). There is a limitation on writing RF
registers.
6.2.1
Breakpoint Support
mAgicV supports breakpoints by toggling a bit of the program VLIW corresponding to the breakpoint PMA. By setting PMCHKON and BREAKON on the MGCCTRL control register, a parity
error is detected and interpreted as a breakpoint (PTY2BREAK flag of the MGCSTAT). The
external debug engine should check if the triggered breakpoint is a break point or a real parity
exception.
In case of Compressed Program Words the parity bit is located into the Super Header. See Section 3.2.1.1 ”Compressed Program Word” on page 5.
6.2.2
Watch Point Support
mAgicV supports watchpoints through a 16-bit watch point register MGCWATCH that must contain the 16-bit internal data address of the watched variable. The watch-point logic detects write
operations upon the specified watch address. To enable watchpoints the WATCHON bit of the
MGCCTRL must be set.
6.2.3
Cross Triggering Support
The main function of the Cross Triggering is to pass debug events from one processor to
another. The CT can communicate debug state information from one core (mAgicV) to another,
so that the program execution on both processors can be stopped at the same time, if required.
90
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
The CT mode is enabled in mAgicV by setting the TRIGGON bit of the MGCCTRL. In this mode
a dedicated mAgicV input line (dbg_req_from_arm) is used to put mAgicV immediately (1 cycle
latency) in debug mode. Vice versa mAgicV has a dedicated output line to communicate to
another core its debug state (dbg_req_to_arm).
6.2.4
Step Mode Support
In this mode a program is executed step by step, in this way it is possible to examine internal
registers at each cycle. An external AHB master controller can activate this mode by setting the
STEPON bit of the MGCCTRL. By setting the CONTINUE bit of the MGCCTRL, the controller
can advance the program execution by one cycle.
NOTE: in this mode the DMA is not interrupted (it continues even if the core is frozen), so that
temporizations are altered with respect to the normal run mode. For example, in presence of the
DMA, the MGCSTEP counts less cycles than in normal run mode.
91
7011A–DSP–12/08
7. DMA
The DMA engine is a single channel with 4 independent programmable set of registers.
The DMA is able to perform the following 32-bit word memory accesses:
• fixed external and/or internal address.
• incremental external and/or internal address.
• incremental address with a fixed external and/or internal modifier ("jump" or "stride").
• incremental address, wrapping around a specified length on external and/or internal address.
• all of the above mixed.
• all of the above, using last accessed external and/or internal addresses or reloading them.
All temporary conditions on the AHB bus, no granted bus or page fault or retry/split condition, do
not change the DMA channel that is currently operating (i.e. no new arbitration).
The DMA channels are serially processed and have fixed priority, the highest being channel
number 3, the lowest number 0.
Highest priority channel 3 is used by the PMU (if enabled). As a good programming rule, the
user application should avoid the usage of this channel when the PMU is enabled, otherwise
unpredictable results could occur. Several PMU DMA parameters (like chunck length, modifiers,
external address) are set at bootstrap and they must be kept fixed during the program execution.
Many parameters could be fixed throughout the entire application; moreover, thanks to the possibility to redo the transfer or continue the transfer with the same parameters and the current
addresses, it could be also convenient to assign a DMA channel to a specific repetitive task,
saving most of programming costs (i.e to access peripheral registers).
NOTE: Only 32-bit word accesses are supported.
7.1
DMA interface
The DMA interface is listed below:
• four sets each composed of 8 channel registers to store source, destination and configuration
parameters (addresses, modifiers, length) of the 4 DMA channels.
• a global control and status register to control the DMA engine.
• three global read only shadow registers that show the parameters (length, internal address,
external address) of the current memory access on the running channel.
The DMA interface can be programmed either from an AHB master controller or from a mAgicV
core access.
7.1.1
DMA Channel Registers
All read/write operations on the DMA channel registers (see Table 7-1 on page 93) are performed on the currently selected channel, which is specified by the ACTIVECHAN field of the
MGCDMASTAT register. To change the currently selected channel a write operation must be
performed to the CHGCHAN field of the MGCDMACTRL register.
The DMA outputs a BUSYx signal to indicate that a DMA operation has been requested on
channel x (by writing the READ or WRITE fields of the MGCDMACTRL register). The DMA generates a RUNx signal when the request on channel x is effectively processed. The DMA raises
92
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
an EOTx signal upon the completion of the whole burst on channel x. Both BUSYx and RUNx
signals are cleared when the transfer on channel x ends (EOTx).
All these signals are registered in the MGCDMASTAT register and can be quickly tested by
using the Write And Test fields of the MGCDMACTRL register.
By reading the BUSYx bit it is possible to know if the channel can be safely programmed.
Each channel can be programmed with new external and/or internal parameters (RLDEXT or
RLDINT fields of MGCDMACTRL register) or with its latest accessed addresses (PRSVEXT or
PRSVINT fields of MGCDMACTRL register).
NOTE: Busy channels can’t be programmed, otherwise a DMABUSY exception is raised and
registered in the MGCEXCEPTION register.
Each DMA channel transfer can be aborted by writing the ABORT or ABORTALL fields of the
MGCDMACTRL register.
In case of abort a following read and write operation in “preserve mode” (that use latest DMA
channel parameters) can lead to unpredictable results.
Table 7-1.
7.1.1.1
DMA Channel Registers
mAgicV
Address
AHB Offset
Name
Type
Reset Value
0x64
0x190
MGCDMAEXTADD
RW
NA
0x65
0x194
MGCDMAEXTCIRCLEN
RW
NA
0x66
0x198
MGCDMAEXTMOD
RW
NA
0x67
0x19C
MGCDMAINTADD
RW
NA
0x68
0x1A0
MGCDMAINTCIRCLEN
RW
NA
0x69
0x1A4
MGCDMAINTMOD
RW
NA
0x6A
0x1A8
MGCDMALEN
RW
NA
0x6B
0x1AC
MGCDMAINTSEG
RW
NA
MGCDMAEXTADD
It is a 32-bit register that specifies an AHB address space. In case of DMA write operations it
contains an AHB destination address, otherwise an AHB source address. The external memory
is byte addressed. However, the external address must be 32-bit word aligned.
NOTE: A misaligned address raises an AHB master error registered in the MASTERR bit of the
MGCEXCEPTION register.
7.1.1.2
MGCDMAEXTCIRCLEN
It is a 24-bit register that specifies the external address bit mask to be applied to the external
modified address. In order to obtain address wrapping around 1Kbyte, for instance, the mask
should be set up to the value: 0x3FF. The biggest mask is 0xFFFFFF. Not using mask features
means that the 0xFFFFFF mask is used (see Section 7.1.3 ”DMA Address Generation” on page
100).
NOTE: Only type 2X-1 masks bring to circular contiguous buffers of size 2X.
93
7011A–DSP–12/08
7.1.1.3
MGCDMAEXTMOD
It is a 16-bit register that specifies the increment to add every AHB cycle to the preceding external address. If it is not 4 the AHB master has to split the burst in a sequence of 1-data transfers
at the calculated addresses, making the DMA transfer less efficient.
NOTE: The modifier must be a multiple of 4 (32-bit accesses), or zero to access the same location. Other values raise a MASTERR exception.
7.1.1.4
MGCDMAINTADD
It is a 16-bit register that contains the internal mAgicV address. In case of DMA write operations
it specifies a source address, otherwise it’s a destination address.
The complete internal address is constituted by decoding the MGCDMAINTSEG register that
specify the internal memory segments (see Table 7-2 on page 95).
NOTE: The mAgicV internal memory is word addressed. The word size depends on the internal
memory segment. Only in case of DM_D memory space the internal mAgicV address must be
multiplied by 2.
7.1.1.5
MGCDMAINTCIRCLEN
It is a 16-bit register that specifies the bit mask to be applied to the internal modified address. In
order to obtain an address wrapping around 512Words, for instance, the mask shall be set up to
the value: 0x1FF. The biggest mask is 0xFFFF. Not using mask features means that the 0xFFFF
mask is used.
NOTE: Only type 2X-1 masks bring to circular contiguous buffers of size 2X.
7.1.1.6
MGCDMAINTMOD
It is a 16 bit register that specifies the increment to add every AHB cycle to preceding internal
address. The default is 1.
NOTE: In case of DM_D memory space the increment must be multiple of 2. A zero value
accesses the same location.
7.1.1.7
MGCDMALEN
It is a 16-bit register that holds the number of 32-bit word transfers. It does not express the number of bytes to transmit. For example a MGCDMALEN=128 refers to a transfer of 32 program
memory 128-bit wide VLIWs, or 128 integer data word, or 128 single precision floating point data
words, or finally 64 double precision (see Section 4.1 ”Data Formats” on page 62) floating point
data words.
7.1.1.8
MGCDMASEG
It is a 4-bit register that selects one of the 4 internal memory regions (PM, DM_I, DM_F and
DM_D) according to the mapping shown in Table 7-2 on page 95 (see Section 3.7.1 ”Program
Memory Accesses” on page 39 and Section 3.7.2 ”Data Memory Accesses” on page 40).
94
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
Table 7-2.
Memory Segment Regions
AHB Start
Address
AHB End
Address
Access
Word
Size
MGCDMASEG
PM
0x00600000
0x0061FFFF
4 x word32
128-bit
0x1
DM_I
0x00620000
0x0062FFFF
word32
32-bit
0x2
DM_F
0x00640000
0x0064FFFF
word32
32-bit
0x4
DM_D
0x00660000
0x0067FFFF
2 x word32
40-bit
0x8
Resource
7.1.2
7.1.2.1
DMA Control and Status Register
The DMA control and status registers have the same address. Write operations access the
MGCDMACTRL control register, while read operations access the MGCDMASTAT register.
MGCDMACTRL
The MGCDMACTRL is a 32-bit control register and it is used to start and configure DMA channel transfers. This register can also be used to generate WAT condition by writing in the WAT
fields (see Section 3.9.3 ”Conditioned and Unconditioned Jumps” on page 48).
All DMA operations (write, read, abort) are performed on the currently selected channel
(ACTIVECHAN field of MGCDMASTAT register). By writing CHGCHAN and ACTIVECHAN
fields in the MGCDMACTRL it is possible to change the currently selected channel.
Table 7-3.
MGCDMACTRL Register
31
-
30
-
29
wateot3
28
wateot2
27
wateot1
26
wateot0
25
-
24
waterror
23
unlock
22
lock
21
watrun
20
watbusy
19
watrun3
18
watrun2
17
watrun1
16
watrun0
15
watbusy3
14
watbusy2
13
watbusy1
12
watbusy0
11
abortall
10
9
8
chgchan
7
prsvext
6
rldext
5
prsvint
4
rldint
3
clreot
2
abort
1
read
0
write
activechan
• write
WRITE=’1’, starts a DMA write operation on the selected channel.
WRITE=’0’, no effect.
• read
READ=’1’, starts a DMA read operation on the selected channel.
READ=’0’, no effect
• abort
ABORT=’1’, aborts the DMA operation on the selected channel.
ABORT=’0’, no effect.
• clreot: clear EOTs
95
7011A–DSP–12/08
CLREOT=’1’, clears EOTs registered into MGCDMASTAT.
CLREOT=’0’, no effect.
• rldint: reload internal
RLDINT=’1’, the internal DMA parameters of the selected channel are re-loaded from channel
registers.
RLDINT=’0’, no effect.
• prsvint: preserve internal
PRSVINT=’1’, the internal DMA parameters of the selected channel are not reloaded from channel registers. The DMA continues from the latest accessed (not included) internal memory
location.
PRSVINT=’0’, no effect.
• rldext: reload external
RLDEXT=’1’, the external DMA parameters of the selected channel are re-loaded from the channel registers.
RLDEXT=’0’, no effect.
• prsvext: preserve external
PRSVEXT=’1’, the external DMA parameters of the selected channel are not reloaded from the
channel registers. The DMA continues from the latest accessed (not included) external memory
location.
PRSVEXT=’0’, no effect.
• chgchan: change channel
CHGCHAN=’1’, it changes the active channel in the MGCDMASTAT to the value specified in the
field ACTIVECHAN of the MGCDMACTRL register. For instance, to set channel 2 as an active
channel the MGCDMACTRL register must be written with 0x500 (CHGCHAN=’1’ and
ACTIVECHAN=’2’).
CHGCHAN=’0, no effect.
• activechan: active channel
two bit parameters that selects the current channel.
• abortall
ABORTALL=’1’, aborts all the current and pending DMAs (can corrupt program, if PMU is
enabled).
ABORTALL=’0’, no effect.
• watbusy0: Write And Test on BUSY0
WATBUSY0=’1’, tests if channel 0 is busy.
WATBUSY0=’0’, no effect.
• watbusy1: Write And Test on BUSY1
WATBUSY1=’1’, tests if channel 1 is busy.
96
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
WATBUSY1=’0’, no effect.
• watbusy2: Write And Test on BUSY2
WATBUSY2=’1’, tests if channel 2 is busy.
WATBUSY2=’0’, no effect.
• watbusy3: Write And Test on BUSY3
WATBUSY3=’1’, tests if channel 3 is busy.
WATBUSY3=’0’, no effect.
• watrun0: Write And Test on RUN0
WATRUN0=’1’, tests if channel 0 is running.
WATRUN0=’0’, no effect.
• watrun1: Write And Test on RUN1
WATRUN1=’1’, tests if channel 1 is running.
WATRUN1=’0’, no effect.
• watrun2: Write And Test on RUN2
WATRUN2=’1’, tests if channel 2 is running.
WATRUN2=’0’, no effect.
• watrun3: Write And Test on RUN3
WATRUN3=’1’, tests if channel 3 is running.
WATRUN3=’0’, no effect.
• watbusy: Write And Test on BUSY
WATBUSY=’1’, tests if any channel is busy.
WATBUSY=’0’, no effect.
• watrun: Write And Test on RUN
WATRUN=’1’, tests if any channel is running.
WATRUN=’0’, no effect.
• lock
LOCK=’1’, forces the exclusive use of the AHB bus (use with care), until unlock.
LOCK=’0’, no effect.
• unlock
UNLOCK=’1’, unlocks the AHB bus (default).
UNLOCK=’0’, no effect.
• waterror: Write And Test on error
WATERROR=’1’, tests for master error.
WATERROR=’0’, no effect.
97
7011A–DSP–12/08
• wateot0: Write And Test on End Of Transfer 0
WATEOT0=’1’, tests for channel 0 EOT.
WATEOT0=’0’, no effect.
• wateot1: Write And Test on End Of Transfer 1
WATEOT1=’1’, tests for channel 1 EOT.
WATEOT1=’0’, no effect.
• wateot2: Write And Test on End Of Transfer 2
WATEOT2=’1’, tests for channel 2 EOT.
WATEOT2=’0’, no effect.
• wateot3: Write And Test on End Of Transfer 3
WATEOT3=’1’, tests for channel 3 EOT.
WATEOT3=’0’, no effect.
7.1.2.2
MGCDMASTAT
It is a 32-bit register that reports the status of the 4 DMA channels.
Table 7-4.
MGCDMASTAT Register
31
-
30
-
29
-
28
-
27
-
26
-
25
-
24
-
23
-
22
-
21
-
20
lock
19
eot3
18
eot2
17
eot1
16
eot0
15
-
14
dmaerr
13
prsvext
12
prsvint
9
run
8
busy
7
run3
6
run2
5
run1
4
run0
1
busy1
0
busy0
11
10
activechan
3
busy3
2
busy2
• busy0
BUSY0=’1’, channel 0 busy.
BUSY0=’0’, channel 0 free.
• busy1
BUSY1=’1’, channel 1 busy.
BUSY1=’1’, channel 1 free.
• busy2
BUSY2=’1’, channel 2 busy.
BUSY2=’0’, channel 2 free.
• busy3
BUSY3=’1’, channel 3 busy.
98
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
BUSY3=’0’, channel 3 free.
• run0
RUN0=’1’, channel 0 running.
RUN0=’0’, channel 0 not running.
• run1
RUN1=’1’, channel 1 running.
RUN1=’0’, channel 1 not running.
• run2
RUN2=’1’, channel 2 running.
RUN2=’0’, channel 2 not running.
• run3
RUN3=’1’, channel 3 running.
RUN3=’0’, channel 3 not running.
• busy
BUSY=’1’, some channels are busy (useful for WAT generation).
BUSY=’0’, all channels are free.
• run
RUN=’1’, some channels are running (useful for WAT generation).
RUN=’0’, all channels are not running
• activechan: active channel
currently selected channel.
• prsvint:preserve internal
PRSVINT=’1’, channel internal parameters are preserved. The DMA continues from the last
accessed (not included) internal memory location.
PRSVINT=’0’, channel internal parameters are re-loaded.
• prsvext:preserve external
PRSVEXT=’1’, channel external parameters are preserved. The DMA continues from the last
accessed (not included) external memory location.
PRSVEXT=’0’, channel external parameters are re-loaded.
• dmaerr
DMAERR=’1’, an error occured during a DMA.
• eot0
EOT0=’1’, End Of Transfer signal on channel 0.
• eot1
99
7011A–DSP–12/08
EOT1=’1’, End Of Transfer signal on channel 1.
• eot2
EOT2=’1’, End Of Transfer signal on channel 2.
• eot3
EOT3=’1’, End Of Transfer signal on channel 3. (PMU EOT are not signalled).
• lock
LOCK=’1’, mAgicV DMA transfers lock the AHB bus.
LOCK=’0’, AHB bus not locked by mAgicV.
7.1.2.3
DMA Shadow Registers
The registers listed in Table 7-5 below show the addresses that the DMA engine is generating
for the current access. They are updated during the transfer and they are used for re-starting a
new transfer from the last point (“preserve mode”).
Table 7-5.
7.1.3
DMA Shadow Registers
mAgicV
Address
AHB Offset
Name
Type
Reset Value
0x6C
0x1B0
MGCDMACURRLEN
RO
NA
0x6D
0x1B4
MGCDMACURREXTADD
RO
NA
0x6E
0x1B8
MGCDMACURRINTADD
RO
NA
DMA Address Generation
When a DMA operation starts by writing the WRITE/READ field of MGCDMACTRL register or by
FLOW code WRITEDMA/READDMA (see Section 5.2 ”Sleep and Wakeup” on page 84), the
channel parameters are copied in the DMA shadow registers.
reload mode:
mgcdmaextadd=> mgcdmaextcuradd
mgcdmaintseg[3:0] & mgcdmaintadd[15:0] => mgcdmacurintadd[19:0]
mgcdmalen => mgccurlen
preserve mode:
mgcdmalen => mgccurlen
At each AHB cycle the DMA engine performs the following operation:
External address = mgcdmacurextadd = (mgcdmacurext BITWISEAND
BITWISENOT(mgcdmaextcirc)) + ((mgcdmacurext + mgcdmaextmod) BITWISEAND
mgcdmaextcirc)
Internal address = mgcdmacurrintadd = (mgcdmacurint BITWISEAND
BITWISENOT(mgcdmaintcirc)) + ((mgcdmacurint + mgcdmaintmod) BITWISEAND
mgcdmaintcirc)
mgccurlen = mgccurlen - 1
EOT=1 when mgccurlen==0
100
mAgicV DSP
7011A–DSP–12/08
mAgicV DSP
8. Revision History
Doc. Rev.
Date
7011A
12/08
Comments
• Initial document release
101
7011A–DSP–12/08
9. Table of Contents
Feature Summary...................................................................................... 1
1
About this manual .................................................................................... 2
2
References ................................................................................................ 2
3
mAgicV VLIW DSP Architecture ............................................................. 2
3.1VLIW overview ..........................................................................................................3
3.2Program Memory .......................................................................................................5
3.3Register File ..............................................................................................................8
3.4Operators Block .........................................................................................................9
3.5On-Chip Data Memory ............................................................................................31
3.6Address Generation Units .......................................................................................32
3.7AHB Slave Port .......................................................................................................39
3.8AHB Master Port .....................................................................................................41
3.9FLOW Control Block ................................................................................................43
3.10Program Management Unit ...................................................................................56
4
Programming Model .............................................................................. 62
4.1Data Formats ...........................................................................................................62
4.2Data Organization ...................................................................................................63
4.3DSP States ..............................................................................................................64
4.4Register Map ...........................................................................................................72
4.5Multicore Synchronization Support ..........................................................................75
5
Event Handling ....................................................................................... 78
5.1Interrupt Handling ....................................................................................................78
5.2Sleep and Wakeup ..................................................................................................84
5.3Exceptions ...............................................................................................................86
6
Profiling and Debug support ................................................................ 90
6.1Profiling registers .....................................................................................................90
6.2Debug ......................................................................................................................90
7
DMA ......................................................................................................... 92
7.1DMA interface ..........................................................................................................92
8
102
Revision History ................................................................................... 101
mAgicV DSP
7011A–DSP–12/08
Headquarters
International
Atmel Corporation
2325 Orchard Parkway
San Jose, CA 95131
USA
Tel: 1(408) 441-0311
Fax: 1(408) 487-2600
Atmel Asia
Room 1219
Chinachem Golden Plaza
77 Mody Road Tsimshatsui
East Kowloon
Hong Kong
Tel: (852) 2721-9778
Fax: (852) 2722-1369
Atmel Europe
Le Krebs
8, Rue Jean-Pierre Timbaud
BP 309
78054 Saint-Quentin-enYvelines Cedex
France
Tel: (33) 1-30-60-70-00
Fax: (33) 1-30-60-71-11
Atmel Japan
9F, Tonetsu Shinkawa Bldg.
1-24-8 Shinkawa
Chuo-ku, Tokyo 104-0033
Japan
Tel: (81) 3-3523-3551
Fax: (81) 3-3523-7581
Technical Support
[email protected]
Sales Contact
www.atmel.com/contacts
Product Contact
Web Site
www.atmel.com
Literature Requests
www.atmel.com/literature
Disclaimer: The information in this document is provided in connection with Atmel products. No license, express or implied, by estoppel or otherwise, to any
intellectual property right is granted by this document or in connection with the sale of Atmel products. EXCEPT AS SET FORTH IN ATMEL’S TERMS AND CONDITIONS OF SALE LOCATED ON ATMEL’S WEB SITE, ATMEL ASSUMES NO LIABILITY WHATSOEVER AND DISCLAIMS ANY EXPRESS, IMPLIED OR STATUTORY
WARRANTY RELATING TO ITS PRODUCTS INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
PURPOSE, OR NON-INFRINGEMENT. IN NO EVENT SHALL ATMEL BE LIABLE FOR ANY DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE, SPECIAL OR INCIDENTAL DAMAGES (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION, OR LOSS OF INFORMATION) ARISING OUT OF
THE USE OR INABILITY TO USE THIS DOCUMENT, EVEN IF ATMEL HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Atmel makes no
representations or warranties with respect to the accuracy or completeness of the contents of this document and reserves the right to make changes to specifications
and product descriptions at any time without notice. Atmel does not make any commitment to update the information contained herein. Unless specifically provided
otherwise, Atmel products are not suitable for, and shall not be used in, automotive applications. Atmel’s products are not intended, authorized, or warranted for use
as components in applications intended to support or sustain life.
© 2008 Atmel Corporation. All rights reserved. Atmel®, Atmel logo and combinations thereof, DIOPSIS® and others are registered trademarks,
Magic DSP® and others are trademarks of Atmel Corporation or its subsidiaries. ARM ®, Thumb ® and others are the registered trademarks or
trademarks of ARM Ltd. Other terms and product names may be trademarks of others.
7011A–DSP–12/08