ATMEL AT572D740

Features
• Dual Core System Integrating an ARM7TDMI ARM Thumb Processor Core and a
mAgic DSP for Audio, Communication and Beam-forming Applications
• High Performance DSP Operating at 100 MHz
•
•
•
•
•
– 1 GFLOPS - 1.5 Gops
– 10 Arithmetic Operations per Cycle (4 Multiply, 2 Add/subtract, 1 Add, 1 Subtract
Floating and Fixed Point) Allowing Single Cycle FFT Butterfly
– Native Support for Complex Arithmetic and Vectorial SIMD Operations: One
Complex Multiply with Dual Add/sub per Clock Cycle or Two Real Multiply and Two
Add/sub or Simple Scalar Operations
– 32-bit Integer and IEEE 40-bit Extended Precision Floating Point Numeric Format
– Large Multi-port Data Register File: 512 Registers Organized in Two 4-input 4output 256-register Banks
– Orthogonal VLIW Architecture, Code Compression for Code Size Reduction
– Flexible Addressing Capability: 2 Independent Address Generation Units
Operating on a 16 Registers Address Register File Supporting Programmable
Stride, Circular Pointers and Bit Reversal
– 1.7 Mbits of On-chip SRAM:
17 K x 40-bit Data Memory Locations
8 K x 128-bit Program Memory Location, Equivalent to 24K Instructions
– DMA Access to the External Program and Data Memory
– Two Main Operating Modes: Run and System Mode
– Efficient Optimizing Assembler: Allows Easy Exploitation of the Available
Hardware Resources Parallelism
Utilizes the ARM7TDMI Processor Core with 32 K Byte of Integrated SRAM,
Operating at 50 MHz
– Fully-programmable External Bus Interface (EBI)
Maximum External Address Space of 4 M Bytes
Up to 4 Chip Selects
Software-programmable 8/16-bit External Data Bus
– 8-channel Peripheral Data Controller (PDC)
– 8-level Priority, Individually Maskable Vectored Interrupt Controller
4 External, 20 Internal Interrupt Sources, Including a High-priority, Low-latency
Interrupt Request
– 28 Programmable I/O Lines
– 8-channel 11-bit Programmable Clock Prescaler Feeding the Timer, Watchdog,
USARTs, SPIs
– 3-channel 16-bit Timer/Counter
5 Internal Clock Sources and 3 Configurable Sources (External Source or
Cascaded Timer Configuration)
2 Multi-purpose Output Pins plus 1 Output Dedicated to the ADDA Interface plus
3 Outputs Dedicated to the mAgic DSP
– 2 USARTs
2 Dedicated Peripheral Data Controller (PDC) Channels per USART
1 USART Supporting Full Modem Interface
– 2 Master/Slave SPI Interfaces
2 Dedicated Peripheral Data Controller (PDC) Channels per SPI
8- to 16-bit Programmable Data Length
4 External Slave Chip Selects for each SPI
– Programmable Watchdog Timer
– ADDA (A/D and D/A Converters) Interface Supporting up to 4 Analog to Digital and
4 Digital to Analog, Stereo 24-bit Converters
– IEEE 1149.1 JTAG Boundary Scan on all Active Pins
Efficient ARM - DSP Interface Based on 1K x 40-bit Dual Ported Shared Memory,
Memory Mapped Register Access, and Interrupt Lines
1.8 V Core Operating Voltage, 3.3 V I/O Operating Voltage
On-chip PLL for 100 Mhz Operation from 25 Mhz Reference Clock
352-ball PBGA Package
DIOPSIS 740
Dual Core DSP
AT572D740
Summary
7001AS–DSP–03/04
Note: This is a summary document. A complete document
is not available at this time. For more information, please
contact your local Atmel sales office.
Description
DIOPSIS 740 is a Dual CPU Processor integrating a mAgic DSP and an ARM7TDMI™
RISC MCU, plus a total of 245 Kbytes SRAM. The system combines the flexibility of the
ARM7TDMI RISC controller with the very high performance of the DSP.
mAgic is a high performance VLIW DSP delivering 1 Giga floating-point operations per
second (GFLOPS) at a clock rate of 100 MHz. It has 512 data registers, 16 address registers, 10 independent operating units and 2 independent address generation units. For
instance, activating all the computing units, it can produce one complete FFT butterfly
per cycle. mAgic operates on 32-bit fixed-point and IEEE 754 40-bit extended precision
floating-point numeric format. It has also on-chip 17K x 40-bit data memory locations
and 8K x 128-bit program memory locations. Efficient usage of the internal program
memory is achieved through a code compression mechanism.
An optimizing assembler frees the user from the burden of dealing with the parallelism
of the processor resources and drastically simplifies the code development.
The ARM7TDMI™ embedded micro controller core is a member of the Advanced RISC
Machines (ARM®) family of general purpose 32-bit microprocessors, which offer high
performance and very low power consumption. The ARM architecture is based on
Reduced Instruction Set Computer (RISC) principles, and the instruction set and the
related decode mechanism are much simpler than those of micro programmed Complex
Instruction Set Computers.
This simplicity results in a high instruction throughput and impressive real-time interrupt
response. The ARM7TDMI™ supports 16-bit Thumb® subset of the most commonly
used 32-bit instructions. These are expanded at run time with no degradation of system
performance. This gives 16-bit code density (saving memory area and cost) coupled
with 32-bit processor performance.
A rich set of peripheral and a 32 Kbytes internal memory provide a highly flexible and
integrated system solution.
2
AT572D740
7001AS–DPS–03/04
AT572D740
Pin Configuration
Table 1. D740 Ball Assignment (243 I/O)
Name
Ball
Name
Ball
Name
Ball
Name
Ball
ADDA_BRCK
C21
ARM_D[6]
W25
PIO[8]
AD23
SPI0_NSS[1]
A17
ADDA0_IN
B21
ARM_D[7]
Y24
PIO[9]
AE24
SPI0_NSS[2]
D17
ADDA1_IN
A22
ARM_D[8]
Y26
PIO[10]
AD22
SPI0_NSS[3]
B16
ADDA2_IN
C22
ARM_D[9]
Y25
PIO[11]
AC22
SPI0_SCK
D18
ADDA3_IN
D22
ARM_D[10]
AA26
PIO[12]
AE23
SPI1_MISO
B19
ADDA0_OUT
B22
ARM_D[11]
AA24
PIO[13]
AD21
SPI1_MOSI
A20
ADDA1_OUT
A23
ARM_D[12]
Y23
PIO[14]
AF22
SPI1_NSS
C18
ADDA2_OUT
C23
ARM_D[13]
AA25
PIO[15]
AE22
SPI1_NSS [1]
C19
ADDA3_OUT
B23
ARM_D[14]
AB26
PIO[16]
AD20
SPI1_NSS [2]
A18
ADDA_TOPLL
A24
ARM_D[15]
AB24
PIO[17]
AF21
SPI1_NSS [3]
B17
ADDA_WCK
B24
ARM_NCS0
H25
PIO[18]
AC20
SPI1_SCK
A19
ARM_A[0]
A25
ARM_NCS1
J26
PIO[19]
AE21
TEST_CLK (dnc)
M25
ARM_A[1]
D24
ARM_NCS2
K24
PIO[20]
AD19
USART0_RXD
AE17
ARM_A[2]
C25
ARM_NCS3
J25
PIO[21]
AF20
USART0_SCK
AF17
ARM_A[3]
E24
ARM_NRD
K23
PIO[22]
AC19
USART0_TXD
AE18
ARM_A[4]
D26
ARM_NWEB0
K26
PIO[23]
AE20
USART1_CTS
AD12
ARM_A[5]
D25
ARM_NWEB1
L24
PIO[24]
AD18
USART1_DCD
AE14
ARM_A[6]
F24
BIST_RES (dnc)
H1
PIO[25]
AE19
USART1_DSR
AC14
ARM_A[7]
E26
BIST_RUN (dnc)
H3
PIO[26]
AF18
USART1_DTR
AF14
ARM_A[8]
E25
FPU_EXC
AD15
PIO[27]
AD17
USART1_RI
AF15
ARM_A[9]
G24
FPU_HALT
AD13
PLL_CLKIN
N24
USART1_RTS
AF16
ARM_A[10]
F26
FPU_MODE
AE15
PLL_CLKOUT
N25
USART1_RXD
AC15
ARM_A[11]
G23
ICE_NTRST
K25
PLL_DIV (dnc)
P24
USART1_SCK
AD16
ARM_A[12]
F25
ICE_TCK
M23
PLL_DN (dnc)
T25
USART1_TXD
AC17
ARM_A[13]
H24
ICE_TDI
L26
PLL_EN
L25
XM_A[0]
AC12
ARM_A[14]
G26
ICE_TDO
N23
PLL_LFT
T24
XM_A[1]
AE13
ARM_A[15]
H23
ICE_TMS
M24
PLL_LOCK
R24
XM_A[2]
AD11
ARM_A[16]
G25
JCFG
M26
PLL_TST (dnc)
N26
XM_A[3]
AD10
ARM_A[17]
J24
PIO[0]
AB23
PLL_UP (dnc)
U23
XM_A[4]
AE11
ARM_A[18]
H26
PIO[1]
AB25
RESET
AD14
XM_A[5]
AC10
ARM_D[0]
V24
PIO[2]
AC26
SCAN_EN (dnc)
G2
XM_A[6]
AD9
ARM_D[1]
U25
PIO[3]
AC24
SCAN_TEST (dnc)
F1
XM_A[7]
AE10
ARM_D[2]
V26
PIO[4]
AC25
SINGLE
AE16
XM_A[8]
AF9
ARM_D[3]
V25
Notes:
AD26
SPI0_MISO
C20
XM_A[9]
AE9
1. PIO[5]
3
7001AS–DPS–03/04
Table 1. D740 Ball Assignment (243 I/O) (Continued)
Name
Ball
Name
Ball
Name
Ball
Name
Ball
ARM_D[4]
W24
PIO[6]
AD25
SPI0_MOSI
B20
XM_A[10]
AD8
ARM_D[5]
V23
PIO[7]
AE26
SPI0_NSS
C17
XM_A[11]
AF8
XM_A[12]
AC9
XM_D[14]
U3
XM_D[39]
C14
XM_CLKOUT[0]
J4
XM_A[13]
AE8
XM_D[15]
V2
XM_D[40]
U4
XM_CLKOUT[1]
H2
XM_A[14]
AD7
XM_D[16]
L1
XM_D[41]
U1
XM_CLKOUT[2]
G1
XM_A[15]
AF7
XM_D[17]
K3
XM_D[42]
T3
XM_D[0]
AD2
XM_A[16]
AE7
XM_D[18]
L2
XM_D[43]
U2
XM_D[64]
B7
XM_A[17]
AF6
XM_D[19]
K4
XM_D[44]
R4
XM_D[65]
C9
XM_A[18]
AC7
XM_D[20]
K1
XM_D[45]
R3
XM_D[66]
A8
XM_A[19]
AE6
XM_D[21]
K2
XM_D[46]
T2
XM_D[67]
A9
XM_A[20]
AF5
XM_D[22]
J1
XM_D[47]
R1
XM_D[68]
C10
XM_A[21]
AD5
XM_D[23]
J2
XM_D[48]
P3
XM_D[69]
B9
XM_A[22]
AC5
XM_D[24]
E3
XM_D[49]
R2
XM_D[70]
D10
XM_A[23]
AE5
XM_D[25]
E4
XM_D[50]
N3
XM_D[71]
A10
XM_D[1]
AB3
XM_D[26]
E2
XM_D[51]
P1
XM_D[72]
A13
XM_D[2]
AC1
XM_D[27]
D1
XM_D[52]
N1
XM_D[73]
B13
XM_D[3]
AA3
XM_D[28]
D3
XM_D[53]
M4
XM_D[74]
A14
XM_D[4]
AB1
XM_D[29]
D2
XM_D[54]
N2
XM_D[75]
D15
XM_D[5]
AB2
XM_D[30]
C1
XM_D[55]
M2
XM_D[76]
B14
XM_D[6]
AA1
XM_D[31]
D5
XM_D[56]
C6
XM_D[77]
A15
XM_D[7]
Y4
XM_D[32]
C11
XM_D[57]
A5
XM_D[78]
B15
XM_D[8]
AA2
XM_D[33]
D12
XM_D[58]
C7
XM_D[79]
A16
XM_D[9]
Y1
XM_D[34]
A11
XM_D[59]
A6
XM_GNT
F2
XM_D[10]
W4
XM_D[35]
C12
XM_D[60]
D7
XM_NCS
E1
XM_D[11]
Y2
XM_D[36]
B11
XM_D[61]
C8
XM_NWE
F3
XM_D[12]
W1
XM_D[37]
A12
XM_D[62]
A7
XM_REQ
G4
XM_D[13]
V1
XM_D[38]
C13
XM_D[63]
D8
Note:
dnc = do not connect pins. These pins are reserved for test use only and are not
described in Table 6.
Table 2. D740 Ball Assignment (VDD = 3.3V)
D6
F4
L4
AC6
D11
F23
T4
D21
AC16
AA23
T23
AC21
4
L23
AC11
D16
AA4
AT572D740
7001AS–DPS–03/04
AT572D740
Table 3. D740 Ball Assignment (VDDI = 1.8V)
B18
B12
B6
T1
W3
AD6
AF11
AF19
AF23
W26
E23
Table 4. D740 Ball Assignment (VDDPLL = 1.8V)
P25
R26
Table 5. D740 Ball Assignment (GND)
A1
C3
D23
W23
AD3
AF25
A2
C24
AC4
AD24
H4
AF26
A26
D4
AC8
J23
AE1
B2
D9
N4
AC13
AE2
B25
AE25
AC18
P23
D14
B26
AF1
AC23
D19
V4
All balls not comprised in Tables 1 to 5 are “not connected”.
Pin name conventions
Pin names are built using the following structure:
(functional block name) _ (activity level) (line name) (bus index)
where:
–
functional block name = name of the functional block to which the pin
belongs
–
activity level = “n” for low active lines; blank for high active lines
–
line name = name of the function of the pin line
–
bus index = number (in [ ]) corresponding to the index when the pin line is an
element of a bus
5
7001AS–DPS–03/04
Pin Description
Table 6. D740 Pin Description
Active
Level
Module
Name
Function
Type
ADDA
ADDA_BRCK
ADDA Bit rate clock
in
digital serial audio stream bit rate clock
(64 x F sampling)
ADDA
ADDA0_IN
ADDA 0 input channel
in
24 bit Left + 24 bit right digital serial
stereo audio stream
ADDA
ADDA1_IN
ADDA 1 input channel
in
24 bit Left + 24 bit right digital serial
audio stream
ADDA
ADDA2_IN
ADDA 2 input channel
in
24 bit Left + 24 bit right digital serial
audio stream
ADDA
ADDA3_IN
ADDA 3 input channel
in
24 bit Left + 24 bit right digital serial
audio stream
ADDA
ADDA0_OUT
ADDA 0 output channel
in
24 bit Left + 24 bit right digital serial
stereo audio stream
ADDA
ADDA1_OUT
ADDA 1 output channel
in
24 bit Left + 24 bit right digital serial
audio stream
ADDA
ADDA2_ OUT
ADDA 2 output channel
in
24 bit Left + 24 bit right digital serial
audio stream
ADDA
ADDA3_ OUT
ADDA 3 output channel
out-02
24 bit Left + 24 bit right digital serial
audio stream
ADDA
ADDA_TOPLL
ADDA clock generator Strobe
out-02
F Sampling toward an external PLL for
ADCs/DACs synchronism generation
ADDA
ADDA_WCK
ADDA Word clock
out-03
F Sampling clock toward ADCs/DACs
ARM
ARM_A[18:0]
ARM external memory address
bus
out-02
ARM
ARM_D[15:0]
ARM external memory data bus
bi-02
ARM
ARM_NCS0
ARM external memory Chip select
command 0
out-02
low
ARM
ARM_NCS1
ARM external memory Chip select
command 1
out-02
low
ARM
ARM_NCS2
ARM external memory Chip select
command 2
out-02
low
ARM
ARM_NCS3
ARM external memory Chip select
command 3
out-02
low
ARM
ARM_NRD
ARM external Memory Read
enable
bi-02
low
ARM
ARM_NWEB0
ARM external memory Low Byte
Write enable
bi-03
low
data byte d[7:0]
ARM
ARM_NWEB1
ARM external memory High Byte
Write enable
bi-03
low
data byte d[15:8]
mAgic
FPU_HALT
ARM Fast IRQ from mAgic “halt”
out-02
high
To be used for monitoring
6
Notes
(internal Pull-Down)
AT572D740
7001AS–DPS–03/04
AT572D740
Table 6. D740 Pin Description (Continued)
Module
Name
Function
Type
Active
Level
Notes
mAgic
FPU_EXC
ARM IRQ15 from mAgic
“exception”
out-02
high
To be used for monitoring
mAgic
FPU_MODE
ARM IRQ25 from mAgic “mode”
out-02
JTAG
ICE_NTRST
JTAG Test reset
in
JTAG
ICE_TCK
JTAG Test clock
in
JTAG
ICE_TDI
JTAG Test data input
in
JTAG
ICE_TDO
JTAG Test data output
out-02
JTAG
ICE_TMS
JTAG Test mode
in
D740
JCFG
ARM JTAG / D740 Boundary Scan
selection
in
0à D740 Boundary Scan
1à ARM JTAG
PIO
PIO[27:0]
Parallel Input/Output
bi-02
general purpose programmable I/Os or
ARM peripheral I/Os
PLL
PLL_CLKIN
Reference clock
in
25MHz (max) if PLL_EN =1
100MHz (max) if PLL_EN =0
PLL
PLL_CLKOUT
PLL Clock output
out-02
100MHz (max) if PLL_EN =1
fixed low if PLL_EN = 0
PLL
PLL_EN
Pll enable (PLL_CLKIN x4
multiply)
in
PLL
PLL_LFT
PLL lowpass filter input
in
PLL
PLL_LOCK
PLL lock condition
D740
RESET
mAgic
To be used for monitoring
0 = mAgic in system mode
1 = mAgic in run mode
low
(internal Pull-Up)
(internal Pull-Up)
low
(internal Pull-Up)
high
1 à system clock = PLL_CLKIN x 4
0 à system clock = PLL_CLKIN
out-02
high
To be used for monitoring
System reset
in
low
asynchronous
SINGLE
Single user on mAgic external
memory
in
high
(internal Pull-Up)
0 à Not default user of shared XM
1 à Single user of not shared XM or
default user of shared XM
SPI
SPI0_MOSI
SPI 0 Master Out/Slave In data
bi-02
SPI SLV à data input
SPI MST à data output
SPI
SPI0_MISO
SPI 0 Master In/Slave Out data
bi-02
SPI SLV à data output
SPI MST à data input
SPI
SPI0_NSS
SPI 0 Input/Output Chip select
bi-02
SPI SLV à CS Input
SPI MST à CS 0 Output
SPI
SPI0_NSS[3:1]
SPI 0 Output Chip Selects
out-02
SPI SLV à n.a.
SPI MST à CS 3, 2, 1 Outputs
SPI
SPI0_SCK
SPI 0 Serial clock
bi-03
SPI SLV à clock input
SPI MST à clock output
SPI
SPI1_MOSI
SPI 1 Master Out/Slave In data
bi-02
SPI SLV à data input
SPI MST à data output
7
7001AS–DPS–03/04
Table 6. D740 Pin Description (Continued)
Active
Level
Module
Name
Function
Type
SPI
SPI1_MISO
SPI 1 Master In/Slave Out data
bi-02
SPI SLV à data output
SPI MST à data input
SPI
SPI1_NSS
SPI 1 Input/Output Chip select
bi-02
SPI SLV à CS Input
SPI MST à CS 0 Output
SPI
SPI1_NSS[3:1]
SPI 1 Output Chip Selects
out-02
SPI SLV à n.a.
SPI MST à CS 3, 2, 1 Outputs
SPI
SPI1_SCK
SPI 1 Serial clock
bi-03
SPI SLV à clock input
SPI MST à clock output
USART
USART0_RXD
USART 0 Data in
in
(internal Pull-Down)
USART
USART0_SCK
USART 0 Serial clock
bi-03
for synchronous mode only
USART
USART0_TXD
USART 0 Data out
bi-02
used as output
USART
USART1_CTS
USART 1 Clear to send
in
USART
USART1_DCD
USART 1 Data carriage detect
in
USART
USART1_DSR
USART 1 Data set ready
in
USART
USART1_DTR
USART 1 Data terminal ready
out-02
USART
USART1_RI
USART 1 Ring indicator
in
USART
USART1_RTS
USART 1 Request to send
out-02
USART
USART1_RXD
USART 1 Data in
in
(internal Pull-Down)
USART
USART1_SCK
USART 1 Serial clock
bi-03
for synchronous mode only
USART
USART1_TXD
USART 1 Data out
bi-02
used as output
mAgic
XM_A[23:0]
mAgic external Memory address
bus
out-03
mAgic
XM_CLKOUT[
2:0]
mAgic external Memory clocks
out-03
100MHz (max) One line for up to three
mAgic XM chip.
mAgic
XM_D[39:0]
mAgic external Memory data bus
bi-03
Right bank (internal Pull-Down)
mAgic
XM_D[79:40]
mAgic external Memory data bus
bi-03
Left bank (internal Pull-Down)
mAgic
XM_GNT
mAgic shared external memory
bus grant
out-02
high
mAgic
XM_NCS
mAgic external Memory Chip
select
out-03
low
mAgic
XM_NWE
mAgic external Memory Write
enable
out-03
low
Power
VDD
IO power supply
Power
3.3 nominal Supply
Power
VDDI
Core power supply
Power
1.8 nominal Supply
Power
VDDPLL
PLL power supply
Power
1.8 nominal Supply
Ground
GND
D740 ground reference
Ground
common to all Supplies
8
Notes
AT572D740
7001AS–DPS–03/04
AT572D740
Block Diagram
Figure 1. D740 Architecture
32K ARM
Memory
Arm7TDMI
ASB / APB Bridge
EB I
SPI0
Amba ASB
SPI1
USART0
MAAR
USART1
Program Bus
Mux / Demux
Shared
Memory
Data Bus
Mux / Demux
TIMER
Watchdog
8Kx128 bit
Program
Memory
mAgic DSP core
Data Memory
(6k+6k) x 40 bit
Double Bank
Double Port
PIO
PDC
ADDA
Data Buffer
2k + 2k word
Double Bank
Double Port
Data / Program Bus
Mux
Clock Gen
IRQ Ctrl
Run Mode data paths
System Mode data paths
ARM exclusive data paths
9
7001AS–DPS–03/04
Architectural
Overview
DIOPSIS 740 (also named D740) is a high performance dual-core processing platform
for audio, communication and beam-forming applications, integrating a floating-point
DSP (mAgic DSP) and an ARM7TDMI™ Reduced Instruction Set Computer (RISC).
The D740 is optimally suited for floating point applications with a significant need for
complex domain computations like FFT and frequency domain phase-shift algorithms,
requiring high dynamic range and maximum numerical precision.
The D740 combines the flexibility of the ARM7 RISC controller with the very high performance of the DSP oriented VLIW architecture of mAgic.
System management
The availability of a standard RISC on-chip lowers software development effort for non
critical and control segments of the application. ARM7TDMI supports the usage of light
RTOS and has efficient interrupt management, leaving mAgic fully available for the
numerically intensive part of the application. The synchronization between the two processors can be either based on software polling on semaphores or on interrupts.
The ARM is the D740 master processor. The bootstrap sequence of the D740 starts
from the bootstrap of the ARM from its external non-volatile memory. The ARM then
boots mAgic from a non-volatile memory. After bootstrap the D740 can start its normal
operations. The DSP side of many applications can be implemented on the D740 using
only the internal memory. In fact the program memory size of 8K by 128-bit coupled with
the availability of the code compression, gives an equivalent on-chip program memory
size of about 24K instructions (typical).
The ARM standard In-Circuit Emulation debug interface is supported via the ICE port.
mAgic DSP Processor
The mAgic DSP is the VLIW numeric processor of the D740. It operates on IEEE 754
40-bit extended precision floating-point and 32-bit integer numeric format. The main
components of the DSP subsystem are the core processor, the on-chip memories and
the interfaces to and from the ARM subsystem. The operators block, the register file, the
address generation unit and the program decoding and sequencing unit compose the
core processor. A short description of each block is given in the following paragraphs.
Core processor
mAgic is a VLIW engine, but from an user point of view, it works like a RISC machine by
implementing triadic computing operations on data coming from the register file, and
data move operations between the local memories and the register file. The operators
are pipelined for maximum performance. The pipeline depth depends on the operator
used. The operations scheduling and parallelism are automatically defined and managed at compile time by the assembler-optimizer, allowing efficient code execution. In
order to give the best support to the RISC-like programming model, mAgic is equipped
with a complex 256-entry register file. It can be used as a complex register file (real +
imaginary part), or as a dual register file for vectorial operations. When performing single instructions the register file can be used as an ordinary 512 register file. Both the left
and right side of the register file are 8-ported, making a total of 16 I/O port available for
the data move to and from the operator block and the memory. The total data bandwidth
between the register file and the operator block is 70 bytes per clock cycle, avoiding bottlenecks in the data flow between the two units.
10
AT572D740
7001AS–DPS–03/04
AT572D740
Figure 2. mAgic DSP Block Diagram
mAgic – ARM I/F
VLIW Program Memory
Local Controller and VLIW Decoder
Instruction
Decoder
Condition
Generation
Data Register
File
Status
Register
Program
Counter
Multiple
Address
Generation
Unit
PARM
Memory
Left 512x40
PARM
Memory
Right 512x40
Data
Memory
Left 6Kx40
Data
Memory
Right 6Kx40
Buffer Data
Memory Left
2Kx40
Buffer Data
Memory
Right 2Kx40
Address
Register File
Operator
Block
DMA
Controller
External Memory I/F
The operators block, the register file, the address generation unit and the programsequencing unit compose the core processor. The Operators Block contains the hardware that performs arithmetical operations. It works on 32-bit integers and IEEE 754
extended precision 40-bit floating-point data.
The Operators Block is composed of four integer/floating point multipliers, an adder, a
subtractor and two add-subtract integer/floating point units; moreover, it has two
shift/logic units, a Min/Max operator and two seed generators for efficient division and
inverse square root computation. The operators block is arranged in order to natively
support complex arithmetic (single cycle complex multiply or multiply and add), fast FFT
(single cycle butterfly computation) and vectorial computations. The peak performance
of mAgic is achieved during single cycle FFT butterfly execution, when mAgic delivers
10 floating-point operations per clock cycle.
mAgic is equipped with two independent address generation units. It is able to generate
up to two pairs of addresses, one to access the left and the right memory for reading
and one to access the left and the right memory for writing. It is also used in the loop
control to test if the end of a loop is reached. The Multiple Address Generation Unit
(MAGU) supports linear addressing with stride, circular addressing and bit reversed
addressing. The address generation unit has 16 registers.
The Program Address Generation Unit is devoted to control the correct Program
Counter generation according to the program flow. It generates addresses for linear
code execution as well as for non-sequential program flow. The Condition Generation
Unit combines the flags generated by the operators to produce complex conditions flags
used to control the program execution. Predicated instruction execution is supported for
different groups of instructions: arithmetical instructions, memory write, immediate load,
or all of them. The Program Address Generation Unit also allows to perform conditioned
and unconditioned branch instructions, loops, call to subroutines and return from subroutines.
11
7001AS–DPS–03/04
Internal memories, External
memories and DMA
mAgic has four on-chip memory blocks: the Program Memory, the Data Memory, the
Data Buffer, and the dual ported memory shared with the ARM processor.
An External Memory Interface multiplexes the Data accesses and the Program
accesses to and from the External Memory.
The Program Memory stores the VLIW program to be executed by mAgic. It is 8K
words by 128-bit single port memory. When mAgic is in System Mode the ARM can
modify the content of the mAgic Program Memory in two different ways. The ARM can
directly write a Program Memory location by accessing the memory address space
assigned to the mAgic Program Memory in the ARM memory map. In this access mode
the ARM writes four 32-bit words to four consecutive addresses at correct address
boundaries, in order to properly complete a single VLIW word write cycle. The ARM can
also modify the content of the mAgic Program Memory by initiating a DMA transfer from
the External Memory to the mAgic Program Memory. In this access mode a single VLIW
word is transferred from the mAgic External Memory to the mAgic Program Memory 64bit per cycle, that is a complete word every two clock cycles. Due to the program compression scheme used, which allows an average program compression between 2 and
3, the code accessing capability of mAgic from its External Memory is greater than an
instruction per clock cycle. When mAgic is in Run Mode, the ARM cannot get access to
the mAgic Program Memory. When in Run Mode mAgic can initiate a DMA transfer
from the External Memory to the mAgic Program Memory to load a new code segment.
The mAgic internal Data Memory is made of three memory pages, 2K words by 40-bit
for the left data memory and 2K words by 40-bit for the right data memory, giving a total
of 6K words for the left and for the right memory banks (a total of 12K words ). Each
Data Memory bank is a dual port memory that allows four simultaneous accesses, two
read and two write. The core can access vectorial and single data stored in the Data
Memory. Accessing complex data is equivalent to accessing vectorial data. During
simultaneous read and write memory accesses, the MAGU generates two independent
read and write addresses common to both the left and the right memory banks. The total
available bandwidth between the Register File and the Data Memory is 20 bytes per
clock cycle, allowing full speed implementation of numerically intensive algorithms (e.g.
complex FFT and FIR).
The Buffer Memory is 2K words by 40-bit for both the left and the right memory. The
Buffer Memory is a dual port memory. A port is connected to the core processor. The
MAGU generates the Buffer Memory addresses for transferring data to and from the
core. The second port of the Buffer Memory is connected to the External Memory Interface. The Buffer Memory does not support dual read and write accesses neither from
the core nor from the External Memory Interface. The available bandwidth between the
core processor and the Buffer Memory is equal to the available bandwidth between the
External Memory Interface and the Buffer Memory: 10 bytes per clock cycle. The maximum External Memory size of mAgic is 16 Mword Left and Right (equivalent to 32
Mword or 160 Mbytes; 24-bit address bus). A DMA controller manages the data transfer
between the External Memory and the Buffer Memory. The DMA controller can generate
accesses with stride for the External Memory. The DMA transfers to and from the Buffer
Memory can be executed in parallel with the full speed core instructions execution with
zero-overhead and without the intervention of the core processor, except for initiating it.
The last memory block in the address space of the mAgic DSP is the memory shared
(PARM) between mAgic and the ARM processor. It is a dual port memory 512 words by
40- bit for both the left and the right bank (total 1K by 40-bit). This memory can be used
to efficiently transfer data between the two processors. The available bandwidth
between the core processor and the shared memory is 10 bytes per clock cycle. On the
12
AT572D740
7001AS–DPS–03/04
AT572D740
ARM side the available bandwidth is limited by the bus size of the ARM processor (32
bits) giving a bandwidth of 4 bytes per ARM clock cycle.
ARM interface (mAAr)
The D740 master is the ARM7 RISC processor. mAgic behaves as a standard AMBA
ASB slave device, allowing access to different resources depending on the operating
mode (Run or System).
In System Mode, mAgic halts its execution and the ARM takes control of it. When mAgic
is in System mode the ARM can access many mAgic internal devices. The ability of the
ARM to access internal mAgic resources in System Mode can be used for initialization
and debugging purposes. By accessing the Command Register, the ARM can change
the operating status of the DSP (Run/System Mode), initiate DMA transactions, force
single or multiple step execution, or simply read the DSP operating status.
In Run Mode, mAgic works under direct control of its own VLIW program and the ARM
has access only to the 1K x 40-bit dual ported shared memory (PARM) and to the mAgic
Command Register.
In order to allow a tight coupling between the operations of mAgic and the ARM at run
time, they can exchange synchronization signals, based on interrupts.
ARM System: ARM7TDMI The ARM7TDMI is a 32-bit RISC microprocessor; it is a member of the Advanced RISC
Machines (ARM) family of general-purpose 32-bit microprocessors, offering high perforProcessor and
mance and very low power consumption.
Peripherals
The ARM architecture is based on Reduced Instruction Set Computer (RISC) principles,
and the instruction set and related decode mechanism are much simpler than those of
microprogrammed Complex Instruction Set Computers. This simplicity results in a high
instruction throughput and a real-time interrupt response. Pipelining is employed so that
all parts of the processing and memory systems can operate continuously. The typical
operating scheme of the ARM7TDMI is the sequence fetch-decode-execute.
The ARM7TDMI processor employs the architectural strategy known as THUMB.
THUMB instructions operate with the standard ARM register configuration, allowing
excellent interoperability between ARM and THUMB states. Each 16-bit THUMB
instruction has a corresponding 32-bit ARM instruction with the same effect on the processor model. The 16-bit instructions are expanded at run time with no degradation of
the system performance. This provides far better performance than a 16-bit architecture,
with better code density than a 32-bit architecture.
The ARM7TDMI processor is built around a bank of 37 32-bit registers and six status
registers. The ARM7TDMI supports seven operation modes:
1. User (usr): The normal ARM program execution state
2. FIQ (fiq): Fast Interrupt reQuest; it is connected to the mAgic Halt signal
3. IRQ (irq): Used for general-purpose interrupt handling
4. Supervisor (svc): Protected mode for the operating system
5. Abort mode (abt): Entered after data or instruction prefetch abort
6. System (sys): A privileged user mode for the operating system
7. Undefined (und):Entered when an undefined instruction is executed
Mode changes can be made under software control or can be brought about by external
interrupts or exception processing. Most application programs execute in User mode.
The non-user modes - known as privileged modes – are entered in order to service
interrupts or exceptions, or to access protected resources. Each operating mode has
dedicated banked registers for fast exception handling. The FIQ mode has five addi-
13
7001AS–DPS–03/04
tional banked working registers, r8_fiq to r12_fiq, to enhance interrupt processing
speed.
The ARM7TDMI processor operates in little-endian mode.
To speed-up critical routine execution or critical data segment access, the ARM7 is
equipped with 32 Kbyte of zero wait states on-chip memory.
The ARM system has two buses. The main bus is the ASB (ARM System Bus). The
APB (ARM Peripheral Bus) is designed for accesses to on-chip peripherals. The AMBA
Bridge provides an interface between the ASB and the APB.
The D740 is equipped with a set of peripherals controlled by the ARM. An on-chip
Peripheral Data Controller (PDC) transfers data between the on-chip USARTs/SPI and
the on- and off-chip memories in the DMA without the intervention of the processor.
Most importantly, the PDC removes the processor interrupt handling overhead and significantly reduces the number of clock cycles required for data transfer.
Each peripheral has a 16K-byte address space allocated in the upper 3M bytes of the
4Gbyte address space. The peripheral register set is composed of control, mode, data,
status, and interrupt registers. To maximize the efficiency of bit manipulation, frequently
written registers are mapped into three memory locations.
A short description of the available peripherals is given in the following.
14
•
EBI (External Bus Interface): the EBI generates the signals that control the access
to the External Memory or peripheral devices.
•
ADDA (Analog to Digital and Digital to Analog interface): the ADDA provides 4
channel serial interface toward stereo audio 24-bit ADC and DAC.
•
PDC (Peripheral Data Controller): The PDC provides 8 communication channels
dedicated to the two USARTs and to the two SPIs. One PDC channel is connected
to the receiving channel and the one to the transmitting channel of each peripheral.
•
USART (Universal Synchronous / Asynchronous Receiver / Transmitter): two, fullduplex, universal synchronous/asynchronous receiver/transmitters provide a simple
standard communication way managed by the Peripheral Data Controller.
•
SPI (Serial Peripheral Interface): two four-wire serial interfaces provide a simple
industry-standard communication way managed by the Peripheral Data Controller.
•
AIC (Advanced Interrupt Controller): the AIC is an 8-level priority, individuallymaskable, vectored interrupt controller. The interrupt controller is connected to the
NFIQ (fast interrupt request) and the NIRQ (standard interrupt request) inputs of the
ARM7TDMI processor.
•
PIO (Parallel I/O Controller): The PIO features 32 programmable I/O lines, 28 PIO
lines are available on D740 pads, while the remaining 4 are only internal.
•
TC (Timer Counter): the TC contains three identical 16-bit timer/counter channels.
•
WD (Watchdog Timer): the WD can be used to guard against system lock-up if the
software becomes trapped in a deadlock. If an overflow occurs, the watchdog timer
generates processor interrupts via the Advanced Interrupt Controller (AIC) and an
external low pulse through the PIO.
•
CLKGEN (Clock Generator): The clock generator provides divided clocks for several
peripherals: the Timer Counter, the Watchdog, the USARTs and the SPIs.
AT572D740
7001AS–DPS–03/04
AT572D740
Figure 3. Armsystem Architecture
15
7001AS–DPS–03/04
Development Tools
D740 is supported with a complete set of software and hardware development tools.
MADE
The D740 is supported by a set of development tools integrated into a visual development environment called MADE (Multicore Application Development Environment).
MADE provides the user with an integrated environment for producing applications for
both the D740 cores, the ARM7TDMI and the mAgic DSP, by means of a common
project management and support for the MARMOS Minimal Bios.
Code generation tools for the ARM include the GNU Code Development Chain for
ARM7 (C-C++ compiler, assembler, linker and utilities) and the ARM SDT Code Development Chain (C-C++ compiler, assembler, linker and utilities).
Code generation tools for mAgic include C compiler (GNU gcc based, ANSI compliant),
VLIW assembler-optimizer, code compressor, linker and utilities.
MADE supports the MARMOS Minimal Bios, a set of helper functions for the ARMmAgic intercommunication and the D740 peripherals management. MARMOS gives the
user the basic APIs for building an integrated ARM-mAgic application.
MADE provides the user with a simulation engine and an emulation kernel: the CycleAccurate simulator and the D740 emulator board support.
JTAG-ICE
The ARM Standard In-Circuit-Emulation debug interface is supported via the JTAG-ICE
port of the D740.
When the ARM ICE configuration is selected, the usual debug capabilities for the ARM
System are supported, while the support for the mAgic core is limited to memory and
status registers inspection.
The 5 jtag pins are shared between ARM7TDMI ICE functionality and the DIOPSIS 740
chip Boundary Scan Logic. The “JCFG” pin acts as ARM jtag / D740 BSL selector.
When “JCFG” pin is high the ARM ICE is selected, while DIOPSIS 740 BSL is selected
when “JCFG” is low.
JTST
JTST is a low cost general-purpose module that provides the appropriate resources in order to
test DIOPSIS 740. JTST provides the following resources to DIOPSIS 740:
16
–
mAgic SSRAM, ARM FLASH and SRAM
–
4 Stereo Audio 20 bit CODECs
–
1 USB 2.0 Full (12 Mbps)
–
2 RS232/LVTTL a/synchronous serial I/O lines
–
2 SPI serial I/O lines
–
Reset Logic (Power ON, Push Button, WDG)
–
IO connectors (USART, SPI, USB, PIO, AUDIO)
–
PLL-Clock Logic (25 MHz oscillator + CLK connector)
–
DIP SWITCH & Status 7-segment Display
–
Voltage Regulators 5V/3.3V & 5V/1.8V
–
M-ICE JTAG
AT572D740
7001AS–DPS–03/04
AT572D740
Mechanical Drawing
17
7001AS–DPS–03/04
Table 7. D740 Dimensions (mm)
Symbol
Min
Nom
Max
A1
0.50
0.60
0.70
∅b
0.60
0.75
0.90
aaa
0.30
bbb
0.25
ccc
0.35
ddd
0.30
eee
0.15
A
2.12
2.33
2.56
Dim “B”
0.44
0.52
0.60
e REF
D/E
18
1.27
34.8
35.0
35.2
D1/E1
30.0
30.7
f REF
11.0
J/L REF
1.62
AT572D740
7001AS–DPS–03/04
AT572D740
Power Dissipation
The D740 has three kinds of power supply pins:
•
VDDCORE pins, which power the chip core (1.8V)
•
VDDIO pins, which power the I/O lines (3.3V)
•
VDDPLL pins, which power the oscillator and PLL cells (1.8V)
The total power dissipation is the sum of two basic contributions:
PD = PIO + PCORE
PIO represents the contribute due to the IO pads current and the output load current.
PCORE represents the contribute due to the internal activity current.
The following table defines the current consumption on different conditions:
Table 8. Power Dissipation
Parameters
typical conditions
worst conditions
Idd IO (3.3V) mA
Idd CORE (1.8V) mA
Idd IO (3.3V) mA
Idd CORE (1.8V) mA
Idd peak
330
460
425
600
Idd high
120
400
155
520
Idd no ext
25
390
35
500
Idd sys mode
25
100
35
135
Idd rst
10
160
15
205
•
Idd peak = mAgic FFT; both mAgic and ARM ext mem written 100% with continuous
toggling data
•
Idd high = mAgic FFT; both mAgic and ARM ext mem read and written alternatively
100% with 50% toggling data
•
Idd no ext = mAgic FFT; ARM FLASH access 100%; no mAgic ext mem access
•
Idd sys mode = mAgic in system mode; ARM FLASH accesses 100%;
•
Idd rst = D740 under reset
•
typical condition = typical process; Tj = 25°; Vdd = nom
•
worst condition = worst process; Tj = 100°; Vdd = nom + 10%
To estimate power consumption for a specific application use the following equation
where % is the amount of time your program spends in that state and each “Idd” contribute corresponds to “IO” or “CORE” columns:
PCORE = ((%peak × Idd peak) + (%high × Idd high) + (%no ext × Idd no ext) + (%sys mode × Idd sys
mode) + (%rst × Idd rst)) x 1.8
PIO = ((%peak × Idd peak) + (%high × Idd
mode) + (%rst × Idd rst)) x 3.3
Note:
high)
+ (%no ext × Idd no ext) + (%sys mode × Idd
sys
Idd peak represents worst-case processor operation (for Idd IO particularly) and it is not
considerable for also for hard applications where all data bits do not toggle every cycle.
19
7001AS–DPS–03/04
Reliability Data
The following table summarizes some basic data that can be used in reliability
calculations.
Table 9. Silicon Block Size
Parameters
Data
Unit
Data
Unit
Logic Gates
585
Kgates
10.5
mm2
Memories
12
M transistors
18
mm2
Register File
0.3
M transistors
5.1
mm2
45
mm2
total Device Die Size (pad excluded)
20
AT572D740
7001AS–DPS–03/04
AT572D740
Ordering Guide
Table 10. Ordering Information
Part Number
Temperature Range
Working Frequency
Operating Supplies
Package
AT572D740
0°C - 70°C
100 MHz
3.3V (I/O) & 1.8V (core)
352PBGA
21
7001AS–DPS–03/04
Atmel Corporation
2325 Orchard Parkway
San Jose, CA 95131, USA
Tel: 1(408) 441-0311
Fax: 1(408) 487-2600
Regional Headquarters
Europe
Atmel Sarl
Route des Arsenaux 41
Case Postale 80
CH-1705 Fribourg
Switzerland
Tel: (41) 26-426-5555
Fax: (41) 26-426-5500
Asia
Room 1219
Chinachem Golden Plaza
77 Mody Road Tsimshatsui
East Kowloon
Hong Kong
Tel: (852) 2721-9778
Fax: (852) 2722-1369
Japan
9F, Tonetsu Shinkawa Bldg.
1-24-8 Shinkawa
Chuo-ku, Tokyo 104-0033
Japan
Tel: (81) 3-3523-3551
Fax: (81) 3-3523-7581
Atmel Operations
Memory
2325 Orchard Parkway
San Jose, CA 95131, USA
Tel: 1(408) 441-0311
Fax: 1(408) 436-4314
Microcontrollers
2325 Orchard Parkway
San Jose, CA 95131, USA
Tel: 1(408) 441-0311
Fax: 1(408) 436-4314
La Chantrerie
BP 70602
44306 Nantes Cedex 3, France
Tel: (33) 2-40-18-18-18
Fax: (33) 2-40-18-19-60
ASIC/ASSP/Smart Cards
RF/Automotive
Theresienstrasse 2
Postfach 3535
74025 Heilbronn, Germany
Tel: (49) 71-31-67-0
Fax: (49) 71-31-67-2340
1150 East Cheyenne Mtn. Blvd.
Colorado Springs, CO 80906, USA
Tel: 1(719) 576-3300
Fax: 1(719) 540-1759
Biometrics/Imaging/Hi-Rel MPU/
High Speed Converters/RF Datacom
Avenue de Rochepleine
BP 123
38521 Saint-Egreve Cedex, France
Tel: (33) 4-76-58-30-00
Fax: (33) 4-76-58-34-80
Zone Industrielle
13106 Rousset Cedex, France
Tel: (33) 4-42-53-60-00
Fax: (33) 4-42-53-60-01
1150 East Cheyenne Mtn. Blvd.
Colorado Springs, CO 80906, USA
Tel: 1(719) 576-3300
Fax: 1(719) 540-1759
Scottish Enterprise Technology Park
Maxwell Building
East Kilbride G75 0QR, Scotland
Tel: (44) 1355-803-000
Fax: (44) 1355-242-743
Literature Requests
www.atmel.com/literature
Disclaimer: Atmel Corporation makes no warranty for the use of its products, other than those expressly contained in the Company’s standard
warranty which is detailed in Atmel’s Terms and Conditions located on the Company’s web site. The Company assumes no responsibility for any
errors which may appear in this document, reserves the right to change devices or specifications detailed herein at any time without notice, and
does not make any commitment to update the information contained herein. No licenses to patents or other intellectual property of Atmel are
granted by the Company in connection with the sale of Atmel products, expressly or by implication. Atmel’s products are not authorized for use
as critical components in life support devices or systems.
© Atmel Corporation 2003. All rights reserved. Atmel ® and combinations thereof, aaa ®, bbb ® and ccc® are the registered trademarks, and
aaa ™, bbb ™ and ccc ™ are the trademarks of Atmel Corporation or its subsidiaries. aaa ®, bbb ® and ccc ® are the registered trademarks, and aaa ™,
bbb ™ and ccc ™ are the trademarks of xxxx Company. Other terms and product names may be the trademarks of others.
Printed on recycled paper.
7001AS–DPS–03/04