file

KeyStone C66x Multicore SoC
Overview
MMI Applications Team
October 2011
KeyStone Overview
•
•
•
•
•
KeyStone Architecture
CorePac & Memory Subsystem
Interfaces and Peripherals
Coprocessors and Accelerators
Debug
Performance improvement
Enhanced DSP core
C66x ISA
100% upward object code
compatible
4x performance improvement
for multiply operation
32 16-bit MACs
Improved support for complex
arithmetic and matrix
computation
C67x+
C67x
2x registers
Native instructions
for IEEE 754,
SP&DP
Advanced VLIW
architecture
Enhanced floatingpoint add
capabilities
C674x
100% upward object code
compatible with C64x, C64x+,
C67x and c67x+
Best of fixed-point and
floating-point architecture for
better system performance
and faster time-to-market.
FLOATING-POINT VALUE
Preliminary Information under NDA - subject to change
C64x+
SPLOOP and 16-bit
instructions for
smaller code size
Flexible level one
memory architecture
iDMA for rapid data
transfers between
local memories
C64x
Advanced fixedpoint instructions
Four 16-bit or eight
8-bit MACs
Two-level cache
FIXED-POINT VALUE
KeyStone Device Features
Memory Subsystem
– Up to 1 MB Local L2 memory per core
– Up to 4 MB Multicore Shared Memory (MSM)
– Multicore Shared Memory Controller (MSMC)
– Boot ROM, DDR3-1600 MHz (64-bit)
Application-Specific Coprocessors
– 2x TCP3d: Turbo Decoder
– TCP3e: Turbo Encoder
– 2x FFT (FFT/IFFT and DFT/IDFT) Coprocessor
– 4x VCP2 for voice channel decoding
Multicore Navigator
– Queue Manager
– Packet DMA
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
64-Bit
DDR3 EMIF
MSMC
Debug & Trace
Boot ROM
Semaphore
C66x™
CorePac
Power
Management
PLL
L1
P-Cache
x3
L1
D-Cache
L2 Cache
EDMA
1 to 8 Cores @ up to 1.25 GHz
x3
HyperLink
TeraNet
Multicore Navigator
Network Coprocessor
– Packet Accelerator
– Security Accelerator
Embedded Trace Buffer (ETB) & System Trace Buffer (STB)
Smart Reflex Enabled
40 nm High-Performance Process
E thernet
Swit ch
SGM II
x2
x4
SRI O
Ap pli catio nSpeci fic I/ O
S PI
UAR T
x2
PCI e
I2C
Ot hers
Interfaces
– High-speed Hyperlink bus
– 4x Serial RapidIO Rev 2.1
– 2x 10/100/1000 Ethernet SGMII ports w/ embedded switch
– 2x PCIe Generation II
– Six-lane Antenna Interface (AIF2) for Wireless Applications
o WCDMA, WiMAX, LTE, GSM, TD-SCDMA, TD-LTE
o Up to 6.144-Gbps
– Additional Serials: I2C, SPI, GPIO, UART
Queue
Manager
Switch
C66x CorePac
– 1 to 8 C66x Fixed/Floating-Point CorePac DSP Cores at up
to 1.25 GHz
– Backward-compatible with C64x+ and C67x+ cores
– Fixed and Floating Point Operations
– RSA instruction set extensions
– Chip-rate processing (downlink & uplink)
– Reed-Muller decoding (CorePac 1 and 2 only)
Packet
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
CorePac & Memory Subsystem
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
64-Bit
DDR3 EMIF
MSMC
Debug & Trace
Boot ROM
Semaphore
C66x™
CorePac
Power
Management
PLL
L1
P-Cache
x3
L1
D-Cache
L2 Memory (cache/RAM)
EDMA
1 to 8 Cores @ up to 1.25 GHz
x3
TeraNet
HyperLink
Multicore Navigator
Switc h
E the rne t
Switc h
SGMI I
x2
x4
SRI O
Ap pli catio nSpeci fic I/ O
S PI
UAR T
x2
PCI e
I2C
Ot hers
Queue
Manager
Packet
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
CorePac & Memory Subsystem
• 1 to 8 C66x CorePac DSP Cores operating
at up to 1.25 GHz
– Fixed and Floating Point Operations
– Code compatible with other C64x+
and C67x+ devices
• L1 Memory can be partitioned as cache or
SRAM
– 32KB L1P per core
– 32KB L1D per core
– Error Detection for L1P
– Memory Protection
• Dedicated and Shared L2 Memory
– 512 KB to 1 MB Local L2 per core
– 2 to 4 MB Multicore Shared Memory
(MSM)
– Multicore Shared Memory Controller
(MSMC)
– Error detection and correction for all
L2 memory
– MSM available to all cores and can be
either program or data
• Boot ROM
Memory Expansion
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
64-Bit
DDR3 EMIF
CorePac & Memory Subsystem
Memory Expansion
MSMC
• Multicore Shared Memory Controller (MSMC)
Debug & Trace
• Arbitrates CorePac and SoC master access to
shared memory
• Provides a direct connection to the DDR3 EMIF
• Provides CorePac access to coprocessors and IO
peripherals
• Memory protection and address extension to 64
GB (36 bits)
• Provides multi-stream pre-fetching capability
Boot ROM
Semaphore
C66x™
CorePac
Power
Management
PLL
L1
P-Cache
x3
L1
D-Cache
L2 SRAM
EDMA
• DDR3 External Memory Interface (EMIF)
1 to 8 Cores @ up to 1.25 GHz
x3
TeraNet
HyperLink
Multicore Navigator
Switc h
E the rne t
Switc h
SGMI I
x2
x4
SRI O
Ap pli catio nSpeci fic I/ O
S PI
UAR T
x2
PCI e
I2C
Ot hers
Queue
Manager
Packet
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
• Support for 1x 16-bit, 1x 32-bit, and 1x 64-bit
modes
• Supports up to 1600 MHz
• Supports power down of unused pins when using
16-bit or 32-bit width
• Support for 8 GB memory address
• Error detection and correction
• EMIF-16 (Media Applications Only)
• Three modes:
• Synchronized SRAM
• NAND flash
• NOR flash
• Can be used to connect asynchronous memory
(e.g., NAND flash) up to 256 MB.
Multicore Navigator
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
64-Bit
DDR3 EMIF
Memory Expansion
Multicore Navigator
MSMC
Debug & Trace
Boot ROM
Semaphore
Queue Manager and Packet DMA
C66x™
CorePac
Power
Management
PLL
L1
P-Cache
x3
L1
D-Cache
L2 SRAM
EDMA
1 to 8 Cores @ up to 1.25 GHz
x3
TeraNet
HyperLink
Multicore Navigator
Switc h
E the rne t
Switc h
SGMI I
x2
x4
SRI O
Ap pli catio nSpeci fic I/ O
S PI
UAR T
PCI e
I2C
x2
Queue
Manager
Ot hers
CorePac & Memory Subsystem
Packet
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
• Low-overhead processing and routing of
packet traffic
• Simplified resource management
• Effective inter-processor communications
• Abstracts physical implementation from
application host software
• Virtualization to enable dynamic load
balancing and provide seamless access to
resources on different cores
• 8K hardware queues and 16K descriptors
• More descriptors can reside in any
shared memory
• 10 Gbps pre-fetching capability
Network Coprocessor
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
64-Bit
DDR3 EMIF
MSMC
Debug & Trace
Boot ROM
CorePac & Memory Subsystem
Memory Expansion
Multicore Navigator
Network Coprocessor
• Packet Accelerator (PA)
Semaphore
C66x™
CorePac
Power
Management
PLL
L1
P-Cache
x3
L1
D-Cache
L2 SRAM
EDMA
1 to 8 Cores @ up to 1.25 GHz
x3
TeraNet
HyperLink
Multicore Navigator
Switc h
E the rne t
Switc h
SGMI I
x2
x4
SRI O
Ap pli catio nSpeci fic I/ O
S PI
UAR T
x2
PCI e
I2C
Ot hers
Queue
Manager
Packet
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
• Support for single or multiple IP addresses
• 1 Gbps wire-speed throughput at 1.5 Mpps
• UDP Checksum processing
• IPSec ESP and AH tunnels with fast path fully
offloaded
• L2 support: Ethernet, Ethertype, and VLAN
• L3/L4 Support: IPv4/IPv6 and UDP port-based raw
Ethernet or IPv4/6 and SCTP port-based routing
• Multicast to multiple queues
• QoS capability: Per channel/flow to individual
queue towards DSP cores and support for TX traffic
shaping per device
• Security Accelerator (SA)
• Support for IPSec, SRTP, 3GPP and WiMAX Air
Interface, and SSL/TLS security
• Support for simultaneous wire-speed security
processing on 1 Gbps Ethernet transmit and receive
traffic.
• Encryption Modes: ECB, CBC, CTR, F8, A5/3, CCM,
GCM, HMAC, CMAC, and GMAC
• Encryption Algorithms: AES, DES, 3DES, Kasumi,
SNOW 3g, SHA-1, SHA-2, and MD5
External Interfaces
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
64-Bit
DDR3 EMIF
MSMC
Debug & Trace
Boot ROM
Semaphore
C66x™
CorePac
Power
Management
PLL
L1
P-Cache
x3
L1
D-Cache
L2 SRAM
EDMA
1 to 8 Cores @ up to 1.25 GHz
x3
TeraNet
HyperLink
Multicore Navigator
Switc h
E the rne t
Switc h
SGMI I
x2
x4
SRI O
Ap pli catio nSpeci fic I/ O
S PI
UAR T
x2
PCI e
I2C
Ot hers
Queue
Manager
Packet
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
CorePac & Memory Subsystem
Memory Expansion
Multicore Navigator
Network Coprocessor
External Interfaces
• SGMII allows two 10/100/1000
Ethernet interfaces
• Four high-bandwidth Serial
RapidIO (SRIO) lanes for interDSP applications
• SPI for boot operations
• UART for development/testing
• Two PCIe at 5 Gbps
• I2C for EPROM at 400 Kbps
• Application-specific Interfaces:
– Antenna Interface 2 (AIF2) for
wireless applications
– Telecommunications Serial Port
(TSIP) x2 for media applications
TeraNet Switch Fabric
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
64-Bit
DDR3 EMIF
MSMC
Debug & Trace
Boot ROM
Semaphore
C66x™
CorePac
Power
Management
PLL
L1
P-Cache
x3
L1
D-Cache
L2 Cache
EDMA
1 to 8 Cores @ up to 1.25 GHz
x3
TeraNet
HyperLink
Multicore Navigator
Switc h
E the rne t
Switc h
SGMI I
x2
x4
SRI O
Ap pli catio nSpeci fic I/ O
S PI
UAR T
x2
PCI e
I2C
Ot hers
Queue
Manager
Packet
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
CorePac & Memory Subsystem
Memory Expansion
Multicore Navigator
Network Coprocessor
External Interfaces
TeraNet Switch Fabric
• TeraNet is a process controller
– Channel Controller
– Transfer Controller
• TeraNet provides a configured way –
within hardware – to manage traffic
queues and ensure priority jobs are
getting accomplished while minimizing the
involvement of the DSP cores.
• TeraNet facilitates high-bandwidth
communications between CorePac cores,
subsystems, peripherals, and memory.
Diagnostic Enhancements
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
64-Bit
DDR3 EMIF
MSMC
Debug & Trace
Boot ROM
Semaphore
C66x™
CorePac
Power
Management
PLL
L1
P-Cache
x3
L1
D-Cache
L2 Cache
EDMA
1 to 8 Cores @ up to 1.25 GHz
x3
TeraNet
HyperLink
Multicore Navigator
Switc h
E the rne t
Switc h
SGMI I
x2
x4
SRI O
Ap pli catio nSpeci fic I/ O
S PI
UAR T
x2
PCI e
I2C
Ot hers
Queue
Manager
Packet
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
CorePac & Memory Subsystem
Memory Expansion
Multicore Navigator
Network Coprocessor
External Interfaces
TeraNet Switch Fabric
Diagnostic Enhancements
• Embedded Trace Buffers (ETB) enhance
the diagnostic capabilities of the CorePac.
• CP Monitor enables diagnostic capabilities
on data traffic through the TeraNet switch
fabric.
• Automatic statistics collection and
exporting (non-intrusive)
• Monitor individual events for better
debugging
• Monitor transactions to both memory end
point and MMRs (memory mapped Regi)
• Configurable monitor filtering capability
based on address and transaction type
HyperLink Bus
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
64-Bit
DDR3 EMIF
MSMC
Debug & Trace
Boot ROM
Semaphore
C66x™
CorePac
Power
Management
PLL
L1
P-Cache
x3
L1
D-Cache
L2 Cache
EDMA
1 to 8 Cores @ up to 1.25 GHz
x3
TeraNet
HyperLink
Multicore Navigator
Switc h
E the rne t
Switc h
SGMI I
x2
x4
SRI O
Ap pli catio nSpeci fic I/ O
S PI
UAR T
x2
PCI e
I2C
Ot hers
Queue
Manager
Packet
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
CorePac & Memory Subsystem
Memory Expansion
Multicore Navigator
Network Coprocessor
External Interfaces
TeraNet Switch Fabric
Diagnostic Enhancements
HyperLink Bus
• Provides the capability to
expand the C66x to include
hardware acceleration or
other auxiliary processors
• Four lanes with up to 12.5
Gbps per lane
Miscellaneous Elements
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
64-Bit
DDR3 EMIF
MSMC
Debug & Trace
Boot ROM
Semaphore
C66x™
CorePac
Power
Management
PLL
L1
P-Cache
x3
L1
D-Cache
L2 Cache
EDMA
1 to 8 Cores @ up to 1.25 GHz
x3
TeraNet
HyperLink
Multicore Navigator
Switc h
E the rne t
Switc h
SGMI I
x2
x4
SRI O
Ap pli catio nSpeci fic I/ O
S PI
UAR T
x2
PCI e
I2C
Ot hers
Queue
Manager
Packet
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
CorePac & Memory Subsystem
Memory Expansion
Multicore Navigator
Network Coprocessor
External Interfaces
TeraNet Switch Fabric
Diagnostic Enhancements
HyperLink Bus
Miscellaneous
• Semaphore2 provides atomic access to
shared chip-level resources.
• Boot ROM
• Power Management
• Eight 64-bit timers
• Three on-chip PLLs:
– PLL1 for CorePacs
– PLL2 for DDR3
– PLL3 for Packet Acceleration
• Three EDMA
Device-Specific: Wireless Applications
KeyStone Device Architecture
for Wireless Applications
Memory Subsystem
2MB
MSM
SRAM
64-Bit
DDR3 EMIF
Coprocessors
MSMC
RSA
Debug & Trace
RSA
x2
VCP2
x4
Boot ROM
Semaphore
C66x™
CorePac
Power
Management
TCP3d
x2
TCP3e
PLL
32KB L1
P-Cache
x3
32KB L1
D-Cache
FFTC
x2
1024KB L2 Cache
EDMA
BCP
4 Cores @ 1.0 GHz / 1.2 GHz
x3
HyperLink
TeraNet
Multicore Navigator
Switc h
E the rne t
Switc h
SGMI I
2
x4
SRI O
x6
AI F2
S PI
UAR T
x2
PCI e
I2C
Ot he rs
Queue
Manager
Packet
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
CorePac & Memory Subsystem
Memory Expansion
Multicore Navigator
Network Coprocessor
External Interfaces
TeraNet Switch Fabric
Diagnostic Enhancements
HyperLink Bus
Miscellaneous
Application-Specific Coprocessors
Device-Specific (Wireless Apps)
• Wireless-specific Coprocessors
• FFTC
• TCP3 Decoder/Encoder
• VCP2
• BCP
• Wireless-specific Interfaces: AIF2 x6
• Characteristics
• Package Size: 24x24
• Process Node: 40nm
• Pin Count: 841
• Core Voltage: 0.9-1.1 V
• 2x Rake Search Accelerator (RSA)
Device-Specific: Media Applications
KeyStone Device
Architecture for
Media Applications
Memory Subsystem
4MB
MSM
SRAM
64-Bit
DDR3 EMIF
MSMC
Debug & Trace
Boot ROM
Semaphore
C66x™
CorePac
Power
Management
PLL
32KB L1
P-Cache
x3
32KB L1
D-Cache
512KB L2 Cache
EDMA
1 to 8 Cores @ up to 1.25 GHz
x3
HyperLink
TeraNet
Multicore Navigator
Switc h
E the rne t
Switc h
SGMI I
x2
x4
SRI O
x2
TS IP
S PI
UAR T
x2
PCI e
I2C
GPI O
EMI F 1 6
Queue
Manager
Packet
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
CorePac & Memory Subsystem
Memory Expansion
Multicore Navigator
Network Coprocessor
External Interfaces
TeraNet Switch Fabric
Diagnostic Enhancements
HyperLink Bus
Miscellaneous
Application-Specific Coprocessors
Device-Specific (Media Apps)
• Media-specific Interfaces
• TSIP x2
• EMIF 16 (EMIF-A)
• Characteristics
• Package Size: 24x24
• Process Node: 40nm
• Pin Count: 841
• Core Voltage: 0.9-1.1 V
KeyStone Overview
•
•
•
•
•
KeyStone Architecture
CorePac & Memory Subsystem
Interfaces and Peripherals
Coprocessors and Accelerators
Debug
MSMC Block Diagram
CorePac 0
CorePac 1
CorePac 2
CorePac 3
XMC
XMC
XMC
XMC
MPAX
MPAX
MPAX
MPAX
CorePac
Slave Port
Teranet
256
256
System
Slave Port
for shared
SRAM
( SMS )
256
System
Slave Port
for external
memory
( SES )
256
MSMC System
Master Port
256
256
256
256
Memory
Protection
and
Extension
Unit
( MPAX )
CorePac
Slave Port
CorePac
Slave Port
256
CorePac
Slave Port
MSMC Datapath
Arbitration
Memory
Protection
and
Extension
Unit
( MPAX )
256
EDC
MSMC Core
events
MSMC EMIF
Master Port
256
TeraNet
EMIF – 64 bit
DDR 3
Shared RAM ,
2048 KB
C66 TeraNet Data Connections
HyperLink
S
CPUCLK/2
256bit TeraNet
HyperLink
M
TPCC
TC0 M
16ch QDMA TC1 M
EDMA_0
S DDR3
S Shared L2
S S S S
L2
0-3 MM
SS Core
Core
S
S Core MM
M
M
Network
M
Coprocessor
TAC_FE
M
RAC_BE0,1
RAC_BE0,1 MM
FFTC / PktDMA M
FFTC / PktDMA M
AIF / PktDMA
M
M
M
M
SRIO
S TCP3e_W/R
S
M
PCIe
M
M
S
TCP3d
TCP3d
S TAC_BE
S RAC_FE
S RAC_FE
S SVCP2
(x4)
(x4)
SVCP2
SVCP2
VCP2(x4)
(x4)
M
QM_SS
DebugSS
S
CPUCLK/3
128bit TeraNet
TC2 M
TPCC
M
TC6
TPCC TC3
64ch
TC4TC7
M
64ch TC5TC8
QDMA
M
QDMA TC9
EDMA_1,2
MSMC
M
DDR3
• C6616 TeraNet facilitates high
XMC
SRIO
M
S
QMSS
S
PCIe
Bandwidth communication links
between DSP cores, subsystems,
peripherals, and memories.
• TeraNet supports parallel orthogonal
communication links
• In order to evaluate the potential
communication link throughput,
consider the peripheral bit-width and the
speed of TeraNet
• Please note that while most of the
communication links are possible, some
of them are not, or are supported by
particular Transfer Controllers. Details
are provided in the C6616 Data Manual
…
TeraNet Switch Fabric
Legend
Bridge
S
Wireless Apps Only
Media Apps Only
HyperLink
TPCC
16ch QDMA
S M3_DDR
CPU/2
256b
TeraNet
SCR
M
HyperLink
M
MSMC Subsystem
S M3_SL2
EDMA_0
CONFIG
XMC
x1 to x8
x5
x8
x7
S CorePac M
1x to 8x
SRIO
M
M
PA/SA
M
TC2 M
TPCC
M
TC6
TPCC TC3
64ch
TC4 M
64ch TC5TC7
QDMA
M
TC8
QDMA TC9
S
SRIO
Bridge 13
M
M
M
M
Bridge 14
CPU/3
128b
TeraNet
SCR
S
M
AIF / DMA
M
QM_SS
M
CPU / 3
128b SCR
MPU
S
M
M
CPU/3
32b
TeraNet
SCR
x2
x2
SCR
CPU / 3
x4
SCR
CPU / 3
x4
S
S
S
S
S
S
S
x4
TPCC
TPTC
TPCC
TPTC
S
SRIO
S CP Tracer (x5)
S CP Tracer (x8)
S CP Tracer (x7)
S
TSIP
S
AIF2
S
VCP2
S
TCP3D
S
TCP3E
S
FFTC
TPCC
TPTC
S
MPU
S
Semaphore
MPU
S
QMSS
PA/SA
S
DebugSS
S
SEC_CTL
QM_SS
S
PLL_CTL
PCIe
S
Bootcfg
S
Timer
S
GPIO
S
I2C
S
EMIF16
CP Tracer (x5) M
S Boot ROM
CP Tracer (x8) M
CP Tracer (x7) M
Preliminary Information
S under
SPI NDA - subject to change
STM
TETB
CPU/3
32b
Write-only
TeraNet
SCR
DebugSS
S STM
S TETB
CPU/6
32b
TeraNet
SCR
x8 / x16
S
INTC
S
UART
…
CPU / 3
32b
TeraNet
SCR
CPU / 6
32b
TeraNet
SCR
TCP3E
CPU/3
32b
TeraNet
SCR
SCR
CPU /2
S VCP2 (x4)
S
DAP (DebugSS) M
TSIP0,1
x2
MPU
S TCP3e_W/R
FFTC / DMA
S TETB
Bridge 12
EDMA_1,2
PCIe
DDR3
M
S
TC0 M
TC1 M
Global
Timestamp
Multicore Navigator Overview
• Multicore Navigator
– Purpose - seamless inter-core communications between cores, IP and
peripherals. “Fire and forget”
– Supports synchronization between cores, move data between cores, move
data to and from peripherals
– Consists of a Queue Manager and multiple, dedicated Packet DMA engines
– Data transfer architecture designed to minimize host interaction while
maximizing memory and bus efficiency
– Move Descriptors and buffers (or pointers to) between different parts of
the Chip
• Navigator hardware:
– Queue Manager Subsystem (QMSS)
– Multiple Packet DMA (PKTDMA)
Navigator Architecture
Queue Interrupts
Link RAM
Host
(App SW)
Buffer Memory
Queue Man register I/F
PKTDMA register I/F
Accumulator command I/F
L2 or DDR
Descriptor RAMs
Accumulation Memory
VBUS
Hardware Block
PKTDMA
Rx Coh
Unit
QMSS
Rx Core
Tx Core
Timer
Timer
PKTDMA
Tx Scheduling
Control
(internal)
APDSP
APDSP
(Accum)
(Monitor)
Config RAM
Interrupt Distributor
Register I/F
Rx Channel
Ctrl / Fifos
Tx Channel
Ctrl / Fifos
Tx DMA
Scheduler
Queue Interrupts
queue pend
Rx Streaming I/F Tx Streaming I/F
Output
(egress)
Input
(ingress)
PKTDMA Control
Tx Scheduling I/F
(AIF2 only)
Queue
Manager
queue
pend
Config RAM
Register I/F
Link RAM
(internal)
Queue Manager Subsystem (QMSS)
• Features:
– 8192 total hardware queues
– Up to 20 Memory regions for descriptor storage (LL2, MSMC, DDR)
– Up to 2 Linking RAMs for queue linking/management
• Up to 16K descriptors can be handled by internal Link RAM.
• Second Link RAM can be placed in L2 or DDR.
– Up to 512K descriptors supported in total.
– Can copy descriptor pointers of transferred data to destination core’s local
memory to reduce access latency
• Major hardware components:
– Queue Manager
– PKTDMA (Infrastructure DMA)
– 2 PDSPs (Packed Data Structure Processors) for:
• Descriptor Accumulation / Queue Monitoring
• Load Balancing and Traffic Shaping
– Interrupt Distributor (INTD) module
Packet DMA Topology
FFTC (B)
FFTC (A)
Queue Manager Subsystem
Queue Manager
PKTDMA
PKTDMA
0
1
2
3
4
5
SRIO
PKTDMA
...
8192
AIF
PKTDMA
Network
Coprocessor
PKTDMA
PKTDMA
• Multiple Packet DMA instances in KeyStone devices:
— QMSS, PA and SRIO instances for all KeyStone devices.
— AIF2 and FFTC (A and B) instances are only in KeyStone devices for wireless applications.
• Transfer engine interface between peripherals/accelerators and QMSS
• Autonomously determines memory buffers to fill and queues to post based on initial setup and buffer descriptor
Queues/Descriptors/Packets
Queue
PTR
PTR
PTR
Host Packet
Descriptor
PTR
Length
...
Data
...
PTR
Host Buffer
Descriptor
PTR
Length
...
Monolithic
Descriptor
Length
...
Data
Data
XMC – External Memory Controller
The XMC is responsible for:
1.
2.
3.
4.
Address extension/translation
Memory protection for addresses outside C66x
Shared memory access path
Cache and Pre-fetch support
User Control of XMC:
1.
2.
MPAX registers – Memory Protection and Extension Registers
MAR registers – Memory attributes registers
Each core has its own set of MPAX and MAR
registers !
The MPAX Registers
• Translate between physical and logical address
• 16 registers (64 bits each) control (up to) 16 memory segments.
• Each register translates logical memory into physical memory
for the segment.
• Segment definition in the MPAX registers:
–
–
–
–
Segment size – 5 bits – power of 2, smallest segment size 4K, up to 4GB
Logical base address –
Physical (replacement address) base
Permission – access type allowed in this address range
The MAR Registers
• MAR = Memory Attributes Registers
• 256 registers (32 bits each) control 256 memory segments.
– Each segment size is 4M Bytes, from logical address 0x00000000 to
address 0xffffffff
– The first 16 registers are read-only. They control the internal memory
of the core.
• Each register controls the cache-ability of the segment (bit 0) and the prefetch-ability (bit 3). All other bits are reserved and set to 0.
• All MAR bits are set to zero after reset.
KeyStone Overview
•
•
•
•
•
KeyStone Architecture
CorePac & Memory Subsystem
Interfaces and Peripherals
Coprocessors and Accelerators
Debug
EDMA
3 EDMA Channel Controllers
•
•
•
•
1 controller in CPU/2 domain
– 2 transfer controllers/queues with
1KB channel buffer
– 8 QDMA channels
– 16 interrupt channels
– 128 PaRAM entries
2 controllers in CPU/3 domain each with
– 4 transfer controllers/queues with
1KB or 512B channel buffer
– 8 QDMA channels
– 64 interrupt channels
– 512 PaRAM entries
Flexible transfer definition
– Linking mechanism allows automatic
PaRAM set update
– Chaining allows multiple transfers to
execute with one event
Interrupt generation
– Transfer completion
– Error conditions
510
511
Interfaces Overview
Common Interfaces
• One PCI Express (PCIe) Gen II port
–
–
–
–
Two lanes running at 5G Baud
Support for root complex (host) mode and end point mode
Single Virtual Channel (VC) and up to eight Traffic Classes (TC)
Hot plug
• Universal Asynchronous Receiver/Transmitter (UART)
– 2.4, 4.8, 9.6, 19.2, 38.4, 56, and 128 K baud rate
• Serial Port Interface (SPI)
– Operate at up to 66 MHz
– Two-chip select
– Master mode
• Inter IC Control Module (I2C)
– One for connecting EPROM (up to 4Mbit)
– 400 Kbps throughput
– Full 7-bit address field
• General Purpose IO (GPIO) module
– 16-bit operation
– Can be configured as interrupt pin
– Interrupt can select either rising edge or falling edge
• Serial RapidIO (SRIO)
– RapidIO 2.1 compliant
– Four lanes @ 5 Gbps
• 1.25/2.5/3.125/5 Gbps operation per lane
• Configurable as four 1x, two 2x, or one 4x
– Direct I/O and message passing (VBUSM slave)
– Packet forwarding
– Improved support for dual-ring daisy-chain
– Reset isolation
– Upgrades for inter-operation with packet accelerator
• Two SGMII ports with embedded switch
–
–
–
–
–
–
–
–
Supports IEEE1588 timing over Ethernet
Supports 1G/100 Mbps full duplex
Supports 10/100 Mbps half duplex
Inter-working with RapidIO message
Integrated with packet accelerator for efficient IPv6 support
Supports jumbo packets (9 Kb)
Three-port embedded Ethernet switch with packet forwarding
Reset isolation with SGMII ports and embedded ETH switch
• HyperLink bus
– Hardware hooks for analog device or customer ASIC
Application-Specific Interfaces
For Wireless Applications
• Antenna Interface 2 (AIF2)
– Multiple-standard support (WCDMA, LTE, WiMAX, GSM/Edge)
– Generic packet interface (~12Gbits/sec ingress & egress)
– Frame Sync module (adapted for WiMAX, LTE & GSM
slots/frames/symbols boundaries)
– Reset Isolation
For Media Gateway Applications
• Telecommunications Serial Port (TSIP)
– Two TSIP ports for interfacing TDM applications
– Supports 2/4/8 lanes at 32.768/16.384/8.192 Mbps per lane & up
to 1024 DS0s
Ethernet Switch: Overview
•
3 Port Ethernet Switch
– Port 0: CPPI port
– Port 1: SGMII 0 Port
– Port 2: SGMII 1 Port
•
Ethernet Switch Modules
–
–
–
–
2 EMAC modules
Address Lookup Engine (ALE) module
2 Statistics modules
CPTS (connect Port TS) module
• The PA will be discussed later
Serial RapidIO (SRIO)
•
•
•
SRIO or RapidIO provides a 3-Layered Architecture
– Physical defines electrical characteristics, link flow control (CRC)
– Transport defines addressing scheme (8b/16b device IDs)
– Logical defines packet format and operational protocol
2 Basic Modes of Logical Layer Operation
– DirectIO
• Transmit Device needs knowledge of memory map of Receiving Device
• Includes NREAD, NWRITE_R, NWRITE, SWRITE
• Functional units: LSU, MAU, AMU
– Message Passing
• Transmit Device does not need knowledge of memory map of Receiving Device
• Includes Type 11 Messages and Type 9 Packets
• Functional units: TXU, RXU
Gen 2 Implementation – Supporting up to 5 Gbps
PCIe Interface
•
KeyStone incorporates a single PCIe interface with the following characteristics:
– Two SERDES lanes running at 5 GBaud/2.5GBaud
– Gen2 compliant
– Three different operational modes (default defined by pin inputs at power up;
can be overwritten by software):
• Root Complex (RC)
• End Point (EP)
• Legacy End Point
– Single Virtual Channel (VC)
– Single Traffic Class (TC)
– Maximum Payloads
• Egress – 128 bytes
• Ingress – 256 bytes
– Configurable BAR filtering, IO filtering, and configuration filtering
HyperLink Bus
HyperLink
TeraNet Switch Fabric
Device #1
TeraNet Switch Fabric
• Provides a high-speed interface between device
interfaces through the TeraNet switch fabric.
• A single 4x bus operating at up to 12.5 Gbps per
lane
• Connections are point-to-point.
Device #2
AIF 2.0
•
AIF2 is a peripheral module that supports data transfers between uplink and downlink
baseband processors through a high-speed serial interface. AIF2 directly supports the
following:
•
•
•
•
•
•
WCDMA/FDD
LTE FDD
LTE TDD
WiMax
TD-SCDMA
GSM/Edge (OBSAI only)Autonomous DMA
• PKTDMA or AIF VBUS Master
• More efficient data transfer for OFDM standards
• FIFO-based buffer provides flexible support for various sampling frequencies.
Other Peripherals & System Elements (1/3)
• TSIP
– Supports 1024 DS0s per TSIP
– Supports 2/4/8 lanes at 32.768/16.384/8.192 Mbps per lane
• UART Interface – Operates at up to 128,000
baud
• I2C Interface
– Supports 400Kbps throughput
– Supports full 7-bit address field
– Supports EEPROM size of 4 Mbit
• SPI Interface
– Operates at up to 66 MHz
– Supports two chip selects
– Support master mode
• GPIO Interface
– 16 GPIO pins
– Can be configured as interrupt pins
– Interrupt can select either rising edge or falling edge
Other Peripherals & System Elements (2/3)
• EMIF16
–
–
–
–
Used for booting, logging, announcement, etc.
Supports NAND flash memory, up to 256MB
Supports NOR flash up to 16MB
Supports asynchronous SRAM mode, up to 1MB
• 64-Bit Timers
– Total of 16 64-bit timers
• One 64-bit timer per core is dedicated to serve as a watchdog (or may be used as a general
purpose timer)
• Eight 64-bit timers are shared for general purpose timers
– Each 64-bit timer can be configured as two individual 32-bit timers
– Timer Input/Output pins
•
•
•
•
Two timer Input pins
Two timer Output pins
Timer input pins can be used as GPI
Timer output pins can be used as GPO
• On-Chip PLLs
– Core
– Packet & Security CoProcessors
– DDR
Other Peripherals & System Elements (3/3)
• Hardware Semaphores
• Power Management
• Support to assert NMI input for each core –
separate hardware pins for NMI and core
selector
• Support for local reset for each core –
separate hardware pins for local reset and
core selector
KeyStone Overview
•
•
•
•
•
KeyStone Architecture
CorePac & Memory Subsystem
Interfaces and Peripherals
Coprocessors and Accelerators
Debug
Network and Security
Coprocessor Overview
• Packet Accelerator (PA)
– Deciphers and adds protocol headers to (and from) packets.
– Standard protocols and limited user’s defined protocol routing
• Security Accelerator (SA)
– Encrypts and decrypts packages
Network Coprocessor (Logical)
PKTDMA Queue
QMSS FIFO Queue
Lookup Engine
(IPSEC16
entries, 32 IP,
16 Ethernet)
Packet Accelerator
SRIO
message RX
Ethernet
RX MAC
Classify
Pass 1
RX
PKTDMA
Security
Accelerator
Ingress Path
(cp_ace)
Egress Path
TX
PKTDMA
Classify
Pass 2
Modify
Modify
Ethernet
TX
Ethernet
MAC
TX
MAC
SRIO
message TX
CorePac 0
DSP 0
DSP0 0
DSP
Session Identification
• Hardware lookup identifies the session.
• First-pass lookup:
– IPv4, IPv6, or Ethernet only
– 64 entries (16 Ethernet, 32 up to IPv6, 16 up to
IPSec)
– IP with ESP or AH as next protocol and SPI
• Second-pass lookup:
– 8192 entries
– UDP, SCTP, etc. or proprietary up to 32-bit
identifier within the first 128 bytes of the packet
IP/UDP or Raw Ethernet/Flow ID
PKTDMA Queue
Lookup Engine
(IPSEC16
entries, 32 IP,
16 Ethernet)
QMSS FIFO Queue
1. No IP Sec
detected,
IPv6 address
matched.
Packet Accelerator
2. UDP port or
proprietary
session ID
number matched.
SRIO
message RX
Ethernet
RX MAC
Classify
Pass 1
RX
PKTDMA
Security
Accelerator
Ingress Path
(cp_ace)
Egress Path
TX
PKTDMA
Classify
Pass 2
3. UDP checksum
verified and result
set in descriptor.
Modify
Modify
Ethernet
TX
Ethernet
MAC
TX
MAC
SRIO
message TX
CorePac 0
DSP 0
DSP0 0
DSP
IPSec Flow (IP/UDP in IP/ESP)
1. IP Sec
detected and SPI
matched against
configured
security contexts.
Lookup Engine
(IPSEC16
entries, 32 IP,
16 Ethernet)
2. Authenticate,
decrypt, and
replay protection.
SRIO
message RX
Ethernet
RX MAC
Classify
Pass 1
PKTDMA Queue
QMSS FIFO Queue
Packet Accelerator
4. UDP checksum
verified and result
set in descriptor.
Classify
Pass 2
RX
PKTDMA
Security
Accelerator
Ingress Path
(cp_ace)
Egress Path
TX
PKTDMA
3. IPv6 address
and UDP port or
proprietary
session ID
number matched.
Modify
Modify
Ethernet
TX
Ethernet
MAC
TX
MAC
SRIO
message TX
CorePac 0
DSP 0
DSP0 0
DSP
IPSec Transmit Flow
PKTDMA Queue
QMSS FIFO Queue
Lookup Engine
(IPSEC16
entries, 32 IP,
16 Ethernet)
SRIO
message RX
Ethernet
RX MAC
Classify
Pass 1
2. UDP
checksum
calculated
and result
stored in IP
header.
Classify
Pass 2
RX
PKTDMA
Security
Accelerator
Ingress Path
(cp_ace)
Egress Path
TX
PKTDMA
Packet Accelerator
3. Payload is
encrypted and
authentication tag
is computed and
stored in trailer.
Modify
Modify
Ethernet
TX
Ethernet
MAC
TX
MAC
SRIO
message TX
1. Host SW
builds payload
and IP/ESP
header.
CorePac 0
DSP 0
DSP0 0
DSP
What is FFTC?
• The FFTC is an accelerator that can be used to
perform FFT and Inverse FFT (IFFT) on data.
• The FFTC has been designed to be compatible
with various OFDM-based wireless standards like
WiMax and LTE.
• The Packet DMA (PKTDMA) is used to move data
in and out of the FFTC module.
• The FFTC supports four input (Tx) queues that are
serviced in a round-robin fashion.
• Using the FFTC to perform computations that
otherwise would have been done in software
frees up CPU cycles for other tasks.
FFTC Features
• Provides algorithms for both FFT and IFFT
• Multiple block sizes:
– Maximum 8192
– All LTE DFT (Long Term Evolution Discrete Fourier Transform) sizes
• LTE 7.5 kHz frequency shift
• 16 bits I/ 16 bits Q input and output – block floating point output
• Dynamic and programmable scaling modes
– Dynamic scaling mode returns block exponent
• Support for left-right FFT shift (switch the left/right halves)
• Support for variable FFT shift
– For OFDM (Orthogonal Frequency Division Multiplexing) downlink,
supports data format with DC subcarrier in the middle of the subcarriers
• Support for cyclic prefix
– Addition and removal
– Any length supported
• Three-buffer design allows for back-to-back computations
• 128-bit, CPU/3, full-duplex VBUS connection
• Input data scaling with shift eliminates the need for front-end digital AGC
(Automatic Gain Control)
• Output data scaling
Turbo CoProcessor 3 Decoder (TCP3D)
• TCP3D is a programmable peripheral for decoding of 3GPP (WCDMA, HSUPA,
HSUPA+, TD_SCDMA), LTE, and WiMax turbo codes.
• Turbo decoding is a part of bit processing.
LTE Bit Processing
Per Transport Block
Soft Bits
De-Scrambling
Per Code Block
Channel
De-interleaver
LLR
combining
LLR Data
• Systematic
• Parity 0
• Parity 1
Hard decision
Decoded bits
TB CRC
De-Rate
Matching
TCP3D
TCP3D Key Features (1/2)
•
•
•
•
•
•
•
•
•
•
•
•
Supports 3GPP Rel-7 and older (WCDMA), LTE, and WiMAX turbo decoding
Native Code Rate: 1/3
Radix 4 Binary and Duo-Binary MAP Decoders
Dual MAP decoders for non-contentious interleavers
Split decoder mode: TCP3D works as two independent, single MAP decoders
Max Star and Max log-map algorithms
Double Buffer input memory for lower latency transfers (except in split mode)
128-bit data bus for reduced latency transfers
Input data bit width: 6 bits
Programmable hard decision bit ordering within a 128-bit word: 0-127 or 127-0
Soft output information for systematic and parity bits: 8 bits
Extrinsic scaling per MAP for up to eight iterations (Both Max and Max Star)
TCP3D Key Features (2/2)
•
•
•
•
•
•
•
•
•
•
•
•
Block sizes supported: 40 to 8192
Programmable sliding window sizes {16, 32, 48, 64, 96, 128}
Max number of iterations: 1 to 15
Min number of iterations: 1 to 15
SNR stopping criterion: 0 to 20 dB threshold
LTE CRC stopping criterion
LTE, WCDMA and WiMAX Hardware Interleaver Generators
Channel Quality Indication
Emulation support
Low DSP pre-processing load
Runs in parallel with CorePac
Targets base station environment
Turbo CoProcessor 3 Encoder (TCP3E)
• TCP3E = Turbo CoProcessor 3 Encoder
– No previous versions, but came out at same time as third version of
decoder co-processor (TCP3D)
– Runs in parallel with DSP
• Performs Turbo Encoding for forward error correction of
transmitted information (downlink for basestation)
– Adds redundant data to transmitted message
– Turbo Decoder in handset uses the redundant data to correct errors
– Often avoids retransmission due to a noisy channel
Turbo Encoder
(TCP3E)
Downlink
Turbo Decoder
in Handset
TCP3E Features Supported
• 3GPP, WiMAX and LTE encoding
– 3GPP includes: WCDMA, HSDPA, and TD-SCDMA
•
•
•
•
•
•
•
•
Code rate: 1/3
Can achieve throughput of 250 Mbps in all three modes
On-the-fly interleaver table generation
Dual-encode engines with input and output memories for
increased throughput
Programmable input and output format within a 32-bit word
Block sizes supported: 40 to 8192
Tail biting for WiMAX
CRC encoding for LTE
TCP3E Block Diagram
Config
Registers
Input
Memory
Encode Engine
Output
Memory
Config
Registers
Ping
Input
Memory
Output
Memory
Config
Registers
Input
Memory
Encode Engine
Output
Memory
Pong
•
•
•
•
Internally, TCP3E has dual (ping/pong) encode engines, config registers, input and output
memories
Externally, TCP3E looks like a single set of config regs and input / output buffers
Routing to ping/pong is handled internally
Alternates between ping and pong from one code block to the next
Bit Rate Coprocessor (BCP)
The Bit Rate Coprocessor (BCP) is a programmable peripheral for baseband bit
processing. Integrated into the Texas Instruments DSP, it supports FDD LTE,
TDD LTE, WCDMA, TD-SCDMA, HSPA, HSPA+, WiMAX 802.16-2009
(802.16e), and monitoring/planning for LTE-A.
Primary functionalities of the BCP peripheral include the following:
•
•
•
•
•
•
•
•
•
•
•
•
•
CRC
Turbo / convolutional encoding
Rate Matching (hard and soft) / rate de-matching
LLR combining
Modulation (hard and soft)
Interleaving / de-interleaving
Scrambling / de-scrambling
Correlation (final de-spreading for WCDMA RX and PUCCH correlation)
Soft slicing (soft demodulation)
128-bit Navigator interface
Two 128-bit direct I/O interfaces
Runs in parallel with DSP
Internal debug logging
Viterbi Decoder Coprocessor (VCP2)
KeyStone Overview
•
•
•
•
•
KeyStone Architecture
CorePac & Memory Subsystem
Interfaces and Peripherals
Coprocessors and Accelerators
Debug
Emulation Features (1/2)
• Host tooling can halt any or all of the cores on the device.
– Each core supports a direct connection to the JTAG interface.
– Emulation has full visibility of the CorePac memory map.
• Real-Time Emulation allows the user to debug application code
while interrupts designated as real-time continue to be serviced.
– Normal code execution runs code in the absence of a debug
event halting execution with the peripheral operating in a
continuous fashion.
– Secondary code execution runs code related to the service of
a real-time interrupt after a debug event has halted code
execution.
– No code execution does not run code because a debug event
halts code execution, and no real-time interrupt is serviced
after code execution is halted.
Emulation Features (2/2)
• Advanced Event Triggering (AET) allows the user to identify events of interest:
– Utilize instruction and data bus comparators, auxiliary event detection,
sequencers/state machines, and event counters
– Manage breakpoints, trace acquisition, data collection via an interrupt, timing
measurement, and generate external triggers
– Control a state machine and the counters used to create the intermediate events
(loop counts and state machines)
– Allow event combining to create simple or complex triggers using modules call
trigger builders
• AET logic is provided for monitoring program, memory bus, system event activity,
remembering event sequences, counting event occurrences, or measuring the
interval between events.
– Perform range and identity comparisons
– Detect exact transactions
– Detect touching of a byte or range of bytes by memory references
• External event detectors allow monitoring of external triggers or internal states of
interest (i.e., cache miss).
– Enables four states for the identification of a sequence of triggers
– Allow specific system activity to generate breakpoints, an interrupt used for the
collection of system data, or the identification of program activity that is observed
through trace
• Any system event routed to a C66x core can be routed (through software selection) to
the AET.
Trace Subsystem (Simplified)
Trace
Stream(s)
Optionally
Exported
ETB0
One Embedded Trace Buffer
per CorePac
CorePac
0
DRM
ETBn-1
S0
DMA
Switch Fabric
CorePac
n
Sm
Other
Masters
Other
Slaves
CP_MONITOR 0
One CP
_MONITOR per
monitored slave endpoint
CP_MONITOR_M
TeraNet
VBUS command signals
exported to CP
_ MONITORs
Trace Logs generated
through dedicated SCR
STM
ETBn
One Embedded Trace for
System Trace
Trace Features
• Trace Pin Support for XDS560T Trace
• On-Chip Embedded Trace Buffers
– 4 KB (Core) /32 KB (STM) on-chip receiver
– One ETB per core for Trace and one for STM
– Snapshot and circular buffer mode
– Simultaneous write (sink) and read (drain) capability
– Can be used in CoreSight ETB mode
• C66x CPU Trace:
– Trace targets the debug of unstable code:
• Provides for the recording of program flow, memory references, cache statistics, and application
specific data with a time stamp, performance analysis, and quality assurance.
• Bus snoopers to collect and export trace data using hardware dedicated to the trace function.
• All or a percentage of the debug port pins can be allocated to trace for any of the cores (or a mix).
– Program flow and timing can be traced at the same rate generated by the CPU.
– Event trace provides a log of user-selectable system events. Event trace can also be used in conjunction
with profiling tools.
– Data references must be restricted however as the export mechanism is limited to a number of pins,
which is insufficient to sustain tracing of all memory references.
• The Advanced Event Triggering facilities provide a means to restrict the trace data exported to data
of interest to maintain the non-intrusive aspect of trace.
• Error indications are embedded in the debug stream in the event the export logic is unable to keep
up with the data rate generated by the collection logic.
• The user can optionally select the export of all specified trace data.
– In this case, the CPU is stalled to avoid the loss of trace data
– The user is notified that trace stalls have occurred although the number of stalls and their
location is not recorded.
For more information on these features, please refer to Debug/Trace User Guide for your selected KeyStone device.
KeyStone CP Tracer Modules
Legend
Bridge
S
Wireless Apps Only
Media Apps Only
VUSR
TPCC
16ch QDMA
S M3_DDR
CPU/2
256b
TeraNet
SCR
CP Tracer
M
VUSR
CPT
TC0 M
TC1 M
EDMA_0
M
MSMC_SS
S M3_SL2
S
for EMIF_DDR3
(36b)
CPT
CPT
CPT
CPT
4 CPTs for SRAM
(36b)
CONFIG
XMC
X 4/ x 8
DDR3
M
x5
x8
S
S CP Tracer (x5)
S CP Tracer (x8)
x7
x4 for Wireless
x8 for Media
SRIO
M
M
PA/SA
M
TC2 M
TPCC
M
TC6
TPCC TC3
64ch
TC4 M
64ch TC5TC7
QDMA
M
TC8
QDMA TC9
EDMA_1,2
S
RAC_BE0,1
M
FFTC / DMA
M
AIF / DMA
M
QMSS
M
PCIe
M
M
M
M
Bridge 14
CPU/3
128b
TeraNet
SCR
S TCP3e_W/R
CPT
S
CPT
CPU / 3
128b SCR
MPU CPT
TCP3d
x2
CPU/3
32b
TeraNet
SCR
S
S
SCR
CPU /2
x2
SCR
CPU / 3
x4
SCR
CPU / 3
x4
S
S
S
S
S
S
TPCC
TPTC
TPCC
TPTC
x4
S CP Tracer (x7)
S
TSIP
S
AIF2
S
VCP2
S
TCP3D
S
TCP3E
S
FFTC
TPCC
TPTC
S
MPU CPT
S
Semaphore
MPU CPT
S
QMSS
PA/SA
S
DebugSS
S
SEC_CTL
QMSS
S
PLL_CTL
PCIe
S
Bootcfg
S
Timer
S
GPIO
S
I2C
S
EMIF16
CP Tracer (x5) M
S Boot ROM
CP Tracer (x8) M
STM
TETB
CPU/3
32b
TeraNet
Write-only
SCR
CP Tracer (x7) M
Preliminary Information
S
SPI under NDA - subject to change
DebugSS
S STM
S TETB
CPU/6
32b
TeraNet
SCR
X8 / x16
S
INTC
S
UART
…
CPU / 3
32b
TeraNet
SCR
CPU / 6
32b
TeraNet
SCR
CPU/3
32b
TeraNet
SCR
S VCP2 (x4)
S
M
M
x2
MPU CPT
x2
DAP (DebugSS) M
TSIP0,1
SRIO
Bridge 13
Monitors transactions
from AIF, TCs
M
S TETB
Bridge 12
Monitors transactions
from AIF,SRIO, Core, TCs
TAC_FE
S CorePac M
SRIO
Global
Timestamp
CP Tracer Module Features (1/2)
• Transaction trace (output to STM)
• Ability to 'see' the transactions for each master to selected slave interfaces through tracing of key
transaction points:
– Arbitration Won (Event B)
– Transaction Complete (Event C, E)
• Two filtering functions for transaction traces to bring out the specific transactions:
– Transaction-qualifier-filtering: read/write
– Address-range-based filtering
• Statistics counters:
– Throughput counts represent the total number of bytes forwarded to the target slave during a
specified time duration.
• Counter accumulates the byte-count presented at the initiation of a new transfer.
• Can be used to calculate the effective throughput in terms of bytes-per-second at a given memory
slave interface.
• Can be used to track the bandwidth consumed by the system masters. (#bytes/time)
– Each CP Tracer provides two independent throughput counters.
• Each can be used to track the total number of bytes forwarded from a group of masters.
• Each system master can be assigned to either / both /none of the two masters groups for
throughput collection.
• CP Tracer also provides address range based filtering and transaction qualifier based filtering
functions to further narrow the interested transactions.
– Accumulated Wait time counter
• Provides an indication of how busy the bus is and how many cycles elapsed with at least one bus
master waiting for access to the bus
– Num Grant counter
• Provides an indication of the number of bus grants. The average transaction size can be
determined by looking at throughput / num Grant
CP Tracer Module Features (2/2)
• Sliding Time Window:
– Specifies the measurement interval for all the CBA statistic counters implemented in the
CP Tracer module.
– When the sliding window timer expires, the counter values are loaded into the respective
registers and the count starts again.
– If enabled, an interrupt is also generated when the sliding time window expires.
– The host CPU and/or EDMA can read the statistics counters upon assertion of the
interrupt.
– If enabled, the counter values can also be exported to STM automatically after the sliding
time window is expired.
• Cross-trigger generation: can assert EMU0/1 when a qualified event occurs
– External trigger to start/stop monitoring.
– The EMU0 trigger line is coupled to trace start. The EMU1 trigger line is coupled to trace
stop.
– Both EMU0 and EMU1 are sourced from any of the CorePac cores.
– It can also be controlled from an external source via the EMU0 and EMU1 pins on the
device.
– The EMU0 trigger enables the EMU01_TraceEnableStatus bit of the Transaction Qualifier
register, the EMU1 trigger disables this bit.
• STM Trace Export Enables
– Status message
– Event message
– Statistics message
For More Information
• For more information, refer to the
C66x Getting Started page to locate the data
manual for your KeyStone device.
• View the complete C66x Multicore SOC Online
Training for KeyStone Devices, including
details on the individual modules.
• For questions regarding topics covered in this
training, visit the support forums at the
TI E2E Community website.