01-Keystone Architecture.pdf

Keystone Architecture
Agenda
• Keystone Overview
• KeyStone I Architecture
–
–
–
–
–
CorePac & Memory Subsystem
Internal Communications and Transport
External Interfaces
Coprocessors and Accelerators
Miscellaneous
• KeyStone Platform
– Debug
– Device-Specific Offerings
TI Keystone DSP Applications
Media Gateways &
Networking
High Performance &
Cloud Computing
Imaging
Applications
Video
Surveillance
Mission Critical
SDR/BTS
Test and
Automation
Radio
Network
Controller
Session
Border
Controller
LTE/SAE
Gateway
Multimedia
Gateway
Video & Audio
Infrastructure
TI multicore innovation progression
KeyStone III • 64 bit ARM v8
5 generations of multicore
•
•
•
•
Lowers development effort
Speeds time to market
Leverages TI’s investment
Optimal software reuse
• C66x+ DSP
• 40G Networking
KeyStone II
28nm
•32 bit ARM A15
•10G Networking
KeyStone
KeyStone I
40nm
6474/6455
Concept
Sampling
•C64x+ Fixed Point Only
65nm
Development
•C66x Fixed AND floating point DSP
•Networking + Wireless Acceleration
6414/6416
130nm
Production
•C64x Single core DSP
2003
2006
2011
2012/13
2014 / 2015
4
Unmatched Performance
BDTImark2000
TM
ADI 2116x (SHARC)
Score
NEC uPD77050
ADI 2126x (SHARC)
ADI BF5xx (Blackfin)
ADI 213xx (SHARC)
ADI TS201S(TigerSHARC)
ADI TS201S (TigerSHARC)
ADI TS202S/203S (TigerSHARC)
ADI TS202S/203S (TigerSHARC)
Freescale MSC81xx (SC140)
Intell Pentium III
Freescale MSC814x (SC3400)
Renesas SH77xx (SH-4)
Freescale MSC815x (SC3850)
TMS320C67x
TMS320C64x+
TMS320C66xx
TMS320C66xx
0
2000
4000
6000
8000
10000
12000
0
14000
BDTI Score for Floating Point Processors
Algorithm
Single Precision Floating Point FFT,
2048 pt, Radix 4
5000
10000
15000
20000
BDTI Score for Fixed Point Processors
C67x @
300MHz
C64x+
@1.2GHz
86.84 us
C66x
@1.25GHz
Gain
17.90 us
~600%
Fixed Point FFT, 2048 pt, Radix 4
8.23 us
4.46 us
~200%
FIR Filter, 40 samples, 40 taps
0.69 us
0.34 us
~200%
Matrix Multiply 32 x 32
17.92 us
6.16 us
~300%
Matrix Inverse 4 x 4
0.53 us
0.13 us
~400%
25000
Agenda
• Keystone Overview
• KeyStone I Architecture
–
–
–
–
–
CorePac & Memory Subsystem
Internal Communications and Transport
External Interfaces
Coprocessors and Accelerators
Miscellaneous
• KeyStone Platform
– Debug
– Device-Specific Offerings
KeyStone I CorePac
• 1 to 8 C66x CorePac DSP Cores
Application-Specific
Coprocessors
Memory Subsystem
C66x™
CorePac
L1D
L1P
Cache/RAM Cache/RAM
L2 Memory Cache/RAM
Miscellaneous
HyperLink
1 to 8 Cores @ up to 1.25 GHz
TeraNet
Multicore Navigator
External Interfaces
Network Coprocessor
operating at up to 1.25 GHz
– Fixed- and floating-point
operations
– Code compatible with other
C64x+ and C67x+ devices
• L1 Memory
– Can be partitioned as cache
and/or RAM
– 32KB L1P per core
– 32KB L1D per core
– Error detection for L1P
– Memory protection
• Dedicated L2 Memory
– Can be partitioned as cache
and/or RAM
– 512 KB to 1 MB Local L2 per core
– Error detection and correction
for all L2 memory
• Direct connection to memory
subsystem
KeyStone I Memory Subsystem
Memory Subsystem
DDR3 EMIF
MSM
SRAM
Application-Specific
Coprocessors
MSMC
C66x™
CorePac
L1D
L1P
Cache/RAM Cache/RAM
L2 Memory Cache/RAM
Miscellaneous
HyperLink
1 to 8 Cores @ up to 1.25 GHz
TeraNet
Multicore Navigator
External Interfaces
Network Coprocessor
• Multicore Shared Memory (MSM SRAM)
• 1 to 4 MB
• Available to all cores
• Can contain program and data
• All devices except C6654
• Multicore Shared Memory Controller (MSMC)
• Arbitrates access of CorePac and SoC
masters to shared memory
• Provides a connection to the DDR3 EMIF
• Provides CorePac access to coprocessors and
IO peripherals
• Provides error detection and correction for
all shared memory
• Memory protection and address extension
to 64 GB (36 bits)
• Provides multi-stream pre-fetching
capability
• DDR3 External Memory Interface (EMIF)
• Support for 16-bit, 32-bit, and (for C667x
devices) 64-bit modes
• Specified at up to 1600 MT/s
• Supports power down of unused pins when
using 16-bit or 32-bit width
• Support for 8 GB memory address
• Error detection and correction
KeyStone I Multicore Navigator
Memory Subsystem
DDR3 EMIF
MSM
SRAM
Application-Specific
Coprocessors
MSMC
C66x™
CorePac
L1D
L1P
Cache/RAM Cache/RAM
L2 Memory Cache/RAM
Miscellaneous
HyperLink
1 to 8 Cores @ up to 1.25 GHz
TeraNet
Multicore Navigator
Queue
Packet
Manager
DMA
External Interfaces
Network Coprocessor
• Provides seamless inter-core
communications (messages and
data exchanges) between cores,
IP, and peripherals. “Fire and
forget”
• Low-overhead processing and
routing of packet traffic to and
from peripherals and cores
• Supports dynamic load
optimization
• Data transfer architecture
designed to minimize host
interaction while maximizing
memory and bus efficiency
• Consists of a Queue Manager
Subsystem (QMSS) and multiple,
dedicated Packet DMA (PKTDMA)
engines
Multicore Navigator architecture
Queue Interrupt
Accumulation
Memory
DSP
core
DSP
DSPcore
core
Queue Event
Queue Event
Packet
DMA
(SRIO)
Packet
DMA
(NetCP)
Queue Event
Packet
DMA
(FFTC)
Queue Event
Packet
DMA
(BCP)
Queue Event
Packet
DMA
(AIF2)
TeraNet
Q0 Q1
IF IF
Buffer
Memory
.
.
.
Descriptor
RAM
Qx
IF
Link
RAM
Packet
DMA
(Internal)
APDSP
Queue
Interrupts
APDSP
Queue
Manager
Queue Manage Subsystem
Queue Events
KeyStone I Network Coprocessor
Application-Specific
Coprocessors
Memory Subsystem
DDR3 EMIF
MSM
SRAM
MSMC
C66x™
CorePac
L1D
L1P
Cache/RAM Cache/RAM
L2 Memory Cache/RAM
TeraNet
External Interfaces
S w itc h
Multicore Navigator
Queue
Packet
Manager
DMA
E th e r n e t
S w itc h
HyperLink
1 to 8 Cores @ up to 1.25 GHz
S G MI I
x2
Miscellaneous
Security
Accelerator
Packet
Accelerator
Network Coprocessor
• Provides hardware accelerators to
perform L2, L3, and L4 processing
and encryption that was previously
done in software
• Packet Accelerator (PA)
• Single or multiple IP address
option
• UDP (and TCP) checksum and
selected CRCs
• L2/L3/L4 support
• Quality of Service (QoS)
• Multicast to multiple
destinations inside the device
• Timestamps
• Security Accelerator (SA)
• Hardware encryption,
decryption, and
authentication
• Supports IPsec ESP, IPsec AH,
SRTP, and 3GPP protocols
KeyStone I External Interfaces
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
DDR3 EMIF
MSMC
C66x™
CorePac
L1D
L1P
Cache/RAM Cache/RAM
L2 Memory Cache/RAM
1 to 8 Cores @ up to 1.25 GHz
Miscellaneous
TeraNet
HyperLink
S w itc h
E th e r n e t
S w itc h
S G MI I
x2
x4
S R IO
Device
Specific I/O
S PI
UA R T
x2
P C Ie
I2 C
GPIO
Device
Specific I/O
Multicore Navigator
Queue
Packet
Manager
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
• 2x SGMII ports support
10/100/1000 Ethernet
• 4x high-bandwidth
Serial RapidIO (SRIO) lanes
• 2x PCIe at 5 Gbps
• SPI for boot operations
• UART for
development/testing
• I2C for EPROM at 400 Kbps
• GPIO
• Device-specific Interfaces
– Wireless Applications
– General Purpose
Applications
TeraNet Switch Fabric
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
DDR3 EMIF
MSMC
C66x™
CorePac
L1D
L1P
Cache/RAM Cache/RAM
L2 Memory Cache/RAM
1 to 8 Cores @ up to 1.25 GHz
Miscellaneous
TeraNet
HyperLink
S w itc h
E th e r n e t
S w itc h
S G MI I
x2
x4
S R IO
Device
Specific I/O
S PI
UA R T
x2
P C Ie
I2 C
GPIO
Device
Specific I/O
Multicore Navigator
Queue
Packet
Manager
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
• A non-blocking switch fabric
that enables fast and
contention-free internal data
movement
• Provides a configured way –
within hardware – to manage
traffic queues and ensure
priority jobs are getting
accomplished while minimizing
the involvement of the CorePac
cores
• Facilitates high-bandwidth
communications between
CorePac cores, subsystems,
peripherals, and memory
KeyStone I TeraNet Data Connections
S
M
TPCC
TC0 M
16ch QDMA TC1 M
EDMA_0
S DDR3
CPUCLK/2
256bit TeraNet
HyperLi nk
HyperLi nk
SShared L2
S S S S
XMC
SRIO
L2
0-3 M
M
SS Core
Core
S
M
S Core M
M
M
Network M
Coprocessor
TAC_FE
S
M
M
M
M
M
RAC_BE0,1
RAC_BE0,1 MM
FFTC / PktDMA M
FFTC / PktDMA M
AIF / PktDMA M
QMSS
M
PCIe
M
DebugSS
M
SRIO
CPUCLK/3
128bit TeraNet
TC2 M
TPCC
M
TC6
TPCC TC3
64ch
TC4TC7
M
QDMA
64ch TC5TC8
M
QDMA TC9
EDMA_1,2
S TCP3e_W/R
S
S
TCP3d
TCP3d
S TAC_BE
S RAC_FE
S RAC_FE
S SVCP2
(x4)
(x4)
SVCP2
SVCP2
VCP2(x4)
(x4)
S
QMSS
S
PCIe
M
MSMC
M
DDR3
• Facilitates high-bandwidth
communication links
between DSP cores,
subsystems, peripherals, and
memories.
• Supports parallel orthogonal
communication links
KeyStone I HyperLink Bus
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
DDR3 EMIF
MSMC
Debug/Trace
C66x™
CorePac
L1D
L1P
Cache/RAM Cache/RAM
L2 Memory Cache/RAM
1 to 8 Cores @ up to 1.25 GHz
Miscellaneous
TeraNet
HyperLink
S w itc h
E th e r n e t
S w itc h
S G MI I
x2
x4
S R IO
Device
Specific I/O
S PI
UA R T
x2
P C Ie
I2 C
GPIO
Device
Specific I/O
Multicore Navigator
Queue
Packet
Manager
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
• Provides the capability to
expand the device to include
hardware acceleration or
other auxiliary processors
• Supports four lanes with up to
12.5 Gbaud per lane
KeyStone I Miscellaneous Elements
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
DDR3 EMIF
MSMC
Debug/Trace
Boot ROM
Semaphore
C66x™
CorePac
Power
Management
PLL
L1D
L1P
Cache/RAM Cache/RAM
x3
L2 Memory Cache/RAM
EDMA
1 to 8 Cores @ up to 1.25 GHz
x3
TeraNet
HyperLink
S w itc h
E th e r n e t
S w itc h
S G MI I
x2
x4
S R IO
Device
Specific I/O
S PI
UA R T
x2
P C Ie
I2 C
GPIO
Device
Specific I/O
Multicore Navigator
Queue
Packet
Manager
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
• Boot ROM
• Semaphore module provides
atomic access to shared chiplevel resources.
• Power Management
• Three on-chip PLLs:
– PLL1 for CorePacs, except
– PLL2 for DDR3
– PLL3 for Packet
Acceleration
• Three EDMA controllers
• Eight 64-bit timers
• Inter-Processor Communication
(IPC) Registers
Agenda
• Keystone Overview
• KeyStone I Architecture
–
–
–
–
–
CorePac & Memory Subsystem
Internal Communications and Transport
External Interfaces
Coprocessors and Accelerators
Miscellaneous
• KeyStone Platform
– Debug
– Device-Specific Offerings
Diagnostic Enhancements
Application-Specific
Coprocessors
Memory Subsystem
MSM
SRAM
DDR3 EMIF
MSMC
Debug/Trace
C66x™
CorePac
L1D
L1P
Cache/RAM Cache/RAM
L2 Memory Cache/RAM
1 to 8 Cores @ up to 1.25 GHz
Miscellaneous
TeraNet
HyperLink
S w itc h
E th e r n e t
S w itc h
S G MI I
x2
x4
S R IO
Device
Specific I/O
S PI
UA R T
x2
P C Ie
I2 C
GPIO
Device
Specific I/O
Multicore Navigator
Queue
Packet
Manager
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
• Embedded Trace Buffers (ETB)
enhance the diagnostic
capabilities of the CorePac.
• CP Monitor enables diagnostic
capabilities on data traffic
through the TeraNet switch
fabric.
• Automatic statistics collection
and exporting (non-intrusive)
• Monitor individual events for
better debugging
• Monitor transactions to both
memory end point and
Memory-Mapped Registers
(MMR)
• Configurable monitor filtering
capability based on address
and transaction type
Device-Specific: C6670 for Wireless Apps
C6670
Memory Subsystem
Coprocessors
2MB
MSM
SRAM
MSMC
64-Bit
DDR3 EMIF
RSA
Debug/Trace
RSA
x2
VCP2
Boot ROM
Semaphore
C66x™
CorePac
TCP3d
32KB L1P 32KB L1D
Cache/RAM Cache/RAM
1024KB L2 Cache/RAM
FFTC
Power
Management
PLL
x3
EDMA
x4
x2
TCP3e
x2
BCP
4 Cores @ 1.0 GHz / 1.2 GHz
x3
TeraNet
HyperLink
S w i tc h
E th e r n e t
S w itc h
S G MI I
x2
x4
S R IO
x6
A IF 2
S PI
UAR T
P C Ie
I2 C
x2
Multicore Navigator
Queue
Packet
Manager
DMA
GPIO
Device-specific Coprocessors:
• 2x FFT Coprocessor (FFTC)
• Turbo Decoder/Encoder
Coprocessor (TCP3d/3e)
• 4x Viterbi Coprocessor (VCP2)
• Bit-rate Coprocessor (BCP)
• 2x Rake Search Accelerator
(RSA)
Security
Accelerator
Packet
Accelerator
Network Coprocessor
Device-specific Interfaces:
• 6x Antenna Interface 2 (AIF2)
Device-Specific: C667x General Purpose
Memory Subsystem
C6671/C6672
C6674/C6678
4MB
MSM
SRAM
MSMC
64-Bit
DDR3 EMIF
Debug/Trace
Boot ROM
Semaphore
C66x™
CorePac
Power
Management
PLL
x3
32KB L1P 32KB L1D
Cache/RAM Cache/RAM
512KB L2 Cache/RAM
x3
1 to 8 Cores @ up to 1.25 GHz
EDMA
TeraNet
HyperLink
S w i tc h
E th e r n e t
S w itc h
S G MI I
x2
x4
S R IO
x2
T S IP
S PI
UAR T
x2
P C Ie
I2 C
G P IO
E MI F 1 6
Multicore Navigator
Queue
Packet
Manager
DMA
Security
Accelerator
Packet
Accelerator
Network Coprocessor
Device-specific Interfaces:
• 2x Telecommunications Serial
Port (TSIP)
• Asynchronous Memory
Interface (EMIF16):
– Connects memory up to
256 MB
– Three modes:
• Synchronized SRAM
• NAND flash
• NOR flash
Device-Specific: C665x General Purpose
C6655/57
Memory Subsystem
1MB
MSM
SRAM
32-Bit
DDR3 EMIF
MSMC
Debug/Trace
Boot ROM
Device-specific Coprocessors:
• Turbo Decoder Coprocessor
(TCP3d)
• 2x Viterbi Coprocessor (VCP2)
2nd core, C6657 only
Semaphore
C66x™
CorePac
Timers
Security /
Key Manager
Power
Management
PLL
Coprocessors
32KB L1P
32KB L1D
Cache/RAM Cache/RAM
TCP3d
1024KB L2 Cache
x2
VCP2
EDMA
x2
1 or 2 Cores @ up to 1.25 GHz
TeraNet
HyperLink
x4
S R IO
x2
P C Ie
x2
McBSP
S PI
U AR T
I2 C
UP P
G PIO
EMIF16
x2
Multicore Navigator
Queue
Packet
Manager
DMA
Ethernet
MAC
SGMII
Device-specific Interfaces:
• Asynchronous Memory
Interface (EMIF16)
• Universal Parallel Port (UPP)
• 2x Multichannel Buffered
Serial Ports (McBSP)
Device-specific Memory:
• 1 MB Multicore Shared
Memory (MSM SRAM)
• 32-bit DDR3 Interface
Device-Specific: C665x Power Optimized
C6654
Memory Subsystem
32-Bit
DDR3 EMIF
MSMC
Debug/Trace
Boot ROM
Semaphore
C66x™
CorePac
Timers
Security /
Key Manager
Power
Management
Device-specific Memory:
• 32-bit DDR3 Interface
32KB L1P
32KB L1D
Cache/RAM Cache/RAM
x2
1024KB L2 Cache
EDMA
1 Core @ 850 MHz
TeraNet
x2
P C Ie
Mc B S P
S PI
U AR T
UP P
G PIO
EMIF16
x2
x2
Multicore Navigator
Queue
Packet
Manager
DMA
I2 C
PLL
Device-specific Interfaces:
• Asynchronous Memory
Interface (EMIF16)
• Universal Parallel Port (UPP)
• 2x Multichannel Buffered
Serial Ports (McBSP)
Ethernet
MAC
SGMII
KeyStone C665x: Key HW Variations
HW Feature
CorePac Frequency (GHz)
Multicore Shared Memory (MSM)
C6654
C6655
C6657
0.85
1 @ 1.0, 1.25
2 @ 0.85, 1.0, 1.25
No
1024KB SRAM
1066
1333
Serial Rapid I/O Lanes
No
4x
HyperLink
No
Yes
Viterbi Coprocessor (VCP)
No
2x
Turbo Coprocessor Decoder (TCP3d)
No
Yes
Network Coprocessor (NETCP)
No
No
DDR3 Maximum Data Rate
C66xx (Multicore) Device Comparison
C6670
C6657
C6672
C6674
C6678
1GHz– 1.2GHz
1GHz – 1.25GHz
1GHz –
1.25GHz
1GHz – 1.25GHz
1GHz – 1.25GHz
Number of Cores
4
2
2
4
8
Fixed & Floating
Yes
Yes
Yes
Yes
Yes
153 (@1.2GHz)
80(@1.25GHz)
80(@1.25GHz)
160(@1.25GHz)
320(@1.25GHz)
32D/32P
32D/32P
32D/32P
32D/32P
32D/32P
L2 MB Dedicated /Core
1MB
1MB
512kB
512kB
512kB
L2 Shared
2 MB
1 MB
4 MB
4 MB
4 MB
64 b 1600 MHz
32 b1600 MHz
64b 1600 MHz
64b 1600 MHz
64b 1600 MHz
10/100/1000 EMAC
2x SGMII
1x SGMII
2xSGMII
2x SGMII
2x SGMII
PCI Express Gen 2
x2
x2
x2
x2
x2
Yes
Yes
Yes
Yes
Yes
x4
x4
x4
x4
x4
AIF 2 (Antenna Interface)
Yes
No
No
No
No
Network Co-processor
Yes
No
Yes
Yes
Yes
Security Co-processor
Yes/Optional
No
No
No
Yes/Optional
Comms. Coprocessors
4x VCP2; 3x TCP3d &
1x TCP3e; 3x FFTC;
RAC, TAC, 1x BCP
4xVCP2 &
1xTCP3d
No
No
No
-40C to 100C
-55C to 100C
-40C to 100C
-40 to 100C
-40 to 100C
Typ. Power (75C) @1GHz
10W+
3.5W
6W
8W
10W
Samples Availability
Now!
Now!
Now!
Now!
Now!
MHz per Core
Max GMACs
L1 KB per core
DDR (with ECC) MHz
Hyperlink
Serial RapidIO 2.1
Extended Case Temp
C66xx (Single Core) Device Comparison
C6655
C6671
1GHz – 1.25GHz
1GHz – 1.25GHz
Number of Cores
1
1
Fixed & Floating
Yes
Yes
40(@1.25GHz)
40(@1.25GHz)
32D/32P
32D/32P
L2 MB Dedicated per Core
1MB
512kB
L2 Shared
1 MB
4 MB
32 b1600 MHz
64b 1600 MHz
10/100/1000 EMAC
1x SGMII
2xSGMII
PCI Express Gen 2
x2
x2
Yes
Yes
Serial RapidIO 2.1
x4
x4
AIF 2 (Antenna Interface)
No
No
Network Co-processor
No
Yes
Security Co-processor
No
No
Comms. Coprocessors
4xVCP2 &
1xTCP3d
No
-55C to 100C
-40C to 100C
Typ. Power (75C) @1GHz
2.5W
4.5W
Samples Availability
Now!
Now!
MHz per Core
Max GMACs
L1 KB per core
DDR (with ECC) MHz
Hyperlink, 50 Gbauds
Extended Case Temp
For More Information
• Multicore articles, tools, and software are available
at Embedded Processors Wiki for the KeyStone
Device Architecture.
• View the complete C66x Multicore SOC Online
Training for KeyStone Devices, including details on
the individual modules.
• For questions regarding topics covered in this
training, visit the support forums at the
TI E2E Community and 德州仪器中文社区.