Keystone Architecture Agenda • Keystone Overview • KeyStone I Architecture – – – – – CorePac & Memory Subsystem Internal Communications and Transport External Interfaces Coprocessors and Accelerators Miscellaneous • KeyStone Platform – Debug – Device-Specific Offerings TI Keystone DSP Applications Media Gateways & Networking High Performance & Cloud Computing Imaging Applications Video Surveillance Mission Critical SDR/BTS Test and Automation Radio Network Controller Session Border Controller LTE/SAE Gateway Multimedia Gateway Video & Audio Infrastructure TI multicore innovation progression KeyStone III • 64 bit ARM v8 5 generations of multicore • • • • Lowers development effort Speeds time to market Leverages TI’s investment Optimal software reuse • C66x+ DSP • 40G Networking KeyStone II 28nm •32 bit ARM A15 •10G Networking KeyStone KeyStone I 40nm 6474/6455 Concept Sampling •C64x+ Fixed Point Only 65nm Development •C66x Fixed AND floating point DSP •Networking + Wireless Acceleration 6414/6416 130nm Production •C64x Single core DSP 2003 2006 2011 2012/13 2014 / 2015 4 Unmatched Performance BDTImark2000 TM ADI 2116x (SHARC) Score NEC uPD77050 ADI 2126x (SHARC) ADI BF5xx (Blackfin) ADI 213xx (SHARC) ADI TS201S(TigerSHARC) ADI TS201S (TigerSHARC) ADI TS202S/203S (TigerSHARC) ADI TS202S/203S (TigerSHARC) Freescale MSC81xx (SC140) Intell Pentium III Freescale MSC814x (SC3400) Renesas SH77xx (SH-4) Freescale MSC815x (SC3850) TMS320C67x TMS320C64x+ TMS320C66xx TMS320C66xx 0 2000 4000 6000 8000 10000 12000 0 14000 BDTI Score for Floating Point Processors Algorithm Single Precision Floating Point FFT, 2048 pt, Radix 4 5000 10000 15000 20000 BDTI Score for Fixed Point Processors C67x @ 300MHz C64x+ @1.2GHz 86.84 us C66x @1.25GHz Gain 17.90 us ~600% Fixed Point FFT, 2048 pt, Radix 4 8.23 us 4.46 us ~200% FIR Filter, 40 samples, 40 taps 0.69 us 0.34 us ~200% Matrix Multiply 32 x 32 17.92 us 6.16 us ~300% Matrix Inverse 4 x 4 0.53 us 0.13 us ~400% 25000 Agenda • Keystone Overview • KeyStone I Architecture – – – – – CorePac & Memory Subsystem Internal Communications and Transport External Interfaces Coprocessors and Accelerators Miscellaneous • KeyStone Platform – Debug – Device-Specific Offerings KeyStone I CorePac • 1 to 8 C66x CorePac DSP Cores Application-Specific Coprocessors Memory Subsystem C66x™ CorePac L1D L1P Cache/RAM Cache/RAM L2 Memory Cache/RAM Miscellaneous HyperLink 1 to 8 Cores @ up to 1.25 GHz TeraNet Multicore Navigator External Interfaces Network Coprocessor operating at up to 1.25 GHz – Fixed- and floating-point operations – Code compatible with other C64x+ and C67x+ devices • L1 Memory – Can be partitioned as cache and/or RAM – 32KB L1P per core – 32KB L1D per core – Error detection for L1P – Memory protection • Dedicated L2 Memory – Can be partitioned as cache and/or RAM – 512 KB to 1 MB Local L2 per core – Error detection and correction for all L2 memory • Direct connection to memory subsystem KeyStone I Memory Subsystem Memory Subsystem DDR3 EMIF MSM SRAM Application-Specific Coprocessors MSMC C66x™ CorePac L1D L1P Cache/RAM Cache/RAM L2 Memory Cache/RAM Miscellaneous HyperLink 1 to 8 Cores @ up to 1.25 GHz TeraNet Multicore Navigator External Interfaces Network Coprocessor • Multicore Shared Memory (MSM SRAM) • 1 to 4 MB • Available to all cores • Can contain program and data • All devices except C6654 • Multicore Shared Memory Controller (MSMC) • Arbitrates access of CorePac and SoC masters to shared memory • Provides a connection to the DDR3 EMIF • Provides CorePac access to coprocessors and IO peripherals • Provides error detection and correction for all shared memory • Memory protection and address extension to 64 GB (36 bits) • Provides multi-stream pre-fetching capability • DDR3 External Memory Interface (EMIF) • Support for 16-bit, 32-bit, and (for C667x devices) 64-bit modes • Specified at up to 1600 MT/s • Supports power down of unused pins when using 16-bit or 32-bit width • Support for 8 GB memory address • Error detection and correction KeyStone I Multicore Navigator Memory Subsystem DDR3 EMIF MSM SRAM Application-Specific Coprocessors MSMC C66x™ CorePac L1D L1P Cache/RAM Cache/RAM L2 Memory Cache/RAM Miscellaneous HyperLink 1 to 8 Cores @ up to 1.25 GHz TeraNet Multicore Navigator Queue Packet Manager DMA External Interfaces Network Coprocessor • Provides seamless inter-core communications (messages and data exchanges) between cores, IP, and peripherals. “Fire and forget” • Low-overhead processing and routing of packet traffic to and from peripherals and cores • Supports dynamic load optimization • Data transfer architecture designed to minimize host interaction while maximizing memory and bus efficiency • Consists of a Queue Manager Subsystem (QMSS) and multiple, dedicated Packet DMA (PKTDMA) engines Multicore Navigator architecture Queue Interrupt Accumulation Memory DSP core DSP DSPcore core Queue Event Queue Event Packet DMA (SRIO) Packet DMA (NetCP) Queue Event Packet DMA (FFTC) Queue Event Packet DMA (BCP) Queue Event Packet DMA (AIF2) TeraNet Q0 Q1 IF IF Buffer Memory . . . Descriptor RAM Qx IF Link RAM Packet DMA (Internal) APDSP Queue Interrupts APDSP Queue Manager Queue Manage Subsystem Queue Events KeyStone I Network Coprocessor Application-Specific Coprocessors Memory Subsystem DDR3 EMIF MSM SRAM MSMC C66x™ CorePac L1D L1P Cache/RAM Cache/RAM L2 Memory Cache/RAM TeraNet External Interfaces S w itc h Multicore Navigator Queue Packet Manager DMA E th e r n e t S w itc h HyperLink 1 to 8 Cores @ up to 1.25 GHz S G MI I x2 Miscellaneous Security Accelerator Packet Accelerator Network Coprocessor • Provides hardware accelerators to perform L2, L3, and L4 processing and encryption that was previously done in software • Packet Accelerator (PA) • Single or multiple IP address option • UDP (and TCP) checksum and selected CRCs • L2/L3/L4 support • Quality of Service (QoS) • Multicast to multiple destinations inside the device • Timestamps • Security Accelerator (SA) • Hardware encryption, decryption, and authentication • Supports IPsec ESP, IPsec AH, SRTP, and 3GPP protocols KeyStone I External Interfaces Application-Specific Coprocessors Memory Subsystem MSM SRAM DDR3 EMIF MSMC C66x™ CorePac L1D L1P Cache/RAM Cache/RAM L2 Memory Cache/RAM 1 to 8 Cores @ up to 1.25 GHz Miscellaneous TeraNet HyperLink S w itc h E th e r n e t S w itc h S G MI I x2 x4 S R IO Device Specific I/O S PI UA R T x2 P C Ie I2 C GPIO Device Specific I/O Multicore Navigator Queue Packet Manager DMA Security Accelerator Packet Accelerator Network Coprocessor • 2x SGMII ports support 10/100/1000 Ethernet • 4x high-bandwidth Serial RapidIO (SRIO) lanes • 2x PCIe at 5 Gbps • SPI for boot operations • UART for development/testing • I2C for EPROM at 400 Kbps • GPIO • Device-specific Interfaces – Wireless Applications – General Purpose Applications TeraNet Switch Fabric Application-Specific Coprocessors Memory Subsystem MSM SRAM DDR3 EMIF MSMC C66x™ CorePac L1D L1P Cache/RAM Cache/RAM L2 Memory Cache/RAM 1 to 8 Cores @ up to 1.25 GHz Miscellaneous TeraNet HyperLink S w itc h E th e r n e t S w itc h S G MI I x2 x4 S R IO Device Specific I/O S PI UA R T x2 P C Ie I2 C GPIO Device Specific I/O Multicore Navigator Queue Packet Manager DMA Security Accelerator Packet Accelerator Network Coprocessor • A non-blocking switch fabric that enables fast and contention-free internal data movement • Provides a configured way – within hardware – to manage traffic queues and ensure priority jobs are getting accomplished while minimizing the involvement of the CorePac cores • Facilitates high-bandwidth communications between CorePac cores, subsystems, peripherals, and memory KeyStone I TeraNet Data Connections S M TPCC TC0 M 16ch QDMA TC1 M EDMA_0 S DDR3 CPUCLK/2 256bit TeraNet HyperLi nk HyperLi nk SShared L2 S S S S XMC SRIO L2 0-3 M M SS Core Core S M S Core M M M Network M Coprocessor TAC_FE S M M M M M RAC_BE0,1 RAC_BE0,1 MM FFTC / PktDMA M FFTC / PktDMA M AIF / PktDMA M QMSS M PCIe M DebugSS M SRIO CPUCLK/3 128bit TeraNet TC2 M TPCC M TC6 TPCC TC3 64ch TC4TC7 M QDMA 64ch TC5TC8 M QDMA TC9 EDMA_1,2 S TCP3e_W/R S S TCP3d TCP3d S TAC_BE S RAC_FE S RAC_FE S SVCP2 (x4) (x4) SVCP2 SVCP2 VCP2(x4) (x4) S QMSS S PCIe M MSMC M DDR3 • Facilitates high-bandwidth communication links between DSP cores, subsystems, peripherals, and memories. • Supports parallel orthogonal communication links KeyStone I HyperLink Bus Application-Specific Coprocessors Memory Subsystem MSM SRAM DDR3 EMIF MSMC Debug/Trace C66x™ CorePac L1D L1P Cache/RAM Cache/RAM L2 Memory Cache/RAM 1 to 8 Cores @ up to 1.25 GHz Miscellaneous TeraNet HyperLink S w itc h E th e r n e t S w itc h S G MI I x2 x4 S R IO Device Specific I/O S PI UA R T x2 P C Ie I2 C GPIO Device Specific I/O Multicore Navigator Queue Packet Manager DMA Security Accelerator Packet Accelerator Network Coprocessor • Provides the capability to expand the device to include hardware acceleration or other auxiliary processors • Supports four lanes with up to 12.5 Gbaud per lane KeyStone I Miscellaneous Elements Application-Specific Coprocessors Memory Subsystem MSM SRAM DDR3 EMIF MSMC Debug/Trace Boot ROM Semaphore C66x™ CorePac Power Management PLL L1D L1P Cache/RAM Cache/RAM x3 L2 Memory Cache/RAM EDMA 1 to 8 Cores @ up to 1.25 GHz x3 TeraNet HyperLink S w itc h E th e r n e t S w itc h S G MI I x2 x4 S R IO Device Specific I/O S PI UA R T x2 P C Ie I2 C GPIO Device Specific I/O Multicore Navigator Queue Packet Manager DMA Security Accelerator Packet Accelerator Network Coprocessor • Boot ROM • Semaphore module provides atomic access to shared chiplevel resources. • Power Management • Three on-chip PLLs: – PLL1 for CorePacs, except – PLL2 for DDR3 – PLL3 for Packet Acceleration • Three EDMA controllers • Eight 64-bit timers • Inter-Processor Communication (IPC) Registers Agenda • Keystone Overview • KeyStone I Architecture – – – – – CorePac & Memory Subsystem Internal Communications and Transport External Interfaces Coprocessors and Accelerators Miscellaneous • KeyStone Platform – Debug – Device-Specific Offerings Diagnostic Enhancements Application-Specific Coprocessors Memory Subsystem MSM SRAM DDR3 EMIF MSMC Debug/Trace C66x™ CorePac L1D L1P Cache/RAM Cache/RAM L2 Memory Cache/RAM 1 to 8 Cores @ up to 1.25 GHz Miscellaneous TeraNet HyperLink S w itc h E th e r n e t S w itc h S G MI I x2 x4 S R IO Device Specific I/O S PI UA R T x2 P C Ie I2 C GPIO Device Specific I/O Multicore Navigator Queue Packet Manager DMA Security Accelerator Packet Accelerator Network Coprocessor • Embedded Trace Buffers (ETB) enhance the diagnostic capabilities of the CorePac. • CP Monitor enables diagnostic capabilities on data traffic through the TeraNet switch fabric. • Automatic statistics collection and exporting (non-intrusive) • Monitor individual events for better debugging • Monitor transactions to both memory end point and Memory-Mapped Registers (MMR) • Configurable monitor filtering capability based on address and transaction type Device-Specific: C6670 for Wireless Apps C6670 Memory Subsystem Coprocessors 2MB MSM SRAM MSMC 64-Bit DDR3 EMIF RSA Debug/Trace RSA x2 VCP2 Boot ROM Semaphore C66x™ CorePac TCP3d 32KB L1P 32KB L1D Cache/RAM Cache/RAM 1024KB L2 Cache/RAM FFTC Power Management PLL x3 EDMA x4 x2 TCP3e x2 BCP 4 Cores @ 1.0 GHz / 1.2 GHz x3 TeraNet HyperLink S w i tc h E th e r n e t S w itc h S G MI I x2 x4 S R IO x6 A IF 2 S PI UAR T P C Ie I2 C x2 Multicore Navigator Queue Packet Manager DMA GPIO Device-specific Coprocessors: • 2x FFT Coprocessor (FFTC) • Turbo Decoder/Encoder Coprocessor (TCP3d/3e) • 4x Viterbi Coprocessor (VCP2) • Bit-rate Coprocessor (BCP) • 2x Rake Search Accelerator (RSA) Security Accelerator Packet Accelerator Network Coprocessor Device-specific Interfaces: • 6x Antenna Interface 2 (AIF2) Device-Specific: C667x General Purpose Memory Subsystem C6671/C6672 C6674/C6678 4MB MSM SRAM MSMC 64-Bit DDR3 EMIF Debug/Trace Boot ROM Semaphore C66x™ CorePac Power Management PLL x3 32KB L1P 32KB L1D Cache/RAM Cache/RAM 512KB L2 Cache/RAM x3 1 to 8 Cores @ up to 1.25 GHz EDMA TeraNet HyperLink S w i tc h E th e r n e t S w itc h S G MI I x2 x4 S R IO x2 T S IP S PI UAR T x2 P C Ie I2 C G P IO E MI F 1 6 Multicore Navigator Queue Packet Manager DMA Security Accelerator Packet Accelerator Network Coprocessor Device-specific Interfaces: • 2x Telecommunications Serial Port (TSIP) • Asynchronous Memory Interface (EMIF16): – Connects memory up to 256 MB – Three modes: • Synchronized SRAM • NAND flash • NOR flash Device-Specific: C665x General Purpose C6655/57 Memory Subsystem 1MB MSM SRAM 32-Bit DDR3 EMIF MSMC Debug/Trace Boot ROM Device-specific Coprocessors: • Turbo Decoder Coprocessor (TCP3d) • 2x Viterbi Coprocessor (VCP2) 2nd core, C6657 only Semaphore C66x™ CorePac Timers Security / Key Manager Power Management PLL Coprocessors 32KB L1P 32KB L1D Cache/RAM Cache/RAM TCP3d 1024KB L2 Cache x2 VCP2 EDMA x2 1 or 2 Cores @ up to 1.25 GHz TeraNet HyperLink x4 S R IO x2 P C Ie x2 McBSP S PI U AR T I2 C UP P G PIO EMIF16 x2 Multicore Navigator Queue Packet Manager DMA Ethernet MAC SGMII Device-specific Interfaces: • Asynchronous Memory Interface (EMIF16) • Universal Parallel Port (UPP) • 2x Multichannel Buffered Serial Ports (McBSP) Device-specific Memory: • 1 MB Multicore Shared Memory (MSM SRAM) • 32-bit DDR3 Interface Device-Specific: C665x Power Optimized C6654 Memory Subsystem 32-Bit DDR3 EMIF MSMC Debug/Trace Boot ROM Semaphore C66x™ CorePac Timers Security / Key Manager Power Management Device-specific Memory: • 32-bit DDR3 Interface 32KB L1P 32KB L1D Cache/RAM Cache/RAM x2 1024KB L2 Cache EDMA 1 Core @ 850 MHz TeraNet x2 P C Ie Mc B S P S PI U AR T UP P G PIO EMIF16 x2 x2 Multicore Navigator Queue Packet Manager DMA I2 C PLL Device-specific Interfaces: • Asynchronous Memory Interface (EMIF16) • Universal Parallel Port (UPP) • 2x Multichannel Buffered Serial Ports (McBSP) Ethernet MAC SGMII KeyStone C665x: Key HW Variations HW Feature CorePac Frequency (GHz) Multicore Shared Memory (MSM) C6654 C6655 C6657 0.85 1 @ 1.0, 1.25 2 @ 0.85, 1.0, 1.25 No 1024KB SRAM 1066 1333 Serial Rapid I/O Lanes No 4x HyperLink No Yes Viterbi Coprocessor (VCP) No 2x Turbo Coprocessor Decoder (TCP3d) No Yes Network Coprocessor (NETCP) No No DDR3 Maximum Data Rate C66xx (Multicore) Device Comparison C6670 C6657 C6672 C6674 C6678 1GHz– 1.2GHz 1GHz – 1.25GHz 1GHz – 1.25GHz 1GHz – 1.25GHz 1GHz – 1.25GHz Number of Cores 4 2 2 4 8 Fixed & Floating Yes Yes Yes Yes Yes 153 (@1.2GHz) 80(@1.25GHz) 80(@1.25GHz) 160(@1.25GHz) 320(@1.25GHz) 32D/32P 32D/32P 32D/32P 32D/32P 32D/32P L2 MB Dedicated /Core 1MB 1MB 512kB 512kB 512kB L2 Shared 2 MB 1 MB 4 MB 4 MB 4 MB 64 b 1600 MHz 32 b1600 MHz 64b 1600 MHz 64b 1600 MHz 64b 1600 MHz 10/100/1000 EMAC 2x SGMII 1x SGMII 2xSGMII 2x SGMII 2x SGMII PCI Express Gen 2 x2 x2 x2 x2 x2 Yes Yes Yes Yes Yes x4 x4 x4 x4 x4 AIF 2 (Antenna Interface) Yes No No No No Network Co-processor Yes No Yes Yes Yes Security Co-processor Yes/Optional No No No Yes/Optional Comms. Coprocessors 4x VCP2; 3x TCP3d & 1x TCP3e; 3x FFTC; RAC, TAC, 1x BCP 4xVCP2 & 1xTCP3d No No No -40C to 100C -55C to 100C -40C to 100C -40 to 100C -40 to 100C Typ. Power (75C) @1GHz 10W+ 3.5W 6W 8W 10W Samples Availability Now! Now! Now! Now! Now! MHz per Core Max GMACs L1 KB per core DDR (with ECC) MHz Hyperlink Serial RapidIO 2.1 Extended Case Temp C66xx (Single Core) Device Comparison C6655 C6671 1GHz – 1.25GHz 1GHz – 1.25GHz Number of Cores 1 1 Fixed & Floating Yes Yes 40(@1.25GHz) 40(@1.25GHz) 32D/32P 32D/32P L2 MB Dedicated per Core 1MB 512kB L2 Shared 1 MB 4 MB 32 b1600 MHz 64b 1600 MHz 10/100/1000 EMAC 1x SGMII 2xSGMII PCI Express Gen 2 x2 x2 Yes Yes Serial RapidIO 2.1 x4 x4 AIF 2 (Antenna Interface) No No Network Co-processor No Yes Security Co-processor No No Comms. Coprocessors 4xVCP2 & 1xTCP3d No -55C to 100C -40C to 100C Typ. Power (75C) @1GHz 2.5W 4.5W Samples Availability Now! Now! MHz per Core Max GMACs L1 KB per core DDR (with ECC) MHz Hyperlink, 50 Gbauds Extended Case Temp For More Information • Multicore articles, tools, and software are available at Embedded Processors Wiki for the KeyStone Device Architecture. • View the complete C66x Multicore SOC Online Training for KeyStone Devices, including details on the individual modules. • For questions regarding topics covered in this training, visit the support forums at the TI E2E Community and 德州仪器中文社区.