PLX Overview PCI Express & Storage - Two Fast Growing Markets PCI Express Switches & Bridges Connectivity USB USB & & PCI PCI Bridges, Bridges, Controllers Controllers & & UARTs UARTs - Now over 50% of Company’s Revenue Storage Controllers PLX Revenue Split by Product Line Public NASDAQ Company (PLXT) Financially Solid with Zero Debt, Cash Flow Positive 2 Market Leader #1 Supplier PCI Express Interconnect PCI Express Switches & Bridges - Over 55% Marketshare* - Designs with all Market Leaders - 4 Million Units Shipped - Broadest Offering of Switches & Bridges Embedded Communications Server Storage PC/Consumer * 55% for Bridges & Switches Combined. Over 65% for Switches. 3 PCIe Switch Is a Basic Building Block x86 Processors SPARC CPU with Native PCIe Graphic Processors Network, Security, & Co-Processors Chip Chip Sets Sets ASICs, Logic, & FPGAs Communication & Storage PCI Express Switch Switch PCI PCI Express Express Cleaner, Lower Cost, Lower Power Switch is a basic building block 4 Product Summary & Road Maps PLX Products No NDA Required 5 PCIe Gen 1 Switch Road Map Lanes Lanes PEX 8548 48 48 32 32 24 24 48 Lanes, 9 Ports, 1KB 3 HPC, 110ns Cut-Thru Introduced in 2004 Server & Storage PEX 8547 48 Lanes, 3 x16 Ports, for Graphics Apps PEX 8532 PEX 8533 32 Lanes, 8 Ports, NT 8 HPC, 2VC, Peer-Peer 32 Lanes, 6 Ports, 1KB 3 HPC, 115ns Cut-Thru PEX 8524 Pin Compatible Migration* 24 Lanes, 6 Ports, NT 6 HPC, 2VC, Peer-Peer 16 16 PEX 8518 PEX 8516 PEX 8517 Added Cut Thru 150ns Latency HBAs & NICs 16 Lanes, 5 Ports, NT 4 HPC, 150ns Cut-Thru 88 Shipping Shipping Now Now PEX 8512 12 Lanes, 5 Ports, NT 5 HPC, 150ns Cut-Thru PEX 8508 PEX 8509 8 Lanes, 5 Ports, NT 5 HPC, 150ns Cut-Thru 8 Lane, 8 Ports, 1KB 118ns, 15x15mm In In Development Development <5 <5 Planned/Concept Planned/Concept Approximate Approximate Timeframes Timeframes PEX 8525 24 Lanes, 5 Ports, 1KB 3 HPC, 115ns Cut-Thru 16 Lanes, 5 Ports, NT 5 HPC, 150ns Cut-Thru 16 Lanes, 4 Ports, NT 4 HPC, 2VC, Peer-Peer 12 12 110ns Lowest Latency & Power with Highest Performance Adds Graphics & Embedded * Pin compatible with some feature changes. Discuss with PLX 2005 2006 Industry’s 1stst Control Plane Switch Communications PEX 8505 5 Lanes, 5 Ports, 1KB 138ns, 15x15mm 2007 6 PCIe Gen 2 Switch Road Map Lanes Lanes 96 96 Multi-Host MR & Multicast 80 80 64 64 48 48 32 32 st Gen 2 Family 1st 64 Lanes, 16 Ports, NT Multi-Root/Host PEX 8649 PEX 8648 PEX 8647 48 Lanes, 12 Ports, NT 48 Lanes, 3 x16 Ports PEX 8632 32 Lanes, 12 Ports, NT PEX 8624 24 Lanes, 6 Ports, NT 2KB, 3HPC, DC, RP, DB PEX 8680 80 Lanes, 20 Ports, NT Multi-Root/Host PEX 8664 Server, Storage & Dual Graphics 2KB, 3HPC, DC, RP, DB 24 24 Blade & Rack Servers, Storage & Networking PEX 8696 96 Lanes, 24 Ports, NT Multi-Root/Host 48 Lanes, 12 Ports, NT (MR)/Host & Multicast 2KB, DC, DB, RP High Port Count, 2VC, DMA & SSC for Control Plane Networking, Embedded & Storage PEX 8617 16 16 PEX 8616 16 Lanes, 4 Ports, NT 2KB, 3HPC, DC, RP, DB 12 12 PEX 8612 12 Lanes, 3 Ports, NT 2KB, 3HPC, DC, RP, DB 88 16 Lanes, 4 Ports, NT PEX 8618 16 Lanes, 16 Ports, NT PEX 8613 PEX 8619 16 Lanes, 16 Ports, NT 2VC, DC, RP, SSC, DB, DMA 12 Lanes, 3 Ports, NT PEX 8614 PEX 8615 12 Lanes, 12 Ports, NT 12 Lanes, 12 Ports, NT 2VC, DC, RP, SSC, DB 2VC, DC, RP, SSC, DB, DMA 8 Lanes, 8 Ports, NT 8 Lanes, 8 Ports, NT 2VC, DC, RP, SSC, DB 2VC, DC, RP, SSC, DB, DMA PEX 8608 PEX 8609 PEX 8606 66 6 Lanes, 6 Ports, NT 44 4 Lanes, 4 Ports, NT Approximate ApproximateTimeframes Timeframes PEX 8604 2VC, DC, RP, SSC, DB 2008 Shipping Shipping Now Now In In Development Development Low Lane Count Planned/Concept Planned/Concept General Purpose 2009 2010 7 PLX Exclusive Features ¾ Here are the exclusive features we will talk about today: visionPAKTM Suite • • • • • Extraction of Receive Data “Eye Width” PCIe Packet Generator Performance Monitoring Error Injection SerDes Loopback Modes performancePAKTM Suite • Read Pacing • Multicast • Dynamic Buffer Allocation ¾ All valuable in both Gen 2 and Gen 1 modes!! 8 visionPAK™ Suite System Debug Features Exclusive to PLX 8600 Switches 9 Best-in-Class Gen 2 SerDes ¾ Best-in-class SerDes (ARM) Can support up to 60 inches of trace length for backplanes ¾ Optional programmable features Transition amplitude and Non-transitional amplitude Pre-emphasis, De-emphasis and drive strength granularity to 50mV Receiver Detect and Electrical Idle bits SerDes BIST and AC JTAG Automatic impedance calibration Non-Transitional Eye Transitional Eye 10 Extraction of Receiver Eye Width ¾ What is it? PLX Exclusive pre-system design tool Used on PLX RDK to test how clean a link is • Early indicator of potential link issues • A very open eye means a very reliable link • A tight eye indicates a weak link • Supported by all PEX 86xx switches 11 Extraction of Receiver Eye Width ¾ How does it work? 1. User selects port, lane, Rx equalization, dwell time, etc. 2. Software finds the center of the eye 3. Software steps through the eye left & right until data errors occur 4. Determines eye width and displays on the screen Eye Width Center of the Eye Steps Left Steps Right 12 Extraction of Receiver Eye Width ¾ Minimum Eye Width Test Customer selects Port, Lanes, and Minimum Eye Width Software tests all selected lanes for Minimum Eye Width Returns “Pass” or “Fail” for each selected lane Extremely convenient and pain-free tool for Customers 13 Extraction of Receiver Eye Width ¾ Auto-Calibrate Feature Rx = Receiver Customer selects Port and Lane and range of Rx Equalization Software steps through each eye and finds optimal setting Returns value of Rx Equalization that gives best eye width Extremely convenient and useful tool for Customers 14 Extraction of Receiver Eye Width ¾ Customer Benefits Allows customer to see how much margin their link has at the PEX 86xx Receiver Convenient features • Minimum Eye Width Test • Auto-Calibrate Feature Identify potential link issues earlier …GET TO MARKET FASTER!!! 15 PCIe Packet Generator ¾ What is it? PLX Exclusive feature that allows customers to create their own traffic patterns using a PLX switch Enables high density traffic • Up to x16 port saturation (not easy to achieve) Can create error messages • See how software/system reacts to errors PLX RDKs can be used as PCIe packet generators • Great alternative to expensive PCIe Exercisers • All-in-one solution 16 PCIe Packet Generator ¾ User-programmable traffic Memory Reads/Writes Payload Size PCI Address ¾ View command list ¾ Create looped traffic 17 PCIe Packet Generator ¾ Customer Benefits Convenient, inexpensive way of stress testing their system • • • • Programmable traffic Ability to fully saturate their links Test system software No external equipment needed …SAVES $$$!!! 18 Performance Monitor ¾ What is it? PLX Exclusive feature that allows customers to monitor PEX 86xx switch performance real-time Displays performance for each individual port • Displays Ingress and Egress performance Completely passive • Does not impact system performance 19 Performance Monitor ¾ How does it work? Customer selects Port to monitor Software reads PEX 86xx registers and displays the data • Link Utilization % indicates unused link potential • Displays total rate, payload rate, reads vs. writes, etc. 20 Performance Monitor ¾ Graphically ‘see’ the traffic on each port during runtime Total Byte Rate % Link Utilization Payload Byte Rate Average Payload Size (Bytes) 21 Performance Monitor ¾ Count Ingress & Egress TLPs for every port Extensive Granularity • Posted header & dword • Non-Posted dword • Completion header & dword Filter for various types of Posted and Non-Posted packets • For example, count just MWr64 ¾ Count Ingress & Egress DLLPs for every port Extensive Granularity • • • Filter for ACKs Filter for NAKs Filter for UpdateFCs (Posted, Non-Posted, or Completions) 22 Performance Monitor ¾ Customer Benefits Allows customers to track real-time link utilization Helps find any weak links & potential bottlenecks Gives additional visibility into traffic patterns Convenient, inexpensive system bring-up tool …THE COMPLETE SOLUTION!!! 23 Error Injection ¾ What is it? Software development tool Inject ECC errors in PCIe packet Inject PCIe error messages ¾ Customer Benefits Allows a customer to see how their software would react to these errors …SPEEDS UP SOFTWARE DEVELOPMENT!!! 24 Loopback ¾ Customer Benefits Four convenient ways to test the SerDes and Logic of the PEX 86xx and/or connected device • • • • Internal Tx External Tx Recovered CLK Recovered Data …FOUR LOOPBACK MODES!!! 25 Internal Transmitter Loopback ¾ Used to test PLX SerDes and Logic 26 External Transmitter Loopback ¾ Used to test the link and logic of connected device 27 Recovered Clock Loopback ¾ External Tx test plus also tests PLX Recovered Clock Circuit 28 Recovered Data Loopback ¾ External Tx test plus also tests PLX Logic 29 Easy Debug via I2C ¾ Customers recommended to design in a I2C connector Allows easy connection to PLX RDK via laptop or PC (USB) Enables systems w/o Windows or Linux OS to run SDK thru remote system (i.e. laptop) Convenient for debug purposes • Ideal for on-site FAE support USB PLX RDK PORT 8 – X16 Port 8 Status LED Port 12 Status LED Port 0 Status LED PORT 12 – X16 Manual Reset HD Power Connector I22C DIP Switch for Config EEPROM JTAG PEX 8548 I2C RefClk PERST# Port 0 – x16 (Upstream Port) System Chassis 30 performancePAK™ Suite Performance Enhancing Features Exclusive to PLX 8600 Switches 31 PEX 8600 Performance ¾ PLX Leadership in Gen 2 Performance ¾ Achieving >99% theoretical max throughput in Host-Centric and Peer-to-Peer environments Stay ahead of the pack with PLX! ¾ Proven in simulations & actual measurements x16 Gen 2 Host-Centric Throughput 7000 100.0 10GE Throughput 5000 12 4000 60.0 100.00 10 0 80.00 8 60.00 6 64 4 128 256 512 1024 Packet Size (B) PEX 8600 Bidirectional 20.0 0.0 Native w/ PEX 8624 % of Native 20.00 2 Theoretical Max 40.00 2048 40.0 % of Native 1000 Throughput (GB/s) 3000 2000 80.0 % Theoretical Max 6000 Throughput (MB/s) ¾ Featuring performancePAK™ Read Pacing™ Dual Cast™ & Multicast Dynamic Buffer Allocation DMA-engine inside % Theoretical Max 0 0.00 One Thread Multi-Thread 32 Read Pacing™ ¾ Problem Æ Reduced endpoint performance caused by: Unbalanced upstream/downstream link-widths Uneven number of Read Requests made by endpoints Leads to one endpoint dominating Root Complex queue • Other endpoints get starved ¾ Solution Æ PLX Read Pacing* Read Pacing Queues manage incoming Read Requests Prevents one endpoint from dominating Root Complex queue Ensures no endpoint is starved • Allows for optimized performance of endpoints * Patents pending 34 Without Read Pacing ¾ Performance bottleneck due to mixing of slow Root Complex and fast I/Os 1KB Data IN 2KB Data 2KB RR 1. FC HBA makes multiple 2KB Read Requests 2KB Data 2KB RR 2KB RR 2KB Data 2KB RR 1KB RR 2. Root Complex queues FC HBA requests 2KB Data OUT 3. Ethernet NIC makes one 1KB Read Requests x8 4. Root Complex queues Ethernet NIC request 5. Ethernet NIC must wait for RC to service FC HBA requests before serving Ethernet NIC request 6. Ethernet NIC is starved Æ Reduced Ethernet NIC Performance! Ethernet NIC packets at the end of the line Switch Sends 2KB Read Requests x4 2KB RR FC HBA x4 Sends 1KB Read Requests 1KB RR Ethernet NIC 35 With PLX Read Pacing Root Complex ¾ Increased performance due to fair allocation of bandwidth to downstream ports IN 1. FC HBA makes multiple 2KB Read Requests 2KB RR 2KB Data 2KB RR 1KB Data 2KB Data 1KB RR 3. Ethernet NIC makes one 1KB Read Requests OUT 1KB RR 4. Switch allows one requests to pass through the switch based on programmable thresholds x8 5. Switch continues to allow Ethernet NIC requests to pass through in front of large FC HBA requests based on programmable settings 7. Neither endpoint is starved Æ Optimized Performance! 1KB Data 2KB RR 2. Switch allows one FC HBA request at a time to pass through based on programmable thresholds 6. Ethernet NIC gets serviced more often with no impact to FC HBA performance 2KB Data Ethernet NIC packets fairly queued PEX 8600 Read Pacing Queue Sends 2KB Read Requests x4 2KB RR FC HBA Read Pacing Queue x4 Sends 1KB Read Requests 1KB RR Ethernet 1KB RR NIC 36 Read Pacing Measured Read Pacing Comparison Root Complex Throughput (MB/s) 498.355 496.315 496.125 x8 Switch x4 112.15 112.24 20.87 Standalone GE NIC Performance Read Pacing Off Read Pacing On PLX Packet Generator Ethernet NIC “Port Hog” “Starved Port” PLX PCIe Packet Generator Performance % of Standalone Performance Endpoint x4 Read Pacing OFF Read Pacing ON GE NIC 18.6 100.1 PLX Packet Generator 99.6 99.6 ¾ PLX PCIe Packet Generator Used to mimic a “fast” I/O (i.e. FC HBA) Sending back to back Memory Read Requests to Host 37 PCIe Multicast ¾ Address Based Multicast BAR creates a Multicast space Posted packets that hit in the Multicast BAR are multicast ¾ Reliability PCIe has low error rate and hop by hop error free transmission ¾ Supports Legacy Any source can send posted packet in Multicast space Legacy devices can be multicast targets ¾ Multicast ECN Specifies how – Switches, Root Complex, and End Points implement Multicast Improves flexibility and protection for End Points participating in Multicast Also allows use of legacy End Points 38 Multicast – Example ¾ ¾ ¾ ¾ ¾ Support for 64 Multicast (MC) Groups All ports can be programmed as MC source One source port can have multiple MC groups One destination port can be part of multiple MC groups MC can be done across an NT port CPU CPU Chipset Chipset Source Port PEX 8680 PEX 8680 Destination Ports IO IO GPU GPU CPU sending single command to multiple IOs (Group 1) IO IO GPU GPU CPU sending single command to multiple GPUs (Group 2) 39 Multicast Memory Space Multicast_Base 2 Multicast_Index_Position Multicast Group 0 Memory Space Multicast Group 1 Memory Space Multicast Group 2 Memory Space Multicast Group 3 Memory Space Multicast Group n Memory Space 40 Multicast and Address Routing PCIe Standard Address Route Multicast Address Route ¾ Request that hits a Multicast address range is routed unchanged to Ports that are part of the Multicast Group derived from Request address ¾ PCIe standard address route not used for multicast Including default upstream route 41 Peer-to-Peer & Peer Plus Host Multicast Root PCIe Switch P2P Bridge Virtual PCI Bus P2P Bridge P2P Bridge P2P Bridge P2P Bridge Endpoint Endpoint Endpoint Endpoint 42 MC in Graphics & Floating Point Acceleration ¾ Dual-headed graphics Each GPU paints ½ the screen Multi-Cast commands downstream CPU Root Complex • E.g. vector list Use peer to peer to transfer bit map from GPU2 to GPU1 ¾ General Floating Point acceleration Some GPUs need to see same data Push data or commands downstream to multiple GPUs/FPUs x16 PCIe SWITCH x16 GPU1 x16 GPU2 43 MC in Communication Systems A 40G (GE) Line Card • May need to split processing over 4 NPUs via MC • Service card on AMC may need to MC packets to FPGA & RegEX FPGA RegEx NPU 1 PEX 8696 AMC NPU 4 RGMII CPU NPU 44 MC in Storage CPU Chip Set CPU Up to 8 Processor Boards Mem Mem PEX 8624 PEX 8624 IO Drawer MC Enabled Ports Chip Set PEX 8664 MR Switch IO Drawer PEX 8664 MR Switch Back Plane IO Cards Back Plane IO Cards 45 PLX PEX 8600 Buffer Allocation ¾ Shared memory pool per 16 lanes PLX Buffer Allocation x4 Assigned Buffers ¾ User assigns buffers as per port-width Set minimum buffers per ports Also creates a common pool ¾ Ports dynamically grab buffers as needed Grab when utilization of assigned buffers exceeds user-assigned thresholds (25% by default) Return empty buffers to the pool Common Buffer Pool Assigned Buffers x4 x2 x2 46 Dynamic Allocation Æ Appropriate Buffers Static Buffers/port ¾ Static Buffers/port x8 Unused buffers Can not assign based on traffic load Can not move buffers between ports …LOWER PERFORMANCE ¾ PLX Shared Memory Pool All buffers usable Assign based on traffic load Move buffers around between ports …HIGHER PERFORMANCE x1 x4 Unused buffers Fixed 5 packet buffers for each ports for all port widths x8 Shared Memory Pool x8 Buffers assign as needed x1 x4 x8 47 Multi-Root Architecture 48 What is Multi-Root? ¾ A root (upstream) port connects in the direction of the CPU ¾ A multi-root device has more than one upstream port Provides connection for Two or more CPUs ¾ Note: Endpoint sharing between CPUs is not supported One upstream port Two upstream ports (single root) (multi-root) CPU CPU CPU CPU Upstream port Upstream port Switch PCIe PCIe Switch PCIe PCIe Switch PCIe PCIe PCIe PCIe 49 Multi-Root Benefits in PEX Switch ¾ Up to Eight upstream ports Eight independent subsystems Host Manager ¾ Efficient use of interconnect Root Complex Mix-and-match number of endpoints to CPU • Based on performance needs • Up to 24-ports supported Host Host Root Complex Root Complex PLX MR Switch PCIe EndPoint PCIe End-Point PCIe End-Point ¾ Failover/Redundancy Re-assign endpoints of a failed host ¾ Smaller footprint Switch Switch PCIe End-Point PCIe End-Point PCIe End-Point PCIe End-Point PCIe End-Point Lower power 50 Transparent and Multi-Root Modes ¾ PEX MR switches support two functional modes Mode 0 – Transparent Switch • Same as today’s switches Mode 1 – Multi-Root Switch • Advanced architecture with multiple upstream ports Mode 0 CPU Mode 1 CPU Cygnus PCIe PCIe CPU Cygnus PCIe PCIe PCIe PCIe 51 Generic Rack Mount Server CPU Chip Set PEX 8696 I/O I/O 52 Storage Server with Failover CPU CPU Chip Set Chip Set NT PEX 8696 I/O I/O 53 Servers Sharing PCIe MR Switch Each CPU* has its own dedicated IOs isolated from other CPUs CPU CPU CPU Chip Set Chip Set Chip Set PEX 8696 * Up to 8 CPUs supported I/O I/O I/O I/O I/O I/O 54 Use of Mode 2 for Fail-over x4 & x8 Endpoint PEX 8664 Endpoint PEX 8664 Endpoint Chip Set Endpoint Chip Set Endpoint CPU Endpoint CPU 55 Packet-Ahead™ Two Virtual Channels Implementation Traffic Class re-mapping 56 Traffic Classes and Virtual Channels ¾ Traffic Class (TC) Specifies priority for a given PCI Express packet PCI Express supports Eight TCs: TC0 – TC7 • TC7 Î Highest Priority • TC for a Request/Completion pair must be the same ¾ Virtual Channel (VC) Buffer entity used to queue PCIe packets • 1 VC = 1 Buffer entity • 2 VC = 2 Buffer entities VC assignment for a given TC according to priority • Low Priority Traffic shares one VC • High Priority Traffic has its own VC 57 One Wire, Multiple Traffic Flows TC6-TC4/VC1 TC7/VC2 PCI Express Wire TC3-TC0/VC0 ¾ Traffic Class determines priority for packets ¾ Packets mapped to VCs according to Priority Highest Priority TC7 Î VC2 Lower Priority TC6-TC4 Î VC1 Lowest Priority TC3-TC0 Î VC0 ¾ One VC Î Same priority for ALL packets 58 TC – VC Mapping in PCIe Hierarchy ¾ Each Device supports different number of VCs ¾ TC/VC mapping according to device capabilities On a per link basis ¾ VC arbitration schemes enable QoS ¾ No ordering between VCs; independent buffers 59 Packet-Ahead Feature ¾ Allows the NT port to modify the original Traffic Class (TC) of a PCIe packet From TC0 to TCx • where TCx is TC1 – TC7 ¾ Benefits Provides two separate data paths for memory traffic • Low priority, High priority Enhanced QoS regardless of CPU single VC limitation • Differentiation of traffic in single VC systems Available in PEX8618, PEX8614 and PEX8608 60 Example Without Packet-Ahead ¾ CPU supports VC0 and TC0 only No differentiation of traffic CPU ¾ Endpoints and Switch support two VCs and at least two TCs NT ¾ Single path to CPU PEX 8618 ¾ System is limited by CPU capabilities ASIC ASIC ASIC 61 Example with Packet-Ahead ¾ Same CPU Limitations VC0 and TC0 only CPU ¾ Same Endpoint and Switch capabilities VC0 – VC1; TC0 – TC1 ¾ Two paths to CPU Via Upstream Port and NT Port ¾ For packets received on NT Port TC is changed from TC0 to TC1 Are mapped to High Priority VC1 NT PEX 8618 ASIC ASIC ASIC ¾ Packets received on upstream port are unaffected 62 Packet-Ahead Transaction Details ¾ Posted Traffic (mem writes) CPU generates Posted Packet with TC0 NT port modifies packet with TC1 ¾ Non-Posted (Read Requests) CPU generates Read Request with TC0 NT port modifies packet with TC1 Endpoint sinks requests and provides completion with TC1 NT port modifies completion packet to original TC0 63 Direct Memory Access (DMA) Inside PLX PCIe Switches 64 DMA Benefits ¾ Independent Data Mover Can transfer small and large blocks of data No CPU involvement Can transfer data between all switch ports ¾ Centralized DMA Engine Processor/chipset no longer needs to support DMA • More selection Æ lower cost Software consolidation through multiple platforms • Software code for one DMA engine ¾ Improves system performance Low latency transfers while sustaining Gen 2 speeds 65 PCIe Switch with DMA ¾ PCIe Switch is a Multi-Function device Function 0 Î P-to-P bridge • Transparent Switch model • No Driver Required Upstream Port DMA F1 P-P Bridge Function 1 Î DMA endpoint • Type 0 Configuration Header • Memory mapped registers • Requires DMA Driver • Provided by PLX Virtual Bus P-P Bridge P-P Bridge P-P Bridge P-P Bridge Downstream Ports ¾ Available now PEX8619: 16-lane/16-Port PEX8615: 12-lane/12-Port PEX8609: 8-lane/8-Port 66 DMA Implementation ¾ Four DMA Channels – Each Channel: Works on one descriptor at a time Has a unique Requester ID (RID) Has a programmable traffic class (TC) for QoS ¾ DMA descriptor Specifies source address, destination address, transfer size, control Internal or external ¾ DMA Read Function Initiates Read Requests and collects completions by matching RID and Tag number ¾ DMA Write Function Converts Read Completion streams into Memory write streams 67 DMA Descriptor Overview ¾ Descriptors are instructions for DMA Written by CPU Stored in a ring in Host Memory OR Stored internal to the PCIe Switch Descriptor Ring Descriptor N – 1 32b DstAddr ¾ 16B standard format Supports 32-bit Addressing Control/Status information SrcAddr SrcAddrH, DstAddrH Descriptor 2 Transfer Size, Control Descriptor 1 Descriptor 0 ¾ 16B extended format Supports 64-bit Addressing Control/Status information 68 DMA Descriptor Prefetch ¾ DMA channel prefetches 1 to 4 descriptors at a time when in external descriptor mode ¾ Internal buffer support for up to 256 descriptors Active Channels Descriptors per Channel 1 256 2 128 4 64 ¾ Descriptors are prefetched to internal buffer until filled Control in place for # of descriptors to be prefetched • 1, 4, 8, or max per channel ¾ Invalid descriptors are dropped no further descriptor fetch until software clears status interrupt optionally enabled 69 DMA Runtime Flow – Host to I/O 1. 2. 3. CPU 4. RC Memory Switch FPGA ASIC ASIC ORANGE text describes CPU tasks 5. 6. CPU Programs Descriptors in RAM CPU enables DMA DMA reads Descriptors in RAM a. DMA prefetches 1-256 descriptors DMA works on 1 descriptor at a time a. DMA reads source b. Completions arrive in switch c. Completions are converted to Writes d. DMA Writes to Destination e. (There can be multiple read/write per descriptor) f. Clears Valid Bit on Descriptor after last write (optional) g. Interrupt CPU after descriptor (optional) h. Start next descriptor End of Ring (DMA Done) CPU receives and handles interrupts 70 DMA Performance ¾ Ordering enforced within a DMA channel only Descriptors are read in order from Host Memory Data within a descriptor is moved in order • Read Requests (MRd) are strictly ordered • Partial completions per MRd follow PCIe Spec – Out of order tags will be re-ordered on the chip • Write Requests (MWr) are strictly ordered ¾ Full Line Rate Throughput One channel can saturate one link in one direction • x8 at 5GT/s with 64B Read Completions Two channels can saturate both directions • x8 at 5GT/s with 64B Read Completions ¾ Programmable Interrupt Control Less interrupts Î Less CPU utilization ¾ Data Rate Controls in place to control the maximum read BW Transfer Max Read Size every X clocks (programmable) 71 Data Integrity 72 Data Integrity and Error Isolation ¾ PLX supports protection against PCIe errors Providing robust system through data integrity & error isolation ¾ PCIe Error Types Malformed packets EP (Poisoned TLPs) & ECRC errors 1-bit ECC & 2-bit ECC LCRC PHY Errors: Disparity, 8b/10b Encoding, Scrambler & Framing Receiver Overflow, Flow Control Protocol Error PLX Device Specific : ECC, UR overflow 73 Data Integrity ¾ Internal data path protection from ingress to egress Complete data protection through ECC ECRC on ingress and egress ports ¾ Higher performance by reducing re-transmission 74 Error Isolation – Fatal Errors ¾ User selectable behavior on fatal errors ¾ Malformed Packet or Internal Fatal Error handling Mode 1 (Default) • Assert FATAL_ERR# pin and send Error Message Mode 2 • Generate Internal Reset (equivalent to in-band hot reset) Mode 3 • Block all packet transmissions • Cancel packets in transit with EDB Mode 4 • Block all packet transmission • Bring upstream link down to cause surprise down 75 Error Isolation – EP/ECRC ¾ User selectable behavior with packet error ¾ Poisoned packet (EP) & ECRC error handling EP/ECRC Mode 1 (Default) • Forwarded with appropriate logging EP/ECRC Mode 2 • Drop EP/ECRC packet • Not forwarded, only logged EP/ECRC Mode 3 • Block violating device • Drop and Block EP/ECRC packet 76 Gen 1 Electrical & Mechanical Summary Package Size* (mm2) Typical Power Consumption (Watts) Max. Power Consumption (Watts) PEX 8505 15 x 15 mm2 0.8 W 1.4 W PEX 8508 19 x 19 mm2 1.6 W 2.5 W PEX 8509 15 x 15 mm2 1.2 W 1.8 W PEX 8511 15 x 15 mm2 1.0 W 1.6 W PEX 8512 23 x 23 mm2 2.2 W 3.1 W PEX 8513 19 x 19 mm2 1.3 W 2.6 W PEX 8516 27 x 27 mm2 3.2 W 4.3 W PEX 8517 27 x 27 mm2 2.6 W 3.6 W PEX 8518 23 x 23 mm2 2.6 W 3.6 W PEX 8519 19 x 19 mm2 1.7 W 2.6 W PEX 8524 31 x 31 mm2 3.9 W 6.1 W PEX 8525 31 x 31 mm2 2.6 W 3.8 W PEX 8532 35 x 35 mm2 5.7 W 7.4 W PEX 8533 35 x 35 mm2 3.3 W 4.8 W PEX 8547 37.5 x 37.5 mm2 4.9 W 7.1 W PEX 8548 37.5 x 37.5 mm2 4.9 W 7.1 W Device ¾ Voltages – 3 sources 1.0 V Core 1.5 V SerDes I/O 3.3 V I/O ¾ Thermal Industrial Temp available on most products * All PBGA, 1.0mm Pitch Typical: 35% lane utilization, typical voltages, 25oC ambient temperature Maximum: 85% lane utilization, max operating voltages, across industrial temperature 77 Gen 2 Electrical & Mechanical ¾ Voltages – 2 sources 1.0 V Core & SerDes I/O 2.5 V I/O ¾ Thermal Commercial Temp Package Size* (mm2) Typical Power Consumption (Gen 2 Mode) Typical Power Consumption (Gen 1 Mode**) Max. Power Consumption (Gen 2 Mode) Max. Power Consumption (Gen 1 Mode**) PEX 8648 27 x 27 mm2 3.7 W 3.0 W 8.6 W 7.9 W PEX 8647 27 x 27 mm2 2.8 W 2.1 W 7.4 W 6.7 W PEX 8632 27 x 27 mm2 2.7 W 2.2 W 6.4 W 5.9 W PEX 8624 19 x 19 mm2 1.9 W 1.5 W 4.6 W 4.3 W PEX 8616 19 x 19 mm2 1.7 W 1.5 W 4.3 W 4.1 W PEX 8619 19 x 19 mm2 1.80 W 1.58 W 4.53 W 4.08 W PEX 8618 19 x 19 mm2 1.75 W 1.54 W 4.47 W 4.04 W PEX 8617 19 x 19 mm2 1.6 W 1.4 W 4.26 W 3.85 W PEX 8612 19 x 19 mm2 1.6 W 1.4 W 4.2 W 4.0 W PEX 8615 19 x 19 mm2 1.6 W 1.37 W 4.01 W 3.65 W PEX 8614 19 x 19 mm2 1.5 W 1.33 W 3.96 W 3.60 W PEX 8613 19 x 19 mm2 1.4 W 1.2 W 3.80 W 3.4 W PEX 8609 15 x 15 mm2 1.33 W 1.16 W 3.51 W 3.20 W PEX 8608 15 x 15 mm2 1.31 W 1.15 W 3.51 W 3.20 W PEX 8606 15 x 15 mm2 1.25 W 1.09 W 3.32 W 3.04 W PEX 8604 15 x 15 mm2 1.18 W 1.03 W 3.13 W 2.89 W Device ** Preliminary Estimates Typical: 35% lane utilization, typical voltages 25C° ambient. L0s mode Maximum: 85% lane utilization, L0 mode, max operating voltages 78 End of Presentation Thank You www.plxtech.com 79