Arria 10 External Memory Interface Design Guidelines Quartus II Software v13.1 Arria 10 Edition Arria 10 design guidelines are preliminary and subject to change 1 Contents 2 Introduction Software requirements Generating interface IP and example design project Generating example design files Generating simulation design example files Simulation guidelines Timing closure guidelines Fitter guidelines DDR4 / DDR3 interface pin guidelines Introduction Altera EMIF IPs have an optional example design to demonstrate a complete interface solution This design can be used by customers for initial interface validation Arria 10 example design improvements Faster generation Automatic pin assignments. Script pin_assignments.tcl not created nor needed 3 Software Requirements 4 Quartus II software version 13.1 Arria 10 Edition Generating External Memory Interface IP and Example Design Project 5 New Features ARRIA 10 External Memory Interface (EMIF) IP All memory protocol generated through a single IP • Select your protocol in the Arria 10 External Memory Interface Megawizard GUI Fast generation mechanism • Faster IP and example design generation Automatic pin assignments • I/O standard and pin termination assignments are created during generation • No pin_assignments.tcl file Synthesis and Simulation file-sets are identical Ability to create memory configuration preset which can be used in different designs Select PLL reference clock frequency and FPGA termination settings directly in the Megawizard GUI 6 Generating Interface IP and Example Design Project 1. 7 The following steps and slides demonstrate how to create the memory interface IP and the example design project Open Quartus and launch MegaWizard Plug-in Manager from the Tools menu. Select Create New Megafunction 2. 8 Select ‘Create a new megafunction variation’ and click Next Establish New MegaCore Type and Name 3. 4. 5. 6. 9 Select ‘Arria 10 External Memory Interfaces v13.1’ IP under Interfaces->External Memory Select VHDL or Verilog HDL Enter the IP variation name for Memory IP used for the <variation _name>_example_design directory along with the IP files in your workspace Click Next to configure memory IP Configuring the Interface IP 7. 8. 9. Select the memory protocol from the drop down list Set the desired interface frequency Configure the Memory IP by selecting appropriate settings available under different tabs on this page. Note: Predefined configurations are available for various memory devices Pick the desired memory device preset from the list and click on Apply to populate all the fields with the vendor specified settings A custom preset can also be created by clicking on ‘New’ and then entering the configuration data 10 7 9 8 New Options in the MegaWizard Select the PLL reference clock frequency from the drop down menu The allowed PLL ref. clock values are calculated based on the Interface frequency On board oscillator frequency must be one of these values for the memory interface to function properly 11 Select the FPGA on-chip termination settings directly in the GUI Creating the Example Design 10. 11. 12 After configuring the IP, click on Finish A window will pop-up asking to create an example design Ensure the ‘Generate Example Design’ option is selected and click on Generate Interface IP and Example Design Output 13 After IP generation is complete, <variation_name>_example_design directory will be created in your project directory In this example, the variation name is ddr3 and the script files needed to create an example design are available in ddr3_example_design Two TCL Scripts Created <variation_name>_example_design contains two TCL scripts: I. make_qii_design.tcl The make_qii_design.tcl generates a synthesizable design example along with a Quartus project, ready for compilation II. make_sim_design.tcl The make_sim_design.tcl generates a simulation design example along with tool-specific scripts to compile and elaborate the necessary files 14 Generating the Example Design Files To generate synthesizable design example, run the make_qii_design.tcl script in Nios II command shell or from a command line: 12. Open Nios II command shell and browse to the <variation_name>_example_design directory Or change directory to <variation_name>_example_design directory 13. Run the make_qii_design.tcl script by executing the following command: quartus_sh -t make_qii_design.tcl Optionally can run the make_qii_design.tcl script for a specific device quartus_sh -t make_qii_design.tcl 10AX115R3F40I2SGES 15 Example Design Script Output This script runs for a few seconds and produces a qii directory containing a project called ed_synth.qpf Open and compile this project with the Quartus II software v13.1 Arria 10 Edition • QSYS file is also generated • Open this file in QSYS to add remove or modify IPs in the example design 16 Generating Simulation Files for Example Design Project Overview Details in simulation guidelines section 17 Generating the Simulation Design Example Files To generate a simulation design example, run the following script in Nios II command shell or from the command line Options of VERILOG or VHDL quartus_sh -t make_sim_design.tcl VERILOG The simulation design example is made of a driver connected to the generated IP (device under test or DUT) and to the memory model Driver generates random traffic and internally checks the legality of the outgoing data Memory Model Example Testbench Avalon Memory AFI PHY Controller Driver (Traffic Generator) Pass/Fail Arria 10 EMIF IP Core Example Design 18 Simulation Example Design Script Output Script creates a sim directory containing one subdirectory for each supported simulation tools 19 Each subdirectory contains the specific scripts to run simulation with the corresponding tool Arria 10 Simulation Guidelines Arria 10 Simulation Guidelines are preliminary and subject to change 20 Simulation Users will be able to choose between two simulation models Skip Calibration Fastest simulation Loads the settings calculated from memory configuration and enters user mode Full Calibration (not supported in Quartus II 13.1) Performs all stages of memory calibration: calibration phases, delay sweeps, and centering of all data bits Skip Calibration mode Full Calibration mode System-level simulation focusing on user logic Memory interface simulation focusing on calibration Details of calibration are not captured Details of calibration are captured (i.e. stages) Enables users to store and retrieve data Includes leveling, per-bit deskew, etc.. Efficiency accurate No board skews are taken into account 21 Simulation: Supported & Not Supported Supported Not Supported Functional Verification Timing Verification Skip Calibration (default) Nativelink Memory Vendor Models Full Calibration* Post-Fit Simulation* Multi-Rank* Multiple-CS memory interface* Memory frequency < 400Mhz* RDIMM & LRDIMM configurations* *Available in a future Quartus II version Note: Validating the timing of your design requires using Altera’s TimeQuest Timing Analyzer 22 Simulation To simulate your design you will need the following components Altera Supported Simulator Design using Altera External Memory Interfaces IP An Example Driver (Altera or User) Testbench (Altera or User) Altera’s Memory Simulation Model (We do not support simulation with memory vendor models) 23 Information About Simulation Filesets & Directory Core simulation filesets are identical to core synthesis filesets Addresses simulation v synthesis fileset concerns of the past Ensure users are simulating the blocks they are synthesizing <dut>/* <dut>_sim/altera_emif_arch_nf/* Any changes made in the synthesis directory should also be made in the simulation directory to reflect similar IP behavior Fewer files to compile for simulation compared to the External Memory Interfaces IP of the past (UniPHY, ALTMEMPHY) 64 files for the example design 7 are unique for each interface Users can modify the BIST in the example design if needed 24 Important Assumptions Made by Simulation Altera library simulation atoms assume the following: RTL simulation assumes an ideal layout including: Interfaces are unaware of each other Interface assumes it is the only interface in the column Interface believes it has its own IOAUX and Hard Nios Interface is at the bottom of the column, nearest to the physical IOAUX block location Fitter may actually place an interface at the top of the column if left unconstrained but there are no drawbacks between an interface at the top of the column and an interface at the bottom of the column One PLL per interface At Post-Fit, it is possible for interfaces to share the same bank PLL PLL reset only occur during power-up Issue a recalibration request per-EMIF interface in place of a PLL reset *EMIF = External Memory Interfaces 25 RTL Simulation v Post-Fit Implementation There may be a discrepancy between the simulated latency versus the Post-Fit latency Do not rely on the simulated interface latency bank 1 bank 0 Lane 3 Lane 3 Lane 2 Lane 2 Lane 1 Lane 1 Lane 0 Lane 0 bank 0 Lane 3 Lane 2 Lane 1 Lane 0 RTL Simulation 26 bank 1 Fitter Operations Lane 3 Lane 2 Lane 1 Lane 0 Post-Fit Implementation AFI Clock Cycle Penalty RTL Simulation v Post-Fit Implementation Introduction of AFI Clock Cycle penalty: For wide, multi-bank interfaces and/or ultra-low latency interfaces Fitter can detect this penalty and will issue a warning accordingly Only an issue when the requested write latency is less than the latency accrued by the farthest away bank bank 2 Lane 3 Lane 2 Lane 1 Lane 0 bank 1 Lane 3 Lane 2 Lane 1 Lane 0 bank 0 Lane 3 Lane 2 Lane 1 Lane 0 27 AFI Clock Cycle Penalty RTL Simulation v Post-Fit Implementation RTL Simulation: NIOS initialization and calibration code executes in parallel for all interfaces Interfaces might assert ‘cal_done’ (calibration done) simultaneously in simulation Do not rely on this behavior shown in simulation Post-Fit Implementation: NIOS initialization and calibration code executes sequentially Order of calibration is determined by fitter operations Calibration is complete when all interfaces in a column assert 'cal_done'" You must sample all cal_done signals in a column to determine when calibration is complete 28 Generating the Example Design 29 Step 1: Generate the Design Simulating the example design Generation Output File/Directory Description dut/ <- Actual IP for your project dut/ Actual IP for your project dut_example_design/ <- Example Design Subfolder dut_sim/ <- Subfolder Simulation fileset only (no driver, etc) dut_example_design/ Example Design dut.cmp dut_sim/ Simulation Fileset (no driver) dut.qip dut.cmp dut.spd Component Declaration File (text file containing port definitions that can be used in VHDL Design Files) dut.sip dut.qip dut.ppf Quartus IP File (contains paths for all files needed for the IP core) dut.v 30 dut.spd Simulation Package Descriptor File (lists the required simulation files for the IP core or Qsys system) dut.sip Simulation IP File (contains information assignments that specify IP simulation source files) dut.ppf Pin Planner File (XML file that stores the port and node assignments for use with the Pin Planner) dut.v Variation File of the IP core (contains the IP settings used to generate the IP core) Core files in each subdirectory are identical dut_example_design/*: Contains driver and memory model dut_sim/* and dut/*: Identical Example Design Output Files In dut_example_design File Description ed_sim.qsys Qsys file capturing the simulation example design ed_synth.qsys Qsys file capturing the example design for synthesis make_qii_design.tcl Script to generate the example design project for synthesis make_sim_design.tcl Script to generate the example design for simulation 31 params.tcl Support file for the generation scripts params.txt XML file created that holds user chosen IP settings readme.txt Instructions for user Generating the Example Design Simulation Files 32 Run “quartus_sh -t make_sim_design.tcl VERILOG” to generate the simulation files for the example design in verilog Run “quartus_sh -t make_sim_design.tcl VHDL” to generate the simulation files for the example design in VHDL Modelsim Example Flow You can create your own .do file in order to view signals in Modelsim’s waveform viewer Example: run.do Execute ‘do run.do’ in the Modelsim console to run the simulation with signal waveforms To only see a successful simulation result shown in the console messages Execute ‘source msim_setup.tcl’ in the Modelsim console Execute ‘ld_debug’ after msim_setup.tcl is loaded Execute ‘run -all’ after ld_debug finishes 33 Modelsim Example Guide To store the entire log of the simulation data and results Edit msim_setup.tcl & add ‘-l ed_sim.log’ to the vsim line 34 Simulating the example design Supported simulators: Supported Not Supported Mentor Graphics Modelsim Aldec Riviera-PRO* Synopsys VCS and VCS-MX Cadence NCSIM* *Available in a future Quartus II version 35 Arria 10 EMIF Timing Closure Guidelines Arria 10 timing closure guidelines are preliminary and subject to change Arria 10 EMIF Timing paths User Logic (Core) 37 Periphery IO including read, write, write levelling, etc Timing Closure Guidelines Timing closure in any core transfers include Arria 10 timing analysis will not show any periphery-to-periphery timing in TimeQuest Timing closure in any of the IO transfers Board skews must be simulated using board tool (not estimated or calculated via trace length) Channel effects (ISI and crosstalk) can only be determined by a board simulator Include simulated Board skew and Channel effects in Megawizard GUI during IP generation Refer to Board guidelines for more details on board skew and channel effects ReportDDR” will run automatically as part of signoff timing analysis 38 Not dependent on Quartus II compile Dependent on customer memory, FPGA speed grade parameters and channel effects For accurate timing analysis, simulate correct board parameters and channel effects From last set of registers in core to first set of registers to periphery (C2P) From last set of registers in periphery to first of registers in core (P2C) Note that C2P/P2C paths are cut and not analyzed in the current Quartus II release Core Timing analysis will not include user logic timing nor user logic timing to/from EMIF block User has to check “ReportDDR” as part of signoff timing analysis to make sure EMIF has closed timing Estimate Early IO Timing without Quartus II Compilation Users can see IO margins without compiling EMIF design Early IO timing will look like a spreadsheet type analysis shown in a TimeQuest panel Provides breakdown in margin loss between receiver/transmitter/channel Flow Generate EMIF IP with configuration of interest including memory and board parameters Create Quartus II project files ( QPF, QSF) with selected Arria 10 device part Run TimeQuest with <name>_report_io_timing.tcl that get’s generated as part of the IP EMIF IP generation Details in the next slide Ideal Window EMIF source Channel Effects (.tcl files) Run _report_io_timing.tcl in TimeQuest Quartus project creation (including part selection) Quartus project files (.qpf/.qsf) Early IO estimate flow 39 Memory (Receiver) Effects FPGA (Transmitter) Effects Final Margin Time Quest Panel Running TimeQuest for Early IO Estimates 1. 2. 3. 4. 5. Start TimeQuest Open Project Pick “Script Run TCL script” Pick <name>_report_io_timing.tcl file TimeQuest prints out summary and creates a “ReportDDR” panel Same level of detail for IOs Produces a warning mentioning that core timing is not included Similar type of analysis available for all IO transfers (Read capture, DQS gating, A/C and Write Leveling) Note: To generate early IO timing reports, run report_IO_timing.tcl before running any Quartus II compilation 40 Early IO Estimates – other Execution Methods Instead of “Script Run TCL script” just type “source <name>_report_io_timing.tcl” in the TCL console OR At the command prompt type Quartus_sta -t <core_name>_report_io_timing.tcl <project_name> Note: Positive margins in Early IO timing estimate does not guarantee signoff timing analysis 41 Arria 10 Fitter Guidelines Arria 10 fitter guidelines are preliminary and subject to change 42 Fitter Behaviors Multi-bank Interface: Multiple, contiguous banks that make up one interface Introduction of AFI Clock Cycle penalty: For wide, multi-bank interfaces and/or ultra-low latency interfaces Fitter can detect this penalty and will issue a warning accordingly Only an issue when the requested write latency is less than the latency accrued by the farthest away bank in a multi-bank interface 43 Clocking Since multi-bank interfaces will use multiple PLLs and in turn multiple PHY clock trees, a reference clock tree will be used to route a common reference clock signal to all PLLs Not all pins can drive the PLL reference clock tree Quartus II software restricts PLL reference clock frequencies depending on the memory frequency Use the Arria 10 External Memory Interfaces IP Megawizard GUI to determine the valid PLL reference clock frequencies 44 Fitter merges PLLs when a bank is shared by different interfaces Fitter duplicates PLL for multi-bank interfaces Clocking Example of the reference clock tree driving multiple PLLs which are, in turn, driving multiple PHY clock trees Balanced Reference Clock Network Jitter is lowered with this balanced structure 45 PLL PHY clock tree PLL PHY clock tree PLL PHY clock tree PLL PHY clock tree Sharing Resources In Arria 10, the following resources can be shared and in some cases are forced to be share: Resource Implication I/O bank Ability to fit more interfaces in a single column Hard Nios II Cannot rely on one cal_done signal as representative of all interfaces passing the calibration stage Core Clock Network Shared PLL reference network. Users should place interfaces in consecutive banks PLL reference clock pins* Shared PLL reference clock and network trees OCT block and RZQ pin* None Address/Command pins* Shared for Ping Pong PHY *More in Pin Guidelines Certain resources are forced to be shared IOAUX & Hard Nios II CPU for all interfaces in a column A bank shared by two interfaces 46 PLL/DLL do not need to be shared as each bank has one Sharing an I/O Bank Fitter can place interfaces in a shared bank if the interfaces share the same: Protocol Rate Phase Frequency Users can fit even more interfaces in a column Interfaces cannot share the same controller nor sequencer Fitter will not allow users to have a lane shared by two interfaces One DQS-in tree can only talk to one controller Unused pins can be used by the customer as a GPIO Must be the same voltage standard 47 Example: 2 x16 interfaces sharing a bank Unused (Free for GPIO, but not LVDS) Bank N+1 Fixed Address / Command Pin out Data path Bank N Unused (Free for GPIO, but not LVDS) Data path Data path Bank N-1 48 Fixed Address / Command Pin out Sharing Hard Nios / IOAUX Interfaces placed within the same column by the fitter will share the same IOAUX and Hard Nios II The Hard Nios II calibrates each interface sequentially You must sample all cal_done signals in a column to determine when calibration is complete RTL simulation behaves as if every interface has its own Hard Nios II More on this in the Simulation Design Guidelines 49 Sharing Hard Nios II Processor The Arria 10 External Memory Interfaces IP will contain one Hard Nios II and IOAUX per interface but fitter will merge them all into a single instance You must use the same IOAUX clock and reset for all interfaces in the same column or Fitter will generate an IOAUX error bank 3 Lane 3 Lane 2 Lane 1 Lane 0 bank 2 Lane 3 Lane 2 Lane 1 Lane 0 bank 1 Lane 3 Lane 2 Lane 1 Lane 0 bank 0 Lane 3 Lane 2 Lane 1 Lane 0 50 Sharing Core Clock Networks Fitter can use one core clock domain to synchronously access all interfaces in a column Users can share core clock networks by the master & slave setting in the IP generation GUI Connect core_clks_master_out from the master to all slave’s core_clks_slave_in Must use same column, PLL reference clock, rate, and frequency Interfaces in different columns cannot use this feature Place interfaces in consecutive banks as the PLL reference clock are forced to be shared when choosing to share core clock networks 51 Fitter Relationship to Pin Assignments Pin assignments Fitter can reallocate banks based on user pin assignments Fitter can rotate pins within a lane based on user pin assignments but cannot move pins across lanes away from their DQS group Users can constrain a DQS pin to a lane, and the Fitter will place all DQ signals in their respective DQS group in the same lane 52 Arria10 Interface Pin Guidelines Arria 10 Pin Guidelines are preliminary and subject to change Overview Pin Guidelines Guidelines Rules for constraining pins Determine IO bank requirements for DDR3, DDR4 Interface placement Find pin names for A/C and data pins Example for constraining DDR3 x8 and x72 Alternate methods for constraining interface pin assignments Pin guidelines for sharing multiple interfaces Constraints for sharing multiple interfaces Step by step guidelines 54 Pin Guidelines 1. Determine number of banks based on interface width 2. Pick CK0 pin based on desired interface location in the FPGA 3. Constrain CK0 pin to selected pin name or A/C bank 4. Constrain one DQS pin for each DQS group either to pin name or A/C bank 5. Constrain PLL reference clock pin and RZQ pin to pin names 55 Rules for constraining pins A/C All A/C pins should be in a single BANK A/C and data pins cannot share a lane (12 IOs) But unused A/C pins in a lane can be used by GPIOs A/C pins must follow predefined locations within a BANK A/C and data pins can share a bank DQ pins DQ signals from two different DQS groups cannot be constrained to same IO_12_LANE 56 DQS pins Related DQ pins must be in the same IO_12_LANE(s) A read data group must be assigned based on DQSin grouping in pin table Rules for constraining pins (contd) PLL Reference Clock pins For a given interface speed, there is a restriction on values of possible PLL ref clock frequencies Must use Arria 10 Interface v13.1 MegaWizard to determine possible PLL reference clock frequencies for onboard crystal oscillator Altera recommends using the default PLL reference clock frequency from MegaWizard Crystal clock frequency is generally lowest clock frequency of memory interface divided by integer N Where N= 1,2,3,4,5 57 Find detailed step by step guidelines for constraining pins in next few slides DDR3: Determine Number of Banks Required Find number of banks required based on whether there is any IO bank sharing or not Pin count for DDR3 8/16/32-bit is based on 1CS When using multiple CS for DDR3 8/16/32-bit, Add 5 more pins for 2CS to total IO count Add 15 more pins for 4CS to total IO count Calculate the number of banks required with multiple CS pins No. of IO Banks = (Total No. of IOs for 1CS + Additional IOs for multiple CS) / 48 IO Banks sharing with Interface width and Number of IO Banks other Interfaces (1/4 memory configuration IOs (non-sharing) Granularity) 8-bit w/o ECC 1CS 16-bit w/o ECC 1CS 16-bit with ECC 1CS 32-bit w/o ECC 1CS 32-bit with ECC 1CS 72-bit UDIMM 1-Rank 72-bit UDIMM 2-Rank 72-bit UDIMM 4-Rank 72-bit SO-DIMM 1-Rank 72-bit SO-DIMM 2-Rank 72-bit SO-DIMM 4-Rank 58 43 55 67 79 91 139 144 154 127 132 142 1 2 2 2 2 3 3 4 3 3 3 1 1.25 1.5 1.75 2 3 3 3.25 2.75 2.75 3 DDR4: Determine Number of Banks Required Find number of banks required based on whether there is any IO bank sharing or not Pin count for DDR3 8/16/32-bit is based on 1CS When using multiple CS for DDR3 8/16/32-bit, Add 5 more pins for 2CS to total IO count Add 15 more pins for 4CS to total IO count Calculate the number of banks required with multiple CS pins No. of IO Banks = (Total No. of IOs for 1CS + Additional IOs for multiple CS) / 48 IO Banks sharing with Interface width and Number of IO Banks other Interfaces (With memory configuration IOs (non-sharing) 1/4 Granularity) 8-bit w/o ECC 1CS 16-bit w/o ECC 1CS 16-bit with ECC 1CS 32-bit w/o ECC 1CS 32-bit with ECC 1CS 72-bit UDIMM 1-Rank 72-bit UDIMM 2-Rank 72-bit UDIMM 4-Rank 72-bit UDIMM 1-Rank 72-bit UDIMM 2-Rank 72-bit UDIMM 4-Rank 59 49 61 73 85 97 142 147 157 145 150 160 1 2 2 2 2 3 4 4 3 4 4 1 1.25 1.5 1.75 2 3 3.25 3.25 3 3.25 3.5 Plan Interface Placement in Column Select consecutive banks out of 8 banks in a column and select middle bank for Address/command (A/C) pins (Must) In case of even number of banks, pick any one of the middle two A/C pins can take 3 or 4 IO lanes depending on memory topology and protocol When A/C requires only 3 IO lanes, only bottom 3 lanes (A/C 0,1, 2) must be used DDR3: Find pin number for CK0 pin Find pin number for CK0 pin based on the A/C lanes selected Examples: Pin 8 corresponds to CK0 pin Pin 24 corresponds to PLL_clockin[0] Pin 26 corresponds to RZQ The pin number for A/C pins is also generated in “project”_readme.txt in “Project”/submodules/ folder Altera recommends using the “project”_readme.txt file to find the pin numbers for A/C pins 61 Pin No. Component, UDIMM, SO-DIMM 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 CK3# CK3 CK2# CK2 CKE[3] CKE[2] ODT3 ODT2 CS[3] CS[2] BA[2] BA[1] BA[0] CAS# RAS# A[15] A[14] A[13] A[12] RZQ PLL_clockin[1] PLL_clockin[0] A[11] A[10] A[9] A[8] A[7] A[6] A[5] A[4] A[3] A[2] A[1] A[0] CK1# CK1 CK0# CK0 CKE[1] CKE[0] ODT1 ODT0 CS[1] CS[0] RESET# WE# DDR4: Find Pin Number for CK0 Pin Pin No. Find pin number for CK0 pin based on the A/C lanes selected Examples: Pin 8 corresponds to CK0 pin Pin 24 corresponds to PLL_clockin[0] Pin 26 corresponds to RZQ The Pin number for A/C pins is also generated in “project”_readme.txt in “Project”/submodules/ folder Altera recommends using the “project”_readme.txt file to find the pin numbers for A/C pins 62 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Component UDIMM Alert_n BG[0] BA[1] BA[0] A[17] A[16] A[15] A[14] A[13] A[12] RZQ PLL_clockin[1] PLL_clockin[0] A[11] A[10] A[9] A[8] A[7] A[6] A[5] A[4] A[3] A[2] A[1] A[0] PAR_in C2 CK0# CK0 C1 CKE[0] C0 ODT0 ACT_n CS[0] RESET# BG[1] CK1# CK1 CK3# CK3 CK2# CK2# CKE[3] CKE[2] ODT3 ODT2 CS[3] CS[2] BG[0] BA[1] BA[0] A[17] A[16] A[15] A[14] A[13] A[12] RZQ PLL_clockin[1] PLL_clockin[0] A[11] A[10] A[9] A[8] A[7] A[6] A[5] A[4] A[3] A[2] A[1] A[0] PAR_in CS[1] CK0# CK0 CKE[1] CKE[0] ODT1 ODT0 ACT_n CS[0] RESET# BG[1] Find Pin Name for CK0 and Constrain the Pin Based on A/C bank selected, identify column index and bank index for the A/C pins Column Index ranges from 0-1 and Bank Index from 0-7 Find pin name for A/C pin from pin table and constrain pin to pin name in QSF Example: Pin name for CK0 in Column0 Bank 6 is L23 set_location_assignment PIN_L23 –to CK0 OR Find Bank name for A/C pin and constrain pin to selected IO48 bank in QSF Example: Selected IO bank for CK0 in Column0 Bank 6 is 2K set_location_assignment IOBANK_2K –to CK0 Effectively locks all A/C signals • I/O Bank 63 P = Pin index (0-47) X = Column index Y = Bank index Use Arria10 Pin table to find Pin index, Column index and Bank Index Pin name Constrain DQS and PLL Reference Clock Pins Constrain PLL reference clock pin and RZQ pin to the pin names Constrain one DQS pin for each DQS group either to the pin names or selected IO48 banks Follow same method as CK0 for finding pin names for DQS and PLL reference clock pins Constraining CK0 pin, one DQS pin per group and PLL reference clock effectively locks the entire interface Good compromise between full-automatic and manual placement Requires minimal effort Altera recommends this method for constraining pin assignments 64 Example for constraining DDR3 x8 Requires 1 Bank 3 lanes for A/C pins and 1 lane for DQ and DQS pins Picked IO lanes 0,1,2 for A/C Pins Constrain pin CK0 to pin8 of Bank0 (P8X0Y0) set_location_assignment PIN_AG16 –to CK0 Constrain pin DQS0 to Bank0 (2A) set_location_assignment IOBANK_2A –to DQS0 Constrain PLL_refclk to pin24 to Bank0 (P24X0Y0) set_location_assignment PIN_AM15 –to PLL_clockin Constrain rzqpin to pin24 to Bank0 (P26X0Y0) set_location_assignment PIN_AK18 –to rzqpin Example for constraining DDR3 x72 DDR3 x72 w/ Hard Controller Requires 3 banks 3 lanes for A/C pins 9 lanes for data Constraining DDR3 x72 Constrain pin CK0 to pin8 of Bank1 (P8X0Y1) Constrain PLL refclk to pin24 of Bank1 (P24X0Y1) Constrain rzqpin to pin26 of Bank1 (P24X0Y1) Constrain DQS groups 66 DQS0 to Bank2 (2G) DQS1 to Bank2 (2G) DQS2 to Bank2 (2G) DQS3 to Bank2 (2G) DQS4 to Bank1 (2F) DQS5 to Bank0 (2A) DQS6 to Bank0 (2A) DQS7 to Bank0 (2A) DQS8 to Bank0 (2A) Alternate Methods: Constraining Interface Pin assignments 1. 2. 67 Let the fitter assign all Interface signals (A/C, DQS, DQ pins) automatically a) Run the design through the fitter without any constraints b) Save the post-fit netlist, or back-annotate the pin assignments Requires least effort but longer compilation time This method works well for small designs (one interface per column) Must not use this method for darge designs with multiple IPs (Interfaces, HSSI, GPIOs, LVDS etc.) Manually constrain all Interface signals (A/C, DQS, DQ pins) to pin locations a) Plan the Interface placement in a column (i.e., which IO48 banks to use) b) Use pin table to find legal position for each Interface pin c) Use QSF assignments to lock down the pins Fast periphery placement Can be a lengthy/tedious process (especially with multiple IPs) Overview Pin Guidelines Step by Step Guidelines Rules for constraining A/C, DQ, DQS and CLK Pins Determine IO Bank Requirements for DDR3, DDR4 Interface placement Find Pin Names for A/C and data pins Constraining Interface Pin assignments Pin Guidelines for sharing Multiple Interfaces Constraints for sharing multiple interfaces Step by step guidelines 68 Constraints for Sharing Multiple Interfaces While sharing bank across multiple Interfaces the following criteria should be followed Must use identical clocks (rate, frequency, PLL ref clock) Same protocol Same voltage settings ( VCCIO, VREF) While sharing PLL Reference clock pin between interfaces the banks must be consecutive Interfaces using same IO standard can share OCT and RZQ Pin A bank cannot be used as A/C bank for two or more interfaces A lane cannot be shared 69 Reason: hard controller and sequencer cannot be shared Reason: only one DQSin tree per lane; a lane can only talk to one controller Pin Guidelines for Sharing Multiple Interfaces (Steps) 1. Determine total number of interfaces required 2. Determine No. of banks based on interface width and No. of interfaces 3. Ensure that interfaces meet the criteria for sharing interfaces 4. Plan Interface placement in column(s) a. 5. Select middle bank for sharing DQ pins between two interfaces For each interface, constrain CK pin to selected A/C bank or pin name 6. Constrain PLL reference clock to the A/C bank for one of the interfaces only 7. For each interface, constrain one DQS pin in each DQS group to a pin name or Bank. 70 Arria 10 Board Design Guidelines Arria 10 board design guidelines are preliminary and subject to change Guidelines Following guidelines are covered in the subsequent slides and they apply to both DDR3 and DDR4 Generic guidelines Length DQ to DQS delay Address command (should include the package delay?) Delay within the group DQS to CK guideline 72 Length matching guidelines are recommendations and they should not be considered as hard guidelines Customer must perform necessary board level simulation to make sure there are no signal integrity, ISI and crosstalk related issues Customers must also enter accurate information in the ‘Board Timing’ tab of the memory MegaCore and compile the design to ensure there are no timing violations Generic Guideline Trace impedance plays an important role in the signal integrity Users must perform board level simulation to determine the best characteristic impedance for their PCB For example, it is possible that for multi rank systems 40 ohm would yield better result than a traditional 50 ohm characteristic impedance To minimize PCB layer propagation variance, Altera recommend that you route signals from the same net group on the same layer Use 45° angles (not 90° corners) Disallow critical signals across split planes Route over appropriate VCC and GND planes Keep signal routing layers close to GND and power planes Avoid routing memory signals closer than 0.025 inch (0.635 mm) to memory clocks 73 Maximum Lengths For DIMM From FPGA to DIMM connector max allowed trace length is 4.5 inches. Maximum DIMM to DIMM distance is 0.425 inches For Discrete components 7 inches maximum for address/command signal 5 inches maximum for DQ/DQS/DM 74 DQ-DQS Delay Match the (package + board) trace delays up to 20 ps of skew for DQ/DQS/DM signals within a DQS group. Details on how to do package de-skew is available in EMIF HB vol2 chapter 4. 75 Address/Command/Control Skew All the address, command and control signals should match up to +/- 20 ps compare to the mem_clk trace For example if the mem_clk trace delay is 500 ps then the allowed range for any address/command/control signal is 480 ps to 520 ps For discrete components; make sure above recommendation is met for each component in the fly-by chain For DIMMs: For single or multiple DIMM configuration make sure this guideline is met at each DIMM connector 76 Address/Command/Control Skew 77 x = y +/- 20 ps x + x1 = y + y1 +/- 20 ps x + x1 + x2 = y + y1 + y2 +/- 20 ps X + x1 + x2 + x3 = y + y1 + y2 + y3 +/- 20 ps DQS-CLK guideline The timing between the DQS and clock signals on each device calibrates dynamically to meet tDQSS. To make sure the skew is not too large for the leveling circuit’s capability 1. Propagation delay of clock signal must not be shorter than propagation delay of DQS signal at every device: (CKi ) – DQSi > 0; 0 < i < number of components – 1 2. Total skew of CLK and DQS signal between groups is less than one clock cycle: (CKi + DQSi) max – (CKi + DQSi) min < 1 × tCK 78 If you are using a DIMM topology, your delay and skew must take into consideration values for the actual DIMM. Simulation Guideline 1. Using board sim tool such as Hyperlynx, set up a trace models which includes IBIS/Hspice buffer models for FPGA and memory, DIMM connector model (if applicable) and accurate board stack up 2. Use this setup to extract accurate trace delay and ISI information 3. Export memory interface layout into the board sim tool to run the board level simulation 4. Use PDN analysis tools to simulate the power supply noise 79 Board Skew Parameters Users must enter accurate information about various delays and the skew in the MegaWizard Timing analysis DDR timing analysis scripts take board skews into account when generating timing analysis report Inaccurate board skew parameters with result in inaccurate timing analysis of the memory interface Delay Chain Settings Physical delays are applied to delay chains to compensate for the skew mismatch between various signals Board skew parameters affects the initial value applied to the delay chains Altera recommends that you simulate your interface in Hyperlynx (or similar tool) to acquire trace delays 80 You can use the Board Skew Parameters Tool available on altera website to calculate the parameter once you have acquired the trace delays Slew Rates, ISI and Crosstalk As the operating frequencies are pushing beyond 1 GHz, it is becoming increasingly important that user enters accurate slew rates and ISI/crosstalk information Customers should perform board simulation on the external memory interfaces and acquire all the necessary slew rates and ISI/crosstalk related information, and enter that information into the MegaWizard Do not use the default values. Accurate information about slew rates and ISI/crosstalk will result in accurate timing analysis of the interface Refer to EMIF HB Vol2 Chapter 9 for further information about ‘Board Timing’ parameters 81 Thank You