Arria 10 External Memory Interface Design Guidelines

Arria 10 External Memory Interface
Design Guidelines
Quartus II Software v13.1 Arria 10 Edition
Arria 10 design guidelines are preliminary and subject to change
1
Contents









2
Introduction
Software requirements
Generating interface IP and example design project
Generating example design files
Generating simulation design example files
Simulation guidelines
Timing closure guidelines
Fitter guidelines
DDR4 / DDR3 interface pin guidelines
Introduction

Altera EMIF IPs have an optional example design to
demonstrate a complete interface solution

This design can be used by customers for initial
interface validation

Arria 10 example design improvements
 Faster generation
 Automatic pin assignments.
 Script pin_assignments.tcl not created nor needed
3
Software Requirements

4
Quartus II software version 13.1 Arria 10 Edition
Generating External Memory Interface
IP and Example Design Project
5
New Features
ARRIA 10 External Memory Interface (EMIF) IP
All memory protocol generated through a single IP
• Select your protocol in the Arria 10 External Memory Interface Megawizard
GUI
Fast generation mechanism
• Faster IP and example design generation
Automatic pin assignments
• I/O standard and pin termination assignments are created during generation
• No pin_assignments.tcl file
Synthesis and Simulation file-sets are identical
Ability to create memory configuration preset which can be used in different designs
Select PLL reference clock frequency and FPGA termination settings directly in the
Megawizard GUI
6
Generating Interface IP and Example Design Project

1.
7
The following steps and
slides demonstrate how
to create the memory
interface IP and the
example design project
Open Quartus and launch
MegaWizard Plug-in Manager from
the Tools menu.
Select Create New Megafunction
2.
8
Select ‘Create a new megafunction variation’ and click Next
Establish New MegaCore Type and Name
3.
4.
5.
6.
9
Select ‘Arria 10 External Memory Interfaces v13.1’ IP under Interfaces->External
Memory
Select VHDL or Verilog HDL
Enter the IP variation name for Memory IP used for the <variation
_name>_example_design directory along with the IP files in your workspace
Click Next to configure memory IP
Configuring the Interface IP
7.
8.
9.
Select the memory protocol
from the drop down list
Set the desired interface
frequency
Configure the Memory IP by
selecting appropriate
settings available under
different tabs on this page.
Note:

Predefined configurations are
available for various memory devices

Pick the desired memory device
preset from the list and click on Apply
to populate all the fields with the
vendor specified settings

A custom preset can also be created
by clicking on ‘New’ and then entering
the configuration data
10
7
9
8
New Options in the MegaWizard
 Select the PLL reference
clock frequency from the
drop down menu
 The allowed PLL ref. clock
values are calculated
based on the Interface
frequency
 On board oscillator
frequency must be one of
these values for the
memory interface to
function properly
11
Select the FPGA
on-chip termination
settings directly in
the GUI
Creating the Example Design
10.
11.
12
After configuring the IP, click on Finish
A window will pop-up asking to create an example design
Ensure the ‘Generate Example Design’ option is selected and click on Generate
Interface IP and Example Design Output

13
After IP generation is complete, <variation_name>_example_design
directory will be created in your project directory

In this example, the variation name is ddr3 and the script files
needed to create an example design are available in
ddr3_example_design
Two TCL Scripts Created

<variation_name>_example_design contains two TCL scripts:
I.
make_qii_design.tcl
The make_qii_design.tcl generates a synthesizable design example along with
a Quartus project, ready for compilation
II.
make_sim_design.tcl
The make_sim_design.tcl generates a simulation design example along with
tool-specific scripts to compile and elaborate the necessary files
14
Generating the Example Design Files
To generate synthesizable design example, run the make_qii_design.tcl
script in Nios II command shell or from a command line:

12.
Open Nios II command shell and browse to the
<variation_name>_example_design directory
Or change directory to <variation_name>_example_design directory
13.
Run the make_qii_design.tcl script by executing the following command:
quartus_sh -t make_qii_design.tcl
Optionally can run the make_qii_design.tcl script for a specific device
quartus_sh -t make_qii_design.tcl 10AX115R3F40I2SGES
15
Example Design Script Output
This script runs for a few seconds and produces a qii directory containing a project
called ed_synth.qpf
Open and compile this project with the Quartus II software v13.1 Arria 10 Edition
• QSYS file is also
generated
• Open this file in QSYS
to add remove or
modify IPs in the
example design
16
Generating Simulation Files for
Example Design Project
Overview
Details in simulation guidelines section
17
Generating the Simulation Design Example Files

To generate a simulation design example, run the following script in Nios II
command shell or from the command line
Options of VERILOG or VHDL
quartus_sh -t make_sim_design.tcl VERILOG

The simulation design example is made of a driver connected to the generated
IP (device under test or DUT) and to the memory model

Driver generates random traffic and internally checks the legality of the outgoing data
Memory Model
Example Testbench
Avalon
Memory
AFI
PHY
Controller
Driver
(Traffic
Generator)
Pass/Fail
Arria 10 EMIF IP Core
Example Design
18
Simulation Example Design Script Output

Script creates a sim directory containing one subdirectory for each supported
simulation tools

19
Each subdirectory contains the specific scripts to run simulation with the
corresponding tool
Arria 10 Simulation Guidelines
Arria 10 Simulation Guidelines are preliminary and subject to change
20
Simulation

Users will be able to choose between two simulation
models
 Skip Calibration
 Fastest simulation
 Loads the settings calculated from memory configuration and enters user mode
 Full Calibration (not supported in Quartus II 13.1)
 Performs all stages of memory calibration: calibration phases, delay sweeps, and
centering of all data bits
Skip Calibration mode
Full Calibration mode
System-level simulation focusing on user logic
Memory interface simulation focusing on calibration
Details of calibration are not captured
Details of calibration are captured (i.e. stages)
Enables users to store and retrieve data
Includes leveling, per-bit deskew, etc..
Efficiency accurate
No board skews are taken into account
21
Simulation: Supported & Not Supported
Supported
Not Supported
Functional Verification
Timing Verification
Skip Calibration (default)
Nativelink
Memory Vendor Models
Full Calibration*
Post-Fit Simulation*
Multi-Rank*
Multiple-CS memory interface*
Memory frequency < 400Mhz*
RDIMM & LRDIMM configurations*
*Available in a future Quartus II version
Note: Validating the timing of your design requires using Altera’s TimeQuest Timing Analyzer
22
Simulation

To simulate your design you will need the following
components
Altera Supported Simulator
Design using Altera External Memory Interfaces IP
An Example Driver (Altera or User)
Testbench (Altera or User)
Altera’s Memory Simulation Model (We do not support
simulation with memory vendor models)
23
Information About Simulation Filesets & Directory

Core simulation filesets are identical to core synthesis
filesets
 Addresses simulation v synthesis fileset concerns of the past
 Ensure users are simulating the blocks they are synthesizing
<dut>/*
<dut>_sim/altera_emif_arch_nf/*


Any changes made in the synthesis directory should
also be made in the simulation directory to reflect
similar IP behavior
Fewer files to compile for simulation compared to the
External Memory Interfaces IP of the past (UniPHY,
ALTMEMPHY)
 64 files for the example design
 7 are unique for each interface
 Users can modify the BIST in the example design if needed
24
Important Assumptions Made by Simulation

Altera library simulation atoms assume the following:
 RTL simulation assumes an ideal layout including:
 Interfaces are unaware of each other
 Interface assumes it is the only interface in the column
 Interface believes it has its own IOAUX and Hard Nios
 Interface is at the bottom of the column, nearest to the physical IOAUX block
location
 Fitter may actually place an interface at the top of the column if left unconstrained but
there are no drawbacks between an interface at the top of the column and an interface
at the bottom of the column
 One PLL per interface
 At Post-Fit, it is possible for interfaces to share the same bank PLL
 PLL reset only occur during power-up
 Issue a recalibration request per-EMIF interface in place of a PLL reset
*EMIF = External Memory Interfaces
25
RTL Simulation v Post-Fit Implementation


There may be a discrepancy between the simulated
latency versus the Post-Fit latency
Do not rely on the simulated interface latency
bank 1
bank 0
Lane 3
Lane 3
Lane 2
Lane 2
Lane 1
Lane 1
Lane 0
Lane 0
bank 0
Lane 3
Lane 2
Lane 1
Lane 0
RTL Simulation
26
bank 1
Fitter
Operations
Lane 3
Lane 2
Lane 1
Lane 0
Post-Fit Implementation
AFI Clock
Cycle Penalty
RTL Simulation v Post-Fit Implementation

Introduction of AFI Clock Cycle penalty:
 For wide, multi-bank interfaces and/or ultra-low latency interfaces
 Fitter can detect this penalty and will issue a warning accordingly
 Only an issue when the requested write latency is less than the latency
accrued by the farthest away bank
bank 2
Lane 3
Lane 2
Lane 1
Lane 0
bank 1
Lane 3
Lane 2
Lane 1
Lane 0
bank 0
Lane 3
Lane 2
Lane 1
Lane 0
27
AFI Clock
Cycle Penalty
RTL Simulation v Post-Fit Implementation

RTL Simulation: NIOS initialization and calibration code
executes in parallel for all interfaces
 Interfaces might assert ‘cal_done’ (calibration done) simultaneously in
simulation
 Do not rely on this behavior shown in simulation

Post-Fit Implementation: NIOS initialization and
calibration code executes sequentially
 Order of calibration is determined by fitter operations
 Calibration is complete when all interfaces in a column assert 'cal_done'"
 You must sample all cal_done signals in a column to determine when
calibration is complete
28
Generating the Example Design

29
Step 1: Generate the Design
Simulating the example design

Generation Output
File/Directory
Description
dut/
<- Actual IP for your project
dut/
Actual
IP
for
your
project
dut_example_design/
<- Example
Design Subfolder
dut_sim/
<- Subfolder
Simulation fileset only (no driver, etc)
dut_example_design/
Example Design
dut.cmp
dut_sim/
Simulation Fileset (no driver)
dut.qip
dut.cmp dut.spd
Component Declaration File (text file containing port definitions that can be used in VHDL Design
Files)
dut.sip
dut.qip dut.ppf
Quartus IP File (contains paths for all files needed for the IP core)
dut.v

30
dut.spd
Simulation Package Descriptor File (lists the required simulation files for the IP core or Qsys
system)
dut.sip
Simulation IP File (contains information assignments that specify IP simulation source files)
dut.ppf
Pin Planner File (XML file that stores the port and node assignments for use with the Pin Planner)
dut.v
Variation File of the IP core (contains the IP settings used to generate the IP core)
Core files in each subdirectory are identical

dut_example_design/*: Contains driver and memory model

dut_sim/* and dut/*: Identical
Example Design Output Files

In dut_example_design
File
Description
ed_sim.qsys
Qsys file capturing the simulation example
design
ed_synth.qsys
Qsys file capturing the example design for
synthesis
make_qii_design.tcl
Script to generate the example design
project for synthesis
make_sim_design.tcl Script to generate the example design for
simulation
31
params.tcl
Support file for the generation scripts
params.txt
XML file created that holds user chosen IP
settings
readme.txt
Instructions for user
Generating the Example Design Simulation Files


32
Run “quartus_sh -t make_sim_design.tcl VERILOG” to
generate the simulation files for the example design in
verilog
Run “quartus_sh -t make_sim_design.tcl VHDL” to
generate the simulation files for the example design in
VHDL
Modelsim Example Flow




You can create your own .do file in order to view signals
in Modelsim’s waveform viewer
Example: run.do
Execute ‘do run.do’ in the Modelsim console to run the
simulation with signal waveforms
To only see a successful simulation result shown in the
console messages
 Execute ‘source msim_setup.tcl’ in the Modelsim console
 Execute ‘ld_debug’ after msim_setup.tcl is loaded
 Execute ‘run -all’ after ld_debug finishes
33
Modelsim Example Guide

To store the entire log of the simulation data and results
 Edit msim_setup.tcl & add ‘-l ed_sim.log’ to the vsim line
34
Simulating the example design

Supported simulators:
Supported
Not Supported
Mentor Graphics Modelsim
Aldec Riviera-PRO*
Synopsys VCS and VCS-MX
Cadence NCSIM*
*Available in a future Quartus II version
35
Arria 10 EMIF Timing Closure Guidelines
Arria 10 timing closure guidelines are preliminary and subject to change
Arria 10 EMIF Timing paths
User Logic
(Core)
37
Periphery
IO including read,
write, write levelling,
etc
Timing Closure Guidelines

Timing closure in any core transfers include






Arria 10 timing analysis will not show any periphery-to-periphery timing in
TimeQuest
Timing closure in any of the IO transfers






Board skews must be simulated using board tool (not estimated or calculated via trace length)
Channel effects (ISI and crosstalk) can only be determined by a board simulator
Include simulated Board skew and Channel effects in Megawizard GUI during IP generation
Refer to Board guidelines for more details on board skew and channel effects
ReportDDR” will run automatically as part of signoff timing analysis

38
Not dependent on Quartus II compile
Dependent on customer memory, FPGA speed grade parameters and channel effects
For accurate timing analysis, simulate correct board parameters and channel
effects


From last set of registers in core to first set of registers to periphery (C2P)
From last set of registers in periphery to first of registers in core (P2C)
Note that C2P/P2C paths are cut and not analyzed in the current Quartus II release
Core Timing analysis will not include user logic timing nor user logic timing to/from EMIF block
User has to check “ReportDDR” as part of signoff timing analysis to make sure EMIF has closed timing
Estimate Early IO Timing without Quartus II Compilation


Users can see IO margins without compiling EMIF design

Early IO timing will look like a spreadsheet type analysis shown in a TimeQuest panel

Provides breakdown in margin loss between receiver/transmitter/channel
Flow

Generate EMIF IP with configuration of interest including memory and board parameters

Create Quartus II project files ( QPF, QSF) with selected Arria 10 device part

Run TimeQuest with <name>_report_io_timing.tcl that get’s generated as part of the IP

EMIF
IP generation
Details in the next slide
Ideal Window
EMIF
source
Channel Effects
(.tcl files)
Run
_report_io_timing.tcl
in TimeQuest
Quartus project
creation
(including part
selection)
Quartus
project
files
(.qpf/.qsf)
Early IO estimate flow
39
Memory (Receiver)
Effects
FPGA (Transmitter)
Effects
Final Margin
Time Quest Panel
Running TimeQuest for Early IO Estimates
1.
2.
3.
4.
5.
Start TimeQuest
Open Project
Pick “Script Run TCL script”
Pick <name>_report_io_timing.tcl file
TimeQuest prints out summary and
creates a “ReportDDR” panel

Same level of detail for IOs

Produces a warning mentioning that core timing is not included

Similar type of analysis available for all IO transfers
(Read capture, DQS gating, A/C and Write Leveling)
Note: To generate early IO timing reports, run report_IO_timing.tcl
before running any Quartus II compilation
40
Early IO Estimates – other Execution Methods


Instead of “Script Run TCL script” just type

“source <name>_report_io_timing.tcl” in the TCL console

OR
At the command prompt type

Quartus_sta -t <core_name>_report_io_timing.tcl <project_name>
Note: Positive margins in Early IO timing estimate does not guarantee signoff timing analysis
41
Arria 10 Fitter Guidelines
Arria 10 fitter guidelines are preliminary and subject to change
42
Fitter Behaviors


Multi-bank Interface: Multiple, contiguous banks that
make up one interface
Introduction of AFI Clock Cycle penalty:
 For wide, multi-bank interfaces and/or ultra-low latency interfaces
 Fitter can detect this penalty and will issue a warning accordingly
 Only an issue when the requested write latency is less than the latency
accrued by the farthest away bank in a multi-bank interface
43
Clocking

Since multi-bank interfaces will use multiple PLLs and
in turn multiple PHY clock trees, a reference clock tree
will be used to route a common reference clock signal
to all PLLs
 Not all pins can drive the PLL reference clock tree
 Quartus II software restricts PLL reference clock frequencies depending on
the memory frequency
 Use the Arria 10 External Memory Interfaces IP Megawizard GUI to determine
the valid PLL reference clock frequencies


44
Fitter merges PLLs when a bank is shared by different
interfaces
Fitter duplicates PLL for multi-bank interfaces
Clocking

Example of the reference clock tree driving multiple
PLLs which are, in turn, driving multiple PHY clock
trees
Balanced Reference Clock Network
 Jitter is lowered with this balanced structure
45
PLL
PHY
clock
tree
PLL
PHY
clock
tree
PLL
PHY
clock
tree
PLL
PHY
clock
tree
Sharing Resources

In Arria 10, the following resources can be shared and
in some cases are forced to be share:
Resource
Implication
I/O bank
Ability to fit more interfaces in a single column
Hard Nios II
Cannot rely on one cal_done signal as representative of
all interfaces passing the calibration stage
Core Clock Network
Shared PLL reference network. Users should place
interfaces in consecutive banks
PLL reference clock pins*
Shared PLL reference clock and network trees
OCT block and RZQ pin*
None
Address/Command pins*
Shared for Ping Pong PHY
*More in Pin Guidelines

Certain resources are forced to be shared
 IOAUX & Hard Nios II CPU for all interfaces in a column
 A bank shared by two interfaces

46
PLL/DLL do not need to be shared as each bank has
one
Sharing an I/O Bank

Fitter can place interfaces in a shared bank if the
interfaces share the same:
 Protocol
 Rate
 Phase
 Frequency



Users can fit even more interfaces in a column
Interfaces cannot share the same controller nor
sequencer
Fitter will not allow users to have a lane shared by two
interfaces
 One DQS-in tree can only talk to one controller

Unused pins can be used by the customer as a GPIO
 Must be the same voltage standard
47
Example: 2 x16 interfaces sharing a bank
Unused
(Free for GPIO, but not LVDS)
Bank
N+1
Fixed Address / Command
Pin out
Data path
Bank
N
Unused
(Free for GPIO, but not LVDS)
Data path
Data path
Bank
N-1
48
Fixed Address / Command
Pin out
Sharing Hard Nios / IOAUX


Interfaces placed within the same column by the fitter
will share the same IOAUX and Hard Nios II
The Hard Nios II calibrates each interface sequentially
 You must sample all cal_done signals in a column to determine when
calibration is complete

RTL simulation behaves as if every interface has its
own Hard Nios II
 More on this in the Simulation Design Guidelines
49
Sharing Hard Nios II Processor


The Arria 10 External Memory Interfaces IP will contain
one Hard Nios II and IOAUX per interface but fitter will
merge them all into a single instance
You must use the same IOAUX clock and reset for all
interfaces in the same column or Fitter will generate an
IOAUX
error
bank 3
Lane 3
Lane 2
Lane 1
Lane 0
bank 2
Lane 3
Lane 2
Lane 1
Lane 0
bank 1
Lane 3
Lane 2
Lane 1
Lane 0
bank 0
Lane 3
Lane 2
Lane 1
Lane 0
50
Sharing Core Clock Networks


Fitter can use one core clock domain to synchronously
access all interfaces in a column
Users can share core clock networks by the master &
slave setting in the IP generation GUI
 Connect core_clks_master_out from the master to all slave’s
core_clks_slave_in
 Must use same column, PLL reference clock, rate, and frequency
 Interfaces in different columns cannot use this feature
 Place interfaces in consecutive banks as the PLL reference clock are forced
to be shared when choosing to share core clock networks
51
Fitter Relationship to Pin Assignments

Pin assignments
 Fitter can reallocate banks based on user pin assignments
 Fitter can rotate pins within a lane based on user pin assignments but
cannot move pins across lanes away from their DQS group
 Users can constrain a DQS pin to a lane, and the Fitter will place all DQ
signals in their respective DQS group in the same lane
52
Arria10 Interface Pin Guidelines
Arria 10 Pin Guidelines are preliminary and subject to change
Overview

Pin Guidelines
 Guidelines
 Rules for constraining pins
 Determine IO bank requirements for DDR3, DDR4
 Interface placement
 Find pin names for A/C and data pins
 Example for constraining DDR3 x8 and x72
 Alternate methods for constraining interface pin assignments

Pin guidelines for sharing multiple interfaces
 Constraints for sharing multiple interfaces
 Step by step guidelines
54
Pin Guidelines
1.
Determine number of banks based on interface width
2.
Pick CK0 pin based on desired interface location in the
FPGA
3.
Constrain CK0 pin to selected pin name or A/C bank
4.
Constrain one DQS pin for each DQS group either to
pin name or A/C bank
5.
Constrain PLL reference clock pin and RZQ pin to pin
names
55
Rules for constraining pins


A/C

All A/C pins should be in a single BANK

A/C and data pins cannot share a lane (12 IOs)

But unused A/C pins in a lane can be used by GPIOs

A/C pins must follow predefined locations within a BANK

A/C and data pins can share a bank
DQ pins

DQ signals from two different DQS groups cannot be
constrained to same IO_12_LANE

56
DQS pins

Related DQ pins must be in the same IO_12_LANE(s)

A read data group must be assigned based on DQSin grouping in pin table
Rules for constraining pins (contd)

PLL Reference Clock pins

For a given interface speed, there is a restriction
on values of possible PLL ref clock frequencies

Must use Arria 10 Interface v13.1 MegaWizard
to determine possible PLL reference clock
frequencies for onboard crystal oscillator
 Altera recommends using the default PLL reference clock
frequency from MegaWizard

Crystal clock frequency is generally lowest clock
frequency of memory interface divided by integer N
 Where N= 1,2,3,4,5

57
Find detailed step by step guidelines for
constraining pins in next few slides
DDR3: Determine Number of Banks Required

Find number of banks required based on whether there is any IO bank
sharing or not

Pin count for DDR3 8/16/32-bit is based on 1CS

When using multiple CS for DDR3 8/16/32-bit,
 Add 5 more pins for 2CS to total IO count
 Add 15 more pins for 4CS to total IO count

Calculate the number of banks required with multiple CS pins
 No. of IO Banks = (Total No. of IOs for 1CS + Additional IOs for multiple CS) / 48
IO Banks sharing with
Interface width and
Number of IO Banks
other Interfaces (1/4
memory configuration
IOs
(non-sharing)
Granularity)
8-bit w/o ECC 1CS
16-bit w/o ECC 1CS
16-bit with ECC 1CS
32-bit w/o ECC 1CS
32-bit with ECC 1CS
72-bit UDIMM 1-Rank
72-bit UDIMM 2-Rank
72-bit UDIMM 4-Rank
72-bit SO-DIMM 1-Rank
72-bit SO-DIMM 2-Rank
72-bit SO-DIMM 4-Rank
58
43
55
67
79
91
139
144
154
127
132
142
1
2
2
2
2
3
3
4
3
3
3
1
1.25
1.5
1.75
2
3
3
3.25
2.75
2.75
3
DDR4: Determine Number of Banks Required

Find number of banks required based on whether there is any IO bank
sharing or not

Pin count for DDR3 8/16/32-bit is based on 1CS

When using multiple CS for DDR3 8/16/32-bit,
 Add 5 more pins for 2CS to total IO count
 Add 15 more pins for 4CS to total IO count

Calculate the number of banks required with multiple CS pins
 No. of IO Banks = (Total No. of IOs for 1CS + Additional IOs for multiple CS) / 48
IO Banks sharing with
Interface width and
Number of IO Banks
other Interfaces (With
memory configuration
IOs
(non-sharing)
1/4 Granularity)
8-bit w/o ECC 1CS
16-bit w/o ECC 1CS
16-bit with ECC 1CS
32-bit w/o ECC 1CS
32-bit with ECC 1CS
72-bit UDIMM 1-Rank
72-bit UDIMM 2-Rank
72-bit UDIMM 4-Rank
72-bit UDIMM 1-Rank
72-bit UDIMM 2-Rank
72-bit UDIMM 4-Rank
59
49
61
73
85
97
142
147
157
145
150
160
1
2
2
2
2
3
4
4
3
4
4
1
1.25
1.5
1.75
2
3
3.25
3.25
3
3.25
3.5
Plan Interface Placement in Column
Select consecutive banks out of 8
banks in a column and select middle bank
for Address/command (A/C) pins (Must)


In case of even number of banks, pick any one of
the middle two
A/C pins can take 3 or 4 IO lanes
depending on memory topology and
protocol

When A/C requires only 3 IO lanes,
only bottom 3 lanes (A/C 0,1, 2) must be
used

DDR3: Find pin number for CK0 pin
Find pin number for CK0 pin based
on the A/C lanes selected
 Examples:




Pin 8 corresponds to CK0 pin
Pin 24 corresponds to PLL_clockin[0]
Pin 26 corresponds to RZQ

The pin number for A/C pins is also
generated in “project”_readme.txt in
“Project”/submodules/ folder

Altera recommends using the
“project”_readme.txt file to find the pin
numbers for A/C pins
61
Pin No.
Component, UDIMM, SO-DIMM
47
46
45
44
43
42
41
40
39
38
37
36
35
34
33
32
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
CK3#
CK3
CK2#
CK2
CKE[3]
CKE[2]
ODT3
ODT2
CS[3]
CS[2]
BA[2]
BA[1]
BA[0]
CAS#
RAS#
A[15]
A[14]
A[13]
A[12]
RZQ
PLL_clockin[1]
PLL_clockin[0]
A[11]
A[10]
A[9]
A[8]
A[7]
A[6]
A[5]
A[4]
A[3]
A[2]
A[1]
A[0]
CK1#
CK1
CK0#
CK0
CKE[1]
CKE[0]
ODT1
ODT0
CS[1]
CS[0]
RESET#
WE#
DDR4: Find Pin Number for CK0 Pin
Pin No.

Find pin number for CK0 pin
based on the A/C lanes
selected

Examples:



Pin 8 corresponds to CK0 pin
Pin 24 corresponds to PLL_clockin[0]
Pin 26 corresponds to RZQ

The Pin number for A/C pins is also
generated in “project”_readme.txt in
“Project”/submodules/ folder

Altera recommends using the
“project”_readme.txt file to find the
pin numbers for A/C pins
62
47
46
45
44
43
42
41
40
39
38
37
36
35
34
33
32
31
30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0
Component
UDIMM
Alert_n
BG[0]
BA[1]
BA[0]
A[17]
A[16]
A[15]
A[14]
A[13]
A[12]
RZQ
PLL_clockin[1]
PLL_clockin[0]
A[11]
A[10]
A[9]
A[8]
A[7]
A[6]
A[5]
A[4]
A[3]
A[2]
A[1]
A[0]
PAR_in
C2
CK0#
CK0
C1
CKE[0]
C0
ODT0
ACT_n
CS[0]
RESET#
BG[1]
CK1#
CK1
CK3#
CK3
CK2#
CK2#
CKE[3]
CKE[2]
ODT3
ODT2
CS[3]
CS[2]
BG[0]
BA[1]
BA[0]
A[17]
A[16]
A[15]
A[14]
A[13]
A[12]
RZQ
PLL_clockin[1]
PLL_clockin[0]
A[11]
A[10]
A[9]
A[8]
A[7]
A[6]
A[5]
A[4]
A[3]
A[2]
A[1]
A[0]
PAR_in
CS[1]
CK0#
CK0
CKE[1]
CKE[0]
ODT1
ODT0
ACT_n
CS[0]
RESET#
BG[1]
Find Pin Name for CK0 and Constrain the Pin

Based on A/C bank selected, identify column index and bank index for the A/C pins


Column Index ranges from 0-1 and Bank Index from 0-7
Find pin name for A/C pin from pin table and constrain pin to pin name in QSF

Example: Pin name for CK0 in Column0 Bank 6 is L23

set_location_assignment PIN_L23 –to CK0
OR
Find Bank name for A/C pin and constrain pin to selected IO48 bank in QSF

Example: Selected IO bank for CK0 in Column0 Bank 6 is 2K

set_location_assignment IOBANK_2K –to CK0

Effectively locks all A/C signals
•
I/O Bank
63
P = Pin index (0-47)
X = Column index
Y = Bank index
Use Arria10 Pin table to
find Pin index, Column
index and Bank Index
Pin name
Constrain DQS and PLL Reference Clock Pins

Constrain PLL reference clock pin and RZQ pin to the pin
names

Constrain one DQS pin for each DQS group either to the pin
names or selected IO48 banks

Follow same method as CK0 for finding pin names for DQS
and PLL reference clock pins

Constraining CK0 pin, one DQS pin per group and PLL
reference clock effectively locks the entire interface
 Good compromise between full-automatic and manual placement
 Requires minimal effort
 Altera recommends this method for constraining pin assignments
64
Example for constraining DDR3 x8

Requires 1 Bank
 3 lanes for A/C pins and 1 lane for DQ and DQS pins

Picked IO lanes 0,1,2 for A/C Pins

Constrain pin CK0 to pin8 of Bank0 (P8X0Y0)
 set_location_assignment PIN_AG16 –to CK0

Constrain pin DQS0 to Bank0 (2A)
 set_location_assignment IOBANK_2A –to DQS0

Constrain PLL_refclk to pin24 to Bank0 (P24X0Y0)
 set_location_assignment PIN_AM15 –to PLL_clockin

Constrain rzqpin to pin24 to Bank0 (P26X0Y0)
 set_location_assignment PIN_AK18 –to rzqpin
Example for constraining DDR3 x72

DDR3 x72 w/ Hard Controller

Requires 3 banks
 3 lanes for A/C pins
 9 lanes for data

Constraining DDR3 x72

Constrain pin CK0 to pin8 of Bank1 (P8X0Y1)

Constrain PLL refclk to pin24 of Bank1
(P24X0Y1)

Constrain rzqpin to pin26 of Bank1 (P24X0Y1)

Constrain DQS groups









66
DQS0 to Bank2 (2G)
DQS1 to Bank2 (2G)
DQS2 to Bank2 (2G)
DQS3 to Bank2 (2G)
DQS4 to Bank1 (2F)
DQS5 to Bank0 (2A)
DQS6 to Bank0 (2A)
DQS7 to Bank0 (2A)
DQS8 to Bank0 (2A)
Alternate Methods: Constraining Interface Pin assignments
1.
2.
67
Let the fitter assign all Interface signals (A/C, DQS, DQ pins) automatically
a)
Run the design through the fitter without any constraints
b)
Save the post-fit netlist, or back-annotate the pin assignments

Requires least effort but longer compilation time

This method works well for small designs (one interface per column)

Must not use this method for darge designs with multiple IPs (Interfaces, HSSI, GPIOs, LVDS
etc.)
Manually constrain all Interface signals (A/C, DQS, DQ pins) to pin
locations
a)
Plan the Interface placement in a column (i.e., which IO48 banks to use)
b)
Use pin table to find legal position for each Interface pin
c)
Use QSF assignments to lock down the pins

Fast periphery placement

Can be a lengthy/tedious process (especially with multiple IPs)
Overview

Pin Guidelines
 Step by Step Guidelines
 Rules for constraining A/C, DQ, DQS and CLK Pins
 Determine IO Bank Requirements for DDR3, DDR4
 Interface placement
 Find Pin Names for A/C and data pins
 Constraining Interface Pin assignments

Pin Guidelines for sharing Multiple Interfaces
 Constraints for sharing multiple interfaces
 Step by step guidelines
68
Constraints for Sharing Multiple Interfaces

While sharing bank across multiple Interfaces the following criteria
should be followed

Must use identical clocks (rate, frequency, PLL ref clock)

Same protocol

Same voltage settings ( VCCIO, VREF)

While sharing PLL Reference clock pin between interfaces the
banks must be consecutive

Interfaces using same IO standard can share OCT and RZQ Pin

A bank cannot be used as A/C bank for two or more interfaces


A lane cannot be shared

69
Reason: hard controller and sequencer cannot be shared
Reason: only one DQSin tree per lane; a lane can only talk to one controller
Pin Guidelines for Sharing Multiple Interfaces (Steps)
1.
Determine total number of interfaces required
2.
Determine No. of banks based on interface width and No. of interfaces
3.
Ensure that interfaces meet the criteria for sharing interfaces
4.
Plan Interface placement in column(s)
a.
5.
Select middle bank for sharing DQ pins between two interfaces
For each interface, constrain CK pin to selected A/C bank or pin name
6.
Constrain PLL reference clock to the A/C bank for one of the interfaces only
7.
For each interface, constrain one DQS pin in each DQS group to a pin name or
Bank.
70
Arria 10 Board Design Guidelines
Arria 10 board design guidelines are preliminary and subject to change
Guidelines

Following guidelines are covered in the subsequent slides and they
apply to both DDR3 and DDR4
 Generic guidelines
 Length
 DQ to DQS delay
 Address command (should include the package delay?)
 Delay within the group
 DQS to CK guideline



72
Length matching guidelines are recommendations and they should not
be considered as hard guidelines
Customer must perform necessary board level simulation to make sure
there are no signal integrity, ISI and crosstalk related issues
Customers must also enter accurate information in the ‘Board Timing’
tab of the memory MegaCore and compile the design to ensure there
are no timing violations
Generic Guideline
Trace impedance plays an important role in the signal integrity

Users must perform board level simulation to determine the best
characteristic impedance for their PCB
 For example, it is possible that for multi rank systems 40 ohm would yield
better result than a traditional 50 ohm characteristic impedance

To minimize PCB layer propagation variance, Altera recommend that
you route signals from the same net group on the same layer
 Use 45° angles (not 90° corners)
 Disallow critical signals across split planes
 Route over appropriate VCC and GND planes
 Keep signal routing layers close to GND and power planes
 Avoid routing memory signals closer than 0.025 inch (0.635 mm) to memory
clocks
73
Maximum Lengths

For DIMM
 From FPGA to DIMM connector max allowed trace length is 4.5 inches.
 Maximum DIMM to DIMM distance is 0.425 inches

For Discrete components
 7 inches maximum for address/command signal
 5 inches maximum for DQ/DQS/DM
74
DQ-DQS Delay

Match the (package + board) trace delays up to 20 ps of skew for DQ/DQS/DM
signals within a DQS group.

Details on how to do package de-skew is available in EMIF HB vol2 chapter 4.
75
Address/Command/Control Skew

All the address, command and control signals should
match up to +/- 20 ps compare to the mem_clk trace
 For example if the mem_clk trace delay is 500 ps then the allowed range for
any address/command/control signal is 480 ps to 520 ps
 For discrete components; make sure above recommendation is met for each
component in the fly-by chain
 For DIMMs: For single or multiple DIMM configuration make sure this
guideline is met at each DIMM connector
76
Address/Command/Control Skew




77
x = y +/- 20 ps
x + x1 = y + y1 +/- 20 ps
x + x1 + x2 = y + y1 + y2 +/- 20 ps
X + x1 + x2 + x3 = y + y1 + y2 + y3 +/- 20 ps
DQS-CLK guideline

The timing between the DQS and clock signals on each device
calibrates dynamically to meet tDQSS. To make sure the skew is not too
large for the leveling circuit’s capability
1.
Propagation delay of clock signal must not be shorter than propagation delay of DQS signal at
every device: (CKi ) – DQSi > 0; 0 < i < number of components – 1
2.
Total skew of CLK and DQS signal between groups is less than one clock cycle:
(CKi + DQSi) max – (CKi + DQSi) min < 1 × tCK

78
If you are using a DIMM topology, your delay and skew must take into
consideration values for the actual DIMM.
Simulation Guideline
1.
Using board sim tool such as Hyperlynx, set up a trace models which
includes IBIS/Hspice buffer models for FPGA and memory, DIMM
connector model (if applicable) and accurate board stack up
2.
Use this setup to extract accurate trace delay and ISI information
3.
Export memory interface layout into the board sim tool to run the board
level simulation
4.
Use PDN analysis tools to simulate the power supply noise
79
Board Skew Parameters

Users must enter accurate information about various delays and the skew in the
MegaWizard

Timing analysis
 DDR timing analysis scripts take board skews into account when generating
timing analysis report
 Inaccurate board skew parameters with result in inaccurate timing analysis
of the memory interface

Delay Chain Settings
 Physical delays are applied to delay chains to compensate for the skew
mismatch between various signals
 Board skew parameters affects the initial value applied to the delay chains

Altera recommends that you simulate your interface in Hyperlynx (or similar tool)
to acquire trace delays

80
You can use the Board Skew Parameters Tool available on altera website to calculate
the parameter once you have acquired the trace delays
Slew Rates, ISI and Crosstalk

As the operating frequencies are
pushing beyond 1 GHz, it is
becoming increasingly important that
user enters accurate slew rates and
ISI/crosstalk information

Customers should perform board
simulation on the external memory
interfaces and acquire all the
necessary slew rates and
ISI/crosstalk related information, and
enter that information into the
MegaWizard

Do not use the default values.

Accurate information about slew
rates and ISI/crosstalk will result in
accurate timing analysis of the
interface

Refer to EMIF HB Vol2 Chapter 9 for
further information about ‘Board
Timing’ parameters
81
Thank You