TN1037 - ORCA Series 4 Fast DDR Interface

ORCA Series 4
Fast DDR Interface
October 2002
Technical Note TN1037
Introduction
This document will specify the capabilities of the ORCA® series 4 I/O logic. Specifically, Double Data Rate (DDR)
interface schemes with clock forwarding. Single-ended I/O standards, such as XGMII, and speed capabilities up to
311MHz for single-ended I/Os and 350MHz for differential I/Os will be addressed. This document will discuss the
ORCA Series 4 FPGA receiving interface and transmitting interface separately. A reference design is included
showing the design practices required to meet XGMII specifications with DDR mode output registers using both
clock edges. Similar to the XGMII reference design, precise design practices must be followed to achieve system
level performance using 311MHz single-ended I/Os and 350MHz differential I/Os. The Lattice web site (www.latticesemi.com) should be consulted periodically to obtain the latest reference designs and information. All timing in
this document should be used instead of the results from ispLEVER™, due to the full range of simulations used to
obtain the results described in this document.
A typical interface between an ORCA FPGA and another device, using DDR with clock forwarding, is shown in
Figure 1. The Data bits are always transmitted and received with a transmit clock and a receive clock, respectively.
Note that for the purposes of this technical note, DATA and CONTROL are both referred to as DATA.
Figure 1. ORCA Chip to Chip Receive Interface
RXCLOCK
Receive
Logic
RXDATA[n-1:0]
n
Customer Device
ORCA FPGA
TXCLOCK
Transmit
Logic
TXDATA[n-1:0]
n
Receive Interface
Each receive interface is limited to a 12 Programmable Interface Cell (PIC) region, due to the interface clock distribution scheme. Each receive interface can have a maximum of 47 pins as single-ended data inputs, or 22 LVDS
data inputs and two pins for the clock input. The number of data input pins per clock input pin is limited due to the
bonding constraints of different packages. All of the potential pins are not bonded in some package types. Some
signal interface standards require reference voltage inputs. The number of single-ended data input pins is further
www.latticesemi.com
1
tn1037_02
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
limited by the number of reference voltage input, Vref pins, pins required. Each PIC has up to four bonded out pads
(A,B,C and D).
Only C pads can be used as a clock input for the Fast Input DDR interface, and the pad C clock input must be programmed in Fast Input mode. The reference design, mentioned in the appendix of this document, shows the
attributes to be put on the hardware description language (HDL) buffer instantiations to guarantee that these, and
other, hardware restrictions are adhered to when the FPGA/FPSC is configured. Data signals can be any pin (A, B,
C and D). The top and bottom edges of the FPGA have 12 PIC interface regions that include the PIC with the clock
input pin, five PICS to the right and six PICs to the left. The left and right edges of the FPGA have 12 PIC interface
regions that include the PIC with the clock input pin and, five PICs below and six PICs above.
No PLL should be used with this Fast Input interface. The input data has an instantaneous relationship with the
input clock, rather than being smoothed by the PLL. A diagram of the receive interface on the left edge of the
device is shown in Figure 2.
Figure 2. ORCA FPGA Receive Interface
Left edge
PICs
Six PICs above
clock input PIC
Fast Input Clock
distribution to
left edge PICs
Clock input on pin C
Five PICs below
clock input PIC
Each PIC is composed of four pins (A,B,C and D) and four DDR logic blocks. Each DDR logic block can be configured as an input DDR logic block or an output DDR logic block. When pins or PIOs are configured in Fast Input
DDR mode, they use a special clock distribution on the edge of the device. The Fast Input clock distribution, Edge
Clock, spans six PICS at a time as shown in Figure 2. Each PIO programmed in Fast Input DDR mode uses two
flip-flops. One flip-flop is a negative edge triggered flip-flop in the shift register and the other flip-flop is a positive
edge triggered flip-flop in the PIO logic. Diagrams of this are shown in Figure 3 and Figure 4.
2
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Figure 3. Fast Input DDR Flip-Flop Distribution
RXDATA nearest to clock input
D
Q
I/O logic
Fast Input DDR pin
logic
closest
to Fast clock
input
flip-flop
Fast Input Clock
near
D
Q
Shift Reg.
flip-flop
near
Edge clock distribution
RXDATA farthest from clock input
D
Q
I/O logic
flip-flop
Fast Input DDR pin
logic
farthest
to Fast clock
input
far
D
Q
Shift Reg.
flip-flop
far
3
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Figure 4. PIC Input DDR Logic Scheme
RXDATA from pin
D
Q
INFF
I/O logic
System clock
flip-flop
0
1
ICKINV
Fast Input
Mode
Select
D
Q
INSH
Shift Reg.
flip-flop
0
1
CLKINV
Edge Clock
The clock and data bits have a required relationship at the input pins. The input relationship is specified as a setup
time and hold time of the data with respect to the clock edge, either rising or falling as shown in Figure 5.
4
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Figure 5. Fast Input DDR Timing Diagram
RXDATA
Data 1
Data 2
RXCLOCK
SETUP
HOLD
SETUP
HOLD
The Input interface setup and hold times are a function of the input buffer type and the signal characteristics.
1.5V HSTL Inputs
A common Fast Input DDR interface buffer type is HSTL class 1, as used in XGMII interface schemes. HSTL class
1 signal input characteristic specifications are listed in Table 1. The signal levels are a direct function of the buffer
power supply, VDDIO. VREF is a function of the VDDIO and the input signal levels are required to be at least 200mv
around the reference voltage. The VREF input signal pins will be required for this interface and will reduce the number of data input signal pins available within a 12 PIC region.
Please refer to both the XGMII and HSTL standard specifications IEEE 802.3ae and EIA/JESD8-6 respectively for
more information.
Table 1. XGMII HSTL Receive Signal Specifications
Parameter
Low
Nominal
High
VDDIO
1.40 V
1.50 V
1.60 V
VDD15
1.425 V
1.50 V
1.575 V
VIH (min.)
0.88 V
0.95 V
1.10 V
VREF
0.68 V
0.75 V
0.90 V
VIL (max.)
0.48 V
0.55 V
0.70 V
Input Slew Rate
1 V/ns
1 V/ns
1 V/ns
A Fast Input DDR interface scheme using an HSTL receiver and 1.5V signaling is shown in Figure 6.
5
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Figure 6. XGMII Fast Input DDR HSTL Receive Interface Diagram
VDDIO
Customer
chip
RXCLOCK
HSTL
output
drivers
RXDATA
VDD15
FPGA
HSTL
input
receivers
VREF
Simulation details and results using a Series 4 HSTL receiver at 156MHz and 1.5V power supply are listed in
Table 2.
Table 2. XGMII Fast Input DDR HSTL Receive Interface Results
Minimum
Maximum
Temperature (oC) Temperature (oC)
Setup (ps)
Hold (ps)
Speed Grade
580
550
-1
-40
125
480
480
-2
-40
125
480
480
-3
-40
125
The setup and hold simulation results presented throughout this document are the worst case delta between the
clock and data measured at seven different process corners. The fact that different speed grade setup and hold
numbers are the same is a result of the these measurements.
1.8V HSTL Inputs - Non-Standard
Simulation signal characteristics and results using a Series 4 ORCA HSTL receiver at 156MHz and a 1.8V power
supply with higher amplitude signal levels are shown in Table 3 and Table 4.
6
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Table 3. 1.8V Fast Input DDR HSTL Receive Interface Signal Specifications
Low
Nominal
High
VDDIO
Parameter
1.7 V
1.8 V
1.9 V
VDD15
1.425 V
1.5 V
1.575 V
VIH (minimum level)
1.4 V
1.5 V
1.8 V
VREF
0.85 V
0.9 V
0.95 V
0V
0V
0.10 V
1 V/ns
1 V/ns
1 V/ns
VIL (maximum level)
Input Slew Rate
Table 4. 1.8V Fast Input DDR HSTL Receive Interface Results
Maximum
Minimum
Temperature (oC) Temperature (oC)
Setup (ps)
Hold (ps)
Speed Grade
580
550
-1
-40
125
580
480
-2
-40
125
580
480
-3
-40
125
Fast Input DDR interface scheme using an HSTL receiver being driven by an SSTL2 output driver (2.5V signaling)
is shown in Figure 7.
Figure 7. 2.5V Fast Input DDR HSTL Receive Interface Diagram
VDDIO
Customer
chip
RXCLOCK
SSTL2
output
drivers
RXDATA
VDD15
FPGA
HSTL
input
receivers
VREF
7
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
2.5V HSTL Inputs with High Non-Centered Reference Voltage Scheme - Non-Standard
Simulation signal characteristics, simulation details and results using an HSTL receiver at 156MHz and a 1.5V
power supply with higher amplitude signal levels are shown in Table 5 and Table 6.
Table 5. 2.5V Fast Input DDR HSTL Receive Interface Signal Specifications
Parameter
Low
Nominal
High
VDDIO
1.4 V
1.5 V
1.6 V
VDD15
1.425 V
1.50 V
1.575 V
Vih (minimum level)
1.45 V
1.55 V
1.65 V
Vref (not standard STTL2 level)
1.30 V
1.40 V
1.5 V
Vil (maximum level)
0.85 V
0.95 V
1.05 V
Input Slew Rate
1 V/ns
1 V/ns
1 V/ns
Table 6. 2.5 V Fast Input DDR HSTL Receive Interface Results
Setup (ps)
Hold (ps)
Speed Grade
Minimum
Temperature(oC)
Maximum Temperature (oC)
820
625
-1
-40
125
820
625
-2
-40
125
820
625
-3
-40
125
2.5V SSTL Inputs
Fast Input DDR interface scheme using an SSTL2 receiver being driven by an SSTL2 output driver (2.5V signaling)
is shown in Figure 8.
Figure 8. 2.5V Fast Input DDR SSTL2 Receive Interface Diagram
VDDIO
Customer
chip
RXCLOCK
SSTL2
output
drivers
RXDATA
VDD15
FPGA
SSTL2
input
receivers
VREF
Signal specifications and simulation results using an SSTL2 receiver at 156 MHz and a 2.5V power supply are
shown in Table 7 and Table 8.
8
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Table 7. 2.5 V Fast Input DDR SSTL2 Receive Interface Signal Specifications
Low
Nominal
High
VDDIO
Parameter
2.30 V
2.50 V
2.70 V
VDD15
1.425 V
1.50 V
1.575 V
Vih (minimum level)
1.50 V
1.60 V
1.70 V
Vref
1.15 V
1.25 V
1.35 V
Vil (maximum level)
0.80 V
0.90 V
1.00 V
Input Slew Rate
1 V/ns
1 V/ns
1 V/ns
Table 8. 2.5 V Fast Input DDR SSTL2 Receive Interface Results
Setup (ps)
Hold (ps)
Speed Grade
Minimum
Temperature(oC)
Maximum Temperature (oC)
700
700
-1
-40
125
500
500
-2
-40
125
500
500
-3
-40
125
2.5V LVDS Inputs
Fast Input DDR interface scheme using an LVDS receiver being driven by an LVDS output driver (2.5V signaling) is
shown in Figure 9.
Figure 9. 2.5 V Fast Input DDR LVDS Receive Interface Diagram
VDDIO
Customer
chip
LVDS
output
drivers
RXCLOCK
VDD15
FPGA
LVDS
input
receivers
RXDATA[P:N]
Signal specifications and simulation results using an LVDS receiver and a 2.5V power supply are shown in Table 9
and Table 10.
9
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Table 9. 2.5V Fast Input DDR LVDS Receive Interface Signal Specifications
Parameter
VDDIO
Differential Input Voltage
Low
Nominal
High
2.3 V
2.5 V
2.7 V
0V
-
2.4 V
Differential Input Threshold
-100 mV
-
100 mV
Output Common-mode (Offset) Voltage
1.125 V
1.25 V
1.357 V
Input Common-mode Voltage
0.2 V
1.25 V
2.2 V
Integrated Receiver Termination Resistor
95 Ω
100 Ω
105 Ω
Table 10. 2.5V Fast Input DDR LVDS Receive Interface Results
Setup (ps)
Hold (ps)
Speed Grade
Minimum
Temperature(oC)
Maximum Temperature (oC)
TBD
TBD
-1
-40
125
TBD
TBD
-2
-40
125
TBD
TBD
-3
-40
125
Additional Hold Considerations
Fast Input DDR interface schemes using I/O standards that require reference voltage inputs become difficult to fit
on the Series 4 ORCA device as the bus width requirement increases and the package size decreases. There are
two possible solutions for these scenarios. The first solution is to use more than one clock input per group of data
bits. The second solution is to extend the Fast Input clock (Edge Clock) in either direction from the clock pin, by
units of six PICs. The first solution can be achieved with manipulation of the board level clock and data traces,
essentially adding more RC delay to the clock and then matching data traces to the clock. The second solution
changes the data to clock setup and hold time specifications for all pins in the extended region of the data bits. For
the pins used in the extended region, an additional amount of hold time must be added to the pin timing specifications. The amount to be added is listed in Table 11.
Table 11. Additional Hold Time Specifications
Setup Time
Hold Time (-1)
Hold Time (-2)
Hold Time (-3)
1 Group of 6 PICs in each direction
Additional Sections of Clock Network
0 ps
+345 ps
+380 ps
+340 ps
2 Groups of 6 PICs (12) in each direction
0 ps
+690 ps
+760 ps
+680 ps
The (-1), (-2), and (-3) in the above table are the speed grades for the Series 4 ORCA devices.
Transmit Interface
Output DDR interfaces transmit data (and control) signals on both the positive and negative edges of a clock. Clock
forwarding transmits a clock signal along with the data and control signals with a specified relationship between the
data (and control) bits and the clock signal.
All forms of signal skew directly reduce the margin of the specified relationship. Each transmit interface should use
Series 4 ORCA Primary Clock routing resources to distribute clocks with as little skew as possible. The skew of the
Primary Clock distribution is a function of the size of the array, process, voltage and temperature corners and the
relative relationship of the rise and fall edges of the clock as seen at the end points of the distribution.
Unlike the Fast Input DDR interface, there are no constraints limiting the relative distance between Output DDR
pins. The architecture of the Primary Clock resources are such that horizontal circuits have less skew than vertical
circuits. This is due to the fact that the primary distribution has a very low skew spine network and branches contribute most of the skew. The branches run vertically, so any circuits that run horizontally across the FPGA, tap off
10
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
the branches at the same point. PICs on the top and bottom edges of the FPGA have less skew between them than
the PICs on the left or right edges of the FPGA.
To implement an output DDR circuit, program the output pins as output DDR elements. Each and all of the four pins
(A,B,C and D) in a PIC can be in Output DDR mode, and they must share the same clock. This implementation will
be discussed in the following sections and a reference design described in the appendix is available. The following
sections will also discuss the idea of clock forwarding which is an essential aspect of high speed Output DDR interface implementation.
Output DDR Interface
Implementing output DDR circuits in the PICs requires using both the PIO logic and Shift Register flip-flops
together. A diagram showing the Output DDR logic is shown in Figure 10.
For clock forwarding, make sure that the clock is using the primary routing distribution for lowest skew. The Output
DDR logic can be driven from either the System Clock (SC) or the Edge Clock(EC).
Figure 10. Different Edge Flip-Flops for Output DDR Interface
OUT FF
Q
D
I/O
Flip-flop
IOQ
0
1
Tx Data To Pad
MCLK
OUT SH
D
Q
Shift Reg.
Flip-flop
SHQ
SCIOREG
CK
Note: Each flip-flop changes value on the opposite edge from which it’s output is selected by the MUX, reducing skew.
A diagram of the timing for output DDR with different edge flip-flops is shown in Figure 11.
11
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Figure 11. Output DDR Mode Using Flip-Flops Triggered on Different Edge
System Clock
SCIOREG
MCLK
SHQ
IOQ
Tx Data to Pad
SH1
IO1
IO1
SH2
IO2
SH1
IO2
SH2
One consideration of using flip-flops with different edge triggers in the output DDR mode PICs is the requirement to
have a half cycle transfer somewhere in the logic path prior to the output PICs. A half cycle transfer logic path is a
path of logic where the data is transmitted from one flip-flop on one edge of the clock cycle and captured in a flipflop on the opposite edge of the clock cycle. A half cycle transfer essentially doubles the speed requirements of the
logic.
It is further suggested that no half cycle transfers are performed across the PLC array to PIC boundary because
the PLC array to PIC boundary has a tighter timing constraint than PLC to PLC timing constraints. The PLC logic
and PIC logic use the same clock distribution for Output DDR registers and they are running at the same frequency
and clock phase. In Output DDR mode operation the output pins transmit data on both relative edges of the clock.
Both the rising and falling input edges of the clock distribution are used and data is transmitted out on both edges
of the clock, inverted and non-inverted.
Clock forwarding in Output DDR mode is accomplished by transmitting the clock out in the same mode as the data,
with the shift register flip-flop input and the I/O logic flip-flop input signals tied to opposite levels (high and low). A
+90 Degree phase shifted clock Is used from a PLL to drive the output clock signal, so that it is in the center of the
data transitions. A diagram of the Output DDR mode with clock forwarding scheme is shown in Figure 12.
12
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Figure 12. Output DDR with Clock Forwarding
PLC
array logic
OUTFF[1:N]
Output DDR logic
clock bit
0
PLL
Clock IN
Output DDR logic
DATA/CNTL BITs
OUTSH[1:N]
Primary
Clock
Distribution
FB
1
Txclk
Txclk +90
TXDATA[1:N]
to pad
Primary
Clock
Distribution
D I/O Logic
Flip-flop
0
D Shift Reg
Flip-flop
1
TXCLK
to pin
A timing diagram of the Output DDR circuit using flip-flops triggered on different edges and clock forwarding is
shown in Figure 13.
13
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Figure 13. Output DDR with Clock Forwarding Timing Diagram
SystemClk
OUTSH
SH1
SH2
OUTFF
IO1
IO2
SHQ
SH1
IOQ
SH2
IO1
TXDATA to
pin
SH1
IO1
SystemClk +90
SHQ
SHQ=1
IOQ
IOQ=0
IO2
TXCLOCK to
pin
14
SH2
IO2
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Clock Skew Considerations for the Transmit Interface
Output DDR in Series 4 ORCA devices should use the Primary Clock distributions. The skew of the Primary Clock
distribution for different edge clocks for different process and temperature conditions are listed in Figure 12 and for
different speed grades in Figure 13.
Table 12. Primary Clock Skews
Condition
OR4E02
OR4E04
OR4E06
Slow @ 125 °C
262 ps
283 ps
297 ps
Nominal @ 125 °C
188 ps
207 ps
219 ps
Nominal @ 85 °C
168 ps
184 ps
195 ps
Nominal @ 25 °C
78 ps
92 ps
101 ps
Fast @ -40 °C
65 ps
73 ps
79 ps
Table 13. Primary Clock Skew by Speed Grade
Speed Grade
OR4E02
OR4E04
OR4E06
-1
262 ps
283 ps
297 ps
-2
258 ps
279 ps
293 ps
-3
217 ps
233 ps
243 ps
DDR Outputs, including forwarded clocks, from Series 4 ORCA devices have skew due to the non-symmetry of the
“Clock to pin” rising output delay versus falling output delay. This delay and skew is a function of the output buffer
type, the capacitive load at the pin and the value of the transmit data. Output DDR modes are listed in Table 14 for
HSTL and SSTL2 buffer types at a loading of 30pf. HSTL18 is a non-standard transmit buffer with a 1.8V power
supply.
Table 14. Output Circuit Skew by Speed Grade
DDR Mode with 30 pF Load
Speed Grade
HSTL
HSTL18
SSTL2
LVDS
-1
542 ps
396 ps
186 ps
TBD
-2
463 ps
344 ps
186 ps
TBD
-3
172 ps
184 ps
186 ps
TBD
15
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Capacitive Loading Effects on Transmit Interface
Output slew changes due to capacitive loading will affect the skew of the “Clock to pin” delay. A listing of the slew
rate versus capacitance is listed below for HSTL and SSTL2 in Table 15 and Table 16.
Table 15. SSTL2 Class 1 Output Slew Rates
Active
Load
Speed
Grade -1
Rise Slew
Speed
Grade -1
Fall Slew
Speed
Grade -2
Rise Slew
Speed
Grade -2
Fall Slew
Speed
Grade -3
Rise Slew
Speed
Grade -3
Fall Slew
Worst Case Worst Case
Fast
Fast
Rise Slew
Fall Slew
5pf
0.300044
0.379045
0.226525
0.263872
TBD
TBD
0.116569
0.113527
10pf
0.379675
0.461125
0.30894
0.340709
TBD
TBD
0.181648
0.177309
15pf
0.472546
0.554026
0.400628
0.425461
TBD
TBD
0.258786
0.249184
25pf
0.693088
0.728555
0.61478
0.599365
TBD
TBD
0.415799
0.399804
30pf
0.809378
0.815816
0.72652
0.692307
TBD
TBD
0.497285
0.474028
35pf
0.926506
0.903643
0.836652
0.780091
TBD
TBD
0.580173
0.551003
50pf
1.29856
1.18164
1.18738
1.06974
TBD
TBD
0.827322
0.781925
75pf
1.92047
1.64757
1.77291
1.54603
TBD
TBD
1.23570
1.16498
100pf
2.55386
2.13613
2.35006
2.03201
TBD
TBD
1.65066
1.55485
Table 16. HSTL Class 1 Output Slew Rates
Active
Load
Speed
Grade -1
Rise Slew
Speed
Grade -1
Fall Slew
Speed
Grade -2
Rise Slew
Speed
Grade -2
Fall Slew
Speed
Grade -3
Rise Slew
Speed
Grade -3
Fall Slew
5pf
0.700712
2.62658
0.465135
1.32322
TBD
TBD
10pf
0.931923
2.61930
0.627783
1.31717
TBD
TBD
0.266703
0.323708
15pf
1.16967
2.62711
0.79414
1.32718
TBD
TBD
0.334558
0.348327
25pf
1.67313
2.65482
1.14563
1.36522
TBD
TBD
0.479858
0.390572
30pf
1.93303
2.66708
1.32808
1.39210
TBD
TBD
0.553889
0.410266
35pf
2.20361
2.68380
1.51216
1.42150
TBD
TBD
0.630547
0.431769
50pf
3.03033
2.75449
2.07544
1.51824
TBD
TBD
0.86584
0.490902
75pf
4.42320
2.91914
3.04104
1.70250
TBD
TBD
1.24917
0.582507
100pf
5.87570
3.10590
3.99798
1.87271
TBD
TBD
1.66138
0.671156
16
Worst Case Worst Case
Fast
Fast
Rise Slew
Fall Slew
0.202517
0.304563
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Series 4 DDR Outputs with Clock Forwarding Results
Output DDR modes with clock forwarding use PLLs to align the output data bits with the output clock signal. PLL’s
have jitter which add skew to the output signals. This added skew reduces the margin of the output signals. The
three types of jitter that affect the Output DDR modes are duty cycle jitter, period jitter and output jitter. A definition
and specification of each of these is listed in Table 17. More information on PLL operation and PLL jitter can be
found in Lattice technical note TN1011, ORCA Series 4 I/O Tuning via PLL.
Table 17. PLL Jitter Definitions and Specifications
Jitter Type
Definition
Maximum
Duty Cycle
The min/max measurement of the time between successive rising to falling edges, or falling
to rising edges. This value does not include the input jitter, a percentage of which must be
added to this value to obtain the total output jitter. Values are 6 sigma results based on the UI
at the PFD inputs.
0.025 UIp-p
Period
The min/max measurement of the time when all rising edges occur versus the ideal edge
locations. This value does not include the input jitter a percentage of which must be added to
this value to obtain the total output jitter. Values are 6 sigma results based on the UI at the
PFD inputs.
0.015UIp-p
M Output vs.
N Output
The worst case relationship between these two outputs from the PLL when they are
programmed to have the same phase.
20ps
Typically, two outputs from a single Series 4 PLL are used to create two clock signals which can be used to drive
any of the output DDR circuits discussed previously. The first clock signal is used to drive all data output pins. The
second clock signal also drives an identical output DDR circuit, but it is shifted in phase by 90 degrees in order to
create a centered, forwarded clock.
In order to create a clock signal, the DDR output mode logic selects between a logic "0" and logic "1".
Output DDR Timing in Series 4
The timing relationship, due to PLL jitter, between the forwarded clock and any data bit is shown in Figure 14.
Figure 14. Timing Diagram (with Phase Lock Loop Jitter)
Duty Cycle
Jitter
TXCLK
Period Jitter +Output Jitter
Period
Jitter
Period Jitter
+ Output jitter
+Duty Cyc. Jitter
TXCLK phase
Margin
Phase Shift
From Figure 14 it is shown how the Output DDR mode data margin can be calculated by use of the following formula:
MARGIN = PERIOD/4 - PLL jitter - Clock Skew - “Clock to pin” skew
17
(1)
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
The following example shows how the margin for a -3 speed grade OR4E04 array.
Margin
= 1600ps
- 230ps
- 206ps
- 235ps
= 929 ps
Maximum ideal setup or hold time (bit period = 3200ps)
Jitter of PLL
Clock skew @ 125 degrees Celsius
DDR out circuitry R/F skew HSTL @ 1.8 V
Total margin
A timing diagram which references the data valid timing parameters is shown in Figure 15.
Figure 15. Data Valid Timing Diagram
Forwarded
Output Clock
Tdv
Tdv
Tdv
Tdv
Transmit
Data
The specifications for XGMII data valid time, relative to the forwarded output clock, is listed in Table 18 through
Table 20 for different sized Series 4 arrays. Since the clock is centered on the data, the minimum guaranteed valid
time before and after the forwarded clock are the same.
Table 18. 156MHz XGMII Output DDR Timing Specification for Array OR4E02
Symbol
Tdv
Data Valid Window
Speed
Grade
1.8 V HSTL
1.5 V HSTL
2.5 V SSTL2
2.5 V LVDS
XGMII
0.9600 ns
0.9600 ns
0.9600 ns
N/A
-1
0.6882 ns
0.5422 ns
0.8982 ns
TBD
-2
0.7442 ns
0.6252 ns
0.9022 ns
TBD
-3
0.9452 ns
0.9572 ns
0.9432 ns
TBD
Table 19. 156MHz XGMII Output DDR Timing Specification for Array OR4E04
Symbol
Tdv
Speed
Grade
Data Valid Window
1.8 V HSTL
1.5 V HSTL
2.5 V SSTL2
2.5 V LVDS
XGMII
0.9600 ns
0.9600 ns
0.9600 ns
N/A
-1
0.6672 ns
0.5212 ns
0.8772 ns
TBD
-2
0.7232 ns
0.6042 ns
0.8812 ns
TBD
-3
0.9292 ns
0.9412 ns
0.9272ns
TBD
Table 20. 156MHz XGMII Output DDR Timing Specification for Array OR4E06
Symbol
Tdv
Speed
Grade
Data Valid Window
1.8 V HSTL
1.5 V HSTL
2.5 V SSTL2 2.5 V LVDS
XGMII
0.9600 ns
0.9600 ns
0.9600 ns
N/A
-1
0.6532 ns
0.5072 ns
0.8632 ns
TBD
-2
0.7092 ns
0.5092 ns
0.8672 ns
TBD
-3
0.9192 ns
0.9312 ns
0.9172ns
TBD
18
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Conclusion
This document shows the specification on how to high-speed, clock-forwarded, double-data rate interfaces are
accomplished with ORCA Series 4 FPGA and FPSC devices. Their potential use in applications such as the standard XGMII interface has been shown for the various options, with the results showing operation is possible at very
high speed rates.
19
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Appendix A. XGMII Example Code
A structural design example available for download from the Lattice website shows how XGMII specifications are
met with DDR Mode input and output registers using both edges of the clock. Clock forwarding is also shown in this
example design. The design is in Verilog HDL with attributes set for synthesis with the Synplicity Synplify synthesis
tool. A preference file for ispLever is also included with the design example. This preference file, also known as a
constraints file, has the necessary use primary clock directives for the input and output clocks.
At the top of the design there are module instantiations for ORCA library elements with attributes that keep the synthesis software from removing these black box modules. These elements include the input buffer (IBM), output
buffer (OB6), the high-speed, programmable, phase-locked loop (HPPLL), and the DDR modules (HIODDR and
IODDR). A block diagram showing the dataflow and clocking structure is shown in Figure 16.
Figure 16. XGMII Example Design Block Diagram
IBM
36
HIODDR
72
Data and Control In
PFU Flip-flop
D
Q
IBM
Looped
Data
Input Clock In
OB6
Data and Control Out
36
IODDR
72
PFU Flip-flop
Q
D
IBM HPPLL
Output Clock In
90o Phase
Shifted Clock
OB6
Output Clock Out
"01"
IODDR
In this interface design example, the IODDR and HIODDR instantiated elements envelope the shift register and the
I/O logic registers discussed in this document. The locate attributes are needed so that the I/O buffer and shift register elements fit the ORLI10G-680 hardware package.
This example design loops the input data back to the output data. Insertion of the data processing (i.e. an
encoder/decoder) code can be connected to this design example in place of the data loop shown in Figure 15.
The HIODDR and IODDR library elements must be port mapped to match the pin bond-out for a given package.
Therefore, care must be taken when porting this design example to any other ORCA device, such as the
ORT82G5, which is not pin for pin compatible with the ORLI10G-680 device.
For example, the first HIODDR in the design example is instantiated as follows:
HIODDR
DDR_B17_inst (
.IND0(bxgmii_data_in[0]),.IND2(bxgmii_data_in[1]),
.IND3(bxgmii_data_in[2]),.CK(bxgmii_inclk),
.IN1Q1(xgmii_indata_64b[32]),.IN1Q0(xgmii_indata_64b[0]),
.IN3Q1(xgmii_indata_64b[33]),.IN3Q0(xgmii_indata_64b[1]),
.IN4Q1(xgmii_indata_64b[34]),.IN4Q0(xgmii_indata_64b[2]))
/* synthesis LOC="SHIFT_B17" */;
20
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
With this HIODDR element, the first three external DDR bits of data, bxgmii_data_in[0:2], are converted into
6 bits of single data rate (SDR) bits: negative-edged xgmii_indata_64b[32:34] and positive-edged
xgmii_indata_64b[0:3]. With all HIODDR and IODDR elements, IND[0:3] maps to IN[1:4]Q0 and IN[1:4]Q1.
As can be seen from the Verilog attribute above, this HIODDR element is located to the I/O shift-register
SHIFT_B17, which is the 17th I/O shift register on the bottom side of the device (counted left to right). There is one
PIC, with four PIOs, for each I/O shift register. For SHIFT_B17, the corresponding PIC that it can connect to is
named PB17 in pinout tables. The four individual I/Os for this PIC are PB17[A:D], but not all of these four I/Os are
bonded out.
The IND0, IND2, and IND3 pins are used in this port-mapping as they correspond to the bonded out A, C, and D
pads of the PIC PB17. IND1 is not used because pad B is not bonded out for the ORLI10G in the 680 pin package.
The pinout for all PICs are given in the device data sheet pinout table, where the pins are listed in PIC name order,
under the Pin Description column.
21
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Appendix B. LVDS 8:1 MUX/DEMUX Example Code
A structural design example available for download from the Lattice website shows how 350 MHz LVDS DDR interfaces can be used in ORCA Series 4 devices. The 350 MHz data rates are achieved with the -3 speed grade ORCA
devices. This example design is 17-bits wide, 16 bits of data and one bit of control.
There are two designs in Verilog HDL with attributes set for synthesis with the Synplicity Synplify synthesis tool.
One design, lvds_ddr_in.v, is for a recieve interface and the another design, lvds_ddr_out.v, is for a transmit interface. A preference file for ispLever is also included for each example design. This preference file, also known as a
constraints file, has the necessary use primary clock directives for the input and output clocks along with the frequency constraints for easily making 311 MHz with -2 speed grade ORCA devices. An orca4_synplify.v source file
is also included which may need to be included when running a stand-alone synthesis run with Synplify.
In order to perform data processing at lower speeds inside the array logic, the data needs to be parallelized and
synchronized to a lower speed clock. This is done in two stages, a 2:1 multiplexer/demultiplexer with the DDR elements coupled with a dual 4:1 multiplexer/demultiplexer with PFU based registers.
A block diagram showing the transmit dataflow and clocking structure for 1 of the 17 multiplexers is shown in
Figure 17.
Figure 17. LVDS Transmit Data Flow Diagram
fast_clk_puls
Array
Registers
tldata[112]
tldata[96]
tldata[80]
tldata[64]
IODDR
tldata[48]
tldata[32]
tldata[16]
tldata[0]
OLVDS
tdat_p[0]
tdat_n[0]
fast_clk
fast_clk_puls
ref_clk_pll
The 4:1 multiplexers in the above figure are implemented in the design example as PFU based 4-bit shift registers
(FD1S3AX elements) and 2-input multiplexer registers (FL1S3AY elements). These elements are instantiated in
the code so that they can be easily located in the array via the LOC attribute. The manual placement of these components is key to meeting the high-speed performance of the interface. The fast_clk_puls select signal is created
from a 2-bit sample and decode of the low-speed clock, to manage the transfer from the low-speed reference clock
domain into the high-speed clock domain. For the transmit interface, the reference clock is phase-locked with a
high-speed programmable PLL element (HPPLL) which also multiplies the reference clock by 4 to generate the fast
clock. This fast clock is forwarded off-chip with the high-speed data through an IODDR element to match the data
skew off-chip.
A block diagram showing the recieve dataflow and clocking structure for 1 of the 17 demultiplexers is shown in
Figure 18.
22
ORCA Series 4
Fast DDR Interface
Lattice Semiconductor
Figure 18. LVDS Receive Data Flow Diagram
fast_clk_puls
rdat_p[0]
rdat_n[0]
ILVDS
rldata[112]
rldata[96]
rldata[80]
rldata[64]
HIODDR
rldata[48]
rldata[32]
rldata[16]
rldata[0]
fast_clk
Array
Registers
fast_clk_npuls
qtr_clk
The 4:1 demultiplexers in the above figure are implemented in the design example as PFU based 4-bit shift registers (FD1S3AX elements) and 2-input multiplexer registers (FL1S3AY elements). These elements are instantiated
in the code so that they can be easily located in the array via the LOC attribute. The manual placement of these
components is key to meeting the high-speed performance of the interface. The fast_clk_puls and fast_clk_npuls
select signals are created from a 2-bit sample and decode of the low-speed clock to manage the transfer from the
fast clock domain into the quarter clock domain. Two different pulses are used: one for the rising edge that interfaces to the DDR element and 1 for the falling edge. For the receive interface the fast clock is an input and is
divided by four by counters with inverted feedback to generate the quarter-speed clock (qtr_clk).
In this interface design example, the IODDR and HIODDR instantiated elements envelope the shift register and the
I/O logic registers discussed in this document. The locate attributes are needed so that the I/O buffer and shift register elements fit the OR4E04-680 hardware package.
This example design registers the multiplexed/demultiplexed data in the array registers.. Insertion of an applicationdependent design can be connected to these array registers.
The HIODDR and IODDR library elements must be port mapped to match the pin bond-out for a given package.
Therefore, care must be taken when porting this design example to any other ORCA device, such as the
ORT82G5, which is not pin for pin compatible with the OR4E04-680 device. For examples of the IODDR and
HIODDR instantiations, please refer to Appendix A.
Technical Support Assistance
Hotline: 1-800-LATTICE (Domestic)
1-408-826-6002 (International)
e-mail: [email protected]
23