LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide November 2007 Technical Note TN1096 Introduction Among the most daunting challenges faced by the FPGA designer is the efficient transport of data to external memories. Current applications require large I/O channel bandwidths. In response to these demands, the industry has defined several new memory devices with their associated protocols (e.g., QDR, DDR, RLDRAM), each being optimized for a particular segment of the high-bandwidth market. This User Guide discusses a memory interface for a Second-Generation Quad-Data-Rate SRAM (QDRII/II+ SRAM), implemented in the LatticeSC™ FPGA. The two data buses (write and read) share a common address bus. The two addresses are independent, and therefore are time-multiplexed. QDRII/II+ SRAMS offer two architectures to provide this: 4-word burst and 2-word burst. The 4-word burst option is simpler and more robust, since the addresses toggle at single-data-rate (SDR). The 2-word burst option addresses, on the other hand, toggle at double-data-rate (DDR), but provide 2X finer granularity. The LatticeSC family can support both options, but this user’s guide concentrates on the simpler and more popular 4-word burst device, since granularity is not typically an issue. QDRII/II+ SDRAM Description History The following is a brief timeline of the major events in the emergence of the QDRII/II+ SRAM: • QDR Consortium Formed – Feb ‘99 • QDR-I Specifications Released – 2H99 • QDR-II Defined – 2H00 • QDR-I 9Mb Sampled – 1H01 • NEC Joins QDR Consortium – 1Q01 • Samsung Joins QDR Consortium – 2Q01 • Hitachi Signs LOI to Join QDR Consortium – 3Q01 • QDR-II Specifications Released – 4Q01 • QDR-II 18Mb Sampled – 4Q01 Specifications and Performance The QDRII/II+ SRAM is targeted to applications requiring: • Very high bandwidth • Low latency • Ratio of reads to writes of approximately one-to-one Table 1 lists some of the features of the QDRII/II+ SRAM solution: © 2007 Lattice Semiconductor Corp. All Lattice trademarks, registered trademarks, patents, and disclaimers are as listed at www.latticesemi.com/legal. All other brand or product names are trademarks or registered trademarks of their respective holders. The specifications and information herein are subject to change without notice. www.latticesemi.com 1 tn1096_01.1 LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor Table 1. QDRII/II+ SRAM Features QDRII/II+ SRAM Feature Max Frequency DLL Value Units 300 MHz Yes (optional) — Min DLL Frequency 119 MHz Initial Latency 1.5 Clocks Echo Clocks Max Address Size Yes — 19 (2-word burst) 20 (4-word burst) Bits 18/36 Bits Data Width Supply Voltage I/O Protocol 1.8 Volts HSTL-15, HSTL-18 — No — Refresh Cycles Needed? Implementation Challenges The most difficult aspect of a QDRII/II+ SRAM Memory Interface implementation is the read bus clock design. This clock, CQ/CQ#, is generated by the QDRII/II+ SRAM, and it is in phase with the read data; that is, the clock and the data transitions occur concurrently. The Memory Interface is charged with the task of properly latching the data, based on the CQ clock’s transitions. In order to accomplish this, while achieving maximum timing margins, it is necessary to shift the clock edge to the center of the “eye” (the interval between successive data transitions), which evenly distributes available margin between setup and hold requirements. LatticeSC devices provide PLL/DLL functions that can dynamically perform this operation easily, simply and efficiently. LatticeSC Features That Solve the Implementation Challenges The LatticeSC family provides many features that make high-speed systems easier to build. Among them are: • Per-pin DDR (Double-Data-Rate) capability built in. Usable in both DDR and QDR (Quad-Data-Rate) implementations. • Per-pin DDR (Double-Data-Rate) incorporates hardwired serial-to-parallel and parallel-to-serial conversion. Useful in SPI4 implementations. • I/Os that support SSTL (Stub-Series Terminated Logic), used with high-speed DDR SDRAM devices, and HSTL (High-Speed Transceiver Logic), used with high-speed SRAM devices. • Optional delays built into the I/O paths that can be statically or dynamically set on a per-pin or bus basis. The I/O pin delay blocks match the delay blocks that are internal to the DLLs, so that the DLLs can adaptively determine the optimum delay setting, and then apply that setting to the I/O pin delay blocks. In this way, the DLL can compensate for effects of frequency, voltage, temperature and device-to-device variation. • PLLs that can generate two high-speed clocks that are in quadrature (90° out of phase with each other). • Integrated and calibrated I/O pin input parallel terminations that can be used to match the 50-60 Ohm traces typically found on printed circuit boards over process, voltage and temperature variation. • Integrated and calibrated I/O pin output serial terminations that allow output buffers to match the 50-60 Ohm traces typically found on printed circuit boards over process, voltage and temperature variation. • High-speed FPGA fabric. 2 LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor LatticeSC QDRII/II+ Memory Interface Implementation Details Block Diagram Figure 1 shows the block diagram for a typical QDRII/II+ Memory Interface implemented in the LatticeSC FPGA. Figure 1. Block Diagram (Typical) ref_clk Clock Manager K KN k_clk qdr_wr_cmd_fifo_we qdr_wr_cmd_fifo_full qdr_write_addr qdr_write_control Clock net push Write Instruction & Control FIFO DIa DOa DIb DOb rst qdr_rd_cmd_fifo_we qdr_rd_cmd_fifo_full qdr_read_addr qdr_read_control ff_rst_n pop push Read Instruction & Control FIFO DIa W DOa DIb R DOb rst Address Mux A W R pop Write State Machine qdr_write_data_ready WN Read State Machine qdr_read_data_valid RN Write Data Path qdr_write_data Read Data Path qdr_read_data FPGA Fabric Interface 3 QDRII/II+ SRAM Interface D QVLD Q CQ LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor Pinout Figure 2 shows the pinout for a typical QDRII/II+ Memory Interface implemented in the LatticeSC FPGA. Inputs are on the left, and outputs on the right. The internal FPGA interface is on top, and the external HSTL interface is on the bottom. Figure 2. QDRII/II+ SRAM Memory Interface Pinout Interface (Typical) refclk k_clk ff_rst_n qdr_write_addr qdr_write_control qdr_wr_cmd_fifo_we qdr_read_addr qdr_read_control qdr_rd_cmd_fifo_we qdr_write_data ‡A ‡D QDRII/II+ SRAM Memory Interface Logic (FPGA Fabric Interface) qdr_wr_cmd_fifo_full ‡A qdr_rd_cmd_fifo_full ‡D ‡B ‡B CQ ‡E Q ‡C (QDRII/II+ SRAM Interface) (HSTL) QVLD Notes: ‡A – 17-20 bits wide ‡B – 36 or 72 bits wide ‡C – 18 or 36 bits wide ‡D – Number and meaning of these signals is implementation-dependent ‡E – CQ# is typically not used; instead, data is strobed on both the rising and falling edges of CQ. 4 ‡A ‡C qdr_write_data_ready qdr_read_data_valid qdr_read_data K KN A D WN RN LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor Signal List Table 2 below lists the basic set of interface signals and gives their descriptions. Table 2. Interface Pin Table Pin Name Direction refclk Width Description 1 Input clock. This input feeds a PLL which supplies all clocks to the unit and the QDRII/II+ SRAM. The PLL may provide frequency multiplication/division. qdr_rst_n Active-LOW asynchronous reset qdr_write_addr 17-20 qdr_write_control qdr_wr_cmd_fifo_we In From FPGA qdr_read_addr Write address * Write control signals. *Note: Number and content is implementation-dependent. 1 Accompanies valid data on “qdr_write_addr” and “qdr_write_control”. 17-20 Read address. qdr_read_control * Read control signals. *Note: Number and content is implementation-dependent. qdr_rd_cmd_fifo_we 1 Accompanies valid data on “qdr_read_addr” and “qdr_read_control”. qdr_write_data 36/72 Write data bus. k_clk 1 This is a copy of the unit’s internal clock, derived from input signal “refclk”. qdr_wr_cmd_fifo_full 1 Reports the full state of the Write Instruction FIFO. qdr_rd_cmd_fifo_full 1 Reports the full state of the Read Instruction FIFO. 1 Signals that the Memory Interface is accepting the data currently on bus “qdr_write_data”. qdr_write_data_ready Out to FPGA qdr_read_data_valid 1 qdr_read_data 36/72 K 2 Accompanies valid data on bus “qdr_read_data” Read data bus. QDRII/II+ SRAM’s input clocks “K” and “K#”. K is the Memory Interface unit’s clock, delayed 90º. KN is the inverse of K. KN A Out to SRAM D 17-20 QDRII/II+ SRAM’s address bus “A”. Width is 17 bits in 4-word burst mode, 18 bits in 2-word burst mode. 18/36 QDRII/II+ SRAM’s write data bus “D”. WN 1 QDRII/II+ SRAM’s active-LOW write enable “W#”. RN 1 QDRII/II+ SRAM’s active-LOW read enable “R#”. 1 CQ is the clock for DDR bus “Q”. Note: the CQ# signal from the QDRII/II+ SRAM is not used. Instead, both the rising and falling edges of CQ are used to clock incoming data. 18/36 QDR_II SRAM’s read data bus “Q” and accompanying valid signal "QVLD." CQ In from SRAM Q, QVLD Description of Operation The QDRII/II+ SRAM features independent read and write buses and control. Also, because the memory is static, there is no need to allocate bus cycles for refresh. Together, this means that the memory is always available for both read and write operations. The only interdependences (for 4-word-burst QDR-SRAMS only) are (1) new operations must of course wait for a previous burst of the same type (read vs. write) to complete, and (2) the read and write operations must share the address bus, so they cannot start on the same clock. Otherwise, there is nothing that can impede the fulfillment of a read or write request. The first interdependency is satisfied if access requests skip every other clock cycle. The second is satisfied if the reads and writes alternate, each utilizing the address bus during the clock that the other is inactive. Once this “cadence” is established, data can flow at a constant 100% bandwidth rate, on both the read and write bus. For most applications, the memory must be able to handle sustained maximum-bandwidth transfers, so the memory bandwidth must be equal to or greater than the channel bandwidth. In this case, FIFOs on the read and 5 LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor write channels are needed only for clock domain transition and buffering of the few data words needed as the channel meshes with the memory’s cadence. A memory interface would typically perform this buffering. Additionally, a memory interface may perform DMA functions, such as extending burst lengths to accommodate larger blocks of read or write data, and buffering multiple block read/write instructions. This instruction buffering also provides the look-ahead necessary to maintain 100% bandwidth operation (i.e., no idle cycles). Write A typical write sequence, as it appears on the pins of the QDRII/II+ SRAM, is shown in Figures 3 (4-word burst) and 7 (2-word burst). The accompanying read operation represents the earliest read that will return the data just written by the write operation. For a 2-word burst device, a write is indicated by signal W# being active LOW during a rising edge on clock K. The first write data word is presented on bus D during this clock edge as well, but the write address is not presented until 1/2 clock later, on the rising edge of clock K#, when the second (final) write data word is also presented on bus D. For a 4-word burst device, a write is also indicated by signal W# being active LOW during a rising edge on clock K. The write address is also presented during that same clock edge. The first of four write data words is received on the subsequent rising edge of clock K, with the final three write data words being received on the following three rising edges on K#, K and K# respectively. In a typical memory interface application, a write operation is initiated by driving signal qdr_wcmd_fifo_wenab active HI, and at the same time providing the address on bus qdr_write_addr and the block length on bus qdr_write_block_length. This operation will be held in the write instruction FIFO until its turn comes up, and then the data will be written. The first write data must be waiting on bus qdr_write_data, ready to be accepted, as early as five cycles after qdr_wcmd_fifo_wenab is asserted. When the data is accepted, as indicated by the assertion of qdr_write_data_ready, the second word must be presented on the following cycle. If there are additional bursts of data, they will be accepted on subsequent cycles without interruption until the block is completed. If multiple write operations are stacked up in the write instruction FIFO, the corresponding data will be taken in the same order. If the write instruction FIFO is kept from emptying or new writes are initiated at the same rate as they are executed, then uninterrupted full-bandwidth writes will continue indefinitely. Read A typical read sequence, as it appears on the pins of the QDRII/II+ SRAM, is also shown in Figures 3 (4-word burst) and 7 (2-word burst). For a 2-word burst device, a read is indicated by signal R# being active LOW during a rising edge on clock K. The read address is presented on bus D during this clock edge as well. The first and second read data words are returned coincident with the falling and rising edges on echo clock CQ (note that the falling edge of CQ occurs at the same time as the rising edge on CQ#), starting with the falling edge of CQ that occurs 21/2 cycles after the rising edge of K that clocks in R# LOW. For a 4-word burst device, a read is also indicated by signal R# being active LOW during a rising edge on clock K. The read address is also presented during that same clock edge. The four read data words are returned coincident with the falling and rising edges on echo clock CQ (note that the falling edge of CQ occurs at the same time as the rising edge on CQ#), starting with the falling edge of CQ that occurs 11/2 cycles after the rising edge of K that clocks in R# LOW. The first of four read data words is received on the subsequent rising edge of clock K, with the final three read data words being received on the following three rising edges on K#, K and K# respectively. A read operation is initiated by driving signal qdr_rcmd_fifo_wenab active HI, and at the same time providing the address on bus qdr_read_addr and the block length on bus qdr_read_block_length. This operation will be held in the read instruction FIFO until its turn comes up, and then the requested data will be read. As the read data 6 LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor becomes available, it will be presented on bus qdr_read_data, accompanied by asserted qdr_read_data_valid. The data for the entire block will be provided uninterrupted. As with writes, if multiple read operations are stored in the read instruction FIFO, the data will be accessed and presented in the same order as requested. If the read instruction FIFO is kept from emptying or new reads are initiated at the same rate as they are executed, then uninterrupted full-bandwidth reads will continue indefinitely in 4-word burst mode. Fulls and Empties Both the write and read interfaces utilize fulls and empties to facilitate the throttling of the loading of FIFOs. The signals are similar between the two interfaces. A full signal informs the FPGA logic that the respective FIFO cannot accept another instruction. A memory interface implementation may also provide an “almost full” indication that enables the pipelined flow to be stopped in time to avoid overflow, while still allowing full-speed pipelined operation. An empty indication may not be provided, since it is primarily used on the memory interface’s side of the FIFO to control its emptying, but may be provided to allow the FPGA logic to detect the idle condition. Figure 3. QDRII/II+ SRAM Interface Timing (4-Word Burst Mode) 0 1 2 3 4 5 6 7 K K# R# R1 W# Address[19:0] D[35:0] R2 W1 R1 W1 W2 R2 W1a W2 W1b W1c W1d W2a W2b W2c W2d CQ Q[35:0] R1a R1b QVLD 7 R1c R1d R2a R2b R2c R2d 8 9 LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor Figure 4. QDRII/II+ SRAM Interface Timing (2-Word Burst Mode) 0 1 2 3 4 5 R# R1 R2 R3 R4 W# W1 W2 W3 W4 Address[19:0] R1 W1 R2 W2 R3 W3 R4 W4 W1a W1b W2a W2b W3a W3b W4a W4b 6 7 8 K K# D[35:0] CQ Q[35:0] R1a QVLD 8 R1b R2a R2b R3a R3b R4a R4b 9 9 Q[35:0] CQ D[35:0] A[19:0] RN WN K qdr_rcmd_fifo_empty qdr_read_data[71:0] qdr_read_data_valid qdr_rcmd_fifo_wenab qdr_wcmd_fifo_empty qdr_write_data[71:0] qdr_write_data_ready qdr_wcmd_fifo_wenab k_clk 0 1 2 3 4 5 6 7 WD0 8 WD1 9 10 11 WA RA 13 14 15 RD RD RD RD 0a 0b 1a 1b WD WD WD WD 0a 0b 1a 1b 12 16 RD0 17 RD1 18 19 20 Lattice Semiconductor LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Figure 5. Complete 4-Word Write/Read Sequence, Reading Back Just-Written Data 10 WD0 6 7 8 Q[35:0] CQ WD 0b 5 WD 0a 4 D[35:0] 3 WA 2 RA 1 A[19:0] RN WN K qdr_rcmd_fifo_empty qdr_read_data[71:0] qdr_read_data_valid qdr_rcmd_fifo_wenab qdr_wcmd_fifo_empty qdr_write_data[71:0] qdr_write_data_ready qdr_wcmd_fifo_wenab k_clk 0 9 RD 0a 10 RD 0b 11 12 13 RD0 14 15 Lattice Semiconductor LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Figure 6. Complete 2-Word Write/Read Sequence, Reading Back Just-Written Data LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor Write/Read The timing of a single basic write/read pair of operations having block lengths of one is detailed in Figure 5 (4-word burst) and Figure 6 (2-word burst). Both figures depict a write followed by a read in which the initial state is idle (both the read and write FIFOs are empty), and the read is the earliest that reads back the data just written. QDRII/II+ SRAM Clocks: K/K#, C/C# and CQ/CQ# The QDRII/II+ SRAM utilizes three clock pairs: 1. K/K# – The K input clock accompanies the input signals (A, D, W# and R#) to the SRAM. It clocks these signals into the device and drives the internal logic. This clock is in a quadrature relationship with its data; that is, it is 90° out of phase. As such, the clock edge transitions are centered in the middle of the data “eye”. 2. C/C# – The C input clock is synchronous to the K input clock, but can differ slightly in phase by a constant amount, within specified limits. The read output clock CQ for data bus Q is normally aligned to this C clock. This allows the timing of the Q bus to be adjusted - to align multiple SRAM devices, for example. If this capability is not necessary, as in the present case, the C clock can be disabled by driving both C and C# to a constant HIGH. The CQ clock is then aligned to the K clock. 3. CQ/CQ# – The CQ output clock, also referred to as the echo clock, is generated by the SRAM to accompany the Q output data bus, and its edge timing is similar or identical to that of the Q bus, not in a quadrature relationship as is the case for the K clock. Therefore, the LatticeSC FPGA is required to effectively shift the CQ clock by 90° before using it to capture the data on the Q bus. The current LatticeSC design derives clocks for both phases of the DDR Q data bus from the CQ clock, and does not utilize the CQ# clock. LatticeSC I/O Buffers As can be seen in Figure 8, the QDRII/II+ SRAM Memory Interface utilizes DDR HSTL I/O buffers for both the outputs (for the clock, address, write data and control), and the inputs (for the read data and echo clock). Since the outputs’ parallel to serial conversion and the inputs’ serial to parallel conversion is performed right in the I/O buffers, there is no need for the FPGA fabric to be able to handle data at the doubled clock rate. LatticeSC I/O buffers that are used as inputs (input-only or bi-directional) feature the ability to provide internal termination. Two termination configurations are available (Figure 7): a. Termination directly to VTT via a 60-Ohm impedance. b. Termination via a Thevenin-equivalent 50-Ohm network across VDDIO and VSS. Figure 7. LatticeSC Input Buffer Configurations VTT VDDIO 60 100 100 VSS (a) Termination to VTT (b) Termination to VDDIO/VSS The input buffers on LatticeSC devices can also insert a predetermined delay. This can be used to align the bits on a data bus, or to align the entire bus to its clock. Individual bits can be assigned a constant delay, and all bits in a bank can be assigned a common dynamic delay value. 11 LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor The constant delay is used in the QDRII/II+ SRAM Memory Interface design to compensate for varying input delays across the bits in the Read Data Bus Q. The dynamic delay is used to shift the clock for that bus, CQ, so that it is centered in the eye of the data bus, as described in the next section. In addition to the input buffer features discussed above, the output buffers are configurable to provide either 50 Ω (HSTL-I) or 25 Ω (HSTL-II) output impedance, which is internally compensated for variations in voltage, temperature and device processing. Clocking Challenges and Solutions Figure 8 illustrates the clocking network. Several unique features of the LatticeSC architecture are utilized in this design. A PLL [1] is used to perform frequency multiplication of the input clock “refclk”, and at the same time to generate a second clock that is 90° lagging, so that the clocks “K” and “K#” to the QDRII/II+ SRAM can transition in the center of the data eye of bus “D”. Both “k_clk” and “k_clk shifted 90°” are typically routed on primary clock nets so that there is very little skew from the ideal 90° offset. The clocks “K” and “K#” are then generated using the same DDR output buffer elements as are used in the buffers for output data and control signals, so that once again very little skew is introduced. These two clocks are generated by simply sending a constant “10” pattern on outputs that are in every other respect identical to the data and control outputs. A Valid Timing Chain [2] generates a data valid signal at the correct time to line up with the returning read data by duplicating the latency in the external QDRII/II+ SRAM and board routing. This is necessary because there is nothing returned from the QDRII/II+ SRAM with the read data to identify the valid data. Note that for 2-word bursts, the valid is asserted for one full clock (two half-clocks), and for 4-word bursts, it is asserted for two full clocks (four halfclocks). Note also that the number of registers in the timing chain varies to match the read latency (1.5, 2.0 or 2.5) of the QDRII/II+ SRAM. The Valid Timing Chain straddles two clock domains having the same frequency but different phases, and performs the clock domain transition between them. The phase difference represents all the cumulative delays in the external path: board trace delays (in both directions), and delay from K/K# to CQ/CQ#. The clocking scheme described here can accommodate and compensate for approximately 1/2 clock cycle of variation in this delay. The input registers for the read data bus “Q” and signal "QVLD" [6] require some special clocking, and this need is handled by special hardware capability. The input bus registers have two clock inputs. The first, ECLK, is fed by the edge clock, to receive the data at the earliest time, since the edge clock net has less delay and skew to the I/O registers than the primary clock net. But if this data were to be sent directly to a register clocked by the primary clock, the receiving register’s input hold time could be violated. Therefore, the input register also takes a second input clock, SCLK, which is fed the primary clock. The register does not output the data until this clock’s edge, avoiding any hold time issues. This clock domain transfer mechanism is built-in to the LatticeSC input buffers, thus allowing operation at highest rates of speed. For the return read data bus “Q” and its accompanying valid signal QVLD, a DLL [3] is employed to dynamically generate a value that determines the proper delay to cause an effective 90° phase shift on CQ’s input buffer [5], so that it too is positioned in the center of the data it captures. This takes advantage of the fact that the DLL and the input buffers contain matching delay blocks, so that the delay selection value generated in the DLL when it generates a 90° shifted clock can also be used in the input buffer to cause the same phase shift. A 9-bit digital bus communicates this delay selection value from the DLL to the “CQ” input buffer. The read data is then typically transferred from the “CQ” clock domain to the internal clock domain with the assistance of a synchronous FIFO. For the “Q” data bus and signal QVLD, the delay elements [4] are also used in an “Edge Clock Injection Match” mode. This compensates for the edge clock routing of the “CQ” input, thereby providing an optimal read data edge. Manual changes to the input delay can also be made to each individual “Q” input pin to compensate for differences in board trace delays. 12 LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor Figure 8. QDRII/II+ SRAM Memory Interface Clock and Data Paths [1] PLL refclk k_clk CLKOP CLKOS CLKI (primary clock) (primary clock) k_clk 0 shifted 90 CLKFB CLKINTFB ODDRXA K Q DA DB 0 1 ODDRXA K# Q DA DB 1 0 ODDRXA A[19:0], WN, RN Q FIFO DA DB Q D1 D2 D3 D4 D5 D Q D Q D Q D Q D Q ODDRXA D[35:0] Q Q [3] CLKI Control Fixed delay Q[35:0] QVLD D Write Data Data valid CLKOP CLKOS UPDT DCNTL[8:0] IDDRX1A [6] D (edge clock) (primary clock) QA QB ECLK SCLK o 90 phase delay CQ Logic Net Reg DA DB DLL [5] Address [2] RN [4] DA DB (primary clock) (CQ#) 13 FIFO D Q Read Data LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor Performance Timing Diagrams Unit timing is shown in Figures 3 to 6. Refer to the read and write operation sections above for details and descriptions. Timing Budgets The timing on the signals interconnecting the LatticeSC Memory Interface and the QDRII/II+ SRAM can be difficult to manage, since there are several unusual components to the timing budget. These signals fall into two groups: the signals going to the SRAM under clocking by clock K/K#, and the return signals from the SRAM under clocking by CQ. All LatticeSC timing specifications apply to all speed grades over process/voltage/temperature variation. The values are subject to change. See the LatticeSC Family Data Sheet for the latest values. The first group of signals is depicted in Figure 5-1, and the timing budget is calculated in Table 3. There are seven components to the timing budget: [1] clock period, [2] duty cycle distortion (DCD), [3] K/K# clock jitter, [4] PLL phase error skew, [5] package skew, [6] board trace skew, and [7] SRAM receive register input setup/hold. Parameters marked [a] are LatticeSC device parameters and are valid over all the device’s specified operating conditions and speed grades, [b] is a characteristic of the PC board, and [c] are QDRII/II+ SRAM device parameters. Values given for [b] and [c] are typical and must be adjusted for the specific PCB and QDRII/II+ SRAM chosen. DCD and PLL phase error skew are both significant because they both tend to displace the clocks in the eye, and because of the fact that both edges of the clock are critical for capturing data. 14 LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor Figure 5-1. Timing Path, DDR Write Note: The data from the register is always sent to the I/O buffer cycle early so that the data and clock paths are the same. FPGA QDRII/II+ SRAM [5] [4] [6] Register Register Clock Fanout [1, 2, 3] [7] DDR I/O Buffer K/K# [7] k_clk PLL DDR I/O Buffer k_clk shifted 90° [5] [4] [6] Table 3. Timing Budget – DDR Write Data Path (Preliminary) Parameter Units Value Long Path Short Path Description fclk MHz 200 Clock Frequency tclk psec 5000 Clock Period tclk-ddr [1] psec 2500 tdcd [2] psec ±125 tcj [3] tphase [4] tpkg [5] DDR Clock Period (1/2 * tclk) +125 -125 [a] Clock Duty Cycle Distortion (45%-55%) psec ±50 +50 -50 [a] Clock Jitter (0.02 UI) psec ±70 +70 -70 [a] PLL Phase Error Skew psec ±50 +50 -50 [a] Primary Clock Skew psec ±40 +40 -40 [a] LatticeSC Package Skew -80 tboard [6] psec ±80 +80 tsd [7] psec +450 +450 thd [7] psec -450 Total -450 +865 -865 [b] Board Trace Skew (K/K# vs. D) [c] QDRII/II+ SRAM Input Setup [c] QDRII/II+ SRAM Input Hold Sum of All Components Note: “Eye” size = (2500 ps) - (865 ps) - (865 ps) = 770 ps The second group of signals is depicted in Figure 6, and the timing budget is calculated in Table 4 (QDRII/II+ SRAM DLL enabled). There are nine components to the timing budget: [1] clock period, [2] QDRII/II+ SRAM duty cycle distortion (DCD), [3] QDRII/II+ SRAM Clock Output Skew, [4] package skew, [5] board trace skew, [6] edge clock skew, [7] LatticeSC receive register input setup/hold, [8] the SRAM output skew, and [9] the quantization error of the DLL. Item [9] represents the smallest step of delay change in the DLL’s internal programmable delay. Parameters marked [a] are LatticeSC device parameters and are valid over all the device’s specified operating conditions and speed grades, [b] is a characteristic of the PC board, and [c] are QDRII/II+ SRAM device parameters. Values given for [b] and [c] are typical and must be adjusted for the specific PCB and QDRII/II+ SRAM chosen. 15 LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor Figure 6. Timing Path, DDR Read FPGA QDRII/II+ SRAM [6] [4] [5] [7] DDR I/O Buffer/Reg Register Register valid Register SDR in, DDR out [6] Da Db [1] CQ PLL I/O Buffer w/delay DLL [8] [2, 3] [4] Register SDR in, DDR out 1 0 [5] 9 Table 4. Timing Budget - DDR Read Data Path (Preliminary) Parameter fclk tclk Units Value MHz 200 Long Path Short Path Clock Frequency Description psec 5000 Clock Period [1] psec 2500 DDR Clock Period (1/2 * tclk) [2] psec ±125 +125 Tcqhqv [3] psec +350 +350 Tcqhqx [3] psec -350 tpkg [4] psec ±40 tboard [5] psec ±80 tclk-ddr tdcd -125 [b] Clock Duty Cycle Distortion of QDRII/II+ SRAM (45%-55%) [b] QDRII/II+ SRAM CQ High to Q Valid -350 [b] QDRII/II+ SRAM CQ High to Q Invalid +40 -40 [a] LatticeSC Package Skew +80 -80 [b] Board Trace Skew [6] psec +50 +50 [a] Edge Clock Skew tsd [7] psec +363 +363 [a] LatticeSC Input Setup tsd [7] psec +279 +279 [a] LatticeSC Input Hold tramout [8] psec ±40 -40 [c] QDRII/II+ SRAM Output Skew Tdqe [9] psec ±70 +70 -70 [a] +1118 -426 Total +40 DLL Delay Quantization Error Sum of All Components Note: “Eye” size = (2500 ps) - (1118 ps) - (426 ps) = 956 ps Usage Overview Module Manager and I/O Assistant The LatticeSC family is supported by ispLEVER®, a suite of tools that facilitate the implementation of complex designs such as those that employ a QDRII/II+ SRAM Memory Interface. Module Manager is a graphically driven utility that accepts user specifications for pre-optimized logic blocks, and generates all the detailed instructions for synthesizing and simulating that logic. The LatticeSC QDR-SRAM Memory Interface is one of the logic units that Module Manager can generate. I/O Assistant is another utility in ispLEVER, and it allows the designer to specify physical design restraints such as pin placement and I/O bank assignment, and to do so early in the design cycle, even before the detailed FPGA design is complete. This is necessary in complex designs where these allocations can have significant impact on the design flow. 16 LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor Pinout Selection and Layout Constraint In order to achieve maximum operating speed, the QDRII/II+ protocol requires carefully designed I/O timing on the interface between the LatticeSC device and the QDRII/II+ SRAM. For this reason, Lattice drop-in modules will provide pre-optimized pinout assignments, as well as assignments for critical internal components such as PLLs and DLLs (a QDRII/II+ Memory Interface design typically requires one PLL and one DLL). Where possible, optional alternatives are provided so that the Memory Interface can co-exist with various combinations of other IP. One option, the more usual case, places the Memory Interface’s interface pins directly adjacent to the unit’s internal logic on the left or right side of the chip (see Figure 7). However, in case those pins are unavailable, a second option places the interface pins on the bottom of the chip. There is the third option, of course, of not utilizing preassigned pinouts, but then the timing is not pre-optimized, and will change with each rerouting of the design. Figure 7. QDRII/II+ SRAM Memory Interface Pinout Options LatticeSC FPGA QDR-II SRAM Memory Interface Right Side Option Left Side Option QDR-II SRAM Memory Interface To I/O Pins Via FPGA Fabric Bottom Side Option Bottom Side Option In order to achieve minimum skew between signals, all signals in a bus should be assigned pins within the same I/O bank. If this is not possible, use pins in two banks adjacent to the same corner of the FPGA. These two banks in the LatticeSC device have low skew when driven by a PLL or DLL in that corner. It is a good design practice to explicitly assign the input clocks, but allow the associated data signals to be assigned by the placement tool. Clocks frequently drive directly into specific internal elements such as a PLL or DLL. These elements typically have a dedicated corresponding I/O pin that, when used, provides optimal performance. The I/O Assistant application facilitates the process of clock/data pin assignment. Using I/O Assistant, the designer can easily assign resources (pins, PLLs, DLLs, etc.) with whatever specificity is appropriate (pin, I/O bank, etc.), even before the design is complete. All LatticeSC device pins that are tied to the QDRII/II+ SRAM’s “Q” bus must be accessible by a single edge clock, which is driven by the conditioned “CQ” clock input. The choice of pinout is often made or changed late in the design process, after design components have already been created in Module Manager. For this reason, the appropriate pinout is best selected by adding a precompiled set of preferences to the preference file, which allows the pinout to be defined or redefined at the latest possible point. When the Memory Interface is created by Module Manager, a “readme” file is also created. This file contains 17 LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor all optional pinouts for the target device, and the designer can then copy and paste the appropriate pinout into the preference file. Board Level Considerations Trace Layout Guidelines Good board layout techniques are important on any high-speed design, but QDRII/II+ SRAM designs have some specific requirements as well. Here are some general guidelines: • All traces are 50 Ohm transmission lines. • Trace lengths on buses must match to each other and their associated clocks to within 0.5 inch. There are two groups of buses, and trace lengths must be matched to others in that group. The first group includes the Address Bus “A”, the Write Data Bus “D”, as well as signals “W#” and “R#” (clock = “K/K#”); the second group includes only the Read Data Bus “Q” (clock = CQ). • All traces are uncoupled; in particular, “K” and “KN” are not a differential pair. • Power rails, such as “VTT”, must be planes, not traces. • Care must be taken to keep reference voltages, such as the QDRII/II+ SRAM’s “VREF”, noise-free. This involves robust, wide-bandwidth decoupling, and isolation of quiet, noise-sensitive signals and references from noise sources. • The physical distance between the LatticeSC device and the QDRII/II+ SRAM needs to be minimized, since trace delays and skews will limit overall speed, as previously discussed. • Simultaneous Switching Outputs (SSO) are a real concern in QDRII/II+ SRAM designs, since there are usually can be no design constraints on how many signals can switch simultaneously. Thus, for example, for the signals from the LatticeSC device to the QDRII/II+ SRAM, it is possible for all 56 bits in the Address and Write Data Buses to switch simultaneously in the same direction. For this reason, be sure to provide robust, high-current power/ground planes, with plenty of high-frequency bypassing. For more details, refer to the LatticeSC application notes in Section 5. References • For more information on the QDRII/II+ standard: – QDR Consortium: www.qdrsram.com • For information on participating vendors: – Cypress Semiconductor Corp. – Samsung Semiconductor – NEC Corporation – IDT Incorporated – Renesas Technology Corp. www.cypress.com www.samsung.com www.nec.com www.idt.com www.renesas.com • For more information on the LatticeSC family: – LatticeSC Family Data Sheet – LatticeSC technical notes www.latticesemi.com Conclusion This user’s guide discusses an implementation of a QDRII/II+ Memory Interface, to assist designers in determining whether the QDRII/II+ SRAM is appropriate for their application, and if so, to provide understanding of its operation and details for design-in. 18 LatticeSC QDRII/II+ SRAM Memory Interface User’s Guide Lattice Semiconductor Technical Support Assistance Hotline: 1-800-LATTICE (North America) +1-503-268-8001 (Outside North America) e-mail: [email protected] Internet: www.latticesemi.com Revision History Date Version February 2006 01.0 November 2007 01.1 Change Summary Initial release. 19