LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface April 2011 Reference Design RD1030 Introduction Source synchronous interfaces consisting of multiple data bits and clocks have become a common method for moving image data within electronic systems. A prevalent standard is the 7:1 LVDS interface (employed in Channel Link, Flat Link, and Camera Link), which has become a common standard in many electronic products including consumer devices, industrial control, medical, and automotive telematics. In many of these applications, the practice of using low-cost FPGAs for image processing has become quite common. In particular, LatticeXP2™, LatticeECP2™, LatticeECP2M™ and LatticeECP3™ are well-suited to support the 7:1 LVDS standard. Note: Since the 7:1 LVDS interface is supported in LatticeECP3 “EA” devices, but not the earlier “E” devices, all references to LatticeECP3 in this document refer to the “EA” devices only. This document describes the requirements for implementing a 7:1 LVDS interface and the advantages of using these FPGAs in such an interface. By extension, support for the 7:1 LVDS interface in these devices proves the feasibility of hardware implementation for all other LVDS source synchronous requirements as well. Two designs are included in the discussion of this document. The first design is a simple loopback test that illustrates the use of the 7:1 transmitter and 7:1 receiver. The second design is an example that brings video data into the FPGA device through the 7:1 receiver, processes it and transmits it out via the 7:1 transmitter. Both designs are verified using the Lattice 7:1 LVDS Video Demo Kit. 7:1 LVDS Interface Requirement The 7:1 LVDS interface is a source synchronous LVDS interface. Seven data bits are serialized for each cycle of the low-speed clock as shown in Figure 1. Typically, the interface consists of four (three data, one clock) or five (four data, one clock) LVDS pairs. The four pairs translate to 21 parallel data bits and five pairs translate to 28 parallel data bits. Note that there is a 2-bit offset between the clock rising edge and the word boundary. Each word is 7 bits long. Figure 1. Basic Timing of the 7:1 LVDS Interface Clock DataA DataA DataA DataA DataA DataA DataA DataA DataA DataA DataA DataA DataA DataA DataA D1 D0 D6 D5 D4 D3 D2 D1 D0 D6 D5 D4 D3 D2 (n-1) (n-1) (n) (n) (n) (n+1) (n+1) (n+1) (n+1) (n+1) (n) (n) (n) (n) DataB DataB DataB DataB DataB DataB DataB DataB DataB DataB DataB DataB DataB DataB DataB D1 D0 D6 D5 D4 D3 D2 D1 D0 D6 D5 D4 D3 D2 (n) (n) (n) (n) (n-1) (n-1) (n) (n) (n) (n+1) (n+1) (n+1) (n+1) (n+1) DataC DataC DataC DataC DataC DataC DataC DataC DataC DataC DataC DataC DataC DataC DataC D1 D0 D6 D5 D4 D3 D2 D1 D0 D6 D5 D4 D3 D2 (n-1) (n-1) (n) (n) (n) (n) (n) (n) (n) (n+1) (n+1) (n+1) (n+1) (n+1) DataD DataD DataD DataD DataD DataD DataD DataD DataD DataD DataD DataD DataD DataD DataD D1 D0 D6 D5 D4 D3 D2 D1 D0 D6 D5 D4 D3 D2 (n-1) (n-1) (n) (n) (n) (n) (n) (n) (n) (n+1) (n+1) (n+1) (n+1) (n+1) Previous Cycle Current Cycle Next Cycle © 2011 Lattice Semiconductor Corp. All Lattice trademarks, registered trademarks, patents, and disclaimers are as listed at www.latticesemi.com/legal. All other brand or product names are trademarks or registered trademarks of their respective holders. The specifications and information herein are subject to change without notice. www.latticesemi.com 1 rd1030_01.5 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor Each channel includes a serial LVDS data pair along with a source synchronous LVDS clock pair. The receiver receives this serial LVDS data, deserializes it and aligns it to the original word boundary to generate seven parallel LVTTL data bits. The 7:1 transmitter serializes the seven LVTTL parallel data bits to a single LVDS data bit and transmits this serial data channel along with a LVDS clock. Figure 2 shows the 7:1 receiver receiving four LVDS data channels. When deserialized, it generates 28-bit wide parallel data. Similarly, the 7:1 transmitter serializes 28-bit parallel data to generate four LVDS data channels. Figure 2. 7:1 Receiver and Transmitter Function 4-Bit LVDS Data 4 28 28-Bit Parallel LVTTL Data 28-Bit Parallel LVTTL Data 7:1 Receiver LVDS Clock 4 28 4-Bit LVDS Data 7:1 Transmitter LVTTL Clock LVTTL Clock LVDS Clock The requirements for an FPGA-based solution to the Channel Link and Flat Link style interfaces consist of four key components: high-speed LVDS buffers, a PLL for generating the de-serialization clock, input data capture and gearing, and data formatting. The data and clock are received or transmitted to or from the FPGA in LVDS format, with the data at relatively high speed. The exact speed depends on the resolution, frame rate and color depth used by the display. For example, 800x600 to 1024x768 displays require LVDS data to be transmitted from 40 MHz to 78.5 MHz for 60 Hz to 75 Hz refresh rates. This translates to LVDS data rates of 280 Mbps to 549 Mbps. Higher resolution displays, such as 1280x1024 60 Hz, require data to be transmitted with 108 MHz LVDS clocks. For this system, data will transmit at 756 Mbps. Clock Generation In a LatticeECP3, LatticeECP2/M or LatticeXP2 implementation, the input capture circuitry uses Double Data Rate (DDR) registers with data captured on both the rising and falling edges of the clock. When operating as a receiver the low-speed clock that is provided with the data must be multiplied by 3.5 times in order to capture the data on both clock edges. If the input capture circuitry operates on only one edge of the clock, a multiplication factor of seven must be used. As an alternative, seven phase-shifted versions of the low-speed clock can be generated and used to capture the input data with seven different registers. However, the challenges of clock generation and distribution discourage this approach for an FPGA implementation. The clock must have relatively low jitter since its jitter must be accounted for in the overall timing budget. Similarly, the skew of the clock distribution network used to provide this clock to input or output registers must be accounted for in any timing analysis. In order to transmit high-speed data, a transmitter must multiply the clock used to transfer low-speed parallel data into the interface by 3.5. Again, the jitter of the clock and the skew of its distribution are important as they impact the timing budget for the interface. Figure 3 shows the PLL clock generation and how the R, G, B bits, Vsync, Hsync, and DE of a pixel on line 2 of a video frame get assigned to the four LVDS data pairs. The data bits are sampled on both rising and falling edges of the eclk clock. 2 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor Figure 3. Timings of Video Signals and the 7:1 LVDS Channel Link Interface Vsync Hsync DE Line 001 R/G/B Line 002 Line 003 Line 479 Line 480 Hsync DE Pixel Clock R[7:0] Pixel(001) R[7:0] Pixel(002) R[7:0] Pixel(003) R[7:0] Pixel(639) R[7:0] Pixel(640) R[7:0] G[7:0] Pixel(001) G[7:0] Pixel(002) G[7:0] Pixel(003) G[7:0] Pixel(639) G[7:0] Pixel(640) G[7:0] B[7:0] Pixel(001) B[7:0] Pixel(002) B[7:0] Pixel(003) B[7:0] Pixel(639) B[7:0] Pixel(640) B[7:0] PLL CLKOP (RCLK_in x 3.5) (not used) RCLK_in CLKOS (RCLK_in x 3.5 + phase shift) eclk CLKOK (RCLK_in x 1.75) sclk RCLK_in RD_in R1 (n-1) R0 (n-1) G0 (n) R5 (n) R4 (n) R3 (n) R2 (n) R1 (n) R0 (n) G7 (n+1) R5 (n+1) R4 (n+1) R3 (n+1) R2 (n+1) RC_in G2 (n-1) G1 (n-1) B1 (n) B0 (n) G5 (n) G4 (n) G3 (n) G2 (n) G1 (n) B1 (n+1) B0 (n+1) G5 (n+1) G4 (n+1) G3 (n+1) RB_in B3 (n-1) B2 (n-1) DE (n) Vsync (n) Hsync (n) B5 (n) B4 (n) B3 (n) B2 (n) DE (n+1) Vsync (n+1) Hsync (n+1) B5 (n+1) B4 (n+1) RA_in R7 (n-1) R6 (n-1) Rsrv (n) B7 (n) B6 (n) G7 (n) G6 (n) R7 (n) R6 (n) Rsrv (n+1) B7 (n+1) B6 (n+1) G7 (n+1) G6 (n+1) eclk Data Capture The registers that follow the LVDS input buffer must accurately capture the data. A tight control of the clock and data relationship is important to capture the incoming high-speed data stream. It is also necessary to gear, or reduce, the speed of the data before it is passed on to the FPGA fabric. Let us take LatticeECP2/M and 3 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor LatticeECP3 as examples. LatticeECP2/M FPGAs specify the operation of individual circuit elements to around 350 MHz. For LatticeECP3, it will be around 470 MHz. A practical operating frequency with a reasonable amount of logic is 225 MHz for LatticeECP2/M and 350 MHz for LatticeECP3. Therefore, the greater the gearing that can be done in the I/O structure, the lower the likelihood that the FPGA fabric will be the limit on overall performance. A similar discussion is applicable to the transmit path. Data Formatting The final step is to take the data from the I/O cells and format it into the original 7-bit width clocked by the lowspeed clock. This logic can easily be constructed within the FPGA fabric. LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Interface The LatticeECP3, LatticeECP2/M and LatticeXP2 architectures provide an ideal solution for this interface. This section describes implementation of the 7:1 receiver and 7:1 transmitter using the LatticeECP3, LatticeECP2/M and LatticeXP2 device I/O structures. 7:1 Receiver Figure 4 shows the block diagram of the receive side of an intra-system display interface within a LatticeECP3, LatticeECP2/M or LatticeXP2 device. The receiver receives four LVDS data channels (seven bits each) and one LVDS clock. Figure 4. 7:1 Receiver Side Block Diagram IO DDR Registers (2x gearing) Auto Alignment Module IDDRX2B* 7 RA_in 4 4:7 Deserializer 7 4 4:7 Deserializer 7 4 4:7 Deserializer 7 4 4:7 Deserializer 7 RA_out 7 7 IDDRX2B* 7 RB_in RB_out 7 7 IDDRX2B* 7 RC_in RC_out 7 7 IDDRX2B* 7 O utpu t S ele c t 7 RD_in RD_out 7 RST IDDRX2B* RCLK_in 4 4:7 Deserializer RCK_out 7 7 reset_sync SCLK ECLK sysCLOCK PLL RESET CLKOS CLKI x3.5, phase-shifted CLKI CLKOK (CLKI x3.5)/2, 0deg phase DPHASE LOCK reset_sync generation logic 4 4 reset_sync_out LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor The data and clock enter the LatticeECP3, LatticeECP2/M or LatticeXP2 device through the LVDS buffers in the Programmable I/O Cell (PIC) block. When the 2x gearing function is used, these buffers operate at up to 420 MHz (i.e., 840Mbps) for LatticeXP2 and LatticeECP2/M devices, or 500 MHz (i.e., 1.0Gbps) for LatticeECP3 devices, supporting most high resolution and display refresh rates. The LVDS data is fed to the I/O logic DDR register and the source synchronous LVDS clock is fed into a PLL. The PLL is used to multiply the clock by 3.5 and create a phase shift which is normally 90 degrees. This phase shift allows for placing the clock in the middle of the data valid window. This faster phase-shifted clock is then distributed via a low skew edge clock net to double data rate input capture registers. The PLL is also used to generate a slower clock that is half the frequency of the faster edge clock. This clock is fed to the second stage of DDR registers in the I/O logic block using the primary clock tree. The I/O DDR register with the 2x gearing function (IDDRX2B) is used for the design with LatticeXP2 and LatticeECP2/M FPGAs. A 2x DDR element provides four FPGA side data bits for every I/O side data bit at half the clock rate. The gearing allows muxing/demuxing of the I/O data clocked with the high-speed Edge clock (ECLK) to the slower speed FPGA clock rate (SCLK). In the end, all the data is received at the rising edge of SCLK. Figure 5 is a detailed diagram of the IDDRX2B. Figure 5. IDDRX2B Detailed Block Diagram IDDRX2B DDR Registers Synchronization Registers Clock Transfer Registers D D H H E E II Q(0) DATA A Q(1) B C TRUE PIO in LVDS sysIO Pair COMP PIO in LVDS sysIO Pair Synchronization Registers Clock Transfer Registers Q(2) F J G K Q(3) ECLK SCLK The IDDRX2B module inputs the DDR data at both edges of the Edge clock and generates four streams of data, all at the rising edge of the slower FPGA clock. The shaded portion of Figure 5 shows the I/O registers used to do the 2x gearing mode. The I/O registers of the complementary PIO are used in DDR gearing mode. For more informa- 5 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor tion on the DDR registers and the various modes, refer to TN1105, LatticeECP2/M High-Speed I/O Interface, TN1138, LatticeXP2 High-Speed I/O Interface and TN1180, LatticeECP3 High-Speed I/O Interface. Figure 6 shows an example of input gearing using the IDDRX2B block. Figure 6. Example of Input Gearing Using IDDRX2B CLK at I/O DDR DATA at I/O P0 N0 P0 N0 P1 N1 P2 N2 P3 N3 P4 N1 P2 N2 P3 N3 P4 ECLK (shifted 90 deg) DDR DATA at IDDRX2B A XX N0 B C D/E F/G P1 N1 P1 P0 XX N2 P2 XX N1/P1 N0/P0 XX N0/P0 N4 N3 P3 P2 P1 P0 N4 P4 P3 P4 N2/P2 N3/P3 N1/P1 N2/P2 SCLK Q(0) XX P0 P2 Q(1) XX P1 P3 Q(2) XX N0 N2 Q(3) XX N1 N3 The four bits of parallel data are then converted to 7-bit data at the correct speed in the 4:7 deserializer module. The deserializer stores the 4-bit output of the IDDRX2B in a 28-bit wide shift register. The incoming LVDS clock is then used as a framing signal to detect the start and end of the 7-bit data frame. The ordering of the 7-bit data can be modified in the design files if required. 7:1 Transmitter Figure 7 shows the transmit side of the 7:1 implementation. In this case, the LatticeECP3, LatticeECP2/M or LatticeXP2 device receives four channels of 7-bit parallel data and the slow clock. All 28 bits of parallel data are aligned to the slow clock received. The slow input clock is fed to the PLL. The PLL is used to multiply the clock 3.5 times (ECLK). The PLL is also used to generate a clock at half the frequency of the 3.5x clock. This clock is represented by the SCLK in Figure 7. The DDR register in the I/O Logic module is used to generate the serial data output. LatticeXP2 and LatticeECP2/M FPGAs support output DDR register modules with 2x gearing similar to the input DDR registers. The advantage of using the output DDR registers with the 2x gearing (ODDRX2B) over 1x gearing is that the FPGA core can run at half the speed of the clock used by the output DDR registers 6 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor The seven bits of parallel data need to be converted to four bits of serial data before they are sent to each of the output DDR registers. The 7:4 Serializer module is used to do this. Each of the seven bits of parallel data is stored in a 28-bit wide buffer and four bits of data aligned to the SCLK clock are sent to the ODDRX2B module. Figure 7. 7:1 Transmitter Side Block Diagram IO DDR Registers (2x gearing) ODDRX2B* TA_in 7 7:4 Serializer TA_out 4 ODDRX2B* TB_in 7 7:4 Serializer TB_out 4 ODDRX2B* TC_in 7 7:4 Serializer TC_out 4 ODDRX2B* TD_in 7 7:4 Serializer TD_out 4 “1100011” 7:4 Serializer ODDRX2B* TCLK_out 4 RST SCLK sysCLOCK PLL RST_Tx RESET ECLK CLKOP CLKI x3.5, 0deg CLK_Tx CLKI CLKOK (CLKI x3.5)/2, 0deg LOCK The ODDRX2B also receives the faster ECLK from the PLL and performs the gearing function. The gearing allows multiplexing of the I/O data clocked with the slow-speed FPGA Clock (SCLK) to the high-speed Edge clock. All of the data is transmitted at the rising edge of the ECLK. Figure 8 shows a detailed diagram of the ODDRX2B. 7 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor Figure 8. ODDRX2B Gearing Function ODDRX2B D0 DB0 Q A0 DB1 B0 E0 C0 F0 TRUE PIO in LVDS sysI/O Pair COMP PIO in LVDS sysI/O Pair DA0 A1 DA1 B1 C1 SCLK ECLK The ODDRX2B module inputs come from the four bits of data from the FPGA fabric at both edges of the slow FPGA Clock (SCLK). These inputs also generate a single stream of data at both edges of the faster Edge Clock (ECLK). The shaded portion of Figure 8 shows the I/O registers used in 2x gearing mode. The I/O registers of the complementary PIO are used in DDR 2x Gearing mode. For more information on the DDR registers and various modes, refer to TN1105, LatticeECP2/M High-Speed I/O Interface, TN1138, LatticeXP2 High-Speed I/O Interface and TN1180, LatticeECP3 High-Speed I/O Interface. Figure 9 shows an example of input gearing using the ODDRX2B block. 8 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor Figure 9. Example of Output Gearing Using ODDRX2B DB0 XX d1 d5 d9 d13 d17 .. DB1 XX d3 d7 d11 d15 d19 .. SCLK Reg A0 XX d1 d5 d9 d13 Reg B0 XX d3 d7 d11 d15 Latch C0 XX A ( Mux0) XX d3 d1 d7 d3 d5 d17 d19 d15 d11 d7 d9 DA0 XX d0 d4 d8 d12 DA1 XX d2 d6 d10 d14 d11 d13 d15 d17 .. d16 d18 .. SCLK Reg A1 Reg B1 XX d4 d0 XX d6 d2 d2 d8 d12 d16 d10 d14 d18 d6 d14 d10 Latch C1 XX B ( Mux1) XX d0 d2 d4 d6 d8 d10 d12 d14 d16 Copy of A ( Mux0) XX d1 d3 d5 d7 d9 d11 d13 d15 d17 ECLK Reg D0 XX d0 d2 d4 d6 d8 d10 d12 d14 Reg E0 XX d1 d3 d5 d7 d9 d11 d13 d15 Latch F0 XX d1 Q XX d0 d1 d3 d5 d7 d2 d3 d4 d5 d6 d9 d7 d8 d11 d13 d15 d9 d10 d11 d12 d13 d14 The serialized data output of the ODDRX2B is sent out of the device using high-speed LVDS buffers. The LatticeECP3 I/O structure is different from that of the LatticeXP2 and LatticeECP2/M devices. The DQSBUF primitive (e.g., DQSBUFE for 2x gearing) has to be used to generate the strobe logic and delay used in the output DDR modules to correctly mux the DDR data. This DQSBUF primitive is required for the outputs of generic DDR implementations such as 7:1 LVDS. Since all I/Os in a DQS group share the same DQSBUF, it is recommended to group as many I/Os of the same 7:1 LVDS bus as possible within one DQS group. Since each DQS group includes only a limited number of True LVDS pins (normally two I/Os per DQS group), if True LVDS I/Os are used for 7:1 LVDS outputs, more DQSBUF primitives will be required to span the True LVDS outputs to adjacent DQS groups. Note that this does not apply to the design using emulated LVDS outputs. Also, the I/O DDR primitives in 9 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor LatticeECP3 devices (IDDRX2D1/ODDRX2D ) are different from those in LatticeXP2 and LatticeECP2/M devices (IDDRX2B/ODDRX2B) in port definitions. For more detailed information, see TN1180, LatticeECP3 High-Speed I/O Interface. Design Example 1: Loopback Test The loopback test design included with this document uses the Lattice FPGA to implement both the 7:1 transmitter and receiver. Figure 10 shows the design implementation. For more detailed information about the 7:1 transmitter and receiver, refer to Figures 4 and 7. 28-bit transmit data is generated in the FPGA logic using counter values. This data is then serialized and transmitted as four bits of LVDS data using the 7:1 transmitter logic. The 4-bit LVDS data is then looped back into the LatticeECP3, LatticeECP2/M or LatticeXP2 device receiver side and deserialized using the 7:1 receiver logic. This deserialized data is then fed to the data compare logic module which compares the deserialized receiver data to the original counter values transmitted. The error count is increased at every mismatch detected between the two data values. The 7:1 transmitter and receiver logic is explained in detail in the sections above. Figure 10. Loopback Test Block Diagram TDATA_out CLK_Tx 7:1 Transmitter Transmit Data Generator Error_Count 4 TCLK_out 28 Data Compare and Error Logic Count RCLK_in 28 7:1 Receiver RDATA_in 4 RCLK_out Loopback Test Implementation Results The loopback design was tested using the LatticeECP2 Advanced Evaluation Board, the LatticeXP2 Advanced Evaluation Board and the LatticeECP3 Video Protocol Board. Both the Lattice FPGA transmit and receive sides were successfully run at 108 MHz transmit and receive pixel clock for LatticeECP3, LatticeECP2/M and LatticeXP2. For LatticeECP3, it can run up to 135 MHz. Table 1 shows the resources utilized by the design. 10 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor Table 1. Loopback Test Design Performance and Resource Utilization Device Family Speed Grade Language VHDL LatticeECP31 -7 Verilog VHDL LatticeECP2/M2 -6 Verilog VHDL LatticeXP23 Utilization (LUTs) fMAX (MHz) -6 Verilog I/Os Slices Registers sysMEM EBRs sysDSP Blocks 832 (1%) >108 36 771 910 0 (0%) 0 (0%) 819 (1%) >108 36 766 916 0 (0%) 0 (0%) 858 (2%) >108 36 794 914 0 (0%) 0 (0%) 834 (2%) >108 36 778 916 0 (0%) 0 (0%) 839 (5%) >108 36 785 916 0 (0%) 0 (0%) 825 (5%) >108 36 774 915 0 (0%) 0 (0%) 1. Performance and utilization characteristics are generated using LFE3-95EA-7FN1156C with Lattice Diamond™ 1.2 design software. When using this design in a different device, density, speed, or grade, performance and utilization may vary. 2. Performance and utilization characteristics are generated using LFE2-50E-6F672C with Lattice Diamond 1.2 design software. When using this design in a different device, density, speed, or grade, performance and utilization may vary. 3. Performance and utilization characteristics are generated using LFXP2-17E-6F484C with Lattice Diamond 1.2 design software. When using this design in a different device, density, speed, or grade, performance and utilization may vary. In this design, the Lattice FPGA functions as both the transmitter and receiver. Design Example 2: Demonstration of 7:1 LVDS Interface with Video Processing Functions In order to verify the operation of the 7:1 LVDS interfaces within the Lattice FPGA, Lattice has developed the test system shown in Figures 11 and 12. The test system on the LatticeECP3 Video Protocol Board is the same as the one on the LatticeXP2 Advanced Evaluation Board. Detailed information regarding the test system on the LatticeECP2 Advanced Evaluation Board, including Boards #1, #2, and #3, and the LatticeECP2 Advanced Evaluation Board, is included in TN1134, Lattice 7:1 LVDS Video Demo Kit User's Guide. This system takes video data supplied in DVI format from a source such as a PC or a DVD player and converts it to the 7:1 LVDS source synchronous format using a National Semiconductor Channel Link Transmitter Device. This image data is fed to the Lattice FPGA where the 7:1 Receiver module is used to deserialize the data. This data is then converted back into serial data using the 7:1 Transmitter module within the Lattice FPGA device. It is then transmitted using a source synchronous 7:1 LVDS interface to a National Semiconductor Channel Link Receiver device and ultimately to a display. Figure 11. 7:1 Interface Test System on LatticeECP2 Advanced Evaluation Board Board #3 Board #1 (or #4) LatticeECP2 Advanced Evaluation Board 60-pin connection LVDS 7:1 Rx Deserializer R G B Gain Control Gain Control Gain Control R G B TMDS signals LVCMOS/LVTTL signals LVDS signals MDR-26 Channel-Link Cable RGB to YCbCr Converter DVI Cable Desktop PC DVD Player ATSC Tuner DVD MDR-26 Channel-Link Cable Y Cb Cr DVI Cable Contrast / Brightness / Hue / Saturation Adjustments Y Cb Board #1 Board #2 DVI B R G B LVDS 7:1 Tx Serializer V H D M V H D M 26 -p in 3 M M D R G OSD D S9 0C R 28 8 A M T D YCbCr to RGB Converter R LCD Display Cr 2 6-p in 3M M D R DVI V H D M O n -B o ard S w itc h s V H D M V ide o A d jus t m en ts 2 6 -pin 3 M M D R (TI TFP401A ) 2 6 -pin 3 M M D R TMDS Receiver LatticeECP2-50 Device D S 9 0 C R 2 87 M T D V H D M TMDS Driver (TI TFP410) 60-pin connection 11 V H D M LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor Figure 12. 7:1 Interface Test System on LatticeXP2 Advanced Evaluation Board Board #3 LatticeXP2 Advanced Evaluation Board 60-pin Connection R G B Gain Control Gain Control Gain Control R G B On-Board Switches LVDS 7:1 Rx Deserializer V ide o A d j us t m en ts DVI 26-pin 3M MDR 26-pin 3M MDR DS90CR287MTD VHDM TMDS Receiver (TI TFP401A) LatticeXP2-17 Device RGB to YCbCr Converter DVI Cable Cb Y Cb Board #2 G B TMDS Driver (TI TFP410) VHDM LVDS 7:1 Tx Serializer DS90CR288AMTD B 26-pin 3M MDR G LCD Display DVI OSD R MDR-26 Channel-Link Cable Cr YCbCr to RGB Converter R LVDS signals DVI Cable 26-pin 3M MDR MDR-26 Channel-Link Cable LVCMOS/LVTTL signals Cr Contrast / Brightness / Hue / Saturation Adjustments Desktop PC DVD Player ATSC Tuner DVD Y TMDS signals 60-pin Connection Figures 11 and 12 show a simplified block diagram of the design inside the FPGA device. Other than the receiver and transmitter modules, the center logic block can be any customized video processing design. For demonstration purposes, the designs shown in Figures 11 and 12 were created to include the following features. • R-gain, G-gain, B-gain controls • Contrast, Brightness, Hue, Saturation controls • On-Screen-Display controlled by LatticeMico8 microprocessor • On-Screen-Display opacity control 12 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor Figure 13. Video Processing Design Example RA_in RB_in RC_in reset_sync 7:1 LVDS Receiver (LVDS_7_to_1_RX) RD_in RCLK_in 7 7 rx_d 7 rx_c 7 rx_b rx_a Rx Signal Mapping 8 8 8 3 r_R r_G r_B r_Vsync r_Hsync r_DE RGB_adj Gain Ctrl Gain Ctrl 8 rgb_R Gain Ctrl 8 rgb_G Delay 8 rgb_B 3 rgb_Vsync rgb_Hsync rgb_DE CBHS_adj CBHS Adjustment Outputs RGBO Adjustment Outputs (Contrast/Brightness/Hue/Saturation Delay Adjustments) 8 cbhs_R 8 cbhs_G 8 cbhs_B 3 cbhs_Vsync cbhs_Hsync cbhs_DE OSD (On-Screen-Display Controlled by Mico8 uP) Mico8 uP Delay 8 t_R t_G t_B 3 t_Vsync t_Hsync t_DE Tx Signal Mapping 7 tx_d 7 tx_c 7 7 tx_b tx_a TCLK_out TA_out TB_out CBHS Adjustment Inputs 8 RGBO Adjustment Inputs 8 Adjustment Signals Generation Logic 7:1 LVDS Transmitter (LVDS_7_to_1_TX) TC_out TD_out From DIP Switch 8 From Pushbotton Switch The block diagram of this design example is shown in Figure 13. The design includes five sub-modules: Receiver, RGB_adj, CBHS_adj, OSD and Transmitter. On the LatticeECP2 Advanced Evaluation Board, the 8-position DIPswitch SW5 is used for adjusting the R, G, B gains, Contrast, Brightness, Hue, Saturation, and OSD opacity. When the specific controls are selected, the push-button SW4 (i.e., Control Switch) needs to be toggled to activate the adjustment. SW5 is also used for enabling and disabling the OSD and the Auto-Demo feature. The functions of SW5 pins are listed in Table 2. On the LatticeXP2 and LatticeECP3 evaluation boards, the corresponding switches and their functions are also listed in Table 2. 13 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor Table 2. Switch for Video Color Adjustments, Demo and OSD Controls SW5 Pin Number on SW8 Pin Number on the LatticeECP2 Board the LatticeXP2 Board SW1/SW2 Pin Number on the LatticeECP3 Board OFF ON SW5-8 SW8-8 SW2-4 R-gain or Contrast deselected R-gain or Contrast selected SW5-7 SW8-7 SW2-3 G-gain or Brightness deselected G-gain or Brightness selected SW5-6 SW8-6 SW2-2 B-gain or Hue deselected B-gain or Hue selected SW5-5 SW8-5 SW2-1 Opacity or Saturation deselected Opacity or Saturation selected SW5-4 SW8-4 SW1-4 OSD enabled OSD disabled SW5-3 SW8-3 SW1-3 Auto-Demo enabled Auto-Demo disabled SW5-2 SW8-2 SW1-2 Select RGBO group Select CBHS group SW5-1 SW8-1 SW1-1 Decrease the selected controls when Control Switch is toggled Increase the selected controls when Control Switch is toggled Note: Control Switch is SW4 for the LatticeECP2 Advanced Evaluation Board, SW5 for the LatticeXP2 Advanced Evaluation Board, or SW6 for the LatticeECP3 Video Protocol Evaluation Board. Video Processing Design Implementation Results The video processing demo design was verified using the Lattice 7:1 LVDS Demo Kit that comes with the LatticeECP3, LatticeECP2 and LatticeXP2 evaluation boards and other daughter boards. The video source was running at 108 MHz at 1280x1024 image resolution. Table 3 shows the resources utilized by the design. Table 3. Video Processing Design Performance and Resource Utilization Device LatticeECP31 LatticeECP2/M2 LatticeXP23 Language VHDL Verilog VHDL Verilog VHDL Verilog Speed Grade -7 -6 -6 Utilization (LUTs) fMAX (MHz) I/Os Slices Registers sysMEM EBRs sysDSP Blocks 1848 (2%) >108 35 1420 1347 10 (4%) 4.125 (12%) 1852 (2%) >108 35 1415 1315 10 (4%) 4.125 (12%) 1804(4%) >108 35 1428 1293 8 (38%) 4.125 (23%) 1857 (4%) >108 35 1433 1253 10 (48%) 4.125 (22%) 1803 (11%) >108 35 1492 1292 8 (53%) 4.125 (82%) 1848 (11%) >108 35 1482 1254 10 (67%) 4.125 (82%) 1. Performance and utilization characteristics are generated using LFE3-95EA-7FN1156C with Lattice Diamond™ 1.2 design software. When using this design in a different device, density, speed, or grade, performance and utilization may vary. 2. Performance and utilization characteristics are generated using LFE2-50E-6F672C with Lattice Diamond 1.2 design software. When using this design in a different device, density, speed, or grade, performance and utilization may vary. 3. Performance and utilization characteristics are generated using LFXP2-17E-6F484C with Lattice Diamond 1.2 design software. When using this design in a different device, density, speed, or grade, performance and utilization may vary. Module RGB_adj With the 9x9 multipliers implemented using the sysDSP blocks, the RGB_adj module multiplies the 8-bit R, G, B color datum with the R-, G-, B-gain. These gains are real numbers with value between 0 and 1. Nine data bits represent the real number with bit 8 representing the integer part and the rest of the bits representing the fractional part of the real number. For the fractional part, bit 7 represents 2-1 (i.e. 0.5), bit 6 represents 2-2 (i.e. 0.25), bit 5 represents 2-3 (i.e. 0.125), and so on. For example, the 9-bit data “011000000” will be representing the real value 0.5 + 0.25 = 0.75; the 9-bit data “100000000” will be representing the real value 1.0. The similar method to represent a non-integer real value is used in many modules of the design. The number of the integer bits and the fractional bits may be changed to represent real numbers in different range. 14 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor Module CBHS_adj For adjusting contrast, brightness, hue and saturation of the video image, the pixel data in the RGB color space needs to be converted to the YCbCr color space. Figure 14 shows the block diagram of the CBHS_adj module. After the adjustment, the pixel data in the YCbCr color space will be converted back to the RGB color space. There are offsets in the YCbCr color space. The offsets of Y, Cb and Cr are 16, 128 and 128 respectively. When performing the contrast, brightness, hue and saturation adjustments, these offsets need to be removed. Therefore, the color space converters CSC1 and CSC2 convert the pixel data between the RGB and the YCbCr without adding the Y, Cb and Cr offsets. Figure 14. Contrast, Brightness, Hue and Saturation Adjustments Contrast(7:0) Contrast (0 ~ 1.992) 8 R_input(7:0) Brightness (-32 ~ +31) Y-16 Brightness(5:0) 6 R_output(7:0) Y-16 8 8 B_input(7:0) Cb-128 Cb-128 Cr-128 CSC2 8 CSC1 Hue Control G_input(7:0) G_output(7:0) 8 B_output(7:0) Cr-128 8 8 + Sin Cos CBHS Module Hue(7:0) 8 Hue (-30 ~ +30 degrees) Vsync_input Hsync_input DE_input Saturation (0 ~ 1.992) Saturation(7:0) 8 Vsync_output Hsync_output DE_output D-FlipFlop Delay The equations used in the CSC1 and CSC2 converters are: CSC1 Y - 16 0.2567890625 0.50412890625 0.09790625 Cb - 128 0.14822265625 0.2909921875 0.43921484375 G Cr - 128 0.43921484375 0.3677890625 0.07142578125 B R CSC2 R 1.1643828125 G 1.1643828125 B 1.1643828125 0 1.59602734375 -0.39176171875 -0.81296875 2.01723046875 15 0 Y - 16 Cb - 128 Cr - 128 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor The Sine and Cosine functions are required for the hue adjustment. A lookup table ROM is used for implementing these two functions. A Tcl script is designed to create the memory file used for the Sine/Cosine ROM contents initialization. Module OSD The contents of the On-Screen-Display are controlled by the LatticeMico8 microcontroller. Figure 15 shows the block diagram of the OSD module. The LatticeMico8™ microprocessor, a free 8-bit microcontroller soft core optimized for Lattice FPGAs, will update the dual-port RAM contents in the OSD_main sub-module to reflect the current RGB gains, Contrast, Brightness, Hue, Saturation and OSD Opacity values. It also controls these adjustment values when the Auto-Demo mode is enabled. Figure 15. On-Screen-Display Module R_output(7:0) R_input(7:0) G_input(7:0) 8 8 G_output(7:0) B_input(7:0) 8 8 B_output(7:0) Vsync_input 8 8 Vsync_output Hsync_output Hsync_input DE_input DE_output OSD_main lin_delta Max_line col_delta 11 11 11 11 Max_column Opaque 8 OSD_Disable 1 OSD RAM Access Signals used by LatticeMico8 for changing the OSD contents and text color. R,G,B,O,C,B,H,S Adjustment Outputs Adjustment Select R,G,B,O,C,B,H,S Adjustment Inputs LatticeMico8 takes over the adjustment controls when it is in the Auto-Demo mode OSD Location Controls LatticeMico8 µP Core Input Ports Output Ports 1 11 11 DIP_switch phase 8 4 4 seven_seg 7 7-Seg LED Decode External Scratch Pad Memory for LatticeMico8 µP 16 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor The block diagram of the OSD_main sub-module is shown in Figure 16. There two dual-port RAMs to hold the character codes and the character colors displayed on the OSD. The OSD contents will be changed whenever the RAMs contents are updated by the LatticeMico8 microcontroller. The character patterns are stored in the Character Generator ROM. Figure 16. OSD_main Sub-module R_input(7:0) G_input(7:0) 24 B_input(7:0) DFF DFF DFF DFF DFF 24 DFF A 5 OSD Text Color RAM col(8:3) A 6 (RAM_DP 2048x9) D 3 G_color 3 B_color D DFF DFF DFF DFF DFF DFF D 3 24 24 Bgd_clr A 5 A 6 OSD Text Code RAM col(8:3) (RAM_DP 2048x9) DFF 24 DFF Color RAM Access Ports lin(7:3) Text_Active 1 0 DFF DFF DFF Opacity Adjust Text_Code DFF DFF D 8 24 DFF DFF 7 DFF DFF DFF DFF 3 lin_delta col_delta Vsync_input Hsync_input DE_input OSD_Disable DFF DFF DFF DFF A col(2:0) 24 Opaque DFF text_enable_D1 3 0 1 24 D Code RAM Access Ports lin(2:0) 24 Duplicate Bgd_clr signal 24 times. DFF DFF D DFF Each 3-bit color will be concatenated with 5 ones to make an 8-bit color data. The bus size here is 24-bit. R_color lin(7:3) 24 24 Character Generator ROM A 3 (ROM 1024x8) D 8 R_output(7:0) DFF G_output(7:0) D-type FlipFlop Delay DFF DFF 3 B_output(7:0) 3 Max_line Line/Column Tracking Logic Max_column 11 Line/Column Tracking Logic generates these two for showing the current screen resolution on OSD. 11 DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF DFF Vsync_output Hsync_output DE_output DFF The Line/Column Tracking Logic block controls the position of the OSD and is also used for tracking the current resolution of the video image. The Max_line and Max_column will be read back by the LatticeMico8 to display the resolution on the OSD. The OSD opacity control is implemented in the OSD_main sub-module as well. The Opaque value is a real number between 0 and 1 with 1 as the default value. When the value is reduced, the OSD will become semi-transparent. Figure 17 shows the block diagram of the OSD opacity control. 17 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor Figure 17. OSD Opacity Control R1_input G1_input B1_input 8 8 8 R_output “10000000” 8 + G_output 8 B_output - 8 8 8 8 R2_input G2_input G2_input Opaque Figure 18. Semitransparent OSD Showing On the Display Screen Summary The LatticeECP3, LatticeECP2/M and LatticeXP2 FPGA families are well-suited for high-speed LVDS video applications. In addition to capturing the video data at high speeds, these families are capable of processing video data using the on-chip sysDSP block and the Embedded Block RAM. 18 LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Video Interface Lattice Semiconductor Technical Support Assistance Hotline: 1-800-LATTICE (North America) +1-503-268-8001 (Outside North America) e-mail: [email protected] Internet: www.latticesemi.com Revision History Date Version September 2006 01.0 March 2007 01.1 Change Summary Initial release. Updated 7:1 Receive Side Block Diagram. Updated IDDRX2B Detailed Block Diagram. Updated Example of Input Gearing Using IDDRX2B diagram. Updated Transmitter Side Block Diagram. Updated ODDRX2B Gearing Function diagram. Updated Example of Output Gearing Using ODDRX2B diagram. May 2007 01.2 Added the video demo design example that includes the color adjustments, OSD and auto-demo features. Updated the Performance and Resource Utilization tables to include numbers for both VHDL and Verilog version. Updated figures. September 2007 01.3 Added LatticeXP2 family support and removed the timing analysis section. September 2009 01.4 Added LatticeECP3 family “E” series support. April 2011 01.5 Added LatticeECP3 family “EA” series support. 19