LatticeECP3, LatticeECP2/M, LatticeXP2 7:1 LVDS Video Interface Reference Design

LatticeECP3, LatticeECP2/M and
LatticeXP2 7:1 LVDS Video Interface
April 2011
Reference Design RD1030
Introduction
Source synchronous interfaces consisting of multiple data bits and clocks have become a common method for
moving image data within electronic systems. A prevalent standard is the 7:1 LVDS interface (employed in Channel
Link, Flat Link, and Camera Link), which has become a common standard in many electronic products including
consumer devices, industrial control, medical, and automotive telematics. In many of these applications, the practice of using low-cost FPGAs for image processing has become quite common. In particular, LatticeXP2™,
LatticeECP2™, LatticeECP2M™ and LatticeECP3™ are well-suited to support the 7:1 LVDS standard.
Note: Since the 7:1 LVDS interface is supported in LatticeECP3 “EA” devices, but not the earlier “E” devices, all references to LatticeECP3 in this document refer to the “EA” devices only.
This document describes the requirements for implementing a 7:1 LVDS interface and the advantages of using
these FPGAs in such an interface. By extension, support for the 7:1 LVDS interface in these devices proves the feasibility of hardware implementation for all other LVDS source synchronous requirements as well.
Two designs are included in the discussion of this document. The first design is a simple loopback test that illustrates the use of the 7:1 transmitter and 7:1 receiver. The second design is an example that brings video data into
the FPGA device through the 7:1 receiver, processes it and transmits it out via the 7:1 transmitter. Both designs are
verified using the Lattice 7:1 LVDS Video Demo Kit.
7:1 LVDS Interface Requirement
The 7:1 LVDS interface is a source synchronous LVDS interface. Seven data bits are serialized for each cycle of
the low-speed clock as shown in Figure 1. Typically, the interface consists of four (three data, one clock) or five (four
data, one clock) LVDS pairs. The four pairs translate to 21 parallel data bits and five pairs translate to 28 parallel
data bits. Note that there is a 2-bit offset between the clock rising edge and the word boundary. Each word is 7 bits
long.
Figure 1. Basic Timing of the 7:1 LVDS Interface
Clock
DataA
DataA DataA DataA DataA DataA DataA DataA DataA DataA DataA DataA DataA DataA DataA
D1
D0
D6
D5
D4
D3
D2
D1
D0
D6
D5
D4
D3
D2
(n-1)
(n-1)
(n)
(n)
(n)
(n+1) (n+1) (n+1) (n+1) (n+1)
(n)
(n)
(n)
(n)
DataB
DataB DataB DataB DataB DataB DataB DataB DataB DataB DataB DataB DataB DataB DataB
D1
D0
D6
D5
D4
D3
D2
D1
D0
D6
D5
D4
D3
D2
(n)
(n)
(n)
(n)
(n-1)
(n-1)
(n)
(n)
(n)
(n+1) (n+1) (n+1) (n+1) (n+1)
DataC
DataC DataC DataC DataC DataC DataC DataC DataC DataC DataC DataC DataC DataC DataC
D1
D0
D6
D5
D4
D3
D2
D1
D0
D6
D5
D4
D3
D2
(n-1)
(n-1)
(n)
(n)
(n)
(n)
(n)
(n)
(n)
(n+1) (n+1) (n+1) (n+1) (n+1)
DataD
DataD DataD DataD DataD DataD DataD DataD DataD DataD DataD DataD DataD DataD DataD
D1
D0
D6
D5
D4
D3
D2
D1
D0
D6
D5
D4
D3
D2
(n-1)
(n-1)
(n)
(n)
(n)
(n)
(n)
(n)
(n)
(n+1) (n+1) (n+1) (n+1) (n+1)
Previous Cycle
Current Cycle
Next Cycle
© 2011 Lattice Semiconductor Corp. All Lattice trademarks, registered trademarks, patents, and disclaimers are as listed at www.latticesemi.com/legal. All other brand
or product names are trademarks or registered trademarks of their respective holders. The specifications and information herein are subject to change without notice.
www.latticesemi.com
1
rd1030_01.5
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
Each channel includes a serial LVDS data pair along with a source synchronous LVDS clock pair. The receiver
receives this serial LVDS data, deserializes it and aligns it to the original word boundary to generate seven parallel
LVTTL data bits. The 7:1 transmitter serializes the seven LVTTL parallel data bits to a single LVDS data bit and
transmits this serial data channel along with a LVDS clock.
Figure 2 shows the 7:1 receiver receiving four LVDS data channels. When deserialized, it generates 28-bit wide
parallel data. Similarly, the 7:1 transmitter serializes 28-bit parallel data to generate four LVDS data channels.
Figure 2. 7:1 Receiver and Transmitter Function
4-Bit LVDS
Data
4
28
28-Bit Parallel
LVTTL Data
28-Bit Parallel
LVTTL Data
7:1
Receiver
LVDS
Clock
4
28
4-Bit LVDS
Data
7:1
Transmitter
LVTTL
Clock
LVTTL
Clock
LVDS
Clock
The requirements for an FPGA-based solution to the Channel Link and Flat Link style interfaces consist of four key
components: high-speed LVDS buffers, a PLL for generating the de-serialization clock, input data capture and
gearing, and data formatting.
The data and clock are received or transmitted to or from the FPGA in LVDS format, with the data at relatively high
speed. The exact speed depends on the resolution, frame rate and color depth used by the display. For example,
800x600 to 1024x768 displays require LVDS data to be transmitted from 40 MHz to 78.5 MHz for 60 Hz to 75 Hz
refresh rates. This translates to LVDS data rates of 280 Mbps to 549 Mbps. Higher resolution displays, such as
1280x1024 60 Hz, require data to be transmitted with 108 MHz LVDS clocks. For this system, data will transmit at
756 Mbps.
Clock Generation
In a LatticeECP3, LatticeECP2/M or LatticeXP2 implementation, the input capture circuitry uses Double Data Rate
(DDR) registers with data captured on both the rising and falling edges of the clock. When operating as a receiver
the low-speed clock that is provided with the data must be multiplied by 3.5 times in order to capture the data on
both clock edges. If the input capture circuitry operates on only one edge of the clock, a multiplication factor of
seven must be used. As an alternative, seven phase-shifted versions of the low-speed clock can be generated and
used to capture the input data with seven different registers. However, the challenges of clock generation and distribution discourage this approach for an FPGA implementation. The clock must have relatively low jitter since its jitter
must be accounted for in the overall timing budget. Similarly, the skew of the clock distribution network used to provide this clock to input or output registers must be accounted for in any timing analysis.
In order to transmit high-speed data, a transmitter must multiply the clock used to transfer low-speed parallel data
into the interface by 3.5. Again, the jitter of the clock and the skew of its distribution are important as they impact
the timing budget for the interface. Figure 3 shows the PLL clock generation and how the R, G, B bits, Vsync,
Hsync, and DE of a pixel on line 2 of a video frame get assigned to the four LVDS data pairs. The data bits are sampled on both rising and falling edges of the eclk clock.
2
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
Figure 3. Timings of Video Signals and the 7:1 LVDS Channel Link Interface
Vsync
Hsync
DE
Line
001
R/G/B
Line
002
Line
003
Line
479
Line
480
Hsync
DE
Pixel Clock
R[7:0]
Pixel(001)
R[7:0]
Pixel(002)
R[7:0]
Pixel(003)
R[7:0]
Pixel(639)
R[7:0]
Pixel(640)
R[7:0]
G[7:0]
Pixel(001)
G[7:0]
Pixel(002)
G[7:0]
Pixel(003)
G[7:0]
Pixel(639)
G[7:0]
Pixel(640)
G[7:0]
B[7:0]
Pixel(001)
B[7:0]
Pixel(002)
B[7:0]
Pixel(003)
B[7:0]
Pixel(639)
B[7:0]
Pixel(640)
B[7:0]
PLL
CLKOP (RCLK_in x 3.5)
(not used)
RCLK_in
CLKOS (RCLK_in x 3.5 + phase shift)
eclk
CLKOK (RCLK_in x 1.75)
sclk
RCLK_in
RD_in
R1
(n-1)
R0
(n-1)
G0
(n)
R5
(n)
R4
(n)
R3
(n)
R2
(n)
R1
(n)
R0
(n)
G7
(n+1)
R5
(n+1)
R4
(n+1)
R3
(n+1)
R2
(n+1)
RC_in
G2
(n-1)
G1
(n-1)
B1
(n)
B0
(n)
G5
(n)
G4
(n)
G3
(n)
G2
(n)
G1
(n)
B1
(n+1)
B0
(n+1)
G5
(n+1)
G4
(n+1)
G3
(n+1)
RB_in
B3
(n-1)
B2
(n-1)
DE
(n)
Vsync
(n)
Hsync
(n)
B5
(n)
B4
(n)
B3
(n)
B2
(n)
DE
(n+1)
Vsync
(n+1)
Hsync
(n+1)
B5
(n+1)
B4
(n+1)
RA_in
R7
(n-1)
R6
(n-1)
Rsrv
(n)
B7
(n)
B6
(n)
G7
(n)
G6
(n)
R7
(n)
R6
(n)
Rsrv
(n+1)
B7
(n+1)
B6
(n+1)
G7
(n+1)
G6
(n+1)
eclk
Data Capture
The registers that follow the LVDS input buffer must accurately capture the data. A tight control of the clock and
data relationship is important to capture the incoming high-speed data stream. It is also necessary to gear, or
reduce, the speed of the data before it is passed on to the FPGA fabric. Let us take LatticeECP2/M and
3
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
LatticeECP3 as examples. LatticeECP2/M FPGAs specify the operation of individual circuit elements to around
350 MHz. For LatticeECP3, it will be around 470 MHz. A practical operating frequency with a reasonable amount of
logic is 225 MHz for LatticeECP2/M and 350 MHz for LatticeECP3. Therefore, the greater the gearing that can be
done in the I/O structure, the lower the likelihood that the FPGA fabric will be the limit on overall performance. A
similar discussion is applicable to the transmit path.
Data Formatting
The final step is to take the data from the I/O cells and format it into the original 7-bit width clocked by the lowspeed clock. This logic can easily be constructed within the FPGA fabric.
LatticeECP3, LatticeECP2/M and LatticeXP2 7:1 LVDS Interface
The LatticeECP3, LatticeECP2/M and LatticeXP2 architectures provide an ideal solution for this interface. This
section describes implementation of the 7:1 receiver and 7:1 transmitter using the LatticeECP3, LatticeECP2/M
and LatticeXP2 device I/O structures.
7:1 Receiver
Figure 4 shows the block diagram of the receive side of an intra-system display interface within a LatticeECP3,
LatticeECP2/M or LatticeXP2 device. The receiver receives four LVDS data channels (seven bits each) and one
LVDS clock.
Figure 4. 7:1 Receiver Side Block Diagram
IO DDR Registers
(2x gearing)
Auto Alignment
Module
IDDRX2B*
7
RA_in
4
4:7
Deserializer
7
4
4:7
Deserializer
7
4
4:7
Deserializer
7
4
4:7
Deserializer
7
RA_out
7
7
IDDRX2B*
7
RB_in
RB_out
7
7
IDDRX2B*
7
RC_in
RC_out
7
7
IDDRX2B*
7
O utpu t S ele c t
7
RD_in
RD_out
7
RST
IDDRX2B*
RCLK_in
4
4:7
Deserializer
RCK_out
7
7
reset_sync
SCLK
ECLK
sysCLOCK PLL
RESET
CLKOS
CLKI x3.5, phase-shifted
CLKI
CLKOK
(CLKI x3.5)/2, 0deg
phase
DPHASE
LOCK
reset_sync
generation
logic
4
4
reset_sync_out
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
The data and clock enter the LatticeECP3, LatticeECP2/M or LatticeXP2 device through the LVDS buffers in the
Programmable I/O Cell (PIC) block. When the 2x gearing function is used, these buffers operate at up to 420 MHz
(i.e., 840Mbps) for LatticeXP2 and LatticeECP2/M devices, or 500 MHz (i.e., 1.0Gbps) for LatticeECP3 devices,
supporting most high resolution and display refresh rates.
The LVDS data is fed to the I/O logic DDR register and the source synchronous LVDS clock is fed into a PLL. The
PLL is used to multiply the clock by 3.5 and create a phase shift which is normally 90 degrees. This phase shift
allows for placing the clock in the middle of the data valid window. This faster phase-shifted clock is then distributed
via a low skew edge clock net to double data rate input capture registers. The PLL is also used to generate a slower
clock that is half the frequency of the faster edge clock. This clock is fed to the second stage of DDR registers in the
I/O logic block using the primary clock tree.
The I/O DDR register with the 2x gearing function (IDDRX2B) is used for the design with LatticeXP2 and
LatticeECP2/M FPGAs. A 2x DDR element provides four FPGA side data bits for every I/O side data bit at half the
clock rate. The gearing allows muxing/demuxing of the I/O data clocked with the high-speed Edge clock (ECLK) to
the slower speed FPGA clock rate (SCLK). In the end, all the data is received at the rising edge of SCLK. Figure 5
is a detailed diagram of the IDDRX2B.
Figure 5. IDDRX2B Detailed Block Diagram
IDDRX2B
DDR Registers
Synchronization
Registers
Clock Transfer
Registers
D
D
H
H
E
E
II
Q(0)
DATA
A
Q(1)
B
C
TRUE PIO in LVDS sysIO Pair
COMP PIO in LVDS sysIO Pair
Synchronization
Registers
Clock Transfer
Registers
Q(2)
F
J
G
K
Q(3)
ECLK
SCLK
The IDDRX2B module inputs the DDR data at both edges of the Edge clock and generates four streams of data, all
at the rising edge of the slower FPGA clock. The shaded portion of Figure 5 shows the I/O registers used to do the
2x gearing mode. The I/O registers of the complementary PIO are used in DDR gearing mode. For more informa-
5
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
tion on the DDR registers and the various modes, refer to TN1105, LatticeECP2/M High-Speed I/O Interface,
TN1138, LatticeXP2 High-Speed I/O Interface and TN1180, LatticeECP3 High-Speed I/O Interface.
Figure 6 shows an example of input gearing using the IDDRX2B block.
Figure 6. Example of Input Gearing Using IDDRX2B
CLK at I/O
DDR DATA at I/O
P0
N0
P0
N0
P1
N1
P2
N2
P3
N3
P4
N1
P2
N2
P3
N3
P4
ECLK (shifted 90 deg)
DDR DATA at IDDRX2B
A
XX
N0
B
C
D/E
F/G
P1
N1
P1
P0
XX
N2
P2
XX
N1/P1
N0/P0
XX
N0/P0
N4
N3
P3
P2
P1
P0
N4
P4
P3
P4
N2/P2
N3/P3
N1/P1
N2/P2
SCLK
Q(0)
XX
P0
P2
Q(1)
XX
P1
P3
Q(2)
XX
N0
N2
Q(3)
XX
N1
N3
The four bits of parallel data are then converted to 7-bit data at the correct speed in the 4:7 deserializer module.
The deserializer stores the 4-bit output of the IDDRX2B in a 28-bit wide shift register. The incoming LVDS clock is
then used as a framing signal to detect the start and end of the 7-bit data frame. The ordering of the 7-bit data can
be modified in the design files if required.
7:1 Transmitter
Figure 7 shows the transmit side of the 7:1 implementation. In this case, the LatticeECP3, LatticeECP2/M or
LatticeXP2 device receives four channels of 7-bit parallel data and the slow clock. All 28 bits of parallel data are
aligned to the slow clock received.
The slow input clock is fed to the PLL. The PLL is used to multiply the clock 3.5 times (ECLK). The PLL is also used
to generate a clock at half the frequency of the 3.5x clock. This clock is represented by the SCLK in Figure 7. The
DDR register in the I/O Logic module is used to generate the serial data output. LatticeXP2 and LatticeECP2/M
FPGAs support output DDR register modules with 2x gearing similar to the input DDR registers. The advantage of
using the output DDR registers with the 2x gearing (ODDRX2B) over 1x gearing is that the FPGA core can run at
half the speed of the clock used by the output DDR registers
6
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
The seven bits of parallel data need to be converted to four bits of serial data before they are sent to each of the
output DDR registers. The 7:4 Serializer module is used to do this. Each of the seven bits of parallel data is stored
in a 28-bit wide buffer and four bits of data aligned to the SCLK clock are sent to the ODDRX2B module.
Figure 7. 7:1 Transmitter Side Block Diagram
IO DDR Registers
(2x gearing)
ODDRX2B*
TA_in
7
7:4
Serializer
TA_out
4
ODDRX2B*
TB_in
7
7:4
Serializer
TB_out
4
ODDRX2B*
TC_in
7
7:4
Serializer
TC_out
4
ODDRX2B*
TD_in
7
7:4
Serializer
TD_out
4
“1100011”
7:4
Serializer
ODDRX2B*
TCLK_out
4
RST
SCLK
sysCLOCK PLL
RST_Tx
RESET
ECLK
CLKOP
CLKI x3.5, 0deg
CLK_Tx
CLKI
CLKOK
(CLKI x3.5)/2, 0deg
LOCK
The ODDRX2B also receives the faster ECLK from the PLL and performs the gearing function. The gearing allows
multiplexing of the I/O data clocked with the slow-speed FPGA Clock (SCLK) to the high-speed Edge clock. All of
the data is transmitted at the rising edge of the ECLK.
Figure 8 shows a detailed diagram of the ODDRX2B.
7
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
Figure 8. ODDRX2B Gearing Function
ODDRX2B
D0
DB0
Q
A0
DB1
B0
E0
C0
F0
TRUE PIO in LVDS sysI/O Pair
COMP PIO in LVDS sysI/O Pair
DA0
A1
DA1
B1
C1
SCLK
ECLK
The ODDRX2B module inputs come from the four bits of data from the FPGA fabric at both edges of the slow
FPGA Clock (SCLK). These inputs also generate a single stream of data at both edges of the faster Edge Clock
(ECLK). The shaded portion of Figure 8 shows the I/O registers used in 2x gearing mode. The I/O registers of the
complementary PIO are used in DDR 2x Gearing mode. For more information on the DDR registers and various
modes, refer to TN1105, LatticeECP2/M High-Speed I/O Interface, TN1138, LatticeXP2 High-Speed I/O Interface
and TN1180, LatticeECP3 High-Speed I/O Interface.
Figure 9 shows an example of input gearing using the ODDRX2B block.
8
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
Figure 9. Example of Output Gearing Using ODDRX2B
DB0
XX
d1
d5
d9
d13
d17
..
DB1
XX
d3
d7
d11
d15
d19
..
SCLK
Reg A0
XX
d1
d5
d9
d13
Reg B0
XX
d3
d7
d11
d15
Latch C0
XX
A ( Mux0)
XX
d3
d1
d7
d3
d5
d17
d19
d15
d11
d7
d9
DA0
XX
d0
d4
d8
d12
DA1
XX
d2
d6
d10
d14
d11
d13
d15
d17
..
d16
d18
..
SCLK
Reg A1
Reg B1
XX
d4
d0
XX
d6
d2
d2
d8
d12
d16
d10
d14
d18
d6
d14
d10
Latch C1
XX
B ( Mux1)
XX
d0
d2
d4
d6
d8
d10
d12
d14
d16
Copy of A
( Mux0)
XX
d1
d3
d5
d7
d9
d11
d13
d15
d17
ECLK
Reg D0
XX
d0
d2
d4
d6
d8
d10
d12
d14
Reg E0
XX
d1
d3
d5
d7
d9
d11
d13
d15
Latch F0
XX
d1
Q
XX
d0
d1
d3
d5
d7
d2 d3
d4
d5 d6
d9
d7
d8
d11
d13
d15
d9 d10 d11 d12 d13 d14
The serialized data output of the ODDRX2B is sent out of the device using high-speed LVDS buffers.
The LatticeECP3 I/O structure is different from that of the LatticeXP2 and LatticeECP2/M devices. The DQSBUF
primitive (e.g., DQSBUFE for 2x gearing) has to be used to generate the strobe logic and delay used in the output
DDR modules to correctly mux the DDR data. This DQSBUF primitive is required for the outputs of generic DDR
implementations such as 7:1 LVDS. Since all I/Os in a DQS group share the same DQSBUF, it is recommended to
group as many I/Os of the same 7:1 LVDS bus as possible within one DQS group. Since each DQS group includes
only a limited number of True LVDS pins (normally two I/Os per DQS group), if True LVDS I/Os are used for 7:1
LVDS outputs, more DQSBUF primitives will be required to span the True LVDS outputs to adjacent DQS groups.
Note that this does not apply to the design using emulated LVDS outputs. Also, the I/O DDR primitives in
9
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
LatticeECP3 devices (IDDRX2D1/ODDRX2D ) are different from those in LatticeXP2 and LatticeECP2/M devices
(IDDRX2B/ODDRX2B) in port definitions. For more detailed information, see TN1180, LatticeECP3 High-Speed
I/O Interface.
Design Example 1: Loopback Test
The loopback test design included with this document uses the Lattice FPGA to implement both the 7:1 transmitter
and receiver. Figure 10 shows the design implementation. For more detailed information about the 7:1 transmitter
and receiver, refer to Figures 4 and 7.
28-bit transmit data is generated in the FPGA logic using counter values. This data is then serialized and transmitted as four bits of LVDS data using the 7:1 transmitter logic. The 4-bit LVDS data is then looped back into the
LatticeECP3, LatticeECP2/M or LatticeXP2 device receiver side and deserialized using the 7:1 receiver logic. This
deserialized data is then fed to the data compare logic module which compares the deserialized receiver data to
the original counter values transmitted. The error count is increased at every mismatch detected between the two
data values. The 7:1 transmitter and receiver logic is explained in detail in the sections above.
Figure 10. Loopback Test Block Diagram
TDATA_out
CLK_Tx
7:1
Transmitter
Transmit Data
Generator
Error_Count
4
TCLK_out
28
Data Compare
and Error Logic
Count
RCLK_in
28
7:1
Receiver
RDATA_in
4
RCLK_out
Loopback Test Implementation Results
The loopback design was tested using the LatticeECP2 Advanced Evaluation Board, the LatticeXP2 Advanced
Evaluation Board and the LatticeECP3 Video Protocol Board. Both the Lattice FPGA transmit and receive sides
were successfully run at 108 MHz transmit and receive pixel clock for LatticeECP3, LatticeECP2/M and
LatticeXP2. For LatticeECP3, it can run up to 135 MHz. Table 1 shows the resources utilized by the design.
10
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
Table 1. Loopback Test Design Performance and Resource Utilization
Device Family
Speed
Grade
Language
VHDL
LatticeECP31
-7
Verilog
VHDL
LatticeECP2/M2
-6
Verilog
VHDL
LatticeXP23
Utilization
(LUTs) fMAX (MHz)
-6
Verilog
I/Os
Slices
Registers
sysMEM
EBRs
sysDSP
Blocks
832 (1%)
>108
36
771
910
0 (0%)
0 (0%)
819 (1%)
>108
36
766
916
0 (0%)
0 (0%)
858 (2%)
>108
36
794
914
0 (0%)
0 (0%)
834 (2%)
>108
36
778
916
0 (0%)
0 (0%)
839 (5%)
>108
36
785
916
0 (0%)
0 (0%)
825 (5%)
>108
36
774
915
0 (0%)
0 (0%)
1. Performance and utilization characteristics are generated using LFE3-95EA-7FN1156C with Lattice Diamond™ 1.2 design software. When
using this design in a different device, density, speed, or grade, performance and utilization may vary.
2. Performance and utilization characteristics are generated using LFE2-50E-6F672C with Lattice Diamond 1.2 design software. When using
this design in a different device, density, speed, or grade, performance and utilization may vary.
3. Performance and utilization characteristics are generated using LFXP2-17E-6F484C with Lattice Diamond 1.2 design software. When using
this design in a different device, density, speed, or grade, performance and utilization may vary.
In this design, the Lattice FPGA functions as both the transmitter and receiver.
Design Example 2: Demonstration of 7:1 LVDS Interface with Video 
Processing Functions
In order to verify the operation of the 7:1 LVDS interfaces within the Lattice FPGA, Lattice has developed the test
system shown in Figures 11 and 12. The test system on the LatticeECP3 Video Protocol Board is the same as the
one on the LatticeXP2 Advanced Evaluation Board. Detailed information regarding the test system on the
LatticeECP2 Advanced Evaluation Board, including Boards #1, #2, and #3, and the LatticeECP2 Advanced Evaluation Board, is included in TN1134, Lattice 7:1 LVDS Video Demo Kit User's Guide. This system takes video data
supplied in DVI format from a source such as a PC or a DVD player and converts it to the 7:1 LVDS source synchronous format using a National Semiconductor Channel Link Transmitter Device. This image data is fed to the Lattice
FPGA where the 7:1 Receiver module is used to deserialize the data. This data is then converted back into serial
data using the 7:1 Transmitter module within the Lattice FPGA device. It is then transmitted using a source synchronous 7:1 LVDS interface to a National Semiconductor Channel Link Receiver device and ultimately to a display.
Figure 11. 7:1 Interface Test System on LatticeECP2 Advanced Evaluation Board
Board #3
Board #1 (or #4)
LatticeECP2 Advanced Evaluation Board
60-pin
connection
LVDS 7:1 Rx
Deserializer
R
G
B
Gain
Control
Gain
Control
Gain
Control
R
G
B
TMDS signals
LVCMOS/LVTTL signals
LVDS signals
MDR-26 Channel-Link Cable
RGB to YCbCr Converter
DVI Cable
Desktop PC
DVD Player
ATSC Tuner
DVD
MDR-26 Channel-Link Cable
Y
Cb
Cr
DVI Cable
Contrast / Brightness / Hue /
Saturation Adjustments
Y
Cb
Board #1
Board #2
DVI
B
R
G
B
LVDS 7:1 Tx
Serializer
V
H
D
M
V
H
D
M
26 -p in 3 M M D R
G
OSD
D S9 0C R 28 8 A M T D
YCbCr to RGB Converter
R
LCD Display
Cr
2 6-p in 3M M D R
DVI
V
H
D
M
O n -B o ard S w itc h s
V
H
D
M
V ide o
A d jus t m en ts
2 6 -pin 3 M M D R
(TI TFP401A )
2 6 -pin 3 M M D R
TMDS
Receiver
LatticeECP2-50 Device
D S 9 0 C R 2 87 M T D
V
H
D
M
TMDS
Driver
(TI TFP410)
60-pin
connection
11
V
H
D
M
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
Figure 12. 7:1 Interface Test System on LatticeXP2 Advanced Evaluation Board
Board #3
LatticeXP2 Advanced Evaluation Board
60-pin
Connection
R
G
B
Gain
Control
Gain
Control
Gain
Control
R
G
B
On-Board Switches
LVDS 7:1 Rx
Deserializer
V ide o
A d j us t m en ts
DVI
26-pin 3M MDR
26-pin 3M MDR
DS90CR287MTD
VHDM
TMDS
Receiver
(TI TFP401A)
LatticeXP2-17 Device
RGB to YCbCr Converter
DVI Cable
Cb
Y
Cb
Board #2
G
B
TMDS
Driver
(TI TFP410)
VHDM
LVDS 7:1 Tx
Serializer
DS90CR288AMTD
B
26-pin 3M MDR
G
LCD Display
DVI
OSD
R
MDR-26
Channel-Link
Cable
Cr
YCbCr to RGB Converter
R
LVDS signals
DVI Cable
26-pin 3M MDR
MDR-26
Channel-Link
Cable
LVCMOS/LVTTL signals
Cr
Contrast / Brightness / Hue /
Saturation Adjustments
Desktop PC
DVD Player
ATSC Tuner
DVD
Y
TMDS signals
60-pin
Connection
Figures 11 and 12 show a simplified block diagram of the design inside the FPGA device. Other than the receiver
and transmitter modules, the center logic block can be any customized video processing design. For demonstration
purposes, the designs shown in Figures 11 and 12 were created to include the following features.
• R-gain, G-gain, B-gain controls
• Contrast, Brightness, Hue, Saturation controls
• On-Screen-Display controlled by LatticeMico8 microprocessor
• On-Screen-Display opacity control
12
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
Figure 13. Video Processing Design Example
RA_in
RB_in
RC_in
reset_sync
7:1 LVDS Receiver
(LVDS_7_to_1_RX)
RD_in
RCLK_in
7
7
rx_d
7
rx_c
7
rx_b
rx_a
Rx Signal Mapping
8
8
8
3
r_R
r_G
r_B
r_Vsync
r_Hsync
r_DE
RGB_adj
Gain
Ctrl
Gain
Ctrl
8
rgb_R
Gain
Ctrl
8
rgb_G
Delay
8
rgb_B
3
rgb_Vsync
rgb_Hsync
rgb_DE
CBHS_adj
CBHS Adjustment Outputs
RGBO Adjustment Outputs
(Contrast/Brightness/Hue/Saturation
Delay
Adjustments)
8
cbhs_R
8
cbhs_G
8
cbhs_B
3
cbhs_Vsync
cbhs_Hsync
cbhs_DE
OSD
(On-Screen-Display Controlled by Mico8 uP)
Mico8 uP
Delay
8
t_R
t_G
t_B
3
t_Vsync
t_Hsync
t_DE
Tx Signal Mapping
7
tx_d
7
tx_c
7
7
tx_b
tx_a
TCLK_out
TA_out
TB_out
CBHS Adjustment Inputs
8
RGBO Adjustment Inputs
8
Adjustment
Signals
Generation
Logic
7:1 LVDS Transmitter
(LVDS_7_to_1_TX)
TC_out
TD_out
From DIP
Switch
8
From
Pushbotton
Switch
The block diagram of this design example is shown in Figure 13. The design includes five sub-modules: Receiver,
RGB_adj, CBHS_adj, OSD and Transmitter. On the LatticeECP2 Advanced Evaluation Board, the 8-position DIPswitch SW5 is used for adjusting the R, G, B gains, Contrast, Brightness, Hue, Saturation, and OSD opacity. When
the specific controls are selected, the push-button SW4 (i.e., Control Switch) needs to be toggled to activate the
adjustment. SW5 is also used for enabling and disabling the OSD and the Auto-Demo feature. The functions of
SW5 pins are listed in Table 2. On the LatticeXP2 and LatticeECP3 evaluation boards, the corresponding switches
and their functions are also listed in Table 2.
13
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
Table 2. Switch for Video Color Adjustments, Demo and OSD Controls
SW5 Pin Number on
SW8 Pin Number on
the LatticeECP2 Board the LatticeXP2 Board
SW1/SW2 Pin Number
on the LatticeECP3
Board
OFF
ON
SW5-8
SW8-8
SW2-4
R-gain or Contrast
deselected
R-gain or Contrast
selected
SW5-7
SW8-7
SW2-3
G-gain or Brightness
deselected
G-gain or Brightness
selected
SW5-6
SW8-6
SW2-2
B-gain or Hue
deselected
B-gain or Hue selected
SW5-5
SW8-5
SW2-1
Opacity or Saturation
deselected
Opacity or Saturation
selected
SW5-4
SW8-4
SW1-4
OSD enabled
OSD disabled
SW5-3
SW8-3
SW1-3
Auto-Demo enabled
Auto-Demo disabled
SW5-2
SW8-2
SW1-2
Select RGBO group
Select CBHS group
SW5-1
SW8-1
SW1-1
Decrease the selected
controls when Control
Switch is toggled
Increase the selected
controls when Control
Switch is toggled
Note: Control Switch is SW4 for the LatticeECP2 Advanced Evaluation Board, SW5 for the LatticeXP2 Advanced Evaluation Board, or SW6 for
the LatticeECP3 Video Protocol Evaluation Board.
Video Processing Design Implementation Results
The video processing demo design was verified using the Lattice 7:1 LVDS Demo Kit that comes with the
LatticeECP3, LatticeECP2 and LatticeXP2 evaluation boards and other daughter boards. The video source was
running at 108 MHz at 1280x1024 image resolution. Table 3 shows the resources utilized by the design.
Table 3. Video Processing Design Performance and Resource Utilization
Device
LatticeECP31
LatticeECP2/M2
LatticeXP23
Language
VHDL
Verilog
VHDL
Verilog
VHDL
Verilog
Speed
Grade
-7
-6
-6
Utilization
(LUTs)
fMAX
(MHz)
I/Os
Slices
Registers
sysMEM
EBRs
sysDSP
Blocks
1848 (2%)
>108
35
1420
1347
10 (4%)
4.125 (12%)
1852 (2%)
>108
35
1415
1315
10 (4%)
4.125 (12%)
1804(4%)
>108
35
1428
1293
8 (38%)
4.125 (23%)
1857 (4%)
>108
35
1433
1253
10 (48%)
4.125 (22%)
1803 (11%)
>108
35
1492
1292
8 (53%)
4.125 (82%)
1848 (11%)
>108
35
1482
1254
10 (67%)
4.125 (82%)
1. Performance and utilization characteristics are generated using LFE3-95EA-7FN1156C with Lattice Diamond™ 1.2 design software. When
using this design in a different device, density, speed, or grade, performance and utilization may vary.
2. Performance and utilization characteristics are generated using LFE2-50E-6F672C with Lattice Diamond 1.2 design software. When using
this design in a different device, density, speed, or grade, performance and utilization may vary.
3. Performance and utilization characteristics are generated using LFXP2-17E-6F484C with Lattice Diamond 1.2 design software. When using
this design in a different device, density, speed, or grade, performance and utilization may vary.
Module RGB_adj
With the 9x9 multipliers implemented using the sysDSP blocks, the RGB_adj module multiplies the 8-bit R, G, B
color datum with the R-, G-, B-gain. These gains are real numbers with value between 0 and 1. Nine data bits represent the real number with bit 8 representing the integer part and the rest of the bits representing the fractional
part of the real number. For the fractional part, bit 7 represents 2-1 (i.e. 0.5), bit 6 represents 2-2 (i.e. 0.25), bit 5
represents 2-3 (i.e. 0.125), and so on. For example, the 9-bit data “011000000” will be representing the real value
0.5 + 0.25 = 0.75; the 9-bit data “100000000” will be representing the real value 1.0. The similar method to represent a non-integer real value is used in many modules of the design. The number of the integer bits and the fractional bits may be changed to represent real numbers in different range.
14
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
Module CBHS_adj
For adjusting contrast, brightness, hue and saturation of the video image, the pixel data in the RGB color space
needs to be converted to the YCbCr color space. Figure 14 shows the block diagram of the CBHS_adj module.
After the adjustment, the pixel data in the YCbCr color space will be converted back to the RGB color space. There
are offsets in the YCbCr color space. The offsets of Y, Cb and Cr are 16, 128 and 128 respectively. When performing the contrast, brightness, hue and saturation adjustments, these offsets need to be removed. Therefore, the
color space converters CSC1 and CSC2 convert the pixel data between the RGB and the YCbCr without adding
the Y, Cb and Cr offsets.
Figure 14. Contrast, Brightness, Hue and Saturation Adjustments
Contrast(7:0)
Contrast
(0 ~ 1.992)
8
R_input(7:0)
Brightness
(-32 ~ +31)
Y-16
Brightness(5:0)
6
R_output(7:0)
Y-16
8
8
B_input(7:0)
Cb-128
Cb-128
Cr-128
CSC2
8
CSC1
Hue Control
G_input(7:0)
G_output(7:0)
8
B_output(7:0)
Cr-128
8
8
+
Sin
Cos
CBHS
Module
Hue(7:0)
8
Hue
(-30 ~ +30 degrees)
Vsync_input
Hsync_input
DE_input
Saturation
(0 ~ 1.992)
Saturation(7:0)
8
Vsync_output
Hsync_output
DE_output
D-FlipFlop Delay
The equations used in the CSC1 and CSC2 converters are:
CSC1
Y - 16
0.2567890625
0.50412890625 0.09790625
Cb - 128
0.14822265625 0.2909921875
0.43921484375
G
Cr - 128
0.43921484375 0.3677890625
0.07142578125
B
R
CSC2
R
1.1643828125
G
1.1643828125
B
1.1643828125
0
1.59602734375
-0.39176171875 -0.81296875
2.01723046875
15
0
Y - 16
Cb - 128
Cr - 128
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
The Sine and Cosine functions are required for the hue adjustment. A lookup table ROM is used for implementing
these two functions. A Tcl script is designed to create the memory file used for the Sine/Cosine ROM contents initialization.
Module OSD
The contents of the On-Screen-Display are controlled by the LatticeMico8 microcontroller. Figure 15 shows the
block diagram of the OSD module. The LatticeMico8™ microprocessor, a free 8-bit microcontroller soft core optimized for Lattice FPGAs, will update the dual-port RAM contents in the OSD_main sub-module to reflect the current RGB gains, Contrast, Brightness, Hue, Saturation and OSD Opacity values. It also controls these adjustment
values when the Auto-Demo mode is enabled.
Figure 15. On-Screen-Display Module
R_output(7:0)
R_input(7:0)
G_input(7:0)
8
8
G_output(7:0)
B_input(7:0)
8
8
B_output(7:0)
Vsync_input
8
8
Vsync_output
Hsync_output
Hsync_input
DE_input
DE_output
OSD_main
lin_delta
Max_line
col_delta
11
11
11
11
Max_column
Opaque
8
OSD_Disable
1
OSD RAM Access Signals
used by LatticeMico8 for changing
the OSD contents and text color.
R,G,B,O,C,B,H,S Adjustment Outputs
Adjustment Select
R,G,B,O,C,B,H,S Adjustment Inputs
LatticeMico8 takes over the adjustment controls
when it is in the Auto-Demo mode
OSD Location
Controls
LatticeMico8
µP Core
Input Ports
Output Ports
1
11
11
DIP_switch
phase
8
4
4
seven_seg
7
7-Seg
LED
Decode
External Scratch Pad
Memory for LatticeMico8 µP
16
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
The block diagram of the OSD_main sub-module is shown in Figure 16. There two dual-port RAMs to hold the
character codes and the character colors displayed on the OSD. The OSD contents will be changed whenever the
RAMs contents are updated by the LatticeMico8 microcontroller. The character patterns are stored in the Character Generator ROM.
Figure 16. OSD_main Sub-module
R_input(7:0)
G_input(7:0)
24
B_input(7:0)
DFF
DFF
DFF
DFF
DFF
24
DFF
A
5
OSD Text
Color RAM
col(8:3)
A
6
(RAM_DP 2048x9)
D
3
G_color
3
B_color
D
DFF
DFF
DFF
DFF
DFF
DFF
D
3
24
24
Bgd_clr
A
5
A
6
OSD Text
Code RAM
col(8:3)
(RAM_DP 2048x9)
DFF
24
DFF
Color RAM Access Ports
lin(7:3)
Text_Active
1
0
DFF
DFF
DFF
Opacity
Adjust
Text_Code
DFF
DFF
D
8
24
DFF
DFF
7
DFF
DFF
DFF
DFF
3
lin_delta
col_delta
Vsync_input
Hsync_input
DE_input
OSD_Disable
DFF
DFF
DFF
DFF
A
col(2:0)
24
Opaque
DFF
text_enable_D1
3
0
1
24
D
Code RAM Access Ports
lin(2:0)
24
Duplicate Bgd_clr
signal 24 times.
DFF
DFF
D
DFF
Each 3-bit color will be
concatenated with 5 ones to
make an 8-bit color data. The
bus size here is 24-bit.
R_color
lin(7:3)
24
24
Character
Generator
ROM
A
3
(ROM 1024x8)
D
8
R_output(7:0)
DFF
G_output(7:0)
D-type FlipFlop Delay
DFF
DFF
3
B_output(7:0)
3
Max_line
Line/Column
Tracking Logic
Max_column
11
Line/Column Tracking Logic generates these two for
showing the current screen resolution on OSD.
11
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
DFF
Vsync_output
Hsync_output
DE_output
DFF
The Line/Column Tracking Logic block controls the position of the OSD and is also used for tracking the current resolution of the video image. The Max_line and Max_column will be read back by the LatticeMico8 to display the resolution on the OSD.
The OSD opacity control is implemented in the OSD_main sub-module as well. The Opaque value is a real number
between 0 and 1 with 1 as the default value. When the value is reduced, the OSD will become semi-transparent.
Figure 17 shows the block diagram of the OSD opacity control.
17
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
Figure 17. OSD Opacity Control
R1_input
G1_input
B1_input
8
8
8
R_output
“10000000”
8
+
G_output
8
B_output
-
8
8
8
8
R2_input
G2_input
G2_input
Opaque
Figure 18. Semitransparent OSD Showing On the Display Screen
Summary
The LatticeECP3, LatticeECP2/M and LatticeXP2 FPGA families are well-suited for high-speed LVDS video applications. In addition to capturing the video data at high speeds, these families are capable of processing video data
using the on-chip sysDSP block and the Embedded Block RAM.
18
LatticeECP3, LatticeECP2/M and LatticeXP2
7:1 LVDS Video Interface
Lattice Semiconductor
Technical Support Assistance
Hotline: 1-800-LATTICE (North America)
+1-503-268-8001 (Outside North America)
e-mail: [email protected]
Internet: www.latticesemi.com
Revision History
Date
Version
September 2006
01.0
March 2007
01.1
Change Summary
Initial release.
Updated 7:1 Receive Side Block Diagram.
Updated IDDRX2B Detailed Block Diagram.
Updated Example of Input Gearing Using IDDRX2B diagram.
Updated Transmitter Side Block Diagram.
Updated ODDRX2B Gearing Function diagram.
Updated Example of Output Gearing Using ODDRX2B diagram.
May 2007
01.2
Added the video demo design example that includes the color adjustments, OSD and auto-demo features. Updated the Performance and
Resource Utilization tables to include numbers for both VHDL and Verilog version. Updated figures.
September 2007
01.3
Added LatticeXP2 family support and removed the timing analysis section.
September 2009
01.4
Added LatticeECP3 family “E” series support.
April 2011
01.5
Added LatticeECP3 family “EA” series support.
19