Delivering FPGA-Based Pre-engineered IP Using Structured ASIC Technology

DELIVERING FPGA-BASED PRE-ENGINEERED IP
USING STRUCTURED ASIC TECHNOLOGY
A Lattice Semiconductor White Paper
February 2006
Lattice Semiconductor
5555 Northeast Moore Ct.
Hillsboro, Oregon 97124 USA
Telephone: (503) 268-8000
www.latticesemi.com
1
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
A Silicon Survey
In the communications and networking markets, designers face a number of competitive
pressures: time-to-market, bandwidth, port density and protocol compliance. Since networking
equipment evolves quickly and continuously (e.g., DSLAM, cable head-end, optical switch,
wireless base station), quick time-to-market confirms that the proverbial “early bird gets the
worm.” Increasing bandwidth and protocol compliance (e.g., 10GbE, GFP, etc) are nonnegotiable requirements for all customers. Port density is a factor that directly affects the
economic value-added for a piece of networking equipment. While all these pressures
squeeze designers from different directions, a variety of silicon technologies are available to
help designers carry the load.
Gate Arrays have been an inexpensive alternative that provides reasonable time-to-market.
However, gate arrays are typically offered in process technologies that are 2-3 generations old
(.25µ or .18µ CMOS), severely limiting density and performance while consuming lots of
power. These silicon dinosaurs are still useful in cost-reduction situations, but are poorly
suited for high-performance communications systems.
Application-specific standard products (ASSPs) offer designer off-the-shelf availability at the
expense of customization and differentiation. Additionally, they typically require custom logic
to serve as a bridge from one ASSP to another. The standards that these devices manage are
being moved as intellectual property (IP) blocks into ASICs and FPGAs.
Standard cell ASICs have long been the choice for communications systems designers due to
their performance, density and support for intellectual property portfolios. However, time-tomarket is always compromised with ASICs, and design tool suites are complex and expensive.
The real limiting factor at 90nm is the non-recurring engineering (NRE) charges. These
charges include mask set cost, support engineering charges and sample device expenses. It
is estimated that a 90nm design can easily exceed $3million (US) per design spin. That’s an
expensive risk to take when less expensive alternatives are available.
SRAM-based FPGAs have become a favorite of designers in the communications space
because of their $0 NRE and inexpensive design tool costs. Their biggest advantage over
competing technologies is time-to-market. SRAM-based FPGAs have always been at the
leading edge of the technology curve, with several companies already offering 90nm devices.
2
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
FPGAs have steadily gained market share at the expense of all other options as densities and
performance continue to increase.
Structured ASICs have gained in popularity recently due to their density and performance
relative to SRAM-based FPGA devices. Unlike full-custom or standard cell ASICs, structured
ASICs cost far less to design because they require only one to seven metal layer changes to
accomplish their task. This results in significantly lower NRE than full custom or standard cell
ASICs, as well as quicker turn-around time. A failing of structured ASICs is that they are
manufactured in processing technologies that are several generations old. Most structured
ASICs available today are fabricated in 0.18µ or 0.13µ CMOS, limiting their usefulness in
multi-gigabit communication systems.
Recognizing significant bandwidth, time-to-market, port density and protocol pressures in the
communications market, Lattice Semiconductor has introduced a new silicon technology to
meet the challenges for next-generation systems: the LatticeSCM.
LatticeSCM FPGAs
The LatticeSCM family of FPGAs combines a high-performance FPGA fabric, 3.8Gbps
SERDES, high-performance I/Os, large embedded RAM and embedded ASIC blocks (MACO)
in a single industry-leading architecture. This FPGA family is fabricated on a state of the art
90nm technology to provide one of the highest performing FPGAs in the industry.
This family of devices also includes specific features to meet the needs of today’s
communication network systems. These features include SERDES with embedded advance
PCS (Physical Coding Sublayer), up to 7.8 Mbits of sysMEM embedded block RAM and
dedicated logic to support system level standards such as RapidIO, HyperTransport, SPI4.2,
SFI-4, UTOPIA, XGMII and CSIX. The LatticeSCM devices feature clock multiply, divide and
phase shift PLLs, numerous DLLs and dynamic glitch free clock MUXs that are required in
today’s high end system designs. High speed, high bandwidth I/O makes this family ideal for
high throughput systems. And, for higher-performance, higher density logic, the LatticeSCM
family offers up to 12 embedded MACO blocks per device.
3
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
Table 1 - LatticeSC Family Selection Guide
Device
Logic Block EBR 18Kb
LUT4
SRAM Blocks
LFSC15 15,168
LFSC25 25,424
LFSC40 40,366
LFSC80 80,080
LFSC115 115,200
56
104
216
308
424
EBR SRAM
Mbits
SERDES
3.8Gbps
MACO
Blocks
Analog
PLL
DLL
Max User I/O
1.03
1.92
3.98
5.68
7.8
8
16
16
32
32
4
6
10
10
12
8
8
8
8
8
12
12
12
12
12
300
484
562
904
942
Figure 1 illustrates the layout of a LatticeSCM chip, with the MACO blocks available on the
periphery of the device.
Figure 1- Layout of Lattice LFSC15 with 4 MACO blocks
The major architectural elements listed in Figure 1 are described in greater detail in Table 2.
For a comprehensive view of the LatticeSCM architecture, please refer to the LatticeSCM Data
Sheet, which can be found on the Lattice Semiconductor web site at www.latticesemi.com.
4
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
Table 2 - Major FPGA architectural features on the LatticeSC
Feature
Description
Function
SERDES
SERializer-DESerializer
Embedded transceiver that converts parallel data to serial data and vice-versa. The
transmitter section is a serial-to-parallel converter, and the receiver section is a
parallel-to-serial converter. Connects to LVDS PICs on periphery of chip, and PCS on
FPGA side.
PCS
Physical Coding Sublayer
PLL
Phase Locked Loops
DLL
Digital Delay-Locked Loops
EBR
Embedded Block RAM
Large, dedicated, fast memory blocks. They can be configured as RAM, ROM or FIFO.
These blocks have dedicated logic to simplify the implementation of FIFOs.
PFU
Programmable Function Unit
Primary logic cell that can be programmed to perform Logic, Arithmetic, Distributed
RAM and Distributed ROM functions.
PIC
Programmable I/O Cell
CIB
Configurable Interface Block
Embedded block that contains logic to simultaneously perform alignment, coding, decoding and other functions. May be bypassed to form SERDES-FPGA direct connect.
Provides the ability to synthesize clock frequencies. Each PLL has four dividers
associated with it: input clock divider, feedback divider and two clock output dividers.
The input divider is used to divide the input clock signal, while the feedback divider is
used to multiply the input clock signal.
Similar to PLLs, DLLs assist in the management of clocks and strobes. DLLs are well
suited to applications where the clock may be stopped or transferring jitter from input to
output is important, for example forward clocked interfaces. Used for clock injection
match, duty cycle correction, and single delay cell.
Each PIC contains four programmable I/O buffers that are then connected to the I/O
pads. The PIO block supplies the output data and the Tri-state control signal to I/O
buffers, and receives input from the buffer. The PIO contains advanced capabilities to
allow the support of speeds up to 2Gbps.
This block serves as the interface between PLC, PIC and EBR blocks. It has a routing
block, and a logic block. The CIB logic block can buffer signals and generate control
signals for other blocks.
The LatticeSC architecture allows FPGA designers to approach performance levels previously
available only with full custom or standard cell ASICs. Table 3 illustrates some commonly
used benchmarks implemented in the LatticeSC FPGA fabric (no MACO gates used).
Table 3 - Performance Benchmarks for LatticeSC FPGA Fabric
Function’s Performance
32-bit Address Decoder
(MHz)
522
64-bit Address Decoder
496
32:1 Multiplexer
561
64-bit Adder (ripple)
328
64-bit Counter (up or down counter, non-loadable)
376
Masked Array for Cost Optimization (MACO)
The layout of the LatticeSCM FPGA is a regular and homogeneous array of programmable
logic cells (PFUs) surrounded by programmable I/O cells (PICs). At the top of the device are
embedded SERDES channels that connect to embedded multi-purpose physical coding sublayer (PCS) blocks for managing high-speed serial data transfers. The PCS block can be
5
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
bypassed to transfer serial data directly to the FPGA fabric. Rows of embedded block RAM
(EBR) are “striped” across the array for efficient connectivity to the PFUs. Special configurable
interconnect blocks (CIB) contain dedicated resources for routing signals to/from the block
RAM. At the end of each EBR row (see Figure 1) is an area of silicon that Lattice has made
available for a “structured ASIC” block, allowing designers the ability to commit logic to highperformance, high-density 90nm arrays. Lattice calls this concept the Masked Array for Cost
Optimization (MACO).
The LatticeSCM family provides designers access to pre-engineered, high performance IP
blocks designed in 90nm structured ASIC blocks. Combined with a state-of-the-art FPGA
array and world-class SERDES technology, these blocks offer the most flexible and highperformance programmable platform available today.
MACO Block Architecture
Figure 2 illustrates a block diagram for each of the MACO sites on a LatticeSCM FPGA. The
major features of each block include three 64x40 RAMs, a MACO interface block (MIB) for
managing FPGA-to-MACO connectivity and a sea of gates for implementing digital logic.
6
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
Figure 2 - MACO block diagram with adjacent FPGA features
64x40
RAM
64x40
RAM
64x40
RAM
64x40 RAMs
Each MACO block contains three 64x40 asynchronous RAMs with dual-port addressing that
permit simultaneous reading and writing of data. Dedicated latches are on all address, data,
and enable ports. Synchronous read and write operations are permitted through the read and
write ports, respectively. The 64x40 memory blocks can be initialized during FPGA device
configuration using a pre-load bus that sits between the 64x40 memory blocks and the rest
of the MACO hard IP logic that access memories during normal operation.
These 64x40 memories are intended to augment the EBR in the FPGA fabric, which are
available to MACO logic through the MACO interface block (MIB).
7
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
MACO Interface Block
The MACO interface block (MIB) acts as a bridge between the MACO blocks and the rest of
the FPGA fabric. All MACO signals must either originate from or route through the MIB. Table
4 lists the MIB resources accessible to the designer and their function.
Table 4 - MACO Interface Block connections
Connection
Description
Number of Connections
MACO ⇔ EBR
MACO is able to directly access up to 10 EBRs through these
connections. If MACO does not connect to EBR, some of these
connections can be used to connect to FPGA logic/routing.
• MACO Inputs from EBR: 48x10=480
• MACO Outputs to EBR: 64x10=640
• If no EBR used, up to 480 inputs and
outputs can connect to FPGA fabric
MACO ⇔ PIO
Connects MACO to PIOs for I/O routing. If MACO is unconnected
to I/Os these connections are used as route-throughs, allowing
PIO routing to cross over MACO block.
• PIOs above MACO: 256 In, 256 Out
• PIOs below MACO: 256 In, 256 Out
MACO ⇔ CIB
Connects MACO to PFU logic array. Connections include I/O
data signals, clocks, local resets, and clock-enables. All
connections to/from PFU or routing go through CIBs.
Edge Clocks
The LatticeSC devices have 8 edge clocks around the periphery
of each array for the purpose of facilitating high-speed I/O.
Tri-States
The tri-states are available for connection to the global reset
(GSR).
• MACO data inputs from CIBs: 70
• MACO data outputs to CIBs: 96
• Clocks inputs to MACO: 12
• Clock-enable inputs to MACO: 12
• Local reset inputs to MACO: 10
• Four (4) four edge clock inputs that can
be driven from any of the eight (8) edge
clocks.
• Two (2) outputs connect to GSR
The primary connectivity between the FPGA fabric and MACO block is through a CIB. This
resource limits the designer to 96 data outputs and 70 data inputs with additional ports
available for resets, clock-enables and clocks. However, if EBR adjacent to the MACO block
are unused, then up to 480 MACO ⇔ EMB connections may be used for highly flexible
connectivity to the FPGA PFU array.
For high-speed I/O connections, 512 I/Os on each MACO block are defined for PIO
connectivity. The I/O capabilities on LatticeSC devices operate at up to 2Gbps, and with
ample connectivity to fast MACO gates, this combination of technologies offers the industry’s
highest throughput solution.
MACO Gates
At the heart of the MACO block is a sea of 50,000 usable ASIC gates. A library of cells was
created with Fujitsu’s 90nm CMOS process technology and optimized for speed, power
dissipation and area. The library contains 257 cells for combinatorial, sequential, and specialpurpose functions.
8
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
Note that there are no I/O cells in the MACO cell library. They are not required because FPGA
IOs are used for the MACO core chip-level I/Os. Additionally, there is no memory compiler or
memory cells (other than the three 64x40 RAMs) for MACO because the LatticeSC FPGA
fabric contains ample, flexible embedded memory with excellent connectivity to each MACO
block.
The Advantages of MACO
FPGA vs. MACO Performance
MACO provides multiple advantages: speed, density and lower power dissipation. Because
MACO is implemented in 90nm ASIC gates, the performance increase over SRAM-based
lookup-table architecture can exceed 100%.
The best way to compare the performance advantage of a 90nm cell-based technology to a
90nm LUT-based technology is to take a design fragment and implement it in both. For this
white paper a 32-bit cyclic redundancy check (CRC) calculation was selected that employs a
dense PRBS sequence chosen to produce a large logic tree. The code is written in Verilog
HDL and is included in Appendix A.
The design was first run through the Lattice ispLEVER v5.1 design tool targeting the Lattice
LFSC25 FPGA, a new, 90nm device that offers the industry’s highest performance. The
design was mapped, placed and routed with minimal effort and a 300MHz timing preference on
the output clock. A static timing report was generated to analyze the worst-case path to
determine a maximum operating frequency. A portion of the report from ispLEVER’s TRACE
tool is shown in Figure 3.
9
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
Figure 3 - TRACE static timing report for CRC calculation in LatticeSC FPGA fabric.
================================================================================
Preference: FREQUENCY NET "oclk2" 300.000000 MHz ;
802 items scored, 0 timing errors detected.
-------------------------------------------------------------------------------Passed:
The following path meets requirements by 0.358ns
Logical Details:
Source:
Destination:
Cell type
Pin type
Cell name
FF
FF
Q
Data in
dec/XZ0Z_8 (from oclk2 +)
dec/X_19 (to oclk2 +)
Delay:
2.900ns
(clock net +/-)
(21.7% logic, 78.3% route), 6 logic levels.
Constraint Details:
2.900ns
3.333ns
0.000ns
0.075ns
physical path delay SLICE_7 to SLICE_1 meets
delay constraint less
skew and
DIN_SET requirement (totaling 3.258ns) by 0.358ns
Physical Path Details:
Name
REG_DEL
ROUTE
CTOF_DEL
ROUTE
CTOF_DEL
ROUTE
CTOF_DEL
ROUTE
CTOF_DEL
ROUTE
CTOF_DEL
ROUTE
Report:
Fanout
--5
--10
--3
--1
--1
--1
Delay (ns)
Site
Resource
0.305
R29C8B.CLK to
R29C8B.Q0 SLICE_7 (from oclk2)
0.728
R29C8B.Q0 to
R28C9A.B0 dec/XZ2Z_8
0.065
R28C9A.B0 to
R28C9A.F0 SLICE_59
0.614
R28C9A.F0 to
R28C7C.B1 dec/GZ0Z_3
0.065
R28C7C.B1 to
R28C7C.F1 SLICE_11
0.374
R28C7C.F1 to
R27C7C.C1 dec/GZ0Z_17
0.065
R27C7C.C1 to
R27C7C.F1 SLICE_13
0.344
R27C7C.F1 to
R27C8A.C1 dec/un921_X_0_1Z0Z_0
0.065
R27C8A.C1 to
R27C8A.F1 SLICE_12
0.210
R27C8A.F1 to
R27C8B.C0 dec/un921_X_0Z0Z_0
0.065
R27C8B.C0 to
R27C8B.F0 SLICE_1
0.000
R27C8B.F0 to
R27C8B.DI0 dec/XZ1Z_20 (to oclk2)
-------2.900 (21.7% logic, 78.3% route), 6 logic levels.
336.134MHz is the maximum frequency for this preference.
As with all SRAM-based FPGAs, routing delay tends to dominate critical path delay. For this
circuit, actual logic is implemented in 6 levels constituting only 21.7% of the critical path delay.
The routing delay consumes the remaining 78.3%. Nonetheless, for a circuit with a rather
large logic tree, 6 levels of logic resulting in 336MHz performance is outstanding. Note: the
max frequency is calculated as 1/(Path_Delay + Setup_Requirement).
10
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
The same code was targeted at the MACO technology, also available on the LatticeSC FPGA.
Synopsys Design Compiler (v 2004.06.1) was used to synthesize the design and Cadence
SOC Encounter (2004.1) was utilized for place and route. After parasitic extraction, the design
Figure 4 - PrimeTime static timing report for CRC calculation implemented in LatticeSC
MACO.
****************************************
Report : timing
-path full_clock
-delay max
-input_pins
-nets
-nworst 4
-max_paths 5
-transition_time
-capacitance
Design : crc_hard
Version: V-2004.06
Date
: Wed Jul 13 16:53:08 2005
****************************************
Startpoint: X_reg_31_V (rising edge-triggered flip-flop clocked by clk)
Endpoint: X_reg_3_V (rising edge-triggered flip-flop clocked by clk)
Path Group: clk
Clock PERIOD
Path Type: max
clock clk (rise edge)
0.000
3333.000
3333.000
clock source latency
0.000
3333.000
clk (in)
12.690
4.033 + 3337.033 r
clk (net)
1
0.008
clk_bufi/A (ckx4m4mce)
13.368
0.131 * 3337.165 r
clk_bufi/Z (ckx4m4mce)
61.442
53.255 * 3390.419 r
clk_buf (net)
6
0.089
clk_buf__L1_I0/A (ckx4m4mce)
79.958
30.227 * 3420.646 r
clk_buf__L1_I0/Z (ckx4m4mce)
62.793
67.937 * 3488.583 r
clk_buf__L1_N0 (net)
15
0.085
X_reg_3_V/CK (mxbffprqnx1m4mce)
74.650
22.550 * 3511.134 r
library setup time
-43.727 * 3467.406
data required time
3467.406
----------------------------------------------------------------------------data required time
3467.406
data arrival time
-1574.362
----------------------------------------------------------------------------slack (MET)
1893.044
Max Frequency = 1 / (3333-1893.044) = 694 MHz
was analyzed in Synopsys PrimeTime. A portion of the PrimeTime report is shown in Figure 4.
The MACO version of this implementation resulted in a critical path that contained 17 logic
levels. Minimal effort was applied during the synthesis phase, resulting in the use of many 2input combinatorial gates. Nonetheless, without much effort, this design was able to achieve
694MHz max frequency, in large part due to the low routing delays associated with a cellbased design. Note: the max frequency is derived as 1/(Clock_Period – Slack).
Table 5 summarizes the results of the performance comparison. Even with inefficient packing
and almost 3x the number of logic levels in the worst-case path, the 90nm cell-based MACO
technology easily outperformed the 90nm SRAM-based LUT technology by over 100%.
11
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
Table 5 - Performance comparison for LatticeSC FPGA vs. MACO technologies
Technology
LatticeSC FPGA
LatticeSCM MACO
Max Frequency
Logic Levels
336.134 MHz
6
694 MHz
17
FPGA vs. MACO Area
Many FPGA designs contain intellectual property (IP) designed to handle chip-to-chip
communications standards. For instance, SPI4.2 is a popular packet interface employed on
many MACs and network processors. Since it is an established standard, it is desirable to
implement this function in hard gates (i.e., MACO) rather than use valuable FPGA gates.
Other FPGA vendors offer SPI4.2 solutions as soft IP that consume FPGA resources and
make user designs more difficult to map, place, route and achieve timing.
A sample application using an FPGA plus a SPI4.2 interface is an Ethernet-to-SONET bridge.
In Figure 5 the FPGA is used to manage the bridge, including flow control and packet
buffering, usually with a memory controller and external buffer. If the SPI4.2 interfaces in this
example have to be implemented in soft IP using FPGA gates, the design becomes
considerably more difficult and may require a larger device. In this case, the real value-added
is not the SPI4.2 interface but the bridging functions.
Figure 5 - FPGA as a SONET-to-Ethernet bridge employing SPI4.2
The LatticeSCM FPGA employs MACO technology to implement communication standards
such as SPI4.2. The blocks are pre-engineered, resulting in a significant up-front development
savings for the designer. The FPGA gates are available to the designer for value-added and
differentiating functions. For an example of the savings for the example, refer to Table 6.
12
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
Table 6 - Resources required to implement SONET-to-GbE bridge
Block
LatticeSC FPGA w/MACO
Competitor FPGA
20,000
20,000
0
5000
3000
16000*
23,000
41,000
Device Required
LFSC25
XC4VLX40
Device Utilization
92%
100%
User Design LUTs
2 x Memory Controller LUTs
2 x SPI4.2 LUTs
Total LUTs Required
*Xilinx SP4.2 v7.4 Core for Virtex4: DS209
The LFSCM25 includes two high-speed memory controllers implemented in MACO and not
requiring many LUTs. The SPI4.2 MACO cores utilize approximately 1500 LUTs each to
manage the SPI4.2 status path and ingress/egress FIFOs in the data path. The FIFOs are
implemented in EBR, since MACO has little embedded RAM. Essentially, the bulk of the
SPI4.2 data path is committed to MACO, while the user-accessible status path consumes just
a handful of LUTs. This is so that the customer is able to modify the status controller if desired.
As a comparison, a competitor device requires over 20K LUTs just to implement this industrystandard IP.
Lattice studies have found that each MACO core is equivalent to at least 5,000 LUTs, while
occupying only 10% of the area of an equivalent FPGA implementation. With multiple MACO
blocks per device, this results in a substantial cost savings to the designer, both by lowering
the development cost and saving significant silicon area.
FPGA vs. MACO Power
In today’s high performance systems, power dissipation budgets are tightening even as clock
rates and data paths rapidly increase. As a cell-based technology, MACO power consumption
is significantly less for a given design at an equivalent operating frequency. For a comparison
of power dissipation, we will again use a SPI4.2 core, a common packet interface found in
communication systems.
The Lattice SPI4.2 solution using the LatticeSCM FPGA utilizes MACO technology for the
majority of a SPI4.2 core. Only 1500 LUTs and 10 EBRs are used in the LFSC FPGA fabric,
primarily for FIFOs and the SPI4.2 status path.
Competitive SPI4.2 solutions use only FPGA gates to implement entire SPI4.2 cores. A
survey of the power consumption for each implementation reveals the LatticeSC solution
employing MACO uses considerably less power then competitive devices. This is due, in part,
13
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
to the power savings offered by MACO. Lattice studies on a variety of cores implemented in
MACO technology show that MACO power, even when running at up to 700MHz, does not
exceed 200mW per MACO block.
Table 7 shows the results of a survey of SPI4.2 core power in competitive SRAM-based
FPGAs. The comparison employs the fastest speed grade in each case. With a SPI4.2
interface running at 1+ Gbps, much power is dissipated by the I/Os. The balance of power is
generated by the core implementation. While Altera’s SPI4.2 core uses less LUTs and
embedded memory than Xilinx, it is at the expense of substantial power. While the Xilinx
implementation is less power hungry than Altera’s, 2.1W is a lot of power to expend for a
standard interface. Adding another SPI4.2 core to a Xilinx solution would leave little power
budget margin for value-added functionality.
Table 7 - A survey of power for SPI4.2 cores in competitive FPGAs
Vendor
Xilinx
1
Altera
2
Lattice
Max LVDS
Rate
Speed
Grade
LUT4s
Memory
(Kbits)
Total Power @
Max Rate
Total Power @
800Mbps
1+ Gbps
-12
(Fastest)
8000
306
2.1 W
1.7 W
6100
154
~4.4W
3.4 W
1500
180
1.05
0.85 W
1.04 Gbps -3 (Fastest)
1+ Gbps
-7 (Fastest)
In contrast, the LatticeSCM solution greatly benefits from the low power dissipation of the
MACO technology. In fact, a LatticeSC FPGA could implement four SPI4.2 cores for the same
power budget that Altera requires for a single core.
Overview of System Connectivity
Figure 6 below summarizes the basic roles of FPGAs on a typical communications line card.
At a high level, these basic functions are common across the major industries served by
FPGAs, which include telecom, datacom and storage. Figure 7 also highlights the many
interfaces and standards that FPGAs are called upon to support.
1
http://www.xilinx.com/xlnx/xebiz/designResources/contentContainer.jsp?key=spi-42_faq
2
http://www.altera.com/products/devices/stratix2/features/st2-competitive.html
14
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
Figure 6 - FPGAs in a typical Communications Line Card
Pre-Designed MACO Blocks
The initial LatticeSC devices will offer pre-designed IP based on industry-standard interfaces.
Included on the initial LFSC80 devices will be memory controllers, SPI4.2 interfaces, and
flexible MAC blocks supporting GbE, 10GbE and PCI Express protocols. Figure 7 illustrates
the MACO block layout for the LFSCM25 FPGA.
15
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
Figure 7 – LFSCM25 FPGA with MACO block layout.
SERDES
SERDES
SERDES
SERDES
Quad
Quad
Quad
Quad
MA
CO
MA
CO
A
B
C
F
C
flexiMAC
(GbE, 10GbE and PCI
Express)
SP4.2
D
E
Memory Controllers
(DDR1/2, QDR2 and
RLDRAM1/2)
Embedded Memory Controllers
The LatticeSCM devices provide dedicated high-speed memory controllers with MACO
technology supporting the major high-speed memory standards implemented in many
communications systems: DDR 1/2 SDRAM, QDR 1/2 SRAM, and RLDRAM 1/2.
Implementing high performance DDR memory interfaces requires very careful design of the
read and write interface blocks of the memory controller. DDR2 memory devices pose a
greater challenge due to their higher speeds of operation and the bi-directional DQS signal.
The LatticeSC memory controller utilizes on-chip PLLs and DLLs, along with programmable
delay elements at the input buffers to align DQS and DQ signals. These elements work
together to compensate for process, voltage and temperature variations, providing reliable
operation at all operating conditions and various frequencies. The LatticeSC devices also
contain dedicated DDR register structures in the inputs (for read operations) and in the outputs
(for write operations). All of these blocks are critical for implementing reliable high-speed DDR
and DDR2 SDRAM Controllers. Typically, designers have problems implementing high-speed
memory controllers in FPGAs because of the complexity of the DQS logic.
QDR SRAMs are very popular in low latency applications requiring simultaneous reads and
writes. The LatticeSC devices also implement an embedded QDR SRAM memory controller
16
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
for interfacing to QDR2 SRAM memory devices. The memory controller supports both 2- and
4-word burst modes.
Reduced Latency DRAM (RLDRAM) provides an SRAM-type interface with non-multiplexed
addresses. RLDRAM2 technology provides minimized latency and reduced row cycle times
that are very suitable for applications requiring critical response time and very fast random
accesses, such as next generation (10 Gbps and beyond) networking applications. The
LatticeSC implements on-chip RLDRAM memory controllers in MACO supporting both
RLDRAM1I and RLDRAM2 devices. Additionally, the LatticeSCM memory controller supports
both types of RLDRAM2 memory: Common I/O (CIO) and Separate I/O (SIO).
Figure 8 - LatticeSCM DDR input for dedicated Memory Controller support
SPI4.2 Interface Cores
Initial LatticeSCM FPGAs will incorporate one or two SPI4.2 cores implemented in MACO.
Each is fully compliant with the OIF-SPI4-02.0 specification, offering up to 256 logical ports,
with transmit/receive data paths that are 16-bits wide with in-band port address, SOP, EOP
indication and error control. Each SPI4.2 MACO core interfaces directly with LatticeSCM
LVDS I/Os that offer source synchronous double edge clocking at 311MHz minimum and in
static or dynamic alignment modes. The core supports up to 1 Gbps with dynamic phase
alignment and up to 700 Mbps in static alignment mode. The SPI4.2 MACO core interfaces
with the FPGA via a 128-bit user accessible data path. Only 1500 FPGA LUTs and 10 EBR
are used, primarily for buffering FIFOs and status path management.
17
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
Figure 9 - SPI4.2 implemented in LatticeSCM FPGA with MACO
flexiMAC
flexiMAC is a flexible packet framer and parser that can implement Layer 2 (Data Link Layer or
MAC) functionality for various standards. Implemented in MACO technology, the flexiMAC
functionality complements the LatticeSCM SERDES and the Layer 1 (Physical Layer) multiprotocol functionality of the Physical Coding Sublayer (PCS). This results in a complete Layer
1/Layer 2 solution for 1G/10G Ethernet standards and provides customers with integrated
1GE/10GE solutions without using up valuable FPGA gates.
Figure 10 - Functional block diagram of flexiMAC in LatticeSCM FPGA
Conclusion
Lattice has introduced a new silicon technology to meet today’s ever-increasing demand for
performance, density, low power and time-to-market. Called MACO, this technology adds a
18
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper
90nm structured ASIC capability to an advanced FPGA + SERDES platform. The MACO
gates provide superior density and performance with significantly lower power dissipation.
MACO is ideal for industry-standard IP such as memory controllers, SPI4.2, and MACs that
are necessary for today’s communications design, but cannot be optimized in soft IP. This
results in substantial development cost reduction because standards are committed to
dedicated silicon and the FPGA gates are reserved for value-added and differentiating
features.
MACO conserves power by implementing IP previously targeted for LUT-based architectures
in 90nm cell-based technology. MACO blocks do not exceed 200mW per site, even at
700MHz performance. This type of power dissipation can’t be matched in 90nm LUT
architectures running at similar clock rates.
MACO conserves area by shrinking 5,000 equivalent LUTs into a much smaller silicon area.
With multiple MACO blocks per device, this significantly boosts device density. Since MACO
is ideal for industry-standard IP cores, this means that valuable LUT-based silicon is reserved
for value-added design features.
Finally, MACO improves device performance. Since MACO is a 90nm cell-based technology
like a structured ASIC, it is capable of > 700Mhz performance with little design effort. Ample
connectivity is provided to connect each MACO to LatticeSCM I/O technology, as well as
embedded RAM and programmable logic blocks.
###
19
Delivering FPGA Based Pre-Engineered IP Using Structured ASIC Technology
A Lattice Semiconductor White Paper