CoreFIR Finite Impulse Response Filter Generator Product Summary Supported Families Intended Use • Finite Impulse Response (FIR) Filter for Actel FPGAs Key Features • Two FIR Computation Algorithms for Design Flexibility • Constant Coefficient (CC) Algorithm for High Throughput • – Constant Multiplier Computation – Low Gate Counts – High Speed Fusion • ProASIC®3/E • ProASICPLUS ® • Axcelerator® • RTAX-S • SX-A • RTSX-S Core Deliverables • Evaluation Version – Distributed Arithmetic (DA) Algorithm – Multiplier-Free Computation – Low Cost – Optimized for Actel FPGAs – Efficient DA Architecture using Embedded RAM Lookup Tables – • • • Folding Architecture with Serialized Computation to Minimize Size for Lower Sample Rates – DA Lookup Table ROM Synthesis for FPGA without Embedded RAMs – Multiple DA Lookup Tables to Split Large Number of Taps RTL Code of a Sample Filter and Compiled RTL Simulation Model Fully Supported in the Actel Libero® Integrated Design Environment (IDE) RTL Version – Microsoft Windows® Binary Executable of the CoreFIR Generator – VHDL FIR Module – VHDL Testbench Synthesis and Simulation Support • Synthesis: Synplicity®, Synopsys® (Design Compiler® / FPGA CompilerTM /FPGA ExpressTM), ExemplarTM • Simulation: OVI-Compliant Verilog Simulators and Vital-Compliant VHDL Simulators Core Generator – Executable File Outputs Runtime Library (RTL) Code and Testbench Based on Input Parameters – Self-Checking: Executable Tests Generate Output against Algorithm • Actel FPGA-Optimized RTL Code • Supports 2 to 128 Taps • 1- to 32-Bit Input Data and Coefficient Precision May 2006 © 2006 Actel Corporation v 4 .0 1 CoreFIR Finite Impulse Response Filter Generator Contents General Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Constant Coefficient Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Distributed Arithmetic Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Device Utilization and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Constant Coefficient Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Distributed Arithmetic Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Constant Coefficient Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 CoreFIR Generator Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Distributed Arithmetic Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 FIR Filter Using Distributed Arithmetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 DA Functional Block Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 CoreFIR Generator Parameters in DA Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 DA FIR Filter with Large Number of Taps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 I/O Signal Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Clock and Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Input and Output Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Appendix I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Appendix II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Ordering Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 List of Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Datasheet Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2 v4.0 CoreFIR Finite Impulse Response Filter Generator current sample and N – 1 delayed samples. Figure 2 shows a detailed design implementation diagram. General Description CoreFIR is an Actel FPGA-optimized RTL generator that produces a finite impulse response filter. It has two architectures: constant coefficient (CC) and distributed arithmetic (DA). The constant coefficient architecture is a multiplier-based architecture for higher throughput. The DA architecture uses add and shift operations to perform the calculations in a small size. x(i) Delay Circuit x(i) x(l – N + 1) Filter Arithmetic Unit Constant Coefficient Mode CoreFIR implements the constant coefficient algorithm for faster designs as well as for newer Flash family products such as ProASIC®3 and Fusion. CC implementation derives directly from the definition of convolution in the time domain. Figure 1 shows such a general filter structure. It is comprised of a filter arithmetic unit and a delay circuit. The filter arithmetic unit is an implementation of the data dependence graph, containing multipliers and adders. The delay circuit furnishes the necessary data input for the filter arithmetic unit. It supplies several samples in parallel, the y(i) Figure 1 • General FIR Filter Structure Input C0 C1 0 Register Output Figure 2 • Design Diagram Distributed Arithmetic Mode CoreFIR is an Actel FPGA-optimized RTL generator that produces a finite impulse response filter. It implements the DA algorithm to eliminate multiplication for faster and smaller designs. CoreFIR uses embedded RAM blocks in Actel FPGAs as DA lookup tables (when available) to further reduce the size of the design. The generator also reads the user system clock rate and data sample rate to explore using a folding or serial architecture to further reduce size, especially when the system clock rate is datai Coefficients much greater than the data sampling rate. The generator automatically switches to the use of multiple DA lookup tables when the requested FIR filter has a large number of taps. Figure 3 shows the functional block diagram of a generated FIR filter design. More complex designs may contain multiple lookup tables, accumulators, or control sections. Input Buffers DA Lookup Tables (RAMs or ROM) DA LUT Generator Control Shifter/Accumulator datao Figure 3 • Functional Block Diagram v4.0 3 CoreFIR Finite Impulse Response Filter Generator Device Utilization and Performance Constant Coefficient Mode CoreFIR generates FIR filters with many configurations. Table 1 provides the typical utilization and performance data for the generated FIR filters implemented with the configurations listed in Table 2 on page 5. Refer to Table 2 on page 5 for the Configuration column in Table 1. Table 1 • CoreFIR Device Utilization and Performance for Constant Coefficient Mode Cells or Tiles Utilization Configuration Combinatorial Sequential Total Device Total Throughput Msps Fusion 1 1,502 224 1,726 AFS250 28% 93 Fusion 2 5,348 577 5,925 AFS250 96% 83 Fusion 3 8,676 1,025 9,701 AFS600 70% 62 Fusion 4 8,676 1,025 9,701 AFS600 70% 62 ProASIC3/E 1 1,502 224 1,726 A3P250 28% 93 ProASIC3/E 2 5,348 577 5,925 A3P250 96% 83 ProASIC3/E 3 8,676 1,025 9,701 A3P600 70% 62 ProASIC3/E 4 8,676 1,025 9,701 A3P600 70% 62 ProASICPLUS 1 1,451 217 1,668 APA075 54% 55 PLUS ProASIC 2 5,809 577 6,386 APA300 78% 55 ProASICPLUS 3 9,690 1,025 10,715 APA1000 19% 41 PLUS ProASIC 4 9,690 1,025 10,715 APA1000 19% 41 ProASICPLUS 5 28,478 2,569 31,047 APA750 94% 46 Axcelerator 1 269 200 469 AX125 23% 202 Axcelerator 2 989 559 1,548 AX250 36% 165 Axcelerator 3 1,795 1,007 2,802 AX250 66% 137 Axcelerator 4 1,795 1,007 2,802 AX500 35% 137 Axcelerator 5 4,849 2,351 7,200 AX1000 40% 80 Axcelerator 6 10,637 4,973 15,610 AX2000 39% 51 RTAX-S 1 269 200 469 RTAX1000S 3% 151 RTAX-S 2 989 559 1,548 RTAX1000S 8% 110 RTAX-S 3 1,795 1,007 2,802 RTAX1000S 15% 105 RTAX-S 4 1,795 1,007 2,802 RTAX1000S 15% 105 RTAX-S 5 4,849 2,351 7,200 RTAX1000S 40% 68 RTAX-S 6 10,637 4,973 15,610 RTAX2000S 48% 44 SX-A 1 484 173 657 A54SX16A 45% 101 SX-A 2 2,162 532 2,694 A54SX72A 44% 59 SX-A 3 493 983 1,476 A54SX72A 86% 67 Family Note: The data above are obtained by typical synthesis and place-and-route methods. Other core parameters can result in different utilization and performance. 4 v4.0 CoreFIR Finite Impulse Response Filter Generator Table 1 • CoreFIR Device Utilization and Performance for Constant Coefficient Mode (Continued) Cells or Tiles Utilization Family Configuration Combinatorial Sequential Total Device Total Throughput Msps RTSX-S 1 502 172 674 RT54SX32S 23% 26 RTSX-S 2 2,159 532 2,691 RT54SX72S 45% 36 RTSX-S 3 4,193 983 5,176 RT54SX72S 86% 39 Note: The data above are obtained by typical synthesis and place-and-route methods. Other core parameters can result in different utilization and performance. Table 2 • Test Configurations Configuration nbits_input nbits_coef 1 8 16 8 0 2 16 16 16 1 3 12 15 32 0 4 12 15 32 1 5 16 15 64 0 6 16 16 128 1 v4.0 ntaps data_signed 5 CoreFIR Finite Impulse Response Filter Generator Distributed Arithmetic Mode CoreFIR generates FIR filters with many configurations. Table 3 provides the typical utilization and performance data for the generated FIR filters implemented with the configurations listed in Table 4 on page 6. Refer to Table 4 on page 6 for the Configuration column in Table 3. Table 3 • CoreFIR Device Utilization and Performance for Distributed Arithmetic Mode Cells or Tiles Utilization Configuration Combinatorial Sequential Total RAM Blocks ProASICPLUS 1 558 116 674 0 APA075 22% 3.63 PLUS ProASIC 2 2,054 427 2,481 0 APA150 40% 1.19 ProASICPLUS 3 3,540 661 4,201 0 APA1000 8% 1.58 PLUS 4 6,391 872 7,271 8 APA1000 13% 1.42 PLUS ProASIC 5 8,775 1,606 10,381 0 APA750 32% 0.81 Axcelerator 1 229 148 377 0 AX125 19% 21.75 Axcelerator 2 693 478 1,171 0 AX250 28% 6.88 Axcelerator 3 1,231 719 1,950 0 AX250 46% 9.25 Axcelerator 4 2,249 852 3,101 4 AX500 38% 6.17 Axcelerator 5 3,129 1,704 4,833 0 AX1000 27% 4.56 Axcelerator 6 9,132 3,355 12,487 32 AX2000 39% 2.88 RTAX-S 1 229 148 377 0 RTAX1000S 2% 14.25 RTAX-S 2 693 478 1,171 0 RTAX1000S 6% 4.75 RTAX-S 3 1,231 719 1,950 0 RTAX1000S 11% 5.50 RTAX-S 4 2,249 852 3,101 4 RTAX1000S 17% 3.42 RTAX-S 5 3,129 1,704 4,833 0 RTAX1000S 27% 2.81 RTAX-S 6 9,132 3,355 12,487 32 RTAX2000S 39% 1.81 SX-A 1 386 159 545 0 A54SX16A 38% 14.00 SX-A 2 1,115 480 1,595 0 A54SX72A 26% 4.00 SX-A 3 1,831 727 2,558 0 A54SX72A 42% 5.25 RTSX-S 1 381 159 540 0 RT54SX32S 19% 6.50 RTSX-S 2 1,115 480 1,595 0 RT54SX72S 26% 2.25 RTSX-S 3 1,831 727 2,558 0 RT54SX72S 42% 3.00 Family ProASIC Device Total Throughput Msps Notes: 1. The data above are obtained by typical synthesis and place-and-route methods. Other core parameters can result in different utilization and performance. 2. Cell (tile) count may vary depending on the actual coefficient values. Table 4 • Test Configurations 6 Configuration nbits_input nbits_coef ntaps fpga_family coef_fixed 1 8 16 8 All 1 2 16 16 16 All 1 3 12 15 32 All 1 4 12 15 32 AX, RTAX-S, APA 0 5 16 15 64 All 1 6 16 16 128 AX, RTAX-S, APA 0 v4.0 CoreFIR Finite Impulse Response Filter Generator Constant Coefficient Mode CoreFIR Generator Parameters CoreFIR generates the RTL code for FIR filters with a variety of parameters. These parameters include generic FIR parameters such as number of taps, number of input bits, number of coefficient bits, and data type. CoreFIR supports the variations specified in Table 5. Table 5 • CoreFIR Generator Configuration Parameters for Constant Coefficient Mode Recommended Selection Parameter Name Description module_name Name of generated module nbits_input Number of input data bits nbits_coef Number of coefficients data bits ntaps Number of taps tap Array of coefficients data_signed Data type; 0 = unsigned, 1 = signed AX APA SX-A – 2–24 2–24 2–128 – – 2–24 2–24 2–64 – – 2–24 2–24 2–32 – 0, 1 0, 1 0, 1 Number of Taps (ntaps) Refer to "Appendix II" on page 17 for a sample usage of the parameters shown in Table 5 in a configuration file for the CoreFIR Generator. Detailed discussion about these parameters follows. The CoreFIR generator supports the number of taps specified by the device family in Table 6 on page 11. The variable ntaps in the configuration file specifies the setting of this parameter. Refer to "Appendix I" on page 17 for details. Module Name (module_name) The CoreFIR Generator supports a configuration parameter, module_name, that specifies the name of the generated module. The generated testbench has the name <module_name>_tb. Refer to Table 6 on page 11 and "Appendix I" on page 17 for details. Signed/Unsigned Inputs and Coefficients (data_signed) The CoreFIR Generator supports signed or unsigned operations. The generator supports two cases: both input and coefficient are unsigned, or both input and coefficient are signed. It supports an unsigned implementation when data_signed is equal to 0, and a signed implementation when the configuration parameter data_signed is equal to 1. Refer to Table 6 on page 11 and "Appendix I" on page 17 for details. Number of Input (nbits_in) and Coefficients (nbits_coef) Bits The CoreFIR Generator supports the number of inputs and coefficient bits for the device families specified in Table 6 on page 11. These parameters are set with the variables nbits_in and nbits_coef in the configuration file. Refer to "Appendix I" on page 17 for details. v4.0 7 CoreFIR Finite Impulse Response Filter Generator Distributed Arithmetic Mode FIR Filter Using Distributed Arithmetic Algorithm Distributed Arithmetic Algorithm Overview FIR filters are used in applications that require exact linear phase response. Typical applications for an FIR filter include image processing, digital audio, digital communication, and biomedical signal processing. An FIR filter is defined in EQ 1: ntaps – 1 y[n] = ∑ c[n] × x[n] 0 EQ 1 where: c[n] = h[ntaps – n – 1] and h is the impulse response. The term ntaps is the number of taps. In summary, the direct computation for one point of FIR requires: ntaps multiplications + (ntaps – 1) additions. Distributed Arithmetic (DA) is a well-known method for eliminating resources in multiply-and-accumulate structures (MACs) implementing digital signal processing (DSP) functions. DA trades memory for combinatorial elements, resulting in an efficient implementation in FPGAs. Another feature of DA is its easy serialization of the input, which further reduces the cost of operation when FIR data rate is low compared to the system clock, a common scenario in FIR applications. The input of an FIR can be expressed in the composition of its bits, as shown in EQ 2: x[n] = nbits_in – 1 ∑ x[n][b] × 2 b 0 EQ 2 where x[n][b] is the shown in EQ 3: bth bit of x[n] and nbits_in is the number of bits of input. The resulting output of the FIR filter is ntaps – 1 y[n] = ∑ nbits_in – 1 ntaps – 1 ∑ c[n] x[n] = 0 c[n] ∑ x[n][b]2 b 0 0 EQ 3 Changing the summation order gives the results shown in EQ 4: nbits_in – 1 y[n] = ∑ 0 2 b ntaps – 1 ∑ nbits_in – 1 c[n] x[n][b] = 0 ∑ b 2 T(X[b]) 0 EQ 4 ntaps – 1 where T(X[b]) = ∑ th c[n] x[n][b] and X[b] is a collection of the b bits of ntaps different taps. 0 Note that the x[n][b] can only be 0 or 1. There are 2ntaps different values of T. If T is pre-calculated and stored inside a RAM or ROM, the FIR computation becomes nbits_in table lookup operations using x[b] and nbits_in – 1 additions. Multiplication operations are eliminated. In summary, the FIR computation using DA for one point of FIR requires: nbits_in table lookups + (nbits_in – 1) additions. The cost to eliminate multiplication is a memory block to store 2ntaps pre-computed values. The serialization of table lookup and addition is possible because table T is the same for each b. If one table 8 v4.0 CoreFIR Finite Impulse Response Filter Generator the first cycle of the example, all 0th bits of inputs x[n] to x[n – 3] are fed into the lookup table as an input address; in the second cycle, all 1st bits of inputs x[n] to x[n – 3] are fed into the lookup table; in the third cycle, all 2nd bits of inputs x[n] to x[n – 3] are fed into the lookup table; and in the fourth cycle, all 3rd bits of inputs x[n] to x[n – 3] are fed into the lookup table. The shifter shifts the outputs of the lookup table for the inputs of the adder, which accumulates for the final result. lookup and one addition can be finished in one cycle, the total computation will finish in b cycles. The serialization of the FIR introduces further opportunity to reduce the size of the design, which is the key to an efficient FPGA design. Example Design of an FIR Filter Using DA An example of an FIR with four taps (ntaps = 4) and four bits for inputs (nbits_in = 4) is shown in Figure 4. The expression x[n][b] represents the bth bit of input x[n]. In x[n][3] x[n][2] x[n – 1][3] x[n – 1][2] x[n][0] x[n][1] x[n –1][1] x[n – 1][0] Lookup Table x[n – 2][3] x[n – 2][2] x[n – 2][1] x[n – 2][0] x[n – 3][3] x[n – 3][2] x[n – 3][1] x[n – 3][0] Shifter Flow Control Adder Reg Figure 4 • Example Implementation of a Bit-Serialized FIR Filter Using DA The serialized DA implementation in Figure 4 uses a table lookup with 16 words and takes four clock cycles to finish one FIR point. Storage and Large Number of Taps As seen in the previous section, the size of the lookup table is 2ntaps, which is increases exponentially with ntaps. A design with a large number of taps needs to have several lookup tables. Let ntaps = p × q. If we split taps into p groups, each group has q taps. Then the FIR becomes as shown in EQ 5: nbits_in – 1 y[n] = ∑ 2 b 0 n=ntaps – 1 ∑ c[n] x[n][b] = 0 nbits_in – 1 b n=pq – 1 0 0 ∑ 2 ∑ c[n] x[n][b] EQ 5 By splitting ntaps into two level summations, we have the result shown in EQ 6: y[n] = nbits_in – 1 b ∑ 0 2 i=p-1 j=q-1 ∑ ∑ 0 c[iq + j] x[iq + j][b] 0 EQ 6 Refer to the "DA FIR Filter with Large Number of Taps" section on page 13 for further information. v4.0 9 CoreFIR Finite Impulse Response Filter Generator DA Functional Block Description The functional blocks shown in Figure 3 on page 3 illustrate the architecture of the generated FIR filter using the DA algorithm. Input Buffers The Input Buffers block stores the input data, which contains ntaps data points, where ntaps is the number of taps of the FIR filter. The Input Buffers block also circulates the data bits to address the DA lookup tables (LUTs) required by the DA algorithm. An optional function of the Input Buffers block is to share its storage with the DA LUT generator. The coefficients used in computing the LUT content can be stored in the input buffers when a design uses the embedded RAM blocks for the LUTs. DA Lookup Tables (LUTs) The DA LUTs store the LUT contents for the distributed algorithm. The generator implements the DA LUTs in two ways: (1) synthesized ROM using FPGA cells and (2) embedded RAM blocks supported by the on-chip DA LUT Generator. The first method is for an FPGA without embedded RAM blocks, intended primarily for a small FIR filter. The latter is for an FPGA with embedded RAM blocks. FIR filters with a large number of taps may require multiple LUTs. Shifter and Accumulator The Shifter and Accumulator block performs additions with LUT outputs and the alignments of LUT outputs required by the DA algorithm. Multiple accumulators 10 v4.0 and shifters may be needed to implement an FIR filter with a large number of taps. DA LUT Generator The DA LUT Generator computes the LUT contents required by the distributed arithmetic algorithm. It reads the coefficients from the Input Buffers block and writes the LUT words into the embedded RAM blocks. These blocks are available only for designs that use embedded RAM blocks as LUTs. The DA LUT Generator produces LUT contents for multiple LUTs when implementing an FIR filter with variable coefficients. Refer to "DA LUT Generation" on page 14 and "I/O Timing Diagram of LUT Initialization" on page 16 for detailed information on initialization of the DA LUT. Control The state machine inside the Control block controls the operations of all other blocks. It controls the input buffers to ensure they operate based on the specified system clock rate and sample rate, monitors input enable and coefficient input enable, and circulates input data bits to address the DA LUTs. It also controls the shifters and accumulators to ensure they operate based on the requested FIR configuration and DA algorithm. The Control block coordinates the initialization of the LUTs by the DA LUT generator when using embedded RAMs. The Control logic is designed to support folding or serialization of computation when the system clock rate is substantially higher than the data sampling rate. CoreFIR Finite Impulse Response Filter Generator CoreFIR Generator Parameters in DA Mode CoreFIR generates the RTL code for FIR filters with a variety of parameters. These parameters include generic FIR parameters such as number of taps, number of input bits, number of coefficient bits, and data type, as well as implementation parameters such as FPGA family, use of embedded RAMs, system clock rate, and data sampling rate. CoreFIR supports the variations specified in Table 6. Table 6 • CoreFIR Generator Configuration Parameters for Distributed Arithmetic Mode Recommended Selection Parameter Name Description Axcelerator ProASICPLUS SX-A – – – module_name Name of generated module nbits_input Number of input data bits 2–24 –24 2–24 nbits_coef Number of coefficient data bits 2–24 –24 2–24 ntaps Number of taps 2–128 2–64 2–32 tap Array of coefficients – – – data_signed Data type: 0 = unsigned, 1 = signed 0, 1 0, 1 0, 1 fpga_family FPGA family ax apa sxa coef_fixed 0 = filter with configurable coefficients 1= filter with fixed coefficients – – – sys_clk_frq Input clock frequency – – – sample_ratio Sampling rate = sys_clk_frq/sample_ratio ≥ nbits_in ≥ nbits_in ≥ nbits_in Module_lang Reserved—VHDL only. VHDL VHDL VHDL where ceil is the ceiling function for floating point data. Refer to "Appendix I" on page 17 for a sample usage of the parameters shown in Table 6 in a configuration file for the CoreFIR Generator. Detailed discussion about these parameters follows. Asymmetric FIR and Symmetric FIR The CoreFIR generator supports an asymmetric FIR filter only. Symmetric FIR filters will be supported in future releases. Number of Taps (ntaps) The CoreFIR generator supports the number of taps specified by the device family in Table 6. The variable ntaps in the configuration file specifies the setting of this parameter. Refer to "Appendix I" on page 17 for details. Embedded RAM as LUTs (coef_fixed) The CoreFIR Generator utilizes a switch that determines whether to implement DA LUTs with embedded RAM blocks or with synthesized ROM using FPGA cells. The LUTs are implemented with synthesized ROM using FPGA cells when coef_fixed is equal to 1. The LUTs are implemented with embedded RAM blocks available for Axcelerator, ProASICPLUS, and ProASIC3 devices when coef_fixed is equal to 0. This setting may be set to 1 for a filter design with fixed coefficients for an FPGA device with embedded RAM, such as Axcelerator, RTAX-S, ProASICPLUS, or ProASIC3, since the overhead of the DA LUT Generator overrides the benefits of using an embedded RAM block as a LUT. The coef_fixed configuration parameter is valid only when the configuration parameter fpga_family is set to ax, apa, or pa3. Refer to Table 6 on page 11 and "Appendix I" on page 17 for details. Number of Input (nbits_in) and Coefficient (nbits_coef) Bits The CoreFIR Generator supports the number of input and coefficient bits for the device families specified in Table 6. These parameters are set with the variables nbits_in and nbits_coef in the configuration file. Refer to "Appendix I" on page 17 for details. Number of Bits of Output (nbits_out) The CoreFIR Generator supports only full-precision computation. Thus, the number of output bits is determined by the number of input and coefficient bits for the device family, as specified in Table 6. The number of output bits is specified by EQ 7: nbits_out = nbits_in + nbits_coef + ceil(log2(ntaps)) – 1 EQ 7 v4.0 11 CoreFIR Finite Impulse Response Filter Generator Signed/Unsigned Inputs and Coefficients (data_signed) Module Name (module_name) The CoreFIR Generator supports signed or unsigned operations. The generator supports two cases: both input and coefficient are unsigned, or both input and coefficient are signed. It supports an unsigned implementation when the configuration parameter data_signed is equal to 0, and a signed implementation when data_signed is equal to 1. Refer to Table 6 on page 11 and "Appendix I" on page 17 for details. System Clock Frequency (sys_clk_frq) The CoreFIR Generator reads in the system clock frequency via configuration parameter sys_clk_frq. The generated testbench assigns this frequency to its clock generation. The generated design runs at this frequency inside the testbench. The configuration parameter should be specified in MHz. Refer to Table 6 on page 11 and "Appendix I" on page 17 for details. Sample Ratio (sample_ratio) The CoreFIR Generator supports a configuration parameter, sample_ratio, that specifies the sampling rate against the system clock frequency. The data sampling rate is equal to sys_clk_frq/sample_ratio. This parameter provides guidance in implementing a folding architecture to reduce the size of the design. The configuration parameter sample_ratio can only be a positive integer greater than 1. Refer to Table 6 on page 11 and "Appendix I" on page 17 for details. 12 v4.0 The CoreFIR Generator supports a configuration parameter, module_name, that specifies the name of the generated module. The generated testbench has the name <module_name>_tb. Refer to Table 6 on page 11 and "Appendix I" on page 17 for details. FPGA Family (fpga_family) The CoreFIR Generator supports a configuration parameter, fpga_family, that specifies the targeted Actel FPGA device family. The options are ax, apa, pa3, and sxa. The option for RTAX-S is ax. The option for RTSX-S is sxa. Refer to Table 6 on page 11 and "Appendix I" on page 17 for details. Architecture Variations The DA algorithm for FIR provides an excellent solution, but also introduces many variations on the design architecture due to FPGA resource limitations. CoreFIR Finite Impulse Response Filter Generator DA FIR Filter with Large Number of Taps This section provides an example of CoreFIR in DA mode with a large number of taps. coefficients into eight or nine taps for each LUT when embedded RAM blocks are available. As illustrated in the "Constant Coefficient Mode" section on page 4, the number of words of the DA LUT is 2ntaps, which increases exponentially with ntaps. A LUT splitting method, as defined in the "Storage and Large Number of Taps" section on page 9, effectively reduces memory usage. The CoreFIR Generator utilizes this method to reduce the memory usage. It usually splits the An example of the split lookup table implementation of an FIR filter with eight taps (ntaps = 8) and four input bits (nbits_in = 4) is shown in Figure 5. In the example, eight taps have been split into two groups. Each group has four taps and addresses separate lookup tables. This differs from the case in Figure 4 on page 9, which only has one LUT. x[n][3] x[n][2] x[n – 1][3] x[n – 1][2] x[n – 1][1] x[n – 1][0] x[n – 2][3] x[n – 2][2] x[n – 2][1] x[n – 2][0] x[n – 3][3] x[n – 3][2] x[n – 3][1] x[n – 3][0] x[n – 4][3] x[n – 4][2] x[n – 4][1] x[n – 4][0] x[n – 5][3] x[n – 5][2] x[n – 5][1] x[n – 5][0] x[n – 6][3] x[n – 6][2] x[n – 6][1] x[n – 6][0] x[n – 7][3] x[n –-7][2] x[n – 7][1] x[n – 7][0] x[n][1] x[n][0] Lookup Table Lookup Table Shifter Shifter Flow Control Adder Reg Adder Reg Adder Output Figure 5 • Example of Split Lookup Table Implementation Folding lookup and additions to finish one output point of the FIR. If the system clock rate is nbits_in times faster than the data rate, the serialization of table lookup and additions is done with the optimized timing. The parameter sample_ratio defines the ratio between the system clock rate (sys_clk_frq) and the data sampling rate (data_rate), as shown in EQ 8: The system clock rate of many FIR filter systems is a multiple of the data rate (or data sampling rate). For typical FPGA implementation, the size of the design is key for efficient implementation. Thus, exploitation of the ratio between the system clock rate and data rate is an effective approach to reduce the size of the design. In other words, folding or serialization of the computation can reduce the size of the design. The DA algorithm for FIR introduces bit-serialization of the operations. This property of DA can be very efficient for exploring the ratio between system clock rate and data rate. If the number of input bits is nbits_in, it takes nbits_in table sample_ratio = sys_clk_frq/data_rate EQ 8 CoreFIR supports folding when sample_ratio is greater than or equal to nbits_in. The serialized operations of table lookup and addition are done in nbits_in clock v4.0 13 CoreFIR Finite Impulse Response Filter Generator cycles of the system clock, and the design is idle during the rest of sample_ratio and nbits_in cycles. The generator only requires that the sample_ratio be an integer; the system clock rate is an exact multiple of the data rate. Future releases may support a sample_ratio less than nbits_in. DA LUTs Using FPGA Cells Some Actel FPGA families, such as SX-A and RTSX-S, do not have an embedded RAM implementation. In this case, the CoreFIR Generator requires that the lookup table be hard-coded as ROM using FPGA cells. This configuration does not need the DA LUT Generator shown in Figure 3 on page 3. The generator selects this configuration when the configuration parameter coef_fixed is set to 1 or the configuration parameter fpga_family is not set to ax, apa, or pa3. DA LUTs Using Embedded RAM Blocks Many Actel FPGA families have embedded RAM blocks. The CoreFIR generator takes advantage of these embedded RAM blocks, and the DA LUTs are implemented using embedded RAM. This configuration requires additional overhead in that the embedded RAM blocks must be initialized by a DA LUT Generator as shown in Figure 3 on page 3. The generator selects this configuration when the configuration parameter coef_fixed is set to 0 and the configuration parameter fpga_family is set to ax, apa, or pa3. DA LUT Generation from the Input Buffers block and writes the LUT words into the embedded RAM blocks. This block is only available when using embedded RAM blocks as LUTs (when coef_fixed is set to 0). After the reset is complete, the DA LUT Generator will wait for the Input Buffers block to signal that the coefficients are loaded into the input buffers. Then the DA LUT generator will compute the LUT words and write them into the embedded RAM blocks. The DA LUT Generator produces LUT contents for multiple LUTs when implementing an FIR filter with a large number of taps. The generator has only one computation engine and initializes multiple LUTs sequentially. After the initialization of the RAM blocks, the output ready will go high to let the system know that the FIR filter is ready to accept data. Input Buffering Scheme The Input Buffers block always performs the functions defined in the "Input Coefficients" section, but only performs functions defined in the "Input Data" section when the embedded RAM blocks are used as the DA LUTs (when coef_fixed is set to 1). Input Data The input dataflow is designed to use the scheme shown in Figure 6 to reduce the size of registers. The horizontal movement of the input ensures the input bits feed into the lookup table, and that this happens on every cycle. Vertical movement of input data only occurs as the most significant bit (MSB) is fed into the lookup table, when it switches to the next FIR data point. The DA LUT Generator computes the LUT contents of the distributed arithmetic algorithm. It reads the coefficients x[n][3] x[n][2] x[n][1] x[n][0] x[n-1][3] x[n-1][2] x[n-1][1] x[n-1][0] x[n-2][3] x[n-2][2] x[n-2][1] x[n-2][0] x[n-3][3] x[n-3][2] x[n-3][1] x[n-3][0] Figure 6 • Example of Input Buffering Scheme Input Coefficients The CoreFIR generator shares the input buffers for coefficient input when embedded RAM blocks are used as the DA LUTs (the configuration parameter coef_fixed is set to 0). In this configuration, the width of the input datai is the maximum of nbits_in and nbits_coef. The input datai reads in coefficients when input coefi_en is high. After enough coefficients are fed into the buffer, coefi_en is ignored and the coefficients stay inside the 14 v4.0 input buffers until the DA LUT Generator finishes the initialization of the embedded RAM blocks. User Interface The generator executable reads one command line parameter, which is the name of the configuration file. It generates RTL code for the module and testbench based on the parameters in the configuration file. Refer to Table 6 on page 11 and "Appendix I" on page 17 for details of the configuration file. CoreFIR Finite Impulse Response Filter Generator I/O Signal Description All following sections apply to both constant coefficient and distributed arithmetic architectures. The FIR filter generated by the CoreFIR Generator consists of the I/O signals defined in Table 7 (see Figure 7). Table 7 • I/O Signal Description I/O Signal Direction Width Polarity Description clk Input 1 N/A Master clock, positive edge rstn Input 1 Active low Master reset, asynchronous datai_en Input 1 Active high Input data enable N/A Input data or coefficients1 datai1 datao_valid datao coefi_en 2 ready2 1 Input nbits_in Output 1 Active high Output data valid Output nbits_out3 N/A Output data Input 1 Active high Coefficient input enable Output 1 Active high Ready to input datai Notes: 1. Input datai is also the input for coefficients for design using embedded RAMs as DA LUTs. In this case the width can be the maximum of nbits_in and nbits_coef. 2. Ports coefi_en and ready are only available when coef_fixed = 0. 3. Refer to the "Number of Bits of Output (nbits_out)" section on page 11 for details. 4. Refer to Table 6 on page 11 for nbits_in and nbits_coef. FIR clk rstn datao_valid datai_en datao datai ready* datao_valid coefi_en* Note: *coefi_en and ready are available when coef_fixed = 0. Figure 7 • I/O Signals Clock and Reset Clock Reset CoreFIR generates an FIR filter design that uses only positive-edge-triggered registers. The entire design is fully synchronized using the positive edge of the input clock clk, including the embedded RAM blocks (when available). CoreFIR generates a design that uses only one active low asynchronous reset. The entire design is asynchronously reset by the input rstn. v4.0 15 CoreFIR Finite Impulse Response Filter Generator Input and Output Timing This section applies to both constant coefficient and distributed arithmetic architectures. point for input data, and s1 is the sampling point for output data. Due to variations of the configuration, refer to comments in the generated module for t0 and t1. These parameters are given in the number of clock cycles of the input clock, clk. I/O Timing Diagram of Normal FIR Operation The I/O timing under normal FIR operation is given in Figure 8. The labels s0 and s2 refer to the data sampling s0 s1 t1 s2 t0 clk datai_en datai 0 1 datao_valid datao 0 Figure 8 • I/O Timing Diagram of Normal FIR Operation I/O Timing Diagram of LUT Initialization The I/O timing for LUT initialization is given in Figure 9. In this figure, s0 and s1 are the starting and ending points for feeding coefficients, and s2 is the sampling point for the output ready. Due to variation of the s0 configuration, refer to the comments inside the generated module for t2. These parameters are given in the number of clock cycles of the input clock clk. t2 s1 clk coefi_en datai 0 1 ntaps – 1 2 ready Figure 9 • I/O Timing Diagram for LUT Initialization 16 v4.0 s2 CoreFIR Finite Impulse Response Filter Generator Appendix I Sample Configuration File for DA Mode The following is a sample configuration file for Distributed Arithmetic Mode. module_name firtest nbits_input 8 nbits_coef 5 ntaps 13 tap 8 14 21 27 31 31 27 21 14 8 4 2 1 data_signed 0 fpga_family ax coef_fixed 1 sys_clk_frq 25 sample_ratio 16 module_lang vhdl Appendix II Sample Configuration File for CC Mode The following is a sample configuration file for Constant Coefficient Mode. module_name fir_const nbits_input 12 nbits_coef 16 ntaps 16 taps 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 data_signed 1 v4.0 17 CoreFIR Finite Impulse Response Filter Generator Ordering Information Order CoreFIR through your local Actel sales representative. Use the following naming convention when ordering: CoreFIR-XX, where XX is listed in Table 8. Table 8 • Ordering Codes XX Description EV Evaluation version AR RTL for unlimited use on Actel devices UR RTL for unlimited use and not restricted to Actel devices List of Changes The following table lists critical changes that were made in the current version of the document. Previous Version Changes in Current Version (v 4 .0 ) v3.0 Page The "Key Features" section was updated to include information for the constant coefficient algorithm. 1 Table 3 was updated to include Throughput. 6 Information was added for constant coefficient mode and the document was reorganized to make clear the differences between these modes. v2.1 v2.0 3–13 Appendix II was added for Constant Coefficient Mode. 17 The "Supported Families" section was updated to include Fusion. 1 Table 3 was updated to include Fusion data. 6 The "Supported Families" section was updated. 1 Table 3 was updated. 6 Datasheet Categories In order to provide the latest information to designers, some datasheets are published before data has been fully characterized. Datasheets are designated as "Product Brief," "Advanced," and "Production." The definitions of these categories are as follows: Product Brief The product brief is a summarized version of an advanced or production datasheet containing general product information. This brief summarizes specific device and family information for unreleased products. Advanced This datasheet version contains initial estimated information based on simulation, other products, devices, or speed grades. This information can be used as estimates, but not for production. Unmarked (production) This datasheet version contains information that is considered to be final. 18 v4.0 Actel and the Actel logo are registered trademarks of Actel Corporation. All other trademarks are the property of their owners. www.actel.com Actel Corporation Actel Europe Ltd. Actel Japan www.jp.actel.com Actel Hong Kong www.actel.com.cn 2061 Stierlin Court Mountain View, CA 94043-4655 USA Phone 650.318.4200 Fax 650.318.4600 Dunlop House, Riverside Way Camberley, Surrey GU15 3YL United Kingdom Phone +44 (0) 1276 401 450 Fax +44 (0) 1276 401 490 EXOS Ebisu Bldg. 4F 1-24-14 Ebisu Shibuya-ku Tokyo 150 Japan Phone +81.03.3445.7671 Fax +81.03.3445.7668 Suite 2114, Two Pacific Place 88 Queensway, Admiralty Hong Kong Phone +852 2185 6460 Fax +852 2185 6488 51700056-3/5.06