CoreFIR Finite Impulse Response (FIR) Filter Generator Product Summary Core Deliverables • Intended Use • – Finite Impulse Response (FIR) Filter for Actel FPGAs • Key Features • – • Self-Checking – Executable Tests Generated Output against Algorithm – A Microsoft Windows® Binary Executable of the CoreFIR Generator – VHDL FIR Module – VHDL Test Harness Synthesis and Simulation Support Distributed Arithmetic (DA) Algorithm – Multiplier-Free Computation – Low Cost – Optimized for Actel FPGAs Folding Architecture to Minimize Design Size – • Executable File Outputs Run-Time Library (RTL) Code and Testbench Based on Input Parameters Serialized Computation when System Clock Rate is Faster than the Data Sample Rate On-Chip DA Lookup Table Generator for FPGA with Embedded RAMs • Embedded RAMs Initialized as DA Lookup Table • DA Lookup Table ROM Synthesis for FPGA without Embedded RAMs • Multiple DA lookup Tables to Split Large Number of Taps • Actel FPGA-Optimized RTL Code • Supports 2 to 128 Taps • 1- to 32-Bit Input Data and Coefficient Precision Synthesis: Synplicity®, Synopsys® (Design ® TM Compiler /FPGA ExpressTM), Compiler /FPGA ExemplarTM • Simulation: OVI-Compliant Verilog Simulators and Vital-Compliant VHDL Simulators Device Utilization and Performance ......................... 2 FIR Filter Using Distributed Arithmetic Algorithm ... 3 General Description ................................................... 5 Functional Block Description ..................................... 5 I/O Signal Description ................................................ 6 CoreFIR Generator Parameters .................................. 7 FIR Filter with Large Number of Taps ....................... 8 Clock and Reset ........................................................ 11 Input and Output Timing ........................................ 11 Appendix I: Sample Configuration File ................... 12 Ordering Information .............................................. 13 List of Changes ......................................................... 13 Datasheet Categories ............................................... 13 Lookup Tables Utilize Embedded RAMs • • Contents Efficient Structure Using Embedded RAMs – RTL Code of a Sample Filter and Compiled RTL Simulation Model Fully Supported in the Actel Libero® Integrated Design Environment (IDE) RTL Version Core Generator – • Evaluation Version Supported Families • Fusion • ProASIC3/E • ProASICPLUS ® • Axcelerator® • RTAX-S • SX-A • RTSX-S December 2005 © 2005 Actel Corporation v 3 .0 1 CoreFIR Device Utilization and Performance The CoreFIR generates FIR filters with many configurations. Table 1 provides the typical utilization and performance data for the generated FIR filters implemented with the configurations listed in Table 2 on page 3. Refer to Table 2 on page 3 for the Configuration column in Table 1. Table 1 • CoreFIR Device Utilization and Performance Cells or Tiles Family Configuration Combinatorial Sequential Utilization Total RAM Blocks Device Total Performance MHz Fusion 1 454 129 583 0 AFS060 38% 69 Fusion 2 1410 375 1784 0 AFS250 29% 56 Fusion 3 3080 679 3759 0 AFS250 61% 52 Fusion 4 5511 935 6446 8 AFS600 47% 45 Fusion 5 7089 1708 8797 0 AFS600 64% 40 Fusion 6 24356 3718 28074 45 AFS1500 73% 31 ProASIC3 1 454 129 583 0 A3P060 38% 69 ProASIC3 2 1410 375 1785 0 A3P125 58% 56 ProASIC3 3 3080 679 3759 0 A3P1000 15% 52 ProASIC3 4 5511 935 6446 8 A3P1000 26% 45 ProASIC3 5 7089 1708 8797 0 A3P1000 36% 40 ProASIC3 6 24356 3718 28074 45 A3P1500 73% 31 ProASICPLUS 1 558 116 674 0 APA075 22% 29 ProASICPLUS 2 2054 427 2481 0 APA150 40% 19 ProASICPLUS 3 3540 661 4201 0 APA1000 8% 19 ProASICPLUS 4 6391 872 7271 8 APA1000 13% 17 ProASICPLUS 5 8775 1606 10381 0 APA750 32% 13 Axcelerator 1 229 148 377 0 AX125 19% 174 Axcelerator 2 693 478 1171 0 AX250 28% 110 Axcelerator 3 1231 719 1950 0 AX250 46% 111 Axcelerator 4 2249 852 3101 4 AX500 38% 74 Axcelerator 5 3129 1704 4833 0 AX1000 27% 73 Axcelerator 6 9132 3355 12487 32 AX2000 39% 46 RTAX-S 1 229 148 377 0 RTAX1000S 2% 114 RTAX-S 2 693 478 1171 0 RTAX1000S 6% 76 RTAX-S 3 1231 719 1950 0 RTAX1000S 11% 66 RTAX-S 4 2249 852 3101 4 RTAX1000S 17% 41 RTAX-S 5 3129 1704 4833 0 RTAX1000S 27% 45 RTAX-S 6 9132 3355 12487 32 RTAX2000S 39% 29 SX-A 1 386 159 545 0 A54SX16A 38% 112 SX-A 2 1115 480 1595 0 A54SX72A 26% 64 SX-A 3 1831 727 2558 0 A54SX72A 42% 63 RTSX-S 1 381 159 540 0 RT54SX32S 19% 52 RTSX-S 2 1115 480 1595 0 RT54SX72S 26% 36 RTSX-S 3 1831 727 2558 0 RT54SX72S 42% 36 Notes: 1. The data above are obtained by typical synthesis and place-and-route methods. Other core parameters can result in different utilization and performance. 2. Cell (tile) count may vary depending on the actual coefficient values. 2 v3.0 CoreFIR Table 2 • Test Configurations Configuration nbits_input nbits_coef ntaps fpga_family coef_fixed 1 8 16 8 All 1 2 16 16 16 All 1 3 12 15 32 All 1 4 12 15 32 AX, RTAX-S, APA 0 5 16 15 64 All 1 6 16 16 128 AX, RTAX-S, APA 0 FIR Filter Using Distributed Arithmetic Algorithm Distributed Arithmetic Algorithm Overview FIR filters are used in applications that require exact linear phase response. Typical applications for a FIR filter include: image processing, digital audio, digital communication, and biomedical signal processing. A FIR filter is defined in EQ 1: ntaps – 1 y[n] = ∑ c[n] × x[n] 0 EQ 1 where: c[n] = h[ntaps - n -1] and h is the impulse response. The term ntaps is short for number of taps. In summary, the direct computation for one point of FIR requires: ntaps multiplications + (ntaps-1) additions. Distributed Arithmetic (DA) is a well-known method for eliminating resources in multiply-and-accumulate structures (MACs) implementing digital signal processing (DSP) functions. DA trades memory for combinatory elements, resulting in an efficient implementation in FPGAs. Another feature of DA is its easy serialization of the input, which further reduces the cost of operation when FIR data rate is low compared to the system clock, a common scenario in FIR applications. The input of a FIR can be expressed in the composition of its bits, as shown in EQ 2: x[n] = nbits_in – 1 ∑ x[n][b] × 2 b 0 EQ 2 where x[n][b] is the bth bit of x[n] and nbits_in is the number of bits of input. The resulting output of the FIR filter is shown in EQ 3: ntaps – 1 y[n] = ∑ nbits_in – 1 ntaps – 1 c[n] x[n] = 0 ∑ c[n] 0 ∑ x[n][b]2 b 0 EQ 3 Changing the summation order gives the results shown in EQ 4: nbits_in – 1 y[n] = ∑ 0 2 b ntaps – 1 ∑ nbits_in – 1 c[n] x[n][b] = 0 ∑ b 2 T(X[b]) 0 EQ 4 v3.0 3 CoreFIR ntaps – 1 where: T(X[b]) = ∑ c[n] x[n][b] and X[b] is a collection of the bth bits of ntaps different taps. 0 Note that the x[n][b] can only be 0 or 1. There are 2ntaps different values of T. If T is pre-calculated and stored inside a RAM or ROM, the FIR computation becomes nbits_in table lookup operations using x[b] and nbits_in–1 additions. Multiplication operations are eliminated. In summary, the FIR computation using DA for one point of FIR requires: nbits_in table lookups + (nbits_in-1) additions. The cost to eliminate multiplication is a memory block to store 2ntaps pre-computed values. The serialization of table lookup and addition is possible because table T is the same for each b. If one table lookup and one addition can be finished in one cycle, the total computation will finish in b cycles. The serialization of the FIR introduces further opportunity to reduce the size of the design, which is the key to an efficient FPGA design. Example Design of a FIR Filter Using DA An example of a FIR with four taps (ntaps = 4) and four bits for inputs (nbits_in = 4) is shown in Figure 1. The expression x[n][b] represents the bth bit of input x[n]. In the example, in the first cycle, all 0th bits of input x[n] to x[n-3] are fed into the lookup table as an input address; in the second cycle, all 1st bits of inputs input x[n] to x[n-3] are fed into the lookup table; in the third cycle, all 2nd bits of inputs input x[n] to x[n-3] are fed into the lookup table; and in the fourth cycle, all 3rd bits of inputs input x[n] to x[n-3] are fed into the lookup table. The shifter shifts the outputs of the lookup table for the inputs of the adder, which accumulates for the final result. x[n][3] x[n][2] x[n][1] x[n][0] x[n-1][3] x[n-1][2] x[n-1][1] x[n-1][0] Lookup Table x[n-2][3] x[n-2][2] x[n-2][1] x[n-2][0] x[n-3][3] x[n-3][2] x[n-3][1] x[n-3][0] Shifter Flow Control Adder Reg Figure 1 • Example Implementation of a Bit-Serialized FIR Using DA The serialized DA implementation in Figure 1 uses a table lookup with 16 words, and takes four clock cycles to finish one FIR point. 4 v3.0 CoreFIR Storage and Large Number of Taps As seen in the previous section, the size of the lookup table is 2ntaps, which is exponentially increased with more ntaps. A design with a large number of taps needs to have several lookup tables. Let ntaps = p × q. If we split taps into p groups, each group has q taps. Then the FIR becomes as shown in EQ 5: nbits_in – 1 y[n] = ∑ 2 b n=ntaps – 1 ∑ 0 c[n] x[n][b] = 0 nbits_in – 1 b n=pq – 1 0 0 ∑ 2 ∑ c[n] x[n][b] EQ 5 By splitting ntaps into two level summations, we have the result shown in EQ 6: y[n] = nbits_in – 1 b ∑ 0 2 i=p-1 j=q-1 ∑ ∑ 0 c[iq + j] x[iq + j][b] 0 EQ 6 Refer to "FIR Filter with Large Number of Taps" on page 8 for further information. General Description The CoreFIR is an Actel FPGA-optimized RTL generator that produces a finite impulse response filter. It implements the DA algorithm to eliminate multiplication for faster and smaller designs. The CoreFIR is a generator which utilizes Actel FPGA’s embedded RAM blocks as DA lookup tables (when available) to further reduce the size of the design. The generator also reads the user system clock rate and data sample rate to explore using a folding or serial architecture to further reduce size, datai Coefficients especially when the system clock rate is much greater than the data sampling rate. The generator automatically switches to the use of multiple DA lookup tables when the requested FIR filter has a large number of taps. Figure 2 shows the functional block diagram of a generated FIR filter design. More complex designs may contain multiple lookup tables, accumulators, or control sections. Input Buffers DA Lookup Tables (RAMs or ROM) DA LUT Generator Shifter Accumulator Control datao Figure 2 • Functional Block Diagram Functional Block Description The functional blocks shown in Figure 2 illustrate the architecture of the generated FIR filter using the DA algorithm. v3.0 5 CoreFIR Input Buffers DA LUT Generator The Input Buffers block stores the input data which contains ntaps data points, where ntaps is the number of taps of the FIR filter. The Input Buffers block also circulates the data bits to address the DA Lookup Tables (LUTs) required by the DA algorithm. An optional function of the input buffers block is to share its storage with the DA LUT generator. The coefficients used in computing the LUT content can be stored in the input buffers when a design uses the embedded RAM blocks for the LUTs. The DA LUT Generator computes the LUT contents required by the distributed arithmetic algorithm. It reads the coefficients from the Input Buffers block and writes the LUT words into the embedded RAM blocks. These blocks are available only for designs that use embedded RAM blocks as LUTs. The DA LUT Generator produces LUT contents for multiple LUTs when implementing a FIR filter with variable coefficients. Refer to "DA LUT Generation" on page 10 and "I/O Timing Diagram of LUT Initialization" on page 11 for detailed information on initialization of the DA LUT. DA Lookup Tables (LUTs) Control The DA LUTs store the LUT contents for the distributed algorithm. The generator implements the DA LUTs in two ways: (1) synthesized ROM using FPGA cells; and (2) embedded RAM blocks supported by the on-chip DA LUT Generator. The first method is for an FPGA without embedded RAM blocks, intended primarily for a small FIR filter. The latter is for an FPGA with embedded RAM blocks. FIR filters with a large number of taps may require multiple LUTs. The state machine inside the Control block controls the operations of all other blocks. It controls the input buffers to ensure they operate based on the specified system clock rate and sample rate, monitors input enable and coefficient input enable, and circulates input data bits to address the DA LUTs. It also controls the shifters and accumulators to ensure they operate based on the requested FIR configuration and DA algorithm. The Control block coordinates the initialization of the LUTs by the DA LUT generator when using embedded RAMs. The Control logic is designed to support folding or serialization of computation when the system clock rate is substantially higher than the data sampling rate. Shifter and Accumulator The Shifter and Accumulator perform additions with LUT outputs and the alignments of LUT outputs required by the DA algorithm. Multiple accumulators and shifters may be needed to implement a FIR filter with a large number of taps. I/O Signal Description The FIR filter generated by the CoreFIR Generator consists of the I/O signals defined in Table 3 (see Figure 3 on page 7). Table 3 • I/O Signal Description I/O Signal Direction Width Polarity Description clk Input 1 N/A Master clock, positive edge rstn Input 1 Active low Master reset, asynchronous datai_en Input 1 Active high Input data enable Input nbits_in1 N/A Input data or coefficients1 Output 1 Active high Output data valid Output nbits_out3 N/A Output data Input 1 Active high Coefficient input enable Output 1 Active high Ready to input datai datai1 datao_valid datao coefi_en2 ready2 Notes: 1. Input datai is also the input for coefficients for design using embedded RAMs as DA LUTs. In this case the width can be the maximum of nbits_in and nbits_coef. 2. Ports coefi_en and ready are only available when coef_fixed = 0. 3. Refer to "Number of Bits of Output (nbits_out)" on page 8 for details. 4. Refer to Table 4 for nbits_in and nbits_coef. 6 v3.0 CoreFIR FIR clk rstn datai_en datai datao_valid coefi_en* datao_valid datao ready* Note: *coefi_en and ready are available when coef_fixed = 0. Figure 3 • I/O Signals CoreFIR Generator Parameters The CoreFIR generates the RTL code for FIR filters with a variety of parameters. These parameters include generic FIR parameters such as number of taps, number of input’s bits, number of coefficients’ bits, and data type, as well as implementation parameters such as FPGA family, use embedded RAMs, system clock rate, and data sampling rate. The CoreFIR supports the variations specified in Table 4. Table 4 • CoreFIR Generator Configuration Parameters Recommended Selection Parameter Name Description AX APA SX-A – – – module_name Name of generated module nbits_input Number of input’s bits of data 2 – 24 2 – 24 2 – 24 nbits_coef Number of coefficients’ bits of data 2 – 24 2 – 24 2 – 24 ntaps Number of taps 2 – 128 2 – 64 2 – 32 tap Array of coefficients – – – data_signed Data type: 0 = unsigned, 1 = signed 0,1 0,1 0,1 fpga_family FPGA family ax apa sxa coef_fixed 0 = filter with configurable coefficients 1= filter with fixed coefficients – – – sys_clk_frq Input clock frequency – – – sample_ratio Sampling rate = sys_clk_frq/sample_ratio ≥ nbits_in ≥ nbits_in ≥ nbits_in Module_lang Reserved. VHDL only. VHDL VHDL VHDL Number of Bits of Inputs (nbits_in) and Coefficients (nbits_coef) Refer to "Appendix I: Sample Configuration File" on page 12 for a sample usage of the parameters shown in Table 4 in a configuration file for the CoreFIR Generator. Detailed discussions about these parameters are in the sections of this datasheet that follow. The FIR Generator supports the number of bits of inputs and coefficients for the device families specified in Table 4. These parameters are set with the variables nbits_in and nbits_coef in the configuration file. Refer to "Appendix I: Sample Configuration File" on page 12 for details. Number of Taps (ntaps) The FIR generator supports the number of taps specified by the device families in Table 4. The variable ntaps in the configuration file specifies the setting of this parameter. Refer to "Appendix I: Sample Configuration File" on page 12 for details. v3.0 7 CoreFIR Number of Bits of Output (nbits_out) The FIR Generator supports only full precision computation. Thus, the number of bits of output is determined by the number of input’s and coefficients’ bits for the device family as specified in Table 4 on page 7. The number of bits of output are specified by EQ 7: specified with the unit of MHz. Refer to Table 4 on page 7 and "Appendix I: Sample Configuration File" on page 12 for details. Sample Ratio (sample_ratio) The FIR generator supports an asymmetric FIR filter only. Symmetric FIR filters will be supported in future releases. The FIR Generator supports a configuration parameter, sample_ratio, which specifies the sampling rate against the system clock frequency. It defines that the data sampling rate is equal to sys_clk_frq/sample_ratio. This parameter provides guidance to implement a folding architecture to reduce the size of the design. The configuration parameter sample_ratio can only be a positive integer greater than 1. Refer to Table 4 on page 7 and "Appendix I: Sample Configuration File" on page 12 for details. Embedded RAM as LUTs (coef_fixed) Module Name (module_name) The FIR Generator utilizes a switch that determines whether to implement DA LUTs by embedded RAM blocks or by synthesized ROM using FPGA cells. The LUTs are implemented by synthesized ROM using FPGA cells when coef_fixed is equal to 1. The LUTs are implemented by embedded RAM blocks available for Axcelerator, ProASICPLUS, and ProASIC3 devices when coef_fixed is equal to 0. This setting may be set to 1 for a filter design with fixed coefficients for an FPGA device with embedded RAM such as AX, RTAX-S, APA, and PA3, since the overhead of the DA LUT Generator overrides the benefits of using an embedded RAM block as a LUT. The coef_fixed configuration parameter is valid only when the configuration parameter fpga_family is set to ax, apa, or pa3. Refer to Table 4 on page 7 and "Appendix I: Sample Configuration File" on page 12 for details. The FIR Generator supports a configuration parameter, module_name, that specifies the name of the generated module. The generated testbench has the name <module_name>_tb. Refer to Table 4 on page 7 and "Appendix I: Sample Configuration File" on page 12 for details. nbits_out = nbits_in + nbits_coef + ceil(log2(ntaps)) - 1 EQ 7 where ceil is the ceiling function of a floating point data. Asymmetric FIR and Symmetric FIR The FIR Generator supports a configuration parameter, fpga_family, that specifies the targeted Actel FPGA device family. The options are ax, apa, pa3, and sxa. The option for RTAX-S is ax. The option for RTSX-S is sxa. Refer to Table 4 on page 7 and "Appendix I: Sample Configuration File" on page 12 for details. Architecture Variations Signed/Unsigned Inputs and Coefficients (data_signed) The FIR Generator supports signed or unsigned operations. The generator supports two cases: both input and coefficient are unsigned, or both input and coefficient are signed. It supports an unsigned implementation when the configuration parameter data_signed is equal to 0, and a signed implementation when the configuration parameter data_signed is equal to 1. Refer to Table 4 on page 7 and "Appendix I: Sample Configuration File" on page 12 for details. System Clock Frequency (sys_clk_frq) The FIR Generator reads in the system clock frequency via configuration parameter sys_clk_frq. The generated testbench assigns this frequency to its clock generation. The generated design runs at this frequency inside the test bench. The configuration parameter should be 8 FPGA Family (fpga_family) v3.0 The DA algorithm for FIR provides an excellent solution, but also introduces many variations on the design architecture due to limitations of the FPGA resource. FIR Filter with Large Number of Taps As illustrated in section "FIR Filter Using Distributed Arithmetic Algorithm" on page 3, the number of words of the DA LUT is 2ntaps, which is exponentially increased with ntaps. A LUT splitting method, as defined in "Storage and Large Number of Taps" on page 5, effectively reduces the memory usage. The CoreFIR Generator utilizes this method to reduce the memory usage. It usually splits the coefficients into eight or nine taps for each LUT when embedded RAM blocks are available. CoreFIR An example of the split lookup table implementation of a FIR with eight taps (ntaps = 8) and four bits for inputs (nbits_in = 4) is shown in Figure 4. In the example, eight taps have been split into two groups. Each has four taps, x[n][3] x[n][2] x[n][1] x[n][0] x[n-1][3] x[n-1][2] x[n-1][1] x[n-1][0] x[n-2][3] x[n-2][2] x[n-2][1] x[n-2][0] x[n-3][3] x[n-3][2] x[n-3][1] x[n-3][0] x[n-4][3] x[n-4][2] x[n-4][1] x[n-4][0] x[n-5][3] x[n-5][2] x[n-5][1] x[n-5][0] x[n-6][3] x[n-6][2] x[n-6][1] x[n-6][0] x[n-7][3] x[n-7][2] x[n-7][1] x[n-7][0] and each group addresses separate lookup tables. This differs from the case in Figure 1 on page 4, which only has one LUT. Lookup Table Lookup Table Shifter Shifter Flow Control Adder Reg Adder Reg Adder Output Figure 4 • Example of Split Lookup Table Implementation Folding rate (sys_clk_frq) and the data sampling rate (data_rate), as shown in EQ 8: The system clock rate of many FIR filter systems is a multiple of the data rate (or data sampling rate). For typical FPGA implementation, the size of the design is key for efficient implementation. Thus, exploitation of the ratio between the system clock rate and data rate is an effective approach to reduce the size of the design. In other words, folding or serialization of the computation can reduce the size of the design. The DA algorithm for FIR introduces bit-serialization of the operations. This property of the DA can be very efficient for exploring the ratio between system clock rate and data rate. If the number of bits of input is nbits_in, it takes nbits_in table lookup and additions to finish one output point of the FIR. If the system clock rate is nbits_in times faster than data rate, the serialization of table lookup and additions is done with the optimized timing. The parameter sample_ratio defines the ratio between the system clock sample_ratio = sys_clk_frq/data_rate EQ 8 CoreFIR supports folding when sample_ratio is greater than or equal to nbits_in. The serialized operations of table lookup and addition are done in nbits_in clock cycles of the system clock, and the design is idle during the rest of sample_ratio and nbits_in cycles. The generator only requires that the sample_ratio be an integer; the system clock rate is an exact multiple of the data rate. Future releases may support a sample_ratio less than nbits_in. v3.0 9 CoreFIR DA LUTs Using FPGA Cells Some Actel FPGA families such as SX-A and RTSX-S do not have an embedded RAM implementation. In this case, the CoreFIR Generator requires that the lookup table be hard-coded as ROM using FPGA cells. This configuration does not need the DA LUT Generator shown in Figure 2 on page 5. The generator selects this configuration when the configurable parameter coef_fixed is set to 1 or the configuration parameter fpga_family is not one of ax, apa, or pa3. DA LUTs Using Embedded RAM Blocks Many Actel FPGA families have embedded RAM blocks. The FIR generator takes advantage of these embedded RAM blocks, and the DA LUTs are implemented using these embedded RAM blocks. This configuration requires additional overhead in that the embedded RAM blocks must be initialized by a DA LUT Generator as shown in Figure 2 on page 5. The generator selects this configuration when the configurable parameter coef_fixed is set to 0 and the configuration parameter fpga_family is set to ax, apa, or pa3. DA LUT Generation The DA LUT Generator computes the LUT contents of the distributed arithmetic algorithm. It reads the coefficients from the Input Buffers block and writes the LUT words into the embedded RAM blocks. This block is only available when using embedded RAM blocks as LUTs – when the configuration parameter coef_fixed is set to 0. After the reset is complete, the DA LUT Generator will wait for the Input Buffers block to signal that the coefficients are loaded into the input buffers. Then the DA LUT generator will compute the LUT words and write them into the embedded RAM blocks. The DA LUT Generator produces LUT contents for multiple LUTs when implementing a FIR filter with a large number of taps. The generator has only one computation engine, and initializes multiple LUTs sequentially. After the initialization of the RAM blocks, the output ready will go high to let the system know that the FIR filter is ready to accept data. Input Buffering Scheme The Input Buffers block always performs the functions defined in "Input Coefficients" on page 10, but only performs functions defined in "Input Data" on page 10 when the embedded RAM blocks are used as the DA LUTs (when coef_fixed is set to 1). Input Data The input dataflow is designed to use the scheme shown in Figure 5 to reduce the size of registers. The horizontal movement of the input ensures the bits of the inputs feed into the lookup table, and that it happens at every cycle. Vertical movement of input data only occurs as the most significant bit (MSB) is fed into lookup table, when it switches to the next FIR data point. x[n][3] x[n][2] x[n][1] x[n][0] x[n-1][3] x[n-1][2] x[n-1][1] x[n-1][0] x[n-2][3] x[n-2][2] x[n-2][1] x[n-2][0] x[n-3][3] x[n-3][2] x[n-3][1] x[n-3][0] Figure 5 • Example of Input Buffering Scheme Input Coefficients User Interface The CoreFIR generator shares the input buffers for coefficients input when embedded RAM blocks are used as the DA LUTs. The configuration parameter coef_fixed is set to 0. In this configuration, the width of the input datai is the maximum of nbits_in and nbits_coef. The input datai reads in coefficients when input coefi_en is high. After enough coefficients are fed into the buffer, coefi_en is ignored and the coefficients stay inside the input buffers until the DA LUT Generator finishes the initialization of the embedded RAM blocks. The generator executable reads one command line parameter, which is the name of the configuration file. It generates RTL code for the module and testbench based on the parameters in the configuration file. Refer to Table 4 on page 7 and "Appendix I: Sample Configuration File" on page 12 for details of the configuration file. 10 v3.0 CoreFIR Clock and Reset Input and Output Timing Clock I/O Timing Diagram of Normal FIR Operation The CoreFIR generates a FIR filter design that uses only positive-edge-triggered registers. The entire design is fully synchronized using the positive edge of the input clock clk, including the embedded RAM blocks (when available). The I/O timing under normal FIR operation is illustrated in Figure 6. The labels s0 and s2 refer to the data sampling point for input data, while s1 is the sampling point for output data. Due to variations of the configuration, you should refer to comments in the generated module for t0 and t1. These parameters are given in the number of the clock cycles of the input clock, clk. Reset The CoreFIR generates a design that uses only one active low asynchronous reset. The entire design is asynchronously reset by the input rstn. s0 s1 t1 s2 t0 clk datai_en datai 0 1 datao_valid datao 0 Figure 6 • I/O Timing Diagram of Normal FIR Operation I/O Timing Diagram of LUT Initialization The I/O timing for LUT initialization is illustrated in Figure 7. In this figure, s0 and s1 are the starting and ending points for feeding coefficients, while s2 is the sampling point for output ready. Due to the variation of the configuration, refer to the comments inside the generated module for t2, which are given in the number of the clock cycles of the input clock clk. s0 t2 s1 s2 clk coefi_en datai 0 1 ntaps-1 2 ready Figure 7 • I/O Timing Diagram for LUT Initialization v3.0 11 CoreFIR Appendix I: Sample Configuration File The following shows a sample configuration file. module_name firtest nbits_input 8 nbits_coef 5 ntaps 13 tap 8 14 21 27 31 31 27 21 14 8 4 2 1 data_signed 0 fpga_family ax coef_fixed 1 sys_clk_frq 25 sample_ratio 16 module_lang vhdl 12 v3.0 CoreFIR Ordering Information Order CoreFIR through your local Actel sales representative. Use the following numbering convention when ordering: CoreFIR-XX, where XX is listed in Table 5. Table 5 • Ordering Codes XX Description EV Evaluation Version AR RTL for unlimited use on Actel devices UR RTL for unlimited use and not restricted to Actel devices List of Changes The following table lists critical changes that were made in the current version of the document. Previous Version Changes in Current Version (v 3 .0 ) v2.1 v2.0 Page The "Supported Families" section was updated to include Fusion. 1 Table 1 was updated to include Fusion data. 2 The "Supported Families" section was updated. 1 Table 1 was updated. 2 Datasheet Categories In order to provide the latest information to designers, some datasheets are published before data has been fully characterized. Datasheets are designated as "Product Brief," "Advanced," and "Production." The definitions of these categories are as follows: Product Brief The product brief is a summarized version of an advanced or production datasheet containing general product information. This brief summarizes specific device and family information for unreleased products. Advanced This datasheet version contains initial estimated information based on simulation, other products, devices, or speed grades. This information can be used as estimates, but not for production. Unmarked (production) This datasheet version contains information that is considered to be final. v3.0 13 Actel and the Actel logo are registered trademarks of Actel Corporation. All other trademarks are the property of their owners. www.actel.com Actel Corporation Actel Europe Ltd. Actel Japan www.jp.actel.com Actel Hong Kong www.actel.com.cn 2061 Stierlin Court Mountain View, CA 94043-4655 USA Phone 650.318.4200 Fax 650.318.4600 Dunlop House, Riverside Way Camberley, Surrey GU15 3YL United Kingdom Phone +44 (0) 1276 401 450 Fax +44 (0) 1276 401 490 EXOS Ebisu Bldg. 4F 1-24-14 Ebisu Shibuya-ku Tokyo 150 Japan Phone +81.03.3445.7671 Fax +81.03.3445.7668 Suite 2114, Two Pacific Place 88 Queensway, Admiralty Hong Kong Phone +852 2185 6460 Fax +852 2185 6488 51700056-2/12.05