CoreFIR datasheet

CoreFIR Finite Impulse Response
Filter Generator
Product Summary
Supported Families
Intended Use
•
Finite Impulse Response (FIR) Filter for Actel FPGAs
Key Features
•
Two FIR Computation Algorithms for Design
Flexibility
•
Constant Coefficient (CC) Algorithm for High
Throughput
•
–
Constant Multiplier Computation
–
Low Gate Counts
–
High Speed
Fusion
•
ProASIC®3/E
•
ProASICPLUS ®
•
Axcelerator®
•
RTAX-S
•
SX-A
•
RTSX-S
Core Deliverables
•
Evaluation Version
–
Distributed Arithmetic (DA) Algorithm
–
Multiplier-Free Computation
–
Low Cost
–
Optimized for Actel FPGAs
–
Efficient DA Architecture using Embedded
RAM Lookup Tables
–
•
•
•
Folding
Architecture
with
Serialized
Computation to Minimize Size for Lower
Sample Rates
–
DA Lookup Table ROM Synthesis for FPGA
without Embedded RAMs
–
Multiple DA Lookup Tables to Split Large
Number of Taps
RTL Code of a Sample Filter and Compiled RTL
Simulation Model Fully Supported in the Actel
Libero® Integrated Design Environment (IDE)
RTL Version
–
Microsoft Windows® Binary Executable of the
CoreFIR Generator
–
VHDL FIR Module
–
VHDL Testbench
Synthesis and Simulation Support
•
Synthesis: Synplicity®, Synopsys® (Design Compiler® /
FPGA CompilerTM /FPGA ExpressTM), ExemplarTM
•
Simulation: OVI-Compliant Verilog Simulators and
Vital-Compliant VHDL Simulators
Core Generator
–
Executable File Outputs Runtime Library (RTL)
Code and Testbench Based on Input Parameters
–
Self-Checking: Executable Tests Generate Output
against Algorithm
•
Actel FPGA-Optimized RTL Code
•
Supports 2 to 128 Taps
•
1- to 32-Bit Input Data and Coefficient Precision
May 2006
© 2006 Actel Corporation
v 4 .0
1
CoreFIR Finite Impulse Response Filter Generator
Contents
General Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Constant Coefficient Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Distributed Arithmetic Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Device Utilization and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Constant Coefficient Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Distributed Arithmetic Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Constant Coefficient Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
CoreFIR Generator Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Distributed Arithmetic Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
FIR Filter Using Distributed Arithmetic Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
DA Functional Block Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
CoreFIR Generator Parameters in DA Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
DA FIR Filter with Large Number of Taps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
I/O Signal Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Clock and Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Input and Output Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Appendix I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Appendix II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Ordering Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
List of Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Datasheet Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2
v4.0
CoreFIR Finite Impulse Response Filter Generator
current sample and N – 1 delayed samples. Figure 2
shows a detailed design implementation diagram.
General Description
CoreFIR is an Actel FPGA-optimized RTL generator that
produces a finite impulse response filter. It has two
architectures: constant coefficient (CC) and distributed
arithmetic (DA). The constant coefficient architecture is a
multiplier-based architecture for higher throughput. The
DA architecture uses add and shift operations to perform
the calculations in a small size.
x(i)
Delay Circuit
x(i)
x(l – N + 1)
Filter Arithmetic Unit
Constant Coefficient Mode
CoreFIR implements the constant coefficient algorithm
for faster designs as well as for newer Flash family
products such as ProASIC®3 and Fusion. CC
implementation derives directly from the definition of
convolution in the time domain. Figure 1 shows such a
general filter structure. It is comprised of a filter
arithmetic unit and a delay circuit. The filter arithmetic
unit is an implementation of the data dependence
graph, containing multipliers and adders. The delay
circuit furnishes the necessary data input for the filter
arithmetic unit. It supplies several samples in parallel, the
y(i)
Figure 1 • General FIR Filter Structure
Input
C0
C1
0
Register
Output
Figure 2 • Design Diagram
Distributed Arithmetic Mode
CoreFIR is an Actel FPGA-optimized RTL generator that
produces a finite impulse response filter. It implements
the DA algorithm to eliminate multiplication for faster
and smaller designs. CoreFIR uses embedded RAM blocks
in Actel FPGAs as DA lookup tables (when available) to
further reduce the size of the design. The generator also
reads the user system clock rate and data sample rate to
explore using a folding or serial architecture to further
reduce size, especially when the system clock rate is
datai
Coefficients
much greater than the data sampling rate. The
generator automatically switches to the use of multiple
DA lookup tables when the requested FIR filter has a
large number of taps.
Figure 3 shows the functional block diagram of a
generated FIR filter design. More complex designs may
contain multiple lookup tables, accumulators, or control
sections.
Input
Buffers
DA Lookup Tables
(RAMs or ROM)
DA LUT
Generator
Control
Shifter/Accumulator
datao
Figure 3 • Functional Block Diagram
v4.0
3
CoreFIR Finite Impulse Response Filter Generator
Device Utilization and Performance
Constant Coefficient Mode
CoreFIR generates FIR filters with many configurations. Table 1 provides the typical utilization and performance data
for the generated FIR filters implemented with the configurations listed in Table 2 on page 5. Refer to Table 2 on
page 5 for the Configuration column in Table 1.
Table 1 • CoreFIR Device Utilization and Performance for Constant Coefficient Mode
Cells or Tiles
Utilization
Configuration
Combinatorial
Sequential
Total
Device
Total
Throughput
Msps
Fusion
1
1,502
224
1,726
AFS250
28%
93
Fusion
2
5,348
577
5,925
AFS250
96%
83
Fusion
3
8,676
1,025
9,701
AFS600
70%
62
Fusion
4
8,676
1,025
9,701
AFS600
70%
62
ProASIC3/E
1
1,502
224
1,726
A3P250
28%
93
ProASIC3/E
2
5,348
577
5,925
A3P250
96%
83
ProASIC3/E
3
8,676
1,025
9,701
A3P600
70%
62
ProASIC3/E
4
8,676
1,025
9,701
A3P600
70%
62
ProASICPLUS
1
1,451
217
1,668
APA075
54%
55
PLUS
ProASIC
2
5,809
577
6,386
APA300
78%
55
ProASICPLUS
3
9,690
1,025
10,715
APA1000
19%
41
PLUS
ProASIC
4
9,690
1,025
10,715
APA1000
19%
41
ProASICPLUS
5
28,478
2,569
31,047
APA750
94%
46
Axcelerator
1
269
200
469
AX125
23%
202
Axcelerator
2
989
559
1,548
AX250
36%
165
Axcelerator
3
1,795
1,007
2,802
AX250
66%
137
Axcelerator
4
1,795
1,007
2,802
AX500
35%
137
Axcelerator
5
4,849
2,351
7,200
AX1000
40%
80
Axcelerator
6
10,637
4,973
15,610
AX2000
39%
51
RTAX-S
1
269
200
469
RTAX1000S
3%
151
RTAX-S
2
989
559
1,548
RTAX1000S
8%
110
RTAX-S
3
1,795
1,007
2,802
RTAX1000S
15%
105
RTAX-S
4
1,795
1,007
2,802
RTAX1000S
15%
105
RTAX-S
5
4,849
2,351
7,200
RTAX1000S
40%
68
RTAX-S
6
10,637
4,973
15,610
RTAX2000S
48%
44
SX-A
1
484
173
657
A54SX16A
45%
101
SX-A
2
2,162
532
2,694
A54SX72A
44%
59
SX-A
3
493
983
1,476
A54SX72A
86%
67
Family
Note: The data above are obtained by typical synthesis and place-and-route methods. Other core parameters can result in different
utilization and performance.
4
v4.0
CoreFIR Finite Impulse Response Filter Generator
Table 1 • CoreFIR Device Utilization and Performance for Constant Coefficient Mode (Continued)
Cells or Tiles
Utilization
Family
Configuration
Combinatorial
Sequential
Total
Device
Total
Throughput
Msps
RTSX-S
1
502
172
674
RT54SX32S
23%
26
RTSX-S
2
2,159
532
2,691
RT54SX72S
45%
36
RTSX-S
3
4,193
983
5,176
RT54SX72S
86%
39
Note: The data above are obtained by typical synthesis and place-and-route methods. Other core parameters can result in different
utilization and performance.
Table 2 • Test Configurations
Configuration
nbits_input
nbits_coef
1
8
16
8
0
2
16
16
16
1
3
12
15
32
0
4
12
15
32
1
5
16
15
64
0
6
16
16
128
1
v4.0
ntaps
data_signed
5
CoreFIR Finite Impulse Response Filter Generator
Distributed Arithmetic Mode
CoreFIR generates FIR filters with many configurations. Table 3 provides the typical utilization and performance data
for the generated FIR filters implemented with the configurations listed in Table 4 on page 6. Refer to Table 4 on
page 6 for the Configuration column in Table 3.
Table 3 • CoreFIR Device Utilization and Performance for Distributed Arithmetic Mode
Cells or Tiles
Utilization
Configuration
Combinatorial
Sequential
Total
RAM
Blocks
ProASICPLUS
1
558
116
674
0
APA075
22%
3.63
PLUS
ProASIC
2
2,054
427
2,481
0
APA150
40%
1.19
ProASICPLUS
3
3,540
661
4,201
0
APA1000
8%
1.58
PLUS
4
6,391
872
7,271
8
APA1000
13%
1.42
PLUS
ProASIC
5
8,775
1,606
10,381
0
APA750
32%
0.81
Axcelerator
1
229
148
377
0
AX125
19%
21.75
Axcelerator
2
693
478
1,171
0
AX250
28%
6.88
Axcelerator
3
1,231
719
1,950
0
AX250
46%
9.25
Axcelerator
4
2,249
852
3,101
4
AX500
38%
6.17
Axcelerator
5
3,129
1,704
4,833
0
AX1000
27%
4.56
Axcelerator
6
9,132
3,355
12,487
32
AX2000
39%
2.88
RTAX-S
1
229
148
377
0
RTAX1000S
2%
14.25
RTAX-S
2
693
478
1,171
0
RTAX1000S
6%
4.75
RTAX-S
3
1,231
719
1,950
0
RTAX1000S
11%
5.50
RTAX-S
4
2,249
852
3,101
4
RTAX1000S
17%
3.42
RTAX-S
5
3,129
1,704
4,833
0
RTAX1000S
27%
2.81
RTAX-S
6
9,132
3,355
12,487
32
RTAX2000S
39%
1.81
SX-A
1
386
159
545
0
A54SX16A
38%
14.00
SX-A
2
1,115
480
1,595
0
A54SX72A
26%
4.00
SX-A
3
1,831
727
2,558
0
A54SX72A
42%
5.25
RTSX-S
1
381
159
540
0
RT54SX32S
19%
6.50
RTSX-S
2
1,115
480
1,595
0
RT54SX72S
26%
2.25
RTSX-S
3
1,831
727
2,558
0
RT54SX72S
42%
3.00
Family
ProASIC
Device
Total
Throughput
Msps
Notes:
1. The data above are obtained by typical synthesis and place-and-route methods. Other core parameters can result in different
utilization and performance.
2. Cell (tile) count may vary depending on the actual coefficient values.
Table 4 • Test Configurations
6
Configuration
nbits_input
nbits_coef
ntaps
fpga_family
coef_fixed
1
8
16
8
All
1
2
16
16
16
All
1
3
12
15
32
All
1
4
12
15
32
AX, RTAX-S, APA
0
5
16
15
64
All
1
6
16
16
128
AX, RTAX-S, APA
0
v4.0
CoreFIR Finite Impulse Response Filter Generator
Constant Coefficient Mode
CoreFIR Generator Parameters
CoreFIR generates the RTL code for FIR filters with a variety of parameters. These parameters include generic FIR
parameters such as number of taps, number of input bits, number of coefficient bits, and data type. CoreFIR supports
the variations specified in Table 5.
Table 5 • CoreFIR Generator Configuration Parameters for Constant Coefficient Mode
Recommended Selection
Parameter Name
Description
module_name
Name of generated module
nbits_input
Number of input data bits
nbits_coef
Number of coefficients data bits
ntaps
Number of taps
tap
Array of coefficients
data_signed
Data type; 0 = unsigned, 1 = signed
AX
APA
SX-A
–
2–24
2–24
2–128
–
–
2–24
2–24
2–64
–
–
2–24
2–24
2–32
–
0, 1
0, 1
0, 1
Number of Taps (ntaps)
Refer to "Appendix II" on page 17 for a sample usage of
the parameters shown in Table 5 in a configuration file
for the CoreFIR Generator. Detailed discussion about
these parameters follows.
The CoreFIR generator supports the number of taps
specified by the device family in Table 6 on page 11. The
variable ntaps in the configuration file specifies the
setting of this parameter. Refer to "Appendix I" on
page 17 for details.
Module Name (module_name)
The CoreFIR Generator supports a configuration
parameter, module_name, that specifies the name of the
generated module. The generated testbench has the
name <module_name>_tb. Refer to Table 6 on page 11
and "Appendix I" on page 17 for details.
Signed/Unsigned Inputs and Coefficients
(data_signed)
The CoreFIR Generator supports signed or unsigned
operations. The generator supports two cases: both
input and coefficient are unsigned, or both input and
coefficient are signed. It supports an unsigned
implementation when data_signed is equal to 0, and a
signed implementation when the configuration
parameter data_signed is equal to 1. Refer to Table 6 on
page 11 and "Appendix I" on page 17 for details.
Number of Input (nbits_in) and
Coefficients (nbits_coef) Bits
The CoreFIR Generator supports the number of inputs
and coefficient bits for the device families specified in
Table 6 on page 11. These parameters are set with the
variables nbits_in and nbits_coef in the configuration
file. Refer to "Appendix I" on page 17 for details.
v4.0
7
CoreFIR Finite Impulse Response Filter Generator
Distributed Arithmetic Mode
FIR Filter Using Distributed Arithmetic Algorithm
Distributed Arithmetic Algorithm Overview
FIR filters are used in applications that require exact linear phase response. Typical applications for an FIR filter include
image processing, digital audio, digital communication, and biomedical signal processing. An FIR filter is defined in EQ 1:
ntaps – 1
y[n] =
∑
c[n] × x[n]
0
EQ 1
where:
c[n] = h[ntaps – n – 1]
and h is the impulse response. The term ntaps is the number of taps.
In summary, the direct computation for one point of FIR requires:
ntaps multiplications + (ntaps – 1) additions.
Distributed Arithmetic (DA) is a well-known method for eliminating resources in multiply-and-accumulate structures
(MACs) implementing digital signal processing (DSP) functions. DA trades memory for combinatorial elements,
resulting in an efficient implementation in FPGAs. Another feature of DA is its easy serialization of the input, which
further reduces the cost of operation when FIR data rate is low compared to the system clock, a common scenario in
FIR applications.
The input of an FIR can be expressed in the composition of its bits, as shown in EQ 2:
x[n] =
nbits_in – 1
∑
x[n][b] × 2
b
0
EQ 2
where x[n][b] is the
shown in EQ 3:
bth
bit of x[n] and nbits_in is the number of bits of input. The resulting output of the FIR filter is
ntaps – 1
y[n] =
∑
nbits_in – 1
ntaps – 1
∑
c[n] x[n] =
0
c[n]
∑
x[n][b]2
b
0
0
EQ 3
Changing the summation order gives the results shown in EQ 4:
nbits_in – 1
y[n] =
∑
0
2
b
ntaps – 1
∑
nbits_in – 1
c[n] x[n][b] =
0
∑
b
2 T(X[b])
0
EQ 4
ntaps – 1
where T(X[b]) =
∑
th
c[n] x[n][b] and X[b] is a collection of the b
bits of ntaps different taps.
0
Note that the x[n][b] can only be 0 or 1. There are 2ntaps
different values of T. If T is pre-calculated and stored
inside a RAM or ROM, the FIR computation becomes
nbits_in table lookup operations using x[b] and
nbits_in – 1
additions. Multiplication operations are
eliminated.
In summary, the FIR computation using DA for one point
of FIR requires:
nbits_in table lookups + (nbits_in – 1) additions.
The cost to eliminate multiplication is a memory block to
store 2ntaps pre-computed values.
The serialization of table lookup and addition is possible
because table T is the same for each b. If one table
8
v4.0
CoreFIR Finite Impulse Response Filter Generator
the first cycle of the example, all 0th bits of inputs x[n] to
x[n – 3] are fed into the lookup table as an input address;
in the second cycle, all 1st bits of inputs x[n] to x[n – 3]
are fed into the lookup table; in the third cycle, all 2nd
bits of inputs x[n] to x[n – 3] are fed into the lookup
table; and in the fourth cycle, all 3rd bits of inputs x[n] to
x[n – 3] are fed into the lookup table. The shifter shifts
the outputs of the lookup table for the inputs of the
adder, which accumulates for the final result.
lookup and one addition can be finished in one cycle, the
total computation will finish in b cycles. The serialization
of the FIR introduces further opportunity to reduce the
size of the design, which is the key to an efficient FPGA
design.
Example Design of an FIR Filter Using DA
An example of an FIR with four taps (ntaps = 4) and four
bits for inputs (nbits_in = 4) is shown in Figure 4. The
expression x[n][b] represents the bth bit of input x[n]. In
x[n][3]
x[n][2]
x[n – 1][3]
x[n – 1][2]
x[n][0]
x[n][1]
x[n –1][1]
x[n – 1][0]
Lookup Table
x[n – 2][3]
x[n – 2][2]
x[n – 2][1]
x[n – 2][0]
x[n – 3][3]
x[n – 3][2]
x[n – 3][1]
x[n – 3][0]
Shifter
Flow
Control
Adder
Reg
Figure 4 • Example Implementation of a Bit-Serialized FIR Filter Using DA
The serialized DA implementation in Figure 4 uses a table lookup with 16 words and takes four clock cycles to finish one FIR point.
Storage and Large Number of Taps
As seen in the previous section, the size of the lookup table is 2ntaps, which is increases exponentially with ntaps. A
design with a large number of taps needs to have several lookup tables. Let ntaps = p × q. If we split taps into p
groups, each group has q taps. Then the FIR becomes as shown in EQ 5:
nbits_in – 1
y[n] =
∑
2
b
0
n=ntaps – 1
∑
c[n] x[n][b] =
0
nbits_in – 1
b
n=pq – 1
0
0
∑
2
∑
c[n] x[n][b]
EQ 5
By splitting ntaps into two level summations, we have the result shown in EQ 6:
y[n] =
nbits_in – 1
b
∑
0
2
i=p-1 j=q-1
∑ ∑
0
c[iq + j] x[iq + j][b]
0
EQ 6
Refer to the "DA FIR Filter with Large Number of Taps" section on page 13 for further information.
v4.0
9
CoreFIR Finite Impulse Response Filter Generator
DA Functional Block Description
The functional blocks shown in Figure 3 on page 3
illustrate the architecture of the generated FIR filter
using the DA algorithm.
Input Buffers
The Input Buffers block stores the input data, which
contains ntaps data points, where ntaps is the number of
taps of the FIR filter. The Input Buffers block also
circulates the data bits to address the DA lookup tables
(LUTs) required by the DA algorithm. An optional
function of the Input Buffers block is to share its storage
with the DA LUT generator. The coefficients used in
computing the LUT content can be stored in the input
buffers when a design uses the embedded RAM blocks
for the LUTs.
DA Lookup Tables (LUTs)
The DA LUTs store the LUT contents for the distributed
algorithm. The generator implements the DA LUTs in two
ways: (1) synthesized ROM using FPGA cells and (2)
embedded RAM blocks supported by the on-chip DA LUT
Generator. The first method is for an FPGA without
embedded RAM blocks, intended primarily for a small
FIR filter. The latter is for an FPGA with embedded RAM
blocks. FIR filters with a large number of taps may
require multiple LUTs.
Shifter and Accumulator
The Shifter and Accumulator block performs additions
with LUT outputs and the alignments of LUT outputs
required by the DA algorithm. Multiple accumulators
10
v4.0
and shifters may be needed to implement an FIR filter
with a large number of taps.
DA LUT Generator
The DA LUT Generator computes the LUT contents
required by the distributed arithmetic algorithm. It reads
the coefficients from the Input Buffers block and writes
the LUT words into the embedded RAM blocks. These
blocks are available only for designs that use embedded
RAM blocks as LUTs. The DA LUT Generator produces LUT
contents for multiple LUTs when implementing an FIR
filter with variable coefficients. Refer to "DA LUT
Generation" on page 14 and "I/O Timing Diagram of LUT
Initialization" on page 16 for detailed information on
initialization of the DA LUT.
Control
The state machine inside the Control block controls the
operations of all other blocks. It controls the input
buffers to ensure they operate based on the specified
system clock rate and sample rate, monitors input enable
and coefficient input enable, and circulates input data
bits to address the DA LUTs. It also controls the shifters
and accumulators to ensure they operate based on the
requested FIR configuration and DA algorithm. The
Control block coordinates the initialization of the LUTs
by the DA LUT generator when using embedded RAMs.
The Control logic is designed to support folding or
serialization of computation when the system clock rate
is substantially higher than the data sampling rate.
CoreFIR Finite Impulse Response Filter Generator
CoreFIR Generator Parameters in DA Mode
CoreFIR generates the RTL code for FIR filters with a variety of parameters. These parameters include generic FIR
parameters such as number of taps, number of input bits, number of coefficient bits, and data type, as well as
implementation parameters such as FPGA family, use of embedded RAMs, system clock rate, and data sampling rate.
CoreFIR supports the variations specified in Table 6.
Table 6 • CoreFIR Generator Configuration Parameters for Distributed Arithmetic Mode
Recommended Selection
Parameter Name
Description
Axcelerator
ProASICPLUS
SX-A
–
–
–
module_name
Name of generated module
nbits_input
Number of input data bits
2–24
–24
2–24
nbits_coef
Number of coefficient data bits
2–24
–24
2–24
ntaps
Number of taps
2–128
2–64
2–32
tap
Array of coefficients
–
–
–
data_signed
Data type: 0 = unsigned, 1 = signed
0, 1
0, 1
0, 1
fpga_family
FPGA family
ax
apa
sxa
coef_fixed
0 = filter with configurable coefficients
1= filter with fixed coefficients
–
–
–
sys_clk_frq
Input clock frequency
–
–
–
sample_ratio
Sampling rate = sys_clk_frq/sample_ratio
≥ nbits_in
≥ nbits_in
≥ nbits_in
Module_lang
Reserved—VHDL only.
VHDL
VHDL
VHDL
where ceil is the ceiling function for floating point data.
Refer to "Appendix I" on page 17 for a sample usage of
the parameters shown in Table 6 in a configuration file
for the CoreFIR Generator. Detailed discussion about
these parameters follows.
Asymmetric FIR and Symmetric FIR
The CoreFIR generator supports an asymmetric FIR filter
only. Symmetric FIR filters will be supported in future
releases.
Number of Taps (ntaps)
The CoreFIR generator supports the number of taps
specified by the device family in Table 6. The variable
ntaps in the configuration file specifies the setting of this
parameter. Refer to "Appendix I" on page 17 for details.
Embedded RAM as LUTs (coef_fixed)
The CoreFIR Generator utilizes a switch that determines
whether to implement DA LUTs with embedded RAM
blocks or with synthesized ROM using FPGA cells. The
LUTs are implemented with synthesized ROM using FPGA
cells when coef_fixed is equal to 1. The LUTs are
implemented with embedded RAM blocks available for
Axcelerator, ProASICPLUS, and ProASIC3 devices when
coef_fixed is equal to 0. This setting may be set to 1 for a
filter design with fixed coefficients for an FPGA device
with embedded RAM, such as Axcelerator, RTAX-S,
ProASICPLUS, or ProASIC3, since the overhead of the DA
LUT Generator overrides the benefits of using an
embedded RAM block as a LUT. The coef_fixed
configuration parameter is valid only when the
configuration parameter fpga_family is set to ax, apa, or
pa3. Refer to Table 6 on page 11 and "Appendix I" on
page 17 for details.
Number of Input (nbits_in) and Coefficient
(nbits_coef) Bits
The CoreFIR Generator supports the number of input
and coefficient bits for the device families specified in
Table 6. These parameters are set with the variables
nbits_in and nbits_coef in the configuration file. Refer to
"Appendix I" on page 17 for details.
Number of Bits of Output (nbits_out)
The CoreFIR Generator supports only full-precision
computation. Thus, the number of output bits is
determined by the number of input and coefficient bits
for the device family, as specified in Table 6. The number
of output bits is specified by EQ 7:
nbits_out = nbits_in + nbits_coef + ceil(log2(ntaps)) – 1
EQ 7
v4.0
11
CoreFIR Finite Impulse Response Filter Generator
Signed/Unsigned Inputs and Coefficients
(data_signed)
Module Name (module_name)
The CoreFIR Generator supports signed or unsigned
operations. The generator supports two cases: both
input and coefficient are unsigned, or both input and
coefficient are signed. It supports an unsigned
implementation when the configuration parameter
data_signed is equal to 0, and a signed implementation
when data_signed is equal to 1. Refer to Table 6 on
page 11 and "Appendix I" on page 17 for details.
System Clock Frequency (sys_clk_frq)
The CoreFIR Generator reads in the system clock
frequency via configuration parameter sys_clk_frq. The
generated testbench assigns this frequency to its clock
generation. The generated design runs at this frequency
inside the testbench. The configuration parameter
should be specified in MHz. Refer to Table 6 on page 11
and "Appendix I" on page 17 for details.
Sample Ratio (sample_ratio)
The CoreFIR Generator supports a configuration
parameter, sample_ratio, that specifies the sampling rate
against the system clock frequency. The data sampling
rate is equal to sys_clk_frq/sample_ratio. This parameter
provides guidance in implementing a folding
architecture to reduce the size of the design. The
configuration parameter sample_ratio can only be a
positive integer greater than 1. Refer to Table 6 on
page 11 and "Appendix I" on page 17 for details.
12
v4.0
The CoreFIR Generator supports a configuration
parameter, module_name, that specifies the name of the
generated module. The generated testbench has the
name <module_name>_tb. Refer to Table 6 on page 11
and "Appendix I" on page 17 for details.
FPGA Family (fpga_family)
The CoreFIR Generator supports a configuration
parameter, fpga_family, that specifies the targeted Actel
FPGA device family. The options are ax, apa, pa3, and
sxa. The option for RTAX-S is ax. The option for RTSX-S is
sxa. Refer to Table 6 on page 11 and "Appendix I" on
page 17 for details.
Architecture Variations
The DA algorithm for FIR provides an excellent solution,
but also introduces many variations on the design
architecture due to FPGA resource limitations.
CoreFIR Finite Impulse Response Filter Generator
DA FIR Filter with Large Number of Taps
This section provides an example of CoreFIR in DA mode
with a large number of taps.
coefficients into eight or nine taps for each LUT when
embedded RAM blocks are available.
As illustrated in the "Constant Coefficient Mode" section
on page 4, the number of words of the DA LUT is 2ntaps,
which increases exponentially with ntaps. A LUT splitting
method, as defined in the "Storage and Large Number of
Taps" section on page 9, effectively reduces memory
usage. The CoreFIR Generator utilizes this method to
reduce the memory usage. It usually splits the
An example of the split lookup table implementation of
an FIR filter with eight taps (ntaps = 8) and four input
bits (nbits_in = 4) is shown in Figure 5. In the example,
eight taps have been split into two groups. Each group
has four taps and addresses separate lookup tables. This
differs from the case in Figure 4 on page 9, which only
has one LUT.
x[n][3]
x[n][2]
x[n – 1][3]
x[n – 1][2]
x[n – 1][1]
x[n – 1][0]
x[n – 2][3]
x[n – 2][2]
x[n – 2][1]
x[n – 2][0]
x[n – 3][3]
x[n – 3][2]
x[n – 3][1]
x[n – 3][0]
x[n – 4][3]
x[n – 4][2]
x[n – 4][1]
x[n – 4][0]
x[n – 5][3]
x[n – 5][2]
x[n – 5][1]
x[n – 5][0]
x[n – 6][3]
x[n – 6][2]
x[n – 6][1]
x[n – 6][0]
x[n – 7][3]
x[n –-7][2]
x[n – 7][1]
x[n – 7][0]
x[n][1]
x[n][0]
Lookup Table
Lookup Table
Shifter
Shifter
Flow Control
Adder
Reg
Adder
Reg
Adder
Output
Figure 5 • Example of Split Lookup Table Implementation
Folding
lookup and additions to finish one output point of the
FIR. If the system clock rate is nbits_in times faster than
the data rate, the serialization of table lookup and
additions is done with the optimized timing. The
parameter sample_ratio defines the ratio between the
system clock rate (sys_clk_frq) and the data sampling rate
(data_rate), as shown in EQ 8:
The system clock rate of many FIR filter systems is a
multiple of the data rate (or data sampling rate). For
typical FPGA implementation, the size of the design is
key for efficient implementation. Thus, exploitation of
the ratio between the system clock rate and data rate is
an effective approach to reduce the size of the design. In
other words, folding or serialization of the computation
can reduce the size of the design. The DA algorithm for
FIR introduces bit-serialization of the operations. This
property of DA can be very efficient for exploring the
ratio between system clock rate and data rate. If the
number of input bits is nbits_in, it takes nbits_in table
sample_ratio = sys_clk_frq/data_rate
EQ 8
CoreFIR supports folding when sample_ratio is greater
than or equal to nbits_in. The serialized operations of
table lookup and addition are done in nbits_in clock
v4.0
13
CoreFIR Finite Impulse Response Filter Generator
cycles of the system clock, and the design is idle during
the rest of sample_ratio and nbits_in cycles. The
generator only requires that the sample_ratio be an
integer; the system clock rate is an exact multiple of the
data rate. Future releases may support a sample_ratio
less than nbits_in.
DA LUTs Using FPGA Cells
Some Actel FPGA families, such as SX-A and RTSX-S, do
not have an embedded RAM implementation. In this
case, the CoreFIR Generator requires that the lookup
table be hard-coded as ROM using FPGA cells. This
configuration does not need the DA LUT Generator
shown in Figure 3 on page 3. The generator selects this
configuration when the configuration parameter
coef_fixed is set to 1 or the configuration parameter
fpga_family is not set to ax, apa, or pa3.
DA LUTs Using Embedded RAM Blocks
Many Actel FPGA families have embedded RAM blocks.
The CoreFIR generator takes advantage of these
embedded RAM blocks, and the DA LUTs are
implemented using embedded RAM. This configuration
requires additional overhead in that the embedded RAM
blocks must be initialized by a DA LUT Generator as
shown in Figure 3 on page 3. The generator selects this
configuration when the configuration parameter
coef_fixed is set to 0 and the configuration parameter
fpga_family is set to ax, apa, or pa3.
DA LUT Generation
from the Input Buffers block and writes the LUT words
into the embedded RAM blocks. This block is only
available when using embedded RAM blocks as LUTs
(when coef_fixed is set to 0). After the reset is complete,
the DA LUT Generator will wait for the Input Buffers
block to signal that the coefficients are loaded into the
input buffers. Then the DA LUT generator will compute
the LUT words and write them into the embedded RAM
blocks. The DA LUT Generator produces LUT contents for
multiple LUTs when implementing an FIR filter with a
large number of taps. The generator has only one
computation engine and initializes multiple LUTs
sequentially. After the initialization of the RAM blocks,
the output ready will go high to let the system know
that the FIR filter is ready to accept data.
Input Buffering Scheme
The Input Buffers block always performs the functions
defined in the "Input Coefficients" section, but only
performs functions defined in the "Input Data" section
when the embedded RAM blocks are used as the DA LUTs
(when coef_fixed is set to 1).
Input Data
The input dataflow is designed to use the scheme shown
in Figure 6 to reduce the size of registers. The horizontal
movement of the input ensures the input bits feed into
the lookup table, and that this happens on every cycle.
Vertical movement of input data only occurs as the most
significant bit (MSB) is fed into the lookup table, when it
switches to the next FIR data point.
The DA LUT Generator computes the LUT contents of the
distributed arithmetic algorithm. It reads the coefficients
x[n][3]
x[n][2]
x[n][1]
x[n][0]
x[n-1][3]
x[n-1][2]
x[n-1][1]
x[n-1][0]
x[n-2][3]
x[n-2][2]
x[n-2][1]
x[n-2][0]
x[n-3][3]
x[n-3][2]
x[n-3][1]
x[n-3][0]
Figure 6 • Example of Input Buffering Scheme
Input Coefficients
The CoreFIR generator shares the input buffers for
coefficient input when embedded RAM blocks are used
as the DA LUTs (the configuration parameter coef_fixed
is set to 0). In this configuration, the width of the input
datai is the maximum of nbits_in and nbits_coef. The
input datai reads in coefficients when input coefi_en is
high. After enough coefficients are fed into the buffer,
coefi_en is ignored and the coefficients stay inside the
14
v4.0
input buffers until the DA LUT Generator finishes the
initialization of the embedded RAM blocks.
User Interface
The generator executable reads one command line
parameter, which is the name of the configuration file. It
generates RTL code for the module and testbench based
on the parameters in the configuration file. Refer to
Table 6 on page 11 and "Appendix I" on page 17 for
details of the configuration file.
CoreFIR Finite Impulse Response Filter Generator
I/O Signal Description
All following sections apply to both constant coefficient and distributed arithmetic architectures.
The FIR filter generated by the CoreFIR Generator consists of the I/O signals defined in Table 7 (see Figure 7).
Table 7 • I/O Signal Description
I/O Signal
Direction
Width
Polarity
Description
clk
Input
1
N/A
Master clock, positive edge
rstn
Input
1
Active low
Master reset, asynchronous
datai_en
Input
1
Active high
Input data enable
N/A
Input data or coefficients1
datai1
datao_valid
datao
coefi_en
2
ready2
1
Input
nbits_in
Output
1
Active high
Output data valid
Output
nbits_out3
N/A
Output data
Input
1
Active high
Coefficient input enable
Output
1
Active high
Ready to input datai
Notes:
1. Input datai is also the input for coefficients for design using embedded RAMs as DA LUTs. In this case the width can be the
maximum of nbits_in and nbits_coef.
2. Ports coefi_en and ready are only available when coef_fixed = 0.
3. Refer to the "Number of Bits of Output (nbits_out)" section on page 11 for details.
4. Refer to Table 6 on page 11 for nbits_in and nbits_coef.
FIR
clk
rstn
datao_valid
datai_en
datao
datai
ready*
datao_valid
coefi_en*
Note: *coefi_en and ready are available when coef_fixed = 0.
Figure 7 • I/O Signals
Clock and Reset
Clock
Reset
CoreFIR generates an FIR filter design that uses only
positive-edge-triggered registers. The entire design is
fully synchronized using the positive edge of the input
clock clk, including the embedded RAM blocks (when
available).
CoreFIR generates a design that uses only one active low
asynchronous reset. The entire design is asynchronously
reset by the input rstn.
v4.0
15
CoreFIR Finite Impulse Response Filter Generator
Input and Output Timing
This section applies to both constant coefficient and
distributed arithmetic architectures.
point for input data, and s1 is the sampling point for
output data. Due to variations of the configuration,
refer to comments in the generated module for t0 and
t1. These parameters are given in the number of clock
cycles of the input clock, clk.
I/O Timing Diagram of Normal FIR
Operation
The I/O timing under normal FIR operation is given in
Figure 8. The labels s0 and s2 refer to the data sampling
s0
s1
t1
s2
t0
clk
datai_en
datai
0
1
datao_valid
datao
0
Figure 8 • I/O Timing Diagram of Normal FIR Operation
I/O Timing Diagram of LUT Initialization
The I/O timing for LUT initialization is given in Figure 9.
In this figure, s0 and s1 are the starting and ending
points for feeding coefficients, and s2 is the sampling
point for the output ready. Due to variation of the
s0
configuration, refer to the comments inside the
generated module for t2. These parameters are given in
the number of clock cycles of the input clock clk.
t2
s1
clk
coefi_en
datai
0
1
ntaps – 1
2
ready
Figure 9 • I/O Timing Diagram for LUT Initialization
16
v4.0
s2
CoreFIR Finite Impulse Response Filter Generator
Appendix I
Sample Configuration File for DA Mode
The following is a sample configuration file for Distributed Arithmetic Mode.
module_name
firtest
nbits_input
8
nbits_coef
5
ntaps
13
tap
8 14 21 27 31 31 27 21 14 8 4 2 1
data_signed
0
fpga_family
ax
coef_fixed
1
sys_clk_frq
25
sample_ratio
16
module_lang
vhdl
Appendix II
Sample Configuration File for CC Mode
The following is a sample configuration file for Constant Coefficient Mode.
module_name
fir_const
nbits_input
12
nbits_coef
16
ntaps
16
taps
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
data_signed
1
v4.0
17
CoreFIR Finite Impulse Response Filter Generator
Ordering Information
Order CoreFIR through your local Actel sales representative. Use the following naming convention when ordering:
CoreFIR-XX, where XX is listed in Table 8.
Table 8 • Ordering Codes
XX
Description
EV
Evaluation version
AR
RTL for unlimited use on Actel devices
UR
RTL for unlimited use and not restricted to Actel devices
List of Changes
The following table lists critical changes that were made in the current version of the document.
Previous Version Changes in Current Version (v 4 .0 )
v3.0
Page
The "Key Features" section was updated to include information for the constant coefficient
algorithm.
1
Table 3 was updated to include Throughput.
6
Information was added for constant coefficient mode and the document was reorganized to make
clear the differences between these modes.
v2.1
v2.0
3–13
Appendix II was added for Constant Coefficient Mode.
17
The "Supported Families" section was updated to include Fusion.
1
Table 3 was updated to include Fusion data.
6
The "Supported Families" section was updated.
1
Table 3 was updated.
6
Datasheet Categories
In order to provide the latest information to designers, some datasheets are published before data has been fully
characterized. Datasheets are designated as "Product Brief," "Advanced," and "Production." The definitions of these
categories are as follows:
Product Brief
The product brief is a summarized version of an advanced or production datasheet containing general product
information. This brief summarizes specific device and family information for unreleased products.
Advanced
This datasheet version contains initial estimated information based on simulation, other products, devices, or speed
grades. This information can be used as estimates, but not for production.
Unmarked (production)
This datasheet version contains information that is considered to be final.
18
v4.0
Actel and the Actel logo are registered trademarks of Actel Corporation.
All other trademarks are the property of their owners.
www.actel.com
Actel Corporation
Actel Europe Ltd.
Actel Japan
www.jp.actel.com
Actel Hong Kong
www.actel.com.cn
2061 Stierlin Court
Mountain View, CA
94043-4655 USA
Phone 650.318.4200
Fax 650.318.4600
Dunlop House, Riverside Way
Camberley, Surrey GU15 3YL
United Kingdom
Phone +44 (0) 1276 401 450
Fax +44 (0) 1276 401 490
EXOS Ebisu Bldg. 4F
1-24-14 Ebisu Shibuya-ku
Tokyo 150 Japan
Phone +81.03.3445.7671
Fax +81.03.3445.7668
Suite 2114, Two Pacific Place
88 Queensway, Admiralty
Hong Kong
Phone +852 2185 6460
Fax +852 2185 6488
51700056-3/5.06