CoreFFT Fast Fourier Transform Product Summary Synthesis and Simulation Support Intended Use • Fast Fourier Transform (FFT) Function for Actel FPGAs • Forward and Inverse 32-, 64-, 128-, 256-, 512-, 1,024-, and 2,048-Point Complex FFT • Decimation–In-Time (DIT) Radix-2 Implementation Optimized for Actel FPGAs • Selection of Unconditional or Conditional Block Floating-Point Scaling • Embedded Generator • 8- to 16-Bit Configurable Input/Output Data and Twiddle Coefficients Precision • Naturally Ordered Input and Output Data • Two’s-Complement Fixed-Point Arithmetic • Built-In Memory Buffers • CoreFFT Provides Register Transfer Level (RTL) Code and a Behavioral Testbench • ProASICPLUS® • Axcelerator® • RTAX-S RAM-Block-Based Twiddle Factor • CoreFFT RTL Generator; Generates UserDefined FFT Model and Test Harness; Fully Supported in Actel Libero® Integrated Design Environment (IDE) Evaluation Version – Supports FFT Engine and Test Harness Generation with Limited Parameters; Fully Supported in Actel Libero IDE May 2007 © 2007 Actel Corporation • Simulation: OVI-Compliant Verilog Simulators and Vital-Compliant VHDL Simulators CoreFFT is an RTL generator that produces an Actel FPGA–optimized FFT engine. The resulting module computes 32-, 64-, 128-, 256-, 512-, 1,024-, or 2,048-point complex forward or inverse decimation-in-time (DIT) FFTs. The input and output data is represented as bb-bit words comprising b-bit real and imaginary parts (bb = b + b; b = 8 to 16 bits). Both the real and imaginary parts of the input and output data are two’scomplement numbers. The FFT module contains all the necessary memory buffers and butterfly and control logic, as well as a twiddle factor generator. A dual input buffer and a single output buffer support simultaneous input of the new data samples with FFT computation and result output. The module processes frames (bursts) of data with a frame size equal to the transform size of N words. The FFT computational process occurs in a Full Version – Synthesis: Synplicity,® Synopsys® (Design Compiler / FPGA Compiler), Exemplar™ General Description Core Deliverables • • General Description ................................................... 1 CoreFFT Device Requirements ................................... 3 Architecture ................................................................ 4 Buffering Scheme ....................................................... 5 FFT Computation ........................................................ 5 Finite Word Length Considerations .......................... 6 CoreFFT Generator Parameters ................................. 7 I/O Signal Description ................................................ 8 I/O Interface and Timing ............................................ 9 References ................................................................ 11 A Sample Configuration File ................................... 11 Ordering Information .............................................. 11 Appendix I: Fast Fourier Transform ......................... 12 List of Changes ......................................................... 14 Datasheet Categories ............................................... 14 Targeted Devices ProASIC®3/E Actel Libero IDE Contents Key Features • • v 4 .0 1 CoreFFT Fast Fourier Transform sequence of stages with the final result obtained at the last computational stage. As soon as a final output vector is ready, the FFT module puts out an N-word frame of FFT results. A negative nreset signal resets the FFT module. After reset (input nreset taken HIGH), the module enters an initialization state where internal RAM-based lookup tables (LUTs) are initialized. Once initialization is complete, the CoreFFT module automatically switches to a ready state, prepared to receive data samples to be processed. The module input start can be used to bring the module to the ready state at any time after initialization. An FFT-based system (Figure 1) consists of the following: • A host presenting data to the FFT module to be processed • The FFT module • A host accepting processed data Note: The CoreFFT module will discard data collected in its input and output buffers when start is taken HIGH by the host. Note: Signals shown in parentheses are optional. The host may use/generate these optional signals if required by the application. HOST Source of Data to Be FFTed (Keep Supplying Data) FFT b-Bit Imaginary Data d_im load b-Bit Real Data d_re y_im b-Bit Imaginary Data Input Data Validity Bit d_valid y_re b-Bit Real Data (Start FFT) start y_valid FFT Result Validity Bit y_rdy (FFT Results are available) pong (1-Bit Data ID) nreset clk read_y HOST Receiver of FFTed Data (output of another FFTed sample) Global Reset Master Clock Figure 1 • FFT Module System Block Diagram The data-source host supplies the FFT engine with the data to be transformed. Every complex input data sample (i.e., a pair of b-bit imaginary and real words) is accompanied with a validity bit. Upon receiving the validity bit, the module assumes a valid complex data sample is present on both b-bit input data busses. Once the input data buffer is full, CoreFFT automatically starts processing data stored in the buffer. At this time, the host source should stop supplying the data to CoreFFT, thus ending a current burst of input data. The data-source host can do so either by counting the number of input samples transferred or by monitoring 2 v4.0 the state of the module signal, load. The CoreFFT module drives load LOW once N samples (N is a transform size) of the current burst are received into the input buffer. As soon as the dual input buffer is ready for the next data burst, load is asserted again. The module is then ready to receive the next data burst to be processed. The data-source host can supply data at a maximum of every clock cycle, or it may skip an arbitrary number of "empty" clock periods. The host signals to the module that no data is being transferred for a given clock cycle by taking the validity bit d_valid LOW. CoreFFT Fast Fourier Transform CoreFFT Device Requirements Once the CoreFFT module has computed the FFT, the results are written to the module’s output memory buffer, and signal y_rdy is taken HIGH. Every output sample is accompanied by a validity bit, y_valid, to indicate to the receiving host that a valid output sample is ready to be read from both b-bit output busses. The receiving host can control the output sample rate using the read_y input of the module. Asserting read_y indicates to the FFT module that the receiving host is ready to read samples. Deasserting read_y informs the module that the host is not ready. Any unread samples are held in the module’s output buffer until the host is ready. Table 1 and Table 2 on page 4 provide typical utilization and performance data for CoreFFT, which is implemented in various Actel devices with the configurations listed in Table 1 and Table 2 on page 4. Device utilization and performance will vary depending upon the FFT parameters used. The transform size parameter N primarily impacts the number of RAM blocks and the time required for transformation. "FFT Computation" on page 5 provides more details on how the FFT time depends on the transform size. Table 1 • CoreFFT Device Utilization and Performance (bit width b = 16) Cells or Tiles Total Utilization % RAM Blocks Device Speed Grade Clock Rate, MHz FFT Time, µsec 2,039 7,364 29.96% 14 –2 100 11 6,105 2,062 8,167 33.23% 14 –2 92 26 1,024 6,904 2,126 9,030 36.74% 28 –2 90 58 256 6,904 2,026 8,930 15.90% 28 Std 59 19 512 6,901 2,019 8,920 15.80% 28 Std 62 39 1,024 9,091 2,431 11,522 20.50% 56 Std 62 85 256 3,398 2,373 5,771 31.81% 7 –2 130 9 512 3,601 2,380 5,981 32.96% 14 –2 120 20 1,024 3,863 2,404 6,267 34.54% 28 –2 105 50 256 3,407 2,370 5,777 31.84% 7 –1 107 10 512 3,596 2,381 5,977 32.94% 14 –1 90 27 1,024 3,881 2,397 6,278 34.60% 28 –1 76 69 FPGA Family and Device FFT Points Comb. Seq. ProASIC3/E A3P1000 256 5,325 512 PLUS ProASIC APA1000 Axcelerator AX1000 RTAX-S RTAX1000S Notes: 1. Auto-scaling (block floating point) is enabled in all cases. 2. The above data were obtained by typical synthesis and place-and-route methods. Other core parameter settings can result in different utilization and performance values. 3. All memory buffers are RAM-block-based. 4. Timing constraints supplied with CoreFFT were used. 5. Timing-driven layout options were used, effort level 3, with no multiple passes. v4.0 3 CoreFFT Fast Fourier Transform Table 2 • CoreFFT Device Utilization and Performance (bit width b = 8) Cells or Tiles Total Utilization % RAM Blocks Device Speed Grade Clock Rate, MHz FFT Time, µsec 968 3,094 12.6% 7 –2 114 10 2,283 989 3,272 13.3% 7 –2 122 20 1,024 2,509 1,026 3,535 14.4% 14 –2 118 44 256 2,386 949 3,335 5.9% 14 Std 87 13 512 2,569 974 3,543 6.3% 14 Std 82 29 1,024 2,759 1,124 3,883 6.9% 28 Std 82 64 256 1,317 958 2,275 12.5% 7 –2 170 7 512 1,435 986 2,421 13.3% 7 –2 159 15 1,024 1,620 1,016 2,636 14.5% 14 –2 137 38 256 1,317 958 2,275 12.5% 7 –1 132 8 512 1,435 986 2,421 13.3% 7 –1 116 21 1,024 1,611 1,027 2,638 14.5% 14 –1 101 52 FPGA Family and Device FFT Points Comb Seq ProASIC3/E A3P1000 256 2,126 512 ProASICPLUS APA1000 Axcelerator AX1000 RTAX-S RTAX1000S Notes: 1. Auto-scaling (block floating point) is enabled in all cases. 2. The above data were obtained by typical synthesis and place-and-route methods. Other core parameter settings can result in different utilization and performance values. 3. All memory buffers are RAM-block-based. 4. Timing constraints supplied with CoreFFT were used. 5. Timing-driven layout options were used, effort level 3, with no multiple passes. Architecture The CoreFFT module input and output data are stored in on-chip RAM blocks. The input memory buffer is also used by the FFT processor as working memory where the FFT engine stores results obtained at any intermediate FFT stage. This dual memory usage is possible due to the in-place FFT algorithm implemented by the core. The twiddle factors (algorithm coefficients) used by the FFT processor are generated by CoreFFT and stored in a RAM-based LUT. In addition to the FFT processor, the resulting module also contains control logic and a host interface used for entering data and reading the FFT results. See Figure 2. Pong Buffer Mem 0 Read Switch Complex Input Data Radix-2 Butterfly P Q Mem 1 Twiddle LUT Figure 2 • CoreFFT Architecture 4 v4.0 Write Switch Data Buffer Ping Buffer Mem 0 Complex FFT Output Mem 1 Bit-Reversed Write Addr CoreFFT Fast Fourier Transform Buffering Scheme tree. The switch output samples are then routed to the butterfly P and Q inputs. Table 3 shows a 16-point FFT example of how Read Switch rearranges sample indices coming from the Mem 0 and Mem 1 of the input memory bank. The CoreFFT module has one radix-2 butterfly, two input memory banks implementing a ping-pong input buffer, and one output buffer (Figure 2 on page 4). Both of the identical memory banks can store N complex samples. Each bank consists of two memory blocks, each of N / 2 complex words, so it can read/write two complex samples per clock. Thus, the memory bandwidth is (4 × b) bits per clock cycle. Internal logic controls the ping-pong switching between the banks so a data-source host only sees the buffer ready to accept new data. The buffer not accepting data is used by the in-place FFT engine as working memory. Table 3 • Sample Indices before and after Read Switch for 16-Point FFT Example Input From Output Mem 0 Mem 1 14 15 P Q 7 15 Stage 1 This ping-pong buffering architecture increases the efficiency of the FFT engine. While one of the two identical ping-pong input banks is involved in current FFT computation, the other is available for the downloading of the next frame of input data. As a result, the FFT engine does not sit idle waiting for fresh data to fill the input buffer. From the data-source host standpoint, the core is capable of receiving a data burst anywhere within the FFT computational period. When the module has finished processing the current data frame and the input buffer bank has been filled with another data frame, the memory ping-pong banks are swapped, and the data load and computation continues on the alternate memory banks. 6 7 6 14 12 13 5 13 4 5 4 12 10 11 3 11 2 3 2 10 8 9 1 9 0 1 0 8 Stage 2 The last stage of the FFT computation uses an out-ofplace scheme—the FFT final results are routed to the output data buffer. The results appear at the FFT engine output in bit-reversed order. A bit-reversed write address is used when writing the results to the output buffer to restore the data to normal read address order. The last stage results remain valid until the FFT engine is ready to store the results of the next data frame. 14 15 11 15 10 11 10 14 12 13 9 13 8 9 8 12 6 7 3 7 2 3 2 6 4 5 1 5 0 1 0 4 14 15 13 15 12 13 12 14 10 11 9 11 8 9 8 10 6 7 5 7 4 5 4 6 2 3 1 3 0 1 0 2 Stage 3 The CoreFFT generator also calculates the twiddle factors required by the FFT algorithm. At power-up, the twiddle factors generated are written to the twiddle factor lookup table (Twiddle LUT). FFT Computation Stage 4 An FFT computational cycle starts when input data is stored in the active ping-pong buffer bank and the FFT engine has finished processing the previous N data samples. Each memory bank comprises two bb-bit-wide memories (Mem 0 and Mem 1), supplying a data bandwidth of (4 × b) bits per clock cycle (two complex samples per clock cycle). Even input samples Di (i = 0, 2, 4, …, N – ) are stored in Mem 0, odd samples in Mem 1. 14 15 14 15 12 13 12 13 10 11 10 11 8 9 8 9 6 7 6 7 4 5 4 5 2 3 2 3 The Read Switch function is used to rearrange the two sample pairs read from the input bank to match the input sample order required by the DIT FFT algorithm 0 1 0 1 v4.0 5 CoreFFT Fast Fourier Transform The radix-2 butterfly processes the data according to the DIT algorithm,[1] one pair of samples per clock. The Write Switch works similarly to the Read Switch, rearranging the butterfly results prior to being written back to the in-place memory bank. On the last stage of every FFT computational cycle, the results are written into the output memory buffer rather than back to the in-place memory bank. Figure 3 shows the FFT computational sequence. FFT Cycle i + 1 FFT Cycle i 1 2 FFT Stages 3 ... 1 log2N 2 Ping-Pong Input Buffer ... log2N Ping-Pong Input Buffer Ping bank is available for loading input data. Ping bank is busy. Pong bank is busy. Pong bank is available for loading input data. Output Buffer Output Buffer Available for reading results of cycle (i – 1) FFT Stages 3 Accepts FFT result Available for reading results of cycle i Accepts FFT result Figure 3 • FFT Computational Sequence the minimal read sample rate to avoid FFT engine idle time is Every FFT stage takes (N / 2 + L) clock cycles EQ 1 (N / 2 + L) (log2 N – 1) / N ≈ (log2 N – 1) / 2 clock cycles EQ 4 to complete, where N/2 = the number of butterflies performed within a stage to L = an implementation-specific parameter representing the aggregate latency of the memory bank, switches, and butterfly. L is much less than the number of butterflies required (N / 2) and does not depend on transform size N. As a result, the minimal input and output sample rates required to avoid FFT engine idle time depend on the transform size N (Table 4). be Table 4 • Minimal Input and Output Sample Rates Transform Size Input Sample Output Sample N, Points Rate, Clock Cycles Rate, Clock Cycles The full FFT cycle takes (N / 2 + L) log2 N clock cycles. 256 4 3 512 4 4 1,024 5 4 EQ 2 This time is available for the new frame of N data samples to be loaded into the memory bank not involved in the current FFT computation. To provide maximum FFT engine utilization (no idle time, FFT engine full loading), the minimal input sample rate that the host should provide is ((N / 2 + L) log2 N) / N ≈ (log2 N) / 2 clock cycles EQ 3 The host can read the output buffer during the first log2 N – 1 stages of the next FFT computational cycle (the last stage is used to write fresh FFT results). Therefore, 6 v4.0 Finite Word Length Considerations The butterfly calculation involves complex multiplication, addition, and subtraction. These operations can potentially cause the butterfly data width to grow by two bits from input to output.[1], [2] At every stage of the in-place FFT algorithm, the butterfly takes two samples out of the input buffer and returns two processed samples to the same buffer location. Potentially, returning samples may have a larger data width than the CoreFFT Fast Fourier Transform samples picked from the memory. Precautions must be taken to ensure that there are no data overflows. shows the total number of bits the data loses because of bit shifting in the FFT calculation. To avoid risk of overflow, one of three methods can be employed: • Input data scaling • Unconditional block floating-point scaling • Conditional block floating-point scaling log2 N – 1 bits EQ 6 Unconditional block floating-point scaling results in the same number of bits lost as in input data scaling. However, it produces more precise results, as the FFT engine starts with more precise input data. One way to ensure that overflow never occurs is to include enough extra sign bits, called guard bits, in the FFT input data. Data can grow by a maximum factor of 2.4 from butterfly input to output (two bits of growth). However, it is not possible for the data value to grow by this maximum amount in two consecutive stages. In conditional block floating-point scaling, data is shifted only if bit growth actually occurs. If one or more butterfly outputs grow, the entire block of data is shifted to the right. The conditional block floating-point monitor checks every butterfly output for growth. If shifting is necessary, it is performed after the entire stage is complete (at the input of the next stage butterfly). This technique provides the least amount of distortion (noise) caused by finite word length. The number of guard bits necessary to compensate for the maximum possible bit growth for an N-point FFT is log2 N + 1 EQ 5 The CoreFFT module is configured to apply conditional block floating-point scaling by default. In this mode, the input data is checked as well and, if necessary, downscaled by a factor of two prior to the first stage. For example, each of the input samples of a 256-point FFT should contain nine guard bits, leaving only seven bits for actual data. Obviously, the data bit resolution is greatly limited when using the input data scaling technique. The user can optionally select one of the other two scaling modes. To apply unconditional block floatingpoint scaling, the CoreFFT configuration parameter scale needs to be set to 1. To apply input data scaling, the scale configuration parameter has to take the default value of 0, and the FFT input data has to contain the proper number of guard bits. Then the conditional block floating-point scaling will take no effect. Another way to compensate for bit growth is to scale the butterfly outputs down by a factor of two after each stage. Consequently, the final FFT results are scaled down by a factor of 1 / N. This approach is called unconditional block floating-point scaling. Initially, two guard bits are included in the input data to accommodate the maximum bit growth at the very first stage. In each successive butterfly calculation, the data can grow into these guard bits. To prevent overflow in successive stages, the guard bits are replaced before the next stage is executed by shifting the entire block of data (all results of the current stage) one bit to the right. The input data of an unconditional block floating-point FFT can have at most 14 bits (1 sign bit and 13 magnitude bits). EQ 6 CoreFFT Generator Parameters CoreFFT generates RTL code for a few selectable FFT cores that vary depending on parameters set by the user when generating the module. The core generator supports the variations specified in Table 5. Table 5 • Core Generator Parameters Parameter Name Description Recommended Selection inv Forward/inverse FFT 0 (FFT) / 1 (IFFT) scale Unconditional block floating-point scaling 0 (conditional block floating-point) / 1 (unconditional block floating-point) points Transform size 32, 64,128, 256, 512, 1024, 2048 bits FFT engine bit width 8 to 16 fpga_family FPGA family ax (Axcelerator, RTAX-S), apa (ProASICPLUS), pa3 (ProASIC3) lang RTL code language vhdl, verilog v4.0 7 CoreFFT Fast Fourier Transform I/O Signal Description Figure 4 shows the CoreFFT module pinout. FFT load d_im d_re d_valid start pong read_y y_im y_re y_valid nreset y_rdy clk Figure 4 • CoreFFT I/O Signals The CoreFFT module I/O signal functionality is listed in Table 6. It is assumed that the module has been configured to compute an N-point FFT/IFFT. Table 6 • I/O Signal Description Signal Name Direction Description clk Input System clock. Active rising edge. nreset Input System asynchronous reset. Active low. d_im[b – 1:0] Input Input imaginary data bus. The imaginary part of the input complex data should be placed on this bus. Bit b – 1 is the MSB. Data are assumed to be presented in two’s-complement format. The imaginary and real parts should be supplied simultaneously. d_re[b – 1:0] Input Input real data bus. The real part of the input complex data should be placed on this bus. Bit b – 1 is the MSB. Data are assumed to be presented in two’s-complement format. The imaginary and real parts should be supplied simultaneously. d_valid Input Input complex word valid. Active high. The bit accompanies valid input samples coming to input busses d_im and d_re. At any system clock interval where d_valid is active, input busses d_im and d_re are considered to present another input complex sample. load Output The FFT module input buffer accepts data. Active high. The signal is active when the input buffer (either of two banks) is ready to accept data. The signal stays active until the buffer is full. start Input FFT start signal. Active high. start is asserted to begin the transform processing or to return the module to the initial ready state. y_rdy Output FFT results ready. Active high. The signal goes active when the FFT results are ready for the host to read. It stays HIGH during host read. y_im[b – 1:0] Output Output imaginary data bus. The imaginary part of the output complex data appears on this bus. Bit b – 1 is the MSB. Data are presented in two’s-complement format. The imaginary and real parts appear simultaneously. y_re[b – 1:0] Output Output real data bus. The real part of the output complex data appears on this bus. Bit b – 1 is the MSB. Data are presented in two’s-complement format. The imaginary and real parts appear simultaneously. 8 v4.0 CoreFFT Fast Fourier Transform Table 6 • I/O Signal Description (continued) Signal Name Direction Description y_valid Output Output complex word valid. Active high. The bit accompanies valid output samples on output busses y_im and y_re. At any system clock interval while y_valid is active, a complex sample is available on output busses d_im and d_re. read_y Input Read FFT output. Active high. If the signal is active, the module puts out the FFT results in a single burst, one complex word per clock cycle. The host can insert arbitrary breaks into the burst by deactivating the signal any time during the burst. pong Output Pong bank of the input buffer is being used by the FFT engine as a working memory. I/O Interface and Timing Resetting the Module buffer. The CoreFFT module puts out the post-processed complex samples on the two b-bit busses, y_im and y_re. Every valid complex sample is accompanied by a y_valid bit. In the basic mode where the host does not control the FFT output data rate (signal read_y remains permanently active), all N post-processed complex samples from a single burst are available consecutively at each rising system clock edge (Figure 6 on page 10). During this mode, y_valid remains valid for N system clock cycles. Upon reset, the module returns to its initial state with input and output buffer pointers reset to zero. The input buffer is now ready to accept a new data frame; signal load is asserted, and signals y_rdy and y_valid are deasserted. Both nreset and start reset the module. Loading Input Data Input data can be loaded once the signal load is asserted; otherwise, the module ignores any activity on the data loading pins. When the host detects an active load signal, it may begin writing data via the b-bit busses d_im and d_re. Every valid complex sample must be accompanied by an active d_valid signal (Figure 5 on page 10). The module samples d_valid at each rising edge of the system clock. Once an active d_valid signal is detected, the core assumes a new complex sample has been written to the input busses. The module then writes the new sample to the input buffer. By the next system clock edge, the module is ready to accept another input sample. The host can control the input sample rate via d_valid. Once the module has received N complex samples, the input buffer is now full, and the module deasserts the signal load. The host can control the FFT output sample rate via the read_y signal. The input read_y acts similarly to an output clock enable signal: when held HIGH, the module will continue generating FFT results at each clock edge; when held LOW, the module will pause in generating. An example of a controlled output rate mode is depicted in Figure 7 on page 10. Every new output sample is valid for two system clock cycles, as read_y is asserted only every other clock cycle. (Note that there is latency of one clock cycle between the signal read_y and a valid sample output). The CoreFFT module design does not place any restrictions on the duty cycle of the read_y signal. However, for the FFT engine to operate at maximum efficiency (i.e., no idle time), the post-processed results must be read out of the output buffer before the engine needs to write the results of the next data frame (this time is marked as Accept FFT Result in Figure 3 on page 6). Table 4 on page 6 shows the minimal output reading rate that does not impact the efficient use of the FFT engine. Reading Output Data Once the FFT engine completes another FFT computational cycle, it asserts the y_rdy signal. The FFT results are now available for the host in the output v4.0 9 CoreFFT Fast Fourier Transform clk load d_valid d_im, d_re samples 0 don’t care 1 2 don’t care 1 2 3 4 5 N–4 N–3 don’t care don’t care N–2 N–1 Figure 5 • Data Load Timing clk y_rdy read_y y_valid y_im, y_re samples 0 3 4 N–4 N–3 N–2 N–1 5 Figure 6 • FFT Results Output clk y_rdy read_y y_valid y_im, y_re samples 0 1 2 Figure 7 • Host Controls the Output Sample Rate 10 v4.0 N–2 N–1 N CoreFFT Fast Fourier Transform References Ordering Information 1. L. R. Rabiner and B. Gold, Theory and Application of Digital Signal Processing, Prentice Hall, 1975. Order CoreFFT through your local Actel sales representative. Use the following numbering convention when ordering: CoreFFT-XX, where XX is listed in Table 7. 2. Alan V. Oppenheim and Ronald W. Schafer with John R. Buck, Discrete-Time Signal Processing, Second Edition, Prentice Hall, 1998. Table 7 • Ordering Codes A Sample Configuration File The following is an example configuration file: inv 0 scale 0 points 256 bits 16 fpga_family pa3 lang verilog v4.0 XX Description EV Evaluation version AR RTL for unlimited use on Actel devices UR RTL for unlimited use and not restricted to Actel devices 11 CoreFFT Fast Fourier Transform Appendix I: Fast Fourier Transform The FFT is a computationally efficient algorithm for computing a discrete Fourier transform (DFT). The N-point DFT is defined as N–1 X(k) ≡ ∑ x [ n ]e ( – jnk2π ) ⁄ N k = 0, 1, 2, ..., N – 1 n=0 EQ 7 where N is the transform size, or number of points. The inverse N-point DFT is defined as N–1 1 x ( n ) = ⎛ ----⎞ ⎝ N⎠ ∑ X [ k ]e ( jnk2π ) ⁄ N k = 1, 2, 3, ..., N – 1 k=0 EQ 8 It is common practice to call the exponential vector rotating factors above the "twiddle factors." Every twiddle factor contains real and imaginary parts W = W r – jWi = e ( – jnk2π ) ⁄ N EQ 9 P = Pr + jPi outP = outPr + j × outPi outQ = outQr + j × outQi Q = Qr + jQi -1 W = Wr + jWi Figure 8 • Radix-2 DIT Butterfly The butterfly performs the basic FFT computation according to the following equations: outP = P + Q × W EQ 10 outP = P – Q × W outP = Pr + jPi + (Qr cos X + Qi sin X) + j(Qi cos X – Qr sin X) = (Pr + Qr cos X + Qi sin X) + j(Pi + Qi cos X – Qr sin X) EQ 13 EQ 11 The twiddle factor can be expressed as W = cos X – j sin X outQ = Pr + jPi – (Qr cos X + Qi sin X) – j(Qi cos X – Qr sin X) = (Pr – Qr cos X – Qi sin X) + j(Pi – Qi cos X + Qr sin X) EQ 14 EQ 12 12 By substituting the values from EQ 12 and expressing P and Q in terms of their real and imaginary parts, EQ 10 and EQ 11 become v4.0 CoreFFT Fast Fourier Transform Figure 9 depicts an example of an 8-point FFT algorithm. The tree contains log2 8 = 3 stages with 8 / 2 = 4 butterflies calculated at every stage. xe[0] x[0] m x[4] W0 xe[1] –1 W2 W2 xe[3] –1 W6 x[3] W3 xo[0] x[1] x[4] W4 W0 xo[1] –1 x[5] W5 W2 xo[2] x[5] W4 x[7] x[2] m W4 x[3] x[1] W1 xe[2] x[2] x[6] x[0] W0 x[6] W6 xo[3] –1 W6 x[7] W7 Figure 9 • 8-Point FFT Using Decimation-in-Time Algorithm v4.0 13 CoreFFT Fast Fourier Transform List of Changes The following table lists critical changes that were made in the current version of the document. Previous Version Changes in Current Version (v 4 .0 ) v3.0 v2.0 Page Fusion was removed from the "Targeted Devices" section. 1 The "Synthesis and Simulation Support" section was updated. 1 FFT size expanded to include 32-, 64-, 128-, and 2,048-point transforms. 1 Input/output data width changed to 8- to 16-bit (configurable). 1 All input and output data widths throughout document now expressed in terms of configurable bit width b = 8 to 16 bits. N/A Table 1 replaced with Table 1 and Table 2. 3–4 EQ 4 was revised. 6 Table 5 updated: removed module_name parameter, added bits parameter, updated points and fpga_family parameters. 7 Updated the sample in the "A Sample Configuration File" section. 11 The "Targeted Devices" section was updated to include Fusion. 1 Table 1 (now Table 1 and Table 2) was updated to include Fusion data. 3 Datasheet Categories In order to provide the latest information to designers, some datasheets are published before the data has been fully characterized. Datasheets are designated as "Product Brief," "Advanced," and "Production." The definitions of these categories are as follows: Product Brief The product brief is a summarized version of an advanced or production datasheet containing general product information. This brief summarizes specific device and family information for unreleased products. Advanced This datasheet version contains initial estimated information based on simulation, other products, devices, or speed grades. This information can be used as estimates, but not for production. Unmarked (production) This datasheet version contains information that is considered to be final. 14 v4.0 Actel and the Actel logo are registered trademarks of Actel Corporation. All other trademarks are the property of their owners. www.actel.com Actel Corporation Actel Europe Ltd. Actel Japan www.jp.actel.com Actel Hong Kong www.actel.com.cn 2061 Stierlin Court Mountain View, CA 94043-4655 USA Phone 650.318.4200 Fax 650.318.4600 Dunlop House, Riverside Way Camberley, Surrey GU15 3YL United Kingdom Phone +44 (0) 1276 401 450 Fax +44 (0) 1276 401 490 EXOS Ebisu Bldg. 4F 1-24-14 Ebisu Shibuya-ku Tokyo 150 Japan Phone +81.03.3445.7671 Fax +81.03.3445.7668 Suite 2114, Two Pacific Place 88 Queensway, Admiralty Hong Kong Phone +852 2185 6460 Fax +852 2185 6488 51700058-2/5.07