AVR32718: AT32UC3 Series Software Framework DSPLib 32-bit 1. Introduction This application note describes the DSP Library from the AVR32® Software Framework. It details the main functions (prototype, algorithm and benchmark) of the DSP library: FFT, convolution, FIR and partial IIR using GCC compiler. All the source code (C code and assembly), software example and GCC and IAR projects are released in the AVR32 UC3 Software Framework. 1.1 Microcontrollers Application Note References • AVR32 UC3 Software Framework: This framework provides software drivers, libraries and application examples to build any application for AVR32 UC3. devices.http://www.atmel.com/dyn/products/tools_card.asp?tool_id=4192 • AVR32 Architecture Manual: http://www.atmel.com/dyn/resources/prod_documents/doc32000.pdf 32076A–AVR32–11/07 2. Radix-4 decimate in time complex FFT 2.1 Description This function computes a complex FFT from an input signal. It uses the Radix-4 “Decimate In Time” algorithm and does not perform a calculation “in place” which means that the input buffer has to be different from the output buffer.Function prototype void dspXX_trans_complexfft( dspXX_complex_t *vect1, dspXX_t *vect2, int nlog); where XX corresponds to the number of bits of a basic data element (i.e. 16 or 32). 2.1.1 Arguments This function takes three parameters: the output buffer, the input buffer and a value corresponding to the size of those buffers. The output buffer (vect1) is a pointer on a complex vector of 2^nlog elements. The input buffer (vect2) is a pointer on a real vector of 2^nlog elements. The size argument (nlog) is in fact the base-2-logarithm of the size of the input vector. (nlog fits in [2, 4, 6, …, 28]) 2.1.2 Algorithm Following is the algorithm used to implement the radix-4 DIT complex FFT. The optimized version is based on this algorithm but can differ in certain points due to the instruction set of the target: size = 1 << nlog FOR r FROM 0 TO size-1 STEP 4 DO Butterfly_zero_only_real_and_bit_reversing(vect1, vect2, r) END FOR stage FROM 1 TO nlog/2 DO m = 4 ^ stage FOR r FROM 0 TO size-1 STEP m DO Butterfly_zero(vect1, r) END FOR j FROM 1 TO m / 4 - 1 DO Comput_twiddle_factors(e, e2, e3, j / m) FOR r FROM 0 TO size-1 STEP m DO Butterfly(vect1, r, j, e, e2, e3) END END END 2 AVR32718 32076A–AVR32–11/07 AVR32718 2.1.3 Notes Interruptibility: the code is interruptible. In-place computation is not allowed. This function uses a static twiddle factors table raw-coded in the file “BASIC/TRANSFORMS/dspXX_twiddle_factors.h”. To generate those factors, you can use the script called “tf_gen.sci” and execute it with Scilab. To avoid overflowing values, the resulting vector amplitude is scaled by 2^nlog. All the vectors have to be 32-bit aligned. 2.2 2.2.1 Benchmark Benchmark routine All these functions have been benchmarked on an avr32-uc3a0512 target. The programs have been compiled with avr32-gcc (4.0.2-atmel.1.0.0) with the –O3 optimization option and have been stored in FLASH memory. The fixed-point format used is the Q1.15 format for the 16-bit data and the Q1.31 format for the 32-bit data. The benchmark process has been performed with the same input signal for all those functions and compared with a reference’s signal computed with a mathematic tool using floating point. The input signal is a combination of one sine and one cosine. The sine oscillating at 400Hz and the cosine at 2KHz. Those signals have been multiplied and sampled at 40KHz. > Complex FFT Input signal 2.2.2 Signal resulting and formatted (FFT) Result Here are tables of the main values of the benchmark results. All those values correspond to the best performances of the functions and are obtained with different compilation options. For more information, please refer to the complete benchmark result table in annexes. 3 32076A–AVR32–11/07 2.2.2.1 16-bit radix-4 D.I.T. complex FFT: generic Concerned file path: /BASIC/TRANSFORMS/dsp16_complex_fft_generic.c Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average Max. amplitude Lowest Algorithm’s size in memory 1.1 Kbytes 64-points 6,296 108.2us 1.58e-5 6.53e-5 256-points 33,723 578.0us 1.69e-5 8.80e-5 1.3 Kbytes 1024-points 169,006 2.90ms 1.67e-5 12.31e-4 2.0 Kbytes 4096-points 812,321 13.90ms 1.52e-5 14.60e-4 5.0 Kbytes More details on Table 1.1.1 in annexes 2.2.2.2 16-bit radix-4 D.I.T. complex FFT: avr32-uc3 optimized Concerned file path: /BASIC/TRANSFORMS/dsp16_complex_fft_avr32uc3.c Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average Max. amplitude Lowest Algorithm’s size in memory 64-points 2,611 44.4 us 1.63e-5 6.53e-5 710 bytes 256-points 13,661 232.2 us 1.68e-5 7.46e-5 902 bytes 1024-points 67,671 1.15 ms 1.69e-5 1.02e-4 1.6 Kbytes 4096-points 322,897 5.49 ms 1.58e-5 1.18e-4 4.6 Kbytes Warning: this function is only compatible with Q1.15 numbers. Note: this function needs 72 bytes of memory for the stack. More details on Table 1.1.2 in annexes 2.2.2.3 32-bit radix-4 D.I.T. complex FFT: generic Concerned file path: /BASIC/TRANSFORMS/dsp32_complex_fft_generic.c Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average Max. amplitude Lowest Algorithm’s size in memory 64-points 13,206 225.2us 6.0e-10 5.7e-9 2.0 Kbytes 256-points 74,297 1.27us 3.0e-10 4.8e-9 2.4 Kbytes 1024-points 383,212 6.53ms 3.0e-10 6.1e-9 3.9 Kbytes More details on Table 1.2.1 in annexes 4 AVR32718 32076A–AVR32–11/07 AVR32718 3. Convolution 3.1 Description This function performs a linear convolution between two discrete sequences. 3.1.1 Function prototype void dspXX_vect_conv( dspXX_t *vect1, dspXX_t *vect2, int vect2_size, dspXX_t *vect3, int vect3_size); where XX corresponds to the number of bits of a basic data element (i.e. 16 or 32). 3.1.2 Arguments This function takes five parameters: the output buffer, the two discrete sequences and their respective sizes. • The output buffer (vect1) is a pointer on a real vector of (vect2_size + vect3_size - 1) elements. • The first input buffer (vect2) is a pointer on a real vector of vect2_size elements. • The first size argument (vect2_size) is the length of the first input buffer (vect2_size ∈ [8, 9, 10, …]). • The second input buffer (vect3) is a pointer on a real vector of vect3_size elements. • The second size argument (vect3_size) is the length of the second input buffer (vect3_size fits in [8, 9, 10, …]). 3.1.3 Requirements This function requires 3 modules: Module name Function name Concerned file path Zero Padding dspXX_vect_zeropad /BASIC/VECTORS/zero_padding.c Copy dspXX_vect_copy /BASIC/VECTORS/copy.c Partial Convolution dspXX_vect_convpart /BASIC/VECTORS/convolution_partial.c The output buffer of the function has to have at least a length of N + 2*M – 2 elements because of intern computations, where N is the length of the largest input buffer and M, the length of the smallest input buffer. 5 32076A–AVR32–11/07 3.1.4 Algorithm Following is the algorithm used to implement the convolution product. The optimized version is based on this algorithm but can differ in certain points due to the instruction set of the target: IF vect2_size >= vect3_size THEN vect1 = 0000…0000 vect2 0000…0000 vect3_size – 1 vect2_size vect3_size – 1 Partial_convolution(vect1, vect1, vect2_size + 2*(vect3_size – 1), vect3, vect3_size) ELSE vect1 = 0000…0000 vect3 0000…0000 vect2_size – 1 vect3_size vect2_size – 1 Partial_convolution(vect1, vect1, vect3_size + 2*(vect2_size – 1), vect2, vect2_size) END 3.1.5 Notes • Interruptibility: the code is interruptible. • Due to its implementation, the dsp16-avr32-uc3 optimized version of the FIR requires a length of 4*m elements for the largest input discrete sequence and the output buffer (vect1) has to have a length of 4*n elements to avoid overflows. • The input discrete sequences have to be scaled to avoid overflowing values. • All the vectors have to be 32-bit aligned. 3.2 3.2.1 Benchmark Benchmark routine All these functions have been benchmarked on an avr32-uc3a0512 target. The programs have been compiled with avr32-gcc (4.1.2-atmel.1.0.0) with the –O3 optimization option and have been stored in FLASH memory. The fixed-point format used is the Q1.15 format for the 16-bit data and the Q1.31 format for the 32-bit data. The benchmark process has been performed with the same input signal and impulse response for all those functions and compared with a reference’s signal computed with a mathematic tool using floating point. The first input signal is a sine oscillating at 433Hz and the second input signal is a cosine oscillating at 2KHz. Those signals are sampled at 40KHz. 6 AVR32718 32076A–AVR32–11/07 AVR32718 > 1st input signal FIR filter Resulting signal 2nd input signal 3.2.2 Result Here are tables of the main values of the benchmark results. All those values correspond to the best performances of the functions and are obtained with different compilation options. For more information, please refer to the complete benchmark result table in annexes. Concerned file path: /BASIC/VECTORS/convolution.c 3.2.2.1 16-bit Convolution: generic Algorithm’s size in memory: 2.2 Kbytes. Length of the first input signal: 64 elements. Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average 32-points 23,524 408.5us 2.0e-5 4.5e-5 64-points 57,757 1.00ms 1.8e-5 4.7e-5 128-points 86,752 1.50ms 1.8e-5 4.4e-5 256-points 144,736 2.51ms 1.5e-5 4.8e-5 Max. amplitude More details on Table 2.1.1 in annexes 7 32076A–AVR32–11/07 3.2.2.2 16-bit Convolution: avr32-uc3 optimized Algorithm’s size in memory: 950 bytes. Length of the first input signal: 64 elements. Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average 32-points 8,248 151.2us 2.0e-5 4.5e-5 64-points 19,087 349.1us 1.8e-5 4.7e-5 128-points 28,532 521.8us 1.8e-5 4.4e-5 256-points 47,412 866.9us 1.5e-5 4.8e-5 Max. amplitude More details on Table 2.1.2 in annexes 3.2.2.3 32-bit Convolution: generic Algorithm’s size in memory: 3.3 Kbytes. Length of the first input signal: 64 elements. Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average 32-points 42,572 729.8us 0.4e-9 2.1e-9 64-points 109,179 1.87ms 0.4e-9 1.7e-9 128-points 163,968 2.81ms 0.5e-9 1.6e-9 256-points 273,536 4.69ms 0.6e-9 2.7e-9 Max. amplitude More details on Table 2.2.1 in annexes 3.2.2.4 32-bit Convolution: avr32-uc3 optimized Algorithm’s size in memory: 1.5 Kbytes. Length of the first input signal: 64 elements. Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average Max. amplitude 32-points 19,958 340.2us 0.5e-9 2.1e-9 64-points 50,501 860.4us 0.5e-9 1.7e-9 128-points 75,722 1.29ms 0.6e-9 2.4e-9 256-points 126,154 2.15ms 0.7e-9 2.7e-9 More details on Table 2.2.2 in annexes 8 AVR32718 32076A–AVR32–11/07 AVR32718 4. FIR Filter (alias Partial Convolution) 4.1 Description This function computes a real FIR filter using the impulse response of the desire filter onto a fixed-length signal. 4.1.1 Function prototype void dspXX_filt_fir( void dspXX_vect_convpart( dspXX_t *vect1, dspXX_t *vect1, dspXX_t *vect2, dspXX_t *vect2, int size, int vect2_size, dspXX_t *h, dspXX_t *vect3, int h_size); int vect3_size); where XX corresponds to the number of bits of a basic data element (i.e. 16 or 32). 4.1.2 Arguments This function takes five parameters: the output buffer, the input buffer, its size, the impulse response of the filter and its size. The output buffer (vect1) is a pointer on a real vector of (size - h_size + 1) elements. The input buffer (vect2) is a pointer on a real vector of size elements. The size argument (size) is the length of the input buffer (size fits in [4, 8, 12, …]). The impulse response of the filter (h) is a pointer on a real vector of h_size elements. The size argument (h_size) is the length of the impulse response of the filter (h_size fits in [8, 9, 10, …]). 4.1.3 Requirements This function requires one module: Module name Function name Concerned file path Partial Convolution dspXX_vect_convpart /BASIC/VECTORS/convolution_partial.c 9 32076A–AVR32–11/07 4.1.4 Algorithm Following is the algorithm used to implement the FIR filter. The optimized version is based on this algorithm but can differ in certain points due to the instruction set of the target: FOR j FROM 0 TO size - h_size + 1 DO sum = 0 FOR i FROM 0 TO h_size DO sum += vect2[i] * h[h_size - i - 1] END vect1[j] = sum >> DSPXX_QB END 4.1.5 Notes • Interruptibility: the code is interruptible. • Due to its implementation, for the dsp16-avr32-uc3 optimized version of the FIR, the output buffer (vect1) has to have a length of 4*n elements to avoid overflows. • The impulse response of the filter has to be scaled to avoid overflowing values. • All the vectors have to be 32-bit aligned. 4.2 4.2.1 Benchmark Benchmark routine All these functions have been benchmarked on an avr32-uc3a0512 target. The programs have been compiled with avr32-gcc (4.1.2-atmel.1.0.0) with the –O3 optimization option and have been stored in FLASH memory. The fixed-point format used is the Q1.15 format for the 16-bit data and the Q1.31 format for the 32-bit data. The benchmark process has been performed with the same input signal and impulse response for all those functions and compared with a reference’s signal computed with a mathematic tool using floating point. The input signal is a combination of one sine and one cosine. The sine oscillating at 400Hz and the cosine at 2KHz. Those signals have been multiplied and sampled at 40KHz. The impulse response describes a low-pass filter with a cutoff frequency equal to 400Hz. 10 AVR32718 32076A–AVR32–11/07 AVR32718 > Input signal FIR filter Resulting signal Impulse response of the filter 4.2.2 Result Here are tables of the main values of the benchmark results. All those values correspond to the best performances of the functions and are obtained with different compilation options. For more information, please refer to the complete benchmark result table in annexes. 4.2.2.1 16-bit FIR filter: generic Concerned file path: /BASIC/VECTORS/dsp16_convpart_generic.c Algorithm’s size in memory: 2.0 Kbytes. Number of Taps: 24. Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average 64-points 7,424 128.0us 2.27e-5 9.46e-5 256-points 41,793 720.0us 2.22e-5 9.46e-5 512-points 87,617 1.51ms 2.23e-5 9.46e-5 1024-points 179,265 3.09ms 2.21e-5 9.46e-5 Max. amplitude More details on Table 3.1.1 in annexes 4.2.2.2 16-bit FIR filter: avr32-uc3 optimized Concerned file path: /BASIC/VECTORS/dsp16_convpart_avr32uc3.c Algorithm’s size in memory: 770 bytes. Number of Taps: 24. Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average 64-points 2,439 44.3us 2.27e-5 9.46e-5 256-points 12,712 230.7us 2.22e-5 9.46e-5 512-points 26,408 479.2us 2.23e-5 9.46e-5 1024-points 53,800 976.3us 2.21e-5 9.46e-5 Max. amplitude More details on Table 3.1.2 in annexes 11 32076A–AVR32–11/07 4.2.2.3 32-bit FIR filter: generic Concerned file path: /BASIC/VECTORS/dsp32_convpart_generic.c Algorithm’s size in memory: 3.1 Kbytes. Number of Taps: 24. Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average 64-points 13,984 239.4us 2.1e-9 1.24e-8 256-points 79,073 1.35ms 2.3e-9 1.74e-8 512-points 165,857 2.84ms 2.6e-9 2.31e-8 1024-points 339,425 5.81ms 3.7e-9 2.84e-8 Max. amplitude More details on Table 3.2.1 in annexes 4.2.2.4 32-bit FIR filter: avr32-uc3 optimized Concerned file path: /BASIC/VECTORS/dsp32_convpart_avr32uc3.c Algorithm’s size in memory: 1.3 Kbytes. Number of Taps: 24. Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average Max. amplitude 64-points 6,479 110.2us 2.1e-9 1.24e-8 256-points 36,432 619.0us 2.3e-9 1.24e-8 512-points 76,368 1.30ms 2.6e-9 2.31e-8 1024-points 156,240 2.65ms 3.7e-9 2.84e-8 More details on Table 3.2.2 in annexes 12 AVR32718 32076A–AVR32–11/07 AVR32718 5. Partial IIR Filter 5.1 Description This function computes a real IIR filter using the impulse response of the desire filter onto a fixed-length signal. 5.1.1 Function prototype void dspXX_filt_iir( dspXX_t *vect1, dspXX_t *vect2, int size, dspXX_t *num, int num_size, dspXX_t *den, int den_size, int num_prediv, int den_prediv); where XX corresponds to the number of bits of a basic data element (i.e. 16 or 32). 5.1.2 Arguments This function takes five parameters: the output buffer, the input buffer, its size, the coefficients of the filter, theirs sizes and a coefficient’s predivisor. The output buffer (vect1) is a pointer on a real vector of (size - num_size + 1) elements. The input buffer (vect2) is a pointer on a real vector of size elements. The size argument (size) is the length of the input buffer (size fits in [4, 5, 6, 7, …]). The numerator’s coefficients argument of the filter (num) is a pointer on a real vector of num_size elements. The size argument (num_size) is the length of the numerator’s coefficients of the filter (num_size fits in [1, 2, 3, …]). The denominator’s coefficients argument of the filter (den) is a pointer on a real vector of den_size elements. The size argument (den_size) is the length of the denominator’s coefficients of the filter (den_size fits in [1, 2, 3, …]). The predivisors (num_prediv and den_prediv) are used to scale down the denominator/numerator’s coefficients of the filter in order to avoid overflow values. So when you use this feature, you have to prescale manually the denominator/numerator’s coefficients by 2^prediv else leave this field to 0. 13 32076A–AVR32–11/07 5.1.3 Algorithm Following is the algorithm used to implement the IIR filter. The optimized version is based on this algorithm but can differ in certain points due to the instruction set of the target: // Initialization of the vect1 coefficients FOR n FROM 0 TO den_size - 1 DO sum1 = 0 FOR m FROM 0 TO num_size - 1 DO sum1 += num[m] * vect2[n + num_size - m - 1] END sum2 = 0 FOR m FROM 1 TO n DO sum2 += den[m] * vect1[n - m] END vect1[n] = (sum1 – (sum2 << prediv)) >> DSPXX_QB END FOR n FROM n TO size – num_size DO sum1 = 0 FOR m FROM 0 TO num_size - 1 DO sum1 += num[m] * vect2[n + num_size - m - 1] END sum2 = 0 FOR m FROM 1 TO den_size - 1 DO sum2 += den[m] * vect1[n - m] END vect1[n] = (sum1 – (sum2 << prediv)) >> DSPXX_QB END 5.1.4 Notes • Interruptibility: the code is interruptible. • Due to its implementation, for the dsp16-avr32-uc3 optimized version of the FIR, the output buffer (vect1) has to have a length of 4*n elements to avoid overflows. • The impulse response of the filter has to be scaled to avoid overflowing values. • All the vectors have to be 32-bit aligned. • The first denominator’s coefficient have to be equal to 1 / (2^prediv). • The predivisor (prediv) must be lower or equals to the constant DSPXX_QB. 14 AVR32718 32076A–AVR32–11/07 AVR32718 5.2 5.2.1 Benchmark Benchmark routine All these functions have been benchmarked on an avr32-uc3a0512 target. The programs have been compiled with avr32-gcc (4.1.2-atmel.1.0.0) with the –O3 optimization option and have been stored in FLASH memory. The fixed-point format used is the Q1.15 format for the 16-bit data and the Q1.31 format for the 32-bit data. The benchmark process has been performed with the same input signal and impulse response for all those functions and compared with a reference’s signal computed with a mathematic tool using floating point. The input signal is a combination of one sine and one cosine. The sine oscillating at 400Hz and the cosine at 4KHz. Those signals have been added together and sampled at 8KHz. The filter used is a low-pass Butterworth filter with a cutoff frequency equal to 2KHz. 15 32076A–AVR32–11/07 5.2.2 Result Here are tables of the main values of the benchmark results. All those values correspond to the best performances of the functions and are obtained with different compilation options. For more information, please refer to the complete benchmark result table in annexes. 5.2.2.1 16-bit IIR filter: generic Concerned file path: /BASIC/FILTERING/dsp16_iir_generic.c Algorithm’s size in memory: 266 bytes. Order of the filter: 7 Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average 72-points 11,469 213.4us 1.40e-5 4.30e-5 256-points 44,591 829.8us 1.40e-5 4.30e-5 512-points 90,671 1.69ms 1.40e-5 4.30e-5 1024-points 182,831 3.40ms 1.40e-5 4.30e-5 Max. amplitude More details on Table 4.1.1 in annexes 5.2.2.2 16-bit IIR filter: avr32-uc3 optimized Concerned file path: /BASIC/FILTERING/dsp16_iir_avr32uc3.c Algorithm’s size in memory: 1.0 Kbytes (size optimization), 3.1 Kbytes (speed optimization). Order of the filter: 7 Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average 72-points 4,332 78.0us 1.90e-5 6.90e-5 256-points 15,006 270.5us 1.70e-5 6.90e-5 512-points 29,854 538.2us 1.70e-5 6.90e-5 1024-points 59,550 1.07ms 1.70e-5 6.90e-5 Max. amplitude More details on Table 4.1.2 in annexes 5.2.2.3 32-bit IIR filter: generic Concerned file path: /BASIC/FILTERING/dsp32_iir_generic.c Algorithm’s size in memory: 400 bytes. Order of the filter: 7 Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average Max. amplitude 72-points 14,517 265.4us 1.50e-9 7.00e-9 256-points 56,471 1.03ms 1.50e-9 7.00e-9 512-points 114,839 2.10ms 1.50e-9 7.00e-9 1024-points 231,575 4.23ms 1.50e-9 7.00e-9 More details on Table 4.2.1 in annexes 16 AVR32718 32076A–AVR32–11/07 AVR32718 5.2.2.4 32-bit IIR filter: avr32-uc3 optimized Concerned file path: /BASIC/FILTERING/dsp32_iir_avr32uc3.c Algorithm’s size in memory: 3.0 Kbytes. Order of the filter: 7 Lowest Error Lowest cycle count Fastest computation at 60 MHz Amplitude average 72-points 8,859 153.9us 1.50e-9 7.00e-9 256-points 33,333 577.1us 1.50e-9 7.00e-9 512-points 67,381 1.17ms 1.60e-9 7.00e-9 1024-points 135,477 2.34ms 1.60e-9 7.00e-9 Max. amplitude More details on Table 4.2.2 in annexes Benchmark results for the 16-bit version (with speed optimization) The benchmark has been performed on a 72-element input signal. 25000 Cycles 20000 15000 avr32-uc3 optimized generic 10000 5000 19 17 15 13 11 9 7 5 3 1 0 Number of coefficients 17 32076A–AVR32–11/07 Benchmark results for the 32-bit version (with speed optimization) The benchmark has been performed on a 72-element input signal. 30000 25000 Cycles 20000 avr32uc3 optimized 15000 generic 10000 5000 19 17 15 13 11 9 7 5 3 1 0 Number of coefficients Remark: the number of coefficients corresponds to a filter which order is equal to “Number of coefficients” – 1. 18 AVR32718 32076A–AVR32–11/07 AVR32718 6. Annexes Optimization Table 1.1.1 - Benchmark of 16-bit Radix-4 D.I.T. complex FFT: generic 64-point 256-point 1024-point 4096-point Error T.F.T.* in SRAM (cycles) T.F.T.* in FLASH (cycles) wait-state wait-state Ampli- Max. tude ampliaveraget u d e (x10-5) (x10-5) Algorithm’s size in memory (bytes) (code size + T.F.T.* size) 0 at 30MHz 1 at 60MHz 0 at 30MHz 1 at 60MHz (0) 6,296 (209.9us) 6,489 (108.2us) 6,640 (221.3us) 7,012 (116.9us) 2.98 15.63 (1) 6,343 (211.4us) 6,538 (109.0us) 6,777 (225.9us) 7,167 (119.5us) 1.58 6.53 1K (844 + 194) (2) 6,666 (222.2us) 6,959 (116.0us) 7,092 (236.4us) 7,546 (125.8us) 2.98 15.63 1.1K (1032 + 66) 1K (840 + 194) (3) 6,747 (224.9us) 6,996 (116.6us) 7,101 (236.7us) 7,536 (125.6us) 1.58 6.53 1K (984 + 66) (0) 33,723 (1124.1us) 34,682 (578.0us) 35,265 (1175.5us) 37,033 (617.2us) 3.02 21.92 1.6K (840 + 770) (1) 34,041 (1134.7us) 35,003 (583.4us) 35,988 (1199.6us) 37,836 (630.6us) 1.69 8.80 1.6K (844 + 770) (2) 35,304 (1176.8us) 36,675 (611.3us) 37,194 (1239.8us) 39,294 (654.9us) 3.02 21.92 1.3K (1032 + 258) (3) 35,,826 (1194.2us) 37,032 (617.2us) 37,419 (1247.3us) 39,453 (657.6us) 1.69 8.80 1.2K (984 + 258) (0) 169,006 (5.63ms) 173,611 (2.90ms) 175,394 (5.85ms) 183,358 (3.06ms) 3.04 25.01 3.8K (840 + 3K) (1) 170,795 (5.69ms) 175,404 (2.92ms) 178,863 (5.96ms) 187,161 (3.12ms) 1.67 12.31 3.8K (844 + 3K) (2) 175,446 (5.85ms) 181,703 (3.03ms) 183,248 (6.11ms) 192,530 (3.21ms) 3.04 25.01 2K (1032 + 1K) (3) 178,153 (5.94ms) 183,772 (3.06ms) 184,761 (6.16ms) 193,802 (3.23ms) 1.67 12.31 2K (984 + 1K) (0) 812,321 (27.08ms) 833,820 (13.90ms) 838,147 (27.94ms) 873,235 (14.55ms) 3.09 32.90 12.8K (840 + 12K) (1) 821,533 (27.38ms) 843,037 (14.05ms) 854,154 (28.47ms) 890,598 (14.84ms) 1.52 14.60 12.8K (844 + 12K) (2) 838,212 (27.94ms) 866,315 (14.44ms) 869,718 (28.99ms) 910,054 (15.17ms) 3.09 32.90 5K (1032 + 4K) (3) 851,232 (28.37ms) 876,816 (14.61ms) 877,959 (29.27ms) 917,367 (15.29ms) 1.52 14.60 5K (984 + 4K) *: Twiddle Factors Table. (0): Algorithmic optimized for speed. (1): Algorithmic optimized for accuracy. (2): Algorithmic optimized for size. (3): Algorithmic optimized for size and accuracy. 19 32076A–AVR32–11/07 Optimization Table 1.1.2 - Benchmark of 16-bit Radix-4 D.I.T. complex FFT: avr32-uc3 optimized 64-point 256-point 1024-point 4096-point Error T.F.T.* in SRAM (cycles) T.F.T.* in FLASH (cycles) wait-state wait-state Algorithm’s size in Amplitude Max. memory (bytes) average amplitude (code size + T.F.T.* (x10-5) (x10-5) size) 0 at 30MHz 1 at 60MHz 0 at 30MHz 1 at 60MHz (0) 2,611 (87.0us) 2,661 (44.4us) 2,753 (91.8us) 2,877 (48.0us) 3.00 10.04 894 (700 + 194) (1) 2,951 (98.4us) 2,999 (50.0us) 3,097 (103.2us) 3,265 (54.4us) 1.63 6.53 774 (580 + 194) (2) 2,833 (94.4us) 2,912 (48.5us) 3,027 (100.9us) 3,221 (53.7us) 2.93 10.04 710 (644 + 66) (3) 3,206 (106.9us) 3,311 (55.2us) 3,458 (115.3us) 3,683 (61.4us) 1.63 6.53 766 (700 + 66) (0) 13,661 (455.4us) 13,932 (232.2us) 14,306 (476.9us) 14,904 (248.4us) 2.78 13.86 1.4K (700 + 770) (1) 15,777 (525.9us) 16,033 (267.2us) 16,428 (547.6us) 17,242 (287.4us) 1.68 7.46 1.3K (580 + 770) (2) 14,651 (488.4us) 15,056 (250.9us) 15,527 (517.6us) 16,442 (274.0us) 2.87 15.82 902 (644 + 258) (3) 16,916 (563.9us) 17,469 (291.2us) 18,041 (601.4us) 19,152 (319.2us) 1.68 7.46 958 (700 + 258) (0) 67,671 (2.26ms) 69,027 (1.15ms) 70,355 (2.35ms) 73,059 (1.22ms) 2.84 19.33 3.7K (700 + 3K) (1) 79,195 (2.64ms) 80,475 (1.34ms) 81,887 (2.73ms) 85,507 (1.43ms) 1.69 10.23 3.6K (580 + 3K) (2) 71,765 (2.39ms) 73,680 (1.23ms) 75,403 (2.51ms) 79,423 (1.32ms) 2.86 19.33 1.6K (644 + 1K) (3) 83,906 (2.80ms) 86,635 (1.44ms) 88,560 (2.95ms) 93,629 (1.56ms) 1.69 10.23 1.7K (700 + 1K) (0) 322,897 (10.76ms) 329,370 (5.49ms) 333,764 (11.13ms) 345,678 (5.76ms) 2.80 23.69 12.7K (700 + 12K) (1) 381,269 (12.71ms) 387,413 (6.46ms) 392,146 (13.07ms) 407,788 (6.80ms) 1.58 11.84 12.6K (580 + 12K) (2) 339,439 (11.31ms) 348,176 (5.80ms) 354,159 (11.81ms) 371,396 (6.19ms) 2.81 23.69 4.6K (644 + 4K) (3) 400,304 (13.34ms) 413,273 (6.89ms) 419,111 (13.97ms) 441,578 (7.36ms) 1.58 11.84 4.7K (700 + 4K) *: Twiddle Factors Table. (0): Algorithmic optimized for speed. (1): Algorithmic optimized for accuracy. (2): Algorithmic optimized for size. (3): Algorithmic optimized for size and accuracy. 20 AVR32718 32076A–AVR32–11/07 AVR32718 Optimization Table 1.2.1 - Benchmark of 32-bit Radix-4 D.I.T. complex FFT: generic 64-point 256-point 1024-point Error T.F.T.* in SRAM (cycles) T.F.T.* in FLASH (cycles) wait-state wait-state Algorithm’s size in Amplitude Max. memory (bytes) average amplitude (code size + T.F.T.* (x10-9) (x10-9) size) 0 at 30MHz 1 at 60MHz 0 at 30MHz 1 at 60MHz (0) 13,206 (440.2us) 13,509 (225.2us) 13,580 (452.7us) 14,063 (234.4us) 0.70 5.70 (1) 15,323 (510.8us) 15,659 (261.0us) 15,627 (520.9us) 16,113 (268.6us) 0.60 5.70 2.4K (2112 + 392) (2) 13,622 (454.1us) 14,011 (233.5us) 13,938 (464.6us) 14,497 (241.6us) 0.70 5.70 2.0K (1952 + 136) (3) 15,714 (523.8us) 16,245 (270.8us) 16,058 (535.3us) 16,805 (280.1us) 0.60 5.70 2.3K (2248 + 136) (0) 74,297 (2.48ms) 75,940 (1.27ms) 75,992 (2.53ms) 78,445 (1.31ms) 0.50 4.80 3.3K (1816 + 1.5K) (1) 87,791 (2.93ms) 89,605 (1.49ms) 89,165 (2.97ms) 91,636 (1.53ms) 0.30 4.80 3.6K (2112 + 1.5K) (2) 76,136 (2.54ms) 78,160 (1.30ms) 77,537 (2.58ms) 80,329 (1.34ms) 0.50 4.80 2.4K (1952 + 520) (3) 89,543 (2.98ms) 92,446 (1.54ms) 91,058 (3.04ms) 94,978 (1.59ms) 0.30 4.80 2.7K (2248 + 520) (0) 383,212 (12.77ms) 391,715 (6.53ms) 390,260 (13.01ms) 402,123 (6.70ms) 0.50 8.80 7.8K (1816 + 6K) (1) 457,571 (15.25ms) 466,815 (7.78ms) 463,279 (15.44ms) 475,223 (7.92ms) 0.30 6.10 8.1K (2112 + 6K) (2) 390,794 (13.03ms) 400,869 (6.68ms) 396,576 (13.22ms) 409,841 (6.83ms) 0.50 8.80 3.9K (1952 + 2K) (3) 464,988 (15.50ms) 479,847 (8.00ms) 471,226 (15.71ms) 490,367 (8.17ms) 0.30 6.10 4.2K (2248 + 2K) 2.2K (1816 + 392) *: Twiddle Factors Table. (0): Algorithmic optimized for speed. (1): Algorithmic optimized for accuracy. (2): Algorithmic optimized for size. (3): Algorithmic optimized for size and accuracy. Table 2.1.1 - Benchmark of 16-bit Convolution: generic Error Execution time (cycles) Amplitude average Max. amplitude 2nd input signal 0 at 30MHz 1 at 60MHz (x10-5) (x10-5) 32-points 15,681 (522.7us) 16,344 (272.4us) 1.60 3.90 64-points 23,524 (784.1us) 24,508 (408.5us) 2.00 4.50 128-points 39,204 (1.31ms) 40,830 (680.5us) 1.90 4.50 wait-state 1st input signal 32-point 64-point 128-point 256-point 256-points 70,564 (2.35ms) 73,470 (1.22ms) 1.70 4.10 64-points 57,757 (1.93ms) 60,084 (1.00ms) 1.80 4.70 128-points 86,752 (2.89ms) 90,234 (1.50ms) 1.80 4.40 256-points 144,736 (4.82ms) 150,522 (2.51ms) 1.50 4.80 128-points 221,780 (7.39ms) 230,512 (3.84ms) 1.80 5.30 256-points 333,018 (11.10ms) 346,097 (5.77ms) 1.70 5.00 256-points 869,316 (28.98ms) 903,136 (15.05ms) 1.70 5.40 21 32076A–AVR32–11/07 Table 2.1.2 - Benchmark of 16-bit Convolution: avr32-uc3 optimized Error Execution time (cycles) wait-state 1st input signal 32-point 64-point 128-point 256-point Amplitude average Max. amplitude 2nd input signal 0 at 30MHz 1 at 60MHz (x10-5) (x10-5) 32-points 5,571 (185.7us) 6,127 (102.1us) 1.60 3.90 64-points 8,248 (274.9us) 9,070 (151.2us) 2.00 4.50 128-points 13,592 (453.1us) 14,944 (249.1us) 1.90 4.50 256-points 24,280 (809.3us) 26,688 (444.8us) 1.70 4.10 64-points 19,087 (636.2us) 20,947 (349.1us) 1.80 4.70 128-points 28,532 (951.1us) 31,308 (521.8us) 1.80 4.40 256-points 47,412 (1.58ms) 52,012 (866.9us) 1.50 4.80 128-points 70,694 (2.36ms) 77,471 (1.29ms) 1.80 5.30 256-points 105,966 (3.53ms) 116,099 (1.93ms) 1.70 5.00 256-points 272,214 (9.07ms) 298,031 (4.97ms) 1.70 5.40 Table 2.2.1 - Benchmark of 32-bit Convolution: generic Error Execution time (cycles) wait-state 1st input signal 32-point 64-point 128-point 256-point 22 Amplitude average Max. amplitude 2nd input signal 0 at 30MHz 1 at 60MHz (x10-9) (x10-9) 32-points 28,359 (945.3us) 29,182 (486.4us) 0.40 1.60 64-points 42,572 (1.42ms) 43,788 (729.8us) 0.40 2.10 128-points 70,988 (2.37ms) 72,990 (1.22ms) 0.30 1.90 256-points 127,820 (4.26ms) 131,390 (2.19ms) 0.40 1.70 64-points 109,179 (3.64ms) 112,334 (1.87ms) 0.40 1.70 128-points 163,968 (5.47ms) 168,678 (2.81ms) 0.50 1.60 256-points 273,536 (9.12ms) 281,350 (4.69ms) 0.60 2.70 128-points 429,026 (14.30ms) 441,458 (7.36ms) 0.40 2.30 256-points 644,074 (21.47ms) 662,677 (11.04ms) 0.40 2.00 256-points 1,701,554 (56.72ms) 1,750,962 (29.18ms) 0.50 3.10 AVR32718 32076A–AVR32–11/07 AVR32718 Table 2.2.2 - Benchmark of 32-bit Convolution: avr32-uc3 optimized Error Execution time (cycles) wait-state 1st input signal 32-point 64-point 128-point 256-point Amplitude average Max. amplitude 2nd input signal 0 at 30MHz 1 at 60MHz (x10-9) (x10-9) 32-points 13,361 (445.4us) 13,680 (228.0us) 0.60 2.30 64-points 19,958 (665.3us) 20,414 (340.2us) 0.50 2.10 128-points 33,142 (1.10ms) 33,872 (564.5us) 0.50 1.90 256-points 59,510 (1.98ms) 60,784 (1.01ms) 0.50 2.10 64-points 50,501 (1.68ms) 51,624 (860.4us) 0.50 1.70 128-points 75,722 (2.52ms) 77,376 (1.29ms) 0.60 2.40 256-points 126,154 (4.21ms) 128,864 (2.15ms) 0.70 2.70 128-points 196,972 (6.57ms) 201,244 (3.35ms) 0.50 2.30 256-points 295,540 (9.85ms) 301,887 (5.03ms) 0.60 2.30 256-points 778,684 (25.96ms) 795,388 (13.26ms) 0.60 3.10 Number of Taps Table 3.1.1 - Benchmark of 16-bit FIR Filter: generic 64-point 256-point 512-point 1024-point Error I.R.* in SRAM (cycles) I.R.* in FLASH (cycles) wait-state wait-state Max. amplitude (x10-5) 0 at 30MHz 1 at 60MHz 0 at 30MHz 1 at 60MHz Amplitude average (x10-5) 24 7,424 (247.5us) 7,682 (128.0us) 10,704 (356.8us) 12,352 (205.9us) 2.27 9.46 48 5,780 (192.7us) 5,996 (99.9us) 8,517 (283.9us) 9,868 (164.5us) 3.09 15.57 24 41,793 (1.39ms) 43,202 (720.0us) 60,433 (2.01ms) 69,760 (1.16ms) 2.22 9.46 48 70,101 (2.34ms) 72,620 (1.21ms) 103,750 (3.46ms) 120,268 (2.00ms) 5.66 27.20 72 90,921 (3.03ms) 94,262 (1.57ms) 135,691 (4.52ms) 157,528 (2.63ms) 11.25 52.65 100 105,127 (3.50ms) 108,903 (1.82ms) 156,466 (5.22ms) 185,989 (3.10ms) 14.19 63.71 24 87,617 (2.92ms) 90,562 (1.51ms) 126,737 (4.22ms) 146,304 (2.44ms) 2.23 9.46 48 155,861 (5.20ms) 161,452 (2.69ms) 230,726 (7.69ms) 267,468 (4.46ms) 5.78 27.20 72 216,617 (7.22ms) 224,566 (3.74ms) 323,339 (10.78ms) 375,384 (6.26ms) 10.79 52.65 100 276,391 (9.21ms) 286,311 (4.77ms) 411,442 (13.71ms) 489,093 (8.15ms) 14.05 63.71 24 179,265 (5.98ms) 185,282 (3.09ms) 259,345 (8.64ms) 299,392 (4.99ms) 2.21 9.46 48 327,381 (10.91ms) 339,116 (5.65ms) 484,678 (16.16ms) 561,868 (9.36ms) 5.85 27.20 72 468,009 (15.60ms) 485,174 (8.09ms) 698,635 (23.29ms) 811,096 (13.52ms) 10.69 52.65 921,394 (30.71ms) 1,095,301 (18.26ms) 13.99 63.71 100 618,919 (20.63ms) 641,127 (10.69ms) *: Impulse Response. 23 32076A–AVR32–11/07 Number of Taps Table 3.1.2 - Benchmark of 16-bit FIR Filter: avr32-uc3 optimized 64-point 256-point 512-point 1024-point Error I.R.* in SRAM (cycles) wait-state 0 at 30MHz 1 at 60MHz I.R.* in FLASH (cycles) wait-state 0 at 30MHz 1 at 60MHz Amplitude average (x10-5) Max. amplitude (x10-5) 24 2,439 (81.3us) 2,657 (44.3us) 2,703 (90.1us) 3,054 (50.9us) 2.27 9.46 48 2,115 (70.5us) 2,309 (38.5us) 2,355 (78.5us) 2,670 (44.5us) 3.09 15.57 24 12,712 (423.7us) 13,841 (230.7us) 14,128 (470.9us) 15,966 (266.1us) 2.22 9.46 48 21,604 (720.1us) 23,573 (392.9us) 24,148 (804.9us) 27,390 (456.5us) 5.66 27.20 72 28,192 (939.7us) 30,785 (513.1us) 31,576 (1.05ms) 35,862 (597.7us) 11.25 52.65 100 32,966 (1.10ms) 36,014 (600.2us) 36,966 (1.23ms) 42,015 (700.3us) 14.19 63.71 24 26,408 (880.3us) 28,753 (479.2us) 29,360 (978.7us) 33,182 (553.0us) 2.23 9.46 48 47,588 (1.59ms) 51,925 (865.4us) 53,204 (1.77ms) 60,350 (1.01ms) 5.78 27.20 72 66,464 (2.22ms) 72,577 (1.21ms) 74,456 (2.48ms) 84,566 (1.41ms) 10.79 52.65 100 85,574 (2.85ms) 93,486 (1.56ms) 95,974 (3.20ms) 109,087 (1.82ms) 14.05 63.71 24 53,800 (1.79ms) 58,577 (976.3us) 59,824 (1.99ms) 67,614 (1.13ms) 2.21 9.46 48 99,556 (3.32ms) 108,629 (1.81ms) 111,316 (3.71ms) 126,270 (2.10ms) 5.85 27.20 72 143,008 (4.77ms) 156,161 (2.60ms) 160,216 (5.34ms) 181,974 (3.03ms) 10.69 52.65 100 190,790 (6.36ms) 208,430 (3.47ms) 213,990 (7.13ms) 243,231 (4.05ms) 13.99 63.71 *: Impulse Response. Number of Taps Table 3.2.1 - Benchmark of 32-bit FIR Filter: generic 64-point 256-point 512-point 1024-point Error I.R.* in SRAM (cycles) I.R.* in FLASH (cycles) wait-state wait-state Amplitude average (x10-9) Max. amplitude (x10-9) 0 at 30MHz 1 at 60MHz 0 at 30MHz 1 at 60MHz 24 13,984 (466.1us) 14,365 (239.4us) 16,608 (553.6us) 18,624 (310.4us) 2.10 12.40 48 11,101 (370.0us) 11,419 (190.3us) 13,311 (443.7us) 14,865 (247.8us) 2.50 14.40 17.40 24 79,073 (2.64ms) 81,181 (1.35ms) 93,985 (3.13ms) 105,408 (1.76ms) 2.30 48 135,518 (4.52ms) 139,291 (2.32ms) 162,688 (5.42ms) 181,713 (3.03ms) 2.70 16.60 72 177,131 (5.90ms) 182,137 (3.04ms) 213,391 (7.11ms) 238,002 (3.97ms) 4.60 31.50 100 187,238 (6.24ms) 194,313 (3.24ms) 241,717 (8.06ms) 264,808 (4.41ms) 4.60 31.90 24 165,857 (5.53ms) 170,269 (2.84ms) 197,153 (6.57ms) 221,120 (3.69ms) 2.60 23.10 48 301,406 (10.05ms) 309,787 (5.16ms) 361,856 (12.06ms) 404,177 (6.74ms) 3.20 23.90 72 422,123 (14.07ms) 434,041 (7.23ms) 508,559 (16.95ms) 567,218 (9.45ms) 5.10 35.30 100 492,390 (16.41ms) 510,985 (8.52ms) 635,701 (21.19ms) 696,424 (11.61ms) 5.10 31.90 24 339,425 (11.31ms) 348,445 (5.81ms) 403,489 (13.45ms) 452,544 (7.54ms) 3.70 28.40 48 633,182 (21.11ms) 650,779 (10.85ms) 760,192 (25.34ms) 849,105 (14.15ms) 4.60 29.30 72 912,107 (30.40ms) 937,849 (15.63ms) 1,098,895 (36.63ms) 1,225,650 (20.43ms) 6.50 35.30 100 1,102,694 (36.76ms) 1,144,329 (19.07ms) 1,423,669 (47.46ms) 1,559,656 (26.00ms) 6.40 33.30 *: Impulse Response. 24 AVR32718 32076A–AVR32–11/07 AVR32718 Number of Taps Table 3.2.2 - Benchmark of 32-bit FIR Filter: avr32-uc3 optimized 64-point 256-point 512-point 1024-point Error I.R.* in SRAM (cycles) I.R.* in FLASH (cycles) wait-state wait-state Amplitude average Max. amplitude (x10-9) (x10-9) 0 at 30MHz 1 at 60MHz 0 at 30MHz 1 at 60MHz 24 6,479 (216.0us) 48 5,132 (171.1us) 6,613 (110.2us) 9,141 (304.7us) 10,918 (182.0us) 2.10 12.40 5,245 (87.4us) 7,305 (243.5us) 8,866 (147.8us) 2.50 14.40 24 36,432 (1.21ms) 37,140 (619.0us) 51,574 (1.72ms) 61,605 (1.03ms) 2.30 12.40 48 62,157 (2.07ms) 63,420 (1.06ms) 88,906 (2.96ms) 107,937 (1.80ms) 2.70 16.60 72 81,114 (2.70ms) 82,788 (1.38ms) 116,446 (3.88ms) 142,173 (2.37ms) 4.60 31.50 100 94,441 (3.15ms) 96,490 (1.61ms) 140,910 (4.70ms) 166,671 (2.78ms) 4.60 31.90 24 76,368 (2.55ms) 77,844 (1.30ms) 108,150 (3.61ms) 129,189 (2.15ms) 2.60 23.10 48 138,189 (4.61ms) 140,988 (2.35ms) 197,706 (6.59ms) 240,033 (4.00ms) 3.20 23.90 35.30 72 193,242 (6.44ms) 197,220 (3.29ms) 277,470 (9.25ms) 338,781 (5.65ms) 5.10 100 248,297 (8.28ms) 253,674 (4.23ms) 370,542 (12.35ms) 438,287 (7.30ms) 5.10 31.90 24 156,240 (5.21ms) 159,252 (2.65ms) 221,302 (7.38ms) 264,357 (4.41ms) 3.70 28.40 48 290,253 (9.68ms) 296,124 (4.94ms) 415,306 (13.84ms) 504,225 (8.40ms) 4.50 29.30 72 417,498 (13.92ms) 426,084 (7.10ms) 599,518 (19.99ms) 731,997 (12.20ms) 6.50 35.30 100 556,009 (18.53ms) 568,042 (9.47ms) 829,806 (27.66ms) 981,519 (16.36ms) 6.40 33.30 *: Impulse Response. 25 32076A–AVR32–11/07 Order of the filter Table 4.1.1 - Benchmark of 16-bit FIR Filter: generic 72-point 256-point 512-point 1024-point Error Execution time (cycles) wait-state 0 at 30MHz 1 at 60MHz Amplitude average Max. amplitude (x10-5) (x10-5) 2 5,294 (176.5us) 5,717 (95.3us) 1.50 4.00 7 11,469 (382.3us) 12,802 (213.4us) 1.40 4.30 12 16,319 (544.0us) 18,387 (306.5us) 2.50 7.60 2 19,096 (636.5us) 20,621 (343.7us) 1.50 4.00 4.30 7 44,591 (1.49ms) 49,786 (829.8us) 1.40 12 68,761 (2.29ms) 77,451 (1.29ms) 3.50 8.90 2 38,296 (1.28ms) 41,357 (689.3us) 1.50 4.00 7 90,671 (3.02ms) 101,242 (1.69ms) 1.40 4.30 12 141,721 (4.72ms) 159,627 (2.66ms) 3.70 8.90 2 76,696 (2.56ms) 82,829 (1.38ms) 1.50 4.00 7 182,831 (6.09ms) 204,154 (3.40ms) 1.40 4.30 12 287,641 (9.59ms) 323,979 (5.40ms) 3.90 8.90 Order of the filter Table 4.1.2 - Benchmark of 16-bit FIR Filter: avr32-uc3 optimized 72-point 256-point 512-point 1024-point 26 Error Execution time (cycles) wait-state 0 at 30MHz 1 at 60MHz Amplitude average Max. amplitude (x10-5) (x10-5) 4.00 2 2,725 (90.8us) 2,871 (47.9us) 1.20 7 4,332 (144.4us) 4,682 (78.0us) 1.90 6.90 12 6,089 (203.0us) 6,534 (108.9us) 3.30 8.40 2 9,167 (305.6us) 9,633 (160.6us) 1.10 4.00 7 15,006 (500.2us) 16,228 (270.5us) 1.70 6.90 12.40 12 22,283 (742.8us) 23,876 (397.9us) 5.10 2 18,127 (604.2us) 19,041 (317.4us) 1.10 4.00 7 29,854 (995.1us) 32,292 (538.2us) 1.70 6.90 12 44,811 (1.49ms) 48,004 (800.1us) 5.60 12.40 2 36,047 (1.20ms) 37,857 (631.0us) 1.10 4.00 7 59,550 (1.99ms) 64,420 (1.07ms) 1.70 6.90 12 89,867 (3.00ms) 96,260 (1.60ms) 5.80 12.40 AVR32718 32076A–AVR32–11/07 AVR32718 Order of the filter Table 4.2.2 - Benchmark of 32-bit FIR Filter: generic 72-point 256-point 512-point 1024-point Error Execution time (cycles) wait-state 0 at 30MHz 1 at 60MHz Amplitude average Max. amplitude (x10-9) (x10-9) 2 7,582 (252.7us) 8,072 (134.5us) 1.00 3.80 7 14,517 (483.9us) 15,922 (265.4us) 1.50 7.00 12 19,952 (665.1us) 22,097 (368.3us) 2.60 10.30 2 27,456 (915.2us) 29,232 (487.2us) 0.80 3.80 7 56,471 (1.88ms) 61,922 (1.03ms) 1.50 7.00 12 83,986 (2.80ms) 92,937 (1.55ms) 1.50 10.30 2 55,104 (1.84ms) 58,672 (977.9us) 0.80 3.80 7 114,839 (3.83ms) 125,922 (2.10ms) 1.50 7.00 12 173,074 (5.77ms) 191,497 (3.19ms) 1.40 10.30 2 110,400 (3.68ms) 117,552 (1.96ms) 0.80 3.80 7 231,575 (7.72ms) 253,922 (4.23ms) 1.50 7.00 12 351,250 (11.71ms) 388,617 (6.48ms) 1.30 10.30 Order of the filter Table 4.2.1 - Benchmark of 32-bit FIR Filter: avr32-uc3 optimized 72-point 256-point 512-point 1024-point Error Execution time (cycles) wait-state 0 at 30MHz Amplitude average Max. amplitude 1 at 60MHz (x10-9) (x10-9) 3.80 2 5,297 (176.6us) 5,662 (94.4us) 1.00 7 8,859 (295.3us) 9,233 (153.9us) 1.50 7.00 12 11,670 (389.0us) 12,239 (204.0us) 3.00 10.30 2 18,731 (624.4us) 20,014 (333.6us) 0.80 3.80 7 33,333 (1.11ms) 34,625 (577.1us) 1.50 7.00 12 46,816 (1.56ms) 48,855 (814.3us) 1.80 10.30 2 37,419 (1.25ms) 39,982 (666.4us) 0.80 3.80 7 67,381 (2.25ms) 69,953 (1.17ms) 1.60 7.00 12 95,712 (3.19ms) 99,799 (1.66ms) 1.70 10.30 2 74,795 (2.49ms) 79,918 (1.33ms) 0.80 3.80 7 135,477 (4.52ms) 140,609 (2.34ms) 1.60 7.00 12 193,504 (6.45ms) 201,687 (3.36ms) 1.60 10.30 27 32076A–AVR32–11/07 Headquarters International Atmel Corporation 2325 Orchard Parkway San Jose, CA 95131 USA Tel: 1(408) 441-0311 Fax: 1(408) 487-2600 Atmel Asia Room 1219 Chinachem Golden Plaza 77 Mody Road Tsimshatsui East Kowloon Hong Kong Tel: (852) 2721-9778 Fax: (852) 2722-1369 Atmel Europe Le Krebs 8, Rue Jean-Pierre Timbaud BP 309 78054 Saint-Quentin-enYvelines Cedex France Tel: (33) 1-30-60-70-00 Fax: (33) 1-30-60-71-11 Atmel Japan 9F, Tonetsu Shinkawa Bldg. 1-24-8 Shinkawa Chuo-ku, Tokyo 104-0033 Japan Tel: (81) 3-3523-3551 Fax: (81) 3-3523-7581 Technical Support Enter Product Line E-mail Sales Contact www.atmel.com/contacts Product Contact Web Site www.atmel.com Literature Requests www.atmel.com/literature Disclaimer: The information in this document is provided in connection with Atmel products. No license, express or implied, by estoppel or otherwise, to any intellectual property right is granted by this document or in connection with the sale of Atmel products. EXCEPT AS SET FORTH IN ATMEL’S TERMS AND CONDITIONS OF SALE LOCATED ON ATMEL’S WEB SITE, ATMEL ASSUMES NO LIABILITY WHATSOEVER AND DISCLAIMS ANY EXPRESS, IMPLIED OR STATUTORY WARRANTY RELATING TO ITS PRODUCTS INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. IN NO EVENT SHALL ATMEL BE LIABLE FOR ANY DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE, SPECIAL OR INCIDENTAL DAMAGES (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION, OR LOSS OF INFORMATION) ARISING OUT OF THE USE OR INABILITY TO USE THIS DOCUMENT, EVEN IF ATMEL HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Atmel makes no representations or warranties with respect to the accuracy or completeness of the contents of this document and reserves the right to make changes to specifications and product descriptions at any time without notice. Atmel does not make any commitment to update the information contained herein. Unless specifically provided otherwise, Atmel products are not suitable for, and shall not be used in, automotive applications. Atmel’s products are not intended, authorized, or warranted for use as components in applications intended to support or sustain life. © 2007 Atmel Corporation. All rights reserved. Atmel ®, logo and combinations thereof, AVR ® and others are registered trademarks or trademarks of Atmel Corporation or its subsidiaries. Other terms and product names may be trademarks of others. 32076A–AVR32–11/07