Digital Signal Processing Reference Guide Reference Guide December 2014 Digital Signal Processing Reference Guide Reference Guide Revision History Date Revision Change 16 December 2014 Revision 2 Second Release 05 June 2014 Revision 1 First Release Confidentiality Status This is a non-confidential document. Digital Signal Processing Reference Guide Table of Contents DSP Reference Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Advantages of using the SmartFusion2/IGLOO2 Devices for DSP Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 SmartFusion2/IGLOO2 Mathblock Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Key Features of Mathblock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathblocks Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multiplier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adder/Subtractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I/O, Control Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 7 9 9 Design Methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 RTL Based Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Model-Based Design (Synphony ME Compiler) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 C-Based Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 DSP Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Top-level Directory Structure of the Design Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advanced Math Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Filter Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transform Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 20 47 67 Appendix 1 – Design Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 List of Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Product Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Customer Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Customer Technical Support Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Website . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contacting the Customer Technical Support Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 73 73 73 73 Email . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 My Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Outside the U.S. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 ITAR Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Revision 2 3 DSP Reference Guide Introduction SmartFusion®2 system-on-chip (SoC) field programmable gate array (FPGA)/IGLOO®2 FPGA devices have unique architectural features to address various digital signal processing (DSP) based applications for high performance systems. These features include the embedded mathblocks for high efficient arithmetic computations, large static RAM (LSRAM) for bulk data storage, and micro SRAM (uSRAM) for small data storage requirements such as coefficients. In addition to these features, flash based technology and low power options built in the Microsemi® SoC FPGAs offer unique advantages in low power DSP based systems. The SmartFusion2/IGLOO2 mathblocks facilitate the following different signal processing applications: • Finite Impulse Response (FIR) filters • Infinite Impulse Response (IIR) filters • Fast Fourier Transforms (FFT) • Inverse Fast Fourier Transforms (IFFT) • Discrete Cosine Transforms (DCT) The SmartFusion2/IGLOO2 mathblocks have a built-in multipliers and adders that minimize the external logic required to implement multiplication, multiply-add, and multiply-accumulate (MACC) functions resulting in efficient resource usage and improved performance for DSP applications. This reference guide describes the architectural features of the embedded mathblock, different DSP design methodologies that are supported, and a few widely used DSP applications with performance details. Advantages of using the SmartFusion2/IGLOO2 Devices for DSP Applications This section describes the advantages of using SmartFusion2/IGLOO2 devices for DSP applications. It has the following subsections: • Fabric Performance • Low Power FPGA • Embedded Micro Controller Fabric Performance • SmartFusion2/IGLOO2 has double the fabric performance compared to the previous generation of Microsemi FPGAs. • Built-in mathblocks that support DSP applications at or up to 350 MHz • Built-in fabric clock conditioning circuits (FCCC) that generate clock frequencies of up to 400 MHz. • Fabric memories, LSRAM and uSRAM operating at 400 MHz • 16x 5 Gbps serializer/deserializer (SERDES), PCIe, and XAUI/XGXS + Native SERDES • Densities up to 150 K look-up table (LUT), 5 Mbit SRAM, and 4 Mbit eNVM Low Power FPGA • 10 mW static power during operation Revision 2 4 Digital Signal Processing Reference Guide Embedded Micro Controller • SmarFusion2 SoC FPGA has a 166 MHz ARM® Cortex®-M3 processor with on-chip embedded SRAM (eSRAM) and embedded nonvolatile memory (eNVM) for processor based DSP applications. • The microcontroller subsystem (MSS) includes peripherals for - Control area network (CAN), Tri-Speed Ethernet, and universal serial bus (USB) SmartFusion2/IGLOO2 Mathblock Architecture The SmartFusion2/IGLOO2 mathblock architecture is optimized to implement various common DSP functions with maximum performance and minimum logic resource utilization. The dedicated routing region around the mathblock and the feedback paths provided in each mathblock maintain high performance by eliminating routing congestion. Key Features of Mathblock • High-performance, power optimized multiplications operations • Supports 18 × 18 signed multiplication natively • Supports 17 × 17 unsigned multiplications • Supports dot-product: The multiplier computes (A[8:0] × B[17:9] + A[17:9] × B[8:0]) × 29. • Built-in addition, subtraction, and accumulation units to combine multiplication results efficiently • Adder support: (A × B) + C or (A × B) + D or (A × B) + C + D • Independent registered third input C with data width of 44 bits • Supports both registered and unregistered inputs and outputs (I/Os) • Supports signed and unsigned operations • Internal cascade signals (44-bit cascade input (CDIN) and cascade output (CDOUT)) enable cascading of the mathblocks to support larger accumulator/adder/subtractor without extra logic. • Support loopback capability • Supports up to 350 MHz operation • Clock-gated input and output registers for power optimizations • Capability to extend the width of adder/accumulator by implementing extra address in the FPGA fabric or using mathblocks Mathblocks Resources Table 1 lists the number of mathblocks available in the SmartFusion2/IGLOO2 devices. Table 1 • Mathblocks Resources SmartFusion2 Device IGLOO2 Device Number of Mathblock Number of Mathblocks Rows Per Row Total Number of Mathblocks M2S005 M2GL005 1 11 11 M2S010 M2GL010 2 11 22 M2S025 M2GL025 2 17 34 M2S050 M2GL050 3 24 72 M2S090 M2GL090 3 28 84 M2S150 M2GL150 6 40 240 Revision 2 5 DSP Reference Guide Figure 1 shows the functional diagram of the mathblock. 68% FQWOUHJ 68%B$/B1 68%B%<3$66 68%B6/B1 68%B6'B1 &/.>@ $>@ '273 68%B$' 68%B(1 LQUHJ $B$567B1>@ $B%<3$66>@ 3B$567B1>@ 3B6567B1>@ $B6567B1>@ $B(1>@ 3B(1>@ &/.>@ %B%<3$66>@ LQUHJ %B$567B1>@ &/.>@ &B$567B1>@ 3B$567B1>@ & %B(1>@ 3B6567B1>@ 3B(1>@ 3B%<3$66>@ &$55<,1 LQUHJ &'287>@ %B6567B1>@ &>@ &$55<,1 29)/B&$55<287 3B%<3$66>@ &/.>@ %>@ FQWOUHJ &B%<3$66>@ ' RXWUHJ 3>@ &/.>@ &B6567B1>@ &B(1>@ &/.>@ $56+)7 FQWOUHJ $56+)7B$/B1 $56+)7B6/B1 $56+)7B(1 $56+)7B$' $56+)7B6'B1 29)/B&$55<287B6(/ !! $56+)7B%<3$66 &/.>@ &'6(/ &'6(/B$/B1 FQWOUHJ &'6(/B6/B1 &'6(/B(1 &/.>@ &'6(/B$' &'6(/B6'B1 &'6(/B%<3$66 )'%.6(/ )'%.6(/B$/B1 )'%.6(/B6/B1 )'%.6(/B(1 FQWOUHJ )'%.6(/B$' )'%.6(/B6'B1 )'%.6(/B%<3$66 &/.>@ &',1>@ Figure 1 • Functional Diagram of the SmartFusion2/IGLOO2 Mathblock Mathblocks can be accessed through the FPGA routing architecture and cascaded in a chain, starting from the left-most block to the right-most block. Each mathblock consists of: 6 • Multiplier • Adder/Subtractor • I/O, Control Registers Revision 2 Digital Signal Processing Reference Guide Multiplier The SmartFusion2/IGLOO2 mathblock can be used as a multiplier, which accepts two 18-bit inputs (A and B) and generates a 36-bit output. The mathblock multiplier can be configured in two different operating modes: • Normal Mode • Dot Product (DOTP) Mode Normal Mode In Normal mode, the mathblock implements a single 18 ×18 signed multiplier. The mathblock accepts A [17:0] and B [17:0]inputs and generates A*B with a 36-bit wide result. Figure 2 shows the functional block diagram of the mathblock in Normal mode. Normal Mode A[17:0] SUB 18 36 B[17:0] 18 44 P[43:0] CARRYIN C[43:0] 44 D[43:0] 44 Figure 2 • Functional Block Diagram of the Mathblock in Normal Mode Dot Product (DOTP) Mode DOTP mode has two independent 9-bit × 9-bit multipliers with adder and the product sum is stored in upper 36 bits of 44-bit register. In Dot Product (DOTP) mode, the mathblock implements the following equation: DOTP result = (A [8:0] × B [17:9] + A[17:9] × B[8:0]) × 29 EQ 1 DOTP mode can be used to implement 9 × 9 complex multiplications. Revision 2 7 DSP Reference Guide Figure 3 shows the functional block diagram of the mathblock in DOTP mode. SUB DOT Product Mode A[17:9] B[8:0] 36 B[17:9] A[8:0] 44 P[43:0] CARRYIN 44 C[43:0] D[43:0] 44 Figure 3 • Functional Block Diagram of the Mathblock in DOTP Mode Math Functions with DOTP When DOTP is enabled, several mathematical functions can be implemented using a single mathblock. Some of them are listed in Table 2 Table 2 • Math Functions with DOTP S.No Conditions Implemented Equations 1 P = A[8:0] = B[17:9]; M = A[17:9]; N = B[8:0] Y = P² + M×N 2 P = A[8:0] = B[17:9]; Q = A[17:9] = B[8:0] Y = P² + Q² 3 A[8:0] = B[17:9] = 1; B = A[17:9]; Q = B[8:0] Y = 1 + Q² 4 A[8:0] = B[17:9] = 1; P = A[17:9]; Q = B[8:0] Y = 1 + P×Q 5 P = A[8:0] = A[17:9]; Q = B[17:9] = B[8:0] Y = P×Q + P×Q = 2×P×Q The 9-bit×9-bit multipliers are extensively used in low precision video processing applications such as color space converters (YCbCr to RGB, YUV to RGB, etc), chroma resampler NTSC, PAL, etc,. In image processing, the operations involving 8-bit RGB such as 3×3, 5×5, 7×7 matrix multiplications, image enhancement techniques, scaling, resizing etc., 9-bit×9-bit multipliers are used. The SmartFusion2/IGLOO2 devices address these applications by using the mathblock in DOTP mode. DOTP Mode Usage Recommendations When designing with DOTP multiplier, the following recommendations must be used to achieve better performance: • To perform Y = A×B + C×D equation, instantiate arithmetic IP from Libero catalog cores with DOTP enabled for 9×9 multiplications. This avoids inferring two 18×18 multipliers. • Register the I/Os, when using arithmetic IP cores (mathblock). • The registered I/Os must use the same clock. Use the cascaded feature to connect the multiple mathblocks. This is achieved by connecting the CDOUT of one mathblock to the CDIN of another mathblock. Refer to "DSP Applications" section for more information on design examples. 8 Revision 2 Digital Signal Processing Reference Guide Adder/Subtractor The adder sums the output from the multiplier, C input, CARRYIN, or D input. The final output (P) of the adder is ((A [17:0] × B [17:0]) + C [43:0] + D [43:0] + CARRYIN). The mathblock can be configured as a 2-input or 3-input adder. • As a 2-input adder, the mathblock computes A × B + C or A × B + D. • As a 3-Input adder, the mathblock computes A × B + C + D. If the adder is configured as a subtractor, the adder output is ((C [43:0] + D [43:0] + CARRYIN) – (A[17:0] × B[17:0])). I/O, Control Registers Mathblocks have built-in registers on data inputs (A, B, C), data output (P), and control signals. If required, these registers can be bypassed. All the registers in the mathblock have clock gating capability to reduce the power consumption. Mathblocks do not have a pipeline register at the cascade input (CDIN). So, pipeline registers can be added from the fabric when multiple mathblocks are cascaded to implement higher bit-width multiplications. A Input Register, B Input Register, and C Output Register A and B are the input registers with 18-bit data width and P is output register with 44-bit data width of the mathblock. C Input The C input port allows the formation of many 3-input mathematical functions, such as 3-input addition or 2-input multiplication with an addition. The CARRYIN signal is the carry input of the adder or accumulator. The C input can also be used as a dynamic input achieving the following functionalities: • Wrapping-around the cascade chain of mathblocks from one row to the next row through the fabric • Rounding of multiplication outputs • Trimming of lower order bits of the final sum or partial sum or the product Rounding Rounding can be computed by adding a fixed term and a variable term to the input value to be rounded and then truncating. The fixed term can be fed from the C-Input of the mathblock and the value depends on the number of decimal points required after rounding. The variable term is always a single bit in the least-significant position whose value may be determined from the input value based on the type of rounding. Types of rounding are: • Round to the adjacent even integer: The variable term is determined from the 20 bit of the input value. • Round towards zero: The variable term is determined from the sign bit of the input value. For example, 1.5 rounds to 1 and -1.5 rounds to -1. Table 3 provides examples for 6-bit values including three fraction bits 000.001. Table 3 • Rounding Examples Input Value Round To Even Round Toward Zero Decimal Fixed Binary Term Variable C-Input Term 2.5 010.100 0.011 000.000 010.111 010 2 000.000 010.111 010 2 1.5 001.100 0.011 000.001 010.000 010 2 000.000 001.111 001 1 -1.5 110.100 0.011 000.000 110.111 110 -2 000.001 111.000 111 -1 -2.5 101.100 0.011 000.001 110.000 110 -2 000.001 110.000 110 -2 Sum Truncated Sum Decimal Variable Term Revision 2 Sum Truncated Decimal Sum 9 DSP Reference Guide A[17:0] B[17:0] 18 18 44 Fixed Term 18 C Input Variable Term 1 CARRYIN P[43:0] Figure 4 • Rounding using C-Input and CARRYIN Trimming Trimming of the Final Sum: Applications like IIR and FFT often requires the rounding and trimming of the final result (for example, last output of a cascade chain or the final value read from an accumulator). The addition of the rounding terms can be done as shown in Figure 5 and final results can be trimmed in the fabric. Variable Term A B A B Fixed Term (Cin) 1 P Figure 5 • Rounding and Trimming of the Final Sum Note: The Fixed Term is connected to Cin of first mathblock in a cascaded chain. The Variable Term is connected to the multiplier input (A or B) of the last mathblock in a cascaded chain as shown in Figure 5. Trimming of Grouped Sums: When computing very large dot products (for example, a large, fully-enumerated FIR) it is good to avoid overflow by breaking the sum into a few groups, trimming the sum for each group, and only then combining the groups' sums into a final result. The rounding of each group's sum can be done as shown in Figure 5. The trimming of each group's sum and summation of the final result can be done in the fabric. Trimming can be done between the output of each cascade and the final fabric adder. Trimming of Products: Figure 6 shows the implementation of rounding all products towards zero and then trimming the least significant m bits of the product. As long as there are no additive terms other than the products, it is possible to equivalently trim the partial sums instead of the products. Round towards zero can be done using sign bit of the product (A*B) from the sign bits of the incoming factors A and B using an EXOR. 10 R e visio n 2 Digital Signal Processing Reference Guide A A[17] A B B[17] B C[m-1] C[m-1] C P[43:m] C[43:m] P Figure 6 • Rounding and Trimming of Products Cascaded Input, Output, and Selection Higher level DSP functions are supported by cascading individual mathblocks in a row. The two data signals, CDIN [43:0] and CDOUT [43:0], provide the cascading capability with a cascade select input (CDSEL). Table 4 shows the selection of CDSEL for propagating CDIN to the D input of the adder. To cascade mathblocks, the CDOUT of one block must feed the CDIN of another block. CDOUT to CDIN is a hardwired connection between the blocks within a row. Two different rows can be cascaded using the fabric routing between the two rows. Extra pipeline registers may be needed to compensate for the extra delays added due to the fabric routing, which in turn will increase the latency of the chain. The ability to cascade mathblocks is useful in filter designs. For example, an finite impulse response (FIR) filter design can use cascading inputs (CDINs) to arrange a series of input data samples and cascading outputs (CDOUTs) to arrange a series of partial output results. The ability to cascade provides a high-performance and low power implementation of DSP filter functions because the general routing in the fabric is not used. For more details refer to "DSP Applications" section on page 19 section. Overflow Output Each mathblock has an overflow signal, OVFL_CARRYOUT. This signal indicates any overflow from the additional operation performed by the adder. This signal is also used to extend the adder data widths from the existing 44 bits using fabric. The overflow signal is also used for the implementation of saturation capabilities. Saturation refers to catching an overflow condition and replacing the output with either the maximum (most positive) or minimum (most negative) value that can be represented. In the IGLOO2 mathblocks, this capability is implemented using the adder's output sign bit (MSB [43] bit of the P output) and the overflow signal. Shift Input For multi-precision arithmetic, mathblocks provide a right-wire-shift by 17, which is controlled by the ARSHFT17 input. Thus, a partial product from one mathblock can be shifted to the right and added to the next partial product computed in an adjacent mathblock. Using this technique, mathblocks can be used to build bigger multipliers. Revision 2 11 DSP Reference Guide Feedback Select Input For accumulation operations, the mathblock output needs to loopback to the D input of the adder block. Selection of the D input is controlled by the feedback select (FDBKSEL) input. Table 4 shows the selection of FDBKSEL for loopback. Table 4 • Truth Table for Propagating Operand D of the Adder/Accumulator CDSEL FDBKSEL ARSHFT17 Operand D 0 0 0 0 0 0 1 0 1 X 0 CDIN[43:0] 1 X 1 {{17{CDIN[43]}}, CDIN[43:18]} 0 1 0 P[43:0] 0 1 1 {{17{P[43]}}, P[43:18]} Design Methodologies Following are the different design methodologies for developing the DSP based applications using Microsemi SoCs or FPGAs: • RTL Based Design • Model-Based Design (Synphony ME Compiler) • C-Based Design RTL Based Design Register transfer level (RTL) based design is a conventional approach for developing the FPGA based designs. In the DSP applications, mathblock can be used by inferring through an RTL coding style or by the mathblock primitives available in the Libero® System-on-Chip (SoC) Smart cores. The Synplify Pro® automatically infers mathblock, if the design contains multiply, multiply-accumulate, and multiply-add/subtract operators. The mathblock primitive available in the Libero SoC IP catalog is called MACC. The mathblock primitive can be used in the designs by SmartDesign for schematic-based design entry or by directly instantiating the mathblock wrapper in a hardware description language (HDL) file as a component. For more information on VHDL/Verilog coding styles for inferring mathblocks, refer to the Inferring Microsemi SmartFusion2 MACC Blocks. Smart Cores for Mathblock Configuration The SmartFuison2/IGLOO2 devices have embedded hard mathblocks for simple to complex arithmetic functions used for the DSP applications. The Libero tool has Arithmetic Catalog which includes the following cores to configure the embedded hard mathblock. • Hard Multiplier Accumulator • Hard Multiplier AddSub • Hard Multiplier Signed Hard Multiplier Accumulator The hard multiplier accumulator for the SmartFusion2/IGLOO2 devices supports Normal (Figure 7 on page 13) and Dot Product (Figure 8) mode multiplications. In Figure 7 on page 13, the control registers are in blue color and the data registers are in brown color. 12 R e visio n 2 Digital Signal Processing Reference Guide ^h WŶсWŶͲϭнZZz/EннͬͲ;ϬΎϬͿ Ϭϭϳ͗Ϭ y Ϭϭϳ͗Ϭ ZZzKhd KsZ&>Kt Wϰϯ͗Ϭ нͬͲ ϰϯ͗Ϭ Khd ZZz/E Z^,/&d хх Figure 7 • Hard Multiplier Accumulator - Normal Mode $>@ ; %>@ 3Q 3Q&$55<,1&$%$% $>@ ; %>@ &$55<287 29(5)/2: 3>@ &>@ &'287 &$55<,1 ' Figure 8 • Hard Multiplier Accumulator - DOTP Mode Revision 2 13 DSP Reference Guide Smart Cores Key Features The hard multiplier accumulator supports two operating modes: Normal and Dot Product. • A structural netlist is generated in either Verilog or VHDL • Individual inputs and outputs can be optionally registered with: – A common rising-edge clock – The independent active-low asynchronous and synchronous clear controls – The independent active-high enable controls • An additional cascade output CDOUT can be enabled. This is the sign-extended 44-bit copy of output P • An additional Carry In input can be enabled • An additional Carry Out or Overflow output can be enabled • Normal mode features: • – Configurable operand widths for A0 and B0 between 2 and 18 – Configurable operand width for C between 2 and 44 – Optional assignment of operand A0 to an 18-bit two's complement constant – Optional assignment of operand C to a 44-bit two’s complement constant – Option to select between multiplier followed by adder, subtractor, or dynamic addsub – Optional Arithmetic Right Shift by 17 bits of the feedback input DOTP mode features: – Configurable operand widths for A0, B0, A1, B1 between 2 and 9. – Configurable operand width for C between 2 and 35. – Optional assignment of operand A0 and A1 to a 9-bit two's complement constant – Optional assignment of Operand C to a 35-bit two’s complement constant For usage and configuration of Hard Multiplier Accumulator, refer to the IGLOO2/SmartFusion2 Hard Multiplier Accumulator Configuration User Guide. 14 R e visio n 2 Digital Signal Processing Reference Guide Hard Multiplier AddSub The hard multiplier addsub for the SmartFusion2/IGLOO2 devices support Normal (Figure 9) and Dot Product (Figure 10 on page 16) mode multiplication. In Figure 9 and Figure 10 on page 16, the control registers are in blue color and the data registers are in brown color. 68% 3 '&$55<,1&$% $>@ ; %>@ &$55<287 29(5)/2: 3>@ &>@ &'287 &$55<,1 ' $56+,)7 !! RU &',1 Figure 9 • Hard Multiplier Addsub - Normal Mode Revision 2 15 DSP Reference Guide 68% 3 '&$55<,1&$% $>@ ; %>@ &$55<287 29(5)/2: 3>@ &>@ &'287 &$55<,1 ' $56+,)7 !! RU &',1 Figure 10 • Hard Multiplier Addsub - DOTP Mode Smart Core Hard Multiplier AddSub Key Features The hard multiplier addsub supports two operating modes: Normal and Dot Product. • • A structural netlist is generated in either Verilog or VHDL. Individual inputs and outputs can be optionally registered with: – – Independent active-low asynchronous and synchronous clear controls – Independent active-high enable controls • An additional cascade output CDOUT can be enabled. This is the sign-extended 44-bit copy of output P. • An additional cascade input CDIN from previous mathblock can be enabled. • An additional Carry In input can be enabled. • An additional Carry Out or Overflow output can be enabled. • Normal mode features: • 16 A common rising-edge clock – Configurable operand widths for A0 and B0 between 2 and 18 – Configurable operand width for C between 2 and 44 – Optional assignment of operand A0 to an 18-bit two's complement constant – Optional assignment of operand C to a 44-bit two’s complement constant – Option to select between multiplier followed by adder, subtractor or dynamic addsub – Optional Arithmetic Right Shift by 17 bits of the Cascade input Dot Product mode features: – Configurable operand widths for A0, B0, A1, B1 between 2 and 9 – Configurable operand width for C between 2 and 35 – Optional assignment of operand A0 and A1 to a 9-bit two's complement constant – Optional assignment of operand C to a 35-bit two’s complement constant R e visio n 2 Digital Signal Processing Reference Guide For usage and configuration of hard multiplier accumulator, refer to the IGLOO2/SmartFusion2 Hard Multiplier Accumulator Configuration User Guide. Hard Multiplier Signed The hard multiplier for the SmartFusion2/ IGLOO2 devices support two’s complement Normal (Figure 11) and Dot Product (Figure 12) mode multiplication. 3 $% $>@ ; 3>@ &'287 %>@ Figure 11 • Hard Multiplier Signed - Normal Mode $>@ ; 3 $%$% %>@ 3>@ &'287 $>@ ; %>@ Figure 12 • Hard Multiplier Signed - DOTP Mode Smart Core Key Features The hard multiplier supports two operating modes: Normal and Dot Product. • A structural netlist is generated in either Verilog or VHDL. • Individual inputs and outputs can be optionally registered with: – A common rising edge clock – Independent active-low asynchronous and synchronous clear controls – Independent active-high enable controls • Additional cascade output CDOUT can be enabled. This is the sign-extended 44-bit copy of output P. • Normal mode features: – Configurable operand widths for A0 and B0 between 2 and 18 – Optional assignment of operand A0 to an 18-bit two's complement constant. • Dot Product mode features: • Configurable operand widths for A0, B0, A1, B1 between 2 and 9 • Optional assignment of operand A0 and A1 to a 9-bit two's complement constant Revision 2 17 DSP Reference Guide For usage and configuration of hard multiplier accumulator, refer to the IGLOO2/SmartFusion2 Hard Multiplier Accumulator Configuration User Guide. Model-Based Design (Synphony ME Compiler) Synphony ME compiler is a Synopsys Microsemi Edition DSP tool for FPGA-based designs in math-works model-based design environment (Matlab-Simulink). This enables the DSP designer to evaluate an algorithm at a higher level of abstraction using MATLAB and Simulink along with an exhaustive set of DSP blockset and Microsemi IPs. Synphony Simulink blockset is a DSP library that includes DSP functions and algorithms. These functions include the common DSP building blocks such as adders, multipliers, and registers. A set of complex DSP building blocks such as forward error correction blocks, FFTs, filters, and memories is also included. Designs are captured in the DSP Simulink modeling environment using a Synphony ME compiler blockset. Synphony ME Compiler automatically converts the high-level system DSP model to RTL. The RTL can be synthesized to Microsemi FPGA/SoC using Synopsys high-level synthesis tool, Synplify Pro ME. The Synphony ME model compiler provides a system integration platform for the DSP designs on FPGAs that integrates the RTL, Simulink, MATLAB, and C/C++ components of a DSP system and it provides a single simulation and implementation environment. The Synphony ME model compiler supports a black box block that allows RTL to be imported into Simulink and co-simulated. Figure 13 shows the DSP design flow using the Synphony ME model compiler and the Libero design tools. 8VHU'HVLJQ 0$7/$%6LPXOLQN 6\QSKRQ\0RGHO &RPSLOHU0( /LEHUR 6\QWKHVLV6\QSOLI\3UR 0( 6LPXODWLRQ0RGHOVLP0( 3ODFHDQG5RXWH *HQHUDWH3URJUDPPLQJILOH 'HEXJ,GHQWLI\0( 0LFURVHPL 6R&)3*$ Figure 13 • DSP Design Flow using Synphony ME Model Compiler and Libero Tools For more information on DSP design flow using Synphony Model compiler, refer to the IGLOO2/SmartFusion2 DSP Flow Tutorial. 18 R e visio n 2 Digital Signal Processing Reference Guide C-Based Design The SmartFusion2 device has a built-in Cortex-M3 processor, which uses the cortex microcontroller software interface standard (CMSIS)-DSP library to develop DSP based applications. The CMSIS-DSP library includes vector operations, matrix computing, complex arithmetic, filter functions, control functions, PID transforms, fourier transforms, and many other frequently used DSP algorithms. Most algorithms are available in floating-point and various fixed-point formats and are optimized for the Cortex-M series processors. For more information on CMSIS-DSP library and usage, visit ARM website www.arm.com/cmsis. DSP Applications This section describes how to implement DSP applications and provide example applications that uses SmartFusion2/IGLOO2 mathblocks. This sections has the following subsections: • Advanced Math Functions • Filter Applications • Transform Applications Notes: 1. The Resource Utilization and Timing Summary given are specific to the SmartFusion2 devices. These reports apply for IGLOO2 devices also. 2. The VHDL design files are available as part of the document. The Verilog files will be provided on request. Revision 2 19 DSP Reference Guide Top-level Directory Structure of the Design Files The design files are provided for example applications. Figure 14 shows the top-level structure of the design files: 'RZQORDGB)ROGHU '63B5HIHUHQFHB*XLGHB') $FFXPXODWRUBELW $GGHUB6XEBELW $6//B$65/BELW %DUUHO6KLIWHU &RXQWHUBELW ([WHQGHGBDGGHUBBLQSXW ([WHQGHGBDGGHUBBLQSXW )RXULQSXWBELWB$GGHU 0$&),5WDS 0XOW[ 0XOW[BPXOWLSOLH0$&& 6KLIWUHJB6//B65/ 6\PPHWULFB0$&B),5B)LOWHU 6\VWROLF),5)LOWHU 6\VWROLFB6\PPHWULFB),5 7UDQVSRVHB),5BZBPDFF 7UDQVSRVHB6\PB),5 7ZR0XOWLSOLHU7DS),5 UHDGPHW[W Figure 14 • Top-level Directory Structure Advanced Math Functions The SmartFusion2/IGLOO2 mathblock efficiently performs a wide range of math functions including adder, subtractor, multiplier, divider, accumulator, multiply and accumulate (MAC), counters, and shifters etc. The pipeline stages within the embedded mathblock ensure high performance arithmetic functions. The cascaded feature of mathblock and associated routing provides fast routing between the SmartFusion2/IGLOO2 mathblocks with less routing congestion to the FPGA fabric. This section describes the realization of some of the math functions using the SmartFusion2/IGLOO2 mathblocks. It has the following sub sections: 20 • Barrel Shift Register • 18-Bit Shift Register • Wide-Multiplier • Extended Addition • 44-Bit Counter • 88-Bit Accumulator R e visio n 2 Digital Signal Processing Reference Guide Barrel Shift Register The barrel shift register can be implemented using the SmartFusion2/IGLOO2 mathblocks and is useful to shifting the data quickly. Data shifting is required in many operations like address generation and other arithmetic functions. Using a barrel shifter, data can be shifted or rotated by any number of bits in a single operation. The barrel shifter shifts the 18-bit value to the left by the value K. The bits shifted out of the most significant part reappear in the least significant bit (LSB) of the result, completing the circular shift. Figure 15 shows the 18-bit barrel shifter using two SmartFusion2/IGLOO2 mathblocks.The 18-bit barrel shifter (ROL) can be implemented using the following equation: Result A*2K + A*(2K-1 / 2^17) A*2K + (A/2)*(2K) / 2^17 A*2K + {(A/2)*(2K)} >> 17 EQ 2 For example, Figure 15 shows that input A is left shift 3CCAD by 5 bits and input B is shifted value 2^5 to first mathblock. The result achieved by 5 bits shifted output is stored in 195AF. Here, input A is an unsigned number. Refer to "Example2: Arithmetic Shift Left/Arithmetic Shift Right" section for signed number. Mathblock A A[17:0] = “111100110010101101” {3CCAD} >>1 A 25 B X Mathblock 25 + B Result = “011001010110101111” {195AF} X ‘’1’ + >>17 Figure 15 • 18-Bit Barrel Shifter using Mathblock EQ 3 Design Files For the implementation code, refer to the Barrelshifter_18bit.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\BarrelShifter\hdl\BarrelShifter_18bit.vhd Revision 2 21 DSP Reference Guide Resource Utilization and Timing Summary Report 1 shows the resources utilized by the m2s050fbga896-1 device for the Barrelshifter_18bit implementation. Type Used Total Percentage COMB 73 56340 0.13 SEQ 90 56340 0.16 IO (W/ clocks) 38 375 10.13 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 2 72 2.78 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 1 • Resource Utilization for 18-bit Barrel Shifter Table 5 shows the timing summary for the BarrelShifter_18bit implemented on the m2s050fbga896-1 device. Table 5 • Timing Summary for 18-bit Barrel Shifter Clock Domain CLK 22 Period (ns) 4.094 Frequency (MHz) 244.260 Required Period (ns) 5.000 Required Frequency (MHz) 200.000 R e visio n 2 External Setup (ns) 1.356 External Hold (ns) 0.814 Min ClockTo-Out Max Clock(ns) To-Out (ns) 5.210 9.064 Digital Signal Processing Reference Guide 18-Bit Shift Register The SmartFusion2/IGLOO2 mathblock 18x18 multiplier can be used to perform the shift operations. To shift left by K bits and shift right by (18-K) bits: 1. Input the value required for shifting in port A. 2. Input the 2K to port B, where K is the value required for left shift. K can be up to 17 bits. 3. The mathblock output is stored in the lower 36 bits of P[43:0], where the lower 18 bits are shifted left by K bits (P[17:0]) and the higher 18 bits (P[35:18]) are shifted right by (18-K) bits and sign extended. Example1: Shift Left Logic/Shift Right Logic Figure 16 shows that input A is left shifted by 5 bits and the result is stored in P[17:0]. P[35:18] shows that the input A right shifted by 13 bits (18-K) with the most significant bits (MSBs) as zeros. A[17:0] = “001100110010101101” {1CCAD} B[17:0] = “000000000000010000” {25} Mathblock A X B + P[35:18] = “000000000000000110” {Shift right by 13 bits (18-5)} P[17:0] = “011001010110100000” {Shift left by 5 bits} Figure 16 • Logical Shift Left/Right using Mathblock Design Files For the implementation code, refer to the SLL_SRL.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\Shiftreg_SLL_SRL\hdl\SLL_SRL.vhd Revision 2 23 DSP Reference Guide Synthesis and Place-and-Route Results Report 2 shows the resources utilized by the m2s050fbga896-1 device for the Shift Left logic 18-bit implementation. Type Used Total Percentage COMB 37 56340 0.07 SEQ 53 56340 0.09 IO (W/ clocks) 55 375 14.67 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 1 72 1.39 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 2 • Resource Utilization for Shift Left /Right Logic Table 6 shows the timing device summary for the Shift Left logic 18-bit implemented on the m2s050fbga896-1 Table 6 • Timing summary for Shift Left /Right Logic Clock Domain CLK 24 Period (ns) 2.189 Frequency (MHz) 456.830 Required Period (ns) 5.000 Required Frequency (MHz) 200.000 External Setup (ns) 0.721 R e visio n 2 External Hold (ns) 0.808 Min ClockTo-Out Max Clock(ns) To-Out (ns) 5.091 9.026 Digital Signal Processing Reference Guide Example2: Arithmetic Shift Left/Arithmetic Shift Right Figure 17 shows that the input A is left shifted by 5 bits and the result is stored in P[17:0], whereas the P[35:18] shows that the input A is right shifted by 13 bits (18-k) with sign bits extended. A[17:0] = “111100110010101101” {3CCAD} B[17:0] = “000000000000010000” {25} Mathblock A X B P[35:18] = “111111111111111110” {Shift right by 13 bits (18-5)} + P[17:0] = “011001010110100000” {Shift left by 5 bits} Figure 17 • Arithmetic Shift Left/Right using Mathblock Design Files For the implementation code, refer to the ASLL_ASRL.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\ASLL_ASRL_18bit\hdl\ASLL_ASRL.vhd Synthesis and P-and-Route Results Report 3 shows the resources utilized by the m2s050fbga896-1 device for the Arithmetic Shift Left logic 18-bit implementation. Type Used Total Percentage COMB 37 56340 0.07 SEQ 53 56340 0.09 IO (W/ clocks) 55 375 14.67 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 1 72 1.39 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 3 • Resource Utilization for 18-bit Arithmetic Shift Left/Right Logic Revision 2 25 DSP Reference Guide Table 7 shows the timing summary for the Arithmetic Shift Left logic 18-bit implemented on the m2s050fbga896-1 device. Table 7 • Timing summary for 18-bit Arithmetic Shift Left/Right Logic Clock Domain CLK Period (ns) 2.189 Frequency (MHz) 456.830 Required Period (ns) 5.000 Required Frequency (MHz) 200.000 External Setup (ns) 0.931 External Hold (ns) 0.830 Min ClockTo-Out Max Clock(ns) To-Out (ns) 5.221 9.341 Wide-Multiplier Wide-multipliers are extensively used in high precision wireless and medical applications where more than 18×18 bits are used. These applications require high precision at every stage when implementing complex arithmetic functions used in the fast fourier transform (FFT), filters, and so on. Military and highperformance computing also require performance and precision requirements, and sometimes require single-precision and double-precision floating-point calculations for implementing complex matrix operations and signal transforms. 26 R e visio n 2 Digital Signal Processing Reference Guide To implement the DSP functions that require high precision, the SmartFusion2/IGLOO2 device offers implementing wide-multipliers (that is, operands width more than 18×18) with the SmartFusion2/IGLOO2 mathblock. The wide-multipliers are implemented by cascading multiple SmartFusion2/IGLOO2 mathblocks using CDOUT and CDIN to propagate the result and to achieve the best performance results. This section describes wide-multiplier guidelines and different implementation methods with design examples to achieve the best performance results. Guidelines Following are some important recommendations for implementing a wide-multiplier to achieve the best results: • The I/Os are registered with the same clock • Add pipeline stages in RTL, so that the synthesis tool can automatically infer registers of mathblock or register the I/Os of mathblock, if arithmetic cores (mathblock) are used. • CDOUT of one mathblock is connected to the CDIN of another mathblock. Design Examples This section explains the 32×32 multiplier implementation with multiple and single mathblock. It also shows the performance results for both the implementations. This section shows the extended addition using the following design examples: • Example1: Multiplier 32×32 Implementation Using Multiple Mathblocks • Example 2: 32×32 Multiplier Implementation Using Single Mathblock Example1: Multiplier 32×32 Implementation Using Multiple Mathblocks The following section explains the 32×32 multiplier implementation with multiple mathblocks and shows the performance results. The 32×32 multiplier is implemented using the following algorithm: A = (AH × 217) + AL; B = (BH × 217) + BL; A×B = (AH × 217 + AL) × (BH × 217 + BL) = ((AH×BH) × 234) + ((AH×BL +AL×BH) × 217) + AL×BL Revision 2 27 DSP Reference Guide The 32×32 multiplier is implemented efficiently using four mathblocks without using fabric resources to produce 64-bit result as shown in Figure 18 and Figure 19 on page 29. To achieve the best performance results, use mathblock I/O registers. $>@[%>@ [ $+ $>@$>@$>@ $/ µ¶$>@ %+ %>@%>@%>@ %/ µ¶%>@ $/%/>@ 6LJQ([WHQGELWV 0DWKEORFN 0DWKEORFN 6LJQ([WHQGELWV $/[%/ $+[%/ $/%/>@ ELWRIIVHW $+%/>@ $+%/>@ $/[%+ ELWRIIVHW 6LJQ([WHQGELWV 0DWKEORFN $/%+>@ $/%+>@ $+[%+ ELWRIIVHW 0DWKEORFN $+%+>@ $+%+>@ 3>@ 3>@ Figure 18 • 32x32 Multiplication 28 R e visio n 2 3>@ Digital Signal Processing Reference Guide Multiplier 32x32 BL AL BH X Zero’s AL BL X AH X >>17 X >>17 SF2/IGL2 MACC + + + + SF2/IGL2 MACC AH BH SF2/IGL2 MACC P[33:17] P[15:0] SF2/IGL2 MACC P[63:34] Figure 19 • Implementation of 32x32 Multiplier using Mathblock When implementing using HDL, to infer mathblock I/O registers by synthesis tool, pipeline stages are added at output and input to achieve maximum throughput. In this design, two pipeline stages are added at input and output. Refer to design files for information on implementation of 32x32 multiplier. Design Files For the implementation code, refer to the Mult32×32_multipleMACC.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\Mult32x32_multiplieMACC\hdl\Mult32x32_multipl eMACC.vhd Hardware Configuration For 32×32 multiplier using a single mathblock, the mathblock is configured to function as: Normal Multiplier Accumulator -> Pn = Pn-1 + CARRYIN + C +/- A0×B0 Normal Multiplier Addsub -> Pn = D + CARRYIN + C +/- A0×B0 (if ARSHFT is disabled) -> Pn = (D>>17) + CARRYIN + C +/- A0×B0 (if ARSHFT is enabled) Normal Multiplier -> P = A0×B0 Revision 2 29 DSP Reference Guide Resource Utilization and Timing Summary Figure 4 shows the 32×32 multiplier resource utilization when using multiple mathblocks. Report 4 shows the resources utilized by the m2s050fbga896std device for the Mult32x32_multipleMACC implementation. Type Used Total Percentage COMB 145 56340 0.26 SEQ 290 56340 0.51 IO (W/ clocks) 130 375 34.67 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 4 72 5.56 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 SERDESIF 0 2 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 4 • Resource Utilization for Multiple Mathblocks Table 8 shows the timing summary for the Mult32x32_multipleMACC implemented on the m2s050fbga896std device. Table 8 • Timing Summary for 32×32 With Multiple Mathblock Clock Domain clk 30 Period (ns) 2.641 Frequency (MHz) 378.644 Required Period (ns) 2.857 Required Frequency (MHz) 350.018 R e visio n 2 External Setup (ns) 3.981 External Hold (ns) 0.782 Min ClockTo-Out Max Clock(ns) To-Out (ns) 4.782 10.858 Digital Signal Processing Reference Guide Example 2: 32×32 Multiplier Implementation Using Single Mathblock This section explains the 32×32 multiplier implementation with single mathblocks and also shows the performance results. The 32×32 multiplier is implemented using the same algorithm A×B = ((AH×BH) × 234) + ((AH×BL +AL×BH) × 217) + AL×BL = ((AH×BH) × 234) + (AH×BL × 217) + (AL×BH × 217) + AL×BL In this implementation, the four multiplications are computed using a single mathblock sequentially. The control finite-state machine (FSM) in the design provides inputs to the mathblock sequentially in four consecutive states as shown in Figure 18 on page 28 and appropriately enables the shift operation in the corresponding state. The mathblock used in this design is configured as a normal multiplier accumulator available in the Arithmetic IP core. Refer to the Hard Multiplier accumulator User Guide for configuration. The time taken to generate output = 4 clock cycles for providing inputs + 2 clock cycles as the inputs and output is registered + 2 clock cycles by mathblock at input and output = 8 clock cycles. reset_n SmartFusion2/IGLOO2 MACC Block A L[17 :0 ] ,B L[ 17 :0 ] clk AH [17 : 0] , B L[17 : 0] B [ 31 : 0 ] A AL [17 : 0 ], B H[17 : 0] A H[ 17 :0 ] , BH[ 17 : 0 ] A [ 31 : 0 ] B P Result Curr_State mul_en Zeros C D mul_result_valid Control FSM ARSHFT Multiplier 32 x 32 Figure 20 • Multiplier 32×32 using Single Mathblock Design Files For the implementation code, refer to the Mult32x32_SingleMACC.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\Mult32x32\hdl\Mult32x32_SingleMACC.vhd Revision 2 31 DSP Reference Guide Resource Utilization and Timing Summary Report 5 shows the resources utilized by the m2s050fbga896std device for the Mult32x32_SingleMACC implementation. Type Used Total Percentage COMB 84 56340 0.15 SEQ 141 56340 0.25 IO (W/ clocks) 132 375 35.20 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 1 72 1.39 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 SERDESIF 0 2 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 5 • Resource Utilization for 32×32 Multiplier using Single Mathblock Table 9 shows the timing summary for the Mult32x32_SingleMACC implemented on the m2s050fbga896std device. Table 9 • Timing Summary for 32×32 Multiplier with Single Mathblock Clock Domain clk 32 Period (ns) 2.641 Frequency (MHz) 378.644 Required Period (ns) 2.857 Required Frequency (MHz) 350.018 External Setup (ns) 1.075 R e visio n 2 External Hold (ns) 0.614 Min ClockTo-Out Max Clock(ns) To-Out (ns) 4.679 11.023 Digital Signal Processing Reference Guide Extended Addition Mathblock has a 3-input adder and supports accumulation of up to 44 bits. In some applications such as floating point multiplication, complex-FFT and filters, high-precision data has to be maintained at every stage. These DSP functions require more than 44-bit addition (extended addition) which can be realized using the SmartFusion2/IGLOO2 mathblock (3-input adder) and fabric logic. The extended addition is implemented by dividing the addition into two parts. The lower part (LSB) of addition is implemented using the SmartFusion2/IGLOO2 mathblock and the upper part (MSB) of addition is implemented using minimal fabric adder logic. For a 2-input addition, the inputs can be chosen from the following: • CDIN and C input or • Multiplier output and CDIN or • Multiplier output and C input or For a 3-input addition, the inputs are from the multiplier output, CDIN, and C-input. To perform arithmetic additions, the SmartFusion2/IGLOO2 mathblock provides Carryin signal and Carryout signal for propagating the carry from one mathblock to another mathblock or from mathblock to fabric logic. the mathblock is configured in Normal mode to function as a normal multiplier addsub. Design Examples This section shows the extended addition using the following design examples: • Example 1: 2-Input Signed Extended Addition • Example 2: 3-Input Signed Extended Addition • Example 3: 4-Input 42-Bit Adder • Example 4: 88-Bit Adder/Subtractor Example 1: 2-Input Signed Extended Addition This section describes a 2-input extended signed addition—if one operand is wider than 44 bits. It also shows that the 2-input extended signed addition implementation logic with fabric resources is implemented with the multiplier adder. 2-Input Addition For computing 2-input extended signed addition Z = U + V, with one operand width more than the mathblock output width 44, the logic as shown in Figure 22 should be implemented in fabric. 8P8P8Q8Q8Q8Q8Q8 9Q9Q9Q9Q9Q9Q9Q9 =P=P=Q=Q=Q=Q=Q= Figure 21 • 2-input Extended Signed Addition U denotes an m-bit value (where, m > 44) and V is a sign-extended n-bit value (where n < 44). The 2-input extended signed addition is divided in to two parts. The lower part (Sumlower) is computed in the mathblock and the upper part (Sumupper) is computed in the fabric. Z = (Sumupper, Sumlower) EQ 4 The lower part of the sum, Z = U + V, is calculated by providing the U[(n-1): 0], V[(n-1): 0] inputs to mathblock, where n = 44 which is the output width of the mathblock. Sumlower = U[(n-1): 0] + V[(n-1): 0] EQ 5 The upper part of sum, Z = U + V, is calculated as mentioned below: Sumupper = U[m: n] + V[m: n] (where U[m: n], V[m: n] are the MSB bits) EQ 6 Revision 2 33 DSP Reference Guide V[m: n] = {S, S….S, X}, S = P[n-1] AND X Where, P[n-1] is MSB of Sumlower X is the carryout of the Sumlower (from the mathblock) S is the sign bit of the adder and (m-n-1) number of S's appended to MSB bits of the V[m:n]. Hardware Implementation Figure 22 shows the operand width of C as 52 bits and explains the implementation for 2-input extended signed addition. For 3-input addition, mathblock is configured as a multiplier addsub in Normal mode. The upper part and lower part of the sum are mentioned as: For 52-bit, 2-input extended signed addition, Sumlower = C[43:0] + A[17:0]×B[17:0] Sumupper = {C[51:44] + {S, S, S, CARRYOUT}} Z [51:0] = {Sumupper, Sumlower} Z [51:0] = {C[51:44] + {S, S, S, CARRYOUT}}, P[43:0] Where, S = P[43] AND CARRYOUT 6)0$&& )DEULF/RJLFIRULQSXW DGGHU $>@ 3>@ %>@ 3>@ &$55<287 &>@ =>@ ; 6 ^666666666;` &>@ Figure 22 • Fabric Logic for 2-input Extended Addition using Mathblock 34 R e visio n 2 Digital Signal Processing Reference Guide Design Files For the implementation code, refer to the Extended_adder_2_input.vhd design files. Download the design files from: download_folder>\DSP_Reference_Guide_DF\Extended_adder_2_input\hdl\Extended_adder_2_in put.vhd Resource Utilization and Timing Summary Report 6 shows the resources utilized by the m2s050fbga896std device for the Extended_adder_2_input implementation. Type Used Total Percentage COMB 45 56340 0.08 SEQ 88 56340 0.16 IO (W/ clocks) 142 375 37.87 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 1 72 1.39 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 SERDESIF 0 2 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 6 • Resource Utilization for 2-input Extended Addition with Fabric Resources Table 10 shows the timing summary for Extended_adder_2_input implemented on the m2s050fbga896std device. Table 10 • Timing summary for 2-input Extended Addition with Fabric Resources Clock Domain clk Period (ns) 2.641 Frequency (MHz) 378.644 Required Period (ns) 3.333 Required Frequency (MHz) 300.030 Revision 2 External Setup (ns) 2.567 External Hold (ns) 0.595 Min ClockTo-Out Max Clock(ns) To-Out (ns) 5.017 10.947 35 DSP Reference Guide Example 2: 3-Input Signed Extended Addition This section explains the 3-input extended signed addition, if one or more operands have a width of more than 44 bits. This section shows the 3-input extended signed addition implementation logic with fabric resources. 3-Input Extended Addition For performing 3-input extended addition, Z = T + U + V, with two operands having width more than the mathblock input width of 44, the logic shown in Figure 24 should be implemented in fabric dŵͲϭdŵͲϮ͘͘͘dŶнϮdŶнϭdŶdŶͲϭdŶͲϮ͘͘͘dϬ hŵͲϭhŵͲϮ͘͘͘hŶнϮhŶнϭhŶhŶͲϭhŶͲϮ͘͘͘hϬ н sŶͲϭsŶͲϭ͙sŶͲϭsŶͲϭsŶͲϭsŶͲϭsŶͲϮ͘͘͘sϬ ŵͲϭŵͲϮ͘͘͘ŶнϮŶнϭŶŶͲϭŶͲϮ͘͘͘Ϭ Figure 23 • 3-input Extended Signed Addition T and U are m-bit values (where m > 44) and V is a sign-extended n-bit value (where n < 44). The 3-input extended signed addition is divided in to two parts. The lower part (Sumlower) is computed in the mathblock and the upper part (Sumupper) is computed in the fabric. Z = {Sumupper, Sumlower} EQ 7 The lower part of the sum, Z = T + U + V, is calculated by providing the {'0', T[(n-2): 0]}, {'0', U [(n-2}: 0]}, V [(n-1): 0] inputs to the mathblock, where n = 44 that is output width of the mathblock. Sumlower = {'0', T[(n-2): 0]} + {'0', U[(n-2): 0]} + V[(n-1): 0] EQ 8 The upper part of sum Z = T + U + V is calculated as shown below Sumupper = T[m: n-1] + U[m: n-1] + V[m: n] EQ 9 Where T[m: n], U[m: n], V[m: n] are the MSB bits V [m: n] = {S, S….S, X, P [n-1]} S = P[n-1] AND X Where' P [n-1] is the MSB bit of the Sumlower X is the overflow of the Sumlower (from the mathblock), (m-n-2) number of Ss should be appended in MSB bits of the V[m: n]. Hardware Implementation Figure 24 shows the operand widths of C and D (52 bits) and explains implementation for 3-input extended signed addition. For 3-input addition, mathblock is configured as multiplier addsub in Normal mode. For 52-bit, 3-input extended signed addition, Sumlower, and Sumupper are calculated as shown below: Sumlower = P[43:0] = {'0', C[42:0]} + {'0', D [42:0]} + A[17:0]×B[17:0] Sumupper = {C[51:44] + {S, S, S, CARRYOUT}} Z [51:0] = {Sumupper, Sumlower} Z [51:0] = {C[51:43] + D[51:43] + {S, S, S, S, S, S, S, CARRYOUT, P[43]}}, P[42:0] Where S = P[43] AND CARRYOUT 36 R e visio n 2 Digital Signal Processing Reference Guide . 6)0$&& )DEULF/RJLFIRULQSXWDGGHU $>@ 3>@ %>@ &>@ 3>@ &$55<287 '>@ =>@ 3>@ ; 6 6)0$&& ^666666;3>@` &>@ '>@ Figure 24 • 3-input Extended Addition using Mathblock and Fabric Logic Design Files For the implementation code, refer to the Extended_adder_3_input.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\Extended_adder_3_input\hdl\Extended_adder_3_i nput.vhd Revision 2 37 DSP Reference Guide Resource Utilization and Timing Summary Report 7 shows the resources utilized by the m2s050fbga896std device for the Extended_adder_3_input implementation COMB 92 56340 0.16 SEQ 120 56340 0.21 IO (W/ clocks) 194 375 51.73 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 2 72 2.78 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 SERDESIF 0 2 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 7 • Resource Utilization for 3-input Extended Addition with Fabric Resources 38 R e visio n 2 Digital Signal Processing Reference Guide Example 3: 4-Input 42-Bit Adder Two SmartFusion2/IGLOO2 mathblocks can be used to build a 4-input adder as shown in Figure 25. First mathblock adder acts as a 2-input adder (that is, 36-bit multiplier output and 44-bit C-input) and the result of first adder output is connected to the CDIN of second mathblock. The second mathblock adder functions as a 3–input adder (that is, 36-bit multiplier output, 44-bit CDIN, and 44-bit C-input). The sum obtained is a 44-bit adder result. In this case, the CARRYOUT is unused. For proper pipeline balancing at the inputs, external registers are added to the inputs of the second mathblock. Figure 25 shows the 4-input 42-bit adder. 0XOW$GG6XE '>@ $ 0XOW$GG6XE (>@ $>@ % $ %>@ % &>@ & )>@ 3>@ & &',1>@ Figure 25 • 4-Input 42-Bit Adder using Mathblocks Design Files For the implementation code, refer to the Fourinput_42bit_Adder.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\Fourinput_42bit_Adder\hdl\Fourinput_42bit_Add er.vhd Revision 2 39 DSP Reference Guide Resource Utilization and Timing Summary Report 8 shows the resources utilized by the m2s050fbga896std device for the Fourinput_42bit_Adder implementation. COMB 73 56340 0.13 SEQ 152 56340 0.27 IO (W/ clocks) 251 375 66.93 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 2 72 2.78 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 8 • Resource utilization for 4-Input 42-Bit Adder Table 11 shows the timing summary for Fourinput_42bit_Adder implemented on the m2s050fbga896-1 device. Table 11 • Timing summary for 4-Input 42-Bit Adder Clock Domain CLK 40 Period (ns) 2.294 Frequency (MHz) 435.920 Required Period (ns) 5.000 Required Frequency (MHz) 200.000 R e visio n 2 External Setup (ns) 1.974 External Hold (ns) 0.777 Min ClockTo-Out Max Clock(ns) To-Out (ns) 5.269 10.102 Digital Signal Processing Reference Guide Example 4: 88-Bit Adder/Subtractor The SmartFusion2/IGLOO2 mathblocks can be cascaded together to implement a large add/subtract function. Figure 26 shows the implementation of a 88-bit adder/subtractor or 88-bit extended addition performed using two mathblocks. The MSB bits addition is performed using the MultAddSub0 mathblock and LSB bits addition is performed using the MultAddSub1 mathblock. The CARRYOUT signal of the MultAddSub0 mathblock is cascaded to CARRYIN of the MultAddSub1 mathblock. 0XOW$GG6XE 68% 0XOW$GG6XE 3>@ $ 68% :>@ 4>@ $ ;>@ 5>@ 5>@ % % & &$55<287 & 5HVXOW>@ 5HVXOW>@ Figure 26 • 88-Bit Adder/Subtractor using Mathblocks Design Files For the implementation code, refer to the Adder_Sub_88bit.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\Adder_Sub_88bit\hdl\Adder_Sub_88bit.vhd Revision 2 41 DSP Reference Guide Resource Utilization and Timing Summary Report 9 shows the resources utilized by the m2s050fbga896-1 device for the Adder_Sub_88bit implementation. Type Used Total Percentage COMB 73 56340 0.13 SEQ 232 56340 0.41 IO (W/ clocks) 250 375 66.67 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 2 72 2.78 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 9 • Resource Utilization for 88-bit Adder/Subtractor Table 12 shows the timing summary for Adder_Sub_88bit implemented on the m2s050fbga896-1 device. Table 12 • Timing summary for 88-bit Adder/Subtractor Clock Domain CLK 42 Period (ns) 2.842 Frequency (MHz) 351.865 Required Period (ns) 5.000 Required Frequency (MHz) 200.000 R e visio n 2 External Setup (ns) 1.819 External Hold (ns) 0.770 Min ClockTo-Out Max Clock(ns) To-Out (ns) 5.233 9.914 Digital Signal Processing Reference Guide 44-Bit Counter The counter is one of the most widely used functions in digital applications. The SmartFusion2/IGLOO2 mathblock can be used as a high-speed counter, with the speed of up to 350 MHz. Figure 27 shows an implementation of 44-bit binary counter using the mathblock. The up counter, down counter, and a loadable counter can be implemented using SmartFusion2/IGLOO2 mathblock. Figure 28 shows the implementation of a count by M, 44-bit counter using the mathblock. ^=HURV` ^=HURV` 0DWKEORFN $ % 3>@ &DUU\LQ Figure 27 • Binary Counter using Mathblock ELW EL ^=HURV` ^=HURV` W 0DWKEORFN $ % 3>@ &>@ 0>@ Figure 28 • Count by M 44-Bit Counter using Mathblock Design Files For the implementation code, refer to the Counter44_bit.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\Counter_44bit\hdl\Counter_44bit.vhd Revision 2 43 DSP Reference Guide Synthesis and P-and-Route Results Report 10 shows the resources utilized by the m2s050fbga896-1 device for the Counter_44bit implementation. Type Used Total Percentage COMB 83 56340 0.15 SEQ 89 56340 0.16 IO (W/ clocks) 93 375 24.80 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 1 72 1.39 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 10 • Resource Utilization for 88-bit Adder/Subtractor Table 13 shows the timing summary for Counter_44bit implemented on the m2s050fbga896-1 device. Table 13 • Timing summary for 44-bit Counter Clock Domain CLK 44 Period (ns) 3.030 Frequency (MHz) 330.033 Required Period (ns) 5.000 Required Frequency (MHz) 200.000 R e visio n 2 External Setup (ns) 1.804 External Hold (ns) 0.690 Min ClockTo-Out Max Clock(ns) To-Out (ns) 5.119 9.160 Digital Signal Processing Reference Guide 88-Bit Accumulator Two SmartFusion2/IGLOO2 mathblocks can be cascaded together to implement a 88-bit accumulator. Here, the CARRYOUT signal is used to cascade the mathblocks. The MultACC0 provides a LSB 44 bits result as output and the MultACC1 provides a MSB 44 bits result as output. The initial input value to the accumulator is the C input of mathblock. Figure 29 shows the implementation of an 88-bit accumulator function. Algorithm: Result [87:0] = Result [87:0] + C [87:0] = Result [87:43] * 243 + Result [43:0] + C [87:43] * 243 + C [43:0] = Result [87:43] * 243 + C [87:43] * 243 + C [43:0] + Result [43:0] = {(Result [87:43] + C [87:43]) * 243 + { C [43:0] + Result [43:0]} {MultACC1} {MultACC0} EQ 10 0XOW$&& ^=HURV` $ ^=HURV` % & &>@ &$55<287 0XOW$&& ^=HURV` $ ^=HURV` % &>@ & 5HVXOW>@ 5HVXOW>@ Figure 29 • 88-Bit Accumulator using Mathblocks Design Files For the implementation code, refer to the Accumulator_88bit.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\Accumulator_88bit\hdl\Accumulator_88bit.vhd Revision 2 45 DSP Reference Guide Synthesis and P-and-Route Results Report 11 shows the resources utilized by the m2s050fbga896-1 device for the Accumulator_88bit implementation. Type Used Total Percentage COMB 73 56340 0.13 SEQ 248 56340 0.44 IO (W/ clocks) 178 375 47.47 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 2 72 2.78 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 11 • Resource Utilization for 88-bit Accumulator Table 14 shows the timing summary for Accumulator_88bit implemented on the m2s050fbga896-1 device. Table 14 • Timing summary for 88-bit Accumulator Clock Domain CLK 46 Period (ns) 2.750 Frequency (MHz) 363.636 Required Period (ns) 5.000 Required Frequency (MHz) 200.000 R e visio n 2 External Setup (ns) 1.109 External Hold (ns) 0.751 Min ClockTo-Out Max Clock(ns) To-Out (ns) 4.729 10.041 Digital Signal Processing Reference Guide Filter Applications A wide variety of filter architectures can be implemented using the Microsemi FPGAs. The SmartFusion2/IGLOO2 mathblock architecture is simple to use, easy to adapt, and build finite impulse response (FIR) filters depending on the requirement of the application. A FIR filter is a convolution of an input signal and impulse response as shown in EQ 11. N-1 Yn = Ʃ X (n-k) * h (k) k=0 EQ 11 Where X (n-k) represents input signal H (k) represents impulse response or coefficient N is the filter length or the filter order In EQ 11, a set of N coefficients is multiplied by respective N data samples and the products are summed together to form an individual result. The coefficients determine the characteristics of the filter (for example, low-pass filter, band-pass filter, high-pass filter). The FIR equation can be implemented using different architectures (sequential, parallel, and semi-parallel). This section describes the architecture of the following FIR filters and their implementation using the SmartFusion2/IGLOO2 mathblock: • MAC FIR Filters • Parallel FIR Filters • Semi-Parallel FIR Filters MAC FIR Filters The MAC FIR is one of the DSP filter structures that uses a single multiplier with an accumulator to implement a sequential FIR filter. The MAC FIR filter can be implemented using a single SmartFusion2/IGLOO2 mathblock as multiplier-accumulator. This architecture is suitable for applications with slow sample rates and many coefficients. Following are the two MAC FIR filter architectures and their implementation using the SmartFusion2/IGLOO2 mathblock: • Single MAC FIR Filter • Symmetric MAC FIR Filter Revision 2 47 DSP Reference Guide Single MAC FIR Filter The Single-MAC FIR filter is useful when the ratio of clock to sample rate is greater than or equal to the filter order/filter length [(fclk/fsample) >= Filter order] . Figure 30 shows the general form of multiply– accumulate (MAC) based FIR filter structure utilizing single MAC engine. 0$& ,QSXW PHPRU\ ;Q = = = ;L = <Q = &L = = &RHIILFLHQWPHPRU\ FLUFXODUEXIIHU 1 & & & &Q &Q Figure 30 • N-Tap Single MAC FIR Structure Figure 31 shows the implementation of N-Tap single MAC FIR filter using a mathblock and a uSRAM is used for coefficient and input data storage. The control logic is used to generate the required control signals to perform filter operations. Thus single-MAC FIR filter architecture saves the math resources by a factor of the number of filter taps. 0$&),5ELW[ELW &RHIILFLHQW 0HPRU\ [ VUDP 0XOW$&& FON UHVHWBQ ;LQ>@ &RHIBDGGU [ VUDP &RHI>@ $ &RQWURO ORJLF ,QSXW0HPRU\ 'DWDBDGGU 'DWD>@ % <QBRXW>@ [ VUDP [ VUDP /RDGBVLJ Figure 31 • N-Tap Single MAC FIR Filter using Mathblock 48 R e visio n 2 = Digital Signal Processing Reference Guide The number of clock cycles required to compute N-Tap single MAC FIR filter (Or) Maximum input sample rate = Clock speed / (N + 1) For example, for a 100-Tap MAC FIR filter, maximum input sample rate = Clock speed / (100 + 1) Therefore, a Single MAC FIR filter input sample rate is same as the number of coefficients in a filter. Design Files For the implementation code, refer to the MAC_FIR.vhd, design files.Download the design files from the below location, <download_folder>\DSP_Reference_Guide_DF\MAC FIR 16-tap\MAC_FIR\hdl\MAC_FIR.vhd Resource Utilization and Timing Summary Report 12 shows the resources utilized by the m2s050fbga896-1 device for the MAC_FIR implementation. Type Used Total Percentage COMB 362 56340 0.64 SEQ 396 56340 0.70 IO (W/ clocks) 67 375 17.87 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 4 72 5.56 RAM1K18 0 69 0.00 MACC 1 72 1.39 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 12 • MAC FIR Table 15 shows the timing summary for MAC_FIR implemented on the m2s050fbga896-1 device. Table 15 • MAC FIR Timing Summary Clock Domain clk Period (ns) 4.000 Frequency (MHz) 250.000 Required Period (ns) 5.000 Required Frequency (MHz) 200.000 Revision 2 External Setup (ns) 1.569 External Hold (ns) 0.879 Min ClockTo-Out Max Clock(ns) To-Out (ns) 4.633 8.240 49 DSP Reference Guide Symmetric MAC FIR Filter Figure 32 shows the general form of MAC based symmetric FIR filter structure utilizing a MAC engine. 0$& ;Q ,QSXW PHPRU\ = = ;L = <Q = &L &RHIILFLHQWPHPRU\ FLUFXODUEXIIHU = = = 1 & & & &Q &Q Figure 32 • Symmetric MAC FIR Filter using Mathblock For N-Tap symmetric FIR filter implementation, an extra adder is used for summing up the samples of the symmetric coefficients and the remaining logic remains same as a single-MAC FIR filter as shown in Figure 33. Moreover, only one uSRAM is required for coefficient memory instead of two as used in single MAC FIR filter and thus saving the memory resources. Yn = (X0*C0) + (X1*C1) + ……+ (Xn-1*Cn-1) + (Xn*Cn) For Symmetric filter, C0 = Cn, C1 = Cn-1, C2 = Cn-2 ….etc Yn = (X0 + Xn)*C0 + (X1+ Xn-1)*C1 + … (Xm + Xn-m)*Cm +... 50 R e visio n 2 Digital Signal Processing Reference Guide 6\PPHWULF0$&),5ELW[ELW ,QSXW0HPRU\ [ VUDP FON 'DWDBDGGU UHVHWBQ 'DWDBDGGU ;LQ>@ 0XOW$&& [ VUDP &RQWURO ORJLF $ &RHIBDGGU % &RHIILFLHQW PHPRU\ [ VUDP <QBRXW>@ VHO /RDGBVLJ = Figure 33 • Symmetric MAC FIR using Mathblock The number of clock cycles required to compute N-Tap symmetric MAC FIR filter (Or) Maximum input sample rate = Clock speed / (N/2) + 1 For example: For a 100-Tap symmetric MAC FIR filter, maximum input sample rate = Clock speed / 50 + 1 Therefore, input sample rate of symmetric MAC filter = 2 * input sample rate of single-MAC FIR filter. Note: There is a limitation in using symmetric MAC FIR filter. Due to the 1-bit growth from the pre-adder, the input data to the filter must be less than 18 bits to fit into a mathblock, that is, maximum input data width is 17 bits for unsigned, and 16 bits for signed. The pre-adder can also be implemented using another mathblock to save the fabric resources because the fabric adder may limit maximum clock frequency. Design Files For the implementation code, refer to the Symmetric_MAC_FIR.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\Symmetric_MAC_FIR_Filter\hdl\Symmetric_MAC_FI R.vhd Revision 2 51 DSP Reference Guide Resource Utilization and Timing Summary Report 13 shows the resources utilized by the m2s050fbga896-1 device for the Symmetric_MAC_FIR implementation. Type Used Total Percentage COMB 424 56340 0.75 SEQ 373 56340 0.66 IO (W/ clocks) 67 375 17.87 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 3 72 4.17 RAM1K18 0 69 0.00 MACC 1 72 1.39 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 13 • Symmetric MAC FIR Table 16 shows the timing summary for Symmetric_MAC_FIR implemented on the m2s050fbga896-1 device. Table 16 • Symmetric MAC FIR Timing Summary Clock Domain clk 52 Period (ns) 4.305 Frequency (MHz) 232.288 Required Period (ns) 5.000 Required Frequency (MHz) 200.000 External Setup (ns) 0.631 R e visio n 2 External Hold (ns) 0.861 Min ClockTo-Out Max Clock(ns) To-Out (ns) 4.678 9.008 Digital Signal Processing Reference Guide Parallel FIR Filters The Parallel FIR filter uses N multipliers and N-1 adders as shown in Figure 34. The parallel FIR filter can be realized using the SmartFusion2/IGLOO2 mathblocks and is well suited in applications for high sample rate requirements. = = = = ;Q & & &1 & <Q Figure 34 • Direct Form FIR Filter Structure The following Parallel FIR filter architectures and their implementation using the SmartFusion2/IGLOO2 mathblock are described in this section. • Transpose - Non-symmetry • Transpose - Symmetry • Systolic - Non-Symmetry • Systolic - Symmetry Transpose - Non-symmetry The transpose FIR architecture is used in high performance filter applications. This architecture is realized from the Direct Form I structure as shown in Figure 35. In the Transpose architecture, the same input is shared to all the multipliers, thus increasing the fan-out at the input. For N-tap Transpose FIR filter, the total initial latency taken = (N – 1) clock cycles. ;Q &1 &1 = &1 & = = <Q Figure 35 • N-Tap Non-Symmetric Transpose FIR Filter Structure Revision 2 53 DSP Reference Guide In transpose FIR filter, each multiplier-adder block is realized using one mathblock. Hence, the N-tap transpose FIR filter utilizes only N mathblocks. Figure 36 shows implementation of the 16-tap NonSymmetric Transpose FIR filter using the SmartFusion2/IGLOO2 mathblocks. Figure 36 • 16-Tap Non-Symmetric Transpose FIR Filter using Mathblock Design Files For the implementation code, refer to the Transpose_FIR.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\Transpose_FIR_w_macc\hdl\Transpose_FIR.vhd 54 R e visio n 2 Digital Signal Processing Reference Guide Resource Utilization and Timing Summary Report 14 shows the resources utilized by the m2s050fbga896-1 device for the Transpose_FIR implementation. Type Used Total Percentage COMB 577 56340 1.02 SEQ 594 56340 1.05 IO (W/ clocks) 64 375 17.07 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 16 72 22.22 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 14 • Transpose FIR Filter Table 17 shows the timing summary for Transpose_FIR implemented on the m2s050fbga896-1 device. Table 17 • Transpose FIR Filter Timing Summary Clock Domain clk Period (ns) 3.635 Frequency (MHz) 275.103 Required Period (ns) 4.000 Required Frequency (MHz) 250.000 External Setup (ns) 0.066 External Hold (ns) 0.829 Min ClockTo-Out Max Clock(ns) To-Out (ns) 4.652 8.236 Transpose - Symmetry The transpose symmetry architecture is same as the transpose architecture but the coefficients are shared to the respective inputs of symmetric taps. Figure 37 on page 56 and Figure 38 on page 56 show the Odd tap and even tap Symmetric Transpose FIR filter architectures. Revision 2 55 DSP Reference Guide ;Q = <Q = & &1 &1 &1 = = = = = Figure 37 • ODD TAP Symmetric Transpose FIR Filter Structure ;Q &1 = <Q = = = = = Figure 38 • Even TAP Symmetric Transpose FIR Filter Structure 56 & &1 &1 R e visio n 2 = Digital Signal Processing Reference Guide In symmetric transpose FIR filter, each multiplier-adder block is realized using one mathblock. Hence, the N-tap transpose FIR filter utilizes only N mathblocks. Figure 39 shows implementation of a 16-tap Symmetric Transpose FIR filter using the mathblocks. Figure 39 • Symmetric Transpose FIR Filter using Mathblock Design Files For the implementation code, refer to the Transpose_Sym_FIR.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\Transpose_Sym_FIR\hdl\Transpose_Sym_FIR.vhd Revision 2 57 DSP Reference Guide Resource Utilization and Timing Summary Report 15 shows the resources utilized by the m2s050fbga896-1 device for the Transpose_Sym_FIR implementation. Type Used Total Percentage COMB 577 56340 1.02 SEQ 594 56340 1.05 IO (W/ clocks) 64 375 17.07 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 16 72 22.22 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 15 • Symmetric Transpose FIR Filter Table 18 shows the timing summary for Transpose_Sym_FIR implemented on the m2s050fbga896-1 device. Table 18 • Symmetric Transpose FIR Filter Timing Summary Clock Domain clk 58 Period (ns) 3.635 Frequency (MHz) 275.103 Required Period (ns) 4.000 Required Frequency (MHz) 250.000 R e visio n 2 External Setup (ns) 0.066 External Hold (ns) 0.829 Min ClockTo-Out Max Clock(ns) To-Out (ns) 4.652 8.236 Digital Signal Processing Reference Guide Systolic - Non-Symmetry The systolic FIR architecture is used in high performance filter applications. The systolic FIR structure is realized from the Direct form structure by adding an extra pipeline register on each MAC stage (that is, a register at input and a register at output). Hence, the maximum performance is achieved with this architecture. Figure 40 shows the Non-Symmetric Systolic FIR filter architecture. For tap Non-Symmetric Systolic FIR filter, the total initial latency taken = 2*(N – 1) clock cycles. 6\VWROLF%ORFN = ;Q & & & = = = = = = & = = = = &1 = <Q Figure 40 • Non-Symmetric Systolic FIR Filter Structure In systolic FIR filter, each systolic block is realized using a mathblock and two fabric registers at the input stage. Hence, the n-tap systolic FIR filter utilizes n mathblocks and minimum 2*(n-1) fabric registers. Figure 41 shows implementation of 16-tap Non-Symmetric Systolic FIR filter using mathblocks. Figure 41 • 16-Tap Non-Symmetric Systolic Parallel FIR Filter using Mathblock Revision 2 59 DSP Reference Guide Design Files For the implementation code, refer to the Systolic_FIR_Filter.vhd, design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\SystolicFIR_Filter\Systolic_FIR_Filter\hdl\Sy stolic_FIR_Filter.vhd Resource Utilization and Timing Summary Report 16 shows the resources utilized by the m2s050fbga896-1 device for the Systolic_FIR_Filter implementation. Type Used Total Percentage COMB 577 SEQ 1134 56340 2.01 IO (W/ clocks) 64 375 17.07 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 16 72 22.22 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 56340 1.02 Report 16 • Systolic FIR Filter Table 19 shows the timing summary for Systolic_FIR_Filter implemented on the m2s050fbga896-1 device. Table 19 • Systolic FIR Filter Timing Summary Clock Domain Clk 60 Period (ns) 2.245 Frequency (MHz) 445.434 Required Period (ns) 4.000 Required Frequency (MHz) 250.000 R e visio n 2 External Setup (ns) -0.024 External Hold (ns) 0.970 Min ClockTo-Out Max Clock(ns) To-Out (ns) 5.292 9.414 Digital Signal Processing Reference Guide Systolic - Symmetry In the symmetric systolic architecture, each systolic can be realized using a fabric pre-adder and a mathblock. A pre-adder is used to sum-up two inputs for symmetric coefficients (for example, C0=cn-1). An n-tap symmetric FIR filter uses N/2 multipliers, N/2-1 adders, and N/2 pre-adders. Figure 42 and Figure 43 show the even-tap and odd tap Symmetric Systolic FIR filter. 6\VWROLF%ORFN = ;Q = = = = & & & = = &1 = = = = <Q Figure 42 • Even-tap Symmetric Systolic FIR Filter Structure 6<672/,&EORFN ;Q = = & & & = = = = = = &1 = = <Q Figure 43 • ODD-tap Symmetric Systolic FIR Filter Structure Revision 2 61 DSP Reference Guide In symmetric systolic FIR filter, each systolic block is realized using a mathblock, a fabric pre-adder and two fabrics registers at the input stage. Hence, the N-tap systolic FIR filter utilizes N/2 mathblocks and N/2 fabric pre-adders. Thus, this architecture utilizes half the mathblock resources when transposed with symmetric architecture. Figure 44 and Figure 45 on page 63 show the implementation of 16-tap Symmetric Systolic FIR filter using the SmartFusion2/IGLOO2 mathblocks. Figure 44 • 16-Tap Even-Tap Symmetric Systolic FIR Filter using Mathblock 62 R e visio n 2 Digital Signal Processing Reference Guide Figure 45 • 16-Tap ODD-Tap Symmetric Systolic FIR Filter using Mathblock Design Files For the implementation code, refer to the Systolic_Symmetric_FIR.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\Systolic_Symmetric_FIR\hdl\Systolic_Symmetric _FIR.vhd Revision 2 63 DSP Reference Guide Resource Utilization and Timing Summary Report 17 shows the resources utilized by the m2s050fbga896-1 device for the Systolic_Symmetric_FIR implementation. Type Used Total Percentage COMB 433 56340 0.77 SEQ 722 56340 1.28 IO (W/ clocks) 64 375 17.07 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 0 72 0.00 RAM1K18 0 69 0.00 MACC 8 72 11.11 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 17 • 16-Tap Even-Tap Systolic FIR Filter Table 20 shows the timing summary for Systolic_Symmetric_FIR implemented on the m2s050fbga896-1 device. Table 20 • 16-Tap Even-Tap Systolic FIR Filter Timing Summary Clock Domain clk Period (ns) 2.729 Frequency (MHz) 366.435 Required Period (ns) 4.000 Required Frequency (MHz) 250.000 External Setup (ns) 0.940 External Hold (ns) 0.818 Min ClockTo-Out Max Clock(ns) To-Out (ns) 4.927 8.506 CoreFIR is available in Libero catalog which supports transpose and systolic architectures under fully enumerated (Parallel FIR architectures). Refer to CoreFIR Handbook for more information on the usage and configuration. 64 R e visio n 2 Digital Signal Processing Reference Guide Semi-Parallel FIR Filters The Semi-Parallel (folded) single rate FIR is a hybrid architecture and is commonly used to use the mathblocks efficiently according to the design requirements. It utilizes minimal number of MAC blocks that are sufficient to keep up with an average input sample rate. In semi-parallel FIR filter, the folding factor decides the number of coefficients for each MAC. For example, for a filter length, L and folding factor, m, the semi-parallel FIR architectures consume (L/m) multipliers, (L/m) adders and 1 accumulator. Figure 46 shows the Semi-parallel FIR filter architecture. ;Q = = P &RHIILFLHQWV = = = = = = = & &P &/P & &P &/P & &P &/P &P &P &/ &RHI0HP = = = &RHI0HP &RHI0HP = = = = Figure 46 • Semi-parallel FIR Filter Structure In this design, each MACC is realized using mathblock and uses uSRAM for coefficients and input data storage. Each coefficient from uSRAM and input from uSRAM is appropriately fed to each mathblocks and intermediate sums are accumulated to generate final result. Figure 47 shows implementation of a 8-tap two Mathblock FIR filter. WDS7ZR0$&&),5ILOWHU X65$0 &ON UHVHWQ ,QSXW PHPRU\ ,QSXW PHPRU\ ;LQ>@ $ 0XOW$GG $ & & & & VUDP % % % & & &RHIBDGGU &RHIILFLHQW PHPRU\ 0XOW$&& $ & &RHIBDGGU &RQWURO ORJLF 0XOW$GG <QBRXW>@ &RHIILFLHQW PHPRU\ =(52¶V &RHIBDGGU & & & & VUDP =1 Figure 47 • 8-Tap Semi-Parallel FIR Filter Structure using Mathblock Revision 2 65 DSP Reference Guide In a similar manner, a 16-tap four MACC FIR filter can be implemented as shown in Figure 48. In this implementation, five mathblocks are required for computing the final result. WDS)RXU0$&&),5ILOWHU &ON UHVHWQ ,QSXWPHPRU\ X65$0 ,QSXW PHPRU\ ,QSXWPHPRU\ ,QSXW PHPRU\ X65$0 ;LQ>@ 0XOW$GG $ $ &RHIBDGGU &RHIBDGGU 0XOW$GG 0XOW$GG $ % % % % % & & & & &RHIILFLHQW PHPRU\ PHPRU\ =(52¶V &RHIBDGGU & & & & VUDP &RHIBDGGU &RHIILFLHQW PHPRU\ & & & & VUDP &RHIBDGGU 0XOW$&& & &RHIBDGGU &RHIILFLHQW & & & & VUDP $ &RHIBDGGU &RQWURO ORJLF 0XOW$GG $ <QBRXW>@ &RHIILFLHQW PHPRU\ & & & & VUDP =1 Figure 48 • 16-Tap Semi-Parallel FIR Filter using Mathblock Design Files For the implementation code, refer to the TwoMult_8_Tap_FIR.vhd design files. Download the design files from: <download_folder>\DSP_Reference_Guide_DF\TwoMultiplier8-Tap FIR\TwoMult_8Tap_FIR\hdl\TwoMult_8_Tap_FIR.vhd 66 R e visio n 2 Digital Signal Processing Reference Guide Resource Utilization and Timing Summary Report 18 shows the resources utilized by the m2s050fbga896-1 device for the TwoMult_8_tap_FIR implementation. Type Used Total Percentage COMB 201 56340 0.36 SEQ 293 56340 0.52 IO (W/ clocks) 65 375 17.33 Differential IO 0 187 0.00 GLOBAL 2 16 12.50 RGB 2 1088 0.18 RAM64x18 2 72 2.78 RAM1K18 0 69 0.00 MACC 3 72 4.17 RCOSC_25_50MHZ 0 1 0.00 RCOSC_1MHZ 0 1 0.00 XTLOSC 0 1 0.00 CCC 0 6 0.00 MSS 0 1 0.00 FDDR 0 1 0.00 SYSCTRL 0 1 0.00 Report 18 • 8-Tap two Mult FIR Filter Table 21 shows the timing summary for TwoMult_8_tap_FIR implemented on the m2s050fbga896-1 device. Table 21 • 8-Tap Two Mult FIR Filter Timing Summary Clock Domain clk Period (ns) 4.000 Frequency (MHz) 250.000 Required Period (ns) 4.000 Required Frequency (MHz) 250.000 External Setup (ns) 0.882 External Hold (ns) 0.887 Min ClockTo-Out Max Clock(ns) To-Out (ns) 4.646 8.176 Transform Applications Fast Fourier Transform Fast fourier transform (FFT) is used in DSP applications such as digital communication, video/audio processing, industrial control and bio-medical processing. This transform can be designed in the SmartFusion2/IGLOO2 devices using the inbuilt mathblock and memory blocks (LSRAM and uSRAM). This section describes the basic theory on FFT transform and the FFT IP core available in the Libero SoC software. The Complex FFT transforms two N point time domain signals into two N point frequency domain signals. The complex signal has two parts, real part and imaginary part. Revision 2 67 DSP Reference Guide The FFT of N complex data points x(n) is defined as: N–1 Xk = x n W nk -----N n=0 EQ 12 Where, k = 0, 1, 2 … N-1 and WN = e-2π/N WN is the twiddle factor or coefficient. Radix-2 FFT is the widely used architecture in FFT implementation. Radix 2 FFT includes butterfly structure that consists of complex adder, subtraction, and a multiplier for the twiddle factors. Figure 49 shows the simple radix2 butterfly structure and Figure 50 shows the Microsemi mathblock architecture for complex multiplier used for butterfly computations. ;Q3 ;Q 3 :1U Figure 49 • Radix-2 Butterfly Structure 68 R e visio n 2 ;Q3 ;Q3 Digital Signal Processing Reference Guide \5( $5( [5( \,0 [,0 \,0 [5( $,0 \5( [,0 Figure 50 • Radix-2 Butterfly using Mathblock Microsemi has the CoreFFT IP available in the Libero SoC IP catalog, which is highly parameterizable, area efficient and high Performance MAC based FFT, optimized for SmartFusion2/IGLOO2 devices. The CoreFFT has two implementations, Radix-2 decimation-in-time place architecture and radix-22 decimation in frequency streaming FFT. CoreFFT supports both the forward and inverse transforms with a length of 2n, where 5 ≤ n ≤ 13. The main features of CoreFFT are shown in Table 21. Table 21 • Core FFT Features Feature In-Place Streaming Transform sizes 32-, 64-, 128-, 256-, 512-, 1024-, 2048-, 16-, 32-, 64-, 128-, 256-, 512-, and 10244096-, and 8192-point point Forward and inverse FFT Yes Yes Input data bit width 8 – 32 8 – 32 Twiddle factor bit width 8 – 32 8 – 32 Input/output data format Two’s complement Two’s complement Natural output sample order Yes Optional Revision 2 69 DSP Reference Guide Table 21 • Core FFT Features (continued) Feature In-Place Streaming Conditional block floating point Yes scaling No Pre-defined scaling schedule Yes No Optional minimal or buffered Yes memory configurations No Embedded RAM-block based Yes twiddle look-up table (LUT) Yes Support for refreshing twiddle Yes look-up tables Yes Handshake signals to facilitate Yes easy interface to the user circuitry Yes Run-time forward/inverse No transform configuration Yes Resource Utilization and Timing Summary For more information on CoreFFT refer to CoreFFT handbook. 70 R e visio n 2 Appendix 1 – Design Files The design files (DF) can be downloaded from the Microsemi® SoC Products Group website: www.microsemi.com/soc/download/rsc/?f=DSP_Reference_Guide_DF The design file consists of example projects in vhdl. Refer to the readme.txt file that is included in the design file for the directory structure. Revision 2 71 List of Changes The following table lists critical changes that were made in each revision of the reference guide. Revision Changes Revision 2 December 2014 Removed all instances of and references to M2S100 and M2GL100 device from Table 1 (SAR 62858). Revision 1 June 2014 First Release Page 5 NA Revision 3 72 Product Support Microsemi SoC Products Group backs its products with various support services, including Customer Service, Customer Technical Support Center, a website, electronic mail, and worldwide sales offices. This appendix contains information about contacting Microsemi SoC Products Group and using these support services. Customer Service Contact Customer Service for non-technical product support, such as product pricing, product upgrades, update information, order status, and authorization. From North America, call 800.262.1060 From the rest of the world, call 650.318.4460 Fax, from anywhere in the world, 408.643.6913 Customer Technical Support Center Microsemi SoC Products Group staffs its Customer Technical Support Center with highly skilled engineers who can help answer your hardware, software, and design questions about Microsemi SoC Products. The Customer Technical Support Center spends a great deal of time creating application notes, answers to common design cycle questions, documentation of known issues, and various FAQs. So, before you contact us, please visit our online resources. It is very likely we have already answered your questions. Technical Support Visit the Customer Support website (www.microsemi.com/soc/support/search/default.aspx) for more information and support. Many answers available on the searchable web resource include diagrams, illustrations, and links to other resources on the website. Website You can browse a variety of technical and non-technical information on the SoC home page, at www.microsemi.com/soc. Contacting the Customer Technical Support Center Highly skilled engineers staff the Technical Support Center. The Technical Support Center can be contacted by email or through the Microsemi SoC Products Group website. Email You can communicate your technical questions to our email address and receive answers back by email, fax, or phone. Also, if you have design problems, you can email your design files to receive assistance. We constantly monitor the email account throughout the day. When sending your request to us, please be sure to include your full name, company name, and your contact information for efficient processing of your request. The technical support email address is [email protected]. Revision 2 73 Product Support My Cases Microsemi SoC Products Group customers may submit and track technical cases online by going to My Cases. Outside the U.S. Customers needing assistance outside the US time zones can either contact technical support via email ([email protected]) or contact a local sales office. Sales office listings can be found at www.microsemi.com/soc/company/contact/default.aspx. ITAR Technical Support For technical support on RH and RT FPGAs that are regulated by International Traffic in Arms Regulations (ITAR), contact us via [email protected]. Alternatively, within My Cases, select Yes in the ITAR drop-down list. For a complete list of ITAR-regulated Microsemi FPGAs, visit the ITAR web page. 74 R e visio n 2 Microsemi Corporate Headquarters One Enterprise, Aliso Viejo CA 92656 USA Within the USA: +1 (800) 713-4113 Outside the USA: +1 (949) 380-6100 Sales: +1 (949) 380-6136 Fax: +1 (949) 215-4996 E-mail: [email protected] Microsemi Corporation (Nasdaq: MSCC) offers a comprehensive portfolio of semiconductor and system solutions for communications, defense and security, aerospace, and industrial markets. Products include high-performance and radiation-hardened analog mixed-signal integrated circuits, FPGAs, SoCs, and ASICs; power management products; timing and synchronization devices and precise time solutions, setting the world's standard for time; voice processing devices; RF solutions; discrete components; security technologies and scalable anti-tamper products; Power-over-Ethernet ICs and midspans; as well as custom design capabilities and services. Microsemi is headquartered in Aliso Viejo, Calif. and has approximately 3,400 employees globally. Learn more at www.microsemi.com. © 2014 Microsemi Corporation. All rights reserved. Microsemi and the Microsemi logo are trademarks of Microsemi Corporation. All other trademarks and service marks are the property of their respective owners. 50200442-2/12.14