PDSP16116/A/MC PDSP16116/A/MC 16 by 16 Bit Complex Multiplier DS3858 The PDSP16116A will multiply two complex (16 + 16) bit words every 50ns and can be configured to output the complete complex (32 + 32) bit result within a single cycle. The data format is fractional two's complement. The PDSP16116/A contains four 16 x 16 Array Multipliers, two 32 bit Adder/Subtractors and all the control logic required to support Block Floating Point Arithmetic as used in FFT applications. In combination with a PDSP16318, the PDSP16116A forms a two chip 10MHz Complex Multiplier Accumulator with 20 bit accumulator registers and output shifters. The PDSP16116 in combination with two PDSP16318s and two PDSP1601s forms a complete 10MHz Radix 2 DIT FFT Butterfly solution which fully supports Block Floating Point Arithmetic. The PDSP16116/A has an extremely high throughput that is suited to recursive algorithms as all calculations are performed with a single pipeline delay (two cycle fall-through). ISSUE 3.0 June 2000 Ordering Information PDSP16116 MC GC1R 10MHz PDSP16116 MC AC1R 10MHz PDSP16116A MC GC1R 20MHz PDSP16116A MC AC1R20MHz MIL-883 screened ceramic QFP MIL-883 screened PGA package MIL-883 screened ceramic QFP MIL-883 screened PGA package XR XI YR YI REG REG REG REG MULT MULT MULT MULT REG REG REG REG FEATURES ■ ■ ■ ■ ■ ■ ■ ■ ■ ■ Complex Number (16 + 16) X (16 + 16) Multiplication Full 32 bit Result 20MHz Clock Rate Block Floating Point FFT Butterfly Support -1 times -1 Trap Two's Complement Fractional Arithmetic TTL Compatible I/O Complex Conjugation 2 Cycle Fall Through 144 pin PGA or QFP packages APPLICATION ■ ■ ■ ■ ■ Fast Fourier Transforms Digital Filtering Radar and Sonar Processing Instrumentation Image Processing +/- +/- SHIFT SHIFT REG REG PR PI ASSOCIATED PRODUCTS PDSP16318/A PDSP16112/A PDSP16330/A PDSP1601/A PDSP16350 PDSP16256 PDSP16510 Complex Accumulator (16 + 16) X (12 + 12) Complex Multiplier Pythagoras Processor ALU and Barrel Shifter Precision Digital Modulator Programmable FIR Filter Single Chip FFT Processor Fig.1 Simplified Block Diagram CHANGE NOTIFICATION The change notification requirements of MIL-M-38510 will be implemented on this device type. Known customers will be notified of any changes since last buy when ordering further parts if significant changes have been made. Rev Date A B C JULY 1993 OCT 1998 JUN 2000 D 1 PDSP16116/A/MC The PDSP16116 has a number of features tailored for System applications. -1 x -1 Trap In multiply operations utilising Twos Complement Fractional notation, the -1 x -1 operation forms an invalid result as +1 is not representable in the fractional number range. The PDSP16116/A eliminates this problem by trapping the -1 x -1 operation and forcing the Multiplier result to become the most positive representable number. traditionally required an adiditional ALU to multiply the imaginary component by -1. The PDSP16116 eliminates the requirement for the extra ALU by offering on chip complex conjugation of either of the two incoming complex data words with no loss in throughput. Easy Interfacing As with all PDSP family members the PDSP16116 has registered I/O for data and control. Data inputs have independent clock enables and data outputs have independent three state output enables. Complex Conjugation Many algorithms utilising complex arithmetic require conjugation of complex data stream. This operation has Normal mode Configuration Signal Type Description XR15:0 XI15:0 YR15:0 YI15:0 PR15:0 PI15:0 CLK CEX CEY CONX CONY ROUND MBFP SOBFP EOPSS AR15:13 AI15:13 WTA1:0 WTB1:0 WTOUT1:0 SFTA1:0 SFTR2:0 GWR4:0 OSEL1:0 OER, OEI VDD GND INPUT INPUT INPUT INPUT OUTPUT OUTPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT OUTPUT OUTPUT OUTPUT OUTPUT INPUT INPUT POWER POWER 16 bit input for real x data 16 bit input for imag x data 16 bit input for reaal y data 16 bit input for imag y data 16 bit output for real p data 16 bit output for img p data Clock, new data is loaded on rising edge of CLK Clock, enable X-port input register Clock, enable Y-port input register Conjugate X data Conjugate Y data Rounds the real & imag results Mode select (BFP/Normal) Start of BFP operations ** End of pass ** 3 MSB's from real part of A-word ** 3 MSB's from imag part of A-word ** Word tag from A-word Word tag from B-word / shift control * Word tag output ** Shift control for A-word / overflow flag * Shift control for accumulator resul ** Global weighting register contents ** Selects the desired output configuration Output enables +5V Supply All supply pins 0V Supply must be connected * Indicates pin performs different functions in BFP / Normal modes. ** Indicates pin is used only in BFP mode Table.1 Signal Descriptions 2 Tie Low Tie Low Tie Low Tie Low Tie Low Tie Low PDSP16116/A/MC XR XI YR YI CEY CEX REG C O M P REG C O M P 16X16 MULT 16X16 MULT REG C O M P REG C O M P 16X16 MULT 16X16 MULT '1' MUX MUX MUX MUX REG REG REG REG ROUND ADD/SUB OVR DECODE CONX WTA ADD/SUB CONY AR15:13 WTB SOBFP EOPSS SFTR SFTA CONTROL LOGIC AI15:13 SHIFT SHIFT REG REG GWR4:0 WTOUT OSEL MUX MUX OER OEI PR PI Figure 2 - Block Diagram 3 PDSP16116/A/MC A B C D E F G H J K L M N P R 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 AC144 (POWER) Pin connections for 144 I/O power pin grid array package (bottom view) PIN 1 PIN 144 PIN 1 IDENT (SEE NOTE 2) GC144 Pin connections for 144 I/O ceramic quad flatpack (top view) Figure 3 Pin connection diagrams (not to scale). 4 PDSP16116/A/MC GC AC Signal GC AC 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 D3 C2 B1 D2 E3 C1 E2 D1 F2 F3 E1 G2 G3 F1 G1 H2 H1 H3 J3 J1 K1 J2 K2 K3 L1 L2 M1 N1 M2 L3 N2 P1 M3 N3 B2 A1 PI14 PI15 WTOUT1 WTOUT0 SFTR0 SFTR1 SFTR2 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 N4 P3 R2 P4 N5 R3 P5 R4 N6 P6 R5 P7 N7 R6 R7 P8 R8 N8 N9 R9 R10 P9 P10 N10 R11 P11 R12 R13 P12 N11 P13 R14 N12 N13 P14 R15 OEI CONX CONY ROUND AI13 AI14 AI15 AR13 AR14 AR15 YI15 YI14 YI13 YI12 YI11 YI10 YI9 YI8 YI7 YI6 YI5 YI4 YI3 YI2 YI1 YI0 XI0 GND VDD Signal GC XI1 XI2 XI3 XI4 XI5 XI6 XI7 XI8 XI9 XI10 XI11 XI12 XI13 XI14 XI15 CEY CEX XR15 XR14 XR13 XR12 XR11 XR10 XR9 XR8 XR7 XR6 XR5 XR4 XR3 XR2 XR1 XR0 YR15 YR14 YR13 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 Signal GC AC Signal GND P2 VDD R1 P15 YR12 M14 YR11 L13 YR10 YR9 N15 YR8 L14 YR7 M15 YR6 K13 YR5 K14 YR4 L15 YR3 J14 YR2 J13 YR1 K15 YR0 J15 H14 EOPSS VDD H15 H13 SOBFP G13 WTB1 G15 WTB0 F15 WTA1 G14 WTA0 F14 MBFP CLK F13 E15 OSEL1 E14 OSEL0 OER D15 C15 SFTA0 D14 SFTA1 E13 GWR0 C14 GWR1 B15 GWR2 D13 GWR3 C13 GWR4 B14 PR15 A15 PR14 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 N14 M13 A14 B12 C11 A13 B11 A12 C10 B10 A11 B13 C12 A10 A9 B8 A8 C8 C7 A7 A6 B7 B6 C6 A5 B5 A4 A3 B4 C5 B3 A2 C4 C3 B9 C9 VDD GND PR13 PR12 PR11 PR10 PR9 PR8 PR7 PR6 PR5 GND VDD PR4 PR3 PR2 PR1 PR0 PI0 PI1 PI2 PI3 PI4 VDD PI5 GND PI6 PI7 PI8 PI9 PI10 PI11 PI12 PI13 GND VDD AC NOTE. All GND and VDD pins must be used Figure 3A - Pin connections for AC144 (Power) and GC144 packages 5 PDSP16116/A/MC NORMAL MODE OPERATION When the MBFP mode select input is held low the ‘Normal’ mode of operation is selected. This mode supports all Complex Multiply operations that do not require Block Floating Point arithmetic. Complex two's complement fractional data is loaded into the X and Y input registers via the X and Y Ports on the rising edge of CLK. The Real and Imaginary components of the fractional data are each assumed to have the following format 15 14 WEIGHTING S 2 -1 13 -2 2 OPERATION CONX CONY XxY (XR+XI)x(YR+YI) low low X x Conj Y (XR+XI)x(YR-YI) high low Conj X x Y (XR-XI)x(YR+YI) low high Invalid high high Invalid Multiplier Satge BIT NUMBER FUNCTION 12 -3 2 11 -4 2 10 -5 2 9 -6 2 8 -7 2 7 -8 2 6 -9 2 5 -10 2 4 -11 2 3 -12 2 2 -13 2 1 -14 2 0 -15 2 Where S = sign bit which has an effective weighting -20 The value of the 16 bit two’s complement word is Table 3 Conjugate Functions Adder / Subtractor Stage The 31 bit Real and Imaginary results from the Multipliers are passed to two 32 bit Adder/Subtractors. The Adder calculates the imaginary result ((Xr x Yi) + (Xi x Yr)) and the Subtractor calculates the Real result ((Xr x Yr) = (Xi x Yi)). Each Adder/Subtractor produces a 32 bit result with the following format. BIT NUMBER 31 30 WEIGHTING S 2 0 29 -1 2 28 -2 2 27 -3 2 26 -4 2 ... ... 8 -22 2 7 -23 2 6 -24 2 5 -25 2 4 3 -26 2 -27 2 2 -28 2 1 -29 2 0 -30 2 Value = (-1xS)+(bit14x2-1)+(bit13x2-2)+(bit12x2-3). . . The effective weighting of the sign bit is -21 The X & Y port registers are individually enabled by the CEX & CEY signals respectvely. If the registers are required to be permanently enabled, then these signals may be tied to ground. On each clock cycle the contents of the input registers are passed to the four multipliers to start a new Complex Multiply operation. Each Complex Multiply operation requires four partial products (Xr x Yr), (Xr x Yi), (Xi x Yr), (Xi x Yi), all of which are calculated in parallel by the four 16 x 16 Multipliers. Only one clock cycle is required to complete the multiply stage before the Mutliplier results are loaded into the Multiplier output registers for passing on to the Adder/ Subtractors in the next cycle. Each multiplier produces a 31 bit result with the duplicate sign bit eliminated. The format of the output data from the Multipliers is BIT NUMBER 30 29 WEIGHTING S 2 -1 28 -2 2 27 -3 2 26 -4 2 25 -5 2 24 -6 2 ... ... 7 -23 2 6 -24 2 5 -25 2 4 -26 2 3 -27 2 2 -28 2 1 -29 2 0 Rounding The ROUND control when asserted rounds the most significant 16 bits of the full 32 bit result from the Adder/ Subtractor. If the ROUND signal is active (High), then bit 16 is set to a one, rounding the most significant 16 bits of the Adder/Subractor result. (The least siginificant 16 bits are unaffected). Inserting a one ensures that the rounding error is never greater than 1LSB, and that no DC bias is introduced as a result of the rounding processes. The format of the Rounded result is; BIT NUMBER 31 30 WEIGHTING S 2 0 -30 29 -1 2 28 -2 2 27 -3 2 ... 18 ... 2 -12 17 -13 2 16 -14 2 15 -15 2 14 16 2 13 -17 2 ROUNDED VALUE 2 ... ... 2 -28 2 1 -29 2 0 -30 2 LBS's The effective weighting of the sign bit is -20 The effective weighting of the sign is -21 Result Correction Shifter Due to the nature of the fraction twos complement representation it is possible to represent -1 exactly but not 1. With conventional multipliers this causes a problem when -1 is multiplied by -1 as the multiplier produces an incorrect result. The PDSP16116 includes a trap to ensure that the most positive number (value = 1.2-30), (hex = 7FFFFFFFF) is subsituted for the incorrect result. The multiplier result is therefore always a (correct) fractional value. Each of the two Adder/Subtractors are followed by Shifters controlled via the WTB control input. These shifters can each apply four different shifts, however the same shift is applied to both real and imaginary components. The four shift options are: Complex Conjugation Either the X or Y input data may be complex conjugated by asserting the CONX or CONY signals respectively. Asserting either of these signals has the effect of inverting (multiplying by -1) the imaginary component of the respective input. Table 3 shows the effect of CONX and CONY on the X and Y inputs. 6 i) WTB1:0 = 11 Shift complex product one place to the left giving a shifter output format: BIT NUMBER 31 30 WEIGHTING S 2 -1 29 -2 2 28 -3 2 27 -4 2 26 -5 2 25 -6 2 ... ... 7 -24 2 6 -25 2 5 -26 2 The effective weighting of the sign bit is -20 4 -27 2 3 -28 2 2 -29 2 1 -30 2 0 -31 2 PDSP16116/A/MC Part No: PDSP11616/A/MC 16 By16 Bit Complex Multiplier VDD max = +5.5V = V1 Package Type: AC144 N/C = not connected Pin No. Con. Pin No. Con. Pin No. Con. Pin No. Con. A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15 C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 V1 N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C 0V N/C N/C 0V N/C N/C N/C 0V N/C N/C N/C 0V N/C N/C N/C N/C N/C N/C N/C V1 N/C N/C V1 N/C N/C V1 N/C N/C N/C V1 100Ω N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C V1 100Ω E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 E14 E15 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12 F13 F14 F15 G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 G13 G14 G15 H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 H13 H14 H15 0V 100Ω N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C V1 100Ω V1 100Ω 0V 100Ω 0V 100Ω V1 100Ω N/C N/C N/C N/C N/C N/C N/C N/C N/C 0V 100Ω 0V 100Ω 0V 100Ω 0V 100Ω 0V 100Ω 0V 100Ω N/C N/C N/C N/C N/C N/C N/C N/C N/C V1 100Ω 0V 100Ω V1 100Ω 0V 100Ω 0V 100Ω V1 100Ω N/C N/C N/C N/C N/C N/C N/C N/C N/C 0V 100Ω 0V 100Ω V1 J1 J2 J3 J4 J5 J6 J7 J8 J9 J10 J11 J12 J13 J14 J15 K1 K2 K3 K4 K5 K6 K7 K8 K9 K10 K11 K12 K13 K14 K15 L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 L11 L12 L13 L14 L15 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 V1 100Ω V1 100Ω V1 100Ω N/C N/C N/C N/C N/C N/C N/C N/C N/C 0V 100Ω 0V 100Ω 0V 100Ω V1 100Ω V1 100Ω V1 100Ω N/C N/C N/C N/C N/C N/C N/C N/C N/C 0V 100Ω 0V 100Ω 0V 100Ω V1 100Ω V1 100Ω V1 100Ω N/C N/C N/C N/C N/C N/C N/C N/C N/C 0V 100Ω 0V 100Ω 0V 100Ω V1 100Ω V1 100Ω V1 100Ω N/C N/C N/C N/C N/C N/C N/C N/C N/C 0V 0V 100Ω 0V 100Ω N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 N14 N15 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 P14 P15 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15 V1 100Ω V1 100Ω 0V 100Ω 0V 100Ω 0V 100Ω 0V 100Ω 0V 100Ω V1 100Ω V1 100Ω V1 100Ω V1 100Ω V1 100Ω 0V 100Ω V1 0V 100Ω V1 100Ω 0V 0V 100Ω 0V 100Ω 0V 100Ω 0V 100Ω 0V 100Ω V1 100Ω V1 100Ω V1 100Ω V1 100Ω V1 100Ω V1 100Ω 0V 100Ω 0V 100Ω V1 0V 100Ω 0V 100Ω 0V 100Ω 0V 100Ω 0V 100Ω 0V 100Ω V1 100Ω V1 100Ω V1 100Ω V1 100Ω V1 100Ω V1 100Ω V1 100Ω 0V 100Ω Figure 4(a) - Life Test/Burn-in connections NOTE: PDA is 5% and based on groups 1 and 7 7 PDSP16116/A/MC Part No: PDSP16116/A/MC 16 By 16 Bit Complex Multiplier Package Type: GC144 Pin No. Con. Pin No. Con. Pin No. Con. Pin No. Con. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 N/C N/C N/C N/C N/C N/C N/C V1 0V 0V 0V 0V 0V 0V 0V 0V 0V V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 0V 0V V1 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 0V 0v 0V 0V 0V 0V 0V 0V 0V 0V 0V 0V 0V 0V 0V V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 V1 0V 0V 0V 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 0V V1 0V 0V 0V 0V 0V 0V 0V 0V 0V 0V 0V 0V 0V 0V V1 0V V1 V1 0V 0V 0V V1 0V 0V V1 N/C N/C N/C N/C N/C N/C N/C N/C N/C 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C 0V V1 N/C N/C N/C N/C N/C N/C N/C N/C N/C N/C V1 N/C 0V N/C N/C N/C N/C N/C N/C N/C N/C 0V V1 VDD max = +5.0V = V1 N/C = not connected Figure 4(b) Life Test/Burn-in connections NOTE: PDA is 5% and based on groups 1 and 7 8 PDSP16116/A/MC ii) WTB1:0 = 00 No shift applied giving a shifter output format: 31 30 29 28 27 26 Weighting S 20 2–1 2–2 2–3 2–4 ≈ ≈ ≈ Bit Number 8 7 6 5 4 3 2 1 2–22 2–23 2–24 2–25 2–26 2–27 2–28 2–29 2–30 The effective weighting of the shift bit is -21. iii) WTB1:0 = 01 Shift complex product one place to the right giving a shifter output format: 31 30 29 28 27 26 25 24 Weighting S 21 20 2–1 2–2 2–3 2–4 2–5 ≈ ≈ ≈ Bit Number 6 5 4 3 2 1 0 2–23 2–24 2–25 2–26 2–27 2–28 2–29 The effective weighting of the sign bit is -22. iv) WTB1:0 = 10 Shift complex product two places to the right giving a shifter output format: 31 30 29 28 27 26 25 24 Weighting S 22 21 20 2–1 2–2 2–3 2–4 ≈ ≈ ≈ Bit Number 6 5 4 3 2 1 PIN DESCRIPTIONS 0 XR, XI, YR, YI Data inputs 16 bits: Data is loaded into the input registers from these ports on the rising edge of CLK. The data format is Twos Complement Fractional, where the MSB (sign bit) is bit 15. In normal mode the weighting of the MSB is -20 ie -1. PR, PI Data outputs 16 bits: Data is clocked into the output registers and passed to the PR and PI outputs on the rising edge of CLK. The data format is Twos Complement Fractional. The field of the internal result selected for output via PR and PI is controlled by signals OSEL1:0 (see Table 4). 0 2–22 2–23 2–24 2–25 2–26 2–27 2–28 The effective weighting of the sign bit is -23. CLK Common Clock to all internal register. Overflow CEX, CEY If the left shift option is selected and the Adder/Subtractor contain a 32 bit word, then an invalid result will be passed to the output. An invalid output arising from this combination of events will be flagged by the SFTA0 flag output. The SFTA0 Flag will go high if either the real or imaginary reslut is invalid. Clock enables for X and Y input ports: When low these inputs enable the CLK signal to the X or Y input registers allowing new data to be clocked into the Multiplier. Output Select If either of these inputs are high on the rising edge of CLK, then the data in the associated input has its imaginary component inverted (multiplied by -1), see Table 3. CONX and CONY affect data input on the same clock rising edge. The output from the Shifters is passed to the Output Select Mux, which is controlled via the OSEL inputs. These inputs are not registered and hence allow the output combination to be changed within each cycle. The full complex 64 bit result from the multiplier may therefore be output within a single cycle. The OSEL control selects four different output combinations as as summarised in Table 4. OSEL1 OSEL0 PR PI 0 0 MSR MSI 0 1 LSR LSI 1 0 MSR LSR 1 1 MSI LSI Table 3 - Output Selection (Where MSR and LSR are the most and least siginificant 16 bit words of the Real Shifter output, MSI and LSI are the most and least significant 16 bit words of the imaginary Shifter output). CONX, CONY ROUND The ROUND control is used to round the most siginficant 16 bits of the Adder/Subtractor result prior to being passed to the output register. The rounding operation takes place one cycle after the ROUND input is taken high. The ROUND input is not latched and is intended to be tied high or low depending upon the application. MBFP Mode select: When high, Block Floating Point (BFP) mode is selected. This allows the device to maintain the dynamic range of the data using a series of word tags. This is especially useful in FFT appllications. When low, the chip operates in normal mode for more general applications. This pin is intended to be tied high or low, depending on application. The output select options allow two different modes for extracting the full 32 bit result from the PDSP16116. The first mode treats the two 16 bit outputs as real and imaginary ports allowing the real and imaginary results to be output in two halves on the real and imaginary output ports. The second mode treats the two 16 bit outputs as one 32 bit output and allows the real and imaginary results to be output as 32 bit words. 9 PDSP16116/A/MC SOBFP (BFP MODE ONLY) GWR4:0 (BFP MODE ONLY) Start of BFP: This input should be held low for the first cycle of the first pass of the BFP calculations (see Fig.7). It serves to reset the internal registers associated with BFP control. When operating in normal mode this input should be tied low. Contents of the global weighting register: This stores the weighting of the largest word present with respect to the weighting of the original input words. Hence, if the contents of the GWR are 00010, this indicates that the largest word currently being processed has its binary point two bits to the right of the original data at the start of the BFP calculations. The contents of this register are updated at the end of each pass, according to the largest value of WTOUT occuring during that pass. (i.e. If WTOUT = 11, then GWR will be increased by 2). The GWR is presented in two’s complement format. These outputs are superfluous in normal mode. EOPSS (BFP MODE ONLY) End of pass: This input should be held low for the last cycle of each pass and for the lay time between passes. It instructs the control logic to update the value of the global weighting register and prepare the BFP circuitry for the next pass. When operating in normal mode this input should be tied low. WTOUT1:0 (BFP MODE ONLY) AR15:13 (BFP MODE ONLY) Three Msbs of the real part of the A-word : These are used in the FFT butterfly application to deteremine the magnitude of the real part of the A-word and, hence, to determine if there will be any chage of word growth in the PDSP16318 Complex Accumulator. When operating in normal mode, these inputs are not used and may be tied low. Word tag output. This tag records the weighting of the output words from the current cycle relative to the current global weighting register (see Table 6). It should be stored along with the A’ and B’ words as it will form the input word tags, WTA and WTB, for each complex word during the next pass. These outputs are superfluous in normal mode. AI15:13 (BFP MODE ONLY) WTOUT1:0 Weighting of the output relative to the current global weighting register Three Msbs of the imaginary part of the A-word : used in the same fashion as AR. 00 One less 01 The same 10 One more 11 Two more SFTR2:0 (BFP MODE ONLY) Accumulator result shift control. These pins should be linked directly to the S2:0 pins on the PDSP16318 Complex Accumulator. They control the accumulator’s barrel shifter (see Table 5). The purpose of this shift is to minimise sign extension in the multiplier or accumulator ALU’s. When operating in normal mode, these output are superfluous. SFTR2:0 FUNCTION 000 Reserved 001 Reserved 010 Reserved 011 Shift right by one 100 No shift 101 Shift left by one 110 Shift left by two 111 Reserved Table 5 - Auccumulator Shifts ( BFP mode ) 10 Table 6 - Word Tag Weightings WTA1:0 (BFP MODE ONLY) Word tag from the A-word. This word records the weighting of the A-word relative to the global weighting register on the previous pass. Although the A-word inself is not processed in the PDSP16116, this information is required by the control logic for the radix-2 butterfly FFT application. These inputs should be tied low in normal mode. WTB1:0 (BFP & NORMAL MODES) In BFP mode, this is the word tag from the B-word. This is operated in the same manner as WTA but for the B-word. The value of the word tags are used to ensure that the binary weighting of the A word and the product of the complex multiplier are the same at the inputs to the complex accumulator. Depending on which word is the larger, the weighting adjustment is performed using either the internal shifter or an external shifter controlled by SFTA. The word tags are also used to maintain the weighting of the final result to within plus two and minus one binary points relative to the new GWR. (On the first pass all word tags will be ignored). PDSP16116/A/MC In normal mode, these inputs perform a different function. They directly control the internal shifter at the output port as shown in Table 7. WTB1:0 FUNCTION 11 00 01 10 shift complex product one place to the left no shift applied shift complex product one place to the right shift complex product two places to the right Table 7 - Normal Mode Shift Control SFTA1:0 (BFP & NORMAL MODES) In BFP mode, these signals act as as the A-word shift control. They allow shifting from one to four places to the right, see Table 8. Depending on the relative weightings of the Awords and the complex product, the A-word may have to be shifted to the right to ensure compatible weightings at the inputs to the PDSP16318 complex accumulator. (The two words must have the same weighting if they are to be added). In normal mode, SFTA0 performs a different a different function. If WTB1:0 is set to implement a left shift, then overflow will occur if the data is fully 32 bits wide. This pin is used to flag such an overflow. SFTA1 is not used in normal mode. WTB1:0 FUNCTION 00 01 10 11 Shift A-word 1 places to the right Shift A-word 2 places to the right Shift A-word 3 places to the right Shift A-word 4 places to the right Table 8 - External A-word shift control OSEL1:0 The outputs from the device are selected by the OSEL0 & OSEL1 instruction bits. These controls allow selection of the output combination during the current cycle. (They are not registered). These are four possible output configurations that allow either complex outputs of the most or least significant bytes, or real or imaginary outputs of the full 32 bit word (see Table 4). OSEL0 and OSEL1 should both be tied low when in BFP mode. BFP MODE FFT APPLICATION The PDSP16116 may be used as the main arithmetic unit of the butterfly processor which will allow the following FFT benchmarks: 1024 point complex radix-2 transform in 517us 512 point complex radix-2 transform in 235us 256 point complex radix-2 transform in 106us In addition, with pin MBFP tied high, the BFP circuitry within the PDSP16116 can be used to adaptively rescale data throughout the course of the FFT so as to give high-resolution results. The BFP system on the PDSP16116 can be used with any variation of the Radix-2 Decimation-In-Time FFT - e.g. the Constant Geometry algorithm, the In-Place algorithm etc. An N-point Radix-2 DIT FFT is split into log (N) passes. Each pass consists of N/2 ‘butterflies’, each performing the operation: A’ = A + B.W B’ = A - B.W Where W is the complex coefficient and A & B are the complex data. Fig.4 illustrates how a single PDSP16116 may be combined with two PDSP1601’s and two PDSP16318’s to form a complete BFP butterfly processor. The PDSP16318’s are used to perform the complex addition and subtraction of the butterfly operation, while the PDSP1601’s are used to match the data path of the A-word to the pipelining and shifting operations within the PDSP16116. For more information on the theory and construction of this butterfly processor, refer to application note AN59. BFP MODE OPERATION The BFP mode on the PDSP16116 is intended for use in the FFT application described above. i.e. it is intended to prevent data degredation during the course of an FFT calculation. The operation of the PDSP16116 based BFP butterfly processor (see Fig.4) is described below. The Block Floating Point System A block floating point system is essentially an ordinary integer arithmetic system with some clever logic bolted on. The object of the extra logic is to lend the system some of the enormous dynamic range afforded by a true floating point system without suffering the corresponding loss in performance. The initial data used by the FFT should all have the same binary arithmetic weighting. i.e. the binary point should occupy the same position in every data word, as is normal in integer arithmetic. However, during the course of the FFT, a variety of weightings are used in the data words to increase the dynamic range available. This situation is similar to that within a true floating point system, though the range of numbers representable is more limited. In the BFP system used in the PDSP16116, there are, within any one pass of the FFT, four possible positions of the binary point wihin the integer words. To record the position of its binary point, each word has a 2bit word tag associated with it. By way of example, in a particular pass we may have the following four positions of binary point avaiable, each denoted by a certain value of word tag: XX.XXXXXXXXXXXX XXX.XXXXXXXXXXX XXXX.XXXXXXXXXX XXXXX.XXXXXXXXX word tag = 00 word tag = 01 word tag = 10 word tag = 11 11 PDSP16116/A/MC SOBFP AR EOPSS BR BI WR WTA WTB WI AI AI15:13 AR15:13 A XR PDSP1601/A XI YR YI A PDSP16116/A PDSP1601/A SFTA SFTA C PR C PI DAR DAI A B PDSP16318/A C A'R B SFTR SFTR D A'I WTOUT GWR A PDSP16318/A C D B'R B'I Figure 5 - FFT Butterfly Processor At the end of each constituent pass of the FFT, the positions of the binary point supported may change to reflect the trend of data increase or decreases in magnitude. Hence, in the pass following that of the above example, the four positions of binary point supported may be change to: XX.XXXXXXXXXXXX XXX.XXXXXXXXXXX XXXX.XXXXXXXXXX XXXXX.XXXXXXXXX word tag = 00 word tag = 01 word tag = 10 word tag = 11 This variation in the range of binary points supported from pass to pass (i.e. the movement of the binary point relative to its position in the original data) is recorded in the GWR. 12 Thus we can determine the position of the binary point relative to its initial position by modifying the value of GWR by WTOUT for a given word as shown in Table 6. As an example, if GWR=01001 and WTOUT=10 then the binary point has moved 10 places to the right of its original position. PDSP16116/A/MC A new butterfly operation is commenced each cycle, requiring a new set of data for , B, W, WTA and WTB. Five cycles later, the corresponding results A' and B' are produced along with their associated WTOUT. In between, the signals SFTA and SFTR are produced and acted upon by the shifters in the PDSP1601/A and PDSP16318/A. The timing of the data and control signals is shown in Fig.6. The results (A' and B') of each butterfly calculation in a pass must be stored away to be used later as the input data (A and B) in the next pass. Each result must be stored together with its associated word tag, WTOUT. Although WTOUT is common to both A' and B', it must be stored separately with each word as the words are used on different cycles during the next pass. At the inputs, the word tag associated with the A word is known as WTA and the word tag associated with the B word is known as WTB. Hence, the WTOUTs from one pass will become the WTAs and WTBs for the following pass. It should be noted that the first pass is unique in that word tags need not be input into the butterfly as all data initially has the same weighting. Hence, during the first pass alone, the inputs WTA and WTB are ignored. The butterfly operation The butterfly operation is the arithmetic operation which is repeated many times to produce an FFT. The PDSP16116A based butterfly processor performs this operation in a low power high accuracy chip set. A A' A' = A + B. W B' = A - B. W W B B' Figure 6 - Butterfly Operation CLK Present Br, Bi, Wr, Wi to inputs ;;;; ;;;; ;;;; ;;;; n n+1 n+2 n+3 n+4 n+5 n n+1 n+2 n+3 n+4 n+5 n n+1 Output SFTA n-2 n-1 Output SFTR n-3 n-2 Output Pr, Pi n-2 n-1 Output DAr, DAi n-3 n-2 Output WTOUT n-5 n-4 Output A'r, A'i, B'r, B'i n-5 n-4 Present WTA, WTB to inputs Present Ar, Ai to inputs ;;;;; ;;;;; ;;;; ;;;; ;;;; ;;;; n+2 n+3 n+4 n+5 n n+1 n+2 n+3 n-1 n n+1 n+2 n-1 n n+1 n+2 n-1 n n+1 n-3 n-2 n-1 n-3 n-2 n-1 ;;;;; ;;;;; ;;;;; n+2 n n Figre 7 Butterfly Data and Control Signals 13 PDSP16116/A/MC Control of the FFT where it should remain for the duration of the FFT. New data is presented to the processor each successive cycle until the end of the first pass of the FFT. On the last cycle of the pass, the signal EOPSS should be pulled low and remain low for a minimum of five cycles *, the time required to clear the pipeline of the butterfly processor so that all the results from one pass are obtained before commencing the following pass. On the initial cycle of each new pass, the signal EOPSS should be pulled high and it should remain high until the final cycle of that pass, when it is pulled low again. * Should a longer pause be required between passes - to arrange the data for the next pass, for example, then EOPSS may be kept low as long as necessary - the next pass cannot commence until it is brought high again. To enable the block floating point hardware to keep track of the data, the following signals are provided : SOBFP - start of the FFT EOPSS - end of current pass These inform the PDSP16116/A when an FFT is starting and when each pass is complete. Fig.7 shows how these signals should be used and a commentary is provided below. To commence the FFT, the signal EOPSS should be set high (where it will remain for the duration of the pass). SOBFP should be pulled low during the initial cycle when the first data words A and B are presented to the inputs of the butterfly processor. The following cycle SOBFP must be pulled high CLK SOBFP 1 = first cycle of data in pass n = last cycle of data in pass EOBFP EOPSS A, B, W, WTA, WTB 1 2 3 4 5 6 1 A', B', WTOUT 7 2 n-1 3 n n-5 1 n-4 n-3 n-2 n-1 2 n 3 4 5 6 1 7 2 GWR start of first pass end of first pass / start of next pass (minimum number of lay cycles shown) - period between other intermediate passes is similar Figure 8 - Use of the BFP Control Signals FFT Output Normalisation When an FFT system outputs a series of FFT results for display, storage or transmission, it is essential that all results are compatible, i.e. with the binary point in the same position. However, in order to preserve the dynamic range of the data in the FFT calculation, the PDSP1601/A employs a range of different weightings. Therefore, data must be re-formatted at the end of the FFT to be pre-determined common weighting. This can be done by comparing the exponent of given data word with the pre-determined unversial exponent and then shifting the data word by the difference. The PDSP1601/A, with its multifunction 16 bit barrel shifter, is ideally suited to this task. What value should the Unversal Exponent take? Well, according to theory, the largest possible data result from an FFT is N times the largest input data. This means that the binary point can move a maximum of log2(N) places to the right. Hence, if we choose the Unverisal Exponent to be log2(N) this should give us sufficient range to represent all data points faithfully. 14 In practice, data output may never approach the theoretical maximum. Hence, it may be worthwhile to try various Unverisal Exponents and choose the one best suited to the particular application. Data is output from the butterfly processor with a two-part exponent: the 5-bit GWR applicable to all data words from a given FFT and a 2-bit WTOUT associated with each individual data word. To find the complete exponent for a given word, the GWR for that FFT must be modified by its WTOUT as shown in Table 6. The result is the number of places the binary point has shifted to the right during the course of the FFT. This value must be compared with the Unversial Exponent to determine the shift required. This is done by subtracting it from the Unversial Exponent. The number of places to be shifted is equal to the difference between the two exponents. The shift can be implemented in a PDSP1601/A. The shift value is fed into the SV port. PDSP16116/A/MC As FFT data consists of real and imaginary parts, either two PDSP1601As must be used (controlled by the same logic) or a single PDSP1601/A could be used handling real and imaginary data on alternate cycles (using the same instructions for both cycles). N.B. It is easier to simply add the word tag to the exponent for the purpose of determing the shift required, instead of modifying it according to Table.6. To compensate for this, the Universal Exponent may be increased by one. An example of an output normalisation circuit is shown in Fig.8. Only 4 bit data paths are used in calculating the shift. This means that we must be able to trap very small values negative of GWR and force a 15-bit right shift in such cases. WTOUT GWR 16-BIT DATA sign bit 4-BIT ADDER UNVERSAL EXPONENT 4-BIT SUBTRACTOR 1111 4-BIT MUX SV-PORT B-PORT PDSP1601 IS ASRSV C-PORT NORMALISED OUTPUT DATA Fig.9 Output Normalisation Circuitry 15 PDSP16116/A/MC ABSOLUTE MAXIMUM RATINGS (Note 1) NOTES 1. Exceeding these ratings may cause permanent damage. Functional operation under these conditions is not implied. 2. Maximum dissipation or 1 second should not be exceedeed, only one output to be tested at any one time. 3. Exposure to absolute maximum ratings for extended periods may affect device reliability. Supply voltage VCC -0.5V to 7.0V Input voltage VIN -0.5V to VCC +0.5V -0.5V to VCC +0.5V Output voltage VOUT Clamp diode current per Ik (see note 2) 18mA Static discharge voltage (HBM) 500V Storage temperature range TS -65°C to +150°C Ambient temperature with power applied TAMB Military -55°C to +125°C Industrial -40°C to +85°C Junction temperature 150°C Package power dissipation 1000mW Thermal resistances Junction to case øJC 12°C/W Junction to case øJA 29°C/W ELECTRICAL CHARACTERISTICS Operating conditions (unless otherwise stated): Industrial: TAMB = -40°C to +85°C, VCC = 5.0V ± 10%, GND = 0V Military: TAMB = -55°C to +125°C, VCC = 5.0V ± 10%, GND = 0V Static Characteristics Characteristic Output high voltage Output low voltage Input high voltage Input high voltage Input low voltage Input leakage current Input capacitance Output leakage current Output S/C current 16 Value Symbol VOH VOL VIH VIH VIL IIN CIN IOZ IOS Min. Typ. Min. 2.4 3.0 2.2 -10 0.4 0.8 +10 10 -50 10 +50 300 Units Conditions V V V V V µA pF µA mA IOH = 8mA IOL = -8mA CLK input only All other inputs GND <VIN<VCC GND <VIN<VCC VCC = Max PDSP16116/A/MC Switching Characteristics PDSP16116 PDSP16116A Characteristic CLK rising edge to P-PORTS CLK rising edge to WTOUT1:0 CLK rising edge to GWR4:0 CLK rising edge to SFTA1:0 CLK rising edge to SFTR2:0 Setup CEX or CEY to CLK rising edge Hold CEX or CEY to CLK rising edge Setup X or Y port inputs to CLK rising edge Hold X or Y port inputs to CLK rising edge Setup WTA1:0, WTB1:0, SOBFP or EOPSS inputs to CLK rising edge Hold WTA1:0, WTB1:0, SOBFP or EOPSS inputs to CLK rising edge Setup CONX or CONY inputs to CLK rising edge Hold CONX or CONY inputs to CLK rising edge Setup AR15:13 or AI15:13 to CLK rising edge Hold AR15:13 or AI15:13 to CLK rising edge OPSEL to valid P-PORTS OER or OEI rising PR-PORT or PI-PORT high to Z OER or OEI rising PR-PORT or PI-PORT low to Z OER or OEI falling PR-PORT or PI-PORT Z to high OER or OEI falling PR-PORT or PI-PORT Z to low Clock period Clock high time Clock low time Vcc Current (CMOS input levels) Vcc Current (TTL input levels) Units Min. Max. Min. Max. 5 5 5 5 5 11 11 14 45 30 30 60 50 0 2 - 5 5 5 5 5 8 8 8 23 20 20 30 28 0 0 - ns ns ns ns ns ns ns ns ns ns - 0 - 0 ns 14 14 100 30 20 - 0 0 35 35 45 22 24 60 100 8 50 12 12 - 0 0 20 25 25 18 18 80 130 ns ns ns ns ns ns ns ns ns ns ns ns mA mA Conditions 2 x LSTTL + 20pF 2 x LSTTL + 20pF 2 x LSTTL + 20pF 2 x LSTTL + 20pF 2 x LSTTL + 20pF 2 x LSTTL + 20pF see Fig.9 see Fig.9 see Fig.9 see Fig.9 see Note 4 see Note 4 NOTE 4 :- VCC = Max Outputs unloaded, clock freq = Max Test Delay from output high to output high impedance Waveform - measurement level VH V T = 0V Delay from output low to output high impedance Delay from output high impedance to output high 1.5K Ω DUT 30pF V T = Vcc VL Delay from output high impedance to output low VT 0.5V 1.5V 1.5V 0.5V Fig.10 Three state delay measurement load 0.5V 0.5V VH - Voltage reached wh en output driven hig VL - Voltage reached wh en output driven low 17 For more information about all Zarlink products visit our Web Site at www.zarlink.com Information relating to products and services furnished herein by Zarlink Semiconductor Inc. or its subsidiaries (collectively “Zarlink”) is believed to be reliable. However, Zarlink assumes no liability for errors that may appear in this publication, or for liability otherwise arising from the application or use of any such information, product or service or for any infringement of patents or other intellectual property rights owned by third parties which may result from such application or use. Neither the supply of such information or purchase of product or service conveys any license, either express or implied, under patents or other intellectual property rights owned by Zarlink or licensed from third parties by Zarlink, whatsoever. Purchasers of products are also hereby notified that the use of product in certain ways or in combination with Zarlink, or non-Zarlink furnished goods or services may infringe patents or other intellectual property rights owned by Zarlink. This publication is issued to provide information only and (unless agreed by Zarlink in writing) may not be used, applied or reproduced for any purpose nor form part of any order or contract nor to be regarded as a representation relating to the products or services concerned. The products, their specifications, services and other information appearing in this publication are subject to change by Zarlink without notice. No warranty or guarantee express or implied is made regarding the capability, performance or suitability of any product or service. Information concerning possible methods of use is provided as a guide only and does not constitute any guarantee that such methods of use will be satisfactory in a specific piece of equipment. It is the user’s responsibility to fully determine the performance and suitability of any equipment using such information and to ensure that any publication or data used is up to date and has not been superseded. Manufacturing does not necessarily include testing of all functions or parameters. These products are not suitable for use in any medical products whose failure to perform may result in significant injury or death to the user. All products and materials are sold and services provided subject to Zarlink’s conditions of sale which are available on request. Purchase of Zarlink’s I2C components conveys a licence under the Philips I2C Patent rights to use these components in and I2C System, provided that the system conforms to the I2C Standard Specification as defined by Philips. Zarlink, ZL and the Zarlink Semiconductor logo are trademarks of Zarlink Semiconductor Inc. Copyright Zarlink Semiconductor Inc. All Rights Reserved. TECHNICAL DOCUMENTATION - NOT FOR RESALE