CoreCORDIC CORDIC RTL Generator Product Summary • – Intended Use • COordinate Rotation DIgital Computer (CORDIC) Rotator Function for Actel FPGAs Vector Rotation – Conversion of Polar Coordinates to Rectangular Coordinates • Vector Translation – Conversion of Rectangular Coordinates to Polar Coordinates • Sine and Cosine Calculation 2 2 • Vector (X, Y) Magnitude (arctan[X/Y]) Calculation • 8-Bit to 48-Bit Configurable Word Size • 8 to 48 Configurable Number of Iterations • Parallel Pipelined Architecture for the Fastest Calculation • Bit-Serial Architecture for the Smallest Area • Word-Serial Architecture for Moderate Speed and Area • Word Parallel Data I/Os X +Y Fusion • ProASIC®3/E • ProASICPLUS ® • Axcelerator® • RTAX-S • SX-A • RTSX-S Synthesis: Synplicity®, Synopsys® Compiler/FPGA Compiler), Exemplar™ • Simulation: OVI-Compliant Verilog Simulators and Vital-Compliant VHDL Simulators. (Design General Description CoreCORDIC is an RTL generator that produces an Actel FPGA–optimized CORDIC engine. The CORDIC algorithm by J. Volder provides an iterative method of performing vector rotations using shifts and adds only. The articles listed in "References" on page 12 present a detailed description of the algorithm. CoreCORDIC RTL Generator. Generates UserDefined CORDIC Model and Test Harness. Fully Supported in the Actel Libero® Integrated Design Environment (IDE) March 2006 © 2006 Actel Corporation • General Description ................................................... 1 CoreCORDIC Device Requirements ........................... 4 Architectures .............................................................. 5 I/O Formats ................................................................. 7 CoreCORDIC Configuration Parameters ................... 9 I/O Signal Description ................................................ 9 I/O Interface and Timing .......................................... 11 References ................................................................ 12 A Sample Configuration File ................................... 13 Ordering Information .............................................. 13 Datasheet Categories ............................................... 13 Appendix I ................................................................ 14 Appendix II ............................................................... 15 Full Version – Libero IDE and Phase Core Deliverables • • Table of Contents Supported Families • Supports CORDIC Engine and Test Harness Generation with Limited Parameters. Fully Supported in Libero IDE. Synthesis and Simulation Support Key Features • Evaluation Version Depending on the configuration defined by the user, the resulting module implements pipelined parallel, wordserial, or bit-serial architecture in one of two major modes: rotation or vectoring. In rotation mode, the CORDIC rotates a vector by a specified angle. This mode is used to convert polar coordinates to Cartesian v 2 .0 1 CoreCORDIC CORDIC RTL Generator coordinates, for general vector rotation, and also to calculate sine and cosine functions (see Figure 1). "Appendix I" on page 14 presents mathematical coordinate conversion formulae, and "Appendix II" on page 15 describes examples of a few of the most used CORDIC modes. Magnitude r Phase θ x CORDIC Engine y The gain can be compensated for elsewhere in many applications when the system includes the CORDIC engine. To assist a user in doing so, the CoreCORDIC software computes the precise value of the gain and displays it on a screen. In the cases when only relative magnitude is of importance—for example, spectrum analysis and AM demodulation—the constant gain can be neglected. When calculating sine/cosine, the CORDIC gets initialized with a constant reciprocal value of the processing gain r = 1/K. EQ 1 and EQ 2 become X = cos θ Figure 1 • CORDIC Engine in Rotation Mode Y = sin θ In vectoring mode, the CORDIC rotates the input vector towards the x axis while accumulating a rotation angle. Vectoring mode is used to convert Cartesian vector coordinates to polar coordinates; i.e., to calculate the magnitude and phase of the input vector (Figure 2). Thus, the gain does not impact the sine/cosine results or the phase output. To perform the conversions, the CORDIC processor implements the iterative CORDIC equations EQ 5 through EQ 7. xi + 1 = xi – yi × di × 2 x CORDIC Engine y –i EQ 5 Magnitude r Phase yi + 1 = yi + xi × di × 2 θ –i EQ 6 –i a i + 1 = a i – d i × arctan ( 2 ) Figure 2 • CORDIC Engine in Vectoring Mode The CORDIC results, such as x, y, and r, are scaled by the inherent processing gain, K, which depends on number of iterations and converges to about 1.647 after a few iterations. The gain is constant for a given number of iterations. When performing Cartesian/polar coordinate conversion, the CORDIC computes the results shown in EQ 1 and EQ 2 in rotation mode. EQ 1 In rotation mode di = –1 if ai < 0, otherwise di = 1 EQ 8 In vectoring mode di = 1 if yi < 0, otherwise di = –1 EQ 9 Y = K ⋅ r ⋅ sin θ EQ 2 EQ 3 and EQ 4 show the CORDIC results in vectoring mode. 2 • • X = K ⋅ r ⋅ cos xθ r = K⋅ X +Y EQ 7 The sign-controlling function di takes the values shown in EQ 8 and EQ 9: 2 EQ 3 θ = arctan ( Y ⁄ X ) The input and output data is represented as n-bit words, where n is a user-defined number in the range from 8 to 48. The number of iterations is also defined by a user in the same range. The CORDIC result accuracy improves when the number of iterations is increased, as long as the number of iterations does not exceed data bit width. In other words, the bit width limits the number of meaningful iterations. EQ 4 A system that utilizes the CORDIC engine (Figure 3 on page 3) consists of the following: 2 • A data source generating the vector data to be converted by the CORDIC • The CORDIC module configured to work in either rotation or vectoring mode • A data receiver accepting the newly converted vector data v2.0 CoreCORDIC CORDIC RTL Generator Master Clock clkEn x0 Data Source xn CORDIC Engine y0 a0 ldData yn an OR Data Receiver rdyOut rst Global Reset nGrst Figure 3 • CORDIC-Based System The negative nGrst signal resets the CORDIC engine and, optionally, the entire system. After the reset (input nGrst taken high), the CORDIC module is ready to receive data samples to be processed. The module synchronous reset input rst can be used to bring the CORDIC unit to the ready state at any time after the initial global reset. Note: The CORDIC module will lose half-processed data when rst is taken high by the system. The data source supplies the CORDIC engine with the data to be converted. Depending on the mode (rotation or vectoring), the system uses different CORDIC inputs and outputs to enter and obtain the data. Table 1 shows the input/ output signals used in each mode. Table 1 • CORDIC Connection to the System Input Data CORDIC Input Output Data CORDIC Output Common Rotation Modes Input vector magnitude x0 Output vector coordinate X xn Constant 0 y0 Output vector coordinate Y yn Input vector phase a0 N/A an Rotation Mode: Sine/Cosine Table Generator Constant reciprocal value of the processing gain r = 1/K x0 sin(θ) xn Constant 0 y0 cos(θ) yn Sine/cosine argument θ a0 N/A an Vectoring Mode Input vector coordinate X x0 Output vector magnitude r xn Input vector coordinate Y y0 N/A yn Constant 0 a0 Output vector phase θ an The system accompanies every new pair of the input data samples with the one-bit ldData signal. Upon receiving the ldData bit, the module assumes the vector coordinates are present on input data busses. Once the CORDIC results are ready, the engine puts these out, accompanied by the one-bit rdyOut signal. Upon receiving the rdyOut bit, the system can supply a new pair of input data and generate another ldData signal. CoreCORDIC can generate three different CORDIC core implementation architectures and an appropriate testbench: • Parallel pipelined • Word-serial • Bit-serial The parallel pipelined architecture provides the fastest speed, whereas the bit-serial architecture provides the smallest area. The word-serial architecture provides the trade-off of moderate speed and area. v2.0 3 CoreCORDIC CORDIC RTL Generator CoreCORDIC Device Requirements Table 2 provides typical utilization and performance data for CoreCORDIC, implemented in various Actel devices with the CORDIC engine bit resolution set to 24 bits and the number of iterations set to 24. Device utilization and performance will vary depending upon the architecture chosen and the configuration parameters used. Time-driven settings were used when synthesizing parallel architectures; area optimization settings were used in other cases. The CORDIC core does not utilize on-chip RAM blocks. Table 2 • CoreCORDIC Device Utilization and Performance Cells or Tiles Device Engine Architecture Mode Comb Fusion AFS600 AFS600 AFS600 A3P250 A3P1000 Total Transform Time, nsec Speed Grade –2 Bit-serial Word- serial Parallel Rotate 297 110 407 3% 88 6,568 Vector 293 108 401 3% 87 6644 Rotate 668 103 771 6% 30 833 Vector 660 101 761 6% 27 926 Rotate 11,810 1,884 13,694 99% 46 21.7 ProASIC3/E A3P250 Seq Utilization Clock Rate, % MHz Speed Grade –2 Bit-serial Word-serial Parallel Rotate 297 110 407 7% 83 6,964 Vector 296 108 404 7% 93 6,215 Rotate 664 103 767 12% 30 833 Vector 658 101 759 12% 26 962 Rotate 12,541 1,906 14,447 59% 46 21.7 Vector 14,832 1,981 16,813 68% 62 16.1 393 8% 61 9,475 ProASICPLUS Speed Grade STD APA150 Bit-serial Rotate Vector 394 107 501 8% 63 9,175 APA150 Word-serial Rotate 824 114 938 15% 20 1,250 Vector 822 114 936 15% 19 1,316 APA1000 Parallel Rotate 14,301 1,889 16,190 29% 32 31.3 Vector 16,594 1,936 18,530 33% 37 27.0 Axcelerator AX125 AX125 AX500 108 501 Speed Grade –2 Bit-serial Word-serial Parallel Rotate 196 106 302 15% 113 5,115 Vector 185 105 290 14% 115 5,026 Rotate 413 124 537 27% 103 243 Vector 405 133 538 27% 109 229 Rotate 4,633 1,832 6,465 80% 130 7.7 Vector 4,617 1,835 6,452 80% 124 8.1 196 302 8% 92 6,283 RTAX-S Speed Grade –1 RTAX250S Bit-serial Rotate 106 Vector 185 105 290 7% 100 5,780 RTAX250S Word-serial Rotate 413 124 537 14% 74 338 Vector 405 133 538 14% 75 333 RTAX1000S Parallel Rotate 4,633 1,832 6,465 36% 89 11.2 Vector 4,617 1,835 6,452 36% 81 12.3 Note: The above data were obtained by typical synthesis and place-and-route methods. Other core parameter settings can result in different utilization and performance values. 4 v2.0 CoreCORDIC CORDIC RTL Generator Table 2 • CoreCORDIC Device Utilization and Performance (Continued) Cells or Tiles Engine Architecture Mode Comb 54SX72A Bit-serial Rotate 190 Vector 195 105 300 54SX72A Word-serial Rotate 656 132 788 Vector 643 124 767 13% Device 54SX-A Seq Total Utilization Clock Rate, % MHz Transform Time, nsec Speed Grade –2 105 RT54SX-S 295 5% 67 8,627 5% 71 8,141 13% 55 455 50 500 Speed Grade –1 RT54SX72S RT54SX72S Bit-serial Word-serial Rotate 189 104 293 5% 55 10,509 Vector 190 104 294 5% 55 10,509 Rotate 677 132 809 13% 33 758 Vector 664 125 789 13% 34 735 Note: The above data were obtained by typical synthesis and place-and-route methods. Other core parameter settings can result in different utilization and performance values. Architectures Word-Serial Architecture Direct implementation of the CORDIC iterative equations (see "References" on page 12) yields the block diagram shown in Figure 4. The vector coordinates to be converted, or initial values, are loaded via multiplexers into registers RegX, RegY, and RegA. RegA, along with an adjacent adder/subtractor, multiplexer, and a small arctan LUT, is often called an angle accumulator. Then on each of the following clock cycles, the registered values are passed through adders/subtractors and shifters. The results described by EQ 5 through EQ 7 on page 2 are loaded back to the same registers. Every iteration takes one clock cycle, so that in n clock cycles, n iterations are performed and the converted coordinates are stored in the registers. >> i x0 y0 >> i di +/– arctan LUT di –/+ RegX +/– RegY Mode: Rotation/Vectoring RegA Sign ai Sign yi xn a0 yn Sign Controlling Logic an di Figure 4 • Word-Serial CORDIC Block Diagram v2.0 5 CoreCORDIC CORDIC RTL Generator Depending on the CORDIC mode (rotation or vectoring), the sign-controlling logic watches either the RegY or the RegA sign bit. Based on EQ 8 and EQ 9 on page 2, it decides what type of operation (addition or subtraction) needs to be performed at every iteration. The arctan LUT keeps a pre-computed table of the arctan(2-i) values. The number of entries in the arctan LUT equals the desirable number of iterations, n. The word-serial CORDIC engine takes n + 1 clock cycles to complete a single vector coordinate conversion. Parallel Pipelined Architecture This architecture presents an unrolled version of the sequential CORDIC algorithm above. Instead of reusing the same hardware for all iteration stages, the parallel architecture has a separate hardware processor for every CORDIC iteration. An example of the parallel CORDIC architecture configured for rotation mode is shown in Figure 5. Each of the n processors performs a specific iteration, and a particular processor always performs the same iteration. This leads to a simplification of the hardware. All the shifters perform the fixed shift, which means these can be implemented in the FPGA wiring. Every processor utilizes a particular arctan value that can also be hardwired to the input of every angle accumulator. Yet another simplification is an absence of a state machine. The parallel architecture is obviously faster than the sequential architecture described in the "Word-Serial Architecture" section on page 5. It accepts new input data and puts out the results at every clock cycle. The architecture introduces a latency of n clock cycles. x0 a0 y0 d0 >> 0 +/– >> 0 d0 –/+ +/– d0 Reg Reg Reg x1 y1 a1 d1 >> 1 +/– >> 1 d1 –/+ +/– d1 Reg Reg x2 y2 a2 d2 >> 2 +/– >> 2 d2 –/+ Reg Reg xn–1 yn–1 an–1 >> n-1 dn–1 –/+ Reg Reg xn yn dn–1 Figure 5 • Parallel CORDIC Architecture v2.0 d2 Reg dn–1 +/– d1 arctan (2–2) +/– d2 d0 arctan (2–1) Reg >> n–1 6 arctan (20) arctan (2n–1) +/– Reg an dn–1 CoreCORDIC CORDIC RTL Generator I/O Formats Bit-Serial Architecture Whenever the CORDIC conversion speed is not an issue, this architecture provides the smallest FPGA implementation. For example, in order to initialize a Sine/Cosine LUT, the bit-serial CORDIC is the solution. Figure 6 depicts the simplified block diagram of the bitserial architecture. The shift registers get loaded with initial data presented in bit-parallel form, i.e., all bits at once. The data then shifts to the right, before arriving the serial adders/subtractors. Every iteration takes m clock cycles, where m is the CORDIC bit resolution. Serial shifters are implemented by properly tapping the bits of the shift registers. The control circuitry (not shown in Figure 6) provides sign-padding of the shifted serial data to realize its correct sign extension. The results from the serial adders return back to the shift registers, so that in m clock cycles the results of another iteration are stored in the shift registers. Q Format Fixed-Point Numbers CoreCORDIC, as virtually any FPGA DSP core does, utilizes fixed-point arithmetic. In particular, the numbers the core operates with are presented as two’s complement signed fractional numbers. To identify the position of a binary point separating the integer and fractional portions of the number, the Q format is commonly used. An mQn format number is an (n+1)-bit signed two’s complement fixed-point number: a sign bit followed by n significant bits with the binary point placed immediately to the right of the m most significant bits. The m MSBs represent the integer part, and (n–m) LSBs represent the fractional part of the number, called the mantissa. Table 3 depicts an example of a 1Qn format number. A single full CORDIC conversion takes n×m+2 clock cycles. x0 Table 3 • 1Qn Format Number msb 3 2 1 0 Integer bit Sign +/– Position of the Binary Point Bits [2n–2 : 20] Mantissa 3 2 1 Shift Reg yn 0 Position of the Binary Point Bits [2n–1 : 20] Mantissa The following sections explain in detail the formats of the input and output signals. The linear and angular values are explained separately. The linear signals include Cartesian coordinates and a vector magnitude. These come to the CORDIC engine inputs x0 and y0, or appear on its outputs xn and yn. Since the sine and cosine functions the CORDIC calculates are essentially the Cartesian coordinates of the vector, the angular signals include the vector phase that comes to the CORDIC engine input a0, or appears on its output an. Both linear and angular signals utilize mQn formats and appropriate conversion rules from floating-point to the mQn formats. signY +/– lsb y0 msb Sign Bit 2n lsb signX n–2 Bit 2n–1 Table 4 • Qn Format Number Shift Reg xn n–2 Bit 2n a0 I/O Linear Format msb The CoreCORDIC engine utilizes the 1Qn format shown in Table 3. Though the 1Qn format numbers are capable of expressing fixed-point numbers in the range from (–2n) to (2n – 2m–n), the input linear data must be limited to fit the smaller range from (–2n–1) to (2n–1). In terms of floating-point numbers, the input must fit the range from –1.0 to +1.0. For example, the 1Q9 format input data range is limited by the following 10-bit numbers: lsb Shift Reg an +/– arctan Serial ROM Figure 6 • Bit-Serial CORDIC Architecture Max input negative number of –1.0: 1100000000 ⇔ 11.00000000 v2.0 7 CoreCORDIC CORDIC RTL Generator Max input positive number of +1.0: Here it is assumed the floating-point data are presented in the range from –1.0 to 1.0. The product on the right-hand side of EQ 10 contains integer and fractional parts. The fractional part has to be truncated or rounded. Table 5 shows a few examples of converting the floating-point numbers to the 1Q9 format. 0100000000 ⇔ 01.00000000 This precaution is taken to prevent the data overflow that otherwise could occur as a result of the CORDIC inherent processing gain. The output data obviously do not have to fit the limited range. To convert the 1Qn format back to the floating-point format, use EQ 11. To convert floating-point linear input data to the 1Qn format, follow the simple rule in EQ 10: n–1 1Qn Fixed-Point Data = 2 Floating-Point Data = 1Qn Fixed-Point Data/2n–1 × Floating-Point Data EQ 11 EQ 10 Table 5 • Floating-Point to 1Q9 Format Conversion Floating-Point Number X P = X × 2(n–1) P Rounded Common Binary Format 1Q9 Format 256 256 0100000000 01.00000000 0.678915 173.80224 174 0010101101 00.10101101 0.047216 12.087296 12 0000001100 00.00001100 –256 –256 1100000000 11.00000000 –0.678915 –173.80224 –174 1101010011 11.01010011 –0.047216 –12.087296 –12 1111110100 11.11110100 1.00 –1.00 I/O Angular Format The conversion formulae (EQ 12 and EQ 13) support an important feature that greatly simplifies sine and cosine table calculations. Such tables usually have power of two entries (lines). At the same time, they often span angular values from –π/2 to π/2 radians. Therefore, it is beneficial to represent the angle of π/2 radians with the power of two fixed-point number. In particular, when having the CORDIC engine calculate the sin(θ) and cos(θ) table, it is sufficient to increment the fixed-point angular argument θ at each cycle. The angle (phase) signals are a0 and an. They are presented in Qn format, as shown in Table 4 on page 7. The relation between the floating-point angular value expressed in radians and the Qn format is shown in EQ 12. 1 Qn Fixed-Point Angle = 2n-1 × Floating-Point Angle/π EQ 12 In EQ 12, the floating-point angle is measured in radians. The product on the right-hand side of EQ 10 contains integer and fractional parts. The fractional part must be truncated or rounded. The angular value range is from –π/2 to π/2, or in Q9 format: Max input negative number of –π/2: EQ 13 presents a rule for the conversion from the Qn format back to the floating-point radian measure. Floating-Point Angle = Qn Fixed-Point Angle × π/2 1100000000 ⇔ .1100000000 n–1 Max input positive number of +π/2: 0100000000 ⇔ .0100000000 EQ 13 Table 6 shows a few examples of converting floating-point numbers to Q9 format. Table 6 • Examples of Angular Value to Fixed-Point Conversion Floating-Point Angle A (rad) P = A × 2n Common Binary Format Q9 Format (sign.mantissa) π/2 1.5707963268 256 0100000000 0.100000000 π/4 0.7853981634 128 0010000000 0.010000000 π/256 0.0122718463 2 0000000010 0.000000010 1. This format means, literally, the angle of π radians is expressed as the floating-point value of 1.0. 8 v2.0 CoreCORDIC CORDIC RTL Generator Table 6 • Examples of Angular Value to Fixed-Point Conversion –π/2 –1.5707963268 –256 1100000000 1.100000000 –π/4 –0.7853981634 –128 1110000000 1.110000000 –π/256 –0.0122718463 –2 1111111110 1.111111110 CoreCORDIC Configuration Parameters CoreCORDIC generates the CORDIC engine RTL code based on parameters set by the user when generating the module. The core generator supports the variations specified in Table 7. Table 7 • Core Generator Parameters Parameter Name Description Values module_name Name of the generated RTL code module – architecture Bit-serial, word-serial, or word parallel architecture mode Vector rotation (polar to rectangular coordinate conversion and sine/ 0 (vector rotation), 1 (vector cosine calculation) or vector translation (rectangular to polar translation). Default value = 0. conversion) bit_width I/O data bit width 8–48. Default value = 16. iterations Number of iterations 8–48. Default value = bit_width.* fpga_family Family of the Actel FPGA device ax (Axcelerator), apa (ProASICPLUS), pa3 (ProASIC3), sx (SX-A), af (Fusion) lang RTL code language vhdl, verilog 0 (bit-serial), 1 (word-serial), (parallel). Default value = 0. 2 Note: *A warning is issued if the number of iterations is set greater than the bit width. I/O Signal Description Figure 7 shows the CoreCORDIC module pinout. CORDIC x0 xn y0 yn a0 an IdData rst rdyOut clkEn clk nGrst Figure 7 • CoreCORDIC I/O Signals v2.0 9 CoreCORDIC CORDIC RTL Generator The CoreCORDIC module I/O signal functionality is listed in Table 8. Table 8 • I/O Signal Descriptions Signal Name Direction Description x0 [bit_width – 1 : 0] Input Input data bus x0. The abscissa of the input vector in the vectoring mode or the magnitude of the input vector in rotation mode should be placed on this bus. Bit [bit_width – 1] is the MSB. Data are assumed to be presented in two’s complement format. The other vector coordinates are to be supplied simultaneously. y0 [bit_width – 1 : 0] Input Input data bus y0. The ordinate of the input vector in the vectoring mode should be placed on this bus. In rotation mode, the bus should be grounded or left idle. Bit [bit_width – 1] is the MSB. Data are assumed to be presented in two’s complement format. The other vector coordinates are to be supplied simultaneously. a0 [bit_width – 1 : 0] Input Input angle data bus a0. The phase of the input vector in the rotation mode should be placed on this bus. In vectoring mode, the bus should be grounded or left idle. Bit [bit_width – 1] is the MSB. Data are assumed to be presented in two’s complement format. The other vector coordinates are to be supplied simultaneously. clk Input System clock. Active rising edge. nGrst Input System asynchronous reset. Active low. rst Input System/module synchronous reset. Active high. Valid in parallel architecture only. Resets all registers of the core. clkEn Input Clock enable signal. Active high. Valid in word-serial and bit-serial architectures. ldData Input Load input data. Indicates that input vector coordinates are ready for the CORDIC engine to be processed. Active high. Valid in word-serial and bit-serial architectures. rdyOut Output Output data (vector coordinates or sine/cosine values) are ready for the data receiver to read. Active high. Valid in word-serial and bit-serial architectures. xn [bit_width-1 : 0] Output Output data bus xn. The abscissa of the output vector in rotation mode or the magnitude of the output vector in the vectoring mode appears on this bus. Bit [bit_width – 1] is the MSB. Data are presented in two’s complement format. The other vector coordinates emerge on their respective output busses simultaneously. yn [bit_width-1 : 0] Output Output data bus yn. The ordinate of the output vector in rotation mode. Bit [bit_width – 1] is the MSB. Data are presented in two’s complement format. The other vector coordinates emerge on their respective output busses simultaneously. an [bit_width-1 : 0] Output Output data bus an. The phase of the output vector in vectoring mode. Bit [bit_width – 1] is the MSB. Data are presented in two’s complement format. The other vector coordinates emerge on their respective output busses simultaneously. 10 v2.0 CoreCORDIC CORDIC RTL Generator I/O Interface and Timing computation cycle and discards the incomplete results of the interrupted cycle. Upon reset, the CORDIC core returns to its initial state. Signal nGrst asynchronously resets any architecture. Other I/O interfaces and timing depend on core architecture. Once the CORDIC engine completes calculating the result, it generates rdyOut signal one clock period in width. The result on the output busses (an, xn, and yn) is valid while the rdyOut signal is active. The next ldData signal can coincide with the rdyOut signal. Obviously a valid, fresh set of input data, shown as In1 in Figure 8, must be ready by then. Bit-Serial Architecture Interface and Timing One cycle of CORDIC computation = (bit_width × iterations + 2) clock cycles. Figure 8 depicts a typical timing diagram for the bitserial architecture. Signal ldData resets the bit-serial CORDIC module and loads a set of data present on the a0, x0, and y0 input busses. The set of input data is shown in Figure 8 as In0. Normally, a next ldData signal has to come after the end of a current CORDIC cycle; i.e., after the rdyOut signal appears on the module output. In the case that the next ldData signal is issued prior to the end of the current cycle, the CORDIC engine starts a new Signal clkEn can be manipulated as desired. While this signal is low, the CORDIC engine retains all the data it has collected or processed so far. Normally, the bit-serial CORDIC engine is used to fill up the LUT on a power-on event. Once the CORDIC fulfills this function, a high-level state machine may disable the clkEn signal. CORDIC Cycle clk IdData x0, y0, a0 In0 In1 rdyOut xn, yn, an Out0 Figure 8 • Bit-Serial Architecture Timing Diagram Word-Serial Architecture Interface and Timing Once the CORDIC engine completes calculating the result, it generates a rdyOut signal one clock period in width. The result on the output busses (an, xn, and yn) is valid while the rdyOut signal is active. The next ldData signal can immediately follow the rdyOut signal. Obviously a valid, fresh set of input data, shown as In1, must be ready by then. Figure 9 on page 12 depicts a timing diagram for the word-serial architecture. It is very similar to the bit-serial timing diagram. Signal ldData resets the word-serial CORDIC module and loads the set of data present on the a0, x0, and y0 input busses. The set of input data is shown in Figure 9 on page 12 as In0. Normally the next ldData signal must come after the end of the current CORDIC cycle; i.e., after the rdyOut signal appears on the module output. In the case that the next ldData signal is issued prior to the end of a current cycle, the CORDIC engine starts a new computation cycle and discards the incomplete results of the interrupted cycle. One cycle of CORDIC computation = (iterations + 1) clock cycles. v2.0 11 CoreCORDIC CORDIC RTL Generator Signal clkEn can be manipulated as desired. While this signal is low, the CORDIC engine retains all the data it has collected or processed so far. As an example, the word-serial CORDIC engine is used to fill up the LUT on a power-on event. Once the CORDIC completes the task, a high-level state machine may disable the clkEn signal. CORDIC Cycle clk IdData In0 x0, y0, a0 In1 rdyOut xn, yn, an Out0 Figure 9 • Word-Serial Architecture Timing Diagram Parallel Architecture Interface and Timing Figure 10 depicts a timing diagram for the parallel architecture. At the beginning of every clock cycle, a fresh set of input arguments a0, x0, and y0 enters the CORDIC engine. No control signals accompany the input data. The CORDIC engine puts out the results at the beginning of every clock cycle with the latency of iterations clock cycles. Signal rst synchronously resets the parallel architecture; i.e., resets all the registers of the parallel engine. CORDIC Latency clk x0, y0, a0 In0 In1 In2 In3 xn, yn, an Out0 Out1 Out2 Out3 Figure 10 • Parallel Architecture Timing Diagram References J.E. Volder. 1959. "The CORDIC Trigonometric Computing Technique." IRE Transaction on Electronic Computers, EC8:330-334. http://lap.epfl.ch/courses/comparith/Papers/3-Volder_CORDIC.pdf Ray Andraka, "A Survey of CORDIC Algorithms for FPGA Based Computers," http://www.fpga-guru.com/files/crdcsrvy.pdf, 1998. Norbert Lindlbauer, "The CORDIC-Algorithm for Computing a Sine," http://www.cnmat.berkeley.edu/~norbert/cordic/ node4.html, 2000. Grant R. Griffin, "CORDIC FAQ," http://www.dspguru.com/info/faqs/cordic.htm. 12 v2.0 CoreCORDIC CORDIC RTL Generator A Sample Configuration File The following is an example of the configuration file: module_name Cordic_test architecture 0 mode 0 bit_width 16 iterations 16 fpga_family pa3 lang verilog Ordering Information Order CoreCORDIC through your local Actel sales representative. Use the following numbering convention when ordering: CoreCORDIC-XX, where XX is listed in Table 9. Table 9 • Ordering Codes XX Description EV Evaluation version AR RTL for unlimited use on Actel devices UR RTL for unlimited use and not restricted to Actel devices Datasheet Categories In order to provide the latest information to designers, some datasheets are published before data has been fully characterized. Datasheets are designated as "Product Brief," "Advanced," and "Production." The definitions of these categories are as follows: Product Brief The product brief is a summarized version of an advanced or production datasheet containing general product information. This brief summarizes specific device and family information for unreleased products. Advanced This datasheet version contains initial estimated information based on simulation, other products, devices, or speed grades. This information can be used as estimates, but not for production. Unmarked (production) This datasheet version contains information that is considered to be final. v2.0 13 CoreCORDIC CORDIC RTL Generator Appendix I Polar and Rectangular Coordinate Relations Y θ r X Figure 11 • Cartesian Coordinate Definition The Cartesian coordinates (X, Y) are defined in terms of the polar coordinates r (vector magnitude, or radial coordinate) and θ (vector phase, or polar angle), as given in EQ 14 and EQ 15. X = r cos θ EQ 14 Y = r sin θ EQ 15 In terms of Cartesian coordinates, the polar coordinates are expressed as given in EQ 16 and EQ 17. r = 2 X +Y 2 EQ 16 θ = arctan ( Y ⁄ X ) EQ 17 14 v2.0 CoreCORDIC CORDIC RTL Generator Appendix II Examples of CORDIC Modes Polar to Cartesian Coordinate Conversion The CORDIC engine is in rotation mode. Input data represent magnitude r and phase θ of the vector whose polar coordinates are to be converted to Cartesian coordinates. The CORDIC engine puts out a pair of Cartesian coordinates (X*K, Y*K) scaled by processing gain K (Figure 12). Polar to Cartesian θ Y*K r*K r X*K Figure 12 • Polar to Cartesian Vector Conversion General Rotation The CORDIC engine is in rotation mode. Input data (X0 , Y0 , Angle) represent initial vector Cartesian coordinates, as well as an angle to rotate the vector. The CORDIC engine puts out a pair of Cartesian coordinates (X*K, Y*K) of the resulting rotated vector scaled by processing gain K (Figure 13). General Rotation Y*K Y0 r*K M X*K Angle to Rotate X0 Figure 13 • CORDIC General Vector Rotation v2.0 15 CoreCORDIC CORDIC RTL Generator Sine and Cosine CORDIC Calculator The CORDIC engine is in rotation mode. Input data r = 1/K and phase θ represent initial vector polar coordinates. The CORDIC engine puts out a pair of Cartesian coordinates equal to (cosθ, sinθ ), as shown in Figure 14. Sin/Cos sinθ θ 1 1/K cosθ Figure 14 • Sine and Cosine CORDIC Computation Cartesian to Polar Coordinate Conversion The CORDIC engine is in vectoring mode. Input data represent Cartesian coordinates (X0, Y0) of the input vector. The CORDIC engine puts out a pair of polar coordinates: magnitude r*K and phase θ of the input vector (Figure 15). Cartesian to Polar θ Y0 r*K X0 Figure 15 • Cartesian to Polar Coordinate Conversion 16 v2.0 CoreCORDIC CORDIC RTL Generator CORDIC Square Root Calculator The CORDIC engine is in vectoring mode. Input data represent Cartesian coordinates (X0, Y0) of the input vector. The 2 CORDIC engine puts out a pair of polar coordinates: magnitude r = K x 0 + y 0 2 and phase θ of the input vector (Figure 16). Square Root Calculator Y0 r*K X0 Figure 16 • CORDIC Square Root Calculator CORDIC Arctan Calculator The CORDIC engine is in vectoring mode. Input data represent Cartesian coordinates (X0, Y0) of the input vector. The CORDIC engine puts out a pair of polar coordinates: magnitude r and phase θ = arctan(Y0 / X0) of the input vector. Arctan Calculator arctan(Y0/X0) Y0 X0 Figure 17 • CORDIC Arctan Phase Calculator v2.0 17 Actel and the Actel logo are registered trademarks of Actel Corporation. All other trademarks are the property of their owners. www.actel.com Actel Corporation Actel Europe Ltd. Actel Japan www.jp.actel.com Actel Hong Kong www.actel.com.cn 2061 Stierlin Court Mountain View, CA 94043-4655 USA Phone 650.318.4200 Fax 650.318.4600 Dunlop House, Riverside Way Camberley, Surrey GU15 3YL United Kingdom Phone +44 (0) 1276 401 450 Fax +44 (0) 1276 401 490 EXOS Ebisu Bldg. 4F 1-24-14 Ebisu Shibuya-ku Tokyo 150 Japan Phone +81.03.3445.7671 Fax +81.03.3445.7668 Suite 2114, Two Pacific Place 88 Queensway, Admiralty Hong Kong Phone +852 2185 6460 Fax +852 2185 6488 51700064-0/3.06