DFPMU Floating Point Coprocessor ver 2.05 OVERVIEW DFPMU is a Floating Point Coprocessor, designed to assist CPU in performing the floating point mathematic computations. DFPMU directly replaces C software functions, by equivalent, very fast hardware operations, which significantly accelerate system performance. It doesn’t require any programming, so it also doesn’t require any modifications made in the main software. Everything is done automatically during software compilation by the DFPMU C driver. DFPMU was designed to operate with DCD’s DP8051, but can also operate with any other 8-, 16- and 32-bit processor. Drivers for all popular 8051 C compilers are delivered together with the DFPMU package. DFPMU uses the specialized CORDIC and standard algorithms to compute math functions. It supports addition, subtraction, multiplication, division, square root, comparison, absolute value, change sign of a number and trigonometric functions: sine, cosine, tangent and arctangent. It has built-in conversion instructions from integer type to floating point type and vice versa. The input numbers format is according to IEEE-754 standard. DFPMU supports single precision real numbers, 16-bit and 32-bit integers. Each floating point function can be turned on/off at configuration level providing the flexible scalability of DFPMU module. It allows save silicon space and provides exact configuration required by certain application. DFPMU is a technology independent design that can be implemented in a variety of process technologies. APPLICATIONS ● Math coprocessors ● DSP algorithms ● Embedded arithmetic coprocessor ● Fast data processing & control KEY FEATURES ● Direct replacement for C float software functions such as: +, -, *, /,==, !=,>=, <=, <, > ● Configurability of all available functions ● C interface supplied for all popular compilers: GNU C/C++, 8051 compilers ● No programming required ● IEEE-754 Single precision real format support – float type ● 16-bit word and 32-bit short integers format supported – integer types ● Flexible arguments and result registers location ● Performs the following functions: ○ FADD, FSUB – addition, subtraction ○ FMUL, FDIV – multiplication, division ○ FSQRT – square root ○ FCHS, FABS – change of sign, absolute value All trademarks mentioned in this document are trademarks of their respective owners. http://www.DigitalCoreDesign.com http://www.dcd.pl Copyright 1999-2007 DCD – Digital Core Design. All Rights Reserved. ○ FXAM – examine input data ○ FUCOM – comparison ○ FSIN, FCOS – sine, cosine ○ FTAN – tangent ○ FATAN – arctangent ○ FILDW, FILD – 16-bit, 32-bit integer to float ○ FISTW, FIST – float to 16-bit, 32-bit integer ● Exceptions built-in routines ● Masks each exception indicator: ○ Precision lack PE ○ Underflow result UE ○ Overflow result OE ○ Invalid operand IE ○ Division by zero ZE LICENSING Comprehensible and clearly defined licensing methods without royalty fees make using of IP Core easy and simply. Single Design license allows using IP Core in single FPGA bitstream and ASIC implementation. It also permits FPGA prototyping before ASIC production. Unlimited Designs license allows using IP Core in unlimited number of FPGA bitstreams and ASIC implementations. In all cases number of IP Core instantiations within a design, and number of manufactured chips are unlimited. There is no time of use limitations. ● ○ Denormal operand DE Source ● Fully configurable ○ Encrypted, or plain text EDIF called Netlist ● Fully synthesizable, static synchronous design with no internal tri-states DELIVERABLES ♦ Single Design license for ○ VHDL, Verilog source code called HDL Source code: VHDL Source Code or/and VERILOG Source Code or/and Encrypted Netlist or/and plain text EDIF netlist VHDL & VERILOG test bench environment ◊ Active-HDL automatic simulation macros ◊ NCSim automatic simulation macros ◊ ModelSim automatic simulation macros ◊ Tests with reference responses Technical documentation ◊ Installation notes ◊ HDL core specification ◊ Datasheet Synthesis scripts Example application Technical support ◊ IP Core implementation support ◊ 3 months maintenance ● Unlimited Designs license for ○ HDL Source ○ Netlist ● Upgrade from ○ Netlist to HDL Source ○ Single Design to Unlimited Designs ◊ ◊ ◊ ◊ ♦ ♦ ♦ ♦ ♦ ● ● ● Delivery the IP Core updates, minor and major versions changes Delivery the documentation updates Phone & email support All trademarks mentioned in this document are trademarks of their respective owners. http://www.DigitalCoreDesign.com http://www.dcd.pl Copyright 1999-2007 DCD – Digital Core Design. All Rights Reserved. Information about shifted-out bits are stored for rounding process. SYMBOL datai(31:0)1 datao(31:0)1 2 addr(4:2) we irq Control Unit – manages execution of all instructions and internal operation required to execute particular function. datai(31:0)1 datao(31:0)1 irq cs rst clk TYPE Input Global system clock rst Input Global system reset Input Chip select for read/write Input Data bus input addr[4:2] Input Register address to read/write we Input Data write enable datai[31:0] 1 2 Align Exponent Shifter DESCRIPTION clk cs Interface addr(4:2)2 we cs PINS DESCRIPTION PIN Mantissa datao[31:0]1 Output Data bus output irq Output Interrupt request indicator 1 – data bus can be configured as 8-, 16- or 32- bit depends on processor’s bus size 2 – address bus is aligned to work with 8- (3:0), 16(3:1) or 32- (4:2) bit processors BLOCK DIAGRAM Mantissa – performs operations on mantissa part of number. The addition, subtraction, multiplication, division, square root, comparison and conversion operations are executed in this module. It contains mantissas and work registers. CORDIC – performs trigonometric operations on input data. The sine, cosine, tangent and arctangent operations are executed in this module. It contains three work registers. Exponent – performs operations on exponent part of number. The addition, subtraction, shifting, comparison and conversion operations are executed in this module. It contains exponents and work registers. CORDIC clk rst Control Unit Interface – makes interface between external device and DFPMU internal 32-bit modules. It contains data, control and status registers. It can be configured to work with 8-, 16- and 32-bit processors. PERFORMANCE The following table gives a survey about the Core area and performance in the ALTERA® devices after Place & Route (all key features have been included): Speed Logic Cells Fmax grade APEX20KE -1 5150 50 MHz APEX20KC -7 5150 58 MHz APEX-II -7 5150 73 MHz CYCLONE -6 4650 90 MHz CYCLONE-II -6 4520 96 MHz STRATIX -5 4460 108 MHz STRATIX-II -3 3300 168 MHz Core performance in ALTERA® devices Device Align – performs the numbers analyze against IEEE-754 standard compliance. Information about the data classes are passed as result to appropriate internal module. Shifter – performs mantissa shifting during normalization, denormalization operations. All trademarks mentioned in this document are trademarks of their respective owners. http://www.DigitalCoreDesign.com http://www.dcd.pl Copyright 1999-2007 DCD – Digital Core Design. All Rights Reserved. DFPMU floating point instructions performance has been compared to standard C library functions delivered with every commercial C compiler. Each program was executed in the same system environments. Number of clock periods were measured between input data loading into work registers and output result storing after operation. The results are placed in table below. Improvement has been computed as number of: (CPU clk) divided by (CPU+DFPMU clk), required to execute the same operation. More details are available in core documentation. The following table gives a survey about the DP8051+DFPMU performance compared to std 8051 microcontroller. Device Improvement 80C51 1.0 DP8051 7.3 DP8051+DFPMU 162.0 General performance improvements The table below shows performance improvements of the NIOS-II and DFPMU based system, compared to the same system without the DFPMU coprocessor. Device Improvement NIOS-II/s 1.0 NIOS-II+DFPMU (arithmetic) 7.5 NIOS-II+DFPMU (trigonometric) 49.2 NIOS-II+DFPMU (overall) 28.3 General performance improvements 49,2 50 40 28,3 30 20 7,5 10 1 0 200 162 32-bit NIOS-II/s NIOS-II+DFPMU (arithmetic) NIOS-II+DFPMU (trigonometric) NIOS-II+DFPMU (overall) 150 100 50 7,3 1 0 80C51 DP8051 DP8051+DFPMU IEEE-754 FP Instruction Improvement Addition 73 Subtraction 60 Multiplication 65 Division 182 Square Root 392 Sine 139 Cosine 144 Tangent 222 Arcs Tangent 182 Average speed improvement: 162 Improvements of particular operations All trademarks mentioned in this document are trademarks of their respective owners. IEEE-754 FP Instruction Improvement Addition 6.4 Subtraction 6.5 Multiplication 5.1 Division 6.5 Square Root 12.9 Sine 40.8 Cosine 41.3 Tangent 65.0 Arcs Tangent 49.6 Average speed improvement: 28.3 Improvements of particular operations More details are available in core documentation. http://www.DigitalCoreDesign.com http://www.dcd.pl Copyright 1999-2007 DCD – Digital Core Design. All Rights Reserved. CONTACTS For any modification or special request please contact to Digital Core Design or local distributors. Headquarters: Wroclawska 94 41-902 Bytom, POLAND n fo @ d c d .p l e-mail: [email protected] tel. : +48 32 282 82 66 fax : +48 32 282 74 37 Distributors: ttp://www.dcd.pl/apartn.php Please check hhttp://www.dcd.pl/apartn.php All trademarks mentioned in this document are trademarks of their respective owners. http://www.DigitalCoreDesign.com http://www.dcd.pl Copyright 1999-2007 DCD – Digital Core Design. All Rights Reserved.