2016 DFPAU IP Core Floating Point Arithmetic Coprocessor v. 2.09 ○ ○ ○ ○ ○ COMPANY OVERVIEW Digital Core Design is a leading IP Core provider and a System-on-Chip design house. The company was founded in 1999 and since the very beginning has been focused on IP Core architecture improvements. Our innovative, silicon proven solutions have been employed by over 300 customers and with more than 500 hundred licenses sold to companies like Intel, Siemens, Philips, General Electric, Sony and Toyota. Based on more than 70 different architectures, starting from serial interfaces to advanced microcontrollers and SoCs, we are designing solutions tailored to your needs. ● ● FMUL, FDIV – multiplication, division FSQRT – square root FCHS, FABS – change of sign, absolute value FXAM – examine input data FUCOM – comparison Exceptions built-in routines Masks each exception indicator: ○ ○ ○ ○ ○ ○ ● ● Precision lack PE Underflow result UE Overflow result OE Invalid operand IE Division by zero ZE Denormal operand DE Fully configurable Fully synthesizable, static synchronous design with no internal tri-states DELIVERABLES IP CORE OVERVIEW The DFPAU is a Floating Point Arithmetic Coprocessor, designed to assist the CPU in performing floating point arithmetic computations. The DFPAU replaces directly C software functions, by equivalent, very fast hardware operations, which significantly accelerate system performance. It doesn’t require any programming, so it also doesn’t require any modifications to be made in the main software. Everything is done automatically during software compilation, by the DFPAU C driver. The DFPAU was designed to operate with DCD’s DP8051, but can also operate with any other 8-, 16- and 32-bit processor. Drivers for all popular 8051 C compilers are delivered together with the DFPAU package. The DFPAU uses specialized algorithms to compute arithmetic functions. It supports addition, subtraction, multiplication, division, square root, comparison, absolute value and change sign of a number. The input numbers format is in accordance with the IEEE-754 standard single precision real numbers. Trigonometric functions are supported indirectly, because they are computed as set of add, multiply and divide operations, by software subroutines. Each floating point function can be turned on/off at configuration level, providing flexible scalability of the DFPAU module. It allows saving silicon space and provides exact configuration required by certain application. KEY FEATURES ● ● ● ● ● ● ● Direct replacement for C float software functions such as: +, -, *, /,==, !=,>=, <=, <, > C interface supplied for all popular compilers: GNU C/C++, 8051 compilers No programming required Configurability of all available functions IEEE-754 Single precision real format support – float type Flexible arguments and result registers location Performs the following functions: ○ ♦ Source code: ● ● ● ♦ VHDL Source Code or/and VERILOG Source Code or/and Encrypted, or plain text EDIF VHDL & VERILOG test bench environment ● ● ● ♦ Active-HDL automatic simulation macros ModelSim automatic simulation macros Tests with reference responses Technical documentation ● ● ● ♦ ♦ ♦ Installation notes HDL core specification Datasheet Synthesis scripts Example application Technical support ● ● IP Core implementation support 3 months maintenance ● Delivery of the IP Core and documentation updates, minor ● and major versions changes Phone & email support LICENSING Comprehensible and clearly defined licensing methods without royalty-per-chip fees make use of our IP Cores easy and simple. Single-Site license option – dedicated to small and middle sized companies, which run their business in one place. Multi-Site license option – dedicated to corporate customers, who operate at several locations. The licensed product can be used in selected company branches. In all cases the number of IP Core instantiations within a project and the number of manufactured chips are unlimited. The license is royalty-per-chip free. There are no restrictions regarding the time of use. There are two formats of the delivered IP Core: VHDL or Verilog RTL synthesizable source code called HDL Source code FPGA EDIF/NGO/NGD/QXP/VQM called Netlist FADD, FSUB – addition, subtraction 1 Copyright © 1999-2016 DCD – Digital Core Design. All Rights Reserved. All trademarks mentioned in this document are the property of their respective owners. APPLICATIONS ● ● ● ● Math coprocessors DSP algorithms Embedded arithmetic coprocessor Fast data processing & control SYMBOL datai(31:0)1 addr(4:2)2 we datao(31:0)1 BLOCK DIAGRAM datai(31:0)1 datao(31:0)1 irq addr(4:2)2 we cs Mantissa Interface Align Exponent irq Shifter cs rst clk Control Unit clk rst PINS DESCRIPTION PIN clk rst cs datai[31:0]1 addr[4:2]2 we datao[31:0]1 int TYPE Input Input Input Input Input Input Output Output DESCRIPTION Global system clock Global system reset Chip select for read/write Data bus input Register address to read/write Data write enable Data bus output Interrupt request indicator 1 – data bus can be configured as an 8-, 16- or 32- bit, depending on processor’s bus size 2 – address bus is aligned to work with 8- (3:0), 16- (3:1) or 32(4:2) bit processors UNITS SUMMARY Mantissa – performs operations on mantissa part of number. The addition, subtraction, multiplication, division, square root, comparison and conversion operations are executed in this module. It contains mantissas and work registers. Exponent – performs operations on exponent part of number. The addition, subtraction, shifting, comparison and conversion operations are executed in this module. It contains exponents and work registers. Align – performs a numbers analysis against the IEEE754 standard compliance. Information about the data classes is passed as result to an appropriate internal module. Shifter – performs mantissa shifting during normalization and denormalization operations. Information about shifted-out bits is stored for rounding process. Control Unit – manages execution of all instructions and internal operations required to execute particular function. Interface – constitutes an interface between an external device and DFPAU internal 32-bit modules. It contains data, control and status registers. It can be configured to work with 8-, 16- and 32-bit processors. IMPROVEMENTS Tables and figures below illustrate system with DFPAU performance improvements, for two typical CPU. The DFPAU floating point instructions performance has been compared to standard C library functions, delivered with every commercial C compiler. Each program was executed in the same system environments. The number of clock periods was measured between input data loading into work registers and output result storing after operation. The results are placed in the table below. Improvement has been computed as a number of clock cycles required by the CPU to compute FP operation, divided by the number of clocks required to compute the same operation by the system of CPU with the DFPAU: improvement CPU _ clocks CPU DFPAU _ clocks More details are available in the core documentation. The following table gives a survey about the DP8051+DFPAU performance compared to standard 8051 microcontroller. Device Improvement 80C51 1.0 DP8051 7.3 DP8051+DFPAU 91.0 General performance improvements 150 91 100 50 1 7,3 0 80C51 DP8051 DP8051+DFPAU 2 Copyright © 1999-2016 DCD – Digital Core Design. All Rights Reserved. All trademarks mentioned in this document are the property of their respective owners. IEEE-754 FP Instruction Improvement Addition 73 Subtraction 60 Multiplication 65 Division 182 Square Root 392 Sine 10 Cosine 10 Tangent 12 Arcs Tangent 17 Average speed improvement: 91 Improvements of particular operations The table below shows performance improvements of the NIOS-II processor with DFPAU, compared to the same system without the DFPAU coprocessor. Device Improvement NIOS-II/s 1.0 NIOS-II+DFPAU (arithmetic) 7.5 NIOS-II+DFPAU (trigonometric) 5.9 NIOS-II+DFPAU (overall) 6.8 General performance improvements 12 7,5 8 4 5,9 6,8 12 7,5 8 5,9 4 6,8 1 0 32-bit CPU CPU+DFPAU (arithmetic) CPU+DFPAU (trigonometric) CPU+DFPAU (overall) IEEE-754 FP Instruction Addition Subtraction Multiplication Division Square Root Sine Cosine Tangent Arcs Tangent Average speed improvement: Improvements of particular operations Improvement 6.4 6.5 5.1 6.5 12.9 5.2 5.4 5.8 7.2 6.8 PERFORMANCE 1 0 32-bit NIOS-II/s NIOS-II+DFPAU (arithmetic) NIOS-II+DFPAU (trigonometric) IEEE-754 FP Instruction Improvement Addition 6.4 Subtraction 6.5 Multiplication 5.1 Division 6.5 Square Root 12.9 Sine 5.2 Cosine 5.4 Tangent 5.8 Arcs Tangent 7.2 Average speed improvement: 6.8 Improvements of particular operations The table below shows performance improvements of the sample 32-bit RISC CPU with the DFPAU, compared to the same system without the DFPAU coprocessor. Device Improvement CPU 1.0 CPU+DFPAU (arithmetic) 7.5 CPU+DFPAU (trigonometric) 5.9 CPU+DFPAU (overall) 6.8 General performance improvements The following table gives a survey about the Core area and performance in ALTERA® devices, after Place & Route (all key features included): Device CYCLONE CYCLONE-II CYCLONE-III CYCLONE-IV STRATIX STRATIX-II STRATIX-III STRATIX-IV Speed grade Logic Cells -6 2410 -6 2380+7xDSP -6 2359+7xDSP -6 2364+7xDSP -5 2210+8xDSP -3 1660+8xDSP -2 1655+4xDSP -2 1630+4xDSP Core performance in ALTERA® devices Fmax 91 MHz 100 MHz 105 MHz 111 MHz 115 MHz 169 MHz 240 MHz 225 MHz CONTACT Digital Core Design Headquarters: Wroclawska 94, 41-902 Bytom, POLAND e-mail: tel.: fax: [email protected] 0048 32 282 82 66 0048 32 282 74 37 Distributors: Please check: http://dcd.pl/sales 3 Copyright © 1999-2016 DCD – Digital Core Design. All Rights Reserved. All trademarks mentioned in this document are the property of their respective owners.