2016 DFPMU IP Core Floating Point Coprocessor v. 2.09 COMPANY OVERVIEW Digital Core Design is a leading IP Core provider and a System-on-Chip design house. The company was founded in 1999 and since the very beginning has been focused on IP Core architecture improvements. Our innovative, silicon proven solutions have been employed by over 300 customers and with more than 500 hundred licenses sold to companies like Intel, Siemens, Philips, General Electric, Sony and Toyota. Based on more than 70 different architectures, starting from serial interfaces to advanced microcontrollers and SoCs, we are designing solutions tailored to your needs. KEY FEATURES ● ● ● ● ● ● ● ● ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ IP CORE OVERVIEW The DFPMU is a Floating Point Coprocessor, designed to assist a CPU in performing floating point mathematic computations. The DFPMU replaces directly C software functions by equivalent, very fast hardware operations, which significantly accelerate the system performance. It doesn’t require any programming, so it also doesn’t require any modifications to be made in the main software. Everything is done automatically during software compilation, by the DFPMU C driver. The DFPMU was designed to operate with DCD’s DP8051, but can also operate with any other 8-, 16- and 32-bit processor. Drivers for all popular 8051 C compilers are delivered together with the DFPMU package. The DFPMU uses specialized CORDIC and standard algorithms, to compute math functions. It supports addition, subtraction, multiplication, division, square root, comparison, absolute value, change sign of a number and trigonometric functions: sine, cosine, tangent and arctangent. It has built-in conversion instructions from the integer type to the floating point type and vice versa. The input numbers format is compliant with the IEEE-754 standard. The DFPMU supports single precision real numbers, 16-bit and 32-bit integers. Each floating point function can be turned on/off at a configuration level, providing a flexible scalability of the DFPMU module. This allows saving silicon space and provides an exact configuration, required by a certain application. The DFPMU is a technology independent design, which can be implemented in a variety of process technologies. Direct replacement for C float software functions, such as: +, -, *, /,==, !=,>=, <=, <, > Configurability of all available functions C interface supplied for all popular compilers: GNU C/C++, 8051 compilers No programming required IEEE-754 Single precision real format support – float type 16-bit word and 32-bit short integers format supported – integer types Flexible arguments and result registers location Performs the following functions: ● ● Exceptions built-in routines Masks each exception indicator: ○ ○ ○ ○ ○ ○ ● ● FADD, FSUB – addition, subtraction FMUL, FDIV – multiplication, division FSQRT – square root FCHS, FABS – change of sign, absolute value FXAM – examine input data FUCOM – comparison FSIN, FCOS – sine, cosine FTAN – tangent FATAN – arctangent FILDW, FILD – 16-bit, 32-bit integer to float FISTW, FIST – float to 16-bit, 32-bit integer Precision lack PE Underflow result UE Overflow result OE Invalid operand IE Division by zero ZE Denormal operand DE Fully configurable Fully synthesizable, static synchronous design with no internal tri-states LICENSING Comprehensible and clearly defined licensing methods without royalty-per-chip fees make use of our IP Cores easy and simple. Single-Site license option – dedicated to small and middle sized companies, which run their business in one place. Multi-Site license option – dedicated to corporate customers, who operate at several locations. The licensed product can be used in selected company branches. In all cases the number of IP Core instantiations within a project and the number of manufactured chips are unlimited. The license is royalty-per-chip free. There are no restrictions regarding the time of use. There are two formats of the delivered IP Core: APPLICATIONS ● ● ● ● Math coprocessors DSP algorithms Embedded arithmetic coprocessor Fast data processing & control VHDL or Verilog RTL synthesizable source code called HDL Source code FPGA EDIF/NGO/NGD/QXP/VQM called Netlist 1 Copyright © 1999-2016 DCD – Digital Core Design. All Rights Reserved. All trademarks mentioned in this document are the property of their respective owners. DELIVERABLES ♦ Source code: ● ● ● ♦ VHDL Source Code or/and VERILOG Source Code or/and Encrypted, or plain text EDIF VHDL & VERILOG test bench environment ● ● ● ♦ Active-HDL automatic simulation macros ModelSim automatic simulation macros Tests with reference responses Technical documentation ● ● ● ♦ ♦ ♦ Device and major versions changes Phone & email support SYMBOL datai(31:0)1 addr(4:2) we CPU _ clocks CPU DFPMU _ clocks The following table gives a survey about the DP8051+DFPMU performance, compared to the standard 8051 microcontroller. IP Core implementation support 3 months maintenance ● Delivery of the IP Core and documentation updates, minor ● improvement More details are available in the core documentation. Installation notes HDL core specification Datasheet Synthesis scripts Example application Technical support ● ● ing into work registers and output result storing after operation. The results are shown in the table below. Improvement has been computed as a number of clock cycles required by the CPU to compute FP operation, divided by the number of clocks required to compute the same operation by the system of CPU with DFPMU: Improvement 80C51 1.0 DP8051 7.3 DP8051+DFPMU 162.0 General performance improvements 200 datao(31:0)1 162 150 2 irq 100 50 cs rst clk 1 7,3 0 80C51 PINS DESCRIPTION PIN TYPE DESCRIPTION clk Input Global system clock rst Input Global system reset cs Input Chip select for read/write datai[31:0]1 Input Data bus input addr[4:2]2 Input Register address to read/write we Input Data write enable datao[31:0]1 Output Data bus output irq Output Interrupt request indicator 1 – data bus can be configured as an 8-, 16- or 32- bit, depending on a processor’s bus size 2 – address bus is aligned to work with 8- (3:0), 16- (3:1) or 32(4:2) bit processors IMPROVEMENTS The tables and figures below illustrate the system with DFPMU performance improvements for two typical CPU. The DFPMU floating point instructions performance has been compared to standard C library functions, delivered with every commercial C compiler. Each program was executed in the same system environments. Numbers of clock periods were measured between input data load- DP8051 DP8051+DFPMU IEEE-754 FP Instruction Improvement Addition 73 Subtraction 60 Multiplication 65 Division 182 Square Root 392 Sine 139 Cosine 144 Tangent 222 Arcs Tangent 182 Average speed improvement: 162 Improvements of particular operations The table below shows performance improvements of the NIOS-II and DFPMU based system, compared to the same system without the DFPMU coprocessor. Device Improvement NIOS-II/s 1.0 NIOS-II+DFPMU (arithmetic) 7.5 NIOS-II+DFPMU (trigonometric) 49.2 NIOS-II+DFPMU (overall) 28.3 General performance improvements 2 Copyright © 1999-2016 DCD – Digital Core Design. All Rights Reserved. All trademarks mentioned in this document are the property of their respective owners. 49,2 IEEE-754 FP Instruction 50 Addition Subtraction Multiplication Division Square Root Sine Cosine Tangent Arcs Tangent Average speed improvement: Improvements of particular operations 40 28,3 30 20 7,5 10 1 0 BLOCK DIAGRAM Improvement Addition Subtraction Multiplication Division Square Root Sine Cosine Tangent Arcs Tangent Average speed improvement: Improvements of particular operations 6.4 6.5 5.1 6.5 12.9 40.8 41.3 65.0 49.6 28.3 More details are available in the core documentation. 32-bit NIOS-II/s NIOS-II+DFPMU (arithmetic) NIOS-II+DFPMU (trigonometric) NIOS-II+DFPMU (overall) IEEE-754 FP Instruction Improvement 6.4 6.5 5.1 6.5 12.9 40.8 41.3 65.0 49.6 28.3 datai(31:0)1 datao(31:0)1 irq addr(4:2)2 we cs Mantissa Interface Align Exponent Shifter CORDIC clk rst Control Unit More details are available in the core documentation. The following table shows performance improvements of the sample 32bit-RISC CPU with the DFPMU, compared to the same system, without the DFPMU coprocessor. Device Improvement CPU CPU+DFPMU (arithmetic) CPU+DFPMU (trigonometric) CPU+DFPMU (overall) General performance improvements 1.0 7.5 49.2 28.3 PERFORMANCE The following table gives a survey about the Core area and performance in LATTICE® devices, after Place & Route (all key features included): Device ORCA 4 ispXPGA Speed grade LUTs/PFUs Fmax -3 4464/ 711 30 MHz -5 5327/1393 42 MHz Core performance in LATTICE® devices 49,2 CONTACT 50 Digital Core Design Headquarters: 40 28,3 30 20 10 7,5 Wroclawska 94, 41-902 Bytom, POLAND e-mail: tel.: fax: [email protected] 0048 32 282 82 66 0048 32 282 74 37 Distributors: 1 Please check: http://dcd.pl/sales 0 CPU CPU+DFPMU (arithmetic) CPU+DFPMU (trigonometric) CPU+DFPMU (overall) 3 Copyright © 1999-2016 DCD – Digital Core Design. All Rights Reserved. All trademarks mentioned in this document are the property of their respective owners.