LATTICE Datasheet

2016
DFPMU IP Core
Floating Point Coprocessor v. 2.09
COMPANY OVERVIEW
Digital Core Design is a leading IP Core provider
and a System-on-Chip design house. The company
was founded in 1999 and since the very beginning
has been focused on IP Core architecture improvements. Our innovative, silicon proven solutions have been employed by over 300 customers
and with more than 500 hundred licenses sold to
companies like Intel, Siemens, Philips, General
Electric, Sony and Toyota. Based on more than 70
different architectures, starting from serial interfaces to advanced microcontrollers and SoCs, we
are designing solutions tailored to your needs.
KEY FEATURES
●
●
●
●
●
●
●
●
○
○
○
○
○
○
○
○
○
○
○
IP CORE OVERVIEW
The DFPMU is a Floating Point Coprocessor, designed to assist a CPU in performing floating point
mathematic computations. The DFPMU replaces
directly C software functions by equivalent, very
fast hardware operations, which significantly accelerate the system performance. It doesn’t require any programming, so it also doesn’t require
any modifications to be made in the main software. Everything is done automatically during
software compilation, by the DFPMU C driver. The
DFPMU was designed to operate with DCD’s
DP8051, but can also operate with any other 8-,
16- and 32-bit processor. Drivers for all popular
8051 C compilers are delivered together
with the DFPMU package. The DFPMU uses specialized CORDIC and standard algorithms, to compute
math functions. It supports addition, subtraction,
multiplication, division, square root, comparison,
absolute value, change sign of a number and trigonometric functions: sine, cosine, tangent and
arctangent. It has built-in conversion instructions
from the integer type to the floating point type and
vice versa. The input numbers format is compliant
with the IEEE-754 standard. The DFPMU supports
single precision real numbers, 16-bit and 32-bit
integers. Each floating point function can be
turned on/off at a configuration level, providing a
flexible scalability of the DFPMU module. This
allows saving silicon space and provides an exact
configuration, required by a certain application.
The DFPMU is a technology independent design,
which can be implemented in a variety of process
technologies.
Direct replacement for C float software functions,
such as: +, -, *, /,==, !=,>=, <=, <, >
Configurability of all available functions
C interface supplied for all popular compilers: GNU
C/C++, 8051 compilers
No programming required
IEEE-754 Single precision real format support –
float type
16-bit word and 32-bit short integers format supported – integer types
Flexible arguments and result registers location
Performs the following functions:
●
●
Exceptions built-in routines
Masks each exception indicator:
○
○
○
○
○
○
●
●
FADD, FSUB – addition, subtraction
FMUL, FDIV – multiplication, division
FSQRT – square root
FCHS, FABS – change of sign, absolute value
FXAM – examine input data
FUCOM – comparison
FSIN, FCOS – sine, cosine
FTAN – tangent
FATAN – arctangent
FILDW, FILD – 16-bit, 32-bit integer to float
FISTW, FIST – float to 16-bit, 32-bit integer
Precision lack PE
Underflow result UE
Overflow result OE
Invalid operand IE
Division by zero ZE
Denormal operand DE
Fully configurable
Fully synthesizable, static synchronous design with
no internal tri-states
LICENSING
Comprehensible and clearly defined licensing
methods without royalty-per-chip fees make use
of our IP Cores easy and simple.
Single-Site license option – dedicated to small and
middle sized companies, which run their business
in one place.
Multi-Site license option – dedicated to corporate
customers, who operate at several locations. The
licensed product can be used in selected company
branches.
In all cases the number of IP Core instantiations
within a project and the number of manufactured
chips are unlimited. The license is royalty-per-chip
free. There are no restrictions regarding the time
of use.
There are two formats of the delivered IP Core:
APPLICATIONS
●
●
●
●
Math coprocessors
DSP algorithms
Embedded arithmetic coprocessor
Fast data processing & control
VHDL or Verilog RTL synthesizable source code
called HDL Source code
FPGA EDIF/NGO/NGD/QXP/VQM called Netlist
1
Copyright © 1999-2016 DCD – Digital Core Design. All Rights Reserved.
All trademarks mentioned in this document are the property
of their respective owners.
DELIVERABLES
♦
Source code:
●
●
●
♦
VHDL Source Code or/and
VERILOG Source Code or/and
Encrypted, or plain text EDIF
VHDL & VERILOG test bench environment
●
●
●
♦
Active-HDL automatic simulation macros
ModelSim automatic simulation macros
Tests with reference responses
Technical documentation
●
●
●
♦
♦
♦
Device
and major versions changes
Phone & email support
SYMBOL
datai(31:0)1
addr(4:2)
we
CPU _ clocks
CPU  DFPMU _ clocks
The following table gives a survey about
the DP8051+DFPMU performance, compared
to the standard 8051 microcontroller.
IP Core implementation support
3 months maintenance
● Delivery of the IP Core and documentation updates, minor
●
improvement 
More details are available in the core documentation.
Installation notes
HDL core specification
Datasheet
Synthesis scripts
Example application
Technical support
●
●
ing into work registers and output result storing
after operation. The results are shown in the table
below. Improvement has been computed as a
number of clock cycles required by the CPU to
compute FP operation, divided by the number of
clocks required to compute the same operation by
the system of CPU with DFPMU:
Improvement
80C51
1.0
DP8051
7.3
DP8051+DFPMU
162.0
General performance improvements
200
datao(31:0)1
162
150
2
irq
100
50
cs
rst
clk
1
7,3
0
80C51
PINS DESCRIPTION
PIN
TYPE
DESCRIPTION
clk
Input
Global system clock
rst
Input
Global system reset
cs
Input
Chip select for read/write
datai[31:0]1
Input
Data bus input
addr[4:2]2
Input
Register address to read/write
we
Input
Data write enable
datao[31:0]1
Output
Data bus output
irq
Output
Interrupt request indicator
1 – data bus can be configured as an 8-, 16- or 32- bit, depending on a processor’s bus size
2 – address bus is aligned to work with 8- (3:0), 16- (3:1) or 32(4:2) bit processors
IMPROVEMENTS
The tables and figures below illustrate the system
with DFPMU performance improvements for two
typical CPU. The DFPMU floating point instructions
performance has been compared to standard C
library functions, delivered with every commercial
C compiler. Each program was executed in the
same system environments. Numbers of clock
periods were measured between input data load-
DP8051
DP8051+DFPMU
IEEE-754 FP Instruction
Improvement
Addition
73
Subtraction
60
Multiplication
65
Division
182
Square Root
392
Sine
139
Cosine
144
Tangent
222
Arcs Tangent
182
Average speed improvement:
162
Improvements of particular operations
The table below shows performance improvements of the NIOS-II and DFPMU based system,
compared to the same system without the DFPMU
coprocessor.
Device
Improvement
NIOS-II/s
1.0
NIOS-II+DFPMU (arithmetic)
7.5
NIOS-II+DFPMU (trigonometric)
49.2
NIOS-II+DFPMU (overall)
28.3
General performance improvements
2
Copyright © 1999-2016 DCD – Digital Core Design. All Rights Reserved.
All trademarks mentioned in this document are the property
of their respective owners.
49,2
IEEE-754 FP Instruction
50
Addition
Subtraction
Multiplication
Division
Square Root
Sine
Cosine
Tangent
Arcs Tangent
Average speed improvement:
Improvements of particular operations
40
28,3
30
20
7,5
10
1
0
BLOCK DIAGRAM
Improvement
Addition
Subtraction
Multiplication
Division
Square Root
Sine
Cosine
Tangent
Arcs Tangent
Average speed improvement:
Improvements of particular operations
6.4
6.5
5.1
6.5
12.9
40.8
41.3
65.0
49.6
28.3
More details are available in the core documentation.
32-bit NIOS-II/s
NIOS-II+DFPMU (arithmetic)
NIOS-II+DFPMU (trigonometric)
NIOS-II+DFPMU (overall)
IEEE-754 FP Instruction
Improvement
6.4
6.5
5.1
6.5
12.9
40.8
41.3
65.0
49.6
28.3
datai(31:0)1
datao(31:0)1
irq
addr(4:2)2
we
cs
Mantissa
Interface
Align
Exponent
Shifter
CORDIC
clk
rst
Control
Unit
More details are available in the core documentation.
The following table shows performance improvements of the sample 32bit-RISC CPU with the
DFPMU, compared to the same system, without
the DFPMU coprocessor.
Device
Improvement
CPU
CPU+DFPMU (arithmetic)
CPU+DFPMU (trigonometric)
CPU+DFPMU (overall)
General performance improvements
1.0
7.5
49.2
28.3
PERFORMANCE
The following table gives a survey about the Core
area and performance in LATTICE® devices, after
Place & Route (all key features included):
Device
ORCA 4
ispXPGA
Speed grade
LUTs/PFUs
Fmax
-3
4464/ 711
30 MHz
-5
5327/1393
42 MHz
Core performance in LATTICE® devices
49,2
CONTACT
50
Digital Core Design Headquarters:
40
28,3
30
20
10
7,5
Wroclawska 94, 41-902 Bytom, POLAND
e-mail:
tel.:
fax:
[email protected]
0048 32 282 82 66
0048 32 282 74 37
Distributors:
1
Please check:
http://dcd.pl/sales
0
CPU
CPU+DFPMU (arithmetic)
CPU+DFPMU (trigonometric)
CPU+DFPMU (overall)
3
Copyright © 1999-2016 DCD – Digital Core Design. All Rights Reserved.
All trademarks mentioned in this document are the property
of their respective owners.