ALTERA Datasheet

2016
DFPAU IP Core
Floating Point Arithmetic Coprocessor v. 2.09
○
○
○
○
○
COMPANY OVERVIEW
Digital Core Design is a leading IP Core provider
and a System-on-Chip design house. The company
was founded in 1999 and since the very beginning
has been focused on IP Core architecture improvements. Our innovative, silicon proven solutions have been employed by over 300 customers
and with more than 500 hundred licenses sold to
companies like Intel, Siemens, Philips, General
Electric, Sony and Toyota. Based on more than 70
different architectures, starting from serial interfaces to advanced microcontrollers and SoCs, we
are designing solutions tailored to your needs.
●
●
FMUL, FDIV – multiplication, division
FSQRT – square root
FCHS, FABS – change of sign, absolute value
FXAM – examine input data
FUCOM – comparison
Exceptions built-in routines
Masks each exception indicator:
○
○
○
○
○
○
●
●
Precision lack PE
Underflow result UE
Overflow result OE
Invalid operand IE
Division by zero ZE
Denormal operand DE
Fully configurable
Fully synthesizable, static synchronous design with
no internal tri-states
DELIVERABLES
IP CORE OVERVIEW
The DFPAU is a Floating Point Arithmetic Coprocessor, designed to assist the CPU in performing floating point arithmetic computations. The DFPAU
replaces directly C software functions, by equivalent, very fast hardware operations, which significantly accelerate system performance. It doesn’t
require any programming, so it also doesn’t require any modifications to be made in the main
software. Everything is done automatically during
software compilation, by the DFPAU C driver. The
DFPAU was designed to operate with DCD’s
DP8051, but can also operate with any other 8-,
16- and 32-bit processor. Drivers for all popular
8051 C compilers are delivered together
with the DFPAU package. The DFPAU uses specialized algorithms to compute arithmetic functions. It
supports addition, subtraction, multiplication,
division, square root, comparison, absolute value
and change sign of a number. The input numbers
format is in accordance with the IEEE-754 standard
single precision real numbers. Trigonometric functions are supported indirectly, because they are
computed as set of add, multiply and divide operations, by software subroutines. Each floating point
function can be turned on/off at configuration
level, providing flexible scalability of the DFPAU
module. It allows saving silicon space and provides
exact configuration required by certain application.
KEY FEATURES
●
●
●
●
●
●
●
Direct replacement for C float software functions
such as: +, -, *, /,==, !=,>=, <=, <, >
C interface supplied for all popular compilers: GNU
C/C++, 8051 compilers
No programming required
Configurability of all available functions
IEEE-754 Single precision real format support –
float type
Flexible arguments and result registers location
Performs the following functions:
○
♦
Source code:
●
●
●
♦
VHDL Source Code or/and
VERILOG Source Code or/and
Encrypted, or plain text EDIF
VHDL & VERILOG test bench environment
●
●
●
♦
Active-HDL automatic simulation macros
ModelSim automatic simulation macros
Tests with reference responses
Technical documentation
●
●
●
♦
♦
♦
Installation notes
HDL core specification
Datasheet
Synthesis scripts
Example application
Technical support
●
●
IP Core implementation support
3 months maintenance
● Delivery of the IP Core and documentation updates, minor
●
and major versions changes
Phone & email support
LICENSING
Comprehensible and clearly defined licensing
methods without royalty-per-chip fees make use
of our IP Cores easy and simple.
Single-Site license option – dedicated to small and
middle sized companies, which run their business
in one place.
Multi-Site license option – dedicated to corporate
customers, who operate at several locations. The
licensed product can be used in selected company
branches.
In all cases the number of IP Core instantiations
within a project and the number of manufactured
chips are unlimited. The license is royalty-per-chip
free. There are no restrictions regarding the time
of use.
There are two formats of the delivered IP Core:
VHDL or Verilog RTL synthesizable source code
called HDL Source code
FPGA EDIF/NGO/NGD/QXP/VQM called Netlist
FADD, FSUB – addition, subtraction
1
Copyright © 1999-2016 DCD – Digital Core Design. All Rights Reserved.
All trademarks mentioned in this document are the property
of their respective owners.
APPLICATIONS
●
●
●
●
Math coprocessors
DSP algorithms
Embedded arithmetic coprocessor
Fast data processing & control
SYMBOL
datai(31:0)1
addr(4:2)2
we
datao(31:0)1
BLOCK DIAGRAM
datai(31:0)1
datao(31:0)1
irq
addr(4:2)2
we
cs
Mantissa
Interface
Align
Exponent
irq
Shifter
cs
rst
clk
Control
Unit
clk
rst
PINS DESCRIPTION
PIN
clk
rst
cs
datai[31:0]1
addr[4:2]2
we
datao[31:0]1
int
TYPE
Input
Input
Input
Input
Input
Input
Output
Output
DESCRIPTION
Global system clock
Global system reset
Chip select for read/write
Data bus input
Register address to read/write
Data write enable
Data bus output
Interrupt request indicator
1 – data bus can be configured as an 8-, 16- or 32- bit, depending
on processor’s bus size
2 – address bus is aligned to work with 8- (3:0), 16- (3:1) or 32(4:2) bit processors
UNITS SUMMARY
Mantissa – performs operations on mantissa part of
number. The addition, subtraction, multiplication,
division, square root, comparison and conversion
operations are executed in this module. It contains
mantissas and work registers.
Exponent – performs operations on exponent part of
number. The addition, subtraction, shifting, comparison and conversion operations are executed in this
module. It contains exponents and work registers.
Align – performs a numbers analysis against the IEEE754 standard compliance. Information about the data
classes is passed as result to an appropriate internal
module.
Shifter – performs mantissa shifting during normalization and denormalization operations. Information
about shifted-out bits is stored for rounding process.
Control Unit – manages execution of all instructions
and internal operations required to execute particular
function.
Interface – constitutes an interface between an external device and DFPAU internal 32-bit modules. It
contains data, control and status registers. It can be
configured to work with 8-, 16- and 32-bit processors.
IMPROVEMENTS
Tables and figures below illustrate system with
DFPAU performance improvements, for two typical
CPU. The DFPAU floating point instructions performance has been compared to standard C library
functions, delivered with every commercial C compiler. Each program was executed in the same
system environments. The number of clock periods
was measured between input data loading into
work registers and output result storing after operation. The results are placed in the table below.
Improvement has been computed as a number of
clock cycles required by the CPU to compute FP
operation, divided by the number of clocks required to compute the same operation by the
system of CPU with the DFPAU:
improvement 
CPU _ clocks
CPU  DFPAU _ clocks
More details are available in the core documentation.
The following table gives a survey about
the DP8051+DFPAU performance compared to
standard 8051 microcontroller.
Device
Improvement
80C51
1.0
DP8051
7.3
DP8051+DFPAU
91.0
General performance improvements
150
91
100
50
1 7,3
0
80C51
DP8051
DP8051+DFPAU
2
Copyright © 1999-2016 DCD – Digital Core Design. All Rights Reserved.
All trademarks mentioned in this document are the property
of their respective owners.
IEEE-754 FP Instruction
Improvement
Addition
73
Subtraction
60
Multiplication
65
Division
182
Square Root
392
Sine
10
Cosine
10
Tangent
12
Arcs Tangent
17
Average speed improvement:
91
Improvements of particular operations
The table below shows performance improvements of the NIOS-II processor with DFPAU, compared to the same system without the DFPAU
coprocessor.
Device
Improvement
NIOS-II/s
1.0
NIOS-II+DFPAU (arithmetic)
7.5
NIOS-II+DFPAU (trigonometric)
5.9
NIOS-II+DFPAU (overall)
6.8
General performance improvements
12
7,5
8
4
5,9
6,8
12
7,5
8
5,9
4
6,8
1
0
32-bit CPU
CPU+DFPAU (arithmetic)
CPU+DFPAU (trigonometric)
CPU+DFPAU (overall)
IEEE-754 FP Instruction
Addition
Subtraction
Multiplication
Division
Square Root
Sine
Cosine
Tangent
Arcs Tangent
Average speed improvement:
Improvements of particular operations
Improvement
6.4
6.5
5.1
6.5
12.9
5.2
5.4
5.8
7.2
6.8
PERFORMANCE
1
0
32-bit NIOS-II/s
NIOS-II+DFPAU (arithmetic)
NIOS-II+DFPAU (trigonometric)
IEEE-754 FP Instruction
Improvement
Addition
6.4
Subtraction
6.5
Multiplication
5.1
Division
6.5
Square Root
12.9
Sine
5.2
Cosine
5.4
Tangent
5.8
Arcs Tangent
7.2
Average speed improvement:
6.8
Improvements of particular operations
The table below shows performance improvements of the sample 32-bit RISC CPU with the
DFPAU, compared to the same system without the
DFPAU coprocessor.
Device
Improvement
CPU
1.0
CPU+DFPAU (arithmetic)
7.5
CPU+DFPAU (trigonometric)
5.9
CPU+DFPAU (overall)
6.8
General performance improvements
The following table gives a survey about the Core
area and performance in ALTERA® devices, after
Place & Route (all key features included):
Device
CYCLONE
CYCLONE-II
CYCLONE-III
CYCLONE-IV
STRATIX
STRATIX-II
STRATIX-III
STRATIX-IV
Speed grade
Logic Cells
-6
2410
-6
2380+7xDSP
-6
2359+7xDSP
-6
2364+7xDSP
-5
2210+8xDSP
-3
1660+8xDSP
-2
1655+4xDSP
-2
1630+4xDSP
Core performance in ALTERA® devices
Fmax
91 MHz
100 MHz
105 MHz
111 MHz
115 MHz
169 MHz
240 MHz
225 MHz
CONTACT
Digital Core Design Headquarters:
Wroclawska 94, 41-902 Bytom, POLAND
e-mail:
tel.:
fax:
[email protected]
0048 32 282 82 66
0048 32 282 74 37
Distributors:
Please check:
http://dcd.pl/sales
3
Copyright © 1999-2016 DCD – Digital Core Design. All Rights Reserved.
All trademarks mentioned in this document are the property
of their respective owners.