Media:C_fastRTS.pdf

C Fast RTS Library
User Guide (Rev 1.0)
Revision History
22 Sep 2008
Initial Revision
v. 1.0
IMPORTANT NOTICE
Texas Instruments and its subsidiaries (TI) reserve the right to make changes to their products or to discontinue any
product or service without notice, and advise customers to obtain the latest version of relevant information to verify, before
placing orders, that information being relied on is current and complete. All products are sold subject to the terms and
conditions of sale supplied at the time of order acknowledgment, including those pertaining to warranty, patent
infringement, and limitation of liability.
TI warrants performance of its products to the specifications applicable at the time of sale in accordance with TI’s
standard warranty. Testing and other quality control techniques are utilized to the extent TI deems necessary to support
this warranty. Specific testing of all parameters of each device is not necessarily performed, except those mandated by
government requirements.
Customers are responsible for their applications using TI components.
In order to minimize risks associated with the customer’s applications, adequate design and operating safeguards must be
provided by the customer to minimize inherent or procedural hazards.
TI assumes no liability for applications assistance or customer product design. TI does not warrant or represent that any
license, either express or implied, is granted under any patent right, copyright, mask work right, or other intellect ual
property right of TI covering or relating to any combination, machine, or process in which such products or services might
be or are used. TI’s publication of information regarding any third party’s products or services does not constitute TI’s
approval, license, warranty or endorsement thereof.
Reproduction of information in TI data books or data sheets is permissible only if reproduction is without alteration and is
accompanied by all associated warranties, conditions, limitations and notices. Repres entation or reproduction of this
information with alteration voids all warranties provided for an associated TI product or service is an unfair and deceptive
business practice, and TI is neither responsible nor liable for any such use.
Resale of TI’s products or services with statements different from or beyond the parameters stated by TI for that product
or service voids all express and any implied warranties for the associated TI product or service, is an unfair and deceptive
business practice, and TI is not responsible nor liable for any such use.
Also see: Standard Terms and Conditions of Sale for Semiconductor Products. www.ti.com/sc/docs/stdterms.htm
Mailing Address:
Texas Instruments
Post Office Box 655303
Dallas, Texas 75265
Copyright © 2008, Texas Instruments Incorporated
3
1 Contents
1
2
3
1
2
3
4
Contents .................................................................................................................. iv
Figures...................................................................................................................... v
Tables ....................................................................................................................... v
Introduction .............................................................................................................. 6
1.1 Introduction ........................................................................................................ 6
1.2 Release package and directory structure ........................................................... 6
1.3 FastRTS C functions .......................................................................................... 6
1.4 Macros provided:................................................................................................ 7
1.5 Usage:................................................................................................................ 7
1.6 Comparison between “FastRTS” and “C FastRTS”............................................. 7
Function Descriptions ............................................................................................. 8
2.1 addsp_i: Single precision floating-point addition ................................................. 8
2.2 subsp_i: Single precision floating point subtraction............................................. 8
2.3 uintsp_i: Convert 32-bit unsigned integer to single precision floating point ......... 8
2.4 intsp_i: Convert 32-bit signed integer to single-precision floating-point............... 9
2.5 mpysp_i: Single precision floating-point multiplication ........................................ 9
2.6 recipsp_i: Single precision floating point reciprocal............................................. 9
2.7 spint_i: Single precision floating point to 32-bit signed integer............................ 9
2.8 spuint_i: Single precision floating point to 32-bit unsigned integer ...................... 9
Benchmarks ........................................................................................................... 11
3.1 C64x and C64x+ FastRTS C Library Benchmarks............................................ 11
Flow Charts ............................................................................................................ 12
4.1 Single Precision Addition (addsp_i): ................................................................. 12
4.2 Single Precision Subtraction (subsp): ............................................................... 13
4.3 Single Precision Multiplication (mpysp): ........................................................... 14
4.4 Single Precision Division (divsp_i): ................................................................... 15
4.5 Single Precision Reciprocal (recipsp_i): ........................................................... 16
2 Figures
Figure 1: Directory structure.................................................................................... 6
Figure 2 : addsp_i ................................................................................................. 12
Figure 3 : subsp_i ................................................................................................. 13
Figure 4 : mpysp_i ................................................................................................ 14
Figure 5 : divsp_i................................................................................................... 15
Figure 6 : recipsp_i ............................................................................................... 16
3 Tables
Table 1. Fast RTS C functions. ............................................................................... 6
Table 2: Function Performance............................................................................. 11
v
1 Introduction
1.1 Introduction
The C62x/C64x/C64x+ FastRTS C library is an optimized, floating-point function library.
The FastRTS C library provides C implementation for a subset of functions available with
the FastRTS library. The C codes allow the user to inline these functions and get much
improved performance. To learn more about inlining, please refer to SPRU187.
1.2 Release package and directory structure
The C package is release as a part of the fastRTS library. The package release directory
is as shown.
Figure 1: Directory structure
1.3 FastRTS C functions
Table 1. Fast RTS C functions.
FastRTS C functions
Function Description
addsp_i
Single precision floating point addition
divsp_i
Single precision floating point division
intsp_i
mpysp_i
32-bit signed integer to single precision floating point
number
Single precision floating point multiplication
recipsp_i
Single precision floating point reciprocal
spint_i
Single precision floating point number to 32-bit signed
integer
Single precision floating point number to 32-bit unsigned
integer
spuint_i
sqrtsp_i
Single precision floating point square root
subsp_i
Single precision floating point subtraction
uintsp_i
32-bit unsigned integer to single precision floating point
number
1.4 Macros provided:
There are two macros used in the code.
• DEBUG – This macro switches ON the under-flow and overflow checks in the
code. See flowcharts and individual function description for further details.
• INLINE_C – This macro enables inlining of the C fast RTS functions.
1.5 Usage:
Following steps should be followed to use the C fast RTS library
• Include “fastrts_i.h” file in your source files.
• Call appropriate functions in code.
• Define the above macros as required.
• The remainig build process remains the same.
An example project demonstrating the use of the C fast RTS library is provided in the
release.
The C library works for all TI C6x architectures, namely the C62x, the C64x and the
C64x+. Appropriate code for a particular architecture is generated based on the compiler
options selected by the user.
1.6 Comparison between “FastRTS” and “C FastRTS”
The FastRTS library is written in optimized assembly to get maximum performance. The
drawback is that because of its assembly nature, the kernels can’t be inlined by the
compiler. The FastrRTS C library is written completely in C and thus the compiler can
inline the kernels to get maximum advantage. Unlike the RTS library, both the FastRTS lib
and the FastRTS C library make compromises to the accuracy to get better performance.
These compromizes include underflow and overflow checks and for most use cases, the
accuracy loss is acceptable. Unlike FastRTS library, the FastRTS C library includes the
code for such checks under DEBUG macro. This macro should be enabled for debug
purposes only as it results in loss of performance.
7
2 Function Descriptions
2.1 addsp_i: Single precision floating-point addition
Syntax: float addsp_i(float x, float y)
Defined in: addsp_i.h
Description: The sum of two input 32-bit floating-point number is generated
Special Cases:
• Zero input return zero output
• Underflow and overflow is checked only in the DEBUG mode
2.2 subsp_i: Single precision floating point subtraction
Syntax: float subsp_i(float x, float y)
Defined in: subsp_i.h
Description: The difference of two single precision floating point numbers
Special Cases:
• Underflow and overflow is checked in DEBUG mode
2.3 uintsp_i: Convert 32-bit unsigned integer to single precision floating
point
Syntax: float uintsp_i(unsigned int x)
Defined in: A 32-bit unsigned integer is converted to a single precision floating point number
divsp_i: Single-precision floating-point division
Syntax: float divsp_i(float x, float y)
Defined in: divsp_i.h
Description: The quotient for division of two 32-bit floating-point numbers is generated
Special Cases:
8
•
•
•
Underflow and Overflow of the quotient is checked only in the DEBUG mode
Zero divided by Zero returns 1.#NAN
Non-zero over zero returns infinity
2.4 intsp_i: Convert 32-bit signed integer to single-precision floating-point
Syntax: Float intsp_i(int x)
Defined in: intsp_i.h
Description: An input 32-bit signed integer is converted to a 32-bit single precision floating point
number
2.5 mpysp_i: Single precision floating-point multiplication
Syntax: float mpysp_i(float x, float y)
Defined in: mpysp_i.h
Description: The product of two 32-bit floating point numbers is generated
2.6 recipsp_i: Single precision floating point reciprocal
Syntax: float recipsp_i(float x)
Defined in: recipsp_i
Description: The reciprocal of an input 32-bit floating point number is generated
Special Cases:
• Underflow and overflow is checked only in DEBUG mode
• The reciprocal of zero returns infinity
2.7
spint_i: Single precision floating point to 32-bit signed integer
Syntax: int intsp_i(float x)
Defined in: spint_i.h
Description: A single precision floating point number is converted to a 32-bit signed integer
2.8 spuint_i: Single precision floating point to 32-bit unsigned integer
Syntax: Unsigned int spuint_i(float x)
Defined in: spuint_i.h
9
Description: A single precision floating point number is converted to 32-bit unsigned integer
Special Cases:
• Numbers less than 1.0 returns zero
• Results greater than 32 bits generate the following saturation values:
o 0xffff_ffff for positive numbers
o 0x0000_0000 for negative numbers
10
3 Benchmarks
3.1 C64x and C64x+ FastRTS C Library Benchmarks
Table 2 gives samples of execution clock cycles. The times in column 3 and 5 (function call)
includes the overhead of the function call. The benchmarks were taken using TMS320C64x+
simulator (Little Endian) with flat memory architecture without overheads. The code has been
tested for large number of inputs.
Table 2: Function Performance
Execution Cycles for C64x
Execution Cycles for C64x+
FastRTS optimized C
FastRTS optimized C
Function
Inlined and
Pipelined
Function Call
Inlined and
Pipelined
Function Call
addsp_i
21.17
37.12
11.33
36
subsp_i
21.18
38.12
11.33
37.12
multsp_i
6.041
31.012
5.03
27.01
divsp_i
17.08
63.012
17.08
62.01
recipsp_i
15.07
62.012
15.08
60.01
intsp_i
4.025
22.012
4.02
22.01
spint_i
7.032
20.012
6.01
22.01
spuint_i
8.027
22.012
8.02
22.01
sqrtsp_i
-
559.75
-
545.15
uintsp_i
3.26
16.12
3.21
16.12
*Compiler version used for Benchmarking is v6.0.18
11
4 Flow Charts
4.1 Single Precision Addition (addsp_i):
Op2
IF
both
0
Op1
yes
Set the
ZERO FLAG
No
Extract the exponent, the
fraction and sign
Inset the hidden bit
yes
Op < 0
2’s complement
No
If
ZERO FLAG
set
No
Shift fractions to align radix point and add
yes
Round and normalize the result (24 bits only)
Check for overflow and underflow
Make the exp and fraction
of result= 0
Assemble the result and return
DEBUG mode
Figure 2 : addsp_i
12
4.2 Single Precision Subtraction (subsp_i):
Op2
IF
both
0
Op1
yes
Set the ZERO
FLAG
No
2’s complement of Op2
Extract the exponent, the
fraction and sign
Inset the hidden bit
Op < 0
yes
2’s complement
No
If
ZERO FLAG
set
No
Shift fractions to align radix point and add
yes
Round and normalize the result (24 bits only)
Make the exp and fraction
of result= 0
Check for overflow and underflow
DEBUG mode
Assemble the result and return
Figure 3 : subsp_i
13
4.3 Single Precision Multiplication (mpysp_i):
Op2
Op1
IF
any
0
yes
Set the ZERO
FLAG
No
Extract the exponent, the
fraction and sign
Inset the hidden
bit
Perform 32-bit Multiplication
No
If
ZERO FLAG
set
Round and normalize the result (24 bits only)
yes
Check for overflow and underflow
Make the exp and fraction
of result= 0
DEBUG mode
Assemble the result and return
Figure 4 : mpysp_i
14
4.4 Single Precision Division (divsp_i):
Set the
IFINITY FLAG
yes
Op2
Op1
IF
0
IF
0
No
yes
Set the ZERO
FLAG
No
No
If
ZERO FLAG
set
yes
Extract the exponent, the
fraction and sign
Make the result 0
Inset the hidden bit
Loop:
Perform Division by
repeated subtraction
No
If
INFINITY FLAG
set
yes
Make the result = INFNAN
Round and normalize the result (24 bits only)
Check for overflow and underflow
No
If
Both FLAGS
set
DEBUG mode
yes
Make the result = NAN
Assemble the result and return
Figure 5 : divsp_i
15
4.5 Single Precision Reciprocal (recipsp_i):
Op2
Set the
IFINITY FLAG
yes
IF
0
Op1 = 1
No
Extract the exponent, the
fraction and sign
Inset the hidden bit
If
INFINITY FLAG
set
yes
Loop:
Perform Division by
repeated subtraction
Round and normalize the result (24 bits only)
Make the result = INFNAN
Assemble the result and return
Check for overflow and underflow
DEBUG mode
Figure 6 : recipsp_i
16
No