ETC HD64F7044

To all our customers
Regarding the change of names mentioned in the document, such as Hitachi
Electric and Hitachi XX, to Renesas Technology Corp.
The semiconductor operations of Mitsubishi Electric and Hitachi were transferred to Renesas
Technology Corporation on April 1st 2003. These operations include microcomputer, logic, analog
and discrete devices, and memory chips other than DRAMs (flash memory, SRAMs etc.)
Accordingly, although Hitachi, Hitachi, Ltd., Hitachi Semiconductors, and other Hitachi brand
names are mentioned in the document, these names have in fact all been changed to Renesas
Technology Corp. Thank you for your understanding. Except for our corporate trademark, logo and
corporate statement, no changes whatsoever have been made to the contents of the document, and
these changes do not constitute any alteration to the contents of the document itself.
Renesas Technology Home Page: http://www.renesas.com
Renesas Technology Corp.
Customer Support Dept.
April 1, 2003
Cautions
Keep safety first in your circuit designs!
1. Renesas Technology Corporation puts the maximum effort into making semiconductor products better
and more reliable, but there is always the possibility that trouble may occur with them. Trouble with
semiconductors may lead to personal injury, fire or property damage.
Remember to give due consideration to safety when making your circuit designs, with appropriate
measures such as (i) placement of substitutive, auxiliary circuits, (ii) use of nonflammable material or
(iii) prevention against any malfunction or mishap.
Notes regarding these materials
1. These materials are intended as a reference to assist our customers in the selection of the Renesas
Technology Corporation product best suited to the customer's application; they do not convey any
license under any intellectual property rights, or any other rights, belonging to Renesas Technology
Corporation or a third party.
2. Renesas Technology Corporation assumes no responsibility for any damage, or infringement of any
third-party's rights, originating in the use of any product data, diagrams, charts, programs, algorithms, or
circuit application examples contained in these materials.
3. All information contained in these materials, including product data, diagrams, charts, programs and
algorithms represents information on products at the time of publication of these materials, and are
subject to change by Renesas Technology Corporation without notice due to product improvements or
other reasons. It is therefore recommended that customers contact Renesas Technology Corporation
or an authorized Renesas Technology Corporation product distributor for the latest product information
before purchasing a product listed herein.
The information described here may contain technical inaccuracies or typographical errors.
Renesas Technology Corporation assumes no responsibility for any damage, liability, or other loss
rising from these inaccuracies or errors.
Please also pay attention to information published by Renesas Technology Corporation by various
means, including the Renesas Technology Corporation Semiconductor home page
(http://www.renesas.com).
4. When using any or all of the information contained in these materials, including product data, diagrams,
charts, programs, and algorithms, please be sure to evaluate all information as a total system before
making a final decision on the applicability of the information and products. Renesas Technology
Corporation assumes no responsibility for any damage, liability or other loss resulting from the
information contained herein.
5. Renesas Technology Corporation semiconductors are not designed or manufactured for use in a device
or system that is used under circumstances in which human life is potentially at stake. Please contact
Renesas Technology Corporation or an authorized Renesas Technology Corporation product distributor
when considering the use of a product contained herein for any specific purposes, such as apparatus or
systems for transportation, vehicular, medical, aerospace, nuclear, or undersea repeater use.
6. The prior written approval of Renesas Technology Corporation is necessary to reprint or reproduce in
whole or in part these materials.
7. If these products or technologies are subject to the Japanese export control restrictions, they must be
exported under a license from the Japanese government and cannot be imported into a country other
than the approved destination.
Any diversion or reexport contrary to the export control laws and regulations of Japan and/or the
country of destination is prohibited.
8. Please contact Renesas Technology Corporation for further details on these materials or the products
contained therein.
SuperH RISC Engine
SH-DSP Software
Application Note
ADE-502-069
Rev. 1.0
9/21/1999
Hitachi, Ltd.
Cautions
1. Hitachi neither warrants nor grants licenses of any rights of Hitachi’s or any third party’s
patent, copyright, trademark, or other intellectual property rights for information contained in
this document. Hitachi bears no responsibility for problems that may arise with third party’s
rights, including intellectual property rights, in connection with use of the information
contained in this document.
2. Products and product specifications may be subject to change without notice. Confirm that you
have received the latest product standards or specifications before final design, purchase or
use.
3. Hitachi makes every attempt to ensure that its products are of high quality and reliability.
However, contact Hitachi’s sales office before using the product in an application that
demands especially high quality and reliability or where its failure or malfunction may directly
threaten human life or cause risk of bodily injury, such as aerospace, aeronautics, nuclear
power, combustion control, transportation, traffic, safety equipment or medical equipment for
life support.
4. Design your application so that the product is used within the ranges guaranteed by Hitachi
particularly for maximum rating, operating supply voltage range, heat radiation characteristics,
installation conditions and other characteristics. Hitachi bears no responsibility for failure or
damage when used beyond the guaranteed ranges. Even within the guaranteed ranges,
consider normally foreseeable failure rates or failure modes in semiconductor devices and
employ systemic measures such as fail-safes, so that the equipment incorporating Hitachi
product does not cause bodily injury, fire or other consequential damage due to operation of
the Hitachi product.
5. This product is not designed to be radiation resistant.
6. No one is permitted to reproduce or duplicate, in any form, the whole or part of this document
without written approval from Hitachi.
7. Contact Hitachi’s sales office for any questions regarding this document or Hitachi
semiconductor products.
Preface
The SH-DSP is a CPU core belonging to the SuperH RISC engine family. It is a 32-bit RISC
microcontroller based on the SH-2 CPU, optimized for signal processing performance, and
incorporating a DSP unit.
These application notes contain example code that makes use of the special features of the SHDSP as well as explanations of how to utilize the hardware. It is hoped that these application notes
will be of use to programmers designing applications that make use of the DSP functions.
Note that though the operation of the example code contained in these application notes has
been verified, it is still necessary to confirm its operation when in an actual implementation.
For more information on the hardware, please refer to the hardware manual for the appropriate
product.
Please feel free to contact Hitachi for detailed information on development systems.
Rev.1.0, 09/99, page v of 7
SH-DSP Code Samples
These application notes contain example code written to illustrate the special features of the SHDSP.
Figure 1 shows the format used for listings of source code in the application notes. The main
program code is transferred to XRAM and the program is executed in XRAM. This format is
compatible with the SH7612. When using other SH-DSP models, the following modifications and
cautions apply:
• XRAM starting address setting .......................................................................................... (1)
• Vector and stack pointer (YRAM ending address + 1 byte) settings ................................. (2)
• Usage of commands with other SH-DSP models ............................................................... (3)
• Since space for the data used by the main program is reserved in XRAM or YRAM,
changes to XRAM or YRAM address settings to match microcontroller used ................. (4)
;***************************************************************************
;*
Symbol definition
;***************************************************************************
;
[
XRAM address (SH7612)
]
XRAM_TOP
.EQU
H'1000E000 ------------------------------------- (2)
;***************************************************************************
;*
Program transfer routine
;***************************************************************************
.SECTION VECT,CODE,LOCATE=H'0
;
.DATA.L
_PRES
;_PRES
------------------- (1)
.DATA.L
H'10020000
;SP
.SECTION ROM,CODE,LOCATE=H'1000
_PRES:
MOV.L
MOV.L
MOV.L
PRG_MOVE:
MOV.W
MOV.W
ADD
CMP/GE
BF
MOV.L
JMP
NOP
#XRAM_TOP,R1
#MAIN,R10
#MAIN_E,R11
@R10+,R0
R0,@R1
#2,R1
R11,R10
PRG_MOVE
#XRAM_TOP,R0
@R0
;Branch to program starting address
;at transfer destination
Main program ---------------------------------- (3)
Data -------------------------------------- (4)
.END
Figure 1 Source Code Format
Rev. 1.0, 09/99, page vi of 7
Contents
Section 1
1.1
1.2
1.3
...........................................................................................................................................
Linking Assignments.........................................................................................................
1.2.1 “prglnk1.sub” Subcommand File for Linking ......................................................
1.2.2 “ini.bat” Batch File for Creating Absolute Files ..................................................
1.2.3 “vect.src” Vector Table for “dsplbr.c” Program, which Uses DSP Library .........
Function Execution Process ..............................................................................................
Section 2
2.1
2.2
2.3
2.4
Example of Calling Functions (DSP Library)
from C Source Code ......................................................................................
X/Y Bus Data Access ....................................................................................
X Memory Read ................................................................................................................
X Memory Write ...............................................................................................................
Y Memory Read ................................................................................................................
Y Memory Write ...............................................................................................................
1
1
2
2
3
3
4
7
7
10
14
17
Section 3
16-bit Fixed-point Multiplication .............................................................. 21
Section 4
Parallel Execution Instruction ..................................................................... 27
Section 5
Repeat Instruction........................................................................................... 33
Section 6
Examples of Arguments Passed Between CPU Instructions
and DSP Instructions ..................................................................................... 41
Section 7
32-bit Multiplication ...................................................................................... 45
Section 8
.............................................................................................................................. 59
Section 9
Matrix Operations........................................................................................... 75
Section 10 Inner Product.................................................................................................... 83
Section 11 Square Root ...................................................................................................... 91
Section 12 Square Mean Error ......................................................................................... 105
Section 13 Effects of DSP Instructions on Program Performance ........................ 115
Rev.1.0, 09/99, page vii of 7
Section 1 Example of Calling Functions (DSP Library)
from C Source Code
1.1
C Source Code Employing Functions (DSP Library)
The example code below, “dsplbr.c,” illustrates calling the “Mean” function in the DSP library
(shdsplib.lib) from C source code.
/*
<<SH-DSP Application Notes>>
-- DSP library usage example -"dsplbr.c"
*/
#include "ensigdsp.h"
#define N 6
/* Mean value definition */
/* Input data number */
(1)
short dat[6]={45,61,516,3000,-974,10214} /* Input data */
#pragma section X
static short
#pragma section Y
static short
#pragma section ANS
static short
#pragma section
main()
{
short
int
/* YRAM address */
(3)
/* Address for storing mean value */
answer;
src_x;
for(i=0;i<N;i++)
{
datx[i] = dat[i];
daty[i] = dat[i];
}
/* output for storing variable i
and Mean function calculation
result */
/* Argument specifying storage area
for input data */
/* Copy input data to XRAM */
/* Copy input data to YRAM */
*/
Mean(output,datx,N,src_x);
answer = output[0];
while(1);
(2)
daty[N];
i,output[1];
/*
select XRAM
*1
src_x = 1;
/* XRAM address */
datx[N];
(4)
/* Use XRAM area for Mean
function calculation */
/* Pass Mean function arguments and
calculate mean value */
/* Store Mean function calculation
result at answer address * /
/* Processing complete */
}
*1 Refer to 1.3 Function Execution Process for details.
Rev. 1.0, 09/99, page 1 of 115
(1) The format of the functions in the library shdsplib.lib are defined in the header file
ensigndsp.h.
(2) To ensure efficient X bus data transfer with the DSP unit, it is necessary to place datX[N] in
XRAM. Section X needs to be set when linking to addresses in XRAM. (See 1.2 Linking
Assignments.)
(3) To ensure efficient Y bus data transfer with the DSP unit, it is necessary to place datY[N] in
YRAM. Section Y needs to be set when linking to addresses in XRAM. (See 1.2 Linking
Assignments.)
(4) If srx_x = 1, an area in XRAM is used for Mean function calculations. If srx_x = 0, an area in
YRAM is used.
1.2
Linking Assignments
When using the DSP library the utmost care must be taken to ensure that the section setting is
correct. The example code dsplbr.c shown in section 1.1 has two sections, X and Y. If XRAM and
YRAM address are not set for these sections, the functions’ internal calculations cannot be
performed correctly. These addresses are assigned in the subcommand file.
1.2.1
“prglnk1.sub” Subcommand File for Linking
INPUT
vect,dsplbr
START
BX(1000ff00),BANS(1000fff0),BY(1001e000) ------------------ (1)
LIBRARY
shdsplib.lib -------------------------------------------------------------------- (2)
PRINT
dsplbr.map
OUTPUT
dsplbr.abs
FORM
A
DEBUG
EXIT
(1) BX(1000ff00) assigns #pragma section X (section X) of dsplbr.c to address H'1000FF00.
BY(1001e000) assigns #pragma section Y (section Y) of dsplbr.c to address H'1001E000.
(2) This specifies shdsplib.lib, which includes the Mean function, as the library to be edited.
Rev. 1.0, 09/99, page 2 of 115
1.2.2
“ini.bat” Batch File for Creating Absolute Files
asmsh vect.src -cpu=shdsp -debug -lis
shc dsplbr.c -cpu=sh2 -lis -debug -include=ensigdsp.h
lnk -subcommand=prglnk1.sub
1.2.3
“vect.src” Vector Table for “dsplbr.c” Program, which Uses DSP Library
;********************************************************
;*
;*
<<SH-DSP Application Notes>>
-- DSP library usage example --
;*
;*
"vect.src"
;*******************************************************
.import
.section
_main
vect,data,locate=h'0
.data.l
_main
.data.l
h'10020000
.end
Rev. 1.0, 09/99, page 3 of 115
1.3
Function Execution Process
Excerpts from the example code dsplbr.c shown in section 1.1, and the assembler code resulting
from the functions used, as shown below.
.
.
.
src_x = 1;
Assembler code resulting from function
Mean(output,datx,N,src_x;)
answer = output[0]
.
.
.
Address
1001e2fc
1001e2fe
1001e300
1001e302
Label
_Mean
Assembler
CMP/PZ
BF
MOV
CMP/GT
R7
@1001E322:8
#H'01,R1
R1,R7
NEG
MOV.W
RTS
R2,R2
R2,@R4
.
.
.
.
.
.
1001e486
1001e488
1001e48a
In table 1.1, the input data is arranged starting at address H'1000FF00. It is assumed that the data
in RAM has been cleared to 0. The data remains the same after the function is executed.
Table 1.1
Memory Map
XRAM Memory
H'1000FF00
002D 003D 0204
0BB8
H'1000FF08
FC32 27E6 0000
0000
Rev. 1.0, 09/99, page 4 of 115
Table 1.2
Function Execution Process
Excerpt from dsplbr.c Code
Register Contents
Mean(output,datx,N,src_x);
Before execution:
R4=H'1001FFFC, R5=H'1000FF00, R6=6, R7=1
After execution:
R4=H'1001FFFC, R5=H'1000FF0C, R6=6, R7=H'10000
The function arguments are assigned the declaration sequence R4 to R7, so output=H'1001FFFC,
datx=H'1000FF00, N=6, src_x=1 is passed to the function. The calculation result is held in @R4.
Table 1.3
C Source Code Execution Process (Process Inside Memory Map)
Excerpt from dsplbr.c Code
YRAM Memory
answer = output[0];
Before execution:
H'1001FF00
0000 0000 0000 0000
After execution:
H'1001FF00
0860 0000 0000 0000
The C source code then stores the function calculation result from @R4 in answer (H'1001FF0).
Table 1.4
Mean Function Calculation Result
Input Value
(decimal)
Input Value
(hexadecimal)
Logical Value
(decimal)
Logical Value
(hexadecimal)
Output Value
(hexadecimal)
45
H'2D
2143.666667
61
H'3D
H'860
H'860
(2144 calculated
as a decimal value)
516
H'204
3000
H'BB8
–974
H'FC32
10214
H'27E6
Rev. 1.0, 09/99, page 5 of 115
Section 2 X/Y Bus Data Access
2.1
X Memory Read
Overview
The data from the XRAM_ADD address (H'1000FF00) and XRAM_ADD+2 address
(H'1000FF02) is transferred, respectively, to registers X0 and X1.
Description
Table 2.1 shows the types of X memory read instructions and the registers that can be used as
operands. Data can be read from X memory using the commands listed in table 2.1.
When reading data from X memory the transfer data length is 16 bits, so the data is stored as the
upper word of register X0 or X1. When this happens, the lower word of register X0 or X1 is
cleared to 0. Processes (1) and (2) in the flowchart are illustrated below.
Table 2.1
X Memory Read Instruction Types
X Memory Read
Instruction
Source Register
(Ax)
Destination Register
(Dx)
Index Register
(Ix)
MOVX.W @Ax,Dx
R4, R5
X0, X1
R8
MOVX.W @Ax+,Dx
MOVX.W @Ax+Ix,Dx
Rev. 1.0, 09/99, page 7 of 115
Process (1)
XRAM
31
16 15
0
XRAM_TOP
*1
XRAM_ADD
Register X0
Bit: 31
16 15
0
XRAM_END
Stores read data
Cleared to 0
Process (2)
XRAM
31
16 15
0
XRAM_TOP
*1
XRAM_ADD
Register X1
Bit: 31
16 15
0
XRAM_END
Stores read data
Cleared to 0
*1
Flowchart
Start
Transfer XRAM address (H'1000FF00) to register R4
After reading data (0.5) from R4 address
(H'1000FF00) to register X0, increment R4 address
(1)
Read data (0.25) from R4 address (H'1000FF02) to
register X1
(2)
End
Rev. 1.0, 09/99, page 8 of 115
: Ignored
Main Program
;**********************************************************************
;*
X memory read
;**********************************************************************
MAIN:
EXIT:
MOV.L
#XRAM_ADD,R4
;XRAM_ADD address -> register R4
MOVX.W
@R4+,X0
;(H'1000FF00) -> X0
MOVX.W
@R4,X1
;(H'1000FF02) -> X1
BRA
EXIT
NOP
MAIN_E: NOP
Data
;***************************************************************
;*
Data
;***************************************************************
.SECTION XRAM,DATA,LOCATE=H'1000FF00
XRAM_ADD:
.XDATA.W
0.5,0.25
Rev. 1.0, 09/99, page 9 of 115
2.2
X Memory Write
Overview
The data from the XRAM_ADD1 address (H'1000FF00) and XRAM_ADD1+2 address
(H'1000FF02) is transferred the XRAM_ADD2 address and XRAM_ADD2+2 address.
Description
Table 2.2 shows the types of X memory write instructions and the registers that can be used as
operands. Data can be written to X memory using the commands listed in table 2.2.
When writing data to X memory the transfer data length is 16 bits, so the upper word data from
register A0 or A1, as specified by the instruction, is stored in X memory. When this happens, the
guard bit and lower word of register A0 or A1 is ignored. The X memory write instructions can
use only registers A0 and A1 as source registers (see Table 2.2 X Memory Write Instruction
Types), so when transferring data to register A0 or A1, single data transfers with register A0 or A1
as the destination operand are used. Processes (1) and (2) in the flowchart are illustrated below.
Table 2.2
X Memory Write Instruction Types
X Memory Write
Instruction
Source Register
(Da)
Destination Register
(Ax)
Index Register
(Ix)
MOVX.W Da,@Ax
A0, A1
R4, R5
R8
MOVX.W Da,@Ax+
MOVX.W Da,@Ax+Ix
Rev. 1.0, 09/99, page 10 of 115
Process (1)
Memory map (XRAM)
31
16 15
0
XRAM_TOP
Register A0
XRAM_ADD1
Bit: 39
31
16 15
0
Data written to XRAM
Ignored
XRAM_ADD2
Ignored
XRAM_END
Process (2)
Memory map (XRAM)
31
16 15
0
XRAM_TOP
Register A0
XRAM_ADD1
Bit: 39
31
16 15
0
Data written to XRAM
XRAM_ADD2
Ignored
Ignored
XRAM_END
Rev. 1.0, 09/99, page 11 of 115
Flowchart
Start
Transfer XRAM_ADD1 address (H'1000FF00) to
register R2
Transfer XRAM_ADD2 address (H'1000FF00) to
register R4
After transferring data (0.5) from R4 (H'1000FF00)
address to register A0, increment R4 address
(1)
Transfer register A0 data to R2 (H'1000FF04)
address and increment R2
Transfer data (0.25) from R4 (H'1000FF02) address
to register A1
(2)
Transfer data from register A1 to R2 (H'1000FF06)
address
End
Rev. 1.0, 09/99, page 12 of 115
Main Program
***********************************************************************
;*
X memory write
;**********************************************************************
MAIN:
EXIT:
MOV.L
#XRAM_ADD1,R2
;XRAM_ADD1 -> R2 register
MOV.L
#XRAM_ADD2,R4
;XRAM_ADD2 -> R4 register
MOVS.W
@R2+,A0
;(H'1000FF00) -> A0 register
MOVX.W
A0,@R4+
;A0 register data -> XRAM_ADD2
MOVS.W
@R2,A1
;(H'1000FF00) -> A1 register
MOVX.W
A1,@R4
;A1 register data -> XRAM_ADD2+2
BRA
EXIT
NOP
MAIN_E: NOP
Data
;***************************************************************
;*
Data
;***************************************************************
.SECTION XRAM,DATA,LOCATE=H'1000FF00
XRAM_ADD1:
.XDATA.W
0.5,0.25
XRAM_ADD2:
.RES.W
2
Rev. 1.0, 09/99, page 13 of 115
2.3
Y Memory Read
Overview
The data from the TRAM_ADD address (H'1001FF00) and YRAM_ADD+2 address
(H'1001FF02) is transferred, respectively, to registers Y0 and Y1.
Description
Table 2.3 shows the types of Y memory read instructions and the registers that can be used as
operands. Data can be read from Y memory using the commands listed in table 2.3.
When reading data from Y memory the transfer data length is 16 bits, so the data is stored as the
upper word of register Y0 or Y1. When this happens, the lower word of register Y0 or Y1 is
cleared to 0. Processes (1) and (2) in the flowchart are illustrated below.
Table 2.3
Y Memory Read Instruction Types
Y Memory Read
Instruction
Source Register
(Ay)
Destination Register
(Dy)
Index Register
(Iy)
MOVY.W @Ay,Dy
R6, R7
Y0, Y1
R9
MOVY.W @Ay+,Dy
MOVY.W @Ay+Iy,Dy
Rev. 1.0, 09/99, page 14 of 115
Process (1)
YRAM
31
16 15
0
YRAM_TOP
*1
YRAM_ADD
Register Y0
Bit: 31
16 15
0
YRAM_END
Stores read data
Cleared to 0
Process (2)
YRAM
31
16 15
0
YRAM_TOP
*1
YRAM_ADD
Register Y1
Bit: 31
16 15
0
YRAM_END
Stores read data
Cleared to 0
*1
: Ignored
Flowchart
Start
Transfer YRAM address (H'1001FF00) to register R6
After reading data (0.5) from R4 address
(H'1001FF00) to register Y0, increment R6 address
(1)
Read data (0.25) from R6 address (H'1001FF02) to
register Y1
(2)
End
Rev. 1.0, 09/99, page 15 of 115
Main Program
;**********************************************************************
;*
Y memory read
;**********************************************************************
MAIN:
EXIT:
MOV.L
#YRAM_ADD,R6
;YRAM_ADD address -> R6 register
MOVX.W
@R6+,Y0
;(H'1001FF00) -> Y0
MOVX.W
@R6,Y1
;(H'1001FF02) -> Y1
BRA
EXIT
NOP
MAIN_E: NOP
Data
;***************************************************************
;*
Data
;***************************************************************
.SECTION YRAM,DATA,LOCATE=H'1001FF00
YRAM_ADD:
.XDATA.W
Rev. 1.0, 09/99, page 16 of 115
0.5,0.25
2.4
Y Memory Write
Overview
The data from the YRAM_ADD1 address (H'1001FF00) and YRAM_ADD1+2 address
(H'1001FF02) is transferred the YRAM_ADD2 address and YRAM_ADD2+2 address.
Description
Table 2.4 shows the types of Y memory write instructions and the registers that can be used as
operands. Data can be written to Y memory using the commands listed in table 2.4.
When writing data to Y memory the transfer data length is 16 bits, so the upper word data from
register A0 or A1, as specified by the instruction, is stored in Y memory. When this happens, the
guard bit and lower word of register A0 or A1 is ignored. The Y memory write instructions can
use only registers A0 and A1 as source registers (see Table 2.4 Y Memory Write Instruction
Types), so when transferring data to register A0 or A1, single data transfers with register A0 or A1
as the destination operand are used. Processes (1) and (2) in the flowchart are illustrated below.
Table 2.4
Y Memory Write Instruction Types
Y Memory Write
Instruction
Source Register
(Da)
Destination Register
(Ax)
Index Register
(Ix)
MOVY.W Da,@Ax
A0, A1
R6, R7
R9
MOVY.W Da,@Ax+
MOVY.W Da,@Ax+Ix
Rev. 1.0, 09/99, page 17 of 115
Process (1)
Memory map (YRAM)
31
16 15
0
YRAM_TOP
*1
Register A0
YRAM_ADD1
Bit: 39
31
16 15
0
Data written to YRAM
*1
Ignored
YRAM_ADD2
Ignored
YRAM_END
Process (2)
Memory map (YRAM)
31
16 15
0
YRAM_TOP
*1
Register A0
YRAM_ADD1
Bit: 39
31
16 15
0
Data written to YRAM
*1
YRAM_ADD2
Ignored
Ignored
YRAM_END
*1
Rev. 1.0, 09/99, page 18 of 115
: Ignored
Flowchart
Start
Transfer YRAM_ADD1 address (H'1001FF00) to
register R3
Transfer YRAM_ADD2 address (H'1001FF00) to
register R6
After transferring data (0.5) from R6 (H'1001FF00)
address to register A0, increment R6 address
(1)
Transfer register A0 data to R3 (H'1001FF04)
address and increment R3
Transfer data (0.25) from R6 (H'1001FF02) address
to register A1
(2)
Transfer data from register A1 to R3 (H'1001FF06)
address
End
Rev. 1.0, 09/99, page 19 of 115
Main Program
***********************************************************************
;*
Y Memory Write
;**********************************************************************
MAIN:
EXIT:
MOV.L
#YRAM_ADD1,R3
;YRAM_ADD1 -> R3 register
MOV.L
#YRAM_ADD2,R6
;YRAM_ADD2 -> R6 register
MOVS.W
@R3+,A0
;(H'1001FF00) -> A0 register
MOVX.W
A0,@R6+
;A0 register data -> YRAM_ADD2
MOVS.W
@R3,A1
;(H'1001FF00) -> A1 register
MOVX.W
A1,@R6
;A1 register data -> YRAM_ADD2+2
BRA
EXIT
NOP
MAIN_E: NOP
Data
;****************************************************************
;*
Data
;****************************************************************
.SECTION YRAM,DATA,LOCATE=H'1001FF00
YRAM_ADD1:
.XDATA.W
0.5,0.25
YRAM_ADD2:
.RES.W
2
Rev. 1.0, 09/99, page 20 of 115
Section 3 16-bit Fixed-point Multiplication
Overview
Multiplies the 16-bit data at the XRAM-ADD address (H'1000FF000) and the 16-bit data at the
YRAM-ADD address (H'1001FF002). The result is stored at the ANS address (H'1001FF002).
Description
1. Data Transfer
Transfer of the data from the XRAM-ADD address (H'1000FF000) and the YRAM-ADD
address (H'1001FF002) is performed using X bus data transfer and Y bus data transfer, as
described in 2. X/Y Bus Data Access. In process (1) in the flowchart the XRAM and YRAM
data is read simultaneously, but no contention occurs because the X bus and Y bus are
independent of each other. The format is shown below.
The sequence is [X bus data transfer] then [Y bus data transfer]. If these are described in a
single step, the instructions may be combined as either [X memory read] [Y memory write] or
[X memory write] [Y memory read].
Format:
MOVX.W @R5,X1
MOVY.W @R7,Y1
Rev. 1.0, 09/99, page 21 of 115
2. Fixed-point Multiplication
The PMULS instruction is used to perform fixed-point multiplication in process (2) in the
flowchart. The format is shown below. The fixed-point multiplication process is shown in
figure 3.1. Only the upper word data from source 1 and source 2 is valid. For example, if the
longword H'12345678 was read from the source, the portion that would actually be multiplied
would be H'1234.
Format:
PMULS
Se,Sf,Dg
Source 1 (Se): X0, X1, Y0, A1
Source 2 (Sf): Y0, Y1, X0, A1
Only upper word is valid
39
Only upper word is valid
31
0
31
0
39
31
0
31
0
MAC
(multiplier)
Destination (Dg): M0, M1, A0, A1
Guard bit
Code extension
0
39
31
10
31
10
0
Figure 3.1 Fixed-point Multiplication Process
Rev. 1.0, 09/99, page 22 of 115
: Ignored
3. Overflow
An overflow can occur during fixed-point multiplication only if the operation is H'8000(–1.0)
× H'8000(–1.0), in which case the calculation result is H'8000(–1.0). This can happen only
when the destination register is a register other than A0 or A1, both of which have guard bits.
If the destination register is A0 or A1, the result of the above calculation is the correct value of
H'008000000(1.0). Refer to table 3.1 for additional fixed-point multiplication execution
examples.
Since the destination register used in the example main program is A0, no overflow problem
occurs.
Table 3.1
Fixed-point Multiplication Execution Examples
State of Operation
Result
Destination
Register
Operation Result
H'4000 (0.5) ×
H'2000 (0.25)
Positive
M0, M1
H'1000 0000 (0.125)
A0, A1
H'00 1000 0000 (0.125)
H'0800 (0.0625) ×
H'FC00 (–0.03125)
Negative
M0, M1
H'FFC00 0000 (–1.95×10 )
A0, A1
H'FF FFC00 0000 (–1.95×10 )
H'8000 (–1.0) ×
H'8000 (–1.0)
Overflow
M0, M1
H'8000 0000 (–0.1)
A0, A1
H'00 8000 0000 (1.0)
Operation Example
–3
–3
Rev. 1.0, 09/99, page 23 of 115
Flowchart
Start
Transfer XRAM_ADD address (H'1000F000) to
register R4
Transfer YRAM_ADD address (H'1001F000) to
register R6
Transfer ANS address (H'1001F002) to register R7
Transfer data from R4 address (H'1000F000) to
register X0
Transfer data from R6 address (H'1001F000) to
register Y0
(1)
Multiply upper 16 bits of register X0 data and register
Y0 data, store result in register A0
(2)
Transfer data from register A0 to ANS address
(H'1001F002)
End
Rev. 1.0, 09/99, page 24 of 115
Main Program
;*******************************************************************************************
;*
16-bit fixed-point multiplication routine
;*******************************************************************************************
MAIN:
MOV.L
#0,R4
MOV.L
#0,R6
;Clear register R4
;Clear register R6
MOV.L
#XRAM_ADD,R4
;XRAM address -> register R4
MOV.L
#YRAM_ADD,R6
;YRAM address -> register R6
MOV.L
#ANS,R7
;ANS address -> register R7
MOVX.W @R4,X0
PMULS
MOVY.W @R6,Y0
X0,Y0,A0
;16-bit fixed-point
multiplication
MOVY.W A0,@R7
EXIT:
BRA
;XRAM and YRAM address data ->
registers X0 and Y0
;Store multiplication result
EXIT
NOP
MAIN_E: NOP
Data
;**************************************************************
;*
Data
;**************************************************************
.SECTION XRAM,DATA,LOCATE=H'1000F000
XRAM_ADD:
.XDATA.W
0.0625
YRAM_ADD:
.XDATA.W
0.03125
ANS:
.RES.W
1
.SECTION YRAM,DATA,LOCATE=H'1001F000
Rev. 1.0, 09/99, page 25 of 115
Section 4 Parallel Execution Instruction
Overview
Four data values obtained sequentially from the XRAM-ADD address (H'1000FF000) and the
YRAM-ADD address (H'1001FF000) are added and multiplied. The addition result is stored at the
ANS1 address (H'1000FF004) and the multiplication result at the ANS2 address (H'1001FF004).
Description
1. Structure of Parallel Execution Instruction
The parallel execution instruction is used to transfer data between a DSP register and X
memory or Y memory at the same time a DSP operation is being executed. Table 4.1 shows
the data transfer and DSP operation structure. The parallel execution instruction comprises a
DSP operation portion and a data transfer portion. Table 4.2 lists format examples for the
parallel execution instruction. The DSP operation portion is a single instruction like the regular
PAND, PINC, and PSHA instructions. However, as shown in table 4.2, its has two-instruction
structure the case of the PADD and PMULS instructions, or the PSUB and PMULS
instructions. The data transfer portion consists of two instructions, one the data transfer
instruction for X memory and the other the data transfer instruction for Y memory. Either one
of these data transfer instructions may be used.
Table 4.1
Data Transfer and DSP Operation Structure
Type
Bus Used
Parallel
Data Transfer Processing with
Length
DSP Operation
Double
data
transfer
X bus
Y bus
(1)
16 bits
No
Parallel Processing
of Data Transfers
Instructio
n Length
No: One or the other
data transfer
16 bits
Yes: Data transfer
with X memory and Y
memory at same time
Yes
No: One or the other
data transfer
32 bits
Yes: Data transfer
with X memory and Y
memory at same time
(2)
Single
data
transfer
C bus
*1
16 bits
32 bits
No
16 bits
*1: Note that the name differs depending on the product.
Rev. 1.0, 09/99, page 27 of 115
Table 4.2
Parallel Execution Instruction Format Examples
DSP Operation Portion
Data Transfer Portion
PADD
X0,Y0,A0
PMULS
X0,Y0,A1
MOVX.W
A0,@R4
MOVY.W A1,@R6
PSUB
X1,Y1,A1
PMULS
X0,Y1,A0
MOVX.W
@R5,X1
MOVY.W @R7,Y1
PADD
X0,Y0,A0
PMULS
X0,Y0,A1
MOVX.W
A0,@R4
PINC
X0,Y0,A0
MOVY.W
@R6,Y1
PAND
X0,Y0,A0
MOVX.W
A0,@R5
PSHA
X0,Y0,A0
MOVX.W
@R4,X1
MOVY.W A1,@R7
2. Parallel Processing of Double Data Transfer and DSP Operation
Process (1) in the flowchart on the following page is double data transfer with no DSP
operation instruction parallel processing, which is indicated as (1) in table 4.1, and processes
(2) and (3) are double data transfer with parallel processing of DSP operation instructions,
which is indicated as (2) in table 4.1. Processes (2) and (3) consist of four instructions, which is
the maximum number that can be declared in a single step. In this case, one execution state is
used.
3. Effect of DSP Operation Portion Result on Data Transfer Portion
Table 4.3 shows the effect of the DSP operation portion result on the data transfer portion.
Instruction 2 (process (3)) uses A0 and A1 as the destination register for the DSP operation
portion and also as the source register for the data transfer portion. However, the result of the
DSP operation portion is not the data stored in the data transfer portion. In this case the
underlined registers are affected, so the calculation result from instruction 1 (process (2))
operation portion is stored in the instruction 2 (process (3)) data transfer portion.
Figure 4.1 shows the instruction 2 pipeline flow. When instructions are executed in parallel,
each of the instructions is processed independently, as shown in figure 4.1. The reason the
DSP operation portion result does not become the data stored in the data transfer portion in this
case is that the WB/DSP stage, in which DSP operations are performed using PADD and
PMULS, is later than the MA stage, in which memory access is performed using MOVX.W
and MOVY.W.
Note that after the execution of instruction 2 (process (3)), the X1 and Y1 addition and
multiplication results are stored in registers A0 and A1.
Rev. 1.0, 09/99, page 28 of 115
Table 4.3
Effect of DSP Operation Portion Result on Data Transfer Portion
Excerpts from Main Program
;Instruction 1
PADD
X0,Y0,A0 PMULS
X0,Y0,A1
MOVX.W @R4,X1
MOVY.W @R6,Y1
X1,Y1,A1
MOVX.W A0,@R5+
MOVY.W A1,@R7+
;Instruction 2
PADD
X1,Y1,A0 PMULS
Content of Registers
Before execution of instruction 2:
X1=H'1000 0000, Y1=H'0800 0000, A0=H'6000 0000, A1=H'1000 0000
After execution of instruction 2:
X1=H'1000 0000, Y1=H'0800 0000, A0=H'1800 0000, A1=H'0100 0000
Slot
PADD
X1,Y1,A0
IF
ID
EX
MA
WB/DSP
PMULS
X1,Y1,A1
IF
ID
EX
MA
WB/DSP
MOVX.W
A0,@R5+
IF
ID
EX
MA
WB/DSP
MOVY.W
A1,@R7+
IF
ID
EX
MA
WB/DSP
Figure 4.1 Instruction 2 Pipeline Flow
Rev. 1.0, 09/99, page 29 of 115
Flowchart
Start
Transfer XRAM_ADD address (H'1000F000) to
register R4
Transfer ANS1 address (H'1000F004) to register R5
Transfer YRAM_ADD address (H'1001F000) to
register R6
Transfer ANS2 address (H'1001F004) to register R7
After transferring data (0.5) from R4 address
(H'1000F000) to register X0, increment address
After transferring data (0.25) from R6 address
(H'1001F000) to register Y0, increment address
(1)
Add data in registers X0 and Y0, store result in
register A0
Multiply data in registers X0 and Y0, store result in
register A1
After transferring data (0.25) from R4 address
(H'1000F000) to register X1, increment address
After transferring data (0.5) from R6 address
(H'1001F000) to register Y1, increment address
(2)
Add data in registers X1 and Y1, store result in
register A0
Multiply data in registers X1 and Y1, store result in
register A1
After transferring data register A0 to ANS1 address
(H'1000F004), increment address
After transferring data register A1 to ANS2 address
(H'1001F004), increment address
(3)
After transferring data register A0 to ANS1 address
(H'1000F004), increment address
After transferring data register A1 to ANS2 address
(H'1001F004), increment address
(1)
End
Rev. 1.0, 09/99, page 30 of 115
Main Program
;*******************************************************************************************
;*
Parallel data transfer routine
;******************************************************************************************
MAIN:
MOV.L
#XRAM_ADD,R4
MOV.L
#ANS1,R5
MOV.L
#YRAM_ADD,R6
MOV.L
#ANS2,R7
MOVX.W @R4+,X0 MOVY.W @R6+,Y0
;No parallel processing
PADD
X0,Y0,A0 PMULS X0,Y0,A1
MOVX.W @R4,X1
MOVY.W @R6,Y1
PADD
X1,Y1,A0 PMULS X1,Y1,A1
MOVX.W A0,@R5+ MOVY.W A1,@R7+
;Parallel processing
;Parallel processing
MOVX.W A0,@R5
MOVY.W A1,@R7
;No parallel processing
EXIT:
BRA
EXIT
NOP
MAIN_E: NOP
Data
;**********************************************************************
;*
Data(X/YRAM)
;**********************************************************************
.SECTION XRAM,DATA,LOCATE=H'1000F000
XRAM_ADD:
.XDATA.W
0.5,0.125
;DSP operation data
ANS1:
.RES.W
2
;DSP operation result storage area
YRAM_ADD:
.XDATA.W
0.25,0.0625
;DSP operation data
ANS2:
.RES.W
2
;DSP operation result storage area
.SECTION YRAM,DATA,LOCATE=H'1001F000
Rev. 1.0, 09/99, page 31 of 115
Section 5 Repeat Instruction
Overview
The average of ten data values stored in XRAM and YRAM is obtained. To accomplish this, the
repeat function is used for transferring data from XRAM and YRAM to the DSP unit, and for
adding the ten data values.
Description
1. DSP Repeat Control
Three settings are required in order to perform repeat control: I the start address setting for the
program to be repeated, II the end address setting for the program to be repeated, III and the
setting for the number of repetitions to be performed. After settings I through III have been
completed, Process IV is to start the program to be repeated. Note that a minimum of one
instruction is required between the processing of III and IV.
The sequence of processes I through IV is shown below.
I
LDRS instruction is used to set the repeat start address in the RS register.
II
LDRE instruction is used to set the repeat end address in the RE register.
III
SETRC instruction is used to set the number of repetitions in the RC register.
IV
:
(Minimum of one instruction inserted.)
Program to be repeated is started.
Process (1) in the flowchart on the next page corresponds to I through III above. After the
program to be repeated is started (IV), it is repeated within the scope of process (2). Two main
programs are shown in the example, but their function is the same. In (1) repeat control
instructions (LDRS, LDRE, and SETRC) are used, and in (2) the extended instruction
REPEAT is used. REPEAT automatically generates the CPU instructions (LDRS, LDRE, and
SETRC) used to repeat the instructions between the start and end addresses. In the format
shown below if the number of repetitions is omitted, the SETRC instruction is not generated.
Format:
REPEAT [start address], [end address], [number of repetitions]
Rev. 1.0, 09/99, page 33 of 115
In program (1) the repeat start and end addresses are different from the actual addresses, and
this is because the address setting change depending on the number of instructions in the
program to be repeated. Table 5.1 shows how the RS and RE settings change depending on the
number of instructions within the range to be repeated. These are the addresses actually
repeated by the program when the repeat start and end addresses are set in RS and RE.
Therefore, it is necessary to label the repeat start and end addresses while keeping the offsets
listed in Table 5.1 in mind. The setting method for RS and RE in program (1) is described on
the next page.
RPT_S0+N: Address N bytes from the instruction preceding the instruction at the start
address of the program to be repeated
RPT_S:
Start address of the program to be repeated
RPT_E:
End address of the program to be repeated
RPT_E3+4: Address 4 bytes from the instruction three instructions before the instruction at
the end address of the program to be repeated
Table 5.1
RS and RE Setting Values Based on Number of Instructions Within Repeat
Number of Instructions in Program to be Repeated
1
2
3
4
RS
RPT_S0 + 8
RPT_S0 + 6
RPT_S0 + 4
RPT_S
RE
RPT_S0 + 4
RPT_S0 + 4
RPT_S0 + 4
RPT_E3 + 4
Rev. 1.0, 09/99, page 34 of 115
2. Repeat Control Using CPU Instructions
Example (a) shows the method for setting addresses in RS and RE. If there are three
instructions in the portion to be repeated, RS and RE must be set to the RPT_S0+4 address, as
indicated in Table 5.1. The double data transfer instructions in lines (1) and (2) of this program
have a 16-bit instruction length, so the RPT_S0+4 address corresponds to the RPT_E0 address.
If RS and RE are set to the address RPT_E0, the result is program (b).
LDRS
RPT_S0+4 address
;Repeat start address
LDRE
RPT_S0+4 address
;Repeat end address
SETRC
#5
;Repeat counter setting/5 repetitions
RPT_S0:
(1)
MOVX.W @R5,X1
RPT_S:
(2)
MOVX.W @R4+,X0 MOVY.W @R6+,Y0
RPT_E0: PADD
X0,Y0,M0
RPT_E:
X1,M0,X1
PADD
MOVY.W @R7,Y1
;Clear X1, Y1 = 1/10
;X1/data total
PMULS
X1,Y1,A1
;A1/average value
(a) RS and RE Address Setting Method
LDRS
RPT_E0
;Repeat start address
LDRE
RPT_E0
;Repeat end address
SETRC
#5
;Repeat counter setting/5 repetitions
RPT_S0:
MOVX.W @R5,X1
RPT_S:
MOVX.W @R4+,X0 MOVY.W @R6+,Y0
RPT_E0: PADD
X0,Y0,M0
RPT_E:
X1,M0,X1
PADD
MOVY.W @R7,Y1
;Clear X1, Y1 = 1/10
;X1/data total
PMULS
X1,Y1,A1
;A1/average value
(b) RS and RE Address Setting Method
Rev. 1.0, 09/99, page 35 of 115
3. Repeat Control Using Extended Instructions
When the extended instruction REPEAT is used there is no need to perform complicated
labeling, as is the case when using CPU instructions for repeat control. The following
explanation is based on the expanded image of a portion of a repeat program shown as (a)
below. With REPEAT one only needs to declare the labels for the start (RPT_S) and end
(RPT_E) addresses of the program to be repeated, and then the assembler automatically
calculates the address values to be used for the RS and RE settings (RPT_E0 if the code to be
repeated contains three instructions), and generates the LDRS, LDRE, and SETRC
instructions. When the extended instruction REPEAT is actually used, the result is the repeat
program shown in example (b) below.
REPEAT
RPT_S,RPT_E,#5
LDRS
RPT_E0
;RPT_S0+4
LDRE
RPT-E0
;RPT_S0+4
SETRC
#5
Expands to CPU instructions for repeat control.
RPT_S0:
MOVX.W @R5,X1
MOVY.W @R7,Y1
RPT_S:
MOVX.W @R4+,X0
MOVY.W @R6+,Y0
RPT_E0: PADD
X0,Y0,M0
RPT_E:
X1,M0,X1
PADD
PMULS
X1,Y1,A1
(a) Expanded Image of Repeat Program
REPEAT
RPT_S,RPT_E,#5
RPT_S0:
MOVX.W @R5,X1
MOVY.W @R7,Y1
RPT_S:
MOVX.W @R4+,X0
MOVY.W @R6+,Y0
RPT_E0: PADD
X0,Y0,M0
RPT_E:
X1,M0,X1
PADD
PMULS
X1,Y1,A1
(b) Repeat Program Using Extended Instruction REPEAT
Rev. 1.0, 09/99, page 36 of 115
Flowchart
Start
Transfer XRAM_ADD address to R4
Transfer CLR address to R5
Transfer YRAM_ADD address to R6
Transfer DIV address to R7
Set RPT_S address as repeat start address (RS)
Set RPT_E address as repeat end address (RE)
(1)
Set RC counter in register SR to number of
repetitions (5 times)
Clear register X1 by transferring R5 address
(H'1000F00A) data (0) to register X1
Transfer data (0.1) from register R7 (H'1001F00A) to
register Y1
Transfer R4 address data to register X0 and
increment R4 address
Transfer R6 address data to register Y0 and
increment R6 address
Add data from registers X0 and Y0, and store result
in register M0
Repeat program
number of times
indicated by
repetitions setting
(5 times in this
case)
(2)
Add data from registers X1 and M0, and store result
in register X1
Multiply data from registers X1 and Y1, and store
result in register A0
End
Rev. 1.0, 09/99, page 37 of 115
Main Program
(1) Repeat Control Using CPU Instructions
;*******************************************************************************************
;*
Repeat routine
;*******************************************************************************************
MAIN:
MOV.L
#XRAM_ADD,R4
MOV.L
#CLR,R5
MOV.L
#YRAM_ADD,R6
MOV.L
#DIV,R7
LDRS
RPT_E0
;Repeat start address
LDRE
RPT_E0
;Repeat end address
SETRC
#5
;Repeat counter setting/5
repetitions
MOVX.W @R5,X1
RPT_S:
RPT_E0: PADD
X0,Y0,M0
RPT_E: PADD
X1,M0,X1
PMULS
EXIT:
MOVY.W @R7,Y1
;Clear X1, Y1 = 1/10
MOVX.W @R4+,X0 MOVY.W @R6+,Y0
BRA
;X1/data total
X1,Y1,A1
;A1/average value
EXIT
NOP
MAIN_E: NOP
(2) Repeat Control Using Extended Instruction REPEAT
;*******************************************************************************************
;*
Repeat routine
;*******************************************************************************************
MAIN:
MOV.L
#XRAM_ADD,R4
MOV.L
#CLR,R5
MOV.L
#YRAM_ADD,R6
MOV.L
#DIV,R7
MOV.L
#5,R0
REPEAT RPT_S,RPT_E,R0
;CPU instructions for
repeat control generated
automatically
MOVX.W @R5,X1
RPT_S:
PADD
X0,Y0,M0
RPT_E: PADD
X1,M0,X1
PMULS X1,Y1,A1
EXIT:
MOVY.W @R7,Y1
;Clear X1, Y1 = 1/10
MOVX.W @R4+,X0 MOVY.W @R6+,Y0
BRA
EXIT
NOP
MAIN_E: NO
Rev. 1.0, 09/99, page 38 of 115
;X1/data total
;A1/average value
Data
* Same data used by main programs (1) and (2)
;*******************************************************************************************
;*
Data (X/YRAM)
;*******************************************************************************************
.SECTION XRAM,CODE,LOCATE=H'1000F000
XRAM_ADD:
.XDATA.W
0.0625,0.125,0.0625,0.0625,0.03125
;DSP operation data
CLR;
.DATA.W
0
;DSP operation result storage area
YRAM_ADD:
.XDATA.W
0.0625,0.125,0.03125,0.125,0.0625
;DSP operation data
DIV:
.XDATA.W
0.1
;DSP operation result storage area
.SECTION YRAM,CODE,LOCATE=H'1001F000
Rev. 1.0, 09/99, page 39 of 115
Section 6 Examples of Arguments Passed Between CPU
Instructions and DSP Instructions
Overview
The two 16-bit fixed-point data values stored at the XRAM_ADD address (H'1000F000) and
YRAM_ADD address (H'1001F000) are multiplied using DSP instructions and CPU instructions.
Description
When data is passed between CPU instructions and DSP instructions, R4, R5, R6, and R7 are used
as pointers and the data is passed via XRAM and YRAM. The procedure when the result of a
calculation performed by the DSP is used by the CPU is described below.
As can be seen in (2-1), (3-1), and (3-2), both the (2) DSP multiplication routine and (3) CPU
multiplication routine of the example main program read data stored in XRAM and YRAM.
Example arguments:
PADD
X0,Y0,A0
MOVX.W A0,@R4
MOV.W @R4,R0
; Stores result of adding X0 and Y0 in A0
; Transfers A0 data to R4 address
; Transfers R4 address data to R0
Some points need to be kept in mind when transferring data. Some of the DSP instructions are for
handling fixed-point data, and when fixed-point multiplication is performed the result is matched
to the MSB. However, when multiplication is performed using CPU instructions, integer
multiplication is performed and the is matched to the LSB. This means that the calculation result
will differ from that obtained using DSP instructions.
The multiplication process used in (2-1), (3-1), and (3-2) in the (2) DSP multiplication routine and
(3) CPU multiplication routine in the flowchart on the following page is shown in table 6.1. This
shows that the calculation results after execution differ even if the source operand data is identical.
When a DSP instruction (PMULS) is used to multiply integer data, it is necessary to convert the
calculation result from fixed-bit data into integer format by performing a bit shift.
Rev. 1.0, 09/99, page 41 of 115
Table 6.1
DSP and CPU Multiplication Process
(2) DSP multiplication routine
Excerpt from Main Program
Register Contents
PMULS
Before execution:
X0=H'4000, Y0=2000
X0,Y0,A0
After execution:
A0=H'1000 0000
(3) CPU multiplication routine
MULS.W
STS
R0,R1
MACL,R14
Before execution:
R0=H'4000, R1=H'2000
After execution:
R14=H'0800 0000
Rev. 1.0, 09/99, page 42 of 115
Flowchart
Start
Transfer XRAM_ADD address (H'1000F000) to
register R4
(1-1)
Transfer YRAM_ADD address (H'1001F000) to
register R6
(1-2)
Transfer data (H'4000) from R4 address
(H'1000F000) to register X0
Transfer data (H'2000) from R6 address
(H'1001F000) to register Y0
(2-1)
Multiply data from register X0 and register Y0, store
result in register A0
(2-2)
Transfer data (H'4000) from R4 address
(H'1000F000) to register R0
(3-1)
Transfer data (H'2000) from R6 address
(H'1001F000) to register R1
(3-2)
Multiply data from register R0 and register R1
(3-3)
Transfer data (multiplication result) from register
MACL to register R14
(3-4)
(1)
(2)
(3)
End
Rev. 1.0, 09/99, page 43 of 115
Main Program
;*******************************************************************************************
;*
Initial setting routine
;*******************************************************************************************
MAIN:
MOV.L
#XRAM_ADD,R4
MOV.L
#YRAM_ADD,R6
;*******************************************************************************************
;*
DSP multiplication routine
;*******************************************************************************************
MOVX.W @R4,X0 MOVY.W @R6,Y0 ;Load 0.5,0.25
PMULS
X0,Y0,A0
;A0 = multiplication result
;*******************************************************************************************
;*
CPU multiplication routine
;*******************************************************************************************
EXIT:
MOV.L
@R4,R0
;H'4000 load
MOV.L
@R6,R1
;H'2000 load
MULS.W
R0,R1
STS
MACL,R14
BRA
EXIT
;R14 = multiplication result
NOP
MAIN_E: NOP
Data
;**********************************************************************
;*
Data
;**********************************************************************
.SECTION XRAM,DATA,LOCATE=H'1000F000
XRAM_ADD:
.XDATA.W
YRAM_ADD
.XDATA.W
0.5
;DSP operation data
.SECTION YRAM,DATA,LOCATE=H'1001F000
0.25
.END
Rev. 1.0, 09/99, page 44 of 115
;DSP operation data
Section 7 32-bit Multiplication
Overview
The 32-bit data value stored at the XRAM_ADD address (H'1000F000) and the 32-bit data value
stored at the YRAM_ADD address (H'1001F000) are multiplied, and the result (64-bit) is
transferred from the ANS address (H'1001F100) to the ANS+7 address (H'1001F107), where it is
stored.
Description
1. Overview of Calculation Method
The addresses where the multiplier and multiplicand of a 32-bit multiplication operation are
stored, and the address where the result is stored, are shown in figure 7.1. Figure 7.2 shows an
overview of the calculation method for 32-bit multiplication. The 32-bit data values (the
multiplier and multiplicand) are separated into their upper and lower 16-bit segments (here
provisionally called A, B, C, and D), which are then multiplied to produce the 64-bit operation
result. The top bit (MSB) of the 16-bit data input to the multiplier is interpreted as the sign bit,
0
and it has a weight of –2 = –1. Therefore, in the example program the first top bit (MSB) is
replaced with 0, the product of the various segments is calculated, and a correction items are
added using the top bit in order to obtain the 32-bit multiplication result.
Input
31
16 15
XRAM_ADD
31
Output
63
48 47
ANS
0
YRAM_ADD+2
32 31
ANS+2
Multiplicand (32-bit)
XRAM_ADD+2
16 15
YRAM_ADD
×)
0
Multiplier (32-bit)
16 15
ANS+4
0
ANS+6
Multiplication result
(64-bit)
Figure 7.1 32-bit Multiplication
Rev. 1.0, 09/99, page 45 of 115
×)
A
B
Multiplicand
C
D
Multiplier
B: XRAM_ADD+2 address data
A: XRAM_ADD address data
D: YRAM_ADD+2 address data
C: YRAM_ADD address data
B×D
+
A×D
+
B×C
+
A×C
63
48 47
32 31
16 15
Figure 7.2 Overview of Calculation Method for 32-bit Multiplication
Rev. 1.0, 09/99, page 46 of 115
0
2. Double-length Calculation Algorithm
If the single-precision number of bits is n, “double-length” refers to 2n bits. Therefore, 2n bit
numbers can be expressed as shown in figure 7.3.
A
2n–1
Multiplicand: E
B
n n–1
A0
B0
*1
–e2n–1 · 22n–1 (Upper MSB)
2n–2
∑ ei · 2i
i=n
en–1 · 2n–1 (Lower MSB)
n–2
∑ ei · 2i
i=0
C
2n–1
Multiplier: F
D
n n–1
C0
D0
*1
–f2n–1 · 22n–1
2n–2
∑ fi · 2i
i=n
fn–1 · 2n–1
n–2
∑ fi · 2i
i=0
*1: ei, fi = 0 or 1
Figure 7.3 Structure of 2n-bit Numbers
Rev. 1.0, 09/99, page 47 of 115
Here, if Σei · 2 = A0, Σei · 2 = B0, Σei · 2 = C0, Σei · 2 = D0, performing the double-length
multiplication E × F is can be expressed as:
i
i
E × F = (–e2n–1 · 2
2n–1
–e2n–1 · 2
–f2n–1 · 2
4n–2
i
+ B0) × (–f2n–1 · 2
2n–1
+ C0 + f2n–1 · 2
n–1+
+ D0)
(1)
(C0 + fn–1 · 2
n–1+
+ D0) (2)
n–1+
+ B0) (3)
(A0 + en–1 · 2
n–1
(C0 + fn–1 · 2
n–1
(A0 + B0) (5)
+en–1 · 2
+fn–1 · 2
2n–1
n–1+
+ A0 + e2n–1 · 2
= e2n–1 · f2n–1 · 2
2n–1
i
n–1+
+ D0) (4)
+A0 · C0 + A0 · D0 + B0 · C0 + B0 · D0 (6)
In the above equation, (6) is the product of the segments and (1) through (5) are correction
items.
The correction items involve determining whether the sign bit is “0” or “1” and, if it is “1”,
adding it to or deleting it from the product of the segments.
Figure 7.4 shows a 32-bit double-length multiplication algorithm that uses the above equation.
The whole can be subdivided into the following six parts:
In part (1), in order to clear the sign bits of A, B, C, and D to 0, the logical product with
H'7FFF is obtained, resulting in A0, B0, C0, and D0. In part (2), the product is calculated for
the following four segments: A0 · C0, A0 · D0, B0 · C0, and D0 · C0. In parts (3) through (6),
the sum is obtained for each digit, and the results are stored at the ANS, ANS+2, ANS+4, and
ANS+6 addresses.
Rev. 1.0, 09/99, page 48 of 115
*1
×)
*2
16 15
31
S
A
C
D
0
(1-1)
A0
0
15
0
0
S
15
0
B
16 15
31
S
0
S
(1-2)
C0
(1)
0
15
0
0
0
(1-4)
D0
16 15
31
(1-3)
B0
15
0
A0 × D0
(2-1)
16 15
31
0
B0 × D0
(2)
16 15
31
(2-2)
0
A0 × D0
(2-3)
16 15
31
0
B0 × C0
(2-4)
0
15
(3)
ANSWER1
(3-1)
0
15
(A0 × D0) Low
+
0
(B0 × C0) Low
+
0
15
(B0 × D0) High
+
16 15
0
(4-1)
15
(4)
31
C0 + D
Correction item (4)
16 15
31
(4-2)
(4-3)
(4-4)
+
0
A0 + B0
+ ) Correction item (5)
(4-5)
0
15
C
(4-6)
ANSWER2
0
15
(A0 × C0) Low
+
0
(B0 × C0) High
+
0
15
(A0 × D0) High
+
16 15
0
(5-1)
15
31
–(C0 + D)
Correction item (2)
(5)
16 15
31
–(A0 + B)
Correction item (3)
15
Correction item (4)
15
+)
Correction item (5)
(5-3)
(5-4)
+
0
(5-5)
+
C0
+
0
(5-6)
0
(5-7)
A0
0
15
C
(5-2)
(5-8)
ANSWER3
0
15
(A0 × C0) High
+
0
Correction item (2)
–C0
+
0
15
Correction item (3)
–A0
+
0
15
+ ) Correction item (1)
H'8000
(6-1)
15
(6)
(6-2)
(6-3)
(6-4)
0
15
(6-5)
ANSWER4
*1 S : Sign bit
*2
: Decimal point position
Figure 7.4 32-bit Double-length Multiplication Algorithm
Rev. 1.0, 09/99, page 49 of 115
Flowchart
Start
To clear sign bit of A, obtain logical product of A and
H'7FFF, and designate as A0
Determine sign bit
(1-1)
To clear sign bit of B, obtain logical product of A and
H'7FFF, and designate as B0
Determine sign bit
(1-2)
To clear sign bit of C, obtain logical product of A and
H'7FFF, and designate as C0
Determine sign bit
(1-3)
To clear sign bit of D, obtain logical product of A and
H'7FFF, and designate as D0
Determine sign bit
(1-4)
Multiply A0 and C0, separate upper and lower bits of
result, and store in XRAM
(2-1)
Multiply B0 and D0, separate upper and lower bits of
result, and store in YRAM
(2-2)
Multiply A0 and D0, separate upper and lower bits of
result, and store in XRAM
(2-3)
Multiply B0 and C0, separate upper and lower bits of
result, and store in YRAM
(2-4)
Store lower bits of B0 and D0 multiplication result at
ANS+6 address
(3-1)
Add lower bits of A0 × D0, lower bits of B0 × C0, and
lower bits of B0 × D0
(4-1)
(1)
(2)
(3)
Is B sign bit 1?
(4)
No
(4-2)
Yes
Add lower bits (D) of correction item (4) to result of
(4-1)
I
Rev. 1.0, 09/99, page 50 of 115
(4-3)
I
Is D sign bit 1?
(4)
No
(4-4)
Yes
Add lower bits (B0) of correction item (5) to result of
(4-1) or (4-3)
(4-5)
Store result of (4-1), (4-3) or (4-5) at ANS+4 address
(4-6)
Add lower bits of A0 × C0, lower bits of B0 × C0, and
upper bits of A0 × D0
(5-1)
Is A sign bit 1?
No
(5-2)
Yes
Add lower bits (–D) of correction item (2) to result of
(5-1)
Is C sign bit 1?
No
(5-3)
(5-4)
Yes
(5)
Add lower bits (–B) of correction item (3) to result of
(5-1) or (5-3)
Is B sign bit 1?
No
(5-5)
(5-6)
Yes
Add upper bits (C0) of correction item (4) to result of
(5-1), (5-3) or (5-5)
Is D sign bit 1?
No
(5-7)
(5-8)
Yes
Add upper bits (A0) of correction item (5) to result of
(5-3), (5-5) or (5-7)
(5-9)
II
Rev. 1.0, 09/99, page 51 of 115
II
(5)
Store result of (5-1), (5-3), (5-5), (5-7) or (5-9) at
ANS+2 address
(5-10)
Add carry to upper bits of result of (2-1)
(6-1)
Is A sign bit 1?
No
(6-2)
Yes
Add upper bits (–C0) of correction item (2) to result
of (6-1)
Is C sign bit 1?
No
(6-3)
(6-4)
Yes
(6)
Add upper bits (–A0) of correction item (3) to result
of (6-1) or (6-3)
Are A and C sign bits both 1?
No
(6-5)
(6-6)
Yes
Add of correction item (1) (H'8000) to result of (6-1),
(6-3) or (6-5)
(6-7)
Store result of (6-1), (6-3), (6-5) or (6-7) at ANS
address
(6-8)
End
Rev. 1.0, 09/99, page 52 of 115
Main Program
;*******************************************************************************************
;*
32-bit fixed-point multiplication routine
;*
[A][B] × [C][D]
;*
;*
;*******************************************************************************************
MAIN: MOV.L
#XRAM_ADD,R4
MOV.L
#WORKX,R5
MOV.L
#YRAM_ADD,R6
MOV.L
#WORKY,R7
;XRAM for work
;YRAM for work
;Clear sign
MOV.W
#H'7FFF,R0
MOV.W
R0,@R7
PCLR
A1
PAND
X0,Y0,A0
MOV.W
PSHA
DCT
PINC
PAND
MOVX.W @R4+,X0 MOVY.W @R7,Y0 ;A,H'7FFF load
MOVY.W @R6+,Y1 ;A0,C load
R0,@R5
;H'7FFF -> #WORKX
#1,X0
MOVX.W @R5,X1
;A sign chech,H'7FFF load
A1,A1
MOVX.W A0,@R5+
;A0 store
X1,Y1,A0
MOVX.W @R4,X0
;C0,B load
MOV.L
R4,@-R15
MOV.L
#SIGNA,R4
PCLR
A1
PSHA
#1,Y1
MOVY.W A0,@R7+ ;C sign check,C0 store
DCT PINC
A1,A1
MOVY.W @R6,Y1 ;B sign check,D load
PAND
X0,Y0,A0
PCLR
A1
PSHA
#1,X0
DCT PINC
A1,A1
PAND
X1,Y1,A0
PCLR
A1
PSHA
#1,Y1
DCT PINC
A1,A1
MOVX.W A1,@R4+
MOVX.W A1,@R4+
;B0
MOVX.W A0,@R5
MOVX.W A1,@R4+
;D0,B0 store
MOVY.W A0,@R7 ;D0 store
MOVX.W A1,@R4
MOV.L
@R15+,R4
;*****************************************************************
;*Segment product calculation routine/
B0×D0,A0×C0,B0×C0,A0×D0
;*****************************************************************
MOV.L
#WORKX,R5
MOV.L
#WORKY,R7
MOVX.W @R5+,X0 MOVY.W @R7+,Y0 ;A0,C0
PMULS
X0,Y0,A1
MOVX.W @R5+,X1 MOVY.W @R7+,Y1 ;A0×C0,B0,D0
PMULS
X1,Y1,A0
MOVX.W A1,@R5+
PSHA
#16,A1
;B0×D0, (A0×C0)H store
MOVY.W A0,@R7+ ;(A0×C0)L, (B0×D0)H store
Rev. 1.0, 09/99, page 53 of 115
PSHA
#16,A0
MOVX.W A1,@R5+
;(B0×D0)L, (A0×C0)L store
PMULS
X0,Y1,A1
PSHA
#16,A1
MOVX.W A1,@R5+
MOVY.W A0,@R7+ ;A0×D0, (B0×D0)L store
;(A0×D0)L, (A0×D0)H store
PMULS
X1,Y0,A1
MOVX.W A1,@R5
;B0×C0, (A0×D0)L store
PSHA
#16,A1
MOVY.W A1,@R7+ ;(B0×C0)L, (B0×C0)H store
MOVY.W A1,@R7 ;(B0×C0)L store
;******************
;*ANSWER1 STORE
;******************
MOV.L
R7,@-R15
MOV.L
#ANS,R7
;push R7
ADD
#6,R7
ADD
#-2,R7
MOV.L
R7,R14
;R14=#ANS+2
MOV.L
@R15+,R7
;pop R7
MOVY.W A0,@R7+ ;Store in ANS1
********************************************************************************************
;*2-word calculation routine/
R4=#XRAM_ADD+2,R5=#WORKX+10,R6=#YRAM_ADD+2,R7=#WORKY+10
;*******************************************************************************************
PCOPY
X1,M1
MOV.L
#-6,R9
PCLR
A1
PADD
X1,Y1,A0
DCT PINC
PADD
DCT PINC
MOVX.W @R5,X1
MOVY.W @R7+R9,Y1 ;(A0×D0)L lode,
(B0×C0)L load
MOVY.W @R7+,Y1
A1,A1
;carry check
A0,Y1,A0
;(A0×D0)L+(B0×C0)
L+(B0×D0)H
A1,A1
;carry check
MOV.W
#H'0,R10
MOV.L
#SIGND,R0
MOV.W
@R0+,R1
CMP/EQ
R10,R1
BT
HOSEI4_L
;Is B negative?
MOVY.W @R6,Y1
PADD
DCT PINC
;(A0×D0)L+(B0×C0)L,
(B0×D0)H load
A0,Y1,A0
;Load D
;Add D
A1,A1
HOSEI4_L:
MOV.W
@R0,R1
CMP/EQ
R10,R1
BT
HOSEI5_L
PADD
DCT PINC
A0,M1,A0
;Is D negative?
;Add B0
A1,A1
HOSEI5_L:
MOV.L
R4,@-R15
Rev. 1.0, 09/99, page 54 of 115
;push R4
MOV.L
#CARRY,R4
MOV.L
@R15+,R4
;pop R4
MOV.L
R7,@-R15
;push R7
MOV.L
R14,R7
ADD
#-2,R7
MOV.L
R7,R14
;R14=#ANS+4
MOV.L
@R15+,R7
;pop R7
MOVX.W A1,@R4
;carry store
;******************
;*ANSWER2 STORE
;******************
MOVY.W A0,@R7+
;ANS2 store
;*******************************************************************************************
;*3-word calculation routine/
R4=#XRAM_ADD+2,R5=#WORKX+10,R6=#YRAM_ADD+2,R7=#WORKY+6
;*******************************************************************************************
MOV.L
#-4,R8
PCOPY
X0,A1
MOVX.W @R5+R8,X0 MOVY.W @R7+,Y1 ;dummy load
MOVX.W @R5+,X0
PADD
DCT PINC
PADD
DCT PINC
X0,Y1,M1
MOVX.W @R5,X1
MOVY.W @R7+,Y1 ;(A0×C0)L lode,
(B0×C0)H load
;(A0×C0)L+(B0×C0)H,
(A0×D0)H load
M0,M0
;carry check
X1,M1,A0
;(A0×C0)L+(B0×C0)
H+(A0×D0)H
M0,M0
;carry check
;Correction
MOV.W
#H'0,R10
MOV.L
#SIGNA,R0
MOV.W
@R0+,R1
CMP/EQ
R10,R1
BT
HOSEI2_L
PSUB
DCT PDEC
;Is A negative?
A0,Y1,A0
;Subtract D (correction 2)
M0,M0
HOSEI2_L:
MOV.W
@R0+,R1
CMP/EQ
R10,R1
BT
HOSEI3_L
;Is C negative?
MOVX.W @R4,X1
PCOPY
PSUB
DCT PDEC
X1,M1
A0,M1,A0
;Subtract B (correction 3)
M0,M0
HOSEI3_L:
MOV.W
@R0+,R1
CMP/EQ
R10,R1
BT
HOSEI4_H
PADD
A0,Y0,A0
;Is B negative?
;Subtract C0 (correction 4)
Rev. 1.0, 09/99, page 55 of 115
DCT PINC
M0,M0
HOSEI4_H:
MOV.W
@R0+,R1
CMP/EQ
R10,R1
BT
HOSEI5_H
PCOPY
PADD
DCT PINC
;Is D negative?
A1,M1
A0,M1,A0
;Add A0 (correction 5)
M0,M0
HOSEI5_H:
PCOPY
A0,M1
MOV.L
#CARRY,R4
MOVX.W @R4,X1
PADD
DCT PINC
;Load carry
X1,M1,A0
;Add carry
M0,M0
;Check carry
;**************
;*ANSWER3 STORE
;**************
MOV.L
R14,R7
ADD
#-2,R7
MOVY.W A0,@R7+ ;ANS3 store
;*******************************************************************************************
;*4-word calculation routine/
R4=#XRAM_ADD+2,R5=#WORKX+8,R6=#YRAM_ADD+2,R7=#WORKY+10
;*******************************************************************************************
PCLR
Y1
MOVX.W @R5+R8,X1
;dummy load
PCLR
M1
MOVX.W @R5,X1
;(A0×C0)H load
PADD
DCT PINC
X1,M0,A0
M1,M1
;Correction
MOV.L
#SIGNA,R0
MOV.W
@R0+,R1
CMP/EQ
R10,R1
BT
HOSEI3_H
PCOPY
PSUB
DCT PDEC
;Is A negative?
A1,M0
A0,M0,A0
;Subtract C0 (correction 2)
M1,M1
MOV.L
#H'0,R12
ADD
#1,R12
HOSEI2_H:
MOV.W
@R0+,R1
CMP/EQ
R10,R1
BT
HOSEI4_H
PSUB
DCT PDEC
ADD
A0,Y0,A0
;Is C negative?
;Subtract A0 (correction 3)
M1,M1
#1,R12
HOSEI3_H:
Rev. 1.0, 09/99, page 56 of 115
MOV.L
#2,R1
CMP/EQ
R1,R12
BF
FIN
MOV.W
#H'8000,R10
MOV.W
R10,@R5
;Are both A and C negative?
MOVX.W @R5,X0
PCOPY
X0,M1
PADD
A0,M1,A0
;Add H'8000 (correction 1)
;**************
;*ANSWER4 STORE
;**************
FIN:
MOVY.W A0,@R7 ;ANS4 store
EXIT: BRA
EXIT
NOP
MAIN_E:
NOP
Data
;*******************************************************************************************
;*
32-bit multiplication data (XRAM/YRAM)
;*******************************************************************************************
.SECTION XRAM,DATA,LOCATE=H'1000F000
XRAM_ADD:
.XDATA.L
0.25002500
;Multiplicand
WORKX:
.RES.W
6
;Work area
CARRY:
.RES.W
1
;Carry area
SIGNA:
.RES.W
1
;For determining sign of multiplicand upper word A
SIGNC:
.RES.W
1
;For determining sign of multiplier upper word C
SIGNB:
.RES.W
1
;For determining sign of multiplicand lower word B
SIGND:
.RES.W
1
;For determining sign of multiplier lower word D
YRAM_ADD:
.XDATA.L
0.50005000
;Multiplier
WORKY:
.RES.W
6
;Work area
ANS:
.RES.W
4
;Multiplication result storage area
.SECTION YRAM,DATA,LOCATE=H'1001F000
Rev. 1.0, 09/99, page 57 of 115
Section 8 Trigonometric Functions
Overview
Calculating the trigonometric functions SIN(X) and COS(X).
Description
1. Performing Trigonometric Functions
Figure 8.1 shows curves for SIN(X) and COS(X). If the angle range is –π ≤ X ≤ π, the
relationships expressed in equation (1) exists.
SIN(–X) = –SIN(X)
COS(–X) = COS(X)
------------------------------------------------------------------ (1)
Using the relationships expressed in equation (1), the SIN(X) and COS(X) of –π ≤ X ≤ 0 can
be calculated by obtaining the SIN(X) and COS(X) of 0 ≤ X ≤ π and processing the sign.
Next is figure 8.2 (a) and (b). The relationships of SIN(X) and COS(X), with X = π/2 at the
center, are expressed in equation (2).
SIN(X + π/2) = –SIN(π/2 – X)
COS(X + π/2) = COS(π/2 – X)
------------------------------------------------------ (2)
1
–π
–π/2
π/2
0
π
–1
Figure 8.1 SIN(X) and COS(X) Curves
Rev. 1.0, 09/99, page 59 of 115
1
1
π/2
0
π
0
π/2
π
–1
(a) SIN (X)
(b) COS (X)
Figure 8.2 SIN(X) and COS(X) Curves with X = π/2 at Center
Based on the relationship between equations (1) and (2), the SIN(X) and COS(X) of –π ≤ X ≤
π can be calculated by obtaining the SIN(X) and COS(X) of 0 ≤ X ≤ π and, finally, processing
the sign. The example program divides 0 ≤ X ≤ π/2 into 128 segments. If X = n · π/256 + ∆X
(n = 1, 2, ...., 128), the result is equation (3), based on the addition theorem of trigonometric
functions.
SIN(X) =
=
COS(X) =
=
SIN(n · π/256 + ∆X)
SIN(n · π/256) · COS(∆X) – COS(n · π/256) · SIN(∆X)
COS(n · π/256 + ∆X)
COS(n · π/256) · COS(∆X) – SIN(n · π/256) · SIN(∆X)
------------ (3)
If we assume that in equation (3) ∆X is extremely small and approximate that SIN(∆X) = ∆X
2
and COS(∆X) = 1 – (∆X) /2, the result is equation (4).
SIN(X) = SIN(n · π/256) · {1 – (∆X)2/2} + ∆X · COS(n · π/256)
--------------- (4)
COS(X) = COS(n · π/256) · {1 – (∆X)2/2} – ∆X · SIN(n · π/256)
In other words, by calculating equation (4) using ∆X and table data (n · π/256), we can obtain
the SIN(X) and COS(X) of 0 ≤ X ≤ π/2. The final result is then obtained by performing sign
processing.
Rev. 1.0, 09/99, page 60 of 115
2. Converting Input Values
Using conversion equation (5), the example program inputs to the DSP as angle parameters the
input value X for the range –π ≤ X ≤ π and a for the range –1 ≤ X < 1.
X = π·a
a = X/π
--------------------------------------------------------------------------------- (5)
X unit: rad
a unit: rad/π
Table 8.1
Relation Between Input Value a and Polarity
Result
Input Value
SIN(X)
COS(X)
|a|
–1 < ≤ a < –0.5
(–π ≤ X < –π/2)
Negative
Negative
| a | > 0.5
–0.5 ≤ a < 0
(–π/2 ≤ X < 0)
Negative
Positive
| a | ≤ 0.5
0 ≤ a ≤ 0.5
(0 ≤ X ≤ π/2)
Positive
Positive
| a | ≤ 0.5
0.5 < a < 1
(π/2 < X < π)
Positive
Negative
| a | > 0.5
Here the range 0 ≤ X ≤ π/2 corresponds to the range 0 ≤ X ≤ 0.5. Also, the input value a is
converted from the range –1 < a ≤ 1 to the range 0 ≤ a' ≤ 0.5. Figure 8.3 shows the curves
| SIN(X) | and | COS(X) |.
–π
–π/2
π/2
0
B
π
–π
–π/2
0
A
(a) | SIN(X) |
π/2
B
π
A
(b) | COS(X) |
Figure 8.3 Curves | SIN(X) | and | COS(X) |
Rev. 1.0, 09/99, page 61 of 115
When obtaining the SIN(X) and COS(X) of point A in figure 8.3, if we assume that A = π/2 +
B, then a = 0.5 + b. Therefore, it is possible to obtain the deviation | b | relative to X = π/2
using equation (6).
| b | = | | a | –0.5 | ------------------------------------------------------------------------- (6)
Next, based on deviation | b |, equation (7) is used to calculate the conversion of input value a
for the range –1 < a ≤ 1 to a' for the range 0 ≤ a' ≤ 0.5.
a' = | | | a | –0.5 | –0.5 | ------------------------------------------------------------------- (7)
3. a' Table Data
The example program uses a table with 128 cells. In other words, the range 0 ≤ a' ≤ 0.5 is
divided into 128 equal segments. The difference in a' due to the angle of each segment is
expressed in equation (8).
0.5/128 = 0.00390625 ------------------------------------------------------------------- (8)
Table 8.2 shows the correspondence between table address n and a' in decimal notation and as
16-bit fixed-point expressions.
Table 8.2
Relationship Between Table Address n and a'
a'
Table
n/256;
16-bit Fixed-point Expression
Address
Decimal Notation
n
15 14 13 12 11 10 9 8 7 6 5 4 3
rad]/π
2
1
0
0
0.00000000
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0.00390625
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
2
0.00781250
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
3
0.01171875
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
4
0.01562500
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
127
0.49609375
0
0
1
1
1
1
1
1
1
0
0
0
0
0
0
0
128
0.50000000
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
: Decimal point position
Rev. 1.0, 09/99, page 62 of 115
4. Method of Calculating ∆X
As shown in table 8.2, the upper nine bits of the a' data expressed in fixed-point format
correspond to n, and the lower seven bits to the amount of shift from the table data ∆a'. Figure
8.4 shows the bit structure of a'. By obtaining the value of a', it is possible to calculate the
equation (2) table data address (the value of n · π/256) as well as ∆X at the same time. Finally,
table 8.1 is used for sign processing in order to obtain the SIN(X) and COS(X) of –π ≤ X ≤ π.
15
7
Table address n
6
0
Shift from table ∆a
: Decimal point position
Figure 8.4 Bit Structure of a'
Figure 8.5 shows the relationship with the amount of shift between table values ∆X. Table shift
∆X can also be obtained by using the ∆a of a' and equation (9).
∆X = ∆a · π -------------------------------------------------------------------------------- (9)
1
(n+1) · π/256
∆X
n · π/256
0
1
Figure 8.5 Relation With Amount of Shift Between Table Values
Rev. 1.0, 09/99, page 63 of 115
5. Overflow Processing
If the calculation result is as shown in equation (10), an overflow occurs.
| SIN(X) | ≥ 1
| COS(X) | < 0
-------------------------------------------------------------------------- (10)
In such cases the value is corrected using equation (11).
| SIN(X) | = 1 – 2–15
| COS(X) | = 0
------------------------------------------------------------------- (11)
6. Algorithm for Calculating Trigonometric Functions
The algorithm for calculating trigonometric functions is as follows.
(1) Make initial settings.
(2) Load input value a, calculate | | | a | –0.5 | –0.5 | to obtain a'.
(3) Obtain logical product of above and #H'FF80 and calculate upper nine bits (n/256) of a'.
Then calculate n and set value in Y bus index register (R9).
(4) Obtain logical product of above and #H'007F and calculate lower seven bits (∆a') of a'.
(5) Calculate π∆a'; calculate ∆X.
(6) Calculate 1 – (∆X) /2. Load sin(n × π/256) and cos(n × π/256) from data table in YRAM.
2
(7) Calculate sin(X).
(8) Process sign of sin(X); store sin(X).
(9) Calculate cos(X).
(10) Process sign of cos(X); store cos(X).
Rev. 1.0, 09/99, page 64 of 115
Execution Example
The sin(X) and cos(X) (OUTPUT) calculation results obtained based on the input value a
(INPUT) are shown in table 8.3.
Table 8.3
sin(x), cos(X) Calculation Results
Logical Value
(decimal)
Logical Value
(hexadecimal)
Output Value
(hexadecimal)
Angle
X°
Input
Value
(a = X/π
π)
sin(X)
cos(X)
sin(X)
cos(X)
sin(X)
cos(X)
0
0
0
1
H'0000
H'7FFF
H'0000
H'7FFF
30
0.16667
0.5
0.86603
H'4000
H'6EDA
H'3FFE
H'6ED9
45
0.25
0.70711
0.70711
H'5A82
H'5A82
H'5A82
H'5A82
89.5
0.49722
0.99996
0.00873
H'7FFE
H'011E
H'7FFD
H'011D
152
0.84444
0.46947
–0.88295
H'3C17
H'8EFC
H'3C19
H'8EFD
179.5
0.99722
0.00873
–0.99996
H'011E
H'8002
H'011C
H'8002
–40
–0.22222
–0.64279
0.76604
H'ADB9
H'620D
H'ADBB
H'620F
–75
–0.41667
–0.96593
0.25882
H'845D
H'2121
H'845D
H'2121
–137
–0.76111
–0.681
–0.73135
H'A8B4
H'A263
H'A8B5
H'A263
–180
–1
0
–1
H'0000
H'8000
H'0002
H'8001
Rev. 1.0, 09/99, page 65 of 115
Flowchart
Start
Transfer INPUT address to register R4
(1-1)
Transfer WORK address to register R5
(1-2)
Transfer TABLE_SIN address to register R6
(1-3)
Transfer TABLE_COS address to register R7
(1-4)
Load input value a
(2-1)
Transfer H'FF80 to R5 address (WORK area)
(2-2)
To determine sign, copy a and store value in register
M1, load 0.5
(2-3)
Calculate | | a | –0.5 |
(2-4)
Calculate | | | a | –0.5 | –0.5 | to obtain a', load
H'FF80 from address R5
(2-5)
Obtain logical product of a' and H'FF80, calculate
upper 9 bits (n/256) of a'
(3-1)
Convert n/256 fixed-point data to integer data by
shifting n/256 6 bits to the right
(3-2)
Transfer integer data n obtained in (2-1) to R5
address (WORK area)
(3-3)
Zero-extend integer data n passed to CPU unit via R5
address to long-word size, set Y index register R9
(3-4)
(1)
(2)
(3)
I
Rev. 1.0, 09/99, page 66 of 115
I
(4)
(5)
(6)
(7)
Transfer H'007F to R5 address (WORK area)
(4-1)
Load H'007F from R5 address
(4-2)
Obtain logical product of a' and H'007F, calculate
lower seven bits (∆a') of a'
(4-3)
Calculate 4∆a' by shifting the ∆a' value obtained in
(4-3) 2 bits to the left
Calculate π/4
(5-1)
Multiply 4∆a' and π/4 to calculate ∆X
(5-2)
Square (∆X2) ∆X value obtained in (5-2)
Load sin(n × π/256) from data table in YRAM
(6-1)
Shift ∆X2 value obtained in (6-1) 1 bit to the right to
obtain 1/2 (∆X2/2)
Load –1 from register R4
(6-2)
Subtract ∆X2/2 value obtained in (6-2) from –1 loaded
in (6-2) to calculate 1 – ∆X2/2
Load cos(n × π/256) from data table
(6-3)
Set operation result status (set using DC bit in register
DSR) to overflow mode
(7-1)
Multiply ∆X value obtained in (5-2) and cos(n × π/256)
value loaded in (6-3)
(7-2)
Multiply sin(n × π/256) value obtained in (6-1) and
(1 – ∆X2/2) value obtained in (6-3)
(7-3)
Add operation results from (7-2) and (7-3) to calculate
sin(X)
(7-4)
II
Rev. 1.0, 09/99, page 67 of 115
II
Did (7-4) operation overflow?
No
(7-5)
Yes
(7)
Decrement sin(X) value obtained in (7-4)
(7-6)
Copy input value a from register M1 to register X1
(8-1)
Set operation result status (set using DC bit in register
DSR) to negative value mode
(8-2)
Shift by 1 bit input value a stored in register X1 in
(8-1)
(8-3)
(8)
Is the sign bit of a 1 (a < 0)?
No
(8-4)
Yes
Reverse the sign of the sin(X) value obtained in (7-4)
(8-5)
Transfer the OUTPUT address to register R6
(8-6)
Store sin(X) at the R6 address (OUTPUT+2)
(8-7)
Set operation result status (set using DC bit in register
DSR) to overflow mode
(9-1)
Multiply DX value obtained in (5-2) and sin(n × π/256)
value loaded in (6-1)
(9-2)
Multiply 1 – ∆X2/2 and cos(n × π/256) values obtained
in (6-3)
(9-3)
Add operation results from (9-2) and (9-3) to calculate
cos(X)
(9-4)
(9)
III
Rev. 1.0, 09/99, page 68 of 115
III
(9)
Did (9-4) operation overflow?
No
(9-5)
Yes
(10)
Clear cos(X) value obtained in (9-4) to 0
(9-6)
Transfer the DAT address to register R4
(10-1)
Load 0.5 from R4 address
(10-2)
Calculate absolute value of input value a stored in
register M1 to obtain | a |
(10-3)
Set operation result status (set using DC bit in register
DSR) to negative value mode
(10-4)
Is value
of | a | greater than 0.5?
| a | > 0.5?
(10-5)
No
Yes
Reverse the sign of the cos(X) value obtained in
(10-4)
(10-6)
Store cos(X) at the R6 address (OUTPUT+2)
(10-7)
End
Rev. 1.0, 09/99, page 69 of 115
Main Program
;*******************************************************************************************
;*
Trigonometric function routine
;*
;*
sinX,cosX
;*
;*******************************************************************************************
;*******************************************************************************************
;*
Initial setting routine
;*******************************************************************************************
MAIN:
MOV.L
#INPUT,R4
MOV.L
#WORK,R5
MOV.L
#TABLE_SIN,R6
MOV.L
#TABLE_COS,R7
;*******************************************************************************************
;*
a calculation routine
;*******************************************************************************************
MOVX.W @R4,X0
MOV.L
#H'FF80,R0
MOV.W
R0,@R5
MOV.L
;a load
;For extracting upper 9 bits
of a' (N×π/64)
#DAT,R4
PCOPY
X0,M1
MOVX.W @R4+,X1
PCOPY
X1,Y1
PSUB
X0,Y1,M0
PABS
M0,A0
;||a|-0.5|
PSUB
A0,Y1,M0
;|||a|-0.5|-0.5|
PABS
M0,M0
MOVX.W @R5,X0
;For determining sign of M1,
load 0.5
;M0 = a', #H'FF80 load
;*******************************************************************************************
;*
n calculation, R6 setting routine
;*******************************************************************************************
PAND
X0,M0,A0
;A1 = n/256
PSHA
#-6,A0
;Convert fixed-point n to
integer n
MOVX.W A0,@R5
;Pass integer n to CPU unit
MOV.W
@R5,R1
EXTU.W
R1,R1
;
MOV.L
R1,R9
;
;*******************************************************************************************
∆a' calculation routine
;*
;*******************************************************************************************
MOV.L
#H'007F,R0
Rev. 1.0, 09/99, page 70 of 115
;For extracting lower 7 bits
of a' (∆a')
MOV.W
R0,@R5
MOVX.W @R5,X1
PAND
;#H'007F load
X1,M0,Y1
;∆a'
;*******************************************************************************************
∆X calculation routine
;*
;*******************************************************************************************
PSHA
#2,Y1
PMULS
X1,Y1,A1
;4∆a', ∆/4 load
MOVX.W @R4+,X1
;∆a'× π
;*******************************************************************************************
1 – (∆X2)/2calculation, sin(n × π/256) and cos(n × π/256) loading routine
;*
;*******************************************************************************************
PCOPY
A1,X0
MOVY.W @R6+R9,Y0 ;copy,dummy load
PMULS
A1,X0,M0
MOVY.W @R6,Y0
PSHA
#-1,M0
PSUB
X1,M0,A1
;∆X2,sin(n×π/256) load
MOVX.W @R4,X1 MOVY.W @R7+R9,Y1 ;∆X2/2, -1 lode,dummy load
MOVY.W @R7,Y1
;1-∆X2/2,cos(n×π/256) load
;*******************************************************************************************
;*
sin(X) calculation routine
;*******************************************************************************************
MOV.L
#H'6,R0
LDS
DCT
R0,DSR
;Set overflow mode
PMULS
X0,Y1,M0
;∆X·cos(n×π/256)
PMULS
A1,Y0,A0
;(1-(∆X2)/2)·sin(n×π/256)
PABS
A0,A0
PADD
A0,M0,A0
;A0 = sin(X)
PDEC
A0,A0
;If overflow occurs, sin(X) – 1
;*******************************************************************************************
;*
sin(X) sign processing and storing routine
;*******************************************************************************************
PCOPY
M1,X1
MOV.L
#H'0,R0,
LDS
DCT
R0,DSR
PSHA
#1,X1
PNEG
A0,A0
MOV.L
;Carry/borrow mode
;If a < 0, reverse sign
#OUTPUT,R6
MOVY.W A0,@R6+
;Store sin(X)
;*******************************************************************************************
;*
cos(X) calculation routine
;*******************************************************************************************
MOV.L
#H'6,R0
LDS
PMULS
DCT
R0,DSR
;Set overflow mode
X0,Y0,M0
;∆X·SIN(N×π/64)
PMULS
A1,Y1,A0
;(1-(∆X·∆X)/2)·COS(N×π/64)
PABS
A0,A0
PSUB
A0,M0,A0
PCLR
A0
;If overflow occurs, clear cos(X) to 0
Rev. 1.0, 09/99, page 71 of 115
;;******************************************************************************************
;*
cos(X) sign processing and storing routine
;*******************************************************************************************
MOV.L
#DAT,R4
MOVX.W @R4.X0
PABS
MOV.L
;|a|
#H'2,R0
LDS
DCT
;0.5 load
M1,M1
R0,DSR
PCMP
X0,M1
PNEG
A0,A0
;Set negative value mode
;If | a | < 0.5, reverse sign
MOVY.W A0,@R6
EXIT:
BRA
EXIT
NOP
MAIN_E: NOP
Rev. 1.0, 09/99, page 72 of 115
Data
;*******************************************************************************************
;*
Trigonometric function data routine
;*******************************************************************************************
.SECTION XRAM,DATA,LOCATE=H'1000FF00
INPUT:
.RES.W
1
WORK:
.RES.W
1
;External input data storage area
DAT:
.XDATA.W
0.5,0.78540,-1
;For calculating a', for calculating Ñ/4 (1 – ¦X2/2)
.SECTION YRAM,DATA,LOCATE=H'1001F800
TABLE_SIN:
TABLE_COS:
.XDATA.W
0,0.01227,0.02454,0.03681,0.04907,0.06132 ;N/0 - 5
.XDATA.W
0.07356,0.08580,0.09802,0.11022,0.12241
;N/6 - 10
.XDATA.W
0.13458,0.14673,0.15886,0.17096,0.18304
;N/11 - 15
.XDATA.W
0.19509,0.20711,0.21910,0.23106,0.24298
;N/16 - 20
.XDATA.W
0.25487,0.26671,0.27852,0.29028,0.30201
;N/21 - 25
.XDATA.W
0.31368,0.32531,0.33689,0.34842,0.35990
;N/26 - 30
.XDATA.W
0.37132,0.38268,0.39400,0.40524,0.41643
;N/31 - 35
.XDATA.W
0.42756,0.43862,0.44961,0.46054,0.47140
;N/36 - 40
.XDATA.W
0.48218,0.49290,0.50354,0.51410,0.52459
;N/41 - 45
.XDATA.W
0.53500,0.54532,0.55557,0.56573,0.57581
;N/46 - 50
.XDATA.W
0.58580,0.59570,0.60551,0.61523,0.62486
;N/51 - 55
.XDATA.W
0.63439,0.64383,0.65317,0.66242,0.67156
;N/56 - 60
.XDATA.W
0.68060,0.68954,0.69838,0.70711,0.71573
;N/61 - 65
.XDATA.W
0.72425,0.73265,0.74095,0.74914,0.75721
;N/66 - 70
.XDATA.W
0.76517,0.77301,0.78074,0.78835,0.76584
;N/71 - 75
.XDATA.W
0.80321,0.81046,0.81758,0.82459,0.83147
;N/76 - 80
.XDATA.W
0.83822,0.84485,0.85136,0.85773,0.86397
;N/81 - 85
.XDATA.W
0.87009,0.87607,0.88192,0.88764,0.89322
;N/86 - 90
.XDATA.W
0.89867,0.90399,0.90917,0.91421,0.91911
;N/91 - 95
.XDATA.W
0.92388,0.92851,0.93299,0.93734,0.94154
;N/96 - 100
.XDATA.W
0.94561,0.94953,0.95331,0.95694,0.96043
;N/101 - 105
.XDATA.W
0.96378,0.96700,0.97003,0.97294,0.97570
;N/106 - 110
.XDATA.W
0.97832,0.98079,0.98311,0.98528,0.98730
;N/111 - 115
.XDATA.W
0.98918,0.99090,0.99248,0.99391,0.99518
;N/116 - 120
.XDATA.W
0.99631,0.99729,0.99812,0.99880,0.99932
;N/121 - 125
.XDATA.W
0.99970,0.99992,1
;N/126 - 128
.XDATA.W
1,0.99992,0.99970,0.99932,0.99880,0.99812 ;N/0 - 5
.XDATA.W
0.99729,0.99631,0.99518,0.99391,0.99248
;N/6 - 10
.XDATA.W
0.99090,0.98918,0.98730,0.98528,0.98311
;N/11 - 15
.XDATA.W
0.98079,0.97832,0.97570,0.97294,0.97003
;N/16 - 20
.XDATA.W
0.96700,0.96378,0.96043,0.95694,0.95331
;N/21 - 25
.XDATA.W
0.94953,0.94561,0.94154,0.93734,0.93299
;N/26 - 30
.XDATA.W
0.92851,0.92388,0.91911,0.91421,0.90917
;N/31 - 35
.XDATA.W
0.90399,0.89867,0.89322,0.88764,0.88192
;N/36 - 40
Rev. 1.0, 09/99, page 73 of 115
OUTPUT:
.XDATA.W
0.87607,0.87009,0.86397,0.85773,0.85136
;N/41 - 45
.XDATA.W
0.84485,0.83822,0.83147,0.82459,0.81758
;N/46 - 50
.XDATA.W
0.81046,0.80321,0.76584,0.78835,0.78074
;N/51 - 55
.XDATA.W
0.77301,0.76517,0.75721,0.74914,0.74095
;N/56 - 60
.XDATA.W
0.73265,0.72425,0.71573,0.70711,0.69838
;N/61 - 65
.XDATA.W
0.68954,0.68060,0.67156,0.66242,0.65317
;N/66 - 70
.XDATA.W
0.64383,0.63439,0.62486,0.61523,0.60551
;N/71 - 75
.XDATA.W
0.59570,0.58580,0.57581,0.56573,0.55557
;N/76 - 80
.XDATA.W
0.54532,0.53500,0.52459,0.51410,0.50354
;N/81 - 85
.XDATA.W
0.49290,0.48218,0.47140,0.46054,0.44961
;N/86 - 90
.XDATA.W
0.43862,0.42756,0.41643,0.40524,0.39400
;N/91 - 95
.XDATA.W
0.38268,0.37132,0.35990,0.34842,0.33689
;N/96 - 100
.XDATA.W
0.32531,0.31368,0.30201,0.29028,0.27852
;N/101 - 105
.XDATA.W
0.26671,0.25487,0.24298,0.23106,0.21910
;N/106 - 110
.XDATA.W
0.20711,0.19509,0.18304,0.17096,0.15886
;N/111 - 115
.XDATA.W
0.14673,0.13458,0.12241,0.11022,0.09802
;N/116 - 120
.XDATA.W
0.08580,0.07356,0.06132,0.04907,0.03681
;N/121 - 125
.XDATA.W
0.02454,0.01227,0
;N/126 - 128
.RES.W
2
Rev. 1.0, 09/99, page 74 of 115
;External output data storage area
Section 9 Matrix Operations
Overview
Matrix A (3, 3) and matrix B (3, 3) are multiplied to obtain a 32-bit precision matrix product C (3,
3). Matrixes A and B are set in XRAM and YRAM beforehand. Matrix product C is stored
beginning at YRAM address H'1001FF00.
Description
1. Method of Expressing Matrixes
Figure 9.1 shows matrix A (n,m). The element aij is a component of matrix A. Horizontal rows
of components are called rows, which are numbered from the top as row1, row2, row3, ..., row
i, ... and so on. Vertical columns of components are called columns, which are numbered from
the left as column 1, column 2, column 3, ... column j, ... and so on. The components in the
position where row I and column k intersect is called component (i,j). Component (i,j) of
matrix A (n,m) is expressed as ai,j.
(Column j)
A = (row i)
a11
a21
a12
a22
a1j
a2j
a1n
a2n
ai1
ai2
aij
ain
am1
am2
amj
amn
Figure 9.1 Matrix A
2. Method of Calculating Matrix Product
Figure 9.2 shows the expression of the components of matrix A × matrix B = matrix product C.
*1
a11 a12 a13
a21 a22 a23
a31 a32 a33
Matrix A
×
b11 b12 b13
b21 b22 b23
b31 b32 b33
Matrix B
=
c11 c12 c13
c21 c22 c23
c31 c32 c33
Matrix Product C
*1 ci,j: 32-bit components.
Figure 9.2 Expression of Components of Matrix A × Matrix B = Matrix Product C
Rev. 1.0, 09/99, page 75 of 115
The components ci,j of matrix product C are obtained using the following equation.
3
Cn,m = Σ
(an,i × bi,m)
i=1
The components ci,j of matrix product C are obtained by performing a sum of products
calculation on row components an,i of matrix A and column components bi,m of matrix B.
3. Method of Storing Matrix A, Matrix B, and Matrix Product C Components
The components cn,m of matrix product C are obtained by performing a sum of products
calculation on row components an,i of matrix A and column components bi,m of matrix B. The
example subroutine, in order to increase the processing speed, stores the elements in XRAM
and YRAM as shown in figure 9.3
A1
A2
C1
×
B1 B2 B3
A3
XRAM
a1,1
a1,2
a1,3
a2,1
a2,2
a2,3
a3,1
a3,2
a3,3
Address
#MATRIXB
#MATRIXB+2
#MATRIXB+4
#MATRIXB+6
#MATRIXB+8
#MATRIXB+A
#MATRIXB+C
#MATRIXB+E
#MATRIXB+10
YRAM
b1,1
b2,1
b3,1
b1,2
b2,2
b3,2
b1,3
b2,3
b3,3
C2
C3
Matrix A
Address
#MATRIXA
#MATRIXA+2
#MATRIXA+4
#MATRIXA+6
#MATRIXA+8
#MATRIXA+A
#MATRIXA+C
#MATRIXA+E
#MATRIXA+10
=
Matrix B
A1
A2
A3
B1
B2
Matrix Product C
Address
#MATRIXC
#MATRIXC+2
#MATRIXC+4
#MATRIXC+6
#MATRIXC+8
#MATRIXC+A
#MATRIXC+C
#MATRIXC+E
#MATRIXC+10
#MATRIXC+12
#MATRIXC+14
#MATRIXC+16
#MATRIXC+18
#MATRIXC+1A
#MATRIXC+1C
#MATRIXC+1E
#MATRIXC+20
#MATRIXC+22
*1
YRAM
CH1,1
CL1,1
CH1,2
CL1,2
CH1,3
CL1,3
CH2,1
CL2,1
CH2,2
CL2,2
CH2,3
CL2,3
CH3,1
CL3,1
CH3,2
CL3,2
CH3,3
CL3,3
B3
*1 CHi,j: Upper 16 bits of Ci,j
CLi,j: Lower 16 bits of Ci,j
Figure 9.3 Memory Map with Matrix A, Matrix B, and Matrix Product C
Components Stored
Rev. 1.0, 09/99, page 76 of 115
C1
C2
C3
4. Algorithm for Calculating Matrix Product C
Figure 9.4 shows the algorithm for calculating matrix product C. The details of the algorithm
are described below.
(1) Clear counter registers, store matrix A in the X address register (R4) and matrix B in the Y
address registers (R6, R7), set the addresses for storing the components of matrix product
C.
(2) Perform sum of products calculation on row components an,i of matrix A and column
components bi,m of matrix B.
(3) Store CHn,m (upper 16 bits of matrix product Cn,m) in MATRIXC+2n address and
CLn,m (lower 16 bits) in MATRIXC+2n+2 address.
(4) Return matrix A column components to first column.
(5) Determine if one row of matrix product Cn,m has been calculated. If n is not 3, return to
process (2). If n is 3, move to process (6).
(6) Shift matrix A row components down one row.
(7) Determine if all three rows of matrix product C have been calculated. If n is not 3, return
to process (2). If n is 3, all of matrix product Cn,m has been calculated and processing
ends.
Rev. 1.0, 09/99, page 77 of 115
Initial setting
(1)
Sum of products calculation on row components an,i
of matrix A and column components bi,m of matrix B
(2)
3
Cn,m = Σ (Cn,i × Ci,m)
i=1
Store CHn,m (upper 16 bits of matrix product Cn,m)
in MATRIXC+2n address and CLn,m (lower 16 bits)
in MATRIXC+2n+2 address
(3)
Return matrix A column components to first column
(4)
No
n = 3?
(5)
Yes
Shift matrix A row components down one row
No
n = 3?
Yes
End
Figure 9.4 Algorithm for Calculating Matrix Product C
Rev. 1.0, 09/99, page 78 of 115
(6)
(7)
Flowchart
Start
Clear R10 address
Clear R12 address
(1)
Transfer MATRIXA (H'1000FF00) address to register
R4
Transfer MATRIXB (H'1001FF00) address to register
R6
Transfer MATRIXC (H'1001FF12) address to register
R7
Use extended instruction REPEAT to set repeat start
address (LOOP_S), repeat end address (LOOP_E),
and number of repeats (3 times)
Clear register M0
Clear register A0
(2)
After reading 1 component ai,j from matrix A,
increment R4 address
After reading 1 component bi,j from matrix B,
increment R6 address
Multiply matrix A component ai,j by matrix B
component bi,j
Repeat program
number of times
indicated by number
of repeats setting (3
times in the case of
the example
program)
Add product of ai,j and bi,j to product from previous
repeat; ci,j has been calculated once repeat operation
finishes
I
α β
Rev. 1.0, 09/99, page 79 of 115
α β
I
(3)
Shift matrix product ci,j obtained in process (2) 16
bits to the left
Store upper 16 bits of matrix product ci,j (cHi,j) in
MATRIXC+2n address
Store lower 16 bits (cLi,j) in MATRIXC+2n+2 address
(4)
Return matrix A column components to first column
Calculation of 1 component of matrix product C is
finished, so increment R12 counter register
(5)
Is calculation of 1
row of matrix product C finished?
R11 = R12?
No
Yes
Clear register R12 (clear counter)
(6)
Shift matrix A row components down one row
Calculation of 1 row of matrix product C is finished,
so increment R10 counter register
(7)
Is calculation of 3
rows of matrix product C finished?
R13 = R10?
Yes
End
Rev. 1.0, 09/99, page 80 of 115
No
Main Program
matrix.src
;*******************************************************************************************
;*
Matrix operation routine
;*
;*
[A][B]=[C]
;*
;*******************************************************************************************
MAIN:
MOV.L
#0,R10
MOV.L
#0,R12
MOV.L
#MATRIXA,R4
MOV.L
#MATRIXB,R6
MOV.L
#MATRIXC,R7
;****************************************
;Calculate all components/R10, R13
;****************************************
MOV.L
#3,R13
;Set repeat value (number of rows)
MATORIX:
;**********************************
;Calculate row components of n’th row
;**********************************
MOV.L
#3,R11
;Set repeat value (number of columns)
RETSU:
;****************************
;Calculate 1 component
;****************************
BSR
SEIBUN
NOP
BSR
STORE
NOP
;****************************
ADD
#-6,R4
;Return address to first column of row i
of matrix A
ADD
#1,R12
;Increment counter each time 1 component
of 1 row of matrix product C is
calculated
CMP/EQ
R11,R12
;Is sum of products calculation for 1 row
of matrix product C finished?
BF
RETSU
MOV.L
#0,R12
;Clear counter
;**********************************
ADD
#6,R4
MOV.L
#MATRIXB,R6
ADD
#1,R10
;Increment counter when sum of products
calculation for 1 row of matrix product C
is finished
CMP/EQ
R13,R10
;Is sum of products calculation for last
row of matrix product C finished?
Rev. 1.0, 09/99, page 81 of 115
BF
MATORIX
;****************************************
EXIT:
BRA
EXIT
NOP
;*******************************************************************************************
;Matrix C 1 component calculation routine
;*******************************************************************************************
SEIBUN:
REPEAT LOOP_S,LOOP_E,#3
;Number of rows in matrix [A]
is number of repeats
PCLR
M0
;Clear for repeat
PCLR
A0
PMULS
X0,Y0,M0
LOOP_S:
MOVX.W @R4+,X0 MOVY.W @R6+,Y0 ;aij,bij load
LOOP_E: PADD
A0,M0,A0
RTS
NOP
;*******************************************************************************************
;Matrix C 1 component storage routine
;*******************************************************************************************
STORE: PSHA
#16,A0
MOVY.W A0,@R7+ ;Store upper bits of ci,j
MOVY.W A0,@R7+ ;Store lower bits of ci,j
RTS
NOP
;***********************
MAIN_E: NOP
Data
*********************************************************************************
;*
Matrix operation data (XRAM/YRAM)
;*********************************************************************************
.SECTION XRAM,DATA,LOCATE=H'1000FF00
MATRIXA:
. XDATA.W
0.5,0.125,0.5,0.125,0.5,0.125,0.5,0.125,0.5
MATRIXB:
.RES.W
0.25,0.0625,0.25,0.0625,0.25,0.0625,0.25,0.0625,0.25
MATRIXC:
.RES.W
18
.SECTION YRAM,DATA,LOCATE=H'1001FF00
Rev. 1.0, 09/99, page 82 of 115
Section 10 Inner Product
Overview
The inner product (32-bit precision) of two non-zero n-dimensional space vectors, a (16-bit
components) and b (16-bit components), is calculated. The n-dimensional space vectors a and b
are set in XRAM and YRAM beforehand. The inner product of a and b is stored in YRAM at
address H'1001FF00.
Description
1. Method of Expressing Space Vectors
Figure 10.1 shows an expression of the components of n-dimensional space vector a. An ndimensional space vector can be thought of as a vector consisting of a group of n real numbers.
There are two ways of expressing the components of a vector: as a row vector and as a column
vector.
*1
a1, a2, ..., an
a1
a2
:
an
(a) Row vector
(b) Column vector
*1
*1 ai: 16-bit
Figure 10.1 Expression of Components of n-dimensional Space Vector a
Rev. 1.0, 09/99, page 83 of 115
2. Method of Calculating Inner Product
Figure 10.2 shows an expression of the components of the inner product of n-dimensional
space vectors a and b. Here the inner product of vectors a and b is expressed as (a,b).
*1
*1
a1, a2, ..., ai, ..., an
n-dimensional
space vector
Row vector a
×
b1
b2
:
bi
:
bn
=
*2
a1b1 + a2b2 + ... + aibi + ... + anbn
n-dimensional
space vector
Column vector b
*1 ai: 16-bit
bi: 16-bit
*2 32-bit
Figure 10.2 Expression of Components of Inner Product of n-dimensional Space
Vectors a and b
The inner product (a,b) is obtained using the following equation.
3
(a,b) = Σ
aibi
i=1
Using the above equation, the inner product (a,b) is obtained by performing a sum of products
calculation on components ai of space vector a and components bi of space vector b.
Rev. 1.0, 09/99, page 84 of 115
3. Method of Storing Inner Product (a,b) of n-dimensional Space Vectors a and b
Figure 10.3 shows the method of storing the inner product (a,b) components of n-dimensional
space vectors a and b, which are set in XRAM and YRAM.
Address
VECTORA
VECTORA+2
VECTORA+4
VECTORA+2n–2
VECTORA+2n
XRAM
a1
a2
a3
an–1
an
Address
VECTORB
VECTORB+2
VECTORB+4
YRAM
b1
b2
b3
bn–1
bn
VECTORB+2n–2
VECTORB+2n
Address
#IN_PRO
#IN_PRO+2
*1
YRAM
(a,b ) H
(a,b ) L
*1 (a,b )H: Upper 16 bits of (a,b )
(a,b )L: Lower 16 bits of (a,b )
Figure 10.3 Method of Storing Inner Product (a,b) of n-dimensional
Space Vectors a and b
Rev. 1.0, 09/99, page 85 of 115
4. Algorithm for Calculating Inner Product
Figure 10.4 shows the algorithm for calculating the inner product (a,b). The details of the
algorithm are described below.
(1) Set the addresses where the space vector a and b components are stored as well as the
address for storing the inner product of a and b in X address register (R4) and Y address
registers (R6, R7).
(2) Perform a sum of products calculation on components ai of space vector a and components
bi of space vector b.
(3) Store (a,b)H, the upper 16 bits of inner product (a,b) at the IN_PRO address and (a,b)L,
the lower 16 bits of inner product (a,b), at the IN_PRO+2 address. This completes the
process.
Initial setting
(1)
sum of products calculation on components ai of
space vector a and components bi of space vector b
n
(a,b ) = Σ (ai × bi)
(2)
i=1
Store (a,b )H, the upper 16 bits of inner product
(a,b ) at the IN_PRO address and (a,b )L, the lower
16 bits of inner product (a,b ), at the IN_PRO+2
address
End
Figure 10.4 Algorithm for Calculating Inner Product
Rev. 1.0, 09/99, page 86 of 115
(3)
Flowchart
Start
(1)
Transfer VECTORA (H'1000FF00) address to register
R4
(1-1)
Transfer VECTORB (H'1001FF00) address to register
R6
(1-2)
Transfer IN_PRO (H'1001FF0A) address to register
R7
(1-3)
Use extended instruction REPEAT to set repeat start
address (LOOP_S), repeat end address (LOOP_E),
and number of repeats (n + 2 times)
(2-1)
Clear register M0
(2-2)
Clear register A0
(2-3)
After reading 1 component ai of vector a from XRAM,
increment R4 address
After reading 1 component bi of vector b from YRAM,
increment R6 address
Multiply ai by bi i–1
Calculate aibi and Σ ajbj
(2-4)
Shift obtained inner product (a,b ) 16 bits to the left to
obtain (a,b )L
Store (a,b )H, the upper 16 bits of inner product (a,b )
at IN_PRO address, increment IN_PRO address
(3-1)
Store (a,b )L, the lower 16 bits of inner product (a,b ),
at IN_PRO+2 address
(3-2)
(2)
j=1
(3)
End
Rev. 1.0, 09/99, page 87 of 115
Main Program
This program calculates the inner product for the three-dimensional space vector {ai, bi (i = 1, 2,
3)}.
in_pro.src
;*******************************************************
;*
Inner product calculation routine
;*
;*
(a,b)=a1b1+a2b2+a3b3
;*
;*******************************************************
;*******************************************************
;*
Initial setting routine
;*******************************************************
MAIN:
MOV.L
#VECTORA,R4
MOV.L
#VECTORB,R6
MOV.L
#IN_PRO,R7
;*******************************************************************************************
;*
Sum of products calculation routine
;*******************************************************************************************
REPEAT
LOOP_S,LOOP_S,#5
PCLR
A0
PCLR
M0
PCLR
X0
PCLR
Y0
PADD
A0,M0,A0
;Number of components in vector a
+ 2 is number of repeats
LOOP_S:
PMULS
X0,Y0,M0 MOVX.W @R4+,X0 MOVY.W @R6+,Y0 ;ai,bi load
;*******************************************************************************************
;*
Inner product storage routine
;*******************************************************************************************
STORE:
PSHA
#16,A0
of inner product
MOVY.W A0,@R7+ ;Store upper bits
MOVY.W A0,@R7 ;Store lower bits
of inner product
EXIT:
BRA
EXIT
NOP
MAIN_E: NOP
Rev. 1.0, 09/99, page 88 of 115
Data
;*****************************************************************
;*
Inner product calculation data (XRAM/YRAM)
;*****************************************************************
.SECTION XRAM,DATA,LOCATE=H'1000FF00
VECTORA:
.XDATA.W
0.5,0.125,0.5,0,0
VECTORB:
.XDATA.W
0.25,0.0625,0.25,0,0
IN_PRO:
.RES.W
2
.SECTION YRAM,DATA,LOCATE=H'1001FF00
Rev. 1.0, 09/99, page 89 of 115
Section 11 Square Root
Overview
A 16-bit fixed-point square root calculation is performed and a square root with 15-bit precision is
obtained.
Description
1. I/O Value Data Format
Figure 11.1 shows the data format for I/O values. The value, X, whose square root is to be
determined is input in 16-bit format with its uppermost bit set to 0. However, it is also
necessary to perform normalization on X before calculating the square root.
The square root, √X, is output in 16-bit (1 word) format with the uppermost bit set to 0.
Bit: 15
0
0
Bit: 15
0
0
Input value
X, whose square root
is to be determined
Output value
Square root, X
: Decimal point position
Figure 11.1 I/O Value Data Format
2. Method of Calculating Square Root
Figure 11.2 illustrates the square root function. The example program calculates an
approximate value for the square root of X using a polyline graph of the sort shown in Figure
11.2 Square Root Function. Next, a gradualization equation is used to converge on a more
accurate value. This is the method used to calculate the square root, √X.
Once normalization is performed on X, the range that can be taken by X, the value whose
square root is to be calculated, is as follows.
0 ≤ X < 1.0
(H'00000 ≤ X ≤ H'7FFF)
In the square root function shown in Figure 11.2, the slope of the polyline graph is created by a
combination of comparatively gentle sections greater than 0.1 and steep sections less than 0.1,
resulting in approximation equations (1) and (2). Using these two equations, an approximate
square root value (y0) is obtained.
Rev. 1.0, 09/99, page 91 of 115
Approximate value y0
1.0
√0.7
√0.5
y0 = 0.58579 × X + 0.41422
0.5
0.41422
√1.0
0
y0 = 3.16228 × X
0.1
0.25
0.5
0.7
1.0
Value whose square root is
to be determined, X
Figure 11.2 Square Root Function
Input value X > 0.1
y0 = 0.58579 × X + 0.41422 ------------------------------------------------------------- (1)
Input value X ≤ 0.1
y0 = 3.16228 × X -------------------------------------------------------------------------- (2)
2
(The actual program uses y0 = 0.79057 × X × 2 .)
Note that equation (2) cannot be used without modification for fixed-point calculation.
2
Therefore, normalization is performed and it is used as y0 = 0.79057 × X × 2 .
Next, the value y0 obtained with approximation equations (1) and (2) is assigned to
gradualization equation (3) to obtain a more accurate square root value, √X.
y0 = √X = 1/2 (y0 + X/y0) ----------------------------------------------------------------- (3)
Here, in item 2 of equation (3), since the value whose square root is being calculated, X, has
been normalized, X/y0 must be a normalized value in order to y0 > X after the calculations of
equations (1) and (2). In the sample program gradualization equation (3) is performed three
times, resulting in a square root value with 15-bit precision.
Rev. 1.0, 09/99, page 92 of 115
3. Algorithm for Fixed-point Square Root Calculation
The algorithm for fixed-point square root calculation is described below.
(1) Initial settings are performed.
(2) It is determined whether X, the value whose square root is to be calculated, is not 0. If X is
0, the square root, √X, is given as 0 and processing ends.
(3) It is determined whether X, the value whose square root is to be calculated, is a negative
number. If X is a negative number, the square root, √X, is given as H'FFFF and processing
ends.
(4) X, the value whose square root is to be calculated, is compared to H'7FFB to determine
whether it is larger or smaller. If X > H'7FFB, the square root, √X, is given as √X(=X) and
processing ends.
(5) X, the value whose square root is to be calculated, is compared to 0.1 to determine
whether it is larger or smaller. If X > 0.1, processing continues with (6). If X ≤ 0.1,
processing continues with (6)'.
(6) Equation (1) is used to calculate approximate square root y0. Processing continues with
(7).
(6)' Equation (2) is used to calculate approximate square root y0. Processing continues with
(7).
(7) Approximate square root y0 is compared to X, the value whose square root is being
calculated, to determine whether it is larger or smaller. If y0 = X, approximate square root
y0 is divided by 2, 0.5 (H'4000) is added, the result is given as the square root, √X, and
processing ends.
(8) If the comparison in (7) shows that X, the value whose square root is being calculated, is
greater than approximate square root y0, gradualization equation X/y0 is not performed. In
this case the square root, √X, is given as H'FFFF and processing ends.
(9) Gradualization equation (3) is used to calculate square root value y, which is given as the
square root, √X, and processing ends.
Figure 11.3 shows the algorithm used for calculating the square root.
Rev. 1.0, 09/99, page 93 of 115
Initial setting
X = 0?
(1)
Yes
√X=0
No
X < 0?
Yes
Yes
(4)
√X=X
No
X > 0.1?
(3)
√ X = H'FFFF
No
X > H'7FFB?
Yes
(5)
(6)
No
Calculate approximate square
root y0 using equation (1)
y0 = 0.58579 × X + 0.41422
y0 = X?
Calculate approximate square
root y0 using equation (2)
y0 = 3.16228 × X
Yes
No
y0 < X?
(6)'
(7)
Divide approximate square root
y0 by 2, add 0.5
y0 = 1/2 (y0 + 1)
Yes
No
(8)
√ X = H'FFFF
Calculate square root √ X using
equation (3)
y0 = √ X = 1/2 (y0 + X/y0)
End
Figure 11.3 Algorithm for Calculating Square Root
Rev. 1.0, 09/99, page 94 of 115
(2)
(9)
Flowchart
Start
Transfer INPUT address to register R4
(1-1)
Transfer EX_OUT address to register R5
(1-2)
Transfer DAT address to register R6
(1-3)
Transfer DAT2 address to register R7
(1-4)
Load input value X in register R0
(2-1)
(1)
Is data value
in register R0 (input value X) 0?
(X = 0?)
No
(2-2)
(2)
Yes
Load H'0 in register X0
(2-3)
Copy register X0 data (H'0) to register A0
(2-4)
(2-5)
FIN
Exchange lower word of data in register R0 and
upper word of data in register R1
(3-1)
Shift data in register R1 (upper word is input value X)
1 bit to the left to determine sign
(3-2)
Is bit 31 of register R1 1?
(X < 0?)
No
(3-3)
Yes
(3)
Load H'FFFF in register X0
(3-4)
Copy register X0 data (H'FFFF) to register A0
(3-5)
FIN
(3-6)
I
Rev. 1.0, 09/99, page 95 of 115
I
Load input value X in register R0
(4-1)
Load H'7FFB in register R1
(4-2)
Is R0 greater than R1?
X > H'7FFB?
(4)
No
(4-3)
Yes
Transfer EX_OUT2 address to register R5
(4-4)
Load input value X in register X0
(4-5)
Copy register X0 data to register A0
(4-6)
FIN
(5)
Transfer DAT2 address to register R7
(5-1)
Load 0.1 in register R1
(5-2)
Is R0 greater than R1?
X > 0.1?
No
(5-3)
Yes
(6)
Load input value X in register X1
Load data for approximate square root calculation
output (0.58579) in register Y0
(6-1)
Load input value X in register R1
(6-2)
Transfer WORK address to register R4
(6-3)
Multiply register X1 and register Y0 (0.58579X)
Load data for approximate square root calculation
output (0.41422) in register Y1
(6-4)
Multiply register A1 and register Y1 (0.58579X +
0.41422)
(6-5)
α
Rev. 1.0, 09/99, page 96 of 115
II
α
II
(6)'
Transfer KINJI2 address to register R6
(6'-1)
Load input value X in register X1
Load data for approximate square root calculation
output (0.79057) in register Y0
(6'-2)
Load input value X in register R1
(6'-3)
Transfer WORK address to register R4
(6'-4)
Multiply register X1 and register Y0 (0.79057X)
(6'-5)
Shift 2 bits to left to multiply 0.79057X by 4
(6'-6)
Load approximate square root y0 in register R0 via
@R4
(7-1)
Is approximate square
root y0 equivalent to input value X?
y0 = X?
No
(7-2)
Yes
(7)
Shift data in register A0 1 bit to right to multiply
approximate square root y0 by 1/2
Load 0.5 in register Y1
(7-3)
Add register A0 and register Y1 (y0/2 + 0.5), store
result in register A0
(7-4)
FIN
Is input value X greater
than approximate square root y0?
X > y0?
No
(8-1)
Yes
(8)
Load H'FFFF in register X0
(8-2)
Copy register X0 data (H'FFFF) to register X0
(8-3)
FIN
III
Rev. 1.0, 09/99, page 97 of 115
III
Set register R14 to 3 (number of times to perform
gradualization equation)
(9-1)
Clear register R13 to 0
(9-2)
Increment register R13 (repeat counter)
(9-3)
Save input value X in register R11
(9-4)
Clear register R12
(9-5)
Use extended instruction REPEAT to set repeat start
address (LOOP_S), repeat end address (LOOP_E),
and number of repeats (15 times)
(9-6)
Initialize for signless division
(9-7)
(9)
(9-8)
Perform 1-step division on X using y0
Store T bit in R12, shift R12 1 bit to left
Program repeats number of times
specified as number of repeats (15
times in case of sample program)
(9-9)
Transfer X/y0 to register Y0 via @R4
(9-10)
Copy register X0 to register Y1
(9-11)
Shift data in register A0 1 bit to right to multiply X by 1/2
(9-12)
Shift data in register X1 1 bit to right to multiply X by 1/2
(9-13)
Add calculation results from (9-12) and (9-13) to
obtain square root y (√X). Store calculation result in
register A0
(9-14)
Transfer y (√X) to register Y0 via @R4
(9-15)
Restore input value X in register R1 from register R11
(9-16)
IV
Rev. 1.0, 09/99, page 98 of 115
β
β
IV
Is register R13
greater than register R14?
No
(9-17)
(9)
Yes
FIN
Store data from register A0 in register R7 (OUTPUT)
(9-18)
End
Rev. 1.0, 09/99, page 99 of 115
Main Program
rout.src
;*******************************************************************************************
;*
Square root calculation routine
;*
√X
;*
;*
;*******************************************************************************************
;*******************************************************************************************
;*
Initial setting routine
;*******************************************************************************************
MAIN:
MOV.L
#INPUT,R4
MOV.L
#EX_OUT,R5
MOV.L
#KINJI1,R6
MOV.L
#DAT1,R7
;*******************************************************************************************
;*
Zero check of value to have square root calculated routine
;*******************************************************************************************
MOV.W
@R4,R0
CMP/EQ
#0,R0
BF
processing
ZERO_CH
;If zero, do following
MOVX.W @R4,X0
PCOPY
BRA
X0,A0
FIN
;End of processing
NOP
;*******************************************************************************************
;*
Negative value check of value to have square root calculated routine
;*******************************************************************************************
ZERO_CH:
SWAP
R0,R1
SHAL
R1
BF
MINUS_CH
PCOPY
X0,A0
BRA
FIN
;If negative, do following
processing
MOVX.W @R5,X0
;End of processing
NOP
;;******************************************************************************************
;*
Comparison of value to have square root calculated and F'7FFB routine
;*******************************************************************************************
MINUS_CH:
Rev. 1.0, 09/99, page 100 of 115
MOV.W
@R4,R0
MOV.W
@R7,R1
;X load
;H'7FFB load
CMP/GT
R1,R0
;R0 > R1 ?
BF
EQU_SEL
;If X > F'7FFB, do following
processing
MOV.L
#EX_OUT2,R5
MOVX.W @R5,X0
PCOPY
BRA
;X load
X0,A0
FIN
NOP
;*******************************************************************************************
;*
Approximation equation selection routine
;*******************************************************************************************
EQU_SEL:
MOV.L
#DAT2,R7
MOV.W
@R7,R1
CMP/GT
R1,R0
BF
Y0_PRO2
;If X ≤ 0.1, jump
********************************************************************************************
;*
Approximate square root y0 calculation routine
;*******************************************************************************************
Y0_PRO1:
MOVX.W @R4,X1 MOVY.W @R6+,Y0 ;Load input value X (value to
have square root calculated)
for use in calculating
approximate square root
MOV.W
MOV.L
@R4,R1
#WORK,R4
PMULS
X1,Y0,A1
PADD
A1,Y1,A0
BRA
;Keep input value X (value to
have square root calculated)
in R1
MOVY.W @R6+,Y1 ;0.58579X,0.41422 load
;0.58579X+0.41422 -> y0
HIKAKU
NOP
;*******************************************************************************************
;*
Approximation equation (2) y0 calculation routine
;*******************************************************************************************
Y0_PRO2:
MOV.L
#KINJI2,R6
MOVX.W @R4,X1 MOVY.W @R6+,Y0 ;Load input value X (value to
have square root calculated)
for use in calculating
approximate square root
MOV.W
@R4,R1
MOV.L
#WORK,R4
;Keep input value X (value to
have square root calculated)
in R1
Rev. 1.0, 09/99, page 101 of 115
PMULS
X1,Y0,A1
PSHA
#2,A0
MOVY.W @R6+,Y1 ;0.58579X,0.41422 load
;0.58579X+0.41422 -> y0
********************************************************************************************
;*
Comparison of approximate square root and value to have square root
calculated routine/Part 1
;*******************************************************************************************
HIKAKU:
MOVX.W A0,@R4
;Pass to CPU unit
MOV.W
@R4,R0
CMP/EQ
R0,R1
;Approximate square root y0 =
input value X (value to have
square root calculated)?
BF
NOT_EQ
;If y0 ≠ X, do following
processing
PSHA
#-1,A0
PADD
A0,Y1,A0
BRA
MOVY.W @R6,Y1 ;y0/2,0.5 load
FIN
;y0/2-0.5
;End of processing
NOP
;*******************************************************************************************
;*
Comparison of approximate square root and value to have square root
calculated routine/Part 2
;*******************************************************************************************
NOT_EQ:
CMP/GT
R0,R1
BF
NOT_GT
;If y0 < X, do following
processing
MOVX.W @R5,X0 ;H'FFFF load
PCOPY
BRA
X0,A0
FIN
NOP
;*******************************************************************************************
;*
Square root y calculation using gradualization equation routine
;*******************************************************************************************
NOT_GT:
MOV.L
#3,R14
MOV.L
#0,R13
;Set number of repeats
LENEAR_LP:
ADD
#1,R13
;Increment counter
MOV
R1,R11
;push X
MOV.L
#0,R12
;Clear register R12
REPEAT
LOOP_S,LOOP_E,#15
DIV0U
;Signless initialization
LOOP_S:
DIV1
R0,R1
LOOP_E:
Rev. 1.0, 09/99, page 102 of 115
;R1/R0
ROTCL
R12
MOV.W
R12,@R4
;Store T bit
MOVX.W @R4,X0
PCOPY
X0,Y1
PSHA
#-1,A0
;y0/2
PSHA
#-1,Y1
;(X/y0)/2
PADD
A0,Y1,A0
MOVX.W A0,@R4
FIN:
MOV.W
@R4,R0
MOV
R11,R1
CMP/GT
R14,R13
BF
LENEAR_LP
MOV.L
#OUTPUT,R7
;pop X
;If set number of repeats has
been performed, escape
MOVY.W A0,@R7 ;Store square root √X
EXIT:
BRA
EXIT
NOP
MAIN_E: NOP
Data
;*******************************************************************************************
;*
Square root calculation data (XRAM/YRAM)
;*******************************************************************************************
.SECTION XRAM,DATA,LOCATE=H'1000FF00
INPUT:
.RES.W
1
WORK:
.RES.W
1
;External input data storage area
;Work area
EX_OUT:
.DATA.W
H'FFFF
;Output value if input value X < 0
EX_OUT2:
.XDATA.W
1
;Output value if input value X > H'7FFB
KINJI1:
.XDATA.W
0.58579,0.41422,0.5
;Approximation equation (1)
KINJI2:
.XDATA.W
0.79057
;Approximation equation (2)
DAT1:
.DATA.W
H'7FFB
.SECTION YRAM,DATA,LOCATE=H'1001FF00
DAT2:
.XDATA.W
0.1
OUTPUT:
.RES.W
1
;External output data storage area
Rev. 1.0, 09/99, page 103 of 115
Execution Example
The input values for X (INPUT) and the square root √X values calculated (OUTPUT) are shown
in table 11.1.
Table 11.1 Square Root √X Calculation Results (3 Executions of Gradualization Equation)
Input Value X
(decimal)
Input Value X
(hexadecimal)
Logical Value
(decimal)
√X
Logical Value
(hexadecimal)
√X
Output Value
(hexadecimal)
√X
0.9999
H'7FFC
0.99995
H'7FFE
H'7FFF
0.99987
H'7FFB
0.99993
H'7FFD
H'7FFD
0.85
H'6CCD
0.92195
H'7602
H'7602
0.523
H'42F1
0.72319
H'5C91
H'5C90
0.34
H'2BB5
0.5831
H'4AA3
H'4AA2
0.136
H'1168
0.36878
H'2F34
H'2F33
0.087
H'0B23
0.29496
H'25C1
H'25C1
0.01
H'0147
0.1
H'0CCD
H'0CC9
0
H'0000
0
H'0000
H'0000
–0.7
H'A667
—
—
H'FFFF
Rev. 1.0, 09/99, page 104 of 115
Section 12 Square Mean Error
Overview
The square mean error of two variables, a[i] (16-bit components) and b[i] (16-bit components), is
calculated.
(i = 1, 2, ..., n)
Description
1. Method of Obtaining Square Mean Error
In order to obtain the square mean error, first the error e[i] for the two variables, a[i] and b[i],
must be considered. The relevant equation is given as equation (1) below.
*1
e[i] = a[i] – b[i] ------------------------------------------------------------------------- (1)
(i = 1, 2, ..., n)
2
2
Next, the error distribution Se is obtained. The error distribution Se can be calculated by
dividing the sum total of the squares of the errors e[i] by the number of components (n). The
components of the squares of the errors e[i] can be expressed as follows.
1/n · Σe[i]2 = 1/n · (a[1] – b[1])2 + (a[2] – b[2]2 + ... + (a[n] – b[n])2
2
The error distribution Se can be obtained using equation (2) below.
n
Se2 = 1/n · Σ (a[i] – b[i])2 ----------------------------------------------------------------- (2)
i=1
2
2
The square mean error E[Se ] is expressed as the square root of the error distribution Se . The
2
relevant equation for obtaining the square mean error E[Se ] is shown as equation (3) below.
E[e2] =
n
1/n · Σ (a[i] – b[i])2 ------------------------------------------------------------- (3)
i=1
*1 a[i]: 16-bit
b[i]: 16-bit
e[i]: 16-bit
Rev. 1.0, 09/99, page 105 of 115
2. Method of Storing Components of Variables a[i] and b[i]
On order to obtain the square mean error, it is first necessary to calculate the sum total of the
squares of the errors e[i]. To increase processing speed, the components of a[i] and b[i] are
stored in XRAM and YRAM ahead of time as shown in figure 12.1. Note that 0 is stored in
VECTORA+2n, VECTORA+2n+2, VECTORB+2n, and VECTORB+2n+2 of XRAM and
YRAM. The example program will not run properly if zeros are not stored in these locations.
For division by the number of components n, the numeric value 1/n is stored in XRAM. The
actual program does not use a DSP instruction, but rather multiplies values by 1/n.
Address
VECTORA
VECTORA+2
VECTORA+4
VECTORA+6
XRAM
15
a[1]
a[2]
a[3]
VECTORA+2n–4
VECTORA+2n–2
VECTORA+2n
VECTORA+2n+2
Address
VECTORA
0
a[n–1]
a[n]
0
0
Address
VECTORB
VECTORB+2
VECTORB+4
VECTORB+6
VECTORB+2n–4
VECTORB+2n–2
VECTORB+2n
VECTORB+2n+2
YRAM
15
0
b[1]
b[2]
b[3]
b[n–1]
b[n]
0
0
XRAM
15
0
1/n
Figure 12.1 Memory Map of Storage of Variables a[i] and b[i], Etc.
Rev. 1.0, 09/99, page 106 of 115
3. Algorithm for Calculating Square Mean Error
The algorithm used to calculate the square mean error is described below.
(1) Perform initial settings.
(2) Set items (2) and (3) so that the number of repeats is number of elements n + 2. Two extra
repeats are added since the following four instructions run in parallel.
i–1
Σ e[j]2 , calculate e[i], load a[i], load b[i]
Calculate e[i]2 + j=1
(3) Calculate the error e[i] for a[i] and b[i].
n
Σ (a[i] – b[i])2, which was obtained using processes (2) and (3), by n.
(4) Divide i=1
2
(5) Calculate the square root of the input error distribution Se . This yields the square mean
error and completes the processing. (For details, see 3. Algorithm for Fixed-point Square
Root Calculation in 11. Square Root.)
(1)
Initial setting
Execute the following 4 instructions in parallel
i–1
Calculate e[i] 2 + Σe[j] 2, calculate e[i]2, load a[i], load
(2)
j=1
Number of repeats is number of
components n + 2
Calculate error for a[i] and b[i]
(3)
e[i] = a[i] – b[i]
Divide Σ(a[i] – b[i])2 by n
n
Se2 = 1/2 · Σ (a[i] – b[i]) 2
(4)
Calculate square root of Se2
(5)
i=1
End
Figure 12.2
Rev. 1.0, 09/99, page 107 of 115
Flowchart
Start
(1)
(2)
Transfer VECTORA address to register R4
(1-1)
Transfer SEIBUN_N address to register R5
(1-2)
Transfer VECTORB address to register R6
(1-3)
Use extended instruction REPEAT to set repeat start
address (LOOP_S), repeat end address (LOOP_E),
and number of repeats (5 times)
(2-1)
Clear register A1
(2-2)
Clear register Y0
(2-3)
Clear register Y0
(2-4)
i–1
Add e[i]2 and Σe[j] 2
j=1
Calculate e[i]2
After reading a[i] from XRAM, increment R4 address
After reading b[i] from YRAM, increment R6 address
(3)
(2-5)
Program repeats number of
times specified as number
of repeats (5 times in case
of sample program)
Calculate error e[i] for a[i] and b[i]
(3-1)
Copy contents of register X0 to register A1
Read 1/n to register X1
(4-1)
(4)
n
Multiply Σe[j] 2 and 1/n
(4-2)
i=1
I
Rev. 1.0, 09/99, page 108 of 115
I
(5)
Transfer INPUT address to register R4
(5-1)
Store error distribution Se2 (register A1) at input
address (INPUT) used for square root output
(5-2)
<Square root calculation routine>
(See flowchart in section 11, Square Root for details)
End
Rev. 1.0, 09/99, page 109 of 115
Main Program
The example program calculates the square mean error using three components {a[i], b[i] (i = 1, 2,
3)}
squ_ave.src
;*******************************************************************************************
;*
Square mean routine
;*
;*
a[i],b[i]
;*
;*******************************************************************************************
;*******************************************************************************************
;*
Initial setting routine
;*******************************************************************************************
MAIN:
MOV.L
#VECTORA,R4
MOV.L
#SEIBUN_N,R5
MOV.L
#VECTORB,R6
;*******************************************************************************************
;*
Error distribution calculation routine
;*******************************************************************************************
REPEAT LOOP_S,LOOP_E,#5
PCLR
A1
PCLR
Y0
PCLR
A0
PADD
A0,Y0,Y0 PMULS
PSUB
X0,Y1,A1
PCOPY
Y0,A1
PMULS
X1,A1,A1
;Number of repeats is number of
vector a components + 2
LOOP_S:
A1,A1,A0
MOVX.W @R4+,X0 MOVY.W @R6+,Y1 ;a[i],b[i]load
LOOP_E:
MOVX.W @R5,X1
;1/3 load
;0.33333 × Σ(a[i] - b[i])2
;*******************************************************************************************
;*
Value to have square root calculated storage routine
;*******************************************************************************************
MOV.L
#INPUT,R4
MOVX.W A1,@R4
;
;*******************************************************************************************
;*
Square root calculation routine
;*******************************************************************************************
;*******************************************************************************************
;*
Initial setting routine
Rev. 1.0, 09/99, page 110 of 115
;*******************************************************************************************
SEMI_MAIN:
MOV.L
#EX_OUT,R5
MOV.L
#DAT,R6
MOV.L
#DAT2,R7
;*******************************************************************************************
;*
Zero check of value to have square root calculated routine
;*******************************************************************************************
MOV.W
@R4,R0
CMP/EQ
#0,R0
BF
ZERO_CH
MOVX.W @R4,X0
PCOPY
;H'0 load
X0,A0
BRA
;
FIN
;End of processing
NOP
;*******************************************************************************************
;*
Negative value check of value to have square root calculated routine
;*******************************************************************************************
ZERO_CH:
SWAP
R0,R1
SHAL
R1
BF
following processing
MINUS_CH
;If negative, do
MOVX.W @R5,X0
PCOPY
X0,A0
BRA
FIN
;H'FFFF load
;End of processing
NOP
;*******************************************************************************************
;*
routine
Comparison of value to have square root calculated and F'7FFB
;*******************************************************************************************
MINUS_CH:
MOV.W
@R4,R0
;X load
MOV.W
@R7,R1
;H'7FFB load
CMP/GT
R1,R0
;R0 > R1 ?
BF
EQU__SEL
;If R1 is greater, jump
MOV.L
#EX_OUT2,R5
MOVX.W @R5,X0
PCOPY
BRA
;X load
X0,A0
FIN
NOP
;*******************************************************************************************
;*
Approximation equation selection routine
Rev. 1.0, 09/99, page 111 of 115
;*******************************************************************************************
EQU_SEL:
MOV.L
#DAT2,R7
MOV.W
@R7,R1
CMP/GT
R1,R0
BF
Y0_PRO2
;*******************************************************************************************
;*
Approximation equation (1) y0 calculation routine
;*******************************************************************************************
Y0_PRO1:
MOVX.W @R4,X1
MOV.W
@R4,R1
MOV.L
MOVY.W @R6+,Y0 ;Load input value X
(value to have square
root calculated) for use
in calculating
approximate square root
;Keep input value X
(value to have square
root calculated) in R1
#WORK,R4
PMULS
X1,Y0,A1
PADD
A1,Y1,A0
BRA
MOVY.W @R6+,Y1 ;0.58579X,0.41422 load
;0.58579X+0.41422-> y0
HIKAKU
NOP
;*******************************************************************************************
;*
Approximation equation (2) y0 calculation routine
;*******************************************************************************************
Y0_PRO2:
MOV.L
#KINJI2,R6
MOV.W
@R4,R1
MOV.L
MOVX.W @R4,X1
MOVY.W @R6+,Y0 ;Load input value X
(value to have square
root calculated) for use
in calculating
approximate square root
;Keep input value X
(value to have square
root calculated) in R1
#WORK,R4
PMULS
X1,Y0,A0
;0.79057 × X
PSHA
#2,A0
;(0.79057 × X) × 4
;*******************************************************************************************
;*
Comparison of approximate square root and value to have square root
calculated routine/Part 1
;*******************************************************************************************
HIKAKU:
MOVX.W A0,@R4
MOV.W
@R4,R0
CMP/EQ
R0,R1
Rev. 1.0, 09/99, page 112 of 115
;Pass to CPU unit
;Approximate square root
= input value X (value
to have square root
calculated)?
BF
NOT_EQ
PSHA
#-1,A0
PADD
A0,Y1,A0
BRA
MOVY.W @R6,Y1
;y0/2,0.5 load
;y0/2-0.5
FIN
NOP
;*******************************************************************************************
;*
Comparison of approximate square root and value to have square root
calculated routine/Part 2
;*******************************************************************************************
NOT_EQ:
CMP/GT
R0,R1
BF
NOT_GT
MOVX.W @R5,X0
PCOPY
;H'FFFF load
X0,A0
BRA
FIN
NOP
;
;*******************************************************************************************
;*
Square root y calculation using gradualization equation routine
;*******************************************************************************************
NOT_GT:
MOV.L
#3,R14
MOV.L
#0,R13
;Set number of repeats
LENEAR_LP:
ADD
#1,R13
;Increment counter
MOV
R1,R11
MOV.L
#0,R12
REPEAT
DIV_S,DIV_E,#15
DIV0U
;Signless initialization
DIV_S:
DIV1
R0,R1
;R1/R0
ROTCL
R12
;Store T bit
MOV.W
R12,@R4
DIV_E:
MOVX.W @R4,X0
PCOPY
X0,Y1
PSHA
#-1,A0
;y0/2
PSHA
#-1,Y1
;(X/y0)/2
PADD
A0,Y1,A0
MOVX.W A0,@R4
MOV.W
@R4,R0
MOV
R11,R1
CMP/GT
R14,R13
BF
LENEAR_LP
Rev. 1.0, 09/99, page 113 of 115
FIN:
MOV.L
#OUTPUT,R7
MOVY.W A0,@R7
EXIT:
BRA
;Store square root √X
EXIT
NOP
MAIN_E: NOP
Data
;*******************************************************************************************
;*
Square mean calculation data (XRAM/YRAM)
;*******************************************************************************************
.SECTION XRAM,DATA,LOCATE=H'1000FF00
VECTERA:
.XDATA.W
0.5,0.125,0.5,0,0
SEIBUN_N:
.XDATA.W
0.33333
;1/number of components (n)
;* For calculating square root *
INPUT:
.RES.W
WORK:
.RES.W
1
1
EX_OUT:
.DATA.W
H'FFFF
EX_OUT2:
.XDATA.W
1
VECTERB:
.XDATA.W
.SECTION YRAM,DATA,LOCATE=H'1001FF00
0.25,0.0625,0.25,0,0
;; * For calculating square root *
KINJI1:
.XDATA.W
0.58579,0.41422,0.5
;Approximation equation (1)
KINJI2:
.XDATA.W
0.79057
;Approximation equation (2)
DAT1:
.DATA.W
H'7FFB
DAT2:
.XDATA.W
0.1
OUTPUT:
.RES.W
1
Rev. 1.0, 09/99, page 114 of 115
Section 13 Effects of DSP Instructions on Program
Performance
The number of execution cycles required by each function program file is listed in tables 13.1 and
13.2.
The test conditions used for table 13.1 were as follows: an E8000 (SH7612) emulator was used,
the main program of each program file was allocated to XRAM, and the data was allotted to
XRAM and YRAM.
The test conditions used for table 13.2 were as follows: a simulator (SH-DSP) was used, the main
program of each program file was allocated to XROM, and the data was allotted to XRAM and
YRAM.
Table 13.1 Performance of Programs Employing DSP Instructions
No. of Execution
Cycles
Program Filename
Function
Notes
pmuls32.src
32-bit multiplication
116
tri_fun.src
Trigonometric function
62
matrix.src
Matrix operation
238
3 × 3 matrix operation
in_pro.src
Inner product
15
3-dmensional space vectors
rout.src
Square root
104
squ_ave.src
Square mean error
114
n = 3 (3 components)
Table 13.2 Performance of Programs Employing DSP Instructions
Program Filename
Function
No. of Execution
Cycles
pmuls32.src
32-bit multiplication
172
tri_fun.src
Trigonometric function
80
matrix.src
Matrix operation
378
3 × 3 matrix operation
in_pro.src
Inner product
21
3-dmensional space vectors
rout.src
Square root
272
squ_ave.src
Square mean error
292
Notes
n = 3 (3 components)
Rev. 1.0, 09/99, page 115 of 115
SH-DSP Software Application Note
Publication Date: 1st Edition, September 1999
Published by:
Electronic Devices Sales & Marketing Group
Semiconductor & Integrated Circuits
Hitachi, Ltd.
Edited by:
Technical Documentation Group
UL Media Co., Ltd.
Copyright © Hitachi, Ltd., 1999. All rights reserved. Printed in Japan.