ETC HD64F7055

To all our customers
Regarding the change of names mentioned in the document, such as Hitachi
Electric and Hitachi XX, to Renesas Technology Corp.
The semiconductor operations of Mitsubishi Electric and Hitachi were transferred to Renesas
Technology Corporation on April 1st 2003. These operations include microcomputer, logic, analog
and discrete devices, and memory chips other than DRAMs (flash memory, SRAMs etc.)
Accordingly, although Hitachi, Hitachi, Ltd., Hitachi Semiconductors, and other Hitachi brand
names are mentioned in the document, these names have in fact all been changed to Renesas
Technology Corp. Thank you for your understanding. Except for our corporate trademark, logo and
corporate statement, no changes whatsoever have been made to the contents of the document, and
these changes do not constitute any alteration to the contents of the document itself.
Renesas Technology Home Page: http://www.renesas.com
Renesas Technology Corp.
Customer Support Dept.
April 1, 2003
Cautions
Keep safety first in your circuit designs!
1.
Renesas Technology Corporation puts the maximum effort into making semiconductor products better and more reliable, but
there is always the possibility that trouble may occur with them. Trouble with semiconductors may lead to personal injury, fire
or property damage.
Remember to give due consideration to safety when making your circuit designs, with appropriate measures such as (i)
placement of substitutive, auxiliary circuits, (ii) use of nonflammable material or (iii) prevention against any malfunction or
mishap.
Notes regarding these materials
1.
These materials are intended as a reference to assist our customers in the selection of the Renesas Technology Corporation
product best suited to the customer's application; they do not convey any license under any intellectual property rights, or any
other rights, belonging to Renesas Technology Corporation or a third party.
2.
Renesas Technology Corporation assumes no responsibility for any damage, or infringement of any third-party's rights,
originating in the use of any product data, diagrams, charts, programs, algorithms, or circuit application examples contained in
these materials.
3.
All information contained in these materials, including product data, diagrams, charts, programs and algorithms represents
information on products at the time of publication of these materials, and are subject to change by Renesas Technology
Corporation without notice due to product improvements or other reasons. It is therefore recommended that customers contact
Renesas Technology Corporation or an authorized Renesas Technology Corporation product distributor for the latest product
information before purchasing a product listed herein.
The information described here may contain technical inaccuracies or typographical errors.
Renesas Technology Corporation assumes no responsibility for any damage, liability, or other loss rising from these
inaccuracies or errors.
Please also pay attention to information published by Renesas Technology Corporation by various means, including the
Renesas Technology Corporation Semiconductor home page (http://www.renesas.com).
4.
When using any or all of the information contained in these materials, including product data, diagrams, charts, programs, and
algorithms, please be sure to evaluate all information as a total system before making a final decision on the applicability of
the information and products. Renesas Technology Corporation assumes no responsibility for any damage, liability or other
loss resulting from the information contained herein.
5.
Renesas Technology Corporation semiconductors are not designed or manufactured for use in a device or system that is used
under circumstances in which human life is potentially at stake. Please contact Renesas Technology Corporation or an
authorized Renesas Technology Corporation product distributor when considering the use of a product contained herein for
any specific purposes, such as apparatus or systems for transportation, vehicular, medical, aerospace, nuclear, or undersea
repeater use.
6.
The prior written approval of Renesas Technology Corporation is necessary to reprint or reproduce in whole or in part these
materials.
7.
If these products or technologies are subject to the Japanese export control restrictions, they must be exported under a license
from the Japanese government and cannot be imported into a country other than the approved destination.
Any diversion or reexport contrary to the export control laws and regulations of Japan and/or the country of destination is
prohibited.
8.
Please contact Renesas Technology Corporation for further details on these materials or the products contained therein.
Hitachi SuperH™ RISC engine
SH-2E
Programming Manual
ADE-602-178
Rev.1.0
3/5/03
Hitachi ,Ltd
Cautions
1. Hitachi neither warrants nor grants licenses of any rights of Hitachi’s or any third party’s
patent, copyright, trademark, or other intellectual property rights for information contained in
this document. Hitachi bears no responsibility for problems that may arise with third party’s
rights, including intellectual property rights, in connection with use of the information
contained in this document.
2. Products and product specifications may be subject to change without notice. Confirm that you
have received the latest product standards or specifications before final design, purchase or
use.
3. Hitachi makes every attempt to ensure that its products are of high quality and reliability.
However, contact Hitachi’s sales office before using the product in an application that
demands especially high quality and reliability or where its failure or malfunction may directly
threaten human life or cause risk of bodily injury, such as aerospace, aeronautics, nuclear
power, combustion control, transportation, traffic, safety equipment or medical equipment for
life support.
4. Design your application so that the product is used within the ranges guaranteed by Hitachi
particularly for maximum rating, operating supply voltage range, heat radiation characteristics,
installation conditions and other characteristics. Hitachi bears no responsibility for failure or
damage when used beyond the guaranteed ranges. Even within the guaranteed ranges,
consider normally foreseeable failure rates or failure modes in semiconductor devices and
employ systemic measures such as fail-safes, so that the equipment incorporating Hitachi
product does not cause bodily injury, fire or other consequential damage due to operation of
the Hitachi product.
5. This product is not designed to be radiation resistant.
6. No one is permitted to reproduce or duplicate, in any form, the whole or part of this document
without written approval from Hitachi.
7. Contact Hitachi’s sales office for any questions regarding this document or Hitachi
semiconductor products.
Introduction
The SH-2E is a new generation of RISC microcomputers that integrate a RISC-type CPU and the
peripheral functions required for system configuration onto a single chip to achieve highperformance operation. It can operate in a power-down state, which is an essential feature for
portable equipment.
This CPU has a RISC-type instruction set. Basic instructions can be executed in one clock cycle,
improving instruction execution speed. In addition, the CPU has a 32-bit internal architecture for
enhanced data-processing ability.
In addition, the SH-2E supports single-precision floating point calculations as well as entirely
PCAPI compatible emulation of double-precision floating point calculations. The SH-2E
instructions are a subset of the floating point calculations conforming to the IEEE754 standard.
This programming manual describes in detail the instructions for the SH-2E Series and is intended
as a reference on instruction operation and architecture. It also covers the pipeline operation,
which is a feature of the SH-2E Series.
For information on the hardware, please refer to the hardware manual for the product in question.
i
ii
Contents
Section 1
1.1
Features..............................................................................................................
SH-2E Features..................................................................................................................
Section 2
2.1
2.2
2.3
2.4
2.5
2.6
Register Configuration ..................................................................................
General Registers...............................................................................................................
Control Registers ...............................................................................................................
System Registers................................................................................................................
Floating-Point Registers ....................................................................................................
Floating-Point System Registers........................................................................................
Initial Values of Registers .................................................................................................
Section 3
3.1
3.2
3.3
Data Formats ....................................................................................................
Data Format in Registers ...................................................................................................
Data Format in Memory ....................................................................................................
Immediate Data Format .....................................................................................................
Section 4
4.1
4.2
4.3
4.4
4.5
Floating-Point Unit (FPU) ...........................................................................
Overview............................................................................................................................
Floating-Point Registers and Floating-Point System Registers.........................................
4.2.1 Floating-Point Register File .................................................................................
4.2.2 Floating-Point Communication Register (FPUL) ................................................
4.2.3 Floating-Point Status/Control Register (FPSCR).................................................
Floating-Point Format........................................................................................................
4.3.1 Floating-Point Format ..........................................................................................
4.3.2 Non-Numbers (NaN)............................................................................................
4.3.3 Denormalized Number Values .............................................................................
4.3.4 Other Special Values ............................................................................................
Floating-Point Exception Model........................................................................................
4.4.1 Enable State Exceptions .......................................................................................
4.4.2 Disable State Exceptions ......................................................................................
4.4.3 FPU Exception Event and Code ...........................................................................
4.4.4 Floating-Point Data Arrangement in Memory .....................................................
4.4.5 Arithmetic Operations Involving Special Operands ............................................
Synchronization with CPU ................................................................................................
1
1
3
3
4
5
6
7
8
9
9
9
10
11
11
12
12
12
12
15
15
16
16
17
18
18
18
18
18
18
19
Section 5
5.1
5.2
5.3
Instruction Features........................................................................................ 21
RISC-Type Instruction Set ................................................................................................ 21
Addressing Modes ............................................................................................................. 24
Instruction Format ............................................................................................................. 27
iii
Section 6
6.1
6.2
Instruction Set by Classification........................................................................................
Instruction Set in Alphabetical Order................................................................................
Section 7
7.1
7.2
iv
Instruction Set .................................................................................................. 31
31
47
Instruction Descriptions................................................................................ 55
Sample Description (Name): Classification ......................................................................
CPU Instruction .................................................................................................................
7.2.1 ADD (ADD Binary): Arithmetic Instruction .......................................................
7.2.2 ADDC (ADD with Carry): Arithmetic Instruction ..............................................
7.2.3 ADDV (ADD with V Flag Overflow Check): Arithmetic Instruction.................
7.2.4 AND (AND Logical): Logic Operation Instruction .............................................
7.2.5 BF (Branch if False): Branch Instruction .............................................................
7.2.6 BF/S (Branch if False with Delay Slot): Branch Instruction................................
7.2.7 BRA (Branch): Branch Instruction ......................................................................
7.2.8 BRAF (Branch Far): Branch Instruction..............................................................
7.2.9 BSR (Branch to Subroutine): Branch Instruction ................................................
7.2.10 BSRF (Branch to Subroutine Far): Branch Instruction ........................................
7.2.11 BT (Branch if True): Branch Instruction..............................................................
7.2.12 BT/S (Branch if True with Delay Slot): Branch Instruction ................................
7.2.13 CLRMAC (Clear MAC Register): System Control Instruction ...........................
7.2.14 CLRT (Clear T Bit): System Control Instruction.................................................
7.2.15 CMP/cond (Compare Conditionally): Arithmetic Instruction..............................
7.2.16 DIV0S (Divide Step 0 as Signed): Arithmetic Instruction...................................
7.2.17 DIV0U (Divide Step 0 as Unsigned): Arithmetic Instruction..............................
7.2.18 DIV1 (Divide 1 Step): Arithmetic Instruction......................................................
7.2.19 DMULS.L (Double-Length Multiply as Signed): Arithmetic Instruction ...........
7.2.20 DMULU.L (Double-Length Multiply as Unsigned): Arithmetic Instruction ......
7.2.21 DT (Decrement and Test): Arithmetic Instruction ...............................................
7.2.22 EXTS (Extend as Signed): Arithmetic Instruction...............................................
7.2.23 EXTU (Extend as Unsigned): Arithmetic Instruction..........................................
7.2.24 JMP (Jump): Branch Instruction ..........................................................................
7.2.25 JSR (Jump to Subroutine): Branch Instruction
(Class: Delayed Branch Instruction) ....................................................................
7.2.26 LDC (Load to Control Register): System Control Instruction
(Class: Interrupt Disabled Instruction).................................................................
7.2.27 LDS (Load to System Register): System Control Instruction..............................
7.2.28 MAC.L (Multiply and Accumulate Calculation Long): Arithmetic Instruction ..
7.2.29 MAC.W (Multiply and Accumulate Calculation Word):
Arithmetic Instruction ..........................................................................................
7.2.30 MOV (Move Data): Data Transfer Instruction ....................................................
7.2.31 MOV (Move Immediate Data): Data Transfer Instruction ..................................
7.2.32 MOV (Move Peripheral Data): Data Transfer Instruction ...................................
7.2.33 MOV (Move Structure Data): Data Transfer Instruction .....................................
55
58
58
59
60
61
63
64
66
67
68
70
71
72
74
75
76
80
81
82
87
89
91
92
93
94
95
97
99
101
104
106
111
113
116
7.2.34
7.2.35
7.2.36
7.2.37
7.2.38
7.2.39
7.2.40
7.2.41
7.2.42
7.2.43
7.2.44
7.2.45
7.2.46
7.2.47
7.2.48
7.2.49
7.3
MOVA (Move Effective Address): Data Transfer Instruction ............................
MOVT (Move T Bit): Data Transfer Instruction .................................................
MUL.L (Multiply Long): Arithmetic Instruction.................................................
MULS.W (Multiply as Signed Word): Arithmetic Instruction ............................
MULU.W (Multiply as Unsigned Word): Arithmetic Instruction .......................
NEG (Negate): Arithmetic Instruction .................................................................
NEGC (Negate with Carry): Arithmetic Instruction ............................................
NOP (No Operation): System Control Instruction ...............................................
NOT (NOT—Logical Complement): Logic Operation Instruction .....................
OR (OR Logical) Logic Operation Instruction ....................................................
ROTCL (Rotate with Carry Left): Shift Instruction.............................................
ROTCR (Rotate with Carry Right): Shift Instruction ..........................................
ROTL (Rotate Left): Shift Instruction..................................................................
ROTR (Rotate Right): Shift Instruction ...............................................................
RTE (Return from Exception): System Control Instruction ................................
RTS (Return from Subroutine): Branch Instruction
(Class: Delayed Branch Instruction) ....................................................................
7.2.50 SETT (Set T Bit): System Control Instruction .....................................................
7.2.51 SHAL (Shift Arithmetic Left): Shift Instruction..................................................
7.2.52 SHAR (Shift Arithmetic Right): Shift Instruction................................................
7.2.53 SHLL (Shift Logical Left): Shift Instruction........................................................
7.2.54 SHLLn (Shift Logical Left n Bits): Shift Instruction ...........................................
7.2.55 SHLR (Shift Logical Right): Shift Instruction .....................................................
7.2.56 SHLRn (Shift Logical Right n Bits): Shift Instruction ........................................
7.2.57 SLEEP (Sleep): System Control Instruction ........................................................
7.2.58 STC (Store Control Register): System Control Instruction
(Interrupt Disabled Instruction)............................................................................
7.2.59 STS (Store System Register): System Control Instruction
(Interrupt Disabled Instruction)............................................................................
7.2.60 SUB (Subtract Binary): Arithmetic Instruction....................................................
7.2.61 SUBC (Subtract with Carry): Arithmetic Instruction ..........................................
7.2.62 SUBV (Subtract with V Flag Underflow Check): Arithmetic Instruction ...........
7.2.63 SWAP (Swap Register Halves): Data Transfer Instruction .................................
7.2.64 TAS (Test and Set): Logic Operation Instruction ................................................
7.2.65 TRAPA (Trap Always): System Control Instruction ...........................................
7.2.66 TST (Test Logical): Logic Operation Instruction ................................................
7.2.67 XOR (Exclusive OR Logical): Logic Operation Instruction................................
7.2.68 XTRCT (Extract): Data Transfer Instruction .......................................................
Floating Point Instructions and FPU Related CPU Instructions........................................
7.3.1 FABS (Floating Point Absolute Value): Floating Point Instruction ....................
7.3.2 FADD (Floating Point Add): Floating Point Instruction......................................
7.3.3 FCMP (Floating Point Compare): Floating Point Instruction..............................
7.3.4 FDIV (Floating Point Divide): Floating Point Instruction ...................................
119
120
121
122
123
124
125
126
127
128
130
131
132
133
134
135
137
138
139
140
141
143
144
146
147
149
151
152
153
154
156
157
158
160
162
163
165
166
168
172
v
7.3.5
7.3.6
7.3.7
7.3.8
7.3.9
7.3.10
7.3.11
7.3.12
7.3.13
7.3.14
7.3.15
FLDI0 (Floating Point Load Immediate 0): Floating Point Instruction ...............
FLDI1 (Floating Point Load Immediate 1): Floating Point Instruction ...............
FLDS (Floating Point Load to System Register): Floating Point Instruction ......
FLOAT (Floating Point Convert from Integer): Floating Point Instruction.........
FMAC (Floating Point Multiply Accumulate): Floating Point Instruction..........
FMOV (Floating Point Move): Floating Point Instruction ..................................
FMUL (Floating Point Multiply): Floating Point Instruction ..............................
FNEG (Floating Point Negate): Floating Point Instruction..................................
FSTS (Floating Point Store From System Register): Floating Point Instruction.
FSUB (Floating Point Subtract): Floating Point Instruction................................
FTRC (Floating Point Truncate And Convert To Integer):
Floating Point Instruction .....................................................................................
7.3.16 LDS (Load to System Register): FPU Related CPU Instruction..........................
7.3.17 STS (Store from FPU System Register): FPU Related CPU Instruction .............
Section 8
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
Pipeline Operation ..........................................................................................
Basic Configuration of Pipelines .......................................................................................
Slot and Pipeline Flow.......................................................................................................
Number of Instruction Execution Cycles ..........................................................................
Contention between Instruction Fetch (IF) and Memory Access (MA)............................
Relationship between Load Instructions and the Instructions that Follow........................
FPU Contention .................................................................................................................
Programming Guide ..........................................................................................................
Operation of Instruction Pipelines .....................................................................................
8.8.1 Data Transfer Instructions ....................................................................................
8.8.2 Arithmetic Instructions.........................................................................................
8.8.3 Logic Operation Instructions................................................................................
8.8.4 Shift Instructions ..................................................................................................
8.8.5 Branch Instructions ..............................................................................................
8.8.6 System Control Instructions .................................................................................
8.8.7 Exception Processing............................................................................................
8.8.8 Relationship between Floating-point Instructions and FPU-related
CPU Instructions ..................................................................................................
174
175
176
177
178
181
185
187
188
189
192
194
197
201
201
203
205
206
209
210
212
212
222
225
259
261
262
265
271
273
Appendix A Instruction Code ........................................................................................... 287
A.1
vi
Instruction Set by Addressing Mode .................................................................................
A.1.1 No Operand ..........................................................................................................
A.1.2 Direct Register Addressing ..................................................................................
A.1.3 Indirect Register Addressing ................................................................................
A.1.4 Post-Increment Indirect Register Addressing ......................................................
A.1.5 Pre-Decrement Indirect Register Addressing.......................................................
A.1.6 Indirect Register Addressing with Displacement .................................................
A.1.7 Indirect Indexed Register Addressing ..................................................................
287
288
289
292
293
294
295
295
A.2
A.3
A.4
A.1.8 Indirect GBR Addressing with Displacement ......................................................
A.1.9 Indirect Indexed GBR Addressing .......................................................................
A.1.10 PC Relative Addressing with Displacement ........................................................
A.1.11 PC Relative Addressing........................................................................................
A.1.12 Immediate .............................................................................................................
Instruction Sets by Instruction Format ..............................................................................
A.2.1 0 Format................................................................................................................
A.2.2 n Format................................................................................................................
A.2.3 m Format ..............................................................................................................
A.2.4 nm Format ............................................................................................................
A.2.5 md Format ............................................................................................................
A.2.6 nd4 Format............................................................................................................
A.2.7 nmd Format ..........................................................................................................
A.2.8 d Format................................................................................................................
A.2.9 d12 Format............................................................................................................
A.2.10 nd8 Format............................................................................................................
A.2.11 i Format ................................................................................................................
A.2.12 ni Format ..............................................................................................................
Instruction Set by Instruction Code ...................................................................................
Operation Code Map..........................................................................................................
296
296
296
297
297
299
300
301
303
305
308
308
309
309
310
310
310
311
312
320
Appendix B Pipeline Operation and Contention ........................................................ 323
vii
Section 1 Features
1.1
SH-2E Features
The SH-2E CPU has RISC-type instruction sets. Basic instructions are executed in one clock
cycle, which dramatically improves instruction execution speed. The CPU also has an internal 32bit architecture for enhanced data processing ability. Table 1.1 lists the SH-2E CPU features.
Table 1.1
SH-2E CPU Features
Item
Architecture
Feature
• Original Hitachi architecture
• 32-bit internal data bus
General-register machine
• Sixteen 32-bit general registers
• Three 32-bit control registers
• Four 32-bit system registers
• Sixteen 32-bit froating-point registers
• Two 32-bit froating point system registers
Instruction set
• Instruction length: 16-bit fixed length for improved code efficiency
• Load-store architecture (basic arithmetic and logic operations are
executed between registers)
• Delayed branch system used for reduced pipeline disruption
• Instruction set optimized for C language
Instruction execution time
• One instruction/cycle for basic instructions
Address space
• Architecture makes 4 Gbytes available
On-chip multiplier
• Multiplication operations executed in 1 to 2 cycles (16 bits × 16 bits
→ 32 bits) or 2 to 4 cycles (32 bits × 32 bits → 64 bits), and
multiplication/accumulation operations executed in 3/(2)*cycles (16
bits × 16 bits + 64 bits → 64 bits) or 3/(2 to 4)* cycles (32 bits × 32
bits + 64 bits → 64 bits)
Pipeline
• Five-stage pipeline
Processing states
• Reset state
• Exception processing state
• Program execution state
• Power-down state
• Bus release state
Power-down states
• Sleep mode
• Standby mode
1
Table 1.1
SH-2E CPU Features (cont)
Feature
Description
FPU
• Single-precision floating point format
• Subset of IEEE754 standard data types
• Invalid calculation exception and divide-by-zero exception (in
compliance with IEEE754 standard)
• Rounding to zero (in compliance with IEEE754 standard)
• General purpose register file, 16 32-bit floating point registers
• Execution pitch for basic instructions: 1 cycle/latency or 2 cycles
(FADD, FSUB, FMUL)
• FMAC (floating point multiply accumulate)
Execution pitch: 1 cycle/latency or 2 cycles
• Support for FDIV
• Support for FLDI0 and FLDI1 (load constant 0/1)
Note: The normal minimum number of execution cycles The number in parentheses in the
mumber in contention with preceding/following instructions.
2
Section 2 Register Configuration
The register set consists of sixteen 32-bit general registers, three 32-bit control registers and four
32-bit system registers.
2.1
General Registers
There are 16 general registers (Rn) numbered R0–R15, which are 32 bits in length. General
registers are used for data processing and address calculation. R0 is also used as an index register.
Several instructions use R0 as a fixed source or destination register. R15 is used as the hardware
stack pointer (SP). Saving and recovering the status register (SR) and program counter (PC) in
exception processing is accomplished by referencing the stack using R15.
31
0
R0 *
1
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13
R14
R15, SP
(hardware stack pointer) *2
Notes: 1. R0 functions as an index register in the indirect indexed
register addressing mode and indirect indexed GBR
addressing mode. In some instructions, R0 functions as
a fixed source register or destination register.
2. R15 functions as a hardware stack pointer (SP) during
exception processing.
Figure 2.1 General Registers (SH-1 and SH-2)
3
2.2
Control Registers
The 32-bit control registers consist of the 32-bit status register (SR), global base register (GBR),
and vector base register (VBR). The status register indicates processing states. The global base
register functions as a base address for the indirect GBR addressing mode to transfer data to the
registers of on-chip peripheral modules. The vector base register functions as the base address of
the exception processing vector area (including interrupts).
31
SR
9 8 7 6 5 4 32 1 0
M Q I3 I2 I1 I0
ST
SR: Status register
T bit: The MOVT, CMP/cond, TAS, TST,
BT (BT/S), BF (BF/S), SETT, and CLRT
instructions use the T bit to indicate
true (1) or false (0). The ADDV/C,
SUBV/C, DIV0U/S, DIV1, NEGC,
SHAR/L, SHLR/L, ROTR/L, and
ROTCR/L instructions also use bit T
to indicate carry/borrow or overflow/
underflow
S bit: Used by the multiply/accumulate
instruction.
Reserved bits: Always reads as 0, and should
always be written with 0.
Bits I3–I0: Interrupt mask bits.
M and Q bits: Used by the DIV0U/S and
DIV1 instructions.
31
GBR
31
VBR
Global base register (GBR):
0 Indicates the base address of the indirect
GBR addressing mode. The indirect GBR
addressing mode is used in data transfer
for on-chip peripheral module register
areas and in logic operations.
0 Vector base register (VBR):
Indicates the base address of the exception
processing vector area.
Figure 2.2 Control Registers
4
2.3
System Registers
System registers consist of four 32-bit registers: high and low multiply and accumulate registers
(MACH and MACL), the procedure register (PR), and the program counter (PC). The multiply
and accumulate registers store the results of multiply and multiply and accumulate operations. The
procedure register stores the return address from the subroutine procedure. The program counter
indicates the address of the program executing and controls the flow of the processing.
9
31
0
MACH
MACL
Multiply and accumulate
register high (MACH)
Multiply and accumulate
register low (MACL)
0
31
PR
Procedure register (PR)
0
31
PC
Program counter (PC)
Figure 2.3 Organization of the System Registers
5
2.4
Floating-Point Registers
There are sixteen 32-bit floating-point registers, designated FR0 to FR15, which are used by
floating-point instructions. FR0 functions as the index register for the FMAC instruction. These
registers are incorporated into the floating-point unit (FPU). For details, see section 4, FloatingPoint Unit.
31
0
FR0
FR0 functions as the index register
for the FMAC instruction.
FR1
FR2
FR3
FR4
FR5
FR6
FR7
FR8
FR9
FR10
FR11
FR12
FR13
FR14
FR15
Figure 2.4 Floating-Point Registers
6
2.5
Floating-Point System Registers
There are two 32-bit floating-point system registers: the floating-point communication register
(FPUL) and the floating-point status/control register (FPSCR). FPUL is used for communication
between the CPU and the floating-point unit (FPU). FPSCR indicates and stores status/control
information relating to FPU exceptions.
These registers are incorporated into the floating-point unit (FPU). For details, see section 4,
Floating-Point Unit.
31
0
FPUL:
FPUL
31
Floating-point communication register
Used for communication between
the CPU and the FPU.
0
FPSCR
FPSCR: Floating-point status/control register
Indicates and stores status/control
information relating to FPU exceptions.
Figure 2.5 Floating-Point System Registers
7
2.6
Initial Values of Registers
Table 2.1 lists the values of the registers after reset.
Table 2.1
Initial Values of Registers
Classification
Register
Initial Value
General registers
R0–R14
Undefined
R15 (SP)
Value of the stack pointer in the vector
address table
SR
Bits I3–I0 are 1111 (H'F), reserved bits are
0, and other bits are undefined
GBR
Undefined
VBR
H'00000000
MACH, MACL, PR
Undefined
PC
Value of the program counter in the vector
address table
Floating-point registers
FR0–FR15
Undefined
Floating-point system registers
FPUL
Undefined
FPSCR
H'00040001
Control registers
System registers
8
Section 3 Data Formats
3.1
Data Format in Registers
Register operands are always longwords (32 bits). When data in memory is loaded to a register
and the memory operand is only a byte (8 bits) or a word (16 bits), it is sign-extended into a
longword when stored into a register.
31
0
Longword
Figure 3.1 Data Format in Registers
3.2
Data Format in Memory
Memory data formats are classified into bytes, words, and longwords. Byte data can be accessed
from any address, but an address error will occur if you try to access word data starting from an
address other than 2n or longword data starting from an address other than 4n. In such cases, the
data accessed cannot be guaranteed. The hardware stack area, which is referred to by the hardware
stack pointer (SP, R15), uses only longword data starting from address 4n because this area stores
the program counter (PC) and status register (SR). See the hardware manual for more information
on address errors.
Address m + 1
Address m
Byte
Address 2n
Address 4n
Address m + 2
23
31
Address m + 3
7
15
Byte
Byte
Word
0
Byte
Word
Longword
Figure 3.2 Data Format in Memory
9
3.3
Immediate Data Format
Byte immediate data is located in an instruction code. Immediate data accessed by the MOV,
ADD, and CMP/EQ instructions is sign-extended and is handled in registers as longword data.
Immediate data accessed by the TST, AND, OR, and XOR instructions is zero-extended and is
handled as longword data. Consequently, AND instructions with immediate data always clear the
upper 24 bits of the destination register.
Word or longword immediate data is not located in the instruction code but rather is stored in a
memory table. The memory table is accessed by a immediate data transfer instruction (MOV)
using the PC relative addressing mode with displacement. Specific examples are given in 5.1
Immediate Data in Section 5, Instruction Features.
10
Section 4 Floating-Point Unit (FPU)
4.1
Overview
The SH-2E has an on-chip floating-point unit (FPU), The FPU’s register configuration is shown in
figure 4.1.
Floating-point registers
31
0
FR0 functions as the index register
for the FMAC instruction.
FR0
FR1
FR2
FR3
FR4
FR5
FR6
FR7
FR8
FR9
FR10
FR11
FR12
FR13
FR14
FR15
Floating-point system registers
31
0
Floating-point communication register
Specifies buffer as communication register between CPU
and FPU*.
0
Floating-point status/control register
Indicates status/control information relating to FPU
exceptions*.
FPUL
31
FPSCR
Note: * For details, see section 4.2, Floating-Point Registers and Floating-Point System Registers.
Figure 4.1 Overview of Register Configuration
(Floating-Point Registers and Floating-Point System Registers)
11
4.2
Floating-Point Registers and Floating-Point System Registers
4.2.1
Floating-Point Register File
The SH-2E has sixteen 32-bit single-precision floating-point registers. Register specifications are
always made as 4 bits. In assembly language, the floating-point registers are specified as FR0,
FR1, FR2, and so on. FR0 functions as the index register for the FMAC instruction.
4.2.2
Floating-Point Communication Register (FPUL)
Information for transfer between the FPU and the CPU is transferred via the FPUL communication
register, which resembles MACL and MACH in the integer unit. The SH-2E is provided with this
communication register since the integer and floating-point formats are different. The 32-bit FPUL
is a system register, and is accessed by the CPU by means of LDS and STS instructions.
4.2.3
Floating-Point Status/Control Register (FPSCR)
The SH-2E has a floating-point status/control register (FPSCR) that functions as a system register
accessed by means of LDS and STS instructions (figure 4.2). FPSCR can be written to by a user
program. This register is part of the process context, and must be saved when the context is
switched. It may also be necessary to save this register when a procedure call is made.
FPSCR is a 32-bit register that controls the storage of detailed information relating to the rounding
mode, asymptotic underflow (denormalized numbers), and FPU exceptions. The module stop bit
that disables the FPU itself is provided in the module standby control register (MSTCR). For
details, refer to hardware manual. After a reset start, the FPU is enabled.
Table 4.1 shows the flags corresponding the five kinds of FPU exception. A sixth flag is also
provided as an FPU error flag that indicates an floating-point unit error state not covered by the
other five flags.
Table 4.1
Floating-Point Exception Flags
Flag
Meaning
Support in SH-2E
E
FPU error
—
V
Invalid operation
Yes
Z
Division by zero
Yes
O
Overflow (value not expressed)
—
U
Underflow (value not expressed)
—
I
Inexact (result not expressed)
—
12
The bits in the cause field indicate the exception cause for the instruction executing at the time.
The cause bits are modified by a floating-point instruction. These bits are set to 1 or cleared to 0
according to whether or not an exception state occurred during execution of a single instruction.
The bits in the enable field specify the kinds of exception to be enabled, allowing the flow to be
changed to exception processing. If the cause bit corresponding to an enable bit is set by the
currently executing instruction, an exception occurs.
The bits in the flag field are used to keep a tally of all exceptions that occur during a series of
instructions. Once one of these bits is set by an instruction, it is not reset by a subsequent
instruction. The bits in this field can only be reset by the explicit execution of a store operation on
FPSCR.
13
31
19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Cause field
Reserved
Enable field
Flag field
DN CE CV CZ CO CU CI EV EZ EO EU EI FV FZ FO FU FI RM
DN:
Denormalized bit
In the SH-2E this bit is always set to 1, and the source or destination operand
of a denormalized number is 0. This bit cannot be modified even by an LDS
instruction.
CV:
Invalid operation cause bit
When 1: Indicates that an invalid operation exception occurred during execution
of the current instruction.
When 0: Indicates that an invalid operation exception has not occurred.
CZ:
Division-by-zero cause bit
When 1: Indicates that a division-by-zero exception occurred during execution
of the current instruction.
When 0: Indicates that a division-by-zero exception has not occurred.
EV:
Invalid operation exception enable
When 1: Enables invalid operation exception generation.
When 0: An invalid operation exception is not generated, and a qNAN is returned
as the result.
EZ:
Division-by-zero exception enable
When 1: Enables exception generation due to division-by-zero during execution
of the current instruction.
When 0: A division-by-zero exception is not generated, and infinity with the sign
(+ or –) of the current expression is returned as the result.
FV:
Invalid operation exception flag bit
When 1: Indicates that an invalid operation exception occurred during instruction
execution.
When 0: Indicates that an invalid operation exception has not occurred.
FZ:
Division-by-zero exception flag bit
When 1: Indicates that a division-by-zero exception occurred during instruction
execution.
When 0: Indicates that a division-by-zero exception has not occurred.
RM: Rounding bits. In the SH-2E, the value of these bits is always 01, meaning that
rounding to zero (RZ mode) is being used. These bits cannot be modified even by
an LDS instruction.
In the SH-2E, the cause field EOUI bits (CE, CO, CU, and CI), enable field OUI bits (EO,
EU, and EI), and flag field OUI bits (FO, FU, and FI), and the reserved area, are preset
to 0, and cannot be modified even by using an LDS instruction.
Figure 4.2 Floating-Point Status/Control Register
14
4.3
Floating-Point Format
4.3.1
Floating-Point Format
The SH-2E supports single-precision floating-point operations, and fully complies with the
IEEE754 floating-point standard.
A floating-point number consists of the following three fields:
• Sign (s)
• Exponent (e)
• Fraction (f)
The exponent is expressed in biased form, as follows:
e = E + bias
The range of unbiased exponent E is E min – 1 to Emax + 1. The two values Emin – 1 and Emax + 1 are
distinguished as follows. Emin – 1 indicates zero (both positive and negative sign) and a
denormalized number, and E max + 1 indicates positive or negative infinity or a non-number (NaN).
In a single-precision operation, the bias value is 127, E min is –126, and Emax is 127.
31 30
s
23 22
e
0
f
Figure 4.3 Floating-Point Number Format
Floating-point number value v is determined as follows:
If E = Emax + 1 and f! = 0, v is a non-number (NaN) irrespective of sign s
If E = Emax + 1 and f = 0, v = (-1)s (infinity) [positive or negative infinity]
If Emin <= E <= Emax , v = (-1)s2E (1.f) [normalized number]
If E = Emin – 1 and f! = 0, v = (-1)s2Emin (0.f) [denormalized number]
If E = Emin – 1 and f = 0, v = (-1)s0 [positive or negative zero]
15
4.3.2
Non-Numbers (NaN)
With non-number (NaN) representation in a single-precision operation value, at least one of bits
22 to 0 is set. If bit 22 is set, this indicates a signaling NaN (sNaN). If bit 22 is reset, the value is a
quiet NaN (qNaN).
The bit pattern of a non-number (NaN) is shown in the figure below. Bit N in the figure is set for a
signaling NaN and reset for a quiet NaN. x indicates a don’t care bit (with the proviso that at least
one of bits 22 to 0 is set). In a non-number (NaN), the sign bit is a don’t care bit.
31 30
x
23 22
11111111
0
Nxxxxxxxxxxxxxxxxxxxxxx
N = 1: sNaN
N = 0: qNaN
Figure 4.4 NaN Bit Pattern
If a non-number (sNaN) is input in an operation that generates a floating-point value:
• When the EV bit in the FPSCR register is reset, the operation result (output) is a quiet NaN
(qNaN).
• When the EV bit in the FPSCR register is set, an invalid operation exception will be generated.
In this case, the contents of the operation destination register do not change.
If a quiet NaN is input in an operation that generates a floating-point value, and a signaling NaN
has not been input in that operation, the output will always be a quiet NaN irrespective of the
setting of the EV bit in the FPSCR register. An exception will not be generated in this case.
Refer to section 7, Instruction Descriptions for details of floating-point operations when a nonnumber (NaN) is input.
4.3.3
Denormalized Number Values
For a denormalized number floating-point value, the biased exponent is expressed as 0, the
fraction as a non-zero value, and the hidden bit as 0. In the SH-2E’s floating-point unit, a
denormalized number (operand source or operation result) is always flushed to 0 in a floatingpoint operation that generates a value (an operation other than copy).
16
4.3.4
Other Special Values
Floating-point value representations include the seven different kinds of special values shown in
table 4.2.
Table 4.2
Representation of Special Values in Single-Precision Floating-Point Operations
Specified by IEEE754 Standard
Value
Representation
+0.0
0x00000000
–0.0
0x80000000
Denormalized number
As described in 4.3.3, Denormalized Number Values
+INF
0x7F800000
–INF
0xFF800000
qNaN (quiet NaN)
As described in 4.3.2, Non-Numbers (NaN)
sNaN (signaling NaN)
As described in 4.3.2, Non-Numbers (NaN)
17
4.4
Floating-Point Exception Model
4.4.1
Enable State Exceptions
Invalid operation and division-by-zero exceptions are both placed in the enable state by setting the
enable bit. All exceptions generated by the FPU are mapped as the same exception event. The
meaning of a particular exception is determined by software by reading system register FPSCR
and analyzing the information held there.
4.4.2
Disable State Exceptions
If the EV enable bit is not set, a qNaN will be generated as the result of an invalid operation
(except for FCMP and FTRC). If the EZ enable bit is not set, division-by-zero will return infinity
with the sign (+ or –) of the current expression. Overflow will generate a finite number which is
the largest value that can be expressed by an absolute value in the format, with the correct sign.
Underflow will generate zero with the correct sign. If the operation result is inexact, the
destination register will store that inexact result.
4.4.3
FPU Exception Event and Code
All FPU exceptions have a vector table address offset in address H'00000034 as the same general
exception event; that is, an FPU exception.
4.4.4
Floating-Point Data Arrangement in Memory
Single-precision floating-point data is located in memory at a 4-byte boundary; that is, it is
arranged in the same form as an SH-2E long integer.
4.4.5
Arithmetic Operations Involving Special Operands
All arithmetic operations involving special operands (qNaN, sNaN, +INF, –INF, +0, –0) comply
with the specifications of the IEEE754 standard. Refer to section 7, Instruction Descriptions for
details.
18
4.5
Synchronization with CPU
Synchronization with CPU: Floating-point instructions and CPU instructions are executed in
turn, according to their order in the program, but in some cases operations may not be completed
in the program order due to a difference in execution cycles. When a floating-point instruction
accesses only FPU resources, there is no need for synchronization with the CPU, and a CPU
instruction following an FPU instruction can finish its operation before completion of the FPU
operation. Consequently, in an optimized program, it is possible to effectively conceal the
execution cycle of a floating-point instruction that requires a long execution cycle, such as a divide
instruction. On the other hand, a floating-point instruction that accesses CPU resources, such as a
compare instruction, must be synchronized to ensure that the program order is observed.
Floating-Point Instructions That Require Synchronization: Load, store, and compare
instructions, and instructions that access the FPUL or FPSCR register, must be synchronized
because they access CPU resources. Load and store instructions access a general register. Postincrement load and pre-decrement store instructions change the contents of a general register. A
compare instruction modifies the T bit. An FPUL or FPSCR access instruction references or
changes the contents of the FPUL or FPSCR register. These references and changes must all be
synchronized with the CPU.
19
Section 5 Instruction Features
5.1
RISC-Type Instruction Set
All instructions are RISC type. Their features are detailed in this section.
16-Bit Fixed Length: All instructions are 16 bits long, increasing program coding efficiency.
One Instruction/Cycle: Basic instructions can be executed in one cycle using the pipeline system.
Instructions are executed in 50 ns at 40 MHz.
Data Length: Longword is the standard data length for all operations. Memory can be accessed in
bytes, words, or longwords. Byte or word data accessed from memory is sign-extended and
calculated with longword data. Immediate data is sign-extended for arithmetic operations or zeroextended for logic operations. It also is calculated with longword data.
Table 5.1
Sign Extension of Word Data
SH-2E CPU
MOV.W
@(disp,PC),R1
ADD
R1,R0
.........
.DATA.W
H'1234
Description
Example for Other CPU
Data is sign-extended to 32
bits, and R1 becomes
H'00001234. It is next
operated upon by an ADD
instruction.
ADD.W
#H'1234,R0
Note: The address of the immediate data is accessed by @(disp, PC).
Load-Store Architecture: Basic operations are executed between registers. For operations that
involve memory access, data is loaded to the registers and executed (load-store architecture).
Instructions such as AND that manipulate bits, however, are executed directly in memory.
Delayed Branch Instructions: Unconditional branch instructions are delayed. Pipeline disruption
during branching is reduced by first executing the instruction that follows the branch instruction,
and then branching (table 5.2). With delayed branching, branching occurs after execution of the
slot instruction. However, instructions such as register changes etc. are executed in the order of
delayed branch instruction, then delay slot instruction. For example, even if the register in which
the branch destination address has been loaded is changed by the delay slot instruction, the branch
will still be made using the value of the register prior to the change as the branch destination
address.
21
Table 5.2
Delayed Branch Instructions
SH-2E CPU
Description
Example for Other CPU
BRA
TRGET
ADD.W
R1,R0
ADD
R1,R0
Executes an ADD before
branching to TRGET.
BRA
TRGET
Multiplication/Accumulation Operation: 16bit × 16bit → 32-bit multiplication operations are
executed in one to two cycles. 16bit × 16bit + 64bit → 64-bit multiplication/accumulation
operations are executed in two to three cycles. 32bit × 32bit → 64-bit multiplication and 32bit ×
32bit + 64bit → 64-bit multiplication/accumulation operations are executed in two to four cycles.
T Bit: The T bit in the status register changes according to the result of the comparison, and in
turn is the condition (true/false) that determines if the program will branch. The number of
instructions after T bit in the status register is kept to a minimum to improve the processing speed.
Table 5.3
T Bit
SH-2E CPU
Description
Example for Other CPU
CMP/GE
R1,R0
CMP.W
R1,R0
BT
TRGET0
BGE
TRGET0
BF
TRGET1
T bit is set when R0 R1. The
program branches to TRGET0
when R0 R1 and to TRGET1
when R0 < R1.
BLT
TRGET1
ADD
#–1,R0
SUB.W
#1,R0
CMP/EQ
#0,R0
T bit is not changed by ADD. T
bit is set when R0 = 0. The
program branches if R0 = 0.
BEQ
TRGET
BT
TRGET
Immediate Data: Byte immediate data is located in instruction code. Word or longword
immediate data is not input via instruction codes but is stored in a memory table. The memory
table is accessed by an immediate data transfer instruction (MOV) using the PC relative
addressing mode with displacement.
22
Table 5.4
Immediate Data Accessing
Classification
SH-2E CPU
Example for Other CPU
8-bit immediate
MOV
#H'12,R0
MOV.B
#H'12,R0
16-bit immediate
MOV.W
@(disp,PC),R0
MOV.W
#H'1234,R0
MOV.L
#H'12345678,R0
.................
32-bit immediate
.DATA.W
H'1234
MOV.L
@(disp,PC),R0
.................
.DATA.L
H'12345678
Note: The address of the immediate data is accessed by @(disp, PC).
Absolute Address: When data is accessed by absolute address, the value already in the absolute
address is placed in the memory table. Loading the immediate data when the instruction is
executed transfers that value to the register and the data is accessed in the indirect register
addressing mode.
Table 5.5
Absolute Address
Classification
SH-2E CPU
Absolute address
MOV.L
MOV.B
Example for Other CPU
@(disp,PC),R1
MOV.B
@H'12345678,R0
@R1,R0
..................
.DATA.L
H'12345678
16-Bit/32-Bit Displacement: When data is accessed by 16-bit or 32-bit displacement, the preexisting displacement value is placed in the memory table. Loading the immediate data when the
instruction is executed transfers that value to the register and the data is accessed in the indirect
indexed register addressing mode.
Table 5.6
Displacement Accessing
Classification
SH-2E CPU
Example for Other CPU
16-bit displacement
MOV.W
@(disp,PC),R0
MOV.W
@(R0,R1),R2
MOV.W
@(H'1234,R1),R2
..................
.DATA.W
H'1234
23
5.2
Addressing Modes
Addressing modes effective address calculation by the CPU core are described below.
Table 5.7
Addressing Modes and Effective Addresses
Addressing Instruction
Mode
Format
Effective Addresses Calculation
Formula
Direct
register
addressing
Rn
The effective address is register Rn. (The operand is
the contents of register Rn.)
Ñ
Indirect
register
addressing
@Rn
The effective address is the content of register Rn.
Rn
Postincrement
indirect
register
addressing
@Rn +
Rn
Rn
The effective address is the content of register Rn. A
constant is added to the content of Rn after the
instruction is executed. 1 is added for a byte
operation, 2 for a word operation, or 4 for a longword
operation.
Rn
Rn
Rn + 1/2/4
@ÐRn
Rn
1/2/4
24
Byte: Rn + 1
→ Rn
Longword:
Rn + 4 → Rn
The effective address is the value obtained by
subtracting a constant from Rn. 1 is subtracted for a
byte operation, 2 for a word operation, or 4 for a
longword operation.
Rn Ð 1/2/4
(After the
instruction is
executed)
Word: Rn + 2
→ Rn
+
1/2/4
Predecrement
indirect
register
addressing
Rn
Ð
Rn Ð 1/2/4
Byte: Rn Ð 1
→ Rn
Word: Rn Ð 2
→ Rn
Longword:
Rn Ð 4 → Rn
(Instruction
executed
with Rn after
calculation)
Table 5.7
Addressing Modes and Effective Addresses (cont)
Addressing Instruction
Mode
Format
Indirect
register
addressing
with
displacement
@(disp:4,
Rn)
Effective Addresses Calculation
Formula
The effective address is Rn plus a 4-bit displacement
(disp). The value of disp is zero-extended, and
remains the same for a byte operation, is doubled for
a word operation, or is quadrupled for a longword
operation.
Byte: Rn +
disp
Word: Rn +
disp × 2
Longword:
Rn + disp × 4
Rn
disp
(zero-extended)
+
Rn
+ disp × 1/2/4
×
1/2/4
Indirect
indexed
register
addressing
@(R0, Rn)
The effective address is the Rn value plus R0.
Rn + R0
Rn
+
Rn + R0
R0
Indirect
GBR
addressing
with
displacement
@(disp:8,
GBR)
The effective address is the GBR value plus an 8-bit
displacement (disp). The value of disp is zeroextended, and remains the same for a byte
operation, is doubled for a word operation, or is
quadrupled for a longword operation.
GBR
disp
(zero-extended)
+
GBR
+ disp × 1/2/4
Byte: GBR +
disp
Word: GBR +
disp × 2
Longword:
GBR + disp ×
4
×
1/2/4
Indirect
indexed
GBR
addressing
@(R0,
GBR)
The effective address is the GBR value plus R0.
GBR + R0
GBR
+
GBR + R0
R0
25
Table 5.7
Addressing Modes and Effective Addresses (cont)
Addressing Instruction
Mode
Format
PC relative
addressing
with
displacement
@(disp:8,
PC)
Effective Addresses Calculation
Formula
The effective address is the PC value plus an 8-bit
displacement (disp). The value of disp is zeroextended, and disp is doubled for a word operation,
or is quadrupled for a longword operation. For a
longword operation, the lowest two bits of the PC are
masked.
Word: PC +
disp × 2
Longword:
PC &
H'FFFFFFFC
+ disp × 4
PC
(for longword)
&
H'FFFFFFFC
PC + disp × 2
or
PC&H'FFFFFFFC
+ disp × 4
+
disp
(zero-extended)
x
2/4
PC relative
addressing
disp:8
The effective address is the PC value sign-extended
with an 8-bit displacement (disp), doubled, and
added to the PC.
PC + disp × 2
PC
disp
(sign-extended)
+
PC + disp × 2
×
2
disp:12
The effective address is the PC value sign-extended
with a 12-bit displacement (disp), doubled, and
added to the PC.
PC
disp
(sign-extended)
+
×
2
26
PC + disp × 2
PC + disp × 2
Table 5.7
Addressing Modes and Effective Addresses (cont)
Addressing Instruction
Mode
Format
PC relative
addressing
(cont)
Rn
Effective Addresses Calculation
Formula
The effective address is the register PC plus Rn.
PC + Rn
PC
+
PC + R0
R0
Immediate
addressing
5.3
#imm:8
The 8-bit immediate data (imm) for the TST, AND,
OR, and XOR instructions are zero-extended.
Ñ
#imm:8
The 8-bit immediate data (imm) for the MOV, ADD,
and CMP/EQ instructions are sign-extended.
Ñ
#imm:8
Immediate data (imm) for the TRAPA instruction is
zero-extended and is quadrupled.
Ñ
Instruction Format
The instruction format table, table 5.8, refers to the source operand and the destination operand.
The meaning of the operand depends on the instruction code. The symbols are used as follows:
•
•
•
•
•
xxxx: Instruction code
mmmm: Source register
nnnn: Destination register
iiii: Immediate data
dddd: Displacement
Table 5.8
Instruction Formats
Instruction Formats
0 format
15
Source
Operand
Destination
Operand
Example
Ñ
Ñ
NOP
Ñ
nnnn: Direct
register
MOVT
Rn
Control register
or system
register
nnnn: Direct
register
STS
MACH,Rn
0
xxxx
xxxx
xxxx
xxxx
n format
15
0
xxxx
nnnn
xxxx
xxxx
27
Table 5.8
Instruction Formats (cont)
Source Operand
Instruction Formats
Destination
Operand
Example
n format (cont)
Control register or nnnn: Indirect
system register
pre-decrement
register
m format
mmmm: Direct
register
Control register or LDC
system register
Rm,SR
mmmm: Indirect
post-increment
register
Control register or LDC.L
system register
@Rm+,SR
mmmm: Direct
register
Ñ
15
0
xxxx mmmm xxxx
xxxx
nm format
15
0
xxxx
nnnn mmmm xxxx
STC.L
SR,@-Rn
JMP
@Rm
mmmm: PC
Ñ
relative using Rm*
BRAF
Rm
mmmm: Direct
register
nnnn: Direct
register
ADD
Rm,Rn
mmmm: Direct
register
nnnn: Indirect
register
MOV.L
Rm,@Rn
mmmm: Indirect
post-increment
register (multiply/
accumulate)
MACH, MACL
MAC.W
@Rm+,@Rn+
mmmm: Indirect
post-increment
register
nnnn: Direct
register
MOV.L
@Rm+,Rn
mmmm: Direct
register
nnnn: Indirect
pre-decrement
register
MOV.L
Rm,@-Rn
mmmm: Direct
register
nnnn: Indirect
indexed register
MOV.L
Rm,@(R0,Rn)
mmmmdddd:
indirect register
with displacement
R0 (Direct
register)
MOV.B
@(disp,Rm),R0
R0 (Direct
register)
nnnndddd:
MOV.B
Indirect register
R0,@(disp,Rn)
with displacement
nnnn*: Indirect
post-increment
register (multiply/
accumulate)
md format
15
0
xxxx
xxxx mmmm dddd
nd4 format
15
0
xxxx
28
xxxx
nnnn
dddd
Table 5.8
Instruction Formats (cont)
Instruction Formats
nmd format
15
0
xxxx
15
0
xxxx
dddd
dddd
d12 format
15
0
xxxx
dddd
dddd
15
0
nnnn
dddd
dddd
i format
15
0
xxxx
xxxx
iiii
nnnndddd: Indirect
register with
displacement
MOV.L
Rm,@(disp,Rn)
mmmmdddd:
Indirect register
with
displacement
nnnn: Direct
register
MOV.L
@(disp,Rm),Rn
dddddddd:
Indirect GBR
with
displacement
R0 (Direct register) MOV.L
@(disp,GBR),R0
R0(Direct
register)
dddddddd: Indirect
GBR with
displacement
dddddddd: PC
relative with
displacement
R0 (Direct register) MOVA
@(disp,PC),R0
dddddddd: PC
relative
Ñ
BF
label
dddddddddddd:
PC relative
Ñ
BRA
label
dddddddd: PC
relative with
displacement
nnnn: Direct
register
MOV.L
@(disp,PC),Rn
iiiiiiii: Immediate
Indirect indexed
GBR
AND.B
#imm,@(R0,GBR)
iiiiiiii: Immediate
R0 (Direct register) AND
#imm,R0
iiiiiiii: Immediate
Ñ
TRAPA
#imm
iiiiiiii: Immediate
nnnn: Direct
register
ADD
#imm,Rn
15
0
nnnn
iiii
Example
MOV.L
R0,@(disp,GBR)
(label = disp +
PC)
iiii
ni format
xxxx
mmmm: Direct
register
dddd
nd8 format
xxxx
Destination
Operand
nnnn mmmm dddd
d format
xxxx
Source
Operand
iiii
Note: In multiply/accumulate instructions, nnnn is the source register.
29
Section 6 Instruction Set
6.1
Instruction Set by Classification
Table 6.1 shows instruction by classification
Table 6.1
Classification of Instructions
Operation
Classification Types Code
Function
Data transfer
Arithmetic
operations
5
21
No. of
Instructions
MOV
39
Data transfer, immediate data transfer,
peripheral module data transfer, structure data
transfer
MOVA
Effective address transfer
MOVT
T bit transfer
SWAP
Swap of upper and lower bytes
XTRCT
Extraction of the middle of registers connected
ADD
Binary addition
ADDC
Binary addition with carry
ADDV
Binary addition with overflow check
33
CMP/cond Comparison
DIV1
Division
DIV0S
Initialization of signed division
DIV0U
Initialization of unsigned division
DMULS
Signed double-length multiplication
DMULU
Unsigned double-length multiplication
DT
Decrement and test
EXTS
Sign extension
EXTU
Zero extension
MAC
Multiply-and-accumulate, double-length
multiply-and-accumulate operation
MUL
Double-length multiply operation
MULS
Signed multiplication
MULU
Unsigned multiplication
NEG
Negation
NEGC
Negation with borrow
31
Table 6.1
Classification of Instructions (cont)
Operation
Classification Types Code
Function
No. of
Instructions
Arithmetic
operations
(cont)
21
33
Logic
operations
6
Shift
Branch
32
10
9
SUB
Binary subtraction
SUBC
Binary subtraction with borrow
SUBV
Binary subtraction with underflow
AND
Logical AND
NOT
Bit inversion
OR
Logical OR
TAS
Memory test and bit set
TST
Logical AND and T bit set
XOR
Exclusive OR
ROTL
One-bit left rotation
ROTR
One-bit right rotation
ROTCL
One-bit left rotation with T bit
ROTCR
One-bit right rotation with T bit
SHAL
One-bit arithmetic left shift
SHAR
One-bit arithmetic right shift
SHLL
One-bit logical left shift
SHLLn
n-bit logical left shift
SHLR
One-bit logical right shift
SHLRn
n-bit logical right shift
BF
Conditional branch, conditional branch with
delay (Branch when T = 0)
BT
Conditional branch, conditional branch with
delay (Branch when T = 1)
BRA
Unconditional branch
BRAF
Unconditional branch
BSR
Branch to subroutine procedure
BSRF
Branch to subroutine procedure
JMP
Unconditional branch
JSR
Branch to subroutine procedure
RTS
Return from subroutine procedure
14
14
11
Table 6.1
Classification of Instructions (cont)
Operation
Classification Types Code
Function
No. of
Instructions
System
control
31
11
Floating-point 15
instructions
FPU-related
CPU
instructions
Total:
2
79
CLRT
T bit clear
CLRMAC
MAC register clear
LDC
Load to control register
LDS
Load to system register
NOP
No operation
RTE
Return from exception processing
SETT
T bit set
SLEEP
Transition to power-down mode
STC
Store control register data
STS
Store system register data
TRAPA
Trap exception handling
FABS
Floating-point absolute value
FADD
Floating-point addition
FCMP
Floating-point comparison
FDIV
Floating-point division
FLDI0
Floating-point load immediate 0
FLDI1
Floating-point load immediate 1
FLDS
Floating-point load into system register FPUL
FLOAT
Integer-to-floating-point conversion
FMAC
Floating-point multiply-and-accumulate operation
FMOV
Floating-point data transfer
FMUL
Floating-point multiplication
FNEG
Floating-point sign inversion
FSTS
Floating-point store from system register FPUL
FSUB
Floating-point subtraction
FTRC
Floating-point conversion with rounding to
integer
LDS
Load into floating-point system register
STS
Store from floating-point system register
22
8
172
33
Table 6.2 shows the format used in tables 6.3 to 6.8, which list instruction codes, operation, and
execution states in order by classification.
Table 6.2
Instruction Code Format
Item
Format
Explanation
Instruction
OP.Sz SRC,DEST
OP: Operation code
Sz: Size (B: byte, W: word, or L: longword)
SRC: Source
DEST: Destination
Rm: Source register
Rn: Destination register
imm: Immediate data
disp: Displacement* 1
Instruction code MSB ↔ LSB
Operation
mmmm: Source register
nnnn: Destination register
0000: R0
0001: R1
⋅
⋅
⋅
1111: R15
iiii: Immediate data
dddd: Displacement
→, ←
Direction of transfer
(xx)
Memory operand
M/Q/T
Flag bits in the SR
&
Logical AND of each bit
|
Logical OR of each bit
^
Exclusive OR of each bit
~
Logical NOT of each bit
<<n
n-bit left shift
>>n
n-bit right shift
Execution
cycles
—
Value when no wait states are inserted*2
T bit
—
Value of T bit after instruction is executed. An em-dash (—)
in the column means no change.
Notes: 1. Depending on the operand size, displacement is scaled ×1, ×2, or ×4. For details, see
section 7, Instruction Descriptions.
2. Instruction execution cycles: The execution cycles shown in the table are minimums.
The actual number of cycles may be increased when (1) contention occurs between
instruction fetches and data access, or (2) when the destination register of the load
instruction (memory → register) and the register used by the next instruction are the
same.
34
Table 6.3
Data Transfer Instructions
Execution
Cycles
T
Bit
Instruction
Instruction Code
Operation
MOV
#imm,Rn
1110nnnniiiiiiii
imm → Sign extension →
Rn
1
—
MOV.W @(disp,PC),Rn
1001nnnndddddddd
(disp × 2 + PC) → Sign
extension → Rn
1
—
MOV.L @(disp,PC),Rn
1101nnnndddddddd
(disp × 4 + PC) → Rn
1
—
MOV
0110nnnnmmmm0011
Rm → Rn
1
—
MOV.B Rm,@Rn
0010nnnnmmmm0000
Rm → (Rn)
1
—
MOV.W Rm,@Rn
0010nnnnmmmm0001
Rm → (Rn)
1
—
MOV.L Rm,@Rn
0010nnnnmmmm0010
Rm → (Rn)
1
—
MOV.B @Rm,Rn
0110nnnnmmmm0000
(Rm) → Sign extension →
Rn
1
—
MOV.W @Rm,Rn
0110nnnnmmmm0001
(Rm) → Sign extension →
Rn
1
—
MOV.L @Rm,Rn
0110nnnnmmmm0010
(Rm) → Rn
1
—
MOV.B Rm,@–Rn
0010nnnnmmmm0100
Rn–1 → Rn, Rm → (Rn)
1
—
MOV.W Rm,@–Rn
0010nnnnmmmm0101
Rn–2 → Rn, Rm → (Rn)
1
—
MOV.L Rm,@–Rn
0010nnnnmmmm0110
Rn–4 → Rn, Rm → (Rn)
1
—
MOV.B @Rm+,Rn
0110nnnnmmmm0100
(Rm) → Sign extension →
Rn,Rm + 1 → Rm
1
—
MOV.W @Rm+,Rn
0110nnnnmmmm0101
(Rm) → Sign extension →
Rn,Rm + 2 → Rm
1
—
MOV.L @Rm+,Rn
0110nnnnmmmm0110
(Rm) → Rn,Rm + 4 → Rm
1
—
MOV.B R0,@(disp,Rn)
10000000nnnndddd
R0 → (disp + Rn)
1
—
MOV.W R0,@(disp,Rn)
10000001nnnndddd
R0 → (disp × 2 + Rn)
1
—
MOV.L Rm,@(disp,Rn)
0001nnnnmmmmdddd
Rm → (disp × 4 + Rn)
1
—
MOV.B @(disp,Rm),R0
10000100mmmmdddd
(disp + Rm) → Sign
extension → R0
1
—
MOV.W @(disp,Rm),R0
10000101mmmmdddd
(disp × 2 + Rm) → Sign
extension → R0
1
—
MOV.L @(disp,Rm),Rn
0101nnnnmmmmdddd
(disp × 4 + Rm) → Rn
1
—
MOV.B Rm,@(R0,Rn)
0000nnnnmmmm0100
Rm → (R0 + Rn)
1
—
Rm,Rn
35
Table 6.3
Data Transfer Instructions (cont)
Instruction
Instruction Code
Operation
Execution
Cycles
MOV.W
Rm,@(R0,Rn)
0000nnnnmmmm0101
Rm → (R0 + Rn)
1
—
MOV.L
Rm,@(R0,Rn)
0000nnnnmmmm0110
Rm → (R0 + Rn)
1
—
MOV.B
@(R0,Rm),Rn
0000nnnnmmmm1100
(R0 + Rm) → Sign
extension → Rn
1
—
MOV.W
@(R0,Rm),Rn
0000nnnnmmmm1101
(R0 + Rm) → Sign
extension → Rn
1
—
MOV.L
@(R0,Rm),Rn
0000nnnnmmmm1110
(R0 + Rm) → Rn
1
—
MOV.B
R0,@(disp,GBR)
11000000dddddddd
R0 → (disp + GBR)
1
—
MOV.W
R0,@(disp,GBR)
11000001dddddddd
R0 → (disp × 2 + GBR)
1
—
MOV.L
R0,@(disp,GBR)
11000010dddddddd
R0 → (disp × 4 + GBR)
1
—
MOV.B
@(disp,GBR),R0
11000100dddddddd
(disp + GBR) → Sign
extension → R0
1
—
MOV.W
@(disp,GBR),R0
11000101dddddddd
(disp × 2 + GBR) → Sign
extension → R0
1
—
MOV.L
@(disp,GBR),R0
11000110dddddddd
(disp × 4 + GBR) → R0
1
—
MOVA
@(disp,PC),R0
11000111dddddddd
disp × 4 + PC → R0
1
—
MOVT
Rn
0000nnnn00101001
T → Rn
1
—
SWAP.B Rm,Rn
0110nnnnmmmm1000
Rm → Swap bottom two
bytes → Rn
1
—
SWAP.W Rm,Rn
0110nnnnmmmm1001
Rm → Swap two
consecutive words → Rn
1
—
XTRCT
0010nnnnmmmm1101
Rm: Middle 32 bits of
Rn → Rn
1
—
36
Rm,Rn
T
Bit
Table 6.4
Arithmetic Operation Instructions
Instruction
Instruction Code
Operation
Execution
Cycles
ADD
Rm,Rn
0011nnnnmmmm1100
Rn + Rm → Rn
1
—
ADD
#imm,Rn
0111nnnniiiiiiii
Rn + imm → Rn
1
—
ADDC
Rm,Rn
0011nnnnmmmm1110
Rn + Rm + T → Rn,
Carry → T
1
Carry
ADDV
Rm,Rn
0011nnnnmmmm1111
Rn + Rm → Rn,
Overflow → T
1
Overflow
CMP/EQ
#imm,R0
10001000iiiiiiii
If R0 = imm, 1 → T
1
Comparison
result
CMP/EQ
Rm,Rn
0011nnnnmmmm0000
If Rn = Rm, 1 → T
1
Comparison
result
CMP/HS
Rm,Rn
0011nnnnmmmm0010
If RnRm with unsigned
data, 1 → T
1
Comparison
result
CMP/GE
Rm,Rn
0011nnnnmmmm0011
If Rn Rm with signed
data, 1 → T
1
Comparison
result
CMP/HI
Rm,Rn
0011nnnnmmmm0110
If Rn > Rm with
unsigned data, 1 → T
1
Comparison
result
CMP/GT
Rm,Rn
0011nnnnmmmm0111
If Rn > Rm with signed
data, 1 → T
1
Comparison
result
CMP/PL
Rn
0100nnnn00010101
If Rn > 0, 1 → T
1
Comparison
result
CMP/PZ
Rn
0100nnnn00010001
If Rn 0, 1 → T
1
Comparison
result
CMP/STR Rm,Rn
0010nnnnmmmm1100
If Rn and Rm have
an equivalent byte,
1→T
1
Comparison
result
DIV1
Rm,Rn
0011nnnnmmmm0100
Single-step division
(Rn ÷ Rm)
1
Calculation
result
DIV0S
Rm,Rn
0010nnnnmmmm0111
MSB of Rn → Q, MSB
of Rm → M, M ^ Q → T
1
Calculation
result
0000000000011001
0 → M/Q/T
1
0
DIV0U
T Bit
37
Table 6.4
Arithmetic Operation Instructions (cont)
Execution
Cycles
T Bit
Instruction
Instruction Code
Operation
DMULS.L Rm,Rn
0011nnnnmmmm1101
Signed operation of Rn
× Rm → MACH, MACL
32 × 32 → 64 bits
2 to 4*
—
DMULU.L Rm,Rn
0011nnnnmmmm0101
Unsigned operation of
2 to 4*
Rn × Rm → MACH,
MACL 32 × 32 → 64 bits
—
DT
Rn
0100nnnn00010000
Rn – 1 → Rn, when Rn 1
is 0, 1 → T. When Rn is
nonzero, 0 → T
Comparison
result
EXTS.B
Rm,Rn
0110nnnnmmmm1110
Byte in Rm is signextended → Rn
1
—
EXTS.W
Rm,Rn
0110nnnnmmmm1111
Word in Rm is signextended → Rn
1
—
EXTU.B
Rm,Rn
0110nnnnmmmm1100
Byte in Rm is zeroextended → Rn
1
—
EXTU.W
Rm,Rn
0110nnnnmmmm1101
Word in Rm is zeroextended → Rn
1
—
MAC.L
@Rm+,@Rn+
0000nnnnmmmm1111
Signed operation of
(Rn) × (Rm) + MAC →
MAC 32 × 32 + 64 →
64 bits
3/(2 to
4)*
—
MAC.W
@Rm+,@Rn+
0100nnnnmmmm1111
Signed operation of
(Rn) × (Rm) + MAC →
MAC 16 × 16 + 64 →
64 bits
3/(2)*
—
MUL.L
Rm,Rn
0000nnnnmmmm0111
Rn × Rm → MACL,
32 × 32 → 32 bits
2 to 4*
—
MULS.W
Rm,Rn
0010nnnnmmmm1111
Signed operation of
Rn × Rm → MAC 16 ×
16 → 32 bits
1 to 3*
—
MULU.W
Rm,Rn
0010nnnnmmmm1110
Unsigned operation of
Rn × Rm → MAC 16 ×
16 → 32 bits
1 to 3*
—
NEG
Rm,Rn
0110nnnnmmmm1011
0 – Rm → Rn
1
—
NEGC
Rm,Rn
0110nnnnmmmm1010
0 – Rm – T → Rn,
Borrow → T
1
Borrow
38
Table 6.4
Arithmetic Operation Instructions (cont)
Instruction
Instruction Code
Operation
Execution
Cycles
SUB
Rm,Rn
0011nnnnmmmm1000
Rn – Rm → Rn
1
—
SUBC
Rm,Rn
0011nnnnmmmm1010
Rn – Rm – T → Rn,
Borrow → T
1
Borrow
SUBV
Rm,Rn
0011nnnnmmmm1011
Rn – Rm → Rn,
Underflow → T
1
Overflow
T Bit
Note: * The normal minimum number of execution cycles. (The number in parentheses is the
number of cycles when there is contention with following instructions.)
39
Table 6.5
Logic Operation Instructions
Instruction
Instruction Code
Operation
Execution
Cycles
AND
Rm,Rn
0010nnnnmmmm1001
Rn & Rm → Rn
1
—
AND
#imm,R0
11001001iiiiiiii
R0 & imm → R0
1
—
AND.B #imm,@(R0,GBR)
11001101iiiiiiii
(R0 + GBR) & imm →
(R0 + GBR)
3
—
NOT
Rm,Rn
0110nnnnmmmm0111
~Rm → Rn
1
—
OR
Rm,Rn
0010nnnnmmmm1011
Rn | Rm → Rn
1
—
OR
#imm,R0
11001011iiiiiiii
R0 | imm → R0
1
—
OR.B
#imm,@(R0,GBR)
11001111iiiiiiii
(R0 + GBR) | imm →
(R0 + GBR)
3
—
TAS.B @Rn
0100nnnn00011011
If (Rn) is 0, 1 → T; 1 →
MSB of (Rn)
4
Test
result
TST
Rm,Rn
0010nnnnmmmm1000
Rn & Rm; if the result is
0, 1 → T
1
Test
result
TST
#imm,R0
11001000iiiiiiii
R0 & imm; if the result is
0, 1 → T
1
Test
result
TST.B #imm,@(R0,GBR)
11001100iiiiiiii
(R0 + GBR) & imm; if the 3
result is 0, 1 → T
Test
result
XOR
Rm,Rn
0010nnnnmmmm1010
Rn ^ Rm → Rn
1
—
XOR
#imm,R0
11001010iiiiiiii
R0 ^ imm → R0
1
—
XOR.B #imm,@(R0,GBR)
11001110iiiiiiii
(R0 + GBR) ^ imm →
(R0 + GBR)
3
—
40
T Bit
Table 6.6
Shift Instructions
Instruction
Instruction Code
Operation
Execution
Cycles
ROTL
Rn
0100nnnn00000100
T ← Rn ← MSB
1
MSB
ROTR
Rn
0100nnnn00000101
LSB → Rn → T
1
LSB
ROTCL
Rn
0100nnnn00100100
T ← Rn ← T
1
MSB
ROTCR
Rn
0100nnnn00100101
T → Rn → T
1
LSB
SHAL
Rn
0100nnnn00100000
T ← Rn ← 0
1
MSB
SHAR
Rn
0100nnnn00100001
MSB → Rn → T
1
LSB
SHLL
Rn
0100nnnn00000000
T ← Rn ← 0
1
MSB
SHLR
Rn
0100nnnn00000001
0 → Rn → T
1
LSB
SHLL2
Rn
0100nnnn00001000
Rn<<2 → Rn
1
—
SHLR2
Rn
0100nnnn00001001
Rn>>2 → Rn
1
—
SHLL8
Rn
0100nnnn00011000
Rn<<8 → Rn
1
—
SHLR8
Rn
0100nnnn00011001
Rn>>8 → Rn
1
—
SHLL16
Rn
0100nnnn00101000
Rn<<16 → Rn
1
—
SHLR16
Rn
0100nnnn00101001
Rn>>16 → Rn
1
—
T Bit
41
Table 6.7
Branch Instructions
Execution
Cycles
T Bit
Instruction
Instruction Code
Operation
BF
label
10001011dddddddd
If T = 0, disp × 2 + PC → PC; if T =
1, nop
3/1*
—
BF/S label
10001111dddddddd
Delayed branch, if T = 0, disp × 2 +
PC → PC; if T = 1, nop
3/1*
—
BT
label
10001001dddddddd
If T = 1, disp × 2 + PC → PC; if T =
0, nop
3/1*
—
BT/S label
10001101dddddddd
Delayed branch, if T = 1, disp × 2 +
PC → PC; if T = 0, nop
2/1*
—
BRA
1010dddddddddddd
Delayed branch, disp × 2 + PC →
PC
2
—
BRAF Rm
0000mmmm00100011
Delayed branch, Rm + PC → PC
2
—
BSR
1011dddddddddddd
Delayed branch, PC → PR, disp × 2
+ PC → PC
2
—
BSRF Rm
0000mmmm00000011
Delayed branch, PC → PR,
Rm + PC → PC
2
—
JMP
@Rm
0100mmmm00101011
Delayed branch, Rm → PC
2
—
JSR
@Rm
0100mmmm00001011
Delayed branch, PC → PR,
Rm → PC
2
—
0000000000001011
Delayed branch, PR → PC
2
—
RTS
label
label
Note: * One state when the program does not branch.
42
Table 6.8
System Control Instructions
Instruction
Instruction Code
Operation
Execution
Cycles
CLRT
0000000000001000
0→T
1
0
CLRMAC
0000000000101000
0 → MACH, MACL
1
—
T Bit
LDC
Rm,SR
0100mmmm00001110
Rm → SR
1
LSB
LDC
Rm,GBR
0100mmmm00011110
Rm → GBR
1
—
LDC
Rm,VBR
0100mmmm00101110
Rm → VBR
1
—
LDC.L @Rm+,SR
0100mmmm00000111
(Rm) → SR, Rm + 4 → Rm
3
LSB
LDC.L @Rm+,GBR
0100mmmm00010111
(Rm) → GBR, Rm + 4 → Rm
3
—
LDC.L @Rm+,VBR
0100mmmm00100111
(Rm) → VBR, Rm + 4 → Rm
3
—
LDS
Rm,MACH
0100mmmm00001010
Rm → MACH
1
—
LDS
Rm,MACL
0100mmmm00011010
Rm → MACL
1
—
LDS
Rm,PR
0100mmmm00101010
Rm → PR
1
—
LDS.L @Rm+,MACH
0100mmmm00000110
(Rm) → MACH, Rm + 4 → Rm 1
—
LDS.L @Rm+,MACL
0100mmmm00010110
(Rm) → MACL, Rm + 4 → Rm
1
—
LDS.L @Rm+,PR
0100mmmm00100110
(Rm) → PR, Rm + 4 → Rm
1
—
NOP
0000000000001001
No operation
1
—
RTE
0000000000101011
Delayed branch, stack area
→ PC/SR
4
—
SETT
0000000000011000
1→T
1
1
SLEEP
0000000000011011
Sleep
3*
—
STC
SR,Rn
0000nnnn00000010
SR → Rn
1
—
STC
GBR,Rn
0000nnnn00010010
GBR → Rn
1
—
STC
VBR,Rn
0000nnnn00100010
VBR → Rn
1
—
STC.L
SR,@–Rn
0100nnnn00000011
Rn – 4 → Rn, SR → (Rn)
2
—
STC.L
GBR,@–Rn
0100nnnn00010011
Rn – 4 → Rn, GBR → (Rn)
2
—
STC.L
VBR,@–Rn
0100nnnn00100011
Rn – 4 → Rn, BR → (Rn)
2
—
STS
MACH,Rn
0000nnnn00001010
MACH → Rn
1
—
STS
MACL,Rn
0000nnnn00011010
MACL → Rn
1
—
STS
PR,Rn
0000nnnn00101010
PR → Rn
1
—
43
Table 6.8
System Control Instructions (cont)
Instruction
Instruction Code
Operation
Execution
Cycles
STS.L
MACH,@–Rn
0100nnnn00000010
Rn – 4 → Rn, MACH → (Rn)
1
—
STS.L
MACL,@–Rn
0100nnnn00010010
Rn – 4 → Rn, MACL → (Rn)
1
—
STS.L
PR,@–Rn
0100nnnn00100010
Rn – 4 → Rn, PR → (Rn)
1
—
TRAPA
#imm
11000011iiiiiiii
PC/SR → stack area, imm × 4
+ VBR → PC
8
—
T Bit
Note: * The number of execution cycles before the chip enters sleep mode: The execution cycles
shown in the table are minimums. The actual number of cycles may be increased when (1)
contention occurs between instruction fetches and data access, or (2) when the destination
register of the load instruction (memory → register) and the register used by the next
instruction are the same.
44
Table 6.9
Floating-Point Instructions
Instruction
Instruction Code
Operation
Execution
Cycles T Bit
FABS
FRn
1111nnnn01011101
|FRn| → FRn
1
—
FADD
FRm,FRn
1111nnnnmmmm0000
FRn + FRm → FRn
1
—
FCMP/EQ FRm,FRn
1111nnnnmmmm0100
(FRn = FRm)? 1:0 → T 1
Comparison
result
FCMP/GT FRm,FRn
1111nnnnmmmm0101
(FRn > FRm)? 1:0 → T 1
Comparison
result
FDIV
FRm,FRn
1111nnnnmmmm0011
FRn/FRm → FRn
13
—
FLDI0
FRn
1111nnnn10001101
0x00000000 → FRn
1
—
FLDI1
FRn
1111nnnn10011101
0x3F800000 → FRn
1
—
FLDS
FRm,FPUL
1111mmmm00011101
FRm → FPUL
1
—
FLOAT
FPUL,FRn
1111nnnn00101101
(float) FPUL → FRn
1
—
FMAC
FR0,FRm,FRn
1111nnnnmmmm1110
FR0 × FRm + FRn →
FRn
1
—
FMOV
FRm, FRn
1111nnnnmmmm1100
FRm → FRn
1
—
FMOV.S
@(R0,Rm),FRn
1111nnnnmmmm0110
(R0 + Rm) → FRn
1
—
FMOV.S
@Rm+,FRn
1111nnnnmmmm1001
(Rm) → FRn, Rm+ = 4
1
—
FMOV.S
@Rm,FRn
1111nnnnmmmm1000
(Rm) → FRn
1
—
FMOV.S
FRm,@(R0,Rn)
1111nnnnmmmm0111
FRm → (R0 + Rn)
1
—
FMOV.S
FRm,@-Rn
1111nnnnmmmm1011
Rn– = 4, FRm → (Rn)
1
—
FMOV.S
FRm,@Rn
1111nnnnmmmm1010
FRm → (Rn)
1
—
FMUL
FRm,FRn
1111nnnnmmmm0010
FRn × FRm → FRn
1
—
FNEG
FRn
1111nnnn01001101
–FRn → FRn
1
—
FSTS
FPUL,FRn
1111nnnn00001101
FPUL → FRn
1
—
FSUB
FRm,FRn
1111nnnnmmmm0001
FRn – FRm → FRn
1
—
FTRC
FRm,FPUL
1111nnnn00111101
(long) FRm → FPUL
1
—
45
Table 6.10 FPU-Related CPU Instructions
Instruction
Instruction Code
Operation
Execution
Cycles
LDS
Rm,FPSCR
0100mmmm01101010
Rm → FPSCR
1
—
LDS
Rm,FPUL
0100mmmm01011010
Rm → FPUL
1
—
LDS.L
@Rm+, FPSCR
0100mmmm01100110
@Rm → FPSCR, Rm+ = 4
1
—
LDS.L
@Rm+, FPUL
0100mmmm01010110
@Rm → FPUL, Rm+ = 4
1
—
STS
FPSCR, Rn
0000nnnn01101010
FPSCR → Rn
1
—
STS
FPUL,Rn
0000nnnn01011010
FPUL → Rn
1
—
STS.L
FPSCR,@-Rn
0100nnnn01100010
Rn– = 4, FPCSR → @Rn
1
—
STS.L
FPUL,@-Rn
0100nnnn01010010
Rn– = 4, FPUL → @Rn
1
—
46
T Bit
6.2
Instruction Set in Alphabetical Order
Table 6-11 alphabetically lists the instruction codes and number of execution cycles for each
instruction.
Table 6-11 Instruction Set Listed Alphabetically
Instruction
Operation
Code
Cycles
T Bit
ADD
#imm,Rn
Rn + imm → Rn
0111nnnniiiiiiii
1
—
ADD
Rm,Rn
Rn + Rm → Rn
0011nnnnmmmm1100
1
—
ADDC
Rm,Rn
Rn + Rm + T → Rn,
Carry → T
0011nnnnmmmm1110
1
Carry
ADDV
Rm,Rn
Rn + Rm → Rn,
Overflow → T
0011nnnnmmmm1111
1
Over
-flow
AND
#imm,R0
R0 & imm → R0
11001001iiiiiiii
1
—
AND
Rm,Rn
Rn & Rm → Rn
0010nnnnmmmm1001
1
—
AND.B
#imm,@(R0,GBR)
(R0 + GBR) & imm →
(R0 + GBR)
11001101iiiiiiii
3
—
BF
label
If T = 0, disp + PC →
PC; if T = 1, nop
10001011dddddddd
3/1*1
—
BF/S
label
If T = 0, disp + PC →
PC; if T = 1, nop
10001111dddddddd
2/1*1
—
BRA
label
Delayed branch, disp +
PC → PC
1010dddddddddddd
2
—
BRAF
Rn
Delayed branch, Rn +
PC → PC
0000nnnn00100011
2
—
BSR
label
Delayed branch, PC →
PR, disp + PC → PC
1011dddddddddddd
2
—
BSRF
Rn
Delayed branch, PC →
PR, Rn + PC → PC
0000nnnn00000011
2
—
BT
label
If T = 1, disp + PC →
PC; if T = 0, nop
10001001dddddddd
3/1*1
—
BT/S
label
If T = 1, disp + PC →
PC; if T = 0, nop
10001101dddddddd
2/1*1
—
CLRMAC
0 → MACH, MACL
0000000000101000
1
—
CLRT
0→T
0000000000001000
1
0
47
Table 6-11 Instruction Set Listed Alphabetically (cont)
Instruction
Operation
Code
Cycles
T Bit
CMP/EQ
#imm,R0
If R0 = imm, 1 → T
10001000iiiiiiii
1
Comparison
result
CMP/EQ
Rm,Rn
If Rn = Rm, 1 → T
0011nnnnmmmm0000
1
Comparison
result
CMP/GE
Rm,Rn
If Rn Rm with signed
data, 1 → T
0011nnnnmmmm0011
1
Comparison
result
CMP/GT
Rm,Rn
If Rn > Rm with signed
data, 1 → T
0011nnnnmmmm0111
1
Comparison
result
CMP/HI
Rm,Rn
If Rn > Rm with
unsigned data,
0011nnnnmmmm0110
1
Comparison
result
CMP/HS
Rm,Rn
If Rn Rm with
unsigned data, 1 → T
0011nnnnmmmm0010
1
Comparison
result
CMP/PL
Rn
If Rn>0, 1 → T
0100nnnn00010101
1
Comparison
result
CMP/PZ
Rn
If Rn 0, 1 → T
0100nnnn00010001
1
Comparison
result
CMP/STR
Rm,Rn
If Rn and Rm have an
equivalent byte, 1 → T
0010nnnnmmmm1100
1
Comparison
result
DIV0S
Rm,Rn
MSB of Rn → Q, MSB 0010nnnnmmmm0111
of Rm → M, M ^ Q → T
1
Calculation
result
0 → M/Q/T
0000000000011001
1
0
DIV0U
DIV1
Rm,Rn
Single-step division
(Rn/Rm)
0011nnnnmmmm0100
1
Calculation
result
DMULS.L
Rm,Rn
Signed operation of Rn
× Rm → MACH, MACL
0011nnnnmmmm1101
2 to 4*2
—
DMULU.L
Rm,Rn
Unsigned operation of
Rn × Rm → MACH,
MACL
0011nnnnmmmm0101
2 to 4*2
—
DT
Rn
Rn - 1 → Rn, when Rn
is 0, 1 → T. When Rn
is nonzero, 0 → T
0100nnnn00010000
1
Comparison
result
EXTS.B
Rm,Rn
A byte in Rm is signextended → Rn
0110nnnnmmmm1110
1
—
48
Table 6-11 Instruction Set Listed Alphabetically (cont)
Instruction
Operation
Code
Cycles T Bit
EXTS.W
Rm,Rn
A word in Rm is signextended → Rn
0110nnnnmmmm1111
1
—
EXTU.B
Rm,Rn
A byte in Rm is zeroextended → Rn
0110nnnnmmmm1100
1
—
EXTU.W
Rm,Rn
A word in Rm is zero- 0110nnnnmmmm1101
extended → Rn
1
—
FABS
FRn
| FRn | → FRn
1111nnnn01011101
1
—
FADD
FRm ,FRn
FRn + FRm → FRn
1111nnnnmmmm0000
1
—
FCMP/EQ FRm ,FRn
(FRn == FRm)?
1:0 → T
1111nnnnmmmm0100
1
Comparison
result
FCMP/GT FRm ,FRn
(FRn > FRm) ?
1:0 → T
1111nnnnmmmm0101
1
Comparison
result
FDIV
FRm ,FRn
FRn /FRm → FRn
1111nnnnmmmm0011
13
—
FLDI0
FRn
H'00000000 → FRn
1111nnnn10001101
1
—
FLDI1
FRn
H'3F800000 → FRn
1111nnnn10011101
1
—
FLDS
FRm ,FPUL
FRm → FPUL
1111mmmm00011101
1
—
FLOAT
FPUL, FRn
(float)FPUL → FRn
1111nnnn00101101
1
—
FMAC
FR0,FRm,FRn
FR0 × FRm + FRn →
FRn
1111nnnnmmmm1110
1
—
FMOV
FRm ,FRn
FRm → FRn
1111nnnnmmmm1100
1
—
FMOV.S
@(R0,Rm),FRn
(R0 + Rm) → FRn
1111nnnnmmmm0110
1
—
FMOV.S
@Rm+,FRn
(Rm) → FRn,Rm + 4
= Rm
1111nnnnmmmm1001
1
—
FMOV.S
@Rm,FRn
(Rm) → FRn
1111nnnnmmmm1000
1
—
FMOV.S
FRm,@(R0,Rn)
(FRm) → (R0 + Rn)
1111nnnnmmmm0111
1
—
FMOV.S
FRm,@-Rn
Rn-4 → Rn, FRm →
(Rn)
1111nnnnmmmm1011
1
—
FMOV.S
FRm,@Rn
FRm → (Rn)
1111nnnnmmmm1010
1
—
FMOV.S
FRm,FRn
FRn × FRm → FRn
1111nnnnmmmm0010
1
—
FMUL
FRm,FRn
FRn × FRm → FRn
1111nnnnmmmm0010
1
—
FNEG
FRn
–FRn → FRn
1111nnnn01001101
1
—
FSTS
FPUL,FRn
FPUL → FRn
1111nnnn00001101
1
—
FSUB
FRm,FRn
FRn – FRm → FRn
1111nnnnmmmm0001
1
—
FTRC
FRm,FPUL
(long)FRm → FPUL
1111mmmm00111101
1
—
49
Table 6-11 Instruction Set Listed Alphabetically (cont)
Instruction
Operation
Code
Cycles T Bit
JMP
@Rm
Delayed branch, Rm → PC
0100nnnn00101011
2
—
JSR
@Rm
Delayed branch, PC → PR,
Rm → PC
0100nnnn00001011
2
—
LDC
Rm,GBR
Rm → GBR
0100mmmm00011110
1
—
LDC
Rm,SR
Rm → SR
0100mmmm00001110
1
LSB
LDC
Rm,VBR
Rm → VBR
0100mmmm00101110
1
—
LDC.L @Rm+,GBR
(Rm) → GBR, Rm + 4 → Rm
0100mmmm00010111
3
—
LDC.L @Rm+,SR
(Rm) → SR, Rm + 4 → Rm
0100mmmm00000111
3
LSB
LDC.L @Rm+,VBR
(Rm) → VBR, Rm + 4 → Rm
0100mmmm00100111
3
—
LDS
Rm,FPSCR
Rm → FPSCR
0100mmmm01101010
1
—
LDS
Rm,FPUL
Rm → FPUL
0100mmmm01011010
1
—
LDS
Rm,MACH
Rm → MACH
0100mmmm00001010
1
—
LDS
Rm,MACL
Rm → MACL
0100mmmm00011010
1
—
LDS
Rm,PR
Rm → PR
0100mmmm00101010
1
—
LDS.L @Rm+,FPSCR
@Rm → FPSCR ,
Rm+4
0100mmmm01100110
1
—
LDS.L @Rm+,FPUL
@Rm → FPUL ,
Rm+4
0100mmmm01010110
1
—
LDS.L @Rm+,MACH
(Rm) → MACH,
Rm + 4 → Rm
0100mmmm00000110
1
—
LDS.L @Rm+,MACL
(Rm) → MACL,
Rm + 4 → Rm
0100mmmm00010110
1
—
LDS.L @Rm+,PR
(Rm) → PR,
Rm + 4 → Rm
0100mmmm00100110
1
—
MAC.L @Rm+,@Rn+
Signed operation of (Rn) ×
(Rm) + MAC → MAC
0000nnnnmmmm1111
3/(2 to
4)*2
—
MAC.W @Rm+,@Rn+
Signed operation of (Rn) ×
(Rm) + MAC → MAC
0100nnnnmmmm1111
3/ (2)*2
—
MOV
#imm,Rn
imm → Sign extension → Rn
1110nnnniiiiiiii
1
—
MOV
Rm,Rn
Rm → Rn
0110nnnnmmmm0011
1
—
MOV.B @(disp,GBR),
R0
(disp + GBR) → Sign extension 11000100dddddddd
→ R0
1
—
MOV.B @(disp,Rm),
R0
(disp + Rm) → Sign extension
→ R0
1
—
50
10000100mmmmdddd
Table 6-11 Instruction Set Listed Alphabetically (cont)
Instruction
Operation
Code
Cycles
T Bit
MOV.B
@(R0,Rm),Rn
(R0 + Rm) → Sign
extension → Rn
0000nnnnmmmm1100
1
—
MOV.B
@Rm+,Rn
(Rm) → Sign extension
→ Rn, Rm + 1 → Rm
0110nnnnmmmm0100
1
—
MOV.B
@Rm,Rn
(Rm) → Sign extension
→ Rn
0110nnnnmmmm0000
1
—
MOV.B
R0,@(disp,GBR)
R0 → (disp + GBR)
11000000dddddddd
1
—
MOV.B
R0,@(disp,Rn)
R0 → (disp + Rn)
10000000nnnndddd
1
—
MOV.B
Rm,@(R0,Rn)
Rm → (R0 + Rn)
0000nnnnmmmm0100
1
—
MOV.B
Rm,@–Rn
Rn–1 → Rn, Rm → (Rn)
0010nnnnmmmm0100
1
—
MOV.B
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0000
1
—
MOV.L
@(disp,GBR),R0
(disp × 4 + GBR) → R0
11000110dddddddd
1
—
MOV.L
@(disp,PC),Rn
(disp × 4 + PC) → Rn
1101nnnndddddddd
1
—
MOV.L
@(disp,Rm),Rn
(disp × 4 + Rm) → Rn
0101nnnnmmmmdddd
1
—
MOV.L
@(R0,Rm),Rn
(R0 + Rm) → Rn
0000nnnnmmmm1110
1
—
MOV.L
@Rm+,Rn
(Rm) → Rn,
Rm + 4 → Rm
0110nnnnmmmm0110
1
—
MOV.L
@Rm,Rn
(Rm) → Rn
0110nnnnmmmm0010
1
—
MOV.L
R0,@(disp,GBR)
R0 → (disp × 4 + GBR)
11000010dddddddd
1
—
MOV.L
Rm,@(disp,Rn)
Rm → (disp × 4 + Rn)
0001nnnnmmmmdddd
1
—
MOV.L
Rm,@(R0,Rn)
Rm → (R0 × 4 + Rn)
0000nnnnmmmm0110
1
—
MOV.L
Rm,@–Rn
Rn–4 → Rn, Rm → (Rn)
0010nnnnmmmm0110
1
—
MOV.L
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0010
1
—
MOV.W
@(disp,GBR),R0
(disp × 2 + GBR) →
Sign extension → R0
11000101dddddddd
1
—
MOV.W
@(disp,PC),Rn
(disp × 2 + PC) → Sign
extension → Rn
1001nnnndddddddd
1
—
MOV.W
@(disp,Rm),
R0
(disp × 2 + Rm) → Sign
extension → R0
10000101mmmmdddd
1
—
MOV.W
@(R0,Rm),Rn
(R0 + Rm) → Sign
extension → Rn
0000nnnnmmmm1101
1
—
MOV.W
@Rm+,Rn
(Rm) → Sign extension
→ Rn, Rm + 2 → Rm
0110nnnnmmmm0101
1
—
51
Table 6-11 Instruction Set Listed Alphabetically (cont)
Instruction
Operation
Code
Cycles
T Bit
MOV.W
@Rm,Rn
(Rm) → Sign extension
→ Rn
0110nnnnmmmm0001
1
—
MOV.W
R0,
@(disp,GBR)
R0 → (disp × 2 + GBR)
11000001dddddddd
1
—
MOV.W
R0,
@(disp,Rn)
R0 → (disp × 2 + Rn)
10000001nnnndddd
1
—
MOV.W
Rm,@(R0,Rn)
Rm → (R0 + Rn)
0000nnnnmmmm0101
1
—
MOV.W
Rm,@–Rn
Rn–2 → Rn, Rm → (Rn)
0010nnnnmmmm0101
1
—
MOV.W
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0001
1
—
MOVA
@(disp,PC),
R0
disp × 4 + PC → R0
11000111dddddddd
1
—
MOVT
Rn
T → Rn
0000nnnn00101001
1
Rm,Rn
Rn × Rm → MAC
MUL.L
0000nnnnmmmm0111
—
2 to
4*2
—
3*2
—
MULS.W Rm,Rn
Signed operation of Rn × 0010nnnnmmmm1111
Rm → MACL
1 to
MULU.W Rm,Rn
Unsigned operation of
Rn × Rm → MACL
0010nnnnmmmm1110
1 to 3*2
—
NEG
Rm,Rn
0–Rm → Rn
0110nnnnmmmm1011
1
—
NEGC
Rm,Rn
0–Rm–T → Rn, Borrow
→T
0110nnnnmmmm1010
1
Borrow
No operation
0000000000001001
1
—
NOP
NOT
Rm,Rn
~Rm → Rn
0110nnnnmmmm0111
1
—
OR
#imm,R0
R0 | imm → R0
11001011iiiiiiii
1
—
OR
Rm,Rn
Rn | Rm → Rn
0010nnnnmmmm1011
1
—
OR.B
#imm,
@(R0,GBR)
(R0 + GBR) | imm →
(R0 + GBR)
11001111iiiiiiii
3
—
ROTCL
Rn
T ← Rn ← T
0100nnnn00100100
1
MSB
ROTCR
Rn
T → Rn → T
0100nnnn00100101
1
LSB
ROTL
Rn
T ← Rn ← MSB
0100nnnn00000100
1
MSB
ROTR
Rn
LSB → Rn → T
0100nnnn00000101
1
LSB
RTE
Delayed branch,
SSR/SPC → SR/PC
0000000000101011
4
LSB
RTS
Delayed branch, PR →
PC
0000000000001011
2
—
SETT
1→T
0000000000011000
1
1
52
Table 6-11 Instruction Set Listed Alphabetically (cont)
Instruction
Operation
Code
Cycles
T Bit
SHAL
Rn
T ← Rn ← 0
0100nnnn00100000
1
MSB
SHAR
Rn
MSB → Rn → T
0100nnnn00100001
1
LSB
SHLL
Rn
T ← Rn ← 0
0100nnnn00000000
1
MSB
SHLL2
Rn
Rn << 2 → Rn
0100nnnn00001000
1
—
SHLL8
Rn
Rn << 8 → Rn
0100nnnn00011000
1
—
SHLL16 Rn
Rn << 16 → Rn
0100nnnn00101000
1
—
SHLR
Rn
0 → Rn → T
0100nnnn00000001
1
LSB
SHLR2
Rn
Rn>>2 → Rn
0100nnnn00001001
1
—
SHLR8
Rn
Rn>>8 → Rn
0100nnnn00011001
1
—
SHLR16
Rn
Rn>>16 → Rn
0100nnnn00101001
1
—
Sleep
0000000000011011
3
—
SLEEP
STC
GBR,Rn
GBR → Rn
0000nnnn00010010
1
—
STC
SR,Rn
SR → Rn
0000nnnn00000010
1
—
STC
VBR,Rn
VBR → Rn
0000nnnn00100010
1
—
STC.L
GBR,@–Rn
Rn–4 → Rn,
GBR → (Rn)
0100nnnn00010011
2
—
STC.L
SR,@–Rn
Rn–4 → Rn, SR → (Rn)
0100nnnn00000011
2
—
STC.L
VBR,@–Rn
Rn–4 → Rn,
VBR → (Rn)
0100nnnn00100011
2
—
STS
FPSCR, Rn
FPSCR → Rn
0000nnnn01101010
1
—
STS
FPUL, Rn
FPUL → Rn
0000nnnn01011010
1
—
STS
MACH,Rn
MACH → Rn
0000nnnn00001010
1
—
STS
MACL,Rn
MACL → Rn
0000nnnn00011010
1
—
STS
PR,Rn
PR → Rn
0000nnnn00101010
1
—
STS.L
FPSCR,@-Rn
Rn-4 → Rn,
FPSCR → @Rn
0100nnnn01100010
1
—
STS.L
FPUL,@-Rn
Rn-4 → Rn,
FPUL → @Rn
0100nnnn01010010
1
—
STS.L
MACH,@–Rn
Rn–4 → Rn, MACH →
(Rn)
0100nnnn00000010
1
—
STS.L
MACL,@–Rn
Rn–4 → Rn, MACL →
(Rn)
0100nnnn00010010
1
—
STS.L
PR,@–Rn
Rn–4 → Rn, PR → (Rn)
0100nnnn00100010
1
—
53
Table 6-11 Instruction Set Listed Alphabetically (cont)
Instruction
Operation
Code
Cycles T Bit
SUB
Rm,Rn
Rn–Rm → Rn
0011nnnnmmmm1000
1
—
SUBC
Rm,Rn
Rn–Rm–T → Rn,
Borrow → T
0011nnnnmmmm1010
1
Borrow
SUBV
Rm,Rn
Rn–Rm → Rn, Underflow
→T
0011nnnnmmmm1011
1
Underflow
SWAP.B
Rm,Rn
Rm → Swap the two
lowest-order bytes → Rn
0110nnnnmmmm1000
1
—
SWAP.W
Rm,Rn
Rm → Swap two
consecutive words → Rn
0110nnnnmmmm1001
1
—
TAS.B
@Rn
If (Rn) is 0, 1 → T; 1 →
MSB of (Rn)
0100nnnn00011011
4
Test result
TST
#imm,R0
R0 & imm; if the result is 0, 11001000iiiiiiii
1→T
1
Test result
TST
Rm,Rn
Rn & Rm; if the result is 0,
1→T
0010nnnnmmmm1000
1
Test result
TST.B
#imm,
@(R0,GBR)
(R0 + GBR) & imm;
if the result is 0, 1 → T
11001100iiiiiiii
3
Test result
XOR
#imm,R0
R0 ^ imm → R0
11001010iiiiiiii
1
—
XOR
Rm,Rn
Rn ^ Rm → Rn
0010nnnnmmmm1010
1
—
XOR.B
#imm,
@(R0,GBR)
(R0 + GBR) ^ imm → (R0
+ GBR)
11001110iiiiiiii
3
—
XTRCT
Rm,Rn
Rm: Middle 32 bits of Rn
→ Rn
0010nnnnmmmm1101
1
—
Notes: 1. The normal minimum number of execution cycles.
2. One state when it does not branch.
54
Section 7 Instruction Descriptions
7.1
Sample Description (Name): Classification
This section describes instructions in alphabetical order using the format shown below in section
7.1.1. The actual descriptions begin at section 7.2.2.
Class: Indicates if the instruction is a delayed branch instruction or interrupt disabled instruction
Format
Abstract
Code
Cycle
T Bit
Assembler input
format; imm and disp
are numbers,
expressions, or
symbols
A brief description of
operation
Displayed in
order MSB ↔
LSB
Number of
cycles when
there is no
wait state
The value of
T bit after the
instruction is
executed
Description: Description of operation
Notes: Notes on using the instruction
Operation: Operation written in C language. The following resources should be used.
• Reads data of each length from address Addr. An address error will occur if word data is read
from an address other than 2n or if longword data is read from an address other than 4n:
unsigned char
Read_Byte(unsigned long Addr);
unsigned short
Read_Word(unsigned long Addr);
unsigned long
Read_Long(unsigned long Addr);
• Writes data of each length to address Addr. An address error will occur if word data is written
to an address other than 2n or if longword data is written to an address other than 4n:
unsigned char
Write_Byte(unsigned long Addr, unsigned long Data);
unsigned short
Write_Word(unsigned long Addr, unsigned long Data);
unsigned long
Write_Long(unsigned long Addr, unsigned long Data);
• Starts execution from the slot instruction located at an address (Addr – 4). For Delay_Slot (4),
execution starts from an instruction at address 0 rather than address 4. When execution moves
from this function to one of the following instructions and one of the listed instructions
precedes it, it will be considered an illegal slot instruction (the listed instructions become
illegal slot instructions when used as delay slot instructions):
BF, BT, BRA, BSR, JMP, JSR, RTS, RTE, TRAPA, BF/S, BT/S, BRAF, BSRF
55
Delay_Slot(unsigned long Addr);
If the address (Addr_4) instruction is 32-bit, 2 is returned; 0 is returned if it is 16-bit.
• List registers:
unsigned long R[16];
unsigned long SR,GBR,VBR;
unsigned long MACH,MACL,PR;
unsigned long PC;
• Definition of SR structures:
struct SR0 {
unsigned long dummy0:4;
unsigned long RC0:12;
unsigned long dummy1:4;
unsigned long DMY0:1;
unsigned long DMX0:1;
unsigned long M0:1;
unsigned long Q0:1;
unsigned long I0:4;
unsigned long RF10:1;
unsigned long RF00:1;
unsigned long S0:1;
unsigned long T0:1;
};
• Definition of bits in SR:
#define M ((*(struct SR0 *)(&SR)).M0)
#define Q ((*(struct SR0 *)(&SR)).Q0)
#define S ((*(struct SR0 *)(&SR)).S0)
#define T ((*(struct SR0 *)(&SR)).T0)
#define RF1 ((*struct SRO *)(&SR)).RF10)
#define RF0 ((*struct SRO *)(&SR)).RF00)
• Error display function:
Error( char *er );
56
The PC should point to the location four bytes after the current instruction. Therefore, PC = 4;
means the instruction starts execution from address 0, not address 4.
Examples: Examples are written in assembler mnemonics and describe status before and after
executing the instruction. Characters in italics such as .align are assembler control instructions
(listed below). For more information, see the Cross Assembler User Manual.
.org
.data.w
.data.l
.sdata
.align 2
.align 4
.arepeat 16
.arepeat 32
.aendr
Location counter set
Securing integer word data
Securing integer longword data
Securing string data
2-byte boundary alignment
2-byte boundary alignment
16-repeat expansion
32-repeat expansion
End of repeat expansion of specified number
Note that the SH series cross assembler version 1.0 does not support the conditional assembler
functions.
Notes: 1. In addressing modes that use the displacements listed below (disp), the assembler
statements in this manual show the value prior to scaling (×1, ×2, and ×4) according to
the operand size. This is done to clarify the LSI operation. Actual assembler statements
should follow the rules of the assembler in question.
@(disp:4, Rn); Indirect register addressing with displacement
@(disp:8, GBR); Indirect GBR addressing with displacement
@(disp:8, PC); Indirect PC addressing with displacement
disp:8, disp:12:; PC relative addressing
2. 16-bit instruction code that is not assigned as instructions is handled as an ordinary
illegal instruction and produces illegal instruction exception processing.
Also, if the FPU is put into stop status by the module stop bit, floating-point
instructions and FPU-related CPU instructions are handled as illegal instructions.
3. An ordinary illegal instruction or branched instruction (i.e., an illegal slot instruction)
that follows a BRA, BT/S or another delayed branch instruction will cause illegal
instruction exception processing.
Example 1:
....
BRA
.data.w
....
LABEL
H'FFFF
← Illegal slot instruction
[H'FFFF is an ordinary illegal instruction from the start]
Example 2:
RTE
BT/S
LABEL
← Illegal slot instruction
57
7.2
CPU Instruction
7.2.1
ADD (ADD Binary): Arithmetic Instruction
Format
Abstract
Code
Cycle
T Bit
ADD Rm,Rn
Rm + Rn → Rn
0011nnnnmmmm1100
1
—
ADD #imm,Rn
Rn + imm → Rn
0111nnnniiiiiiii
1
—
Description: Adds general register Rn data to Rm data, and stores the result in Rn. 8-bit
immediate data can be added instead of Rm data. Since the 8-bit immediate data is sign-extended
to 32 bits, this instruction can add and subtract immediate data.
Operation:
ADD(long m,long n)
/* ADD Rm,Rn */
{
R[n]+=R[m];
PC+=2;
}
ADDI(long i,long n)
/* ADD #imm,Rn */
{
if ((i&0x80)==0) R[n]+=(0x000000FF & (long)i);
else R[n]+=(0xFFFFFF00 | (long)i);
PC+=2;
}
Examples:
ADD
R0,R1
;Before execution: R0 = H'7FFFFFFF, R1 = H'00000001
;After execution:
ADD
#H'01,R2
;Before execution: R2 = H'00000000
;After execution:
ADD
#H'FE,R3
R2 = H'00000001
;Before execution: R3 = H'00000001
;After execution:
58
R1 = H'80000000
R3 = H'FFFFFFFF
7.2.2
ADDC (ADD with Carry): Arithmetic Instruction
Format
ADDC
Rm,Rn
Abstract
Code
Cycle
T Bit
Rn + Rm + T → Rn, carry → T
0011nnnnmmmm1110
1
Carry
Description: Adds Rm data and the T bit to general register Rn data, and stores the result in Rn.
The T bit changes according to the result. This instruction can add data that has more than 32 bits.
Operation:
ADDC (long m,long n)
/* ADDC Rm,Rn */
{
unsigned long tmp0,tmp1;
tmp1=R[n]+R[m];
tmp0=R[n];
R[n]=tmp1+T;
if (tmp0>tmp1) T=1;
else T=0;
if (tmp1>R[n]) T=1;
PC+=2;
}
Examples:
;R0:R1 (64 bits) + R2:R3 (64 bits) = R0:R1 (64 bits)
CLRT
ADDC
R3,R1
;Before execution:
;After execution:
ADDC
R2,R0
;Before execution:
;After execution:
T = 0, R1 = H'00000001, R3 = H'FFFFFFFF
T = 1, R1 = H'0000000
T = 1, R0 = H'00000000, R2 = H'00000000
T = 0, R0 = H'00000001
59
7.2.3
ADDV (ADD with V Flag Overflow Check): Arithmetic Instruction
Format
Abstract
Code
Cycle
T Bit
ADDV Rm,Rn
Rn + Rm → Rn, overflow → T
0011nnnnmmmm1111
1
Overflow
Description: Adds general register Rn data to Rm data, and stores the result in Rn. If an overflow
occurs, the T bit is set to 1.
Operation:
ADDV(long m,long n)
/*ADDV Rm,Rn */
{
long dest,src,ans;
if ((long)R[n]>=0) dest=0;
else dest=1;
if ((long)R[m]>=0) src=0;
else src=1;
src+=dest;
R[n]+=R[m];
if ((long)R[n]>=0) ans=0;
else ans=1;
ans+=dest;
if (src==0 || src==2) {
if (ans==1) T=1;
else T=0;
}
else T=0;
PC+=2;
}
Examples:
ADDV
R0,R1
;Before execution:
;After execution:
ADDV
R0,R1
;Before execution:
;After execution:
60
R0 = H'00000001, R1 = H'7FFFFFFE, T = 0
R1 = H'7FFFFFFF, T = 0
R0 = H'00000002, R1 = H'7FFFFFFE, T = 0
R1 = H'80000000, T = 1
7.2.4
AND (AND Logical): Logic Operation Instruction
Format
AND
Rm,Rn
AND
#imm,R0
AND.B #imm, @(R0,GBR)
Abstract
Code
Cycle
T Bit
Rn & Rm → Rn
0010nnnnmmmm1001
1
—
R0 & imm → R0
11001001iiiiiiii
1
—
(R0 + GBR) & imm → (R0 + GBR)
11001101iiiiiiii
3
—
Description: Logically ANDs the contents of general registers Rn and Rm, and stores the result in
Rn. The contents of general register R0 can be ANDed with zero-extended 8-bit immediate data.
8-bit memory data pointed to by GBR relative addressing can be ANDed with 8-bit immediate
data.
Note: After AND #imm, R0 is executed and the upper 24 bits of R0 are always cleared to 0.
Operation:
AND(long m,long n)
/* AND Rm,Rn */
{
R[n]&=R[m]
PC+=2;
}
ANDI(long i)
/* AND #imm,R0 */
{
R[0]&=(0x000000FF & (long)i);
PC+=2;
}
ANDM(long i)
/* AND.B #imm,@(R0,GBR) */
{
long temp;
temp=(long)Read_Byte(GBR+R[0]);
temp&=(0x000000FF & (long)i);
Write_Byte(GBR+R[0],temp);
PC+=2;
}
61
Examples:
AND
R0,R1
;Before execution:
;After execution:
AND
#H'0F,R0
;Before execution:
;After execution:
AND.B
#H'80,@(R0,GBR)
;Before execution:
;After execution:
62
R0 = H'AAAAAAAA, R1 = H'55555555
R1 = H'00000000
R0 = H'FFFFFFFF
R0 = H'0000000F
@(R0,GBR) = H'A5
@(R0,GBR) = H'80
7.2.5
BF (Branch if False): Branch Instruction
Format
Abstract
Code
Cycle
T Bit
BF
When T = 0, disp × 2 + PC → PC;
When T = 1, nop
10001011dddddddd
3/1
—
label
Description: Reads the T bit, and conditionally branches. If T = 0, it branches to the branch
destination address. If T = 1, BF executes the next instruction. The branch destination is an
address specified by PC + displacement. However, in this case it is used for address calculation.
The PC is the address 4 bytes after this instruction. The 8-bit displacement is sign-extended and
doubled. Consequently, the relative interval from the branch destination is –256 to +254 bytes. If
the displacement is too short to reach the branch destination, use BF with the BRA instruction or
the like.
Note: When branching, three cycles; when not branching, one cycle.
Operation:
BF(long d) /* BF disp */
{
long disp;
if ((d&0x80)==0) disp=(0x000000FF & (long)d);
else disp=(0xFFFFFF00 | (long)d);
if (T==0) PC=PC+(disp<<1);
else PC+=2;
}
Example:
;T is always cleared to 0
CLRT
TRGET_F:
BT
TRGET_T
;Does not branch, because T = 0
BF
TRGET_F
;Branches to TRGET_F, because T = 0
NOP
;
NOP
..........
;← The PC location is used to calculate the branch destination
address of the BF instruction
;← Branch destination of the BF instruction
63
7.2.6
BF/S (Branch if False with Delay Slot): Branch Instruction
Format
Abstract
Code
Cycle
T Bit
BF/S label
When T = 0, disp × 2+ PC → PC;
When T = 1, nop
10001111dddddddd
2/1
—
Description: Reads the T bit and conditionally branches. If T = 0, it branches after executing the
next instruction. If T = 1, BF/S executes the next instruction. The branch destination is an address
specified by PC + displacement. However, in this case it is used for address calculation. The PC is
the address 4 bytes after this instruction. The 8-bit displacement is sign-extended and doubled.
Consequently, the relative interval from the branch destination is –256 to +254 bytes. If the
displacement is too short to reach the branch destination, use BF with the BRA instruction or the
like.
Note: Since this is a delay branch instruction, the instruction immediately following is executed
before the branch. No interrupts and address errors are accepted between this instruction
and the next instruction. When the instruction immediately following is a branch
instruction, it is recognized as an illegal slot instruction. When branching, this is a twocycle instruction; when not branching, one cycle.
Operation:
BFS(long d)
/* BFS disp */
{
long disp;
unsigned long temp;
temp=PC;
if ((d&0x80)==0) disp=(0x000000FF & (long)d);
else disp=(0xFFFFFF00 | (long)d);
if (T==0) {
PC=PC+(disp<<1);
Delay_Slot(temp+2);
}
else PC+=2;
}
64
Example:
CLRT
;T is always 0
BT/S TRGET_T
;Does not branch, because T = 0
NOP
;
BF/S TRGET_F
;Branches to TRGET_F, because T = 0
ADD
;Executed before branch.
R0,R1
NOP
..........
TRGET_F:
;← The PC location is used to calculate the branch destination
address of the BF/S instruction
;← Branch destination of the BF/S instruction
Note: When a delayed branch instruction is used, the branching operation takes place after the
slot instruction is executed, but the execution of instructions (register update, etc.) takes
place in the sequence delayed branch instruction → delayed slot instruction. For example,
even if a delayed slot instruction is used to change the register where the branch
destination address is stored, the register content previous to the change will be used as the
branch destination address.
65
7.2.7
BRA (Branch): Branch Instruction
Format
Abstract
Code
Cycle
T Bit
BRA label
disp × 2 + PC → PC
1010dddddddddddd
2
—
Description: Branches unconditionally after executing the instruction following this BRA
instruction. The branch destination is an address specified by PC + displacement However, in this
case it is used for address calculation. The PC is the address 4 bytes after this instruction. The 12bit displacement is sign-extended and doubled. Consequently, the relative interval from the branch
destination is –4096 to +4094 bytes. If the displacement is too short to reach the branch
destination, this instruction must be changed to the JMP instruction. Here, a MOV instruction
must be used to transfer the destination address to a register.
Note: Since this is a delayed branch instruction, the instruction after BRA is executed before
branching. No interrupts and address errors are accepted between this instruction and the
next instruction. If the next instruction is a branch instruction, it is acknowledged as an
illegal slot instruction.
Operation:
BRA(long d)
/* BRA disp */
{
unsigned long temp;
long disp;
if ((d&0x800)==0) disp=(0x00000FFF & (long) d);
else disp=(0xFFFFF000 | (long) d);
temp=PC;
PC=PC+(disp<<1);
Delay_Slot(temp+2);
}
Example:
BRA
TRGET
ADD
R0,R1
NOP
..........
TRGET:
;Branches to TRGET
;Executes ADD before branching
;← The PC location is used to calculate the branch destination
address of the BRA instruction
;← Branch destination of the BRA instruction
Note: When a delayed branch instruction is used, the branching operation takes place after the
slot instruction is executed, but the execution of instructions (register update, etc.) takes
66
place in the sequence delayed branch instruction → delayed slot instruction. For example,
even if a delayed slot instruction is used to change the register where the branch
destination address is stored, the register content previous to the change will be used as the
branch destination address.
7.2.8
BRAF (Branch Far): Branch Instruction
Format
Abstract
Code
Cycle
T Bit
BRAF Rm
Rm + PC → PC
0000mmmm00100011
2
—
Description: Branches unconditionally. The branch destination is PC + the 32-bit contents of the
general register Rm. However, in this case it is used for address calculation. The PC is the address
4 bytes after this instruction.
Note: Since this is a delayed branch instruction, the instruction after BRAF is executed before
branching. No interrupts and address errors are accepted between this instruction and the
next instruction. If the next instruction is a branch instruction, it is acknowledged as an
illegal slot instruction.
Operation:
BRAF(long m)
/* BRAF Rm */
{
unsigned long temp;
temp=PC;
PC+=R[m];
Delay_Slot(temp+2);
}
Example:
MOV.L
#(TARGET-BSRF_PC),R0
BRA
TRGET
ADD
R0,R1
BRAF_PC:
;Sets displacement.
;Branches to TARGET
;Executes ADD before branching
;← The PC location is used to calculate the
branch destination address of the BRAF
instruction
NOP
....................
TARGET:
;← Branch destination of the BRAF instruction
67
Note: When a delayed branch instruction is used, the branching operation takes place after the
slot instruction is executed, but the execution of instructions (register update, etc.) takes
place in the sequence delayed branch instruction → delayed slot instruction. For example,
even if a delayed slot instruction is used to change the register where the branch
destination address is stored, the register content previous to the change will be used as the
branch destination address.
7.2.9
BSR (Branch to Subroutine): Branch Instruction
Format
Abstract
Code
Cycle
T Bit
BSR
PC → PR, disp × 2+ PC → PC
1011dddddddddddd
2
—
label
Description: Branches to the subroutine procedure at a specified address. The PC value is stored
in the PR, and the program branches to an address specified by PC + displacement However, in
this case it is used for address calculation. The PC is the address 4 bytes after this instruction. The
12-bit displacement is sign-extended and doubled. Consequently, the relative interval from the
branch destination is –4096 to +4094 bytes. If the displacement is too short to reach the branch
destination, the JSR instruction must be used instead. With JSR, the destination address must be
transferred to a register by using the MOV instruction. This BSR instruction and the RTS
instruction are used together for a subroutine procedure call.
Note: Since this is a delayed branch instruction, the instruction after BSR is executed before
branching. No interrupts and address errors are accepted between this instruction and the
next instruction. If the next instruction is a branch instruction, it is acknowledged as an
illegal slot instruction.
Operation:
BSR(long d)
/* BSR disp */
{
long disp;
if ((d&0x800)==0) disp=(0x00000FFF & (long) d);
else disp=(0xFFFFF000 | (long) d);
PR=PC+Is_32bit_Inst(PR+2);
PC=PC+(disp<<1);
Delay_Slot(PR+2);
}
68
Example:
BSR
TRGET
;Branches to TRGET
MOV
R3,R4
;Executes the MOV instruction before branching
ADD
R0,R1
;← The PC location is used to calculate the branch destination
address of the BSR instruction (return address for when the
subroutine procedure is completed (PR data))
.......
.......
;← Procedure entrance
TRGET:
MOV
R2,R3
;
#1,R0
;Executes MOV before branching
;Returns to the above ADD instruction
RTS
MOV
Note: When a delayed branch instruction is used, the branching operation takes place after the
slot instruction is executed, but the execution of instructions (register update, etc.) takes
place in the sequence delayed branch instruction → delayed slot instruction. For example,
even if a delayed slot instruction is used to change the register where the branch
destination address is stored, the register content previous to the change will be used as the
branch destination address.
69
7.2.10
BSRF (Branch to Subroutine Far): Branch Instruction
Format
BSRF Rm
Abstract
PC → PR, Rm + PC → PC
Code
0000mmmm00000011
Cycle
2
T Bit
—
Description: Branches to the subroutine procedure at a specified address after executing the
instruction following this BSRF instruction. The PC value is stored in the PR. The branch
destination is PC + the 32-bit contents of the general register Rm. However, in this case it is used
for address calculation. The PC is the address 4 bytes after this instruction. Used as a subroutine
procedure call in combination with RTS.
Note: Since this is a delayed branch instruction, the instruction after BSR is executed before
branching. No interrupts and address errors are accepted between this instruction and the
next instruction. If the next instruction is a branch instruction, it is acknowledged as an
illegal slot instruction.
Operation:
BSRF(long m)
/* BSRF Rm */
{
PR=PC+Is_32bit_Inst(PR+2);
PC+=R[m];
Delay_Slot(PR+2);
}
Example:
MOV.L
#(TARGET-BSRF_PC),R0
;Sets displacement.
BRSF
R0
;Branches to TARGET
MOV
R3,R4
;Executes the MOV instruction before
branching
;← The PC location is used to calculate the
branch destination with BSRF.
BSRF_PC:
ADD
R0,R1
.....
.....
;←Procedure entrance
TARGET:
MOV
R2,R3
;Returns to the above ADD instruction
RTS
MOV
;
#1,R0
;Executes MOV before branching
Note: When a delayed branch instruction is used, the branching operation takes place after the
slot instruction is executed, but the execution of instructions (register update, etc.) takes
place in the sequence delayed branch instruction → delayed slot instruction. For example,
70
even if a delayed slot instruction is used to change the register where the branch
destination address is stored, the register content previous to the change will be used as the
branch destination address.
7.2.11
BT (Branch if True): Branch Instruction
Format
Abstract
Code
Cycle
T Bit
BT label
When T = 1, disp × 2 + PC → PC;
When T = 0, nop
10001001dddddddd
3/1
—
Description: Reads the T bit, and conditionally branches. If T = 1, BT branches. If T = 0, BT
executes the next instruction. The branch destination is an address specified by PC +
displacement. However, in this case it is used for address calculation. The PC is the address 4
bytes after this instruction. The 8-bit displacement is sign-extended and doubled. Consequently,
the relative interval from the branch destination is –256 to +254 bytes. If the displacement is too
short to reach the branch destination, use BT with the BRA instruction or the like.
Note: When branching, requires three cycles; when not branching, one cycle.
Operation:
BT(long d) /* BT disp */
{
long disp;
if ((d&0x80)==0) disp=(0x000000FF & (long)d);
else disp=(0xFFFFFF00 | (long)d);
if (T==1) PC=PC+(disp<<1);
else PC+=2;
}
Example:
;T is always 1
SETT
BF
TRGET_F
;Does not branch, because T = 1
BT
TRGET_T
;Branches to TRGET_T, because T = 1
NOP
;
NOP
;← The PC location is used to calculate the branch
..........
address of the BT instruction
;← Branch destination of the BT instruction
destination
TRGET_T:
71
7.2.12
BT/S (Branch if True with Delay Slot): Branch Instruction
Format
Abstract
Code
Cycle
T Bit
BT/S label
When T = 1, disp × 2 + PC → PC;
When T = 0, nop
10001101dddddddd
2/1
—
Description: Reads the T bit and conditionally branches. If T = 1, BT/S branches after the
following instruction executes. If T = 0, BT/S executes the next instruction. The branch
destination is an address specified by PC + displacement. However, in this case it is used for
address calculation. The PC is the address 4 bytes after this instruction. The 8-bit displacement is
sign-extended and doubled. Consequently, the relative interval from the branch destination is –256
to +254 bytes. If the displacement is too short to reach the branch destination, use BT/S with the
BRA instruction or the like.
Note: Since this is a delay branch instruction, the instruction immediately following is executed
before the branch. No interrupts and address errors are accepted between this instruction
and the next instruction. When the immediately following instruction is a branch
instruction, it is recognized as an illegal slot instruction. When branching, requires two
cycles; when not branching, one cycle.
Operation:
BTS(long d)
/* BTS disp */
{
long disp;
unsigned
long temp;
temp=PC;
if ((d&0x80)==0) disp=(0x000000FF & (long)d);
else disp=(0xFFFFFF00 | (long)d);
if (T==1) {
PC=PC+(disp<<1);
Delay_Slot(temp+2);
}
else PC+=2;
}
72
Example:
SETT
;T is always 1
BF/S TARGET_F
;Does not branch, because T = 1
NOP
;
BT/S TARGET_T
;Branches to TARGET, because T = 1
ADD
;Executes before branching.
R0,R1
NOP
..........
TARGET_T:
;← The PC location is used to calculate the branch destination
address of the BT/S instruction
;← Branch destination of the BT/S instruction
Note: When a delayed branch instruction is used, the branching operation takes place after the
slot instruction is executed, but the execution of instructions (register update, etc.) takes
place in the sequence delayed branch instruction → delayed slot instruction. For example,
even if a delayed slot instruction is used to change the register where the branch
destination address is stored, the register content previous to the change will be used as the
branch destination address.
73
7.2.13
CLRMAC (Clear MAC Register): System Control Instruction
Format
Abstract
Code
Cycle
T Bit
CLRMAC
0 → MACH, MACL
0000000000101000
1
—
Description: Clear the MACH and MACL Register.
Operation:
CLRMAC()
/* CLRMAC */
{
MACH=0;
MACL=0;
PC+=2;
}
Example:
;Clears and initializes the MAC register
CLRMAC
MAC.W
@R0+,@R1+
;Multiply and accumulate operation
MAC.W
@R0+,@R1+
;
74
7.2.14
CLRT (Clear T Bit): System Control Instruction
Format
Abstract
Code
Cycle
T Bit
CLRT
0→T
0000000000001000
1
0
Description: Clears the T bit.
Operation:
CLRT() /* CLRT */
{
T=0;
PC+=2;
}
Example:
CLRT
;Before execution:
;After execution:
T=1
T=0
75
7.2.15
CMP/cond (Compare Conditionally): Arithmetic Instruction
Format
Abstract
Code
Cycle
T Bit
0011nnnnmmmm0000
1
Comparison result
CMP/EQ
Rm,Rn
When Rn = Rm, 1 → T
CMP/GE
Rm,Rn
When signed and Rn
1→T
Rm,
0011nnnnmmmm0011
1
Comparison result
CMP/GT
Rm,Rn
When signed and Rn > Rm,
1→T
0011nnnnmmmm0111
1
Comparison result
CMP/HI
Rm,Rn
When unsigned and Rn > Rm,
1→T
0011nnnnmmmm0110
1
Comparison result
CMP/HS
Rm,Rn
When unsigned and Rn Rm,
1→T
0011nnnnmmmm0010
1
Comparison result
CMP/PL
Rn
When Rn > 0, 1 → T
0100nnnn00010101
1
Comparison result
CMP/PZ
Rn
When Rn 0, 1 → T
0100nnnn00010001
1
Comparison result
CMP/STR Rm,Rn
When a byte in Rn equals a
byte in Rm, 1 → T
0010nnnnmmmm1100
1
Comparison result
CMP/EQ
When R0 = imm, 1 → T
10001000iiiiiiii
1
Comparison result
#imm,R0
Description: Compares general register Rn data with Rm data, and sets the T bit to 1 if a specified
condition (cond) is satisfied. The T bit is cleared to 0 if the condition is not satisfied. The Rn data
does not change. The following eight conditions can be specified. Conditions PZ and PL are the
results of comparisons between Rn and 0. Sign-extended 8-bit immediate data can also be
compared with R0 by using condition EQ. Here, R0 data does not change. Table 7.2 shows the
mnemonics for the conditions.
Table 7.2
CMP Mnemonics
Mnemonics
Condition
CMP/EQ
Rm,Rn
If Rn = Rm, T = 1
CMP/GE
Rm,Rn
If Rn
CMP/GT
Rm,Rn
If Rn > Rm with signed data, T = 1
CMP/HI
Rm,Rn
If Rn > Rm with unsigned data, T = 1
CMP/HS
Rm,Rn
If Rn
CMP/PL
Rn
If Rn > 0, T = 1
CMP/PZ
Rn
If Rn
CMP/STR
Rm,Rn
If a byte in Rn equals a byte in Rm, T = 1
CMP/EQ
#imm,R0
If R0 = imm, T = 1
76
Rm with signed data, T = 1
Rm with unsigned data, T = 1
0, T = 1
Operation:
CMPEQ(long m,long n)
/* CMP_EQ Rm,Rn */
{
if (R[n]==R[m]) T=1;
else T=0;
PC+=2;
}
CMPGE(long m,long n)
/* CMP_GE Rm,Rn */
{
if ((long)R[n]>=(long)R[m]) T=1;
else T=0;
PC+=2;
}
CMPGT(long m,long n)
/* CMP_GT Rm,Rn */
{
if ((long)R[n]>(long)R[m]) T=1;
else T=0;
PC+=2;
}
CMPHI(long m,long n)
/* CMP_HI Rm,Rn */
{
if ((unsigned long)R[n]>(unsigned long)R[m]) T=1;
else T=0;
PC+=2;
}
CMPHS(long m,long n)
/* CMP_HS Rm,Rn */
{
if ((unsigned long)R[n]>=(unsigned long)R[m]) T=1;
else T=0;
PC+=2;
}
77
CMPPL(long n)
/* CMP_PL Rn */
{
if ((long)R[n]>0) T=1;
else T=0;
PC+=2;
}
CMPPZ(long n) /* CMP_PZ Rn */
{
if ((long)R[n]>=0) T=1;
else T=0;
PC+=2;
}
CMPSTR(long m,long n) /* CMP_STR Rm,Rn */
{
unsigned long temp;
long HH,HL,LH,LL;
temp=R[n]^R[m];
HH=(temp>>12)&0x000000FF;
HL=(temp>>8)&0x000000FF;
LH=(temp>>4)&0x000000FF;
LL=temp&0x000000FF;
HH=HH&&HL&&LH&&LL;
if (HH==0) T=1;
else T=0;
PC+=2;
}
78
CMPIM(long i)
/* CMP_EQ #imm,R0 */
{
long imm;
if ((i&0x80)==0) imm=(0x000000FF & (long i));
else imm=(0xFFFFFF00 | (long i));
if (R[0]==imm) T=1;
else T=0;
PC+=2;
}
Example:
R0,R1
;R0 = H'7FFFFFFF, R1 = H'80000000
BT
TRGET_T
;Does not branch because T = 0
CMP/HS
R0,R1
;R0 = H'7FFFFFFF, R1 = H'80000000
BT
TRGET_T
;Branches because T = 1
CMP/STR
R2,R3
;R2 = “ABCD”, R3 = “XYCZ”
BT
TRGET_T
;Branches because T = 1
CMP/GE
79
7.2.16
DIV0S (Divide Step 0 as Signed): Arithmetic Instruction
Format
Abstract
Code
Cycle
T Bit
DIV0S Rm,Rn
MSB of Rn → Q, MSB of Rm → M,
M^Q → T
0010nnnnmmmm0111
1
Calculation result
Description: DIV0S is an initialization instruction for signed division. It finds the quotient by
repeatedly dividing in combination with the DIV1 or another instruction that divides for each bit
after this instruction. See the description given with DIV1 for more information.
Operation:
DIV0S(long m,long n)
/* DIV0S Rm,Rn */
{
if ((R[n]&0x80000000)==0) Q=0;
else Q=1;
if ((R[m]&0x80000000)==0) M=0;
else M=1;
T=!(M==Q);
PC+=2;
}
Example: See DIV1.
80
7.2.17
DIV0U (Divide Step 0 as Unsigned): Arithmetic Instruction
Format
Abstract
Code
Cycle
T Bit
DIV0U
0 → M/Q/T
0000000000011001
1
0
Description: DIV0U is an initialization instruction for unsigned division. It finds the quotient by
repeatedly dividing in combination with the DIV1 or another instruction that divides for each bit
after this instruction. See the description given with DIV1 for more information.
Operation:
DIV0U() /* DIV0U */
{
M=Q=T=0;
PC+=2;
}
Example: See DIV1.
81
7.2.18
DIV1 (Divide 1 Step): Arithmetic Instruction
Format
Abstract
Code
Cycle
T Bit
DIV1 Rm,Rn
1 step division (Rn ÷ Rm)
0011nnnnmmmm0100
1
Calculation result
Description: Uses single-step division to divide one bit of the 32-bit data in general register Rn
(dividend) by Rm data (divisor). It finds a quotient through repetition either independently or used
in combination with other instructions. During this repetition, do not rewrite the specified register
or the M, Q, and T bits.
In one-step division, the dividend is shifted one bit left, the divisor is subtracted and the quotient
bit reflected in the Q bit according to the status (positive or negative). To find the remainder in a
division, first find the quotient using a DIV1 instruction, then find the remainder as follows:
(dividend) – (divisor) × (quotient) = (remainder)
Zero division, overflow detection, and remainder operation are not supported. Check for zero
division and overflow division before dividing.
Find the remainder by first finding the sum of the divisor and the quotient obtained and then
subtracting it from the dividend. That is, first initialize with DIV0S or DIV0U. Repeat DIV1 for
each bit of the divisor to obtain the quotient. When the quotient requires 17 or more bits, place
ROTCL before DIV1. For the division sequence, see the following examples.
82
Operation:
DIV1(long m,long n)
/* DIV1 Rm,Rn */
{
unsigned long tmp0;
unsigned char old_q,tmp1;
old_q=Q;
Q=(unsigned char)((0x80000000 & R[n])!=0);
R[n]<<=1;
R[n]|=(unsigned long)T;
switch(old_q){
case 0:switch(M){
case 0:tmp0=R[n];
R[n]-=R[m];
tmp1=(R[n]>tmp0);
switch(Q){
case 0:Q=tmp1;
break;
case 1:Q=(unsigned char)(tmp1==0);
break;
}
break;
case 1:tmp0=R[n];
R[n]+=R[m];
tmp1=(R[n]<tmp0);
switch(Q){
case 0:Q=(unsigned char)(tmp1==0);
break;
case 1:Q=tmp1;
break;
}
break;
}
break;
83
case 1:switch(M){
case 0:tmp0=R[n];
R[n]+=R[m];
tmp1=(R[n]<tmp0);
switch(Q){
case 0:Q=tmp1;
break;
case 1:Q=(unsigned char)(tmp1==0);
break;
}
break;
case 1:tmp0=R[n];
R[n]-=R[m];
tmp1=(R[n]>tmp0);
switch(Q){
case 0:Q=(unsigned char)(tmp1==0);
break;
case 1:Q=tmp1;
break;
}
break;
}
break;
}
T=(Q==M);
PC+=2;
}
84
Example 1:
;R1 (32 bits) / R0 (16 bits) = R1 (16 bits):Unsigned
SHLL16
R0
;Upper 16 bits = divisor, lower 16 bits = 0
TST
R0,R0
;Zero division check
BT
ZERO_DIV
;
CMP/HS
R0,R1
;Overflow check
BT
OVER_DIV
;
;Flag initialization
DIV0U
.arepeat
16
;
DIV1
R0,R1
;Repeat 16 times
ROTCL
R1
;
EXTU.W
R1,R1
;R1 = Quotient
.aendr
;
Example 2:
;R1:R2 (64 bits)/R0 (32 bits) = R2 (32 bits):Unsigned
TST
R0,R0
BT ZERO_DIV
;
CMP/HS
;R0,R1
BT OVER_DIV
;
;Zero division check
;Overflow check
;Flag initialization
DIV0U
.arepeat
32
ROTCL
R2
;Repeat 32 times
DIV1
R0,R1
;
.aendr
ROTCL
;
;
R2
;R2 = Quotient
85
Example 3:
;R1 (16 bits)/R0 (16 bits) = R1 (16 bits):Signed
SHLL16
R0
;Upper 16 bits = divisor, lower 16 bits = 0
EXTS.W
R1,R1
;Sign-extends the dividend to 32 bits
XOR
R2,R2
;R2 = 0
MOV
R1,R3
;
ROTCL
R3
;
SUBC
R2,R1
;Decrements if the dividend is negative
DIV0S
R0,R1
;Flag initialization
.arepeat
16
;
DIV1
R0,R1
;Repeat 16 times
EXTS.W
R1,R1
;
ROTCL
R1
;R1 = quotient (one’s complement)
ADDC
R2,R1
;Increments and takes the two’s complement if the MSB of the
R1,R1
quotient is 1
;R1 = quotient (two’s complement)
.aendr
EXTS.W
Example 4:
;R2 (32 bits) / R0 (32 bits) = R2 (32 bits):Signed
MOV
R2,R3
;
ROTCL
R3
;
SUBC
R1,R1
;Sign-extends the dividend to 64 bits (R1:R2)
XOR
R3,R3
;R3 = 0
SUBC
R3,R2
;Decrements and takes the one’s complement if the dividend is
DIV0S
R0,R1
negative
;Flag initialization
.arepeat
32
;
ROTCL
R2
;Repeat 32 times
DIV1
R0,R1
;
ROTCL
R2
;R2 = Quotient (one’s complement)
ADDC
R3,R2
;Increments and takes the two’s complement if the MSB of the
.aendr
;
quotient is 1. R2 = Quotient (two’s complement)
86
7.2.19
DMULS.L (Double-Length Multiply as Signed): Arithmetic Instruction
Format
Abstract
Code
DMULS.L Rm, Rn With sign, Rn × Rm → MACH, MACL 0011nnnnmmmm1101
Cycle T Bit
2 to 4
—
Description: Performs 32-bit multiplication of the contents of general registers Rn and Rm, and
stores the 64-bit results in the MACL and MACH register. The operation is a signed arithmetic
operation.
Operation:
DMULS(long m,long n) /* DMULS.L Rm,Rn */
{
unsigned
long RnL,RnH,RmL,RmH,Res0,Res1,Res2;
unsigned
long temp0,temp1,temp2,temp3;
long tempm,tempn,fnLmL;
tempn=(long)R[n];
tempm=(long)R[m];
if (tempn<0) tempn=0-tempn;
if (tempm<0) tempm=0-tempm;
if ((long)(R[n]^R[m])<0) fnLmL=-1;
else fnLmL=0;
temp1=(unsigned long)tempn;
temp2=(unsigned long)tempm;
RnL=temp1&0x0000FFFF;
RnH=(temp1>>16)&0x0000FFFF;
RmL=temp2&0x0000FFFF;
RmH=(temp2>>16)&0x0000FFFF;
temp0=RmL*RnL;
temp1=RmH*RnL;
temp2=RmL*RnH;
temp3=RmH*RnH;
87
Res2=0
Res1=temp1+temp2;
if (Res1<temp1) Res2+=0x00010000;
temp1=(Res1<<16)&0xFFFF0000;
Res0=temp0+temp1;
if (Res0<temp0) Res2++;
Res2=Res2+((Res1>>16)&0x0000FFFF)+temp3;
if (fnLmL<0) {
Res2=~Res2;
if (Res0==0)
Res2++;
else
Res0=(~Res0)+1;
}
MACH=Res2;
MACL=Res0;
PC+=2;
}
Example:
DMULS.L R0,R1
;Before execution:
R0 = H'FFFFFFFE, R1 = H'00005555
;After execution:
MACH = H'FFFFFFFF, MACL = H'FFFF5556
STS
MACH,R0 ;Operation result (top)
STS
MACL,R0 ;Operation result (bottom)
88
7.2.20
DMULU.L (Double-Length Multiply as Unsigned): Arithmetic Instruction
Format
Abstract
Code
Cycle T Bit
DMULU.L Rm, Rn
Without sign, Rn × Rm → MACH,
MACL
0011nnnnmmmm0101
2 to 4
—
Description: Performs 32-bit multiplication of the contents of general registers Rn and Rm, and
stores the 64-bit results in the MACL and MACH register. The operation is an unsigned arithmetic
operation.
Operation:
DMULU(long m,long n) /* DMULU.L Rm,Rn */
{
unsigned
long RnL,RnH,RmL,RmH,Res0,Res1,Res2;
unsigned
long temp0,temp1,temp2,temp3;
RnL=R[n]&0x0000FFFF;
RnH=(R[n]>>16)&0x0000FFFF;
RmL=R[m]&0x0000FFFF;
RmH=(R[m]>>16)&0x0000FFFF;
temp0=RmL*RnL;
temp1=RmH*RnL;
temp2=RmL*RnH;
temp3=RmH*RnH;
Res2=0
Res1=temp1+temp2;
if (Res1<temp1) Res2+=0x00010000;
temp1=(Res1<<16)&0xFFFF0000;
Res0=temp0+temp1;
if (Res0<temp0) Res2++;
Res2=Res2+((Res1>>16)&0x0000FFFF)+temp3;
MACH=Res2;
89
MACL=Res0;
PC+=2;
}
Example:
DMULU.L R0,R1
;Before execution:R0 = H'FFFFFFFE, R1 = H'00005555
;After execution: MACH = H'FFFFFFFF, MACL = H'FFFF5556
STS
MACH,R0
;Operation result (top)
STS
MACL,R0
;Operation result (bottom)
90
7.2.21
DT (Decrement and Test): Arithmetic Instruction
Format
Abstract
Code
Cycle T Bit
DT Rn
Rn – 1 → Rn; When Rn is 0, 1 → T,
when Rn is nonzero, 0 → T
0100nnnn00010000
1
Comparison
result
Description: The contents of general register Rn are decremented by 1 and the result compared to
0 (zero). When the result is 0, the T bit is set to 1. When the result is not zero, the T bit is set to 0.
Operation:
DT(long n) /* DT Rn */
{
R[n]--;
if (R[n]==0) T=1;
else T=0;
PC+=2;
}
Example:
MOV
#4,R5
;Sets the number of loops.
ADD
R0,R1
;
DT
RS
;Decrements the R5 value and checks whether it has become 0.
BF
LOOP
;Branches to LOOP is T=0. (In this example, loops 4 times.)
LOOP:
91
7.2.22
EXTS (Extend as Signed): Arithmetic Instruction
Format
Abstract
Code
Cycle T Bit
EXTS.B Rm, Rn
Sign-extend Rm from byte → Rn
0110nnnnmmmm1110
1
—
EXTS.W Rm, Rn
Sign-extend Rm from word → Rn
0110nnnnmmmm1111
1
—
Description: Sign-extends general register Rm data, and stores the result in Rn. If byte length is
specified, the bit 7 value of Rm is copied into bits 8 to 31 of Rn. If word length is specified, the bit
15 value of Rm is copied into bits 16 to 31 of Rn.
Operation:
EXTSB(long m,long n)
/* EXTS.B Rm,Rn */
{
R[n]=R[m];
if ((R[m]&0x00000080)==0) R[n]&=0x000000FF;
else R[n]|=0xFFFFFF00;
PC+=2;
}
EXTSW(long m,long n)
/* EXTS.W Rm,Rn */
{
R[n]=R[m];
if ((R[m]&0x00008000)==0) R[n]&=0x0000FFFF;
else R[n]|=0xFFFF0000;
PC+=2;
}
Examples:
EXTS.B R0,R1
;Before execution: R0 = H'00000080
;After execution:
EXTS.W R0,R1
92
R1 = H'FFFFFF80
;Before execution: R0 = H'00008000
;After execution: R1 = H'FFFF8000
7.2.23
EXTU (Extend as Unsigned): Arithmetic Instruction
Format
Abstract
Code
Cycle T Bit
EXTU.B Rm, Rn
Zero-extend Rm from byte → Rn
0110nnnnmmmm1100
1
—
EXTU.W Rm, Rn
Zero-extend Rm from word → Rn
0110nnnnmmmm1101
1
—
Description: Zero-extends general register Rm data, and stores the result in Rn. If byte length is
specified, 0s are written in bits 8 to 31 of Rn. If word length is specified, 0s are written in bits 16
to 31 of Rn.
Operation:
EXTUB(long m,long n) /* EXTU.B Rm,Rn */
{
R[n]=R[m];
R[n]&=0x000000FF;
PC+=2;
}
EXTUW(long m,long n) /* EXTU.W Rm,Rn */
{
R[n]=R[m];
R[n]&=0x0000FFFF;
PC+=2;
}
Examples:
EXTU.B R0,R1
;Before execution: R0 = H'FFFFFF80
;After execution:
EXTU.W R0,R1
R1 = H'00000080
;Before execution: R0 = H'FFFF8000
;After execution:
R1 = H'00008000
93
7.2.24
JMP (Jump): Branch Instruction
Class: Delayed branch instruction
Format
Abstract
Code
Cycle T Bit
JMP
Rm → PC
0100mmmm00101011
2
@Rm
—
Description: Branches unconditionally to the address specified by register indirect addressing.
The branch destination is an address specified by the 32-bit data in general register Rm.
Note: Since this is a delayed branch instruction, the instruction after JMP is executed before
branching. No interrupts or address errors are accepted between this instruction and the
next instruction. If the next instruction is a branch instruction, it is acknowledged as an
illegal slot instruction.
Operation:
JMP(long m)
/* JMP @Rm */
{
unsigned long temp;
temp=PC;
PC=R[m]+4;
Delay_Slot(temp+2);
}
Example:
JMP_TABLE:
MOV.L
JMP_TABLE,R0
;Address of R0 = TRGET
JMP
@R0
;Branches to TRGET
MOV
R0,R1
;Executes MOV before branching
.align
4
.data.l
TRGET
;Jump table
.................
TRGET:
ADD
#1,R1
;← Branch destination
Note: When a delayed branch instruction is used, the branching operation takes place after the
slot instruction is executed, but the execution of instructions (register update, etc.) takes
place in the sequence delayed branch instruction → delayed slot instruction. For example,
even if a delayed slot instruction is used to change the register where the branch
destination address is stored, the register content previous to the change will be used as the
branch destination address.
94
7.2.25
JSR (Jump to Subroutine): Branch Instruction (Class: Delayed Branch
Instruction)
Format
Abstract
Code
Cycle
T Bit
JSR
PC → PR, Rm → PC
0100mmmm00001011
2
—
@Rm
Description: Branches to the subroutine procedure at the address specified by register indirect
addressing. The PC value is stored in the PR. The jump destination is an address specified by the
32-bit data in general register Rm. The stored/saved PC is the address four bytes after this
instruction. The JSR instruction and RTS instruction are used together for subroutine procedure
calls.
Note: Since this is a delayed branch instruction, the instruction after JSR is executed before
branching. No interrupts and address errors are accepted between this instruction and the
next instruction. If the next instruction is a branch instruction, it is acknowledged as an
illegal slot instruction.
Operation:
JSR(long m)
/* JSR @Rm */
{
PR=PC;
PC=R[m]+4;
Delay_Slot(PR+2);
}
Example:
;Address of R0 = TRGET
MOV.L
JSR_TABLE,R0
JSR
@R0
;Branches to TRGET
XOR
R1,R1
;Executes XOR before branching
ADD
R0,R1
;← Return address for when the subroutine
procedure is completed (PR data)
...........
.align
JSR_TABLE: .data.l
TRGET:
4
TRGET
;← Procedure entrance
NOP
MOV
R2,R3
;
;Returns to the above ADD instruction
RTS
MOV
;Jump table
#70,R1
;Executes MOV before RTS
95
Note: When a delayed branch instruction is used, the branching operation takes place after the
slot instruction is executed, but the execution of instructions (register update, etc.) takes
place in the sequence delayed branch instruction → delayed slot instruction. For example,
even if a delayed slot instruction is used to change the register where the branch
destination address is stored, the register content previous to the change will be used as the
branch destination address.
96
7.2.26
LDC (Load to Control Register): System Control Instruction (Class: Interrupt
Disabled Instruction)
Format
Abstract
Code
Cycle T Bit
LDC
Rm,SR
Rm → SR
0100mmmm00001110
1
LSB
LDC
Rm,GBR
Rm → GBR
0100mmmm00011110
1
—
LDC
Rm,VBR
Rm → VBR
0100mmmm00101110
1
—
LDC.L @Rm+,SR
(Rm) → SR, Rm + 4 → Rm
0100mmmm00000111
3
LSB
LDC.L @Rm+,GBR
(Rm) → GBR, Rm + 4 → Rm
0100mmmm00010111
3
—
LDC.L @Rm+,VBR
(Rm) → VBR, Rm + 4 → Rm
0100mmmm00100111
3
—
Description: Store the source operand into control register SR, GBR, or VBR.
Note: No interrupts are accepted between this instruction and the next instruction. Address errors
are accepted.
Operation:
LDCSR(long m)
/* LDC Rm,SR */
{
SR=R[m]&0x0FFF0FFF;
PC+=2;
}
LDCGBR(long m) /* LDC Rm,GBR */
{
GBR=R[m];
PC+=2;
}
LDCVBR(long m) /* LDC Rm,VBR */
{
VBR=R[m];
PC+=2;
}
97
LDCMSR(long m) /* LDC.L @Rm+,SR */
{
SR=Read_Long(R[m])&0x0FFF0FFF;
R[m]+=4;
PC+=2;
}
LDCMGBR(long m)
/* LDC.L @Rm+,GBR */
{
GBR=Read_Long(R[m]);
R[m]+=4;
PC+=2;
}
LDCMVBR(long m)
/* LDC.L @Rm+,VBR */
{
VBR=Read_Long(R[m]);
R[m]+=4;
PC+=2;
}
Examples:
LDC
R0,SR
;Before execution:
;After execution:
LDC.L
@R15+,GBR
;Before execution:
;After execution:
98
R0 = H'FFFFFFFF, SR = H'00000000
SR = H'0FFF0FFF
R15 = H'10000000
R15 = H'10000004, GBR = @H'10000000
7.2.27
LDS (Load to System Register): System Control Instruction
Class: Interrupt disabled instruction
Format
Abstract
Code
Cycle T Bit
LDS
Rm,MACH
Rm → MACH
0100mmmm00001010
1
—
LDS
Rm,MACL
Rm → MACL
0100mmmm00011010
1
—
LDS
Rm,PR
Rm → PR
0100mmmm00101010
1
—
LDS.L
@Rm+, MACH (Rm) → MACH, Rm + 4 → Rm
0100mmmm00000110
1
—
LDS.L
@Rm+, MACL
(Rm) → MACL, Rm + 4 → Rm
0100mmmm00010110
1
—
LDS.L
@Rm+,PR
(Rm) → PR, Rm + 4 → Rm
0100mmmm00100110
1
—
Description: Store the source operand into the system register MACH, MACL, or PR.
Note: No interrupts are accepted between this instruction and the next instruction. Address errors
are accepted.
Operation:
LDSMACH(long m)
/* LDS Rm,MACH */
{
MACH=R[m];
PC+=2;
}
LDSMACL(long m)
/* LDS Rm,MACL */
{
MACL=R[m];
PC+=2;
}
LDSPR(long m)
/* LDS Rm,PR */
{
PR=R[m];
PC+=2;
}
LDSMMACH(long m)
/* LDS.L @Rm+,MACH */
{
MACH=Read_Long(R[m]);
R[m]+=4;
PC+=2;
99
}
LDSMMACL(long m)
/* LDS.L @Rm+,MACL */
{
MACL=Read_Long(R[m]);
R[m]+=4;
PC+=2;
}
LDSMPR(long m) /* LDS.L @Rm+,PR */
{
PR=Read_Long(R[m]);
R[m]+=4;
PC+=2;
}
Examples:
LDS
R0,PR
LDS.L
@R15+,MACL
100
;Before execution:
;After execution:
;Before execution:
;After execution:
R0 = H'12345678, PR = H'00000000
PR = H'12345678
R15 = H'10000000
R15 = H'10000004, MACL = @H'10000000
7.2.28
MAC.L (Multiply and Accumulate Calculation Long): Arithmetic Instruction
Format
Abstract
Code
Cycle
T Bit
MAC.L @Rm+, @Rn+
Signed operation,
(Rn) × (Rm) + MAC → MAC
0000nnnnmmmm1111
3/(2 to 4)
—
Description: Does signed multiplication of 32-bit operands obtained using the contents of general
registers Rm and Rn as addresses. The 64-bit result is added to contents of the MAC register, and
the final result is stored in the MAC register. Every time an operand is read, they increment Rm
and Rn by four.
When the S bit is cleared to 0, the 64-bit result is stored in the coupled MACH and MACL
registers. When bit S is set to 1, addition to the MAC register is a saturation operation of 48 bits
starting from the LSB. For the saturation operation, only the lower 48 bits of the MACL register
are enabled and the result is limited to a range of H'FFFF800000000000 (minimum) and
H'00007FFFFFFFFFFF (maximum).
Operation:
MACL(long m,long n)
/* MAC.L @Rm+,@Rn+*/
{
unsigned long RnL,RnH,RmL,RmH,Res0,Res1,Res2;
unsigned long temp0,templ,temp2,temp3;
long tempm,tempn,fnLmL;
tempn=(long)Read_Long(R[n]);
R[n]+=4;
tempm=(long)Read_Long(R[m]);
R[m]+=4;
if ((long)(tempn^tempm)<0) fnLmL=-1;
else fnLmL=0;
if (tempn<0) tempn=0-tempn;
if (tempm<0) tempm=0-tempm;
temp1=(unsigned long)tempn;
temp2=(unsigned long)tempm;
RnL=temp1&0x0000FFFF;
RnH=(temp1>>16)&0x0000FFFF;
101
RmL=temp2&0x0000FFFF;
RmH=(temp2>>16)&0x0000FFFF;
temp0=RmL*RnL;
temp1=RmH*RnL;
temp2=RmL*RnH;
temp3=RmH*RnH;
Res2=0
Res1=temp1+temp2;
if (Res1<temp1) Res2+=0x00010000;
temp1=(Res1<<16)&0xFFFF0000;
Res0=temp0+temp1;
if (Res0<temp0) Res2++;
Res2=Res2+((Res1>>16)&0x0000FFFF)+temp3;
if(fnLm<0){
Res2=~Res2;
if (Res0==0) Res2++;
else Res0=(~Res0)+1;
}
if(S==1){
Res0=MACL+Res0;
if (MACL>Res0) Res2++;
Res2+=(MACH&0x0000FFFF);
if(((long)Res2<0)&&(Res2<0xFFFF8000)){
Res2=0x00008000;
Res0=0x00000000;
}
if(((long)Res2>0)&&(Res2>0x00007FFF)){
Res2=0x00007FFF;
Res0=0xFFFFFFFF;
};
102
MACH={Res2;
MACL=Res0;
}
else {
Res0=MACL+Res0;
if (MACL>Res0) Res2++;
Res2+=MACH
MACH=Res2;
MACL=Res0;
}
PC+=2;
}
Example:
;Table address
MOVA
TBLM,R0
MOV
R0,R1
;
MOVA
TBLN,R0
;Table address
;MAC register initialization
CLRMAC
MAC.L
@R0+,@R1+
;
MAC.L
@R0+,@R1+
;
STS
MACL,R0
;Store result into R0
...............
TBLM
TBLN
.align
2
;
.data.l
H'1234ABCD
;
.data.l
H'5678EF01
;
.data.l
H'0123ABCD
;
.data.l
H'4567DEF0
;
103
7.2.29
MAC.W (Multiply and Accumulate Calculation Word): Arithmetic Instruction
Format
MAC.W
MAC
@Rm+, @Rn+
Abstract
Code
Cycle T Bit
With sign,
(Rn) × (Rm) + MAC → MAC
0100nnnnmmmm1111
3/(2)
—
@Rm+, @Rn+
Description: Does signed multiplication of 16-bit operands obtained using the contents of general
registers Rm and Rn as addresses. The 32-bit result is added to contents of the MAC register, and
the final result is stored in the MAC register. Rm and Rn data are incremented by 2 after the
operation.
When the S bit is cleared to 0, the operation is 16 × 16 + 64 → 64-bit multiply and accumulate and
the 64-bit result is stored in the coupled MACH and MACL registers.
When the S bit is set to 1, the operation is 16 × 16 + 32 → 32-bit multiply and accumulate and
addition to the MAC register is a saturation operation. For the saturation operation, only the
MACL register is enabled and the result is limited to a range of H'80000000 (minimum) and
H'7FFFFFFF (maximum).
If an overflow occurs, the LSB of the MACH register is set to 1. The result is stored in the MACL
register. The result is limited to a value between H'80000000 (minimum) for overflows in the
negative direction and H'7FFFFFFF (maximum) for overflows in the positive direction.
Operation:
MACW(long m,long n)
/* MAC.W @Rm+,@Rn+*/
{
long tempm,tempn,dest,src,ans;
unsigned long templ;
tempn=(long)Read_Word(R[n]);
R[n]+=2;
tempm=(long)Read_Word(R[m]);
R[m]+=2;
templ=MACL;
tempm=((long)(short)tempn*(long)(short)tempm);
if ((long)MACL>=0) dest=0;
else dest=1;
if ((long)tempm>=0 {
src=0;
tempn=0;
104
}
else {
src=1;
tempn=0xFFFFFFFF;
}
src+=dest;
MACL+=tempm;
if ((long)MACL>=0) ans=0;
else ans=1;
ans+=dest;
if (S==1) {
if (ans==1) {
if (src==0) MACL=0x7FFFFFFF;
if (src==2) MACL=0x80000000;
}
}
else {
MACH+=tempn;
if (templ>MACL) MACH+=1;
}
PC+=2;
}
Example:
;Table address
MOVA
TBLM,R0
MOV
R0,R1
;
MOVA
TBLN,R0
;Table address
;MAC register initialization
CLRMAC
MAC.W
@R0+,@R1+
;
MAC.W
@R0+,@R1+
;
STS
MACL,R0
;Store result into R0
...............
TBLM
TBLN
.align
2
;
.data.w
H'1234
;
.data.w
H'5678
;
.data.w
H'0123
;
.data.w
H'4567
;
105
7.2.30
MOV (Move Data): Data Transfer Instruction
Format
Abstract
Code
Cycle
T Bit
MOV
Rm,Rn
Rm → Rn
0110nnnnmmmm0011
1
—
MOV.B
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0000
1
—
MOV.W
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0001
1
—
MOV.L
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0010
1
—
MOV.B
@Rm,Rn
(Rm) → sign extension → Rn
0110nnnnmmmm0000
1
—
MOV.W
@Rm,Rn
(Rm) → sign extension → Rn
0110nnnnmmmm0001
1
—
MOV.L
@Rm,Rn
(Rm) → Rn
0110nnnnmmmm0010
1
—
MOV.B
Rm,@–Rn
Rn – 1 → Rn, Rm → (Rn)
0010nnnnmmmm0100
1
—
MOV.W
Rm,@–Rn
Rn – 2 → Rn, Rm → (Rn)
0010nnnnmmmm0101
1
—
MOV.L
Rm,@–Rn
Rn – 4 → Rn, Rm → (Rn)
0010nnnnmmmm0110
1
—
MOV.B
@Rm+,Rn
(Rm) → sign extension → Rn,
Rm + 1 → Rm
0110nnnnmmmm0100
1
—
MOV.W
@Rm+,Rn
(Rm) → sign extension → Rn,
Rm + 2 → Rm
0110nnnnmmmm0101
1
—
MOV.L
@Rm+,Rn
(Rm) → Rn, Rm + 4 → Rm
0110nnnnmmmm0110
1
—
MOV.B
Rm,@(R0,Rn)
Rm → (R0 + Rn)
0000nnnnmmmm0100
1
—
MOV.W
Rm,@(R0,Rn)
Rm → (R0 + Rn)
0000nnnnmmmm0101
1
—
MOV.L
Rm,@(R0,Rn)
Rm → (R0 + Rn)
0000nnnnmmmm0110
1
—
MOV.B
@(R0,Rm),Rn
(R0 + Rm) → sign extension → Rn
0000nnnnmmmm1100
1
—
MOV.W
@(R0,Rm),Rn
(R0 + Rm) → sign extension → Rn
0000nnnnmmmm1101
1
—
MOV.L
@(R0,Rm),Rn
(R0 + Rm) → Rn
0000nnnnmmmm1110
1
—
Description: Transfers the source operand to the destination. When the operand is stored in
memory, the transferred data can be a byte, word, or longword. Loaded data from memory is
stored in a register after it is sign-extended to a longword.
Operation:
MOV(long m,long n)
{
R[n]=R[m];
PC+=2;
}
106
/* MOV Rm,Rn */
MOVBS(long m,long n)
/* MOV.B Rm,@Rn */
{
Write_Byte(R[n],R[m]);
PC+=2;
}
MOVWS(long m,long n)
/* MOV.W Rm,@Rn */
{
Write_Word(R[n],R[m]);
PC+=2;
}
MOVLS(long m,long n)
/* MOV.L Rm,@Rn */
{
Write_Long(R[n],R[m]);
PC+=2;
}
MOVBL(long m,long n)
/* MOV.B @Rm,Rn */
{
R[n]=(long)Read_Byte(R[m]);
if ((R[n]&0x80)==0) R[n]&0x000000FF;
else R[n]|=0xFFFFFF00;
PC+=2;
}
MOVWL(long m,long n)
/* MOV.W @Rm,Rn */
{
R[n]=(long)Read_Word(R[m]);
if ((R[n]&0x8000)==0) R[n]&0x0000FFFF;
else R[n]|=0xFFFF0000;
PC+=2;
}
MOVLL(long m,long n)
/* MOV.L @Rm,Rn */
{
R[n]=Read_Long(R[m]);
PC+=2;
}
107
MOVBM(long m,long n)
/* MOV.B Rm,@–Rn */
{
Write_Byte(R[n]–1,R[m]);
R[n]–=1;
PC+=2;
}
MOVWM(long m,long n)
/* MOV.W Rm,@–Rn */
{
Write_Word(R[n]–2,R[m]);
R[n]–=2;
PC+=2;
}
MOVLM(long m,long n)
/* MOV.L Rm,@–Rn */
{
Write_Long(R[n]–4,R[m]);
R[n]–=4;
PC+=2;
}
MOVBP(long m,long n) /* MOV.B @Rm+,Rn */
{
R[n]=(long)Read_Byte(R[m]);
if ((R[n]&0x80)==0) R[n]&0x000000FF;
else R[n]|=0xFFFFFF00;
if (n!=m) R[m]+=1;
PC+=2;
}
MOVWP(long m,long n)
/* MOV.W @Rm+,Rn */
{
R[n]=(long)Read_Word(R[m]);
if ((R[n]&0x8000)==0) R[n]&0x0000FFFF;
else R[n]|=0xFFFF0000;
if (n!=m) R[m]+=2;
PC+=2;
}
108
MOVLP(long m,long n)
/* MOV.L @Rm+,Rn */
{
R[n]=Read_Long(R[m]);
if (n!=m) R[m]+=4;
PC+=2;
}
MOVBS0(long m,long n)
/* MOV.B Rm,@(R0,Rn) */
{
Write_Byte(R[n]+R[0],R[m]);
PC+=2;
}
MOVWS0(long m,long n)
/* MOV.W Rm,@(R0,Rn) */
{
Write_Word(R[n]+R[0],R[m]);
PC+=2;
}
MOVLS0(long m,long n) /* MOV.L Rm,@(R0,Rn) */
{
Write_Long(R[n]+R[0],R[m]);
PC+=2;
}
MOVBL0(long m,long n) /* MOV.B @(R0,Rm),Rn */
{
R[n]=(long)Read_Byte(R[m]+R[0]);
if ((R[n]&0x80)==0) R[n]&0x000000FF;
else R[n]|=0xFFFFFF00;
PC+=2;
}
MOVWL0(long m,long n) /* MOV.W @(R0,Rm),Rn */
{
R[n]=(long)Read_Word(R[m]+R[0]);
if ((R[n]&0x8000)==0) R[n]&0x0000FFFF;
else R[n]|=0xFFFF0000;
PC+=2;
}
109
MOVLL0(long m,long n) /* MOV.L @(R0,Rm),Rn */
{
R[n]=Read_Long(R[m]+R[0]);
PC+=2;
}
Example:
MOV
R0,R1
;Before execution:
;After execution:
MOV.W R0,@R1
;Before execution:
;After execution:
MOV.B @R0,R1
;Before execution:
;After execution:
MOV.W R0,@–R1
;Before execution:
;After execution:
MOV.L @R0+,R1
;Before execution:
;After execution:
MOV.B R1,@(R0,R2)
;Before execution:
;After execution:
MOV.W @(R0,R2),R1
;Before execution:
;After execution:
110
R0 = H'FFFFFFFF, R1 = H'00000000
R1 = H'FFFFFFFF
R0 = H'FFFF7F80
@R1 = H'7F80
@R0 = H'80, R1 = H'00000000
R1 = H'FFFFFF80
R0 = H'AAAAAAAA, R1 = H'FFFF7F80
R1 = H'FFFF7F7E, @R1 = H'AAAA
R0 = H'12345670
R0 = H'12345674, R1 = @H'12345670
R2 = H'00000004, R0 = H'10000000
R1 = @H'10000004
R2 = H'00000004, R0 = H'10000000
R1 = @H'10000004
7.2.31
MOV (Move Immediate Data): Data Transfer Instruction
Format
Abstract
Code
Cycle
T Bit
imm → sign extension → Rn
1110nnnniiiiiiii
1
—
MOV.W @(disp, PC),Rn
(disp × 2 + PC) → sign extension → Rn
1001nnnndddddddd
1
—
MOV.L @(disp, PC),Rn
(disp × 4 + PC) → Rn
1101nnnndddddddd
1
—
MOV
#imm,Rn
Description: Stores immediate data, which has been sign-extended to a longword, into general
register Rn.
If the data is a word or longword, table data stored in the address specified by PC + displacement
is accessed. If the data is a word, the 8-bit displacement is zero-extended and doubled.
Consequently, the relative interval from the table can be up to PC + 510 bytes. The PC points to
the starting address of the second instruction after this MOV instruction. If the data is a longword,
the 8-bit displacement is zero-extended and quadrupled. Consequently, the relative interval from
the table can be up to PC + 1020 bytes. The PC points to the starting address of the second
instruction after this MOV instruction, but the lowest two bits of the PC are corrected to B'00.
Note: The optimum table assignment is at the rear end of the module or one instruction after the
unconditional branch instruction. If the optimum assignment is impossible for the reason
of no unconditional branch instruction in the 510 byte/1020 byte or some other reason,
means to jump past the table by the BRA instruction are required. By assigning this
instruction immediately after the delayed branch instruction, the PC becomes the "first
address + 2".
Operation:
MOVI(long i,long n)
/* MOV #imm,Rn */
{
if ((i&0x80)==0) R[n]=(0x000000FF & (long)i);
else R[n]=(0xFFFFFF00 | (long)i);
PC+=2;
}
111
MOVWI(long d,long n)
/* MOV.W @(disp,PC),Rn */
{
long disp;
disp=(0x000000FF & (long)d);
R[n]=(long)Read_Word(PC+(disp<<1));
if ((R[n]&0x8000)==0) R[n]&=0x0000FFFF;
else R[n]|=0xFFFF0000;
PC+=2;
}
MOVLI(long d,long n)
/* MOV.L @(disp,PC),Rn */
{
long disp;
disp=(0x000000FF & (long)d);
R[n]=Read_Long((PC&0xFFFFFFFC)+(disp<<2));
PC+=2;
}
Example:
Address
1000
MOV
#H'80,R1
;R1 = H'FFFFFF80
1002
MOV.W
IMM,R2
;R2 = H'FFFF9ABC, IMM means @(H'08,PC)
1004
ADD
#–1,R0
;
1006
TST
R0,R0
;← PC location used for address calculation for the
MOV.W instruction
1008
MOVT
R13
;
100A
BRA
NEXT
;Delayed branch instruction
100C
MOV.L
@(4,PC),R3
;R3 = H'12345678
100E IMM
.data.w
H'9ABC
;
1010
.data.w
H'1234
;
1012 NEXT
JMP
@R3
;Branch destination of the BRA instruction
1014
CMP/EQ
#0,R0
;← PC location used for address calculation for the
;MOV.L instruction
.align
4
;
.data.l
H'12345678
;
1018
112
7.2.32
MOV (Move Peripheral Data): Data Transfer Instruction
Format
Abstract
Code
Cycle
T Bit
MOV.B @(disp,GBR),R0
(disp + GBR) → sign extension → R0
11000100dddddddd
1
—
MOV.W @(disp,GBR),R0
(disp × 2 + GBR) → sign extension →
R0
11000101dddddddd
1
—
MOV.L @(disp,GBR),R0
(disp × 4 + GBR) → R0
11000110dddddddd
1
—
MOV.B R0,@(disp,GBR)
R0 → (disp + GBR)
11000000dddddddd
1
—
MOV.W R0,@(disp,GBR)
R0 → (disp × 2 + GBR)
11000001dddddddd
1
—
MOV.L R0,@(disp,GBR)
R0 → (disp × 4 + GBR)
11000010dddddddd
1
—
Description: Transfers the source operand to the destination. This instruction is optimum for
accessing data in the peripheral module area. The data can be a byte, word, or longword, but only
the R0 register can be used.
A peripheral module base address is set to the GBR. When the peripheral module data is a byte,
the only change made is to zero-extend the 8-bit displacement. Consequently, an address within
+255 bytes can be specified. When the peripheral module data is a word, the 8-bit displacement is
zero-extended and doubled. Consequently, an address within +510 bytes can be specified. When
the peripheral module data is a longword, the 8-bit displacement is zero-extended and is
quadrupled. Consequently, an address within +1020 bytes can be specified. If the displacement is
too short to reach the memory operand, the above @(R0,Rn) mode must be used after the GBR
data is transferred to a general register. When the source operand is in memory, the loaded data is
stored in the register after it is sign-extended to a longword.
Note: The destination register of a data load is always R0. R0 cannot be accessed by the next
instruction until the load instruction is finished. The instruction order shown in figure 7.1
will give better results.
MOV.B @(12, GBR), R0
MOV.B @(12, GBR), R0
AND
#80, R0
ADD
#20, R1
ADD
#20, R1
AND
#80, R0
Figure 7.1 Using R0 after MOV
113
Operation:
MOVBLG(long d) /* MOV.B @(disp,GBR),R0 */
{
long disp;
disp=(0x000000FF & (long)d);
R[0]=(long)Read_Byte(GBR+disp);
if ((R[0]&0x80)==0) R[0]&=0x000000FF;
else R[0]|=0xFFFFFF00;
PC+=2;
}
MOVWLG(long d) /* MOV.W @(disp,GBR),R0 */
{
long disp;
disp=(0x000000FF & (long)d);
R[0]=(long)Read_Word(GBR+(disp<<1));
if ((R[0]&0x8000)==0) R[0]&=0x0000FFFF;
else R[0]|=0xFFFF0000;
PC+=2;
}
MOVLLG(long d) /* MOV.L @(disp,GBR),R0 */
{
long disp;
disp=(0x000000FF & (long)d);
R[0]=Read_Long(GBR+(disp<<2));
PC+=2;
}
114
MOVBSG(long d) /* MOV.B R0,@(disp,GBR) */
{
long disp;
disp=(0x000000FF & (long)d);
Write_Byte(GBR+disp,R[0]);
PC+=2;
}
MOVWSG(long d) /* MOV.W R0,@(disp,GBR) */
{
long disp;
disp=(0x000000FF & (long)d);
Write_Word(GBR+(disp<<1),R[0]);
PC+=2;
}
MOVLSG(long d) /* MOV.L R0,@(disp,GBR) */
{
long disp;
disp=(0x000000FF & (long)d);
Write_Long(GBR+(disp<<2),R[0]);
PC+=2;
}
Examples:
MOV.L
@(2,GBR),R0
;Before execution:
;After execution:
MOV.B
R0,@(1,GBR)
;Before execution:
;After execution:
@(GBR + 8) = H'12345670
R0 = H'12345670
R0 = H'FFFF7F80
@(GBR + 1) = H'FFFF7F80
115
7.2.33
MOV (Move Structure Data): Data Transfer Instruction
Format
Abstract
Code
Cycle
T Bit
R0 → (disp + Rn)
10000000nnnndddd
1
—
MOV.W R0,@(disp,Rn)
R0 → (disp × 2 + Rn)
10000001nnnndddd
1
—
MOV.L
Rm,@(disp,Rn)
Rm → (disp × 4 + Rn)
0001nnnnmmmmdddd
1
—
MOV.B
@(disp,Rm),R0
MOV.B
R0,@(disp,Rn)
(disp + Rm) → sign extension → R0
10000100mmmmdddd
1
—
MOV.W @(disp,Rm),R0
(disp × 2 + Rm) → sign extension → R0
10000101mmmmdddd
1
—
MOV.L
disp × 4 + Rm) → Rn
0101nnnnmmmmdddd
1
—
@(disp,Rm),Rn
Description: Transfers the source operand to the destination. This instruction is optimum for
accessing data in a structure or a stack. The data can be a byte, word, or longword, but when a byte
or word is selected, only the R0 register can be used. When the data is a byte, the only change
made is to zero-extend the 4-bit displacement. Consequently, an address within +15 bytes can be
specified. When the data is a word, the 4-bit displacement is zero-extended and doubled.
Consequently, an address within +30 bytes can be specified. When the data is a longword, the
4-bit displacement is zero-extended and quadrupled. Consequently, an address within +60 bytes
can be specified. If the displacement is too short to reach the memory operand, the aforementioned
@(R0,Rn) mode must be used. When the source operand is in memory, the loaded data is stored in
the register after it is sign-extended to a longword.
Note: When byte or word data is loaded, the destination register is always R0. R0 cannot be
accessed by the next instruction until the load instruction is finished. The instruction order
in figure 7.2 will give better results.
MOV.B @(2, R1), R0
MOV.B @(2, R1), R0
AND
#80, R0
ADD
#20, R1
ADD
#20, R1
AND
#80, R0
Figure 7.2 Using R0 after MOV
116
Operation:
MOVBS4(long d,long n) /* MOV.B R0,@(disp,Rn) */
{
long disp;
disp=(0x0000000F & (long)d);
Write_Byte(R[n]+disp,R[0]);
PC+=2;
}
MOVWS4(long d,long n) /* MOV.W R0,@(disp,Rn) */
{
long disp;
disp=(0x0000000F & (long)d);
Write_Word(R[n]+(disp<<1),R[0]);
PC+=2;
}
MOVLS4(long m,long d,long n)
/* MOV.L Rm,@(disp,Rn) */
{
long disp;
disp=(0x0000000F & (long)d);
Write_Long(R[n]+(disp<<2),R[m]);
PC+=2;
}
MOVBL4(long m,long d) /* MOV.B @(disp,Rm),R0 */
{
long disp;
disp=(0x0000000F & (long)d);
R[0]=Read_Byte(R[m]+disp);
if ((R[0]&0x80)==0) R[0]&=0x000000FF;
else R[0]|=0xFFFFFF00;
PC+=2;
}
117
MOVWL4(long m,long d) /* MOV.W @(disp,Rm),R0 */
{
long disp;
disp=(0x0000000F & (long)d);
R[0]=Read_Word(R[m]+(disp<<1));
if ((R[0]&0x8000)==0) R[0]&=0x0000FFFF;
else R[0]|=0xFFFF0000;
PC+=2;
}
MOVLL4(long m,long d,long n)
/* MOV.L @(disp,Rm),Rn */
{
long disp;
disp=(0x0000000F & (long)d);
R[n]=Read_Long(R[m]+(disp<<2));
PC+=2;
}
Examples:
MOV.L
@(2,R0),R1
;Before execution: @(R0 + 8) = H'12345670
;After execution:
MOV.L
R0,@(H'F,R1)
;Before execution: R0 = H'FFFF7F80
;After execution:
118
R1 = H'12345670
@(R1 + 60) = H'FFFF7F80
7.2.34
MOVA (Move Effective Address): Data Transfer Instruction
Format
Abstract
Code
Cycle
T Bit
MOVA @(disp,PC),R0
disp × 4 + PC → R0
11000111dddddddd
1
—
Description: Stores the effective address of the source operand into general register R0. The 8-bit
displacement is zero-extended and quadrupled. Consequently, the relative interval from the
operand is PC + 1020 bytes. The PC is the address four bytes after this instruction, but the lowest
two bits of the PC are corrected to B'00.
Note: If this instruction is placed immediately after a delayed branch instruction, the PC must
point to an address specified by (the starting address of the branch destination) + 2.
Operation:
MOVA(long d)
/* MOVA @(disp,PC),R0 */
{
long disp;
disp=(0x000000FF & (long)d);
R[0]=(PC&0xFFFFFFFC)+(disp<<2);
PC+=2;
}
Example:
Address .org
H'1006
1006
MOVA
STR,R0
;Address of STR → R0
1008
MOV.B
@R0,R1
;R1 = “X” ← PC location after correcting the lowest
R4,R5
two bits
;← Original PC location for address calculation for
the MOVA instruction
100A
ADD
.align 4
100C
STR:
.sdata “XYZP12”
...............
2002
BRA
TRGET
;Delayed branch instruction
2004
MOVA
@(0,PC),R0
;Address of TRGET + 2 → R0
2006
NOP
;
119
7.2.35
MOVT (Move T Bit): Data Transfer Instruction
Format
Abstract
Code
Cycle T Bit
MOVT Rn
T → Rn
0000nnnn00101001
1
—
Description: Stores the T bit value into general register Rn. When T = 1, 1 is stored in Rn, and
when T = 0, 0 is stored in Rn.
Operation:
MOVT(long n)
/* MOVT Rn */
{
R[n]=(0x00000001 & SR);
PC+=2;
}
Example:
XOR
R2,R2
;R2 = 0
CMP/PZ R2
;T = 1
MOVT
;R0 = 1
R0
;T = 0
CLRT
MOVT
120
R1
;R1 = 0
7.2.36
MUL.L (Multiply Long): Arithmetic Instruction
Format
Abstract
Code
Cycle
T Bit
MUL.L Rm,Rn
Rn × Rm → MACL
0000nnnnmmmm0111
2 to 4
—
Description: Performs 32-bit multiplication of the contents of general registers Rn and Rm, and
stores the bottom 32 bits of the result in the MACL register. The MACH register data does not
change.
Operation:
MUL.L(long m,long n) /* MUL.L Rm,Rn */
{
MACL=R[n]*R[m];
PC+=2;
}
Example:
MULL R0,R1
;Before execution: R0 = H'FFFFFFFE, R1 = H'00005555
STS
;Operation result
;After execution:
MACL,R0
MACL = H'FFFF5556
121
7.2.37
MULS.W (Multiply as Signed Word): Arithmetic Instruction
Format
MULS.W
MULS
Rm,Rn
Rm,Rn
Abstract
Code
Cycle
T Bit
Signed operation, Rn × Rm → MACL
0010nnnnmmmm1111
1 to 3
—
Description: Performs 16-bit multiplication of the contents of general registers Rn and Rm, and
stores the 32-bit result in the MACL register. The operation is signed and the MACH register data
does not change.
Operation:
MULS(long m,long n)
/* MULS Rm,Rn */
{
MACL=((long)(short)R[n]*(long)(short)R[m]);
PC+=2;
}
Example:
MULS R0,R1
;Before execution: R0 = H'FFFFFFFE, R1 = H'00005555
;After execution:
STS
122
MACL,R0
Operation result
MACL = H'FFFF5556
7.2.38
MULU.W (Multiply as Unsigned Word): Arithmetic Instruction
Format
Abstract
Code
Cycle
T Bit
MULU.W Rm,Rn
MULU
Rm,Rn
Unsigned, Rn × Rm → MACL
0010nnnnmmmm1110
1 to 3
—
Description: Performs 16-bit multiplication of the contents of general registers Rn and Rm, and
stores the 32-bit result in the MACL register. The operation is unsigned and the MACH register
data does not change.
Operation:
MULU(long m,long n)
/* MULU Rm,Rn */
{
MACL=((unsigned long)(unsigned short)R[n]
*(unsigned long)(unsigned short)R[m]);
PC+=2;
}
Example:
MULU
R0,R1
;Before execution:
;After execution:
STS
MACL,R0
R0 = H'00000002, R1 = H'FFFFAAAA
MACL = H'00015554
;Operation result
123
7.2.39
NEG (Negate): Arithmetic Instruction
Format
Abstract
Code
Cycle
T Bit
NEG Rm,Rn
0 – Rm → Rn
0110nnnnmmmm1011
1
—
Description: Takes the two’s complement of data in general register Rm, and stores the result in
Rn. This effectively subtracts Rm data from 0, and stores the result in Rn.
Operation:
NEG(long m,long n)
/* NEG Rm,Rn */
{
R[n]=0-R[m];
PC+=2;
}
Example:
NEG
R0,R1
;Before execution:
;After execution:
124
R0 = H'00000001
R1 = H'FFFFFFFF
7.2.40
NEGC (Negate with Carry): Arithmetic Instruction
Format
Abstract
Code
Cycle
T Bit
NEGC Rm,Rn
0 – Rm – T → Rn, Borrow → T
0110nnnnmmmm1010
1
Borrow
Description: Subtracts general register Rm data and the T bit from 0, and stores the result in Rn.
If a borrow is generated, T bit changes accordingly. This instruction is used for inverting the sign
of a value that has more than 32 bits.
Operation:
NEGC(long m,long n)
/* NEGC Rm,Rn */
{
unsigned long temp;
temp=0-R[m];
R[n]=temp-T;
if (0<temp)
T=1;
else T=0;
if (temp<R[n]) T=1;
PC+=2;
}
Examples:
;Sign inversion of R1 and R0 (64 bits)
CLRT
NEGC
R1,R1
NEGC
R0,R0
;Before execution:
R1 = H'00000001, T = 0
;After execution:
R1 = H'FFFFFFFF, T = 1
;Before execution: R0 = H'00000000, T = 1
;After execution:
R0 = H'FFFFFFFF, T = 1
125
7.2.41
NOP (No Operation): System Control Instruction
Format
Abstract
Code
Cycle
T Bit
NOP
No operation
0000000000001001
1
—
Description: Increments the PC to execute the next instruction.
Operation:
NOP()
/* NOP */
{
PC+=2;
}
Example:
NOP
126
;Executes in one cycle
7.2.42
NOT (NOT—Logical Complement): Logic Operation Instruction
Format
Abstract
Code
Cycle
T Bit
NOT Rm,Rn
~Rm → Rn
0110nnnnmmmm0111
1
—
Description: Takes the one’s complement of general register Rm data, and stores the result in Rn.
This effectively inverts each bit of Rm data and stores the result in Rn.
Operation:
NOT(long m,long n)
/* NOT Rm,Rn */
{
R[n]=~R[m];
PC+=2;
}
Example:
NOT
R0,R1
;Before execution: R0 = H'AAAAAAAA
;After execution:
R1 = H'55555555
127
7.2.43
OR (OR Logical) Logic Operation Instruction
Format
Abstract
Code
Cycle
T Bit
OR
Rm,Rn
Rn | Rm → Rn
0010nnnnmmmm1011
1
—
OR
#imm,R0
R0 | imm → R0
11001011iiiiiiii
1
—
(R0 + GBR) | imm → (R0 + GBR)
11001111iiiiiiii
3
—
OR.B #imm,@(R0,GBR)
Description: Logically ORs the contents of general registers Rn and Rm, and stores the result in
Rn. The contents of general register R0 can also be ORed with zero-extended 8-bit immediate
data, or 8-bit memory data accessed by using indirect indexed GBR addressing can be ORed with
8-bit immediate data.
Operation:
OR(long m,long n) /* OR Rm,Rn */
{
R[n]|=R[m];
PC+=2;
}
ORI(long i)
/* OR #imm,R0 */
{
R[0]|=(0x000000FF & (long)i);
PC+=2;
}
ORM(long i)
/* OR.B #imm,@(R0,GBR) */
{
long temp;
temp=(long)Read_Byte(GBR+R[0]);
temp|=(0x000000FF & (long)i);
Write_Byte(GBR+R[0],temp);
PC+=2;
}
128
Examples:
OR
R0,R1
;Before execution:
;After execution:
OR
#H'F0,R0
;Before execution:
;After execution:
OR.B
#H'50,@(R0,GBR)
;Before execution:
;After execution:
R0 = H'AAAA5555, R1 = H'55550000
R1 = H'FFFF5555
R0 = H'00000008
R0 = H'000000F8
@(R0,GBR) = H'A5
@(R0,GBR) = H'F5
129
7.2.44
ROTCL (Rotate with Carry Left): Shift Instruction
Format
Abstract
Code
Cycle
T Bit
ROTCL Rn
T ← Rn ← T
0100nnnn00100100
1
MSB
Description: Rotates the contents of general register Rn and the T bit to the left by one bit, and
stores the result in Rn. The bit that is shifted out of the operand is transferred to the T bit (figure
7.3).
MSB
T
ROTCL
Figure 7.3 Rotate with Carry Left
Operation:
ROTCL(long n) /* ROTCL Rn */
{
long temp;
if ((R[n]&0x80000000)==0) temp=0;
else temp=1;
R[n]<<=1;
if (T==1) R[n]|=0x00000001;
else R[n]&=0xFFFFFFFE;
if (temp==1) T=1;
else T=0;
PC+=2;
}
Example:
ROTCL
R0
;Before execution:
;After execution:
130
R0 = H'80000000, T = 0
R0 = H'00000000, T = 1
LSB
7.2.45
ROTCR (Rotate with Carry Right): Shift Instruction
Format
ROTCR
Rn
Abstract
Code
Cycle
T Bit
T → Rn → T
0100nnnn00100101
1
LSB
Description: Rotates the contents of general register Rn and the T bit to the right by one bit, and
stores the result in Rn. The bit that is shifted out of the operand is transferred to the T bit
(figure 7.4).
MSB
LSB
T
ROTCR
Figure 7.4 Rotate with Carry Right
Operation:
ROTCR(long n) /* ROTCR Rn */
{
long temp;
if ((R[n]&0x00000001)==0) temp=0;
else temp=1;
R[n]>>=1;
if (T==1) R[n]|=0x80000000;
else R[n]&=0x7FFFFFFF;
if (temp==1) T=1;
else T=0;
PC+=2;
}
Examples:
ROTCR
R0
;Before execution:
;After execution:
R0 = H'00000001, T = 1
R0 = H'80000000, T = 1
131
7.2.46
ROTL (Rotate Left): Shift Instruction
Format
Abstract
Code
Cycle
T Bit
ROTL Rn
T ← Rn ← MSB
0100nnnn00000100
1
MSB
Description: Rotates the contents of general register Rn to the left by one bit, and stores the result
in Rn (figure 7.5). The bit that is shifted out of the operand is transferred to the T bit.
MSB
ROTL
T
Figure 7.5 Rotate Left
Operation:
ROTL(long n)
/* ROTL Rn */
{
if ((R[n]&0x80000000)==0) T=0;
else T=1;
R[n]<<=1;
if (T==1) R[n]|=0x00000001;
else R[n]&=0xFFFFFFFE;
PC+=2;
}
Examples:
ROTL
R0
;Before execution:
;After execution:
132
R0 = H'80000000, T = 0
R0 = H'00000001, T = 1
LSB
7.2.47
ROTR (Rotate Right): Shift Instruction
Format
Abstract
Code
Cycle
T Bit
ROTR Rn
LSB → Rn → T
0100nnnn00000101
1
LSB
Description: Rotates the contents of general register Rn to the right by one bit, and stores the
result in Rn (figure 7.6). The bit that is shifted out of the operand is transferred to the T bit.
MSB
LSB
T
ROTR
Figure 7.6 Rotate Right
Operation:
ROTR(long n)
/* ROTR Rn */
{
if ((R[n]&0x00000001)==0) T=0;
else T=1;
R[n]>>=1;
if (T==1) R[n]|=0x80000000;
else R[n]&=0x7FFFFFFF;
PC+=2;
}
Examples:
ROTR
R0
;Before execution:
;After execution:
R0 = H'00000001, T = 0
R0 = H'80000000, T = 1
133
7.2.48
RTE (Return from Exception): System Control Instruction
Class: Delayed branch instruction
Format
Abstract
Code
Cycle
T Bit
RTE
Delayed branch, Stack area → PC/SR
0000000000101011
4
LSB
Description: Returns from an interrupt routine. The PC and SR values are restored from the stack,
and the program continues from the address specified by the restored PC value. The T bit is used
as the LSB bit in the SR register restored from the stack area.
Note: Since this is a delayed branch instruction, the instruction after this RTE is executed before
branching. No address errors and interrupts are accepted between this instruction and the
next instruction. If the next instruction is a branch instruction, it is acknowledged as an
illegal slot instruction.
Operation:
RTE()
/* RTE */
{
unsigned long temp;
temp=PC;
PC=Read_Long(R[15])+4;
R[15]+=4;
SR=Read_Long(R[15])&0x0FFF0FFF;
R[15]+=4;
Delay_Slot(temp+2);
}
Example:
;Returns to the original routine
RTE
ADD
#8,R14
;Executes ADD before branching
Note: When a delayed branch instruction is used, the branching operation takes place after the
slot instruction is executed, but the execution of instructions (register update, etc.) takes
place in the sequence delayed branch instruction → delayed slot instruction. For example,
even if a delayed slot instruction is used to change the register where the branch
destination address is stored, the register content previous to the change will be used as the
branch destination address.
134
7.2.49
RTS (Return from Subroutine): Branch Instruction (Class: Delayed Branch
Instruction)
Format
Abstract
Code
Cycle
T Bit
RTS
Delayed branch, PR → PC
0000000000001011
2
—
Description: Returns from a subroutine procedure. The PC values are restored from the PR, and
the program continues from the address specified by the restored PC value. This instruction is used
to return to the program from a subroutine program called by a BSR, BSRF, or JSR instruction.
Note: Since this is a delayed branch instruction, the instruction after this RTS is executed before
branching. No address errors and interrupts are accepted between this instruction and the
next instruction. If the next instruction is a branch instruction, it is acknowledged as an
illegal slot instruction.
Operation:
RTS()
/* RTS */
{
unsigned long temp;
temp=PC;
PC=PR+4;
Delay_Slot(temp+2);
}
135
Example:
MOV.L
TABLE,R3
;R3 = Address of TRGET
JSR
@R3
;Branches to TRGET
;Executes NOP before branching
NOP
ADD
R0,R1
;← Return address for when the subroutine procedure is
completed (PR data)
.............
TABLE: .data.l
TRGET
;Jump table
R1,R0
;← Procedure entrance
.............
TRGET: MOV
;PR data → PC
RTS
MOV
#12,R0
;
Executes MOV before branching
Note: When a delayed branch instruction is used, the branching operation takes place after the
slot instruction is executed, but the execution of instructions (register update, etc.) takes
place in the sequence delayed branch instruction → delayed slot instruction. For example,
even if a delayed slot instruction is used to change the register where the branch
destination address is stored, the register content previous to the change will be used as the
branch destination address.
136
7.2.50
SETT (Set T Bit): System Control Instruction
Format
Abstract
Code
Cycle
T Bit
SETT
1→T
0000000000011000
1
1
Description: Sets the T bit to 1.
Operation:
SETT() /* SETT */
{
T=1;
PC+=2;
}
Example:
SETT
;Before execution: T = 0
;After execution: T = 1
137
7.2.51
SHAL (Shift Arithmetic Left): Shift Instruction
Format
Abstract
Code
Cycle
T Bit
SHAL Rn
T ← Rn ← 0
0100nnnn00100000
1
MSB
Description: Arithmetically shifts the contents of general register Rn to the left by one bit, and
stores the result in Rn. The bit that is shifted out of the operand is transferred to the T bit
(figure 7.7).
MSB
SHAL
T
Figure 7.7 Shift Arithmetic Left
Operation:
SHAL(long n)
/* SHAL Rn (Same as SHLL) */
{
if ((R[n]&0x80000000)==0) T=0;
else T=1;
R[n]<<=1;
PC+=2;
}
Example:
SHAL
138
R0
;Before execution: R0 = H'80000001, T = 0
;After execution: R0 = H'00000002, T = 1
LSB
0
7.2.52
SHAR (Shift Arithmetic Right): Shift Instruction
Format
SHAR
Rn
Abstract
Code
Cycle
T Bit
MSB → Rn → T
0100nnnn00100001
1
LSB
Description: Arithmetically shifts the contents of general register Rn to the right by one bit, and
stores the result in Rn. The bit that is shifted out of the operand is transferred to the T bit (figure
7.8).
MSB
LSB
T
SHAR
Figure 7.8 Shift Arithmetic Right
Operation:
SHAR(long n)
/* SHAR Rn */
{
long temp;
if ((R[n]&0x00000001)==0) T=0;
else T=1;
if ((R[n]&0x80000000)==0) temp=0;
else temp=1;
R[n]>>=1;
if (temp==1) R[n]|=0x80000000;
else R[n]&=0x7FFFFFFF;
PC+=2;
}
Example:
SHAR
R0
;Before execution:
;After execution:
R0 = H'80000001, T = 0
R0 = H'C0000000, T = 1
139
7.2.53
SHLL (Shift Logical Left): Shift Instruction
Format
Abstract
Code
Cycle
T Bit
SHLL Rn
T ← Rn ← 0
0100nnnn00000000
1
MSB
Description: Logically shifts the contents of general register Rn to the left by one bit, and stores
the result in Rn. The bit that is shifted out of the operand is transferred to the T bit (figure 7.9).
MSB
SHLL
T
Figure 7.9 Shift Logical Left
Operation:
SHLL(long n)
/* SHLL Rn (Same as SHAL) */
{
if ((R[n]&0x80000000)==0) T=0;
else T=1;
R[n]<<=1;
PC+=2;
}
Examples:
SHLL
140
R0
;Before execution: R0 = H'80000001, T = 0
;After execution: R0 = H'00000002, T = 1
LSB
0
7.2.54
SHLLn (Shift Logical Left n Bits): Shift Instruction
Format
Abstract
Code
Cycle
T Bit
SHLL2
Rn
Rn << 2 → Rn
0100nnnn00001000
1
—
SHLL8
Rn
Rn << 8 → Rn
0100nnnn00011000
1
—
Rn << 16 → Rn
0100nnnn00101000
1
—
SHLL16 Rn
Description: Logically shifts the contents of general register Rn to the left by 2, 8, or 16 bits, and
stores the result in Rn. Bits that are shifted out of the operand are not stored (figure 7.10).
MSB
LSB
SHLL2
0
MSB
LSB
SHLL8
0
MSB
LSB
SHLL16
0
Figure 7.10 Shift Logical Left n Bits
141
Operation:
SHLL2(long n) /* SHLL2 Rn */
{
R[n]<<=2;
PC+=2;
}
SHLL8(long n) /* SHLL8 Rn */
{
R[n]<<=8;
PC+=2;
}
SHLL16(long n) /* SHLL16 Rn */
{
R[n]<<=16;
PC+=2;
}
Examples:
SHLL2
R0
;Before execution:
;After execution:
R0 = H'12345678
R0 = H'48D159E0
SHLL8
R0
;Before execution:
;After execution:
R0 = H'12345678
R0 = H'34567800
SHLL16 R0
;Before execution:
;After execution:
R0 = H'12345678
R0 = H'56780000
142
7.2.55
SHLR (Shift Logical Right): Shift Instruction
Format
Abstract
Code
Cycle
T Bit
SHLR Rn
0 → Rn → T
0100nnnn00000001
1
LSB
Description: Logically shifts the contents of general register Rn to the right by one bit, and stores
the result in Rn. The bit that is shifted out of the operand is transferred to the T bit (figure 7.11).
MSB
SHLR
LSB
0
T
Figure 7.11 Shift Logical Right
Operation:
SHLR(long n)
/* SHLR Rn */
{
if ((R[n]&0x00000001)==0) T=0;
else T=1;
R[n]>>=1;
R[n]&=0x7FFFFFFF;
PC+=2;
}
Examples:
SHLR
R0
;Before execution:
;After execution:
R0 = H'80000001, T = 0
R0 = H'40000000, T = 1
143
7.2.56
SHLRn (Shift Logical Right n Bits): Shift Instruction
Format
Abstract
Code
Cycle
T Bit
SHLR2
Rn
Rn>>2 → Rn
0100nnnn00001001
1
—
SHLR8
Rn
Rn>>8 → Rn
0100nnnn00011001
1
—
Rn>>16 → Rn
0100nnnn00101001
1
—
SHLR16 Rn
Description: Logically shifts the contents of general register Rn to the right by 2, 8, or 16 bits,
and stores the result in Rn. Bits that are shifted out of the operand are not stored (figure 7.12).
MSB
LSB
MSB
LSB
MSB
LSB
SHLR2
0
SHLR8
0
SHLR16
0
Figure 7.12 Shift Logical Right n Bits
144
Operation:
SHLR2(long n) /* SHLR2 Rn */
{
R[n]>>=2;
R[n]&=0x3FFFFFFF;
PC+=2;
}
SHLR8(long n) /* SHLR8 Rn */
{
R[n]>>=8;
R[n]&=0x00FFFFFF;
PC+=2;
}
SHLR16(long n) /* SHLR16 Rn */
{
R[n]>>=16;
R[n]&=0x0000FFFF;
PC+=2;
}
Examples:
SHLR2
R0
;Before execution:
;After execution:
R0 = H'12345678
R0 = H'048D159E
SHLR8
R0
;Before execution:
;After execution:
R0 = H'12345678
R0 = H'00123456
SHLR16 R0
;Before execution:
;After execution:
R0 = H'12345678
R0 = H'00001234
145
7.2.57
SLEEP (Sleep): System Control Instruction
Format
Abstract
Code
Cycle
T Bit
SLEEP
Sleep
0000000000011011
3
—
Description: Sets the CPU into power-down mode. In power-down mode, instruction execution
stops, but the CPU internal status is maintained, and the CPU waits for an interrupt request. If an
interrupt is requested, the CPU exits the power-down mode and begins exception processing.
Note:
The number of cycles given is for the transition to sleep mode.
Operation:
SLEEP() /* SLEEP */
{
PC-=2;
wait_for_exception;
}
Example:
SLEEP
146
;Enters power-down mode
7.2.58
STC (Store Control Register): System Control Instruction (Interrupt Disabled
Instruction)
Format
Abstract
Code
Cycle
T Bit
STC
SR,Rn
SR → Rn
0000nnnn00000010
1
—
STC
GBR,Rn
GBR → Rn
0000nnnn00010010
1
—
STC
VBR,Rn
VBR → Rn
0000nnnn00100010
1
—
STC.L
SR,@-Rn
Rn – 4 → Rn, SR → (Rn)
0100nnnn00000011
2
—
STC.L
GBR,@-Rn
Rn – 4 → Rn, GBR → (Rn)
0100nnnn00010011
2
—
STC.L
VBR,@-Rn
Rn – 4 → Rn, VBR → (Rn)
0100nnnn00100011
2
—
Description: Stores control register SR, GBR, or VBR data into a specified destination.
Note: No interrupts are accepted between this instruction and the next instruction. Address errors
are accepted.
Operation:
STCSR(long n)
/* STC SR,Rn */
{
R[n]=SR;
PC+=2;
}
STCGBR(long n) /* STC GBR,Rn */
{
R[n]=GBR;
PC+=2;
}
STCVBR(long n) /* STC VBR,Rn */
{
R[n]=VBR;
PC+=2;
}
147
STCMSR(long n) /* STC.L SR,@-Rn */
{
R[n]-=4;
Write_Long(R[n],SR);
PC+=2;
}
STCMGBR(long n)
/* STC.L GBR,@-Rn */
{
R[n]-=4;
Write_Long(R[n],GBR);
PC+=2;
}
STCMVBR(long n)
/* STC.L VBR,@-Rn */
{
R[n]-=4;
Write_Long(R[n],VBR);
PC+=2;
}
Examples:
STC
SR,R0
STC.L
GBR,@-R15
148
;Before execution:
;After execution:
;Before execution:
;After execution:
R0 = H'FFFFFFFF, SR = H'00000000
R0 = H'00000000
R15 = H'10000004
R15 = H'10000000, @R15 = GBR
7.2.59
STS (Store System Register): System Control Instruction (Interrupt Disabled
Instruction)
Format
Abstract
Code
Cycle
T Bit
STS
MACH,Rn
MACH → Rn
0000nnnn00001010
1
—
STS
MACL,Rn
MACL → Rn
0000nnnn00011010
1
—
STS
PR,Rn
PR → Rn
0000nnnn00101010
1
—
STS.L
MACH,@–Rn
Rn – 4 → Rn, MACH → (Rn)
0100nnnn00000010
1
—
STS.L
MACL,@–Rn
Rn – 4 → Rn, MACL → (Rn)
0100nnnn00010010
1
—
STS.L
PR,@–Rn
Rn – 4 → Rn, PR → (Rn)
0100nnnn00100010
1
—
Description: Stores data from system register MACH, MACL, or PR into a specified destination.
Note: No interrupts are accepted between this instruction and the next instruction. Address errors
are accepted.
Operation:
STSMACH(long n)
/* STS MACH,Rn */
{
R[n]=MACH;
PC+=2;
}
STSMACL(long n)
/* STS MACL,Rn */
{
R[n]=MACL;
PC+=2;
}
STSPR(long n)
/* STS PR,Rn */
{
R[n]=PR;
PC+=2;
}
149
STSMMACH(long n)
/* STS.L MACH,@–Rn */
{
R[n]–=4;
Write_Long(R[n],MACH);
PC+=2;
}
STSMMACL(long n)
/* STS.L MACL,@–Rn */
{
R[n]–=4;
Write_Long(R[n],MACL);
PC+=2;
}
STSMPR(long n) /* STS.L PR,@–Rn */
{
R[n]–=4;
Write_Long(R[n],PR);
PC+=2;
}
Example:
STS
MACH,R0
;Before execution:
;After execution:
R0 = H'FFFFFFFF, MACH = H'00000000
R0 = H'00000000
STS.L
PR,@–R15
;Before execution:
;After execution:
R15 = H'10000004
R15 = H'10000000, @R15 = PR
150
7.2.60
SUB (Subtract Binary): Arithmetic Instruction
Format
SUB
Rm,Rn
Abstract
Code
Cycle
T Bit
Rn – Rm → Rn
0011nnnnmmmm1000
1
—
Description: Subtracts general register Rm data from Rn data, and stores the result in Rn. To
subtract immediate data, use ADD #imm,Rn.
Operation:
SUB(long m,long n)
/* SUB Rm,Rn */
{
R[n]-=R[m];
PC+=2;
}
Example:
SUB
R0,R1
;Before execution: R0 = H'00000001, R1 = H'80000000
;After execution: R1 = H'7FFFFFFF
151
7.2.61
SUBC (Subtract with Carry): Arithmetic Instruction
Format
SUBC
Rm,Rn
Abstract
Code
Cycle
T Bit
Rn – Rm– T → Rn, Borrow → T
0011nnnnmmmm1010
1
Borrow
Description: Subtracts Rm data and the T bit value from general register Rn data, and stores the
result in Rn. The T bit changes according to the result. This instruction is used for subtraction of
data that has more than 32 bits.
Operation:
SUBC(long m,long n)
/* SUBC Rm,Rn */
{
unsigned long tmp0,tmp1;
tmp1=R[n]-R[m];
tmp0=R[n];
R[n]=tmp1-T;
if (tmp0<tmp1) T=1;
else T=0;
if (tmp1<R[n]) T=1;
PC+=2;
}
Examples:
CLRT
SUBC
R3,R1
SUBC
R2,R0
152
;R0:R1(64 bits) – R2:R3(64 bits) = R0:R1(64 bits)
;Before execution:
T = 0, R1 = H'00000000, R3 = H'00000001
;After execution:
T = 1, R1 = H'FFFFFFFF
;Before execution:
T = 1, R0 = H'00000000, R2 = H'00000000
;After execution:
T = 1, R0 = H'FFFFFFFF
7.2.62
SUBV (Subtract with V Flag Underflow Check): Arithmetic Instruction
Format
SUBV
Rm,Rn
Abstract
Code
Cycle
T Bit
Rn – Rm → Rn, underflow → T
0011nnnnmmmm1011
1
Underflow
Description: Subtracts Rm data from general register Rn data, and stores the result in Rn. If an
underflow occurs, the T bit is set to 1.
Operation:
SUBV(long m,long n)
/* SUBV Rm,Rn */
{
long dest,src,ans;
if ((long)R[n]>=0) dest=0;
else dest=1;
if ((long)R[m]>=0) src=0;
else src=1;
src+=dest;
R[n]-=R[m];
if ((long)R[n]>=0) ans=0;
else ans=1;
ans+=dest;
if (src==1) {
if (ans==1) T=1;
else T=0;
}
else T=0;
PC+=2;
}
Examples:
SUBV
R0,R1
;Before execution:
;After execution:
R0 = H'00000002, R1 = H'80000001
R1 = H'7FFFFFFF, T = 1
SUBV
R2,R3
;Before execution:
;After execution:
R2 = H'FFFFFFFE, R3 = H'7FFFFFFE
R3 = H'80000000, T = 1
153
7.2.63
SWAP (Swap Register Halves): Data Transfer Instruction
Format
SWAP.B
Rm,Rn
SWAP.W Rm,Rn
Abstract
Code
Cycle
T Bit
Rm → Swap upper and lower
halves of lower 2 bytes → Rn
0110nnnnmmmm1000
1
—
Rm → Swap upper and lower word 0110nnnnmmmm1001
→ Rn
1
—
Description: Swaps the upper and lower bytes of the general register Rm data, and stores the
result in Rn. If a byte is specified, bits 0 to 7 of Rm are swapped for bits 8 to 15. The upper 16 bits
of Rm are transferred to the upper 16 bits of Rn. If a word is specified, bits 0 to 15 of Rm are
swapped for bits 16 to 31.
Operation:
SWAPB(long m,long n) /* SWAP.B Rm,Rn */
{
unsigned long temp0,temp1;
temp0=R[m]&0xffff0000;
temp1=(R[m]&0x000000ff)<<8;
R[n]=(R[m]>>8)&0x000000ff;
R[n]=R[n]|temp1|temp0;
PC+=2;
}
SWAPW(long m,long n) /* SWAP.W Rm,Rn */
{
unsigned long temp;
temp=(R[m]>>16)&0x0000FFFF;
R[n]=R[m]<<16;
R[n]|=temp;
PC+=2;
}
154
Examples:
SWAP.B
R0,R1
;Before execution:
;After execution:
R0 = H'12345678
R1 = H'12347856
SWAP.W
R0,R1
;Before execution:
;After execution:
R0 = H'12345678
R1 = H'56781234
155
7.2.64
TAS (Test and Set): Logic Operation Instruction
Format
TAS.B
Abstract
@Rn
Code
When (Rn) is 0, 1 → T, 1 → MSB 0100nnnn00011011
of (Rn)
Cycle
T Bit
4
Test results
Description: Reads byte data from the address specified by general register Rn, and sets the T bit
to 1 if the data is 0, or clears the T bit to 0 if the data is not 0. Then, data bit 7 is set to 1, and the
data is written to the address specified by Rn. During this operation, the bus is not released.
Operation:
TAS(long n)
/* TAS.B @Rn */
{
long temp;
temp=(long)Read_Byte(R[n]);
/* Bus Lock enable */
if (temp==0) T=1;
else T=0;
temp|=0x00000080;
Write_Byte(R[n],temp);
/* Bus Lock disable */
PC+=2;
}
Example:
_LOOP
156
TAS.B
@R7
BF
_LOOP
;R7 = 1000
;Loops until data in address 1000 is 0
7.2.65
TRAPA (Trap Always): System Control Instruction
Format
Abstract
Code
Cycle
T Bit
TRAPA #imm
PC/SR → Stack area, (imm × 4 + VBR)
→ PC
11000011iiiiiiii
8
—
Description: Starts the trap exception processing. The PC and SR values are stored on the stack,
and the program branches to an address specified by the vector. The vector is a memory address
obtained by zero-extending the 8-bit immediate data and then quadrupling it. The PC is the start
address of the next instruction. TRAPA and RTE are both used together for system calls.
Operation:
TRAPA(long i) /* TRAPA #imm */
{
long imm;
imm=(0x000000FF & i);
R[15]-=4;
Write_Long(R[15],SR);
R[15]-=4;
Write_Long(R[15],PC–2);
PC=Read_Long(VBR+(imm<<2))+4;
}
Example:
Address
VBR+H'80
.data.l
10000000 ;
..........
TRAPA
#H'20
TST
#0,R0
;Branches to an address specified by data in address VBR
+ H'80
;← Return address from the trap routine (stacked PC
value)
...........
..........
100000000 XOR
100000002 RTE
100000004 NOP
R0,R0
;← Trap routine entrance
;Returns to the TST instruction
;Executes NOP before RTE
157
7.2.66
TST (Test Logical): Logic Operation Instruction
Format
Abstract
Code
Cycle T Bit
TST
Rm,Rn
Rn & Rm, when result is 0, 1 → T
0010nnnnmmmm1000
1
Test
results
TST
#imm,R0
R0 & imm, when result is 0, 1 → T 11001000iiiiiiii
1
Test
results
(R0 + GBR) & imm, when result is
0, 1 → T
3
Test
results
TST.B #imm,
@(R0,GBR)
11001100iiiiiiii
Description: Logically ANDs the contents of general registers Rn and Rm, and sets the T bit to 1
if the result is 0 or clears the T bit to 0 if the result is not 0. The Rn data does not change. The
contents of general register R0 can also be ANDed with zero-extended 8-bit immediate data, or the
contents of 8-bit memory accessed by indirect indexed GBR addressing can be ANDed with 8-bit
immediate data. The R0 and memory data do not change.
Operation:
TST(long m,long n)
/* TST Rm,Rn */
{
if ((R[n]&R[m])==0) T=1;
else T=0;
PC+=2;
}
TSTI(long i)
/* TEST #imm,R0 */
{
long temp;
temp=R[0]&(0x000000FF & (long)i);
if (temp==0) T=1;
else T=0;
PC+=2;
}
158
TSTM(long i)
/* TST.B #imm,@(R0,GBR) */
{
long temp;
temp=(long)Read_Byte(GBR+R[0]);
temp&=(0x000000FF & (long)i);
if (temp==0) T=1;
else T=0;
PC+=2;
}
Examples:
TST
R0,R0
;Before execution: R0 = H'00000000
;After execution: T = 1
TST
#H'80,R0
;Before execution: R0 = H'FFFFFF7F
;After execution: T = 1
TST.B
#H'A5,@(R0,GBR)
;Before execution: @(R0,GBR) = H'A5
;After execution: T = 0
159
7.2.67
XOR (Exclusive OR Logical): Logic Operation Instruction
Format
Abstract
Code
Cycle T Bit
XOR
Rm,Rn
Rn ^ Rm → Rn
0010nnnnmmmm1010
1
—
XOR
#imm,R0
R0 ^ imm → R0
11001010iiiiiiii
1
—
XOR.B #imm,
(R0 + GBR) ^ imm → (R0 + GBR) 11001110iiiiiiii
@(R0,GBR)
3
—
Description: Exclusive ORs the contents of general registers Rn and Rm, and stores the result in
Rn. The contents of general register R0 can also be exclusive ORed with zero-extended 8-bit
immediate data, or 8-bit memory accessed by indirect indexed GBR addressing can be exclusive
ORed with 8-bit immediate data.
Operation:
XOR(long m,long n)
/* XOR Rm,Rn */
{
R[n]^=R[m];
PC+=2;
}
XORI(long i)
/* XOR #imm,R0 */
{
R[0]^=(0x000000FF & (long)i);
PC+=2;
}
XORM(long i)
/* XOR.B #imm,@(R0,GBR) */
{
long temp;
temp=(long)Read_Byte(GBR+R[0]);
temp^=(0x000000FF & (long)i);
Write_Byte(GBR+R[0],temp);
PC+=2;
}
160
Examples:
XOR
R0,R1
;Before execution:
;After execution:
R0 = H'AAAAAAAA, R1 = H'55555555
R1 = H'FFFFFFFF
XOR
#H'F0,R0
;Before execution:
;After execution:
R0 = H'FFFFFFFF
R0 = H'FFFFFF0F
XOR.B
#H'A5,@(R0,GBR)
;Before execution:
;After execution:
@(R0,GBR) = H'A5
@(R0,GBR) = H'00
161
7.2.68
XTRCT (Extract): Data Transfer Instruction
Format
Abstract
Code
Cycle
T Bit
XTRCT Rm,Rn
Rm: Center 32 bits of Rn → Rn
0010nnnnmmmm1101
1
—
Description: Extracts the middle 32 bits from the 64 bits of coupled general registers Rm and Rn,
and stores the 32 bits in Rn (figure 7.13).
MSB
LSB
MSB
Rm
LSB
Rn
Rn
Figure 7.13 Extract
Operation:
XTRCT(long m,long n) /* XTRCT Rm,Rn */
{
unsigned long temp;
temp=(R[m]<<16)&0xFFFF0000;
R[n]=(R[n]>>16)&0x0000FFFF;
R[n]|=temp;
PC+=2;
}
Example:
XTRCT
162
R0,R1
;Before execution: R0 = H'01234567, R1 = H'89ABCDEF
;After execution: R1 = H'456789AB
7.3
Floating Point Instructions and FPU Related CPU Instructions
The functions used in the descriptions of the operation of FPU calculations are as follows.
long FPSCR;
int T;
int load_long(long *adress, *data)
{
/* This function is defined in CPU part */
}
int store_long(long *adress, *data)
{
/* This function is defined in CPU part */
}
int sign_of(long *src)
{
return(*src >> 31);
}
int data_type_of(long *src)
{
float abs;
abs = *src & 0x7fffffff;
if(abs < 0x00800000) {
if(sign_of (src) == 0) return(PZERO);
else
return(NZERO);
}
else if((0x00800000 <= abs) && (abs < 0x7f800000))
return(NORM);
else if(0x7f800000 == abs) {
if(sign_of (src) == 0) return(PINF);
else
return(NINF);
}
else if(0x00400000 & abs)
return(sNaN);
else
return(qNaN);
}
}
clear_cause_VZ(){ FPSCR &= (~CAUSE_V & ~CAUSE_Z); }
163
set_V(){ FPSCR = (CAUSE_V  FLAG_V); }
set_Z(){ FPSCR = (CAUSE_Z  FLAG_Z); }
invalid(float *dest)
{
set_V();
if((FPSCR & ENABLE_V) == 0) qnan(dest);
}
}
dz(float *dest, int sign)
{
set_Z();
if((FPSCR & ENABLE_Z) == 0) inf (dest,sign);
}
zero(float *dest, int sign)
{
if(sign == 0)
*dest = 0x00000000;
else
*dest = 0x80000000;
}
int(float *dest, int sign)
{
if(sign == 0)
*dest = 0x7f800000;
else
*dest = 0xff800000;
}
qnan(float *dest)
{
*dest = 0x7fbfffff;
}
164
7.3.1
FABS (Floating Point Absolute Value): Floating Point Instruction
Format
Abstract
Code
Cycle
T Bit
FABS FRn
|FRn| → FRn
1111nnnn01011101
1
—
Description: Obtains arithmetic absolute value (as a floating point number) of the contents of
floating point register FRn. The calculation result is stored in FRn.
Operation:
FABS(float *Frn)
/* FABS FRn */
{
clear_cause_VZ();
case(data_type_of(FRn))
NORM:
{
if(sign_of(FRn) == 0)
*FRn = *FRn;
else
*FRn = -*FRn;
break;
PZERO :
NZERO :
zero(FRn,0);
break;
PINF
:
NINF
:
inf(FRn,0);
break;
qnan
:
qnan(FRn);
break;
sNaN
:
invalid(FRn);
break;
}
pc += 2;
}
FABS Special Cases
FRn
NORM
+0
–0
+INF
–INF
qNaN
sNaN
FABS(FRn)
ABS
+0
+0
+INF
+INF
qNaN
Invalid
Note: Non-normalized values are treated as zero.
Exceptions: Invalid operation
Examples:
FABS
FR2
; Floating point absolute value
; Before execution FR2=H'C0800000/*–4 in base 10*/
; After execution FR2=H'40800000/*4 in base 10*/
165
7.3.2
FADD (Floating Point Add): Floating Point Instruction
Format
Abstract
Code
Cycles T Bit
FADD FRm,FRn
FRn+FRm → FRn
1111nnnnmmmm0000
1
—
Description: Arithmetically adds (as floating point numbers) the contents of floating point
registers FRm and FRn. The calculation result is stored in FRn.
Operation:
FADD (float *FRm,FRn)
/* FADD FRm,FRn */
{
clear_cause_VZ();
if((data_type_of(FRm) = = sNaN)
||
(data_type_of(FRn) = = sNaN))
invalid(FRn);
else if((data_type_of(FRm) = = qNaN) ||
(data_type_of(FRn) = = qNaN))
else case(data_type_of(FRm))
qnan(FRn);
{
NORM:
case(data_type_of(FRn))
{
PINF
:
inf(FRn,0);
break;
NINF
:
inf(FRn,1);
break;
default
:
*FRn = *FRn + *FRm;
break;
}
break;
PZERO:
case(data_type_of(FRn))
NORM
:
PZERO
:
NZERO
{
*FRn = *FRn + *FRm;
break;
:
zero(FRn,0);
break;
PINF
:
inf(FRn,0);
break;
NINF
:
inf(FRn,1);
break;
}
break;
NZERO:
case(data_type_of(FRn)){
NORM
:
*FRn = *FRn + *FRm;
break;
PZERO
:
zero(FRn,0);
break;
NZERO
:
zero(FRn,1);
break;
PINF
:
inf(FRn,0);
break;
NINF
:
inf(FRn,1);
break;
}
PINF:
166
break;
case(data_type_of(FRn))
{
NINF
:
invalid(FRn);
break;
default
:
inf(FRn,0);
break;
}
break;
NINF:
case(data_type_of(FRn)){
PINF
:
invalid(FRn);
break;
default
:
inf(FRn,1);
break;
}
break;
}
pc += 2;
}
FADD Special Cases
FRm
FRn
NORM
NORM
+0
–0
+INF
ADD
+0
–INF
qNaN
sNaN
–INF
+0
–0
–0
+INF
–INF
–INF
+INF
Invalid
Invalid
–INF
qNaN
qNaN
sNaN
Invalid
Note: Non-normalized values are treated as zero.
Exceptions: Invalid operation
Examples:
FADD
FR2,FR3
; Floating point add
; Before execution:
;
; After execution:
;
FADD
FR5,FR4
FR2=H'40400000/*3 in base 10*/
FR3=H'3F800000/*1 in base 10*/
FR2=H'40400000
FR3=H'40800000/*4 in base 10*/
;
; Before execution:
;
; After execution:
;
FR5=H'40400000/*3 in base 10*/
FR4=H'C0000000/*–2 in base 10*/
FR5=H'40400000
FR4=H'3F800000/*1 in base 10*/
167
7.3.3
FCMP (Floating Point Compare): Floating Point Instruction
Format
Abstract
Code
Cycle
T Bit
FCMP/
EQ FRm,FRn
(FRn==FRm)? 1:0 → T
1111nnnnmmmm0100
1
Comparison
result
FCMP/GT FRm,FRn (FRn> FRm)? 1:0 → T
1111nnnnmmmm0101
1
Comparison
result
Description: Arithmetically compares (as floating point numbers) the contents of floating point
registers FRm and FRn. The calculation result (true/false) is written to the T bit.
Operation:
FCMP_EQ(float *FRm,FRn)
/* FCMP/EQ FRm,FRn */
{
clear_cause_VZ();
if (fcmp_chk(FRm,FRn) = = INVALID) {fcmp_invalid(0); }
else if(fcmp_chk(FRm,FRn) = = EQ)
T = 1;
else
T = 0;
pc += 2;
}
FCMP_GT(float *FRm,FRn)
/* FCMP/GT FRm,FRn */
{
clear_cause_VZ();
if (fcmp_chk(FRm,FRn)==INVALID)||{fcmp_chk(FRm,FRn)==UO)){
fcmp_invalid(0):}
else if(fcmp_chk(FRm,FRn) = = GT)
else
T = 1;
T = 0;
pc += 2;
}
fcmp_chk(float *FRm,*FRn)
{
if((data_type_of(FRm) == sNaN) ||
(data_type_of(FRn) == sNaN))
else
if((data_type_of(FRm) == qNaN) || ||
(data_type_of(FRn) == qNaN))
else
case(data_type_of(FRm))
NORM
return(UO);
{
:case(data_type_of(FRn))
PINF
168
return(INVALID);
:return(GT);
{
break;
NINF
:return(NOTGT);
break;
default
:
break;
}
break;
PZERO
NZERO
:
:
case(data_type_of(FRn))
{
PZERO
:
NZERO
:return(EQ);
break;
PINF
:return(GT);
break;
NINF
:return(NOTGT);
break;
default
:
break;
}
PINF
break;
:
case(data_type_of(FRn))
{
PINF
:return(EQ)
break;
default
:return(NOTGT);
break;
}
NINF
break;
:
case(data_type_of(FRn))
{
NINF
:return(EQ);
break;
default
:return(GT);
break;
}
break;
}
if(*FRn = = *FRm)
return(EQ);
else if(*FRn > *FRm)
return(GT);
else
return(NOTGT);
}
fcmp_invalid(int cmp_flag)
{
set_V();
if((FPSCR & ENABLE_V) = = 0)
T = cmp_flag;
}
169
FCMP Special Cases
FRm
FRn
NORM
NORM
+0
CMP
+0
–0
+INF
–INF
GT
!GT
qNaN
sNaN
EQ
–0
+INF
!GT
–INF
GT
EQ
EQ
qNaN
UO
sNaN
Invalid
Notes: 1. UO if result is FCMP/EQ, invalid if result is FCMP/GT.
2. Non-normalized values are treated as zero.
Exceptions: Invalid operation
Note: Four comparison operations that are independent of each other are defined in the IEEE
standard, but the SH-2E supports FCMP/EQ and FCMP/GT only. However, all
comparison conditions can be supported by using these two FCMP instructions in
combination with the BT and BF instructions.
(FRm = = FRn)
fcmp/eq FRm, FRn ; bt
(FRm ! = FRn)
fcmp/eq FRm, FRn ; bf
(FRm < FRn)
fcmp/gt FRm, FRn ; bt
(FRm <= FRn)
fcmp/gt FRn, FRm ; bt
(FRm > FRn)
fcmp/gt FRn, FRm ; bt
(FRm >= FRn)
fcmp/gt FRm, FRn ; bf
Unorder FRm, FRn
fcmp/eq FRm, FRm ; bf
Examples:
FCMP/EQ:
FLDI1
FR6
;FR6=H'3F800000/*1 in base 10*/
FLDI1
FR7
;FR7=H'3F800000
CLRT
;T Bit =0
FCMP/EQ
FR6,FR7
; Floating point compare, equal
BF
TRGET_F
; Don't branch (T=1)
BT/S
TRGET_T
; Branch
FADD
FR6,FR7
; Delay slot, FR7=H'40000000/*2 in base 10*/
NOP
170
NOP
TRGET_F FCMP/EQ
BT/S
FR6,FR7
; Don't branch (T=0)
TRGET_T
FLDI1
TRGET_T FCMP/EQ
FR7
; Delay slot
FR6,FR7
; T bit = 0
BF TRGET_F
; Branch first time only
NOP
;FR6=FR7=H'3F800000/*1 in base 10*/
.END
FCMP/GT:
FLDI1
FR2
FLDI1
FR7
FADD
FR2,FR7
;FR2=H'3F800000/*1 in base 10*/
;FR7=H'40000000/*2 in base 10*/
; T bit = 0
CLRT
FCMP/GT
FR2,FR7
; Floating point compare, greater than
BT/S
TRGET_T
; Branch (T=1)
FLDI1
FR7
TRGET_T FCMP/GT
BT
FR2,FR7
; T bit = 0
TRGET_T
; Don't branch (T=0)
.END
171
7.3.4
FDIV (Floating Point Divide): Floating Point Instruction
Format
Abstract
Code
Cycles
T Bit
FDIV FRm, FRn
FRn/FRm → FRn
1111nnnnmmmm0011
13
—
Description: Arithmetically divides (as floating point numbers) the contents of floating point
register FRn by the contents of floating point register FRm. The calculation result is stored in FRn.
Operation:
FDIV(float *FRm,*FRn)
/* FDIV FRm,FRn
*/
{
clear_cause_VZ();
if((data_type_of(FRm) = = sNaN) | |
(data_type_of(FRn) = = sNaN))
invalid(FRn);
else if((data_type_of(FRm) = = qNaN) | |
(data_type_of(FRn) = = qNaN))
else case((data_type_of(FRm)
NORM
qnan(FRn);
{
:
case(data_type_of(FRn))
{
PINF
:
NINF
:
inf(FRn,sign_of(FRm)^sign_of(FRn));
break;
default
:
*FRn =*FRn / *FRm;
break;
}
break;
PZERO :
NZERO :
case(data_type_of(FRn))
PZERO
:
NZERO
:
PINF
:
NINF
default
{
invalid(FRn);
break;
:
inf(FN,Sign_of(FRm)^sign_of(FRn));
break;
:
dz(FRn,sign_of(FRm)^sign_of(FRn));
break;
}
break;
PINF
:
NINF
:
case(data_type_of(FRn))
172
{
PINF
:
NINF
:
default
:zero (FRn,sign_of(FRm)^sign_of(FRn));
invalid(FRn);
break;
break
break;
}
pc += 2;
}
FDIV Special Cases
FRm
FRn
NORM
+0
–0
NORM
DIV
0
+0
DZ
Invalid
0
+0
–0
–0
+0
+INF
–INF
qNaN
sNaN
INF
–0
+INF
–INF
Invalid
qNaN
qNaN
sNaN
Invalid
Note: Non-normalized values are treated as zero.
Exceptions: Invalid operation, divide by zero
Examples:
FDIV
FR6, FR5
; Floating point divide
; Before execution:
;FR5=H'40800000/*4 in base 10*/
;
;FR6=H'40400000/*3 in base 10*/
; After execution:
;FR5=H'3FAAAAAA/*1.33... in base 10*/
;
;FR6=H'40400000
173
7.3.5
FLDI0 (Floating Point Load Immediate 0): Floating Point Instruction
Format
Abstract
Code
Cycles
T Bit
FLDI0 FRn
H'00000000 → FRn
1111nnnn10001101
1
—
Description: Loads the floating point number 0 (0x00000000) in floating point register FRn.
Operation:
FLDI0(float *FRn)
/* FLDI0 FRn */
{
*FRn = 0x00000000;
pc += 2;
}
Exceptions: None
Examples:
FLDI0
FR1
; Load immediate 0
; Before execution: FR1=x (don't care)
; After execution:
174
FR1=00000000
7.3.6
FLDI1 (Floating Point Load Immediate 1): Floating Point Instruction
Format
Abstract
Code
Cycles
T Bit
FLDI1 FRn
H'3F800000 → FRn
1111nnnn10011101
1
—
Description: Loads the floating point number 1 (0x3F800000) in floating point register Frn.
Operation:
FLDI1(float *FRn)
/* FLDI1 FRn */
{
*FRn = 0x3F800000;
pc += 2;
}
Exceptions: None
Examples:
FLDI1
FR2
; Load immediate 1
; Before execution:
FR2=x (don't care)
; After execution:
FR2=H'3F800000/*1 in base 10*/
175
7.3.7
FLDS (Floating Point Load to System Register): Floating Point Instruction
Format
Abstract
Code
Cycles
T Bit
FLDS FRm,FPUL
FRm → FPUL
1111nnnn00011101
1
—
Description: Loads the contents of floating point register FRm to system register FPUL.
Operation:
FLDS(float *FRm,*FPUL)
/* FLDS FRm,FPUL */
{
*FPUL = *FRm;
pc += 2;
}
Exceptions: None
Examples:
;Before execution of FLDS and FSTS:
FLDI1
FR6
;FR6=H'3F800000/*1 in base 10*/
FLDI0
FR2
;FR2=0
;After execution of FLDS and FSTS:
FLDS
FR6, FPUL
;FPUL=H'3F800000
FSTS
FPUL, FR2
;FR2= H'3F800000
176
7.3.8
FLOAT (Floating Point Convert from Integer): Floating Point Instruction
Format
Abstract
Code
Cycles
T Bit
FLOAT FPUL,FRn
(float)FPUL → FRn
1111nnnn00101101
1
—
Description: Interprets the contents of FPUL as an integer value and converts it into a floating
point number. The result is stored in floating point register FRn.
Operation:
FLOAT(int,*FPUL,float *FRn)
/* FLOAT FRn */
{
clear_cause_VZ();
*FRn = (float)*FPUL;
pc += 2;
}
Exceptions: None
Examples:
;Floating Point Convert from Integer
;Before execution of FLOAT instruction:
MOV.L
#H'00000003,R1
;
R1=H'00000003
FLDI0
FR2
;
FR2=0
LDS
R1, FPUL
;
FPUL=H'00000003
FLOAT
FPUL, FR2
;
FR2=H'40400000/*3 in base 10*/
;After execution of FLOAT instruction:
177
7.3.9
FMAC (Floating Point Multiply Accumulate): Floating Point Instruction
Format
Abstract
Code
Cycles
T Bit
FMAC FR0, FRm,FRn
FR0 × FRm+FRn → FRn
1111nnnnmmmm1110
1
—
Description: Arithmetically multiplies (as floating point numbers) the contents of floating point
registers FR0 and FRm. To this calculation result is added the contents of floating point register
FRn, and the result is stored in FRn.
Operation:
FMAC(float *FR0,*FRm,*FRn)
/* FMAC FR0,FRm,FRn */
{
long
tmp_FPSCR;
float
*tmp_FMUL = *FRm;
FMUL(F0,tmp_FMUL);
pc -=2;
/* correct pc */
tmp_FPSCR = FPSCR;
/* save cause field for FR0*FRm */
FADD(tmp_FMUL,FRn);
FPSCR |= tmp_FPSCR;
}
178
/* reflect cause field for F0*FRm
*/
FMAC Special Cases
FRn
FR0
FRm
+NORM –NORM
NORM
NORM
+0
–0
MAC
+INF
+INF
–INF
–INF
–INF
+INF
NORM
MAC
Invalid
+INF
–INF
–INF
+INF
+0
+INF
+INF
–INF
–INF
–INF
+INF
+NORM
MAC
Invalid
Invalid
+INF
–INF
–INF
+INF
+0
–0
+INF
–INF
–0
+0
–INF
+INF
Invalid
+0
+0
–0
+0
–0
–0
–0
+0
–0
+0
+INF
+INF
–INF
Invalid
–INF
–INF
+INF
+NORM
+INF
+INF
–INF
–INF
+INF
Invalid
–NORM
+INF
0
Invalid
+INF
–INF
sNaN
INF
–NORM
+INF
qNaN
Invalid
0
–0
–INF
INF
0
+0
+INF
Invalid
–INF
Invalid
+NORM
–INF
+INF
+INF
+INF
–INF
–NORM
0
qNaN
+INF
Invalid
–INF
–INF
Invalid
–INF
0
INF
–INF
Invalid
Invalid
Invalid
!sNaN
!NaN
qNaN
All types
sNaN
sNaN
All types
qNaN
Invalid
Note: Non-normalized values are treated as zero.
179
Exceptions: Invalid operation
Examples:
FMAC FR0, FR3, FR5
;Floating point multiply accumulate
FR0*FR3+FR5->FR5
FMAC FR0, FR0, FR5
FMAC FR0, FR5, FR0
180
;Before execution:
FR0=H'40000000/*2 in base 10*/
;
FR3=H'40800000/*4 in base 10*/
;
FR5=H'3F800000/*1 in base 10*/
;After execution:
FR0=H'40000000/*2 in base 10*/
;
FR3=H'40800000/*4 in base 10*/
;
FR5=H'41100000/*9 in base 10*/
;FR0*FR0+FR5->FR5
;Before execution:
FR0=H'40000000/*2 in base 10*/
;
FR5=H'3F800000/*1 in base 10*/
;After execution:
FR0=H'40000000/*2 in base 10*/
;
FR5=H'40A00000/*5 in base 10*/
;FR0*FR5+FR0->FR5
;Before execution:
FR0=H'40000000/*2 in base 10*/
;
FR5=H'40A00000/*5 in base 10*/
;After execution:
FR0=H'41400000/*12 in base 10*/
;
FR5=H'40A00000/*5 in base 10*/
7.3.10
FMOV (Floating Point Move): Floating Point Instruction
Format
Abstract
Code
Cycles
T Bit
1. FMOV FRm,Frn
FRm → FRn
1111nnnnmmmm1100
1
—
2. FMOV.S @Rm,FRn
(Rm) → FRn
1111nnnnmmmm1000
1
—
3. FMOV.S FRm, @Rn
FRm → (Rn)
1111nnnnmmmm1010
1
—
4. FMOV.S @Rm+,FRn
(Rm) → FRn,Rm+=4
1111nnnnmmmm1001
1
—
5. FMOV.S FRm,@-Rn
Rn-=4,FRm → (Rn)
1111nnnnmmmm1011
1
—
6. FMOV.S @(R0,Rm),FRn
(R0+Rm) → FRn
1111nnnnmmmm0110
1
—
7. FMOV.S FRm,@(R0,Rn)
FRm → (R0+Rn)
1111nnnnmmmm0111
1
—
Description:
1.
2.
3.
4.
5.
6.
7.
Moves the contents of floating point register FRm to floating point register FRn.
Loads the contents of the memory addresses specified by general-use register Rm to floating
point register FRn.
Stores the contents of floating point register FRm in the memory address position specified by
general-use register Rm.
Loads the contents of the memory addresses specified by general-use register Rm to floating
point register FRn. After the load completes successfully, increments the value of Rm by 4.
Stores the contents of floating point register FRm in the memory address position specified by
general-use register Rn-4. After the store completes successfully, the decremented value (Rn4) becomes the value of Rm.
Loads the contents of the memory addresses specified by general-use registers Rm and R0 to
floating point register FRn.
Stores the contents of floating point register FRm in the memory address position specified by
general-use registers Rn and R0.
181
Operation:
FMOV(float *FRm,*FRn)
/* FMOV.S FRm,FRn */
{
*FRn = *FRm;
pc += 2;
}
FMOV_LOAD(long *Rm,float *FRn)
{
if(load_long(Rm,FRn)
/* FMOV @Rm,FRn */
!=Address_Error)
load_long(Rm,FRn);
pc += 2;
}
FMOV_STORE(float *FRm,long *Rn)
{
/* FMOV.S FRm,@Rn */
if(store_long(FRm,tmp_address)
!=Address_Error)
store_long(FRm,Rn);
pc += 2;
}
FMOV_RESTORE(long *Rm,float *FRn)
{
if(load_long(Rm,FRn)
/* FMOV.S @Rm+,FRn */
!=Address_Error)
*Rm += 4;
pc += 2;
}
FMOV_SAVE(float *FRm,long *Rn)
/*FMOV.S FRm,@-Rn */
{
long
*tmp_address =*Rn -4;
if(store_long(FRm,tmp_address)
!=Address_Error)
Rn = tmp_address;
pc += 2;
}
FMOV_LOAD_index(long *Rm, long *R0, float *FRn)/* FMOV.S @(R0,Rm),FRn*/
{
if (load_long(&(*Rm+*R0),FRn),
! = Address_Error);
pc += 2;
}
FMOV_STORE_index(float *FRm,long *R0, long *Rn)/* FMOV.S FRm,@(R0,Rn)*/
182
{
if (store_long(FRm,&((*Rn+*R0)),
! = Address_Error);
pc += 2;
}
Exceptions: Address error
Examples:
FMOV.S
FMOV.S
FMOV.S
@R1, FR2
FR2, @R3
@R3+,FR3
;Load
;Before execution:
@R1=H'00ABCDEF
;
FR2=0
;After execution:
@R1=H'00ABCDEF
;
FR2=H'00ABCDEF
;Store
;Before execution:
@R3=0
;
FR2=H'40800000
;After execution:
@R3=H'40800000
;
FR2=H'40800000
;Restore
;Before execution:
R3=H'0C700028
;
@R3=H'40800000
;
FR3=0
;After execution:
R3=H'0C70002C
;
;
FMOV.S
FMOV.S
FR4, @-R3
@(R0, R3), FR4
FR3=H'40800000
;Save
;Before execution:
R3=H'0C700044
;
@R3=0
;
FR4=H'01234567
;After execution:
R3=H'0C700040
;
@R3=H'01234567
;
FR4=H'01234567
;Load with index
;Before execution:
R0=H'00000004
183
;
R3=H'0C700040
;
@H'0C700044=H'00ABCDEF
;
FR=4
;After execution:
R0=H'00000004
;
R3=H'0C700040
;
;
FMOV.S
FR5, @(R0,R3)
FR4=H'00ABCDEF
;Store with index
;Before execution:
R0=H'00000028
;
R3=H'0C700040
;
@H'0C700068=0
;
FR5=H'76543210
;After execution:
R0=H'00000028
;
R3=H'0C700040
;
@H'0C700068=H'76543210
;
FMOV.S
184
FR5, FR6
;Register file contents
;Before execution:
FR5=H'76543210
;
FR6=x(don't care)
;After execution:
FR5=H'76543210
;
FR6=H'76543210
7.3.11
FMUL (Floating Point Multiply): Floating Point Instruction
Format
Abstract
Code
Cycles
T Bit
FMUL FRm,FRn
FRn × FRm → FRn
1111nnnnmmmm0010
1
—
Description: Arithmetically multiplies (as floating point numbers) the contents of floating point
registers FRm and FRn. The calculation result is stored in FRn.
Operation:
FMUL(float *FRm,*FRn) /* FMUL FRm,FRn */
{
clear_cause_VZ();
if((data_type_of(FRm) = = sNaN)
||
(data_type_of(FRn) = = sNaN))
invalid(FRn);
else if((data_type_of(FRm) = = qNaN) ||
(data_type_of(FRn) = = qNaN))
else case(data_type_of(FRm)
NORM
qnan(FRn);
{
:
case(data_type_of(FRn))
{
PINF
:
NINF
: inf(FRn,sign_of(FRm)^sign_of(FRn)); break;
default: *FRn=(*FRn)*(*FRm);
}
break;
break;
PZERO
:
NZERO
:
case(data_type_of(FRn))
{
PINF
:
NINF
: invalid(FRn);
break;
default: zero(FRn,sign_of(FRm)^sign_of(FRn)); break;
}
break;
PINF
:
NINF
:
case(data_type_of(FRn))
{
PZERO
:
NZERO
: invalid(FRn);
break;
default:inf (FRn,sign_of(FRm)^sign_of(FRn)); break
}
break;
}
185
pc += 2;
}
FMUL Special Cases
FRm
FRn
NORM
+0
NORM
MUL
0
+0
0
+0
–0
–0
+0
–0
+INF
INF
–0
+INF
–INF
qNaN
INF
Invalid
–INF
Invalid
+INF
–INF
–INF
+INF
qNaN
qNaN
sNaN
Invalid
Note: Non-normalized values are treated as zero.
Exceptions: Invalid operation
Examples:
FMUL
186
sNaN
FR2, FR3
;Floating point multiply
;Before execution:
FR2=H'40000000/*2 in base 10*/
;
FR3=H'40800000/*4 in base 10*/
;After execution:
FR2=H'40000000
;
FR3=H'41000000/*8 in base 10*/
7.3.12
FNEG (Floating Point Negate): Floating Point Instruction
Format
Abstract
Code
Cycles
T Bit
FNEG FRn
-FRn → FRn
1111nnnn01001101
1
—
Description: Arithmetically negates (as a floating point number) the contents of floating point
register FRn. The calculation result is stored in FRn.
Operation:
FNEG(float *Frn)
/* FNEG FRn */
{
clear_cause_VZ();
case(data_type_of(FRn))
{
qNaN
:
qnan(FRn);
break;
sNaN
:
invalid(FRn);
break;
*FRn = -(*Frn);
break;
default :
}
pc += 2;
}
FNEG Special Cases
FRn
NORM
+0
–0
+INF
–INF
qNaN
sNaN
FNEG(FRn)
NEG
–0
+0
–INF
+INF
qNaN
Invalid
Note: Non-normalized values are treated as zero.
Exceptions: Invalid operation
Examples:
FNEG
FR2
;Floating point negate
;Before execution:
FR2=H'40800000/*4 in base 10*/
;After execution:
FR2=H'C0800000/*–4 in base 10*/
187
7.3.13
FSTS (Floating Point Store From System Register): Floating Point Instruction
Format
Abstract
Code
Cycles
T Bit
FSTS FPUL,FRn
FPUL → FRn
1111nnnn00001101
1
—
Description: Copies the contents of system register FPUL to floating point register FRn.
Operation:
FSTS(float *FRn,*FPUL)
/* FSTS FPUL,FRn */
{
*FRn = *FPUL;
pc += 2;
}
Exceptions: None
Examples:
MOV.L
#H'00000002, R2
FLDI0
FR5
LDS
R2,FPUL
FSTS
FPUL, R5
188
;Before execution of FSTS instruction: ;R2=H'00000002
;FR5=0
;After execution of FSTS instruction: ;R2=H'00000002
;FR5= H'00000002
7.3.14
FSUB (Floating Point Subtract): Floating Point Instruction
Format
Abstract
Code
Cycles
T Bit
FSUB FRm, FRn
FRn-FRm → FRn
1111nnnnmmmm0001
1
—
Description: Arithmetically subtracts (as floating point numbers) the contents of floating point
register FRm from contents of floating point register FRn. The calculation result is stored in FRn.
Operation:
FSUB(float *FRm,FRn)
/* FSUB FRm,FRn */
{
clear_cause_VZ();
if((data_type_of(FRm) = = sNaN)
| |
(data_type_of(FRn) = = sNaN))
invalid(FRn);
else if((data_type_of(FRm) = = qNaN) | |
(data_type_of(FRn) = = qNaN))
else case(data_type_of(FRm))
NORM
qnan(FRn);
{
:
case(data_tyoe_of(FRn))
{
PINF
:
inf(FRn,0);
break;
NINF
:
inf(FRn,1);
break;
default
:
*FRn = *FRn - *FRm;
break;
}
break;
PZERO
:
case(data_type_of(FRn))
{
NORM
:
*FRn = *FRn- *FRm;
break;
PZERO
:
zero(FRn,0);
break;
NZERO
:
zero(FRn,1);
break;
PINF
:
inf(FRn,0);
break;
NINF
:
inf(FRn,1);
break;
}
break;
NZERO
:
case(data_type_of(FRn))
{
NORM
:
*FRn = *FRn - *FRm; break;
PZERO
:
NZERO
:
zero(FRn,0);
break;
PINF
:
inf(FRn,0);
break;
189
NINF
:
inf(FRn,1);
break;
}
break;
PINF
:
case(data_type_of(FRn))
{
NINF
:
invalid(FRn);
break;
default
:
inf(FRn,1);
break;
NINF
:
}
break;
case(data_type_of(FRn))
{
PINF
:
invalid(FRn);
break;
default
:
inf(FRn,0);
break;
}
break;
}
pc += 2;
}
FSUB Special Cases
FRm
FRn
NORM
NORM
+0
–0
SUB
+0
+INF
–INF
+INF
–INF
qNaN
–0
–0
+0
+INF
–INF
–INF
+INF
Invalid
Invalid
qNaN
qNaN
sNaN
Invalid
Note: Non-normalized values are treated as zero.
Exceptions: Invalid operation
Examples:
FSUB
FR0, FR3
;Floating point subtract
;Before execution:
190
sNaN
;FR0=H'3F800000/*1 in base 10*/
;
;FR3=H'40E00000/*7 in base 10*/
;After execution:
;FR0=H'3F800000/*1 in base 10*/
;
;FR3=H'40C00000/*6 in base 10*/
FSUB
FR3, FR2
;
;Before execution:
;FR2=H'40800000/*4 in base 10*/
;
;FR3=H'40C00000/*6 in base 10*/
;After execution:
;FR2=H'C0000000/*–2 in base 10*/
;
;FR3=H'40C00000/*6 in base 10*/
191
7.3.15
FTRC (Floating Point Truncate And Convert To Integer): Floating Point
Instruction
Format
Abstract
Code
Cycles
T Bit
FTRC FRm, FPUL
(long)FRm → FPUL
1111nnnn00111101
1
—
Description: Interprets the contents of floating point register FRm as a floating point number and
converts it to an integer by truncating everything after the decimal point. The calculation result is
stored in FRn.
Operation:
#define N_INT_RANGE 0xCF000000
/* 01.000000 * 2^16 */
#define P_INT_RANGE 0x47FFFFFF
/* 1.fffffe * 2^30 */
FTRC(float *FRm,int *FPUL)
/* FTRC FRm,FPUL */
{
clear_cause_VZ();
case(ftrc_type_of(FRm))
{
NORM
:
*FPUL = (long)(*FRm);
break;
PINF
:
ftrc_invalid(0);
break;
NINF
:
ftrc_invalid(1);
break;
}
pc += 2;
}
int ftrc_type_of(long *src)
{
long abs;
abs = *src & 0x7FFFFFF;
if(sign_of(src) = = 0)
if(abs > 0x7F800000)
{
return(NINF);
/* NaN*/
else if(abs > P_INT_RANGE)
return(PINF);
/* out of range,+INF
else
return(NORM);
/* +0,+NORM
*/
*/
}
else
{
if(*src > N_INT_RANGE) return(NINF);/* out of range ,+INF,NaN*/
else
}
}
192
return(NORM);
/* -0,-NORM*/
ftrc_invalid(long *dest,int sign)
{
set_V();
if((FPSCR & ENABLE_V) = = 0) {
if(sign = = 0)
*dest = 0x7FFFFFFF;
else
*dest = 0x80000000;
}
}
FTRC Special Cases
FRn
NORM
+0
–0
FTRC
(FRn)
TRC
0
0
positive negative
out of
out of
range
rarge
7FFFFFF 8000000
F
0
+INF
-INF
qNaN
sNaN
Invalid
+MAX
Invalid
–MAX
Invalid
–MAX
Invalid
–MAX
Invalid
Note: Non-normalized values are treated as zero.
Exceptions: Invalid operation
Examples:
MOV.L
#H'402ED9EB, R2
LDS
R2, FPUL
FSTS
FPUL, FR6
FTRC
FR6, FPUL
STS
FPUL, R2
;FR6=H'402ED9EB/*2.7320 in base 10*/
;R2=H'00000002/*2 in base 10*/
;Before execution of FTRC and STS:
;
R2=H'402ED9EB
;
FR6=H'402ED9EB
;After execution of FTRC and STS:
;
R2=H'00000002
;
FR6=H'402ED9EB
193
7.3.16
LDS (Load to System Register): FPU Related CPU Instruction
Format
Abstract
Code
Cycles
T Bit
Rm → FPUL
0100nnnn01011010
1
—
1. LDS
Rm, FPUL
2. LDS.L
@Rm+,FPUL
(Rm) → FPUL,Rm+=4
0100nnnn01010110
1
—
3. LDS
Rm,FPSCR
Rm → FPSCR
0100nnnn01101010
1
—
4. LDS.L
@Rm+,FPSCR (Rm) → FPSCR,Rm+=4
0100nnnn01100110
1
—
Description:
1.
2.
Moves the contents of general-use register Rm to system register FPUL.
Loads the contents of the memory addresses specified by general-use register Rm to system
register FPUL. After the load completes successfully, increments the value of Rm by 4.
Moves the contents of general-use register Rm to system register FPSCR. Previously defined
bits in FPSCR are not changed.
Loads the contents of the memory addresses specified by general-use register Rm to system
register FPSCR. After the load completes successfully, increments the value of Rm by 4.
Previously defined bits in FPSCR are not changed.
3.
4.
Operation:
#define FPSCR_MASK 0x00018C60
LDS(long *Rm,*FPUL)
/* LDS Rm,FPUL */
{
*FPUL = *Rm;
pc += 2;
}
LDS_RESTORE(long *Rm, *FPUL)
/* LDS.L @Rm+,FPUL */
{
if(load_long(Rm,FPUL) != Address_Error) *Rm += 4 ;
pc += 2;
}
LDS(long *Rm,*FPSCR)
/* LDS Rm,FPSCR */
{
*FPSCR = *Rm & FPSCR_MASK;
pc += 2;
}
LDS_RESTORE(long *Rm, *FPSCR)
194
/* LDS.L @Rm+,FPSCR */
{
long *tmp_FPSCR;
if(load_long(Rm, tmp_FPSCR) != Address_Error){
*FPSCR =*tmp_FPSCR & FPSCR_MASK;
*Rm += 4 ;
}
pc += 2;
}
Exceptions: Address error
Examples:
• LDS
Example 1
MOV.L
#H'12345678, R2
;Before execution of LDS and FSTS instructions:
;
R2=H'12345678
FR3=0
FLDI0
FR3
;
LDS
R2, FPUL
;After execution of LDS and FSTS instructions:
;
R2=H'12345678
FSTS
FPUL, FR3
;
FR3= H'12345678
Example 2
MOV.L
#H'00040801, R4
;After execution of LDS instruction:
LDS
R4, FPSCR
;FPSCR=00040801
LDI0
FR0
;Before execution of LDS.L and FSTS instructions:
MOV.L
#H'87654321, R4
;
FR0=0
MOV.L
#H'0C700128, R8
;
R8=0C700128
MOV.L
R4,@R8
;After execution of LDS.L and FSTS instructions:
LDS.L
@R8+, FPUL
;
FR0=87654321
FSTS
FPUL, FR0
;
R8=0C70012C
• LDS.L
Example 1
195
Example 2
#H'00040C01, R4
;Before execution of LDS.L instruction:
MOV.L
#H'0C700134, R8
;
MOV.L
R4,@R8
;After execution of LDS.L instruction:
MOV.L
LDS.L
196
@R8+, FPSCR
R8=0C700134
;
R8=0C700138
;
FPSCR=00040C01
7.3.17
STS (Store from FPU System Register): FPU Related CPU Instruction
Format
Abstract
Code
Cycles
T Bit
1. STS
FPUL,Rn
FPUL → Rn
0000nnnn01011010
1
—
2. STS.L
FPUL,@-Rn
Rn -= 4,FPUL → @(Rn)
0100nnnn01010010
1
—
3. STS
FPSCR,Rn
FPSCR → Rn
0000nnnn01101010
1
—
4. STS.L
FPSCR,@-Rn
Rn -= 4,FPSCR → @(Rn)
0100nnnn01100010
1
—
Description:
1.
2.
Moves the contents of system register FPUL to general-use register Rn.
Stores contents of system register FPUL at the memory address position specified by generaluse register Rn-4. After the store completes successfully, the decremented value becomes the
value of Rn.
Moves the contents of system register FPSCR to general-use register Rn.
Stores contents of system register FPSCR at the memory address position specified by
general-use register Rn-4. After the store completes successfully, the decremented value
becomes the value of Rn.
3.
4.
Operation:
STS(long *FPUL,*Rn)
/* STS.L FPUL,Rn */
{
*Rn = *FPUL;
pc += 2;
}
STS_SAVE(long *FPUL,*Rn)
/* STS.L FPUL,@-Rn */
{
long *tmp_address = *Rn - 4;
if(store_long(FPUL,tmp_address) != Address_Error)
Rn = tmp_address;
pc += 2;
}
STS(long *FPSCR,*Rn)
/* STS FPSCR,Rn */
{
*Rn = *FPSCR;
pc += 2;
}
197
STS STore from FPU System register
STS_RESTORE long *FPSCR,*Rn)
/* STS.L FPSCR,@-Rn */
{
long *tmp_address = *Rn - 4;
if(store_long(FPSCR tmp_address) != Address_Error)
Rn = tmp_address
pc += 2;
}
Exceptions: Address error
Examples:
• STS
Example 1
MOV.L
#H'12ABCDEF, R12
LDS.L
@R12, FPUL
STS
FPUL, R13
;After execution of STS instruction:
;
R13 = 12ABCDEF
Example 2
STS
FPSCR, R2
;After execution of STS instruction:
Contents of FPSCR at that point stored in R2 register
;
• STS.L
Example 1
MOV.L
#H'0C700148, R7
STS
FPUL, @-R7
;Before execution of STS.L instruction:
;
R7 = H'0C700148
;After execution of STS.L instruction:
R7 = H'0C700144, contents of FPUL saved at
;
address H'0C700144
;
198
location H'0C700144
Example 2
MOV.L
#H'0C700154, R8
STS.L
FPSCR, @-R8
;After execution of STS.L instruction:
;
Contents of FPSCR saved at address H'0C700150
199
Section 8 Pipeline Operation
This section describes the operation of the pipelines for each instruction. This information is
provided to allow calculation of the required number of CPU instruction execution states (system
clock cycles).
8.1
Basic Configuration of Pipelines
The Five-Stage Pipeline: Pipelines are composed of the following five stages:
• IF (Instruction fetch)
Fetches instruction from the memory where the program is stored.
• ID (Instruction decode)
Decodes the instruction fetched.
• EX (Instruction execution)
Does data operations and address calculations according to the results of decoding.
• MA (Memory access)
Accesses data in memory. Generated by instructions that involve memory access, with some
exceptions.
• WB (Write back)
Returns the results of the memory access (data) to a register. Generated by instructions that
involve memory loads, with some exceptions.
These stages flow with the execution of the instructions and thereby constitute a pipeline. At a
given instant, five instructions are being executed simultaneously. The basic pipeline flow is as
shown in figure 8.1. The period in which a single stage is operating is called a slot and is indicated
by two-way arrows (←→).
All instructions have at least the 3 stages IF, ID and EX, but not all have stages MA and WB. The
way the pipeline flows also varies with the type of instruction, with some having two MA stages,
some accessing the FPU (mm), and so on. Finally, conflicts can occur, for example between IF
and MA. When such a conflict occurs, the pipeline flow changes.
201
: Slot
Instruction 1
IF
Instruction 2
Instruction 3
Instruction 4
Instruction 5
ID
EX
MA
WB
IF
ID
EX
MA
WB
IF
ID
EX
MA
WB
IF
ID
EX
MA
WB
IF
ID
EX
MA
WB
IF
ID
EX
MA
Instruction 6
Instruction
stream
WB
Time
Figure 8.1 Basic Structure of Pipeline Flow
FPU Pipeline: The durations of the stages in the FPU pipeline are the same as those of the stages
in the CPU pipeline. In both pipelines, the first stage is instruction fetch (IF). The FPU pipeline
also has the following four additional stages:
• DF (Decode FPU)
Decodes the fetched instruction.
• E1 (FPU execution stage 1)
Initializes the floating-point operation.
• E2 (FPU execution stage 2)
Completes the floating-point operation.
• SF (Store FPU)
Stores the result in the FPU register.
All instructions pass through both the CPU and the FPU pipelines. Depending on the instruction,
operations are performed either by the CPU pipeline alone or by both pipelines.
In the case of floating-point instructions and FPU-related CPU instructions, the FPU pipeline and
CPU pipeline operate simultaneously in parallel.
In the case of instructions involving the CPU only, the FPU pipeline does not operate; only the
CPU pipeline operates.
Refer to 8.8 Instruction Pipeline Operation for details.
202
8.2
Slot and Pipeline Flow
The time period in which a single stage operates called a slot. Slots must follow the rules
described below.
Instruction Execution: Each stage (IF, ID, EX, MA, WB) of an instruction must be executed in
one slot. Two or more stages cannot be executed within one slot (figure 8.2), with exception of
WB and MA. Since WB is executed immediately after MA, however, some instructions may
execute MA and WB within the same slot.
: Slot
X
Instruction 1
IF
ID
Instruction 2
EX
MA WA
IF
ID
EX
MA W/D
Note: ID and EX of instruction 1 are executed in the same slot.
Figure 8.2 Impossible Pipeline Flow 1
Slot Sharing: A maximum of one stage from another instruction may be set per slot, and that
stage must be different from the stage of the first instruction. Identical stages from two different
instructions may never be executed within the same slot (figure 8.3).
: Slot
X
Instruction 1
IF
ID
EX
MA WB
Instruction 2
IF
ID
EX
MA WB
IF
ID
EX
MA
WB
Instruction 4
IF
ID
EX
MA WB
Instruction 5
IF
ID
EX
MA WB
Instruction 3
Note: Same stage of another instruction is being executed in same slot.
Figure 8.3 Impossible Pipeline Flow 2
Slot Length: The number of states (system clock cycles) S for the execution of one slot is
calculated with the following conditions:
• S = (the cycles of the stage with the highest number of cycles of all instruction stages
contained in the slot). This means that the instruction with the longest stage stalls others with
shorter stages.
203
• The number of execution cycles for each stage:





IF
ID
EX
MA
WB
The number of memory access cycles for instruction fetch
Always one cycle
Always one cycle
The number of memory access cycles for data access
Always one cycle
As an example, figure 8.4 shows the flow of a pipeline in which the IF (memory access for
instruction fetch) of instructions 1 and 2 are two cycles, the MA (memory access for data access)
of instruction 1 is three cycles and all others are one cycle. The dashes indicate the instruction is
being stalled.
(2)
Instruction 1
Instruction 2
IF
(2)
IF
(1)
(3)
(1)
ID
—
EX
MA
MA MA
WB
IF
IF
ID
EX
—
MA
—
Figure 8.4 Slots Requiring Multiple Cycles
204
(1)
WB
: Slot
Number of
cycles
8.3
Number of Instruction Execution Cycles
The number of instruction execution cycles is counted as the interval between execution of EX
stages. The number of cycles between the start of the EX stage for instruction 1 and the start of the
EX stage for the following instruction (instruction 2) is the execution time for instruction 1.
For example, in a pipeline flow like that shown in figure 8.5, the EX stage interval between
instructions 1 and 2 is five cycles, so the execution time for instruction 1 is five cycles. Since the
interval between EX stages for instructions 2 and 3 is one cycle, the execution time of instruction
2 is one cycle.
If a program ends with instruction 3, the execution time for instruction 3 should be calculated as
the interval between the EX stage of instruction 3 and the EX stage of a hypothetical instruction 4,
using a MOV Rm, Rn that follows instruction 3. (In figure 8.5, the execution time of instruction 3
would thus be one cycle.) In this example, the MA of instruction 1 and the IF of instruction 4 are
in contention. For operation during the contention between the MA and IF, see section 8.4,
Contention between Instruction Fetch (IF) and Memory Access (MA).
The total execution time for instructions 1 through 3 in Figure 8 is seven cycles (5 + 1 + 1).
: Slot
(2)
Instruction 1
Instruction 2
IF
(2)
IF
(4)
(2)
(1)
(1)
ID
—
EX
—
MA
MA
IF
IF
ID
—
—
—
—
EX
IF
IF
—
—
—
ID
EX
MA
IF
ID
EX )
Instruction 3
(Instruction 4 : MOV Rm, Rn
MA W/D
Figure 8.5 Method for Counting Instruction Execution Cycles
205
8.4
Contention between Instruction Fetch (IF) and Memory Access (MA)
Basic Operation when IF and MA Are in Contention: The IF and MA stages both access
memory, so they cannot operate simultaneously. When the IF and MA stages both try to access
memory within the same slot, the slot splits as shown in figure 8.6. When there is a WB, it is
executed immediately after the MA ends.
A
B
C
D
E
F
G
: Slot
Instruction 1
IF
Instruction 2
ID
EX
MA W/D
IF
ID
EX
MA W/D
IF
ID
EX
IF
ID
EX
IF
ID
Instruction 3
Instruction 4
Instruction 5
MA of instruction 2 and IF of
instruction 5 contend at E
When MA and IF are
A
B
C
D
MA of instruction 1 and IF of
instruction 4 contend at D
EX
in contention, the following occurs:
E
F
G
: Slot
Instruction 1
Instruction 2
Instruction 3
Instruction 4
Instruction 5
IF
ID
EX
MA WB
IF
ID
—
EX
MA
WB
IF
—
ID
—
EX
IF
—
ID
EX
IF
ID
Split at D
Split at E
EX
Figure 8.6 Operation when IF and MA Are in Contention
The slots in which MA and IF contend are split into two cycles. MA is given priority to execute in
the first half (when there is a WB, it immediately follows the MA), and the EX, ID, and IF are
executed simultaneously in the latter half. For example, in figure 8.6 the MA of instruction 1 is
executed in slot D while the EX of instruction 2, the ID of instruction 3 and IF of instruction 4 are
executed simultaneously thereafter. In slot E, the MA of instruction 2 is given priority and the EX
of instruction 3, the ID of instruction 4 and the IF of instruction 5 executed thereafter.
The number of cycles for a slot in which MA and IF are in contention is the sum of the number of
memory access cycles for the MA and the number of memory access cycles for the IF.
Relationship between Locations of Instructions in Memory and IF Stages: The SH-2E
accesses instructions in memory in the 32-bit mode. Since all of the SH-2E instructions have a
fixed length of 16 bits, it is basically possible to access two instructions per IF stage. Whether the
206
IF fetches one instruction or two depends on where in memory the instruction(s) are located
(word/longword boundary).
If an instruction is located at a longword boundary, it is possible to fetch two instructions using a
single IF operation. This means that the IF for the next instruction does not generate a separate bus
cycle in order to fetch the instruction. In addition, the IF for the instruction after that fetches two
instructions, and therefore the IF for the instruction which follows again generates no bus cycle.
In other words, IF stages for instructions located in memory at longword boundaries (instructions
for which the bottom two address bits are 00: A1 = 0, A0 = 0) actually fetch two instructions.
Therefore no bus cycle is generated by the IF for the following instruction. These instruction
fetches that do not generate bus cycles are indicated in lower case as "if" rather than IF. An "if" is
always one cycle.
On the other hand, if due to branching or the like an instruction at a word boundary (instructions
for which the bottom two address bits are 10: A1 = 1, A0 = 0) is fetched, only one instruction can
be fetched in the IF bus cycle. Consequently, the IF for the next instruction generates a bus cycle.
Then two instructions are fetched from the subsequent IF onward. Figure 8.7 illustrates the
operations described above.
32 bits
······ Instruction 1
Instruction 1 Instruction 2
Instruction 3 Instruction 4
Instruction 2
IF
ID
EX
if
ID
EX
IF
ID
EX
if
ID
EX
IF
ID
EX
if
ID
······ Instruction 3
Instruction 4
Instruction 5 Instruction 6
······ Instruction 5
IF
if
Instruction 6
On-chip ROM/RAM
or on-chip cache
Instruction 3 Instruction 4
Instruction 5 Instruction 6
EX
(a) Fetches Beginning with an Instruction (Instruction 1)
Located at a Long Word Boundary
IF
Instruction 2
: Slot
: Bus cycle generated
: No bus cycle
generated
······ Instruction 2
······ Instruction 3
Instruction 4
······ Instruction 5
Instruction 6
IF
if
ID
EX
IF
ID
EX
if
ID
EX
IF
ID
EX
if
ID
: Slot
: Bus cycle generated
: No bus cycle
generated
EX
(b) Fetches Beginning with an Instruction (Instruction 2)
Located at a Word Boundary
Figure 8.7 Relationship between Locations of Instructions in Memory and IF Stages
Relationship between Position of Instructions Located in On-Chip Memory and Contention
between IF and MA: When an instruction is located in on-chip memory, there are instruction
fetch stages (“if”, written in lower case) that do not generate bus cycles. When an if is in
contention with an MA, the slot will not split, as it does when an IF and an MA are in contention,
207
because ifs and MAs can be executed simultaneously. Such slots execute in the number of cycles
the MA requires for memory access. This is illustrated in Figure 8.8.
When programming, avoid contention of MA and IF whenever possible and pair MAs with ifs to
increase the instruction execution speed. In other words, if an instruction with a four (five) stage
pipeline consisting of IF, ID, EX, MA, (MB) is located at a memory longword boundary (the
instruction's bottom two address bits are 00: A1 = 0, A0 = 0), the MA stage uses the same slot as
the if following it, so no stall occurs.
32 bits
······ Instruction 1
Instruction 1 Instruction 2
Instruction 3 Instruction 4
Instruction 2
······ Instruction 3
Instruction 4
Instruction 5 Instruction 6
······ Instruction 5
Instruction 6
IF
ID
EX
MA
WB
if
ID
EX
MA
—
IF
ID
—
EX
if
—
ID
EX
IF
ID
EX
if
ID
WB
IF
if
: Split
: No split
EX
Note: In slot A there is contention between MA and if, so there is no split. In slot B there is contention
between MA and IF, resulting in a split.
Figure 8.8 Relationship between Position of Instructions Located in On-chip Memory and
Contention between IF and MA
208
8.5
Effects of Memory Load Instructions on the Pipeline
Instructions that involve loading from memory return data to the destination register during the
WB stage, which comes at the end of the pipeline. The WB stage of such a load instruction (load
instruction 1) will thus not have ended before after the EX stage of the instruction that
immediately follows it (instruction 2) begins.
When instruction 2 uses the same destination register as load instruction 1, the contents of that
register will not be ready, so any slot containing the MA of instruction 1 and EX of instruction 2
will split. When the destination register of load instruction 1 is the same as the destination, not the
source, of instruction 2 it will still split.
When the destination of load instruction 1 is the status register (SR) and the flag in it is fetched by
instruction 2 (as ADDC does), a split occurs. No split occurs, however, in the following cases:
• When instruction 2 is a load instruction and its destination is the same as that of load
instruction 1
• When instruction 2 is MAC @Rm+,@Rn+ and the destinations of Rm and load instruction 1
were the same
The number of cycles in the slot generated by the split is the number of MA cycles plus the
number of IF (or if) cycles, as shown in figure 8.9. This means the execution speed will be
lowered if the instruction that will use the results of the load instruction is placed immediately
after the load instruction. The instruction that uses the result of the load instruction will not slow
down the program if placed one or more instructions after the load instruction.
: Slot
Load instruction 1 (MOV.W@R0,R1)
Instruction 2 (ADD R1,R2)
Instruction 3
Instruction 4
IF
EX
EX
MA
WB
IF
ID
—
EX
IF
—
ID
EX
·····
IF
ID
·····
Figure 8.9 Effects of Memory Load Instructions on the Pipeline (1)
209
8.6
FPU Contention
In addition to the LDS and STS instructions, which move data between the CPU and FPU, loading
and storing floating point numbers also uses the MA stage of the pipeline. Consequently, such
instructions create contention with the IF stage.
If the register (FR0 to FR15, FPUL) to which the result of a floating point arithmetic calculation
instruction, the FMOV instruction, or a floating point number load instruction is stored is read
(used as the source register) by the next instruction, the execution of this instruction (the next
instruction) is delayed by one slot cycle (Figure 8.10).
Slot
Floating point arithmetic
calculation instruction
(FADD FR1, FR2)
IF
Next floating point instruction
(FMOV FR2, FR3)
ID
E1
E2
SF
IF
DF
—
E1
E2
SF
Figure 8.10 FPU Contention 1
If the LDS or LDS.L instruction is used to change the value of FPSCR, the execution of the next
instruction is delayed by two slot cycle (Figure 8.11).
Slot
Instruction 1
(LDS R2, FPSCR)
Instruction 2
(FADD FR4, FR5)
IF
ID
E1
E2
SF
IF
DF
—
—
E1
E2
SF
Figure 8.11 FPU Contention 2
If the STS or STS.L instruction is used to read the value of FPSCR the execution is delayed by
two slot cycle (Figure 8.12).
Slot
Instruction 1
(FADD FR6, FR9)
Instruction 2
(STS FPSCR, R3)
IF
ID
E1
E2
SF
IF
DF
—
—
E1
Figure 8.12 FPU Contention 3
210
E2
SF
The FDIV instruction require 13 cycles in the E1 stage. During this period, no other floating point
instruction or FPU-related CPU instruction may enter the E1 stage. If another floating point
instruction or FPU-related CPU instruction are encountered before the FDIV instruction has
finished using the E1 stage, the fixed slot duration for the execution of that instruction is delayed,
and the instruction enters the E1 stage only after the FDIV instruction has finished using the SF
stage (Figure 8.13).
Slot
Instruction 1
(FDIV FR6, FR7)
Floating point instruction
(FMOV FR8, FR10)
IF
ID
E1
...
E1
E2
SF
IF
DF
...
...
...
...
E1
E2
SF
Figure 8.13 FPU Contention 4
211
8.7
Programming Guide
When writing programs, follow the guidelines below in order to increase instruction execution
speed.
• Instructions with memory accesses (MA) should be located in memory at longword boundaries
(position where the instruction's bottom two address bits are 00: A1 = 0, A0 = 0). This will
prevent contention between MA and instruction fetch (IF).
• The instruction immediately following a memory load instruction should not use the same
register as the destination register of the load instruction.
• Instructions that use the FPU should be arranged so that they are not sequential. Also,
instructions that access registers MACH and MACL in order to fetch the results of operations
performed by the FPU should no be situated immediately following instructions that use the
FPU.
• The instruction immediately preceding a floating-point arithmetic operation instruction should
not use the destination register of the floating-point operation instruction.
• As far as possible, avoid placing a floating-point instruction or FPU-related CPU instruction
within the 14 instructions following the FDIV instruction.
8.8
Operation of Instruction Pipelines
This section describes the operation of the instruction pipelines. By combining these with the rules
described so far, the way pipelines flow in a program and the number of instruction execution
cycles can be calculated.
In the following figures, “Instruction A” refers to the instruction being discussed. When “IF” is
written in the instruction fetch stage, it may refer to either “IF” or “if”. When there is contention
between IF and MA, the slot will split, but the manner of the split is not discussed in the tables,
with a few exceptions. When a slot has split, see section 8.4, Contention between Instruction Fetch
(IF) and Memory Access (MA). Base your response on the rules for pipeline operation given
there.
Table 8.1 shows the number of instruction stages and number of execution cycles as follows:
•
•
•
•
•
•
Type: Given by function
Category: Categorized by differences in instruction operation
Stages: The number of stages in the instruction
Cycles: The number of execution cycles when there is no contention
Contention: Indicates the contention that occurs
Instructions: Gives a mnemonic for the instruction concerned
212
Table 8.1
Number of Instruction Stages and Execution Cycles
Type
Category
RegisterData
register
transfer
instructions transfer
instructions
Stages Cycles Contention
Instruction
3
MOV
#imm,Rn
MOV
Rm,Rn
MOVA
@(disp,PC),R0
MOVT
Rn
1
—
SWAP.B Rm,Rn
SWAP.W Rm,Rn
Memory load 5
instructions
1
•
•
XTRCT
Rm,Rn
MOV.W
@(disp,PC),Rn
MOV.L
@(disp,PC),Rn
MOV.B
Rm,@Rn
MOV.W
Rm,@Rn
MOV.L
Rm,@Rn
MOV.B
@Rm+,Rn
MOV.W
@Rm+,Rn
MA contends with MOV.L
IF
MOV.B
@Rm+,Rn
Contention occurs
when an
instruction that
uses the same
destination
register is placed
immediately after
this instruction
@(disp,Rm),R0
MOV.W
@(disp,Rm),R0
MOV.L
@(disp,Rm),Rn
MOV.B
@(R0,Rm),Rn
MOV.W
@(R0,Rm),Rn
MOV.L
@(R0,Rm),Rn
MOV.B
@(disp,GBR),R0
MOV.W
@(disp,GBR),R0
MOV.L
@(disp,GBR),R0
213
Table 8.1
Number of Instruction Stages and Execution Cycles (cont)
Type
Category
Stages
Memory store 4
Data
instructions
transfer
instructions
(cont)
3
Arithmetic
Arithmetic
instructions instructions
between
registers
(except
multiplic-ation
instruc-tions)
Cycles Contention
Instruction
1
MOV.B
@Rm,Rn
MOV.W
@Rm,Rn
MOV.L
@Rm,Rn
MOV.B
Rm,@–Rn
MOV.W
Rm,@–Rn
MOV.L
Rm,@–Rn
MOV.B
R0,@(disp,Rn)
MOV.W
R0,@(disp,Rn)
MOV.L
Rm,@(disp,Rn)
MOV.B
Rm,@(R0,Rn)
MOV.W
Rm,@(R0,Rn)
MOV.L
Rm,@(R0,Rn)
MOV.B
R0,@(disp,GBR)
MOV.W
R0,@(disp,GBR)
MOV.L
R0,@(disp,GBR)
ADD
Rm,Rn
ADD
#imm,Rn
ADDC
Rm,Rn
ADDV
Rm,Rn
1
MA contends with IF
—
CMP/EQ #imm,R0
CMP/EQ Rm,Rn
CMP/HS Rm,Rn
CMP/GE Rm,Rn
CMP/HI Rm,Rn
CMP/GT Rm,Rn
CMP/PZ Rn
CMP/PL Rn
CMP/STR Rm,Rn
DIV1
Rm,Rn
DIV0S
Rm,Rn
DIV0U
214
Table 8.1
Number of Instruction Stages and Execution Cycles (cont)
Type
Category
Stages Cycles Contention
Arithmetic
instructions
(cont)
Multiply/ add 7
instructions
Doublelength
multiply/
accumulate
instruction
9
Multiplication 6
instructions
3/(2)*1
3/(2 to
4)*1
•
If an instruction that
uses the FPU follows
this instruction, FPU
contention occurs.
•
MA contends with IF
•
If an instruction that
uses the FPU follows
this instruction, FPU
contention occurs.
•
MA contends with IF
1 to 3*1 •
•
Doublelength
multiply/
accumulate
instruction
9
2 to 4*1 •
•
If an instruction that
uses the FPU follows
this instruction, FPU
contention occurs.
Instruction
DT
Rn
EXTS.B
Rm,Rn
EXTS.W
Rm,Rn
EXTU.B
Rm,Rn
EXTU.W
Rm,Rn
NEG
Rm,Rn
NEGC
Rm,Rn
SUB
Rm,Rn
SUBC
Rm,Rn
SUBV
Rm,Rn
MAC.W
@Rm+,@Rn+
MAC.L
@Rm+,@Rn+
MULS.W Rm,Rn
MULU.W Rm,Rn
MA contends with IF
If an instruction that
uses the FPU follows
this instruction, FPU
contention occurs.
DMULS.L Rm,Rn
DMULU.L Rm,Rn
MUL.L
Rm,Rn
MA contends with IF
215
Table 8.1
Number of Instruction Stages and Execution Cycles (cont)
Type
Category
Stages Cycles Contention
3
RegisterLogic
register logic
operation
instructions operation
instructions
Memory logic 6
operations
instructions
TAS
instruction
Shift
Shift
instructions instructions
1
3
—
Instruction
AND
Rm,Rn
AND
#imm,R0
NOT
Rm,Rn
OR
Rm,Rn
OR
#imm,R0
TST
Rm,Rn
TST
#imm,R0
XOR
Rm,Rn
XOR
#imm,R0
MA contends with IF AND.B
#imm,@(R0,GBR)
OR.B
#imm,@(R0,GBR)
TST.B
#imm,@(R0,GBR)
XOR.B
#imm,@(R0,GBR)
6
4
MA contends with IF TAS.B
@Rn
3
1
—
Rn
ROTL
ROTR
Rn
ROTCL
Rn
ROTCR
Rn
SHAL
Rn
SHAR
Rn
SHLL
Rn
SHLR
Rn
SHLL2
Rn
SHLR2
Rn
SHLL8
Rn
SHLR8
Rn
SHLL16 Rn
SHLR16 Rn
216
Table 8.1
Number of Instruction Stages and Execution Cycles (cont)
Type
Category
Stages Cycles Contention
Branch
Conditional
instructions branch
instructions
3
Delayed
conditional
branch
instructions
3
3/1*2
2/1*2
—
—
Instruction
BF
label
BT
label
BF/S label
BT/S label
Unconditional 3
branch
instructions
2
—
BRA
label
BRAF Rm
BSR
label
BSRF Rm
JMP
@Rm
JSR
@Rm
RTS
System
System
control
control ALU
instructions instructions
3
1
—
CLRT
LDC
Rm,SR
LDC
Rm,GBR
LDC
Rm,VBR
LDS
Rm,PR
NOP
SETT
LDS.L
instructions
(PR)
5
1
STC
SR,Rn
STC
GBR,Rn
STC
VBR,Rn
STS
PR,Rn
•
Contention occurs LDS.L
when an
instruction that
uses the same
destination
register is placed
immediately after
this instruction
•
MA contends with
IF
@Rm+,PR
217
Table 8.1
Number of Instruction Stages and Execution Cycles (cont)
Type
Category
STS.L
System
instruction
control
instructions (PR)
(cont)
LDC.L
instructions
Stages Cycles Contention
Instruction
4
1
MA contends with IF
STS.L
PR,@–Rn
5
3
•
LDC.L
@Rm+,SR
LDC.L
@Rm+,GBR
LDC.L
@Rm+,VBR
STC.L
SR,@–Rn
STC.L
GBR,@–Rn
STC.L
VBR,@–Rn
•
STC.L
instructions
4
Register → 4
MAC transfer
instruction
218
2
1
Memory →
4
MAC transfer
instructions
1
MAC →
register
transfer
instruction
1
5
Contention occurs
when an instruction
that uses the same
destination register
is placed
immediately after
this instruction
MA contends with IF
MA contends with IF
•
Contention occurs
with multiplier
LDS
Rm,MACH
•
MA contends with IF LDS
Rm,MACL
•
Contention occurs
with multiplier
•
MA contends with IF
•
Contention occurs
with multiplier
•
Contention occurs
when an instruction
that uses the same
destination register
is placed
immediately after
this instruction
•
MA contends with IF
CLRMAC
LDS.L
@Rm+,MACH
LDS.L
@Rm+,MACL
STS
MACH,Rn
STS
MACL,Rn
Table 8.1
Number of Instruction Stages and Execution Cycles (cont)
Type
Category
Stages
Cycles Contention
System
control
instructions
(cont)
MAC →
memory
transfer
instruction
4
1
RTE
instruction
5
4
—
RTE
TRAP
instruction
9
8
—
TRAPA #imm
SLEEP
instruction
3
3
—
SLEEP
5 (FPU
pipeline)
4 (CPU
pipeline)
1
•
Contention occurs LDS
if next instruction LDS.L
reads FPUL
•
MA in CPU
pipeline contends
with IF
FPU-related FPUL load
CPU
instruction
instruction
•
•
Instruction
Contention occurs STS.L
with multiplier
STS.L
MACH,@–Rn
MACL,@–Rn
MA contends with
IF
Rm,FPUL
@Rm+,FPUL
FPSCR load 5 (FPU
instruction
pipeline)
4 (CPU
pipeline)
1
•
Contention occurs LDS
LDS.L
as shown in
Figure 8.11
Rm,FPSCR
@Rm+,FPSCR
FPUL store
instruction
(STS)
1
•
Contention occurs STS
if next instruction
uses Rn
FPUL,Rn
•
MA in CPU
pipeline contends
with IF
•
STS.L
MA in CPU
pipeline contends
with IF
FPUL store
instruction
(STS.L)
4 (FPU
pipeline)
5 (CPU
pipeline)
4 (FPU
pipeline)
4 (CPU
pipeline)
1
FPUL,@-Rn
219
Table 8.1
Number of Instruction Stages and Execution Cycles (cont)
Type
Category
FPU-related FPSCR store
instruction
CPU
instruction (STS)
(cont)
Stages
Cycles Contention
4 (FPU
pipeline)
5 (CPU
pipeline)
1
FPSCR store 4 (FPU
instruction
pipeline)
(STS.L)
4 (CPU
pipeline)
Floatingpoint
instruction
•
Contention occurs STS
as shown in
Figure 8.12
•
Contention occurs
if next instruction
uses Rn
•
MA in CPU
pipeline contends
with IF
•
Contention occurs STS.L
as shown in
Figure 8.12
•
MA in CPU
pipeline contends
with IF
FPSCR,Rn
FPSCR,@-Rn
Floating-point
register
transfer
instruction
5 (FPU
pipeline)
3 (CPU
pipeline)
1
•
Contention occurs FLDS
if next instruction FMOV
FSTS
reads destination
register
FRm,FPUL
FRm,FRn
FPUL,FRn
Floating-point
register
immediate
instruction
5 (FPU
pipeline)
3 (CPU
pipeline)
1
•
Contention occurs FLDI0
if next instruction FLDI1
reads destination
register
FRn
FRn
Floating-point 5 (FPU
register load pipeline)
instruction
4 (CPU
pipeline)
1
•
Contention occurs FMOV.S
if next instruction FMOV.S
FMOV.S
reads destination
register
@Rm,FRn
@Rm+,FRn
@(R0,Rm),FRn
•
MA in CPU
pipeline contends
with IF
•
FMOV.S
MA in CPU
pipeline contends FMOV.S
FMOV.S
with IF
Floating-point 4 (FPU
register store pipeline)
instruction
4 (CPU
pipeline)
220
1
Instruction
1
FRm,@Rn
FRm,@-Rn
FRm,@(R0,Rn)
Table 8.1
Number of Instruction Stages and Execution Cycles (cont)
Type
Category
Stages
Cycles Contention
Floatingpoint
instruction
(cont)
Floating-point
register
operation
instruction
(other than
FDIV)
5 (FPU
pipeline)
3 (CPU
pipeline)
1
•
Contention occurs
if next instruction
reads destination
register
Floating-point
register
operation
instruction
(FDIV)
17 (FPU
pipeline)
3 (CPU
pipeline)
13
•
Contention occurs FDIV
as shown in
Figure 8.13
Floating-point
register
compare
instruction
3 (FPU
pipeline)
3 (CPU
pipeline)
1
Instruction
FABS
FADD
FLOAT
FMAC
FMUL
FNEG
FSUB
FTRC
FRn
FRm,FRn
FPUL,FRn
FR0,FRm,FRn
FRm,FRn
FRn
FRm,FRn
FRm,FPUL
FRm,FRn
FCMP/EQ FRm,FRn
FCMP/GT FRm,FRn
Notes: 1. The normal minimum number of execution cycles. The number in parentheses is the
number of cycles when there is contention with following instructions.
2. One state when there is no branch.
221
8.8.1
Data Transfer Instructions
Register-Register Transfer Instructions
Instruction Types:
•
•
•
•
•
•
•
MOV
MOV
MOVA
MOVT
SWAP.B
SWAP.W
XTRCT
#imm, Rn
Rm, Rn
@(disp, PC), R0
Rn
Rm, Rn
Rm, Rn
Rm, Rn
Pipeline:
: Slot
Instruction A
Next instruction
Third instruction in series
......
IF
ID
EX
IF
ID
EX ......
IF
ID
EX ......
Figure 8.14 Register-Register Transfer Instruction Pipeline
Operation:
The pipeline ends after three stages: IF, ID, and EX. Data is transferred in the EX stage via the
ALU.
222
Memory Load Instructions
Instruction Types:
•
•
•
•
•
•
•
•
•
MOV.W
MOV.L
MOV.B
MOV.W
MOV.L
MOV.B
MOV.W
MOV.L
MOV.B
•
•
•
•
•
•
•
•
@(disp, PC), Rn
@(disp, PC), Rn
@Rm, Rn
@Rm, Rn
@Rm, Rn
@Rm+, Rn
@Rm+, Rn
@Rm+, Rn
@(disp, Rm), R0
MOV.W
MOV.L
MOV.B
MOV.W
MOV.L
MOV.B
MOV.W
MOV.L
@(disp, Rm), R0
@(disp, Rm), Rn
@(R0, Rm), Rn
@(R0, Rm), Rn
@(R0, Rm), Rn
@(disp, GBR), R0
@(disp, GBR), R0
@(disp, GBR), R0
Pipeline:
: Slot
Instruction A
Next instruction
Third instruction in series
......
IF
ID
EX MA
IF
ID
EX
WB
.....
IF
ID
EX
.....
Figure 8.15 Memory Load Instruction Pipeline
Operation:
The pipeline has five stages: IF, ID, EX, MA, and WB (figure 8.15). If an instruction that uses the
same destination register as this instruction is placed immediately after it, contention will occur.
(See section 8.5 Effects of Memory Load Instructions on the Pipeline)
223
Memory Store Instructions
Instruction Types:
•
•
•
•
•
•
•
•
MOV.B
MOV.W
MOV.L
MOV.B
MOV.W
MOV.L
MOV.B
MOV.W
•
•
•
•
•
•
•
Rm, @Rn
Rm, @Rn
Rm, @Rn
Rm, @–Rn
Rm, @–Rn
Rm, @–Rn
R0, @(disp, Rn)
R0, @(disp, Rn)
MOV.L
MOV.B
MOV.W
MOV.L
MOV.B
MOV.W
MOV.L
Rm, @(disp, Rn)
Rm, @(R0, Rn)
Rm, @(R0, Rn)
Rm, @(R0, Rn)
R0, @(disp, GBR)
R0, @(disp, GBR)
R0, @(disp, GBR)
Pipeline:
: Slot
Instruction A
Next instruction
Third instruction in series
......
IF
ID
EX MA
IF
ID
EX
.....
IF
ID
EX
.....
Figure 8.16 Memory Store Instructions Pipeline
Operation:
The pipeline has four stages: IF, ID, EX, and MA (figure 8.16). Data is not returned to the register
so there is no WB stage.
224
8.8.2
Arithmetic Instructions
Arithmetic Instructions between Registers (Except Multiplication Instructions): Include the
following instruction types:
•
•
•
•
•
•
•
•
•
•
•
•
•
ADD
ADD
ADDC
ADDV
CMP/EQ
CMP/EQ
CMP/HS
CMP/GE
CMP/HI
CMP/GT
CMP/PZ
CMP/PL
CMP/STR
•
•
•
•
•
•
•
•
•
•
•
•
•
Rm, Rn
#imm, Rn
Rm, Rn
Rm, Rn
#imm, R0
Rm, Rn
Rm, Rn
Rm, Rn
Rm, Rn
Rm, Rn
Rn
Rn
Rm, Rn
DIV1
DIV0S
DIV0U
DT
EXTS.B
EXTS.W
EXTU.B
EXTU.W
NEG
NEGC
SUB
SUBC
SUBV
Rm, Rn
Rm, Rn
Rn
Rm, Rn
Rm, Rn
Rm, Rn
Rm, Rn
Rm, Rn
Rm, Rn
Rm, Rn
Rm, Rn
Rm, Rn
: Slot
Instruction A
Next instruction
Third instruction in series
......
IF
ID
EX MA
IF
ID
EX
.....
IF
ID
EX
.....
Figure 8.17 Pipeline for Arithmetic Instructions between Registers Except Multiplication
Instructions
The pipeline has three stages: IF, ID, and EX (figure 8.17). The data operation is completed in the
EX stage via the ALU.
225
Multiply/Accumulate Instruction: Includes the following instruction type:
• MAC.W
@Rm+, @Rn+
: Slot
Instruction A
Next instruction
Third instruction in series
......
IF
ID
EX
MA MA mm mm
IF
—
ID
EX
MA WB
IF
ID
EX
MA
WB
Figure 8.18 Multiply/Accumulate Instruction Pipeline
The pipeline has seven stages: IF, ID, EX, MA, MA, mm, and mm. The second MA reads the
memory and accesses the multiplier. mm indicates that the multiplier is operating. mm operates for
two cycles after the final MA ends, regardless of slot. The ID of the instruction after the MAC.W
instruction is stalled for 1 slot. The two MAs of the MAC.W instruction, when they contend with
IF, split the slots as described in Section 8.4, Contention between Instruction Fetch (IF) and
Memory Access (MA).
When an instruction that does not use the multiplier comes after the MAC.W instruction, the
MAC.W instruction may be considered to be a five-stage pipeline instruction of IF, ID, EX, MA,
MA. In such cases, the ID of the next instruction simply stalls one slot and thereafter operates like
a normal pipeline. When an instruction that uses the multiplier comes after the MAC.W
instruction, however, contention occurs with the multiplier, so operation is different from normal.
The following cases are possible:
(a) MAC.W instruction follows immediately after MAC.W instruction
(b) MAC.L instruction follows immediately after MAC.W instruction
(c) MULS.W instruction follows immediately after MAC.W instruction
(d) DMULS.L instruction follows immediately after MAC.W instruction
(e) STS (register) instruction follows immediately after MAC.W instruction
(f) STS.L (memory) instruction follows immediately after MAC.W instruction
(g) LDS (register) instruction follows immediately after MAC.W instruction
(h) LDS.L (memory) instruction follows immediately after MAC.W instruction
226
(a) MAC.W instruction follows immediately after MAC.W instruction
The second MA of MAC.W instruction does not contend with the mm generated by the preceding
multiply instruction.
: Slot
IF
MAC.W
MAC.W
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
······
Next instruction in series
··········
Figure 8.19 MAC.W Instruction Follows Immediately after MAC.W Instruction (1)
If the MAC.W instruction occurs twice in succession, contention between MA and IF could cause
a delay in instruction execution. Refer to the diagram below. This diagram takes into account the
possibility of contention between MA and IF.
: Slot
MAC.W
if
MAC.W
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
—
MA
mm
mm
if
—
—
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
MA
MAC.W
MAC.W
mm
······
··········
Figure 8.20 MAC.W Instruction Follows Immediately after MAC.W Instruction (2)
If contention occurs between the second MA of the MAC.W instruction and IF, the slot splits
normally. Refer to the diagram below. This diagram takes into account the possibility of
contention between MA and IF.
: Slot
MAC.W
IF
MAC.W
Other instruction
Other instruction
Other instruction
ID
EX
MA
—
MA
mm
mm
if
—
—
ID
EX
MA
MA
mm
mm
IF
—
ID
—
EX
MA
······
if
—
ID
EX
······
IF
··········
Figure 8.21 MAC.W Instruction Follows Immediately after MAC.W Instruction (3)
227
(b) MAC.L instruction follows immediately after MAC.W instruction
The second MA of the MAC.W instruction does not contend with the mm generated by the
preceding multiply instruction.
: Slot
MAC.W
IF
MAC.L
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
······
Next instruction in series
mm
mm
··········
Figure 8.22 MAC.L Instruction Follows Immediately after MAC.W Instruction
(c) MULS.W instruction follows immediately after MAC.W instruction
The MULS.W instruction has an MA stage for accessing the multiplier. If contention with the MA
of MULS.W occurs during the MAC.W instruction's multiplier operation (mm), that MA is
delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. If
there is one or more instruction that does not use the multiplier located between MAC.W and
MULS.W, no contention occurs between MAC.W and MULS.W and there is no delay. Note that
the slot splits if there is contention between the MA of MULS.W and IF.
: Slot
MAC.W
IF
MULS.W
ID
EX
MA
MA
IF
—
ID
EX
M
A
mm
mm
IF
ID
EX
—
MA
······
Other instruction
mm
mm
··········
: Slot
MAC.W
IF
Branch destination
MULS.W
Other instruction
ID
EX
MA
MA
IF
—
ID
EX
mm
mm
IF
ID
EX
MA
mm
mm
IF
ID
EX
MA
······
Figure 8.23 MULS.W Instruction Follows Immediately after MAC.W Instruction
228
(d) DMULS.L instruction follows immediately after MAC.W instruction
The MULS.W instruction has an MA stage for accessing the multiplier, but there is no contention
with the MA of MULS.W during the MAC.W instruction's multiplier operation (mm). Note that
the slot splits if there is contention between the MA of MULS.W and IF.
: Slot
MAC.W
IF
DMULS.L
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
······
Other instruction
mm
mm
··········
Figure 8.24 DMULS.L Instruction Follows Immediately after MAC.W Instruction
(e) STS (register) instruction follows immediately after MAC.W instruction
If the STS instruction is used to store the contents of the MAC register to a general-use register,
the STS instruction will include an MA stage for accessing the multiplier, as described below. If
contention with the MA of STS occurs during the multiplier operation (mm), that MA is delayed
until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA
of STS contends with IF. This situation is shown in the diagrams below. These diagrams take into
account the possibility of contention between MA and IF.
: Slot
MAC.W
IF
STS
ID
EX
MA
—
MA
mm
mm
if
—
—
ID
EX
M
A
WB
IF
ID
—
—
EX
MA
if
—
—
ID
EX
IF
ID
Other instruction
Other instruction
Other instruction
EX
······
··········
: Slot
MAC.W
if
STS
Other instruction
Other instruction
Other instruction
ID
EX
MA
MA
mm
mm
IF
—
ID
—
EX
MA
if
—
ID
EX
IF
ID
EX
if
ID
WB
EX
······
··········
Figure 8.25 STS (Register) Instruction Follows Immediately after MAC.W Instruction
229
(f) STS.L (memory) instruction follows immediately after MAC.W instruction
If the STS instruction is used to store the contents of the MAC register in memory, the STS
instruction will include an MA stage for accessing the multiplier and writing to memory, as
described below. These diagrams take into account the possibility of contention between MA and
IF.
: Slot
MAC.W
IF
STS.L
ID
EX
MA
—
MA
mm
mm
if
—
—
ID
EX
M
A
ID
—
—
EX
MA
if
—
—
ID
EX
IF
ID
Other instruction
Other instruction
Other instruction
EX
······
··········
: Slot
MAC.W
if
STS.L
Other instruction
Other instruction
Other instruction
ID
EX
MA
MA
mm
mm
IF
—
ID
—
EX
MA
if
—
ID
EX
IF
ID
EX
if
ID
EX
······
··········
Figure 8.26 STS.L (Memory) Instruction Follows Immediately after MAC.W Instruction
230
(g) LDS (register) instruction follows immediately after MAC.W instruction
If the LDS instruction is used to load the contents of the MAC register from a general-use register,
the LDS instruction will include an MA stage for accessing the multiplier, as described below. If
contention with the MA of LDS occurs during the multiplier operation (mm), that MA is delayed
until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA
of LDS contends with IF. This situation is shown in the diagrams below. These diagrams take into
account the possibility of contention between MA and IF.
: Slot
MAC.W
IF
LDS
ID
EX
MA
—
MA
mm
if
—
—
ID
EX
M
A
IF
ID
—
—
EX
MA
if
—
—
ID
EX
IF
ID
Other instruction
Other instruction
mm
Other instruction
EX
······
··········
: Slot
MAC.W
if
LDS
Other instruction
Other instruction
Other instruction
ID
EX
MA
MA
mm
mm
IF
—
ID
—
EX
MA
if
—
ID
EX
IF
ID
EX
if
ID
EX
······
··········
Figure 8.27 LDS (Register) Instruction Follows Immediately after MAC.W Instruction
231
(h) LDS.L (memory) instruction follows immediately after MAC.W instruction
If the LDS instruction is used to load the contents of the MAC register from memory, the LDS
instruction will include an MA stage for accessing memory and accessing the multiplier, as
described below. If contention with the MA of LDS occurs during the multiplier operation (mm),
that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single
slot. Also, the MA of LDS contends with IF. This situation is shown in the diagrams below. These
diagrams take into account the possibility of contention between MA and IF.
: Slot
MAC.W
IF
LDS.L
ID
EX
MA
—
MA
if
—
—
ID
EX
M
A
IF
ID
—
—
EX
if
—
—
ID
EX
IF
ID
Other instruction
Other instruction
mm
mm
Other instruction
EX
······
··········
: Slot
MAC.W
if
LDS.L
Other instruction
Other instruction
Other instruction
ID
EX
MA
MA
mm
mm
IF
—
ID
—
EX
MA
if
—
ID
EX
IF
ID
EX
if
ID
EX
······
··········
Figure 8.28 LDS.L (Memory) Instruction Follows Immediately after MAC.W Instruction
232
Double-Length Multiply/Accumulate Instruction: Includes the following instruction type:
• MAC.L
@Rm+, @Rn+
: Slot
Instruction A
Next instruction
Third instruction
......
IF
ID
EX
MA MA mm mm mm mm
IF
—
ID
EX
MA WB
IF
ID
EX
MA
WB
Figure 8.29 Double-Length Multiply/Accumulate Instruction Pipeline
The pipeline has nine stages: IF, ID, EX, MA, MA, mm, mm, mm, and mm (figure 8.29). The
second MA reads the memory and accesses the multiplier. The mm indicates that the multiplier is
operating. The mm operates for four cycles after the final MA ends, regardless of slot. The ID of
the instruction after the MAC.L instruction is stalled for one slot. The two MAs of the MAC.L
instruction, when they contend with IF, split the slots as described in section 8.4, Contention
between Instruction Fetch (IF) and Memory Access (MA).
When an instruction that does not use the multiplier follows the MAC.L instruction, the MAC.L
instruction may be considered to be a five-stage pipeline instruction of IF, ID, EX, MA, MA. In
such cases, the ID of the next instruction simply stalls one slot and thereafter the pipeline operates
normally. When an instruction that uses the multiplier comes after the MAC.L instruction,
contention occurs with the multiplier, so operation is different from normal.
The following cases are possible:
(a) MAC.L instruction follows immediately after MAC.L instruction
(b) MAC.W instruction follows immediately after MAC.L instruction
(c) DMULS.L instruction follows immediately after MAC.L instruction
(d) MULS.W instruction follows immediately after MAC.L instruction
(e) STS (register) instruction follows immediately after MAC.L instruction
(f) STS.L (memory) instruction follows immediately after MAC.L instruction
(g) LDS (register) instruction follows immediately after MAC.L instruction
(h) LDS.L (memory) instruction follows immediately after MAC.L instruction
233
(a) MAC.L instruction follows immediately after MAC.L instruction
If the second MA of the MAC.L instruction contends with the mm generated by the preceding
multiply instruction, that MA is delayed until the mm finishes (M -- A in the diagram below),
thereby forming a single slot.
If there are two or more instructions that do not use the multiplier located between the one MAC.L
instruction and a second MAC.L instruction, no contention occurs the two MAC.L instructions
and there is no delay.
: Slot
IF
MAC.L
MAC.L
ID
EX
MA
MA
mm
IF
—
ID
EX
MA
M
IF
—
ID
Next instruction in series
mm
mm
mm
A
mm
mm
EX
—
—
MA
······
mm
mm
mm
mm
··········
: Slot
IF
MAC.L
Other instruction
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
WB
IF
ID
EX
MA
WB
IF
ID
EX
MA
Other instruction
MAC.L
MA
mm
mm
mm
mm
··········
Figure 8.30 MAC.L Instruction Follows Immediately after MAC.L Instruction (1)
Even if the succession of MAC.L instructions causes delays in execution due to contention
between MA and IF, multiplier contention may be reduced in some cases. Refer to the diagram
below. This diagram takes into account the possibility of contention between MA and IF.
: Slot
MAC.L
MAC.L
MAC.L
MAC.L
if
ID
EX
MA
MA
mm
mm
mm
mm
IF
—
ID
EX
MA
—
M
A
mm
mm
if
—
—
ID
EX
—
MA
M
IF
—
—
ID
EX
mm
mm
A
mm
—
—
MA
mm
mm
··········
Figure 8.31 MAC.L Instruction Follows Immediately after MAC.L Instruction (2)
234
mm
If the second MA of the MAC.L instruction is delayed until the mm finishes, and that MA
contends with IF, the slot splits normally. Refer to the diagram below. This diagram takes into
account the possibility of contention between MA and IF.
: Slot
MAC.L
IF
MAC.L
ID
EX
MA
—
MA
mm
if
—
—
ID
EX
MA
M
IF
—
ID
if
Other instruction
Other instruction
mm
mm
mm
A
mm
—
—
—
EX
—
—
—
ID
mm
mm
mm
IF
Other instruction
··········
Figure 8.32 MAC.L Instruction Follows Immediately after MAC.L Instruction (3)
(b) MAC.W instruction follows immediately after MAC.L instruction
If the second MA of the MAC.L contends with the mm generated by the preceding multiply
instruction, that MA is delayed until the mm finishes (M -- A in the diagram below), thereby
forming a single slot.
If there are two or more instructions that do not use the multiplier located between the MAC.L
instruction and the MAC.W instruction, no multiplier contention occurs between the MAC.L
instruction and the MAC.W instruction, and there is no delay.
: Slot
MAC.L
IF
MAC.W
ID
EX
MA
MA
mm
IF
—
ID
EX
MA
M
IF
—
ID
EX
Next instruction in series
mm
mm
mm
A
mm
mm
—
—
MA
······
··········
: Slot
MAC.L
Other instruction
Other instruction
MAC.W
IF
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
WB
mm
IF
ID
EX
MA
WB
IF
ID
EX
MA
mm
MA
mm
mm
··········
Figure 8.33 MAC.W Instruction Follows Immediately after MAC.L Instruction
235
(c) DMULS.L instruction follows immediately after MAC.L instruction
The DMULS.L instruction has an MA stage for accessing the multiplier. If contention with the
second MA of DMULS.L occurs during the MAC.L instruction's multiplier operation (mm), that
MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot.
If there are two or more instructions that do not use the multiplier located between the MAC.L
instruction and the DMULS.L instruction, no contention occurs between MAC.L and DMULS.L,
and there is no delay. Note that the slot splits if there is contention between the MA of DMULS.L
and IF.
: Slot
MAC.L
IF
DMULS.L
ID
EX
MA
MA
mm
IF
—
ID
EX
MA
M
IF
—
ID
Other instruction
mm
mm
mm
A
mm
mm
—
—
EX
MA
······
mm
mm
mm
mm
mm
mm
··········
: Slot
MAC.L
IF
Branch destination
ID
EX
MA
MA
IF
—
ID
EX
IF
ID
EX
MA
M
A
mm
mm
IF
—
ID
—
EX
MA
······
DMULS.L
Other instruction
mm
mm
··········
: Slot
MAC.L
IF
Other instruction
Other instruction
DMULS.L
Other instruction
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
WB
mm
mm
IF
ID
EX
MA
WB
IF
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
······
mm
mm
··········
Figure 8.34 DMULS.L Instruction Follows Immediately after MAC.L Instruction
236
(d) MULS.W instruction follows immediately after MAC.L instruction
The MULS.W instruction has an MA stage for accessing the multiplier. If contention with the MA
of MULS.W occurs during the MAC.L instruction's multiplier operation (mm), that MA is delayed
until the mm finishes (M -- A in the diagram below), thereby forming a single slot. If there are
three or more instructions that do not use the multiplier located between MAC.L and MULS.W, no
contention occurs between MAC.L and MULS.W and there is no delay. Note that the slot splits if
there is contention between the MA of MULS.W and IF.
: Slot
MAC.L
IF
MULS.W
ID
EX
MA
MA
IF
—
ID
EX
M
IF
ID
EX
Other instruction
mm
mm
mm
mm
A
mm
mm
—
—
—
MA
······
··········
: Slot
MAC.L
IF
Branch destination
ID
EX
MA
MA
IF
—
ID
EX
IF
ID
EX
M
IF
ID
MULS.W
Other instruction
mm
mm
mm
mm
A
mm
mm
EX
—
—
MA
······
mm
mm
··········
: Slot
MAC.L
IF
Other instruction
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
WB
IF
ID
EX
MA
IF
ID
EX
M
A
mm
mm
IF
ID
EX
—
MA
······
mm
mm
Other instruction
MULS.W
Other instruction
WB
··········
: Slot
MAC.L
Other instruction
Other instruction
Other instruction
MULS.W
Other instruction
IF
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
WB
IF
ID
EX
MA
WB
IF
ID
EX
MA
WB
IF
ID
EX
MA
mm
mm
IF
ID
EX
MA
······
··········
Figure 8.35 MULS.W Instruction Follows Immediately after MAC.L Instruction
237
(e) STS (register) instruction follows immediately after MAC.L instruction
If the STS instruction is used to store the contents of the MAC register to a general-use register,
the STS instruction will include an MA stage for accessing the multiplier, as described below. If
contention with the MA of STS occurs during the multiplier operation (mm), that MA is delayed
until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA
of STS contends with IF. This situation is shown in the diagrams below. These diagrams take into
account the possibility of contention between MA and IF.
: Slot
MAC.L
IF
STS
ID
EX
MA
—
MA
if
—
—
ID
EX
M
IF
ID
if
Other instruction
Other instruction
mm
mm
mm
mm
A
WB
—
—
—
—
EX
MA
—
—
—
—
ID
EX
IF
ID
Other instruction
EX
······
··········
: Slot
MAC.L
if
STS
Other instruction
Other instruction
Other instruction
ID
EX
MA
MA
mm
mm
IF
—
ID
—
EX
M
if
—
ID
EX
IF
ID
if
mm
mm
A
WB
—
—
EX
—
—
ID
EX
······
··········
Figure 8.36 STS (Register) Instruction Follows Immediately after MAC.L Instruction
238
(f) STS.L (memory) instruction follows immediately after MAC.L instruction
If the STS instruction is used to store the contents of the MAC register in memory, the STS
instruction will include an MA stage for accessing the multiplier and writing to memory, as
described below. Also, the MA of STS contends with IF. This situation is shown in the diagrams
below. These diagrams take into account the possibility of contention between MA and IF.
: Slot
MAC.L
IF
STS.L
ID
EX
MA
—
MA
mm
if
—
—
ID
EX
M
IF
ID
if
Other instruction
Other instruction
mm
mm
mm
—
—
—
—
EX
MA
—
—
—
—
ID
EX
IF
ID
A
Other instruction
EX
······
··········
: Slot
MAC.L
if
STS.L
Other instruction
Other instruction
Other instruction
ID
EX
MA
MA
mm
mm
IF
—
ID
—
EX
M
mm
mm
if
—
ID
EX
IF
ID
—
—
EX
if
—
—
ID
A
EX
······
··········
Figure 8.37 STS.L (Memory) Instruction Follows Immediately after MAC.L Instruction
239
(g) LDS (register) instruction follows immediately after MAC.L instruction
If the LDS instruction is used to load the contents of the MAC register from a general-use register,
the LDS instruction will include an MA stage for accessing the multiplier, as described below. If
contention with the MA of LDS occurs during the multiplier operation (mm), that MA is delayed
until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA
of LDS contends with IF. This situation is shown in the diagrams below. These diagrams take into
account the possibility of contention between MA and IF.
: Slot
MAC.L
IF
LDS
ID
EX
MA
—
MA
if
—
—
ID
EX
M
IF
ID
if
Other instruction
Other instruction
mm
mm
mm
mm
—
—
—
—
EX
MA
—
—
—
—
ID
EX
IF
ID
A
Other instruction
EX
······
··········
: Slot
MAC.L
LDS
Other instruction
Other instruction
Other instruction
if
ID
EX
MA
MA
mm
mm
mm
mm
IF
—
ID
—
EX
M
if
—
ID
EX
IF
ID
—
—
EX
if
—
—
ID
A
EX
······
··········
Figure 8.38 LDS (Register) Instruction Follows Immediately after MAC.L Instruction
240
(h) LDS.L (memory) instruction follows immediately after MAC.L instruction
If the LDS instruction is used to load the contents of the MAC register from memory, the LDS
instruction will include an MA stage for accessing memory and accessing the multiplier, as
described below. If contention with the MA of LDS occurs during the multiplier operation (mm),
that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single
slot. Also, the MA of LDS contends with IF. This situation is shown in the diagrams below. These
diagrams take into account the possibility of contention between MA and IF.
: Slot
MAC.L
IF
LDS.L
ID
EX
MA
—
MA
if
—
—
ID
EX
M
IF
ID
if
Other instruction
Other instruction
mm
mm
mm
mm
—
—
—
—
EX
MA
—
—
—
—
ID
EX
IF
ID
A
Other instruction
EX
······
··········
: Slot
MAC.L
if
LDS.L
Other instruction
Other instruction
Other instruction
ID
EX
MA
MA
mm
mm
mm
mm
IF
—
ID
—
EX
M
if
—
ID
EX
IF
ID
—
—
EX
if
—
—
ID
A
EX
······
··········
Figure 8.39 LDS.L (Memory) Instruction Follows Immediately after MAC.L Instruction
241
Multiplication Instructions: Include the following instruction types:
• MULS.W
• MULU.W
Rm, Rn
Rm, Rn
: Slot
Instruction A
Next instruction
Third instruction
......
IF
ID
EX
MA mm mm
IF
ID
EX
MA WB
IF
ID
EX
MA
WB
Figure 8.40 Multiplication Instruction Pipeline
The pipeline has six stages: IF, ID, EX, MA, mm, and mm. The MA accesses the multiplier. mm
indicates that the multiplier is operating. mm operates for three cycles after the MA ends,
regardless of slot. The MA of the MULS.W instruction, when it contends with IF, splits the slot as
described in Section 8.4, Contention between Instruction Fetch (IF) and Memory Access (MA).
When an instruction that does not use the multiplier comes after the MULS.W instruction, the
MULS.W instruction may be considered to be a four-stage pipeline instruction of IF, ID, EX, and
MA. In such cases, it operates like a normal pipeline. When an instruction that uses the multiplier
comes after the MULS.W instruction, however, contention occurs with the multiplier, so operation
is different from normal.
The following cases are possible:
(a) MAC.W instruction follows immediately after MULS.W instruction
(b) MAC.L instruction follows immediately after MULS.W instruction
(c) MULS.W instruction follows immediately after MULS.W instruction
(d) DMULS.L instruction follows immediately after MULS.W instruction
(e) STS (register) instruction follows immediately after MULS.W instruction
(f) STS.L (memory) instruction follows immediately after MULS.W instruction
(g) LDS (register) instruction follows immediately after MULS.W instruction
(h) LDS.L (memory) instruction follows immediately after MULS.W instruction
242
(a) MAC.W instruction follows immediately after MULS.W instruction
The second MA of the MAC.W instruction does not contend with the mm generated by the
preceding multiply instruction.
: Slot
MULS.W
IF
MAC.W
ID
EX
MA
mm
mm
IF
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
······
Next instruction in series
··········
Figure 8.41 MAC.W Instruction Follows Immediately after MULS.W Instruction
(b) MAC.L instruction follows immediately after MULS.W instruction
The second MA of the MAC.W instruction does not contend with the mm generated by the
preceding multiply instruction.
: Slot
ID
EX
MA
mm
mm
IF
ID
EX
MA
MA
mm
mm
Next instruction in series
IF
—
ID
EX
MA
······
MULS.W
MAC.L
IF
mm
mm
··········
Figure 8.42 MAC.L Instruction Follows Immediately after MULS.W Instruction
243
(c) MULS.W instruction follows immediately after MULS.W instruction
The MULS.W instruction has an MA stage for accessing the multiplier. If contention with the MA
of the other MULS.W occurs during the MULS.W instruction's multiplier operation (mm), that
MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot.
If there is one or more instruction that does not use the multiplier located between MULS.W and
MULS.W, no contention occurs between MULS.W and MULS.W and there is no delay. Note that
the slot splits if there is contention between the MA of MULS.W and IF.
: Slot
IF
MULS.W
MULS.W
ID
EX
MA
IF
ID
EX
M
A
mm
mm
IF
ID
EX
—
MA
······
ID
EX
MA
mm
mm
IF
ID
EX
IF
ID
EX
MA
mm
mm
IF
ID
EX
MA
······
Other instruction
mm
mm
··········
: Slot
IF
MULS.W
Other instruction
MULS.W
Other instruction
··········
Figure 8.43 MULS.W Instruction Follows Immediately after MULS.W Instruction (1)
If the MA of the MULS.W instruction is delayed until the mm finishes, and that MA contends
with IF, the slot splits normally. Refer to the diagram below. This diagram takes into account the
possibility of contention between MA and IF.
: Slot
MULS.W
IF
MULS.W
Other instruction
Other instruction
Other instruction
ID
EX
MA
mm
if
ID
EX
M
mm
A
mm
mm
IF
ID
—
—
EX
MA
······
if
—
—
ID
EX
······
IF
ID
······
··········
Figure 8.44 MULS.W Instruction Follows Immediately after MULS.W Instruction (2)
244
(d) DMULS.L instruction follows immediately after MULS.W instruction
The second MA of the DMULS.L accesses the multiplier, but there is no contention with the mm
generated by the MULS.W instruction.
: Slot
MULS.W
IF
DMULS.L
ID
EX
MA
mm
mm
IF
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
······
Other instruction
mm
mm
··········
Figure 8.45 DMULS.L Instruction Follows Immediately after MULS.W Instruction
(e) STS (register) instruction follows immediately after MULS.W instruction
If the STS instruction is used to store the contents of the MAC register to a general-use register,
the STS instruction will include an MA stage for accessing the multiplier, as described below. If
contention with the MA of STS occurs during the multiplier operation (mm), that MA is delayed
until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA
of STS contends with IF. This situation is shown in the diagrams below. These diagrams take into
account the possibility of contention between MA and IF.
: Slot
MULS.W
IF
STS
ID
EX
MA
mm
if
ID
EX
M
A
WB
IF
ID
—
—
EX
MA
if
—
—
ID
EX
IF
ID
Other instruction
Other instruction
mm
Other instruction
EX
······
··········
: Slot
MULS.W
if
STS
Other instruction
Other instruction
Other instruction
ID
EX
MA
mm
mm
IF
ID
—
EX
MA
if
—
ID
EX
IF
ID
EX
if
ID
WB
EX
······
··········
Figure 8.46 STS (Register) Instruction Follows Immediately after MULS.W Instruction
245
(f) STS.L (memory) instruction follows immediately after MULS.W instruction
If the STS instruction is used to store the contents of the MAC register in memory, the STS
instruction will include an MA stage for accessing the multiplier and writing to memory, as
described below. Also, the MA of STS contends with IF. This situation is shown in the diagrams
below. These diagrams take into account the possibility of contention between MA and IF.
: Slot
MULS.W
IF
STS.L
ID
EX
MA
mm
if
ID
EX
M
A
IF
ID
—
—
EX
MA
if
—
—
ID
EX
IF
ID
Other instruction
Other instruction
mm
Other instruction
EX
······
··········
: Slot
MULS.W
if
STS.L
Other instruction
Other instruction
Other instruction
ID
EX
MA
mm
mm
IF
ID
—
EX
MA
if
—
ID
EX
IF
ID
EX
if
ID
EX
······
··········
Figure 8.47 STS.L (Memory) Instruction Follows Immediately after MULS.W Instruction
246
(g) LDS (register) instruction follows immediately after MULS.W instruction
If the LDS instruction is used to load the contents of the MAC register from a general-use register,
the LDS instruction will include an MA stage for accessing the multiplier, as described below. If
contention with the MA of LDS occurs during the multiplier operation (mm), that MA is delayed
until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA
of LDS contends with IF. This situation is shown in the diagrams below. These diagrams take into
account the possibility of contention between MA and IF.
: Slot
MULS.W
IF
LDS
ID
EX
MA
if
ID
EX
M
A
IF
ID
—
—
EX
MA
if
—
—
ID
EX
IF
ID
Other instruction
Other instruction
mm
mm
Other instruction
EX
······
··········
: Slot
MULS.W
if
LDS
Other instruction
Other instruction
Other instruction
ID
EX
MA
mm
mm
IF
ID
—
EX
MA
if
—
ID
EX
IF
ID
EX
if
ID
EX
······
··········
Figure 8.48 LDS (Register) Instruction Follows Immediately after MULS.W Instruction
247
(h) LDS.L (memory) instruction follows immediately after MULS.W instruction
If the LDS instruction is used to load the contents of the MAC register from memory, the LDS
instruction will include an MA stage for accessing memory and accessing the multiplier, as
described below. If contention with the MA of LDS occurs during the multiplier operation (mm),
that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single
slot. Also, the MA of LDS contends with IF. This situation is shown in the diagrams below. These
diagrams take into account the possibility of contention between MA and IF.
: Slot
MULS.W
IF
LDS.L
ID
EX
MA
if
ID
EX
M
A
IF
ID
—
—
EX
MA
if
—
—
ID
EX
IF
ID
Other instruction
Other instruction
mm
mm
Other instruction
EX
······
··········
: Slot
MULS.W
LDS.L
Other instruction
Other instruction
Other instruction
if
ID
EX
MA
mm
mm
IF
ID
—
EX
MA
if
—
ID
EX
IF
ID
EX
if
ID
EX
······
··········
Figure 8.49 LDS.L (Memory) Instruction Follows Immediately after MULS.W Instruction
248
Double-Length Multiplication Instructions: Include the following instruction types:
• DMULS.L
• DMULU.L
• MUL.L
Rm, Rn
Rm, Rn
Rm, Rn
: Slot
Instruction A
Next instruction
Third instruction
......
IF
ID
EX
MA MA mm mm mm mm
IF
—
ID
EX
MA WB
IF
ID
EX
MA
WB
Figure 8.50 Multiplication Instruction Pipeline
The pipeline has nine stages: IF, ID, EX, MA, MA, mm, mm, mm, and mm (figure 8.50). The
second MA accesses the multiplier. The mm indicates that the multiplier is operating. The mm
operates for four cycles after the MA ends, regardless of slot. The ID of the instruction following
the DMULS.L instruction is stalled for 1 slot (see the description of the Multiply/Accumulate
instruction). The two MA stages of the DMULS.L instruction, when they contend with IF, split the
slot as described in section 8.4, Contention between Instruction Fetch (IF) and Memory Access
(MA).
When an instruction that does not use the multiplier comes after the DMULS.L instruction, the
DMULS.L instruction may be considered to be a five-stage pipeline instruction of IF, ID, EX,
MA, and MA. In such cases, it operates like a normal pipeline. When an instruction that uses the
multiplier come after the DMULS.L instruction, however, contention occurs with the multiplier,
so operation is different from normal.
The following cases are possible:
(a) MAC.L instruction follows immediately after DMULS.L instruction
(b) MAC.W instruction follows immediately after DMULS.L instruction
(c) DMULS.L instruction follows immediately after DMULS.L instruction
(d) MULS.W instruction follows immediately after DMULS.L instruction
(e) STS (register) instruction follows immediately after DMULS.L instruction
(f) STS.L (memory) instruction follows immediately after DMULS.L instruction
(g) LDS (register) instruction follows immediately after DMULS.L instruction
(h) LDS.L (memory) instruction follows immediately after DMULS.L instruction
249
(a) MAC.L instruction follows immediately after DMULS.L instruction
If the second MA of the MAC.L instruction contends with the mm generated by the preceding
multiply instruction, the bus cycle of that MA is extended until the mm finishes (M -- A in the
diagram below), thereby forming a single slot.
If there are two or more instructions that do not use the multiplier located between the DMULS.L
instruction and the MAC.L instruction, no contention occurs between DMULS.L and MAC.L, and
there is no delay.
: Slot
DMULS.L
IF
MAC.L
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
M
IF
—
ID
EX
Next instruction in series
mm
mm
A
mm
mm
—
—
MA
······
mm
mm
mm
mm
··········
: Slot
DMULS.L
Other instruction
Other instruction
MAC.L
IF
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
WB
IF
ID
EX
MA
WB
IF
ID
EX
MA
MA
mm
mm
mm
mm
··········
Figure 8.51 MAC.L Instruction Follows Immediately after DMULS.L Instruction
250
(b) MAC.W instruction follows immediately after DMULS.L instruction
If the second MA of the MAC.W instruction contends with the mm generated by the preceding
multiply instruction, the bus cycle of that MA is extended until the mm finishes (M -- A in the
diagram below), thereby forming a single slot.
If there are two or more instructions that do not use the multiplier located between the DMULS.L
instruction and the MAC.W instruction, no contention occurs between DMULS.L and MAC.W,
and there is no delay.
: Slot
DMULS.L
IF
MAC.W
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
M
IF
—
ID
EX
Next instruction in series
mm
mm
A
mm
mm
—
—
MA
······
mm
mm
··········
: Slot
DMULS.L
IF
Other instruction
Other instruction
MAC.W
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
WB
IF
ID
EX
MA
WB
IF
ID
EX
MA
MA
mm
mm
··········
Figure 8.52 MAC.W Instruction Follows Immediately after DMULS.L Instruction
251
(c) DMULS.L instruction follows immediately after DMULS.L instruction
The DMULS.L instruction has an MA stage for accessing the multiplier. If contention with the
MA of DMULS.L occurs during the other DMULS.L instruction's multiplier operation (mm), that
MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot.
If there are two or more instructions that do not use the multiplier located between DMULS.L and
DMULS.L, no contention occurs between DMULS.L and DMULS.L and there is no delay. Note
that the slot splits if there is contention between the MA of DMULS.L and IF.
: Slot
DMULS.L
IF
DMULS.L
ID
EX
MA
MA
mm
IF
—
ID
EX
MA
M
IF
—
ID
EX
Other instruction
mm
mm
mm
A
mm
mm
—
—
MA
······
mm
mm
··········
: Slot
DMULS.L
IF
Other instruction
ID
EX
MA
MA
IF
—
ID
EX
IF
DMULS.L
Other instruction
mm
mm
mm
mm
ID
EX
MA
M
A
mm
mm
IF
—
ID
EX
—
MA
······
mm
mm
mm
mm
··········
: Slot
DMULS.L
Other instruction
Other instruction
DMULS.L
Other instruction
IF
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
WB
IF
ID
EX
MA
WB
IF
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
······
mm
mm
··········
Figure 8.53 DMULS.L Instruction Follows Immediately after DMULS.L Instruction (1)
252
If the MA of the DMULS.L instruction is delayed until the mm finishes, and that MA contends
with IF, the slot splits normally. Refer to the diagram below. This diagram takes into account the
possibility of contention between MA and IF.
: Slot
DMULS.L
IF
DMULS.L
Other instruction
Other instruction
Other instruction
ID
EX
MA
MA
—
mm
mm
if
—
EX
—
ID
MA
M
IF
ID
if
—
mm
mm
A
mm
—
—
—
EX
—
—
—
mm
mm
ID
EX
······
IF
ID
······
mm
··········
Figure 8.54 DMULS.L Instruction Follows Immediately after DMULS.L Instruction (2)
253
(d) MULS.W instruction follows immediately after DMULS.L instruction
The MULS.W instruction has an MA stage for accessing the multiplier. If contention with the MA
of MULS.W occurs during the DMULS.L instruction's multiplier operation (mm), that MA is
delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. If
there are three or more instructions that do not use the multiplier located between DMULS.L and
MULS.W, no contention occurs between DMULS.L and MULS.W and there is no delay. Note
that the slot splits if there is contention between the MA of MULS.W and IF.
: Slot
IF
DMULS.L
MULS.W
ID
EX
MA
MA
mm
IF
—
ID
EX
M
IF
ID
EX
—
—
mm
Other instruction
mm
mm
mm
A
mm
mm
MA
······
··········
: Slot
IF
DMULS.L
Other instruction
ID
EX
MA
MA
mm
mm
IF
—
ID
EX
MA
WB
IF
ID
EX
MA
WB
IF
ID
EX
MA
WB
IF
ID
EX
MA
MA
mm
IF
ID
EX
MA
······
Other instruction
Other instruction
MULS.W
mm
mm
Other instruction
··········
Figure 8.55 MULS.W Instruction Follows Immediately after DMULS.L Instruction (1)
If the MA of the DMULS.L instruction is delayed until the mm finishes, and that MA contends
with IF, the slot splits normally. Refer to the diagram below. This diagram takes into account the
possibility of contention between MA and IF.
: Slot
DMULS.L
IF
MULS.W
Other instruction
Other instruction
Other instruction
ID
EX
MA
—
MA
mm
mm
if
—
—
ID
EX
M
IF
ID
—
—
if
—
—
mm
mm
A
mm
mm
—
—
EX
MA
······
—
—
ID
EX
······
IF
ID
······
··········
Figure 8.56 MULS.W Instruction Follows Immediately after DMULS.L Instruction (2)
254
(e) STS (register) instruction follows immediately after DMULS.L instruction
If the STS instruction is used to store the contents of the MAC register to a general-use register,
the STS instruction will include an MA stage for accessing the multiplier, as described below. If
contention with the MA of STS occurs during the multiplier operation (mm), that MA is delayed
until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA
of STS contends with IF. This situation is shown in the diagrams below. These diagrams take into
account the possibility of contention between MA and IF.
: Slot
DMULS.L
IF
STS
ID
EX
MA
—
MA
if
—
—
ID
EX
M
IF
ID
if
Other instruction
Other instruction
mm
mm
mm
mm
A
WB
—
—
—
—
EX
MA
—
—
—
—
ID
EX
IF
ID
Other instruction
EX
······
··········
: Slot
DMULS.L
if
STS
Other instruction
Other instruction
Other instruction
ID
EX
MA
MA
mm
mm
IF
—
ID
—
EX
M
if
—
ID
EX
IF
ID
if
mm
mm
A
WB
—
—
EX
—
—
ID
EX
······
··········
Figure 8.57 STS (Register) Instruction Follows Immediately after DMULS.L Instruction
255
(f) STS.L (memory) instruction follows immediately after DMULS.L instruction
If the STS instruction is used to store the contents of the MAC register in memory, the STS
instruction will include an MA stage for accessing the multiplier and writing to memory, as
described below. Also, the MA of STS contends with IF. This situation is shown in the diagrams
below. These diagrams take into account the possibility of contention between MA and IF.
: Slot
DMULS.L
IF
STS.L
ID
EX
MA
—
MA
if
—
—
ID
EX
M
IF
ID
if
Other instruction
Other instruction
mm
mm
mm
mm
—
—
—
—
EX
MA
—
—
—
—
ID
EX
IF
ID
A
Other instruction
EX
······
··········
: Slot
DMULS.L
if
STS.L
Other instruction
Other instruction
Other instruction
ID
EX
MA
MA
mm
mm
mm
mm
IF
—
ID
—
EX
M
if
—
ID
EX
IF
ID
—
—
EX
if
—
—
ID
A
EX
······
··········
Figure 8.58 STS.L (Memory) Instruction Follows Immediately after DMULS.L Instruction
256
(g) LDS (register) instruction follows immediately after DMULS.L instruction
If the LDS instruction is used to load the contents of the MAC register from a general-use register,
the LDS instruction will include an MA stage for accessing the multiplier, as described below. If
contention with the MA of LDS occurs during the multiplier operation (mm), that MA is delayed
until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA
of LDS contends with IF. This situation is shown in the diagrams below. These diagrams take into
account the possibility of contention between MA and IF.
: Slot
DMULS.L
IF
LDS
ID
EX
MA
—
MA
if
—
—
ID
EX
M
IF
ID
if
Other instruction
Other instruction
mm
mm
mm
mm
—
—
—
—
EX
MA
—
—
—
—
ID
EX
IF
ID
A
Other instruction
EX
······
··········
: Slot
DMULS.L
if
LDS
Other instruction
Other instruction
Other instruction
ID
EX
MA
MA
mm
mm
IF
—
ID
—
EX
M
mm
mm
if
—
ID
EX
IF
ID
—
—
EX
if
—
—
ID
A
EX
······
··········
Figure 8.59 LDS (Register) Instruction Follows Immediately after DMULS.L Instruction
257
(h) LDS.L (memory) instruction follows immediately after DMULS.L instruction
If the LDS instruction is used to load the contents of the MAC register from memory, the LDS
instruction will include an MA stage for accessing memory and accessing the multiplier, as
described below. If contention with the MA of LDS occurs during the multiplier operation (mm),
that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single
slot. Also, the MA of LDS contends with IF. This situation is shown in the diagrams below. These
diagrams take into account the possibility of contention between MA and IF.
: Slot
DMULS.L
IF
LDS.L
ID
EX
MA
if
—
—
—
I
Other instruction
Other instruction
MA
mm
EX
M
mm
mm
mm
A
ID
—
—
—
EX
MA
if
—
—
—
ID
EX
IF
ID
Other instruction
EX
······
··········
: Slot
DMULS.L
if
LDS.L
Other instruction
Other instruction
Other instruction
ID
EX
MA
MA
mm
mm
IF
—
ID
—
EX
M
mm
mm
if
—
ID
EX
IF
ID
—
—
EX
if
—
—
ID
A
EX
······
··········
Figure 8.60 LDS.L (Memory) Instruction Follows Immediately after DMULS.L Instruction
258
8.8.3
Logic Operation Instructions
Register-Register Logic Operation Instructions: Include the following instruction types:
•
•
•
•
•
AND
AND
NOT
OR
OR
Rm, Rn
#imm, R0
Rm, Rn
Rm, Rn
#imm, R0
•
•
•
•
TST
TST
XOR
XOR
ID
EX
IF
ID
EX ......
IF
ID
Rm, Rn
#imm, R0
Rm, Rn
#imm, R0
: Slot
Instruction A
Next instruction
Third instruction in series
......
IF
EX ......
Figure 8.61 Register-Register Logic Operation Instruction Pipeline
The pipeline has three stages: IF, ID, and EX (figure 8.61). The data operation is completed in the
EX stage via the ALU.
259
Memory Logic Operations Instructions: Include the following instruction types:
•
•
•
•
AND.B
OR.B
TST.B
XOR.B
#imm, @(R0, GBR)
#imm, @(R0, GBR)
#imm, @(R0, GBR)
#imm, @(R0, GBR)
: Slot
Instruction A
IF
Next instruction
ID
EX
MA
EX
MA
IF
—
—
ID
EX
.....
IF
ID
EX
Third instruction in series
.....
.....
Figure 8.62 Memory Logic Operation Instruction Pipeline
The pipeline has six stages: IF, ID, EX, MA, EX, and MA (figure 8.62). The ID of the next
instruction stalls for 2 slots. The MAs of these instructions contend with IF.
TAS Instruction: Includes the following instruction type:
• TAS.B
@Rn
: Slot
Instruction A
Next instruction
Third instruction in series
IF
ID
EX
MA
EX
MA
IF
—
—
—
ID
EX
.....
IF
ID
EX
.....
.....
Figure 8.63 TAS Instruction Pipeline
The pipeline has six stages: IF, ID, EX, MA, EX, and MA (figure 8.63). The ID of the next
instruction stalls for 3 slots. The MA of the TAS instruction contends with IF.
260
8.8.4
Shift Instructions
General Shift Instructions: Include the following instruction types:
•
•
•
•
•
•
•
ROTL
ROTR
ROTCL
ROTCR
SHAL
SHAR
SHLL
•
•
•
•
•
•
•
Rn
Rn
Rn
Rn
Rn
Rn
Rn
SHLR
SHLL2
SHLR2
SHLL8
SHLR8
SHLL16
SHLR16
Rn
Rn
Rn
Rn
Rn
Rn
Rn
: Slot
Instruction A
Next instruction
Third instruction in series
.....
IF
ID
EX
IF
ID
EX
.....
IF
ID
EX
.....
Figure 8.64 General Shift Instruction Pipeline
The pipeline has three stages: IF, ID, and EX (figure 8.64). The data operation is completed in the
EX stage via the ALU.
261
8.8.5
Branch Instructions
Conditional Branch Instructions: Include the following instruction types:
• BF label
• BT label
The pipeline has three stages: IF, ID, and EX. Condition verification is performed in the ID stage.
Conditionally branched instructions are not delay branched.
1. When condition is satisfied
The branch destination address is calculated in the EX stage. The two instructions after the
conditional branch instruction (instruction A) are fetched but discarded. The branch destination
instruction begins its fetch from the slot following the slot which has the EX stage of
instruction A (figure 8.65).
: Slot
Instruction A
IF
Next instruction
ID
EX
IF
—
(Fetched but discarded)
Third instruction in series
IF
—
Branch destination
.....
—
IF
(Fetched but discarded)
ID EX .....
IF
ID
EX
.....
.....
Figure 8.65 Branch Instruction when Condition Is Satisfied
2. When condition is not satisfied
If it is determined that conditions are not satisfied at the ID stage, the EX stage proceeds
without doing anything. The next instruction also executes a fetch (figure 8.66).
: Slot
Instruction A
Next instruction
Third instruction in series
.....
IF
ID
EX
IF
ID
EX
.....
IF
ID
EX
.....
IF
ID
EX
.....
.....
Figure 8.66 Branch Instruction when Condition Is Not Satisfied
262
Note: The SH-2E always fetches data as longwords. Consequently, a fetch performed by the
instruction following the status "1. When condition is satisfied" will overlap two
instructions if the address is at the 4n address boundary.
Delayed Conditional Branch Instructions: Include the following instruction types:
• BF/S label
• BT/S label
The pipeline has three stages: IF, ID, and EX. Condition verification is performed in the ID stage.
1. When condition is satisfied
The branch destination address is calculated in the EX stage. The instruction after the
conditional branch instruction (instruction A) is fetched and executed, but the instruction after
that is fetched and discarded. The branch destination instruction begins its fetch from the slot
following the slot which has the EX stage of instruction A (figure 8.67).
: Slot
Instruction A
IF
Next instruction
ID
EX
IF
—
ID
EX
IF
—
(Fetched but discarded)
ID EX .....
Third instruction in series
Branch destination
.....
IF
IF
MA WB
ID
EX
.....
Figure 8.67 Branch Instruction when Condition Is Satisfied
2. When condition is not satisfied
If it is determined that a condition is not satisfied at the ID stage, the EX stage proceeds
without doing anything. The next instruction also executes a fetch (figure 8.68).
: Slot
Instruction A
Next instruction
Third instruction in series
.....
IF
ID
EX
IF
ID
EX
.....
IF
ID
EX
.....
IF
ID
EX
.....
.....
Figure 8.68 Branch Instruction when Condition Is Not Satisfied
Note: The SH-2E always fetches data as longwords. Consequently, a fetch performed by the
instruction following the status "1. When condition is satisfied" will overlap two
instructions if the address is at the 4n address boundary.
263
Unconditional Branch Instructions: Include the following instruction types:
•
•
•
•
•
•
•
BRA
BRAF
BSR
BSRF
JMP
JSR
RTS
label
Rm
label
Rm
@Rm
@Rm
: Slot
Instruction A
Delay slot
Branch destination
.....
IF
ID
EX
IF
—
ID
EX
IF
ID
MA WB
EX .....
IF
ID
EX
.....
.....
Figure 8.69 Unconditional Branch Instruction Pipeline
The pipeline has three stages: IF, ID, and EX (figure 8.69). Unconditionally branched instructions
are delay branched. The branch destination address is calculated in the EX stage. The instruction
following the unconditional branch instruction (instruction A), that is, the delay slot instruction is
not fetched and discarded as conditional branch instructions are, but is instead executed. Note that
the ID slot of the delay slot instruction does stall for one cycle. The branch destination instruction
starts its fetch from the slot after the slot that has the EX stage of instruction A.
264
8.8.6
System Control Instructions
System Control ALU Instructions: Include the following instruction types:
•
•
•
•
•
•
CLRT
LDC
LDC
LDC
LDS
NOP
•
•
•
•
•
Rm,SR
Rm,GBR
Rm,VBR
Rm,PR
SETT
STC
STC
STC
STS
SR,Rn
GBR,Rn
VBR,Rn
PR,Rn
: Slot
Instruction A
Next instruction
Third instruction in series
.....
IF
ID
EX
IF
ID
EX
.....
IF
ID
EX
.....
Figure 8.70 System Control ALU Instruction Pipeline
The pipeline has three stages: IF, ID, and EX (figure 8.70). The data operation is completed in the
EX stage via the ALU.
265
LDC.L Instructions: Include the following instruction types:
• LDC.L
• LDC.L
• LDC.L
@Rm+, SR
@Rm+, GBR
@Rm+, VBR
: Slot
Instruction A
IF
Next instruction
ID
EX MA WB
IF
—
—
Third instruction in series
.....
ID
EX
.....
IF
ID
EX
.....
Figure 8.71 LDC.L Instruction Pipeline
The pipeline has five stages: IF, ID, EX, MA, and EX (figure 8.71). The ID of the following
instruction is stalled two slots.
STC.L Instructions: Include the following instruction types:
• STC.L
• STC.L
• STC.L
SR, @–Rn
GBR, @–Rn
VBR, @–Rn
: Slot
Instruction A
Next instruction
Third instruction in series
.....
IF
ID
EX MA
IF
—
ID
EX
.....
IF
ID
EX
.....
Figure 8.72 STC.L Instruction Pipeline
The pipeline has four stages: IF, ID, EX, and MA (figure 8.72). The ID of the next instruction is
stalled one slot.
266
LDS.L Instruction (PR): Includes the following instruction type:
• LDS.L
@Rm+, PR
: Slot
Instruction A
IF
Next instruction
ID
EX
IF
ID
MA WB
EX .....
IF
ID
Third instruction in series
.....
EX
.....
Figure 8.73 LDS.L Instructions (PR) Pipeline
The pipeline has five stages: IF, ID, EX, MA, and WB (figure 8.73). It is the same as an ordinary
load instruction.
STS.L Instruction (PR): Includes the following instruction type:
• STS.L
PR, @–Rn
: Slot
Instruction A
Next instruction
Third instruction in series
.....
IF
ID
EX
MA
IF
ID
EX
.....
IF
ID
EX
.....
Figure 8.74 STS.L Instruction (PR) Pipeline
The pipeline has four stages: IF, ID, EX, and MA (figure 8.74). It is the same as an ordinary load
instruction.
267
Register → MAC Transfer Instructions: Include the following instruction types:
• CLRMAC
• LDS
Rm, MACH
• LDS
Rm, MACL
: Slot
Instruction A
IF
Next instruction
ID
EX
MA
IF
ID
EX
.....
IF
ID
EX
Third instruction in series
.....
.....
Figure 8.75 Register → MAC Transfer Instruction Pipeline
The pipeline has four stages: IF, ID, EX, and MA (figure 8.75). MA is a stage for accessing the
multiplier. MA contends with IF. This makes it the same as ordinary store instructions. Since the
multiplier does contend with the MA, however, the items noted for the multiplication,
Multiply/Accumulate, double-length multiplication, and double-length multiply/accumulate
instructions apply.
Memory → MAC Transfer Instructions: Include the following instruction types:
• LDS.L
• LDS.L
@Rm+, MACH
@Rm+, MACL
: Slot
Instruction A
Next instruction
Third instruction in series
.....
IF
ID
EX
MA
IF
ID
EX
.....
IF
ID
EX
.....
Figure 8.76 Memory → MAC Transfer Instruction Pipeline
The pipeline has four stages: IF, ID, EX, and MA (figure 8.76). MA contends with IF. MA is a
stage for memory access and multiplier access. This makes it the same as ordinary load
instructions. Since the multiplier does contend with the MA, however, the items noted for the
multiplication, Multiply/Accumulate, double-length multiplication, and double-length
multiply/accumulate instructions apply.
268
MAC → Register Transfer Instructions: Include the following instruction types:
• STS
• STS
MACH, Rn
MACL, Rn
: Slot
Instruction A
IF
Next instruction
ID
EX
IF
ID
MA WB
EX .....
IF
ID
Third instruction in series
.....
EX
.....
Figure 8.77 MAC → Register Transfer Instruction Pipeline
The pipeline has five stages: IF, ID, EX, MA, and WB (figure 8.77). MA is a stage for accessing
the multiplier. MA contends with IF. This makes it the same as ordinary load instructions. Since
the multiplier does contend with the MA, however, the items noted for the multiplication,
Multiply/Accumulate, double-length multiplication, and double-length multiply/accumulate
instructions apply.
MAC → Memory Transfer Instructions: Include the following instruction types:
• STS.L
• STS.L
MACH, @–Rn
MACL, @–Rn
: Slot
Instruction A
Next instruction
Third instruction in series
.....
IF
ID
EX
MA
IF
ID
EX
.....
IF
ID
EX
.....
Figure 8.78 MAC → Memory Transfer Instruction Pipeline
The pipeline has four stages: IF, ID, EX, and MA (figure 8.78). MA is a stage for accessing the
memory and multiplier. MA contends with IF. This makes it the same as ordinary store
instructions. Since the multiplier does contend with the MA, however, the items noted for the
multiplication, Multiply/Accumulate, double-length multiplication, and double-length
multiply/accumulate instructions apply.
269
RTE Instruction: RTE
: Slot
RTE
IF
Delay slot
ID
EX
MA
MA
IF
—
—
—
Branch destination
.....
ID
EX
.....
IF
ID
EX
.....
Figure 8.79 RTE Instruction Pipeline
The pipeline has five stages: IF, ID, EX, MA, and MA (figure 8.79). The MAs do not contend
with IF. RTE is a delayed branch instruction. The ID of the delay slot instruction is stalled 3 slots.
The IF of the branch destination instruction starts from the slot following the MA of the RTE.
TRAP Instruction: TRAPA #imm
: Slot
Instruction A
Next instruction
IF
ID EX EX MA MA MA EX EX
IF
Third instruction in series
IF
Branch destination
......
IF
ID EX .....
IF
ID EX
Figure 8.80 TRAP Instruction Pipeline
The pipeline has nine stages: IF, ID, EX, EX, MA, MA, MA, EX, and EX (figure 8.80). The MAs
do not contend with IF. TRAP is not a delayed branch instruction. The two instructions after the
TRAP instruction are fetched, but they are discarded without being executed. The IF of the branch
destination instruction starts from the slot of the EX in the ninth stage of the TRAP instruction.
SLEEP Instruction: SLEEP
: Slot
SLEEP
Next instruction
IF
ID EX
IF
.....
Figure 8.81 SLEEP Instruction Pipeline
The pipeline has three stages: IF, ID and EX (figure 8.81). It is issued until the IF of the next
instruction. After the SLEEP instruction is executed, the CPU enters sleep mode or standby mode.
270
8.8.7
Exception Processing
Interrupt Exception Processing: The interrupt is received during the ID stage of the instruction
and everything after the ID stage is replaced by the interrupt exception processing sequence. The
pipeline has ten stages: IF, ID, EX, EX, MA, MA, EX, MA, EX, and EX (figure 8.82). Interrupt
exception processing is not a delayed branch. In interrupt exception processing, an overrun fetch
(IF) occurs. In branch destination instructions, the IF starts from the slot that has the final EX in
the interrupt exception processing.
Interrupt sources are external interrupt request pins such as NMI, user breaks, IRQ, and on-chip
peripheral module interrupts.
: Slot
Interrupt
IF
Next instruction
Branch destination
......
ID EX EX MA MA EX MA EX EX
IF
IF
EX ......
IF ID ......
ID
Figure 8.82 Interrupt Exception Processing Pipeline
Address Error Exception Processing: The address error is received during the ID stage of the
instruction and everything after the ID stage is replaced by the address error exception processing
sequence. The pipeline has ten stages: IF, ID, EX, EX, MA, MA, EX, MA, EX, and EX (figure
8.83). Address error exception processing is not a delayed branch. In address error exception
processing, an overrun fetch (IF) occurs. In branch destination instructions, the IF starts from the
slot that has the final EX in the address error exception processing.
Address errors are caused by instruction fetches and by data reads or writes. See the Hardware
Manual for information on the causes of address errors.
: Slot
Interrupt
Next instruction
Branch destination
......
IF
ID EX EX MA MA EX MA EX EX
IF
IF
EX ......
IF ID ......
ID
Figure 8.83 Address Error Exception Processing Pipeline
271
Illegal Instruction Exception Processing: The illegal instruction is received during the ID stage
of the instruction and everything after the ID stage is replaced by the illegal instruction exception
processing sequence. The pipeline has nine stages: IF, ID, EX, EX, MA, MA, MA, EX, and EX
(figure 8.84). Illegal instruction exception processing is not a delayed branch. In illegal instruction
exception processing, overrun fetches (IF) occur. Whether there is an IF only in the next
instruction or in the one after that as well depends on the instruction that was to be executed. In
branch destination instructions, the IF starts from the slot that has the final EX in the illegal
instruction exception processing.
Illegal instruction exception processing is caused by ordinary illegal instructions and by
instructions with illegal slots. When undefined code placed somewhere other than the slot directly
after the delayed branch instruction (called the delay slot) is decoded, ordinary illegal instruction
exception processing occurs. When undefined code placed in the delay slot is decoded or when an
instruction placed in the delay slot to rewrite the program counter is decoded, an illegal slot
instruction occurs.
: Slot
Interrupt
Next instruction
Third destination
Branch destination
......
IF
ID EX EX MA MA MA EX EX
IF
IF)
IF
EX ......
IF ID ......
ID
Figure 8.84 Illegal Instruction Exception Processing Pipeline
272
8.8.8
Relationship between Floating-point Instructions and FPU-related CPU
Instructions
FPUL Load Instructions: Include the following instruction types:
• LDS
• LDS.L
Rm,FPUL
@Rm+,FPUL
: Slot
Instruction
IF
ID
EX
MA
IF
DF
E1
E2
SF
: FPU pipeline
IF
ID
EX
······
: CPU pipeline
IF
DF
E1
······
: FPU pipeline (CPU instruction only)
IF
ID
EX
······ : CPU pipeline
IF
DF
E1
······ : FPU pipeline (CPU instruction only)
Next instruction
Third instruction in series
: CPU pipeline
··········
Figure 8.85 FPUL Load Instruction Pipeline
The CPU pipeline has four stages, IF, ID, EX, and MA (figure 8.85) ; and the FPU pipeline has
five stages, IF, DF, E1, E2, and SF. The CPU MA stage contends with IF. Contention will also
result if an instruction that reads FPUL follows immediately after this instruction.
273
FPSCR Load Instructions: Include the following instruction types:
• LDS
• LDS.L
Rm,FPSCR
@Rm+,FPSCR
: Slot
Instruction
IF
ID
EX
MA
IF
DF
E1
E2
SF
IF
ID
—
—
EX
······
IF
DF
—
—
E1
······
IF
—
—
ID
EX
······ : CPU pipeline
IF
—
—
DF
E1
······ : FPU pipeline (CPU instruction only)
Next instruction
Third instruction in series
: CPU pipeline
: FPU pipeline
: CPU pipeline
: FPU pipeline (CPU instruction only)
··········
Figure 8.86 FPSCR Load Instruction Pipeline
The CPU pipeline has four stages, IF, ID, EX, and MA (figure 8.86) ; and the FPU pipeline has
five stages, IF, DF, E1, E2, and SF. Contention occurs as shown in Figure 8.11, and execution of
the next instruction is delayed by two slots.
274
FPUL Store Instruction (STS) : Include the following instruction type:
• STS
FPUL,Rn
: Slot
Instruction
IF
ID
EX
MA
IF
DF
E1
E2
IF
ID
EX
······
: CPU pipeline
IF
DF
E1
······
: FPU pipeline (CPU instruction only)
IF
ID
EX
······ : CPU pipeline
IF
DF
E1
······ : FPU pipeline (CPU instruction only)
Next instruction
Third instruction in series
WB
: CPU pipeline
: FPU pipeline
··········
Figure 8.87 FPUL Store Instruction (STS) Pipeline
The CPU pipeline has five stages, IF, ID, EX, MA, and MB (figure 8.87) ; and the FPU pipeline
has four stages, IF, DF, E1, and E2. The CPU MA stage contends with IF. Contention will also
result if an instruction that uses the destination of this instruction follows immediately after it.
275
FPUL Store Instruction (STS.L) : Include the following instruction type:
• STS.L
FPUL,@-Rn
: Slot
Instruction
IF
ID
EX
MA
: CPU pipeline
IF
DF
E1
E2
: FPU pipeline
IF
ID
EX
······
: CPU pipeline
IF
DF
E1
······
: FPU pipeline (CPU instruction only)
IF
ID
EX
······ : CPU pipeline
IF
DF
E1
······ : FPU pipeline (CPU instruction only)
Next instruction
Third instruction in series
··········
Figure 8.88 FPUL Store Instruction (STS.L) Pipeline
The CPU pipeline has four stages, IF, ID, EX, and MA (figure 8.88) ; and the FPU pipeline has
four stages, IF, DF, E1, and E2. The CPU MA stage contends with IF.
276
FPSCR Store Instruction (STS) : Include the following instruction type:
• STS
FPSCR,Rn
: Slot
Instruction
Next instruction
IF
ID
—
—
EX
MA
IF
DF
—
—
E1
E2
IF
—
—
ID
EX
······
IF
—
—
DF
E1
······
IF
ID
EX
······ : CPU pipeline
IF
DF
E1
······ : FPU pipeline (CPU instruction only)
Third instruction in series
WB
: CPU pipeline
: FPU pipeline
: CPU pipeline
: FPU pipeline (CPU instruction only)
··········
Figure 8.89 FPSCR Store Instruction (STS) Pipeline
The CPU pipeline has five stages, IF, ID, EX, MA, and MB (figure 8.89) ; and the FPU pipeline
has four stages, IF, DF, E1, and E2. Contention occurs as shown in Figure 8.12, and execution of
the next instruction is delayed by two slots. The CPU MA stage contends with IF. Contention will
also result if an instruction that uses the destination of this instruction follows immediately after it.
277
FPSCR Store Instruction (STS.L) : Include the following instruction type:
• STS.L
FPSCR,@-Rn
: Slot
Instruction
Next instruction
IF
ID
—
—
EX
MA
: CPU pipeline
IF
DF
—
—
E1
E2
: FPU pipeline
IF
—
—
ID
EX
······
IF
—
—
DF
E1
······
IF
ID
EX
······ : CPU pipeline
IF
DF
E1
······ : FPU pipeline (CPU instruction only)
Third instruction in series
: CPU pipeline
: FPU pipeline (CPU instruction only)
··········
Figure 8.90 FPSCR Store Instruction (STS.L) Pipeline
The CPU pipeline has four stages, IF, ID, EX, and MA (figure 8.90) ; and the FPU pipeline has
four stages, IF, DF, E1, and E2. Contention occurs as shown in Figure 8.12, and execution of the
next instruction is delayed by two slots. The CPU MA stage contends with IF.
278
Floating-point Register Transfer Instructions: Include the following instruction types:
• FLDS
• FMOV
• FSTS
FRm,FPUL
FRm,FRn
FPUL,FRn
: Slot
Instruction
IF
ID
EX
IF
DF
E1
E2
SF
: FPU pipeline
IF
ID
EX
······
: CPU pipeline
IF
DF
E1
······
: FPU pipeline (CPU instruction only)
IF
ID
EX
······ : CPU pipeline
IF
DF
E1
······ : FPU pipeline (CPU instruction only)
Next instruction
Third instruction in series
: CPU pipeline
··········
Figure 8.91 Floating-point Register Transfer Instruction Pipeline
The CPU pipeline has three stages, IF, ID, and EX (figure 8.91) ; and the FPU pipeline has five
stages, IF, DF, E1, E2, and SF. Contention occurs if an instruction that reads from the destination
of this instruction follows immediately after it.
279
Floating-point Register Immediate Instructions: Include the following instruction types:
• FLDI0
• FMDI1
FRn
FRn
: Slot
Instruction
IF
ID
EX
IF
DF
E1
E2
SF
: FPU pipeline
IF
ID
EX
······
: CPU pipeline
IF
DF
E1
······
IF
ID
EX
······ : CPU pipeline
IF
DF
E1
······ : FPU pipeline (CPU instruction only)
Next instruction
Third instruction in series
: CPU pipeline
: FPU pipeline (CPU instruction only)
··········
Figure 8.92 Floating-point Register Immediate Instructions
The CPU pipeline has three stages, IF, ID, and EX (figure 8.92) ; and the FPU pipeline has five
stages, IF, DF, E1, E2, and SF. Contention occurs if an instruction that reads from the destination
of this instruction follows immediately after it.
280
Floating-point Register Load Instructions: Include the following instruction types:
• FMOV.S
• FMOV.S
• FMOV.S
@Rm,FRn
@Rm+,FRn
@(R0,Rm),FRn
: Slot
Instruction
IF
ID
EX
MA
IF
DF
E1
E2
SF
: FPU pipeline
IF
ID
EX
······
: CPU pipeline
IF
DF
E1
······
: FPU pipeline (CPU instruction only)
Third instruction in series
IF
ID
EX
······ : CPU pipeline
··········
IF
DF
E1
······ : FPU pipeline (CPU instruction only)
Next instruction
: CPU pipeline
Figure 8.93 Floating-point Register Load Instruction Pipeline
The CPU pipeline has four stages, IF, ID, EX and MA (figure 8.93) ; and the FPU pipeline has
five stages, IF, DF, E1, E2, and SF. The CPU MA stage contends with IF. Contention will also
result if an instruction that reads from the destination of this instruction follows immediately after
it.
281
Floating-point Register Store Instructions: Include the following instruction types:
• FMOV.S
• FMOV.S
• FMOV.S
FRm,@Rn
FRm,@-Rn
FRm,@(R0,Rn)
: Slot
Instruction
IF
ID
EX
MA
: CPU pipeline
IF
DF
E1
E2
: FPU pipeline
IF
ID
EX
······
: CPU pipeline
IF
DF
E1
······
: FPU pipeline (CPU instruction only)
IF
ID
EX
······ : CPU pipeline
IF
DF
E1
······ : FPU pipeline (CPU instruction only)
Next instruction
Third instruction in series
··········
Figure 8.94 Floating-point Register Store Instruction Pipeline
The CPU pipeline has four stages, IF, ID, EX and MA (figure 8.94) ; and the FPU pipeline has
four stages, IF, DF, E1, and E2. The CPU MA stage contends with IF.
282
Floating-point Operation Instructions (Excluding FDIV) : Include the following instruction
types:
•
•
•
•
•
•
•
•
FABS
FADD
FLOAT
FMAC
FMUL
FNEG
FSUB
FTRC
FRn
FRm,FRn
FPUL,FRn
FR0,FRm,FRn
FRm,FRn
FRn
FRm,FRn
FRm,FPUL
: Slot
Instruction
IF
ID
EX
IF
DF
E1
E2
SF
: FPU pipeline
IF
ID
EX
······
: CPU pipeline
IF
DF
E1
······
: FPU pipeline (CPU instruction only)
IF
ID
EX
······ : CPU pipeline
IF
DF
E1
······ : FPU pipeline (CPU instruction only)
Next instruction
Third instruction in series
: CPU pipeline
··········
Figure 8.95 Floating-point Operation Instructions (Excluding FDIV) Pipeline
The CPU pipeline has three stages, IF, ID, and EX (figure 8.95) ; and the FPU pipeline has five
stages, IF, DF, E1, E2, and SF. Contention occurs if an instruction that reads from the destination
of this instruction follows immediately after it.
283
Floating-point Operation Instruction (FDIV) : Include the following instruction type:
• FDIV
FRm,FRn
Case 1: If next instruction is a floating-point instruction or an FPU-related CPU instruction
: Slot
Instruction
IF
ID
EX
IF
DF
E1
E1
······
E1
E2
SF
IF
ID
—
······
—
—
—
EX
······
IF
DF
—
······
—
—
—
E1
······
IF
—
······
—
—
—
ID
EX
······
: CPU pipeline
IF
—
······
—
—
—
DF
E1
······
: FPU pipeline
Next instruction
Third instruction in series
······
: CPU pipeline
: FPU pipeline
: CPU pipeline
: FPU pipeline
(CPU instruction only)
··········
Case 2: If next instruction is a CPU instruction and the following instruction is a floating-point instruction
or an FPU-related CPU instruction
: Slot
Instruction
IF
ID
EX
IF
DF
E1
E1
······
IF
ID
EX
······
IF
ID
······
—
—
—
EX
······
: CPU pipeline
IF
DF
······
—
—
—
E1
······
: FPU pipeline
Next instruction
Third instruction in series
: CPU pipeline
E1
E2
SF
: FPU pipeline
: CPU pipeline
··········
Figure 8.96 Floating-point Operation Instruction (FDIV) Pipeline
The CPU pipeline has three stages, IF, ID, and EX (figure 8.96) ; and the FPU pipeline has 17
stages, IF, DF, E1, E1, E1, E1, E1, E1, E1, E1, E1, E1, E1, E1, E1, E1, E2, and SF. In other
words, 13 E1 stages are repeated in succession.
Contention occurs as shown in Figure 8.13. If the FDIV pipeline overlaps with the pipeline of a
floating-point instruction or an FPU-related CPU instruction, all stages from E1 onward are stalled
until execution of FDIV completes, and the following instructions are also stalled. Consequently,
performance can be improved by not placing any floating-point instructions or FPU-related CPU
instructions within the 14 instructions immediately following the FDIV instruction, since CPU
instructions can execute normally.
284
Floating-point Compare Instructions: Include the following instruction types:
• FCMP/EQ
• FCMP/GT
FRm,FRn
FRm,FRn
: Slot
Instruction
IF
ID
EX
: CPU pipeline
IF
DF
E1
: FPU pipeline
IF
ID
EX
······
IF
DF
E1
······
IF
ID
EX
······ : CPU pipeline
IF
DF
E1
······ : FPU pipeline (CPU instruction only)
Next instruction
Third instruction in series
: CPU pipeline
: FPU pipeline (CPU instruction only)
··········
Figure 8.97 Floating-point Compare Instruction Pipeline
The CPU pipeline has three stages, IF, ID, and EX (figure 8.97) ; and the FPU pipeline has three
stages, IF, DF, and E1.
285
Appendix A Instruction Code
A.1
Instruction Set by Addressing Mode
Table A.1
Instruction Set by Addressing Mode
Addressing Mode
Category
Sample Instruction
No operand
—
NOP
Direct register
addressing
Destination operand only
MOVT
Rn
22
Source and destination operand
ADD
Rm,Rn
42
Load and store with control
register or system register
LDC
STS
Rm,SR
MACH,Rn
18
Source operand only
JMP
@Rm
2
Destination operand only
TAS.B
@Rn
1
Data transfer direct from register MOV.L
Rm,@Rn
8
Multiply/accumulate operation
@Rm+,@Rn+
2
Data transfer direct from register MOV.L
@Rm+,Rn
4
Load to control register or
system register
LDC.L
@Rm+,SR
8
Pre-decrement indirect
register addressing
Data transfer direct from register MOV.L
Rm,@–Rn
4
Store from control register or
system register
STC.L
SR,@–Rn
8
Indirect register
addressing with
displacement
Data transfer direct to register
MOV.L
Rm,@(disp,Rn)
6
Indirect indexed register
addressing
Data transfer direct to register
MOV.L
Rm,@(R0,Rn)
8
Indirect GBR addressing Data transfer direct to register
with displacement
MOV.L
R0,@(disp,GBR)
6
Indirect indexed GBR
addressing
Immediate data transfer
AND.B
#imm,@(R0,GBR)
4
PC relative addressing
with displacement
Data transfer direct to register
MOV.L
@(disp,PC),Rn
3
PC relative addressing
with Rn
Branch instruction
BRAF
Rn
2
PC relative addressing
Branch instruction
BRA
label
6
Indirect register
addressing
Post-increment indirect
register addressing
MAC.W
Types
8
287
Table A.1
Instruction Set by Addressing Mode (cont)
Addressing Mode
Immediate addressing
Category
Sample Instruction
Types
Load to register
FLDI0
FRn
2
Arithmetic logical operations
direct with register
ADD
#imm,Rn
7
Specify exception processing
vector
TRAPA
#imm
1
Total:
172
Note: Figures not in parentheses ( ) indicate the number of instructions for the SH-3E and figures
in parentheses ( ) indicate the number of instructions for the SH-3.
A.1.1
No Operand
Table A.2
No Operand
Instruction
Operation
Code
Cycles
T Bit
CLRT
0→T
0000000000001000
1
0
CLRMAC
0 → MACH, MACL
0000000000101000
1
—
DIV0U
0 → M/Q/T
0000000000011001
1
0
NOP
No operation
0000000000001001
1
—
RTE
Delayed branching,
Stack area → PC/SR
0000000000101011
4
—
RTS
Delayed branching, PR → PC
0000000000001011
2
—
SETT
1→T
0000000000011000
1
1
SLEEP
Sleep
0000000000011011
3
—
288
A.1.2
Direct Register Addressing
Table A.3
Destination Operand Only
Instruction
Operation
Code
Cycles
T Bit
CMP/PL Rn
Rn > 0, 1 → T
0100nnnn00010101
1
Comparison
result
CMP/PZ Rn
Rn 0, 1 → T
0100nnnn00010001
1
Comparison
result
DT
Rn
Rn – 1 → Rn, when Rn is 0, 1 0100nnnn00010000
→ T. When Rn is nonzero, 0
→T
1
Comparison
result
FABS
FRn
abs(FRn → FRn
1111nnnn01011101
1
—
FLOAT
FPUL,
FRn
(float)FPUL → FRn
1111nnnn00101101
1
—
FNEG
FRn
–1.0 × FRn → FRn
1111nnnn01001101
1
—
FTRC
FRm,
FPUL
(int)FRm → FPUL
1111mmmm00111101
1
—
MOVT
Rn
T → Rn
0000nnnn00101001
1
—
ROTL
Rn
T ← Rn ← MSB
0100nnnn00000100
1
MSB
ROTR
Rn
LSB → Rn → T
0100nnnn00000101
1
LSB
ROTCL
Rn
T ← Rn ← T
0100nnnn00100100
1
MSB
ROTCR
Rn
T → Rn → T
0100nnnn00100101
1
LSB
SHAL
Rn
T ← Rn ← 0
0100nnnn00100000
1
MSB
SHAR
Rn
MSB → Rn → T
0100nnnn00100001
1
LSB
SHLL
Rn
T ← Rn ← 0
0100nnnn00000000
1
MSB
SHLR
Rn
0 → Rn → T
0100nnnn00000001
1
LSB
SHLL2
Rn
Rn << 2 → Rn
0100nnnn00001000
1
—
SHLR2
Rn
Rn >> 2 → Rn
0100nnnn00001001
1
—
SHLL8
Rn
Rn << 8 → Rn
0100nnnn00011000
1
—
SHLR8
Rn
Rn >> 8 → Rn
0100nnnn00011001
1
—
SHLL16 Rn
Rn << 16 → Rn
0100nnnn00101000
1
—
SHLR16 Rn
Rn >> 16 → Rn
0100nnnn00101001
1
—
289
Table A.4
Source and Destination Operand
Instruction
Operation
Code
Cycles
T Bit
ADD
Rm,Rn
Rn + Rm → Rn
0011nnnnmmmm1100
1
—
ADDC
Rm,Rn
Rn + Rm + T → Rn,
carry → T
0011nnnnmmmm1110
1
Carry
ADDV
Rm,Rn
Rn + Rm → Rn,
overflow → T
0011nnnnmmmm1111
1
Overflow
AND
Rm,Rn
Rn & Rm → Rn
0010nnnnmmmm1001
1
—
CMP/EQ
Rm,Rn
When Rn = Rm, 1 → T
0011nnnnmmmm0000
1
Comparison
result
CMP/HS
Rm,Rn
When unsigned and Rn
Rm, 1 → T
0011nnnnmmmm0010
1
Comparison
result
CMP/GE
Rm,Rn
When signed and Rn
Rm, 1 → T
0011nnnnmmmm0011
1
Comparison
result
CMP/HI
Rm,Rn
When unsigned and Rn >
Rm, 1 → T
0011nnnnmmmm0110
1
Comparison
result
CMP/GT
Rm,Rn
When signed and Rn >
Rm, 1 → T
0011nnnnmmmm0111
1
Comparison
result
CMP/STR Rm,Rn
When a byte in Rn equals
a bytes in Rm, 1 → T
0010nnnnmmmm1100
1
Comparison
result
DIV1
Rm,Rn
1 step division (Rn ÷ Rm)
0011nnnnmmmm0100
1
Calculation
result
DIV0S
Rm,Rn
MSB of Rn → Q, MSB of
Rm → M, M ^ Q → T
0010nnnnmmmm0111
1
Calculation
result
DMULS.L Rm,Rn
Signed operation of Rn x
Rm → MACH, MACL
0011nnnnmmmm1101
2 to 4*
—
DMULU.L Rm,Rn
Unsigned operation of Rn
× Rm → MACH, MACL
0011nnnnmmmm0101
2 to 4*
—
EXTS.B
Rm,Rn
Sign – extend Rm from
byte → Rn
0110nnnnmmmm1110
1
—
EXTS.W
Rm,Rn
Sign – extend Rm from
word → Rn
0110nnnnmmmm1111
1
—
EXTU.B
Rm,Rn
Zero – extend Rm from
byte → Rn
0110nnnnmmmm1100
1
—
EXTU.W
Rm,Rn
Zero – extend Rm from
word → Rn
0110nnnnmmmm1101
1
—
FADD
FRm,
FRn
FRm + FRn → FRn
1111nnnnmmmm0000
1
—
290
Table A.4
Source and Destination Operand (cont)
Instruction
Operation
Code
Cycles
T Bit
FCMP/EQ FRm,
FRn
(FRn == FRm)?
1:0 → T
1111nnnnmmmm0100
1
Comparison
result
FCMP/GT FRm,
FRn
(FRn > FRm)?
1:0 → T
1111nnnnmmmm0101
1
Comparison
result
FDIV
FRm,
FRn
FRn/FRm → FRn
1111nnnnmmmm0011
13
—
FMAC
FR0,FRm
FRn
(FR0 × FRm) + FRn → FRn 1111nnnnmmmm1110
1
—
FMOV
FRm,
FRn
FRm → FRn
1111nnnnmmmm1100
1
—
FMUL
FRm,
FRn
FRn × FRm → FRn
1111nnnnmmmm0010
1
—
FSUB
FRm,
FRn
FRn – FRm → FRn
1111nnnnmmmm0001
1
—
MOV
Rm,Rn
Rm → Rn
0110nnnnmmmm0011
1
—
MUL.L
Rm,Rn
Rn × Rm → MAC
0000nnnnmmmm0111
2 to 4*
—
MULS.W
Rm,Rn
With sign, Rn × Rm → MAC 0010nnnnmmmm1111
1 to 3*
—
MULU.W
Rm,Rn
Unsigned, Rn × Rm →
MAC
0010nnnnmmmm1110
1 to 3*
—
NEG
Rm,Rn
0 – Rm → Rn
0110nnnnmmmm1011
1
—
NEGC
Rm,Rn
0 – Rm – T → Rn,
Borrow → T
0110nnnnmmmm1010
1
Borrow
NOT
Rm,Rn
~Rm → Rn
0110nnnnmmmm0111
1
—
OR
Rm,Rn
Rn | Rm → Rn
0010nnnnmmmm1011
1
—
SUB
Rm,Rn
Rn – Rm → Rn
0011nnnnmmmm1000
1
—
SUBC
Rm,Rn
Rn – Rm – T → Rn,
Borrow → T
0011nnnnmmmm1010
1
Borrow
SUBV
Rm,Rn
Rn – Rm → Rn,
Underflow → T
0011nnnnmmmm1011
1
Underflow
SWAP.B
Rm,Rn
Rm → Swap upper and
lower halves of lower 2
bytes → Rn
0110nnnnmmmm1000
1
—
SWAP.W
Rm,Rn
Rm → Swap upper and
lower word → Rn
0110nnnnmmmm1001
1
—
TST
Rm,Rn
Rn & Rm, when result is 0,
1→T
0010nnnnmmmm1000
1
Test results
XOR
Rm,Rn
Rn ^ Rm → Rn
0010nnnnmmmm1010
1
—
291
Table A.4
Source and Destination Operand (cont)
Instruction
Operation
XTRCT
Rm: Center 32 bits of Rn → 0010nnnnmmmm1101
Rn
Rm,Rn
Code
Cycles
T Bit
1
—
Note: * The normal minimum number of execution states.
Table A.5
Load and Store with Control Register or System Register
Instruction
Operation
Code
Cycles
T Bit
FLDS
FRm,FPUL
FRm → FPUL
1111mmmm00011101
1
—
FSTS
FPUL,FRn
FPUL → FRn
1111nnnn00001101
1
—
LDC
Rm,SR
Rm → SR
0100mmmm00001110
1
LSB
LDC
Rm,GBR
Rm → GBR
0100mmmm00011110
1
—
LDC
Rm,VBR
Rm → VBR
0100mmmm00101110
1
—
LDS
Rm,FPSCR
Rm → FPSCR
0100mmmm01101010
1
—
LDS
Rm,FPUL
Rm → FPUL
0100mmmm01011010
1
—
LDS
Rm,MACH
Rm → MACH
0100mmmm00001010
1
—
LDS
Rm,MACL
Rm → MACL
0100mmmm00011010
1
—
LDS
Rm,PR
Rm → PR
0100mmmm00101010
1
—
STC
SR,Rn
SR → Rn
0000nnnn00000010
1
—
STC
GBR,Rn
GBR → Rn
0000nnnn00010010
1
—
STC
VBR,Rn
VBR → Rn
0000nnnn00100010
1
—
STS
FPSCR,Rn
FPSCR → Rn
1111nnnn01101010
1
—
STS
FPUL,Rn
FPUL → Rn
1111nnnn01011010
1
—
STS
MACH,Rn
MACH → Rn
0000nnnn00001010
1
—
STS
MACL,Rn
MACL → Rn
0000nnnn00011010
1
—
STS
PR,Rn
PR → Rn
0000nnnn00101010
1
—
A.1.3
Indirect Register Addressing
Table A.6
Source Operand Only
Instruction
Operation
JMP
@Rm
JSR
@Rm
292
Code
Cycles
T Bit
Delayed branching, Rm → PC 0100nnnn00101011
2
—
Delayed branching,
PC → PR, Rm → PC
2
—
0100nnnn00001011
Table A.7
Destination Operand Only
Instruction
Operation
Code
Cycles
T Bit
TAS.B
When (Rn) is 0, 1 → T,
1 → MSB of (Rn)
0100nnnn00011011
4
Test
results
@Rn
Table A.8
Data Transfer Direct to Register
Instruction
Operation
Code
Cycles
T Bit
FMOV.S FRm,@Rn
FRm → (FRn)
1111nnnnmmmm1010
1
—
FMOV.S @Rm,FRn
(Rm) → FRn
1111nnnnmmmm1000
1
—
MOV.B
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0000
1
—
MOV.W
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0001
1
—
MOV.L
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0010
1
—
MOV.B
@Rm,Rn
(Rm) → sign extension → Rn
0110nnnnmmmm0000
1
—
MOV.W
@Rm,Rn
(Rm) → sign extension → Rn
0110nnnnmmmm0001
1
—
MOV.L
@Rm,Rn
(Rm) → Rn
0110nnnnmmmm0010
1
—
A.1.4
Post-Increment Indirect Register Addressing
Table A.9
Multiply/Accumulate Operation
Instruction
Operation
Code
Cycles
T Bit
MAC.L
@Rm+,@Rn+
Signed operation of (Rn) ×
(Rm) + MAC → MAC
0000nnnnmmmm1111
3/(2 to
4)*
—
MAC.W
@Rm+,@Rn+
Signed operation of (Rn) ×
(Rm) + MAC → MAC
0100nnnnmmmm1111
3/(2)*
—
Note: * Normal minimum number of execution states (the number in parenthesis is the number of
states when there is contention with preceding/following instructions).
Table A.10 Data Transfer Direct from Register
Instruction
Operation
Code
Cycles
T Bit
FMOV.S @Rm+,FRn
(Rm) → FRn, Rm + 4 → Rm
1111nnnnmmmm1001
1
—
MOV.B
@Rm+,Rn
(Rm) → sign extension →
Rn, Rm + 1 → Rm
0110nnnnmmmm0100
1
—
MOV.W
@Rm+,Rn
(Rm) → sign extension →
Rn, Rm + 2 → Rm
0110nnnnmmmm0101
1
—
MOV.L
@Rm+,Rn
(Rm) → Rn, Rm + 4 → Rm
0110nnnnmmmm0110
1
—
293
Table A.11 Load to Control Register or System Register
Instruction
Operation
Code
Cycles
T Bit
LDC.L @Rm+,SR
(Rm) → SR, Rm + 4 → Rm
0100mmmm00000111
3
LSB
LDC.L @Rm+,GBR
(Rm) → GBR, Rm + 4 → Rm
0100mmmm00010111
3
—
LDC.L @Rm+,VBR
(Rm) → VBR, Rm + 4 → Rm
0100mmmm00100111
3
—
LDS.L @Rm+,FPSCR
(Rm) → FPSCR,
Rm + 4 → Rm
0100mmmm01100110
1
—
LDS.L @Rm+,FPUL
(Rm) → FPUL,
Rm + 4 → Rm
0100mmmm01010110
1
—
LDS.L @Rm+,MACH
(Rm) → MACH,
@Rm + 4 → Rm
0100mmmm00000110
1
—
LDS.L @Rm+,MACL
(Rm) → MACL,
@Rm + 4 → Rm
0100mmmm00010110
1
—
LDS.L @Rm+,PR
(Rm) → PR, @Rm + 4 → Rm
0100mmmm00100110
1
—
A.1.5
Pre-Decrement Indirect Register Addressing
Table A.12 Data Transfer Direct from Register
Instruction
Operation
Code
Cycles
T Bit
FMOV.S FRm,@–Rn
Rn – 4 → Rn, FRm → (Rn)
1111nnnnmmmm1011
1
—
MOV.B
Rm,@–Rn
Rn – 1 → Rn, Rm → (Rn)
0010nnnnmmmm0100
1
—
MOV.W
Rm,@–Rn
Rn – 2 → Rn, Rm → (Rn)
0010nnnnmmmm0101
1
—
MOV.L
Rm,@–Rn
Rn – 4 → Rn, Rm → (Rn)
0010nnnnmmmm0110
1
—
Table A.13 Store from Control Register or System Register
Instruction
Operation
Code
Cycles
T Bit
STC.L
SR,@-Rn
Rn – 4 → Rn, SR → (Rn)
0100nnnn00000011
2
—
STC.L
GBR,@-Rn
Rn – 4 → Rn, GBR → (Rn)
0100nnnn00010011
2
—
STC.L
VBR,@-Rn
Rn – 4 → Rn, VBR → (Rn)
0100nnnn00100011
2
—
STS.L
FPSCR,@–Rn
Rn – 4 → Rn, FPSCR → (Rn)
0100nnnn01100010
1
—
STS.L
FPUL,@–Rn
Rn – 4 → Rn, FPUL → (Rn)
0100nnnn01010010
1
—
STS.L
MACH,@–Rn
Rn – 4 → Rn, MACH → (Rn)
0100nnnn00000010
1
—
STS.L
MACL,@–Rn
Rn – 4 → Rn, MACL → (Rn)
0100nnnn00010010
1
—
STS.L
PR,@–Rn
Rn – 4 → Rn, PR → (Rn)
0100nnnn00100010
1
—
294
A.1.6
Indirect Register Addressing with Displacement
Table A.14 Indirect Register Addressing with Displacement
Instruction
Operation
Code
Cycles
T Bit
MOV.B
R0,@(disp,Rn)
R0 → (disp + Rn)
10000000nnnndddd
1
—
MOV.W
R0,@(disp,Rn)
R0 → (disp + Rn)
10000001nnnndddd
1
—
MOV.L
Rm,@(disp,Rn)
Rm → (disp + Rn)
0001nnnnmmmmdddd
1
—
MOV.B
@(disp,Rm),R0
(disp + Rm) → sign
extension → R0
10000100mmmmdddd
1
—
MOV.W
@(disp,Rm),R0
(disp + Rm) → sign
extension → R0
10000101mmmmdddd
1
—
MOV.L
@(disp,Rm),Rn
(disp + Rm) → Rn
0101nnnnmmmmdddd
1
—
A.1.7
Indirect Indexed Register Addressing
Table A.15 Indirect Indexed Register Addressing
Instruction
Operation
Code
Cycles
T Bit
MOV.B
Rm,@(R0,Rn)
Rm → (R0 + Rn)
0000nnnnmmmm0100
1
—
MOV.W
Rm,@(R0,Rn)
Rm → (R0 + Rn)
0000nnnnmmmm0101
1
—
MOV.L
Rm,@(R0,Rn)
Rm → (R0 + Rn)
0000nnnnmmmm0110
1
—
FMOV.S FRm,@(R0,Rn)
FRm → (R0 + Rn)
1111nnnnmmmm0111
1
—
MOV.B
@(R0,Rm),Rn
(R0 + Rm) → sign
extension → Rn
0000nnnnmmmm1100
1
—
MOV.W
@(R0,Rm),Rn
(R0 + Rm) → sign
extension → Rn
0000nnnnmmmm1101
1
—
MOV.L
@(R0,Rm),Rn
(R0 + Rm) → Rn
0000nnnnmmmm1110
1
—
(R0 + Rn) → FRn
1111nnnnmmmm0110
1
—
FMOV.S @(R0,FRm),FRm
295
A.1.8
Indirect GBR Addressing with Displacement
Table A.16 Indirect GBR Addressing with Displacement
Instruction
Operation
Code
Cycles
T Bit
MOV.B
R0,@(disp,GBR)
R0 → (disp + GBR)
11000000dddddddd
1
—
MOV.W
R0,@(disp,GBR)
R0 → (disp + GBR)
11000001dddddddd
1
—
MOV.L
R0,@(disp,GBR)
R0 → (disp + GBR)
11000010dddddddd
1
—
MOV.B
@(disp,GBR),R0
(disp + GBR) → sign
extension → R0
11000100dddddddd
1
—
MOV.W
@(disp,GBR),R0
(disp + GBR) → sign
extension → R0
11000101dddddddd
1
—
MOV.L
@(disp,GBR),R0
(disp + GBR) → R0
11000110dddddddd
1
—
A.1.9
Indirect Indexed GBR Addressing
Table A.17 Indirect Indexed GBR Addressing
Instruction
Operation
Code
Cycles T Bit
AND.B
#imm,@(R0,GBR)
(R0 + GBR) & imm →
(R0 + GBR)
11001101iiiiiiii
3
—
OR.B
#imm,@(R0,GBR)
(R0 + GBR) | imm →
(R0 + GBR)
11001111iiiiiiii
3
—
TST.B
#imm,@(R0,GBR)
(R0 + GBR) & imm,
when result is 0, 1 → T
11001100iiiiiiii
3
Test
results
XOR.B
#imm,@(R0,GBR)
(R0 + GBR) ^ imm →
(R0 + GBR)
11001110iiiiiiii
3
—
A.1.10 PC Relative Addressing with Displacement
Table A.18 PC Relative Addressing with Displacement
Instruction
Operation
Code
Cycles
T Bit
MOV.W
@(disp,PC),Rn
(disp + PC) → sign
extension → Rn
1001nnnndddddddd
1
—
MOV.L
@(disp,PC),Rn
(disp + PC) → Rn
1101nnnndddddddd
1
—
MOVA
@(disp,PC),R0
disp + PC → R0
11000111dddddddd
1
—
296
A.1.11 PC Relative Addressing
Table A.19 PC Relative Addressing with Rn
Instruction
Operation
Code
Cycles T Bit
BRAF Rm
Delayed branch, Rm + PC → PC
0000nnnn00100011
2
—
BSRF Rm
Delayed branch, PC → PR, Rm + PC →
PC
0000nnnn00000011
2
—
Table A.20 PC Relative Addressing
Instruction
Operation
Code
Cycles T Bit
BF
label
When T = 0, disp + PC → PC;
when T = 1, nop
10001011dddddddd
3/1*
—
BF/S
label
If T = 0, disp + PC → PC;
if T = 1, nop
10001111dddddddd
2/1*
—
BT
label
When T = 1, disp + PC → PC;
when T = 1, nop
10001001dddddddd
3/1*
—
BT/S
label
If T = 1, disp + PC → PC;
if T = 0, nop
10001101dddddddd
2/1*
—
BRA
label
Delayed branching, disp + PC → PC 1010dddddddddddd
2
—
BSR
label
Delayed branching, PC → PR,
disp + PC → PC
2
—
1011dddddddddddd
Note: * One state when it does not branch.
A.1.12 Immediate
Table A.21 Load to Register
Instruction
Operation
Code
Cycles T Bit
FLDI0
FRn
0x00000000 → FRn
1111nnnn10001101
1
—
FLDI1
FRn
0x3F800000 → FRn
1111nnnn10011101
1
—
297
Table A.22 Arithmetic Logical Operations Direct with Register
Instruction
Operation
Code
Cycles T Bit
ADD
#imm,Rn
Rn + imm → Rn
0111nnnniiiiiiii
1
—
AND
#imm,R0
R0 & imm → R0
11001001iiiiiiii
1
—
CMP/EQ #imm,R0
When R0 = imm, 1 →
T
10001000iiiiiiii
1
Comparison
result
MOV
#imm,Rn
imm → sign extension
→ Rn
1110nnnniiiiiiii
1
—
OR
#imm,R0
R0 | imm → R0
11001011iiiiiiii
1
—
TST
#imm,R0
R0 & imm, when result
is 0, 1 → T
11001000iiiiiiii
1
Test results
XOR
#imm,R0
R0 ^ imm → R0
11001010iiiiiiii
1
—
Table A.23 Specify Exception Processing Vector
Instruction
Operation
Code
Cycles
T Bit
TRAPA
Stack area → PC/SR
(imm × 4 + VBR) → PC
11000011iiiiiiii
8
—
298
#imm
A.2
Instruction Sets by Instruction Format
Tables A.24 to A.54 list instruction codes and execution cycles by instruction formats.
Table A.24 Instruction Sets by Format
Format
Category
Sample Instruction
0
—
NOP
n
Direct register addressing
MOVT
Rn
Direct register addressing (store with control or
system registers)
STS
MACH,Rn
8
Indirect register addressing
TAS.B
@Rn
1
Pre-decrement indirect register addressing
STC.L
SR,@–Rn
8
Floating-point instruction
FABS
FRn
6
Direct register addressing (load with control or
system registers)
LDC
Rm,SR
8
PC relative addressing with Rm
BRAF
Rm
2
Indirect register addressing
JMP
@Rm
2
Post-increment indirect register addressing
LDC.L
@Rm+,SR
8
Floating-point instruction
FLDS
FRm,FPUL
2
Direct register addressing
ADD
Rm,Rn
34
Indirect register addressing
MOV.L
Rm,@Rn
6
Post-increment indirect register addressing
(multiply/accumulate operation)
MAC.W
@Rm+,@Rn+
2
Post-increment indirect register addressing
MOV.L
@Rm+,Rn
3
Pre-decrement indirect register addressing
MOV.L
Rm,@–Rn
3
Indirect indexed register addressing
MOV.L
Rm,@(R0,Rn)
6
Floating-point instruction
FADD
FRm,FRn
md
Indirect register addressing with displacement
MOV.B
@(disp,Rm),R0
2
nd4
Indirect register addressing with displacement
MOV.B
R0,@(disp,Rn)
2
nmd
Indirect register addressing with displacement
MOV.L
Rm,@(disp,Rn)
2
d
Indirect GBR addressing with displacement
MOV.L
R0,@(disp,GBR)
6
Indirect PC addressing with displacement
MOVA
@(disp,PC),R0
1
PC relative addressing
BF
disp
4
d12
PC relative addressing
BRA
disp
2
nd8
PC relative addressing with displacement
MOV.L
@(disp,PC),Rn
2
m
nm
Types
8
18
14
299
Table A.24 Instruction Sets by Format (cont)
Format
Category
Sample Instruction
i
Indirect indexed GBR addressing
AND.B
#imm,@(R0,GBR)
4
Immediate addressing (arithmetic and logical
operations direct with register)
AND
#imm,R0
5
Immediate addressing (specify exception
processing vector)
TRAPA
#imm
1
#imm,Rn
2
ni
Immediate addressing (direct register arithmetic ADD
operations and data transfers )
Types
Total:
A.2.1
172
0 Format
Table A.25 0 Format
Instruction
Operation
Code
Cycles
T Bit
CLRT
0→T
0000000000001000
1
0
CLRMAC
0 → MACH, MACL
0000000000101000
1
—
DIV0U
0 → M/Q/T
0000000000011001
1
0
NOP
No operation
0000000000001001
1
—
RTE
Delayed branch,
Stack area → PC/SR
0000000000101011
4
LSB
RTS
Delayed branching, PR → PC 0000000000001011
2
—
SETT
1→T
0000000000011000
1
1
SLEEP
Sleep
0000000000011011
3*
—
Note: * The number of excection cycles before the chip enters sleep mode.
300
A.2.2
n Format
Table A.26 Direct Register
Instruction
Operation
Code
Cycles T Bit
CMP/PL
Rn
Rn > 0, 1 → T
0100nnnn00010101
1
Comparison
result
CMP/PZ
Rn
Rn 0, 1 → T
0100nnnn00010001
1
Comparison
result
DT
Rn
Rn – 1 → Rn, when Rn is 0, 1 →
T. When Rn is nonzero, 0 → T
0100nnnn00010000
1
Comparison
result
MOVT
Rn
T → Rn
0000nnnn00101001
1
—
ROTL
Rn
T ← Rn ← MSB
0100nnnn00000100
1
MSB
ROTR
Rn
LSB → Rn → T
0100nnnn00000101
1
LSB
ROTCL
Rn
T ← Rn ← T
0100nnnn00100100
1
MSB
ROTCR
Rn
T → Rn → T
0100nnnn00100101
1
LSB
SHAL
Rn
T ← Rn ← 0
0100nnnn00100000
1
MSB
SHAR
Rn
MSB → Rn → T
0100nnnn00100001
1
LSB
SHLL
Rn
T ← Rn ← 0
0100nnnn00000000
1
MSB
SHLR
Rn
0 → Rn → T
0100nnnn00000001
1
LSB
SHLL2
Rn
Rn << 2 → Rn
0100nnnn00001000
1
—
SHLR2
Rn
Rn >> 2 → Rn
0100nnnn00001001
1
—
SHLL8
Rn
Rn << 8 → Rn
0100nnnn00011000
1
—
SHLR8
Rn
Rn >> 8 → Rn
0100nnnn00011001
1
—
SHLL16
Rn
Rn << 16 → Rn
0100nnnn00101000
1
—
SHLR16
Rn
Rn >> 16 → Rn
0100nnnn00101001
1
—
301
Table A.27 Direct Register (Store with Control and System Registers)
Instruction
Operation
Code
Cycles
T Bit
STC
SR,Rn
SR → Rn
0000nnnn00000010
1
—
STC
GBR,Rn
GBR → Rn
0000nnnn00010010
1
—
STC
VBR,Rn
VBR → Rn
0000nnnn00100010
1
—
STS
FPSCR,Rn
FPSCR→ Rn
0000nnnn01101010
1
—
STS
FPUL,Rn
FPUL→ Rn
0000nnnn01011010
1
—
STS
MACH,Rn
MACH → Rn
0000nnnn00001010
1
—
STS
MACL,Rn
MACL → Rn
0000nnnn00011010
1
—
STS
PR,Rn
PR → Rn
0000nnnn00101010
1
—
Table A.28 Indirect Register
Instruction
Operation
Code
Cycles
T Bit
TAS.B
When (Rn) is 0, 1 → T,
1 → MSB of (Rn)
0100nnnn00011011
4
Test results
@Rn
Table A.29 Indirect Pre-Decrement Register
Instruction
Operation
Code
Cycles
T Bit
STC.L
SR,@-Rn
Rn – 4 → Rn, SR → (Rn)
0100nnnn00000011
1
—
STC.L
GBR,@-Rn
Rn – 4 → Rn, GBR → (Rn)
0100nnnn00010011
1
—
STC.L
VBR,@-Rn
Rn – 4 → Rn, VBR → (Rn)
0100nnnn00100011
1
—
STS.L
FRSCR,@-Rn
Rn – 4 → Rn,FPSCR → Rn
0100nnnn01100010
1
—
STS.L
FPUL,@-Rn
Rn – 4 → Rn,FPUL → Rn
0100nnnn01010010
1
—
STS.L
MACH,@–Rn
Rn – 4 → Rn, MACH → (Rn)
0100nnnn00000010
1
—
STS.L
MACL,@–Rn
Rn – 4 → Rn, MACL → (Rn)
0100nnnn00010010
1
—
STS.L
PR,@–Rn
Rn – 4 → Rn, PR → (Rn)
0100nnnn00100010
1
—
Note: SH-3E instructions.
302
Table A.30 Floating-Point Instruction
Instruction
Operation
Code
Cycles
T Bit
FABS
FRn
FRn → FRn
1111nnnn01011101
1
—
FLDI0
FRn
H'00000000 → FRn
1111nnnn10001101
1
—
FLDI1
FRn
H'3F800000 → FRn
1111nnnn10011101
1
—
FLOAT
FPUL,FRn
(float)FPUL → FRn
1111nnnn00101101
1
—
FNEG
FRn
-FRn → FRn
1111nnnn01001101
1
—
FSTS
FPUL,FRn
FPUL → FRn
1111nnnn00001101
1
—
A.2.3
m Format
Table A.31 Direct Register (Load from Control and System Registers)
Instruction
Operation
Code
Cycles
T Bit
LDC
Rm,SR
Rm → SR
0100mmmm00001110
1
LSB
LDC
Rm,GBR
Rm → GBR
0100mmmm00011110
1
—
LDC
Rm,VBR
Rm → VBR
0100mmmm00101110
1
—
LDS
Rm,FPSCR
Rm → FPSCR
0100nnnn01101010
1
—
LDS
Rm,FPUL
Rm → FPUL
0100nnnn01011010
1
—
LDS
Rm,MACH
Rm → MACH
0100mmmm00001010
1
—
LDS
Rm,MACL
Rm → MACL
0100mmmm00011010
1
—
LDS
Rm,PR
Rm → PR
0100mmmm00101010
1
—
Table A.32 Indirect Register
Instruction
Operation
Code
Cycles T Bit
JMP
@Rm
Delayed branch, Rm → PC
0100mmmm00101011
2
—
JSR
@Rm
Delayed branch, PC → PR, Rm → PC
0100mmmm00001011
2
—
303
Table A.33 Indirect Post-Increment Register
Instruction
Operation
Code
Cycles
T Bit
LDC.L @Rm+,SR
(Rm) → SR, Rm + 4 → Rm
0100mmmm00000111
3
LSB
LDC.L @Rm+,GBR
(Rm) → GBR, Rm + 4 → Rm
0100mmmm00010111
3
—
LDC.L @Rm+,VBR
(Rm) → VBR, Rm + 4 → Rm
0100mmmm00100111
3
—
LDS.L @Rm+,FPSCR
@Rm → FPSCR,
Rm + 4 → Rm
0100nnnn01100110
1
—
LDS.L @Rm+,FPUL
@Rm → FPUL,
Rm + 4 → Rm
0100nnnn01010110
1
—
LDS.L @Rm+,MACH
(Rm) → MACH, Rm + 4 → Rm
0100mmmm00000110
1
—
LDS.L @Rm+,MACL
(Rm) → MACL, Rm + 4 → Rm
0100mmmm00010110
1
—
LDS.L @Rm+,PR
(Rm) → PR, Rm + 4 → Rm
0100mmmm00100110
1
—
Table A.34 PC Relative Addressing with Rn
Instruction
Operation
Code
Cycles T Bit
BRAF Rn
Delayed branch, Rn + PC → PC
0000nnnn00100011
2
—
BSRF Rn
Delayed branch, PC → PR,
Rn + PC → PC
0000nnnn00000011
2
—
Table A.35 Floating-Point Instructions
Instruction
Operation
Code
Cycles
T Bit
FLDS
FRm,FPUL
FRm → FPUL
1111nnnn00011101
1
—
FTRC
FRm,FPUL
(long)FRm → FPUL
1111nnnn00111101
1
—
304
A.2.4
nm Format
Table A.36 Direct Register
Instruction
Operation
Code
Cycles
T Bit
ADD
Rm,Rn
Rm + Rn → Rn
0011nnnnmmmm1100
1
—
ADDC
Rm,Rn
Rn + Rm + T → Rn,
carry → T
0011nnnnmmmm1110
1
Carry
ADDV
Rm,Rn
Rn + Rm → Rn,
overflow → T
0011nnnnmmmm1111
1
Overflow
AND
Rm,Rn
Rn & Rm → Rn
0010nnnnmmmm1001
1
—
CMP/EQ
Rm,Rn
When Rn = Rm, 1 → T
0011nnnnmmmm0000
1
Comparison
result
CMP/HS
Rm,Rn
When unsigned and Rn
Rm, 1 → T
0011nnnnmmmm0010
1
Comparison
result
CMP/GE
Rm,Rn
When signed and Rn
Rm, 1 → T
0011nnnnmmmm0011
1
Comparison
result
CMP/HI
Rm,Rn
When unsigned and Rn >
Rm, 1 → T
0011nnnnmmmm0110
1
Comparison
result
CMP/GT
Rm,Rn
When signed and Rn >
Rm, 1 → T
0011nnnnmmmm0111
1
Comparison
result
CMP/STR Rm,Rn
When a byte in Rn equals
a byte in Rm, 1 → T
0010nnnnmmmm1100
1
Comparison
result
DIV1
Rm,Rn
1 step division (Rn ÷ Rm)
0011nnnnmmmm0100
1
Calculation
result
DIV0S
Rm,Rn
MSB of Rn → Q, MSB of
Rm → M, M ^ Q → T
0010nnnnmmmm0111
1
Calculation
result
DMULS.L Rm,Rn
Signed operation of Rn ×
Rm → MACH, MACL
0011nnnnmmmm1101
2 to 4*
—
DMULU.L Rm,Rn
Unsigned operation of Rn
× Rm → MACH, MACL
0011nnnnmmmm0101
2 to 4*
—
EXTS.B
Rm,Rn
Sign-extend Rm from byte
→ Rn
0110nnnnmmmm1110
1
—
EXTS.W
Rm,Rn
Sign-extend Rm from word
→ Rn
0110nnnnmmmm1111
1
—
EXTU.B
Rm,Rn
Zero-extend Rm from byte
→ Rn
0110nnnnmmmm1100
1
—
EXTU.W
Rm,Rn
Zero-extend Rm from word
→ Rn
0110nnnnmmmm1101
1
—
MOV
Rm,Rn
Rm → Rn
0110nnnnmmmm0011
1
—
305
Table A.36 Direct Register (cont)
Instruction
Operation
Code
Cycles
T Bit
MUL.L
Rm,Rn
Rn × Rm → MAC
0000nnnnmmmm0111
2 to 4*
—
MULS.W
Rm,Rn
With sign, Rn × Rm → MAC
0010nnnnmmmm1111
1 to 3*
—
MULU.W
Rm,Rn
Unsigned, Rn × Rm → MAC
0010nnnnmmmm1110
1 to 3*
—
NEG
Rm,Rn
0 – Rm → Rn
0110nnnnmmmm1011
1
—
NEGC
Rm,Rn
0 – Rm – T → Rn, Borrow → T
0110nnnnmmmm1010
1
Borrow
NOT
Rm,Rn
~Rm → Rn
0110nnnnmmmm0111
1
—
OR
Rm,Rn
Rn | Rm → Rn
0010nnnnmmmm1011
1
—
SUB
Rm,Rn
Rn – Rm → Rn
0011nnnnmmmm1000
1
—
SUBC
Rm,Rn
Rn – Rm – T → Rn, Borrow → T
0011nnnnmmmm1010
1
Borrow
SUBV
Rm,Rn
Rn – Rm → Rn, Underflow → T
0011nnnnmmmm1011
1
Underflow
SWAP.B
Rm,Rn
Rm → Swap upper and lower
halves of lower 2 bytes → Rn
0110nnnnmmmm1000
1
—
SWAP.W
Rm,Rn
Rm → Swap upper and lower
word → Rn
0110nnnnmmmm1001
1
—
TST
Rm,Rn
Rn & Rm, when result is 0, 1 → T
0010nnnnmmmm1000
1
Test
results
XOR
Rm,Rn
Rn ^ Rm → Rn
0010nnnnmmmm1010
1
—
XTRCT
Rm,Rn
Rm: Center 32 bits of Rn → Rn
0010nnnnmmmm1101
1
—
Note: The normal minimum number of execution states.
Table A.37 Indirect Register
Instruction
Operation
Code
Cycles
T Bit
MOV.B
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0000
1
—
MOV.W
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0001
1
—
MOV.L
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0010
1
—
MOV.B
@Rm,Rn
(Rm) → sign extension → Rn 0110nnnnmmmm0000
1
—
MOV.W
@Rm,Rn
(Rm) → sign extension → Rn 0110nnnnmmmm0001
1
—
MOV.L
@Rm,Rn
(Rm) → Rn
1
—
306
0110nnnnmmmm0010
Table A.38 Indirect Post-Increment Register (Multiply/Accumulate Operation)
Instruction
Operation
Code
Cycles
T Bit
MAC.L
@Rm+,@Rn+
Signed operation of (Rn) ×
(Rm) + MAC → MAC
0000nnnnmmmm1111
3/(2 to
4)*
—
MAC.W
@Rm+,@Rn+
Signed operation of (Rn) ×
(Rm) + MAC → MAC
0100nnnnmmmm1111
3/(2)*
—
Note: * Normal minimum number of execution states (the number in parentheses is the number of
states when there is contention with preceding/following instructions).
Table A.39 Indirect Post-Increment Register
Instruction
Operation
Code
Cycles
T Bit
MOV.B
@Rm+,Rn
(Rm) → sign extension → Rn,
Rm + 1 → Rm
0110nnnnmmmm0100
1
—
MOV.W
@Rm+,Rn
(Rm) → sign extension → Rn,
Rm + 2 → Rm
0110nnnnmmmm0101
1
—
MOV.L
@Rm+,Rn
(Rm) → Rn, Rm + 4 → Rm
0110nnnnmmmm0110
1
—
Table A.40 Indirect Pre-Decrement Register
Instruction
Operation
Code
Cycles
T Bit
MOV.B
Rm,@–Rn
Rn – 1 → Rn, Rm → (Rn)
0010nnnnmmmm0100
1
—
MOV.W
Rm,@–Rn
Rn – 2 → Rn, Rm → (Rn)
0010nnnnmmmm0101
1
—
MOV.L
Rm,@–Rn
Rn – 4 → Rn, Rm → (Rn)
0010nnnnmmmm0110
1
—
Table A.41 Indirect Indexed Register
Instruction
Operation
Code
Cycles
T Bit
MOV.B
Rm,@(R0,Rn)
Rm → (R0 + Rn)
0000nnnnmmmm0100
1
—
MOV.W
Rm,@(R0,Rn)
Rm → (R0 + Rn)
0000nnnnmmmm0101
1
—
MOV.L
Rm,@(R0,Rn)
Rm → (R0 + Rn)
0000nnnnmmmm0110
1
—
MOV.B
@(R0,Rm),Rn
(R0 + Rm) → sign
extension → Rn
0000nnnnmmmm1100
1
—
MOV.W
@(R0,Rm),Rn
(R0 + Rm) → sign
extension → Rn
0000nnnnmmmm1101
1
—
MOV.L
@(R0,Rm),Rn
(R0 + Rm) → Rn
0000nnnnmmmm1110
1
—
307
Table A.42 Floating Point Instructions
Instruction
Operation
Code
Cycles
T Bit
FADD
FRm,FRn
FRn+FRm → FRn
1111nnnnmmmm0000
1
—
FCMP/EQ
FRm,FRn
(FRn=FRm)? 1:0 → T
1111nnnnmmmm0100
1
Comparison
result
FCMP/GT
FRm,FRn
(FRn>FRm)? 1:0 → T
1111nnnnmmmm0101
1
Comparison
result
FDIV
FRm,FRn
FRn/FRm → FRn
1111nnnnmmmm0011
13
—
FMAC
FR0,FRm,FRn
FR0×FRm+FRn → FRn
1111nnnnmmmm1110
1
—
FMOV
FRm,FRn
FRm → FRn
1111nnnnmmmm1100
1
—
FMOV.S
@(R0,Rm),FRn
(R0+Rm) → FRn
1111nnnnmmmm0110
1
—
FMOV.S
@Rm+,FRn
(Rm) → FRn,Rm+4 → Rm
1111nnnnmmmm1001
1
—
FMOV.S
@Rm,FRn
(Rm) → FRn
1111nnnnmmmm1000
1
—
FMOV.S
FRm,@(R0,Rn)
FRm → (R0+Rn)
1111nnnnmmmm0111
1
—
FMOV.S
FRm,@-Rn
Rn-4 → Rn, FRm → (Rn)
1111nnnnmmmm1011
1
—
FMOV.S
FRm,@Rn
FRm → (Rn)
1111nnnnmmmm1010
1
—
FMUL
FRm,FRn
FRn × FRm → FRn
1111nnnnmmmm0010
1
—
FSUB
FRm,FRn
FRn-FRm → FRn
1111nnnnmmmm0001
1
—
A.2.5
md Format
Table A.43 md Format
Instruction
Operation
Code
Cycles
T Bit
MOV.B
@(disp,Rm),R0
(disp + Rm) → sign
extension → R0
10000100mmmmdddd
1
—
MOV.W
@(disp,Rm),R0
(disp × 2 + Rm) →
sign extension → R0
10000101mmmmdddd
1
—
A.2.6
nd4 Format
Table A.44 nd4 Format
Instruction
Operation
Code
Cycles
T Bit
MOV.B
R0,@(disp,Rn)
R0 → (disp + Rn)
10000000nnnndddd
1
—
MOV.W
R0,@(disp,Rn)
R0 → (disp × 2 + Rn)
10000001nnnndddd
1
—
308
A.2.7
nmd Format
Table A.45 nmd Format
Instruction
Operation
Code
Cycles
T Bit
MOV.L
Rm,@(disp,Rn)
Rm → (disp + Rn)
0001nnnnmmmmdddd
1
—
MOV.L
@(disp,Rm),Rn
(disp × 4 + Rm) → Rn
0101nnnnmmmmdddd
1
—
A.2.8
d Format
Table A.46 Indirect GBR with Displacement
Instruction
Operation
Code
Cycles
T Bit
MOV.B
R0,@(disp,GBR)
R0 → (disp + GBR)
11000000dddddddd
1
—
MOV.W
R0,@(disp,GBR)
R0 → (disp × 2 + GBR)
11000001dddddddd
1
—
MOV.L
R0,@(disp,GBR)
R0 → (disp × 4 + GBR)
11000010dddddddd
1
—
MOV.B
@(disp,GBR),R0
(disp + GBR) → sign
extension → R0
11000100dddddddd
1
—
MOV.W
@(disp,GBR),R0
(disp × 2 + GBR) →
sign extension → R0
11000101dddddddd
1
—
MOV.L
@(disp,GBR),R0
(disp × 4 + GBR) → R0
11000110dddddddd
1
—
Table A.47 PC Relative with Displacement
Instruction
Operation
Code
Cycles
T Bit
MOVA
disp × 4 + PC → R0
11000111dddddddd
1
—
@(disp,PC),R0
Table A.48 PC Relative
Instruction
Operation
Code
Cycles
T Bit
BF
When T = 0, disp × 2 + PC →
PC; when T = 1, nop
10001011dddddddd
3/1*
—
BF/S label
If T = 0, disp × 2 + PC → PC;
if T = 1, nop
10001111dddddddd
2/1*
—
BT
label
When T = 1, disp × 2 + PC → PC; 10001001dddddddd
when T = 0, nop
3/1*
—
BT/S
label
If T = 1, disp × 2 + PC → PC;
if T = 0, nop
2/1*
label
10001101dddddddd
Note: * One state when it does not branch.
309
A.2.9
d12 Format
Table A.49 d12 Format
Instruction
Operation
BRA label
BSR label
Code
Cycles
T Bit
Delayed branching, disp × 2 + PC → PC 1010dddddddddddd
2
—
Delayed branching, PC → PR,
disp × 2 + PC → PC
2
—
1011dddddddddddd
A.2.10 nd8 Format
Table A.50 nd8 Format
Instruction
Operation
Code
Cycles
T Bit
MOV.W
@(disp,PC),Rn
(disp × 2 + PC) → sign
extension → Rn
1001nnnndddddddd
1
—
MOV.L
@(disp,PC),Rn
(disp × 4 + PC) → Rn
1101nnnndddddddd
1
—
A.2.11 i Format
Table A.51 Indirect Indexed GBR
Instruction
Operation
Code
Cycles T Bit
AND.B
#imm,@(R0,GBR)
(R0 + GBR) & imm →
(R0 + GBR)
11001101iiiiiiii
3
—
OR.B
#imm,@(R0,GBR)
(R0 + GBR) | imm →
(R0 + GBR)
11001111iiiiiiii
3
—
TST.B
#imm,@(R0,GBR)
(R0 + GBR) & imm,
when result is 0, 1 → T
11001100iiiiiiii
3
Test
results
XOR.B
#imm,@(R0,GBR)
(R0 + GBR) ^ imm →
(R0 + GBR)
11001110iiiiiiii
3
—
310
Table A.52 Immediate (Arithmetic Logical Operation with Direct Register)
Instruction
Operation
Code
Cycles T Bit
AND
#imm,R0
R0 & imm → R0
11001001iiiiiiii
1
—
CMP/EQ
#imm,R0
When R0 = imm, 1 →
T
10001000iiiiiiii
1
Comparison
results
OR
#imm,R0
R0 | imm → R0
11001011iiiiiiii
1
—
TST
#imm,R0
R0 & imm, when result
is 0, 1 → T
11001000iiiiiiii
1
Test results
XOR
#imm,R0
R0 ^ imm → R0
11001010iiiiiiii
1
—
Table A.53 Immediate (Specify Exception Processing Vector)
Instruction
Operation
Code
Cycles
T Bit
TRAPA
Stack area → PC/SR
(imm × 4 + VBR) → PC
11000011iiiiiiii
8
—
#imm
A.2.12 ni Format
Table A.54 ni Format
Instruction
Operation
Code
Cycles
T Bit
ADD
#imm,Rn
Rn + imm → Rn
0111nnnniiiiiiii
1
—
MOV
#imm,Rn
imm → sign extension → Rn
1110nnnniiiiiiii
1
—
311
A.3
Instruction Set by Instruction Code
Table A.55 lists instruction codes and execution cycles by instruction code.
Table A.55 Instruction Set by Instruction Code
Instruction
Operation
Code
Cycles
T Bit
CLRT
0→T
0000000000001000
1
0
NOP
No operation
0000000000001001
1
—
RTS
Delayed branching,
PR → PC
0000000000001011
2
—
SETT
1→T
0000000000011000
1
1
DIV0U
0 → M/Q/T
0000000000011001
1
0
SLEEP
Sleep
0000000000011011
3
—
CLRMAC
0 → MACH, MACL
0000000000101000
1
—
RTE
Delayed branch,
SSR/SPC → SR/PC
0000000000101011
4
—
STC
SR,Rn
SR → Rn
0000nnnn00000010
1
—
BSRF
Rn
Delayed branch, PC →
PR, Rn + PC → PC
0000nnnn00000011
2
—
STS
MACH,Rn
MACH → Rn
0000nnnn00001010
1
—
STC
GBR,Rn
GBR → Rn
0000nnnn00010010
1
—
STS
MACL,Rn
MACL → Rn
0000nnnn00011010
1
—
STC
VBR,Rn
VBR → Rn
0000nnnn00100010
1
—
BRAF
Rm
Delayed branch,
Rn + PC → PC
0000nnnn00100011
2
—
MOVT
Rn
T → Rn
0000nnnn00101001
1
—
STS
PR,Rn
PR → Rn
0000nnnn00101010
1
—
STS
FPUL,Rn
FPUL → Rn
0000nnnn01011010
1
—
STS
FPSCR,Rn
FPSCR → Rn
0000nnnn01101010
1
—
MOV.B
Rm,@(R0,Rn)
Rm → (R0 + Rn)
0000nnnnmmmm0100
1
—
MOV.W
Rm,@(R0,Rn)
Rm → (R0 + Rn)
0000nnnnmmmm0101
1
—
MOV.L
Rm,@(R0,Rn)
Rm → (R0 + Rn)
0000nnnnmmmm0110
1
—
MUL.L
Rm,Rn
Rn × Rm → MACL
0000nnnnmmmm0111
2 to 4*
—
MOV.B
@(R0,Rm),Rn
(R0 + Rm) → sign
extension → Rn
0000nnnnmmmm1100
1
—
MOV.W
@(R0,Rm),Rn
(R0 + Rm) → sign
extension → Rn
0000nnnnmmmm1101
1
—
312
Table A.55 Instruction Set by Instruction Code (cont)
Instruction
Operation
Code
Cycles T Bit
MOV.L
@(R0,Rm),
Rn
(R0 + Rm) → Rn
0000nnnnmmmm1110
1
—
MAC.L
@Rm+,@Rn+
Signed operation of (Rn)
× (Rm) + MAC → MAC
0000nnnnmmmm1111
3/(2 to
4)*
—
MOV.L
Rm,
@(disp,Rn)
Rm → (disp × 4 + Rn)
0001nnnnmmmmdddd
1
—
MOV.B
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0000
1
—
MOV.W
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0001
1
—
MOV.L
Rm,@Rn
Rm → (Rn)
0010nnnnmmmm0010
1
—
MOV.B
Rm,@-Rn
Rn – 1 → Rn, Rm → (Rn)
0010nnnnmmmm0100
1
—
MOV.W
Rm,@–Rn
Rn – 2 → Rn, Rm → (Rn)
0010nnnnmmmm0101
1
—
MOV.L
Rm,@–Rn
Rn – 4 → Rn, Rm → (Rn)
0010nnnnmmmm0110
1
—
DIV0S
Rm,Rn
MSB of Rn → Q, MSB of
Rm → M, M ^ Q → T
0010nnnnmmmm0111
1
Calculation
result
TST
Rm,Rn
Rn & Rm, when result is
0, 1 → T
0010nnnnmmmm1000
1
Test results
AND
Rm,Rn
Rn & Rm → Rn
0010nnnnmmmm1001
1
—
XOR
Rm,Rn
Rn ^ Rm → Rn
0010nnnnmmmm1010
1
—
OR
Rm,Rn
Rn | Rm → Rn
0010nnnnmmmm1011
1
—
CMP/STR Rm,Rn
When a byte in Rn equals 0010nnnnmmmm1100
a byte in Rm, 1 → T
1
Comparison
result
XTRCT
Rm,Rn
Rm: Center 32 bits of Rn
→ Rn
0010nnnnmmmm1101
1
—
MULU.W
Rm,Rn
Unsigned, Rn × Rm →
MAC
0010nnnnmmmm1110
1 to 3*
—
MULS.W
Rm,Rn
Signed, Rn × Rm → MAC
0010nnnnmmmm1111
1 to 3*
—
CMP/EQ
Rm,Rn
When Rn = Rm, 1 → T
0011nnnnmmmm0000
1
Comparison
result
CMP/HS
Rm,Rn
When unsigned
and Rn Rm, 1 → T
0011nnnnmmmm0010
1
Comparison
result
CMP/GE
Rm,Rn
When signed and Rn
Rm, 1 → T
0011nnnnmmmm0011
1
Comparison
result
DIV1
Rm,Rn
1 step division (Rn ÷ Rm)
0011nnnnmmmm0100
1
Calculation
result
313
Table A.55 Instruction Set by Instruction Code (cont)
Instruction
Operation
DMULU.L Rm,Rn
Unsigned operation of Rn 0011nnnnmmmm0101
× Rm → MACH, MACL
2 to 4*
—
CMP/HI
Rm,Rn
When unsigned and
Rn > Rm, 1 → T
0011nnnnmmmm0110
1
Comparison
result
CMP/GT
Rm,Rn
When signed
and Rn > Rm, 1 → T
0011nnnnmmmm0111
1
Comparison
result
SUB
Rm,Rn
Rn – Rm → Rn
0011nnnnmmmm1000
1
—
SUBC
Rm,Rn
Rn – Rm – T → Rn,
Borrow → T
0011nnnnmmmm1010
1
Borrow
SUBV
Rm,Rn
Rn – Rm → Rn,
underflow → T
0011nnnnmmmm1011
1
Underflow
ADD
Rm,Rn
Rm + Rn → Rn
0011nnnnmmmm1100
1
—
DMULS.L
Rm,Rn
Signed operation of Rn ×
Rm → MACH, MACL
0011nnnnmmmm1101
2 to 4*
—
ADDC
Rm,Rn
Rn + Rm + T → Rn, carry 0011nnnnmmmm1110
→T
1
Carry
ADDV
Rm,Rn
Rn + Rm → Rn, overflow
→T
0011nnnnmmmm1111
1
Overflow
SHLL
Rn
T ← Rn ← 0
0100nnnn00000000
1
MSB
SHLR
Rn
0 → Rn → T
0100nnnn00000001
1
LSB
STS.L
MACH,@–Rn
Rn – 4 → Rn,
MACH → (Rn)
0100nnnn00000010
1
—
STC.L
SR,@-Rn
Rn – 4 → Rn,
SR → (Rn)
0100nnnn00000011
2
—
ROTL
Rn
T ← Rn ← MSB
0100nnnn00000100
1
MSB
ROTR
Rn
LSB → Rn → T
0100nnnn00000101
1
LSB
LDS.L
@Rm+,MACH
(Rm) → MACH,
Rm + 4 → Rm
0100mmmm00000110
1
—
LDC.L
@Rm+,SR
(Rm) → SR,
Rm + 4 → Rm
0100mmmm00000111
3
LSB
SHLL2
Rn
Rn << 2 → Rn
0100nnnn00001000
1
—
SHLR2
Rn
Rn >> 2 → Rn
0100nnnn00001001
1
—
LDS
Rm,MACH
Rm → MACH
0100mmmm00001010
1
—
JSR
@Rm
Delayed branching, PC → 0100nnnn00001011
Rn, Rn → PC
2
—
314
Code
Cycles T Bit
Table A.55 Instruction Set by Instruction Code (cont)
Instruction
Operation
Code
Cycles T Bit
LDC
Rm,SR
Rm → SR
0100mmmm00001110
1
LSB
DT
Rn
Rn - 1 → Rn, when Rn is
0, 1 → T. When Rn is
nonzero, 0 → T
0100nnnn00010000
1
Comparison
result
CMP/PZ Rn
Rn 0, 1 → T
0100nnnn00010001
1
Comparison
result
STS.L
MACL,@–Rn
Rn – 4 → Rn,
MACL → (Rn)
0100nnnn00010010
1
—
STC.L
GBR,@-Rn
Rn – 4 → Rn,
GBR → (Rn)
0100nnnn00010011
2
—
CMP/PL Rn
Rn > 0, 1 → T
0100nnnn00010101
1
Comparison
result
LDS.L
@Rm+,MACL
(Rm) → MACL,
Rm + 4 → Rm
0100mmmm00010110
1
—
LDC.L
@Rm+,GBR
(Rm) → GBR,
Rm + 4 → Rm
0100mmmm00010111
3
—
SHLL8
Rn
Rn << 8 → Rn
0100nnnn00011000
1
—
SHLR8
Rn
Rn >> 8 → Rn
0100nnnn00011001
1
—
LDS
Rm,MACL
Rm → MACL
0100mmmm00011010
1
—
TAS.B
@Rn
When (Rn) is 0, 1 → T,
1 → MSB of (Rn)
0100nnnn00011011
4
Test results
LDC
Rm,GBR
Rm → GBR
0100mmmm00011110
1
—
SHAL
Rn
T ← Rn ← 0
0100nnnn00100000
1
MSB
SHAR
Rn
MSB → Rn → T
0100nnnn00100001
1
LSB
STS.L
PR,@–Rn
Rn – 4 → Rn, PR → (Rn)
0100nnnn00100010
1
—
STC.L
VBR,@-Rn
Rn – 4 → Rn,
VBR → (Rn)
0100nnnn00100011
2
—
ROTCL
Rn
T ← Rn ← T
0100nnnn00100100
1
MSB
ROTCR
Rn
T → Rn → T
0100nnnn00100101
1
LSB
LDS.L
@Rm+,PR
(Rm) → PR,
Rm + 4 → Rm
0100mmmm00100110
1
—
LDC.L
@Rm+,VBR
(Rm) → VBR,
Rm + 4 → Rm
0100mmmm00100111
3
—
SHLL16 Rn
Rn << 16 → Rn
0100nnnn00101000
1
—
SHLR16 Rn
Rn >> 16 → Rn
0100nnnn00101001
1
—
315
Table A.55 Instruction Set by Instruction Code (cont)
Instruction
Operation
Code
Cycles T Bit
LDS
Rm,PR
Rm → PR
0100mmmm00101010
1
—
JMP
@Rm
Delayed branching,
Rm → PC
0100nnnn00101011
2
—
LDC
Rm,VBR
Rm → VBR
0100mmmm00101110
1
—
STS.L
FPUL,@-Rn
Rn-4 → Rn, FPUL → (Rn)
0100nnnn01010010
1
—
LDS.L
@Rm+,FPUL
(Rm) → FPUL, Rm+4 →
Rm
0100nmmm01010110
1
—
LDS
Rm,FPUL
Rm → FPUL
0100mmmm01011010
1
—
STS.L
FPSCR,@-Rn
Rn-4 → Rn, FPSCR →
(Rn)
0100nnnn01100010
1
—
LDS.L
@Rm,FPSCR
(Rm) → FPSCR, Rm+4 →
Rm
0100mmmm01100110
1
—
LDS
Rm,FPSCR
Rm → FPSCR
0100nmmm01101010
1
—
MAC.W
@Rm+,@Rn+
With sign, (Rn) × (Rm) +
MAC → MAC
0100nnnnmmmm1111
3/(2)*
—
MOV.L
@(disp,Rm),Rn
(disp + Rm) → Rn
0101nnnnmmmmdddd
1
—
MOV.B
@Rm,Rn
(Rm) → sign extension →
Rn
0110nnnnmmmm0000
1
—
MOV.W
@Rm,Rn
(Rm) → sign extension →
Rn
0110nnnnmmmm0001
1
—
MOV.L
@Rm,Rn
(Rm) → Rn
0110nnnnmmmm0010
1
—
MOV
Rm,Rn
Rm → Rn
0110nnnnmmmm0011
1
—
MOV.B
@Rm+,Rn
(Rm) → sign extension →
Rn, Rm + 1 → Rm
0110nnnnmmmm0100
1
—
MOV.W
@Rm+,Rn
(Rm) → sign extension →
Rn, Rm + 2 → Rm
0110nnnnmmmm0101
1
—
MOV.L
@Rm+,Rn
(Rm) → Rn, Rm + 4 → Rm
0110nnnnmmmm0110
1
—
NOT
Rm,Rn
~Rm → Rn
0110nnnnmmmm0111
1
—
SWAP.B
Rm,Rn
Rm → Swap upper and
lower halves of lower 2
bytes → Rn
0110nnnnmmmm1000
1
—
SWAP.W
Rm,Rn
Rm → Swap upper and
lower word → Rn
0110nnnnmmmm1001
1
—
316
Table A.55 Instruction Set by Instruction Code (cont)
Instruction
Operation
Code
Cycles T Bit
NEGC
Rm,Rn
0 – Rm – T → Rn,
Borrow → T
0110nnnnmmmm1010
1
Borrow
NEG
Rm,Rn
0 – Rm → Rn
0110nnnnmmmm1011
1
—
EXTU.B Rm,Rn
Zero-extend Rm
from byte → Rn
0110nnnnmmmm1100
1
—
EXTU.W Rm,Rn
Zero-extend Rm
from word → Rn
0110nnnnmmmm1101
1
—
EXTS.B Rm,Rn
Sign-extend Rm
from byte → Rn
0110nnnnmmmm1110
1
—
EXTS.W Rm,Rn
Sign-extend Rm
from word → Rn
0110nnnnmmmm1111
1
—
ADD
#imm,Rn
Rn + #imm → Rn
0111nnnniiiiiiii
1
—
MOV.B
R0,@(disp,Rn)
R0 → (disp + Rn)
10000000nnnndddd
1
—
MOV.W
R0,@(disp,Rn)
R0 → (disp + Rn)
10000001nnnndddd
1
—
MOV.B
@(disp,Rm),R0
(disp + Rm) → sign
extension → R0
10000100mmmmdddd
1
—
MOV.W
@(disp,Rm),R0
(disp + Rm) → sign
extension → R0
10000101mmmmdddd
1
—
CMP/EQ #imm,R0
When R0 = imm,
1→T
10001000iiiiiiii
1
Comparison
results
BT
label
When T = 1,
disp + PC → PC;
when T = 1, nop.
10001001dddddddd
3/1*2
—
BF
label
When T = 0,
disp + PC → PC;
when T = 1, nop
10001011dddddddd
3/1*2
—
BT/S
label
If T = 1, disp + PC → 10001101dddddddd
PC; if T = 0, nop
2/1*2
—
BF/S
label
If T = 0, disp + PC → 10001111dddddddd
PC; if T = 1, nop
2/1*2
—
MOV.W
@(disp,PC),Rn
(disp + PC) → sign
extension → Rn
1001nnnndddddddd
1
—
BRA
label
Delayed branching,
disp + PC → PC
1010dddddddddddd
2
—
BSR
label
Delayed branching,
PC → PR,
disp + PC → PC
1011dddddddddddd
2
—
317
Table A.55 Instruction Set by Instruction Code (cont)
Instruction
Operation
Code
Cycles T Bit
MOV.B
R0,@(disp,GBR)
R0 → (disp + GBR)
11000000dddddddd
1
—
MOV.W
R0,@(disp,GBR)
R0 → (disp × 2 + GBR)
11000001dddddddd
1
—
MOV.L
R0,@(disp,GBR)
R0 → (disp × 4 + GBR)
11000010dddddddd
1
—
TRAPA
#imm
Stack area → PC/SR
(imm × 4 + VBR) → PC
11000011iiiiiiii
8
—
MOV.B
@(disp,GBR),R0
(disp + GBR) → sign
extension → R0
11000100dddddddd
1
—
MOV.W
@(disp,GBR),R0
(disp × 2 + GBR) →
sign extension → R0
11000101dddddddd
1
—
MOV.L
@(disp,GBR),R0
(disp × 4 + GBR) → R0
11000110dddddddd
1
—
MOVA
@(disp,PC),R0
disp × 4 + PC → R0
11000111dddddddd
1
—
TST
#imm,R0
R0 & imm,
when result is 0, 1 → T
11001000iiiiiiii
1
Test
results
AND
#imm,R0
R0 & imm → R0
11001001iiiiiiii
1
—
XOR
#imm,R0
R0 ^ imm → R0
11001010iiiiiiii
1
—
OR
#imm,R0
R0 | imm → R0
11001011iiiiiiii
1
—
TST.B
#imm,@(R0,GBR)
(R0 + GBR) & imm,
when result is 0, 1 → T
11001100iiiiiiii
3
Test
results
AND.B
#imm,@(R0,GBR)
(R0 + GBR) & imm →
(R0 + GBR)
11001101iiiiiiii
3
—
XOR.B
#imm,@(R0,GBR)
(R0 + GBR) ^ imm →
(R0 + GBR)
11001110iiiiiiii
3
—
OR.B
#imm,@(R0,GBR)
(R0 + GBR) | imm →
(R0 + GBR)
11001111iiiiiiii
3
—
MOV.L
@(disp,PC),Rn
(disp × 4 + PC) → Rn
1101nnnndddddddd
1
—
MOV
#imm,Rn
#imm → sign extension
→ Rn
1110nnnniiiiiiii
1
—
FSTS
FPUL,FRn
FPUL → FRn
1111nnnn00001101
1
—
FLDS
FRm,FPUL
FRm → FPUL
1111nnnn00011101
1
—
FLOAT
FPUL,FRn
(float)FPUL → FRn
1111nnnn00101101
1
—
FTRC
FRm,FPUL
(long)FRm → FPUL
1111nnnn00111101
1
—
FNEG
FRn
-FRn → FRn
1111nnnn01001101
1
—
FABS
FRn
FRn → FRn
1111nnnn01011101
1
—
FLDI0
FRn
H'00000000 → FRn
1111nnnn10001101
1
—
FLDI1
FRn
H'3F800000 → FRn
1111nnnn10011101
1
—
318
Table A.55 Instruction Set by Instruction Code (cont)
Instruction
Operation
Code
Cycles T Bit
FADD
FRm,FRn
FRn + FRm → FRn
1111nnnnmmmm0000
1
—
FSUB
FRm,FRn
FRn – FRm → FRn
1111nnnnmmmm0001
1
—
FMUL
FRm,FRn
FRn × FRm → FRn
1111nnnnmmmm0010
1
—
FDIV
FRm,FRn
FRn/FRm → FRn
1111nnnnmmmm0011
13
—
FCMP/EQ FRm,FRn
(FRn = FRm)?1:0 → T 1111nnnnmmmm0100
1
Comparison
result
FCMP/GT FRm,FRn
(FRn > FRm)?1:0 → T 1111nnnnmmmm0101
1
Comparison
result
FMOV.S @(R0,Rm),FRn
(R0 + Rm) → FRn
1111nnnnmmmm0110
1
—
FMOV.S FRm,@(R0,Rn)
(FRm) → (R0 + Rn)
1111nnnnmmmm0111
1
—
FMOV.S @Rm,FRn
(Rm) → FRn
1111nnnnmmmm1000
1
—
FMOV.S @Rm+,FRn
(Rm) → FRn, Rm + 4
→ Rm
1111nnnnmmmm1001
1
—
FMOV.S FRm,@Rn
FRm → (Rn)
1111nnnnmmmm1010
1
—
FMOV.S FRm,@-Rn
Rn – 4 → Rn, FRm →
(Rn)
1111nnnnmmmm1011
1
—
FMOV
FRm,FRn
FRm → FRn
1111nnnnmmmm1100
1
—
FMAC
FR0,FRm,FRn
FR0 × FRm + FRn→
FRn
1111nnnnmmmm1110
1
—
Notes: 1. Normal minimum number of execution states (the number in parenthesis is the number
of states when there is contention with preceding/following instructions).
2. One state when it does not branch.
319
A.4
Operation Code Map
Table A.56 shows operation code map.
Table A.56 Operation Code Map
Instruction Code
MSB
Fx: 0000
LSB MD: 00
0000 Rn
Fx
0000
0000 Rn
Fx
0001
0000 Rn
Fx
0010 STC
0000 Rn
Fx
0011 BSRF
0000 Rn
Rm 01MD MOV.B
Rm,@(R0,Rn)
SR,Rn
Fx: 0001
Fx: 0010
Fx: 0011–1111
MD: 01
MD: 10
MD: 11
STC
Rm
Rm
MOV.W
Rm,@(R0,Rn)
MOV.L
Rm,@(R0,Rn)
CLRMAC
1000 CLRT
SETT
0000 0000 Fx
1001 NOP
DIV0U
0000 0000 Fx
1010
0000 0000 Fx
1011 RTS
0000 Rn
Fx
1000
0000 Rn
Fx
1001
0000 Rn
Fx
1010 STS
MACH,Rn
0000 Rn
Fx
1011
0000 Rn
RM 11MD MOV.B
@(R0,Rm),Rn
0001 Rn
Rm disp
0010 Rn
Rm 00MD MOV.B
0010 Rn
Rm 01MD MOV.B
Rm,@-Rn
0010 Rn
Rm 10MD TST
0010 Rn
Rm 11MD CMP/STR Rm,Rn
0011 Rn
Rm 00MD CMP/EQ
Rm,Rn
0011 Rn
Rm 01MD DIV1
Rm,Rn
MOV.L
VBR,Rn
BRAF
0000 0000 Fx
320
GBR,Rn STC
SLEEP
MUL.L Rm,Rn
RTE
MOVT
Rn
STS
MACL,Rn
STS
PR,Rn
MOV.W
@(R0,Rm),Rn
MOV.L
@(R0,Rm),Rn
STS
FPUL,Rn/
STS
FPSCR,Rn
MAC.L
@Rm+,@Rn+
Rm,@(disp:4,Rn)
Rm,@Rn
Rm,Rn
MOV.W
Rm,@Rn MOV.L
Rm,@Rn
MOV.W
Rm,@-Rn
MOV.L
Rm,@-Rn
DIV0S
Rm,Rn
AND
Rm,Rn
XOR
OR
Rm,Rn
XTRCT
Rm,Rn
MULU.W
Rm,Rn
MULS.W Rm,Rn
CMP/HS
Rm,Rn
CMP/GE Rm,Rn
CMP/HI
Rm,Rn
CMP/GT Rm,Rn
DMULU.L
Rm,Rn
Rm,Rn
Table A.56 Operation Code Map (cont)
Instruction Code
MSB
Fx: 0000
LSB MD: 00
Fx: 0001
Fx: 0010
Fx: 0011–1111
MD: 01
MD: 10
MD: 11
0011 Rn
Rm 10MD SUB
Rm,Rn
SUBC
Rm,Rn
SUBV
Rm,Rn
0011 Rn
Rm 11MD ADD
Rm,Rn DMULS.L
Rm,Rn
ADDC
Rm,Rn
ADDV
Rm,Rn
0100 Rn
Fx
0000 SHLL
Rn
DT
Rn
SHAL
Rn
0100 Rn
Fx
0001 SHLR
Rn
CMP/PZ Rn
SHAR
Rn
0100 Rn
Fx
0010 STS.L
MACH,@–Rn
STS.L
MACL,@–Rn
STS.L
PR,@–Rn
0100 Rn
00 0011 STC.L
MD
SR,@–Rn
STC.L
GBR,@–Rn
STC.L
VBR,@–Rn
0100 Rn
Fx
0100 ROTL
Rn
0100 Rn
Fx
0101 ROTR
Rn
0100 Rm
Fx
0100 Rm
STC.L
FPSCR,@-Rn
STC.L
FPUL,@-Rn
ROTCL
Rn
CMP/PL Rn
ROTCR
Rn
0110 LDS.L
@Rm+,MACH
LDS.L
@Rm+,MACL
LDS.L
@Rm+,PR
Fx
0111 LDC.L
@Rm+,SR
LDC.L
@Rm+,GBR
LDC.L
@Rm+,VBR
0100 Rn
Fx
1000 SHLL2
Rn
SHLL8
Rn
SHLL16 Rn
0100 Rn
Fx
1001 SHLR2
Rn
SHLR8
Rn
SHLR16 Rn
0100 Rm
Fx
1010 LDS
Rm,MACH
LDS
Rm,MACL LDS
Rm,PR
0100 Rm/ Fx
Rn
1011 JSR
@Rm
TAS.B
@Rm
JMP
@Rm
0100 Rm
Fx
1100
0100 Rm
Fx
1101
0100 Rm
Fx
1110 LDC
Rm,SR
LDC
Rm,GBR
LDC
Rm,VBR LDC
Rm,SSR
0100 Rn
Rm 1111 MAC.W @Rm+,@Rn+
0101 Rn
Rm disp
0110 Rn
Rm 00MD MOV.B @Rm,Rn
MOV.L
@Rm,Rn MOV
Rm,Rn
LDS.L
@Rm+,FPSCR
LDS.L
@Rm+,FPUL
LDS
Rm,FPSCR
LDS
Rm,FPUL
MOV.L @(disp:4,Rm),Rn
MOV.W
@Rm,Rn
321
Table A.56 Operation Code Map (cont)
Instruction Code
MSB
Fx: 0000
LSB MD: 00
Fx: 0001
Fx: 0010
Fx: 0011–1111
MD: 01
MD: 10
MD: 11
NOT
Rm,Rn
Rm,Rn NEG
Rm,Rn
0110 Rn
Rm
01MD MOV.B
@Rm+,Rn
MOV.W
@Rm+,Rn
MOV.L
@Rm+,Rn
0110 Rn
Rm
10MD SWAP.B
@Rm,Rn
SWAP.W
@Rm,Rn
NEGC
0110 Rn
Rm
11MD EXTU.B
Rm,Rn
EXTU.W
Rm,Rn
EXTS.B
Rm,Rn
0111 Rn
1000 00
MD
imm
Rn
disp
ADD
EXTS.W
Rm,Rn
#imm:8,Rn
MOV.B
MOV.W R0,
R0,@(disp:4,
@(disp:4,Rn)
Rn)
1000 01
MD
Rm
disp
MOV.B
@(disp:4,
Rm),R0
MOV.W
@(disp:4,
Rm),R0
1000 10
MD
imm/disp
CMP/EQ
#imm:8,R0
BT
disp:8
BF
1000 10
MD
imm/disp
BT/S
disp:8
BF/S disp:8
1001 Rn
disp
disp:8
MOV.W @(disp:8,PC),Rn
1010
disp
BRA
disp:12
1011
disp
BSR
disp:12
1100 00
MD
imm/disp
MOV.B
R0,@(disp:
8,GBR)
MOV.W
R0,@(disp:
8,GBR)
MOV.L
R0,@(disp:
8,GBR)
TRAPA #imm:8
1100 01
MD
disp
MOV.B
@(disp:8,
GBR),R0
MOV.W
@(disp:8,
GBR),R0
MOV.L
@(disp:8,
GBR),R0
MOVA
@(disp:8,
PC),R0
1100 10
MD
imm
TST
#imm:8,R0
AND
#imm:8,R0
XOR
#imm:8,R0
OR
#imm:8,R0
1100 11
MD
imm
TST.B
#imm:8,
@(R0,GBR)
AND.B
#imm:8,
@(R0,GBR)
XOR.B
#imm:8,
@(R0,GBR)
OR.B
#imm:8,
@(R0,GBR)
1101 Rn
disp
MOV.L @(disp:8,PC),R0
1110 Rn
imm
MOV
1111
322
—
#imm:8,Rn
Floating-point instruction
Appendix B Pipeline Operation and Contention
The SH-2E is designed so that basic instructions are executed in one cycle. Two or more cycles
are required for instructions when, for example, the branch destination address is changed by a
branch instruction or when the number of cycles is increased by contention between MA and IF.
Table B.1 gives the number of execution cycles and stages for different types of contention and
their instructions. Instructions without contention and instructions that require 2 or more cycles
even without contention are also shown.
Instructions contend in the following ways:
CPU instructions
• Operations and transfers between registers are executed in one cycle with no contention.
• No contention occurs, but the instruction still requires 2 or more cycles.
• Contention occurs, increasing the number of execution cycles. Contention combinations are:
— MA contends with IF
— MA contends with IF and sometimes with memory loads as well
— MA contends with IF and sometimes with the multiplier as well
— MA contends with IF and sometimes with memory loads and sometimes with the multiplier
Floating-point instructions or FPU-related CPU instructions
• No contention occurs with the FCMP instruction.
• MA contends with IF in the case of store instructions involving FR0 to FR15 and FRUL.
• For floating-point operation instructions other than FDIV, floating-point register transfer
instructions, and floating-point register immediate instructions, contention occurs if an
instruction that reads from the destination of the instruction follows immediately after it.
• MA contends with IF in the case of load instructions involving FR0 to FR15 and FRUL. Also,
contention occurs if an instruction that reads from the destination of the instruction follows
immediately after it.
• Contention occurs if an instruction that uses Rn follows the STS FPUL,Rn or STS FPSCR,Rn
instruction.
• In the case of FPSCR load instructions, contention occurs as shown in Figure 8.11.
• In the case of FPSCR store instructions, contention occurs as shown in Figure 8.12, and MA
contends with IF.
• In the case of the FDIV instruction, contention occurs as shown in Figure 8.13.
323
Table B.1
Instructions and Their Contention Patterns
Contention
Cycles
Stages
Instructions
None
1
3
• Transfers between registers
• Operations between registers (except
when a multiplier is involved)
• Logical operations between registers
• Shift instructions
• System control ALU instructions
MA contends with IF
2
3
Unconditional branches
3/1
3
Conditional branches
3
3
SLEEP instruction
4
5
RTE instruction
8
9
TRAP instruction
1
4
• Memory store instructions
• STS.L instruction (PR)
2
4
STC.L instruction
3
6
Memory logic operations
4
6
TAS instruction
MA contends with IF and
sometimes with memory loads
as well.
1
5
• Memory load instructions
3
5
LDC.L instruction
MA contends with IF and
sometimes with the multiplier
as well.
1
4
• Register to MAC transfer instructions
MA contends with IF and
sometimes with memory loads
and sometimes with the
multiplier.
• LDS.L instruction (PR)
• Memory to MAC transfer instructions
• MAC to memory transfer instructions
1 to 3*
6
Multiplication instructions
3/(2)*
7
Multiply/accumulate instructions
3/(2 to 4)*
9
Double length multiply/accumulate
instructions (SH-2 CPU only)
2 to 4*
9
Double length multiplication instructions
(SH-2 CPU only)
1
5
MAC to register transfer instructions
Note: * The normal minimum number of execution states. (The number in parentheses is the
number in contention with the preceding/following instructions.)
324
Table B.2
Types of Contention and Instruction Behavior (Floating-point Instructions or
FPU-related CPU Instructions)
Contention
Cycles
Stages
Instructions
None
1
3 (FPU pipeline)
3 (CPU pipeline)
FCMP/EQ
FCMP/GT
FRm,FRn
FRm,FRn
•
MA in CPU pipeline contends 1
with IF
4 (FPU pipeline)
4 (CPU pipeline)
STS.L
FMOV.S
FMOV.S
FMOV.S
FPUL,@-Rn
FRm,@Rn
FRm,@-Rn
FRm,@(R0,Rn)
•
Contention occurs if next
1
instruction reads destination
register
5 (FPU pipeline)
3 (CPU pipeline)
FLDS
FMOV
FSTS
FLDI0
FLDI1
FABS
FADD
FLOAT
FMAC
FMUL
FNEG
FSUB
FTRC
FRm,FPUL
FRm,FRn
FPUL,FRn
FRn
FRn
FRn
FRm,FRn
FPUL,FRn
FR0,FRm,FRn
FRm,FRn
FRn
FRm,FRn
FRm,FPUL
•
Contention occurs if next
1
instruction reads destination
register
5 (FPU pipeline)
4 (CPU pipeline)
•
MA in CPU pipeline contends
with IF
LDS
LDS.L
FMOV.S
FMOV.S
FMOV.S
Rm,FPUL
@Rm+,FPUL
@Rm,FRn
@Rm+,FRn
@(R0,Rm),FRn
•
Contention occurs if next
instruction uses Rn
1
4 (FPU pipeline)
5 (CPU pipeline)
STS
FPUL,Rn
•
MA in CPU pipeline contends
with IF
•
Contention occurs as shown 1
in Figure 8.11
5 (FPU pipeline)
4 (CPU pipeline)
LDS
LDS.L
Rm,FPSCR
@Rm+,FPSCR
325
Table B.2
Types of Contention and Instruction Behavior (Floating-point Instructions or
FPU-related CPU Instructions) (cont)
Contention
Cycles
•
Contention occurs as shown 1
in Figure 8.12
•
Contention occurs if next
instruction uses Rn
•
MA in CPU pipeline contends
with IF
•
Contention occurs as shown 1
in Figure 8.12
•
MA in CPU pipeline contends
with IF
•
Contention occurs as shown 13
in Figure 8.13
326
Stages
Instructions
4 (FPU pipeline)
5 (CPU pipeline)
STS
FPSCR,Rn
4 (FPU pipeline)
4 (CPU pipeline)
STS.L
FPSCR,@-Rn
17 (FPU pipeline)
3 (CPU pipeline)
FDIV
FRm,FRn