To all our customers Regarding the change of names mentioned in the document, such as Hitachi Electric and Hitachi XX, to Renesas Technology Corp. The semiconductor operations of Mitsubishi Electric and Hitachi were transferred to Renesas Technology Corporation on April 1st 2003. These operations include microcomputer, logic, analog and discrete devices, and memory chips other than DRAMs (flash memory, SRAMs etc.) Accordingly, although Hitachi, Hitachi, Ltd., Hitachi Semiconductors, and other Hitachi brand names are mentioned in the document, these names have in fact all been changed to Renesas Technology Corp. Thank you for your understanding. Except for our corporate trademark, logo and corporate statement, no changes whatsoever have been made to the contents of the document, and these changes do not constitute any alteration to the contents of the document itself. Renesas Technology Home Page: http://www.renesas.com Renesas Technology Corp. Customer Support Dept. April 1, 2003 Cautions Keep safety first in your circuit designs! 1. Renesas Technology Corporation puts the maximum effort into making semiconductor products better and more reliable, but there is always the possibility that trouble may occur with them. Trouble with semiconductors may lead to personal injury, fire or property damage. Remember to give due consideration to safety when making your circuit designs, with appropriate measures such as (i) placement of substitutive, auxiliary circuits, (ii) use of nonflammable material or (iii) prevention against any malfunction or mishap. Notes regarding these materials 1. These materials are intended as a reference to assist our customers in the selection of the Renesas Technology Corporation product best suited to the customer's application; they do not convey any license under any intellectual property rights, or any other rights, belonging to Renesas Technology Corporation or a third party. 2. Renesas Technology Corporation assumes no responsibility for any damage, or infringement of any third-party's rights, originating in the use of any product data, diagrams, charts, programs, algorithms, or circuit application examples contained in these materials. 3. All information contained in these materials, including product data, diagrams, charts, programs and algorithms represents information on products at the time of publication of these materials, and are subject to change by Renesas Technology Corporation without notice due to product improvements or other reasons. It is therefore recommended that customers contact Renesas Technology Corporation or an authorized Renesas Technology Corporation product distributor for the latest product information before purchasing a product listed herein. The information described here may contain technical inaccuracies or typographical errors. Renesas Technology Corporation assumes no responsibility for any damage, liability, or other loss rising from these inaccuracies or errors. Please also pay attention to information published by Renesas Technology Corporation by various means, including the Renesas Technology Corporation Semiconductor home page (http://www.renesas.com). 4. When using any or all of the information contained in these materials, including product data, diagrams, charts, programs, and algorithms, please be sure to evaluate all information as a total system before making a final decision on the applicability of the information and products. Renesas Technology Corporation assumes no responsibility for any damage, liability or other loss resulting from the information contained herein. 5. Renesas Technology Corporation semiconductors are not designed or manufactured for use in a device or system that is used under circumstances in which human life is potentially at stake. Please contact Renesas Technology Corporation or an authorized Renesas Technology Corporation product distributor when considering the use of a product contained herein for any specific purposes, such as apparatus or systems for transportation, vehicular, medical, aerospace, nuclear, or undersea repeater use. 6. The prior written approval of Renesas Technology Corporation is necessary to reprint or reproduce in whole or in part these materials. 7. If these products or technologies are subject to the Japanese export control restrictions, they must be exported under a license from the Japanese government and cannot be imported into a country other than the approved destination. Any diversion or reexport contrary to the export control laws and regulations of Japan and/or the country of destination is prohibited. 8. Please contact Renesas Technology Corporation for further details on these materials or the products contained therein. Hitachi SuperH™ RISC engine SH-2E Programming Manual ADE-602-178 Rev.1.0 3/5/03 Hitachi ,Ltd Cautions 1. Hitachi neither warrants nor grants licenses of any rights of Hitachi’s or any third party’s patent, copyright, trademark, or other intellectual property rights for information contained in this document. Hitachi bears no responsibility for problems that may arise with third party’s rights, including intellectual property rights, in connection with use of the information contained in this document. 2. Products and product specifications may be subject to change without notice. Confirm that you have received the latest product standards or specifications before final design, purchase or use. 3. Hitachi makes every attempt to ensure that its products are of high quality and reliability. However, contact Hitachi’s sales office before using the product in an application that demands especially high quality and reliability or where its failure or malfunction may directly threaten human life or cause risk of bodily injury, such as aerospace, aeronautics, nuclear power, combustion control, transportation, traffic, safety equipment or medical equipment for life support. 4. Design your application so that the product is used within the ranges guaranteed by Hitachi particularly for maximum rating, operating supply voltage range, heat radiation characteristics, installation conditions and other characteristics. Hitachi bears no responsibility for failure or damage when used beyond the guaranteed ranges. Even within the guaranteed ranges, consider normally foreseeable failure rates or failure modes in semiconductor devices and employ systemic measures such as fail-safes, so that the equipment incorporating Hitachi product does not cause bodily injury, fire or other consequential damage due to operation of the Hitachi product. 5. This product is not designed to be radiation resistant. 6. No one is permitted to reproduce or duplicate, in any form, the whole or part of this document without written approval from Hitachi. 7. Contact Hitachi’s sales office for any questions regarding this document or Hitachi semiconductor products. Introduction The SH-2E is a new generation of RISC microcomputers that integrate a RISC-type CPU and the peripheral functions required for system configuration onto a single chip to achieve highperformance operation. It can operate in a power-down state, which is an essential feature for portable equipment. This CPU has a RISC-type instruction set. Basic instructions can be executed in one clock cycle, improving instruction execution speed. In addition, the CPU has a 32-bit internal architecture for enhanced data-processing ability. In addition, the SH-2E supports single-precision floating point calculations as well as entirely PCAPI compatible emulation of double-precision floating point calculations. The SH-2E instructions are a subset of the floating point calculations conforming to the IEEE754 standard. This programming manual describes in detail the instructions for the SH-2E Series and is intended as a reference on instruction operation and architecture. It also covers the pipeline operation, which is a feature of the SH-2E Series. For information on the hardware, please refer to the hardware manual for the product in question. i ii Contents Section 1 1.1 Features.............................................................................................................. SH-2E Features.................................................................................................................. Section 2 2.1 2.2 2.3 2.4 2.5 2.6 Register Configuration .................................................................................. General Registers............................................................................................................... Control Registers ............................................................................................................... System Registers................................................................................................................ Floating-Point Registers .................................................................................................... Floating-Point System Registers........................................................................................ Initial Values of Registers ................................................................................................. Section 3 3.1 3.2 3.3 Data Formats .................................................................................................... Data Format in Registers ................................................................................................... Data Format in Memory .................................................................................................... Immediate Data Format ..................................................................................................... Section 4 4.1 4.2 4.3 4.4 4.5 Floating-Point Unit (FPU) ........................................................................... Overview............................................................................................................................ Floating-Point Registers and Floating-Point System Registers......................................... 4.2.1 Floating-Point Register File ................................................................................. 4.2.2 Floating-Point Communication Register (FPUL) ................................................ 4.2.3 Floating-Point Status/Control Register (FPSCR)................................................. Floating-Point Format........................................................................................................ 4.3.1 Floating-Point Format .......................................................................................... 4.3.2 Non-Numbers (NaN)............................................................................................ 4.3.3 Denormalized Number Values ............................................................................. 4.3.4 Other Special Values ............................................................................................ Floating-Point Exception Model........................................................................................ 4.4.1 Enable State Exceptions ....................................................................................... 4.4.2 Disable State Exceptions ...................................................................................... 4.4.3 FPU Exception Event and Code ........................................................................... 4.4.4 Floating-Point Data Arrangement in Memory ..................................................... 4.4.5 Arithmetic Operations Involving Special Operands ............................................ Synchronization with CPU ................................................................................................ 1 1 3 3 4 5 6 7 8 9 9 9 10 11 11 12 12 12 12 15 15 16 16 17 18 18 18 18 18 18 19 Section 5 5.1 5.2 5.3 Instruction Features........................................................................................ 21 RISC-Type Instruction Set ................................................................................................ 21 Addressing Modes ............................................................................................................. 24 Instruction Format ............................................................................................................. 27 iii Section 6 6.1 6.2 Instruction Set by Classification........................................................................................ Instruction Set in Alphabetical Order................................................................................ Section 7 7.1 7.2 iv Instruction Set .................................................................................................. 31 31 47 Instruction Descriptions................................................................................ 55 Sample Description (Name): Classification ...................................................................... CPU Instruction ................................................................................................................. 7.2.1 ADD (ADD Binary): Arithmetic Instruction ....................................................... 7.2.2 ADDC (ADD with Carry): Arithmetic Instruction .............................................. 7.2.3 ADDV (ADD with V Flag Overflow Check): Arithmetic Instruction................. 7.2.4 AND (AND Logical): Logic Operation Instruction ............................................. 7.2.5 BF (Branch if False): Branch Instruction ............................................................. 7.2.6 BF/S (Branch if False with Delay Slot): Branch Instruction................................ 7.2.7 BRA (Branch): Branch Instruction ...................................................................... 7.2.8 BRAF (Branch Far): Branch Instruction.............................................................. 7.2.9 BSR (Branch to Subroutine): Branch Instruction ................................................ 7.2.10 BSRF (Branch to Subroutine Far): Branch Instruction ........................................ 7.2.11 BT (Branch if True): Branch Instruction.............................................................. 7.2.12 BT/S (Branch if True with Delay Slot): Branch Instruction ................................ 7.2.13 CLRMAC (Clear MAC Register): System Control Instruction ........................... 7.2.14 CLRT (Clear T Bit): System Control Instruction................................................. 7.2.15 CMP/cond (Compare Conditionally): Arithmetic Instruction.............................. 7.2.16 DIV0S (Divide Step 0 as Signed): Arithmetic Instruction................................... 7.2.17 DIV0U (Divide Step 0 as Unsigned): Arithmetic Instruction.............................. 7.2.18 DIV1 (Divide 1 Step): Arithmetic Instruction...................................................... 7.2.19 DMULS.L (Double-Length Multiply as Signed): Arithmetic Instruction ........... 7.2.20 DMULU.L (Double-Length Multiply as Unsigned): Arithmetic Instruction ...... 7.2.21 DT (Decrement and Test): Arithmetic Instruction ............................................... 7.2.22 EXTS (Extend as Signed): Arithmetic Instruction............................................... 7.2.23 EXTU (Extend as Unsigned): Arithmetic Instruction.......................................... 7.2.24 JMP (Jump): Branch Instruction .......................................................................... 7.2.25 JSR (Jump to Subroutine): Branch Instruction (Class: Delayed Branch Instruction) .................................................................... 7.2.26 LDC (Load to Control Register): System Control Instruction (Class: Interrupt Disabled Instruction)................................................................. 7.2.27 LDS (Load to System Register): System Control Instruction.............................. 7.2.28 MAC.L (Multiply and Accumulate Calculation Long): Arithmetic Instruction .. 7.2.29 MAC.W (Multiply and Accumulate Calculation Word): Arithmetic Instruction .......................................................................................... 7.2.30 MOV (Move Data): Data Transfer Instruction .................................................... 7.2.31 MOV (Move Immediate Data): Data Transfer Instruction .................................. 7.2.32 MOV (Move Peripheral Data): Data Transfer Instruction ................................... 7.2.33 MOV (Move Structure Data): Data Transfer Instruction ..................................... 55 58 58 59 60 61 63 64 66 67 68 70 71 72 74 75 76 80 81 82 87 89 91 92 93 94 95 97 99 101 104 106 111 113 116 7.2.34 7.2.35 7.2.36 7.2.37 7.2.38 7.2.39 7.2.40 7.2.41 7.2.42 7.2.43 7.2.44 7.2.45 7.2.46 7.2.47 7.2.48 7.2.49 7.3 MOVA (Move Effective Address): Data Transfer Instruction ............................ MOVT (Move T Bit): Data Transfer Instruction ................................................. MUL.L (Multiply Long): Arithmetic Instruction................................................. MULS.W (Multiply as Signed Word): Arithmetic Instruction ............................ MULU.W (Multiply as Unsigned Word): Arithmetic Instruction ....................... NEG (Negate): Arithmetic Instruction ................................................................. NEGC (Negate with Carry): Arithmetic Instruction ............................................ NOP (No Operation): System Control Instruction ............................................... NOT (NOT—Logical Complement): Logic Operation Instruction ..................... OR (OR Logical) Logic Operation Instruction .................................................... ROTCL (Rotate with Carry Left): Shift Instruction............................................. ROTCR (Rotate with Carry Right): Shift Instruction .......................................... ROTL (Rotate Left): Shift Instruction.................................................................. ROTR (Rotate Right): Shift Instruction ............................................................... RTE (Return from Exception): System Control Instruction ................................ RTS (Return from Subroutine): Branch Instruction (Class: Delayed Branch Instruction) .................................................................... 7.2.50 SETT (Set T Bit): System Control Instruction ..................................................... 7.2.51 SHAL (Shift Arithmetic Left): Shift Instruction.................................................. 7.2.52 SHAR (Shift Arithmetic Right): Shift Instruction................................................ 7.2.53 SHLL (Shift Logical Left): Shift Instruction........................................................ 7.2.54 SHLLn (Shift Logical Left n Bits): Shift Instruction ........................................... 7.2.55 SHLR (Shift Logical Right): Shift Instruction ..................................................... 7.2.56 SHLRn (Shift Logical Right n Bits): Shift Instruction ........................................ 7.2.57 SLEEP (Sleep): System Control Instruction ........................................................ 7.2.58 STC (Store Control Register): System Control Instruction (Interrupt Disabled Instruction)............................................................................ 7.2.59 STS (Store System Register): System Control Instruction (Interrupt Disabled Instruction)............................................................................ 7.2.60 SUB (Subtract Binary): Arithmetic Instruction.................................................... 7.2.61 SUBC (Subtract with Carry): Arithmetic Instruction .......................................... 7.2.62 SUBV (Subtract with V Flag Underflow Check): Arithmetic Instruction ........... 7.2.63 SWAP (Swap Register Halves): Data Transfer Instruction ................................. 7.2.64 TAS (Test and Set): Logic Operation Instruction ................................................ 7.2.65 TRAPA (Trap Always): System Control Instruction ........................................... 7.2.66 TST (Test Logical): Logic Operation Instruction ................................................ 7.2.67 XOR (Exclusive OR Logical): Logic Operation Instruction................................ 7.2.68 XTRCT (Extract): Data Transfer Instruction ....................................................... Floating Point Instructions and FPU Related CPU Instructions........................................ 7.3.1 FABS (Floating Point Absolute Value): Floating Point Instruction .................... 7.3.2 FADD (Floating Point Add): Floating Point Instruction...................................... 7.3.3 FCMP (Floating Point Compare): Floating Point Instruction.............................. 7.3.4 FDIV (Floating Point Divide): Floating Point Instruction ................................... 119 120 121 122 123 124 125 126 127 128 130 131 132 133 134 135 137 138 139 140 141 143 144 146 147 149 151 152 153 154 156 157 158 160 162 163 165 166 168 172 v 7.3.5 7.3.6 7.3.7 7.3.8 7.3.9 7.3.10 7.3.11 7.3.12 7.3.13 7.3.14 7.3.15 FLDI0 (Floating Point Load Immediate 0): Floating Point Instruction ............... FLDI1 (Floating Point Load Immediate 1): Floating Point Instruction ............... FLDS (Floating Point Load to System Register): Floating Point Instruction ...... FLOAT (Floating Point Convert from Integer): Floating Point Instruction......... FMAC (Floating Point Multiply Accumulate): Floating Point Instruction.......... FMOV (Floating Point Move): Floating Point Instruction .................................. FMUL (Floating Point Multiply): Floating Point Instruction .............................. FNEG (Floating Point Negate): Floating Point Instruction.................................. FSTS (Floating Point Store From System Register): Floating Point Instruction. FSUB (Floating Point Subtract): Floating Point Instruction................................ FTRC (Floating Point Truncate And Convert To Integer): Floating Point Instruction ..................................................................................... 7.3.16 LDS (Load to System Register): FPU Related CPU Instruction.......................... 7.3.17 STS (Store from FPU System Register): FPU Related CPU Instruction ............. Section 8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 Pipeline Operation .......................................................................................... Basic Configuration of Pipelines ....................................................................................... Slot and Pipeline Flow....................................................................................................... Number of Instruction Execution Cycles .......................................................................... Contention between Instruction Fetch (IF) and Memory Access (MA)............................ Relationship between Load Instructions and the Instructions that Follow........................ FPU Contention ................................................................................................................. Programming Guide .......................................................................................................... Operation of Instruction Pipelines ..................................................................................... 8.8.1 Data Transfer Instructions .................................................................................... 8.8.2 Arithmetic Instructions......................................................................................... 8.8.3 Logic Operation Instructions................................................................................ 8.8.4 Shift Instructions .................................................................................................. 8.8.5 Branch Instructions .............................................................................................. 8.8.6 System Control Instructions ................................................................................. 8.8.7 Exception Processing............................................................................................ 8.8.8 Relationship between Floating-point Instructions and FPU-related CPU Instructions .................................................................................................. 174 175 176 177 178 181 185 187 188 189 192 194 197 201 201 203 205 206 209 210 212 212 222 225 259 261 262 265 271 273 Appendix A Instruction Code ........................................................................................... 287 A.1 vi Instruction Set by Addressing Mode ................................................................................. A.1.1 No Operand .......................................................................................................... A.1.2 Direct Register Addressing .................................................................................. A.1.3 Indirect Register Addressing ................................................................................ A.1.4 Post-Increment Indirect Register Addressing ...................................................... A.1.5 Pre-Decrement Indirect Register Addressing....................................................... A.1.6 Indirect Register Addressing with Displacement ................................................. A.1.7 Indirect Indexed Register Addressing .................................................................. 287 288 289 292 293 294 295 295 A.2 A.3 A.4 A.1.8 Indirect GBR Addressing with Displacement ...................................................... A.1.9 Indirect Indexed GBR Addressing ....................................................................... A.1.10 PC Relative Addressing with Displacement ........................................................ A.1.11 PC Relative Addressing........................................................................................ A.1.12 Immediate ............................................................................................................. Instruction Sets by Instruction Format .............................................................................. A.2.1 0 Format................................................................................................................ A.2.2 n Format................................................................................................................ A.2.3 m Format .............................................................................................................. A.2.4 nm Format ............................................................................................................ A.2.5 md Format ............................................................................................................ A.2.6 nd4 Format............................................................................................................ A.2.7 nmd Format .......................................................................................................... A.2.8 d Format................................................................................................................ A.2.9 d12 Format............................................................................................................ A.2.10 nd8 Format............................................................................................................ A.2.11 i Format ................................................................................................................ A.2.12 ni Format .............................................................................................................. Instruction Set by Instruction Code ................................................................................... Operation Code Map.......................................................................................................... 296 296 296 297 297 299 300 301 303 305 308 308 309 309 310 310 310 311 312 320 Appendix B Pipeline Operation and Contention ........................................................ 323 vii Section 1 Features 1.1 SH-2E Features The SH-2E CPU has RISC-type instruction sets. Basic instructions are executed in one clock cycle, which dramatically improves instruction execution speed. The CPU also has an internal 32bit architecture for enhanced data processing ability. Table 1.1 lists the SH-2E CPU features. Table 1.1 SH-2E CPU Features Item Architecture Feature • Original Hitachi architecture • 32-bit internal data bus General-register machine • Sixteen 32-bit general registers • Three 32-bit control registers • Four 32-bit system registers • Sixteen 32-bit froating-point registers • Two 32-bit froating point system registers Instruction set • Instruction length: 16-bit fixed length for improved code efficiency • Load-store architecture (basic arithmetic and logic operations are executed between registers) • Delayed branch system used for reduced pipeline disruption • Instruction set optimized for C language Instruction execution time • One instruction/cycle for basic instructions Address space • Architecture makes 4 Gbytes available On-chip multiplier • Multiplication operations executed in 1 to 2 cycles (16 bits × 16 bits → 32 bits) or 2 to 4 cycles (32 bits × 32 bits → 64 bits), and multiplication/accumulation operations executed in 3/(2)*cycles (16 bits × 16 bits + 64 bits → 64 bits) or 3/(2 to 4)* cycles (32 bits × 32 bits + 64 bits → 64 bits) Pipeline • Five-stage pipeline Processing states • Reset state • Exception processing state • Program execution state • Power-down state • Bus release state Power-down states • Sleep mode • Standby mode 1 Table 1.1 SH-2E CPU Features (cont) Feature Description FPU • Single-precision floating point format • Subset of IEEE754 standard data types • Invalid calculation exception and divide-by-zero exception (in compliance with IEEE754 standard) • Rounding to zero (in compliance with IEEE754 standard) • General purpose register file, 16 32-bit floating point registers • Execution pitch for basic instructions: 1 cycle/latency or 2 cycles (FADD, FSUB, FMUL) • FMAC (floating point multiply accumulate) Execution pitch: 1 cycle/latency or 2 cycles • Support for FDIV • Support for FLDI0 and FLDI1 (load constant 0/1) Note: The normal minimum number of execution cycles The number in parentheses in the mumber in contention with preceding/following instructions. 2 Section 2 Register Configuration The register set consists of sixteen 32-bit general registers, three 32-bit control registers and four 32-bit system registers. 2.1 General Registers There are 16 general registers (Rn) numbered R0–R15, which are 32 bits in length. General registers are used for data processing and address calculation. R0 is also used as an index register. Several instructions use R0 as a fixed source or destination register. R15 is used as the hardware stack pointer (SP). Saving and recovering the status register (SR) and program counter (PC) in exception processing is accomplished by referencing the stack using R15. 31 0 R0 * 1 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 R14 R15, SP (hardware stack pointer) *2 Notes: 1. R0 functions as an index register in the indirect indexed register addressing mode and indirect indexed GBR addressing mode. In some instructions, R0 functions as a fixed source register or destination register. 2. R15 functions as a hardware stack pointer (SP) during exception processing. Figure 2.1 General Registers (SH-1 and SH-2) 3 2.2 Control Registers The 32-bit control registers consist of the 32-bit status register (SR), global base register (GBR), and vector base register (VBR). The status register indicates processing states. The global base register functions as a base address for the indirect GBR addressing mode to transfer data to the registers of on-chip peripheral modules. The vector base register functions as the base address of the exception processing vector area (including interrupts). 31 SR 9 8 7 6 5 4 32 1 0 M Q I3 I2 I1 I0 ST SR: Status register T bit: The MOVT, CMP/cond, TAS, TST, BT (BT/S), BF (BF/S), SETT, and CLRT instructions use the T bit to indicate true (1) or false (0). The ADDV/C, SUBV/C, DIV0U/S, DIV1, NEGC, SHAR/L, SHLR/L, ROTR/L, and ROTCR/L instructions also use bit T to indicate carry/borrow or overflow/ underflow S bit: Used by the multiply/accumulate instruction. Reserved bits: Always reads as 0, and should always be written with 0. Bits I3–I0: Interrupt mask bits. M and Q bits: Used by the DIV0U/S and DIV1 instructions. 31 GBR 31 VBR Global base register (GBR): 0 Indicates the base address of the indirect GBR addressing mode. The indirect GBR addressing mode is used in data transfer for on-chip peripheral module register areas and in logic operations. 0 Vector base register (VBR): Indicates the base address of the exception processing vector area. Figure 2.2 Control Registers 4 2.3 System Registers System registers consist of four 32-bit registers: high and low multiply and accumulate registers (MACH and MACL), the procedure register (PR), and the program counter (PC). The multiply and accumulate registers store the results of multiply and multiply and accumulate operations. The procedure register stores the return address from the subroutine procedure. The program counter indicates the address of the program executing and controls the flow of the processing. 9 31 0 MACH MACL Multiply and accumulate register high (MACH) Multiply and accumulate register low (MACL) 0 31 PR Procedure register (PR) 0 31 PC Program counter (PC) Figure 2.3 Organization of the System Registers 5 2.4 Floating-Point Registers There are sixteen 32-bit floating-point registers, designated FR0 to FR15, which are used by floating-point instructions. FR0 functions as the index register for the FMAC instruction. These registers are incorporated into the floating-point unit (FPU). For details, see section 4, FloatingPoint Unit. 31 0 FR0 FR0 functions as the index register for the FMAC instruction. FR1 FR2 FR3 FR4 FR5 FR6 FR7 FR8 FR9 FR10 FR11 FR12 FR13 FR14 FR15 Figure 2.4 Floating-Point Registers 6 2.5 Floating-Point System Registers There are two 32-bit floating-point system registers: the floating-point communication register (FPUL) and the floating-point status/control register (FPSCR). FPUL is used for communication between the CPU and the floating-point unit (FPU). FPSCR indicates and stores status/control information relating to FPU exceptions. These registers are incorporated into the floating-point unit (FPU). For details, see section 4, Floating-Point Unit. 31 0 FPUL: FPUL 31 Floating-point communication register Used for communication between the CPU and the FPU. 0 FPSCR FPSCR: Floating-point status/control register Indicates and stores status/control information relating to FPU exceptions. Figure 2.5 Floating-Point System Registers 7 2.6 Initial Values of Registers Table 2.1 lists the values of the registers after reset. Table 2.1 Initial Values of Registers Classification Register Initial Value General registers R0–R14 Undefined R15 (SP) Value of the stack pointer in the vector address table SR Bits I3–I0 are 1111 (H'F), reserved bits are 0, and other bits are undefined GBR Undefined VBR H'00000000 MACH, MACL, PR Undefined PC Value of the program counter in the vector address table Floating-point registers FR0–FR15 Undefined Floating-point system registers FPUL Undefined FPSCR H'00040001 Control registers System registers 8 Section 3 Data Formats 3.1 Data Format in Registers Register operands are always longwords (32 bits). When data in memory is loaded to a register and the memory operand is only a byte (8 bits) or a word (16 bits), it is sign-extended into a longword when stored into a register. 31 0 Longword Figure 3.1 Data Format in Registers 3.2 Data Format in Memory Memory data formats are classified into bytes, words, and longwords. Byte data can be accessed from any address, but an address error will occur if you try to access word data starting from an address other than 2n or longword data starting from an address other than 4n. In such cases, the data accessed cannot be guaranteed. The hardware stack area, which is referred to by the hardware stack pointer (SP, R15), uses only longword data starting from address 4n because this area stores the program counter (PC) and status register (SR). See the hardware manual for more information on address errors. Address m + 1 Address m Byte Address 2n Address 4n Address m + 2 23 31 Address m + 3 7 15 Byte Byte Word 0 Byte Word Longword Figure 3.2 Data Format in Memory 9 3.3 Immediate Data Format Byte immediate data is located in an instruction code. Immediate data accessed by the MOV, ADD, and CMP/EQ instructions is sign-extended and is handled in registers as longword data. Immediate data accessed by the TST, AND, OR, and XOR instructions is zero-extended and is handled as longword data. Consequently, AND instructions with immediate data always clear the upper 24 bits of the destination register. Word or longword immediate data is not located in the instruction code but rather is stored in a memory table. The memory table is accessed by a immediate data transfer instruction (MOV) using the PC relative addressing mode with displacement. Specific examples are given in 5.1 Immediate Data in Section 5, Instruction Features. 10 Section 4 Floating-Point Unit (FPU) 4.1 Overview The SH-2E has an on-chip floating-point unit (FPU), The FPU’s register configuration is shown in figure 4.1. Floating-point registers 31 0 FR0 functions as the index register for the FMAC instruction. FR0 FR1 FR2 FR3 FR4 FR5 FR6 FR7 FR8 FR9 FR10 FR11 FR12 FR13 FR14 FR15 Floating-point system registers 31 0 Floating-point communication register Specifies buffer as communication register between CPU and FPU*. 0 Floating-point status/control register Indicates status/control information relating to FPU exceptions*. FPUL 31 FPSCR Note: * For details, see section 4.2, Floating-Point Registers and Floating-Point System Registers. Figure 4.1 Overview of Register Configuration (Floating-Point Registers and Floating-Point System Registers) 11 4.2 Floating-Point Registers and Floating-Point System Registers 4.2.1 Floating-Point Register File The SH-2E has sixteen 32-bit single-precision floating-point registers. Register specifications are always made as 4 bits. In assembly language, the floating-point registers are specified as FR0, FR1, FR2, and so on. FR0 functions as the index register for the FMAC instruction. 4.2.2 Floating-Point Communication Register (FPUL) Information for transfer between the FPU and the CPU is transferred via the FPUL communication register, which resembles MACL and MACH in the integer unit. The SH-2E is provided with this communication register since the integer and floating-point formats are different. The 32-bit FPUL is a system register, and is accessed by the CPU by means of LDS and STS instructions. 4.2.3 Floating-Point Status/Control Register (FPSCR) The SH-2E has a floating-point status/control register (FPSCR) that functions as a system register accessed by means of LDS and STS instructions (figure 4.2). FPSCR can be written to by a user program. This register is part of the process context, and must be saved when the context is switched. It may also be necessary to save this register when a procedure call is made. FPSCR is a 32-bit register that controls the storage of detailed information relating to the rounding mode, asymptotic underflow (denormalized numbers), and FPU exceptions. The module stop bit that disables the FPU itself is provided in the module standby control register (MSTCR). For details, refer to hardware manual. After a reset start, the FPU is enabled. Table 4.1 shows the flags corresponding the five kinds of FPU exception. A sixth flag is also provided as an FPU error flag that indicates an floating-point unit error state not covered by the other five flags. Table 4.1 Floating-Point Exception Flags Flag Meaning Support in SH-2E E FPU error — V Invalid operation Yes Z Division by zero Yes O Overflow (value not expressed) — U Underflow (value not expressed) — I Inexact (result not expressed) — 12 The bits in the cause field indicate the exception cause for the instruction executing at the time. The cause bits are modified by a floating-point instruction. These bits are set to 1 or cleared to 0 according to whether or not an exception state occurred during execution of a single instruction. The bits in the enable field specify the kinds of exception to be enabled, allowing the flow to be changed to exception processing. If the cause bit corresponding to an enable bit is set by the currently executing instruction, an exception occurs. The bits in the flag field are used to keep a tally of all exceptions that occur during a series of instructions. Once one of these bits is set by an instruction, it is not reset by a subsequent instruction. The bits in this field can only be reset by the explicit execution of a store operation on FPSCR. 13 31 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Cause field Reserved Enable field Flag field DN CE CV CZ CO CU CI EV EZ EO EU EI FV FZ FO FU FI RM DN: Denormalized bit In the SH-2E this bit is always set to 1, and the source or destination operand of a denormalized number is 0. This bit cannot be modified even by an LDS instruction. CV: Invalid operation cause bit When 1: Indicates that an invalid operation exception occurred during execution of the current instruction. When 0: Indicates that an invalid operation exception has not occurred. CZ: Division-by-zero cause bit When 1: Indicates that a division-by-zero exception occurred during execution of the current instruction. When 0: Indicates that a division-by-zero exception has not occurred. EV: Invalid operation exception enable When 1: Enables invalid operation exception generation. When 0: An invalid operation exception is not generated, and a qNAN is returned as the result. EZ: Division-by-zero exception enable When 1: Enables exception generation due to division-by-zero during execution of the current instruction. When 0: A division-by-zero exception is not generated, and infinity with the sign (+ or –) of the current expression is returned as the result. FV: Invalid operation exception flag bit When 1: Indicates that an invalid operation exception occurred during instruction execution. When 0: Indicates that an invalid operation exception has not occurred. FZ: Division-by-zero exception flag bit When 1: Indicates that a division-by-zero exception occurred during instruction execution. When 0: Indicates that a division-by-zero exception has not occurred. RM: Rounding bits. In the SH-2E, the value of these bits is always 01, meaning that rounding to zero (RZ mode) is being used. These bits cannot be modified even by an LDS instruction. In the SH-2E, the cause field EOUI bits (CE, CO, CU, and CI), enable field OUI bits (EO, EU, and EI), and flag field OUI bits (FO, FU, and FI), and the reserved area, are preset to 0, and cannot be modified even by using an LDS instruction. Figure 4.2 Floating-Point Status/Control Register 14 4.3 Floating-Point Format 4.3.1 Floating-Point Format The SH-2E supports single-precision floating-point operations, and fully complies with the IEEE754 floating-point standard. A floating-point number consists of the following three fields: • Sign (s) • Exponent (e) • Fraction (f) The exponent is expressed in biased form, as follows: e = E + bias The range of unbiased exponent E is E min – 1 to Emax + 1. The two values Emin – 1 and Emax + 1 are distinguished as follows. Emin – 1 indicates zero (both positive and negative sign) and a denormalized number, and E max + 1 indicates positive or negative infinity or a non-number (NaN). In a single-precision operation, the bias value is 127, E min is –126, and Emax is 127. 31 30 s 23 22 e 0 f Figure 4.3 Floating-Point Number Format Floating-point number value v is determined as follows: If E = Emax + 1 and f! = 0, v is a non-number (NaN) irrespective of sign s If E = Emax + 1 and f = 0, v = (-1)s (infinity) [positive or negative infinity] If Emin <= E <= Emax , v = (-1)s2E (1.f) [normalized number] If E = Emin – 1 and f! = 0, v = (-1)s2Emin (0.f) [denormalized number] If E = Emin – 1 and f = 0, v = (-1)s0 [positive or negative zero] 15 4.3.2 Non-Numbers (NaN) With non-number (NaN) representation in a single-precision operation value, at least one of bits 22 to 0 is set. If bit 22 is set, this indicates a signaling NaN (sNaN). If bit 22 is reset, the value is a quiet NaN (qNaN). The bit pattern of a non-number (NaN) is shown in the figure below. Bit N in the figure is set for a signaling NaN and reset for a quiet NaN. x indicates a don’t care bit (with the proviso that at least one of bits 22 to 0 is set). In a non-number (NaN), the sign bit is a don’t care bit. 31 30 x 23 22 11111111 0 Nxxxxxxxxxxxxxxxxxxxxxx N = 1: sNaN N = 0: qNaN Figure 4.4 NaN Bit Pattern If a non-number (sNaN) is input in an operation that generates a floating-point value: • When the EV bit in the FPSCR register is reset, the operation result (output) is a quiet NaN (qNaN). • When the EV bit in the FPSCR register is set, an invalid operation exception will be generated. In this case, the contents of the operation destination register do not change. If a quiet NaN is input in an operation that generates a floating-point value, and a signaling NaN has not been input in that operation, the output will always be a quiet NaN irrespective of the setting of the EV bit in the FPSCR register. An exception will not be generated in this case. Refer to section 7, Instruction Descriptions for details of floating-point operations when a nonnumber (NaN) is input. 4.3.3 Denormalized Number Values For a denormalized number floating-point value, the biased exponent is expressed as 0, the fraction as a non-zero value, and the hidden bit as 0. In the SH-2E’s floating-point unit, a denormalized number (operand source or operation result) is always flushed to 0 in a floatingpoint operation that generates a value (an operation other than copy). 16 4.3.4 Other Special Values Floating-point value representations include the seven different kinds of special values shown in table 4.2. Table 4.2 Representation of Special Values in Single-Precision Floating-Point Operations Specified by IEEE754 Standard Value Representation +0.0 0x00000000 –0.0 0x80000000 Denormalized number As described in 4.3.3, Denormalized Number Values +INF 0x7F800000 –INF 0xFF800000 qNaN (quiet NaN) As described in 4.3.2, Non-Numbers (NaN) sNaN (signaling NaN) As described in 4.3.2, Non-Numbers (NaN) 17 4.4 Floating-Point Exception Model 4.4.1 Enable State Exceptions Invalid operation and division-by-zero exceptions are both placed in the enable state by setting the enable bit. All exceptions generated by the FPU are mapped as the same exception event. The meaning of a particular exception is determined by software by reading system register FPSCR and analyzing the information held there. 4.4.2 Disable State Exceptions If the EV enable bit is not set, a qNaN will be generated as the result of an invalid operation (except for FCMP and FTRC). If the EZ enable bit is not set, division-by-zero will return infinity with the sign (+ or –) of the current expression. Overflow will generate a finite number which is the largest value that can be expressed by an absolute value in the format, with the correct sign. Underflow will generate zero with the correct sign. If the operation result is inexact, the destination register will store that inexact result. 4.4.3 FPU Exception Event and Code All FPU exceptions have a vector table address offset in address H'00000034 as the same general exception event; that is, an FPU exception. 4.4.4 Floating-Point Data Arrangement in Memory Single-precision floating-point data is located in memory at a 4-byte boundary; that is, it is arranged in the same form as an SH-2E long integer. 4.4.5 Arithmetic Operations Involving Special Operands All arithmetic operations involving special operands (qNaN, sNaN, +INF, –INF, +0, –0) comply with the specifications of the IEEE754 standard. Refer to section 7, Instruction Descriptions for details. 18 4.5 Synchronization with CPU Synchronization with CPU: Floating-point instructions and CPU instructions are executed in turn, according to their order in the program, but in some cases operations may not be completed in the program order due to a difference in execution cycles. When a floating-point instruction accesses only FPU resources, there is no need for synchronization with the CPU, and a CPU instruction following an FPU instruction can finish its operation before completion of the FPU operation. Consequently, in an optimized program, it is possible to effectively conceal the execution cycle of a floating-point instruction that requires a long execution cycle, such as a divide instruction. On the other hand, a floating-point instruction that accesses CPU resources, such as a compare instruction, must be synchronized to ensure that the program order is observed. Floating-Point Instructions That Require Synchronization: Load, store, and compare instructions, and instructions that access the FPUL or FPSCR register, must be synchronized because they access CPU resources. Load and store instructions access a general register. Postincrement load and pre-decrement store instructions change the contents of a general register. A compare instruction modifies the T bit. An FPUL or FPSCR access instruction references or changes the contents of the FPUL or FPSCR register. These references and changes must all be synchronized with the CPU. 19 Section 5 Instruction Features 5.1 RISC-Type Instruction Set All instructions are RISC type. Their features are detailed in this section. 16-Bit Fixed Length: All instructions are 16 bits long, increasing program coding efficiency. One Instruction/Cycle: Basic instructions can be executed in one cycle using the pipeline system. Instructions are executed in 50 ns at 40 MHz. Data Length: Longword is the standard data length for all operations. Memory can be accessed in bytes, words, or longwords. Byte or word data accessed from memory is sign-extended and calculated with longword data. Immediate data is sign-extended for arithmetic operations or zeroextended for logic operations. It also is calculated with longword data. Table 5.1 Sign Extension of Word Data SH-2E CPU MOV.W @(disp,PC),R1 ADD R1,R0 ......... .DATA.W H'1234 Description Example for Other CPU Data is sign-extended to 32 bits, and R1 becomes H'00001234. It is next operated upon by an ADD instruction. ADD.W #H'1234,R0 Note: The address of the immediate data is accessed by @(disp, PC). Load-Store Architecture: Basic operations are executed between registers. For operations that involve memory access, data is loaded to the registers and executed (load-store architecture). Instructions such as AND that manipulate bits, however, are executed directly in memory. Delayed Branch Instructions: Unconditional branch instructions are delayed. Pipeline disruption during branching is reduced by first executing the instruction that follows the branch instruction, and then branching (table 5.2). With delayed branching, branching occurs after execution of the slot instruction. However, instructions such as register changes etc. are executed in the order of delayed branch instruction, then delay slot instruction. For example, even if the register in which the branch destination address has been loaded is changed by the delay slot instruction, the branch will still be made using the value of the register prior to the change as the branch destination address. 21 Table 5.2 Delayed Branch Instructions SH-2E CPU Description Example for Other CPU BRA TRGET ADD.W R1,R0 ADD R1,R0 Executes an ADD before branching to TRGET. BRA TRGET Multiplication/Accumulation Operation: 16bit × 16bit → 32-bit multiplication operations are executed in one to two cycles. 16bit × 16bit + 64bit → 64-bit multiplication/accumulation operations are executed in two to three cycles. 32bit × 32bit → 64-bit multiplication and 32bit × 32bit + 64bit → 64-bit multiplication/accumulation operations are executed in two to four cycles. T Bit: The T bit in the status register changes according to the result of the comparison, and in turn is the condition (true/false) that determines if the program will branch. The number of instructions after T bit in the status register is kept to a minimum to improve the processing speed. Table 5.3 T Bit SH-2E CPU Description Example for Other CPU CMP/GE R1,R0 CMP.W R1,R0 BT TRGET0 BGE TRGET0 BF TRGET1 T bit is set when R0 R1. The program branches to TRGET0 when R0 R1 and to TRGET1 when R0 < R1. BLT TRGET1 ADD #–1,R0 SUB.W #1,R0 CMP/EQ #0,R0 T bit is not changed by ADD. T bit is set when R0 = 0. The program branches if R0 = 0. BEQ TRGET BT TRGET Immediate Data: Byte immediate data is located in instruction code. Word or longword immediate data is not input via instruction codes but is stored in a memory table. The memory table is accessed by an immediate data transfer instruction (MOV) using the PC relative addressing mode with displacement. 22 Table 5.4 Immediate Data Accessing Classification SH-2E CPU Example for Other CPU 8-bit immediate MOV #H'12,R0 MOV.B #H'12,R0 16-bit immediate MOV.W @(disp,PC),R0 MOV.W #H'1234,R0 MOV.L #H'12345678,R0 ................. 32-bit immediate .DATA.W H'1234 MOV.L @(disp,PC),R0 ................. .DATA.L H'12345678 Note: The address of the immediate data is accessed by @(disp, PC). Absolute Address: When data is accessed by absolute address, the value already in the absolute address is placed in the memory table. Loading the immediate data when the instruction is executed transfers that value to the register and the data is accessed in the indirect register addressing mode. Table 5.5 Absolute Address Classification SH-2E CPU Absolute address MOV.L MOV.B Example for Other CPU @(disp,PC),R1 MOV.B @H'12345678,R0 @R1,R0 .................. .DATA.L H'12345678 16-Bit/32-Bit Displacement: When data is accessed by 16-bit or 32-bit displacement, the preexisting displacement value is placed in the memory table. Loading the immediate data when the instruction is executed transfers that value to the register and the data is accessed in the indirect indexed register addressing mode. Table 5.6 Displacement Accessing Classification SH-2E CPU Example for Other CPU 16-bit displacement MOV.W @(disp,PC),R0 MOV.W @(R0,R1),R2 MOV.W @(H'1234,R1),R2 .................. .DATA.W H'1234 23 5.2 Addressing Modes Addressing modes effective address calculation by the CPU core are described below. Table 5.7 Addressing Modes and Effective Addresses Addressing Instruction Mode Format Effective Addresses Calculation Formula Direct register addressing Rn The effective address is register Rn. (The operand is the contents of register Rn.) Ñ Indirect register addressing @Rn The effective address is the content of register Rn. Rn Postincrement indirect register addressing @Rn + Rn Rn The effective address is the content of register Rn. A constant is added to the content of Rn after the instruction is executed. 1 is added for a byte operation, 2 for a word operation, or 4 for a longword operation. Rn Rn Rn + 1/2/4 @ÐRn Rn 1/2/4 24 Byte: Rn + 1 → Rn Longword: Rn + 4 → Rn The effective address is the value obtained by subtracting a constant from Rn. 1 is subtracted for a byte operation, 2 for a word operation, or 4 for a longword operation. Rn Ð 1/2/4 (After the instruction is executed) Word: Rn + 2 → Rn + 1/2/4 Predecrement indirect register addressing Rn Ð Rn Ð 1/2/4 Byte: Rn Ð 1 → Rn Word: Rn Ð 2 → Rn Longword: Rn Ð 4 → Rn (Instruction executed with Rn after calculation) Table 5.7 Addressing Modes and Effective Addresses (cont) Addressing Instruction Mode Format Indirect register addressing with displacement @(disp:4, Rn) Effective Addresses Calculation Formula The effective address is Rn plus a 4-bit displacement (disp). The value of disp is zero-extended, and remains the same for a byte operation, is doubled for a word operation, or is quadrupled for a longword operation. Byte: Rn + disp Word: Rn + disp × 2 Longword: Rn + disp × 4 Rn disp (zero-extended) + Rn + disp × 1/2/4 × 1/2/4 Indirect indexed register addressing @(R0, Rn) The effective address is the Rn value plus R0. Rn + R0 Rn + Rn + R0 R0 Indirect GBR addressing with displacement @(disp:8, GBR) The effective address is the GBR value plus an 8-bit displacement (disp). The value of disp is zeroextended, and remains the same for a byte operation, is doubled for a word operation, or is quadrupled for a longword operation. GBR disp (zero-extended) + GBR + disp × 1/2/4 Byte: GBR + disp Word: GBR + disp × 2 Longword: GBR + disp × 4 × 1/2/4 Indirect indexed GBR addressing @(R0, GBR) The effective address is the GBR value plus R0. GBR + R0 GBR + GBR + R0 R0 25 Table 5.7 Addressing Modes and Effective Addresses (cont) Addressing Instruction Mode Format PC relative addressing with displacement @(disp:8, PC) Effective Addresses Calculation Formula The effective address is the PC value plus an 8-bit displacement (disp). The value of disp is zeroextended, and disp is doubled for a word operation, or is quadrupled for a longword operation. For a longword operation, the lowest two bits of the PC are masked. Word: PC + disp × 2 Longword: PC & H'FFFFFFFC + disp × 4 PC (for longword) & H'FFFFFFFC PC + disp × 2 or PC&H'FFFFFFFC + disp × 4 + disp (zero-extended) x 2/4 PC relative addressing disp:8 The effective address is the PC value sign-extended with an 8-bit displacement (disp), doubled, and added to the PC. PC + disp × 2 PC disp (sign-extended) + PC + disp × 2 × 2 disp:12 The effective address is the PC value sign-extended with a 12-bit displacement (disp), doubled, and added to the PC. PC disp (sign-extended) + × 2 26 PC + disp × 2 PC + disp × 2 Table 5.7 Addressing Modes and Effective Addresses (cont) Addressing Instruction Mode Format PC relative addressing (cont) Rn Effective Addresses Calculation Formula The effective address is the register PC plus Rn. PC + Rn PC + PC + R0 R0 Immediate addressing 5.3 #imm:8 The 8-bit immediate data (imm) for the TST, AND, OR, and XOR instructions are zero-extended. Ñ #imm:8 The 8-bit immediate data (imm) for the MOV, ADD, and CMP/EQ instructions are sign-extended. Ñ #imm:8 Immediate data (imm) for the TRAPA instruction is zero-extended and is quadrupled. Ñ Instruction Format The instruction format table, table 5.8, refers to the source operand and the destination operand. The meaning of the operand depends on the instruction code. The symbols are used as follows: • • • • • xxxx: Instruction code mmmm: Source register nnnn: Destination register iiii: Immediate data dddd: Displacement Table 5.8 Instruction Formats Instruction Formats 0 format 15 Source Operand Destination Operand Example Ñ Ñ NOP Ñ nnnn: Direct register MOVT Rn Control register or system register nnnn: Direct register STS MACH,Rn 0 xxxx xxxx xxxx xxxx n format 15 0 xxxx nnnn xxxx xxxx 27 Table 5.8 Instruction Formats (cont) Source Operand Instruction Formats Destination Operand Example n format (cont) Control register or nnnn: Indirect system register pre-decrement register m format mmmm: Direct register Control register or LDC system register Rm,SR mmmm: Indirect post-increment register Control register or LDC.L system register @Rm+,SR mmmm: Direct register Ñ 15 0 xxxx mmmm xxxx xxxx nm format 15 0 xxxx nnnn mmmm xxxx STC.L SR,@-Rn JMP @Rm mmmm: PC Ñ relative using Rm* BRAF Rm mmmm: Direct register nnnn: Direct register ADD Rm,Rn mmmm: Direct register nnnn: Indirect register MOV.L Rm,@Rn mmmm: Indirect post-increment register (multiply/ accumulate) MACH, MACL MAC.W @Rm+,@Rn+ mmmm: Indirect post-increment register nnnn: Direct register MOV.L @Rm+,Rn mmmm: Direct register nnnn: Indirect pre-decrement register MOV.L Rm,@-Rn mmmm: Direct register nnnn: Indirect indexed register MOV.L Rm,@(R0,Rn) mmmmdddd: indirect register with displacement R0 (Direct register) MOV.B @(disp,Rm),R0 R0 (Direct register) nnnndddd: MOV.B Indirect register R0,@(disp,Rn) with displacement nnnn*: Indirect post-increment register (multiply/ accumulate) md format 15 0 xxxx xxxx mmmm dddd nd4 format 15 0 xxxx 28 xxxx nnnn dddd Table 5.8 Instruction Formats (cont) Instruction Formats nmd format 15 0 xxxx 15 0 xxxx dddd dddd d12 format 15 0 xxxx dddd dddd 15 0 nnnn dddd dddd i format 15 0 xxxx xxxx iiii nnnndddd: Indirect register with displacement MOV.L Rm,@(disp,Rn) mmmmdddd: Indirect register with displacement nnnn: Direct register MOV.L @(disp,Rm),Rn dddddddd: Indirect GBR with displacement R0 (Direct register) MOV.L @(disp,GBR),R0 R0(Direct register) dddddddd: Indirect GBR with displacement dddddddd: PC relative with displacement R0 (Direct register) MOVA @(disp,PC),R0 dddddddd: PC relative Ñ BF label dddddddddddd: PC relative Ñ BRA label dddddddd: PC relative with displacement nnnn: Direct register MOV.L @(disp,PC),Rn iiiiiiii: Immediate Indirect indexed GBR AND.B #imm,@(R0,GBR) iiiiiiii: Immediate R0 (Direct register) AND #imm,R0 iiiiiiii: Immediate Ñ TRAPA #imm iiiiiiii: Immediate nnnn: Direct register ADD #imm,Rn 15 0 nnnn iiii Example MOV.L R0,@(disp,GBR) (label = disp + PC) iiii ni format xxxx mmmm: Direct register dddd nd8 format xxxx Destination Operand nnnn mmmm dddd d format xxxx Source Operand iiii Note: In multiply/accumulate instructions, nnnn is the source register. 29 Section 6 Instruction Set 6.1 Instruction Set by Classification Table 6.1 shows instruction by classification Table 6.1 Classification of Instructions Operation Classification Types Code Function Data transfer Arithmetic operations 5 21 No. of Instructions MOV 39 Data transfer, immediate data transfer, peripheral module data transfer, structure data transfer MOVA Effective address transfer MOVT T bit transfer SWAP Swap of upper and lower bytes XTRCT Extraction of the middle of registers connected ADD Binary addition ADDC Binary addition with carry ADDV Binary addition with overflow check 33 CMP/cond Comparison DIV1 Division DIV0S Initialization of signed division DIV0U Initialization of unsigned division DMULS Signed double-length multiplication DMULU Unsigned double-length multiplication DT Decrement and test EXTS Sign extension EXTU Zero extension MAC Multiply-and-accumulate, double-length multiply-and-accumulate operation MUL Double-length multiply operation MULS Signed multiplication MULU Unsigned multiplication NEG Negation NEGC Negation with borrow 31 Table 6.1 Classification of Instructions (cont) Operation Classification Types Code Function No. of Instructions Arithmetic operations (cont) 21 33 Logic operations 6 Shift Branch 32 10 9 SUB Binary subtraction SUBC Binary subtraction with borrow SUBV Binary subtraction with underflow AND Logical AND NOT Bit inversion OR Logical OR TAS Memory test and bit set TST Logical AND and T bit set XOR Exclusive OR ROTL One-bit left rotation ROTR One-bit right rotation ROTCL One-bit left rotation with T bit ROTCR One-bit right rotation with T bit SHAL One-bit arithmetic left shift SHAR One-bit arithmetic right shift SHLL One-bit logical left shift SHLLn n-bit logical left shift SHLR One-bit logical right shift SHLRn n-bit logical right shift BF Conditional branch, conditional branch with delay (Branch when T = 0) BT Conditional branch, conditional branch with delay (Branch when T = 1) BRA Unconditional branch BRAF Unconditional branch BSR Branch to subroutine procedure BSRF Branch to subroutine procedure JMP Unconditional branch JSR Branch to subroutine procedure RTS Return from subroutine procedure 14 14 11 Table 6.1 Classification of Instructions (cont) Operation Classification Types Code Function No. of Instructions System control 31 11 Floating-point 15 instructions FPU-related CPU instructions Total: 2 79 CLRT T bit clear CLRMAC MAC register clear LDC Load to control register LDS Load to system register NOP No operation RTE Return from exception processing SETT T bit set SLEEP Transition to power-down mode STC Store control register data STS Store system register data TRAPA Trap exception handling FABS Floating-point absolute value FADD Floating-point addition FCMP Floating-point comparison FDIV Floating-point division FLDI0 Floating-point load immediate 0 FLDI1 Floating-point load immediate 1 FLDS Floating-point load into system register FPUL FLOAT Integer-to-floating-point conversion FMAC Floating-point multiply-and-accumulate operation FMOV Floating-point data transfer FMUL Floating-point multiplication FNEG Floating-point sign inversion FSTS Floating-point store from system register FPUL FSUB Floating-point subtraction FTRC Floating-point conversion with rounding to integer LDS Load into floating-point system register STS Store from floating-point system register 22 8 172 33 Table 6.2 shows the format used in tables 6.3 to 6.8, which list instruction codes, operation, and execution states in order by classification. Table 6.2 Instruction Code Format Item Format Explanation Instruction OP.Sz SRC,DEST OP: Operation code Sz: Size (B: byte, W: word, or L: longword) SRC: Source DEST: Destination Rm: Source register Rn: Destination register imm: Immediate data disp: Displacement* 1 Instruction code MSB ↔ LSB Operation mmmm: Source register nnnn: Destination register 0000: R0 0001: R1 ⋅ ⋅ ⋅ 1111: R15 iiii: Immediate data dddd: Displacement →, ← Direction of transfer (xx) Memory operand M/Q/T Flag bits in the SR & Logical AND of each bit | Logical OR of each bit ^ Exclusive OR of each bit ~ Logical NOT of each bit <<n n-bit left shift >>n n-bit right shift Execution cycles — Value when no wait states are inserted*2 T bit — Value of T bit after instruction is executed. An em-dash (—) in the column means no change. Notes: 1. Depending on the operand size, displacement is scaled ×1, ×2, or ×4. For details, see section 7, Instruction Descriptions. 2. Instruction execution cycles: The execution cycles shown in the table are minimums. The actual number of cycles may be increased when (1) contention occurs between instruction fetches and data access, or (2) when the destination register of the load instruction (memory → register) and the register used by the next instruction are the same. 34 Table 6.3 Data Transfer Instructions Execution Cycles T Bit Instruction Instruction Code Operation MOV #imm,Rn 1110nnnniiiiiiii imm → Sign extension → Rn 1 — MOV.W @(disp,PC),Rn 1001nnnndddddddd (disp × 2 + PC) → Sign extension → Rn 1 — MOV.L @(disp,PC),Rn 1101nnnndddddddd (disp × 4 + PC) → Rn 1 — MOV 0110nnnnmmmm0011 Rm → Rn 1 — MOV.B Rm,@Rn 0010nnnnmmmm0000 Rm → (Rn) 1 — MOV.W Rm,@Rn 0010nnnnmmmm0001 Rm → (Rn) 1 — MOV.L Rm,@Rn 0010nnnnmmmm0010 Rm → (Rn) 1 — MOV.B @Rm,Rn 0110nnnnmmmm0000 (Rm) → Sign extension → Rn 1 — MOV.W @Rm,Rn 0110nnnnmmmm0001 (Rm) → Sign extension → Rn 1 — MOV.L @Rm,Rn 0110nnnnmmmm0010 (Rm) → Rn 1 — MOV.B Rm,@–Rn 0010nnnnmmmm0100 Rn–1 → Rn, Rm → (Rn) 1 — MOV.W Rm,@–Rn 0010nnnnmmmm0101 Rn–2 → Rn, Rm → (Rn) 1 — MOV.L Rm,@–Rn 0010nnnnmmmm0110 Rn–4 → Rn, Rm → (Rn) 1 — MOV.B @Rm+,Rn 0110nnnnmmmm0100 (Rm) → Sign extension → Rn,Rm + 1 → Rm 1 — MOV.W @Rm+,Rn 0110nnnnmmmm0101 (Rm) → Sign extension → Rn,Rm + 2 → Rm 1 — MOV.L @Rm+,Rn 0110nnnnmmmm0110 (Rm) → Rn,Rm + 4 → Rm 1 — MOV.B R0,@(disp,Rn) 10000000nnnndddd R0 → (disp + Rn) 1 — MOV.W R0,@(disp,Rn) 10000001nnnndddd R0 → (disp × 2 + Rn) 1 — MOV.L Rm,@(disp,Rn) 0001nnnnmmmmdddd Rm → (disp × 4 + Rn) 1 — MOV.B @(disp,Rm),R0 10000100mmmmdddd (disp + Rm) → Sign extension → R0 1 — MOV.W @(disp,Rm),R0 10000101mmmmdddd (disp × 2 + Rm) → Sign extension → R0 1 — MOV.L @(disp,Rm),Rn 0101nnnnmmmmdddd (disp × 4 + Rm) → Rn 1 — MOV.B Rm,@(R0,Rn) 0000nnnnmmmm0100 Rm → (R0 + Rn) 1 — Rm,Rn 35 Table 6.3 Data Transfer Instructions (cont) Instruction Instruction Code Operation Execution Cycles MOV.W Rm,@(R0,Rn) 0000nnnnmmmm0101 Rm → (R0 + Rn) 1 — MOV.L Rm,@(R0,Rn) 0000nnnnmmmm0110 Rm → (R0 + Rn) 1 — MOV.B @(R0,Rm),Rn 0000nnnnmmmm1100 (R0 + Rm) → Sign extension → Rn 1 — MOV.W @(R0,Rm),Rn 0000nnnnmmmm1101 (R0 + Rm) → Sign extension → Rn 1 — MOV.L @(R0,Rm),Rn 0000nnnnmmmm1110 (R0 + Rm) → Rn 1 — MOV.B R0,@(disp,GBR) 11000000dddddddd R0 → (disp + GBR) 1 — MOV.W R0,@(disp,GBR) 11000001dddddddd R0 → (disp × 2 + GBR) 1 — MOV.L R0,@(disp,GBR) 11000010dddddddd R0 → (disp × 4 + GBR) 1 — MOV.B @(disp,GBR),R0 11000100dddddddd (disp + GBR) → Sign extension → R0 1 — MOV.W @(disp,GBR),R0 11000101dddddddd (disp × 2 + GBR) → Sign extension → R0 1 — MOV.L @(disp,GBR),R0 11000110dddddddd (disp × 4 + GBR) → R0 1 — MOVA @(disp,PC),R0 11000111dddddddd disp × 4 + PC → R0 1 — MOVT Rn 0000nnnn00101001 T → Rn 1 — SWAP.B Rm,Rn 0110nnnnmmmm1000 Rm → Swap bottom two bytes → Rn 1 — SWAP.W Rm,Rn 0110nnnnmmmm1001 Rm → Swap two consecutive words → Rn 1 — XTRCT 0010nnnnmmmm1101 Rm: Middle 32 bits of Rn → Rn 1 — 36 Rm,Rn T Bit Table 6.4 Arithmetic Operation Instructions Instruction Instruction Code Operation Execution Cycles ADD Rm,Rn 0011nnnnmmmm1100 Rn + Rm → Rn 1 — ADD #imm,Rn 0111nnnniiiiiiii Rn + imm → Rn 1 — ADDC Rm,Rn 0011nnnnmmmm1110 Rn + Rm + T → Rn, Carry → T 1 Carry ADDV Rm,Rn 0011nnnnmmmm1111 Rn + Rm → Rn, Overflow → T 1 Overflow CMP/EQ #imm,R0 10001000iiiiiiii If R0 = imm, 1 → T 1 Comparison result CMP/EQ Rm,Rn 0011nnnnmmmm0000 If Rn = Rm, 1 → T 1 Comparison result CMP/HS Rm,Rn 0011nnnnmmmm0010 If RnRm with unsigned data, 1 → T 1 Comparison result CMP/GE Rm,Rn 0011nnnnmmmm0011 If Rn Rm with signed data, 1 → T 1 Comparison result CMP/HI Rm,Rn 0011nnnnmmmm0110 If Rn > Rm with unsigned data, 1 → T 1 Comparison result CMP/GT Rm,Rn 0011nnnnmmmm0111 If Rn > Rm with signed data, 1 → T 1 Comparison result CMP/PL Rn 0100nnnn00010101 If Rn > 0, 1 → T 1 Comparison result CMP/PZ Rn 0100nnnn00010001 If Rn 0, 1 → T 1 Comparison result CMP/STR Rm,Rn 0010nnnnmmmm1100 If Rn and Rm have an equivalent byte, 1→T 1 Comparison result DIV1 Rm,Rn 0011nnnnmmmm0100 Single-step division (Rn ÷ Rm) 1 Calculation result DIV0S Rm,Rn 0010nnnnmmmm0111 MSB of Rn → Q, MSB of Rm → M, M ^ Q → T 1 Calculation result 0000000000011001 0 → M/Q/T 1 0 DIV0U T Bit 37 Table 6.4 Arithmetic Operation Instructions (cont) Execution Cycles T Bit Instruction Instruction Code Operation DMULS.L Rm,Rn 0011nnnnmmmm1101 Signed operation of Rn × Rm → MACH, MACL 32 × 32 → 64 bits 2 to 4* — DMULU.L Rm,Rn 0011nnnnmmmm0101 Unsigned operation of 2 to 4* Rn × Rm → MACH, MACL 32 × 32 → 64 bits — DT Rn 0100nnnn00010000 Rn – 1 → Rn, when Rn 1 is 0, 1 → T. When Rn is nonzero, 0 → T Comparison result EXTS.B Rm,Rn 0110nnnnmmmm1110 Byte in Rm is signextended → Rn 1 — EXTS.W Rm,Rn 0110nnnnmmmm1111 Word in Rm is signextended → Rn 1 — EXTU.B Rm,Rn 0110nnnnmmmm1100 Byte in Rm is zeroextended → Rn 1 — EXTU.W Rm,Rn 0110nnnnmmmm1101 Word in Rm is zeroextended → Rn 1 — MAC.L @Rm+,@Rn+ 0000nnnnmmmm1111 Signed operation of (Rn) × (Rm) + MAC → MAC 32 × 32 + 64 → 64 bits 3/(2 to 4)* — MAC.W @Rm+,@Rn+ 0100nnnnmmmm1111 Signed operation of (Rn) × (Rm) + MAC → MAC 16 × 16 + 64 → 64 bits 3/(2)* — MUL.L Rm,Rn 0000nnnnmmmm0111 Rn × Rm → MACL, 32 × 32 → 32 bits 2 to 4* — MULS.W Rm,Rn 0010nnnnmmmm1111 Signed operation of Rn × Rm → MAC 16 × 16 → 32 bits 1 to 3* — MULU.W Rm,Rn 0010nnnnmmmm1110 Unsigned operation of Rn × Rm → MAC 16 × 16 → 32 bits 1 to 3* — NEG Rm,Rn 0110nnnnmmmm1011 0 – Rm → Rn 1 — NEGC Rm,Rn 0110nnnnmmmm1010 0 – Rm – T → Rn, Borrow → T 1 Borrow 38 Table 6.4 Arithmetic Operation Instructions (cont) Instruction Instruction Code Operation Execution Cycles SUB Rm,Rn 0011nnnnmmmm1000 Rn – Rm → Rn 1 — SUBC Rm,Rn 0011nnnnmmmm1010 Rn – Rm – T → Rn, Borrow → T 1 Borrow SUBV Rm,Rn 0011nnnnmmmm1011 Rn – Rm → Rn, Underflow → T 1 Overflow T Bit Note: * The normal minimum number of execution cycles. (The number in parentheses is the number of cycles when there is contention with following instructions.) 39 Table 6.5 Logic Operation Instructions Instruction Instruction Code Operation Execution Cycles AND Rm,Rn 0010nnnnmmmm1001 Rn & Rm → Rn 1 — AND #imm,R0 11001001iiiiiiii R0 & imm → R0 1 — AND.B #imm,@(R0,GBR) 11001101iiiiiiii (R0 + GBR) & imm → (R0 + GBR) 3 — NOT Rm,Rn 0110nnnnmmmm0111 ~Rm → Rn 1 — OR Rm,Rn 0010nnnnmmmm1011 Rn | Rm → Rn 1 — OR #imm,R0 11001011iiiiiiii R0 | imm → R0 1 — OR.B #imm,@(R0,GBR) 11001111iiiiiiii (R0 + GBR) | imm → (R0 + GBR) 3 — TAS.B @Rn 0100nnnn00011011 If (Rn) is 0, 1 → T; 1 → MSB of (Rn) 4 Test result TST Rm,Rn 0010nnnnmmmm1000 Rn & Rm; if the result is 0, 1 → T 1 Test result TST #imm,R0 11001000iiiiiiii R0 & imm; if the result is 0, 1 → T 1 Test result TST.B #imm,@(R0,GBR) 11001100iiiiiiii (R0 + GBR) & imm; if the 3 result is 0, 1 → T Test result XOR Rm,Rn 0010nnnnmmmm1010 Rn ^ Rm → Rn 1 — XOR #imm,R0 11001010iiiiiiii R0 ^ imm → R0 1 — XOR.B #imm,@(R0,GBR) 11001110iiiiiiii (R0 + GBR) ^ imm → (R0 + GBR) 3 — 40 T Bit Table 6.6 Shift Instructions Instruction Instruction Code Operation Execution Cycles ROTL Rn 0100nnnn00000100 T ← Rn ← MSB 1 MSB ROTR Rn 0100nnnn00000101 LSB → Rn → T 1 LSB ROTCL Rn 0100nnnn00100100 T ← Rn ← T 1 MSB ROTCR Rn 0100nnnn00100101 T → Rn → T 1 LSB SHAL Rn 0100nnnn00100000 T ← Rn ← 0 1 MSB SHAR Rn 0100nnnn00100001 MSB → Rn → T 1 LSB SHLL Rn 0100nnnn00000000 T ← Rn ← 0 1 MSB SHLR Rn 0100nnnn00000001 0 → Rn → T 1 LSB SHLL2 Rn 0100nnnn00001000 Rn<<2 → Rn 1 — SHLR2 Rn 0100nnnn00001001 Rn>>2 → Rn 1 — SHLL8 Rn 0100nnnn00011000 Rn<<8 → Rn 1 — SHLR8 Rn 0100nnnn00011001 Rn>>8 → Rn 1 — SHLL16 Rn 0100nnnn00101000 Rn<<16 → Rn 1 — SHLR16 Rn 0100nnnn00101001 Rn>>16 → Rn 1 — T Bit 41 Table 6.7 Branch Instructions Execution Cycles T Bit Instruction Instruction Code Operation BF label 10001011dddddddd If T = 0, disp × 2 + PC → PC; if T = 1, nop 3/1* — BF/S label 10001111dddddddd Delayed branch, if T = 0, disp × 2 + PC → PC; if T = 1, nop 3/1* — BT label 10001001dddddddd If T = 1, disp × 2 + PC → PC; if T = 0, nop 3/1* — BT/S label 10001101dddddddd Delayed branch, if T = 1, disp × 2 + PC → PC; if T = 0, nop 2/1* — BRA 1010dddddddddddd Delayed branch, disp × 2 + PC → PC 2 — BRAF Rm 0000mmmm00100011 Delayed branch, Rm + PC → PC 2 — BSR 1011dddddddddddd Delayed branch, PC → PR, disp × 2 + PC → PC 2 — BSRF Rm 0000mmmm00000011 Delayed branch, PC → PR, Rm + PC → PC 2 — JMP @Rm 0100mmmm00101011 Delayed branch, Rm → PC 2 — JSR @Rm 0100mmmm00001011 Delayed branch, PC → PR, Rm → PC 2 — 0000000000001011 Delayed branch, PR → PC 2 — RTS label label Note: * One state when the program does not branch. 42 Table 6.8 System Control Instructions Instruction Instruction Code Operation Execution Cycles CLRT 0000000000001000 0→T 1 0 CLRMAC 0000000000101000 0 → MACH, MACL 1 — T Bit LDC Rm,SR 0100mmmm00001110 Rm → SR 1 LSB LDC Rm,GBR 0100mmmm00011110 Rm → GBR 1 — LDC Rm,VBR 0100mmmm00101110 Rm → VBR 1 — LDC.L @Rm+,SR 0100mmmm00000111 (Rm) → SR, Rm + 4 → Rm 3 LSB LDC.L @Rm+,GBR 0100mmmm00010111 (Rm) → GBR, Rm + 4 → Rm 3 — LDC.L @Rm+,VBR 0100mmmm00100111 (Rm) → VBR, Rm + 4 → Rm 3 — LDS Rm,MACH 0100mmmm00001010 Rm → MACH 1 — LDS Rm,MACL 0100mmmm00011010 Rm → MACL 1 — LDS Rm,PR 0100mmmm00101010 Rm → PR 1 — LDS.L @Rm+,MACH 0100mmmm00000110 (Rm) → MACH, Rm + 4 → Rm 1 — LDS.L @Rm+,MACL 0100mmmm00010110 (Rm) → MACL, Rm + 4 → Rm 1 — LDS.L @Rm+,PR 0100mmmm00100110 (Rm) → PR, Rm + 4 → Rm 1 — NOP 0000000000001001 No operation 1 — RTE 0000000000101011 Delayed branch, stack area → PC/SR 4 — SETT 0000000000011000 1→T 1 1 SLEEP 0000000000011011 Sleep 3* — STC SR,Rn 0000nnnn00000010 SR → Rn 1 — STC GBR,Rn 0000nnnn00010010 GBR → Rn 1 — STC VBR,Rn 0000nnnn00100010 VBR → Rn 1 — STC.L SR,@–Rn 0100nnnn00000011 Rn – 4 → Rn, SR → (Rn) 2 — STC.L GBR,@–Rn 0100nnnn00010011 Rn – 4 → Rn, GBR → (Rn) 2 — STC.L VBR,@–Rn 0100nnnn00100011 Rn – 4 → Rn, BR → (Rn) 2 — STS MACH,Rn 0000nnnn00001010 MACH → Rn 1 — STS MACL,Rn 0000nnnn00011010 MACL → Rn 1 — STS PR,Rn 0000nnnn00101010 PR → Rn 1 — 43 Table 6.8 System Control Instructions (cont) Instruction Instruction Code Operation Execution Cycles STS.L MACH,@–Rn 0100nnnn00000010 Rn – 4 → Rn, MACH → (Rn) 1 — STS.L MACL,@–Rn 0100nnnn00010010 Rn – 4 → Rn, MACL → (Rn) 1 — STS.L PR,@–Rn 0100nnnn00100010 Rn – 4 → Rn, PR → (Rn) 1 — TRAPA #imm 11000011iiiiiiii PC/SR → stack area, imm × 4 + VBR → PC 8 — T Bit Note: * The number of execution cycles before the chip enters sleep mode: The execution cycles shown in the table are minimums. The actual number of cycles may be increased when (1) contention occurs between instruction fetches and data access, or (2) when the destination register of the load instruction (memory → register) and the register used by the next instruction are the same. 44 Table 6.9 Floating-Point Instructions Instruction Instruction Code Operation Execution Cycles T Bit FABS FRn 1111nnnn01011101 |FRn| → FRn 1 — FADD FRm,FRn 1111nnnnmmmm0000 FRn + FRm → FRn 1 — FCMP/EQ FRm,FRn 1111nnnnmmmm0100 (FRn = FRm)? 1:0 → T 1 Comparison result FCMP/GT FRm,FRn 1111nnnnmmmm0101 (FRn > FRm)? 1:0 → T 1 Comparison result FDIV FRm,FRn 1111nnnnmmmm0011 FRn/FRm → FRn 13 — FLDI0 FRn 1111nnnn10001101 0x00000000 → FRn 1 — FLDI1 FRn 1111nnnn10011101 0x3F800000 → FRn 1 — FLDS FRm,FPUL 1111mmmm00011101 FRm → FPUL 1 — FLOAT FPUL,FRn 1111nnnn00101101 (float) FPUL → FRn 1 — FMAC FR0,FRm,FRn 1111nnnnmmmm1110 FR0 × FRm + FRn → FRn 1 — FMOV FRm, FRn 1111nnnnmmmm1100 FRm → FRn 1 — FMOV.S @(R0,Rm),FRn 1111nnnnmmmm0110 (R0 + Rm) → FRn 1 — FMOV.S @Rm+,FRn 1111nnnnmmmm1001 (Rm) → FRn, Rm+ = 4 1 — FMOV.S @Rm,FRn 1111nnnnmmmm1000 (Rm) → FRn 1 — FMOV.S FRm,@(R0,Rn) 1111nnnnmmmm0111 FRm → (R0 + Rn) 1 — FMOV.S FRm,@-Rn 1111nnnnmmmm1011 Rn– = 4, FRm → (Rn) 1 — FMOV.S FRm,@Rn 1111nnnnmmmm1010 FRm → (Rn) 1 — FMUL FRm,FRn 1111nnnnmmmm0010 FRn × FRm → FRn 1 — FNEG FRn 1111nnnn01001101 –FRn → FRn 1 — FSTS FPUL,FRn 1111nnnn00001101 FPUL → FRn 1 — FSUB FRm,FRn 1111nnnnmmmm0001 FRn – FRm → FRn 1 — FTRC FRm,FPUL 1111nnnn00111101 (long) FRm → FPUL 1 — 45 Table 6.10 FPU-Related CPU Instructions Instruction Instruction Code Operation Execution Cycles LDS Rm,FPSCR 0100mmmm01101010 Rm → FPSCR 1 — LDS Rm,FPUL 0100mmmm01011010 Rm → FPUL 1 — LDS.L @Rm+, FPSCR 0100mmmm01100110 @Rm → FPSCR, Rm+ = 4 1 — LDS.L @Rm+, FPUL 0100mmmm01010110 @Rm → FPUL, Rm+ = 4 1 — STS FPSCR, Rn 0000nnnn01101010 FPSCR → Rn 1 — STS FPUL,Rn 0000nnnn01011010 FPUL → Rn 1 — STS.L FPSCR,@-Rn 0100nnnn01100010 Rn– = 4, FPCSR → @Rn 1 — STS.L FPUL,@-Rn 0100nnnn01010010 Rn– = 4, FPUL → @Rn 1 — 46 T Bit 6.2 Instruction Set in Alphabetical Order Table 6-11 alphabetically lists the instruction codes and number of execution cycles for each instruction. Table 6-11 Instruction Set Listed Alphabetically Instruction Operation Code Cycles T Bit ADD #imm,Rn Rn + imm → Rn 0111nnnniiiiiiii 1 — ADD Rm,Rn Rn + Rm → Rn 0011nnnnmmmm1100 1 — ADDC Rm,Rn Rn + Rm + T → Rn, Carry → T 0011nnnnmmmm1110 1 Carry ADDV Rm,Rn Rn + Rm → Rn, Overflow → T 0011nnnnmmmm1111 1 Over -flow AND #imm,R0 R0 & imm → R0 11001001iiiiiiii 1 — AND Rm,Rn Rn & Rm → Rn 0010nnnnmmmm1001 1 — AND.B #imm,@(R0,GBR) (R0 + GBR) & imm → (R0 + GBR) 11001101iiiiiiii 3 — BF label If T = 0, disp + PC → PC; if T = 1, nop 10001011dddddddd 3/1*1 — BF/S label If T = 0, disp + PC → PC; if T = 1, nop 10001111dddddddd 2/1*1 — BRA label Delayed branch, disp + PC → PC 1010dddddddddddd 2 — BRAF Rn Delayed branch, Rn + PC → PC 0000nnnn00100011 2 — BSR label Delayed branch, PC → PR, disp + PC → PC 1011dddddddddddd 2 — BSRF Rn Delayed branch, PC → PR, Rn + PC → PC 0000nnnn00000011 2 — BT label If T = 1, disp + PC → PC; if T = 0, nop 10001001dddddddd 3/1*1 — BT/S label If T = 1, disp + PC → PC; if T = 0, nop 10001101dddddddd 2/1*1 — CLRMAC 0 → MACH, MACL 0000000000101000 1 — CLRT 0→T 0000000000001000 1 0 47 Table 6-11 Instruction Set Listed Alphabetically (cont) Instruction Operation Code Cycles T Bit CMP/EQ #imm,R0 If R0 = imm, 1 → T 10001000iiiiiiii 1 Comparison result CMP/EQ Rm,Rn If Rn = Rm, 1 → T 0011nnnnmmmm0000 1 Comparison result CMP/GE Rm,Rn If Rn Rm with signed data, 1 → T 0011nnnnmmmm0011 1 Comparison result CMP/GT Rm,Rn If Rn > Rm with signed data, 1 → T 0011nnnnmmmm0111 1 Comparison result CMP/HI Rm,Rn If Rn > Rm with unsigned data, 0011nnnnmmmm0110 1 Comparison result CMP/HS Rm,Rn If Rn Rm with unsigned data, 1 → T 0011nnnnmmmm0010 1 Comparison result CMP/PL Rn If Rn>0, 1 → T 0100nnnn00010101 1 Comparison result CMP/PZ Rn If Rn 0, 1 → T 0100nnnn00010001 1 Comparison result CMP/STR Rm,Rn If Rn and Rm have an equivalent byte, 1 → T 0010nnnnmmmm1100 1 Comparison result DIV0S Rm,Rn MSB of Rn → Q, MSB 0010nnnnmmmm0111 of Rm → M, M ^ Q → T 1 Calculation result 0 → M/Q/T 0000000000011001 1 0 DIV0U DIV1 Rm,Rn Single-step division (Rn/Rm) 0011nnnnmmmm0100 1 Calculation result DMULS.L Rm,Rn Signed operation of Rn × Rm → MACH, MACL 0011nnnnmmmm1101 2 to 4*2 — DMULU.L Rm,Rn Unsigned operation of Rn × Rm → MACH, MACL 0011nnnnmmmm0101 2 to 4*2 — DT Rn Rn - 1 → Rn, when Rn is 0, 1 → T. When Rn is nonzero, 0 → T 0100nnnn00010000 1 Comparison result EXTS.B Rm,Rn A byte in Rm is signextended → Rn 0110nnnnmmmm1110 1 — 48 Table 6-11 Instruction Set Listed Alphabetically (cont) Instruction Operation Code Cycles T Bit EXTS.W Rm,Rn A word in Rm is signextended → Rn 0110nnnnmmmm1111 1 — EXTU.B Rm,Rn A byte in Rm is zeroextended → Rn 0110nnnnmmmm1100 1 — EXTU.W Rm,Rn A word in Rm is zero- 0110nnnnmmmm1101 extended → Rn 1 — FABS FRn | FRn | → FRn 1111nnnn01011101 1 — FADD FRm ,FRn FRn + FRm → FRn 1111nnnnmmmm0000 1 — FCMP/EQ FRm ,FRn (FRn == FRm)? 1:0 → T 1111nnnnmmmm0100 1 Comparison result FCMP/GT FRm ,FRn (FRn > FRm) ? 1:0 → T 1111nnnnmmmm0101 1 Comparison result FDIV FRm ,FRn FRn /FRm → FRn 1111nnnnmmmm0011 13 — FLDI0 FRn H'00000000 → FRn 1111nnnn10001101 1 — FLDI1 FRn H'3F800000 → FRn 1111nnnn10011101 1 — FLDS FRm ,FPUL FRm → FPUL 1111mmmm00011101 1 — FLOAT FPUL, FRn (float)FPUL → FRn 1111nnnn00101101 1 — FMAC FR0,FRm,FRn FR0 × FRm + FRn → FRn 1111nnnnmmmm1110 1 — FMOV FRm ,FRn FRm → FRn 1111nnnnmmmm1100 1 — FMOV.S @(R0,Rm),FRn (R0 + Rm) → FRn 1111nnnnmmmm0110 1 — FMOV.S @Rm+,FRn (Rm) → FRn,Rm + 4 = Rm 1111nnnnmmmm1001 1 — FMOV.S @Rm,FRn (Rm) → FRn 1111nnnnmmmm1000 1 — FMOV.S FRm,@(R0,Rn) (FRm) → (R0 + Rn) 1111nnnnmmmm0111 1 — FMOV.S FRm,@-Rn Rn-4 → Rn, FRm → (Rn) 1111nnnnmmmm1011 1 — FMOV.S FRm,@Rn FRm → (Rn) 1111nnnnmmmm1010 1 — FMOV.S FRm,FRn FRn × FRm → FRn 1111nnnnmmmm0010 1 — FMUL FRm,FRn FRn × FRm → FRn 1111nnnnmmmm0010 1 — FNEG FRn –FRn → FRn 1111nnnn01001101 1 — FSTS FPUL,FRn FPUL → FRn 1111nnnn00001101 1 — FSUB FRm,FRn FRn – FRm → FRn 1111nnnnmmmm0001 1 — FTRC FRm,FPUL (long)FRm → FPUL 1111mmmm00111101 1 — 49 Table 6-11 Instruction Set Listed Alphabetically (cont) Instruction Operation Code Cycles T Bit JMP @Rm Delayed branch, Rm → PC 0100nnnn00101011 2 — JSR @Rm Delayed branch, PC → PR, Rm → PC 0100nnnn00001011 2 — LDC Rm,GBR Rm → GBR 0100mmmm00011110 1 — LDC Rm,SR Rm → SR 0100mmmm00001110 1 LSB LDC Rm,VBR Rm → VBR 0100mmmm00101110 1 — LDC.L @Rm+,GBR (Rm) → GBR, Rm + 4 → Rm 0100mmmm00010111 3 — LDC.L @Rm+,SR (Rm) → SR, Rm + 4 → Rm 0100mmmm00000111 3 LSB LDC.L @Rm+,VBR (Rm) → VBR, Rm + 4 → Rm 0100mmmm00100111 3 — LDS Rm,FPSCR Rm → FPSCR 0100mmmm01101010 1 — LDS Rm,FPUL Rm → FPUL 0100mmmm01011010 1 — LDS Rm,MACH Rm → MACH 0100mmmm00001010 1 — LDS Rm,MACL Rm → MACL 0100mmmm00011010 1 — LDS Rm,PR Rm → PR 0100mmmm00101010 1 — LDS.L @Rm+,FPSCR @Rm → FPSCR , Rm+4 0100mmmm01100110 1 — LDS.L @Rm+,FPUL @Rm → FPUL , Rm+4 0100mmmm01010110 1 — LDS.L @Rm+,MACH (Rm) → MACH, Rm + 4 → Rm 0100mmmm00000110 1 — LDS.L @Rm+,MACL (Rm) → MACL, Rm + 4 → Rm 0100mmmm00010110 1 — LDS.L @Rm+,PR (Rm) → PR, Rm + 4 → Rm 0100mmmm00100110 1 — MAC.L @Rm+,@Rn+ Signed operation of (Rn) × (Rm) + MAC → MAC 0000nnnnmmmm1111 3/(2 to 4)*2 — MAC.W @Rm+,@Rn+ Signed operation of (Rn) × (Rm) + MAC → MAC 0100nnnnmmmm1111 3/ (2)*2 — MOV #imm,Rn imm → Sign extension → Rn 1110nnnniiiiiiii 1 — MOV Rm,Rn Rm → Rn 0110nnnnmmmm0011 1 — MOV.B @(disp,GBR), R0 (disp + GBR) → Sign extension 11000100dddddddd → R0 1 — MOV.B @(disp,Rm), R0 (disp + Rm) → Sign extension → R0 1 — 50 10000100mmmmdddd Table 6-11 Instruction Set Listed Alphabetically (cont) Instruction Operation Code Cycles T Bit MOV.B @(R0,Rm),Rn (R0 + Rm) → Sign extension → Rn 0000nnnnmmmm1100 1 — MOV.B @Rm+,Rn (Rm) → Sign extension → Rn, Rm + 1 → Rm 0110nnnnmmmm0100 1 — MOV.B @Rm,Rn (Rm) → Sign extension → Rn 0110nnnnmmmm0000 1 — MOV.B R0,@(disp,GBR) R0 → (disp + GBR) 11000000dddddddd 1 — MOV.B R0,@(disp,Rn) R0 → (disp + Rn) 10000000nnnndddd 1 — MOV.B Rm,@(R0,Rn) Rm → (R0 + Rn) 0000nnnnmmmm0100 1 — MOV.B Rm,@–Rn Rn–1 → Rn, Rm → (Rn) 0010nnnnmmmm0100 1 — MOV.B Rm,@Rn Rm → (Rn) 0010nnnnmmmm0000 1 — MOV.L @(disp,GBR),R0 (disp × 4 + GBR) → R0 11000110dddddddd 1 — MOV.L @(disp,PC),Rn (disp × 4 + PC) → Rn 1101nnnndddddddd 1 — MOV.L @(disp,Rm),Rn (disp × 4 + Rm) → Rn 0101nnnnmmmmdddd 1 — MOV.L @(R0,Rm),Rn (R0 + Rm) → Rn 0000nnnnmmmm1110 1 — MOV.L @Rm+,Rn (Rm) → Rn, Rm + 4 → Rm 0110nnnnmmmm0110 1 — MOV.L @Rm,Rn (Rm) → Rn 0110nnnnmmmm0010 1 — MOV.L R0,@(disp,GBR) R0 → (disp × 4 + GBR) 11000010dddddddd 1 — MOV.L Rm,@(disp,Rn) Rm → (disp × 4 + Rn) 0001nnnnmmmmdddd 1 — MOV.L Rm,@(R0,Rn) Rm → (R0 × 4 + Rn) 0000nnnnmmmm0110 1 — MOV.L Rm,@–Rn Rn–4 → Rn, Rm → (Rn) 0010nnnnmmmm0110 1 — MOV.L Rm,@Rn Rm → (Rn) 0010nnnnmmmm0010 1 — MOV.W @(disp,GBR),R0 (disp × 2 + GBR) → Sign extension → R0 11000101dddddddd 1 — MOV.W @(disp,PC),Rn (disp × 2 + PC) → Sign extension → Rn 1001nnnndddddddd 1 — MOV.W @(disp,Rm), R0 (disp × 2 + Rm) → Sign extension → R0 10000101mmmmdddd 1 — MOV.W @(R0,Rm),Rn (R0 + Rm) → Sign extension → Rn 0000nnnnmmmm1101 1 — MOV.W @Rm+,Rn (Rm) → Sign extension → Rn, Rm + 2 → Rm 0110nnnnmmmm0101 1 — 51 Table 6-11 Instruction Set Listed Alphabetically (cont) Instruction Operation Code Cycles T Bit MOV.W @Rm,Rn (Rm) → Sign extension → Rn 0110nnnnmmmm0001 1 — MOV.W R0, @(disp,GBR) R0 → (disp × 2 + GBR) 11000001dddddddd 1 — MOV.W R0, @(disp,Rn) R0 → (disp × 2 + Rn) 10000001nnnndddd 1 — MOV.W Rm,@(R0,Rn) Rm → (R0 + Rn) 0000nnnnmmmm0101 1 — MOV.W Rm,@–Rn Rn–2 → Rn, Rm → (Rn) 0010nnnnmmmm0101 1 — MOV.W Rm,@Rn Rm → (Rn) 0010nnnnmmmm0001 1 — MOVA @(disp,PC), R0 disp × 4 + PC → R0 11000111dddddddd 1 — MOVT Rn T → Rn 0000nnnn00101001 1 Rm,Rn Rn × Rm → MAC MUL.L 0000nnnnmmmm0111 — 2 to 4*2 — 3*2 — MULS.W Rm,Rn Signed operation of Rn × 0010nnnnmmmm1111 Rm → MACL 1 to MULU.W Rm,Rn Unsigned operation of Rn × Rm → MACL 0010nnnnmmmm1110 1 to 3*2 — NEG Rm,Rn 0–Rm → Rn 0110nnnnmmmm1011 1 — NEGC Rm,Rn 0–Rm–T → Rn, Borrow →T 0110nnnnmmmm1010 1 Borrow No operation 0000000000001001 1 — NOP NOT Rm,Rn ~Rm → Rn 0110nnnnmmmm0111 1 — OR #imm,R0 R0 | imm → R0 11001011iiiiiiii 1 — OR Rm,Rn Rn | Rm → Rn 0010nnnnmmmm1011 1 — OR.B #imm, @(R0,GBR) (R0 + GBR) | imm → (R0 + GBR) 11001111iiiiiiii 3 — ROTCL Rn T ← Rn ← T 0100nnnn00100100 1 MSB ROTCR Rn T → Rn → T 0100nnnn00100101 1 LSB ROTL Rn T ← Rn ← MSB 0100nnnn00000100 1 MSB ROTR Rn LSB → Rn → T 0100nnnn00000101 1 LSB RTE Delayed branch, SSR/SPC → SR/PC 0000000000101011 4 LSB RTS Delayed branch, PR → PC 0000000000001011 2 — SETT 1→T 0000000000011000 1 1 52 Table 6-11 Instruction Set Listed Alphabetically (cont) Instruction Operation Code Cycles T Bit SHAL Rn T ← Rn ← 0 0100nnnn00100000 1 MSB SHAR Rn MSB → Rn → T 0100nnnn00100001 1 LSB SHLL Rn T ← Rn ← 0 0100nnnn00000000 1 MSB SHLL2 Rn Rn << 2 → Rn 0100nnnn00001000 1 — SHLL8 Rn Rn << 8 → Rn 0100nnnn00011000 1 — SHLL16 Rn Rn << 16 → Rn 0100nnnn00101000 1 — SHLR Rn 0 → Rn → T 0100nnnn00000001 1 LSB SHLR2 Rn Rn>>2 → Rn 0100nnnn00001001 1 — SHLR8 Rn Rn>>8 → Rn 0100nnnn00011001 1 — SHLR16 Rn Rn>>16 → Rn 0100nnnn00101001 1 — Sleep 0000000000011011 3 — SLEEP STC GBR,Rn GBR → Rn 0000nnnn00010010 1 — STC SR,Rn SR → Rn 0000nnnn00000010 1 — STC VBR,Rn VBR → Rn 0000nnnn00100010 1 — STC.L GBR,@–Rn Rn–4 → Rn, GBR → (Rn) 0100nnnn00010011 2 — STC.L SR,@–Rn Rn–4 → Rn, SR → (Rn) 0100nnnn00000011 2 — STC.L VBR,@–Rn Rn–4 → Rn, VBR → (Rn) 0100nnnn00100011 2 — STS FPSCR, Rn FPSCR → Rn 0000nnnn01101010 1 — STS FPUL, Rn FPUL → Rn 0000nnnn01011010 1 — STS MACH,Rn MACH → Rn 0000nnnn00001010 1 — STS MACL,Rn MACL → Rn 0000nnnn00011010 1 — STS PR,Rn PR → Rn 0000nnnn00101010 1 — STS.L FPSCR,@-Rn Rn-4 → Rn, FPSCR → @Rn 0100nnnn01100010 1 — STS.L FPUL,@-Rn Rn-4 → Rn, FPUL → @Rn 0100nnnn01010010 1 — STS.L MACH,@–Rn Rn–4 → Rn, MACH → (Rn) 0100nnnn00000010 1 — STS.L MACL,@–Rn Rn–4 → Rn, MACL → (Rn) 0100nnnn00010010 1 — STS.L PR,@–Rn Rn–4 → Rn, PR → (Rn) 0100nnnn00100010 1 — 53 Table 6-11 Instruction Set Listed Alphabetically (cont) Instruction Operation Code Cycles T Bit SUB Rm,Rn Rn–Rm → Rn 0011nnnnmmmm1000 1 — SUBC Rm,Rn Rn–Rm–T → Rn, Borrow → T 0011nnnnmmmm1010 1 Borrow SUBV Rm,Rn Rn–Rm → Rn, Underflow →T 0011nnnnmmmm1011 1 Underflow SWAP.B Rm,Rn Rm → Swap the two lowest-order bytes → Rn 0110nnnnmmmm1000 1 — SWAP.W Rm,Rn Rm → Swap two consecutive words → Rn 0110nnnnmmmm1001 1 — TAS.B @Rn If (Rn) is 0, 1 → T; 1 → MSB of (Rn) 0100nnnn00011011 4 Test result TST #imm,R0 R0 & imm; if the result is 0, 11001000iiiiiiii 1→T 1 Test result TST Rm,Rn Rn & Rm; if the result is 0, 1→T 0010nnnnmmmm1000 1 Test result TST.B #imm, @(R0,GBR) (R0 + GBR) & imm; if the result is 0, 1 → T 11001100iiiiiiii 3 Test result XOR #imm,R0 R0 ^ imm → R0 11001010iiiiiiii 1 — XOR Rm,Rn Rn ^ Rm → Rn 0010nnnnmmmm1010 1 — XOR.B #imm, @(R0,GBR) (R0 + GBR) ^ imm → (R0 + GBR) 11001110iiiiiiii 3 — XTRCT Rm,Rn Rm: Middle 32 bits of Rn → Rn 0010nnnnmmmm1101 1 — Notes: 1. The normal minimum number of execution cycles. 2. One state when it does not branch. 54 Section 7 Instruction Descriptions 7.1 Sample Description (Name): Classification This section describes instructions in alphabetical order using the format shown below in section 7.1.1. The actual descriptions begin at section 7.2.2. Class: Indicates if the instruction is a delayed branch instruction or interrupt disabled instruction Format Abstract Code Cycle T Bit Assembler input format; imm and disp are numbers, expressions, or symbols A brief description of operation Displayed in order MSB ↔ LSB Number of cycles when there is no wait state The value of T bit after the instruction is executed Description: Description of operation Notes: Notes on using the instruction Operation: Operation written in C language. The following resources should be used. • Reads data of each length from address Addr. An address error will occur if word data is read from an address other than 2n or if longword data is read from an address other than 4n: unsigned char Read_Byte(unsigned long Addr); unsigned short Read_Word(unsigned long Addr); unsigned long Read_Long(unsigned long Addr); • Writes data of each length to address Addr. An address error will occur if word data is written to an address other than 2n or if longword data is written to an address other than 4n: unsigned char Write_Byte(unsigned long Addr, unsigned long Data); unsigned short Write_Word(unsigned long Addr, unsigned long Data); unsigned long Write_Long(unsigned long Addr, unsigned long Data); • Starts execution from the slot instruction located at an address (Addr – 4). For Delay_Slot (4), execution starts from an instruction at address 0 rather than address 4. When execution moves from this function to one of the following instructions and one of the listed instructions precedes it, it will be considered an illegal slot instruction (the listed instructions become illegal slot instructions when used as delay slot instructions): BF, BT, BRA, BSR, JMP, JSR, RTS, RTE, TRAPA, BF/S, BT/S, BRAF, BSRF 55 Delay_Slot(unsigned long Addr); If the address (Addr_4) instruction is 32-bit, 2 is returned; 0 is returned if it is 16-bit. • List registers: unsigned long R[16]; unsigned long SR,GBR,VBR; unsigned long MACH,MACL,PR; unsigned long PC; • Definition of SR structures: struct SR0 { unsigned long dummy0:4; unsigned long RC0:12; unsigned long dummy1:4; unsigned long DMY0:1; unsigned long DMX0:1; unsigned long M0:1; unsigned long Q0:1; unsigned long I0:4; unsigned long RF10:1; unsigned long RF00:1; unsigned long S0:1; unsigned long T0:1; }; • Definition of bits in SR: #define M ((*(struct SR0 *)(&SR)).M0) #define Q ((*(struct SR0 *)(&SR)).Q0) #define S ((*(struct SR0 *)(&SR)).S0) #define T ((*(struct SR0 *)(&SR)).T0) #define RF1 ((*struct SRO *)(&SR)).RF10) #define RF0 ((*struct SRO *)(&SR)).RF00) • Error display function: Error( char *er ); 56 The PC should point to the location four bytes after the current instruction. Therefore, PC = 4; means the instruction starts execution from address 0, not address 4. Examples: Examples are written in assembler mnemonics and describe status before and after executing the instruction. Characters in italics such as .align are assembler control instructions (listed below). For more information, see the Cross Assembler User Manual. .org .data.w .data.l .sdata .align 2 .align 4 .arepeat 16 .arepeat 32 .aendr Location counter set Securing integer word data Securing integer longword data Securing string data 2-byte boundary alignment 2-byte boundary alignment 16-repeat expansion 32-repeat expansion End of repeat expansion of specified number Note that the SH series cross assembler version 1.0 does not support the conditional assembler functions. Notes: 1. In addressing modes that use the displacements listed below (disp), the assembler statements in this manual show the value prior to scaling (×1, ×2, and ×4) according to the operand size. This is done to clarify the LSI operation. Actual assembler statements should follow the rules of the assembler in question. @(disp:4, Rn); Indirect register addressing with displacement @(disp:8, GBR); Indirect GBR addressing with displacement @(disp:8, PC); Indirect PC addressing with displacement disp:8, disp:12:; PC relative addressing 2. 16-bit instruction code that is not assigned as instructions is handled as an ordinary illegal instruction and produces illegal instruction exception processing. Also, if the FPU is put into stop status by the module stop bit, floating-point instructions and FPU-related CPU instructions are handled as illegal instructions. 3. An ordinary illegal instruction or branched instruction (i.e., an illegal slot instruction) that follows a BRA, BT/S or another delayed branch instruction will cause illegal instruction exception processing. Example 1: .... BRA .data.w .... LABEL H'FFFF ← Illegal slot instruction [H'FFFF is an ordinary illegal instruction from the start] Example 2: RTE BT/S LABEL ← Illegal slot instruction 57 7.2 CPU Instruction 7.2.1 ADD (ADD Binary): Arithmetic Instruction Format Abstract Code Cycle T Bit ADD Rm,Rn Rm + Rn → Rn 0011nnnnmmmm1100 1 — ADD #imm,Rn Rn + imm → Rn 0111nnnniiiiiiii 1 — Description: Adds general register Rn data to Rm data, and stores the result in Rn. 8-bit immediate data can be added instead of Rm data. Since the 8-bit immediate data is sign-extended to 32 bits, this instruction can add and subtract immediate data. Operation: ADD(long m,long n) /* ADD Rm,Rn */ { R[n]+=R[m]; PC+=2; } ADDI(long i,long n) /* ADD #imm,Rn */ { if ((i&0x80)==0) R[n]+=(0x000000FF & (long)i); else R[n]+=(0xFFFFFF00 | (long)i); PC+=2; } Examples: ADD R0,R1 ;Before execution: R0 = H'7FFFFFFF, R1 = H'00000001 ;After execution: ADD #H'01,R2 ;Before execution: R2 = H'00000000 ;After execution: ADD #H'FE,R3 R2 = H'00000001 ;Before execution: R3 = H'00000001 ;After execution: 58 R1 = H'80000000 R3 = H'FFFFFFFF 7.2.2 ADDC (ADD with Carry): Arithmetic Instruction Format ADDC Rm,Rn Abstract Code Cycle T Bit Rn + Rm + T → Rn, carry → T 0011nnnnmmmm1110 1 Carry Description: Adds Rm data and the T bit to general register Rn data, and stores the result in Rn. The T bit changes according to the result. This instruction can add data that has more than 32 bits. Operation: ADDC (long m,long n) /* ADDC Rm,Rn */ { unsigned long tmp0,tmp1; tmp1=R[n]+R[m]; tmp0=R[n]; R[n]=tmp1+T; if (tmp0>tmp1) T=1; else T=0; if (tmp1>R[n]) T=1; PC+=2; } Examples: ;R0:R1 (64 bits) + R2:R3 (64 bits) = R0:R1 (64 bits) CLRT ADDC R3,R1 ;Before execution: ;After execution: ADDC R2,R0 ;Before execution: ;After execution: T = 0, R1 = H'00000001, R3 = H'FFFFFFFF T = 1, R1 = H'0000000 T = 1, R0 = H'00000000, R2 = H'00000000 T = 0, R0 = H'00000001 59 7.2.3 ADDV (ADD with V Flag Overflow Check): Arithmetic Instruction Format Abstract Code Cycle T Bit ADDV Rm,Rn Rn + Rm → Rn, overflow → T 0011nnnnmmmm1111 1 Overflow Description: Adds general register Rn data to Rm data, and stores the result in Rn. If an overflow occurs, the T bit is set to 1. Operation: ADDV(long m,long n) /*ADDV Rm,Rn */ { long dest,src,ans; if ((long)R[n]>=0) dest=0; else dest=1; if ((long)R[m]>=0) src=0; else src=1; src+=dest; R[n]+=R[m]; if ((long)R[n]>=0) ans=0; else ans=1; ans+=dest; if (src==0 || src==2) { if (ans==1) T=1; else T=0; } else T=0; PC+=2; } Examples: ADDV R0,R1 ;Before execution: ;After execution: ADDV R0,R1 ;Before execution: ;After execution: 60 R0 = H'00000001, R1 = H'7FFFFFFE, T = 0 R1 = H'7FFFFFFF, T = 0 R0 = H'00000002, R1 = H'7FFFFFFE, T = 0 R1 = H'80000000, T = 1 7.2.4 AND (AND Logical): Logic Operation Instruction Format AND Rm,Rn AND #imm,R0 AND.B #imm, @(R0,GBR) Abstract Code Cycle T Bit Rn & Rm → Rn 0010nnnnmmmm1001 1 — R0 & imm → R0 11001001iiiiiiii 1 — (R0 + GBR) & imm → (R0 + GBR) 11001101iiiiiiii 3 — Description: Logically ANDs the contents of general registers Rn and Rm, and stores the result in Rn. The contents of general register R0 can be ANDed with zero-extended 8-bit immediate data. 8-bit memory data pointed to by GBR relative addressing can be ANDed with 8-bit immediate data. Note: After AND #imm, R0 is executed and the upper 24 bits of R0 are always cleared to 0. Operation: AND(long m,long n) /* AND Rm,Rn */ { R[n]&=R[m] PC+=2; } ANDI(long i) /* AND #imm,R0 */ { R[0]&=(0x000000FF & (long)i); PC+=2; } ANDM(long i) /* AND.B #imm,@(R0,GBR) */ { long temp; temp=(long)Read_Byte(GBR+R[0]); temp&=(0x000000FF & (long)i); Write_Byte(GBR+R[0],temp); PC+=2; } 61 Examples: AND R0,R1 ;Before execution: ;After execution: AND #H'0F,R0 ;Before execution: ;After execution: AND.B #H'80,@(R0,GBR) ;Before execution: ;After execution: 62 R0 = H'AAAAAAAA, R1 = H'55555555 R1 = H'00000000 R0 = H'FFFFFFFF R0 = H'0000000F @(R0,GBR) = H'A5 @(R0,GBR) = H'80 7.2.5 BF (Branch if False): Branch Instruction Format Abstract Code Cycle T Bit BF When T = 0, disp × 2 + PC → PC; When T = 1, nop 10001011dddddddd 3/1 — label Description: Reads the T bit, and conditionally branches. If T = 0, it branches to the branch destination address. If T = 1, BF executes the next instruction. The branch destination is an address specified by PC + displacement. However, in this case it is used for address calculation. The PC is the address 4 bytes after this instruction. The 8-bit displacement is sign-extended and doubled. Consequently, the relative interval from the branch destination is –256 to +254 bytes. If the displacement is too short to reach the branch destination, use BF with the BRA instruction or the like. Note: When branching, three cycles; when not branching, one cycle. Operation: BF(long d) /* BF disp */ { long disp; if ((d&0x80)==0) disp=(0x000000FF & (long)d); else disp=(0xFFFFFF00 | (long)d); if (T==0) PC=PC+(disp<<1); else PC+=2; } Example: ;T is always cleared to 0 CLRT TRGET_F: BT TRGET_T ;Does not branch, because T = 0 BF TRGET_F ;Branches to TRGET_F, because T = 0 NOP ; NOP .......... ;← The PC location is used to calculate the branch destination address of the BF instruction ;← Branch destination of the BF instruction 63 7.2.6 BF/S (Branch if False with Delay Slot): Branch Instruction Format Abstract Code Cycle T Bit BF/S label When T = 0, disp × 2+ PC → PC; When T = 1, nop 10001111dddddddd 2/1 — Description: Reads the T bit and conditionally branches. If T = 0, it branches after executing the next instruction. If T = 1, BF/S executes the next instruction. The branch destination is an address specified by PC + displacement. However, in this case it is used for address calculation. The PC is the address 4 bytes after this instruction. The 8-bit displacement is sign-extended and doubled. Consequently, the relative interval from the branch destination is –256 to +254 bytes. If the displacement is too short to reach the branch destination, use BF with the BRA instruction or the like. Note: Since this is a delay branch instruction, the instruction immediately following is executed before the branch. No interrupts and address errors are accepted between this instruction and the next instruction. When the instruction immediately following is a branch instruction, it is recognized as an illegal slot instruction. When branching, this is a twocycle instruction; when not branching, one cycle. Operation: BFS(long d) /* BFS disp */ { long disp; unsigned long temp; temp=PC; if ((d&0x80)==0) disp=(0x000000FF & (long)d); else disp=(0xFFFFFF00 | (long)d); if (T==0) { PC=PC+(disp<<1); Delay_Slot(temp+2); } else PC+=2; } 64 Example: CLRT ;T is always 0 BT/S TRGET_T ;Does not branch, because T = 0 NOP ; BF/S TRGET_F ;Branches to TRGET_F, because T = 0 ADD ;Executed before branch. R0,R1 NOP .......... TRGET_F: ;← The PC location is used to calculate the branch destination address of the BF/S instruction ;← Branch destination of the BF/S instruction Note: When a delayed branch instruction is used, the branching operation takes place after the slot instruction is executed, but the execution of instructions (register update, etc.) takes place in the sequence delayed branch instruction → delayed slot instruction. For example, even if a delayed slot instruction is used to change the register where the branch destination address is stored, the register content previous to the change will be used as the branch destination address. 65 7.2.7 BRA (Branch): Branch Instruction Format Abstract Code Cycle T Bit BRA label disp × 2 + PC → PC 1010dddddddddddd 2 — Description: Branches unconditionally after executing the instruction following this BRA instruction. The branch destination is an address specified by PC + displacement However, in this case it is used for address calculation. The PC is the address 4 bytes after this instruction. The 12bit displacement is sign-extended and doubled. Consequently, the relative interval from the branch destination is –4096 to +4094 bytes. If the displacement is too short to reach the branch destination, this instruction must be changed to the JMP instruction. Here, a MOV instruction must be used to transfer the destination address to a register. Note: Since this is a delayed branch instruction, the instruction after BRA is executed before branching. No interrupts and address errors are accepted between this instruction and the next instruction. If the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. Operation: BRA(long d) /* BRA disp */ { unsigned long temp; long disp; if ((d&0x800)==0) disp=(0x00000FFF & (long) d); else disp=(0xFFFFF000 | (long) d); temp=PC; PC=PC+(disp<<1); Delay_Slot(temp+2); } Example: BRA TRGET ADD R0,R1 NOP .......... TRGET: ;Branches to TRGET ;Executes ADD before branching ;← The PC location is used to calculate the branch destination address of the BRA instruction ;← Branch destination of the BRA instruction Note: When a delayed branch instruction is used, the branching operation takes place after the slot instruction is executed, but the execution of instructions (register update, etc.) takes 66 place in the sequence delayed branch instruction → delayed slot instruction. For example, even if a delayed slot instruction is used to change the register where the branch destination address is stored, the register content previous to the change will be used as the branch destination address. 7.2.8 BRAF (Branch Far): Branch Instruction Format Abstract Code Cycle T Bit BRAF Rm Rm + PC → PC 0000mmmm00100011 2 — Description: Branches unconditionally. The branch destination is PC + the 32-bit contents of the general register Rm. However, in this case it is used for address calculation. The PC is the address 4 bytes after this instruction. Note: Since this is a delayed branch instruction, the instruction after BRAF is executed before branching. No interrupts and address errors are accepted between this instruction and the next instruction. If the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. Operation: BRAF(long m) /* BRAF Rm */ { unsigned long temp; temp=PC; PC+=R[m]; Delay_Slot(temp+2); } Example: MOV.L #(TARGET-BSRF_PC),R0 BRA TRGET ADD R0,R1 BRAF_PC: ;Sets displacement. ;Branches to TARGET ;Executes ADD before branching ;← The PC location is used to calculate the branch destination address of the BRAF instruction NOP .................... TARGET: ;← Branch destination of the BRAF instruction 67 Note: When a delayed branch instruction is used, the branching operation takes place after the slot instruction is executed, but the execution of instructions (register update, etc.) takes place in the sequence delayed branch instruction → delayed slot instruction. For example, even if a delayed slot instruction is used to change the register where the branch destination address is stored, the register content previous to the change will be used as the branch destination address. 7.2.9 BSR (Branch to Subroutine): Branch Instruction Format Abstract Code Cycle T Bit BSR PC → PR, disp × 2+ PC → PC 1011dddddddddddd 2 — label Description: Branches to the subroutine procedure at a specified address. The PC value is stored in the PR, and the program branches to an address specified by PC + displacement However, in this case it is used for address calculation. The PC is the address 4 bytes after this instruction. The 12-bit displacement is sign-extended and doubled. Consequently, the relative interval from the branch destination is –4096 to +4094 bytes. If the displacement is too short to reach the branch destination, the JSR instruction must be used instead. With JSR, the destination address must be transferred to a register by using the MOV instruction. This BSR instruction and the RTS instruction are used together for a subroutine procedure call. Note: Since this is a delayed branch instruction, the instruction after BSR is executed before branching. No interrupts and address errors are accepted between this instruction and the next instruction. If the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. Operation: BSR(long d) /* BSR disp */ { long disp; if ((d&0x800)==0) disp=(0x00000FFF & (long) d); else disp=(0xFFFFF000 | (long) d); PR=PC+Is_32bit_Inst(PR+2); PC=PC+(disp<<1); Delay_Slot(PR+2); } 68 Example: BSR TRGET ;Branches to TRGET MOV R3,R4 ;Executes the MOV instruction before branching ADD R0,R1 ;← The PC location is used to calculate the branch destination address of the BSR instruction (return address for when the subroutine procedure is completed (PR data)) ....... ....... ;← Procedure entrance TRGET: MOV R2,R3 ; #1,R0 ;Executes MOV before branching ;Returns to the above ADD instruction RTS MOV Note: When a delayed branch instruction is used, the branching operation takes place after the slot instruction is executed, but the execution of instructions (register update, etc.) takes place in the sequence delayed branch instruction → delayed slot instruction. For example, even if a delayed slot instruction is used to change the register where the branch destination address is stored, the register content previous to the change will be used as the branch destination address. 69 7.2.10 BSRF (Branch to Subroutine Far): Branch Instruction Format BSRF Rm Abstract PC → PR, Rm + PC → PC Code 0000mmmm00000011 Cycle 2 T Bit — Description: Branches to the subroutine procedure at a specified address after executing the instruction following this BSRF instruction. The PC value is stored in the PR. The branch destination is PC + the 32-bit contents of the general register Rm. However, in this case it is used for address calculation. The PC is the address 4 bytes after this instruction. Used as a subroutine procedure call in combination with RTS. Note: Since this is a delayed branch instruction, the instruction after BSR is executed before branching. No interrupts and address errors are accepted between this instruction and the next instruction. If the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. Operation: BSRF(long m) /* BSRF Rm */ { PR=PC+Is_32bit_Inst(PR+2); PC+=R[m]; Delay_Slot(PR+2); } Example: MOV.L #(TARGET-BSRF_PC),R0 ;Sets displacement. BRSF R0 ;Branches to TARGET MOV R3,R4 ;Executes the MOV instruction before branching ;← The PC location is used to calculate the branch destination with BSRF. BSRF_PC: ADD R0,R1 ..... ..... ;←Procedure entrance TARGET: MOV R2,R3 ;Returns to the above ADD instruction RTS MOV ; #1,R0 ;Executes MOV before branching Note: When a delayed branch instruction is used, the branching operation takes place after the slot instruction is executed, but the execution of instructions (register update, etc.) takes place in the sequence delayed branch instruction → delayed slot instruction. For example, 70 even if a delayed slot instruction is used to change the register where the branch destination address is stored, the register content previous to the change will be used as the branch destination address. 7.2.11 BT (Branch if True): Branch Instruction Format Abstract Code Cycle T Bit BT label When T = 1, disp × 2 + PC → PC; When T = 0, nop 10001001dddddddd 3/1 — Description: Reads the T bit, and conditionally branches. If T = 1, BT branches. If T = 0, BT executes the next instruction. The branch destination is an address specified by PC + displacement. However, in this case it is used for address calculation. The PC is the address 4 bytes after this instruction. The 8-bit displacement is sign-extended and doubled. Consequently, the relative interval from the branch destination is –256 to +254 bytes. If the displacement is too short to reach the branch destination, use BT with the BRA instruction or the like. Note: When branching, requires three cycles; when not branching, one cycle. Operation: BT(long d) /* BT disp */ { long disp; if ((d&0x80)==0) disp=(0x000000FF & (long)d); else disp=(0xFFFFFF00 | (long)d); if (T==1) PC=PC+(disp<<1); else PC+=2; } Example: ;T is always 1 SETT BF TRGET_F ;Does not branch, because T = 1 BT TRGET_T ;Branches to TRGET_T, because T = 1 NOP ; NOP ;← The PC location is used to calculate the branch .......... address of the BT instruction ;← Branch destination of the BT instruction destination TRGET_T: 71 7.2.12 BT/S (Branch if True with Delay Slot): Branch Instruction Format Abstract Code Cycle T Bit BT/S label When T = 1, disp × 2 + PC → PC; When T = 0, nop 10001101dddddddd 2/1 — Description: Reads the T bit and conditionally branches. If T = 1, BT/S branches after the following instruction executes. If T = 0, BT/S executes the next instruction. The branch destination is an address specified by PC + displacement. However, in this case it is used for address calculation. The PC is the address 4 bytes after this instruction. The 8-bit displacement is sign-extended and doubled. Consequently, the relative interval from the branch destination is –256 to +254 bytes. If the displacement is too short to reach the branch destination, use BT/S with the BRA instruction or the like. Note: Since this is a delay branch instruction, the instruction immediately following is executed before the branch. No interrupts and address errors are accepted between this instruction and the next instruction. When the immediately following instruction is a branch instruction, it is recognized as an illegal slot instruction. When branching, requires two cycles; when not branching, one cycle. Operation: BTS(long d) /* BTS disp */ { long disp; unsigned long temp; temp=PC; if ((d&0x80)==0) disp=(0x000000FF & (long)d); else disp=(0xFFFFFF00 | (long)d); if (T==1) { PC=PC+(disp<<1); Delay_Slot(temp+2); } else PC+=2; } 72 Example: SETT ;T is always 1 BF/S TARGET_F ;Does not branch, because T = 1 NOP ; BT/S TARGET_T ;Branches to TARGET, because T = 1 ADD ;Executes before branching. R0,R1 NOP .......... TARGET_T: ;← The PC location is used to calculate the branch destination address of the BT/S instruction ;← Branch destination of the BT/S instruction Note: When a delayed branch instruction is used, the branching operation takes place after the slot instruction is executed, but the execution of instructions (register update, etc.) takes place in the sequence delayed branch instruction → delayed slot instruction. For example, even if a delayed slot instruction is used to change the register where the branch destination address is stored, the register content previous to the change will be used as the branch destination address. 73 7.2.13 CLRMAC (Clear MAC Register): System Control Instruction Format Abstract Code Cycle T Bit CLRMAC 0 → MACH, MACL 0000000000101000 1 — Description: Clear the MACH and MACL Register. Operation: CLRMAC() /* CLRMAC */ { MACH=0; MACL=0; PC+=2; } Example: ;Clears and initializes the MAC register CLRMAC MAC.W @R0+,@R1+ ;Multiply and accumulate operation MAC.W @R0+,@R1+ ; 74 7.2.14 CLRT (Clear T Bit): System Control Instruction Format Abstract Code Cycle T Bit CLRT 0→T 0000000000001000 1 0 Description: Clears the T bit. Operation: CLRT() /* CLRT */ { T=0; PC+=2; } Example: CLRT ;Before execution: ;After execution: T=1 T=0 75 7.2.15 CMP/cond (Compare Conditionally): Arithmetic Instruction Format Abstract Code Cycle T Bit 0011nnnnmmmm0000 1 Comparison result CMP/EQ Rm,Rn When Rn = Rm, 1 → T CMP/GE Rm,Rn When signed and Rn 1→T Rm, 0011nnnnmmmm0011 1 Comparison result CMP/GT Rm,Rn When signed and Rn > Rm, 1→T 0011nnnnmmmm0111 1 Comparison result CMP/HI Rm,Rn When unsigned and Rn > Rm, 1→T 0011nnnnmmmm0110 1 Comparison result CMP/HS Rm,Rn When unsigned and Rn Rm, 1→T 0011nnnnmmmm0010 1 Comparison result CMP/PL Rn When Rn > 0, 1 → T 0100nnnn00010101 1 Comparison result CMP/PZ Rn When Rn 0, 1 → T 0100nnnn00010001 1 Comparison result CMP/STR Rm,Rn When a byte in Rn equals a byte in Rm, 1 → T 0010nnnnmmmm1100 1 Comparison result CMP/EQ When R0 = imm, 1 → T 10001000iiiiiiii 1 Comparison result #imm,R0 Description: Compares general register Rn data with Rm data, and sets the T bit to 1 if a specified condition (cond) is satisfied. The T bit is cleared to 0 if the condition is not satisfied. The Rn data does not change. The following eight conditions can be specified. Conditions PZ and PL are the results of comparisons between Rn and 0. Sign-extended 8-bit immediate data can also be compared with R0 by using condition EQ. Here, R0 data does not change. Table 7.2 shows the mnemonics for the conditions. Table 7.2 CMP Mnemonics Mnemonics Condition CMP/EQ Rm,Rn If Rn = Rm, T = 1 CMP/GE Rm,Rn If Rn CMP/GT Rm,Rn If Rn > Rm with signed data, T = 1 CMP/HI Rm,Rn If Rn > Rm with unsigned data, T = 1 CMP/HS Rm,Rn If Rn CMP/PL Rn If Rn > 0, T = 1 CMP/PZ Rn If Rn CMP/STR Rm,Rn If a byte in Rn equals a byte in Rm, T = 1 CMP/EQ #imm,R0 If R0 = imm, T = 1 76 Rm with signed data, T = 1 Rm with unsigned data, T = 1 0, T = 1 Operation: CMPEQ(long m,long n) /* CMP_EQ Rm,Rn */ { if (R[n]==R[m]) T=1; else T=0; PC+=2; } CMPGE(long m,long n) /* CMP_GE Rm,Rn */ { if ((long)R[n]>=(long)R[m]) T=1; else T=0; PC+=2; } CMPGT(long m,long n) /* CMP_GT Rm,Rn */ { if ((long)R[n]>(long)R[m]) T=1; else T=0; PC+=2; } CMPHI(long m,long n) /* CMP_HI Rm,Rn */ { if ((unsigned long)R[n]>(unsigned long)R[m]) T=1; else T=0; PC+=2; } CMPHS(long m,long n) /* CMP_HS Rm,Rn */ { if ((unsigned long)R[n]>=(unsigned long)R[m]) T=1; else T=0; PC+=2; } 77 CMPPL(long n) /* CMP_PL Rn */ { if ((long)R[n]>0) T=1; else T=0; PC+=2; } CMPPZ(long n) /* CMP_PZ Rn */ { if ((long)R[n]>=0) T=1; else T=0; PC+=2; } CMPSTR(long m,long n) /* CMP_STR Rm,Rn */ { unsigned long temp; long HH,HL,LH,LL; temp=R[n]^R[m]; HH=(temp>>12)&0x000000FF; HL=(temp>>8)&0x000000FF; LH=(temp>>4)&0x000000FF; LL=temp&0x000000FF; HH=HH&&HL&&LH&&LL; if (HH==0) T=1; else T=0; PC+=2; } 78 CMPIM(long i) /* CMP_EQ #imm,R0 */ { long imm; if ((i&0x80)==0) imm=(0x000000FF & (long i)); else imm=(0xFFFFFF00 | (long i)); if (R[0]==imm) T=1; else T=0; PC+=2; } Example: R0,R1 ;R0 = H'7FFFFFFF, R1 = H'80000000 BT TRGET_T ;Does not branch because T = 0 CMP/HS R0,R1 ;R0 = H'7FFFFFFF, R1 = H'80000000 BT TRGET_T ;Branches because T = 1 CMP/STR R2,R3 ;R2 = “ABCD”, R3 = “XYCZ” BT TRGET_T ;Branches because T = 1 CMP/GE 79 7.2.16 DIV0S (Divide Step 0 as Signed): Arithmetic Instruction Format Abstract Code Cycle T Bit DIV0S Rm,Rn MSB of Rn → Q, MSB of Rm → M, M^Q → T 0010nnnnmmmm0111 1 Calculation result Description: DIV0S is an initialization instruction for signed division. It finds the quotient by repeatedly dividing in combination with the DIV1 or another instruction that divides for each bit after this instruction. See the description given with DIV1 for more information. Operation: DIV0S(long m,long n) /* DIV0S Rm,Rn */ { if ((R[n]&0x80000000)==0) Q=0; else Q=1; if ((R[m]&0x80000000)==0) M=0; else M=1; T=!(M==Q); PC+=2; } Example: See DIV1. 80 7.2.17 DIV0U (Divide Step 0 as Unsigned): Arithmetic Instruction Format Abstract Code Cycle T Bit DIV0U 0 → M/Q/T 0000000000011001 1 0 Description: DIV0U is an initialization instruction for unsigned division. It finds the quotient by repeatedly dividing in combination with the DIV1 or another instruction that divides for each bit after this instruction. See the description given with DIV1 for more information. Operation: DIV0U() /* DIV0U */ { M=Q=T=0; PC+=2; } Example: See DIV1. 81 7.2.18 DIV1 (Divide 1 Step): Arithmetic Instruction Format Abstract Code Cycle T Bit DIV1 Rm,Rn 1 step division (Rn ÷ Rm) 0011nnnnmmmm0100 1 Calculation result Description: Uses single-step division to divide one bit of the 32-bit data in general register Rn (dividend) by Rm data (divisor). It finds a quotient through repetition either independently or used in combination with other instructions. During this repetition, do not rewrite the specified register or the M, Q, and T bits. In one-step division, the dividend is shifted one bit left, the divisor is subtracted and the quotient bit reflected in the Q bit according to the status (positive or negative). To find the remainder in a division, first find the quotient using a DIV1 instruction, then find the remainder as follows: (dividend) – (divisor) × (quotient) = (remainder) Zero division, overflow detection, and remainder operation are not supported. Check for zero division and overflow division before dividing. Find the remainder by first finding the sum of the divisor and the quotient obtained and then subtracting it from the dividend. That is, first initialize with DIV0S or DIV0U. Repeat DIV1 for each bit of the divisor to obtain the quotient. When the quotient requires 17 or more bits, place ROTCL before DIV1. For the division sequence, see the following examples. 82 Operation: DIV1(long m,long n) /* DIV1 Rm,Rn */ { unsigned long tmp0; unsigned char old_q,tmp1; old_q=Q; Q=(unsigned char)((0x80000000 & R[n])!=0); R[n]<<=1; R[n]|=(unsigned long)T; switch(old_q){ case 0:switch(M){ case 0:tmp0=R[n]; R[n]-=R[m]; tmp1=(R[n]>tmp0); switch(Q){ case 0:Q=tmp1; break; case 1:Q=(unsigned char)(tmp1==0); break; } break; case 1:tmp0=R[n]; R[n]+=R[m]; tmp1=(R[n]<tmp0); switch(Q){ case 0:Q=(unsigned char)(tmp1==0); break; case 1:Q=tmp1; break; } break; } break; 83 case 1:switch(M){ case 0:tmp0=R[n]; R[n]+=R[m]; tmp1=(R[n]<tmp0); switch(Q){ case 0:Q=tmp1; break; case 1:Q=(unsigned char)(tmp1==0); break; } break; case 1:tmp0=R[n]; R[n]-=R[m]; tmp1=(R[n]>tmp0); switch(Q){ case 0:Q=(unsigned char)(tmp1==0); break; case 1:Q=tmp1; break; } break; } break; } T=(Q==M); PC+=2; } 84 Example 1: ;R1 (32 bits) / R0 (16 bits) = R1 (16 bits):Unsigned SHLL16 R0 ;Upper 16 bits = divisor, lower 16 bits = 0 TST R0,R0 ;Zero division check BT ZERO_DIV ; CMP/HS R0,R1 ;Overflow check BT OVER_DIV ; ;Flag initialization DIV0U .arepeat 16 ; DIV1 R0,R1 ;Repeat 16 times ROTCL R1 ; EXTU.W R1,R1 ;R1 = Quotient .aendr ; Example 2: ;R1:R2 (64 bits)/R0 (32 bits) = R2 (32 bits):Unsigned TST R0,R0 BT ZERO_DIV ; CMP/HS ;R0,R1 BT OVER_DIV ; ;Zero division check ;Overflow check ;Flag initialization DIV0U .arepeat 32 ROTCL R2 ;Repeat 32 times DIV1 R0,R1 ; .aendr ROTCL ; ; R2 ;R2 = Quotient 85 Example 3: ;R1 (16 bits)/R0 (16 bits) = R1 (16 bits):Signed SHLL16 R0 ;Upper 16 bits = divisor, lower 16 bits = 0 EXTS.W R1,R1 ;Sign-extends the dividend to 32 bits XOR R2,R2 ;R2 = 0 MOV R1,R3 ; ROTCL R3 ; SUBC R2,R1 ;Decrements if the dividend is negative DIV0S R0,R1 ;Flag initialization .arepeat 16 ; DIV1 R0,R1 ;Repeat 16 times EXTS.W R1,R1 ; ROTCL R1 ;R1 = quotient (one’s complement) ADDC R2,R1 ;Increments and takes the two’s complement if the MSB of the R1,R1 quotient is 1 ;R1 = quotient (two’s complement) .aendr EXTS.W Example 4: ;R2 (32 bits) / R0 (32 bits) = R2 (32 bits):Signed MOV R2,R3 ; ROTCL R3 ; SUBC R1,R1 ;Sign-extends the dividend to 64 bits (R1:R2) XOR R3,R3 ;R3 = 0 SUBC R3,R2 ;Decrements and takes the one’s complement if the dividend is DIV0S R0,R1 negative ;Flag initialization .arepeat 32 ; ROTCL R2 ;Repeat 32 times DIV1 R0,R1 ; ROTCL R2 ;R2 = Quotient (one’s complement) ADDC R3,R2 ;Increments and takes the two’s complement if the MSB of the .aendr ; quotient is 1. R2 = Quotient (two’s complement) 86 7.2.19 DMULS.L (Double-Length Multiply as Signed): Arithmetic Instruction Format Abstract Code DMULS.L Rm, Rn With sign, Rn × Rm → MACH, MACL 0011nnnnmmmm1101 Cycle T Bit 2 to 4 — Description: Performs 32-bit multiplication of the contents of general registers Rn and Rm, and stores the 64-bit results in the MACL and MACH register. The operation is a signed arithmetic operation. Operation: DMULS(long m,long n) /* DMULS.L Rm,Rn */ { unsigned long RnL,RnH,RmL,RmH,Res0,Res1,Res2; unsigned long temp0,temp1,temp2,temp3; long tempm,tempn,fnLmL; tempn=(long)R[n]; tempm=(long)R[m]; if (tempn<0) tempn=0-tempn; if (tempm<0) tempm=0-tempm; if ((long)(R[n]^R[m])<0) fnLmL=-1; else fnLmL=0; temp1=(unsigned long)tempn; temp2=(unsigned long)tempm; RnL=temp1&0x0000FFFF; RnH=(temp1>>16)&0x0000FFFF; RmL=temp2&0x0000FFFF; RmH=(temp2>>16)&0x0000FFFF; temp0=RmL*RnL; temp1=RmH*RnL; temp2=RmL*RnH; temp3=RmH*RnH; 87 Res2=0 Res1=temp1+temp2; if (Res1<temp1) Res2+=0x00010000; temp1=(Res1<<16)&0xFFFF0000; Res0=temp0+temp1; if (Res0<temp0) Res2++; Res2=Res2+((Res1>>16)&0x0000FFFF)+temp3; if (fnLmL<0) { Res2=~Res2; if (Res0==0) Res2++; else Res0=(~Res0)+1; } MACH=Res2; MACL=Res0; PC+=2; } Example: DMULS.L R0,R1 ;Before execution: R0 = H'FFFFFFFE, R1 = H'00005555 ;After execution: MACH = H'FFFFFFFF, MACL = H'FFFF5556 STS MACH,R0 ;Operation result (top) STS MACL,R0 ;Operation result (bottom) 88 7.2.20 DMULU.L (Double-Length Multiply as Unsigned): Arithmetic Instruction Format Abstract Code Cycle T Bit DMULU.L Rm, Rn Without sign, Rn × Rm → MACH, MACL 0011nnnnmmmm0101 2 to 4 — Description: Performs 32-bit multiplication of the contents of general registers Rn and Rm, and stores the 64-bit results in the MACL and MACH register. The operation is an unsigned arithmetic operation. Operation: DMULU(long m,long n) /* DMULU.L Rm,Rn */ { unsigned long RnL,RnH,RmL,RmH,Res0,Res1,Res2; unsigned long temp0,temp1,temp2,temp3; RnL=R[n]&0x0000FFFF; RnH=(R[n]>>16)&0x0000FFFF; RmL=R[m]&0x0000FFFF; RmH=(R[m]>>16)&0x0000FFFF; temp0=RmL*RnL; temp1=RmH*RnL; temp2=RmL*RnH; temp3=RmH*RnH; Res2=0 Res1=temp1+temp2; if (Res1<temp1) Res2+=0x00010000; temp1=(Res1<<16)&0xFFFF0000; Res0=temp0+temp1; if (Res0<temp0) Res2++; Res2=Res2+((Res1>>16)&0x0000FFFF)+temp3; MACH=Res2; 89 MACL=Res0; PC+=2; } Example: DMULU.L R0,R1 ;Before execution:R0 = H'FFFFFFFE, R1 = H'00005555 ;After execution: MACH = H'FFFFFFFF, MACL = H'FFFF5556 STS MACH,R0 ;Operation result (top) STS MACL,R0 ;Operation result (bottom) 90 7.2.21 DT (Decrement and Test): Arithmetic Instruction Format Abstract Code Cycle T Bit DT Rn Rn – 1 → Rn; When Rn is 0, 1 → T, when Rn is nonzero, 0 → T 0100nnnn00010000 1 Comparison result Description: The contents of general register Rn are decremented by 1 and the result compared to 0 (zero). When the result is 0, the T bit is set to 1. When the result is not zero, the T bit is set to 0. Operation: DT(long n) /* DT Rn */ { R[n]--; if (R[n]==0) T=1; else T=0; PC+=2; } Example: MOV #4,R5 ;Sets the number of loops. ADD R0,R1 ; DT RS ;Decrements the R5 value and checks whether it has become 0. BF LOOP ;Branches to LOOP is T=0. (In this example, loops 4 times.) LOOP: 91 7.2.22 EXTS (Extend as Signed): Arithmetic Instruction Format Abstract Code Cycle T Bit EXTS.B Rm, Rn Sign-extend Rm from byte → Rn 0110nnnnmmmm1110 1 — EXTS.W Rm, Rn Sign-extend Rm from word → Rn 0110nnnnmmmm1111 1 — Description: Sign-extends general register Rm data, and stores the result in Rn. If byte length is specified, the bit 7 value of Rm is copied into bits 8 to 31 of Rn. If word length is specified, the bit 15 value of Rm is copied into bits 16 to 31 of Rn. Operation: EXTSB(long m,long n) /* EXTS.B Rm,Rn */ { R[n]=R[m]; if ((R[m]&0x00000080)==0) R[n]&=0x000000FF; else R[n]|=0xFFFFFF00; PC+=2; } EXTSW(long m,long n) /* EXTS.W Rm,Rn */ { R[n]=R[m]; if ((R[m]&0x00008000)==0) R[n]&=0x0000FFFF; else R[n]|=0xFFFF0000; PC+=2; } Examples: EXTS.B R0,R1 ;Before execution: R0 = H'00000080 ;After execution: EXTS.W R0,R1 92 R1 = H'FFFFFF80 ;Before execution: R0 = H'00008000 ;After execution: R1 = H'FFFF8000 7.2.23 EXTU (Extend as Unsigned): Arithmetic Instruction Format Abstract Code Cycle T Bit EXTU.B Rm, Rn Zero-extend Rm from byte → Rn 0110nnnnmmmm1100 1 — EXTU.W Rm, Rn Zero-extend Rm from word → Rn 0110nnnnmmmm1101 1 — Description: Zero-extends general register Rm data, and stores the result in Rn. If byte length is specified, 0s are written in bits 8 to 31 of Rn. If word length is specified, 0s are written in bits 16 to 31 of Rn. Operation: EXTUB(long m,long n) /* EXTU.B Rm,Rn */ { R[n]=R[m]; R[n]&=0x000000FF; PC+=2; } EXTUW(long m,long n) /* EXTU.W Rm,Rn */ { R[n]=R[m]; R[n]&=0x0000FFFF; PC+=2; } Examples: EXTU.B R0,R1 ;Before execution: R0 = H'FFFFFF80 ;After execution: EXTU.W R0,R1 R1 = H'00000080 ;Before execution: R0 = H'FFFF8000 ;After execution: R1 = H'00008000 93 7.2.24 JMP (Jump): Branch Instruction Class: Delayed branch instruction Format Abstract Code Cycle T Bit JMP Rm → PC 0100mmmm00101011 2 @Rm — Description: Branches unconditionally to the address specified by register indirect addressing. The branch destination is an address specified by the 32-bit data in general register Rm. Note: Since this is a delayed branch instruction, the instruction after JMP is executed before branching. No interrupts or address errors are accepted between this instruction and the next instruction. If the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. Operation: JMP(long m) /* JMP @Rm */ { unsigned long temp; temp=PC; PC=R[m]+4; Delay_Slot(temp+2); } Example: JMP_TABLE: MOV.L JMP_TABLE,R0 ;Address of R0 = TRGET JMP @R0 ;Branches to TRGET MOV R0,R1 ;Executes MOV before branching .align 4 .data.l TRGET ;Jump table ................. TRGET: ADD #1,R1 ;← Branch destination Note: When a delayed branch instruction is used, the branching operation takes place after the slot instruction is executed, but the execution of instructions (register update, etc.) takes place in the sequence delayed branch instruction → delayed slot instruction. For example, even if a delayed slot instruction is used to change the register where the branch destination address is stored, the register content previous to the change will be used as the branch destination address. 94 7.2.25 JSR (Jump to Subroutine): Branch Instruction (Class: Delayed Branch Instruction) Format Abstract Code Cycle T Bit JSR PC → PR, Rm → PC 0100mmmm00001011 2 — @Rm Description: Branches to the subroutine procedure at the address specified by register indirect addressing. The PC value is stored in the PR. The jump destination is an address specified by the 32-bit data in general register Rm. The stored/saved PC is the address four bytes after this instruction. The JSR instruction and RTS instruction are used together for subroutine procedure calls. Note: Since this is a delayed branch instruction, the instruction after JSR is executed before branching. No interrupts and address errors are accepted between this instruction and the next instruction. If the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. Operation: JSR(long m) /* JSR @Rm */ { PR=PC; PC=R[m]+4; Delay_Slot(PR+2); } Example: ;Address of R0 = TRGET MOV.L JSR_TABLE,R0 JSR @R0 ;Branches to TRGET XOR R1,R1 ;Executes XOR before branching ADD R0,R1 ;← Return address for when the subroutine procedure is completed (PR data) ........... .align JSR_TABLE: .data.l TRGET: 4 TRGET ;← Procedure entrance NOP MOV R2,R3 ; ;Returns to the above ADD instruction RTS MOV ;Jump table #70,R1 ;Executes MOV before RTS 95 Note: When a delayed branch instruction is used, the branching operation takes place after the slot instruction is executed, but the execution of instructions (register update, etc.) takes place in the sequence delayed branch instruction → delayed slot instruction. For example, even if a delayed slot instruction is used to change the register where the branch destination address is stored, the register content previous to the change will be used as the branch destination address. 96 7.2.26 LDC (Load to Control Register): System Control Instruction (Class: Interrupt Disabled Instruction) Format Abstract Code Cycle T Bit LDC Rm,SR Rm → SR 0100mmmm00001110 1 LSB LDC Rm,GBR Rm → GBR 0100mmmm00011110 1 — LDC Rm,VBR Rm → VBR 0100mmmm00101110 1 — LDC.L @Rm+,SR (Rm) → SR, Rm + 4 → Rm 0100mmmm00000111 3 LSB LDC.L @Rm+,GBR (Rm) → GBR, Rm + 4 → Rm 0100mmmm00010111 3 — LDC.L @Rm+,VBR (Rm) → VBR, Rm + 4 → Rm 0100mmmm00100111 3 — Description: Store the source operand into control register SR, GBR, or VBR. Note: No interrupts are accepted between this instruction and the next instruction. Address errors are accepted. Operation: LDCSR(long m) /* LDC Rm,SR */ { SR=R[m]&0x0FFF0FFF; PC+=2; } LDCGBR(long m) /* LDC Rm,GBR */ { GBR=R[m]; PC+=2; } LDCVBR(long m) /* LDC Rm,VBR */ { VBR=R[m]; PC+=2; } 97 LDCMSR(long m) /* LDC.L @Rm+,SR */ { SR=Read_Long(R[m])&0x0FFF0FFF; R[m]+=4; PC+=2; } LDCMGBR(long m) /* LDC.L @Rm+,GBR */ { GBR=Read_Long(R[m]); R[m]+=4; PC+=2; } LDCMVBR(long m) /* LDC.L @Rm+,VBR */ { VBR=Read_Long(R[m]); R[m]+=4; PC+=2; } Examples: LDC R0,SR ;Before execution: ;After execution: LDC.L @R15+,GBR ;Before execution: ;After execution: 98 R0 = H'FFFFFFFF, SR = H'00000000 SR = H'0FFF0FFF R15 = H'10000000 R15 = H'10000004, GBR = @H'10000000 7.2.27 LDS (Load to System Register): System Control Instruction Class: Interrupt disabled instruction Format Abstract Code Cycle T Bit LDS Rm,MACH Rm → MACH 0100mmmm00001010 1 — LDS Rm,MACL Rm → MACL 0100mmmm00011010 1 — LDS Rm,PR Rm → PR 0100mmmm00101010 1 — LDS.L @Rm+, MACH (Rm) → MACH, Rm + 4 → Rm 0100mmmm00000110 1 — LDS.L @Rm+, MACL (Rm) → MACL, Rm + 4 → Rm 0100mmmm00010110 1 — LDS.L @Rm+,PR (Rm) → PR, Rm + 4 → Rm 0100mmmm00100110 1 — Description: Store the source operand into the system register MACH, MACL, or PR. Note: No interrupts are accepted between this instruction and the next instruction. Address errors are accepted. Operation: LDSMACH(long m) /* LDS Rm,MACH */ { MACH=R[m]; PC+=2; } LDSMACL(long m) /* LDS Rm,MACL */ { MACL=R[m]; PC+=2; } LDSPR(long m) /* LDS Rm,PR */ { PR=R[m]; PC+=2; } LDSMMACH(long m) /* LDS.L @Rm+,MACH */ { MACH=Read_Long(R[m]); R[m]+=4; PC+=2; 99 } LDSMMACL(long m) /* LDS.L @Rm+,MACL */ { MACL=Read_Long(R[m]); R[m]+=4; PC+=2; } LDSMPR(long m) /* LDS.L @Rm+,PR */ { PR=Read_Long(R[m]); R[m]+=4; PC+=2; } Examples: LDS R0,PR LDS.L @R15+,MACL 100 ;Before execution: ;After execution: ;Before execution: ;After execution: R0 = H'12345678, PR = H'00000000 PR = H'12345678 R15 = H'10000000 R15 = H'10000004, MACL = @H'10000000 7.2.28 MAC.L (Multiply and Accumulate Calculation Long): Arithmetic Instruction Format Abstract Code Cycle T Bit MAC.L @Rm+, @Rn+ Signed operation, (Rn) × (Rm) + MAC → MAC 0000nnnnmmmm1111 3/(2 to 4) — Description: Does signed multiplication of 32-bit operands obtained using the contents of general registers Rm and Rn as addresses. The 64-bit result is added to contents of the MAC register, and the final result is stored in the MAC register. Every time an operand is read, they increment Rm and Rn by four. When the S bit is cleared to 0, the 64-bit result is stored in the coupled MACH and MACL registers. When bit S is set to 1, addition to the MAC register is a saturation operation of 48 bits starting from the LSB. For the saturation operation, only the lower 48 bits of the MACL register are enabled and the result is limited to a range of H'FFFF800000000000 (minimum) and H'00007FFFFFFFFFFF (maximum). Operation: MACL(long m,long n) /* MAC.L @Rm+,@Rn+*/ { unsigned long RnL,RnH,RmL,RmH,Res0,Res1,Res2; unsigned long temp0,templ,temp2,temp3; long tempm,tempn,fnLmL; tempn=(long)Read_Long(R[n]); R[n]+=4; tempm=(long)Read_Long(R[m]); R[m]+=4; if ((long)(tempn^tempm)<0) fnLmL=-1; else fnLmL=0; if (tempn<0) tempn=0-tempn; if (tempm<0) tempm=0-tempm; temp1=(unsigned long)tempn; temp2=(unsigned long)tempm; RnL=temp1&0x0000FFFF; RnH=(temp1>>16)&0x0000FFFF; 101 RmL=temp2&0x0000FFFF; RmH=(temp2>>16)&0x0000FFFF; temp0=RmL*RnL; temp1=RmH*RnL; temp2=RmL*RnH; temp3=RmH*RnH; Res2=0 Res1=temp1+temp2; if (Res1<temp1) Res2+=0x00010000; temp1=(Res1<<16)&0xFFFF0000; Res0=temp0+temp1; if (Res0<temp0) Res2++; Res2=Res2+((Res1>>16)&0x0000FFFF)+temp3; if(fnLm<0){ Res2=~Res2; if (Res0==0) Res2++; else Res0=(~Res0)+1; } if(S==1){ Res0=MACL+Res0; if (MACL>Res0) Res2++; Res2+=(MACH&0x0000FFFF); if(((long)Res2<0)&&(Res2<0xFFFF8000)){ Res2=0x00008000; Res0=0x00000000; } if(((long)Res2>0)&&(Res2>0x00007FFF)){ Res2=0x00007FFF; Res0=0xFFFFFFFF; }; 102 MACH={Res2; MACL=Res0; } else { Res0=MACL+Res0; if (MACL>Res0) Res2++; Res2+=MACH MACH=Res2; MACL=Res0; } PC+=2; } Example: ;Table address MOVA TBLM,R0 MOV R0,R1 ; MOVA TBLN,R0 ;Table address ;MAC register initialization CLRMAC MAC.L @R0+,@R1+ ; MAC.L @R0+,@R1+ ; STS MACL,R0 ;Store result into R0 ............... TBLM TBLN .align 2 ; .data.l H'1234ABCD ; .data.l H'5678EF01 ; .data.l H'0123ABCD ; .data.l H'4567DEF0 ; 103 7.2.29 MAC.W (Multiply and Accumulate Calculation Word): Arithmetic Instruction Format MAC.W MAC @Rm+, @Rn+ Abstract Code Cycle T Bit With sign, (Rn) × (Rm) + MAC → MAC 0100nnnnmmmm1111 3/(2) — @Rm+, @Rn+ Description: Does signed multiplication of 16-bit operands obtained using the contents of general registers Rm and Rn as addresses. The 32-bit result is added to contents of the MAC register, and the final result is stored in the MAC register. Rm and Rn data are incremented by 2 after the operation. When the S bit is cleared to 0, the operation is 16 × 16 + 64 → 64-bit multiply and accumulate and the 64-bit result is stored in the coupled MACH and MACL registers. When the S bit is set to 1, the operation is 16 × 16 + 32 → 32-bit multiply and accumulate and addition to the MAC register is a saturation operation. For the saturation operation, only the MACL register is enabled and the result is limited to a range of H'80000000 (minimum) and H'7FFFFFFF (maximum). If an overflow occurs, the LSB of the MACH register is set to 1. The result is stored in the MACL register. The result is limited to a value between H'80000000 (minimum) for overflows in the negative direction and H'7FFFFFFF (maximum) for overflows in the positive direction. Operation: MACW(long m,long n) /* MAC.W @Rm+,@Rn+*/ { long tempm,tempn,dest,src,ans; unsigned long templ; tempn=(long)Read_Word(R[n]); R[n]+=2; tempm=(long)Read_Word(R[m]); R[m]+=2; templ=MACL; tempm=((long)(short)tempn*(long)(short)tempm); if ((long)MACL>=0) dest=0; else dest=1; if ((long)tempm>=0 { src=0; tempn=0; 104 } else { src=1; tempn=0xFFFFFFFF; } src+=dest; MACL+=tempm; if ((long)MACL>=0) ans=0; else ans=1; ans+=dest; if (S==1) { if (ans==1) { if (src==0) MACL=0x7FFFFFFF; if (src==2) MACL=0x80000000; } } else { MACH+=tempn; if (templ>MACL) MACH+=1; } PC+=2; } Example: ;Table address MOVA TBLM,R0 MOV R0,R1 ; MOVA TBLN,R0 ;Table address ;MAC register initialization CLRMAC MAC.W @R0+,@R1+ ; MAC.W @R0+,@R1+ ; STS MACL,R0 ;Store result into R0 ............... TBLM TBLN .align 2 ; .data.w H'1234 ; .data.w H'5678 ; .data.w H'0123 ; .data.w H'4567 ; 105 7.2.30 MOV (Move Data): Data Transfer Instruction Format Abstract Code Cycle T Bit MOV Rm,Rn Rm → Rn 0110nnnnmmmm0011 1 — MOV.B Rm,@Rn Rm → (Rn) 0010nnnnmmmm0000 1 — MOV.W Rm,@Rn Rm → (Rn) 0010nnnnmmmm0001 1 — MOV.L Rm,@Rn Rm → (Rn) 0010nnnnmmmm0010 1 — MOV.B @Rm,Rn (Rm) → sign extension → Rn 0110nnnnmmmm0000 1 — MOV.W @Rm,Rn (Rm) → sign extension → Rn 0110nnnnmmmm0001 1 — MOV.L @Rm,Rn (Rm) → Rn 0110nnnnmmmm0010 1 — MOV.B Rm,@–Rn Rn – 1 → Rn, Rm → (Rn) 0010nnnnmmmm0100 1 — MOV.W Rm,@–Rn Rn – 2 → Rn, Rm → (Rn) 0010nnnnmmmm0101 1 — MOV.L Rm,@–Rn Rn – 4 → Rn, Rm → (Rn) 0010nnnnmmmm0110 1 — MOV.B @Rm+,Rn (Rm) → sign extension → Rn, Rm + 1 → Rm 0110nnnnmmmm0100 1 — MOV.W @Rm+,Rn (Rm) → sign extension → Rn, Rm + 2 → Rm 0110nnnnmmmm0101 1 — MOV.L @Rm+,Rn (Rm) → Rn, Rm + 4 → Rm 0110nnnnmmmm0110 1 — MOV.B Rm,@(R0,Rn) Rm → (R0 + Rn) 0000nnnnmmmm0100 1 — MOV.W Rm,@(R0,Rn) Rm → (R0 + Rn) 0000nnnnmmmm0101 1 — MOV.L Rm,@(R0,Rn) Rm → (R0 + Rn) 0000nnnnmmmm0110 1 — MOV.B @(R0,Rm),Rn (R0 + Rm) → sign extension → Rn 0000nnnnmmmm1100 1 — MOV.W @(R0,Rm),Rn (R0 + Rm) → sign extension → Rn 0000nnnnmmmm1101 1 — MOV.L @(R0,Rm),Rn (R0 + Rm) → Rn 0000nnnnmmmm1110 1 — Description: Transfers the source operand to the destination. When the operand is stored in memory, the transferred data can be a byte, word, or longword. Loaded data from memory is stored in a register after it is sign-extended to a longword. Operation: MOV(long m,long n) { R[n]=R[m]; PC+=2; } 106 /* MOV Rm,Rn */ MOVBS(long m,long n) /* MOV.B Rm,@Rn */ { Write_Byte(R[n],R[m]); PC+=2; } MOVWS(long m,long n) /* MOV.W Rm,@Rn */ { Write_Word(R[n],R[m]); PC+=2; } MOVLS(long m,long n) /* MOV.L Rm,@Rn */ { Write_Long(R[n],R[m]); PC+=2; } MOVBL(long m,long n) /* MOV.B @Rm,Rn */ { R[n]=(long)Read_Byte(R[m]); if ((R[n]&0x80)==0) R[n]&0x000000FF; else R[n]|=0xFFFFFF00; PC+=2; } MOVWL(long m,long n) /* MOV.W @Rm,Rn */ { R[n]=(long)Read_Word(R[m]); if ((R[n]&0x8000)==0) R[n]&0x0000FFFF; else R[n]|=0xFFFF0000; PC+=2; } MOVLL(long m,long n) /* MOV.L @Rm,Rn */ { R[n]=Read_Long(R[m]); PC+=2; } 107 MOVBM(long m,long n) /* MOV.B Rm,@–Rn */ { Write_Byte(R[n]–1,R[m]); R[n]–=1; PC+=2; } MOVWM(long m,long n) /* MOV.W Rm,@–Rn */ { Write_Word(R[n]–2,R[m]); R[n]–=2; PC+=2; } MOVLM(long m,long n) /* MOV.L Rm,@–Rn */ { Write_Long(R[n]–4,R[m]); R[n]–=4; PC+=2; } MOVBP(long m,long n) /* MOV.B @Rm+,Rn */ { R[n]=(long)Read_Byte(R[m]); if ((R[n]&0x80)==0) R[n]&0x000000FF; else R[n]|=0xFFFFFF00; if (n!=m) R[m]+=1; PC+=2; } MOVWP(long m,long n) /* MOV.W @Rm+,Rn */ { R[n]=(long)Read_Word(R[m]); if ((R[n]&0x8000)==0) R[n]&0x0000FFFF; else R[n]|=0xFFFF0000; if (n!=m) R[m]+=2; PC+=2; } 108 MOVLP(long m,long n) /* MOV.L @Rm+,Rn */ { R[n]=Read_Long(R[m]); if (n!=m) R[m]+=4; PC+=2; } MOVBS0(long m,long n) /* MOV.B Rm,@(R0,Rn) */ { Write_Byte(R[n]+R[0],R[m]); PC+=2; } MOVWS0(long m,long n) /* MOV.W Rm,@(R0,Rn) */ { Write_Word(R[n]+R[0],R[m]); PC+=2; } MOVLS0(long m,long n) /* MOV.L Rm,@(R0,Rn) */ { Write_Long(R[n]+R[0],R[m]); PC+=2; } MOVBL0(long m,long n) /* MOV.B @(R0,Rm),Rn */ { R[n]=(long)Read_Byte(R[m]+R[0]); if ((R[n]&0x80)==0) R[n]&0x000000FF; else R[n]|=0xFFFFFF00; PC+=2; } MOVWL0(long m,long n) /* MOV.W @(R0,Rm),Rn */ { R[n]=(long)Read_Word(R[m]+R[0]); if ((R[n]&0x8000)==0) R[n]&0x0000FFFF; else R[n]|=0xFFFF0000; PC+=2; } 109 MOVLL0(long m,long n) /* MOV.L @(R0,Rm),Rn */ { R[n]=Read_Long(R[m]+R[0]); PC+=2; } Example: MOV R0,R1 ;Before execution: ;After execution: MOV.W R0,@R1 ;Before execution: ;After execution: MOV.B @R0,R1 ;Before execution: ;After execution: MOV.W R0,@–R1 ;Before execution: ;After execution: MOV.L @R0+,R1 ;Before execution: ;After execution: MOV.B R1,@(R0,R2) ;Before execution: ;After execution: MOV.W @(R0,R2),R1 ;Before execution: ;After execution: 110 R0 = H'FFFFFFFF, R1 = H'00000000 R1 = H'FFFFFFFF R0 = H'FFFF7F80 @R1 = H'7F80 @R0 = H'80, R1 = H'00000000 R1 = H'FFFFFF80 R0 = H'AAAAAAAA, R1 = H'FFFF7F80 R1 = H'FFFF7F7E, @R1 = H'AAAA R0 = H'12345670 R0 = H'12345674, R1 = @H'12345670 R2 = H'00000004, R0 = H'10000000 R1 = @H'10000004 R2 = H'00000004, R0 = H'10000000 R1 = @H'10000004 7.2.31 MOV (Move Immediate Data): Data Transfer Instruction Format Abstract Code Cycle T Bit imm → sign extension → Rn 1110nnnniiiiiiii 1 — MOV.W @(disp, PC),Rn (disp × 2 + PC) → sign extension → Rn 1001nnnndddddddd 1 — MOV.L @(disp, PC),Rn (disp × 4 + PC) → Rn 1101nnnndddddddd 1 — MOV #imm,Rn Description: Stores immediate data, which has been sign-extended to a longword, into general register Rn. If the data is a word or longword, table data stored in the address specified by PC + displacement is accessed. If the data is a word, the 8-bit displacement is zero-extended and doubled. Consequently, the relative interval from the table can be up to PC + 510 bytes. The PC points to the starting address of the second instruction after this MOV instruction. If the data is a longword, the 8-bit displacement is zero-extended and quadrupled. Consequently, the relative interval from the table can be up to PC + 1020 bytes. The PC points to the starting address of the second instruction after this MOV instruction, but the lowest two bits of the PC are corrected to B'00. Note: The optimum table assignment is at the rear end of the module or one instruction after the unconditional branch instruction. If the optimum assignment is impossible for the reason of no unconditional branch instruction in the 510 byte/1020 byte or some other reason, means to jump past the table by the BRA instruction are required. By assigning this instruction immediately after the delayed branch instruction, the PC becomes the "first address + 2". Operation: MOVI(long i,long n) /* MOV #imm,Rn */ { if ((i&0x80)==0) R[n]=(0x000000FF & (long)i); else R[n]=(0xFFFFFF00 | (long)i); PC+=2; } 111 MOVWI(long d,long n) /* MOV.W @(disp,PC),Rn */ { long disp; disp=(0x000000FF & (long)d); R[n]=(long)Read_Word(PC+(disp<<1)); if ((R[n]&0x8000)==0) R[n]&=0x0000FFFF; else R[n]|=0xFFFF0000; PC+=2; } MOVLI(long d,long n) /* MOV.L @(disp,PC),Rn */ { long disp; disp=(0x000000FF & (long)d); R[n]=Read_Long((PC&0xFFFFFFFC)+(disp<<2)); PC+=2; } Example: Address 1000 MOV #H'80,R1 ;R1 = H'FFFFFF80 1002 MOV.W IMM,R2 ;R2 = H'FFFF9ABC, IMM means @(H'08,PC) 1004 ADD #–1,R0 ; 1006 TST R0,R0 ;← PC location used for address calculation for the MOV.W instruction 1008 MOVT R13 ; 100A BRA NEXT ;Delayed branch instruction 100C MOV.L @(4,PC),R3 ;R3 = H'12345678 100E IMM .data.w H'9ABC ; 1010 .data.w H'1234 ; 1012 NEXT JMP @R3 ;Branch destination of the BRA instruction 1014 CMP/EQ #0,R0 ;← PC location used for address calculation for the ;MOV.L instruction .align 4 ; .data.l H'12345678 ; 1018 112 7.2.32 MOV (Move Peripheral Data): Data Transfer Instruction Format Abstract Code Cycle T Bit MOV.B @(disp,GBR),R0 (disp + GBR) → sign extension → R0 11000100dddddddd 1 — MOV.W @(disp,GBR),R0 (disp × 2 + GBR) → sign extension → R0 11000101dddddddd 1 — MOV.L @(disp,GBR),R0 (disp × 4 + GBR) → R0 11000110dddddddd 1 — MOV.B R0,@(disp,GBR) R0 → (disp + GBR) 11000000dddddddd 1 — MOV.W R0,@(disp,GBR) R0 → (disp × 2 + GBR) 11000001dddddddd 1 — MOV.L R0,@(disp,GBR) R0 → (disp × 4 + GBR) 11000010dddddddd 1 — Description: Transfers the source operand to the destination. This instruction is optimum for accessing data in the peripheral module area. The data can be a byte, word, or longword, but only the R0 register can be used. A peripheral module base address is set to the GBR. When the peripheral module data is a byte, the only change made is to zero-extend the 8-bit displacement. Consequently, an address within +255 bytes can be specified. When the peripheral module data is a word, the 8-bit displacement is zero-extended and doubled. Consequently, an address within +510 bytes can be specified. When the peripheral module data is a longword, the 8-bit displacement is zero-extended and is quadrupled. Consequently, an address within +1020 bytes can be specified. If the displacement is too short to reach the memory operand, the above @(R0,Rn) mode must be used after the GBR data is transferred to a general register. When the source operand is in memory, the loaded data is stored in the register after it is sign-extended to a longword. Note: The destination register of a data load is always R0. R0 cannot be accessed by the next instruction until the load instruction is finished. The instruction order shown in figure 7.1 will give better results. MOV.B @(12, GBR), R0 MOV.B @(12, GBR), R0 AND #80, R0 ADD #20, R1 ADD #20, R1 AND #80, R0 Figure 7.1 Using R0 after MOV 113 Operation: MOVBLG(long d) /* MOV.B @(disp,GBR),R0 */ { long disp; disp=(0x000000FF & (long)d); R[0]=(long)Read_Byte(GBR+disp); if ((R[0]&0x80)==0) R[0]&=0x000000FF; else R[0]|=0xFFFFFF00; PC+=2; } MOVWLG(long d) /* MOV.W @(disp,GBR),R0 */ { long disp; disp=(0x000000FF & (long)d); R[0]=(long)Read_Word(GBR+(disp<<1)); if ((R[0]&0x8000)==0) R[0]&=0x0000FFFF; else R[0]|=0xFFFF0000; PC+=2; } MOVLLG(long d) /* MOV.L @(disp,GBR),R0 */ { long disp; disp=(0x000000FF & (long)d); R[0]=Read_Long(GBR+(disp<<2)); PC+=2; } 114 MOVBSG(long d) /* MOV.B R0,@(disp,GBR) */ { long disp; disp=(0x000000FF & (long)d); Write_Byte(GBR+disp,R[0]); PC+=2; } MOVWSG(long d) /* MOV.W R0,@(disp,GBR) */ { long disp; disp=(0x000000FF & (long)d); Write_Word(GBR+(disp<<1),R[0]); PC+=2; } MOVLSG(long d) /* MOV.L R0,@(disp,GBR) */ { long disp; disp=(0x000000FF & (long)d); Write_Long(GBR+(disp<<2),R[0]); PC+=2; } Examples: MOV.L @(2,GBR),R0 ;Before execution: ;After execution: MOV.B R0,@(1,GBR) ;Before execution: ;After execution: @(GBR + 8) = H'12345670 R0 = H'12345670 R0 = H'FFFF7F80 @(GBR + 1) = H'FFFF7F80 115 7.2.33 MOV (Move Structure Data): Data Transfer Instruction Format Abstract Code Cycle T Bit R0 → (disp + Rn) 10000000nnnndddd 1 — MOV.W R0,@(disp,Rn) R0 → (disp × 2 + Rn) 10000001nnnndddd 1 — MOV.L Rm,@(disp,Rn) Rm → (disp × 4 + Rn) 0001nnnnmmmmdddd 1 — MOV.B @(disp,Rm),R0 MOV.B R0,@(disp,Rn) (disp + Rm) → sign extension → R0 10000100mmmmdddd 1 — MOV.W @(disp,Rm),R0 (disp × 2 + Rm) → sign extension → R0 10000101mmmmdddd 1 — MOV.L disp × 4 + Rm) → Rn 0101nnnnmmmmdddd 1 — @(disp,Rm),Rn Description: Transfers the source operand to the destination. This instruction is optimum for accessing data in a structure or a stack. The data can be a byte, word, or longword, but when a byte or word is selected, only the R0 register can be used. When the data is a byte, the only change made is to zero-extend the 4-bit displacement. Consequently, an address within +15 bytes can be specified. When the data is a word, the 4-bit displacement is zero-extended and doubled. Consequently, an address within +30 bytes can be specified. When the data is a longword, the 4-bit displacement is zero-extended and quadrupled. Consequently, an address within +60 bytes can be specified. If the displacement is too short to reach the memory operand, the aforementioned @(R0,Rn) mode must be used. When the source operand is in memory, the loaded data is stored in the register after it is sign-extended to a longword. Note: When byte or word data is loaded, the destination register is always R0. R0 cannot be accessed by the next instruction until the load instruction is finished. The instruction order in figure 7.2 will give better results. MOV.B @(2, R1), R0 MOV.B @(2, R1), R0 AND #80, R0 ADD #20, R1 ADD #20, R1 AND #80, R0 Figure 7.2 Using R0 after MOV 116 Operation: MOVBS4(long d,long n) /* MOV.B R0,@(disp,Rn) */ { long disp; disp=(0x0000000F & (long)d); Write_Byte(R[n]+disp,R[0]); PC+=2; } MOVWS4(long d,long n) /* MOV.W R0,@(disp,Rn) */ { long disp; disp=(0x0000000F & (long)d); Write_Word(R[n]+(disp<<1),R[0]); PC+=2; } MOVLS4(long m,long d,long n) /* MOV.L Rm,@(disp,Rn) */ { long disp; disp=(0x0000000F & (long)d); Write_Long(R[n]+(disp<<2),R[m]); PC+=2; } MOVBL4(long m,long d) /* MOV.B @(disp,Rm),R0 */ { long disp; disp=(0x0000000F & (long)d); R[0]=Read_Byte(R[m]+disp); if ((R[0]&0x80)==0) R[0]&=0x000000FF; else R[0]|=0xFFFFFF00; PC+=2; } 117 MOVWL4(long m,long d) /* MOV.W @(disp,Rm),R0 */ { long disp; disp=(0x0000000F & (long)d); R[0]=Read_Word(R[m]+(disp<<1)); if ((R[0]&0x8000)==0) R[0]&=0x0000FFFF; else R[0]|=0xFFFF0000; PC+=2; } MOVLL4(long m,long d,long n) /* MOV.L @(disp,Rm),Rn */ { long disp; disp=(0x0000000F & (long)d); R[n]=Read_Long(R[m]+(disp<<2)); PC+=2; } Examples: MOV.L @(2,R0),R1 ;Before execution: @(R0 + 8) = H'12345670 ;After execution: MOV.L R0,@(H'F,R1) ;Before execution: R0 = H'FFFF7F80 ;After execution: 118 R1 = H'12345670 @(R1 + 60) = H'FFFF7F80 7.2.34 MOVA (Move Effective Address): Data Transfer Instruction Format Abstract Code Cycle T Bit MOVA @(disp,PC),R0 disp × 4 + PC → R0 11000111dddddddd 1 — Description: Stores the effective address of the source operand into general register R0. The 8-bit displacement is zero-extended and quadrupled. Consequently, the relative interval from the operand is PC + 1020 bytes. The PC is the address four bytes after this instruction, but the lowest two bits of the PC are corrected to B'00. Note: If this instruction is placed immediately after a delayed branch instruction, the PC must point to an address specified by (the starting address of the branch destination) + 2. Operation: MOVA(long d) /* MOVA @(disp,PC),R0 */ { long disp; disp=(0x000000FF & (long)d); R[0]=(PC&0xFFFFFFFC)+(disp<<2); PC+=2; } Example: Address .org H'1006 1006 MOVA STR,R0 ;Address of STR → R0 1008 MOV.B @R0,R1 ;R1 = “X” ← PC location after correcting the lowest R4,R5 two bits ;← Original PC location for address calculation for the MOVA instruction 100A ADD .align 4 100C STR: .sdata “XYZP12” ............... 2002 BRA TRGET ;Delayed branch instruction 2004 MOVA @(0,PC),R0 ;Address of TRGET + 2 → R0 2006 NOP ; 119 7.2.35 MOVT (Move T Bit): Data Transfer Instruction Format Abstract Code Cycle T Bit MOVT Rn T → Rn 0000nnnn00101001 1 — Description: Stores the T bit value into general register Rn. When T = 1, 1 is stored in Rn, and when T = 0, 0 is stored in Rn. Operation: MOVT(long n) /* MOVT Rn */ { R[n]=(0x00000001 & SR); PC+=2; } Example: XOR R2,R2 ;R2 = 0 CMP/PZ R2 ;T = 1 MOVT ;R0 = 1 R0 ;T = 0 CLRT MOVT 120 R1 ;R1 = 0 7.2.36 MUL.L (Multiply Long): Arithmetic Instruction Format Abstract Code Cycle T Bit MUL.L Rm,Rn Rn × Rm → MACL 0000nnnnmmmm0111 2 to 4 — Description: Performs 32-bit multiplication of the contents of general registers Rn and Rm, and stores the bottom 32 bits of the result in the MACL register. The MACH register data does not change. Operation: MUL.L(long m,long n) /* MUL.L Rm,Rn */ { MACL=R[n]*R[m]; PC+=2; } Example: MULL R0,R1 ;Before execution: R0 = H'FFFFFFFE, R1 = H'00005555 STS ;Operation result ;After execution: MACL,R0 MACL = H'FFFF5556 121 7.2.37 MULS.W (Multiply as Signed Word): Arithmetic Instruction Format MULS.W MULS Rm,Rn Rm,Rn Abstract Code Cycle T Bit Signed operation, Rn × Rm → MACL 0010nnnnmmmm1111 1 to 3 — Description: Performs 16-bit multiplication of the contents of general registers Rn and Rm, and stores the 32-bit result in the MACL register. The operation is signed and the MACH register data does not change. Operation: MULS(long m,long n) /* MULS Rm,Rn */ { MACL=((long)(short)R[n]*(long)(short)R[m]); PC+=2; } Example: MULS R0,R1 ;Before execution: R0 = H'FFFFFFFE, R1 = H'00005555 ;After execution: STS 122 MACL,R0 Operation result MACL = H'FFFF5556 7.2.38 MULU.W (Multiply as Unsigned Word): Arithmetic Instruction Format Abstract Code Cycle T Bit MULU.W Rm,Rn MULU Rm,Rn Unsigned, Rn × Rm → MACL 0010nnnnmmmm1110 1 to 3 — Description: Performs 16-bit multiplication of the contents of general registers Rn and Rm, and stores the 32-bit result in the MACL register. The operation is unsigned and the MACH register data does not change. Operation: MULU(long m,long n) /* MULU Rm,Rn */ { MACL=((unsigned long)(unsigned short)R[n] *(unsigned long)(unsigned short)R[m]); PC+=2; } Example: MULU R0,R1 ;Before execution: ;After execution: STS MACL,R0 R0 = H'00000002, R1 = H'FFFFAAAA MACL = H'00015554 ;Operation result 123 7.2.39 NEG (Negate): Arithmetic Instruction Format Abstract Code Cycle T Bit NEG Rm,Rn 0 – Rm → Rn 0110nnnnmmmm1011 1 — Description: Takes the two’s complement of data in general register Rm, and stores the result in Rn. This effectively subtracts Rm data from 0, and stores the result in Rn. Operation: NEG(long m,long n) /* NEG Rm,Rn */ { R[n]=0-R[m]; PC+=2; } Example: NEG R0,R1 ;Before execution: ;After execution: 124 R0 = H'00000001 R1 = H'FFFFFFFF 7.2.40 NEGC (Negate with Carry): Arithmetic Instruction Format Abstract Code Cycle T Bit NEGC Rm,Rn 0 – Rm – T → Rn, Borrow → T 0110nnnnmmmm1010 1 Borrow Description: Subtracts general register Rm data and the T bit from 0, and stores the result in Rn. If a borrow is generated, T bit changes accordingly. This instruction is used for inverting the sign of a value that has more than 32 bits. Operation: NEGC(long m,long n) /* NEGC Rm,Rn */ { unsigned long temp; temp=0-R[m]; R[n]=temp-T; if (0<temp) T=1; else T=0; if (temp<R[n]) T=1; PC+=2; } Examples: ;Sign inversion of R1 and R0 (64 bits) CLRT NEGC R1,R1 NEGC R0,R0 ;Before execution: R1 = H'00000001, T = 0 ;After execution: R1 = H'FFFFFFFF, T = 1 ;Before execution: R0 = H'00000000, T = 1 ;After execution: R0 = H'FFFFFFFF, T = 1 125 7.2.41 NOP (No Operation): System Control Instruction Format Abstract Code Cycle T Bit NOP No operation 0000000000001001 1 — Description: Increments the PC to execute the next instruction. Operation: NOP() /* NOP */ { PC+=2; } Example: NOP 126 ;Executes in one cycle 7.2.42 NOT (NOT—Logical Complement): Logic Operation Instruction Format Abstract Code Cycle T Bit NOT Rm,Rn ~Rm → Rn 0110nnnnmmmm0111 1 — Description: Takes the one’s complement of general register Rm data, and stores the result in Rn. This effectively inverts each bit of Rm data and stores the result in Rn. Operation: NOT(long m,long n) /* NOT Rm,Rn */ { R[n]=~R[m]; PC+=2; } Example: NOT R0,R1 ;Before execution: R0 = H'AAAAAAAA ;After execution: R1 = H'55555555 127 7.2.43 OR (OR Logical) Logic Operation Instruction Format Abstract Code Cycle T Bit OR Rm,Rn Rn | Rm → Rn 0010nnnnmmmm1011 1 — OR #imm,R0 R0 | imm → R0 11001011iiiiiiii 1 — (R0 + GBR) | imm → (R0 + GBR) 11001111iiiiiiii 3 — OR.B #imm,@(R0,GBR) Description: Logically ORs the contents of general registers Rn and Rm, and stores the result in Rn. The contents of general register R0 can also be ORed with zero-extended 8-bit immediate data, or 8-bit memory data accessed by using indirect indexed GBR addressing can be ORed with 8-bit immediate data. Operation: OR(long m,long n) /* OR Rm,Rn */ { R[n]|=R[m]; PC+=2; } ORI(long i) /* OR #imm,R0 */ { R[0]|=(0x000000FF & (long)i); PC+=2; } ORM(long i) /* OR.B #imm,@(R0,GBR) */ { long temp; temp=(long)Read_Byte(GBR+R[0]); temp|=(0x000000FF & (long)i); Write_Byte(GBR+R[0],temp); PC+=2; } 128 Examples: OR R0,R1 ;Before execution: ;After execution: OR #H'F0,R0 ;Before execution: ;After execution: OR.B #H'50,@(R0,GBR) ;Before execution: ;After execution: R0 = H'AAAA5555, R1 = H'55550000 R1 = H'FFFF5555 R0 = H'00000008 R0 = H'000000F8 @(R0,GBR) = H'A5 @(R0,GBR) = H'F5 129 7.2.44 ROTCL (Rotate with Carry Left): Shift Instruction Format Abstract Code Cycle T Bit ROTCL Rn T ← Rn ← T 0100nnnn00100100 1 MSB Description: Rotates the contents of general register Rn and the T bit to the left by one bit, and stores the result in Rn. The bit that is shifted out of the operand is transferred to the T bit (figure 7.3). MSB T ROTCL Figure 7.3 Rotate with Carry Left Operation: ROTCL(long n) /* ROTCL Rn */ { long temp; if ((R[n]&0x80000000)==0) temp=0; else temp=1; R[n]<<=1; if (T==1) R[n]|=0x00000001; else R[n]&=0xFFFFFFFE; if (temp==1) T=1; else T=0; PC+=2; } Example: ROTCL R0 ;Before execution: ;After execution: 130 R0 = H'80000000, T = 0 R0 = H'00000000, T = 1 LSB 7.2.45 ROTCR (Rotate with Carry Right): Shift Instruction Format ROTCR Rn Abstract Code Cycle T Bit T → Rn → T 0100nnnn00100101 1 LSB Description: Rotates the contents of general register Rn and the T bit to the right by one bit, and stores the result in Rn. The bit that is shifted out of the operand is transferred to the T bit (figure 7.4). MSB LSB T ROTCR Figure 7.4 Rotate with Carry Right Operation: ROTCR(long n) /* ROTCR Rn */ { long temp; if ((R[n]&0x00000001)==0) temp=0; else temp=1; R[n]>>=1; if (T==1) R[n]|=0x80000000; else R[n]&=0x7FFFFFFF; if (temp==1) T=1; else T=0; PC+=2; } Examples: ROTCR R0 ;Before execution: ;After execution: R0 = H'00000001, T = 1 R0 = H'80000000, T = 1 131 7.2.46 ROTL (Rotate Left): Shift Instruction Format Abstract Code Cycle T Bit ROTL Rn T ← Rn ← MSB 0100nnnn00000100 1 MSB Description: Rotates the contents of general register Rn to the left by one bit, and stores the result in Rn (figure 7.5). The bit that is shifted out of the operand is transferred to the T bit. MSB ROTL T Figure 7.5 Rotate Left Operation: ROTL(long n) /* ROTL Rn */ { if ((R[n]&0x80000000)==0) T=0; else T=1; R[n]<<=1; if (T==1) R[n]|=0x00000001; else R[n]&=0xFFFFFFFE; PC+=2; } Examples: ROTL R0 ;Before execution: ;After execution: 132 R0 = H'80000000, T = 0 R0 = H'00000001, T = 1 LSB 7.2.47 ROTR (Rotate Right): Shift Instruction Format Abstract Code Cycle T Bit ROTR Rn LSB → Rn → T 0100nnnn00000101 1 LSB Description: Rotates the contents of general register Rn to the right by one bit, and stores the result in Rn (figure 7.6). The bit that is shifted out of the operand is transferred to the T bit. MSB LSB T ROTR Figure 7.6 Rotate Right Operation: ROTR(long n) /* ROTR Rn */ { if ((R[n]&0x00000001)==0) T=0; else T=1; R[n]>>=1; if (T==1) R[n]|=0x80000000; else R[n]&=0x7FFFFFFF; PC+=2; } Examples: ROTR R0 ;Before execution: ;After execution: R0 = H'00000001, T = 0 R0 = H'80000000, T = 1 133 7.2.48 RTE (Return from Exception): System Control Instruction Class: Delayed branch instruction Format Abstract Code Cycle T Bit RTE Delayed branch, Stack area → PC/SR 0000000000101011 4 LSB Description: Returns from an interrupt routine. The PC and SR values are restored from the stack, and the program continues from the address specified by the restored PC value. The T bit is used as the LSB bit in the SR register restored from the stack area. Note: Since this is a delayed branch instruction, the instruction after this RTE is executed before branching. No address errors and interrupts are accepted between this instruction and the next instruction. If the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. Operation: RTE() /* RTE */ { unsigned long temp; temp=PC; PC=Read_Long(R[15])+4; R[15]+=4; SR=Read_Long(R[15])&0x0FFF0FFF; R[15]+=4; Delay_Slot(temp+2); } Example: ;Returns to the original routine RTE ADD #8,R14 ;Executes ADD before branching Note: When a delayed branch instruction is used, the branching operation takes place after the slot instruction is executed, but the execution of instructions (register update, etc.) takes place in the sequence delayed branch instruction → delayed slot instruction. For example, even if a delayed slot instruction is used to change the register where the branch destination address is stored, the register content previous to the change will be used as the branch destination address. 134 7.2.49 RTS (Return from Subroutine): Branch Instruction (Class: Delayed Branch Instruction) Format Abstract Code Cycle T Bit RTS Delayed branch, PR → PC 0000000000001011 2 — Description: Returns from a subroutine procedure. The PC values are restored from the PR, and the program continues from the address specified by the restored PC value. This instruction is used to return to the program from a subroutine program called by a BSR, BSRF, or JSR instruction. Note: Since this is a delayed branch instruction, the instruction after this RTS is executed before branching. No address errors and interrupts are accepted between this instruction and the next instruction. If the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. Operation: RTS() /* RTS */ { unsigned long temp; temp=PC; PC=PR+4; Delay_Slot(temp+2); } 135 Example: MOV.L TABLE,R3 ;R3 = Address of TRGET JSR @R3 ;Branches to TRGET ;Executes NOP before branching NOP ADD R0,R1 ;← Return address for when the subroutine procedure is completed (PR data) ............. TABLE: .data.l TRGET ;Jump table R1,R0 ;← Procedure entrance ............. TRGET: MOV ;PR data → PC RTS MOV #12,R0 ; Executes MOV before branching Note: When a delayed branch instruction is used, the branching operation takes place after the slot instruction is executed, but the execution of instructions (register update, etc.) takes place in the sequence delayed branch instruction → delayed slot instruction. For example, even if a delayed slot instruction is used to change the register where the branch destination address is stored, the register content previous to the change will be used as the branch destination address. 136 7.2.50 SETT (Set T Bit): System Control Instruction Format Abstract Code Cycle T Bit SETT 1→T 0000000000011000 1 1 Description: Sets the T bit to 1. Operation: SETT() /* SETT */ { T=1; PC+=2; } Example: SETT ;Before execution: T = 0 ;After execution: T = 1 137 7.2.51 SHAL (Shift Arithmetic Left): Shift Instruction Format Abstract Code Cycle T Bit SHAL Rn T ← Rn ← 0 0100nnnn00100000 1 MSB Description: Arithmetically shifts the contents of general register Rn to the left by one bit, and stores the result in Rn. The bit that is shifted out of the operand is transferred to the T bit (figure 7.7). MSB SHAL T Figure 7.7 Shift Arithmetic Left Operation: SHAL(long n) /* SHAL Rn (Same as SHLL) */ { if ((R[n]&0x80000000)==0) T=0; else T=1; R[n]<<=1; PC+=2; } Example: SHAL 138 R0 ;Before execution: R0 = H'80000001, T = 0 ;After execution: R0 = H'00000002, T = 1 LSB 0 7.2.52 SHAR (Shift Arithmetic Right): Shift Instruction Format SHAR Rn Abstract Code Cycle T Bit MSB → Rn → T 0100nnnn00100001 1 LSB Description: Arithmetically shifts the contents of general register Rn to the right by one bit, and stores the result in Rn. The bit that is shifted out of the operand is transferred to the T bit (figure 7.8). MSB LSB T SHAR Figure 7.8 Shift Arithmetic Right Operation: SHAR(long n) /* SHAR Rn */ { long temp; if ((R[n]&0x00000001)==0) T=0; else T=1; if ((R[n]&0x80000000)==0) temp=0; else temp=1; R[n]>>=1; if (temp==1) R[n]|=0x80000000; else R[n]&=0x7FFFFFFF; PC+=2; } Example: SHAR R0 ;Before execution: ;After execution: R0 = H'80000001, T = 0 R0 = H'C0000000, T = 1 139 7.2.53 SHLL (Shift Logical Left): Shift Instruction Format Abstract Code Cycle T Bit SHLL Rn T ← Rn ← 0 0100nnnn00000000 1 MSB Description: Logically shifts the contents of general register Rn to the left by one bit, and stores the result in Rn. The bit that is shifted out of the operand is transferred to the T bit (figure 7.9). MSB SHLL T Figure 7.9 Shift Logical Left Operation: SHLL(long n) /* SHLL Rn (Same as SHAL) */ { if ((R[n]&0x80000000)==0) T=0; else T=1; R[n]<<=1; PC+=2; } Examples: SHLL 140 R0 ;Before execution: R0 = H'80000001, T = 0 ;After execution: R0 = H'00000002, T = 1 LSB 0 7.2.54 SHLLn (Shift Logical Left n Bits): Shift Instruction Format Abstract Code Cycle T Bit SHLL2 Rn Rn << 2 → Rn 0100nnnn00001000 1 — SHLL8 Rn Rn << 8 → Rn 0100nnnn00011000 1 — Rn << 16 → Rn 0100nnnn00101000 1 — SHLL16 Rn Description: Logically shifts the contents of general register Rn to the left by 2, 8, or 16 bits, and stores the result in Rn. Bits that are shifted out of the operand are not stored (figure 7.10). MSB LSB SHLL2 0 MSB LSB SHLL8 0 MSB LSB SHLL16 0 Figure 7.10 Shift Logical Left n Bits 141 Operation: SHLL2(long n) /* SHLL2 Rn */ { R[n]<<=2; PC+=2; } SHLL8(long n) /* SHLL8 Rn */ { R[n]<<=8; PC+=2; } SHLL16(long n) /* SHLL16 Rn */ { R[n]<<=16; PC+=2; } Examples: SHLL2 R0 ;Before execution: ;After execution: R0 = H'12345678 R0 = H'48D159E0 SHLL8 R0 ;Before execution: ;After execution: R0 = H'12345678 R0 = H'34567800 SHLL16 R0 ;Before execution: ;After execution: R0 = H'12345678 R0 = H'56780000 142 7.2.55 SHLR (Shift Logical Right): Shift Instruction Format Abstract Code Cycle T Bit SHLR Rn 0 → Rn → T 0100nnnn00000001 1 LSB Description: Logically shifts the contents of general register Rn to the right by one bit, and stores the result in Rn. The bit that is shifted out of the operand is transferred to the T bit (figure 7.11). MSB SHLR LSB 0 T Figure 7.11 Shift Logical Right Operation: SHLR(long n) /* SHLR Rn */ { if ((R[n]&0x00000001)==0) T=0; else T=1; R[n]>>=1; R[n]&=0x7FFFFFFF; PC+=2; } Examples: SHLR R0 ;Before execution: ;After execution: R0 = H'80000001, T = 0 R0 = H'40000000, T = 1 143 7.2.56 SHLRn (Shift Logical Right n Bits): Shift Instruction Format Abstract Code Cycle T Bit SHLR2 Rn Rn>>2 → Rn 0100nnnn00001001 1 — SHLR8 Rn Rn>>8 → Rn 0100nnnn00011001 1 — Rn>>16 → Rn 0100nnnn00101001 1 — SHLR16 Rn Description: Logically shifts the contents of general register Rn to the right by 2, 8, or 16 bits, and stores the result in Rn. Bits that are shifted out of the operand are not stored (figure 7.12). MSB LSB MSB LSB MSB LSB SHLR2 0 SHLR8 0 SHLR16 0 Figure 7.12 Shift Logical Right n Bits 144 Operation: SHLR2(long n) /* SHLR2 Rn */ { R[n]>>=2; R[n]&=0x3FFFFFFF; PC+=2; } SHLR8(long n) /* SHLR8 Rn */ { R[n]>>=8; R[n]&=0x00FFFFFF; PC+=2; } SHLR16(long n) /* SHLR16 Rn */ { R[n]>>=16; R[n]&=0x0000FFFF; PC+=2; } Examples: SHLR2 R0 ;Before execution: ;After execution: R0 = H'12345678 R0 = H'048D159E SHLR8 R0 ;Before execution: ;After execution: R0 = H'12345678 R0 = H'00123456 SHLR16 R0 ;Before execution: ;After execution: R0 = H'12345678 R0 = H'00001234 145 7.2.57 SLEEP (Sleep): System Control Instruction Format Abstract Code Cycle T Bit SLEEP Sleep 0000000000011011 3 — Description: Sets the CPU into power-down mode. In power-down mode, instruction execution stops, but the CPU internal status is maintained, and the CPU waits for an interrupt request. If an interrupt is requested, the CPU exits the power-down mode and begins exception processing. Note: The number of cycles given is for the transition to sleep mode. Operation: SLEEP() /* SLEEP */ { PC-=2; wait_for_exception; } Example: SLEEP 146 ;Enters power-down mode 7.2.58 STC (Store Control Register): System Control Instruction (Interrupt Disabled Instruction) Format Abstract Code Cycle T Bit STC SR,Rn SR → Rn 0000nnnn00000010 1 — STC GBR,Rn GBR → Rn 0000nnnn00010010 1 — STC VBR,Rn VBR → Rn 0000nnnn00100010 1 — STC.L SR,@-Rn Rn – 4 → Rn, SR → (Rn) 0100nnnn00000011 2 — STC.L GBR,@-Rn Rn – 4 → Rn, GBR → (Rn) 0100nnnn00010011 2 — STC.L VBR,@-Rn Rn – 4 → Rn, VBR → (Rn) 0100nnnn00100011 2 — Description: Stores control register SR, GBR, or VBR data into a specified destination. Note: No interrupts are accepted between this instruction and the next instruction. Address errors are accepted. Operation: STCSR(long n) /* STC SR,Rn */ { R[n]=SR; PC+=2; } STCGBR(long n) /* STC GBR,Rn */ { R[n]=GBR; PC+=2; } STCVBR(long n) /* STC VBR,Rn */ { R[n]=VBR; PC+=2; } 147 STCMSR(long n) /* STC.L SR,@-Rn */ { R[n]-=4; Write_Long(R[n],SR); PC+=2; } STCMGBR(long n) /* STC.L GBR,@-Rn */ { R[n]-=4; Write_Long(R[n],GBR); PC+=2; } STCMVBR(long n) /* STC.L VBR,@-Rn */ { R[n]-=4; Write_Long(R[n],VBR); PC+=2; } Examples: STC SR,R0 STC.L GBR,@-R15 148 ;Before execution: ;After execution: ;Before execution: ;After execution: R0 = H'FFFFFFFF, SR = H'00000000 R0 = H'00000000 R15 = H'10000004 R15 = H'10000000, @R15 = GBR 7.2.59 STS (Store System Register): System Control Instruction (Interrupt Disabled Instruction) Format Abstract Code Cycle T Bit STS MACH,Rn MACH → Rn 0000nnnn00001010 1 — STS MACL,Rn MACL → Rn 0000nnnn00011010 1 — STS PR,Rn PR → Rn 0000nnnn00101010 1 — STS.L MACH,@–Rn Rn – 4 → Rn, MACH → (Rn) 0100nnnn00000010 1 — STS.L MACL,@–Rn Rn – 4 → Rn, MACL → (Rn) 0100nnnn00010010 1 — STS.L PR,@–Rn Rn – 4 → Rn, PR → (Rn) 0100nnnn00100010 1 — Description: Stores data from system register MACH, MACL, or PR into a specified destination. Note: No interrupts are accepted between this instruction and the next instruction. Address errors are accepted. Operation: STSMACH(long n) /* STS MACH,Rn */ { R[n]=MACH; PC+=2; } STSMACL(long n) /* STS MACL,Rn */ { R[n]=MACL; PC+=2; } STSPR(long n) /* STS PR,Rn */ { R[n]=PR; PC+=2; } 149 STSMMACH(long n) /* STS.L MACH,@–Rn */ { R[n]–=4; Write_Long(R[n],MACH); PC+=2; } STSMMACL(long n) /* STS.L MACL,@–Rn */ { R[n]–=4; Write_Long(R[n],MACL); PC+=2; } STSMPR(long n) /* STS.L PR,@–Rn */ { R[n]–=4; Write_Long(R[n],PR); PC+=2; } Example: STS MACH,R0 ;Before execution: ;After execution: R0 = H'FFFFFFFF, MACH = H'00000000 R0 = H'00000000 STS.L PR,@–R15 ;Before execution: ;After execution: R15 = H'10000004 R15 = H'10000000, @R15 = PR 150 7.2.60 SUB (Subtract Binary): Arithmetic Instruction Format SUB Rm,Rn Abstract Code Cycle T Bit Rn – Rm → Rn 0011nnnnmmmm1000 1 — Description: Subtracts general register Rm data from Rn data, and stores the result in Rn. To subtract immediate data, use ADD #imm,Rn. Operation: SUB(long m,long n) /* SUB Rm,Rn */ { R[n]-=R[m]; PC+=2; } Example: SUB R0,R1 ;Before execution: R0 = H'00000001, R1 = H'80000000 ;After execution: R1 = H'7FFFFFFF 151 7.2.61 SUBC (Subtract with Carry): Arithmetic Instruction Format SUBC Rm,Rn Abstract Code Cycle T Bit Rn – Rm– T → Rn, Borrow → T 0011nnnnmmmm1010 1 Borrow Description: Subtracts Rm data and the T bit value from general register Rn data, and stores the result in Rn. The T bit changes according to the result. This instruction is used for subtraction of data that has more than 32 bits. Operation: SUBC(long m,long n) /* SUBC Rm,Rn */ { unsigned long tmp0,tmp1; tmp1=R[n]-R[m]; tmp0=R[n]; R[n]=tmp1-T; if (tmp0<tmp1) T=1; else T=0; if (tmp1<R[n]) T=1; PC+=2; } Examples: CLRT SUBC R3,R1 SUBC R2,R0 152 ;R0:R1(64 bits) – R2:R3(64 bits) = R0:R1(64 bits) ;Before execution: T = 0, R1 = H'00000000, R3 = H'00000001 ;After execution: T = 1, R1 = H'FFFFFFFF ;Before execution: T = 1, R0 = H'00000000, R2 = H'00000000 ;After execution: T = 1, R0 = H'FFFFFFFF 7.2.62 SUBV (Subtract with V Flag Underflow Check): Arithmetic Instruction Format SUBV Rm,Rn Abstract Code Cycle T Bit Rn – Rm → Rn, underflow → T 0011nnnnmmmm1011 1 Underflow Description: Subtracts Rm data from general register Rn data, and stores the result in Rn. If an underflow occurs, the T bit is set to 1. Operation: SUBV(long m,long n) /* SUBV Rm,Rn */ { long dest,src,ans; if ((long)R[n]>=0) dest=0; else dest=1; if ((long)R[m]>=0) src=0; else src=1; src+=dest; R[n]-=R[m]; if ((long)R[n]>=0) ans=0; else ans=1; ans+=dest; if (src==1) { if (ans==1) T=1; else T=0; } else T=0; PC+=2; } Examples: SUBV R0,R1 ;Before execution: ;After execution: R0 = H'00000002, R1 = H'80000001 R1 = H'7FFFFFFF, T = 1 SUBV R2,R3 ;Before execution: ;After execution: R2 = H'FFFFFFFE, R3 = H'7FFFFFFE R3 = H'80000000, T = 1 153 7.2.63 SWAP (Swap Register Halves): Data Transfer Instruction Format SWAP.B Rm,Rn SWAP.W Rm,Rn Abstract Code Cycle T Bit Rm → Swap upper and lower halves of lower 2 bytes → Rn 0110nnnnmmmm1000 1 — Rm → Swap upper and lower word 0110nnnnmmmm1001 → Rn 1 — Description: Swaps the upper and lower bytes of the general register Rm data, and stores the result in Rn. If a byte is specified, bits 0 to 7 of Rm are swapped for bits 8 to 15. The upper 16 bits of Rm are transferred to the upper 16 bits of Rn. If a word is specified, bits 0 to 15 of Rm are swapped for bits 16 to 31. Operation: SWAPB(long m,long n) /* SWAP.B Rm,Rn */ { unsigned long temp0,temp1; temp0=R[m]&0xffff0000; temp1=(R[m]&0x000000ff)<<8; R[n]=(R[m]>>8)&0x000000ff; R[n]=R[n]|temp1|temp0; PC+=2; } SWAPW(long m,long n) /* SWAP.W Rm,Rn */ { unsigned long temp; temp=(R[m]>>16)&0x0000FFFF; R[n]=R[m]<<16; R[n]|=temp; PC+=2; } 154 Examples: SWAP.B R0,R1 ;Before execution: ;After execution: R0 = H'12345678 R1 = H'12347856 SWAP.W R0,R1 ;Before execution: ;After execution: R0 = H'12345678 R1 = H'56781234 155 7.2.64 TAS (Test and Set): Logic Operation Instruction Format TAS.B Abstract @Rn Code When (Rn) is 0, 1 → T, 1 → MSB 0100nnnn00011011 of (Rn) Cycle T Bit 4 Test results Description: Reads byte data from the address specified by general register Rn, and sets the T bit to 1 if the data is 0, or clears the T bit to 0 if the data is not 0. Then, data bit 7 is set to 1, and the data is written to the address specified by Rn. During this operation, the bus is not released. Operation: TAS(long n) /* TAS.B @Rn */ { long temp; temp=(long)Read_Byte(R[n]); /* Bus Lock enable */ if (temp==0) T=1; else T=0; temp|=0x00000080; Write_Byte(R[n],temp); /* Bus Lock disable */ PC+=2; } Example: _LOOP 156 TAS.B @R7 BF _LOOP ;R7 = 1000 ;Loops until data in address 1000 is 0 7.2.65 TRAPA (Trap Always): System Control Instruction Format Abstract Code Cycle T Bit TRAPA #imm PC/SR → Stack area, (imm × 4 + VBR) → PC 11000011iiiiiiii 8 — Description: Starts the trap exception processing. The PC and SR values are stored on the stack, and the program branches to an address specified by the vector. The vector is a memory address obtained by zero-extending the 8-bit immediate data and then quadrupling it. The PC is the start address of the next instruction. TRAPA and RTE are both used together for system calls. Operation: TRAPA(long i) /* TRAPA #imm */ { long imm; imm=(0x000000FF & i); R[15]-=4; Write_Long(R[15],SR); R[15]-=4; Write_Long(R[15],PC–2); PC=Read_Long(VBR+(imm<<2))+4; } Example: Address VBR+H'80 .data.l 10000000 ; .......... TRAPA #H'20 TST #0,R0 ;Branches to an address specified by data in address VBR + H'80 ;← Return address from the trap routine (stacked PC value) ........... .......... 100000000 XOR 100000002 RTE 100000004 NOP R0,R0 ;← Trap routine entrance ;Returns to the TST instruction ;Executes NOP before RTE 157 7.2.66 TST (Test Logical): Logic Operation Instruction Format Abstract Code Cycle T Bit TST Rm,Rn Rn & Rm, when result is 0, 1 → T 0010nnnnmmmm1000 1 Test results TST #imm,R0 R0 & imm, when result is 0, 1 → T 11001000iiiiiiii 1 Test results (R0 + GBR) & imm, when result is 0, 1 → T 3 Test results TST.B #imm, @(R0,GBR) 11001100iiiiiiii Description: Logically ANDs the contents of general registers Rn and Rm, and sets the T bit to 1 if the result is 0 or clears the T bit to 0 if the result is not 0. The Rn data does not change. The contents of general register R0 can also be ANDed with zero-extended 8-bit immediate data, or the contents of 8-bit memory accessed by indirect indexed GBR addressing can be ANDed with 8-bit immediate data. The R0 and memory data do not change. Operation: TST(long m,long n) /* TST Rm,Rn */ { if ((R[n]&R[m])==0) T=1; else T=0; PC+=2; } TSTI(long i) /* TEST #imm,R0 */ { long temp; temp=R[0]&(0x000000FF & (long)i); if (temp==0) T=1; else T=0; PC+=2; } 158 TSTM(long i) /* TST.B #imm,@(R0,GBR) */ { long temp; temp=(long)Read_Byte(GBR+R[0]); temp&=(0x000000FF & (long)i); if (temp==0) T=1; else T=0; PC+=2; } Examples: TST R0,R0 ;Before execution: R0 = H'00000000 ;After execution: T = 1 TST #H'80,R0 ;Before execution: R0 = H'FFFFFF7F ;After execution: T = 1 TST.B #H'A5,@(R0,GBR) ;Before execution: @(R0,GBR) = H'A5 ;After execution: T = 0 159 7.2.67 XOR (Exclusive OR Logical): Logic Operation Instruction Format Abstract Code Cycle T Bit XOR Rm,Rn Rn ^ Rm → Rn 0010nnnnmmmm1010 1 — XOR #imm,R0 R0 ^ imm → R0 11001010iiiiiiii 1 — XOR.B #imm, (R0 + GBR) ^ imm → (R0 + GBR) 11001110iiiiiiii @(R0,GBR) 3 — Description: Exclusive ORs the contents of general registers Rn and Rm, and stores the result in Rn. The contents of general register R0 can also be exclusive ORed with zero-extended 8-bit immediate data, or 8-bit memory accessed by indirect indexed GBR addressing can be exclusive ORed with 8-bit immediate data. Operation: XOR(long m,long n) /* XOR Rm,Rn */ { R[n]^=R[m]; PC+=2; } XORI(long i) /* XOR #imm,R0 */ { R[0]^=(0x000000FF & (long)i); PC+=2; } XORM(long i) /* XOR.B #imm,@(R0,GBR) */ { long temp; temp=(long)Read_Byte(GBR+R[0]); temp^=(0x000000FF & (long)i); Write_Byte(GBR+R[0],temp); PC+=2; } 160 Examples: XOR R0,R1 ;Before execution: ;After execution: R0 = H'AAAAAAAA, R1 = H'55555555 R1 = H'FFFFFFFF XOR #H'F0,R0 ;Before execution: ;After execution: R0 = H'FFFFFFFF R0 = H'FFFFFF0F XOR.B #H'A5,@(R0,GBR) ;Before execution: ;After execution: @(R0,GBR) = H'A5 @(R0,GBR) = H'00 161 7.2.68 XTRCT (Extract): Data Transfer Instruction Format Abstract Code Cycle T Bit XTRCT Rm,Rn Rm: Center 32 bits of Rn → Rn 0010nnnnmmmm1101 1 — Description: Extracts the middle 32 bits from the 64 bits of coupled general registers Rm and Rn, and stores the 32 bits in Rn (figure 7.13). MSB LSB MSB Rm LSB Rn Rn Figure 7.13 Extract Operation: XTRCT(long m,long n) /* XTRCT Rm,Rn */ { unsigned long temp; temp=(R[m]<<16)&0xFFFF0000; R[n]=(R[n]>>16)&0x0000FFFF; R[n]|=temp; PC+=2; } Example: XTRCT 162 R0,R1 ;Before execution: R0 = H'01234567, R1 = H'89ABCDEF ;After execution: R1 = H'456789AB 7.3 Floating Point Instructions and FPU Related CPU Instructions The functions used in the descriptions of the operation of FPU calculations are as follows. long FPSCR; int T; int load_long(long *adress, *data) { /* This function is defined in CPU part */ } int store_long(long *adress, *data) { /* This function is defined in CPU part */ } int sign_of(long *src) { return(*src >> 31); } int data_type_of(long *src) { float abs; abs = *src & 0x7fffffff; if(abs < 0x00800000) { if(sign_of (src) == 0) return(PZERO); else return(NZERO); } else if((0x00800000 <= abs) && (abs < 0x7f800000)) return(NORM); else if(0x7f800000 == abs) { if(sign_of (src) == 0) return(PINF); else return(NINF); } else if(0x00400000 & abs) return(sNaN); else return(qNaN); } } clear_cause_VZ(){ FPSCR &= (~CAUSE_V & ~CAUSE_Z); } 163 set_V(){ FPSCR = (CAUSE_V FLAG_V); } set_Z(){ FPSCR = (CAUSE_Z FLAG_Z); } invalid(float *dest) { set_V(); if((FPSCR & ENABLE_V) == 0) qnan(dest); } } dz(float *dest, int sign) { set_Z(); if((FPSCR & ENABLE_Z) == 0) inf (dest,sign); } zero(float *dest, int sign) { if(sign == 0) *dest = 0x00000000; else *dest = 0x80000000; } int(float *dest, int sign) { if(sign == 0) *dest = 0x7f800000; else *dest = 0xff800000; } qnan(float *dest) { *dest = 0x7fbfffff; } 164 7.3.1 FABS (Floating Point Absolute Value): Floating Point Instruction Format Abstract Code Cycle T Bit FABS FRn |FRn| → FRn 1111nnnn01011101 1 — Description: Obtains arithmetic absolute value (as a floating point number) of the contents of floating point register FRn. The calculation result is stored in FRn. Operation: FABS(float *Frn) /* FABS FRn */ { clear_cause_VZ(); case(data_type_of(FRn)) NORM: { if(sign_of(FRn) == 0) *FRn = *FRn; else *FRn = -*FRn; break; PZERO : NZERO : zero(FRn,0); break; PINF : NINF : inf(FRn,0); break; qnan : qnan(FRn); break; sNaN : invalid(FRn); break; } pc += 2; } FABS Special Cases FRn NORM +0 –0 +INF –INF qNaN sNaN FABS(FRn) ABS +0 +0 +INF +INF qNaN Invalid Note: Non-normalized values are treated as zero. Exceptions: Invalid operation Examples: FABS FR2 ; Floating point absolute value ; Before execution FR2=H'C0800000/*–4 in base 10*/ ; After execution FR2=H'40800000/*4 in base 10*/ 165 7.3.2 FADD (Floating Point Add): Floating Point Instruction Format Abstract Code Cycles T Bit FADD FRm,FRn FRn+FRm → FRn 1111nnnnmmmm0000 1 — Description: Arithmetically adds (as floating point numbers) the contents of floating point registers FRm and FRn. The calculation result is stored in FRn. Operation: FADD (float *FRm,FRn) /* FADD FRm,FRn */ { clear_cause_VZ(); if((data_type_of(FRm) = = sNaN) || (data_type_of(FRn) = = sNaN)) invalid(FRn); else if((data_type_of(FRm) = = qNaN) || (data_type_of(FRn) = = qNaN)) else case(data_type_of(FRm)) qnan(FRn); { NORM: case(data_type_of(FRn)) { PINF : inf(FRn,0); break; NINF : inf(FRn,1); break; default : *FRn = *FRn + *FRm; break; } break; PZERO: case(data_type_of(FRn)) NORM : PZERO : NZERO { *FRn = *FRn + *FRm; break; : zero(FRn,0); break; PINF : inf(FRn,0); break; NINF : inf(FRn,1); break; } break; NZERO: case(data_type_of(FRn)){ NORM : *FRn = *FRn + *FRm; break; PZERO : zero(FRn,0); break; NZERO : zero(FRn,1); break; PINF : inf(FRn,0); break; NINF : inf(FRn,1); break; } PINF: 166 break; case(data_type_of(FRn)) { NINF : invalid(FRn); break; default : inf(FRn,0); break; } break; NINF: case(data_type_of(FRn)){ PINF : invalid(FRn); break; default : inf(FRn,1); break; } break; } pc += 2; } FADD Special Cases FRm FRn NORM NORM +0 –0 +INF ADD +0 –INF qNaN sNaN –INF +0 –0 –0 +INF –INF –INF +INF Invalid Invalid –INF qNaN qNaN sNaN Invalid Note: Non-normalized values are treated as zero. Exceptions: Invalid operation Examples: FADD FR2,FR3 ; Floating point add ; Before execution: ; ; After execution: ; FADD FR5,FR4 FR2=H'40400000/*3 in base 10*/ FR3=H'3F800000/*1 in base 10*/ FR2=H'40400000 FR3=H'40800000/*4 in base 10*/ ; ; Before execution: ; ; After execution: ; FR5=H'40400000/*3 in base 10*/ FR4=H'C0000000/*–2 in base 10*/ FR5=H'40400000 FR4=H'3F800000/*1 in base 10*/ 167 7.3.3 FCMP (Floating Point Compare): Floating Point Instruction Format Abstract Code Cycle T Bit FCMP/ EQ FRm,FRn (FRn==FRm)? 1:0 → T 1111nnnnmmmm0100 1 Comparison result FCMP/GT FRm,FRn (FRn> FRm)? 1:0 → T 1111nnnnmmmm0101 1 Comparison result Description: Arithmetically compares (as floating point numbers) the contents of floating point registers FRm and FRn. The calculation result (true/false) is written to the T bit. Operation: FCMP_EQ(float *FRm,FRn) /* FCMP/EQ FRm,FRn */ { clear_cause_VZ(); if (fcmp_chk(FRm,FRn) = = INVALID) {fcmp_invalid(0); } else if(fcmp_chk(FRm,FRn) = = EQ) T = 1; else T = 0; pc += 2; } FCMP_GT(float *FRm,FRn) /* FCMP/GT FRm,FRn */ { clear_cause_VZ(); if (fcmp_chk(FRm,FRn)==INVALID)||{fcmp_chk(FRm,FRn)==UO)){ fcmp_invalid(0):} else if(fcmp_chk(FRm,FRn) = = GT) else T = 1; T = 0; pc += 2; } fcmp_chk(float *FRm,*FRn) { if((data_type_of(FRm) == sNaN) || (data_type_of(FRn) == sNaN)) else if((data_type_of(FRm) == qNaN) || || (data_type_of(FRn) == qNaN)) else case(data_type_of(FRm)) NORM return(UO); { :case(data_type_of(FRn)) PINF 168 return(INVALID); :return(GT); { break; NINF :return(NOTGT); break; default : break; } break; PZERO NZERO : : case(data_type_of(FRn)) { PZERO : NZERO :return(EQ); break; PINF :return(GT); break; NINF :return(NOTGT); break; default : break; } PINF break; : case(data_type_of(FRn)) { PINF :return(EQ) break; default :return(NOTGT); break; } NINF break; : case(data_type_of(FRn)) { NINF :return(EQ); break; default :return(GT); break; } break; } if(*FRn = = *FRm) return(EQ); else if(*FRn > *FRm) return(GT); else return(NOTGT); } fcmp_invalid(int cmp_flag) { set_V(); if((FPSCR & ENABLE_V) = = 0) T = cmp_flag; } 169 FCMP Special Cases FRm FRn NORM NORM +0 CMP +0 –0 +INF –INF GT !GT qNaN sNaN EQ –0 +INF !GT –INF GT EQ EQ qNaN UO sNaN Invalid Notes: 1. UO if result is FCMP/EQ, invalid if result is FCMP/GT. 2. Non-normalized values are treated as zero. Exceptions: Invalid operation Note: Four comparison operations that are independent of each other are defined in the IEEE standard, but the SH-2E supports FCMP/EQ and FCMP/GT only. However, all comparison conditions can be supported by using these two FCMP instructions in combination with the BT and BF instructions. (FRm = = FRn) fcmp/eq FRm, FRn ; bt (FRm ! = FRn) fcmp/eq FRm, FRn ; bf (FRm < FRn) fcmp/gt FRm, FRn ; bt (FRm <= FRn) fcmp/gt FRn, FRm ; bt (FRm > FRn) fcmp/gt FRn, FRm ; bt (FRm >= FRn) fcmp/gt FRm, FRn ; bf Unorder FRm, FRn fcmp/eq FRm, FRm ; bf Examples: FCMP/EQ: FLDI1 FR6 ;FR6=H'3F800000/*1 in base 10*/ FLDI1 FR7 ;FR7=H'3F800000 CLRT ;T Bit =0 FCMP/EQ FR6,FR7 ; Floating point compare, equal BF TRGET_F ; Don't branch (T=1) BT/S TRGET_T ; Branch FADD FR6,FR7 ; Delay slot, FR7=H'40000000/*2 in base 10*/ NOP 170 NOP TRGET_F FCMP/EQ BT/S FR6,FR7 ; Don't branch (T=0) TRGET_T FLDI1 TRGET_T FCMP/EQ FR7 ; Delay slot FR6,FR7 ; T bit = 0 BF TRGET_F ; Branch first time only NOP ;FR6=FR7=H'3F800000/*1 in base 10*/ .END FCMP/GT: FLDI1 FR2 FLDI1 FR7 FADD FR2,FR7 ;FR2=H'3F800000/*1 in base 10*/ ;FR7=H'40000000/*2 in base 10*/ ; T bit = 0 CLRT FCMP/GT FR2,FR7 ; Floating point compare, greater than BT/S TRGET_T ; Branch (T=1) FLDI1 FR7 TRGET_T FCMP/GT BT FR2,FR7 ; T bit = 0 TRGET_T ; Don't branch (T=0) .END 171 7.3.4 FDIV (Floating Point Divide): Floating Point Instruction Format Abstract Code Cycles T Bit FDIV FRm, FRn FRn/FRm → FRn 1111nnnnmmmm0011 13 — Description: Arithmetically divides (as floating point numbers) the contents of floating point register FRn by the contents of floating point register FRm. The calculation result is stored in FRn. Operation: FDIV(float *FRm,*FRn) /* FDIV FRm,FRn */ { clear_cause_VZ(); if((data_type_of(FRm) = = sNaN) | | (data_type_of(FRn) = = sNaN)) invalid(FRn); else if((data_type_of(FRm) = = qNaN) | | (data_type_of(FRn) = = qNaN)) else case((data_type_of(FRm) NORM qnan(FRn); { : case(data_type_of(FRn)) { PINF : NINF : inf(FRn,sign_of(FRm)^sign_of(FRn)); break; default : *FRn =*FRn / *FRm; break; } break; PZERO : NZERO : case(data_type_of(FRn)) PZERO : NZERO : PINF : NINF default { invalid(FRn); break; : inf(FN,Sign_of(FRm)^sign_of(FRn)); break; : dz(FRn,sign_of(FRm)^sign_of(FRn)); break; } break; PINF : NINF : case(data_type_of(FRn)) 172 { PINF : NINF : default :zero (FRn,sign_of(FRm)^sign_of(FRn)); invalid(FRn); break; break break; } pc += 2; } FDIV Special Cases FRm FRn NORM +0 –0 NORM DIV 0 +0 DZ Invalid 0 +0 –0 –0 +0 +INF –INF qNaN sNaN INF –0 +INF –INF Invalid qNaN qNaN sNaN Invalid Note: Non-normalized values are treated as zero. Exceptions: Invalid operation, divide by zero Examples: FDIV FR6, FR5 ; Floating point divide ; Before execution: ;FR5=H'40800000/*4 in base 10*/ ; ;FR6=H'40400000/*3 in base 10*/ ; After execution: ;FR5=H'3FAAAAAA/*1.33... in base 10*/ ; ;FR6=H'40400000 173 7.3.5 FLDI0 (Floating Point Load Immediate 0): Floating Point Instruction Format Abstract Code Cycles T Bit FLDI0 FRn H'00000000 → FRn 1111nnnn10001101 1 — Description: Loads the floating point number 0 (0x00000000) in floating point register FRn. Operation: FLDI0(float *FRn) /* FLDI0 FRn */ { *FRn = 0x00000000; pc += 2; } Exceptions: None Examples: FLDI0 FR1 ; Load immediate 0 ; Before execution: FR1=x (don't care) ; After execution: 174 FR1=00000000 7.3.6 FLDI1 (Floating Point Load Immediate 1): Floating Point Instruction Format Abstract Code Cycles T Bit FLDI1 FRn H'3F800000 → FRn 1111nnnn10011101 1 — Description: Loads the floating point number 1 (0x3F800000) in floating point register Frn. Operation: FLDI1(float *FRn) /* FLDI1 FRn */ { *FRn = 0x3F800000; pc += 2; } Exceptions: None Examples: FLDI1 FR2 ; Load immediate 1 ; Before execution: FR2=x (don't care) ; After execution: FR2=H'3F800000/*1 in base 10*/ 175 7.3.7 FLDS (Floating Point Load to System Register): Floating Point Instruction Format Abstract Code Cycles T Bit FLDS FRm,FPUL FRm → FPUL 1111nnnn00011101 1 — Description: Loads the contents of floating point register FRm to system register FPUL. Operation: FLDS(float *FRm,*FPUL) /* FLDS FRm,FPUL */ { *FPUL = *FRm; pc += 2; } Exceptions: None Examples: ;Before execution of FLDS and FSTS: FLDI1 FR6 ;FR6=H'3F800000/*1 in base 10*/ FLDI0 FR2 ;FR2=0 ;After execution of FLDS and FSTS: FLDS FR6, FPUL ;FPUL=H'3F800000 FSTS FPUL, FR2 ;FR2= H'3F800000 176 7.3.8 FLOAT (Floating Point Convert from Integer): Floating Point Instruction Format Abstract Code Cycles T Bit FLOAT FPUL,FRn (float)FPUL → FRn 1111nnnn00101101 1 — Description: Interprets the contents of FPUL as an integer value and converts it into a floating point number. The result is stored in floating point register FRn. Operation: FLOAT(int,*FPUL,float *FRn) /* FLOAT FRn */ { clear_cause_VZ(); *FRn = (float)*FPUL; pc += 2; } Exceptions: None Examples: ;Floating Point Convert from Integer ;Before execution of FLOAT instruction: MOV.L #H'00000003,R1 ; R1=H'00000003 FLDI0 FR2 ; FR2=0 LDS R1, FPUL ; FPUL=H'00000003 FLOAT FPUL, FR2 ; FR2=H'40400000/*3 in base 10*/ ;After execution of FLOAT instruction: 177 7.3.9 FMAC (Floating Point Multiply Accumulate): Floating Point Instruction Format Abstract Code Cycles T Bit FMAC FR0, FRm,FRn FR0 × FRm+FRn → FRn 1111nnnnmmmm1110 1 — Description: Arithmetically multiplies (as floating point numbers) the contents of floating point registers FR0 and FRm. To this calculation result is added the contents of floating point register FRn, and the result is stored in FRn. Operation: FMAC(float *FR0,*FRm,*FRn) /* FMAC FR0,FRm,FRn */ { long tmp_FPSCR; float *tmp_FMUL = *FRm; FMUL(F0,tmp_FMUL); pc -=2; /* correct pc */ tmp_FPSCR = FPSCR; /* save cause field for FR0*FRm */ FADD(tmp_FMUL,FRn); FPSCR |= tmp_FPSCR; } 178 /* reflect cause field for F0*FRm */ FMAC Special Cases FRn FR0 FRm +NORM –NORM NORM NORM +0 –0 MAC +INF +INF –INF –INF –INF +INF NORM MAC Invalid +INF –INF –INF +INF +0 +INF +INF –INF –INF –INF +INF +NORM MAC Invalid Invalid +INF –INF –INF +INF +0 –0 +INF –INF –0 +0 –INF +INF Invalid +0 +0 –0 +0 –0 –0 –0 +0 –0 +0 +INF +INF –INF Invalid –INF –INF +INF +NORM +INF +INF –INF –INF +INF Invalid –NORM +INF 0 Invalid +INF –INF sNaN INF –NORM +INF qNaN Invalid 0 –0 –INF INF 0 +0 +INF Invalid –INF Invalid +NORM –INF +INF +INF +INF –INF –NORM 0 qNaN +INF Invalid –INF –INF Invalid –INF 0 INF –INF Invalid Invalid Invalid !sNaN !NaN qNaN All types sNaN sNaN All types qNaN Invalid Note: Non-normalized values are treated as zero. 179 Exceptions: Invalid operation Examples: FMAC FR0, FR3, FR5 ;Floating point multiply accumulate FR0*FR3+FR5->FR5 FMAC FR0, FR0, FR5 FMAC FR0, FR5, FR0 180 ;Before execution: FR0=H'40000000/*2 in base 10*/ ; FR3=H'40800000/*4 in base 10*/ ; FR5=H'3F800000/*1 in base 10*/ ;After execution: FR0=H'40000000/*2 in base 10*/ ; FR3=H'40800000/*4 in base 10*/ ; FR5=H'41100000/*9 in base 10*/ ;FR0*FR0+FR5->FR5 ;Before execution: FR0=H'40000000/*2 in base 10*/ ; FR5=H'3F800000/*1 in base 10*/ ;After execution: FR0=H'40000000/*2 in base 10*/ ; FR5=H'40A00000/*5 in base 10*/ ;FR0*FR5+FR0->FR5 ;Before execution: FR0=H'40000000/*2 in base 10*/ ; FR5=H'40A00000/*5 in base 10*/ ;After execution: FR0=H'41400000/*12 in base 10*/ ; FR5=H'40A00000/*5 in base 10*/ 7.3.10 FMOV (Floating Point Move): Floating Point Instruction Format Abstract Code Cycles T Bit 1. FMOV FRm,Frn FRm → FRn 1111nnnnmmmm1100 1 — 2. FMOV.S @Rm,FRn (Rm) → FRn 1111nnnnmmmm1000 1 — 3. FMOV.S FRm, @Rn FRm → (Rn) 1111nnnnmmmm1010 1 — 4. FMOV.S @Rm+,FRn (Rm) → FRn,Rm+=4 1111nnnnmmmm1001 1 — 5. FMOV.S FRm,@-Rn Rn-=4,FRm → (Rn) 1111nnnnmmmm1011 1 — 6. FMOV.S @(R0,Rm),FRn (R0+Rm) → FRn 1111nnnnmmmm0110 1 — 7. FMOV.S FRm,@(R0,Rn) FRm → (R0+Rn) 1111nnnnmmmm0111 1 — Description: 1. 2. 3. 4. 5. 6. 7. Moves the contents of floating point register FRm to floating point register FRn. Loads the contents of the memory addresses specified by general-use register Rm to floating point register FRn. Stores the contents of floating point register FRm in the memory address position specified by general-use register Rm. Loads the contents of the memory addresses specified by general-use register Rm to floating point register FRn. After the load completes successfully, increments the value of Rm by 4. Stores the contents of floating point register FRm in the memory address position specified by general-use register Rn-4. After the store completes successfully, the decremented value (Rn4) becomes the value of Rm. Loads the contents of the memory addresses specified by general-use registers Rm and R0 to floating point register FRn. Stores the contents of floating point register FRm in the memory address position specified by general-use registers Rn and R0. 181 Operation: FMOV(float *FRm,*FRn) /* FMOV.S FRm,FRn */ { *FRn = *FRm; pc += 2; } FMOV_LOAD(long *Rm,float *FRn) { if(load_long(Rm,FRn) /* FMOV @Rm,FRn */ !=Address_Error) load_long(Rm,FRn); pc += 2; } FMOV_STORE(float *FRm,long *Rn) { /* FMOV.S FRm,@Rn */ if(store_long(FRm,tmp_address) !=Address_Error) store_long(FRm,Rn); pc += 2; } FMOV_RESTORE(long *Rm,float *FRn) { if(load_long(Rm,FRn) /* FMOV.S @Rm+,FRn */ !=Address_Error) *Rm += 4; pc += 2; } FMOV_SAVE(float *FRm,long *Rn) /*FMOV.S FRm,@-Rn */ { long *tmp_address =*Rn -4; if(store_long(FRm,tmp_address) !=Address_Error) Rn = tmp_address; pc += 2; } FMOV_LOAD_index(long *Rm, long *R0, float *FRn)/* FMOV.S @(R0,Rm),FRn*/ { if (load_long(&(*Rm+*R0),FRn), ! = Address_Error); pc += 2; } FMOV_STORE_index(float *FRm,long *R0, long *Rn)/* FMOV.S FRm,@(R0,Rn)*/ 182 { if (store_long(FRm,&((*Rn+*R0)), ! = Address_Error); pc += 2; } Exceptions: Address error Examples: FMOV.S FMOV.S FMOV.S @R1, FR2 FR2, @R3 @R3+,FR3 ;Load ;Before execution: @R1=H'00ABCDEF ; FR2=0 ;After execution: @R1=H'00ABCDEF ; FR2=H'00ABCDEF ;Store ;Before execution: @R3=0 ; FR2=H'40800000 ;After execution: @R3=H'40800000 ; FR2=H'40800000 ;Restore ;Before execution: R3=H'0C700028 ; @R3=H'40800000 ; FR3=0 ;After execution: R3=H'0C70002C ; ; FMOV.S FMOV.S FR4, @-R3 @(R0, R3), FR4 FR3=H'40800000 ;Save ;Before execution: R3=H'0C700044 ; @R3=0 ; FR4=H'01234567 ;After execution: R3=H'0C700040 ; @R3=H'01234567 ; FR4=H'01234567 ;Load with index ;Before execution: R0=H'00000004 183 ; R3=H'0C700040 ; @H'0C700044=H'00ABCDEF ; FR=4 ;After execution: R0=H'00000004 ; R3=H'0C700040 ; ; FMOV.S FR5, @(R0,R3) FR4=H'00ABCDEF ;Store with index ;Before execution: R0=H'00000028 ; R3=H'0C700040 ; @H'0C700068=0 ; FR5=H'76543210 ;After execution: R0=H'00000028 ; R3=H'0C700040 ; @H'0C700068=H'76543210 ; FMOV.S 184 FR5, FR6 ;Register file contents ;Before execution: FR5=H'76543210 ; FR6=x(don't care) ;After execution: FR5=H'76543210 ; FR6=H'76543210 7.3.11 FMUL (Floating Point Multiply): Floating Point Instruction Format Abstract Code Cycles T Bit FMUL FRm,FRn FRn × FRm → FRn 1111nnnnmmmm0010 1 — Description: Arithmetically multiplies (as floating point numbers) the contents of floating point registers FRm and FRn. The calculation result is stored in FRn. Operation: FMUL(float *FRm,*FRn) /* FMUL FRm,FRn */ { clear_cause_VZ(); if((data_type_of(FRm) = = sNaN) || (data_type_of(FRn) = = sNaN)) invalid(FRn); else if((data_type_of(FRm) = = qNaN) || (data_type_of(FRn) = = qNaN)) else case(data_type_of(FRm) NORM qnan(FRn); { : case(data_type_of(FRn)) { PINF : NINF : inf(FRn,sign_of(FRm)^sign_of(FRn)); break; default: *FRn=(*FRn)*(*FRm); } break; break; PZERO : NZERO : case(data_type_of(FRn)) { PINF : NINF : invalid(FRn); break; default: zero(FRn,sign_of(FRm)^sign_of(FRn)); break; } break; PINF : NINF : case(data_type_of(FRn)) { PZERO : NZERO : invalid(FRn); break; default:inf (FRn,sign_of(FRm)^sign_of(FRn)); break } break; } 185 pc += 2; } FMUL Special Cases FRm FRn NORM +0 NORM MUL 0 +0 0 +0 –0 –0 +0 –0 +INF INF –0 +INF –INF qNaN INF Invalid –INF Invalid +INF –INF –INF +INF qNaN qNaN sNaN Invalid Note: Non-normalized values are treated as zero. Exceptions: Invalid operation Examples: FMUL 186 sNaN FR2, FR3 ;Floating point multiply ;Before execution: FR2=H'40000000/*2 in base 10*/ ; FR3=H'40800000/*4 in base 10*/ ;After execution: FR2=H'40000000 ; FR3=H'41000000/*8 in base 10*/ 7.3.12 FNEG (Floating Point Negate): Floating Point Instruction Format Abstract Code Cycles T Bit FNEG FRn -FRn → FRn 1111nnnn01001101 1 — Description: Arithmetically negates (as a floating point number) the contents of floating point register FRn. The calculation result is stored in FRn. Operation: FNEG(float *Frn) /* FNEG FRn */ { clear_cause_VZ(); case(data_type_of(FRn)) { qNaN : qnan(FRn); break; sNaN : invalid(FRn); break; *FRn = -(*Frn); break; default : } pc += 2; } FNEG Special Cases FRn NORM +0 –0 +INF –INF qNaN sNaN FNEG(FRn) NEG –0 +0 –INF +INF qNaN Invalid Note: Non-normalized values are treated as zero. Exceptions: Invalid operation Examples: FNEG FR2 ;Floating point negate ;Before execution: FR2=H'40800000/*4 in base 10*/ ;After execution: FR2=H'C0800000/*–4 in base 10*/ 187 7.3.13 FSTS (Floating Point Store From System Register): Floating Point Instruction Format Abstract Code Cycles T Bit FSTS FPUL,FRn FPUL → FRn 1111nnnn00001101 1 — Description: Copies the contents of system register FPUL to floating point register FRn. Operation: FSTS(float *FRn,*FPUL) /* FSTS FPUL,FRn */ { *FRn = *FPUL; pc += 2; } Exceptions: None Examples: MOV.L #H'00000002, R2 FLDI0 FR5 LDS R2,FPUL FSTS FPUL, R5 188 ;Before execution of FSTS instruction: ;R2=H'00000002 ;FR5=0 ;After execution of FSTS instruction: ;R2=H'00000002 ;FR5= H'00000002 7.3.14 FSUB (Floating Point Subtract): Floating Point Instruction Format Abstract Code Cycles T Bit FSUB FRm, FRn FRn-FRm → FRn 1111nnnnmmmm0001 1 — Description: Arithmetically subtracts (as floating point numbers) the contents of floating point register FRm from contents of floating point register FRn. The calculation result is stored in FRn. Operation: FSUB(float *FRm,FRn) /* FSUB FRm,FRn */ { clear_cause_VZ(); if((data_type_of(FRm) = = sNaN) | | (data_type_of(FRn) = = sNaN)) invalid(FRn); else if((data_type_of(FRm) = = qNaN) | | (data_type_of(FRn) = = qNaN)) else case(data_type_of(FRm)) NORM qnan(FRn); { : case(data_tyoe_of(FRn)) { PINF : inf(FRn,0); break; NINF : inf(FRn,1); break; default : *FRn = *FRn - *FRm; break; } break; PZERO : case(data_type_of(FRn)) { NORM : *FRn = *FRn- *FRm; break; PZERO : zero(FRn,0); break; NZERO : zero(FRn,1); break; PINF : inf(FRn,0); break; NINF : inf(FRn,1); break; } break; NZERO : case(data_type_of(FRn)) { NORM : *FRn = *FRn - *FRm; break; PZERO : NZERO : zero(FRn,0); break; PINF : inf(FRn,0); break; 189 NINF : inf(FRn,1); break; } break; PINF : case(data_type_of(FRn)) { NINF : invalid(FRn); break; default : inf(FRn,1); break; NINF : } break; case(data_type_of(FRn)) { PINF : invalid(FRn); break; default : inf(FRn,0); break; } break; } pc += 2; } FSUB Special Cases FRm FRn NORM NORM +0 –0 SUB +0 +INF –INF +INF –INF qNaN –0 –0 +0 +INF –INF –INF +INF Invalid Invalid qNaN qNaN sNaN Invalid Note: Non-normalized values are treated as zero. Exceptions: Invalid operation Examples: FSUB FR0, FR3 ;Floating point subtract ;Before execution: 190 sNaN ;FR0=H'3F800000/*1 in base 10*/ ; ;FR3=H'40E00000/*7 in base 10*/ ;After execution: ;FR0=H'3F800000/*1 in base 10*/ ; ;FR3=H'40C00000/*6 in base 10*/ FSUB FR3, FR2 ; ;Before execution: ;FR2=H'40800000/*4 in base 10*/ ; ;FR3=H'40C00000/*6 in base 10*/ ;After execution: ;FR2=H'C0000000/*–2 in base 10*/ ; ;FR3=H'40C00000/*6 in base 10*/ 191 7.3.15 FTRC (Floating Point Truncate And Convert To Integer): Floating Point Instruction Format Abstract Code Cycles T Bit FTRC FRm, FPUL (long)FRm → FPUL 1111nnnn00111101 1 — Description: Interprets the contents of floating point register FRm as a floating point number and converts it to an integer by truncating everything after the decimal point. The calculation result is stored in FRn. Operation: #define N_INT_RANGE 0xCF000000 /* 01.000000 * 2^16 */ #define P_INT_RANGE 0x47FFFFFF /* 1.fffffe * 2^30 */ FTRC(float *FRm,int *FPUL) /* FTRC FRm,FPUL */ { clear_cause_VZ(); case(ftrc_type_of(FRm)) { NORM : *FPUL = (long)(*FRm); break; PINF : ftrc_invalid(0); break; NINF : ftrc_invalid(1); break; } pc += 2; } int ftrc_type_of(long *src) { long abs; abs = *src & 0x7FFFFFF; if(sign_of(src) = = 0) if(abs > 0x7F800000) { return(NINF); /* NaN*/ else if(abs > P_INT_RANGE) return(PINF); /* out of range,+INF else return(NORM); /* +0,+NORM */ */ } else { if(*src > N_INT_RANGE) return(NINF);/* out of range ,+INF,NaN*/ else } } 192 return(NORM); /* -0,-NORM*/ ftrc_invalid(long *dest,int sign) { set_V(); if((FPSCR & ENABLE_V) = = 0) { if(sign = = 0) *dest = 0x7FFFFFFF; else *dest = 0x80000000; } } FTRC Special Cases FRn NORM +0 –0 FTRC (FRn) TRC 0 0 positive negative out of out of range rarge 7FFFFFF 8000000 F 0 +INF -INF qNaN sNaN Invalid +MAX Invalid –MAX Invalid –MAX Invalid –MAX Invalid Note: Non-normalized values are treated as zero. Exceptions: Invalid operation Examples: MOV.L #H'402ED9EB, R2 LDS R2, FPUL FSTS FPUL, FR6 FTRC FR6, FPUL STS FPUL, R2 ;FR6=H'402ED9EB/*2.7320 in base 10*/ ;R2=H'00000002/*2 in base 10*/ ;Before execution of FTRC and STS: ; R2=H'402ED9EB ; FR6=H'402ED9EB ;After execution of FTRC and STS: ; R2=H'00000002 ; FR6=H'402ED9EB 193 7.3.16 LDS (Load to System Register): FPU Related CPU Instruction Format Abstract Code Cycles T Bit Rm → FPUL 0100nnnn01011010 1 — 1. LDS Rm, FPUL 2. LDS.L @Rm+,FPUL (Rm) → FPUL,Rm+=4 0100nnnn01010110 1 — 3. LDS Rm,FPSCR Rm → FPSCR 0100nnnn01101010 1 — 4. LDS.L @Rm+,FPSCR (Rm) → FPSCR,Rm+=4 0100nnnn01100110 1 — Description: 1. 2. Moves the contents of general-use register Rm to system register FPUL. Loads the contents of the memory addresses specified by general-use register Rm to system register FPUL. After the load completes successfully, increments the value of Rm by 4. Moves the contents of general-use register Rm to system register FPSCR. Previously defined bits in FPSCR are not changed. Loads the contents of the memory addresses specified by general-use register Rm to system register FPSCR. After the load completes successfully, increments the value of Rm by 4. Previously defined bits in FPSCR are not changed. 3. 4. Operation: #define FPSCR_MASK 0x00018C60 LDS(long *Rm,*FPUL) /* LDS Rm,FPUL */ { *FPUL = *Rm; pc += 2; } LDS_RESTORE(long *Rm, *FPUL) /* LDS.L @Rm+,FPUL */ { if(load_long(Rm,FPUL) != Address_Error) *Rm += 4 ; pc += 2; } LDS(long *Rm,*FPSCR) /* LDS Rm,FPSCR */ { *FPSCR = *Rm & FPSCR_MASK; pc += 2; } LDS_RESTORE(long *Rm, *FPSCR) 194 /* LDS.L @Rm+,FPSCR */ { long *tmp_FPSCR; if(load_long(Rm, tmp_FPSCR) != Address_Error){ *FPSCR =*tmp_FPSCR & FPSCR_MASK; *Rm += 4 ; } pc += 2; } Exceptions: Address error Examples: • LDS Example 1 MOV.L #H'12345678, R2 ;Before execution of LDS and FSTS instructions: ; R2=H'12345678 FR3=0 FLDI0 FR3 ; LDS R2, FPUL ;After execution of LDS and FSTS instructions: ; R2=H'12345678 FSTS FPUL, FR3 ; FR3= H'12345678 Example 2 MOV.L #H'00040801, R4 ;After execution of LDS instruction: LDS R4, FPSCR ;FPSCR=00040801 LDI0 FR0 ;Before execution of LDS.L and FSTS instructions: MOV.L #H'87654321, R4 ; FR0=0 MOV.L #H'0C700128, R8 ; R8=0C700128 MOV.L R4,@R8 ;After execution of LDS.L and FSTS instructions: LDS.L @R8+, FPUL ; FR0=87654321 FSTS FPUL, FR0 ; R8=0C70012C • LDS.L Example 1 195 Example 2 #H'00040C01, R4 ;Before execution of LDS.L instruction: MOV.L #H'0C700134, R8 ; MOV.L R4,@R8 ;After execution of LDS.L instruction: MOV.L LDS.L 196 @R8+, FPSCR R8=0C700134 ; R8=0C700138 ; FPSCR=00040C01 7.3.17 STS (Store from FPU System Register): FPU Related CPU Instruction Format Abstract Code Cycles T Bit 1. STS FPUL,Rn FPUL → Rn 0000nnnn01011010 1 — 2. STS.L FPUL,@-Rn Rn -= 4,FPUL → @(Rn) 0100nnnn01010010 1 — 3. STS FPSCR,Rn FPSCR → Rn 0000nnnn01101010 1 — 4. STS.L FPSCR,@-Rn Rn -= 4,FPSCR → @(Rn) 0100nnnn01100010 1 — Description: 1. 2. Moves the contents of system register FPUL to general-use register Rn. Stores contents of system register FPUL at the memory address position specified by generaluse register Rn-4. After the store completes successfully, the decremented value becomes the value of Rn. Moves the contents of system register FPSCR to general-use register Rn. Stores contents of system register FPSCR at the memory address position specified by general-use register Rn-4. After the store completes successfully, the decremented value becomes the value of Rn. 3. 4. Operation: STS(long *FPUL,*Rn) /* STS.L FPUL,Rn */ { *Rn = *FPUL; pc += 2; } STS_SAVE(long *FPUL,*Rn) /* STS.L FPUL,@-Rn */ { long *tmp_address = *Rn - 4; if(store_long(FPUL,tmp_address) != Address_Error) Rn = tmp_address; pc += 2; } STS(long *FPSCR,*Rn) /* STS FPSCR,Rn */ { *Rn = *FPSCR; pc += 2; } 197 STS STore from FPU System register STS_RESTORE long *FPSCR,*Rn) /* STS.L FPSCR,@-Rn */ { long *tmp_address = *Rn - 4; if(store_long(FPSCR tmp_address) != Address_Error) Rn = tmp_address pc += 2; } Exceptions: Address error Examples: • STS Example 1 MOV.L #H'12ABCDEF, R12 LDS.L @R12, FPUL STS FPUL, R13 ;After execution of STS instruction: ; R13 = 12ABCDEF Example 2 STS FPSCR, R2 ;After execution of STS instruction: Contents of FPSCR at that point stored in R2 register ; • STS.L Example 1 MOV.L #H'0C700148, R7 STS FPUL, @-R7 ;Before execution of STS.L instruction: ; R7 = H'0C700148 ;After execution of STS.L instruction: R7 = H'0C700144, contents of FPUL saved at ; address H'0C700144 ; 198 location H'0C700144 Example 2 MOV.L #H'0C700154, R8 STS.L FPSCR, @-R8 ;After execution of STS.L instruction: ; Contents of FPSCR saved at address H'0C700150 199 Section 8 Pipeline Operation This section describes the operation of the pipelines for each instruction. This information is provided to allow calculation of the required number of CPU instruction execution states (system clock cycles). 8.1 Basic Configuration of Pipelines The Five-Stage Pipeline: Pipelines are composed of the following five stages: • IF (Instruction fetch) Fetches instruction from the memory where the program is stored. • ID (Instruction decode) Decodes the instruction fetched. • EX (Instruction execution) Does data operations and address calculations according to the results of decoding. • MA (Memory access) Accesses data in memory. Generated by instructions that involve memory access, with some exceptions. • WB (Write back) Returns the results of the memory access (data) to a register. Generated by instructions that involve memory loads, with some exceptions. These stages flow with the execution of the instructions and thereby constitute a pipeline. At a given instant, five instructions are being executed simultaneously. The basic pipeline flow is as shown in figure 8.1. The period in which a single stage is operating is called a slot and is indicated by two-way arrows (←→). All instructions have at least the 3 stages IF, ID and EX, but not all have stages MA and WB. The way the pipeline flows also varies with the type of instruction, with some having two MA stages, some accessing the FPU (mm), and so on. Finally, conflicts can occur, for example between IF and MA. When such a conflict occurs, the pipeline flow changes. 201 : Slot Instruction 1 IF Instruction 2 Instruction 3 Instruction 4 Instruction 5 ID EX MA WB IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB IF ID EX MA WB IF ID EX MA Instruction 6 Instruction stream WB Time Figure 8.1 Basic Structure of Pipeline Flow FPU Pipeline: The durations of the stages in the FPU pipeline are the same as those of the stages in the CPU pipeline. In both pipelines, the first stage is instruction fetch (IF). The FPU pipeline also has the following four additional stages: • DF (Decode FPU) Decodes the fetched instruction. • E1 (FPU execution stage 1) Initializes the floating-point operation. • E2 (FPU execution stage 2) Completes the floating-point operation. • SF (Store FPU) Stores the result in the FPU register. All instructions pass through both the CPU and the FPU pipelines. Depending on the instruction, operations are performed either by the CPU pipeline alone or by both pipelines. In the case of floating-point instructions and FPU-related CPU instructions, the FPU pipeline and CPU pipeline operate simultaneously in parallel. In the case of instructions involving the CPU only, the FPU pipeline does not operate; only the CPU pipeline operates. Refer to 8.8 Instruction Pipeline Operation for details. 202 8.2 Slot and Pipeline Flow The time period in which a single stage operates called a slot. Slots must follow the rules described below. Instruction Execution: Each stage (IF, ID, EX, MA, WB) of an instruction must be executed in one slot. Two or more stages cannot be executed within one slot (figure 8.2), with exception of WB and MA. Since WB is executed immediately after MA, however, some instructions may execute MA and WB within the same slot. : Slot X Instruction 1 IF ID Instruction 2 EX MA WA IF ID EX MA W/D Note: ID and EX of instruction 1 are executed in the same slot. Figure 8.2 Impossible Pipeline Flow 1 Slot Sharing: A maximum of one stage from another instruction may be set per slot, and that stage must be different from the stage of the first instruction. Identical stages from two different instructions may never be executed within the same slot (figure 8.3). : Slot X Instruction 1 IF ID EX MA WB Instruction 2 IF ID EX MA WB IF ID EX MA WB Instruction 4 IF ID EX MA WB Instruction 5 IF ID EX MA WB Instruction 3 Note: Same stage of another instruction is being executed in same slot. Figure 8.3 Impossible Pipeline Flow 2 Slot Length: The number of states (system clock cycles) S for the execution of one slot is calculated with the following conditions: • S = (the cycles of the stage with the highest number of cycles of all instruction stages contained in the slot). This means that the instruction with the longest stage stalls others with shorter stages. 203 • The number of execution cycles for each stage: IF ID EX MA WB The number of memory access cycles for instruction fetch Always one cycle Always one cycle The number of memory access cycles for data access Always one cycle As an example, figure 8.4 shows the flow of a pipeline in which the IF (memory access for instruction fetch) of instructions 1 and 2 are two cycles, the MA (memory access for data access) of instruction 1 is three cycles and all others are one cycle. The dashes indicate the instruction is being stalled. (2) Instruction 1 Instruction 2 IF (2) IF (1) (3) (1) ID — EX MA MA MA WB IF IF ID EX — MA — Figure 8.4 Slots Requiring Multiple Cycles 204 (1) WB : Slot Number of cycles 8.3 Number of Instruction Execution Cycles The number of instruction execution cycles is counted as the interval between execution of EX stages. The number of cycles between the start of the EX stage for instruction 1 and the start of the EX stage for the following instruction (instruction 2) is the execution time for instruction 1. For example, in a pipeline flow like that shown in figure 8.5, the EX stage interval between instructions 1 and 2 is five cycles, so the execution time for instruction 1 is five cycles. Since the interval between EX stages for instructions 2 and 3 is one cycle, the execution time of instruction 2 is one cycle. If a program ends with instruction 3, the execution time for instruction 3 should be calculated as the interval between the EX stage of instruction 3 and the EX stage of a hypothetical instruction 4, using a MOV Rm, Rn that follows instruction 3. (In figure 8.5, the execution time of instruction 3 would thus be one cycle.) In this example, the MA of instruction 1 and the IF of instruction 4 are in contention. For operation during the contention between the MA and IF, see section 8.4, Contention between Instruction Fetch (IF) and Memory Access (MA). The total execution time for instructions 1 through 3 in Figure 8 is seven cycles (5 + 1 + 1). : Slot (2) Instruction 1 Instruction 2 IF (2) IF (4) (2) (1) (1) ID — EX — MA MA IF IF ID — — — — EX IF IF — — — ID EX MA IF ID EX ) Instruction 3 (Instruction 4 : MOV Rm, Rn MA W/D Figure 8.5 Method for Counting Instruction Execution Cycles 205 8.4 Contention between Instruction Fetch (IF) and Memory Access (MA) Basic Operation when IF and MA Are in Contention: The IF and MA stages both access memory, so they cannot operate simultaneously. When the IF and MA stages both try to access memory within the same slot, the slot splits as shown in figure 8.6. When there is a WB, it is executed immediately after the MA ends. A B C D E F G : Slot Instruction 1 IF Instruction 2 ID EX MA W/D IF ID EX MA W/D IF ID EX IF ID EX IF ID Instruction 3 Instruction 4 Instruction 5 MA of instruction 2 and IF of instruction 5 contend at E When MA and IF are A B C D MA of instruction 1 and IF of instruction 4 contend at D EX in contention, the following occurs: E F G : Slot Instruction 1 Instruction 2 Instruction 3 Instruction 4 Instruction 5 IF ID EX MA WB IF ID — EX MA WB IF — ID — EX IF — ID EX IF ID Split at D Split at E EX Figure 8.6 Operation when IF and MA Are in Contention The slots in which MA and IF contend are split into two cycles. MA is given priority to execute in the first half (when there is a WB, it immediately follows the MA), and the EX, ID, and IF are executed simultaneously in the latter half. For example, in figure 8.6 the MA of instruction 1 is executed in slot D while the EX of instruction 2, the ID of instruction 3 and IF of instruction 4 are executed simultaneously thereafter. In slot E, the MA of instruction 2 is given priority and the EX of instruction 3, the ID of instruction 4 and the IF of instruction 5 executed thereafter. The number of cycles for a slot in which MA and IF are in contention is the sum of the number of memory access cycles for the MA and the number of memory access cycles for the IF. Relationship between Locations of Instructions in Memory and IF Stages: The SH-2E accesses instructions in memory in the 32-bit mode. Since all of the SH-2E instructions have a fixed length of 16 bits, it is basically possible to access two instructions per IF stage. Whether the 206 IF fetches one instruction or two depends on where in memory the instruction(s) are located (word/longword boundary). If an instruction is located at a longword boundary, it is possible to fetch two instructions using a single IF operation. This means that the IF for the next instruction does not generate a separate bus cycle in order to fetch the instruction. In addition, the IF for the instruction after that fetches two instructions, and therefore the IF for the instruction which follows again generates no bus cycle. In other words, IF stages for instructions located in memory at longword boundaries (instructions for which the bottom two address bits are 00: A1 = 0, A0 = 0) actually fetch two instructions. Therefore no bus cycle is generated by the IF for the following instruction. These instruction fetches that do not generate bus cycles are indicated in lower case as "if" rather than IF. An "if" is always one cycle. On the other hand, if due to branching or the like an instruction at a word boundary (instructions for which the bottom two address bits are 10: A1 = 1, A0 = 0) is fetched, only one instruction can be fetched in the IF bus cycle. Consequently, the IF for the next instruction generates a bus cycle. Then two instructions are fetched from the subsequent IF onward. Figure 8.7 illustrates the operations described above. 32 bits ······ Instruction 1 Instruction 1 Instruction 2 Instruction 3 Instruction 4 Instruction 2 IF ID EX if ID EX IF ID EX if ID EX IF ID EX if ID ······ Instruction 3 Instruction 4 Instruction 5 Instruction 6 ······ Instruction 5 IF if Instruction 6 On-chip ROM/RAM or on-chip cache Instruction 3 Instruction 4 Instruction 5 Instruction 6 EX (a) Fetches Beginning with an Instruction (Instruction 1) Located at a Long Word Boundary IF Instruction 2 : Slot : Bus cycle generated : No bus cycle generated ······ Instruction 2 ······ Instruction 3 Instruction 4 ······ Instruction 5 Instruction 6 IF if ID EX IF ID EX if ID EX IF ID EX if ID : Slot : Bus cycle generated : No bus cycle generated EX (b) Fetches Beginning with an Instruction (Instruction 2) Located at a Word Boundary Figure 8.7 Relationship between Locations of Instructions in Memory and IF Stages Relationship between Position of Instructions Located in On-Chip Memory and Contention between IF and MA: When an instruction is located in on-chip memory, there are instruction fetch stages (“if”, written in lower case) that do not generate bus cycles. When an if is in contention with an MA, the slot will not split, as it does when an IF and an MA are in contention, 207 because ifs and MAs can be executed simultaneously. Such slots execute in the number of cycles the MA requires for memory access. This is illustrated in Figure 8.8. When programming, avoid contention of MA and IF whenever possible and pair MAs with ifs to increase the instruction execution speed. In other words, if an instruction with a four (five) stage pipeline consisting of IF, ID, EX, MA, (MB) is located at a memory longword boundary (the instruction's bottom two address bits are 00: A1 = 0, A0 = 0), the MA stage uses the same slot as the if following it, so no stall occurs. 32 bits ······ Instruction 1 Instruction 1 Instruction 2 Instruction 3 Instruction 4 Instruction 2 ······ Instruction 3 Instruction 4 Instruction 5 Instruction 6 ······ Instruction 5 Instruction 6 IF ID EX MA WB if ID EX MA — IF ID — EX if — ID EX IF ID EX if ID WB IF if : Split : No split EX Note: In slot A there is contention between MA and if, so there is no split. In slot B there is contention between MA and IF, resulting in a split. Figure 8.8 Relationship between Position of Instructions Located in On-chip Memory and Contention between IF and MA 208 8.5 Effects of Memory Load Instructions on the Pipeline Instructions that involve loading from memory return data to the destination register during the WB stage, which comes at the end of the pipeline. The WB stage of such a load instruction (load instruction 1) will thus not have ended before after the EX stage of the instruction that immediately follows it (instruction 2) begins. When instruction 2 uses the same destination register as load instruction 1, the contents of that register will not be ready, so any slot containing the MA of instruction 1 and EX of instruction 2 will split. When the destination register of load instruction 1 is the same as the destination, not the source, of instruction 2 it will still split. When the destination of load instruction 1 is the status register (SR) and the flag in it is fetched by instruction 2 (as ADDC does), a split occurs. No split occurs, however, in the following cases: • When instruction 2 is a load instruction and its destination is the same as that of load instruction 1 • When instruction 2 is MAC @Rm+,@Rn+ and the destinations of Rm and load instruction 1 were the same The number of cycles in the slot generated by the split is the number of MA cycles plus the number of IF (or if) cycles, as shown in figure 8.9. This means the execution speed will be lowered if the instruction that will use the results of the load instruction is placed immediately after the load instruction. The instruction that uses the result of the load instruction will not slow down the program if placed one or more instructions after the load instruction. : Slot Load instruction 1 (MOV.W@R0,R1) Instruction 2 (ADD R1,R2) Instruction 3 Instruction 4 IF EX EX MA WB IF ID — EX IF — ID EX ····· IF ID ····· Figure 8.9 Effects of Memory Load Instructions on the Pipeline (1) 209 8.6 FPU Contention In addition to the LDS and STS instructions, which move data between the CPU and FPU, loading and storing floating point numbers also uses the MA stage of the pipeline. Consequently, such instructions create contention with the IF stage. If the register (FR0 to FR15, FPUL) to which the result of a floating point arithmetic calculation instruction, the FMOV instruction, or a floating point number load instruction is stored is read (used as the source register) by the next instruction, the execution of this instruction (the next instruction) is delayed by one slot cycle (Figure 8.10). Slot Floating point arithmetic calculation instruction (FADD FR1, FR2) IF Next floating point instruction (FMOV FR2, FR3) ID E1 E2 SF IF DF — E1 E2 SF Figure 8.10 FPU Contention 1 If the LDS or LDS.L instruction is used to change the value of FPSCR, the execution of the next instruction is delayed by two slot cycle (Figure 8.11). Slot Instruction 1 (LDS R2, FPSCR) Instruction 2 (FADD FR4, FR5) IF ID E1 E2 SF IF DF — — E1 E2 SF Figure 8.11 FPU Contention 2 If the STS or STS.L instruction is used to read the value of FPSCR the execution is delayed by two slot cycle (Figure 8.12). Slot Instruction 1 (FADD FR6, FR9) Instruction 2 (STS FPSCR, R3) IF ID E1 E2 SF IF DF — — E1 Figure 8.12 FPU Contention 3 210 E2 SF The FDIV instruction require 13 cycles in the E1 stage. During this period, no other floating point instruction or FPU-related CPU instruction may enter the E1 stage. If another floating point instruction or FPU-related CPU instruction are encountered before the FDIV instruction has finished using the E1 stage, the fixed slot duration for the execution of that instruction is delayed, and the instruction enters the E1 stage only after the FDIV instruction has finished using the SF stage (Figure 8.13). Slot Instruction 1 (FDIV FR6, FR7) Floating point instruction (FMOV FR8, FR10) IF ID E1 ... E1 E2 SF IF DF ... ... ... ... E1 E2 SF Figure 8.13 FPU Contention 4 211 8.7 Programming Guide When writing programs, follow the guidelines below in order to increase instruction execution speed. • Instructions with memory accesses (MA) should be located in memory at longword boundaries (position where the instruction's bottom two address bits are 00: A1 = 0, A0 = 0). This will prevent contention between MA and instruction fetch (IF). • The instruction immediately following a memory load instruction should not use the same register as the destination register of the load instruction. • Instructions that use the FPU should be arranged so that they are not sequential. Also, instructions that access registers MACH and MACL in order to fetch the results of operations performed by the FPU should no be situated immediately following instructions that use the FPU. • The instruction immediately preceding a floating-point arithmetic operation instruction should not use the destination register of the floating-point operation instruction. • As far as possible, avoid placing a floating-point instruction or FPU-related CPU instruction within the 14 instructions following the FDIV instruction. 8.8 Operation of Instruction Pipelines This section describes the operation of the instruction pipelines. By combining these with the rules described so far, the way pipelines flow in a program and the number of instruction execution cycles can be calculated. In the following figures, “Instruction A” refers to the instruction being discussed. When “IF” is written in the instruction fetch stage, it may refer to either “IF” or “if”. When there is contention between IF and MA, the slot will split, but the manner of the split is not discussed in the tables, with a few exceptions. When a slot has split, see section 8.4, Contention between Instruction Fetch (IF) and Memory Access (MA). Base your response on the rules for pipeline operation given there. Table 8.1 shows the number of instruction stages and number of execution cycles as follows: • • • • • • Type: Given by function Category: Categorized by differences in instruction operation Stages: The number of stages in the instruction Cycles: The number of execution cycles when there is no contention Contention: Indicates the contention that occurs Instructions: Gives a mnemonic for the instruction concerned 212 Table 8.1 Number of Instruction Stages and Execution Cycles Type Category RegisterData register transfer instructions transfer instructions Stages Cycles Contention Instruction 3 MOV #imm,Rn MOV Rm,Rn MOVA @(disp,PC),R0 MOVT Rn 1 — SWAP.B Rm,Rn SWAP.W Rm,Rn Memory load 5 instructions 1 • • XTRCT Rm,Rn MOV.W @(disp,PC),Rn MOV.L @(disp,PC),Rn MOV.B Rm,@Rn MOV.W Rm,@Rn MOV.L Rm,@Rn MOV.B @Rm+,Rn MOV.W @Rm+,Rn MA contends with MOV.L IF MOV.B @Rm+,Rn Contention occurs when an instruction that uses the same destination register is placed immediately after this instruction @(disp,Rm),R0 MOV.W @(disp,Rm),R0 MOV.L @(disp,Rm),Rn MOV.B @(R0,Rm),Rn MOV.W @(R0,Rm),Rn MOV.L @(R0,Rm),Rn MOV.B @(disp,GBR),R0 MOV.W @(disp,GBR),R0 MOV.L @(disp,GBR),R0 213 Table 8.1 Number of Instruction Stages and Execution Cycles (cont) Type Category Stages Memory store 4 Data instructions transfer instructions (cont) 3 Arithmetic Arithmetic instructions instructions between registers (except multiplic-ation instruc-tions) Cycles Contention Instruction 1 MOV.B @Rm,Rn MOV.W @Rm,Rn MOV.L @Rm,Rn MOV.B Rm,@–Rn MOV.W Rm,@–Rn MOV.L Rm,@–Rn MOV.B R0,@(disp,Rn) MOV.W R0,@(disp,Rn) MOV.L Rm,@(disp,Rn) MOV.B Rm,@(R0,Rn) MOV.W Rm,@(R0,Rn) MOV.L Rm,@(R0,Rn) MOV.B R0,@(disp,GBR) MOV.W R0,@(disp,GBR) MOV.L R0,@(disp,GBR) ADD Rm,Rn ADD #imm,Rn ADDC Rm,Rn ADDV Rm,Rn 1 MA contends with IF — CMP/EQ #imm,R0 CMP/EQ Rm,Rn CMP/HS Rm,Rn CMP/GE Rm,Rn CMP/HI Rm,Rn CMP/GT Rm,Rn CMP/PZ Rn CMP/PL Rn CMP/STR Rm,Rn DIV1 Rm,Rn DIV0S Rm,Rn DIV0U 214 Table 8.1 Number of Instruction Stages and Execution Cycles (cont) Type Category Stages Cycles Contention Arithmetic instructions (cont) Multiply/ add 7 instructions Doublelength multiply/ accumulate instruction 9 Multiplication 6 instructions 3/(2)*1 3/(2 to 4)*1 • If an instruction that uses the FPU follows this instruction, FPU contention occurs. • MA contends with IF • If an instruction that uses the FPU follows this instruction, FPU contention occurs. • MA contends with IF 1 to 3*1 • • Doublelength multiply/ accumulate instruction 9 2 to 4*1 • • If an instruction that uses the FPU follows this instruction, FPU contention occurs. Instruction DT Rn EXTS.B Rm,Rn EXTS.W Rm,Rn EXTU.B Rm,Rn EXTU.W Rm,Rn NEG Rm,Rn NEGC Rm,Rn SUB Rm,Rn SUBC Rm,Rn SUBV Rm,Rn MAC.W @Rm+,@Rn+ MAC.L @Rm+,@Rn+ MULS.W Rm,Rn MULU.W Rm,Rn MA contends with IF If an instruction that uses the FPU follows this instruction, FPU contention occurs. DMULS.L Rm,Rn DMULU.L Rm,Rn MUL.L Rm,Rn MA contends with IF 215 Table 8.1 Number of Instruction Stages and Execution Cycles (cont) Type Category Stages Cycles Contention 3 RegisterLogic register logic operation instructions operation instructions Memory logic 6 operations instructions TAS instruction Shift Shift instructions instructions 1 3 — Instruction AND Rm,Rn AND #imm,R0 NOT Rm,Rn OR Rm,Rn OR #imm,R0 TST Rm,Rn TST #imm,R0 XOR Rm,Rn XOR #imm,R0 MA contends with IF AND.B #imm,@(R0,GBR) OR.B #imm,@(R0,GBR) TST.B #imm,@(R0,GBR) XOR.B #imm,@(R0,GBR) 6 4 MA contends with IF TAS.B @Rn 3 1 — Rn ROTL ROTR Rn ROTCL Rn ROTCR Rn SHAL Rn SHAR Rn SHLL Rn SHLR Rn SHLL2 Rn SHLR2 Rn SHLL8 Rn SHLR8 Rn SHLL16 Rn SHLR16 Rn 216 Table 8.1 Number of Instruction Stages and Execution Cycles (cont) Type Category Stages Cycles Contention Branch Conditional instructions branch instructions 3 Delayed conditional branch instructions 3 3/1*2 2/1*2 — — Instruction BF label BT label BF/S label BT/S label Unconditional 3 branch instructions 2 — BRA label BRAF Rm BSR label BSRF Rm JMP @Rm JSR @Rm RTS System System control control ALU instructions instructions 3 1 — CLRT LDC Rm,SR LDC Rm,GBR LDC Rm,VBR LDS Rm,PR NOP SETT LDS.L instructions (PR) 5 1 STC SR,Rn STC GBR,Rn STC VBR,Rn STS PR,Rn • Contention occurs LDS.L when an instruction that uses the same destination register is placed immediately after this instruction • MA contends with IF @Rm+,PR 217 Table 8.1 Number of Instruction Stages and Execution Cycles (cont) Type Category STS.L System instruction control instructions (PR) (cont) LDC.L instructions Stages Cycles Contention Instruction 4 1 MA contends with IF STS.L PR,@–Rn 5 3 • LDC.L @Rm+,SR LDC.L @Rm+,GBR LDC.L @Rm+,VBR STC.L SR,@–Rn STC.L GBR,@–Rn STC.L VBR,@–Rn • STC.L instructions 4 Register → 4 MAC transfer instruction 218 2 1 Memory → 4 MAC transfer instructions 1 MAC → register transfer instruction 1 5 Contention occurs when an instruction that uses the same destination register is placed immediately after this instruction MA contends with IF MA contends with IF • Contention occurs with multiplier LDS Rm,MACH • MA contends with IF LDS Rm,MACL • Contention occurs with multiplier • MA contends with IF • Contention occurs with multiplier • Contention occurs when an instruction that uses the same destination register is placed immediately after this instruction • MA contends with IF CLRMAC LDS.L @Rm+,MACH LDS.L @Rm+,MACL STS MACH,Rn STS MACL,Rn Table 8.1 Number of Instruction Stages and Execution Cycles (cont) Type Category Stages Cycles Contention System control instructions (cont) MAC → memory transfer instruction 4 1 RTE instruction 5 4 — RTE TRAP instruction 9 8 — TRAPA #imm SLEEP instruction 3 3 — SLEEP 5 (FPU pipeline) 4 (CPU pipeline) 1 • Contention occurs LDS if next instruction LDS.L reads FPUL • MA in CPU pipeline contends with IF FPU-related FPUL load CPU instruction instruction • • Instruction Contention occurs STS.L with multiplier STS.L MACH,@–Rn MACL,@–Rn MA contends with IF Rm,FPUL @Rm+,FPUL FPSCR load 5 (FPU instruction pipeline) 4 (CPU pipeline) 1 • Contention occurs LDS LDS.L as shown in Figure 8.11 Rm,FPSCR @Rm+,FPSCR FPUL store instruction (STS) 1 • Contention occurs STS if next instruction uses Rn FPUL,Rn • MA in CPU pipeline contends with IF • STS.L MA in CPU pipeline contends with IF FPUL store instruction (STS.L) 4 (FPU pipeline) 5 (CPU pipeline) 4 (FPU pipeline) 4 (CPU pipeline) 1 FPUL,@-Rn 219 Table 8.1 Number of Instruction Stages and Execution Cycles (cont) Type Category FPU-related FPSCR store instruction CPU instruction (STS) (cont) Stages Cycles Contention 4 (FPU pipeline) 5 (CPU pipeline) 1 FPSCR store 4 (FPU instruction pipeline) (STS.L) 4 (CPU pipeline) Floatingpoint instruction • Contention occurs STS as shown in Figure 8.12 • Contention occurs if next instruction uses Rn • MA in CPU pipeline contends with IF • Contention occurs STS.L as shown in Figure 8.12 • MA in CPU pipeline contends with IF FPSCR,Rn FPSCR,@-Rn Floating-point register transfer instruction 5 (FPU pipeline) 3 (CPU pipeline) 1 • Contention occurs FLDS if next instruction FMOV FSTS reads destination register FRm,FPUL FRm,FRn FPUL,FRn Floating-point register immediate instruction 5 (FPU pipeline) 3 (CPU pipeline) 1 • Contention occurs FLDI0 if next instruction FLDI1 reads destination register FRn FRn Floating-point 5 (FPU register load pipeline) instruction 4 (CPU pipeline) 1 • Contention occurs FMOV.S if next instruction FMOV.S FMOV.S reads destination register @Rm,FRn @Rm+,FRn @(R0,Rm),FRn • MA in CPU pipeline contends with IF • FMOV.S MA in CPU pipeline contends FMOV.S FMOV.S with IF Floating-point 4 (FPU register store pipeline) instruction 4 (CPU pipeline) 220 1 Instruction 1 FRm,@Rn FRm,@-Rn FRm,@(R0,Rn) Table 8.1 Number of Instruction Stages and Execution Cycles (cont) Type Category Stages Cycles Contention Floatingpoint instruction (cont) Floating-point register operation instruction (other than FDIV) 5 (FPU pipeline) 3 (CPU pipeline) 1 • Contention occurs if next instruction reads destination register Floating-point register operation instruction (FDIV) 17 (FPU pipeline) 3 (CPU pipeline) 13 • Contention occurs FDIV as shown in Figure 8.13 Floating-point register compare instruction 3 (FPU pipeline) 3 (CPU pipeline) 1 Instruction FABS FADD FLOAT FMAC FMUL FNEG FSUB FTRC FRn FRm,FRn FPUL,FRn FR0,FRm,FRn FRm,FRn FRn FRm,FRn FRm,FPUL FRm,FRn FCMP/EQ FRm,FRn FCMP/GT FRm,FRn Notes: 1. The normal minimum number of execution cycles. The number in parentheses is the number of cycles when there is contention with following instructions. 2. One state when there is no branch. 221 8.8.1 Data Transfer Instructions Register-Register Transfer Instructions Instruction Types: • • • • • • • MOV MOV MOVA MOVT SWAP.B SWAP.W XTRCT #imm, Rn Rm, Rn @(disp, PC), R0 Rn Rm, Rn Rm, Rn Rm, Rn Pipeline: : Slot Instruction A Next instruction Third instruction in series ...... IF ID EX IF ID EX ...... IF ID EX ...... Figure 8.14 Register-Register Transfer Instruction Pipeline Operation: The pipeline ends after three stages: IF, ID, and EX. Data is transferred in the EX stage via the ALU. 222 Memory Load Instructions Instruction Types: • • • • • • • • • MOV.W MOV.L MOV.B MOV.W MOV.L MOV.B MOV.W MOV.L MOV.B • • • • • • • • @(disp, PC), Rn @(disp, PC), Rn @Rm, Rn @Rm, Rn @Rm, Rn @Rm+, Rn @Rm+, Rn @Rm+, Rn @(disp, Rm), R0 MOV.W MOV.L MOV.B MOV.W MOV.L MOV.B MOV.W MOV.L @(disp, Rm), R0 @(disp, Rm), Rn @(R0, Rm), Rn @(R0, Rm), Rn @(R0, Rm), Rn @(disp, GBR), R0 @(disp, GBR), R0 @(disp, GBR), R0 Pipeline: : Slot Instruction A Next instruction Third instruction in series ...... IF ID EX MA IF ID EX WB ..... IF ID EX ..... Figure 8.15 Memory Load Instruction Pipeline Operation: The pipeline has five stages: IF, ID, EX, MA, and WB (figure 8.15). If an instruction that uses the same destination register as this instruction is placed immediately after it, contention will occur. (See section 8.5 Effects of Memory Load Instructions on the Pipeline) 223 Memory Store Instructions Instruction Types: • • • • • • • • MOV.B MOV.W MOV.L MOV.B MOV.W MOV.L MOV.B MOV.W • • • • • • • Rm, @Rn Rm, @Rn Rm, @Rn Rm, @–Rn Rm, @–Rn Rm, @–Rn R0, @(disp, Rn) R0, @(disp, Rn) MOV.L MOV.B MOV.W MOV.L MOV.B MOV.W MOV.L Rm, @(disp, Rn) Rm, @(R0, Rn) Rm, @(R0, Rn) Rm, @(R0, Rn) R0, @(disp, GBR) R0, @(disp, GBR) R0, @(disp, GBR) Pipeline: : Slot Instruction A Next instruction Third instruction in series ...... IF ID EX MA IF ID EX ..... IF ID EX ..... Figure 8.16 Memory Store Instructions Pipeline Operation: The pipeline has four stages: IF, ID, EX, and MA (figure 8.16). Data is not returned to the register so there is no WB stage. 224 8.8.2 Arithmetic Instructions Arithmetic Instructions between Registers (Except Multiplication Instructions): Include the following instruction types: • • • • • • • • • • • • • ADD ADD ADDC ADDV CMP/EQ CMP/EQ CMP/HS CMP/GE CMP/HI CMP/GT CMP/PZ CMP/PL CMP/STR • • • • • • • • • • • • • Rm, Rn #imm, Rn Rm, Rn Rm, Rn #imm, R0 Rm, Rn Rm, Rn Rm, Rn Rm, Rn Rm, Rn Rn Rn Rm, Rn DIV1 DIV0S DIV0U DT EXTS.B EXTS.W EXTU.B EXTU.W NEG NEGC SUB SUBC SUBV Rm, Rn Rm, Rn Rn Rm, Rn Rm, Rn Rm, Rn Rm, Rn Rm, Rn Rm, Rn Rm, Rn Rm, Rn Rm, Rn : Slot Instruction A Next instruction Third instruction in series ...... IF ID EX MA IF ID EX ..... IF ID EX ..... Figure 8.17 Pipeline for Arithmetic Instructions between Registers Except Multiplication Instructions The pipeline has three stages: IF, ID, and EX (figure 8.17). The data operation is completed in the EX stage via the ALU. 225 Multiply/Accumulate Instruction: Includes the following instruction type: • MAC.W @Rm+, @Rn+ : Slot Instruction A Next instruction Third instruction in series ...... IF ID EX MA MA mm mm IF — ID EX MA WB IF ID EX MA WB Figure 8.18 Multiply/Accumulate Instruction Pipeline The pipeline has seven stages: IF, ID, EX, MA, MA, mm, and mm. The second MA reads the memory and accesses the multiplier. mm indicates that the multiplier is operating. mm operates for two cycles after the final MA ends, regardless of slot. The ID of the instruction after the MAC.W instruction is stalled for 1 slot. The two MAs of the MAC.W instruction, when they contend with IF, split the slots as described in Section 8.4, Contention between Instruction Fetch (IF) and Memory Access (MA). When an instruction that does not use the multiplier comes after the MAC.W instruction, the MAC.W instruction may be considered to be a five-stage pipeline instruction of IF, ID, EX, MA, MA. In such cases, the ID of the next instruction simply stalls one slot and thereafter operates like a normal pipeline. When an instruction that uses the multiplier comes after the MAC.W instruction, however, contention occurs with the multiplier, so operation is different from normal. The following cases are possible: (a) MAC.W instruction follows immediately after MAC.W instruction (b) MAC.L instruction follows immediately after MAC.W instruction (c) MULS.W instruction follows immediately after MAC.W instruction (d) DMULS.L instruction follows immediately after MAC.W instruction (e) STS (register) instruction follows immediately after MAC.W instruction (f) STS.L (memory) instruction follows immediately after MAC.W instruction (g) LDS (register) instruction follows immediately after MAC.W instruction (h) LDS.L (memory) instruction follows immediately after MAC.W instruction 226 (a) MAC.W instruction follows immediately after MAC.W instruction The second MA of MAC.W instruction does not contend with the mm generated by the preceding multiply instruction. : Slot IF MAC.W MAC.W ID EX MA MA mm mm IF — ID EX MA MA mm mm IF — ID EX MA ······ Next instruction in series ·········· Figure 8.19 MAC.W Instruction Follows Immediately after MAC.W Instruction (1) If the MAC.W instruction occurs twice in succession, contention between MA and IF could cause a delay in instruction execution. Refer to the diagram below. This diagram takes into account the possibility of contention between MA and IF. : Slot MAC.W if MAC.W ID EX MA MA mm mm IF — ID EX MA — MA mm mm if — — ID EX MA MA mm mm IF — ID EX MA MA MAC.W MAC.W mm ······ ·········· Figure 8.20 MAC.W Instruction Follows Immediately after MAC.W Instruction (2) If contention occurs between the second MA of the MAC.W instruction and IF, the slot splits normally. Refer to the diagram below. This diagram takes into account the possibility of contention between MA and IF. : Slot MAC.W IF MAC.W Other instruction Other instruction Other instruction ID EX MA — MA mm mm if — — ID EX MA MA mm mm IF — ID — EX MA ······ if — ID EX ······ IF ·········· Figure 8.21 MAC.W Instruction Follows Immediately after MAC.W Instruction (3) 227 (b) MAC.L instruction follows immediately after MAC.W instruction The second MA of the MAC.W instruction does not contend with the mm generated by the preceding multiply instruction. : Slot MAC.W IF MAC.L ID EX MA MA mm mm IF — ID EX MA MA mm mm IF — ID EX MA ······ Next instruction in series mm mm ·········· Figure 8.22 MAC.L Instruction Follows Immediately after MAC.W Instruction (c) MULS.W instruction follows immediately after MAC.W instruction The MULS.W instruction has an MA stage for accessing the multiplier. If contention with the MA of MULS.W occurs during the MAC.W instruction's multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. If there is one or more instruction that does not use the multiplier located between MAC.W and MULS.W, no contention occurs between MAC.W and MULS.W and there is no delay. Note that the slot splits if there is contention between the MA of MULS.W and IF. : Slot MAC.W IF MULS.W ID EX MA MA IF — ID EX M A mm mm IF ID EX — MA ······ Other instruction mm mm ·········· : Slot MAC.W IF Branch destination MULS.W Other instruction ID EX MA MA IF — ID EX mm mm IF ID EX MA mm mm IF ID EX MA ······ Figure 8.23 MULS.W Instruction Follows Immediately after MAC.W Instruction 228 (d) DMULS.L instruction follows immediately after MAC.W instruction The MULS.W instruction has an MA stage for accessing the multiplier, but there is no contention with the MA of MULS.W during the MAC.W instruction's multiplier operation (mm). Note that the slot splits if there is contention between the MA of MULS.W and IF. : Slot MAC.W IF DMULS.L ID EX MA MA mm mm IF — ID EX MA MA mm mm IF — ID EX MA ······ Other instruction mm mm ·········· Figure 8.24 DMULS.L Instruction Follows Immediately after MAC.W Instruction (e) STS (register) instruction follows immediately after MAC.W instruction If the STS instruction is used to store the contents of the MAC register to a general-use register, the STS instruction will include an MA stage for accessing the multiplier, as described below. If contention with the MA of STS occurs during the multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA of STS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot MAC.W IF STS ID EX MA — MA mm mm if — — ID EX M A WB IF ID — — EX MA if — — ID EX IF ID Other instruction Other instruction Other instruction EX ······ ·········· : Slot MAC.W if STS Other instruction Other instruction Other instruction ID EX MA MA mm mm IF — ID — EX MA if — ID EX IF ID EX if ID WB EX ······ ·········· Figure 8.25 STS (Register) Instruction Follows Immediately after MAC.W Instruction 229 (f) STS.L (memory) instruction follows immediately after MAC.W instruction If the STS instruction is used to store the contents of the MAC register in memory, the STS instruction will include an MA stage for accessing the multiplier and writing to memory, as described below. These diagrams take into account the possibility of contention between MA and IF. : Slot MAC.W IF STS.L ID EX MA — MA mm mm if — — ID EX M A ID — — EX MA if — — ID EX IF ID Other instruction Other instruction Other instruction EX ······ ·········· : Slot MAC.W if STS.L Other instruction Other instruction Other instruction ID EX MA MA mm mm IF — ID — EX MA if — ID EX IF ID EX if ID EX ······ ·········· Figure 8.26 STS.L (Memory) Instruction Follows Immediately after MAC.W Instruction 230 (g) LDS (register) instruction follows immediately after MAC.W instruction If the LDS instruction is used to load the contents of the MAC register from a general-use register, the LDS instruction will include an MA stage for accessing the multiplier, as described below. If contention with the MA of LDS occurs during the multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA of LDS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot MAC.W IF LDS ID EX MA — MA mm if — — ID EX M A IF ID — — EX MA if — — ID EX IF ID Other instruction Other instruction mm Other instruction EX ······ ·········· : Slot MAC.W if LDS Other instruction Other instruction Other instruction ID EX MA MA mm mm IF — ID — EX MA if — ID EX IF ID EX if ID EX ······ ·········· Figure 8.27 LDS (Register) Instruction Follows Immediately after MAC.W Instruction 231 (h) LDS.L (memory) instruction follows immediately after MAC.W instruction If the LDS instruction is used to load the contents of the MAC register from memory, the LDS instruction will include an MA stage for accessing memory and accessing the multiplier, as described below. If contention with the MA of LDS occurs during the multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA of LDS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot MAC.W IF LDS.L ID EX MA — MA if — — ID EX M A IF ID — — EX if — — ID EX IF ID Other instruction Other instruction mm mm Other instruction EX ······ ·········· : Slot MAC.W if LDS.L Other instruction Other instruction Other instruction ID EX MA MA mm mm IF — ID — EX MA if — ID EX IF ID EX if ID EX ······ ·········· Figure 8.28 LDS.L (Memory) Instruction Follows Immediately after MAC.W Instruction 232 Double-Length Multiply/Accumulate Instruction: Includes the following instruction type: • MAC.L @Rm+, @Rn+ : Slot Instruction A Next instruction Third instruction ...... IF ID EX MA MA mm mm mm mm IF — ID EX MA WB IF ID EX MA WB Figure 8.29 Double-Length Multiply/Accumulate Instruction Pipeline The pipeline has nine stages: IF, ID, EX, MA, MA, mm, mm, mm, and mm (figure 8.29). The second MA reads the memory and accesses the multiplier. The mm indicates that the multiplier is operating. The mm operates for four cycles after the final MA ends, regardless of slot. The ID of the instruction after the MAC.L instruction is stalled for one slot. The two MAs of the MAC.L instruction, when they contend with IF, split the slots as described in section 8.4, Contention between Instruction Fetch (IF) and Memory Access (MA). When an instruction that does not use the multiplier follows the MAC.L instruction, the MAC.L instruction may be considered to be a five-stage pipeline instruction of IF, ID, EX, MA, MA. In such cases, the ID of the next instruction simply stalls one slot and thereafter the pipeline operates normally. When an instruction that uses the multiplier comes after the MAC.L instruction, contention occurs with the multiplier, so operation is different from normal. The following cases are possible: (a) MAC.L instruction follows immediately after MAC.L instruction (b) MAC.W instruction follows immediately after MAC.L instruction (c) DMULS.L instruction follows immediately after MAC.L instruction (d) MULS.W instruction follows immediately after MAC.L instruction (e) STS (register) instruction follows immediately after MAC.L instruction (f) STS.L (memory) instruction follows immediately after MAC.L instruction (g) LDS (register) instruction follows immediately after MAC.L instruction (h) LDS.L (memory) instruction follows immediately after MAC.L instruction 233 (a) MAC.L instruction follows immediately after MAC.L instruction If the second MA of the MAC.L instruction contends with the mm generated by the preceding multiply instruction, that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. If there are two or more instructions that do not use the multiplier located between the one MAC.L instruction and a second MAC.L instruction, no contention occurs the two MAC.L instructions and there is no delay. : Slot IF MAC.L MAC.L ID EX MA MA mm IF — ID EX MA M IF — ID Next instruction in series mm mm mm A mm mm EX — — MA ······ mm mm mm mm ·········· : Slot IF MAC.L Other instruction ID EX MA MA mm mm IF — ID EX MA WB IF ID EX MA WB IF ID EX MA Other instruction MAC.L MA mm mm mm mm ·········· Figure 8.30 MAC.L Instruction Follows Immediately after MAC.L Instruction (1) Even if the succession of MAC.L instructions causes delays in execution due to contention between MA and IF, multiplier contention may be reduced in some cases. Refer to the diagram below. This diagram takes into account the possibility of contention between MA and IF. : Slot MAC.L MAC.L MAC.L MAC.L if ID EX MA MA mm mm mm mm IF — ID EX MA — M A mm mm if — — ID EX — MA M IF — — ID EX mm mm A mm — — MA mm mm ·········· Figure 8.31 MAC.L Instruction Follows Immediately after MAC.L Instruction (2) 234 mm If the second MA of the MAC.L instruction is delayed until the mm finishes, and that MA contends with IF, the slot splits normally. Refer to the diagram below. This diagram takes into account the possibility of contention between MA and IF. : Slot MAC.L IF MAC.L ID EX MA — MA mm if — — ID EX MA M IF — ID if Other instruction Other instruction mm mm mm A mm — — — EX — — — ID mm mm mm IF Other instruction ·········· Figure 8.32 MAC.L Instruction Follows Immediately after MAC.L Instruction (3) (b) MAC.W instruction follows immediately after MAC.L instruction If the second MA of the MAC.L contends with the mm generated by the preceding multiply instruction, that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. If there are two or more instructions that do not use the multiplier located between the MAC.L instruction and the MAC.W instruction, no multiplier contention occurs between the MAC.L instruction and the MAC.W instruction, and there is no delay. : Slot MAC.L IF MAC.W ID EX MA MA mm IF — ID EX MA M IF — ID EX Next instruction in series mm mm mm A mm mm — — MA ······ ·········· : Slot MAC.L Other instruction Other instruction MAC.W IF ID EX MA MA mm mm IF — ID EX MA WB mm IF ID EX MA WB IF ID EX MA mm MA mm mm ·········· Figure 8.33 MAC.W Instruction Follows Immediately after MAC.L Instruction 235 (c) DMULS.L instruction follows immediately after MAC.L instruction The DMULS.L instruction has an MA stage for accessing the multiplier. If contention with the second MA of DMULS.L occurs during the MAC.L instruction's multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. If there are two or more instructions that do not use the multiplier located between the MAC.L instruction and the DMULS.L instruction, no contention occurs between MAC.L and DMULS.L, and there is no delay. Note that the slot splits if there is contention between the MA of DMULS.L and IF. : Slot MAC.L IF DMULS.L ID EX MA MA mm IF — ID EX MA M IF — ID Other instruction mm mm mm A mm mm — — EX MA ······ mm mm mm mm mm mm ·········· : Slot MAC.L IF Branch destination ID EX MA MA IF — ID EX IF ID EX MA M A mm mm IF — ID — EX MA ······ DMULS.L Other instruction mm mm ·········· : Slot MAC.L IF Other instruction Other instruction DMULS.L Other instruction ID EX MA MA mm mm IF — ID EX MA WB mm mm IF ID EX MA WB IF ID EX MA MA mm mm IF — ID EX MA ······ mm mm ·········· Figure 8.34 DMULS.L Instruction Follows Immediately after MAC.L Instruction 236 (d) MULS.W instruction follows immediately after MAC.L instruction The MULS.W instruction has an MA stage for accessing the multiplier. If contention with the MA of MULS.W occurs during the MAC.L instruction's multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. If there are three or more instructions that do not use the multiplier located between MAC.L and MULS.W, no contention occurs between MAC.L and MULS.W and there is no delay. Note that the slot splits if there is contention between the MA of MULS.W and IF. : Slot MAC.L IF MULS.W ID EX MA MA IF — ID EX M IF ID EX Other instruction mm mm mm mm A mm mm — — — MA ······ ·········· : Slot MAC.L IF Branch destination ID EX MA MA IF — ID EX IF ID EX M IF ID MULS.W Other instruction mm mm mm mm A mm mm EX — — MA ······ mm mm ·········· : Slot MAC.L IF Other instruction ID EX MA MA mm mm IF — ID EX MA WB IF ID EX MA IF ID EX M A mm mm IF ID EX — MA ······ mm mm Other instruction MULS.W Other instruction WB ·········· : Slot MAC.L Other instruction Other instruction Other instruction MULS.W Other instruction IF ID EX MA MA mm mm IF — ID EX MA WB IF ID EX MA WB IF ID EX MA WB IF ID EX MA mm mm IF ID EX MA ······ ·········· Figure 8.35 MULS.W Instruction Follows Immediately after MAC.L Instruction 237 (e) STS (register) instruction follows immediately after MAC.L instruction If the STS instruction is used to store the contents of the MAC register to a general-use register, the STS instruction will include an MA stage for accessing the multiplier, as described below. If contention with the MA of STS occurs during the multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA of STS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot MAC.L IF STS ID EX MA — MA if — — ID EX M IF ID if Other instruction Other instruction mm mm mm mm A WB — — — — EX MA — — — — ID EX IF ID Other instruction EX ······ ·········· : Slot MAC.L if STS Other instruction Other instruction Other instruction ID EX MA MA mm mm IF — ID — EX M if — ID EX IF ID if mm mm A WB — — EX — — ID EX ······ ·········· Figure 8.36 STS (Register) Instruction Follows Immediately after MAC.L Instruction 238 (f) STS.L (memory) instruction follows immediately after MAC.L instruction If the STS instruction is used to store the contents of the MAC register in memory, the STS instruction will include an MA stage for accessing the multiplier and writing to memory, as described below. Also, the MA of STS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot MAC.L IF STS.L ID EX MA — MA mm if — — ID EX M IF ID if Other instruction Other instruction mm mm mm — — — — EX MA — — — — ID EX IF ID A Other instruction EX ······ ·········· : Slot MAC.L if STS.L Other instruction Other instruction Other instruction ID EX MA MA mm mm IF — ID — EX M mm mm if — ID EX IF ID — — EX if — — ID A EX ······ ·········· Figure 8.37 STS.L (Memory) Instruction Follows Immediately after MAC.L Instruction 239 (g) LDS (register) instruction follows immediately after MAC.L instruction If the LDS instruction is used to load the contents of the MAC register from a general-use register, the LDS instruction will include an MA stage for accessing the multiplier, as described below. If contention with the MA of LDS occurs during the multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA of LDS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot MAC.L IF LDS ID EX MA — MA if — — ID EX M IF ID if Other instruction Other instruction mm mm mm mm — — — — EX MA — — — — ID EX IF ID A Other instruction EX ······ ·········· : Slot MAC.L LDS Other instruction Other instruction Other instruction if ID EX MA MA mm mm mm mm IF — ID — EX M if — ID EX IF ID — — EX if — — ID A EX ······ ·········· Figure 8.38 LDS (Register) Instruction Follows Immediately after MAC.L Instruction 240 (h) LDS.L (memory) instruction follows immediately after MAC.L instruction If the LDS instruction is used to load the contents of the MAC register from memory, the LDS instruction will include an MA stage for accessing memory and accessing the multiplier, as described below. If contention with the MA of LDS occurs during the multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA of LDS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot MAC.L IF LDS.L ID EX MA — MA if — — ID EX M IF ID if Other instruction Other instruction mm mm mm mm — — — — EX MA — — — — ID EX IF ID A Other instruction EX ······ ·········· : Slot MAC.L if LDS.L Other instruction Other instruction Other instruction ID EX MA MA mm mm mm mm IF — ID — EX M if — ID EX IF ID — — EX if — — ID A EX ······ ·········· Figure 8.39 LDS.L (Memory) Instruction Follows Immediately after MAC.L Instruction 241 Multiplication Instructions: Include the following instruction types: • MULS.W • MULU.W Rm, Rn Rm, Rn : Slot Instruction A Next instruction Third instruction ...... IF ID EX MA mm mm IF ID EX MA WB IF ID EX MA WB Figure 8.40 Multiplication Instruction Pipeline The pipeline has six stages: IF, ID, EX, MA, mm, and mm. The MA accesses the multiplier. mm indicates that the multiplier is operating. mm operates for three cycles after the MA ends, regardless of slot. The MA of the MULS.W instruction, when it contends with IF, splits the slot as described in Section 8.4, Contention between Instruction Fetch (IF) and Memory Access (MA). When an instruction that does not use the multiplier comes after the MULS.W instruction, the MULS.W instruction may be considered to be a four-stage pipeline instruction of IF, ID, EX, and MA. In such cases, it operates like a normal pipeline. When an instruction that uses the multiplier comes after the MULS.W instruction, however, contention occurs with the multiplier, so operation is different from normal. The following cases are possible: (a) MAC.W instruction follows immediately after MULS.W instruction (b) MAC.L instruction follows immediately after MULS.W instruction (c) MULS.W instruction follows immediately after MULS.W instruction (d) DMULS.L instruction follows immediately after MULS.W instruction (e) STS (register) instruction follows immediately after MULS.W instruction (f) STS.L (memory) instruction follows immediately after MULS.W instruction (g) LDS (register) instruction follows immediately after MULS.W instruction (h) LDS.L (memory) instruction follows immediately after MULS.W instruction 242 (a) MAC.W instruction follows immediately after MULS.W instruction The second MA of the MAC.W instruction does not contend with the mm generated by the preceding multiply instruction. : Slot MULS.W IF MAC.W ID EX MA mm mm IF ID EX MA MA mm mm IF — ID EX MA ······ Next instruction in series ·········· Figure 8.41 MAC.W Instruction Follows Immediately after MULS.W Instruction (b) MAC.L instruction follows immediately after MULS.W instruction The second MA of the MAC.W instruction does not contend with the mm generated by the preceding multiply instruction. : Slot ID EX MA mm mm IF ID EX MA MA mm mm Next instruction in series IF — ID EX MA ······ MULS.W MAC.L IF mm mm ·········· Figure 8.42 MAC.L Instruction Follows Immediately after MULS.W Instruction 243 (c) MULS.W instruction follows immediately after MULS.W instruction The MULS.W instruction has an MA stage for accessing the multiplier. If contention with the MA of the other MULS.W occurs during the MULS.W instruction's multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. If there is one or more instruction that does not use the multiplier located between MULS.W and MULS.W, no contention occurs between MULS.W and MULS.W and there is no delay. Note that the slot splits if there is contention between the MA of MULS.W and IF. : Slot IF MULS.W MULS.W ID EX MA IF ID EX M A mm mm IF ID EX — MA ······ ID EX MA mm mm IF ID EX IF ID EX MA mm mm IF ID EX MA ······ Other instruction mm mm ·········· : Slot IF MULS.W Other instruction MULS.W Other instruction ·········· Figure 8.43 MULS.W Instruction Follows Immediately after MULS.W Instruction (1) If the MA of the MULS.W instruction is delayed until the mm finishes, and that MA contends with IF, the slot splits normally. Refer to the diagram below. This diagram takes into account the possibility of contention between MA and IF. : Slot MULS.W IF MULS.W Other instruction Other instruction Other instruction ID EX MA mm if ID EX M mm A mm mm IF ID — — EX MA ······ if — — ID EX ······ IF ID ······ ·········· Figure 8.44 MULS.W Instruction Follows Immediately after MULS.W Instruction (2) 244 (d) DMULS.L instruction follows immediately after MULS.W instruction The second MA of the DMULS.L accesses the multiplier, but there is no contention with the mm generated by the MULS.W instruction. : Slot MULS.W IF DMULS.L ID EX MA mm mm IF ID EX MA MA mm mm IF — ID EX MA ······ Other instruction mm mm ·········· Figure 8.45 DMULS.L Instruction Follows Immediately after MULS.W Instruction (e) STS (register) instruction follows immediately after MULS.W instruction If the STS instruction is used to store the contents of the MAC register to a general-use register, the STS instruction will include an MA stage for accessing the multiplier, as described below. If contention with the MA of STS occurs during the multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA of STS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot MULS.W IF STS ID EX MA mm if ID EX M A WB IF ID — — EX MA if — — ID EX IF ID Other instruction Other instruction mm Other instruction EX ······ ·········· : Slot MULS.W if STS Other instruction Other instruction Other instruction ID EX MA mm mm IF ID — EX MA if — ID EX IF ID EX if ID WB EX ······ ·········· Figure 8.46 STS (Register) Instruction Follows Immediately after MULS.W Instruction 245 (f) STS.L (memory) instruction follows immediately after MULS.W instruction If the STS instruction is used to store the contents of the MAC register in memory, the STS instruction will include an MA stage for accessing the multiplier and writing to memory, as described below. Also, the MA of STS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot MULS.W IF STS.L ID EX MA mm if ID EX M A IF ID — — EX MA if — — ID EX IF ID Other instruction Other instruction mm Other instruction EX ······ ·········· : Slot MULS.W if STS.L Other instruction Other instruction Other instruction ID EX MA mm mm IF ID — EX MA if — ID EX IF ID EX if ID EX ······ ·········· Figure 8.47 STS.L (Memory) Instruction Follows Immediately after MULS.W Instruction 246 (g) LDS (register) instruction follows immediately after MULS.W instruction If the LDS instruction is used to load the contents of the MAC register from a general-use register, the LDS instruction will include an MA stage for accessing the multiplier, as described below. If contention with the MA of LDS occurs during the multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA of LDS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot MULS.W IF LDS ID EX MA if ID EX M A IF ID — — EX MA if — — ID EX IF ID Other instruction Other instruction mm mm Other instruction EX ······ ·········· : Slot MULS.W if LDS Other instruction Other instruction Other instruction ID EX MA mm mm IF ID — EX MA if — ID EX IF ID EX if ID EX ······ ·········· Figure 8.48 LDS (Register) Instruction Follows Immediately after MULS.W Instruction 247 (h) LDS.L (memory) instruction follows immediately after MULS.W instruction If the LDS instruction is used to load the contents of the MAC register from memory, the LDS instruction will include an MA stage for accessing memory and accessing the multiplier, as described below. If contention with the MA of LDS occurs during the multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA of LDS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot MULS.W IF LDS.L ID EX MA if ID EX M A IF ID — — EX MA if — — ID EX IF ID Other instruction Other instruction mm mm Other instruction EX ······ ·········· : Slot MULS.W LDS.L Other instruction Other instruction Other instruction if ID EX MA mm mm IF ID — EX MA if — ID EX IF ID EX if ID EX ······ ·········· Figure 8.49 LDS.L (Memory) Instruction Follows Immediately after MULS.W Instruction 248 Double-Length Multiplication Instructions: Include the following instruction types: • DMULS.L • DMULU.L • MUL.L Rm, Rn Rm, Rn Rm, Rn : Slot Instruction A Next instruction Third instruction ...... IF ID EX MA MA mm mm mm mm IF — ID EX MA WB IF ID EX MA WB Figure 8.50 Multiplication Instruction Pipeline The pipeline has nine stages: IF, ID, EX, MA, MA, mm, mm, mm, and mm (figure 8.50). The second MA accesses the multiplier. The mm indicates that the multiplier is operating. The mm operates for four cycles after the MA ends, regardless of slot. The ID of the instruction following the DMULS.L instruction is stalled for 1 slot (see the description of the Multiply/Accumulate instruction). The two MA stages of the DMULS.L instruction, when they contend with IF, split the slot as described in section 8.4, Contention between Instruction Fetch (IF) and Memory Access (MA). When an instruction that does not use the multiplier comes after the DMULS.L instruction, the DMULS.L instruction may be considered to be a five-stage pipeline instruction of IF, ID, EX, MA, and MA. In such cases, it operates like a normal pipeline. When an instruction that uses the multiplier come after the DMULS.L instruction, however, contention occurs with the multiplier, so operation is different from normal. The following cases are possible: (a) MAC.L instruction follows immediately after DMULS.L instruction (b) MAC.W instruction follows immediately after DMULS.L instruction (c) DMULS.L instruction follows immediately after DMULS.L instruction (d) MULS.W instruction follows immediately after DMULS.L instruction (e) STS (register) instruction follows immediately after DMULS.L instruction (f) STS.L (memory) instruction follows immediately after DMULS.L instruction (g) LDS (register) instruction follows immediately after DMULS.L instruction (h) LDS.L (memory) instruction follows immediately after DMULS.L instruction 249 (a) MAC.L instruction follows immediately after DMULS.L instruction If the second MA of the MAC.L instruction contends with the mm generated by the preceding multiply instruction, the bus cycle of that MA is extended until the mm finishes (M -- A in the diagram below), thereby forming a single slot. If there are two or more instructions that do not use the multiplier located between the DMULS.L instruction and the MAC.L instruction, no contention occurs between DMULS.L and MAC.L, and there is no delay. : Slot DMULS.L IF MAC.L ID EX MA MA mm mm IF — ID EX MA M IF — ID EX Next instruction in series mm mm A mm mm — — MA ······ mm mm mm mm ·········· : Slot DMULS.L Other instruction Other instruction MAC.L IF ID EX MA MA mm mm IF — ID EX MA WB IF ID EX MA WB IF ID EX MA MA mm mm mm mm ·········· Figure 8.51 MAC.L Instruction Follows Immediately after DMULS.L Instruction 250 (b) MAC.W instruction follows immediately after DMULS.L instruction If the second MA of the MAC.W instruction contends with the mm generated by the preceding multiply instruction, the bus cycle of that MA is extended until the mm finishes (M -- A in the diagram below), thereby forming a single slot. If there are two or more instructions that do not use the multiplier located between the DMULS.L instruction and the MAC.W instruction, no contention occurs between DMULS.L and MAC.W, and there is no delay. : Slot DMULS.L IF MAC.W ID EX MA MA mm mm IF — ID EX MA M IF — ID EX Next instruction in series mm mm A mm mm — — MA ······ mm mm ·········· : Slot DMULS.L IF Other instruction Other instruction MAC.W ID EX MA MA mm mm IF — ID EX MA WB IF ID EX MA WB IF ID EX MA MA mm mm ·········· Figure 8.52 MAC.W Instruction Follows Immediately after DMULS.L Instruction 251 (c) DMULS.L instruction follows immediately after DMULS.L instruction The DMULS.L instruction has an MA stage for accessing the multiplier. If contention with the MA of DMULS.L occurs during the other DMULS.L instruction's multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. If there are two or more instructions that do not use the multiplier located between DMULS.L and DMULS.L, no contention occurs between DMULS.L and DMULS.L and there is no delay. Note that the slot splits if there is contention between the MA of DMULS.L and IF. : Slot DMULS.L IF DMULS.L ID EX MA MA mm IF — ID EX MA M IF — ID EX Other instruction mm mm mm A mm mm — — MA ······ mm mm ·········· : Slot DMULS.L IF Other instruction ID EX MA MA IF — ID EX IF DMULS.L Other instruction mm mm mm mm ID EX MA M A mm mm IF — ID EX — MA ······ mm mm mm mm ·········· : Slot DMULS.L Other instruction Other instruction DMULS.L Other instruction IF ID EX MA MA mm mm IF — ID EX MA WB IF ID EX MA WB IF ID EX MA MA mm mm IF — ID EX MA ······ mm mm ·········· Figure 8.53 DMULS.L Instruction Follows Immediately after DMULS.L Instruction (1) 252 If the MA of the DMULS.L instruction is delayed until the mm finishes, and that MA contends with IF, the slot splits normally. Refer to the diagram below. This diagram takes into account the possibility of contention between MA and IF. : Slot DMULS.L IF DMULS.L Other instruction Other instruction Other instruction ID EX MA MA — mm mm if — EX — ID MA M IF ID if — mm mm A mm — — — EX — — — mm mm ID EX ······ IF ID ······ mm ·········· Figure 8.54 DMULS.L Instruction Follows Immediately after DMULS.L Instruction (2) 253 (d) MULS.W instruction follows immediately after DMULS.L instruction The MULS.W instruction has an MA stage for accessing the multiplier. If contention with the MA of MULS.W occurs during the DMULS.L instruction's multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. If there are three or more instructions that do not use the multiplier located between DMULS.L and MULS.W, no contention occurs between DMULS.L and MULS.W and there is no delay. Note that the slot splits if there is contention between the MA of MULS.W and IF. : Slot IF DMULS.L MULS.W ID EX MA MA mm IF — ID EX M IF ID EX — — mm Other instruction mm mm mm A mm mm MA ······ ·········· : Slot IF DMULS.L Other instruction ID EX MA MA mm mm IF — ID EX MA WB IF ID EX MA WB IF ID EX MA WB IF ID EX MA MA mm IF ID EX MA ······ Other instruction Other instruction MULS.W mm mm Other instruction ·········· Figure 8.55 MULS.W Instruction Follows Immediately after DMULS.L Instruction (1) If the MA of the DMULS.L instruction is delayed until the mm finishes, and that MA contends with IF, the slot splits normally. Refer to the diagram below. This diagram takes into account the possibility of contention between MA and IF. : Slot DMULS.L IF MULS.W Other instruction Other instruction Other instruction ID EX MA — MA mm mm if — — ID EX M IF ID — — if — — mm mm A mm mm — — EX MA ······ — — ID EX ······ IF ID ······ ·········· Figure 8.56 MULS.W Instruction Follows Immediately after DMULS.L Instruction (2) 254 (e) STS (register) instruction follows immediately after DMULS.L instruction If the STS instruction is used to store the contents of the MAC register to a general-use register, the STS instruction will include an MA stage for accessing the multiplier, as described below. If contention with the MA of STS occurs during the multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA of STS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot DMULS.L IF STS ID EX MA — MA if — — ID EX M IF ID if Other instruction Other instruction mm mm mm mm A WB — — — — EX MA — — — — ID EX IF ID Other instruction EX ······ ·········· : Slot DMULS.L if STS Other instruction Other instruction Other instruction ID EX MA MA mm mm IF — ID — EX M if — ID EX IF ID if mm mm A WB — — EX — — ID EX ······ ·········· Figure 8.57 STS (Register) Instruction Follows Immediately after DMULS.L Instruction 255 (f) STS.L (memory) instruction follows immediately after DMULS.L instruction If the STS instruction is used to store the contents of the MAC register in memory, the STS instruction will include an MA stage for accessing the multiplier and writing to memory, as described below. Also, the MA of STS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot DMULS.L IF STS.L ID EX MA — MA if — — ID EX M IF ID if Other instruction Other instruction mm mm mm mm — — — — EX MA — — — — ID EX IF ID A Other instruction EX ······ ·········· : Slot DMULS.L if STS.L Other instruction Other instruction Other instruction ID EX MA MA mm mm mm mm IF — ID — EX M if — ID EX IF ID — — EX if — — ID A EX ······ ·········· Figure 8.58 STS.L (Memory) Instruction Follows Immediately after DMULS.L Instruction 256 (g) LDS (register) instruction follows immediately after DMULS.L instruction If the LDS instruction is used to load the contents of the MAC register from a general-use register, the LDS instruction will include an MA stage for accessing the multiplier, as described below. If contention with the MA of LDS occurs during the multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA of LDS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot DMULS.L IF LDS ID EX MA — MA if — — ID EX M IF ID if Other instruction Other instruction mm mm mm mm — — — — EX MA — — — — ID EX IF ID A Other instruction EX ······ ·········· : Slot DMULS.L if LDS Other instruction Other instruction Other instruction ID EX MA MA mm mm IF — ID — EX M mm mm if — ID EX IF ID — — EX if — — ID A EX ······ ·········· Figure 8.59 LDS (Register) Instruction Follows Immediately after DMULS.L Instruction 257 (h) LDS.L (memory) instruction follows immediately after DMULS.L instruction If the LDS instruction is used to load the contents of the MAC register from memory, the LDS instruction will include an MA stage for accessing memory and accessing the multiplier, as described below. If contention with the MA of LDS occurs during the multiplier operation (mm), that MA is delayed until the mm finishes (M -- A in the diagram below), thereby forming a single slot. Also, the MA of LDS contends with IF. This situation is shown in the diagrams below. These diagrams take into account the possibility of contention between MA and IF. : Slot DMULS.L IF LDS.L ID EX MA if — — — I Other instruction Other instruction MA mm EX M mm mm mm A ID — — — EX MA if — — — ID EX IF ID Other instruction EX ······ ·········· : Slot DMULS.L if LDS.L Other instruction Other instruction Other instruction ID EX MA MA mm mm IF — ID — EX M mm mm if — ID EX IF ID — — EX if — — ID A EX ······ ·········· Figure 8.60 LDS.L (Memory) Instruction Follows Immediately after DMULS.L Instruction 258 8.8.3 Logic Operation Instructions Register-Register Logic Operation Instructions: Include the following instruction types: • • • • • AND AND NOT OR OR Rm, Rn #imm, R0 Rm, Rn Rm, Rn #imm, R0 • • • • TST TST XOR XOR ID EX IF ID EX ...... IF ID Rm, Rn #imm, R0 Rm, Rn #imm, R0 : Slot Instruction A Next instruction Third instruction in series ...... IF EX ...... Figure 8.61 Register-Register Logic Operation Instruction Pipeline The pipeline has three stages: IF, ID, and EX (figure 8.61). The data operation is completed in the EX stage via the ALU. 259 Memory Logic Operations Instructions: Include the following instruction types: • • • • AND.B OR.B TST.B XOR.B #imm, @(R0, GBR) #imm, @(R0, GBR) #imm, @(R0, GBR) #imm, @(R0, GBR) : Slot Instruction A IF Next instruction ID EX MA EX MA IF — — ID EX ..... IF ID EX Third instruction in series ..... ..... Figure 8.62 Memory Logic Operation Instruction Pipeline The pipeline has six stages: IF, ID, EX, MA, EX, and MA (figure 8.62). The ID of the next instruction stalls for 2 slots. The MAs of these instructions contend with IF. TAS Instruction: Includes the following instruction type: • TAS.B @Rn : Slot Instruction A Next instruction Third instruction in series IF ID EX MA EX MA IF — — — ID EX ..... IF ID EX ..... ..... Figure 8.63 TAS Instruction Pipeline The pipeline has six stages: IF, ID, EX, MA, EX, and MA (figure 8.63). The ID of the next instruction stalls for 3 slots. The MA of the TAS instruction contends with IF. 260 8.8.4 Shift Instructions General Shift Instructions: Include the following instruction types: • • • • • • • ROTL ROTR ROTCL ROTCR SHAL SHAR SHLL • • • • • • • Rn Rn Rn Rn Rn Rn Rn SHLR SHLL2 SHLR2 SHLL8 SHLR8 SHLL16 SHLR16 Rn Rn Rn Rn Rn Rn Rn : Slot Instruction A Next instruction Third instruction in series ..... IF ID EX IF ID EX ..... IF ID EX ..... Figure 8.64 General Shift Instruction Pipeline The pipeline has three stages: IF, ID, and EX (figure 8.64). The data operation is completed in the EX stage via the ALU. 261 8.8.5 Branch Instructions Conditional Branch Instructions: Include the following instruction types: • BF label • BT label The pipeline has three stages: IF, ID, and EX. Condition verification is performed in the ID stage. Conditionally branched instructions are not delay branched. 1. When condition is satisfied The branch destination address is calculated in the EX stage. The two instructions after the conditional branch instruction (instruction A) are fetched but discarded. The branch destination instruction begins its fetch from the slot following the slot which has the EX stage of instruction A (figure 8.65). : Slot Instruction A IF Next instruction ID EX IF — (Fetched but discarded) Third instruction in series IF — Branch destination ..... — IF (Fetched but discarded) ID EX ..... IF ID EX ..... ..... Figure 8.65 Branch Instruction when Condition Is Satisfied 2. When condition is not satisfied If it is determined that conditions are not satisfied at the ID stage, the EX stage proceeds without doing anything. The next instruction also executes a fetch (figure 8.66). : Slot Instruction A Next instruction Third instruction in series ..... IF ID EX IF ID EX ..... IF ID EX ..... IF ID EX ..... ..... Figure 8.66 Branch Instruction when Condition Is Not Satisfied 262 Note: The SH-2E always fetches data as longwords. Consequently, a fetch performed by the instruction following the status "1. When condition is satisfied" will overlap two instructions if the address is at the 4n address boundary. Delayed Conditional Branch Instructions: Include the following instruction types: • BF/S label • BT/S label The pipeline has three stages: IF, ID, and EX. Condition verification is performed in the ID stage. 1. When condition is satisfied The branch destination address is calculated in the EX stage. The instruction after the conditional branch instruction (instruction A) is fetched and executed, but the instruction after that is fetched and discarded. The branch destination instruction begins its fetch from the slot following the slot which has the EX stage of instruction A (figure 8.67). : Slot Instruction A IF Next instruction ID EX IF — ID EX IF — (Fetched but discarded) ID EX ..... Third instruction in series Branch destination ..... IF IF MA WB ID EX ..... Figure 8.67 Branch Instruction when Condition Is Satisfied 2. When condition is not satisfied If it is determined that a condition is not satisfied at the ID stage, the EX stage proceeds without doing anything. The next instruction also executes a fetch (figure 8.68). : Slot Instruction A Next instruction Third instruction in series ..... IF ID EX IF ID EX ..... IF ID EX ..... IF ID EX ..... ..... Figure 8.68 Branch Instruction when Condition Is Not Satisfied Note: The SH-2E always fetches data as longwords. Consequently, a fetch performed by the instruction following the status "1. When condition is satisfied" will overlap two instructions if the address is at the 4n address boundary. 263 Unconditional Branch Instructions: Include the following instruction types: • • • • • • • BRA BRAF BSR BSRF JMP JSR RTS label Rm label Rm @Rm @Rm : Slot Instruction A Delay slot Branch destination ..... IF ID EX IF — ID EX IF ID MA WB EX ..... IF ID EX ..... ..... Figure 8.69 Unconditional Branch Instruction Pipeline The pipeline has three stages: IF, ID, and EX (figure 8.69). Unconditionally branched instructions are delay branched. The branch destination address is calculated in the EX stage. The instruction following the unconditional branch instruction (instruction A), that is, the delay slot instruction is not fetched and discarded as conditional branch instructions are, but is instead executed. Note that the ID slot of the delay slot instruction does stall for one cycle. The branch destination instruction starts its fetch from the slot after the slot that has the EX stage of instruction A. 264 8.8.6 System Control Instructions System Control ALU Instructions: Include the following instruction types: • • • • • • CLRT LDC LDC LDC LDS NOP • • • • • Rm,SR Rm,GBR Rm,VBR Rm,PR SETT STC STC STC STS SR,Rn GBR,Rn VBR,Rn PR,Rn : Slot Instruction A Next instruction Third instruction in series ..... IF ID EX IF ID EX ..... IF ID EX ..... Figure 8.70 System Control ALU Instruction Pipeline The pipeline has three stages: IF, ID, and EX (figure 8.70). The data operation is completed in the EX stage via the ALU. 265 LDC.L Instructions: Include the following instruction types: • LDC.L • LDC.L • LDC.L @Rm+, SR @Rm+, GBR @Rm+, VBR : Slot Instruction A IF Next instruction ID EX MA WB IF — — Third instruction in series ..... ID EX ..... IF ID EX ..... Figure 8.71 LDC.L Instruction Pipeline The pipeline has five stages: IF, ID, EX, MA, and EX (figure 8.71). The ID of the following instruction is stalled two slots. STC.L Instructions: Include the following instruction types: • STC.L • STC.L • STC.L SR, @–Rn GBR, @–Rn VBR, @–Rn : Slot Instruction A Next instruction Third instruction in series ..... IF ID EX MA IF — ID EX ..... IF ID EX ..... Figure 8.72 STC.L Instruction Pipeline The pipeline has four stages: IF, ID, EX, and MA (figure 8.72). The ID of the next instruction is stalled one slot. 266 LDS.L Instruction (PR): Includes the following instruction type: • LDS.L @Rm+, PR : Slot Instruction A IF Next instruction ID EX IF ID MA WB EX ..... IF ID Third instruction in series ..... EX ..... Figure 8.73 LDS.L Instructions (PR) Pipeline The pipeline has five stages: IF, ID, EX, MA, and WB (figure 8.73). It is the same as an ordinary load instruction. STS.L Instruction (PR): Includes the following instruction type: • STS.L PR, @–Rn : Slot Instruction A Next instruction Third instruction in series ..... IF ID EX MA IF ID EX ..... IF ID EX ..... Figure 8.74 STS.L Instruction (PR) Pipeline The pipeline has four stages: IF, ID, EX, and MA (figure 8.74). It is the same as an ordinary load instruction. 267 Register → MAC Transfer Instructions: Include the following instruction types: • CLRMAC • LDS Rm, MACH • LDS Rm, MACL : Slot Instruction A IF Next instruction ID EX MA IF ID EX ..... IF ID EX Third instruction in series ..... ..... Figure 8.75 Register → MAC Transfer Instruction Pipeline The pipeline has four stages: IF, ID, EX, and MA (figure 8.75). MA is a stage for accessing the multiplier. MA contends with IF. This makes it the same as ordinary store instructions. Since the multiplier does contend with the MA, however, the items noted for the multiplication, Multiply/Accumulate, double-length multiplication, and double-length multiply/accumulate instructions apply. Memory → MAC Transfer Instructions: Include the following instruction types: • LDS.L • LDS.L @Rm+, MACH @Rm+, MACL : Slot Instruction A Next instruction Third instruction in series ..... IF ID EX MA IF ID EX ..... IF ID EX ..... Figure 8.76 Memory → MAC Transfer Instruction Pipeline The pipeline has four stages: IF, ID, EX, and MA (figure 8.76). MA contends with IF. MA is a stage for memory access and multiplier access. This makes it the same as ordinary load instructions. Since the multiplier does contend with the MA, however, the items noted for the multiplication, Multiply/Accumulate, double-length multiplication, and double-length multiply/accumulate instructions apply. 268 MAC → Register Transfer Instructions: Include the following instruction types: • STS • STS MACH, Rn MACL, Rn : Slot Instruction A IF Next instruction ID EX IF ID MA WB EX ..... IF ID Third instruction in series ..... EX ..... Figure 8.77 MAC → Register Transfer Instruction Pipeline The pipeline has five stages: IF, ID, EX, MA, and WB (figure 8.77). MA is a stage for accessing the multiplier. MA contends with IF. This makes it the same as ordinary load instructions. Since the multiplier does contend with the MA, however, the items noted for the multiplication, Multiply/Accumulate, double-length multiplication, and double-length multiply/accumulate instructions apply. MAC → Memory Transfer Instructions: Include the following instruction types: • STS.L • STS.L MACH, @–Rn MACL, @–Rn : Slot Instruction A Next instruction Third instruction in series ..... IF ID EX MA IF ID EX ..... IF ID EX ..... Figure 8.78 MAC → Memory Transfer Instruction Pipeline The pipeline has four stages: IF, ID, EX, and MA (figure 8.78). MA is a stage for accessing the memory and multiplier. MA contends with IF. This makes it the same as ordinary store instructions. Since the multiplier does contend with the MA, however, the items noted for the multiplication, Multiply/Accumulate, double-length multiplication, and double-length multiply/accumulate instructions apply. 269 RTE Instruction: RTE : Slot RTE IF Delay slot ID EX MA MA IF — — — Branch destination ..... ID EX ..... IF ID EX ..... Figure 8.79 RTE Instruction Pipeline The pipeline has five stages: IF, ID, EX, MA, and MA (figure 8.79). The MAs do not contend with IF. RTE is a delayed branch instruction. The ID of the delay slot instruction is stalled 3 slots. The IF of the branch destination instruction starts from the slot following the MA of the RTE. TRAP Instruction: TRAPA #imm : Slot Instruction A Next instruction IF ID EX EX MA MA MA EX EX IF Third instruction in series IF Branch destination ...... IF ID EX ..... IF ID EX Figure 8.80 TRAP Instruction Pipeline The pipeline has nine stages: IF, ID, EX, EX, MA, MA, MA, EX, and EX (figure 8.80). The MAs do not contend with IF. TRAP is not a delayed branch instruction. The two instructions after the TRAP instruction are fetched, but they are discarded without being executed. The IF of the branch destination instruction starts from the slot of the EX in the ninth stage of the TRAP instruction. SLEEP Instruction: SLEEP : Slot SLEEP Next instruction IF ID EX IF ..... Figure 8.81 SLEEP Instruction Pipeline The pipeline has three stages: IF, ID and EX (figure 8.81). It is issued until the IF of the next instruction. After the SLEEP instruction is executed, the CPU enters sleep mode or standby mode. 270 8.8.7 Exception Processing Interrupt Exception Processing: The interrupt is received during the ID stage of the instruction and everything after the ID stage is replaced by the interrupt exception processing sequence. The pipeline has ten stages: IF, ID, EX, EX, MA, MA, EX, MA, EX, and EX (figure 8.82). Interrupt exception processing is not a delayed branch. In interrupt exception processing, an overrun fetch (IF) occurs. In branch destination instructions, the IF starts from the slot that has the final EX in the interrupt exception processing. Interrupt sources are external interrupt request pins such as NMI, user breaks, IRQ, and on-chip peripheral module interrupts. : Slot Interrupt IF Next instruction Branch destination ...... ID EX EX MA MA EX MA EX EX IF IF EX ...... IF ID ...... ID Figure 8.82 Interrupt Exception Processing Pipeline Address Error Exception Processing: The address error is received during the ID stage of the instruction and everything after the ID stage is replaced by the address error exception processing sequence. The pipeline has ten stages: IF, ID, EX, EX, MA, MA, EX, MA, EX, and EX (figure 8.83). Address error exception processing is not a delayed branch. In address error exception processing, an overrun fetch (IF) occurs. In branch destination instructions, the IF starts from the slot that has the final EX in the address error exception processing. Address errors are caused by instruction fetches and by data reads or writes. See the Hardware Manual for information on the causes of address errors. : Slot Interrupt Next instruction Branch destination ...... IF ID EX EX MA MA EX MA EX EX IF IF EX ...... IF ID ...... ID Figure 8.83 Address Error Exception Processing Pipeline 271 Illegal Instruction Exception Processing: The illegal instruction is received during the ID stage of the instruction and everything after the ID stage is replaced by the illegal instruction exception processing sequence. The pipeline has nine stages: IF, ID, EX, EX, MA, MA, MA, EX, and EX (figure 8.84). Illegal instruction exception processing is not a delayed branch. In illegal instruction exception processing, overrun fetches (IF) occur. Whether there is an IF only in the next instruction or in the one after that as well depends on the instruction that was to be executed. In branch destination instructions, the IF starts from the slot that has the final EX in the illegal instruction exception processing. Illegal instruction exception processing is caused by ordinary illegal instructions and by instructions with illegal slots. When undefined code placed somewhere other than the slot directly after the delayed branch instruction (called the delay slot) is decoded, ordinary illegal instruction exception processing occurs. When undefined code placed in the delay slot is decoded or when an instruction placed in the delay slot to rewrite the program counter is decoded, an illegal slot instruction occurs. : Slot Interrupt Next instruction Third destination Branch destination ...... IF ID EX EX MA MA MA EX EX IF IF) IF EX ...... IF ID ...... ID Figure 8.84 Illegal Instruction Exception Processing Pipeline 272 8.8.8 Relationship between Floating-point Instructions and FPU-related CPU Instructions FPUL Load Instructions: Include the following instruction types: • LDS • LDS.L Rm,FPUL @Rm+,FPUL : Slot Instruction IF ID EX MA IF DF E1 E2 SF : FPU pipeline IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) Next instruction Third instruction in series : CPU pipeline ·········· Figure 8.85 FPUL Load Instruction Pipeline The CPU pipeline has four stages, IF, ID, EX, and MA (figure 8.85) ; and the FPU pipeline has five stages, IF, DF, E1, E2, and SF. The CPU MA stage contends with IF. Contention will also result if an instruction that reads FPUL follows immediately after this instruction. 273 FPSCR Load Instructions: Include the following instruction types: • LDS • LDS.L Rm,FPSCR @Rm+,FPSCR : Slot Instruction IF ID EX MA IF DF E1 E2 SF IF ID — — EX ······ IF DF — — E1 ······ IF — — ID EX ······ : CPU pipeline IF — — DF E1 ······ : FPU pipeline (CPU instruction only) Next instruction Third instruction in series : CPU pipeline : FPU pipeline : CPU pipeline : FPU pipeline (CPU instruction only) ·········· Figure 8.86 FPSCR Load Instruction Pipeline The CPU pipeline has four stages, IF, ID, EX, and MA (figure 8.86) ; and the FPU pipeline has five stages, IF, DF, E1, E2, and SF. Contention occurs as shown in Figure 8.11, and execution of the next instruction is delayed by two slots. 274 FPUL Store Instruction (STS) : Include the following instruction type: • STS FPUL,Rn : Slot Instruction IF ID EX MA IF DF E1 E2 IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) Next instruction Third instruction in series WB : CPU pipeline : FPU pipeline ·········· Figure 8.87 FPUL Store Instruction (STS) Pipeline The CPU pipeline has five stages, IF, ID, EX, MA, and MB (figure 8.87) ; and the FPU pipeline has four stages, IF, DF, E1, and E2. The CPU MA stage contends with IF. Contention will also result if an instruction that uses the destination of this instruction follows immediately after it. 275 FPUL Store Instruction (STS.L) : Include the following instruction type: • STS.L FPUL,@-Rn : Slot Instruction IF ID EX MA : CPU pipeline IF DF E1 E2 : FPU pipeline IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) Next instruction Third instruction in series ·········· Figure 8.88 FPUL Store Instruction (STS.L) Pipeline The CPU pipeline has four stages, IF, ID, EX, and MA (figure 8.88) ; and the FPU pipeline has four stages, IF, DF, E1, and E2. The CPU MA stage contends with IF. 276 FPSCR Store Instruction (STS) : Include the following instruction type: • STS FPSCR,Rn : Slot Instruction Next instruction IF ID — — EX MA IF DF — — E1 E2 IF — — ID EX ······ IF — — DF E1 ······ IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) Third instruction in series WB : CPU pipeline : FPU pipeline : CPU pipeline : FPU pipeline (CPU instruction only) ·········· Figure 8.89 FPSCR Store Instruction (STS) Pipeline The CPU pipeline has five stages, IF, ID, EX, MA, and MB (figure 8.89) ; and the FPU pipeline has four stages, IF, DF, E1, and E2. Contention occurs as shown in Figure 8.12, and execution of the next instruction is delayed by two slots. The CPU MA stage contends with IF. Contention will also result if an instruction that uses the destination of this instruction follows immediately after it. 277 FPSCR Store Instruction (STS.L) : Include the following instruction type: • STS.L FPSCR,@-Rn : Slot Instruction Next instruction IF ID — — EX MA : CPU pipeline IF DF — — E1 E2 : FPU pipeline IF — — ID EX ······ IF — — DF E1 ······ IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) Third instruction in series : CPU pipeline : FPU pipeline (CPU instruction only) ·········· Figure 8.90 FPSCR Store Instruction (STS.L) Pipeline The CPU pipeline has four stages, IF, ID, EX, and MA (figure 8.90) ; and the FPU pipeline has four stages, IF, DF, E1, and E2. Contention occurs as shown in Figure 8.12, and execution of the next instruction is delayed by two slots. The CPU MA stage contends with IF. 278 Floating-point Register Transfer Instructions: Include the following instruction types: • FLDS • FMOV • FSTS FRm,FPUL FRm,FRn FPUL,FRn : Slot Instruction IF ID EX IF DF E1 E2 SF : FPU pipeline IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) Next instruction Third instruction in series : CPU pipeline ·········· Figure 8.91 Floating-point Register Transfer Instruction Pipeline The CPU pipeline has three stages, IF, ID, and EX (figure 8.91) ; and the FPU pipeline has five stages, IF, DF, E1, E2, and SF. Contention occurs if an instruction that reads from the destination of this instruction follows immediately after it. 279 Floating-point Register Immediate Instructions: Include the following instruction types: • FLDI0 • FMDI1 FRn FRn : Slot Instruction IF ID EX IF DF E1 E2 SF : FPU pipeline IF ID EX ······ : CPU pipeline IF DF E1 ······ IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) Next instruction Third instruction in series : CPU pipeline : FPU pipeline (CPU instruction only) ·········· Figure 8.92 Floating-point Register Immediate Instructions The CPU pipeline has three stages, IF, ID, and EX (figure 8.92) ; and the FPU pipeline has five stages, IF, DF, E1, E2, and SF. Contention occurs if an instruction that reads from the destination of this instruction follows immediately after it. 280 Floating-point Register Load Instructions: Include the following instruction types: • FMOV.S • FMOV.S • FMOV.S @Rm,FRn @Rm+,FRn @(R0,Rm),FRn : Slot Instruction IF ID EX MA IF DF E1 E2 SF : FPU pipeline IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) Third instruction in series IF ID EX ······ : CPU pipeline ·········· IF DF E1 ······ : FPU pipeline (CPU instruction only) Next instruction : CPU pipeline Figure 8.93 Floating-point Register Load Instruction Pipeline The CPU pipeline has four stages, IF, ID, EX and MA (figure 8.93) ; and the FPU pipeline has five stages, IF, DF, E1, E2, and SF. The CPU MA stage contends with IF. Contention will also result if an instruction that reads from the destination of this instruction follows immediately after it. 281 Floating-point Register Store Instructions: Include the following instruction types: • FMOV.S • FMOV.S • FMOV.S FRm,@Rn FRm,@-Rn FRm,@(R0,Rn) : Slot Instruction IF ID EX MA : CPU pipeline IF DF E1 E2 : FPU pipeline IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) Next instruction Third instruction in series ·········· Figure 8.94 Floating-point Register Store Instruction Pipeline The CPU pipeline has four stages, IF, ID, EX and MA (figure 8.94) ; and the FPU pipeline has four stages, IF, DF, E1, and E2. The CPU MA stage contends with IF. 282 Floating-point Operation Instructions (Excluding FDIV) : Include the following instruction types: • • • • • • • • FABS FADD FLOAT FMAC FMUL FNEG FSUB FTRC FRn FRm,FRn FPUL,FRn FR0,FRm,FRn FRm,FRn FRn FRm,FRn FRm,FPUL : Slot Instruction IF ID EX IF DF E1 E2 SF : FPU pipeline IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) Next instruction Third instruction in series : CPU pipeline ·········· Figure 8.95 Floating-point Operation Instructions (Excluding FDIV) Pipeline The CPU pipeline has three stages, IF, ID, and EX (figure 8.95) ; and the FPU pipeline has five stages, IF, DF, E1, E2, and SF. Contention occurs if an instruction that reads from the destination of this instruction follows immediately after it. 283 Floating-point Operation Instruction (FDIV) : Include the following instruction type: • FDIV FRm,FRn Case 1: If next instruction is a floating-point instruction or an FPU-related CPU instruction : Slot Instruction IF ID EX IF DF E1 E1 ······ E1 E2 SF IF ID — ······ — — — EX ······ IF DF — ······ — — — E1 ······ IF — ······ — — — ID EX ······ : CPU pipeline IF — ······ — — — DF E1 ······ : FPU pipeline Next instruction Third instruction in series ······ : CPU pipeline : FPU pipeline : CPU pipeline : FPU pipeline (CPU instruction only) ·········· Case 2: If next instruction is a CPU instruction and the following instruction is a floating-point instruction or an FPU-related CPU instruction : Slot Instruction IF ID EX IF DF E1 E1 ······ IF ID EX ······ IF ID ······ — — — EX ······ : CPU pipeline IF DF ······ — — — E1 ······ : FPU pipeline Next instruction Third instruction in series : CPU pipeline E1 E2 SF : FPU pipeline : CPU pipeline ·········· Figure 8.96 Floating-point Operation Instruction (FDIV) Pipeline The CPU pipeline has three stages, IF, ID, and EX (figure 8.96) ; and the FPU pipeline has 17 stages, IF, DF, E1, E1, E1, E1, E1, E1, E1, E1, E1, E1, E1, E1, E1, E1, E2, and SF. In other words, 13 E1 stages are repeated in succession. Contention occurs as shown in Figure 8.13. If the FDIV pipeline overlaps with the pipeline of a floating-point instruction or an FPU-related CPU instruction, all stages from E1 onward are stalled until execution of FDIV completes, and the following instructions are also stalled. Consequently, performance can be improved by not placing any floating-point instructions or FPU-related CPU instructions within the 14 instructions immediately following the FDIV instruction, since CPU instructions can execute normally. 284 Floating-point Compare Instructions: Include the following instruction types: • FCMP/EQ • FCMP/GT FRm,FRn FRm,FRn : Slot Instruction IF ID EX : CPU pipeline IF DF E1 : FPU pipeline IF ID EX ······ IF DF E1 ······ IF ID EX ······ : CPU pipeline IF DF E1 ······ : FPU pipeline (CPU instruction only) Next instruction Third instruction in series : CPU pipeline : FPU pipeline (CPU instruction only) ·········· Figure 8.97 Floating-point Compare Instruction Pipeline The CPU pipeline has three stages, IF, ID, and EX (figure 8.97) ; and the FPU pipeline has three stages, IF, DF, and E1. 285 Appendix A Instruction Code A.1 Instruction Set by Addressing Mode Table A.1 Instruction Set by Addressing Mode Addressing Mode Category Sample Instruction No operand — NOP Direct register addressing Destination operand only MOVT Rn 22 Source and destination operand ADD Rm,Rn 42 Load and store with control register or system register LDC STS Rm,SR MACH,Rn 18 Source operand only JMP @Rm 2 Destination operand only TAS.B @Rn 1 Data transfer direct from register MOV.L Rm,@Rn 8 Multiply/accumulate operation @Rm+,@Rn+ 2 Data transfer direct from register MOV.L @Rm+,Rn 4 Load to control register or system register LDC.L @Rm+,SR 8 Pre-decrement indirect register addressing Data transfer direct from register MOV.L Rm,@–Rn 4 Store from control register or system register STC.L SR,@–Rn 8 Indirect register addressing with displacement Data transfer direct to register MOV.L Rm,@(disp,Rn) 6 Indirect indexed register addressing Data transfer direct to register MOV.L Rm,@(R0,Rn) 8 Indirect GBR addressing Data transfer direct to register with displacement MOV.L R0,@(disp,GBR) 6 Indirect indexed GBR addressing Immediate data transfer AND.B #imm,@(R0,GBR) 4 PC relative addressing with displacement Data transfer direct to register MOV.L @(disp,PC),Rn 3 PC relative addressing with Rn Branch instruction BRAF Rn 2 PC relative addressing Branch instruction BRA label 6 Indirect register addressing Post-increment indirect register addressing MAC.W Types 8 287 Table A.1 Instruction Set by Addressing Mode (cont) Addressing Mode Immediate addressing Category Sample Instruction Types Load to register FLDI0 FRn 2 Arithmetic logical operations direct with register ADD #imm,Rn 7 Specify exception processing vector TRAPA #imm 1 Total: 172 Note: Figures not in parentheses ( ) indicate the number of instructions for the SH-3E and figures in parentheses ( ) indicate the number of instructions for the SH-3. A.1.1 No Operand Table A.2 No Operand Instruction Operation Code Cycles T Bit CLRT 0→T 0000000000001000 1 0 CLRMAC 0 → MACH, MACL 0000000000101000 1 — DIV0U 0 → M/Q/T 0000000000011001 1 0 NOP No operation 0000000000001001 1 — RTE Delayed branching, Stack area → PC/SR 0000000000101011 4 — RTS Delayed branching, PR → PC 0000000000001011 2 — SETT 1→T 0000000000011000 1 1 SLEEP Sleep 0000000000011011 3 — 288 A.1.2 Direct Register Addressing Table A.3 Destination Operand Only Instruction Operation Code Cycles T Bit CMP/PL Rn Rn > 0, 1 → T 0100nnnn00010101 1 Comparison result CMP/PZ Rn Rn 0, 1 → T 0100nnnn00010001 1 Comparison result DT Rn Rn – 1 → Rn, when Rn is 0, 1 0100nnnn00010000 → T. When Rn is nonzero, 0 →T 1 Comparison result FABS FRn abs(FRn → FRn 1111nnnn01011101 1 — FLOAT FPUL, FRn (float)FPUL → FRn 1111nnnn00101101 1 — FNEG FRn –1.0 × FRn → FRn 1111nnnn01001101 1 — FTRC FRm, FPUL (int)FRm → FPUL 1111mmmm00111101 1 — MOVT Rn T → Rn 0000nnnn00101001 1 — ROTL Rn T ← Rn ← MSB 0100nnnn00000100 1 MSB ROTR Rn LSB → Rn → T 0100nnnn00000101 1 LSB ROTCL Rn T ← Rn ← T 0100nnnn00100100 1 MSB ROTCR Rn T → Rn → T 0100nnnn00100101 1 LSB SHAL Rn T ← Rn ← 0 0100nnnn00100000 1 MSB SHAR Rn MSB → Rn → T 0100nnnn00100001 1 LSB SHLL Rn T ← Rn ← 0 0100nnnn00000000 1 MSB SHLR Rn 0 → Rn → T 0100nnnn00000001 1 LSB SHLL2 Rn Rn << 2 → Rn 0100nnnn00001000 1 — SHLR2 Rn Rn >> 2 → Rn 0100nnnn00001001 1 — SHLL8 Rn Rn << 8 → Rn 0100nnnn00011000 1 — SHLR8 Rn Rn >> 8 → Rn 0100nnnn00011001 1 — SHLL16 Rn Rn << 16 → Rn 0100nnnn00101000 1 — SHLR16 Rn Rn >> 16 → Rn 0100nnnn00101001 1 — 289 Table A.4 Source and Destination Operand Instruction Operation Code Cycles T Bit ADD Rm,Rn Rn + Rm → Rn 0011nnnnmmmm1100 1 — ADDC Rm,Rn Rn + Rm + T → Rn, carry → T 0011nnnnmmmm1110 1 Carry ADDV Rm,Rn Rn + Rm → Rn, overflow → T 0011nnnnmmmm1111 1 Overflow AND Rm,Rn Rn & Rm → Rn 0010nnnnmmmm1001 1 — CMP/EQ Rm,Rn When Rn = Rm, 1 → T 0011nnnnmmmm0000 1 Comparison result CMP/HS Rm,Rn When unsigned and Rn Rm, 1 → T 0011nnnnmmmm0010 1 Comparison result CMP/GE Rm,Rn When signed and Rn Rm, 1 → T 0011nnnnmmmm0011 1 Comparison result CMP/HI Rm,Rn When unsigned and Rn > Rm, 1 → T 0011nnnnmmmm0110 1 Comparison result CMP/GT Rm,Rn When signed and Rn > Rm, 1 → T 0011nnnnmmmm0111 1 Comparison result CMP/STR Rm,Rn When a byte in Rn equals a bytes in Rm, 1 → T 0010nnnnmmmm1100 1 Comparison result DIV1 Rm,Rn 1 step division (Rn ÷ Rm) 0011nnnnmmmm0100 1 Calculation result DIV0S Rm,Rn MSB of Rn → Q, MSB of Rm → M, M ^ Q → T 0010nnnnmmmm0111 1 Calculation result DMULS.L Rm,Rn Signed operation of Rn x Rm → MACH, MACL 0011nnnnmmmm1101 2 to 4* — DMULU.L Rm,Rn Unsigned operation of Rn × Rm → MACH, MACL 0011nnnnmmmm0101 2 to 4* — EXTS.B Rm,Rn Sign – extend Rm from byte → Rn 0110nnnnmmmm1110 1 — EXTS.W Rm,Rn Sign – extend Rm from word → Rn 0110nnnnmmmm1111 1 — EXTU.B Rm,Rn Zero – extend Rm from byte → Rn 0110nnnnmmmm1100 1 — EXTU.W Rm,Rn Zero – extend Rm from word → Rn 0110nnnnmmmm1101 1 — FADD FRm, FRn FRm + FRn → FRn 1111nnnnmmmm0000 1 — 290 Table A.4 Source and Destination Operand (cont) Instruction Operation Code Cycles T Bit FCMP/EQ FRm, FRn (FRn == FRm)? 1:0 → T 1111nnnnmmmm0100 1 Comparison result FCMP/GT FRm, FRn (FRn > FRm)? 1:0 → T 1111nnnnmmmm0101 1 Comparison result FDIV FRm, FRn FRn/FRm → FRn 1111nnnnmmmm0011 13 — FMAC FR0,FRm FRn (FR0 × FRm) + FRn → FRn 1111nnnnmmmm1110 1 — FMOV FRm, FRn FRm → FRn 1111nnnnmmmm1100 1 — FMUL FRm, FRn FRn × FRm → FRn 1111nnnnmmmm0010 1 — FSUB FRm, FRn FRn – FRm → FRn 1111nnnnmmmm0001 1 — MOV Rm,Rn Rm → Rn 0110nnnnmmmm0011 1 — MUL.L Rm,Rn Rn × Rm → MAC 0000nnnnmmmm0111 2 to 4* — MULS.W Rm,Rn With sign, Rn × Rm → MAC 0010nnnnmmmm1111 1 to 3* — MULU.W Rm,Rn Unsigned, Rn × Rm → MAC 0010nnnnmmmm1110 1 to 3* — NEG Rm,Rn 0 – Rm → Rn 0110nnnnmmmm1011 1 — NEGC Rm,Rn 0 – Rm – T → Rn, Borrow → T 0110nnnnmmmm1010 1 Borrow NOT Rm,Rn ~Rm → Rn 0110nnnnmmmm0111 1 — OR Rm,Rn Rn | Rm → Rn 0010nnnnmmmm1011 1 — SUB Rm,Rn Rn – Rm → Rn 0011nnnnmmmm1000 1 — SUBC Rm,Rn Rn – Rm – T → Rn, Borrow → T 0011nnnnmmmm1010 1 Borrow SUBV Rm,Rn Rn – Rm → Rn, Underflow → T 0011nnnnmmmm1011 1 Underflow SWAP.B Rm,Rn Rm → Swap upper and lower halves of lower 2 bytes → Rn 0110nnnnmmmm1000 1 — SWAP.W Rm,Rn Rm → Swap upper and lower word → Rn 0110nnnnmmmm1001 1 — TST Rm,Rn Rn & Rm, when result is 0, 1→T 0010nnnnmmmm1000 1 Test results XOR Rm,Rn Rn ^ Rm → Rn 0010nnnnmmmm1010 1 — 291 Table A.4 Source and Destination Operand (cont) Instruction Operation XTRCT Rm: Center 32 bits of Rn → 0010nnnnmmmm1101 Rn Rm,Rn Code Cycles T Bit 1 — Note: * The normal minimum number of execution states. Table A.5 Load and Store with Control Register or System Register Instruction Operation Code Cycles T Bit FLDS FRm,FPUL FRm → FPUL 1111mmmm00011101 1 — FSTS FPUL,FRn FPUL → FRn 1111nnnn00001101 1 — LDC Rm,SR Rm → SR 0100mmmm00001110 1 LSB LDC Rm,GBR Rm → GBR 0100mmmm00011110 1 — LDC Rm,VBR Rm → VBR 0100mmmm00101110 1 — LDS Rm,FPSCR Rm → FPSCR 0100mmmm01101010 1 — LDS Rm,FPUL Rm → FPUL 0100mmmm01011010 1 — LDS Rm,MACH Rm → MACH 0100mmmm00001010 1 — LDS Rm,MACL Rm → MACL 0100mmmm00011010 1 — LDS Rm,PR Rm → PR 0100mmmm00101010 1 — STC SR,Rn SR → Rn 0000nnnn00000010 1 — STC GBR,Rn GBR → Rn 0000nnnn00010010 1 — STC VBR,Rn VBR → Rn 0000nnnn00100010 1 — STS FPSCR,Rn FPSCR → Rn 1111nnnn01101010 1 — STS FPUL,Rn FPUL → Rn 1111nnnn01011010 1 — STS MACH,Rn MACH → Rn 0000nnnn00001010 1 — STS MACL,Rn MACL → Rn 0000nnnn00011010 1 — STS PR,Rn PR → Rn 0000nnnn00101010 1 — A.1.3 Indirect Register Addressing Table A.6 Source Operand Only Instruction Operation JMP @Rm JSR @Rm 292 Code Cycles T Bit Delayed branching, Rm → PC 0100nnnn00101011 2 — Delayed branching, PC → PR, Rm → PC 2 — 0100nnnn00001011 Table A.7 Destination Operand Only Instruction Operation Code Cycles T Bit TAS.B When (Rn) is 0, 1 → T, 1 → MSB of (Rn) 0100nnnn00011011 4 Test results @Rn Table A.8 Data Transfer Direct to Register Instruction Operation Code Cycles T Bit FMOV.S FRm,@Rn FRm → (FRn) 1111nnnnmmmm1010 1 — FMOV.S @Rm,FRn (Rm) → FRn 1111nnnnmmmm1000 1 — MOV.B Rm,@Rn Rm → (Rn) 0010nnnnmmmm0000 1 — MOV.W Rm,@Rn Rm → (Rn) 0010nnnnmmmm0001 1 — MOV.L Rm,@Rn Rm → (Rn) 0010nnnnmmmm0010 1 — MOV.B @Rm,Rn (Rm) → sign extension → Rn 0110nnnnmmmm0000 1 — MOV.W @Rm,Rn (Rm) → sign extension → Rn 0110nnnnmmmm0001 1 — MOV.L @Rm,Rn (Rm) → Rn 0110nnnnmmmm0010 1 — A.1.4 Post-Increment Indirect Register Addressing Table A.9 Multiply/Accumulate Operation Instruction Operation Code Cycles T Bit MAC.L @Rm+,@Rn+ Signed operation of (Rn) × (Rm) + MAC → MAC 0000nnnnmmmm1111 3/(2 to 4)* — MAC.W @Rm+,@Rn+ Signed operation of (Rn) × (Rm) + MAC → MAC 0100nnnnmmmm1111 3/(2)* — Note: * Normal minimum number of execution states (the number in parenthesis is the number of states when there is contention with preceding/following instructions). Table A.10 Data Transfer Direct from Register Instruction Operation Code Cycles T Bit FMOV.S @Rm+,FRn (Rm) → FRn, Rm + 4 → Rm 1111nnnnmmmm1001 1 — MOV.B @Rm+,Rn (Rm) → sign extension → Rn, Rm + 1 → Rm 0110nnnnmmmm0100 1 — MOV.W @Rm+,Rn (Rm) → sign extension → Rn, Rm + 2 → Rm 0110nnnnmmmm0101 1 — MOV.L @Rm+,Rn (Rm) → Rn, Rm + 4 → Rm 0110nnnnmmmm0110 1 — 293 Table A.11 Load to Control Register or System Register Instruction Operation Code Cycles T Bit LDC.L @Rm+,SR (Rm) → SR, Rm + 4 → Rm 0100mmmm00000111 3 LSB LDC.L @Rm+,GBR (Rm) → GBR, Rm + 4 → Rm 0100mmmm00010111 3 — LDC.L @Rm+,VBR (Rm) → VBR, Rm + 4 → Rm 0100mmmm00100111 3 — LDS.L @Rm+,FPSCR (Rm) → FPSCR, Rm + 4 → Rm 0100mmmm01100110 1 — LDS.L @Rm+,FPUL (Rm) → FPUL, Rm + 4 → Rm 0100mmmm01010110 1 — LDS.L @Rm+,MACH (Rm) → MACH, @Rm + 4 → Rm 0100mmmm00000110 1 — LDS.L @Rm+,MACL (Rm) → MACL, @Rm + 4 → Rm 0100mmmm00010110 1 — LDS.L @Rm+,PR (Rm) → PR, @Rm + 4 → Rm 0100mmmm00100110 1 — A.1.5 Pre-Decrement Indirect Register Addressing Table A.12 Data Transfer Direct from Register Instruction Operation Code Cycles T Bit FMOV.S FRm,@–Rn Rn – 4 → Rn, FRm → (Rn) 1111nnnnmmmm1011 1 — MOV.B Rm,@–Rn Rn – 1 → Rn, Rm → (Rn) 0010nnnnmmmm0100 1 — MOV.W Rm,@–Rn Rn – 2 → Rn, Rm → (Rn) 0010nnnnmmmm0101 1 — MOV.L Rm,@–Rn Rn – 4 → Rn, Rm → (Rn) 0010nnnnmmmm0110 1 — Table A.13 Store from Control Register or System Register Instruction Operation Code Cycles T Bit STC.L SR,@-Rn Rn – 4 → Rn, SR → (Rn) 0100nnnn00000011 2 — STC.L GBR,@-Rn Rn – 4 → Rn, GBR → (Rn) 0100nnnn00010011 2 — STC.L VBR,@-Rn Rn – 4 → Rn, VBR → (Rn) 0100nnnn00100011 2 — STS.L FPSCR,@–Rn Rn – 4 → Rn, FPSCR → (Rn) 0100nnnn01100010 1 — STS.L FPUL,@–Rn Rn – 4 → Rn, FPUL → (Rn) 0100nnnn01010010 1 — STS.L MACH,@–Rn Rn – 4 → Rn, MACH → (Rn) 0100nnnn00000010 1 — STS.L MACL,@–Rn Rn – 4 → Rn, MACL → (Rn) 0100nnnn00010010 1 — STS.L PR,@–Rn Rn – 4 → Rn, PR → (Rn) 0100nnnn00100010 1 — 294 A.1.6 Indirect Register Addressing with Displacement Table A.14 Indirect Register Addressing with Displacement Instruction Operation Code Cycles T Bit MOV.B R0,@(disp,Rn) R0 → (disp + Rn) 10000000nnnndddd 1 — MOV.W R0,@(disp,Rn) R0 → (disp + Rn) 10000001nnnndddd 1 — MOV.L Rm,@(disp,Rn) Rm → (disp + Rn) 0001nnnnmmmmdddd 1 — MOV.B @(disp,Rm),R0 (disp + Rm) → sign extension → R0 10000100mmmmdddd 1 — MOV.W @(disp,Rm),R0 (disp + Rm) → sign extension → R0 10000101mmmmdddd 1 — MOV.L @(disp,Rm),Rn (disp + Rm) → Rn 0101nnnnmmmmdddd 1 — A.1.7 Indirect Indexed Register Addressing Table A.15 Indirect Indexed Register Addressing Instruction Operation Code Cycles T Bit MOV.B Rm,@(R0,Rn) Rm → (R0 + Rn) 0000nnnnmmmm0100 1 — MOV.W Rm,@(R0,Rn) Rm → (R0 + Rn) 0000nnnnmmmm0101 1 — MOV.L Rm,@(R0,Rn) Rm → (R0 + Rn) 0000nnnnmmmm0110 1 — FMOV.S FRm,@(R0,Rn) FRm → (R0 + Rn) 1111nnnnmmmm0111 1 — MOV.B @(R0,Rm),Rn (R0 + Rm) → sign extension → Rn 0000nnnnmmmm1100 1 — MOV.W @(R0,Rm),Rn (R0 + Rm) → sign extension → Rn 0000nnnnmmmm1101 1 — MOV.L @(R0,Rm),Rn (R0 + Rm) → Rn 0000nnnnmmmm1110 1 — (R0 + Rn) → FRn 1111nnnnmmmm0110 1 — FMOV.S @(R0,FRm),FRm 295 A.1.8 Indirect GBR Addressing with Displacement Table A.16 Indirect GBR Addressing with Displacement Instruction Operation Code Cycles T Bit MOV.B R0,@(disp,GBR) R0 → (disp + GBR) 11000000dddddddd 1 — MOV.W R0,@(disp,GBR) R0 → (disp + GBR) 11000001dddddddd 1 — MOV.L R0,@(disp,GBR) R0 → (disp + GBR) 11000010dddddddd 1 — MOV.B @(disp,GBR),R0 (disp + GBR) → sign extension → R0 11000100dddddddd 1 — MOV.W @(disp,GBR),R0 (disp + GBR) → sign extension → R0 11000101dddddddd 1 — MOV.L @(disp,GBR),R0 (disp + GBR) → R0 11000110dddddddd 1 — A.1.9 Indirect Indexed GBR Addressing Table A.17 Indirect Indexed GBR Addressing Instruction Operation Code Cycles T Bit AND.B #imm,@(R0,GBR) (R0 + GBR) & imm → (R0 + GBR) 11001101iiiiiiii 3 — OR.B #imm,@(R0,GBR) (R0 + GBR) | imm → (R0 + GBR) 11001111iiiiiiii 3 — TST.B #imm,@(R0,GBR) (R0 + GBR) & imm, when result is 0, 1 → T 11001100iiiiiiii 3 Test results XOR.B #imm,@(R0,GBR) (R0 + GBR) ^ imm → (R0 + GBR) 11001110iiiiiiii 3 — A.1.10 PC Relative Addressing with Displacement Table A.18 PC Relative Addressing with Displacement Instruction Operation Code Cycles T Bit MOV.W @(disp,PC),Rn (disp + PC) → sign extension → Rn 1001nnnndddddddd 1 — MOV.L @(disp,PC),Rn (disp + PC) → Rn 1101nnnndddddddd 1 — MOVA @(disp,PC),R0 disp + PC → R0 11000111dddddddd 1 — 296 A.1.11 PC Relative Addressing Table A.19 PC Relative Addressing with Rn Instruction Operation Code Cycles T Bit BRAF Rm Delayed branch, Rm + PC → PC 0000nnnn00100011 2 — BSRF Rm Delayed branch, PC → PR, Rm + PC → PC 0000nnnn00000011 2 — Table A.20 PC Relative Addressing Instruction Operation Code Cycles T Bit BF label When T = 0, disp + PC → PC; when T = 1, nop 10001011dddddddd 3/1* — BF/S label If T = 0, disp + PC → PC; if T = 1, nop 10001111dddddddd 2/1* — BT label When T = 1, disp + PC → PC; when T = 1, nop 10001001dddddddd 3/1* — BT/S label If T = 1, disp + PC → PC; if T = 0, nop 10001101dddddddd 2/1* — BRA label Delayed branching, disp + PC → PC 1010dddddddddddd 2 — BSR label Delayed branching, PC → PR, disp + PC → PC 2 — 1011dddddddddddd Note: * One state when it does not branch. A.1.12 Immediate Table A.21 Load to Register Instruction Operation Code Cycles T Bit FLDI0 FRn 0x00000000 → FRn 1111nnnn10001101 1 — FLDI1 FRn 0x3F800000 → FRn 1111nnnn10011101 1 — 297 Table A.22 Arithmetic Logical Operations Direct with Register Instruction Operation Code Cycles T Bit ADD #imm,Rn Rn + imm → Rn 0111nnnniiiiiiii 1 — AND #imm,R0 R0 & imm → R0 11001001iiiiiiii 1 — CMP/EQ #imm,R0 When R0 = imm, 1 → T 10001000iiiiiiii 1 Comparison result MOV #imm,Rn imm → sign extension → Rn 1110nnnniiiiiiii 1 — OR #imm,R0 R0 | imm → R0 11001011iiiiiiii 1 — TST #imm,R0 R0 & imm, when result is 0, 1 → T 11001000iiiiiiii 1 Test results XOR #imm,R0 R0 ^ imm → R0 11001010iiiiiiii 1 — Table A.23 Specify Exception Processing Vector Instruction Operation Code Cycles T Bit TRAPA Stack area → PC/SR (imm × 4 + VBR) → PC 11000011iiiiiiii 8 — 298 #imm A.2 Instruction Sets by Instruction Format Tables A.24 to A.54 list instruction codes and execution cycles by instruction formats. Table A.24 Instruction Sets by Format Format Category Sample Instruction 0 — NOP n Direct register addressing MOVT Rn Direct register addressing (store with control or system registers) STS MACH,Rn 8 Indirect register addressing TAS.B @Rn 1 Pre-decrement indirect register addressing STC.L SR,@–Rn 8 Floating-point instruction FABS FRn 6 Direct register addressing (load with control or system registers) LDC Rm,SR 8 PC relative addressing with Rm BRAF Rm 2 Indirect register addressing JMP @Rm 2 Post-increment indirect register addressing LDC.L @Rm+,SR 8 Floating-point instruction FLDS FRm,FPUL 2 Direct register addressing ADD Rm,Rn 34 Indirect register addressing MOV.L Rm,@Rn 6 Post-increment indirect register addressing (multiply/accumulate operation) MAC.W @Rm+,@Rn+ 2 Post-increment indirect register addressing MOV.L @Rm+,Rn 3 Pre-decrement indirect register addressing MOV.L Rm,@–Rn 3 Indirect indexed register addressing MOV.L Rm,@(R0,Rn) 6 Floating-point instruction FADD FRm,FRn md Indirect register addressing with displacement MOV.B @(disp,Rm),R0 2 nd4 Indirect register addressing with displacement MOV.B R0,@(disp,Rn) 2 nmd Indirect register addressing with displacement MOV.L Rm,@(disp,Rn) 2 d Indirect GBR addressing with displacement MOV.L R0,@(disp,GBR) 6 Indirect PC addressing with displacement MOVA @(disp,PC),R0 1 PC relative addressing BF disp 4 d12 PC relative addressing BRA disp 2 nd8 PC relative addressing with displacement MOV.L @(disp,PC),Rn 2 m nm Types 8 18 14 299 Table A.24 Instruction Sets by Format (cont) Format Category Sample Instruction i Indirect indexed GBR addressing AND.B #imm,@(R0,GBR) 4 Immediate addressing (arithmetic and logical operations direct with register) AND #imm,R0 5 Immediate addressing (specify exception processing vector) TRAPA #imm 1 #imm,Rn 2 ni Immediate addressing (direct register arithmetic ADD operations and data transfers ) Types Total: A.2.1 172 0 Format Table A.25 0 Format Instruction Operation Code Cycles T Bit CLRT 0→T 0000000000001000 1 0 CLRMAC 0 → MACH, MACL 0000000000101000 1 — DIV0U 0 → M/Q/T 0000000000011001 1 0 NOP No operation 0000000000001001 1 — RTE Delayed branch, Stack area → PC/SR 0000000000101011 4 LSB RTS Delayed branching, PR → PC 0000000000001011 2 — SETT 1→T 0000000000011000 1 1 SLEEP Sleep 0000000000011011 3* — Note: * The number of excection cycles before the chip enters sleep mode. 300 A.2.2 n Format Table A.26 Direct Register Instruction Operation Code Cycles T Bit CMP/PL Rn Rn > 0, 1 → T 0100nnnn00010101 1 Comparison result CMP/PZ Rn Rn 0, 1 → T 0100nnnn00010001 1 Comparison result DT Rn Rn – 1 → Rn, when Rn is 0, 1 → T. When Rn is nonzero, 0 → T 0100nnnn00010000 1 Comparison result MOVT Rn T → Rn 0000nnnn00101001 1 — ROTL Rn T ← Rn ← MSB 0100nnnn00000100 1 MSB ROTR Rn LSB → Rn → T 0100nnnn00000101 1 LSB ROTCL Rn T ← Rn ← T 0100nnnn00100100 1 MSB ROTCR Rn T → Rn → T 0100nnnn00100101 1 LSB SHAL Rn T ← Rn ← 0 0100nnnn00100000 1 MSB SHAR Rn MSB → Rn → T 0100nnnn00100001 1 LSB SHLL Rn T ← Rn ← 0 0100nnnn00000000 1 MSB SHLR Rn 0 → Rn → T 0100nnnn00000001 1 LSB SHLL2 Rn Rn << 2 → Rn 0100nnnn00001000 1 — SHLR2 Rn Rn >> 2 → Rn 0100nnnn00001001 1 — SHLL8 Rn Rn << 8 → Rn 0100nnnn00011000 1 — SHLR8 Rn Rn >> 8 → Rn 0100nnnn00011001 1 — SHLL16 Rn Rn << 16 → Rn 0100nnnn00101000 1 — SHLR16 Rn Rn >> 16 → Rn 0100nnnn00101001 1 — 301 Table A.27 Direct Register (Store with Control and System Registers) Instruction Operation Code Cycles T Bit STC SR,Rn SR → Rn 0000nnnn00000010 1 — STC GBR,Rn GBR → Rn 0000nnnn00010010 1 — STC VBR,Rn VBR → Rn 0000nnnn00100010 1 — STS FPSCR,Rn FPSCR→ Rn 0000nnnn01101010 1 — STS FPUL,Rn FPUL→ Rn 0000nnnn01011010 1 — STS MACH,Rn MACH → Rn 0000nnnn00001010 1 — STS MACL,Rn MACL → Rn 0000nnnn00011010 1 — STS PR,Rn PR → Rn 0000nnnn00101010 1 — Table A.28 Indirect Register Instruction Operation Code Cycles T Bit TAS.B When (Rn) is 0, 1 → T, 1 → MSB of (Rn) 0100nnnn00011011 4 Test results @Rn Table A.29 Indirect Pre-Decrement Register Instruction Operation Code Cycles T Bit STC.L SR,@-Rn Rn – 4 → Rn, SR → (Rn) 0100nnnn00000011 1 — STC.L GBR,@-Rn Rn – 4 → Rn, GBR → (Rn) 0100nnnn00010011 1 — STC.L VBR,@-Rn Rn – 4 → Rn, VBR → (Rn) 0100nnnn00100011 1 — STS.L FRSCR,@-Rn Rn – 4 → Rn,FPSCR → Rn 0100nnnn01100010 1 — STS.L FPUL,@-Rn Rn – 4 → Rn,FPUL → Rn 0100nnnn01010010 1 — STS.L MACH,@–Rn Rn – 4 → Rn, MACH → (Rn) 0100nnnn00000010 1 — STS.L MACL,@–Rn Rn – 4 → Rn, MACL → (Rn) 0100nnnn00010010 1 — STS.L PR,@–Rn Rn – 4 → Rn, PR → (Rn) 0100nnnn00100010 1 — Note: SH-3E instructions. 302 Table A.30 Floating-Point Instruction Instruction Operation Code Cycles T Bit FABS FRn FRn → FRn 1111nnnn01011101 1 — FLDI0 FRn H'00000000 → FRn 1111nnnn10001101 1 — FLDI1 FRn H'3F800000 → FRn 1111nnnn10011101 1 — FLOAT FPUL,FRn (float)FPUL → FRn 1111nnnn00101101 1 — FNEG FRn -FRn → FRn 1111nnnn01001101 1 — FSTS FPUL,FRn FPUL → FRn 1111nnnn00001101 1 — A.2.3 m Format Table A.31 Direct Register (Load from Control and System Registers) Instruction Operation Code Cycles T Bit LDC Rm,SR Rm → SR 0100mmmm00001110 1 LSB LDC Rm,GBR Rm → GBR 0100mmmm00011110 1 — LDC Rm,VBR Rm → VBR 0100mmmm00101110 1 — LDS Rm,FPSCR Rm → FPSCR 0100nnnn01101010 1 — LDS Rm,FPUL Rm → FPUL 0100nnnn01011010 1 — LDS Rm,MACH Rm → MACH 0100mmmm00001010 1 — LDS Rm,MACL Rm → MACL 0100mmmm00011010 1 — LDS Rm,PR Rm → PR 0100mmmm00101010 1 — Table A.32 Indirect Register Instruction Operation Code Cycles T Bit JMP @Rm Delayed branch, Rm → PC 0100mmmm00101011 2 — JSR @Rm Delayed branch, PC → PR, Rm → PC 0100mmmm00001011 2 — 303 Table A.33 Indirect Post-Increment Register Instruction Operation Code Cycles T Bit LDC.L @Rm+,SR (Rm) → SR, Rm + 4 → Rm 0100mmmm00000111 3 LSB LDC.L @Rm+,GBR (Rm) → GBR, Rm + 4 → Rm 0100mmmm00010111 3 — LDC.L @Rm+,VBR (Rm) → VBR, Rm + 4 → Rm 0100mmmm00100111 3 — LDS.L @Rm+,FPSCR @Rm → FPSCR, Rm + 4 → Rm 0100nnnn01100110 1 — LDS.L @Rm+,FPUL @Rm → FPUL, Rm + 4 → Rm 0100nnnn01010110 1 — LDS.L @Rm+,MACH (Rm) → MACH, Rm + 4 → Rm 0100mmmm00000110 1 — LDS.L @Rm+,MACL (Rm) → MACL, Rm + 4 → Rm 0100mmmm00010110 1 — LDS.L @Rm+,PR (Rm) → PR, Rm + 4 → Rm 0100mmmm00100110 1 — Table A.34 PC Relative Addressing with Rn Instruction Operation Code Cycles T Bit BRAF Rn Delayed branch, Rn + PC → PC 0000nnnn00100011 2 — BSRF Rn Delayed branch, PC → PR, Rn + PC → PC 0000nnnn00000011 2 — Table A.35 Floating-Point Instructions Instruction Operation Code Cycles T Bit FLDS FRm,FPUL FRm → FPUL 1111nnnn00011101 1 — FTRC FRm,FPUL (long)FRm → FPUL 1111nnnn00111101 1 — 304 A.2.4 nm Format Table A.36 Direct Register Instruction Operation Code Cycles T Bit ADD Rm,Rn Rm + Rn → Rn 0011nnnnmmmm1100 1 — ADDC Rm,Rn Rn + Rm + T → Rn, carry → T 0011nnnnmmmm1110 1 Carry ADDV Rm,Rn Rn + Rm → Rn, overflow → T 0011nnnnmmmm1111 1 Overflow AND Rm,Rn Rn & Rm → Rn 0010nnnnmmmm1001 1 — CMP/EQ Rm,Rn When Rn = Rm, 1 → T 0011nnnnmmmm0000 1 Comparison result CMP/HS Rm,Rn When unsigned and Rn Rm, 1 → T 0011nnnnmmmm0010 1 Comparison result CMP/GE Rm,Rn When signed and Rn Rm, 1 → T 0011nnnnmmmm0011 1 Comparison result CMP/HI Rm,Rn When unsigned and Rn > Rm, 1 → T 0011nnnnmmmm0110 1 Comparison result CMP/GT Rm,Rn When signed and Rn > Rm, 1 → T 0011nnnnmmmm0111 1 Comparison result CMP/STR Rm,Rn When a byte in Rn equals a byte in Rm, 1 → T 0010nnnnmmmm1100 1 Comparison result DIV1 Rm,Rn 1 step division (Rn ÷ Rm) 0011nnnnmmmm0100 1 Calculation result DIV0S Rm,Rn MSB of Rn → Q, MSB of Rm → M, M ^ Q → T 0010nnnnmmmm0111 1 Calculation result DMULS.L Rm,Rn Signed operation of Rn × Rm → MACH, MACL 0011nnnnmmmm1101 2 to 4* — DMULU.L Rm,Rn Unsigned operation of Rn × Rm → MACH, MACL 0011nnnnmmmm0101 2 to 4* — EXTS.B Rm,Rn Sign-extend Rm from byte → Rn 0110nnnnmmmm1110 1 — EXTS.W Rm,Rn Sign-extend Rm from word → Rn 0110nnnnmmmm1111 1 — EXTU.B Rm,Rn Zero-extend Rm from byte → Rn 0110nnnnmmmm1100 1 — EXTU.W Rm,Rn Zero-extend Rm from word → Rn 0110nnnnmmmm1101 1 — MOV Rm,Rn Rm → Rn 0110nnnnmmmm0011 1 — 305 Table A.36 Direct Register (cont) Instruction Operation Code Cycles T Bit MUL.L Rm,Rn Rn × Rm → MAC 0000nnnnmmmm0111 2 to 4* — MULS.W Rm,Rn With sign, Rn × Rm → MAC 0010nnnnmmmm1111 1 to 3* — MULU.W Rm,Rn Unsigned, Rn × Rm → MAC 0010nnnnmmmm1110 1 to 3* — NEG Rm,Rn 0 – Rm → Rn 0110nnnnmmmm1011 1 — NEGC Rm,Rn 0 – Rm – T → Rn, Borrow → T 0110nnnnmmmm1010 1 Borrow NOT Rm,Rn ~Rm → Rn 0110nnnnmmmm0111 1 — OR Rm,Rn Rn | Rm → Rn 0010nnnnmmmm1011 1 — SUB Rm,Rn Rn – Rm → Rn 0011nnnnmmmm1000 1 — SUBC Rm,Rn Rn – Rm – T → Rn, Borrow → T 0011nnnnmmmm1010 1 Borrow SUBV Rm,Rn Rn – Rm → Rn, Underflow → T 0011nnnnmmmm1011 1 Underflow SWAP.B Rm,Rn Rm → Swap upper and lower halves of lower 2 bytes → Rn 0110nnnnmmmm1000 1 — SWAP.W Rm,Rn Rm → Swap upper and lower word → Rn 0110nnnnmmmm1001 1 — TST Rm,Rn Rn & Rm, when result is 0, 1 → T 0010nnnnmmmm1000 1 Test results XOR Rm,Rn Rn ^ Rm → Rn 0010nnnnmmmm1010 1 — XTRCT Rm,Rn Rm: Center 32 bits of Rn → Rn 0010nnnnmmmm1101 1 — Note: The normal minimum number of execution states. Table A.37 Indirect Register Instruction Operation Code Cycles T Bit MOV.B Rm,@Rn Rm → (Rn) 0010nnnnmmmm0000 1 — MOV.W Rm,@Rn Rm → (Rn) 0010nnnnmmmm0001 1 — MOV.L Rm,@Rn Rm → (Rn) 0010nnnnmmmm0010 1 — MOV.B @Rm,Rn (Rm) → sign extension → Rn 0110nnnnmmmm0000 1 — MOV.W @Rm,Rn (Rm) → sign extension → Rn 0110nnnnmmmm0001 1 — MOV.L @Rm,Rn (Rm) → Rn 1 — 306 0110nnnnmmmm0010 Table A.38 Indirect Post-Increment Register (Multiply/Accumulate Operation) Instruction Operation Code Cycles T Bit MAC.L @Rm+,@Rn+ Signed operation of (Rn) × (Rm) + MAC → MAC 0000nnnnmmmm1111 3/(2 to 4)* — MAC.W @Rm+,@Rn+ Signed operation of (Rn) × (Rm) + MAC → MAC 0100nnnnmmmm1111 3/(2)* — Note: * Normal minimum number of execution states (the number in parentheses is the number of states when there is contention with preceding/following instructions). Table A.39 Indirect Post-Increment Register Instruction Operation Code Cycles T Bit MOV.B @Rm+,Rn (Rm) → sign extension → Rn, Rm + 1 → Rm 0110nnnnmmmm0100 1 — MOV.W @Rm+,Rn (Rm) → sign extension → Rn, Rm + 2 → Rm 0110nnnnmmmm0101 1 — MOV.L @Rm+,Rn (Rm) → Rn, Rm + 4 → Rm 0110nnnnmmmm0110 1 — Table A.40 Indirect Pre-Decrement Register Instruction Operation Code Cycles T Bit MOV.B Rm,@–Rn Rn – 1 → Rn, Rm → (Rn) 0010nnnnmmmm0100 1 — MOV.W Rm,@–Rn Rn – 2 → Rn, Rm → (Rn) 0010nnnnmmmm0101 1 — MOV.L Rm,@–Rn Rn – 4 → Rn, Rm → (Rn) 0010nnnnmmmm0110 1 — Table A.41 Indirect Indexed Register Instruction Operation Code Cycles T Bit MOV.B Rm,@(R0,Rn) Rm → (R0 + Rn) 0000nnnnmmmm0100 1 — MOV.W Rm,@(R0,Rn) Rm → (R0 + Rn) 0000nnnnmmmm0101 1 — MOV.L Rm,@(R0,Rn) Rm → (R0 + Rn) 0000nnnnmmmm0110 1 — MOV.B @(R0,Rm),Rn (R0 + Rm) → sign extension → Rn 0000nnnnmmmm1100 1 — MOV.W @(R0,Rm),Rn (R0 + Rm) → sign extension → Rn 0000nnnnmmmm1101 1 — MOV.L @(R0,Rm),Rn (R0 + Rm) → Rn 0000nnnnmmmm1110 1 — 307 Table A.42 Floating Point Instructions Instruction Operation Code Cycles T Bit FADD FRm,FRn FRn+FRm → FRn 1111nnnnmmmm0000 1 — FCMP/EQ FRm,FRn (FRn=FRm)? 1:0 → T 1111nnnnmmmm0100 1 Comparison result FCMP/GT FRm,FRn (FRn>FRm)? 1:0 → T 1111nnnnmmmm0101 1 Comparison result FDIV FRm,FRn FRn/FRm → FRn 1111nnnnmmmm0011 13 — FMAC FR0,FRm,FRn FR0×FRm+FRn → FRn 1111nnnnmmmm1110 1 — FMOV FRm,FRn FRm → FRn 1111nnnnmmmm1100 1 — FMOV.S @(R0,Rm),FRn (R0+Rm) → FRn 1111nnnnmmmm0110 1 — FMOV.S @Rm+,FRn (Rm) → FRn,Rm+4 → Rm 1111nnnnmmmm1001 1 — FMOV.S @Rm,FRn (Rm) → FRn 1111nnnnmmmm1000 1 — FMOV.S FRm,@(R0,Rn) FRm → (R0+Rn) 1111nnnnmmmm0111 1 — FMOV.S FRm,@-Rn Rn-4 → Rn, FRm → (Rn) 1111nnnnmmmm1011 1 — FMOV.S FRm,@Rn FRm → (Rn) 1111nnnnmmmm1010 1 — FMUL FRm,FRn FRn × FRm → FRn 1111nnnnmmmm0010 1 — FSUB FRm,FRn FRn-FRm → FRn 1111nnnnmmmm0001 1 — A.2.5 md Format Table A.43 md Format Instruction Operation Code Cycles T Bit MOV.B @(disp,Rm),R0 (disp + Rm) → sign extension → R0 10000100mmmmdddd 1 — MOV.W @(disp,Rm),R0 (disp × 2 + Rm) → sign extension → R0 10000101mmmmdddd 1 — A.2.6 nd4 Format Table A.44 nd4 Format Instruction Operation Code Cycles T Bit MOV.B R0,@(disp,Rn) R0 → (disp + Rn) 10000000nnnndddd 1 — MOV.W R0,@(disp,Rn) R0 → (disp × 2 + Rn) 10000001nnnndddd 1 — 308 A.2.7 nmd Format Table A.45 nmd Format Instruction Operation Code Cycles T Bit MOV.L Rm,@(disp,Rn) Rm → (disp + Rn) 0001nnnnmmmmdddd 1 — MOV.L @(disp,Rm),Rn (disp × 4 + Rm) → Rn 0101nnnnmmmmdddd 1 — A.2.8 d Format Table A.46 Indirect GBR with Displacement Instruction Operation Code Cycles T Bit MOV.B R0,@(disp,GBR) R0 → (disp + GBR) 11000000dddddddd 1 — MOV.W R0,@(disp,GBR) R0 → (disp × 2 + GBR) 11000001dddddddd 1 — MOV.L R0,@(disp,GBR) R0 → (disp × 4 + GBR) 11000010dddddddd 1 — MOV.B @(disp,GBR),R0 (disp + GBR) → sign extension → R0 11000100dddddddd 1 — MOV.W @(disp,GBR),R0 (disp × 2 + GBR) → sign extension → R0 11000101dddddddd 1 — MOV.L @(disp,GBR),R0 (disp × 4 + GBR) → R0 11000110dddddddd 1 — Table A.47 PC Relative with Displacement Instruction Operation Code Cycles T Bit MOVA disp × 4 + PC → R0 11000111dddddddd 1 — @(disp,PC),R0 Table A.48 PC Relative Instruction Operation Code Cycles T Bit BF When T = 0, disp × 2 + PC → PC; when T = 1, nop 10001011dddddddd 3/1* — BF/S label If T = 0, disp × 2 + PC → PC; if T = 1, nop 10001111dddddddd 2/1* — BT label When T = 1, disp × 2 + PC → PC; 10001001dddddddd when T = 0, nop 3/1* — BT/S label If T = 1, disp × 2 + PC → PC; if T = 0, nop 2/1* label 10001101dddddddd Note: * One state when it does not branch. 309 A.2.9 d12 Format Table A.49 d12 Format Instruction Operation BRA label BSR label Code Cycles T Bit Delayed branching, disp × 2 + PC → PC 1010dddddddddddd 2 — Delayed branching, PC → PR, disp × 2 + PC → PC 2 — 1011dddddddddddd A.2.10 nd8 Format Table A.50 nd8 Format Instruction Operation Code Cycles T Bit MOV.W @(disp,PC),Rn (disp × 2 + PC) → sign extension → Rn 1001nnnndddddddd 1 — MOV.L @(disp,PC),Rn (disp × 4 + PC) → Rn 1101nnnndddddddd 1 — A.2.11 i Format Table A.51 Indirect Indexed GBR Instruction Operation Code Cycles T Bit AND.B #imm,@(R0,GBR) (R0 + GBR) & imm → (R0 + GBR) 11001101iiiiiiii 3 — OR.B #imm,@(R0,GBR) (R0 + GBR) | imm → (R0 + GBR) 11001111iiiiiiii 3 — TST.B #imm,@(R0,GBR) (R0 + GBR) & imm, when result is 0, 1 → T 11001100iiiiiiii 3 Test results XOR.B #imm,@(R0,GBR) (R0 + GBR) ^ imm → (R0 + GBR) 11001110iiiiiiii 3 — 310 Table A.52 Immediate (Arithmetic Logical Operation with Direct Register) Instruction Operation Code Cycles T Bit AND #imm,R0 R0 & imm → R0 11001001iiiiiiii 1 — CMP/EQ #imm,R0 When R0 = imm, 1 → T 10001000iiiiiiii 1 Comparison results OR #imm,R0 R0 | imm → R0 11001011iiiiiiii 1 — TST #imm,R0 R0 & imm, when result is 0, 1 → T 11001000iiiiiiii 1 Test results XOR #imm,R0 R0 ^ imm → R0 11001010iiiiiiii 1 — Table A.53 Immediate (Specify Exception Processing Vector) Instruction Operation Code Cycles T Bit TRAPA Stack area → PC/SR (imm × 4 + VBR) → PC 11000011iiiiiiii 8 — #imm A.2.12 ni Format Table A.54 ni Format Instruction Operation Code Cycles T Bit ADD #imm,Rn Rn + imm → Rn 0111nnnniiiiiiii 1 — MOV #imm,Rn imm → sign extension → Rn 1110nnnniiiiiiii 1 — 311 A.3 Instruction Set by Instruction Code Table A.55 lists instruction codes and execution cycles by instruction code. Table A.55 Instruction Set by Instruction Code Instruction Operation Code Cycles T Bit CLRT 0→T 0000000000001000 1 0 NOP No operation 0000000000001001 1 — RTS Delayed branching, PR → PC 0000000000001011 2 — SETT 1→T 0000000000011000 1 1 DIV0U 0 → M/Q/T 0000000000011001 1 0 SLEEP Sleep 0000000000011011 3 — CLRMAC 0 → MACH, MACL 0000000000101000 1 — RTE Delayed branch, SSR/SPC → SR/PC 0000000000101011 4 — STC SR,Rn SR → Rn 0000nnnn00000010 1 — BSRF Rn Delayed branch, PC → PR, Rn + PC → PC 0000nnnn00000011 2 — STS MACH,Rn MACH → Rn 0000nnnn00001010 1 — STC GBR,Rn GBR → Rn 0000nnnn00010010 1 — STS MACL,Rn MACL → Rn 0000nnnn00011010 1 — STC VBR,Rn VBR → Rn 0000nnnn00100010 1 — BRAF Rm Delayed branch, Rn + PC → PC 0000nnnn00100011 2 — MOVT Rn T → Rn 0000nnnn00101001 1 — STS PR,Rn PR → Rn 0000nnnn00101010 1 — STS FPUL,Rn FPUL → Rn 0000nnnn01011010 1 — STS FPSCR,Rn FPSCR → Rn 0000nnnn01101010 1 — MOV.B Rm,@(R0,Rn) Rm → (R0 + Rn) 0000nnnnmmmm0100 1 — MOV.W Rm,@(R0,Rn) Rm → (R0 + Rn) 0000nnnnmmmm0101 1 — MOV.L Rm,@(R0,Rn) Rm → (R0 + Rn) 0000nnnnmmmm0110 1 — MUL.L Rm,Rn Rn × Rm → MACL 0000nnnnmmmm0111 2 to 4* — MOV.B @(R0,Rm),Rn (R0 + Rm) → sign extension → Rn 0000nnnnmmmm1100 1 — MOV.W @(R0,Rm),Rn (R0 + Rm) → sign extension → Rn 0000nnnnmmmm1101 1 — 312 Table A.55 Instruction Set by Instruction Code (cont) Instruction Operation Code Cycles T Bit MOV.L @(R0,Rm), Rn (R0 + Rm) → Rn 0000nnnnmmmm1110 1 — MAC.L @Rm+,@Rn+ Signed operation of (Rn) × (Rm) + MAC → MAC 0000nnnnmmmm1111 3/(2 to 4)* — MOV.L Rm, @(disp,Rn) Rm → (disp × 4 + Rn) 0001nnnnmmmmdddd 1 — MOV.B Rm,@Rn Rm → (Rn) 0010nnnnmmmm0000 1 — MOV.W Rm,@Rn Rm → (Rn) 0010nnnnmmmm0001 1 — MOV.L Rm,@Rn Rm → (Rn) 0010nnnnmmmm0010 1 — MOV.B Rm,@-Rn Rn – 1 → Rn, Rm → (Rn) 0010nnnnmmmm0100 1 — MOV.W Rm,@–Rn Rn – 2 → Rn, Rm → (Rn) 0010nnnnmmmm0101 1 — MOV.L Rm,@–Rn Rn – 4 → Rn, Rm → (Rn) 0010nnnnmmmm0110 1 — DIV0S Rm,Rn MSB of Rn → Q, MSB of Rm → M, M ^ Q → T 0010nnnnmmmm0111 1 Calculation result TST Rm,Rn Rn & Rm, when result is 0, 1 → T 0010nnnnmmmm1000 1 Test results AND Rm,Rn Rn & Rm → Rn 0010nnnnmmmm1001 1 — XOR Rm,Rn Rn ^ Rm → Rn 0010nnnnmmmm1010 1 — OR Rm,Rn Rn | Rm → Rn 0010nnnnmmmm1011 1 — CMP/STR Rm,Rn When a byte in Rn equals 0010nnnnmmmm1100 a byte in Rm, 1 → T 1 Comparison result XTRCT Rm,Rn Rm: Center 32 bits of Rn → Rn 0010nnnnmmmm1101 1 — MULU.W Rm,Rn Unsigned, Rn × Rm → MAC 0010nnnnmmmm1110 1 to 3* — MULS.W Rm,Rn Signed, Rn × Rm → MAC 0010nnnnmmmm1111 1 to 3* — CMP/EQ Rm,Rn When Rn = Rm, 1 → T 0011nnnnmmmm0000 1 Comparison result CMP/HS Rm,Rn When unsigned and Rn Rm, 1 → T 0011nnnnmmmm0010 1 Comparison result CMP/GE Rm,Rn When signed and Rn Rm, 1 → T 0011nnnnmmmm0011 1 Comparison result DIV1 Rm,Rn 1 step division (Rn ÷ Rm) 0011nnnnmmmm0100 1 Calculation result 313 Table A.55 Instruction Set by Instruction Code (cont) Instruction Operation DMULU.L Rm,Rn Unsigned operation of Rn 0011nnnnmmmm0101 × Rm → MACH, MACL 2 to 4* — CMP/HI Rm,Rn When unsigned and Rn > Rm, 1 → T 0011nnnnmmmm0110 1 Comparison result CMP/GT Rm,Rn When signed and Rn > Rm, 1 → T 0011nnnnmmmm0111 1 Comparison result SUB Rm,Rn Rn – Rm → Rn 0011nnnnmmmm1000 1 — SUBC Rm,Rn Rn – Rm – T → Rn, Borrow → T 0011nnnnmmmm1010 1 Borrow SUBV Rm,Rn Rn – Rm → Rn, underflow → T 0011nnnnmmmm1011 1 Underflow ADD Rm,Rn Rm + Rn → Rn 0011nnnnmmmm1100 1 — DMULS.L Rm,Rn Signed operation of Rn × Rm → MACH, MACL 0011nnnnmmmm1101 2 to 4* — ADDC Rm,Rn Rn + Rm + T → Rn, carry 0011nnnnmmmm1110 →T 1 Carry ADDV Rm,Rn Rn + Rm → Rn, overflow →T 0011nnnnmmmm1111 1 Overflow SHLL Rn T ← Rn ← 0 0100nnnn00000000 1 MSB SHLR Rn 0 → Rn → T 0100nnnn00000001 1 LSB STS.L MACH,@–Rn Rn – 4 → Rn, MACH → (Rn) 0100nnnn00000010 1 — STC.L SR,@-Rn Rn – 4 → Rn, SR → (Rn) 0100nnnn00000011 2 — ROTL Rn T ← Rn ← MSB 0100nnnn00000100 1 MSB ROTR Rn LSB → Rn → T 0100nnnn00000101 1 LSB LDS.L @Rm+,MACH (Rm) → MACH, Rm + 4 → Rm 0100mmmm00000110 1 — LDC.L @Rm+,SR (Rm) → SR, Rm + 4 → Rm 0100mmmm00000111 3 LSB SHLL2 Rn Rn << 2 → Rn 0100nnnn00001000 1 — SHLR2 Rn Rn >> 2 → Rn 0100nnnn00001001 1 — LDS Rm,MACH Rm → MACH 0100mmmm00001010 1 — JSR @Rm Delayed branching, PC → 0100nnnn00001011 Rn, Rn → PC 2 — 314 Code Cycles T Bit Table A.55 Instruction Set by Instruction Code (cont) Instruction Operation Code Cycles T Bit LDC Rm,SR Rm → SR 0100mmmm00001110 1 LSB DT Rn Rn - 1 → Rn, when Rn is 0, 1 → T. When Rn is nonzero, 0 → T 0100nnnn00010000 1 Comparison result CMP/PZ Rn Rn 0, 1 → T 0100nnnn00010001 1 Comparison result STS.L MACL,@–Rn Rn – 4 → Rn, MACL → (Rn) 0100nnnn00010010 1 — STC.L GBR,@-Rn Rn – 4 → Rn, GBR → (Rn) 0100nnnn00010011 2 — CMP/PL Rn Rn > 0, 1 → T 0100nnnn00010101 1 Comparison result LDS.L @Rm+,MACL (Rm) → MACL, Rm + 4 → Rm 0100mmmm00010110 1 — LDC.L @Rm+,GBR (Rm) → GBR, Rm + 4 → Rm 0100mmmm00010111 3 — SHLL8 Rn Rn << 8 → Rn 0100nnnn00011000 1 — SHLR8 Rn Rn >> 8 → Rn 0100nnnn00011001 1 — LDS Rm,MACL Rm → MACL 0100mmmm00011010 1 — TAS.B @Rn When (Rn) is 0, 1 → T, 1 → MSB of (Rn) 0100nnnn00011011 4 Test results LDC Rm,GBR Rm → GBR 0100mmmm00011110 1 — SHAL Rn T ← Rn ← 0 0100nnnn00100000 1 MSB SHAR Rn MSB → Rn → T 0100nnnn00100001 1 LSB STS.L PR,@–Rn Rn – 4 → Rn, PR → (Rn) 0100nnnn00100010 1 — STC.L VBR,@-Rn Rn – 4 → Rn, VBR → (Rn) 0100nnnn00100011 2 — ROTCL Rn T ← Rn ← T 0100nnnn00100100 1 MSB ROTCR Rn T → Rn → T 0100nnnn00100101 1 LSB LDS.L @Rm+,PR (Rm) → PR, Rm + 4 → Rm 0100mmmm00100110 1 — LDC.L @Rm+,VBR (Rm) → VBR, Rm + 4 → Rm 0100mmmm00100111 3 — SHLL16 Rn Rn << 16 → Rn 0100nnnn00101000 1 — SHLR16 Rn Rn >> 16 → Rn 0100nnnn00101001 1 — 315 Table A.55 Instruction Set by Instruction Code (cont) Instruction Operation Code Cycles T Bit LDS Rm,PR Rm → PR 0100mmmm00101010 1 — JMP @Rm Delayed branching, Rm → PC 0100nnnn00101011 2 — LDC Rm,VBR Rm → VBR 0100mmmm00101110 1 — STS.L FPUL,@-Rn Rn-4 → Rn, FPUL → (Rn) 0100nnnn01010010 1 — LDS.L @Rm+,FPUL (Rm) → FPUL, Rm+4 → Rm 0100nmmm01010110 1 — LDS Rm,FPUL Rm → FPUL 0100mmmm01011010 1 — STS.L FPSCR,@-Rn Rn-4 → Rn, FPSCR → (Rn) 0100nnnn01100010 1 — LDS.L @Rm,FPSCR (Rm) → FPSCR, Rm+4 → Rm 0100mmmm01100110 1 — LDS Rm,FPSCR Rm → FPSCR 0100nmmm01101010 1 — MAC.W @Rm+,@Rn+ With sign, (Rn) × (Rm) + MAC → MAC 0100nnnnmmmm1111 3/(2)* — MOV.L @(disp,Rm),Rn (disp + Rm) → Rn 0101nnnnmmmmdddd 1 — MOV.B @Rm,Rn (Rm) → sign extension → Rn 0110nnnnmmmm0000 1 — MOV.W @Rm,Rn (Rm) → sign extension → Rn 0110nnnnmmmm0001 1 — MOV.L @Rm,Rn (Rm) → Rn 0110nnnnmmmm0010 1 — MOV Rm,Rn Rm → Rn 0110nnnnmmmm0011 1 — MOV.B @Rm+,Rn (Rm) → sign extension → Rn, Rm + 1 → Rm 0110nnnnmmmm0100 1 — MOV.W @Rm+,Rn (Rm) → sign extension → Rn, Rm + 2 → Rm 0110nnnnmmmm0101 1 — MOV.L @Rm+,Rn (Rm) → Rn, Rm + 4 → Rm 0110nnnnmmmm0110 1 — NOT Rm,Rn ~Rm → Rn 0110nnnnmmmm0111 1 — SWAP.B Rm,Rn Rm → Swap upper and lower halves of lower 2 bytes → Rn 0110nnnnmmmm1000 1 — SWAP.W Rm,Rn Rm → Swap upper and lower word → Rn 0110nnnnmmmm1001 1 — 316 Table A.55 Instruction Set by Instruction Code (cont) Instruction Operation Code Cycles T Bit NEGC Rm,Rn 0 – Rm – T → Rn, Borrow → T 0110nnnnmmmm1010 1 Borrow NEG Rm,Rn 0 – Rm → Rn 0110nnnnmmmm1011 1 — EXTU.B Rm,Rn Zero-extend Rm from byte → Rn 0110nnnnmmmm1100 1 — EXTU.W Rm,Rn Zero-extend Rm from word → Rn 0110nnnnmmmm1101 1 — EXTS.B Rm,Rn Sign-extend Rm from byte → Rn 0110nnnnmmmm1110 1 — EXTS.W Rm,Rn Sign-extend Rm from word → Rn 0110nnnnmmmm1111 1 — ADD #imm,Rn Rn + #imm → Rn 0111nnnniiiiiiii 1 — MOV.B R0,@(disp,Rn) R0 → (disp + Rn) 10000000nnnndddd 1 — MOV.W R0,@(disp,Rn) R0 → (disp + Rn) 10000001nnnndddd 1 — MOV.B @(disp,Rm),R0 (disp + Rm) → sign extension → R0 10000100mmmmdddd 1 — MOV.W @(disp,Rm),R0 (disp + Rm) → sign extension → R0 10000101mmmmdddd 1 — CMP/EQ #imm,R0 When R0 = imm, 1→T 10001000iiiiiiii 1 Comparison results BT label When T = 1, disp + PC → PC; when T = 1, nop. 10001001dddddddd 3/1*2 — BF label When T = 0, disp + PC → PC; when T = 1, nop 10001011dddddddd 3/1*2 — BT/S label If T = 1, disp + PC → 10001101dddddddd PC; if T = 0, nop 2/1*2 — BF/S label If T = 0, disp + PC → 10001111dddddddd PC; if T = 1, nop 2/1*2 — MOV.W @(disp,PC),Rn (disp + PC) → sign extension → Rn 1001nnnndddddddd 1 — BRA label Delayed branching, disp + PC → PC 1010dddddddddddd 2 — BSR label Delayed branching, PC → PR, disp + PC → PC 1011dddddddddddd 2 — 317 Table A.55 Instruction Set by Instruction Code (cont) Instruction Operation Code Cycles T Bit MOV.B R0,@(disp,GBR) R0 → (disp + GBR) 11000000dddddddd 1 — MOV.W R0,@(disp,GBR) R0 → (disp × 2 + GBR) 11000001dddddddd 1 — MOV.L R0,@(disp,GBR) R0 → (disp × 4 + GBR) 11000010dddddddd 1 — TRAPA #imm Stack area → PC/SR (imm × 4 + VBR) → PC 11000011iiiiiiii 8 — MOV.B @(disp,GBR),R0 (disp + GBR) → sign extension → R0 11000100dddddddd 1 — MOV.W @(disp,GBR),R0 (disp × 2 + GBR) → sign extension → R0 11000101dddddddd 1 — MOV.L @(disp,GBR),R0 (disp × 4 + GBR) → R0 11000110dddddddd 1 — MOVA @(disp,PC),R0 disp × 4 + PC → R0 11000111dddddddd 1 — TST #imm,R0 R0 & imm, when result is 0, 1 → T 11001000iiiiiiii 1 Test results AND #imm,R0 R0 & imm → R0 11001001iiiiiiii 1 — XOR #imm,R0 R0 ^ imm → R0 11001010iiiiiiii 1 — OR #imm,R0 R0 | imm → R0 11001011iiiiiiii 1 — TST.B #imm,@(R0,GBR) (R0 + GBR) & imm, when result is 0, 1 → T 11001100iiiiiiii 3 Test results AND.B #imm,@(R0,GBR) (R0 + GBR) & imm → (R0 + GBR) 11001101iiiiiiii 3 — XOR.B #imm,@(R0,GBR) (R0 + GBR) ^ imm → (R0 + GBR) 11001110iiiiiiii 3 — OR.B #imm,@(R0,GBR) (R0 + GBR) | imm → (R0 + GBR) 11001111iiiiiiii 3 — MOV.L @(disp,PC),Rn (disp × 4 + PC) → Rn 1101nnnndddddddd 1 — MOV #imm,Rn #imm → sign extension → Rn 1110nnnniiiiiiii 1 — FSTS FPUL,FRn FPUL → FRn 1111nnnn00001101 1 — FLDS FRm,FPUL FRm → FPUL 1111nnnn00011101 1 — FLOAT FPUL,FRn (float)FPUL → FRn 1111nnnn00101101 1 — FTRC FRm,FPUL (long)FRm → FPUL 1111nnnn00111101 1 — FNEG FRn -FRn → FRn 1111nnnn01001101 1 — FABS FRn FRn → FRn 1111nnnn01011101 1 — FLDI0 FRn H'00000000 → FRn 1111nnnn10001101 1 — FLDI1 FRn H'3F800000 → FRn 1111nnnn10011101 1 — 318 Table A.55 Instruction Set by Instruction Code (cont) Instruction Operation Code Cycles T Bit FADD FRm,FRn FRn + FRm → FRn 1111nnnnmmmm0000 1 — FSUB FRm,FRn FRn – FRm → FRn 1111nnnnmmmm0001 1 — FMUL FRm,FRn FRn × FRm → FRn 1111nnnnmmmm0010 1 — FDIV FRm,FRn FRn/FRm → FRn 1111nnnnmmmm0011 13 — FCMP/EQ FRm,FRn (FRn = FRm)?1:0 → T 1111nnnnmmmm0100 1 Comparison result FCMP/GT FRm,FRn (FRn > FRm)?1:0 → T 1111nnnnmmmm0101 1 Comparison result FMOV.S @(R0,Rm),FRn (R0 + Rm) → FRn 1111nnnnmmmm0110 1 — FMOV.S FRm,@(R0,Rn) (FRm) → (R0 + Rn) 1111nnnnmmmm0111 1 — FMOV.S @Rm,FRn (Rm) → FRn 1111nnnnmmmm1000 1 — FMOV.S @Rm+,FRn (Rm) → FRn, Rm + 4 → Rm 1111nnnnmmmm1001 1 — FMOV.S FRm,@Rn FRm → (Rn) 1111nnnnmmmm1010 1 — FMOV.S FRm,@-Rn Rn – 4 → Rn, FRm → (Rn) 1111nnnnmmmm1011 1 — FMOV FRm,FRn FRm → FRn 1111nnnnmmmm1100 1 — FMAC FR0,FRm,FRn FR0 × FRm + FRn→ FRn 1111nnnnmmmm1110 1 — Notes: 1. Normal minimum number of execution states (the number in parenthesis is the number of states when there is contention with preceding/following instructions). 2. One state when it does not branch. 319 A.4 Operation Code Map Table A.56 shows operation code map. Table A.56 Operation Code Map Instruction Code MSB Fx: 0000 LSB MD: 00 0000 Rn Fx 0000 0000 Rn Fx 0001 0000 Rn Fx 0010 STC 0000 Rn Fx 0011 BSRF 0000 Rn Rm 01MD MOV.B Rm,@(R0,Rn) SR,Rn Fx: 0001 Fx: 0010 Fx: 0011–1111 MD: 01 MD: 10 MD: 11 STC Rm Rm MOV.W Rm,@(R0,Rn) MOV.L Rm,@(R0,Rn) CLRMAC 1000 CLRT SETT 0000 0000 Fx 1001 NOP DIV0U 0000 0000 Fx 1010 0000 0000 Fx 1011 RTS 0000 Rn Fx 1000 0000 Rn Fx 1001 0000 Rn Fx 1010 STS MACH,Rn 0000 Rn Fx 1011 0000 Rn RM 11MD MOV.B @(R0,Rm),Rn 0001 Rn Rm disp 0010 Rn Rm 00MD MOV.B 0010 Rn Rm 01MD MOV.B Rm,@-Rn 0010 Rn Rm 10MD TST 0010 Rn Rm 11MD CMP/STR Rm,Rn 0011 Rn Rm 00MD CMP/EQ Rm,Rn 0011 Rn Rm 01MD DIV1 Rm,Rn MOV.L VBR,Rn BRAF 0000 0000 Fx 320 GBR,Rn STC SLEEP MUL.L Rm,Rn RTE MOVT Rn STS MACL,Rn STS PR,Rn MOV.W @(R0,Rm),Rn MOV.L @(R0,Rm),Rn STS FPUL,Rn/ STS FPSCR,Rn MAC.L @Rm+,@Rn+ Rm,@(disp:4,Rn) Rm,@Rn Rm,Rn MOV.W Rm,@Rn MOV.L Rm,@Rn MOV.W Rm,@-Rn MOV.L Rm,@-Rn DIV0S Rm,Rn AND Rm,Rn XOR OR Rm,Rn XTRCT Rm,Rn MULU.W Rm,Rn MULS.W Rm,Rn CMP/HS Rm,Rn CMP/GE Rm,Rn CMP/HI Rm,Rn CMP/GT Rm,Rn DMULU.L Rm,Rn Rm,Rn Table A.56 Operation Code Map (cont) Instruction Code MSB Fx: 0000 LSB MD: 00 Fx: 0001 Fx: 0010 Fx: 0011–1111 MD: 01 MD: 10 MD: 11 0011 Rn Rm 10MD SUB Rm,Rn SUBC Rm,Rn SUBV Rm,Rn 0011 Rn Rm 11MD ADD Rm,Rn DMULS.L Rm,Rn ADDC Rm,Rn ADDV Rm,Rn 0100 Rn Fx 0000 SHLL Rn DT Rn SHAL Rn 0100 Rn Fx 0001 SHLR Rn CMP/PZ Rn SHAR Rn 0100 Rn Fx 0010 STS.L MACH,@–Rn STS.L MACL,@–Rn STS.L PR,@–Rn 0100 Rn 00 0011 STC.L MD SR,@–Rn STC.L GBR,@–Rn STC.L VBR,@–Rn 0100 Rn Fx 0100 ROTL Rn 0100 Rn Fx 0101 ROTR Rn 0100 Rm Fx 0100 Rm STC.L FPSCR,@-Rn STC.L FPUL,@-Rn ROTCL Rn CMP/PL Rn ROTCR Rn 0110 LDS.L @Rm+,MACH LDS.L @Rm+,MACL LDS.L @Rm+,PR Fx 0111 LDC.L @Rm+,SR LDC.L @Rm+,GBR LDC.L @Rm+,VBR 0100 Rn Fx 1000 SHLL2 Rn SHLL8 Rn SHLL16 Rn 0100 Rn Fx 1001 SHLR2 Rn SHLR8 Rn SHLR16 Rn 0100 Rm Fx 1010 LDS Rm,MACH LDS Rm,MACL LDS Rm,PR 0100 Rm/ Fx Rn 1011 JSR @Rm TAS.B @Rm JMP @Rm 0100 Rm Fx 1100 0100 Rm Fx 1101 0100 Rm Fx 1110 LDC Rm,SR LDC Rm,GBR LDC Rm,VBR LDC Rm,SSR 0100 Rn Rm 1111 MAC.W @Rm+,@Rn+ 0101 Rn Rm disp 0110 Rn Rm 00MD MOV.B @Rm,Rn MOV.L @Rm,Rn MOV Rm,Rn LDS.L @Rm+,FPSCR LDS.L @Rm+,FPUL LDS Rm,FPSCR LDS Rm,FPUL MOV.L @(disp:4,Rm),Rn MOV.W @Rm,Rn 321 Table A.56 Operation Code Map (cont) Instruction Code MSB Fx: 0000 LSB MD: 00 Fx: 0001 Fx: 0010 Fx: 0011–1111 MD: 01 MD: 10 MD: 11 NOT Rm,Rn Rm,Rn NEG Rm,Rn 0110 Rn Rm 01MD MOV.B @Rm+,Rn MOV.W @Rm+,Rn MOV.L @Rm+,Rn 0110 Rn Rm 10MD SWAP.B @Rm,Rn SWAP.W @Rm,Rn NEGC 0110 Rn Rm 11MD EXTU.B Rm,Rn EXTU.W Rm,Rn EXTS.B Rm,Rn 0111 Rn 1000 00 MD imm Rn disp ADD EXTS.W Rm,Rn #imm:8,Rn MOV.B MOV.W R0, R0,@(disp:4, @(disp:4,Rn) Rn) 1000 01 MD Rm disp MOV.B @(disp:4, Rm),R0 MOV.W @(disp:4, Rm),R0 1000 10 MD imm/disp CMP/EQ #imm:8,R0 BT disp:8 BF 1000 10 MD imm/disp BT/S disp:8 BF/S disp:8 1001 Rn disp disp:8 MOV.W @(disp:8,PC),Rn 1010 disp BRA disp:12 1011 disp BSR disp:12 1100 00 MD imm/disp MOV.B R0,@(disp: 8,GBR) MOV.W R0,@(disp: 8,GBR) MOV.L R0,@(disp: 8,GBR) TRAPA #imm:8 1100 01 MD disp MOV.B @(disp:8, GBR),R0 MOV.W @(disp:8, GBR),R0 MOV.L @(disp:8, GBR),R0 MOVA @(disp:8, PC),R0 1100 10 MD imm TST #imm:8,R0 AND #imm:8,R0 XOR #imm:8,R0 OR #imm:8,R0 1100 11 MD imm TST.B #imm:8, @(R0,GBR) AND.B #imm:8, @(R0,GBR) XOR.B #imm:8, @(R0,GBR) OR.B #imm:8, @(R0,GBR) 1101 Rn disp MOV.L @(disp:8,PC),R0 1110 Rn imm MOV 1111 322 — #imm:8,Rn Floating-point instruction Appendix B Pipeline Operation and Contention The SH-2E is designed so that basic instructions are executed in one cycle. Two or more cycles are required for instructions when, for example, the branch destination address is changed by a branch instruction or when the number of cycles is increased by contention between MA and IF. Table B.1 gives the number of execution cycles and stages for different types of contention and their instructions. Instructions without contention and instructions that require 2 or more cycles even without contention are also shown. Instructions contend in the following ways: CPU instructions • Operations and transfers between registers are executed in one cycle with no contention. • No contention occurs, but the instruction still requires 2 or more cycles. • Contention occurs, increasing the number of execution cycles. Contention combinations are: — MA contends with IF — MA contends with IF and sometimes with memory loads as well — MA contends with IF and sometimes with the multiplier as well — MA contends with IF and sometimes with memory loads and sometimes with the multiplier Floating-point instructions or FPU-related CPU instructions • No contention occurs with the FCMP instruction. • MA contends with IF in the case of store instructions involving FR0 to FR15 and FRUL. • For floating-point operation instructions other than FDIV, floating-point register transfer instructions, and floating-point register immediate instructions, contention occurs if an instruction that reads from the destination of the instruction follows immediately after it. • MA contends with IF in the case of load instructions involving FR0 to FR15 and FRUL. Also, contention occurs if an instruction that reads from the destination of the instruction follows immediately after it. • Contention occurs if an instruction that uses Rn follows the STS FPUL,Rn or STS FPSCR,Rn instruction. • In the case of FPSCR load instructions, contention occurs as shown in Figure 8.11. • In the case of FPSCR store instructions, contention occurs as shown in Figure 8.12, and MA contends with IF. • In the case of the FDIV instruction, contention occurs as shown in Figure 8.13. 323 Table B.1 Instructions and Their Contention Patterns Contention Cycles Stages Instructions None 1 3 • Transfers between registers • Operations between registers (except when a multiplier is involved) • Logical operations between registers • Shift instructions • System control ALU instructions MA contends with IF 2 3 Unconditional branches 3/1 3 Conditional branches 3 3 SLEEP instruction 4 5 RTE instruction 8 9 TRAP instruction 1 4 • Memory store instructions • STS.L instruction (PR) 2 4 STC.L instruction 3 6 Memory logic operations 4 6 TAS instruction MA contends with IF and sometimes with memory loads as well. 1 5 • Memory load instructions 3 5 LDC.L instruction MA contends with IF and sometimes with the multiplier as well. 1 4 • Register to MAC transfer instructions MA contends with IF and sometimes with memory loads and sometimes with the multiplier. • LDS.L instruction (PR) • Memory to MAC transfer instructions • MAC to memory transfer instructions 1 to 3* 6 Multiplication instructions 3/(2)* 7 Multiply/accumulate instructions 3/(2 to 4)* 9 Double length multiply/accumulate instructions (SH-2 CPU only) 2 to 4* 9 Double length multiplication instructions (SH-2 CPU only) 1 5 MAC to register transfer instructions Note: * The normal minimum number of execution states. (The number in parentheses is the number in contention with the preceding/following instructions.) 324 Table B.2 Types of Contention and Instruction Behavior (Floating-point Instructions or FPU-related CPU Instructions) Contention Cycles Stages Instructions None 1 3 (FPU pipeline) 3 (CPU pipeline) FCMP/EQ FCMP/GT FRm,FRn FRm,FRn • MA in CPU pipeline contends 1 with IF 4 (FPU pipeline) 4 (CPU pipeline) STS.L FMOV.S FMOV.S FMOV.S FPUL,@-Rn FRm,@Rn FRm,@-Rn FRm,@(R0,Rn) • Contention occurs if next 1 instruction reads destination register 5 (FPU pipeline) 3 (CPU pipeline) FLDS FMOV FSTS FLDI0 FLDI1 FABS FADD FLOAT FMAC FMUL FNEG FSUB FTRC FRm,FPUL FRm,FRn FPUL,FRn FRn FRn FRn FRm,FRn FPUL,FRn FR0,FRm,FRn FRm,FRn FRn FRm,FRn FRm,FPUL • Contention occurs if next 1 instruction reads destination register 5 (FPU pipeline) 4 (CPU pipeline) • MA in CPU pipeline contends with IF LDS LDS.L FMOV.S FMOV.S FMOV.S Rm,FPUL @Rm+,FPUL @Rm,FRn @Rm+,FRn @(R0,Rm),FRn • Contention occurs if next instruction uses Rn 1 4 (FPU pipeline) 5 (CPU pipeline) STS FPUL,Rn • MA in CPU pipeline contends with IF • Contention occurs as shown 1 in Figure 8.11 5 (FPU pipeline) 4 (CPU pipeline) LDS LDS.L Rm,FPSCR @Rm+,FPSCR 325 Table B.2 Types of Contention and Instruction Behavior (Floating-point Instructions or FPU-related CPU Instructions) (cont) Contention Cycles • Contention occurs as shown 1 in Figure 8.12 • Contention occurs if next instruction uses Rn • MA in CPU pipeline contends with IF • Contention occurs as shown 1 in Figure 8.12 • MA in CPU pipeline contends with IF • Contention occurs as shown 13 in Figure 8.13 326 Stages Instructions 4 (FPU pipeline) 5 (CPU pipeline) STS FPSCR,Rn 4 (FPU pipeline) 4 (CPU pipeline) STS.L FPSCR,@-Rn 17 (FPU pipeline) 3 (CPU pipeline) FDIV FRm,FRn