e200z759n3 Core Reference Manual Supports: e200z759n3 e200z759n3CRM Rev. 2 January 2015 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 1 Chapter 1 e200z759n3 Overview 1.1 Overview of the e200z759n3 ..........................................................................................................23 1.1.1 Features .............................................................................................................................23 1.1.2 Microarchitecture summary ..............................................................................................24 1.1.2.1 Instruction unit features ..................................................................................26 1.1.2.2 Integer unit features ........................................................................................27 1.1.2.3 Load/store unit features ..................................................................................27 1.1.2.4 Cache features .................................................................................................27 1.1.2.5 MMU Features ................................................................................................28 1.1.2.6 e200z759n3 system bus features .....................................................................28 Chapter 2 Register Model 2.1 2.2 2.3 2.4 2.5 2.6 PowerPC Book E registers ..............................................................................................................33 Zen-specific special purpose registers .............................................................................................35 Zen-specific device control registers ...............................................................................................37 Special-purpose register descriptions ..............................................................................................37 2.4.1 Machine State Register (MSR) .........................................................................................37 2.4.2 Processor ID Register (PIR) .............................................................................................39 2.4.3 Processor Version Register (PVR) ....................................................................................40 2.4.4 System Version Register (SVR) ........................................................................................41 2.4.5 Integer Exception Register (XER) ....................................................................................41 2.4.6 Exception Syndrome Register ..........................................................................................42 2.4.6.1 PowerPC VLE mode instruction syndrome ....................................................44 2.4.6.2 Misaligned instruction fetch syndrome ...........................................................44 2.4.7 Machine Check Syndrome Register (MCSR) ...................................................................45 2.4.8 Timer Control Register (TCR) ..........................................................................................47 2.4.9 Timer Status Register (TSR) .............................................................................................48 2.4.10 Debug registers .................................................................................................................49 2.4.11 Hardware Implementation Dependent Register 0 (HID0) ................................................50 2.4.12 Hardware Implementation Dependent Register 1 (HID1) ................................................52 2.4.13 Branch Unit Control and Status Register (BUCSR) .........................................................53 2.4.14 L1 Cache Control and Status Registers (L1CSR0, L1CSR1) ...........................................54 2.4.15 L1 Cache Configuration registers (L1CFG0, L1CFG1) ...................................................54 2.4.16 L1 Cache Flush and Invalidate registers (L1FINV0, L1FINV1) ......................................55 2.4.17 MMU Control and Status Register (MMUCSR0) ............................................................55 2.4.18 MMU Configuration register (MMUCFG) .......................................................................55 2.4.19 TLB Configuration registers (TLB0CFG, TLB1CFG) .....................................................55 SPR register access ..........................................................................................................................55 2.5.1 Invalid SPR references ......................................................................................................55 2.5.2 Synchronization requirements for SPRs ...........................................................................56 2.5.3 Special purpose register summary ....................................................................................57 Reset settings ...................................................................................................................................60 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 3 Chapter 3 Instruction Model 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 3.12 3.13 3.14 3.15 3.16 3.17 Unsupported instructions and instruction forms .............................................................................65 Implementation-specific instructions ..............................................................................................65 Book E instruction extensions .........................................................................................................66 Memory access alignment support ..................................................................................................66 Memory synchronization and reservation instructions ...................................................................66 Branch prediction ............................................................................................................................68 Interruption of instructions by interrupt requests ............................................................................68 New Zen instructions and APUs .....................................................................................................68 ISEL APU .......................................................................................................................................69 Debug APU .....................................................................................................................................69 3.10.1 Debug notify halt instructions ...........................................................................................71 Machine Check APU .......................................................................................................................73 WAIT APU ......................................................................................................................................75 Enhanced reservations APU ............................................................................................................76 Volatile Context Save/Restore APU ................................................................................................79 Unimplemented SPRs and read-only SPRs .....................................................................................87 Invalid forms of instructions ...........................................................................................................87 3.16.1 Load and store with update instructions ...........................................................................87 3.16.2 Load multiple word (lmw, e_lmw) instruction .................................................................87 3.16.3 Branch conditional to count register instructions .............................................................87 3.16.4 Instructions with reserved fields non-zero ........................................................................88 Instruction summary ........................................................................................................................88 3.17.1 Instruction index sorted by mnemonic ..............................................................................89 3.17.2 Instruction index sorted by opcode .................................................................................102 Chapter 4 Instruction Pipeline and Execution Timing 4.1 4.2 4.3 Overview of operation ...................................................................................................................117 4.1.1 Control unit .....................................................................................................................119 4.1.2 Instruction unit ................................................................................................................119 4.1.3 Branch unit ......................................................................................................................119 4.1.4 Instruction decode unit ....................................................................................................119 4.1.5 Exception handling .........................................................................................................120 Execution units ..............................................................................................................................120 4.2.1 Integer execution units ....................................................................................................120 4.2.2 Load / store unit ..............................................................................................................120 4.2.3 Embedded floating-point execution units .......................................................................120 Instruction pipeline ........................................................................................................................120 4.3.1 Description of pipeline stages .........................................................................................122 4.3.2 Instruction prefetch buffers and branch target buffer .....................................................123 4.3.3 Single-cycle instruction pipeline operation ....................................................................125 4.3.4 Basic load and store instruction pipeline operation ........................................................125 4.3.5 Change-of-flow instruction pipeline operation ...............................................................126 e200z759n3 Core Reference Manual, Rev. 2 4 Freescale Semiconductor 4.4 4.5 4.6 4.7 4.8 4.3.6 Basic multi-cycle instruction pipeline operation ............................................................128 4.3.7 Additional examples of instruction pipeline operation for load and store ......................129 4.3.8 Move to/from SPR instruction pipeline operation ..........................................................131 Control hazards .............................................................................................................................133 Instruction serialization .................................................................................................................133 4.5.1 Completion serialization .................................................................................................133 4.5.2 Dispatch serialization ......................................................................................................134 4.5.3 Refetch serialization .......................................................................................................134 Concurrent instruction execution ..................................................................................................135 Instruction Timings .......................................................................................................................136 Operand placement on performance .............................................................................................141 Chapter 5 Embedded Floating-Point APU (EFPU2) 5.1 5.2 5.3 5.4 5.5 5.6 Nomenclature and conventions .....................................................................................................143 EFPU programming model ...........................................................................................................143 5.2.1 Signal Processing Extension / Embedded Floating-point Status and Control Register (SPEFSCR) 143 5.2.2 GPRs and PowerISA 2.06 instructions ...........................................................................147 5.2.3 SPE/EFPU available bit in MSR ....................................................................................147 5.2.4 Embedded floating-point exception bit in ESR ..............................................................147 5.2.5 EFPU exceptions .............................................................................................................147 5.2.5.1 EFPU unavailable exception .........................................................................148 5.2.5.2 Embedded floating-point data exception ......................................................148 5.2.5.3 Embedded floating-point round exception ...................................................148 5.2.6 Exception Priorities .........................................................................................................149 Embedded floating-point APU operations ....................................................................................149 5.3.1 Floating-point data formats .............................................................................................149 5.3.1.1 Single-precision floating-point format ..........................................................150 5.3.1.2 Half-precision floating-point format .............................................................151 5.3.2 IEEE 754 compliance .....................................................................................................152 5.3.3 Floating-point exceptions ...............................................................................................153 5.3.4 Embedded scalar single-precision floating-point instructions ........................................153 5.3.5 EFPU Vector Single-precision Embedded Floating-Point Instructions ..........................186 Embedded floating-point results summary ...................................................................................238 EFPU instruction timing ................................................................................................................253 5.5.1 EFPU single-precision vector floating-point instruction timing .....................................254 5.5.2 EFPU single-precision scalar floating-point instruction timing .....................................255 Instruction forms and opcodes ......................................................................................................256 5.6.1 Opcodes for EFPU vector floating-point instructions ....................................................257 5.6.2 Opcodes for EFPU scalar single-precision floating-point instructions ..........................259 Chapter 6 Signal Processing Extension APU (SPE APU) 6.1 Nomenclature and conventions .....................................................................................................261 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 5 6.2 SPE programming model ..............................................................................................................261 6.2.1 SPE Status and Control Register (SPEFSCR) ................................................................261 6.2.2 Accumulator ....................................................................................................................263 6.2.2.1 Context switch ..............................................................................................264 6.2.3 GPRs and PowerPC Book E instructions .......................................................................264 6.2.4 SPE available bit in MSR ...............................................................................................264 6.2.5 SPE exception bit in ESR ...............................................................................................264 6.2.6 SPE exceptions ...............................................................................................................264 6.2.6.1 SPE APU Unavailable exception ..................................................................265 6.2.7 Exception priorities .........................................................................................................265 6.3 Integer SPE simple instructions ....................................................................................................265 6.4 Integer SPE multiply, multiply-accumulate, and operation to accumulator instructions (complex integer instructions) 307 6.4.1 Multiply halfword instructions .......................................................................................308 6.4.2 Multiply words instructions ............................................................................................372 6.4.3 Add/subtract word to accumulator instructions ..............................................................412 6.4.4 Initializing and reading the accumulator ........................................................................420 6.5 SPE vector load/store instructions .................................................................................................422 6.6 SPE instruction timing ..................................................................................................................458 6.6.1 SPE integer simple instructions timing ...........................................................................458 6.6.2 SPE load and store instruction timing .............................................................................460 6.6.3 SPE complex integer instruction timing .........................................................................461 6.7 Instruction forms and opcodes ......................................................................................................465 6.7.1 SPE vector integer simple instructions ...........................................................................466 6.7.2 Opcodes for SPE load and store instructions ..................................................................468 6.7.3 Opcodes for SPE complex integer instructions ..............................................................469 Chapter 7 Interrupts and Exceptions 7.1 7.2 7.3 7.4 7.5 7.6 7.7 e200z759n3 interrupts ...................................................................................................................479 Exception Syndrome Register (ESR) ............................................................................................482 Machine State Register (MSR) ......................................................................................................484 7.3.1 Machine Check Syndrome Register (MCSR) .................................................................486 Interrupt Vector Prefix Registers (IVPR) ......................................................................................489 Interrupt Vector Offset Registers (IVORxx) .................................................................................490 Hardware Interrupt Vector Offset Values (p_voffset[0:15]) ..........................................................490 Interrupt definitions .......................................................................................................................491 7.7.1 Critical Input interrupt (IVOR0) .....................................................................................491 7.7.2 Machine Check interrupt (IVOR1) .................................................................................492 7.7.2.1 Machine check causes ...................................................................................492 7.7.2.1.1Error report machine check exceptions ........................................................492 7.7.2.1.2Non-maskable interrupt machine check exceptions .....................................497 7.7.2.1.3Asynchronous machine check exceptions ....................................................497 7.7.2.2 Machine check interrupt actions ...................................................................504 7.7.2.3 Checkstop state .............................................................................................505 e200z759n3 Core Reference Manual, Rev. 2 6 Freescale Semiconductor 7.7.3 Data Storage interrupt (IVOR2) ......................................................................................505 7.7.4 Instruction Storage interrupt (IVOR3) ............................................................................506 7.7.5 External Input interrupt (IVOR4) ...................................................................................507 7.7.6 Alignment interrupt (IVOR5) .........................................................................................508 7.7.7 Program interrupt (IVOR6) ............................................................................................508 7.7.8 Floating-Point Unavailable interrupt (IVOR7) ...............................................................509 7.7.9 System Call interrupt (IVOR8) .......................................................................................510 7.7.10 Auxiliary Processor Unavailable interrupt (IVOR9) ......................................................510 7.7.11 Decrementer interrupt (IVOR10) ....................................................................................510 7.7.12 Fixed-Interval Timer interrupt (IVOR11) .......................................................................511 7.7.13 Watchdog Timer interrupt (IVOR12) ..............................................................................512 7.7.14 Data TLB Error interrupt (IVOR13) ...............................................................................512 7.7.15 Instruction TLB Error interrupt (IVOR14) .....................................................................513 7.7.16 Debug interrupt (IVOR15) ..............................................................................................514 7.7.17 System Reset interrupt ....................................................................................................516 7.7.18 SPE/EFPU APU Unavailable interrupt (IVOR32) .........................................................518 7.7.19 Embedded Floating-point Data interrupt (IVOR33) .......................................................518 7.7.20 Embedded Floating-point Round interrupt (IVOR34) ....................................................519 7.7.21 Performance monitor interrupt (IVOR35) ......................................................................519 7.8 Exception recognition and priorities .............................................................................................520 7.8.1 Exception priorities .........................................................................................................522 7.9 Interrupt processing .......................................................................................................................525 7.9.1 Enabling and disabling exceptions .................................................................................526 7.9.2 Returning from an interrupt handler ...............................................................................527 7.10 Process switching ..........................................................................................................................527 Chapter 8 Performance Monitor 8.1 8.2 8.3 8.4 8.5 8.6 Overview .......................................................................................................................................529 Performance Monitor APU instructions ........................................................................................530 Performance Monitor APU registers .............................................................................................531 8.3.1 Invalid PMR references ..................................................................................................532 8.3.2 References to read-only PMRs .......................................................................................532 8.3.3 Performance Monitor Global Control Register 0 (PMGC0) ..........................................532 8.3.4 User Performance Monitor Global Control Register 0 (UPMGC0) ...............................534 8.3.5 Performance Monitor Local Control A Registers (PMLCa0–PMLCa3) ........................534 8.3.6 User Performance Monitor Local Control A Registers (UPMLCa0–UPMLCa3) .........535 8.3.7 Performance Monitor Local Control B Registers (PMLCb0–PMLCb3) ........................535 8.3.8 User Performance Monitor Local Control B registers (UPMLCb0–UPMLCb3) ...........540 8.3.9 Performance Monitor Counter registers (PMC0–PMC3) ...............................................540 8.3.10 User Performance Monitor Counter registers (UPMC0–UPMC3) .................................541 Performance monitor interrupt ......................................................................................................541 Event counting ...............................................................................................................................542 8.5.1 MSR-based context filtering ...........................................................................................542 Examples .......................................................................................................................................543 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 7 8.7 8.6.1 Chaining counters ...........................................................................................................543 8.6.2 Thresholding ...................................................................................................................543 Event selection ..............................................................................................................................544 Chapter 9 Power Management 9.1 Power management .......................................................................................................................555 9.1.1 Active state .....................................................................................................................555 9.1.2 Waiting state ....................................................................................................................555 9.1.3 Halted state .....................................................................................................................555 9.1.4 Stopped state ...................................................................................................................556 9.1.5 Power management pins .................................................................................................556 9.1.6 Power management control bits ......................................................................................557 9.1.7 Software considerations for power management using wait instructions .......................557 9.1.8 Software considerations for power management using Doze, Nap or Sleep ..................558 9.1.9 Debug considerations for power management ................................................................558 Chapter 10 Memory Management Unit 10.1 Overview .......................................................................................................................................559 10.2 Effective to real address translation ..............................................................................................559 10.2.1 Effective addresses ..........................................................................................................559 10.2.2 Address spaces ................................................................................................................559 10.2.3 Process ID .......................................................................................................................560 10.2.4 Translation flow ..............................................................................................................560 10.2.5 Permissions .....................................................................................................................562 10.2.6 Restrictions on 1 KB and 2 KB page size usage .............................................................563 10.3 Translation Lookaside Buffer (TLB) .............................................................................................563 10.4 Configuration information .............................................................................................................564 10.4.1 MMU Configuration Register (MMUCFG) ...................................................................564 10.4.2 TLB0 Configuration Register (TLB0CFG) ....................................................................565 10.4.3 TLB1 Configuration Register (TLB1CFG) ....................................................................566 10.5 Software interface and TLB instructions ......................................................................................567 10.5.1 TLB read entry instruction (tlbre) ...................................................................................568 10.5.2 TLB write entry instruction (tlbwe) ................................................................................568 10.5.3 TLB search instruction (tlbsx) ........................................................................................568 10.5.4 TLB Invalidate (tlbivax) Instruction ...............................................................................569 10.5.5 TLB synchronize instruction (tlbsync) ...........................................................................570 10.6 TLB operations ..............................................................................................................................571 10.6.1 Translation reload ...........................................................................................................571 10.6.2 Reading the TLB .............................................................................................................571 10.6.3 Writing the TLB ..............................................................................................................571 10.6.4 Searching the TLB ..........................................................................................................571 10.6.5 TLB miss exception update ............................................................................................572 10.6.6 IPROT invalidation protection ........................................................................................572 e200z759n3 Core Reference Manual, Rev. 2 8 Freescale Semiconductor 10.6.7 TLB load on reset ...........................................................................................................572 10.6.8 The G bit .........................................................................................................................573 10.7 MMU control registers ..................................................................................................................573 10.7.1 Data Exception Address Register (DEAR) .....................................................................573 10.7.2 MMU Control and Status Register 0 (MMUCSR0) .......................................................573 10.7.3 MMU assist registers (MAS) ..........................................................................................574 10.7.3.1 MMU Read/Write and Replacement Control register (MAS0) ....................574 10.7.3.2 Descriptor Context and Configuration Control register (MAS1) .................575 10.7.3.3 EPN and Page Attributes register (MAS2) ...................................................576 10.7.3.4 RPN and Access Control register (MAS3) ...................................................577 10.7.3.5 Hardware Replacement Assist Configuration register (MAS4) ...................578 10.7.3.6 TLB Search Context Register 0 (MAS6) ......................................................579 10.7.4 MAS registers summary .................................................................................................580 10.7.5 MAS register updates ......................................................................................................580 10.8 TLB coherency control ..................................................................................................................581 10.9 Core interface operation for MMU control instructions ...............................................................581 10.9.1 Transfer type encodings for MMU control instructions .................................................581 10.10 Effect of hardware debug on MMU operation ..............................................................................582 10.11 External translation alterations for realtime systems ....................................................................583 Chapter 11 L1 Cache 11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8 Overview .......................................................................................................................................585 16 KB cache organization .............................................................................................................586 Cache lookup .................................................................................................................................587 Cache control .................................................................................................................................589 11.4.1 L1 Cache Control and Status Register 0 (L1CSR0) .......................................................589 11.4.2 L1 Cache Control and Status Register 1 (L1CSR1) .......................................................593 11.4.3 L1 Cache Configuration Register 0 (L1CFG0) ..............................................................595 11.4.4 L1 Cache Configuration Register 1 (L1CFG1) ..............................................................596 Data cache software coherency .....................................................................................................597 Address aliasing ............................................................................................................................597 Cache Operation ............................................................................................................................598 11.7.1 Cache enable/disable .......................................................................................................598 11.7.2 Cache fills .......................................................................................................................598 11.7.3 Cache line replacement ...................................................................................................599 11.7.4 Cache miss access ordering ............................................................................................599 11.7.5 Cache-inhibited accesses ................................................................................................599 11.7.6 Guarded accesses ............................................................................................................600 11.7.7 Cache-inhibited guarded accesses ..................................................................................600 11.7.8 Cache invalidation ..........................................................................................................600 11.7.9 Cache flush/invalidate by set and way ............................................................................601 11.7.9.1 L1 Flush and Invalidate Control Register 0 (L1FINV0) ..............................601 11.7.9.2 L1 Flush and Invalidate Control Register 1 (L1FINV1) ..............................602 Cache parity and EDC protection ..................................................................................................603 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 9 11.8.1 Cache error action control ...............................................................................................604 11.8.1.1 L1CSR[0,1][I,D]CEA = 00, machine check generation on error .................604 11.8.1.2 L1CSR[0,1][I,D]CEA = 01, correction/auto-invalidation on error ..............605 11.8.1.2.1Instruction cache errors ..............................................................................605 11.8.1.2.2Data cache errors ........................................................................................606 11.8.1.2.3Data cache line flush or invalidation due to reservation instructions (l[b,h,w]arx, st[b,h,w]cx.) .......................................................................607 11.8.2 Parity/EDC error handling for cache control operations and instructions ......................607 11.8.2.1 L1FINV[0,1] operations ...............................................................................607 11.8.2.2 Cache touch instructions (dcbt, dcbtst, icbt) .................................................608 11.8.2.3 icbi instructions .............................................................................................608 11.8.2.4 dcbi instructions ............................................................................................608 11.8.2.5 dcbst instructions ..........................................................................................609 11.8.2.6 dcbf instructions ............................................................................................609 11.8.2.7 dcbz instructions ...........................................................................................609 11.8.2.8 Cache locking instructions (dcbtls, dcbtstls, dcblc, icbtls, icblc) .................610 11.8.3 Cache inhibited accesses and parity/EDC errors ............................................................610 11.8.4 Snoop operations and parity/EDC errors ........................................................................611 11.8.5 EDC checkbit/syndrome coding scheme generation — ICache .....................................611 11.8.6 EDC checkbit/syndrome coding scheme generation — DCache ...................................612 11.8.7 Cache error injection .......................................................................................................612 11.9 Push and store buffers ...................................................................................................................613 11.10 Cache management instructions ....................................................................................................614 11.10.1Instruction cache block invalidate (icbi) instruction .......................................................614 11.10.2Instruction cache block touch (icbt) instruction .............................................................614 11.10.3Data cache block allocate (dcba) instruction ..................................................................614 11.10.4Data cache block flush (dcbf) instruction .......................................................................615 11.10.5Data cache block invalidate (dcbi) instruction ...............................................................615 11.10.6Data cache block store (dcbst) instruction ......................................................................615 11.10.7Data cache block touch (dcbt) instruction ......................................................................615 11.10.8Data cache block touch for store (dcbtst) instruction .....................................................615 11.10.9Data cache block set to zero (dcbz) instruction ..............................................................615 11.11 Touch instructions .........................................................................................................................616 11.12 Cache line locking/unlocking APU ...............................................................................................616 11.12.1Overview .........................................................................................................................616 11.12.2dcbtls — data cache block touch and lock set ................................................................618 11.12.3dcbtstls — data cache block touch for store and lock set ...............................................619 11.12.4dcblc — data cache block lock clear ..............................................................................619 11.12.5icbtls — instruction cache block touch and lock set .......................................................620 11.12.6icblc — instruction cache block lock clear .....................................................................621 11.12.7Effects of other cache instructions on locked lines .........................................................622 11.12.8Flash clearing of lock bits ...............................................................................................622 11.13 Cache instructions and exceptions ................................................................................................623 11.13.1Exception conditions for cache instructions ...................................................................623 11.13.2Transfer type encodings for cache management instructions .........................................624 e200z759n3 Core Reference Manual, Rev. 2 10 Freescale Semiconductor 11.14 Sequential consistency ..................................................................................................................625 11.15 Self-modifying code requirements ................................................................................................625 11.16 Page table control bits ...................................................................................................................625 11.16.1Writethrough stores .........................................................................................................626 11.16.2Cache-inhibited accesses ................................................................................................626 11.16.3Memory coherence required ...........................................................................................626 11.16.4Guarded storage ..............................................................................................................626 11.16.5Misaligned accesses and the endian (E) bit ....................................................................626 11.17 Reservation instructions and cache interactions ............................................................................626 11.18 Effect of hardware debug on cache operation ...............................................................................627 11.19 Cache memory access for debug / error handling .........................................................................627 11.19.1Cache memory access via software ................................................................................627 11.19.2Cache memory access through JTAG/OnCE port ..........................................................628 11.19.3Cache Debug Access Control register (CDACNTL) ......................................................629 11.19.3.1 Cache Debug Access Data register (CDADATA) ........................................630 11.20 Hardware Debug (Cache) Control Register 0 ...............................................................................631 11.21 Hardware cache coherency ............................................................................................................632 11.21.1Coherency protocol .........................................................................................................633 11.21.2Snoop command port ......................................................................................................633 11.21.3Snoop request queue .......................................................................................................635 11.21.4Snoop lookup operation ..................................................................................................635 11.21.5Snoop errors ....................................................................................................................636 11.21.6Snoop collisions ..............................................................................................................636 11.21.7Snoop synchronization ....................................................................................................636 11.21.7.1 Synchronization port request ........................................................................636 11.21.7.2 Snoop command port request .......................................................................637 11.21.8Starvation control ............................................................................................................637 11.21.9Queue flow control .........................................................................................................637 11.21.10Snooping in low power states .......................................................................................638 Chapter 12 Debug Support 12.1 Overview .......................................................................................................................................639 12.1.1 Software debug facilities ................................................................................................639 12.1.1.1 PowerISA 2.06 compatibility ........................................................................640 12.1.2 Additional debug facilities ..............................................................................................640 12.1.3 Hardware debug facilities ...............................................................................................640 12.1.4 Sharing debug resources by software/hardware .............................................................641 12.1.4.1 Simultaneous hardware and software debug event handing .........................641 12.2 Software debug events and exceptions ..........................................................................................642 12.2.1 Instruction Address Compare event ................................................................................643 12.2.2 Data Address Compare event .........................................................................................644 12.2.2.1 Data Address Compare event status updates ................................................645 12.2.3 Linked Instruction Address and Data Address Compare event ......................................655 12.2.4 Trap debug event .............................................................................................................656 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 11 12.2.5 Branch Taken debug event ..............................................................................................656 12.2.6 Instruction Complete debug event ..................................................................................656 12.2.7 Interrupt Taken debug event ...........................................................................................657 12.2.8 Critical Interrupt Taken debug event ..............................................................................657 12.2.9 Return debug event .........................................................................................................657 12.2.10Critical Return debug event ............................................................................................658 12.2.11Debug Counter debug event ...........................................................................................658 12.2.12External debug event ......................................................................................................658 12.2.13Unconditional debug event .............................................................................................658 12.3 Debug registers ..............................................................................................................................659 12.3.1 Debug address and value registers ..................................................................................659 12.3.2 Debug Counter register (DBCNT) ..................................................................................660 12.3.3 Debug Control and Status registers .................................................................................660 12.3.3.1 Debug Control Register 0 (DBCR0) .............................................................661 12.3.3.2 Debug Control Register 1 (DBCR1) .............................................................663 12.3.3.3 Debug Control Register 2 (DBCR2) .............................................................665 12.3.3.4 Debug Control Register 3 (DBCR3) .............................................................669 12.3.3.5 Debug Control Register 4 (DBCR4) .............................................................674 12.3.3.6 Debug Control Register 5 (DBCR5) .............................................................675 12.3.3.7 Debug Control Register 6 (DBCR6) .............................................................677 12.3.3.8 Debug Status register (DBSR) ......................................................................679 12.3.4 Debug External Resource Control register (DBERC0) ..................................................681 12.3.5 Debug Event Select register (DEVENT) ........................................................................688 12.3.6 Debug Data Acquisition Message register (DDAM) ......................................................689 12.4 External debug support ..................................................................................................................689 12.4.1 External debug registers ..................................................................................................690 12.4.1.1 External Debug Control Register 0 (EDBCR0) ............................................691 12.4.1.2 External Debug Status Register 0 (EDBSR0) ...............................................692 12.4.1.3 External Debug Status Register Mask 0 (EDBSRMSK0) ............................694 12.4.2 OnCE introduction ..........................................................................................................696 12.4.3 JTAG/OnCE pins ............................................................................................................698 12.4.4 OnCE internal interface signals ......................................................................................698 12.4.4.1 CPU debug request (dbg_dbgrq) ..................................................................699 12.4.4.2 CPU debug acknowledge (cpu_dbgack) .......................................................699 12.4.4.3 CPU address, attributes .................................................................................699 12.4.4.4 CPU data .......................................................................................................699 12.4.5 OnCE interface signals ...................................................................................................699 12.4.5.1 OnCE enable (jd_en_once) ...........................................................................699 12.4.5.2 OnCE debug request/event (jd_de_b, jd_de_en) ..........................................700 12.4.5.3 e200z759n3 OnCE debug output (jd_debug_b) ...........................................700 12.4.5.4 e200z759n3 CPU clock on input (jd_mclk_on) ...........................................700 12.4.5.5 Watchpoint events (jd_watchpt[0:29]) ..........................................................700 12.4.6 e200z759n3 OnCE controller and serial interface ..........................................................701 12.4.6.1 e200z759n3 OnCE Status Register (OSR) ...................................................701 12.4.6.2 e200z759n3 OnCE Command register (OCMD) ..........................................702 e200z759n3 Core Reference Manual, Rev. 2 12 Freescale Semiconductor 12.5 12.6 12.7 12.8 12.9 12.4.6.3 e200z759n3 OnCE Control Register (OCR) ................................................706 12.4.7 Access to debug resources ..............................................................................................708 12.4.8 Methods of entering debug mode ...................................................................................710 12.4.8.1 External debug request during RESET .........................................................710 12.4.8.2 Debug request during RESET .......................................................................710 12.4.8.3 Debug request during normal activity ..........................................................711 12.4.8.4 Debug request during Waiting, Halted, or Stopped state ..............................711 12.4.8.5 Software request during normal activity .......................................................711 12.4.8.6 Debug notify halt instructions .......................................................................711 12.4.9 CPU Status and Control Scan Chain Register (CPUSCR) .............................................712 12.4.9.1 Instruction Register (IR) ...............................................................................712 12.4.9.2 Control State register (CTL) .........................................................................713 12.4.9.3 Program Counter register (PC) .....................................................................716 12.4.9.4 Write-Back Bus Register (WBBRlow, WBBRhigh) ....................................716 12.4.9.5 Machine State Register (MSR) .....................................................................717 12.4.9.6 Exiting debug mode and interrupt blocking .................................................717 12.4.10Instruction Address FIFO buffer (PC FIFO) ..................................................................717 12.4.10.1 PC FIFO ........................................................................................................717 12.4.11Reserved registers (reserved) ..........................................................................................719 Watchpoint support ........................................................................................................................719 MMU and cache operation during debug ......................................................................................721 Cache array access during debug ..................................................................................................722 Basic steps for enabling, using, and exiting external debug mode ...............................................722 Parallel Signature unit ...................................................................................................................723 12.9.1 Parallel Signature Control Register (PSCR) ...................................................................725 12.9.2 Parallel Signature Status Register (PSSR) ......................................................................725 12.9.3 Parallel Signature High Register (PSHR) .......................................................................726 12.9.4 Parallel Signature Low Register (PSLR) ........................................................................726 12.9.5 Parallel Signature Counter Register (PSCTR) ................................................................727 12.9.6 Parallel Signature Update High Register (PSUHR) .......................................................727 12.9.7 Parallel Signature Update Low Register (PSULR) .........................................................727 Chapter 13 Nexus 3 Module 13.1 Introduction ...................................................................................................................................729 13.1.1 General description .........................................................................................................729 13.1.2 Terms and definitions ......................................................................................................729 13.1.3 Feature list .......................................................................................................................730 13.1.4 Functional block diagram ...............................................................................................732 13.2 Enabling Nexus 3 operation ..........................................................................................................732 13.3 TCODEs supported .......................................................................................................................733 13.4 Nexus 3 programmer’s model .......................................................................................................739 13.4.1 Client Select Control register (CSC) ..............................................................................741 13.4.2 Port Configuration Register (PCR) — reference only ....................................................741 13.4.3 Nexus Development Control Register 1 (DC1) ..............................................................742 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 13 13.4.4 Nexus Development Control Registers 2 and 3 (DC2, DC3) .........................................744 13.4.5 Nexus Development Control Register 4 (DC4) ..............................................................748 13.4.6 Development Status register (DS) ..................................................................................749 13.4.7 Watchpoint Trigger registers (WT, PTSTC, PTETC, DTSTC, DTETC) ........................749 13.4.8 Nexus Watchpoint Mask register (WMSK) ....................................................................754 13.4.9 Nexus Overrun Control Register (OVCR) ......................................................................755 13.4.10Data Trace Control Register (DTC) ................................................................................756 13.4.11Data Trace Start Address Registers (DTSA1–4) ............................................................758 13.4.12Data Trace End Address registers (DTEA1–4) ...............................................................758 13.4.13Read/Write Access Control/Status register (RWCS) ......................................................760 13.4.14Read/Write Access Data (RWD) .....................................................................................761 13.4.15Read/Write Access Address register (RWA) ..................................................................763 13.5 Nexus 3 register access via JTAG/OnCE ......................................................................................763 13.6 Nexus message fields ....................................................................................................................764 13.6.1 TCODE field ...................................................................................................................764 13.6.2 Source ID field (SRC) .....................................................................................................764 13.6.3 Relative address field (U-ADDR) ...................................................................................764 13.6.4 Full address field (F-ADDR) ..........................................................................................765 13.6.5 Address space indication field (MAP) ............................................................................765 13.7 Nexus message queues ..................................................................................................................766 13.7.1 Message queue overrun ..................................................................................................766 13.7.2 CPU stall .........................................................................................................................766 13.7.3 Message suppression .......................................................................................................766 13.7.4 Nexus message priority ...................................................................................................767 13.7.5 Data Acquisition Message (DQM) priority loss response ..............................................768 13.7.6 Ownership Trace Message (OTM) priority loss response ..............................................768 13.7.7 Program Trace Message (PTM) priority loss response ...................................................768 13.7.8 Data Trace Message (DTM) priority loss response ........................................................768 13.8 Debug Status messages .................................................................................................................768 13.9 Error messages ..............................................................................................................................769 13.10 Ownership trace .............................................................................................................................769 13.10.1Overview .........................................................................................................................769 13.10.2Ownership Trace Messaging (OTM) ..............................................................................769 13.11 Program trace ................................................................................................................................770 13.11.1Branch Trace messaging types ........................................................................................770 13.11.1.1 Zen Indirect Branch message instructions ....................................................771 13.11.1.2 Zen Direct Branch Message instructions ......................................................771 13.11.1.3 BTM using Branch History Messages ..........................................................772 13.11.1.4 BTM using Traditional Program Trace messages .........................................772 13.11.2BTM Message formats ....................................................................................................772 13.11.2.1 Indirect Branch Messages (history) ..............................................................772 13.11.2.2 Indirect Branch Messages (traditional) .........................................................773 13.11.2.3 Direct Branch Messages (traditional) ...........................................................773 13.11.3Program Trace message fields ........................................................................................773 13.11.3.1 Sequential Instruction Count field (ICNT) ...................................................773 e200z759n3 Core Reference Manual, Rev. 2 14 Freescale Semiconductor 13.11.3.2 Branch/Predicate Instruction History (HIST) ...............................................774 13.11.3.3 Execution mode indication ...........................................................................774 13.11.4Resource Full Messages ..................................................................................................775 13.11.5Program Correlation Messages (PCM) ...........................................................................775 13.11.5.1 Program Correlation Message generation for TLB update with new address translation .....................................................................................................................777 13.11.5.2 Program Correlation Message generation for TLB invalidate (tlbivax) operations ......................................................................................................................778 13.11.5.3 Program Correlation Message generation for PID updates or MSRIS updates .. .......................................................................................................................................778 13.11.6Program trace overflow error messages ..........................................................................778 13.11.7Program trace synchronization messages .......................................................................778 13.11.8Enabling Program Trace .................................................................................................780 13.11.9Program Trace timing diagrams (2 MDO / 1 MSEO configuration) ..............................781 13.12 Data Trace ....................................................................................................................................782 13.12.1Data Trace Messaging (DTM) ........................................................................................782 13.12.2DTM Message formats ...................................................................................................782 13.12.2.1 Data Write Messages ....................................................................................782 13.12.2.2 Data Read Messages .....................................................................................782 13.12.2.3 Data Trace Synchronization Messages .........................................................783 13.12.3DTM operation ...............................................................................................................784 13.12.3.1 Data trace windowing ...................................................................................784 13.12.3.2 Data access / instruction access data tracing ................................................785 13.12.3.3 Data trace filtering ........................................................................................785 13.12.3.4 Zen bus cycle special cases ...........................................................................785 13.12.4Data Trace Timing Diagrams(8 MDO / 2 MSEO configuration) ...................................786 13.13 Data Acquisition messaging ..........................................................................................................786 13.13.1Data Acquisition ID Tag field .........................................................................................787 13.13.2Data Acquisition Data field ............................................................................................787 13.13.3Data Acquisition Trace event ..........................................................................................787 13.14 Watchpoint Trace Messaging ........................................................................................................787 13.14.1Watchpoint Timing Diagram (2 MDO / 1 MSEO configuration) ...................................789 13.15 Nexus 3 read/write access to memory-mapped resources .............................................................790 13.15.1Single write Access .........................................................................................................790 13.15.2Block write access ..........................................................................................................791 13.15.3Single read access ...........................................................................................................791 13.15.4Block read access ............................................................................................................792 13.15.5Error handling .................................................................................................................792 13.15.5.1 AHB read/write error ....................................................................................793 13.15.5.2 Access termination ........................................................................................793 13.15.6Read/write access error message ....................................................................................793 13.16 Nexus 3 pin interface .....................................................................................................................793 13.16.1Pins implemented ............................................................................................................793 13.16.2Pin protocol .....................................................................................................................796 13.17 Rules for output messages .............................................................................................................798 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 15 13.18 Auxiliary port arbitration ..............................................................................................................798 13.19 Examples .......................................................................................................................................798 13.20 Electrical characteristics ................................................................................................................801 13.21 IEEE 1149.1 (JTAG) RD/WR sequences ......................................................................................801 13.21.1JTAG sequence for accessing internal Nexus registers ..................................................801 13.21.2JTAG sequence for read access of memory-mapped resources ......................................802 13.21.3JTAG sequence for write access of memory-mapped resources ....................................802 Chapter 14 External Core Complex Interfaces 14.1 Signal index ...................................................................................................................................806 14.2 Signal descriptions ........................................................................................................................813 14.2.1 e200z759n3 processor clock (m_clk) .............................................................................814 14.2.2 Reset-related signals .......................................................................................................814 14.2.2.1 Power-on reset (m_por) ................................................................................814 14.2.2.2 Reset (p_reset_b) ..........................................................................................814 14.2.2.3 Watchdog reset status (p_wrs[0:1]) ..............................................................815 14.2.2.4 Debug reset control (p_dbrstc[0:1]) ..............................................................815 14.2.2.5 Reset base (p_rstbase[0:29]) .........................................................................815 14.2.2.6 Reset endian mode (p_rst_endmode) ............................................................815 14.2.2.7 Reset VLE Mode (p_rst_vlemode) ...............................................................815 14.2.2.8 JTAG/OnCE reset (j_trst_b) .........................................................................815 14.2.3 Address and data buses ...................................................................................................816 14.2.3.1 Address bus (p_d_haddr[31:0], p_i_haddr[31:0]) ........................................816 14.2.3.2 Read data bus (p_d_hrdata[63:0], p_i_hrdata[63:0]) ....................................816 14.2.3.3 Write data bus (p_d_hwdata[63:0]) ..............................................................816 14.2.4 Transfer attribute signals .................................................................................................817 14.2.4.1 Transfer type (p_d_htrans[1:0], p_i_htrans[1:0]) .........................................817 14.2.4.2 Write (p_d_hwrite, p_i_hwrite) ....................................................................817 14.2.4.3 Transfer size (p_d_hsize[1:0], p_i_hsize[1:0]) .............................................817 14.2.4.4 Burst type (p_d_hburst[2:0], p_i_hburst[2:0]) ..............................................818 14.2.4.5 Protection control (p_d_hprot[5:0], p_i_hprot[5:0]) ....................................818 14.2.4.6 Transfer data error (p_d_htrans_derr) ...........................................................820 14.2.4.7 Globally coherent access — (p_d_gbl) .........................................................820 14.2.4.8 Cache way replacement (p_d_wayrep[0:1], p_i_wayrep[0:1]) ....................820 14.2.5 Byte lane specification ....................................................................................................820 14.2.5.1 Unaligned access (p_d_hunalign, p_i_hunalign) ..........................................821 14.2.5.2 Byte strobes (p_d_hbstrb[7:0], p_i_hbstrb[7:0]) ..........................................821 14.2.6 Transfer control signals ...................................................................................................831 14.2.6.1 Transfer ready (p_d_hready, p_i_hready) .....................................................831 14.2.6.2 Transfer response (p_d_hresp[2:0], p_i_hresp[1:0]) ....................................831 14.2.6.3 Bus stall global write request (p_stall_bus_gwrite) ......................................832 14.2.7 AHB clock enable signals ...............................................................................................832 14.2.7.1 Instruction AHB clock enable (p_i_ahb_clken) ...........................................832 14.2.7.2 Data AHB clock enable (p_d_ahb_clken) ....................................................833 e200z759n3 Core Reference Manual, Rev. 2 16 Freescale Semiconductor 14.2.8 Master ID configuration signals .....................................................................................833 14.2.8.1 CPU master ID (p_masterid[3:0]) .................................................................833 14.2.8.2 Nexus master ID (nex_masterid[3:0]) ..........................................................833 14.2.9 Coherency control signals ...............................................................................................833 14.2.9.1 Snoop ready (p_snp_rdy) ..............................................................................833 14.2.9.2 Snoop request (p_snp_req) ...........................................................................834 14.2.9.3 Snoop command input (p_snp_cmd_in[0:1]) ...............................................834 14.2.9.4 Snoop request ID input (p_snp_id_in[0:3]) ..................................................834 14.2.9.5 Snoop address input (p_snp_addr_in[0:26]) .................................................835 14.2.9.6 Snoop acknowledge (p_snp_ack) .................................................................835 14.2.9.7 Snoop request ID output (p_snp_id_out[0:3]) ..............................................835 14.2.9.8 Snoop response (p_snp_resp[0:4]) ................................................................835 14.2.9.9 Cache stalled (p_cac_stalled) ........................................................................836 14.2.9.10 Data cache enabled (p_d_cache_en) .............................................................836 14.2.10Memory synchronization control signals ........................................................................836 14.2.10.1 Synchronization request in (p_sync_req_in) ................................................836 14.2.10.2 Synchronization request acknowledge out (p_sync_ack_out) ......................836 14.2.10.3 Synchronization request out (p_sync_req_out) ............................................837 14.2.10.4 Synchronization request acknowledge in (p_sync_ack_in) ..........................837 14.2.11Interrupt signals ..............................................................................................................837 14.2.11.1 External input interrupt request (p_extint_b) ................................................837 14.2.11.2 Critical input interrupt request (p_critint_b) .................................................838 14.2.11.3 Non-maskable input interrupt request (p_nmi_b) .........................................838 14.2.11.4 Interrupt pending (p_ipend) ..........................................................................838 14.2.11.5 Autovector (p_avec_b) .................................................................................838 14.2.11.6 Interrupt vector offset (p_voffset[0:15]) .......................................................838 14.2.11.7 Interrupt vector acknowledge (p_iack) .........................................................839 14.2.11.8 Machine check (p_mcp_b) ............................................................................839 14.2.12External translation alteration signals .............................................................................839 14.2.12.1 External PID enable (p_extpid_en) ...............................................................839 14.2.12.2 External PID in (p_extpid[6:7]) ....................................................................839 14.2.13Timer facility signals ......................................................................................................840 14.2.13.1 Timer disable (p_tbdisable) ..........................................................................840 14.2.13.2 Timer external clock (p_tbclk) ......................................................................840 14.2.13.3 Timer interrupt status (p_tbint) .....................................................................840 14.2.14Processor reservation signals ..........................................................................................840 14.2.14.1 CPU reservation status (p_rsrv) ....................................................................840 14.2.14.2 CPU reservation clear (p_rsrv_clr) ...............................................................840 14.2.15Miscellaneous processor signals .....................................................................................841 14.2.15.1 CPU ID (p_cpuid[0:7]) .................................................................................841 14.2.15.2 PID0 outputs (p_pid0[0:7]) ...........................................................................841 14.2.15.3 PID0 update (p_pid0_updt) ..........................................................................841 14.2.15.4 System version (p_sysvers[0:31]) .................................................................841 14.2.15.5 Processor version (p_pvrin[16:31]) ..............................................................841 14.2.15.6 HID1 system control (p_hid1_sysctl[0:7]) ...................................................842 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 17 14.2.15.7 Debug event outputs (p_devnt_out[0:7]) ......................................................842 14.2.16Processor state signals ....................................................................................................842 14.2.16.1 Processor mode (p_mode[0:3]) .....................................................................842 14.2.16.2 Processor execution pipeline status (p_pstat_pipe0[0:5], p_pstat_pipe1[0:5]) .. .......................................................................................................................................842 14.2.16.3 Branch prediction status (p_brstat[0:1]) .......................................................843 14.2.16.4 Processor exception enable MSR values (p_msr_EE, p_msr_CE, p_msr_DE, p_msr_ME) ...................................................................................................................844 14.2.16.5 Processor return from interrupt (p_rfi, p_rfci, p_rfdi, p_rfmci) ...................844 14.2.16.6 Processor machine check (p_mcp_out) ........................................................844 14.2.17Power management control signals ................................................................................844 14.2.17.1 Processor waiting (p_waiting) ......................................................................844 14.2.17.2 Processor halt request (p_halt) ......................................................................845 14.2.17.3 Processor halted (p_halted) ...........................................................................845 14.2.17.4 Processor stop request (p_stop) ....................................................................845 14.2.17.5 Processor stopped (p_stopped) .....................................................................845 14.2.17.6 Low-power mode signals (p_doze, p_nap, p_sleep) .....................................845 14.2.17.7 Wakeup (p_wakeup) .....................................................................................845 14.2.18Performance monitor signals ..........................................................................................846 14.2.18.1 Performance monitor event (p_pm_event) ...................................................846 14.2.18.2 Performance monitor counter 0 overflow state (p_pmc0_ov) ......................846 14.2.18.3 Performance monitor counter 1 overflow state (p_pmc1_ov) ......................846 14.2.18.4 Performance monitor counter 2 overflow state (p_pmc2_ov) ......................846 14.2.18.5 Performance monitor counter 3 overflow state (p_pmc3_ov) ......................846 14.2.18.6 Performance monitor counter 3 qualifier inputs (p_pmc[0,1,2,3]_qual) ......846 14.2.19Debug event input signals ...............................................................................................846 14.2.19.1 Unconditional debug event (p_ude) ..............................................................847 14.2.19.2 External debug event 1 (p_devt1) .................................................................847 14.2.19.3 External debug event 2 (p_devt2) .................................................................847 14.2.20Debug event output signals (p_devnt_out[0:7]) .............................................................847 14.2.21Debug/emulation (Nexus 1/ OnCE) support signals .......................................................847 14.2.21.1 OnCE enable (jd_en_once) ...........................................................................848 14.2.21.2 Debug session (jd_debug_b) .........................................................................848 14.2.21.3 Debug request (jd_de_b) ...............................................................................848 14.2.21.4 DE_b active high output enable (jd_de_en) .................................................849 14.2.21.5 Processor clock on (jd_mclk_on) .................................................................849 14.2.21.6 Watchpoint events (jd_watchpt[0:29]) ..........................................................849 14.2.22Development support (Nexus 3) signals .........................................................................849 14.2.23JTAG support signals ......................................................................................................850 14.2.23.1 JTAG/OnCE serial input (j_tdi) ....................................................................850 14.2.23.2 JTAG/OnCE serial clock (j_tclk) ..................................................................850 14.2.23.3 JTAG/OnCE serial output (j_tdo) .................................................................850 14.2.23.4 JTAG/OnCE test mode select (j_tms) ...........................................................850 14.2.23.5 JTAG/OnCE test reset (j_trst_b) ...................................................................851 14.2.23.6 Test-Logic-Reset (j_tst_log_rst) ...................................................................851 e200z759n3 Core Reference Manual, Rev. 2 18 Freescale Semiconductor 14.2.23.7 Run-Test/Idle (j_rti) ......................................................................................851 14.2.23.8 Capture IR (j_capture_ir) ..............................................................................851 14.2.23.9 Shift IR (j_shift_ir) .......................................................................................851 14.2.23.10Update IR (j_update_ir) ...............................................................................852 14.2.23.11Capture DR (j_capture_dr) ..........................................................................852 14.2.23.12Shift DR (j_shift_dr) ....................................................................................852 14.2.23.13Update DR w/write (j_update_gp_reg) .......................................................852 14.2.23.14Register select (j_gp_regsel) .......................................................................852 14.2.23.15Enable OnCE register select (j_en_once_regsel) ........................................852 14.2.23.16External Nexus register select (j_nexus_regsel) ..........................................853 14.2.23.17External LSRL register select (j_lsrl_regsel) ..............................................853 14.2.23.18Serial data (j_serial_data) ............................................................................853 14.2.23.19Key data in (j_key_in) .................................................................................854 14.2.24JTAG ID signals ..............................................................................................................854 14.2.24.1 JTAG ID sequence (j_id_sequence[0:1]) ......................................................855 14.2.24.2 JTAG ID sequence (j_id_sequence[2:9]) ......................................................855 14.2.24.3 JTAG ID version (j_id_version[0:3]) ............................................................855 14.2.25Test signals ......................................................................................................................856 14.3 Timing diagrams ............................................................................................................................856 14.3.1 AHB clock enable and the internal HCLK .....................................................................856 14.3.2 Processor instruction/data transfers ................................................................................856 14.3.2.1 Basic read transfer cycles .............................................................................858 14.3.2.2 Read transfer with wait state .........................................................................859 14.3.2.3 Basic write transfer cycles ............................................................................860 14.3.2.4 Write transfer with wait states ......................................................................862 14.3.2.5 Read and write transfers ...............................................................................863 14.3.2.6 Misaligned accesses ......................................................................................867 14.3.2.7 Burst accesses ...............................................................................................870 14.3.2.8 Error termination operation ..........................................................................874 14.3.3 Memory synchronization control operation ....................................................................877 14.3.4 Cache coherency interface operation ..............................................................................880 14.3.4.1 Stop mode entry/exit and snoop ready signaling ..........................................884 14.3.5 Power management .........................................................................................................885 14.3.6 Interrupt Interface ...........................................................................................................886 14.3.7 Time base interface .........................................................................................................888 14.3.8 JTAG test interface .........................................................................................................889 Chapter 15 Internal Core Interfaces 15.1 Signal index ...................................................................................................................................891 15.2 Signal descriptions ........................................................................................................................896 15.2.1 Address and data buses ...................................................................................................896 15.2.1.1 Data address bus (p_d_addr[0:31]) ...............................................................896 15.2.1.2 Instruction address bus (p_i_addr[0:31]) ......................................................897 15.2.1.3 Data input data bus (p_d_data_in[0:63]) ......................................................897 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 19 15.2.2 15.2.3 15.2.4 15.2.5 15.2.6 15.2.1.4 Instruction input data bus (p_i_data_in[0:63]) .............................................897 15.2.1.5 Data output data bus (p_d_data_out[0:63]) ..................................................897 Transfer attribute signals .................................................................................................897 15.2.2.1 Read/write (p_d_rw_b) .................................................................................897 15.2.2.2 Data transfer code (p_d_tc[0:1]) ...................................................................897 15.2.2.3 Instruction transfer code (p_i_tc[0:4]) ..........................................................897 15.2.2.4 Data transfer size (p_d_tsiz[0:2]) .................................................................899 15.2.2.5 Element size (p_elsiz[0:1]) ...........................................................................899 15.2.2.6 Instruction Transfer Size (p_i_tsiz[0:2]) .......................................................900 15.2.2.7 Data Transfer Type (p_d_ttype[0:5]) ............................................................900 15.2.2.8 Data sequential access (p_d_seq_b) .............................................................901 15.2.2.9 Instruction sequential access (p_i_seq_b) .....................................................901 15.2.2.10 Misaligned access (p_d_misal_b) .................................................................901 15.2.2.11 Block data transfer (p_d_bdt) .......................................................................901 15.2.2.12 Error kill control (p_d_err_kill, p_i_err_kill) ...............................................901 Transfer control signals ...................................................................................................902 15.2.3.1 Halt ZLB (p_d_halt_zlb, p_i_halt_zlb) ........................................................902 15.2.3.2 Transfer request (p_d_treq_b, p_i_treq_b) ...................................................902 15.2.3.3 Transfer busy (p_d_tbusy[0:1]_b, p_i_tbusy[0:1]_b) ...................................902 15.2.3.4 Transfer abort (p_d_abort_b, p_i_abort_b) ...................................................902 15.2.3.5 Transfer acknowledge (p_d_ta_b, p_i_ta_b) ................................................903 15.2.3.6 Transfer error acknowledge (p_d_tea_b, p_i_tea_b) ....................................903 15.2.3.7 Translation miss (p_d_tmiss_b, p_i_tmiss_b) ..............................................903 15.2.3.8 Byte ordering error (p_d_boerr_b, p_i_boerr_b) ..........................................903 15.2.3.9 Alignment error (p_d_alignerr_b) ................................................................903 15.2.3.10 Cache tag parity error (p_d_tag_perr_b, p_i_tag_perr_b) ............................903 15.2.3.11 Cache data parity error (p_d_data_perr_b, p_i_data_perr_b) .......................904 15.2.3.12 External termination error (p_d_xte_b, p_i_xte_b) ......................................904 15.2.3.13 Guarded termination status (p_d_ta_g) .........................................................904 15.2.3.14 Cache-inhibited termination status (p_d_ta_ci) ............................................904 15.2.3.15 Access physical address (p_[d,i]_ta_addr[0:31]) ..........................................904 15.2.3.16 Termination error signaling and qualification ...............................................904 15.2.3.17 Store exclusive failure (p_d_xfail_b) ............................................................905 15.2.3.18 Read endian mode select (p_d_rdbigend_b, p_i_rdbigend_b) .....................906 15.2.3.19 Write endian mode select (p_d_wrbigend_b) ...............................................906 15.2.3.20 VLE mode select (p_rd_vle) .........................................................................906 Byte lane specification ....................................................................................................906 External SPR interface signals ........................................................................................929 15.2.5.1 SPR number (p_sprnum[0:9]) .......................................................................929 15.2.5.2 SPR read data (p_spr_in[0:31]) ....................................................................929 15.2.5.3 SPR write data (p_spr_out[0:31]) .................................................................929 15.2.5.4 SPR read control (p_rd_spr) .........................................................................929 15.2.5.5 SPR write control (p_wr_spr) .......................................................................929 Miscellaneous processor signals .....................................................................................929 15.2.6.1 PID0 outputs (p_pid0[0:7]) ...........................................................................929 e200z759n3 Core Reference Manual, Rev. 2 20 Freescale Semiconductor 15.2.6.2 PID0 update (p_pid0_updt) ..........................................................................929 15.2.7 Cache/MMU status signals .............................................................................................930 15.2.7.1 Cache enabled (p_d_cache_enabled, p_i_cache_enabled) ...........................930 15.2.7.2 Cache/MMU busy (p_d_cmbusy, p_i_cmbusy) ...........................................930 15.2.7.3 Cache set CUL (p_d_set_cul, p_i_set_cul) ...................................................930 15.2.7.4 User cache lock DSI control (p_ucl_dsi) ......................................................930 15.2.7.5 Cache push parity error (p_d_cp_perr) .........................................................930 15.2.7.6 Cache push address (p_d_push_addr[0:31]) .................................................930 15.2.7.7 Bus write error (p_d_bus_wrerr) ..................................................................931 15.2.7.8 Bus write error address (p_d_bus_wrerr_addr[0:31]) ..................................931 15.2.7.9 Cache linefill status (p_d_lf_status[0:3], p_i_lf_status[0:3]) .......................931 15.2.7.10 Linefill status address (p_d_lf_addr[0:31], p_i_lf_addr[0:31]) ....................931 15.2.7.11 Debug mode MMU disable (p_d_dmdis, p_i_dmdis) ..................................931 15.2.7.12 Debug mode MMU ‘VLE’ attribute (p_dbg_vle) .........................................931 15.2.7.13 Debug mode MMU ‘W’ attribute (p_d_dbg_w) ...........................................932 15.2.7.14 Debug mode MMU ‘I’ attribute (p_d_dbg_i, p_i_dbg_i) .............................932 15.2.7.15 Debug mode MMU ‘M’ attribute (p_d_dbg_m, p_i_dbg_m) ......................932 15.2.7.16 Debug mode MMU ‘G’ attribute (p_d_dbg_g) ............................................932 15.2.7.17 Debug mode MMU ‘E’ attribute (p_d_dbg_e, p_i_dbg_e) ..........................932 15.2.8 EFPU interface signals ....................................................................................................932 15.2.9 Test signals ......................................................................................................................932 15.3 Timing diagrams ............................................................................................................................932 15.3.1 Processor instruction/data transfers ................................................................................932 15.3.1.1 Basic read transfer cycles .............................................................................934 15.3.1.2 Read transfer with wait states .......................................................................936 15.3.1.3 Basic write transfer cycles ............................................................................938 15.3.1.4 Write transfer with wait states ......................................................................940 15.3.1.5 Read and write transfers ...............................................................................941 15.3.1.6 Misaligned accesses ......................................................................................944 15.3.1.7 Abort operation .............................................................................................949 15.3.1.8 Error termination and abort operation ..........................................................950 15.3.2 SPR interface operation ..................................................................................................953 Appendix A Register Summary Appendix B Revision History e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 21 Chapter 1 e200z759n3 Overview 1.1 Overview of the e200z759n3 The e200z759n3 processor family is a set of CPU cores that implement low-cost versions of the PowerISA 2.06 architecture. The e200z759n3 is a dual-issue, 32-bit PowerISA 2.06 compliant design with 64-bit general purpose registers (GPRs). PowerISA 2.06 floating-point instructions are not supported by e200z759n3 in hardware, but are trapped and may be emulated by software. An Embedded Floating-point (EFPU2) APU is provided to support real-time single-precision embedded numerics operations using the general-purpose registers. A Signal Processing Extension (SPE) APU is provided to support real-time SIMD fixed point and single-precision, embedded numerics operations using the general-purpose registers. All arithmetic instructions that execute in the core operate on data in the general purpose registers (GPRs). The GPRs have been extended to 64-bits in order to support vector instructions defined by the SPE APU. These instructions operate on a vector pair of 16-bit or 32-bit data types, and deliver vector and scalar results. In addition to the base PowerISA 2.06 instruction set support, the e200z759n3 core also implements the VLE (variable-length encoding) technology, providing improved code density. The VLE technology is further documented in “PowerPC VLE Definition, Version 1.03", a separate document. The e200z759n3 processor integrates a pair of integer execution units, a branch control unit, instruction fetch unit and load/store unit, and a multi-ported register file capable of sustaining six read and three write operations per clock. Most integer instructions execute in a single clock cycle. Branch target prefetching is performed by the branch unit to allow single-cycle branches in many cases. The e200z759n3 contains a 16 KB instruction cache (ICache), a 16 KB Data Cache, as well as a Memory Management Unit. A Nexus Class 3+ module is also integrated. 1.1.1 Features The following is a list of some of the key features of the e200z759n3: • Dual issue, 32-bit PowerISA 2.06 compliant CPU • Implements the VLE APU for reduced code footprint • In-order execution and retirement • Precise exception handling • Branch processing unit — Dedicated branch address calculation adder — Branch target prefetching using BTB — Return Address Stack e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 23 • • • • • • • • • • • 1.1.2 Load/store unit — 3 cycle load latency — Fully pipelined — Big and Little endian support — Misaligned access support 64-bit General Purpose Register file Dual AHB 2.v6 64-bit System buses Memory Management Unit (MMU) with 32-entry fully-associative TLB and multiple page size support 16Kbyte, 4-Way Set Associative Harvard I and D Caches Embedded Floating-point APU (EFPU2) supporting scalar and SIMD single-precision floating-point operations Signal Processing Extension (SPE1.1) APU supporting SIMD fixed-point operations, using the 64-bit General Purpose Register file. Performance Monitor APU supporting execution profiling Nexus Class 3-plus Real-time Development Unit Power management — Low power design - extensive clock gating — Power saving modes: doze, nap, sleep, wait — Dynamic power management of execution units, caches and MMUs Testability — Synthesizeable, MuxD scan design — Optional ABIST/MBIST for arrays — Built-in Parallel Signature Unit Microarchitecture summary The e200z759n3 processor utilizes a ten stage instruction pipeline, with four stages for execution. The Instruction Fetch 0, Instruction Fetch 1, Instruction Fetch 2, Instruction Decode0, Instruction Decode 1/Register file Read/ EA Calc, Execute 0/ Memory Access0, Execute1/Memory Access1, Execute2/Memory Access2, Execute 3, and Register Writeback stages operate in an overlapped fashion, allowing single clock instruction execution for most instructions. The integer execution units each consists of a 32-bit Arithmetic Unit (AU), a Logic Unit (LU), a 32-bit Barrel shifter (Shifter), a Mask-Insertion Unit (MIU), a Condition Register manipulation Unit (CRU), a Count-Leading-Zeros unit (CLZ), a 32x32 Hardware Multiplier array, and result feed-forward hardware. Integer EU1 also supports hardware division. Most arithmetic and logical operations are executed in a single cycle with the exception of multiply, which is implemented with a pipelined hardware array, and the divide instructions. A Count-Leading-Zeros unit operates in a single clock cycle. e200z759n3 Core Reference Manual, Rev. 2 24 Freescale Semiconductor The Instruction Unit contains a PC incrementer and dedicated Branch Address adders to minimize delays during change of flow operations. Sequential prefetching is performed to ensure a supply of instructions into the execution pipeline. Branch target prefetching is performed to accelerate taken branches. Prefetched instructions are placed into an instruction buffer. Branch target addresses are calculated in parallel with branch instruction decode, resulting in execution time of four clocks for correctly predicted branches. Conditional branches that are not taken execute in a single clock. Branches with successful BTB target prefetching have an effective execution time of one clock if correctly predicted. Memory load and store operations are provided for byte, halfword, word (32-bit), and doubleword data with automatic zero or sign extension of byte and halfword load data as well as optional byte reversal of data. These instructions can be pipelined to allow effective single cycle throughput. Load and store multiple word instructions allow low overhead context save and restore operations. The load/store unit contains a dedicated effective address adder to allow effective address generation to be optimized. The Condition Register unit supports the condition register (CR) and condition register operations defined by the PowerPC architecture. The condition register consists of eight 4-bit fields that reflect the results of certain operations, such as move, integer and floating-point compare, arithmetic, and logical instructions, and provide a mechanism for testing and branching. Vectored and autovectored interrupts are supported by the CPU. Vectored interrupt support is provided to allow multiple interrupt sources to have unique interrupt handlers invoked with no software overhead. The SPE1.1 APU supports vector instructions operating on 16 and 32-bit fixed-point data types. The EFPU2 APU supports 32-bit IEEE-754 single-precision floating-point formats, and supports scalar and vector single-precision floating-point operations in a pipelined fashion. The 64-bit general purpose register file is used for source and destination operands, and there is a unified storage model for scalar single-precision floating-point data types of 32-bits and the normal integer type. Low latency fixed-point and floating-point add, subtract, mixed add/subtract, sum, diff, min, max, multiply, multiply-add, multiply-sub, divide, square root, compare, and conversion operations are provided, and most operations can be pipelined. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 25 OnCE/NEXUS CPU CONTROL LOGIC SIMD and EFP UNITS CONTROL LOGIC MEMORY MANAGEMENT UNIT LR SPR CR INTEGER EXECUTION UNITS GPR CTR XER INSTRUCTION BUFFER 64 N CONTROL INSTRUCTION CACHE 32 DATA INSTRUCTION BUS INTERFACE UNIT ADDRESS MULTIPLY UNITS INSTRUCTION UNIT CONTROL EXTERNAL SPR INTERFACE DATA (MTSPR/MFSPR) PC UNIT BRANCH UNIT LOAD/ STORE UNIT DATA CACHE DATA BUS INTERFACE UNIT 32 ADDRESS 64 DATA N CONTROL Figure 1-1. e200z759n3 block diagram 1.1.2.1 Instruction unit features The features of the e200z759n3 Instruction unit are: • 64-bit path to cache supports fetching of two 32-bit instruction per clock • Instruction buffer holds up to 10 32-bit instructions • Dedicated PC incrementer supporting instruction prefetches e200z759n3 Core Reference Manual, Rev. 2 26 Freescale Semiconductor • Branch unit with dedicated branch address adder, and branch lookahead logic (BTB) supporting single cycle execution of successfully predicted branches 1.1.2.2 Integer unit features The e200z759n3 integer units support single cycle execution of most integer instructions: • 32-bit AU for arithmetic and comparison operations • 32-bit LU for logical operations • 32-bit priority encoder for count leading zero’s function • 32-bit single cycle barrel shifter for static shifts and rotates • 32-bit mask unit for data masking and insertion • Divider logic for signed and unsigned divide in 4-15 clocks with minimized execution timing (EU1 only) • Pipelined 32x32 hardware multiplier array supports 32 × 32 32 multiply with 3 clock latency, 1 clock throughput 1.1.2.3 Load/store unit features The e200z759n3 load/store unit supports load, store, and the load multiple / store multiple instructions: • 32-bit effective address adder for data memory address calculations • Pipelined operation supports throughput of one load or store operation per cycle • Dedicated 64-bit interface to memory supports saving and restoring of up to two registers per cycle for load multiple and store multiple word instructions 1.1.2.4 Cache features The features of the cache are as follows: • Separate 16 KB, 4-way set-associative instruction and data caches (Harvard architecture) • Copyback and Writethrough Support • 8-entry store buffer • Push buffer • Linefill buffer • 32-bit address bus plus attributes and control • Separate uni-directional 64-bit read data bus and 64-bit write data bus • Support for cache line locking • Support for way allocation • Support for write allocation policies • Support for tag and data parity • Support for multi-bit EDC for the ICache • Correction/auto-invalidation capability for the I and D caches • Hardware cache coherency support for the data cache e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 27 1.1.2.5 MMU Features The features of the MMU are as follows: • Virtual memory support • 32-bit virtual and physical addresses • 8-bit process identifier • 32-entry fully-associative TLB • Multiple page size support from 1 KB to 4 GB • Entry flush protection 1.1.2.6 e200z759n3 system bus features The features of the e200z759n3 system bus interface are as follows: • Independent Instruction and Data interfaces • AMBA AHB2.v6 protocol • 32-bit address bus, 64-bit data bus, plus attributes and control • Data interface provides separate uni-directional 64-bit read and write data buses • Support for HCLK running at a slower rate than CPU clock e200z759n3 Core Reference Manual, Rev. 2 28 Freescale Semiconductor Chapter 2 Register Model This section describes the registers implemented in the e200z759n3 core. It includes an overview of registers defined by the Power Architecture Book E architecture, highlighting differences in how these registers are implemented in the e200z759n3 core, and provides a detailed description of e200z759n3-specific registers. Full descriptions of the architecture-defined register set are provided in Book E: Enhanced PowerPCtm Architecture. The Power Architecture Book E architecture defmines register-to-register operations for all computational instructions. Source data for these instructions are accessed from the on-chip registers or are provided as immediate values embedded in the opcode. The three-register instruction format allows specification of a target register distinct from the two source registers, thus preserving the original data for use by other instructions. Data is transferred between memory and registers with explicit load and store instructions only. e200z759n3 extends the General Purpose Registers to 64-bits for supporting SPE and EFPU APU operations. PowerPC Book E instructions operate on the lower 32 bits of the GPRs only, and the upper 32 bits are unaffected by these instructions. SPE vector instructions operate on the entire 64-bit register. The SPE APU defines load and store instructions for transferring 64-bit values to/from memory. Figure 1 and Figure 3 show the complete e200z759n3 register set. Figure 1 shows the registers that are accessible while in supervisor mode, and Figure 3 shows the set of registers that are accessible while in user mode. The number to the right of the special-purpose registers (SPRs) is the decimal number used in the instruction syntax to access the register (for example, the integer exception register (XER) is SPR 1). e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 29 General Registers SUPERVISOR Mode Programmer’s Model SPRs Exception Handling/Control Registers CR GPR0 Count Register GPR1 CTR SPR 9 Link Register LR GPR31 SPR 8 XER XER SPR 1 Accumulator ACC Processor Version PVR SPR 287 Processor ID PIR SPR 286 Debug HID1 SPRG2 SPRG3 SPR 275 SPRG4 SPR 276 SPRG5 SPR 277 DSRR11 SPR 575 SPRG6 SPR 278 MCSRR01 SPRG7 SPR 279 MCSRR11 SPRG8 SPR 604 SPRG9 SPR 605 SPR 273 SRR1 SPR 27 Interrupt Vector Offset SPR 274 CSRR0 SPR 58 IVOR0 SPR 400 CSRR1 SPR 59 IVOR1 SPR 401 DSRR01 SPR 574 SPR 570 IVOR15 SPR 415 SPR 571 IVOR321 SPR 528 IVOR351 SPR 531 SPR 256 SPR 62 Machine Check Syndrome Register Machine Check Address Register SPR 572 MCAR SPR 573 Data Exception Address 1 DEAR Time Base (writeonly) Instruction Address Compare Exception Syndrome ESR Timers SPR 1023 SPR 63 SPR 26 MCSR Registers2 Debug Control SPRG1 IVPR SRR0 SPR 1009 System Version SVR SPR 272 USPRG0 Hardware Implementation Dependent1 HID0 SPR 1008 Machine State MSR SPRG0 User SPR Processor Control Registers Interrupt Vector Prefix Save and Restore SPR General General-Purpose Registers Condition Register SPR 61 BTB Register BTB Control1 Decrementer TBL SPR 284 DEC SPR 22 TBU SPR 285 DECAR SPR 54 SPR 1013 BUCSR SPE/EFPU Registers Control and Status TCR SPR 340 TSR SPR 336 SPE /EFPU APU Status and Control Register DBCR0 SPR 308 IAC1 SPR 312 DBCR1 SPR 309 IAC2 SPR 313 DBCR2 SPR 310 IAC3 SPR 314 DBCR31 SPR 561 IAC4 SPR 315 DBCR41 SPR 563 IAC5 SPR 565 DBCR51 SPR 564 IAC6 SPR 566 DBCR61 SPR 603 IAC7 SPR 567 MAS0 SPR 624 DBERC01 SPR 569 MMUCSR0 SPR 1012 IAC8 SPR 568 SPR 625 DEVENT1 MAS1 SPR 975 MMUCFG SPR 1015 SPR 626 DDAM1 MAS2 SPR 576 TLB0CFG SPR 688 MAS3 SPR 627 TLB1CFG SPR 689 MAS4 SPR 628 MAS6 SPR 630 Debug Status DBSR SPR 304 1 Debug Counter DBCNT SPR 562 SPEFSCR Memory Management Registers MMU Assist1 Data Address Compare DAC1 SPR 316 DAC2 SPR 317 SPR 512 Process ID PID0 Control & Configuration SPR 48 Cache Registers Data Value Compare DVC1 SPR 318 DVC2 SPR 319 1 - These Zen-specific registers may not be supported by other Power Architecture processors 2 - Optional registers defined by the Power Architecture Book-E architecture 3 - Read-only registers Cache Control1 Cache Configuration (Read-only) L1CSR0 SPR 1010 L1CSR1 SPR 1011 L1CFG0 SPR 515 L1FINV0 SPR 1016 L1CFG1 SPR 516 L1FINV1 SPR 959 Figure 1. e200z759n3 supervisor mode programmer’s model SPRs e200z759n3 Core Reference Manual, Rev. 2 30 Freescale Semiconductor Supervisor Mode Programmer’s Model DCRs and PMRs Performance Monitor Registers1 PSU User Control (read-only) Control PSU Registers1 Counters PSCR DCR 272 PMGC0 PMR 400 UPMGC0 PMR 384 PMC0 PMR 16 PSSR DCR 273 PMLCa0 PMR 144 UPMLCa0 PMR 128 PMC1 PMR 17 PSHR DCR 274 PMLCa1 PMR 145 UPMLCa1 PMR 129 PMC2 PMR 18 PSLR DCR 275 PMLCa2 PMR 146 UPMLCa2 PMR 130 PMC3 PMR 19 PMLCa3 PMR 147 UPMLCa3 PMR 131 PMLCb0 PMR 272 UPMLCb0 PMR 256 PMLCb1 PMR 273 UPMLCb1 PMR 257 PMLCb2 PMR 274 UPMLCb2 PMR 258 PMLCb3 PMR 275 UPMLCb3 PMR 259 User Counters (read-only) UPMC0 PMR 0 UPMC1 PMR 1 UPMC2 PMR 2 UPMC3 PMR 3 PSCTR DCR 276 PSUHR DCR 277 PSULR DCR 278 Cache Access Registers1 CDACNTL DCR 351 CDADATA DCR 350 1 - These Zen-specific registers may not be supported by other Power Architecture processors Figure 2. e200z759n3 supervisor mode programmer’s model DCRs and PMRs e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 31 USER Mode Programmer’s Model SPRs Timers (Read only) General Registers CR Count Register CTR GPR0 SPR 9 SPR 268 TBU SPR 269 Control Registers SPR 8 SPR General (Read-only) GPR31 XER XER TBL GPR1 Link Register LR Time Base General-Purpose Registers Condition Register SPR 1 Accumulator ACC SPRG4 SPR 260 SPRG5 SPR 261 SPRG6 SPR 262 SPRG7 SPR 263 USPRG0 Debug SPR 975 DDAM SPR 576 Cache Configuration L1CFG0 SPR 515 L1CFG1 SPR 516 APU Registers SPE/EFPU APU Status and Control Register SPEFSCR User SPR DEVENT Cache Register (Read-only) SPR 512 SPR 256 Figure 3. e200z759n3 user mode programmer’s model SPRs User Mode Programmer’s Model PMRs Performance Monitor Registers1 User Control (read-only) User Counters (read-only) UPMGC0 PMR 384 UPMC0 PMR 0 UPMLCa0 PMR 128 UPMC1 PMR 1 UPMLCa1 PMR 129 UPMC2 PMR 2 UPMLCa2 PMR 130 UPMC3 PMR 3 UPMLCa3 PMR 131 UPMLCb0 PMR 256 UPMLCb1 PMR 257 UPMLCb2 PMR 258 UPMLCb3 PMR 259 1 - These Zen-specific registers may not be supported by other Power Architecture processors Figure 4. e200z759n3 user mode programmer’s model PMRs General purpose registers (GPRs) are accessed through instruction operands. Access to other registers can be explicit (by using instructions for that purpose such as Move to Special Purpose Register (mtspr) and Move from Special Purpose Register (mfspr) instructions) or implicit as part of the execution of an instruction. Some registers are accessed both explicitly and implicitly. e200z759n3 Core Reference Manual, Rev. 2 32 Freescale Semiconductor 2.1 PowerPC Book E registers e200z759n3 supports most of the registers defined by Book E: Enhanced PowerPCtm Architecture. Notable exceptions are the Floating Point registers FPR0-FPR31 and FPSCR. e200z759n3 does not support the Book E Floating Point Architecture in hardware. The General Purpose registers have been extended to 64-bits. The Zen supported Power Architecture Book E registers are described as follows (Zen-specific registers are described in the next sub-section): • User-level registers —The user-level registers can be accessed by all software with either user or supervisor privileges. They include the following: — General-purpose registers (GPRs). The thirty-two 64-bit GPRs (GPR0–GPR31) serve as data source or destination registers for integer instructions and provide data for generating addresses. PowerPC Book E instructions affect only the lower 32 bits of the GPRs. SPE and EFP APU instructions are provided, which operate on the entire 64-bit register. — Condition register (CR). The 32-bit CR consists of eight 4-bit fields, CR0–CR7, that reflect results of certain arithmetic operations and provide a mechanism for testing and branching. See “Condition Register (CR),” in Chapter 3, “Branch and Condition Register Operations, Book E: Enhanced PowerPCtm Architecture. The remaining user-level registers are SPRs. Note that the Power Architecture architecture provides the mtspr and mfspr instructions for accessing SPRs. — Integer exception register (XER). The XER indicates overflow and carries for integer operations. See “XER Register (XER),” in Chapter 4, “Integer Operations” of Book E: Enhanced PowerPCtm Architecture for more information. — Link register (LR). The LR provides the branch target address for the Branch [Conditional] to Link Register (bclr, bclrl, se_blr, se_blrl) instructions, and is used to hold the address of the instruction that follows a branch and link instruction, typically used for linking to subroutines. See “Link Register (LR)”, in Chapter 3, “Branch and Condition Register Operations” of Book E: Enhanced PowerPCtm Architecture. — Count register (CTR). The CTR holds a loop count that can be decremented during execution of appropriately coded branch instructions. The CTR also provides the branch target address for the Branch [Conditional] to Count Register (bcctr, bcctrl, se_bctr, se_bctrl) instructions. See “Count Register (CTR)”, in Chapter 3, “Branch and Condition Register Operations” of Book E: Enhanced PowerPCtm Architecture. — The Time Base facility (TB) consists of two 32-bit registers—Time Base Upper (TBU) and Time Base Lower (TBL). These two registers are accessible in a read-only fashion to user-level software. See “Time Base”, in Chapter 8, “Timer Facilities” of Book E: Enhanced PowerPCtm Architecture. — SPRG4-SPRG7. The PowerPC Book E architecture defines Software-Use Special Purpose Registers (SPRGs). SPRG4 through SPRG7 are accessible in a read-only fashion by user-level software. Zen does not allow user mode access to the SPRG3 register (defined as implementation dependent by Book E). — USPRG0. The Power Architecture Book E architecture defines User Software-Use Special Purpose Register USPRG0, which is accessible in a read-write fashion by user-level software. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 33 • Supervisor-level registers — In addition to the registers accessible in user mode, Supervisor-level software has access to additional control and status registers used for configuration, exception handling, and other operating system functions. The Power Architecture Book E architecture defines the following supervisor-level registers: — Processor Control registers – Machine State Register (MSR). The MSR defines the state of the processor. The MSR can be modified by the Move to Machine State Register (mtmsr), System Call (sc, se_sc), and Return from Exception (rfi, rfci, rfdi, rfmci, se_rfi, se_rfci, se_rfdi, se_rfmci) instructions. It can be read by the Move from Machine State Register (mfmsr) instruction. When an interrupt occurs, the contents of the MSR are saved to one of the machine state save/restore registers (SRR1, CSRR1, DSRR1, MCSRR1). – Processor version register (PVR). This register is a read-only register that identifies the version (model) and revision level of the Power Architecture processor. – Processor Identification Register (PIR). This read/write register is provided to distinguish the processor from other processors in the system. • Storage Control register – Process ID Register (PID, also referred to as PID0). This register is provided to indicate the current process or task identifier. It is used by the MMU as an extension to the effective address, and by external Nexus 2/3/4 modules for Ownership Trace message generation. PowerPC Book E allows for multiple PIDs; e200z759n3 implements only one. — Interrupt Registers – Data Exception Address Register (DEAR). After most Data Storage Interrupts (DSI), or on an Alignment Interrupt or Data TLB Miss Interrupt, the DEAR is set to the effective address (EA) generated by the faulting instruction. – SPRG0–SPRG7, USPRG0. The SPRG0–SPRG7 and USPRG0 registers are provided for operating system use. Zen does not allow user mode access to the SPRG3 register (defined as implementation dependent by Book E). – Exception Syndrome Register (ESR). The ESR register provides a syndrome to differentiate between the different kinds of exceptions that can generate the same interrupt. – Interrupt Vector Prefix Register (IVPR) and the Interrupt Vector Offset Registers (IVOR0-IVOR15, IVOR32-IVOR35). These registers together provide the address of the interrupt handler for different classes of interrupts. – Save/Restore Register 0 (SRR0). The SRR0 register is used to save machine state on a non-critical interrupt, and contains the address of the instruction at which execution resumes when an rfi or se_rfi instruction is executed at the end of a non-critical class interrupt handler routine. – Critical Save/Restore register 0 (CSRR0). The CSRR0 register is used to save machine state on a critical interrupt, and contains the address of the instruction at which execution resumes when an rfci or se_rfci instruction is executed at the end of a critical class interrupt handler routine. – Save/Restore register 1 (SRR1). The SRR1 register is used to save machine state from the MSR on non-critical interrupts, and to restore machine state when an rfi or se_rfi executes. e200z759n3 Core Reference Manual, Rev. 2 34 Freescale Semiconductor – Critical Save/Restore register 1 (CSRR1). The CSRR1 register is used to save machine state from the MSR on critical interrupts, and to restore machine state when rfci or se_rfci executes. — Debug facility registers – Debug Control Registers (DBCR0-DBCR2). These registers provide control for enabling and configuring debug events. – Debug Status Register (DBSR). This register contains debug event status. – Instruction Address Compare registers (IAC1-IAC4). These registers contain addresses and/or masks that specify Instruction Address Compare debug events. – Data address compare registers (DAC1-2). These registers contain addresses and/or masks that specify Data Address Compare debug events. – Data value compare registers (DVC1-2). These registers contain data values that specify Data Value Compare debug events. — Timer Registers – Time base (TB). The TB is a 64-bit structure provided for maintaining the time of day and operating interval timers. The TB consists of two 32-bit registers, Time Base Upper (TBU) and Time Base Lower (TBL). The Time Base registers can be written to only by supervisor-level software, but can be read by both user and supervisor-level software. – Decrementer register (DEC). This register is a 32-bit decrementing counter that provides a mechanism for causing a decrementer exception after a programmable delay. – Decrementer Auto-Reload (DECAR). This register is provided to support the auto-reload feature of the Decrementer. – Timer Control Register (TCR). This register controls Decrementer, Fixed-Interval Timer, and Watchdog Timer options. – Timer Status Register (TSR). This register contains status on timer events and the most recent Watchdog Timer-initiated processor reset. 2.2 Zen-specific special purpose registers The Power Architecture Book E architecture allows implementation-specific special purpose registers. Those incorporated in the Zen core are as follows: • User-level registers —The user-level registers can be accessed by all software with either user or supervisor privileges. They include the following: — Signal Processing Extension / Embedded Floating-point APU status and control register (SPEFSCR). The SPEFSCR contains all fixed-point and floating-point exception signal bits, exception summary bits, exception enable bits, and rounding control bits needed for compliance with the IEEE 754 standard. See Section 6.2.1, SPE Status and Control Register (SPEFSCR), in Chapter 6, Signal Processing Extension APU (SPE APU) — The L1 Cache Configuration registers (L1CFG0, L1CGF1). These read-only registers allows software to query the configuration of the L1 Harvard caches. • Supervisor-level registers — The following supervisor-level registers are defined in Zen in addition to the Power Architecture Book E registers described above: e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 35 — Configuration Registers – Hardware implementation-dependent register 0 (HID0). This register controls various processor and system functions. – Hardware implementation-dependent register 1 (HID1). This register controls various processor and system functions. — Exception Handling and Control Registers – Machine Check Save/Restore register 0 (MCSRR0). The MCSRR0 register is used to save machine state on a Machine Check interrupt, and contains the address of the instruction at which execution resumes when an rfmci or se_rfmci instruction is executed. – Machine Check Save/Restore register 1 (MCSRR1). The MCSRR1 register is used to save machine state from the MSR on Machine Check interrupts, and to restore machine state when an rfmci or se_rfmci instruction is executed. – Machine Check Syndrome register (MCSR). This register provides a syndrome to differentiate between the different kinds of conditions that can generate a Machine Check. – Machine Check Address register (MCAR). This register provides an address associated with certain Machine Checks. – Debug Save/Restore register 0 (DSRR0). When enabled, the DSRR0 register is used to save the address of the instruction at which execution continues when an rfdi or se_rfdi instruction executes at the end of a debug interrupt handler routine. – Debug Save/Restore register 1 (DSRR1). When enabled, the DSRR1 register is used to save machine status on debug interrupts and to restore machine status when an rfdi or se_rfdi instruction executes. – SPRG8, SPRG9. The SPRG8 and SPRG9 registers are provided for operating system use for the Machine check and Debug APUs. — Debug Facility Registers – Instruction Address Compare registers (IAC5–IAC8). These registers contain addresses and/or masks that are used to specify Instruction Address Compare debug events. – Debug Control Register 3–6 (DBCR3, DBCR4, DBCR5, DBCR6)—These registers provides control for debug functions not described in Power Architecture Book E architecture. – Debug External Resource Control Register 0 (DBERC0)—This register provides control for debug functions not described in Power Architecture Book E architecture. – Debug Counter Register (DBCNT)—This register provides counter capability for debug functions. — Branch Unit Control and Status Register (BUCSR) controls operation of the BTB — Cache Registers – L1 Cache Configuration Registers (L1CFG0, L1CFG1) is a read-only register that allows software to query the configuration of the L1 Caches. – L1 Cache Control and Status Registers (L1CSR0, L1CSR1) control the operation of the L1 Caches such as cache enabling, cache invalidation, cache locking, etc. – L1 Cache Flush and Invalidate Registers (L1FINV0, L1FINV1) controls software flushing e200z759n3 Core Reference Manual, Rev. 2 36 Freescale Semiconductor and invalidation of the L1 Caches. — Memory Management Unit Registers – MMU Configuration Register (MMUCFG) is a read-only register that allows software to query the configuration of the MMU. – MMU Assist (MAS0-MAS4, MAS6) registers. These registers provide the interface to the Zen core from the Memory Management Unit. – MMU Control and Status Register (MMUCSR0) controls invalidation of the MMU. – TLB Configuration Registers (TLB0CFG, TLB1CFG) are read-only registers that allow software to query the configuration of the TLBs. — System version register (SVR). This register is a read-only register that identifies the version (model) and revision level of the System that includes a Zen Power Architecture processor. Note that it is not guaranteed that the implementation of Zen core-specific registers is consistent among Power Architecture processors, although other processors may implement similar or identical registers. All Zen SPR definitions are compliant with the Freescale EIS definitions. 2.3 Zen-specific device control registers In addition to the SPRs described above, implementations may also choose to implement one or more Device Control Registers (DCRs). e200z759n3 implements a set of device control registers to perform a parallel signature capability in the Parallel Signature Unit (PSU). These registers are described in Section 12.9, Parallel Signature unit. 2.4 Special-purpose register descriptions 2.4.1 Machine State Register (MSR) 0 1 2 3 4 7 8 RI 0 PMM DS 0 IS DE 0 FE1 FE0 ME FP PR 0 EE 6 CE 5 0 WE SPE 0 UCLE A complete description of the Machine State Register (MSR) begins on pg. 37 of Book E: Enhanced PowerPCtm Architecture v0.99. The Machine State Register defines the state of the processor. Chapter 7, Interrupts and Exceptions, describes how the MSR is affected when Interrupts occur. The Zen MSR is shown in Figure 5. 0 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Read/ Write; Reset - 0x0 Figure 5. Machine State Register (MSR) The MSR bits are defined in Table 2. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 37 Table 2. MSR field descriptions Bit(s) Name Description 0:4 (32:36) — Reserved1 5 (37) UCLE 6 (38) SPE 7:12 (39:44) — 13 (45) WE Wait State (Power management) enable. 0 Power management is disabled. 1 Power management is enabled. The processor can enter a power-saving mode when additional conditions are present. The mode chosen is determined by the DOZE, NAP, and SLEEP bits in the HID0 register, described in Section 2.4.11, Hardware Implementation Dependent Register 0 (HID0). 14 (46) CE Critical Interrupt Enable 0 Critical Input and Watchdog Timer interrupts are disabled. 1 Critical Input and Watchdog Timer interrupts are enabled. 15 (47) — Preserved1 16 (48) EE External Interrupt Enable 0 External Input, Decrementer, and Fixed-Interval Timer interrupts are disabled. 1 External Input, Decrementer, and Fixed-Interval Timer interrupts are enabled. 17 (49) PR Problem State 0 The processor is in supervisor mode, can execute any instruction, and can access any resource (e.g. GPRs, SPRs, MSR, etc.). 1 The processor is in user mode, cannot execute any privileged instruction, and cannot access any privileged resource. 18 (50) FP Floating-Point Available 0 Floating point unit is unavailable. The processor cannot execute floating-point instructions, including floating-point loads, stores, and moves. 1 Floating Point unit is available. The processor can execute floating-point instructions. User Cache Lock Enable 0 Execution of the cache locking instructions in user mode (MSRPR=1) disabled; DSI exception taken instead, and ILK or DLK set in ESR. 1 Execution of the cache lock instructions in user mode enabled. SPE/EFPU Available 0 Execution of SPE and EFPU APU vector instructions is disabled; SPE/EFPU Unavailable exception taken instead, and SPE bit is set in ESR. 1 Execution of SPE and EFPU APU vector instructions is enabled. Reserved1 Note: For e200z759n3, the floating point unit is not supported in hardware, and an Illegal Instruction exception will be generated for attempted execution of PowerPC Book E floating point instructions regardless of the setting of FP. FP is ignored, but cleared on exceptions. 19 (51) ME Machine Check Enable 0 Asynchronous Machine Check interrupts are disabled. 1 Asynchronous Machine Check interrupts are enabled. 20 (52) FE0 Floating-point exception mode 0 (not used by Zen) 21 (53) — Reserved1 e200z759n3 Core Reference Manual, Rev. 2 38 Freescale Semiconductor Table 2. MSR field descriptions (continued) Description Bit(s) Name 22 (54) DE Debug Interrupt Enable 0 Debug interrupts are disabled. 1 Debug interrupts are enabled. 23 (55) FE1 Floating-point exception mode 1 (not used by Zen) 24 (56) — Reserved1 25 (57) — Preserved1 26 (58) IS Instruction Address Space 0 The processor directs all instruction fetches to address space 0 (TS=0 in the relevant TLB entry). 1 - The processor directs all instruction fetches to address space 1 (TS=1 in the relevant TLB entry). 27 (59) DS Data Address Space 0 The processor directs all data storage accesses to address space 0 (TS=0 in the relevant TLB entry). 1 The processor directs all data storage accesses to address space 1 (TS=1 in the relevant TLB entry). 28 (60) — Reserved1 29 (61) PMM 30 (62) RI Recoverable Interrupt - This bit is provided for software use to detect nested exception conditions. This bit is cleared by hardware when a Machine Check interrupt is taken. 31 (63) — Preserved1 PMM Performance monitor mark bit. System software can set PMM when a marked process is running to enable statistics to be gathered only during the execution of the marked process. MSRPR and MSRPMM together define a state that the processor (supervisor or user) and the process (marked or unmarked) may be in at any time. If this state matches an individual state specified in the Performance Monitor registers PMLCa n, the state for which monitoring is enabled, counting is enabled. NOTES: 1 These bits are not implemented, will be read as zero, and writes are ignored. 2.4.2 Processor ID Register (PIR) The processor ID for the CPU core is contained in the Processor ID Register (PIR). The contents of the PIR register are a reflection of hardware input signals to the Zen core following reset. This register may be written by software to modify the default reset value. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 39 ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 286; Read/Write; Reset: - bits 24:31 updated to reflect the values on p_cpuid[0:7], bits 0:23 reset to 0 Figure 6. Processor ID Register (PIR) The PIR fields are defined in Table 3. Table 3. PIR field descriptions Bits Name 0:23 ID These bits are reset to 0. These bits are writable by software. 24:31 2.4.3 Description These bit are reset to the values provided on the p_cpuid[0:7] input signals. These bits are writable by software. Processor Version Register (PVR) The Processor Version Register (PVR) contains the processor version number for the CPU core. 1 0 0 0 0 0 0 1 0 1 1 0 Version MBG Use Minor Rev Major Rev MBG ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 287; Read-only Figure 7. Processor Version Register (PVR) This register contains fields to specify a particular implementation of a Zen family member as well as allocating fields to be used by a particular business unit at their discretion. This register is read-only. Interface signals p_pvrin[16:31] provide the contents of a portion of this register. Table 4. PVR field descriptions Bits 0:3 Name Description Manuf. ID These bits identify the Manufacturer ID. Freescale is 4`b1000. 4:5 — 6:11 Type 12:15 Version These bits are reserved (00) These bits identify the processor type. Zen Z7 is 6`b010110. These bits identify the version of the processor and inclusion of optional elements.For e200z759n3, these are tied to 4`b1001. e200z759n3 Core Reference Manual, Rev. 2 40 Freescale Semiconductor Table 4. PVR field descriptions (continued) Bits Name 16:19 Description MBG Use These bits are allocated for use by Freescale Business Groups to distinguish different system variants, and are provided by the p_pvrin[16:19] input signals. 20:23 Minor Rev These bits distinguish between implementations of the version, and are provided by the p_pvrin[20:23] input signals. 24:27 Major Rev These bits distinguish between implementations of the version, and are provided by the p_pvrin[24:27] input signals. 28:31 2.4.4 MBG ID These bits identify the Freescale Business Group responsible for a particular mask set, and are provided by the p_pvrin[28:31] input signals. MBG value of 4`b0000 is reserved. System Version Register (SVR) The System Version Register (SVR) contains system version information for a Zen-based SoC. System Version 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 1023; Read-only Figure 8. System Version Register (SVR) This register is used to specify a particular implementation of a Zen-based system by a particular business unit at their discretion. This register is read-only. Table 5. SVR field descriptions Bits Name 0:31 Version 2.4.5 Description These bits are allocated for use by Freescale Business Groups to distinguish different system variants, and are provided by the p_sysvers[0:31] input signals Integer Exception Register (XER) SO OV CA A complete description of the Integer Exception Register (XER) begins on pg. 51 of Book E: Enhanced PowerPCtm Architecture v0.99.The XER bit assignments are shown in Figure 9. 0 1 2 0 3 4 5 6 7 8 Bytecnt 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 1; Read/Write; Reset - 0x0 Figure 9. Integer Exception Register (XER) The XER fields are defined in Table 6. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 41 . Table 6. XER field descriptions Bits Name Description 0 (32) SO Summary Overflow (per Book E) 1 (33) OV Overflow (per Book E) 2 (34) CA Carry (per Book E) 3:24 (35:56) — Reserved1 Bytecnt2 Preserved for lswi, lswx, stswi, stswx string instructions 25:31 (57:63) NOTES: 1 These bits are not implemented, will be read as zero, and writes are ignored. 2 These bits are implemented to support emulation of the string instructions. 2.4.6 Exception Syndrome Register A complete description of the Exception Syndrome Register (ESR) begins on pg. 142 of Book E: Enhanced PowerPCtm Architecture v0.99. The Exception Syndrome Register (ESR) provides a syndrome to differentiate between exceptions that can generate the same interrupt type. Zen adds some implementation specific bits to this register, as seen in Figure 10 VLEMI SPE 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ILK 0 0 0 MIF 6 PIE 5 BO 4 AP 0 DLK ST 3 FP 2 PTR 1 PIL 0 PPR 0 PUO . 0 SPR - 62; Read/Write; Reset - 0x0 Figure 10. Exception Syndrome Register (ESR) The ESR fields are defined in Table 7. Table 7. ESR field descriptions Bits Name Description Associated interrupt type 0:3 (32:35) — Allocated1 4 (36) PIL Illegal Instruction exception (For e200z759n3, PIL used for all illegal/unimps) Program 5 (37) PPR Privileged Instruction exception Program 6 (38) PTR Trap exception Program — e200z759n3 Core Reference Manual, Rev. 2 42 Freescale Semiconductor Table 7. ESR field descriptions (continued) Bits Name Description 7 (39) FP 8 (40) ST 9 (41) — 10 (42) DLK Data Cache Locking Data Storage 11 (43) ILK Instruction Cache Locking Data Storage 12 (44) AP Auxiliary Processor operation (Currently unused in Zen) Alignment Data Storage Data TLB Program 13 (45) PUO 14 (46) BO Byte Ordering exception Mismatched Instruction Storage exception 15 (47) PIE Program Imprecise exception (Reserved) 16:23 (48:55) — 24 (56) SPE Floating-point operation Alignment Data Storage Data TLB Program Store operation Alignment Data Storage Data TLB Reserved2 Unimplemented Operation exception (Not used by e200z759n3, PIL used for all illegal/unimps) Reserved2 SPE/EFPU APU Operation 25 (57) — Associated interrupt type Allocated1 — Program Data Storage Instruction Storage Currently unused in Zen — SPE/EFPU Unavailable EFPU Floating-point Data Exception EFPU Floating-point Round Exception Alignment Data Storage Data TLB — e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 43 Table 7. ESR field descriptions (continued) Bits Name 26 (58) VLEMI Description VLE Mode Instruction 27:29 (59:61) — 30 (62) MIF 31 (63) — Associated interrupt type SPE/EFPU Unavailable EFPU Floating-point Data Exception EFPU Floating-point Round Exception Data Storage Data TLB Instruction Storage Alignment Program System Call Allocated1 Misaligned Instruction Fetch Allocated1 — Instruction Storage Instruction TLB — NOTES: 1 These bits are not implemented and should be written with zero for future compatibility. 2 These bits are not implemented, and should be written with zero for future compatibility. 2.4.6.1 PowerPC VLE mode instruction syndrome The ESRVLEMI bit is provided to indicate that an interrupt was caused by a PowerPC VLE instruction. This syndrome bit is set on an exception associated with execution or attempted execution of a PowerPC VLE instruction. This bit is updated for the interrupt types indicated in Table 7. 2.4.6.2 Misaligned instruction fetch syndrome The ESRMIF bit is provided to indicate that an Instruction Storage Interrupt was caused by an attempt to fetch an instruction from a BookE page that was not aligned on a word boundary. The fetch may have been caused by execution of a Branch class instruction from a VLE page to a non-VLE page, a Branch to LR instruction with LR[62]=1, a Branch to CTR instruction with CTR[62]=1, execution of an rfi or se_rfi instruction with SRR0[62]=1, execution of an rfci or se_rfci instruction with CSRR0[62]=1, execution of an rfdi or se_rfdi instruction with DSRR0[62]=1, or execution of an rfmci or se_rfmci instruction with MCSRR0[62]=1, where the destination address corresponds to an instruction page that is not marked as a PowerPC VLE page. The ESRMIF bit is also used to indicate that an Instruction TLB Interrupt was caused by a TLB miss on the second half of a misaligned 32-bit PowerPC VLE Instruction. For this case, SRR0 will be pointing to the first half of the instruction, which will reside on the previous page from the miss at page offset 0xFFE. The ITLB handler may need to realize that the miss corresponds to the next page, although MMU MAS2 contents will correctly reflect the page corresponding to the miss. e200z759n3 Core Reference Manual, Rev. 2 44 Freescale Semiconductor 2.4.7 Machine Check Syndrome Register (MCSR) BUS_WRERR BUS_IRERR 0 BUS_DRERR SNPERR 8 G 7 ST 6 IF 5 0 LD IC_LKERR DC_LKERR 4 MAV DC_TPERR 3 MEA IC_TPERR 2 0 NMI EXCP_ERR 1 CP_PERR IC_DPERR 0 DC_DPERR MCP When the core complex takes a machine check interrupt, it updates the Machine Check Syndrome register (MCSR) to differentiate between machine check conditions. The MCSR is shown in Figure 11. 0 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 572; Read/Clear; Reset - 0x0 Figure 11. Machine Check Syndrome Register (MCSR) Table 8 describes MCSR fields. The MCSR indicates the source of a machine check condition. When an “Async Mchk” or “Error Report” syndrome bit in the MCSR is set, the core complex asserts p_mcp_out for system information. Note that the bits in the MCSR are implemented as “write ‘1’ to clear”, so software must write ones into those bit positions it wishes to clear, typically by writing back what was originally read. See Section 7.7.2, Machine Check interrupt (IVOR1), for more details of the MCSR settings. Table 8. MCSR field descriptions Exception type1 Recoverable Machine check input pin Async Mchk Maybe IC_DPERR Instruction Cache data array parity error Async Mchk Precise 2 (34) CP_PERR Data Cache push parity error Async Mchk Unlikely 3 (35) DC_DPERR Data Cache data array parity error Async Mchk Maybe 4 (36) EXCP_ERR ISI, ITLB, or Bus Error on first instruction fetch for an exception handler Async Mchk Precise 5 (37) IC_TPERR Instruction Cache Tag parity error Async Mchk Precise 6 (38) DC_TPERR Data Cache Tag parity error Async Mchk Maybe 7 (39) IC_LKERR Instruction Cache Lock error Indicates a cache control operation or invalidation operation invalidated one or more locked lines in the ICache. Status — 8 (40) DC_LKERR Data Cache Lock error Indicates a cache control operation or instruction invalidation operation invalidated one or more locked lines in the DCache. Status — Bit Name 0 (32) MCP 1 (33) Description e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 45 Table 8. MCSR field descriptions (continued) Description Exception type1 Bit Name 9:10 (41:42) — 11 (43) NMI NMI input pin 12 (44) MAV 13 (45) MEA 14 (46) — Reserved, should be cleared. 15 (47) IF Instruction Fetch Error Report Error Report An error occurred during the attempt to fetch an instruction. MCSRR0 contains the instruction address. Precise 16 (48) LD Load type instruction Error Report An error occurred during the attempt to execute the load type instruction located at the address stored in MCSRR0. Error Report Precise 17 (49) ST Store type instruction Error Report An error occurred during the attempt to execute the store type instruction located at the address stored in MCSRR0. Error Report Precise 18 (50) G Guarded instruction Error Report An error occurred during the attempt to execute the load or store type instruction located at the address stored in MCSRR0 and the access was guarded and encountered an error on the external bus. Error Report Precise 19:25 (51:57) — Reserved, should be cleared. 26 (58) SNPERR 27 (59) BUS_IRERR Read bus error on Instruction fetch or linefill Async Mchk Precise if data used 28 (60) BUS_DRERR Read bus error on data load or linefill Async Mchk Precise if data used 29 (61) BUS_WRERR Write bus error on store or cache line push Async Mchk 30:31 (62:63) — Reserved, should be cleared. Recoverable — NMI — MCAR Address Valid Indicates that the address contained in the MCAR was updated by hardware to correspond to the first detected Async Mchk error condition Status — MCAR holds Effective Address If MAV=1,MEA=1 indicates that the MCAR contains an effective address and MEA=0 indicates that the MCAR contains a physical address Status — — — Snoop Lookup Error Async Mchk An error occurred during certain snoop operations. This is typically due to a data cache tag parity error, in which case DC_TPERR will also be set. Reserved, should be cleared. Unlikely Unlikely — e200z759n3 Core Reference Manual, Rev. 2 46 Freescale Semiconductor NOTES: 1 The Exception Type indicates the exception type associated with a given syndrome bit - “Error Report” indicates that this bit is only set for error report exceptions that cause machine check interrupts. These bits are only updated when the machine check interrupt is actually taken. Error report exceptions are not gated by MSRME. These are synchronous exceptions. These bits will remain set until cleared by software writing a “1” to the bit position(s) to be cleared. - “Status” indicates that this bit is provides additional status information regarding the logging of a machine check exception. These bits will remain set until cleared by software writing a “1” to the bit position(s) to be cleared. - “NMI” indicates that this bit is only set for the non-maskable interrupt type exception that causes a machine check interrupt. This bit is only updated when the machine check interrupt is actually taken. NMI exceptions are not gated by MSRME. This is an asynchronous exception. This bit will remain set until cleared by software writing a “1” to the bit position. - “Async Mchk” indicates that this bit is set for an asynchronous machine check exception. These bits are set immediately upon detection of the error. Once any “Async Mchk” bit is set in the MCSR, a machine check interrupt will occur if MSRME=1. If MSRME=0, the machine check exception will remain pending. These bits will remain set until cleared by software writing a “1” to the bit position(s) to be cleared. 2.4.8 Timer Control Register (TCR) 3 6 7 FPEXT WPEXT 5 FP 4 ARE 2 FIE 1 DIE 0 WIE WP WRC The Timer Control Register (TCR) provides control information for the CPU timer facilities. A complete description of the TCR begins on pg. 182 of Book E: Enhanced PowerPCtm Architecture v0.99. The TCRWRC field functions are defined to be implementation-dependent and are described below. In addition, the Zen core implements two fields not specified in Book E, TCRWPEXT and TCRFPEXT. The TCR is shown in Figure 12. 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 0 0 SPR - 340; Read/Write; Reset - 0x0 Figure 12. Timer Control Register (TCR) The TCR fields are defined in Table 9. Table 9. TCR field descriptions Bits Name 0:1 (32:33) WP Description Watchdog Timer Period When concatenated with WPEXT, specifies one of 64 bit locations of the time base used to signal a watchdog timer exception on a transition from 0 to 1. TCRwpext[0–3],TCRwp[0–1] == 6’b000000 selects TBU[0] TCRwpext[0–3],TCRwp[0–1] == 6’b111111 selects TBL[31] e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 47 Table 9. TCR field descriptions (continued) Bits Name Description 2:3 (34:35) WRC Watchdog Timer Reset Control 00 No Watchdog Timer reset will occur 01 Assert watchdog reset status output 1 (p_wrs[1]) on second time-out of Watchdog Timer 10 Assert watchdog reset status output 0 (p_wrs[0]) on second time-out of Watchdog Timer 11 Assert watchdog reset status outputs 0 and 1 (p_wrs[0], p_wrs[1]) on second time-out of Watchdog Timer TCRWRC resets to 0b00. This field may be set by software, but cannot be cleared by software (except by a software-induced reset). Once written to a non-zero value, this field may no longer be altered by software. 4 (36) WIE Watchdog Timer Interrupt Enable 5 (37) DIE Decrementer Interrupt Enable FP Fixed-Interval Timer Period - When concatenated with FPEXT, specifies one of 64 bit locations of the time base used to signal a fixed-interval timer exception on a transition from 0 to 1. TCRfpext[0–3],TCRfp[0–1] == 6’b000000 selects TBU[0] TCRfpext[0–3],TCRfp[0–1] == 6’b111111 selects TBL[31] 8 (40) FIE Fixed-Interval Timer Interrupt Enable 9 (41) ARE Auto-reload Enable 10 (42) — 6:7 (38:39) 11:14 (43:46) Reserved1 WPEXT Watchdog Timer Period Extension (see above description for WP) These bits get prepended to the TCRWP bits to allow selection of the one of the 64 Time Base bits used to signal a Watchdog Timer exception. tb0:63 TBU0:31 || TBL0:31 wp TCRWPEXT || TCRWP tb_wp_bit tbwp 15:18 (47:50) FPEXT Fixed-Interval Timer Period Extension (see above description for FP) These bits get prepended to the TCRFP bits to allow selection of the one of the 64 Time Base bits used to signal a Fixed-Interval Timer exception. tb0:63 TBU0:31 || TBL0:31 fp TCRFPEXT || TCRFP tb_fp_bit tbfp 19:31 (51:63) — Reserved1 NOTES: 1 These bits are not implemented and should be written with zero for future compatibility. 2.4.9 Timer Status Register (TSR) The Timer Status Register (TSR) provides status information for the CPU timer facilities. A complete description of the TSR begins on pg. 184 of Book E: Enhanced PowerPCtm Architecture v0.99. The e200z759n3 Core Reference Manual, Rev. 2 48 Freescale Semiconductor 2 3 FIS 1 DIS WIS 0 WRS ENW TSRWRS field is defined to be implementation-dependent and is described below. The TSR is shown in Figure 13. 4 5 0 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 336; Read/Clear; Reset - 0x0 Figure 13. Timer Status Register (TSR) The TSR fields are defined in Table 10. Table 10. TSR field descriptions Bits Name Description 0 (32) ENW Enable Next Watchdog 1 (33) WIS Watchdog timer interrupt status 2:3 (34:35) WRS Watchdog timer reset status 00 No second time-out of Watchdog Timer has occurred. 01 Assertion of watchdog reset status output 1 (p_wrs[1]) on second time-out of Watchdog Timer has occurred. 10 Assertion of watchdog reset status output 0 (p_wrs[0]) on second time-out of Watchdog Timer has occurred. 11 Assertion of watchdog reset status outputs 0 and 1 (p_wrs[0], p_wrs[1]) on second time-out of Watchdog Timer has occurred. 4 (36) DIS Decrementer interrupt status 5 (37) FIS Fixed-Interval Timer interrupt status 6:31 (38:63) — Reserved1 NOTES: 1 These bits are not implemented and should be written with zero for future compatibility. NOTE The Timer Status Register can be read using mfspr RT,TSR. The Timer Status Register cannot be directly written to. Instead, bits in the Timer Status Register corresponding to 1 bits in GPR(RS) can be cleared using mtspr TSR,RS. 2.4.10 Debug registers The Debug facility registers are described in Chapter 12, Debug Support. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 49 2.4.11 Hardware Implementation Dependent Register 0 (HID0) 1 2 3 4 5 6 7 NOPTI DAPUEN MCCLRDE CICLRDE DCLREE DCLRCE SEL_TBCLK SLEEP 0 TBEN 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ICR 8 0 NHR NAP 0 0 DOZE EMCP The HID0 register is a Zen implementation dependent register used for various configuration and control functions.The HID0 register is shown in Figure 14. 0 SPR - 1008; Read/Write; Reset - 0x0 Figure 14. Hardware Implementation Dependent Register 0 (HID0) The HID0 fields are defined in Table 11. Table 11. HID0 fiels descriptions Bits Name Description 0 [32] EMCP 1:7 [33:39] — 8 [40] DOZE Configure for Doze power management mode 0 Doze mode is disabled. 1 Doze mode is enabled. Doze mode is invoked by setting MSRWE while this bit is set. 9 [41] NAP Configure for Nap power management mode 0 Nap mode is disabled. 1 Nap mode is enabled. Nap mode is invoked by setting MSRWE while this bit is set. 10 [42] SLEEP 11:13 [43:45] — 14 [46] ICR Enable machine check pin (p_mcp_b) 0 p_mcp_b pin is disabled. 1 p_mcp_b pin is enabled. Asserting p_mcp_b causes a machine check interrupt to be reported. Reserved1 Configure for Sleep power management mode 0 Sleep mode is disabled. 1 Sleep mode is enabled. Sleep mode is invoked by setting MSRWE while this bit is set. Only one of DOZE, NAP, or SLEEP should be set for proper operation. Reserved1 Interrupt Inputs Clear Reservation 0 External Input, Critical Input, and Non-Maskable Interrupts do not affect reservation status. 1 External Input, Critical Input, and Non-Maskable Interrupts clear an outstanding reservation. e200z759n3 Core Reference Manual, Rev. 2 50 Freescale Semiconductor Table 11. HID0 fiels descriptions (continued) Bits Name Description 15 [47] NHR Not hardware reset 0 Indicates to a reset exception handler that a reset occurred if software had previously set this bit. 1 Indicates to a reset exception handler that no reset occurred if software had previously set this bit. Provided for software useset anytime by software, cleared by reset. 16 [48] — 17 [49] TBEN 18 [50] Reserved1 TimeBase Enable 0 TimeBase is disabled. 1 TimeBase is enabled. SEL_TBCLK Select TimeBase Clock 0 TimeBase is based on processor clock. 1 TimeBase is based on p_tbclk input. This bit controls the clock source for the TimeBase. Altering this bit must be done while the time base is disabled to preclude glitching of the counter. Timer interrupts should be disabled prior to alteration, and the TBL and TBU registers re-initialized following a change of TimeBase clock source. 19 [51] DCLREE Debug Interrupt Clears MSREE 0 MSREEunaffected by Debug Interrupt. 1 MSREE cleared by Debug Interrupt. This bit controls whether Debug interrupts force External Input interrupts to be disabled, or whether they remain unaffected. 20 [52] DCLRCE Debug Interrupt Clears MSRCE 0 MSRCE unaffected by Debug Interrupt. 1 MSRCE cleared by Debug Interrupt. This bit controls whether Debug interrupts force Critical interrupts to be disabled, or whether they remain unaffected. 21 [53] CICLRDE Critical Interrupt Clears MSRDE 0 MSRDE unaffected by Critical class interrupt. 1 MSRDE cleared by Critical class interrupt. This bit controls whether certain Critical interrupts (Critical Input, Watchdog Timer) force Debug interrupts to be disabled, or whether they remain unaffected. Machine Check interrupts have a separate control bit. If Critical Interrupt Debug events are enabled (DBCR0CIRPT set, which should only be done when the Debug APU is enabled), and MSRDE is set at the time of a (Critical Input, Watchdog Timer) Critical interrupt, a debug event will be generated after the Critical Interrupt Handler has been fetched, and the Debug handler will be executed first. In this case, DSRR0DE will have been cleared, such that after returning from the debug handler, the Critical interrupt handler will not be run with MSRDE enabled. 22 [54] MCCLRDE Machine Check Interrupt Clears MSRDE 0 MSRDE unaffected by Machine Check interrupt. 1 MSRDE cleared by Machine Check interrupt. This bit controls whether Machine Check interrupts force Debug interrupts to be disabled, or whether they remain unaffected. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 51 Table 11. HID0 fiels descriptions (continued) Bits Name Description 23 [55] DAPUEN Debug APU enable 0 Debug APU disabled. 1 Debug APU enabled. This bit controls whether the Debug APU is enabled. When enabled, Debug interrupts use the DSRR0/DSRR1 registers for saving state, and the rfdi instruction is available for returning from a debug interrupt. When disabled, Debug Interrupts use the critical interrupt resources CSRR0/CSRR1 for saving state, the rfci instruction is used for returning from a debug interrupt, and the rfdi instruction is treated as an illegal instruction. When disabled, the settings of the DCLREE, DCLRCE, CICLRDE, and MCCLRDE bits are ignored and are assumed to be ‘1’s Read and write access to DSRR0/DSRR1 via the mfspr and mtspr instructions is not affected by this bit. 24 [56] — Reserved1 25:30 [58:62] — Reserved1 31 [63] NOPTI No-op Touch Instructions 0 icbt, dcbt, dcbtst instructions operate normally. 1 icbt, dcbt, dcbtst instructions are no-oped. This bit only affects the icbt, dcbt, and dcbtst instructions. NOTES: 1 These bits are not implemented and should be written with zero for future compatibility. 2.4.12 Hardware Implementation Dependent Register 1 (HID1) 0 0 1 2 3 4 5 6 7 8 ATS SYSCTL The HID1 register is used for bus configuration and system control. HID1 is shown in Figure 15. 0 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 1009; Read/Write; Reset - 0x0 Figure 15. Hardware Implementation Dependent Register 1 (HID1) The HID1 fields are defined in Table 12. e200z759n3 Core Reference Manual, Rev. 2 52 Freescale Semiconductor Table 12. HID 1 field descriptions Bits Name 0:15 [32:47] — 16:23 [48:56] SYSCTL 24 [56] ATS 25:31 [57:63] — Description Reserved1 System Control These bits are reflected on the outputs of the p_hid1_sysctl[0:7] output signals for use in controlling the system. They may need external synchronization. Atomic status (read-only) Indicates state of the reservation bit in the load/store unit. See Section 3.5, Memory synchronization and reservation instructions for more detail. Reserved1 NOTES: 1 These bits are not implemented and should be written with zero for future compatibility. 2.4.13 Branch Unit Control and Status Register (BUCSR) 0 1 2 3 4 5 6 7 8 0 BPEN 0 BPRED 0 BALLOC BBFI The BUCSR register is used for general control and status of the branch target buffer (BTB). BUCSR is shown in Figure 16. 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 1013; Read/Write; Reset - 0x0 Figure 16. Branch Unit Control and Status Register (BUCSR) The BUCSR fields are defined in Table 13. Table 13. BUCSR field descriptions Bits Name 0:21 [32:53] — 22 [54] BBFI 23:25 [55:57] — Description Reserved1 Branch target buffer flash invalidate. When written to a ‘1’, BBFI flash clears the valid bit of all entries in the branch buffer; clearing occurs regardless of the value of the enable bit (BPEN). BBFI is always read as 0. Reserved1 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 53 Table 13. BUCSR field descriptions (continued) Bits 26:27 [58:59] Name Description BALLOC Branch Target Buffer Allocation Control 00 - Branch Target Buffer allocation for all branches is enabled. 01 - Branch Target Buffer allocation is disabled for backward branches. 10 - Branch Target Buffer allocation is disabled for forward branches. 11 - Branch Target Buffer allocation is disabled for both branch directions. This field controls BTB allocation for branch acceleration when BPEN = 1. BTB hits are not affected by the settings of this field. For branches with “AA’ = ‘1’, the MSB of the displacement field is still used to indicate forward/backward, even though the branch is absolute. 28 [60] — 29:30 [61:62] BPRED 31 [63] BPEN Reserved1 Branch Prediction Control (Static) 00 - Branch predicted taken on BTB miss for all branches. 01 - Branch predicted taken on BTB miss only for forward branches. 10 - Branch predicted taken on BTB miss only for backward branches. 11 - Branch predicted not taken on BTB miss for both branch directions. This field controls operation of static prediction mechanism on a BTB miss. Unless disabled, fetching of the predicted target location will be performed for branch acceleration. BPRED operates independently of BPEN, and with a BPEN setting of 0, will be used to perform static prediction of all unresolved branches. BTB hits are not affected by the settings of this field. For certain applications, setting BPRED to a non-default value may result in improved performance. Branch target buffer prediction enable. 0 Branch target buffer prediction disabled 1 Branch target buffer prediction enabled (enables BTB to predict branches) When the BPEN bit is cleared, no hits will be generated from the BTB, and no new entries will be allocated. Entries are not automatically invalidated when BPEN is cleared; the BBFI bit controls entry invalidation. BPEN operates independently of BPRED, and will be used even with a BPRED setting of 00. NOTES: 1 These bits are not implemented and should be written with zero for future compatibility. 2.4.14 L1 Cache Control and Status Registers (L1CSR0, L1CSR1) The L1CSR0 and L1CSR1 registers are used for general control and status of the L1 caches. A description of the L1CSR0 and L1CSR1 registers can be found in Chapter 11, L1 Cache. 2.4.15 L1 Cache Configuration registers (L1CFG0, L1CFG1) The L1CFG0 and L1CGF1 registers provide configuration information for the L1 caches supplied with this version of the Zen CPU core. A description of the L1CFG0 and L1CGF1 registers can be found in Chapter 11, L1 Cache. e200z759n3 Core Reference Manual, Rev. 2 54 Freescale Semiconductor 2.4.16 L1 Cache Flush and Invalidate registers (L1FINV0, L1FINV1) The L1FINV0 and L1FINV1 registers provide software-based flush and invalidation control for the L1 caches supplied with this version of the Zen CPU core. A description of the L1FINV0 and L1FINV1 registers can be found in Chapter 11, L1 Cache. 2.4.17 MMU Control and Status Register (MMUCSR0) The MMUCSR0 register is used for general control of the MMU. A description of the MMUCSR register can be found in Chapter 10, Memory Management Unit. 2.4.18 MMU Configuration register (MMUCFG) The MMUCFG register provides configuration information for the MMU supplied with this version of the Zen CPU core. A description of the MMUCFG register can be found in Chapter 10, Memory Management Unit. 2.4.19 TLB Configuration registers (TLB0CFG, TLB1CFG) The TLB0CFG and TLB1CFG registers provide configuration information for the MMU TLBs supplied with this version of the Zen CPU core. A description of these registers can be found in Chapter 10, Memory Management Unit. 2.5 SPR register access SPRs are accessed with the mfspr and mtspr instructions. The following sections outline additional access requirements. 2.5.1 Invalid SPR references System behavior when an invalid SPR is referenced depends on the apparent privilege level of the register. The register privilege level is determined by bit 5 in the SPR address. If the invalid SPR is accessible in user mode, then an illegal exception is generated. If the invalid SPR is accessible only in supervisor mode and the CPU core is in supervisor mode (MSRPR = 0), then an illegal exception is generated. If the invalid SPR address is accessible only in supervisor mode and the CPU is not in supervisor mode (MSRPR = 1), then a privilege exception is generated. NOTE Writes to read-only SPRs and reads of write-only SPRs are treated as invalid SPR references. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 55 Table 14. System response to invalid SPR reference 2.5.2 SPR address bit 5 Mode MSRPR Response 0 — — Illegal exception 1 Supervisor 0 Illegal exception 1 user 1 Privilege exception Synchronization requirements for SPRs With the exception of the following registers, there are no synchronization requirements for accessing SPRs beyond those stated in PowerPC Book E. A complete description of Synchronization requirements are contained in Chapter 11 of Book E: Enhanced PowerPCtm Architecture v0.99 beginning on page 219. Software requirements for synchronization before/after accessing these registers are shown in Table 15. The notation CSI in the table refers to a Context Synchronizing instruction, which includes sc, isync, rfi, rfci, and rfdi. Table 15. Additional synchronization requirements for SPRs Context altering event or instruction Required before Required after Notes mtmsr[UCLE] none CSI — mtmsr[SPE] none CSI — mtmsr[PMM] none CSI — Debug Counter register msync none 1 DBSR Debug Status register msync none — HID0 Hardware implementation dependent reg 0 none none — HID1 Hardware implementation dependent reg 1 msync none — L1CSR0, L1CSR1 L1 cache control and status registers 0,1 msync none — L1FINV0, L1FINV1 L1 cache flush and invalidate control registers 0,1 msync none — MMUCSR MMU control and status register 0 CSI none — mfspr DBCNT mtspr BUCSR Branch Unit Control and Status Register none CSI — DBCNT Debug Counter register none CSI 1 Debug Control Register 0-6 none CSI — msync none — DBCR0-6 DBSR Debug Status Register HID0 Hardware implementation dependent reg 0 CSI isync — HID1 Hardware implementation dependent reg 1 msync, isync CSI — L1CSR0 L1 cache control and status register 0 msync, isync CSI — L1CSR1 L1 cache control and status registers 1 none CSI — e200z759n3 Core Reference Manual, Rev. 2 56 Freescale Semiconductor Table 15. Additional synchronization requirements for SPRs (continued) Required before Required after Notes msync CSI — MMU MAS registers none CSI — MMU control and status register 0 CSI CSI — PID0 register none CSI — none CSI2 — Context altering event or instruction L1FINV0, L1FINV1 MASx MMUCSR PID SPEFSCR L1 cache flush and invalidate control registers 0,1 SPEFSCR register Notes: 1. Not required if counter is not currently enabled 2. Not required for status bit clearing, required for altering exception enable or rounding mode bits 2.5.3 Special purpose register summary PowerPC Book E and implementation-specific SPRs for the Zen core are listed in the following table. All registers are 32-bits in size. Register bits are numbered from bit 0 to bit 31 (most-significant to least-significant). Shaded entries represent optional registers. An SPR register may be read or written with the mfspr and mtspr instructions. In the instruction syntax, compilers should recognize the mnemonic name given in the table below. Table 16. Special purpose registers Mnemonic Name SPR number Access Privileged Zenspecific 1013 R/W Yes Yes BUCSR Branch Unit Control and Status Register CSRR0 Critical Save/Restore Register 0 58 R/W Yes No CSRR1 Critical Save/Restore Register 1 59 R/W Yes No CTR Count Register 9 R/W No No DAC1 Data Address Compare 1 316 R/W Yes No DAC2 Data Address Compare 2 317 R/W Yes No DBCNT Debug Counter register 562 R/W Yes Yes DBCR0 Debug Control Register 0 308 R/W Yes No DBCR1 Debug Control Register 1 309 R/W Yes No DBCR2 Debug Control Register 2 310 R/W Yes No DBCR3 Debug control register 3 561 R/W Yes Yes DBCR4 Debug control register 4 563 R/W Yes Yes DBCR5 Debug control register 5 564 R/W Yes Yes DBCR6 Debug control register 5 603 R/W Yes Yes Debug external resource control register 0 569 Read-only Yes Yes Debug Status Register 304 Read/Clear1 Yes No DBERC0 DBSR e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 57 Table 16. Special purpose registers (continued) Mnemonic Name SPR number Access Privileged Zenspecific DDAM Debug Data Acquisition Messaging register 576 R/W No Yes DEAR Data Exception Address Register 61 R/W Yes No Decrementer 22 R/W Yes No DECAR Decrementer Auto-Reload 54 R/W Yes No DEVENT Debug Event register 975 R/W No Yes DSRR0 Debug save/restore register 0 574 R/W Yes Yes DSRR1 Debug save/restore register 1 575 R/W Yes Yes DVC1 Data Value Compare 1 318 R/W Yes No DVC2 Data Value Compare 2 319 R/W Yes No ESR Exception Syndrome Register 62 R/W Yes No HID0 Hardware implementation dependent reg 0 1008 R/W Yes Yes HID1 Hardware implementation dependent reg 1 1009 R/W Yes Yes IAC1 Instruction Address Compare 1 312 R/W Yes No IAC2 Instruction Address Compare 2 313 R/W Yes No IAC3 Instruction Address Compare 3 314 R/W Yes No IAC4 Instruction Address Compare 4 315 R/W Yes No IAC5 Instruction Address Compare 5 565 R/W Yes Yes IAC6 Instruction Address Compare 6 566 R/W Yes Yes IAC7 Instruction Address Compare 7 567 R/W Yes Yes IAC8 Instruction Address Compare 8 568 R/W Yes Yes IVOR0 Interrupt Vector Offset Register 0 400 R/W Yes No IVOR1 Interrupt Vector Offset Register 1 401 R/W Yes No IVOR2 Interrupt Vector Offset Register 2 402 R/W Yes No IVOR3 Interrupt Vector Offset Register 3 403 R/W Yes No IVOR4 Interrupt Vector Offset Register 4 404 R/W Yes No IVOR5 Interrupt Vector Offset Register 5 405 R/W Yes No IVOR6 Interrupt Vector Offset Register 6 406 R/W Yes No IVOR7 Interrupt Vector Offset Register 7 407 R/W Yes No IVOR8 Interrupt Vector Offset Register 8 408 R/W Yes No IVOR9 Interrupt Vector Offset Register 9 409 R/W Yes No IVOR10 Interrupt Vector Offset Register 10 410 R/W Yes No IVOR11 Interrupt Vector Offset Register 11 411 R/W Yes No IVOR12 Interrupt Vector Offset Register 12 412 R/W Yes No IVOR13 Interrupt Vector Offset Register 13 413 R/W Yes No DEC e200z759n3 Core Reference Manual, Rev. 2 58 Freescale Semiconductor Table 16. Special purpose registers (continued) Mnemonic Name SPR number Access Privileged Zenspecific IVOR14 Interrupt Vector Offset Register 14 414 R/W Yes No IVOR15 Interrupt Vector Offset Register 15 415 R/W Yes No IVOR32 Interrupt vector offset register 32 528 R/W Yes Yes IVOR33 Interrupt vector offset register 33 529 R/W Yes Yes IVOR34 Interrupt vector offset register 34 530 R/W Yes Yes IVOR35 Interrupt vector offset register 35 531 R/W Yes Yes Interrupt Vector Prefix Register 63 R/W Yes No Link Register 8 R/W No No IVPR LR L1CFG0 L1 cache config register 0 515 Read-only No Yes L1CFG1 L1 cache config register 1 516 Read-only No Yes L1CSR0 L1 cache control and status register 0 1010 R/W Yes Yes L1CSR1 L1 cache control and status register 1 1011 R/W Yes Yes L1FINV0 L1 cache flush and invalidate control register 0 1016 R/W Yes Yes L1FINV1 L1 cache flush and invalidate control register 0 959 R/W Yes Yes MAS0 MMU assist register 0 624 R/W Yes Yes MAS1 MMU assist register 1 625 R/W Yes Yes MAS2 MMU assist register 2 626 R/W Yes Yes MAS3 MMU assist register 3 627 R/W Yes Yes MAS4 MMU assist register 4 628 R/W Yes Yes MAS6 MMU assist register 6 630 R/W Yes Yes MCAR Machine Check Address Register 573 R/W Yes Yes Yes Yes MCSR 2 Machine Check Syndrome Register 572 R/Clear MCSRR0 Machine Check Save/Restore Register 0 570 R/W Yes Yes MCSRR1 Machine Check Save/Restore Register 1 571 R/W Yes Yes MMUCFG MMU configuration register 1015 Read-only Yes Yes MMUCSR MMU control and status register 0 1012 R/W Yes Yes PID0 Process ID Register 48 R/W Yes No PIR Processor ID Register 286 R/W Yes No PVR Processor Version Register 287 Read-only Yes No SPE APU status and control register 512 R/W No No SPRG0 SPR General 0 272 R/W Yes No SPRG1 SPR General 1 273 R/W Yes No SPRG2 SPR General 2 274 R/W Yes No SPRG3 SPR General 3 275 R/W Yes No SPEFSCR e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 59 Table 16. Special purpose registers (continued) Mnemonic SPRG4 SPRG5 SPRG6 SPRG7 Name SPR General 4 SPR General 5 SPR General 6 SPR General 7 SPR number Access Privileged Zenspecific 260 Read-only No No 276 R/W Yes No 261 Read-only No No 277 R/W Yes No 262 Read-only No No 278 R/W Yes No 263 Read-only No No 279 R/W Yes No SPRG8 SPR General 8 604 R/W Yes Yes SPRG9 SPR General 9 605 R/W Yes Yes SRR0 Save/Restore Register 0 26 R/W Yes No SRR1 Save/Restore Register 1 27 R/W Yes No SVR System Version Register 1023 Read-only Yes Yes TBL Time Base Lower 268 Read-only No No 284 Write-only Yes No 269 Read-only No No 285 Write-only Yes No Timer Control Register 340 R/W Yes No TLB0CFG TLB0 configuration register 688 Read-only Yes Yes TLB1CFG TLB1 configuration register 689 Read-only Yes Yes TSR Timer Status Register 336 Read/Clear3 Yes No USPRG0 User SPR General 0 256 R/W No No 1 R/W No No TBU TCR XER Time Base Upper Integer Exception Register NOTES: 1 The Debug Status Register can be read using mfspr RT,DBSR. The Debug Status Register cannot be directly written to. Instead, bits in the Debug Status Register corresponding to ‘1’ bits in GPR(RS) can be cleared using mtspr DBSR,RS. 2 The Machine Check Syndrome Register can be read using mfspr RT,MCSR. The Machine Check Syndrome Register cannot be directly written to. Instead, bits in the Machine Check Syndrome Register corresponding to ‘1’ bits in GPR(RS) can be cleared using mtspr MCSR,RS. 3 The Timer Status Register can be read using mfspr RT,TSR. The Timer Status Register cannot be directly written to. Instead, bits in the Timer Status Register corresponding to ‘1’ bits in GPR(RS) can be cleared using mtspr TSR,RS. 2.6 Reset settings Table 17 shows the state of the PowerPC Book E architected registers and other optional resources immediately following a system reset. e200z759n3 Core Reference Manual, Rev. 2 60 Freescale Semiconductor Table 17. Reset settings for Zen resources Resource System reset setting Program Counter p_rstbase[0:29] || 2’b00 GPRs Unaffected1 CR Unaffected1 BUCSR 0x0000_0000 CSRR0 Unaffected1 CSRR1 Unaffected1 CTR Unaffected1 DAC1 0x0000_00002 DAC2 0x0000_00002 DBCNT Unaffected1 DBCR0 0x0000_00002 DBCR1 0x0000_00002 DBCR2 0x0000_00002 DBCR3 0x0000_00002 DBCR4 0x0000_00002 DBCR5 0x0000_00002 DBCR6 0x0000_00002 DBSR 0x1000_00002 DDAM 0x0000_00002 DEAR Unaffected1 DEC Unaffected1 DECAR Unaffected1 DEVENT 0x0000_00002 DSRR0 Unaffected1 DSRR1 Unaffected1 DVC1 Unaffected1 DVC2 Unaffected1 ESR 0x0000_0000 HID0 0x0000_0000 HID1 0x0000_0000 IAC1 0x0000_00002 IAC2 0x0000_00002 IAC3 0x0000_00002 IAC4 0x0000_00002 IAC5 0x0000_00002 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 61 Table 17. Reset settings for Zen resources (continued) Resource System reset setting IAC6 0x0000_00002 IAC7 0x0000_00002 IAC8 0x0000_00002 IVORxx Unaffected1 IVPR Unaffected1 LR Unaffected1 L1CFG0, L1CFG13 — L1CSR0, 1 0x0000_0000 L1FINV0, 1 0x0000_0000 MAS0 Unaffected1 MAS1 Unaffected1 MAS2 Unaffected1 MAS3 Unaffected1 MAS4 Unaffected1 MAS6 Unaffected1 MCAR Unaffected1 MCSR 0x0000_0000 MCSRR0 Unaffected1 MCSRR1 Unaffected1 MMUCFG3 — MSR 0x0000_0000 PID0 0x0000_0000 PIR 0x0000_00 || p_cpuid[0:7] PVR3 — SPEFSCR 0x0000_0000 SPRG0 Unaffected1 SPRG1 Unaffected1 SPRG2 Unaffected1 SPRG3 Unaffected1 SPRG4 Unaffected1 SPRG5 Unaffected1 SPRG6 Unaffected1 SPRG7 Unaffected1 SPRG8 Unaffected1 SPRG9 Unaffected1 e200z759n3 Core Reference Manual, Rev. 2 62 Freescale Semiconductor Table 17. Reset settings for Zen resources (continued) Resource System reset setting SRR0 Unaffected1 SRR1 Unaffected1 SVR3 — TBL Unaffected1 TBU Unaffected1 TCR 0x0000_0000 TSR 0x0000_0000 TLB0CFG3 — 3 TLB1CFG — USPRG0 Unaffected1 XER 0x0000_0000 NOTES: 1 Undefined on m_por assertion, unchanged on p_reset_b assertion 2 Reset by processor reset p_reset_b if DBCR0[EDM]=0, as well as unconditionally by m_por. 3 Read-only registers e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 63 e200z759n3 Core Reference Manual, Rev. 2 64 Freescale Semiconductor Chapter 3 Instruction Model This chapter provides additional information about the Book E Power Architecture architecture as it relates specifically to e200z759n3. The e200z759n3 is a 32-bit implementation of the Book E Power Architecture architecture as defined in Book E: Enhanced PowerPCtm Architecture. This architecture specification includes a recognition that different processor implementations may require clarifications, extensions or deviations from the architectural descriptions. The PowerPC Book E instruction set is described in Chapter 12 “Instruction Set” of Book E: Enhanced PowerPCtm Architecture v0.99 beginning on page 223. 3.1 Unsupported instructions and instruction forms Because e200z759n3 is a 32-bit PowerPC Book E core, all of the instructions defined for 64-bit implementations of the PowerPC Book E architecture are illegal on Zen. See Appendix A of Book E: Enhanced PowerPCtm Architecture for more information on 64-bit instructions. Zen takes an illegal instruction exception type program interrupt upon encountering a 64-bit PowerPC Book E instruction. The e200z759n3 core does not support the instructions listed in Table 18. An illegal instruction exception is generated if the processor attempts to execute one of these instructions. Table 18. List of unsupported instructions Type / name Mnemonics String Instructions lswi, lswx, stswi, stswx Floating Point Instructions fxxxx, lfxxx, sfxxxx, mcrfs, mffs, mtfxxx Device control register and Move from APID 3.2 mfapidi, mfdcrx, mtdcrx Implementation-specific instructions Several PowerPC Book E defined instructions are implementation-specific. Table 19 summarizes the Zen implementation-specific instructions. Table 19. Implementation-specific instruction summary Mnemonic Implementation Details mfapidi mfdcrx unimplemented instructions (treated as illegal on e200z759n3) mtdcrx stbcx., sthcx., stwcx. mfdcr, mtdcr1 address match with prior lbarx, lharx, or lwarx not req’d for store to be performed optionally supported instructions NOTES: 1 The Zen CPU will take an illegal instruction exception for unsupported DCR values e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 65 3.3 Book E instruction extensions This section describes the various extensions to Book E instructions to support the PowerPC VLE APU. rfci, rfdi, rfi, rfmci - no longer mask bit 62 of CSRR0, DSRR0, or SRR0 respectively. The destination address is [D,C, MC]SRR0[32:62] || 0b0. bclr, bclrl, bcctr, bcctrl - no longer mask bit 62 of the LR or CTR respectively. The destination address is [LR,CTR][32:62] || 0b0. 3.4 Memory access alignment support The Zen core provides hardware support for unaligned memory accesses; however, there is a performance degradation for accesses that cross a 64-bit (8 byte) boundary. For loads that hit in the cache, the throughput of the load/store unit is degraded to 1 misaligned load every 2 cycles. Stores that are misaligned across a 64-bit (8 byte) boundary can be translated at a rate of 2 cycles per store. Frequent use of unaligned memory accesses is discouraged because of the impact on performance. NOTE Accesses that cross a translation boundary may be restarted. A misaligned access that crosses a page boundary is restarted in its entirety in the event of a TLB miss of the second portion of the access. This may result in the first portion being accessed twice. Accesses that cross a translation boundary where the endianness changes cause a byte ordering DSI exception. 3.5 Memory synchronization and reservation instructions The msync instruction provides a synchronization function and a memory barrier function. This instruction waits for all preceding instructions and data memory accesses to complete before the msync instruction completes. Subsequent instructions in the instruction stream are not initiated until after the msync instruction completes to ensure these functions have been performed. In addition, the msync instruction, and the mbar w/MO=0, or 1 instructions handshake with the system to ensure that all accesses initiated by this CPU have been “performed” with respect to all other processors and mechanisms prior to completion of the instruction. Refer to Section 14.2.10, Memory synchronization control signals for further detail on the hardware handshake sequence. On the Zen core, the mbar instruction with MO=0 or 1 behaves similarly to the msync instruction, but only waits for previous data memory accesses rather than all previous instructions to complete before completing. The mbar instruction with MO= 2 behaves similarly to the msync instruction, but only waits for previous data memory accesses rather than all previous instructions to complete before completing, and does not signal synchronizations to other processors through the synchronization port. The mbar instruction may be preferred for most memory synchronization operations, since it does not stall instruction execution if no load or store operations remain in the execution pipeline, unlike the msync instruction. The mbar instruction with the MO field not equal to 0, 1, or 2 is treated as illegal by the Zen core. e200z759n3 Core Reference Manual, Rev. 2 66 Freescale Semiconductor The Zen core implements the lwarx and stwcx. instructions as described in Book E, as well as the lharx, lbarx, sthcx., and stbcx. instructions defined by the EIS Enhanced Reservation APU. If the EA is not a multiple of the access size for these instructions, an alignment interrupt is invoked. Zen allows reservation instructions to access a page that is marked as write-through required or cache-inhibited, and no data storage interrupt is invoked. As allowed by PowerPC Book E, the Zen core does not require that for a reservation store-type instruction to succeed, the EA of the store-type instruction must be to the same reservation granule as the EA of a preceding reservation load-type instruction. Reservation granularity is implementation-dependent. The Zen core does not define a reservation granule explicitly; reservation granularity is defined by external logic. When no external logic is provided, the Zen core performs no address comparison checking, thus the effective implementation granularity is “null”. The Zen core implements an internal status flag (HID1ATS) representing reservation status. This flag is set when a load-type reservation instruction is executed and completes without error, and remains set until it is cleared by one of the following mechanisms: 1. Execution of a store-type reservation instruction is completed without error, or 2. The Zen core p_rsrv_clr input signal is asserted, or 3. The reservation is invalidated when an external input, critical input, or non-maskable interrupt is signaled and the HID0ICR bit is set. When the Zen core decodes a store-type reservation instruction, it checks the value of the local reservation flag (HID1[ATS]). If the status indicates that no reservation is active, then the store-type reservation instruction is treated as a nop. No exceptions will be taken, and no access is performed, thus no data breakpoint will occur, regardless of matching the data breakpoint attributes. The Zen core treats reservation accesses as though they were both cache inhibited and guarded, regardless of storage attributes. A hit to a cache line corresponding to the address of a reservation access will be flushed to memory if dirty, prior to the reservation access being issued to the bus. The access will be performed externally, regardless of a cache hit. This is done to allow external reservation logic to be built that properly signals a reservation failure. The Zen core provides the input signal p_xfail_b, which is sampled at termination of a st[b,h,w]cx. store transfer to allow an external agent or mechanism to indicate that the st[b,h,w]cx. instruction has failed to update memory, even though a reservation existed for the store at the time it was issued. This is not considered an error, and will cause the condition codes for the st[b,h,w]cx. instruction to be written as if a reservation did not exist for the st[b,h,w]cx. instruction. In addition, any outstanding reservation will be cleared. The p_rsrv_clr input signal is not intended for normal use in managing reservations. It is provided for specialized system applications. The normal bus protocol is used to manage reservations using external reservation logic in systems with multiple coherent bus masters, using the transfer type and transfer response signals. In single coherent master systems, no external logic is required, and the internal reservation flag is sufficient to support multi-tasking applications. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 67 3.6 Branch prediction The e200z759n3 instruction fetching mechanism uses a branch target buffer (BTB) that holds branch target addresses combined with a 2-bit saturating up-down counter scheme for branch prediction. Branch paths are predicted by either the branch target buffer (BTB hit) or a selectable static prediction algorithm (BTB miss) and subsequently checked to see if the prediction was correct. This enables operation beyond a conditional branch without waiting for the branch to be decoded and resolved. The instruction fetch unit predicts the direction of the branch as follows: • Predict taken for any backward branch whose fetch address hits in the BTB and is predicted taken by the counter or misses in the BTB and static prediction control in BUCSR for backward branches indicates “predict taken”. Otherwise predict not-taken. • Predict taken for any forward branch whose fetch address hits in the BTB and is predicted taken by the counter or misses in the BTB and static prediction control in BUCSR for forward branches indicates “predict taken”. Otherwise predict not-taken. 3.7 Interruption of instructions by interrupt requests In general, the e200z759n3 core samples pending non-maskable interrupts, external input, and critical input interrupt requests at instruction boundaries. However, in order to reduce interrupt latency, long running instructions may be interrupted prior to completion. Instructions in this class include divides (divw[uo][.], efsdiv, evfsdiv, evdivw[su]), load multiple word (lmw, e_lmw), and store multiple word (stmw, e_stmw). In addition, the e_lmvgprw, e_stmvgprw, e_lmvsprw, and e_stmvsprw Volatile Context Save/Restore APU instructions may also be interrupted prior to completion. When interrupted prior to completion, the value saved in SRR0/CSRR0/MCSRR0 will be the address of the interrupted instruction. The instruction will be restarted from the beginning after returning to it from the interrupt handler. 3.8 New Zen instructions and APUs The e200z759n3 core implements the following Freescale EIS APUs that extend the PowerPC Book E instruction set: • The ISEL APU, which is described in Section 3.9, ISEL APU • The Enhanced Debug APU and the Debug Notify Halt instructions, described in Section 3.10, Debug APU • The Machine Check APU, which is described in Section 3.11, Machine Check APU • The WAIT APU, which is described in Section 3.12, WAIT APU • The Volatile Context Save/Restore APU, which is described in Section 3.14, Volatile Context Save/Restore APU • The Embedded Floating-Point APU version 2, described along with supporting instructions in Chapter 5, Embedded Floating-Point APU (EFPU2). • The Signal Processing Extension (SPE) APU version 1, described along with supporting instructions in Chapter 6, Signal Processing Extension APU (SPE APU). • The Performance Monitor APU, which is described in Chapter 8, Performance Monitor e200z759n3 Core Reference Manual, Rev. 2 68 Freescale Semiconductor • The Cache Line-locking APU, which is described in Section 11.12, Cache line locking/unlocking APU The Enhanced Reservations APU, which is described in Section 3.13, Enhanced reservations APU • 3.9 ISEL APU The ISEL APU defines the isel instruction, which provides a means to select one of two registers and place the result in a destination register under the control of a predicate value supplied by a bit in the condition register. This instruction can be used to eliminate branches in software and in many cases improve performance. This instruction can also increase program execution time determinism by eliminating the need to predict the target and direction of the branches replaced by the integer select function. The instruction form and definition is as follows: isel isel Integer Select isel RT, RA, RB, crb 31 0 RT 5 6 RA 10 11 RB 15 16 crb 20 21 01111 25 26 0 30 31 if RA=0 then a 320else a GPR(RA) c = CRcrb if c then GPR(RT) a else GPR(RT) GPR(RB) For isel, if the bit of the CR specified by (crb) is set, the contents of RA|0 are copied into RT. If the bit of the CR specified by (crb) is clear, the contents of RB are copied into RT. Other registers altered: • None 3.10 Debug APU e200z759n3 implements the Freescale EIS Debug APU to support the capability to handle the Debug interrupt as an additional interrupt level. To support this interrupt level, a new ‘return from debug interrupt’ (rfdi, se_rfdi) instruction is defined as part of the Debug APU, along with a new pair of save/restore registers, DSRR0, and DSRR1. When the Debug APU is enabled (HID0DAPUEN = 1), the rfdi or se_rfdi instruction provides a means to return from a debug interrupt. See Section 2.4.11, Hardware Implementation Dependent Register 0 (HID0) for more information about enabling the Debug APU. The instruction form and definition is as follows: e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 69 rfdi rfdi Return From Debug Interrupt rfdi 19 0 /// 5 0000100111 6 20 21 0 30 31 MSR DSRR1 PC DSRR00:30 || 10 The rfdi instruction is used to return from a Debug interrupt, or as a means of simultaneously establishing a new context and synchronizing on that new context. The contents of Debug Save/Restore Register 1 are place into the Machine State Register. If the new Machine State Register value does not enable any pending exceptions, then the next instruction is fetched, under control of the new Machine State Register value from the address DSRR00:30|| 1’b0. If the new Machine State Register value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the value placed into Save/Restore Register 0 or Critical Save/Restore Register 0 by the interrupt processing mechanism is the address of the instruction that would have been executed next had the interrupt not occurred (i.e. the address in Debug Save/Restore Register 0 at the time of the execution of the rfdi). Execution of this instruction is privileged and context synchronizing. Special Registers Altered: • MSR When the Debug APU is disabled (HID0DAPUEN=0), this instruction is treated as an illegal instruction. se_rfdi se_rfdi Return From Debug Interrupt se_rfdi 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 15 MSR DSRR1 PC DSRR032:62 || 0b0 The rfdi or se_rfdi instruction is used to return from a Debug interrupt, or as a means of simultaneously establishing a new context and synchronizing on that new context. e200z759n3 Core Reference Manual, Rev. 2 70 Freescale Semiconductor The contents of Debug Save/Restore Register 1 are place into the Machine State Register. If the new Machine State Register value does not enable any pending exceptions, then the next instruction is fetched, under control of the new Machine State Register value from the address DSRR032:62|| 0b0. If the new Machine State Register value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the value placed into Save/Restore Register 0 or Critical Save/Restore Register 0 by the interrupt processing mechanism is the address of the instruction that would have been executed next had the interrupt not occurred (i.e. the address in Debug Save/Restore Register 0 at the time of the execution of the rfdi or se_rfdi). Execution of this instruction is privileged and context synchronizing. Special Registers Altered: • MSR When the Debug APU is disabled (HID0[DAPUEN]=0), this instruction is treated as an illegal instruction. 3.10.1 Debug notify halt instructions The dnh, e_dnh, and se_dnh instructions provide a bridge between the execution of instructions on the core in a non-halted mode, and an external debug facility. dnh, e_dnh, and se_dnh allows software to transition the core from a running state to a debug halted state if enabled by an external debugger, and dnh provides the external debugger with bits reserved in the instruction itself to pass additional information. For e200z759n3, when the CPU enters a debug halted state due to a dnh, e_dnh, or se_dnh instruction, the instruction will be stored in the CPUSCR[IR] portion, and the CPUSCR[PC] value will point to the instruction. The external debugger should update the CPUSCR prior to exiting the debug halted state to point past the dnh, e_dnh, or se_dnh instruction. Note that the dnh instruction is only available in BookE instruction pages, and the e_dnh and se_dnh instructions are only available in VLE instruction pages. dnh dnh Debugger Notify Halt dnh dui, duis 0 5 0 1 0 0 1 1 6 10 dui 11 15 16 duis 20 21 0 0 1 1 0 0 0 1 1 30 31 0 / if EDBCRDNH_EN = 1 then implementation dependent register dui halt processor else illegal instruction exception Execution of the dnh instruction causes the processor to halt if the external debug facility has enabled such action by previously setting the EDBCRDNH_EN bit. If the processor is halted, the contents of the dui field are provided to the external debug facility to identify the reason for the halt. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 71 If EDBCRDNH_EN has not been previously set by the external debug facility, executing the dnh instruction produces an illegal instruction exception. The duis field is provided to pass additional information about the halt, but requires that actions be performed by the external debug facility to access the dnh instruction to read the contents of the field. The dnh instruction is not privileged, and executes the same regardless of the state of MSRPR. The current state of the processor debug facility, whether the processor is in IDM or EDM mode has no effect on the execution of the dnh instruction. Other registers altered: • None. Software Note: After the dnh instruction has executed, the instruction itself can be read back by the Illegal Instruction Interrupt handler or the external debug facility if the contents of the dui and duis field are of interest. If the processor entered the Illegal Instruction Interrupt handler, software can use SRR0 to obtain the address of the dnh instruction that caused the handler to be invoked. If the processor is halted in debug mode, the external debug facility can access the CPUSCR register to obtain the dnh instruction that caused the processor to halt. e_dnh e_dnh Debugger Notify Halt e_dnh 0 dui, duis 5 6 0 1 1 1 1 1 10 dui 11 15 16 duis 20 21 0 0 0 1 1 0 0 0 0 30 31 1 / if EDBCRDNH_EN = 1 then implementation dependent register dui halt processor else illegal instruction exception Execution of the e_dnh instruction causes the processor to halt if the external debug facility has enabled such action by previously setting the EDBCRDNH_EN bit. If the processor is halted, the contents of the dui field are provided to the external debug facility to identify the reason for the halt. If EDBCRDNH_EN has not been previously set by the external debug facility, executing the e_dnh instruction produces an illegal instruction exception. The duis field is provided to pass additional information about the halt, but requires that actions be performed by the external debug facility to access the e_dnh instruction to read the contents of the field. The e_dnh instruction is not privileged, and executes the same regardless of the state of MSRPR. The current state of the processor debug facility, whether the processor is in IDM or EDM mode has no effect on the execution of the e_dnh instruction. Other registers altered: e200z759n3 Core Reference Manual, Rev. 2 72 Freescale Semiconductor • None. se_dnh se_dnh Debugger Notify Halt se_dnh 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 15 if EDBCRDNH_EN = 1 then halt processor else illegal instruction exception Execution of the se_dnh instruction causes the processor to halt if the external debug facility has enabled such action by previously setting the EDBCRDNH_EN bit. If EDBCRDNH_EN has not been previously set by the external debug facility, executing the se_dnh instruction produces an illegal instruction exception. The se_dnh instruction is not privileged, and executes the same regardless of the state of MSRPR. The current state of the processor debug facility, whether the processor is in IDM or EDM mode has no effect on the execution of the se_dnh instruction. Other registers altered: • None. 3.11 Machine Check APU e200z759n3 implements the Freescale EIS Machine Check APU to support the capability to handle the Machine Check interrupt as an additional interrupt level. To support this interrupt level, a new ‘return from Machine Check interrupt’ (rfmci, se_rfmci) instruction is defined as part of the Machine Check APU, along with a new pair of save/restore registers, MCSRR0, and MCSRR1, a machine check syndrome register MCSR, and a machine check address register MCAR. The rfmci and se_rfmci instructions provide a means to return from a Machine Check interrupt. The instruction form and definitions is as follows: e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 73 rfmci rfmci Return From Machine Check Interrupt rfmci 19 0 /// 5 0000100110 6 20 21 0 30 31 MSR MCSRR1 PC MCSRR00:30 || 10 The rfmci instruction is used to return from a Machine Check interrupt, or as a means of simultaneously establishing a new context and synchronizing on that new context. The contents of Machine Check Save/Restore Register 1 are place into the Machine State Register. If the new Machine State Register value does not enable any pending exceptions, then the next instruction is fetched, under control of the new Machine State Register value from the address MCSRR00:30|| 1’b0. If the new Machine State Register value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the value placed into the appropriate Save/Restore Register 0 by the interrupt processing mechanism is the address of the instruction that would have been executed next had the interrupt not occurred (i.e. the address in Machine Check Save/Restore Register 0 at the time of the execution of the rfmci). Execution of this instruction is privileged and context synchronizing. Special Registers Altered: • MSR NOTE This instruction is only available in Book E instruction pages, it is not available in VLE instruction pages. se_rfmci se_rfmci Return From Machine Check Interrupt se_rfmci 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 15 MSR MCSRR1 e200z759n3 Core Reference Manual, Rev. 2 74 Freescale Semiconductor PC MCSRR00:30 || 10 The se_rfmci instruction is used to return from a Machine Check interrupt, or as a means of simultaneously establishing a new context and synchronizing on that new context. The contents of Machine Check Save/Restore Register 1 are place into the Machine State Register. If the new Machine State Register value does not enable any pending exceptions, then the next instruction is fetched, under control of the new Machine State Register value from the address MCSRR00:30|| 1’b0. If the new Machine State Register value enables one or more pending exceptions, the interrupt associated with the highest priority pending exception is generated; in this case the value placed into the appropriate Save/Restore Register 0 by the interrupt processing mechanism is the address of the instruction that would have been executed next had the interrupt not occurred (i.e. the address in Machine Check Save/Restore Register 0 at the time of the execution of the se_rfmci). Execution of this instruction is privileged and context synchronizing. Special Registers Altered: • MSR NOTE This instruction is only available in VLE instruction pages, it is not available in BookE instruction pages. 3.12 WAIT APU The wait instruction allows software to cease all synchronous activity, waiting for an asynchronous interrupt or debug interrupt to occur. The instruction can be used to cease processor activity in both user and supervisor modes. Asynchronous interrupts that cause the waiting state to be exited if enabled are critical input, external input, and machine check pin (p_mcp_b). Non-maskable interrupts (p_nmi_b) also cause the waiting state to be exited. wait wait Wait for Interrupt wait 0 0 5 1 1 1 1 1 6 10 11 15 16 /// 20 21 0 31 0 0 0 1 1 1 1 1 0 / The wait instruction provides an ordering function for the effects of all instructions executed by the processor executing the wait instruction and stops synchronous processor activity. Executing a wait instruction ensures that all instructions have completed before the wait instruction completes, causes processor instruction fetching to cease, and ensures that no subsequent instructions are initiated until an asynchronous interrupt or a debug interrupt occurs. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 75 Once the wait instruction has completed, the program counter will point to the next sequential instruction. The saved value in xSRR0 when the processor re-initiates activity will point to the instruction following the wait instruction. Execution of a wait instruction places the CPU in the “waiting” state and is indicated by assertion of the p_waiting output signal. The signal will be negated after leaving the “waiting” state. Software must ensure that interrupts responsible for exiting the waiting state are enabled before executing a wait instruction. NOTE The wait instruction can be used in verification test cases to signal the end of a test case. The encoding for the instruction is the same in both big-endian and little-endian modes. 3.13 Enhanced reservations APU Zen implements the EIS enhanced reservations APU, which extends the load and reserve and store conditional instructions to support byte and halfword data types. These instructions operate in the same manner as the lwarx and stwcx. instructions, except for the size of the access. lbarx lbarx Load Byte And Reserve Indexed lbarx 0 0 RT,RA,RB 1 1 1 1 1 (X-mode) RT 6 RA 11 RB 16 0 0 0 0 1 1 0 1 21 0 0 / 31 if RA=0 then a 640 else a GPR(RA) if X-mode then EA 320 || (a + GPR(RB))32:63 RESERVE 1 RESERVE_ADDR real_addr(EA) GPR(RT) 560 || MEM(EA,1) Let the effective address (EA) be calculated as follows: • For lbarx, let EA be 32 0s concatenated with bits 32:63 of the sum of the contents of GPR(RA), or 64 0s if RA=0, and the contents of GPR(RB). The byte in storage addressed by EA is loaded into GPR(RT)56:63. GPR(RT)0:55 are set to 0. This instruction creates a reservation for use by a Store Byte Conditional instruction. An address computed from the EA is associated with the reservation and replaces any address previously associated with the reservation. Special Registers Altered: • None e200z759n3 Core Reference Manual, Rev. 2 76 Freescale Semiconductor lharx lharx Load Halfword And Reserve Indexed lharx 0 RT,RA,RB 1 1 1 1 1 0 (X-mode) RT RA 6 RB 11 16 0 0 0 1 1 1 0 1 0 0 21 / 31 if RA=0 then a 640 else a GPR(RA) EA 320 || (a + GPR(RB))32:63 RESERVE 1 RESERVE_ADDR real_addr(EA) GPR(RT) 480 || MEM(EA,2) Let the effective address (EA) be calculated as follows: • For lharx, let EA be 32 0s concatenated with bits 32:63 of the sum of the contents of GPR(RA), or 64 0s if RA=0, and the contents of GPR(RB). The halfword in storage addressed by EA is loaded into GPR(RT)48:63. GPR(RT)0:47 are set to 0. This instruction creates a reservation for use by a Store Halfword Conditional instruction. An address computed from the EA is associated with the reservation and replaces any address previously associated with the reservation. EA must be a multiple of 2. If it is not, either an Alignment interrupt is invoked or the results are boundedly undefined. Special Registers Altered: • None stbcx. stbcx. Store Byte Conditional Indexed stbcx. 0 RS,RA,RB 1 1 1 1 1 0 (X-mode) RS 6 RA 11 RB 16 1 0 21 1 0 1 1 0 1 1 0 1 31 if RA=0 then a 640 else a GPR(RA) EA 320 || (a + GPR(RB))32:63 if RESERVE then if RESERVE_ADDR = real_addr(EA) then MEM(EA,1) GPR(RS)56:63 CR0 0b00 || 0b1 || XERSO else u undefined 1-bit value if u then MEM(EA,1) GPR(RS)56:63 CR0 0b00 || u || XERSO RESERVE 0 else CR0 0b00 || 0b0 || XERSO e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 77 Let the effective address (EA) be calculated as follows: • For stbcx., let EA be 32 0s concatenated with bits 32:63 of the sum of the contents of GPR(RA), or 64 0s if RA=0, and the contents of GPR(RB). If a reservation exists and the storage address specified by the stbcx. is the same as that specified by the lbarx instruction that established the reservation, the contents of bits 56:63 of GPR(RS) are stored into the byte in storage addressed by EA and the reservation is cleared. If a reservation exists but the storage address specified by the stbcx. is not the same as that specified by the Load and Reserve instruction that established the reservation, the reservation is cleared, and it is undefined whether the instruction completes without altering storage. If a reservation does not exist, the instruction completes without altering storage. CR Field 0 is set to reflect whether the store operation was performed, as follows. CR0LT GT EQ SO = 0b00 || store_performed || XERSO Special Registers Altered: • CR0 sthcx. sthcx. Store Halfword Conditional Indexed sthcx. 0 0 RS,RA,RB 1 1 1 1 1 (X-mode) RS 6 RA 11 RB 16 1 0 21 1 1 0 1 0 1 1 0 1 31 if RA=0 then a 640 else a GPR(RA) EA 320 || (a + GPR(RB))32:63 if RESERVE then if RESERVE_ADDR = real_addr(EA) then MEM(EA,2) GPR(RS)48:63 CR0 0b00 || 0b1 || XERSO else u undefined 1-bit value if u then MEM(EA,2) GPR(RS)48:63 CR0 0b00 || u || XERSO RESERVE 0 else CR0 0b00 || 0b0 || XERSO Let the effective address (EA) be calculated as follows: • For sthcx., let EA be 32 0s concatenated with bits 32:63 of the sum of the contents of GPR(RA), or 64 0s if RA=0, and the contents of GPR(RB). If a reservation exists and the storage address specified by the sthcx. is the same as that specified by the lharx instruction that established the reservation, the contents of bits 48:63 of GPR(RS) are stored into the halfword in storage addressed by EA and the reservation is cleared. e200z759n3 Core Reference Manual, Rev. 2 78 Freescale Semiconductor If a reservation exists but the storage address specified by the sthcx. is not the same as that specified by the Load and Reserve instruction that established the reservation, the reservation is cleared, and it is undefined whether the instruction completes without altering storage. If a reservation does not exist, the instruction completes without altering storage. CR Field 0 is set to reflect whether the store operation was performed, as follows. CR0LT GT EQ SO = 0b00 || store_performed || XERSO EA must be a multiple of 2. If it is not, either an Alignment interrupt is invoked or the results are boundedly undefined. Special Registers Altered: • CR0 3.14 Volatile Context Save/Restore APU Zen implements the EIS Volatile Context Save/Restore APU to support the capability to quickly save and restore volatile register context on entry into an interrupt handler. To support this functionality, a new set of instructions is defined as part of the APU. • e_lmvgprw, e_stmvgprw — load/store multiple volatile gprs (r0, r3:r12) • e_lmvsprw, e_stmvsprw — load/store multiple volatile sprs (CR, LR, CTR, and XER) • e_lmvsrrw, e_stmvsrrw — load/store multiple volatile srrs (SRR0, SRR1) • e_lmvcsrrw, e_stmvcsrrw — load/store multiple volatile csrrs (CSRR0, CSRR1) • e_lmvdsrrw, e_stmvdsrrw — load/store multiple volatile dsrrs (DSRR0, DSRR1) • e_lmvmcsrrw, e_stmvmcsrrw — load/store multiple volatile mcsrrs (MCSRR0, MCSRR1) These instructions are available in VLE instruction pages to perform a multiple register load or store to a word aligned memory address. e_lmvgprw e_lmvgprw Load Multiple Volatile GPR Word e_lmvgprw 0 0 0 D8(RA) 1 1 0 0 0 0 0 0 6 (D8-mode) 0 RA 11 0 0 0 1 0 0 0 16 0 D8 24 31 if RA=0 then EA EXTS(D8) else EA (GPR(RA)+EXTS(D8)) GPR(r0)32:63 MEM(EA,4) EA (EA+4) r 3 do while r 12 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 79 GPR(r)32:63 MEM(EA,4) EA (EA+4) r r + 1 Let the effective address (EA) be the sum of the contents of GPR(RA), or 0 if RA=0, and the sign-extended value of the D8 instruction field. Bits 32:63 of registers GPR(R0), and GPR(R3) through GPR(12) are loaded from n consecutive words in storage starting at address EA. EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly undefined. Special Registers Altered: • None e_stmvgprw e_stmvgprw Store Multiple Volatile GPR Word e_stmvgprw 0 0 0 0 D8(RA) 1 1 0 0 0 0 0 (D8-mode) 0 6 RA 11 0 0 0 1 0 0 0 16 1 D8 24 31 if RA=0 then EA EXTS(D8) else EA (GPR(RA)+EXTS(D8)) MEM(EA,4) GPR(r0)32:63 EA (EA+4) r 3 do while r 12 MEM(EA,4) GPR(r)32:63 r r + 1 EA (EA+4) Let the effective address (EA) be the sum of the contents of GPR(RA), or 0 if RA=0, and the sign-extended value of the D8 instruction field. Bits 32:63 of registers GPR(R0), and GPR(R3) through GPR(12) are stored in n consecutive words in storage starting at address EA. EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly undefined. Special Registers Altered: • None e200z759n3 Core Reference Manual, Rev. 2 80 Freescale Semiconductor e_lmvsprw e_lmvsprw Load Multiple Volatile SPR Word e_lmvsprw 0 0 0 D8(RA) 1 1 0 0 0 0 0 0 (D8-mode) 1 6 RA 0 11 0 0 1 0 0 0 0 16 D8 24 31 if RA=0 then EA EXTS(D8) else EA (GPR(RA)+EXTS(D8)) CR32:63 MEM(EA,4) EA (EA+4) LR32:63 MEM(EA,4) EA (EA+4) CTR32:63 MEM(EA,4) EA (EA+4) XER32:63 MEM(EA,4) Let the effective address (EA) be the sum of the contents of GPR(RA), or 0 if RA=0, and the sign-extended value of the D8 instruction field. Bits 32:63 of registers CR, LR, CTR, and XER are loaded from n consecutive words in storage starting at address EA. EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly undefined. Special Registers Altered: • CR • LR • CTR • XER e_stmvsprw e_stmvsprw Store Multiple Volatile SPR Word e_stmvsprw 0 0 0 D8(RA) 1 1 0 0 0 0 6 0 0 (D8-mode) 1 RA 11 0 0 0 1 0 0 0 16 1 D8 24 31 if RA=0 then EA EXTS(D8) else EA (GPR(RA)+EXTS(D8)) MEM(EA,4) CR32:63 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 81 EA (EA+4) MEM(EA,4) LR32:63 EA (EA+4) MEM(EA,4) CTR32:63 EA (EA+4) MEM(EA,4) XER32:63 Let the effective address (EA) be the sum of the contents of GPR(RA), or 0 if RA=0, and the sign-extended value of the D8 instruction field. Bits 32:63 of registers CR, LR, CTR, and XER are stored in n consecutive words in storage starting at address EA. EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly undefined. Special Registers Altered: • None e_lmvsrrw e_lmvsrrw Load Multiple Volatile SRR Word e_lmvsrrw 0 0 0 0 D8(RA) 1 1 0 0 0 1 6 0 (D8-mode) 0 RA 11 0 0 0 1 0 0 0 16 0 D8 24 31 if RA=0 then EA EXTS(D8) else EA (GPR(RA)+EXTS(D8)) SRR032:63 MEM(EA,4) EA (EA+4) SRR132:63 MEM(EA,4) Let the effective address (EA) be the sum of the contents of GPR(RA), or 0 if RA=0, and the sign-extended value of the D8 instruction field. Bits 32:63 of registers SRR0 and SRR1 are loaded from consecutive words in storage starting at address EA. EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly undefined. Special Registers Altered: • SRR0 • SRR1 e200z759n3 Core Reference Manual, Rev. 2 82 Freescale Semiconductor e_stmvsrrw e_stmvsrrw Store Multiple Volatile SRR Word e_stmvsrrw 0 0 0 D8(RA) 1 1 0 0 0 0 1 0 (D8-mode) 0 6 RA 0 11 0 0 1 0 0 0 1 16 D8 24 31 if RA=0 then EA EXTS(D8) else EA (GPR(RA)+EXTS(D8)) MEM(EA,4) SRR032:63 EA (EA+4) MEM(EA,4) SRR132:63 Let the effective address (EA) be the sum of the contents of GPR(RA), or 0 if RA=0, and the sign-extended value of the D8 instruction field. Bits 32:63 of registers SRR0 and SRR1 are stored in consecutive words in storage starting at address EA. EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly undefined. Special Registers Altered: • None e_lmvcsrrw e_lmvcsrrw Load Multiple Volatile CSRR Word e_lmvcsrrw 0 0 0 D8(RA) 1 1 0 0 0 0 1 6 0 (D8-mode) 1 RA 11 0 0 0 1 0 0 0 16 0 D8 24 31 if RA=0 then EA EXTS(D8) else EA (GPR(RA)+EXTS(D8)) CSRR032:63 MEM(EA,4) EA (EA+4) CSRR132:63 MEM(EA,4) Let the effective address (EA) be the sum of the contents of GPR(RA), or 0 if RA=0, and the sign-extended value of the D8 instruction field. Bits 32:63 of registers CSRR0 and CSRR1 are loaded from consecutive words in storage starting at address EA. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 83 EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly undefined. Special Registers Altered: • CSRR0 • CSRR1 e_stmvcsrrw e_stmvcsrrw Store Multiple Volatile CSRR Word e_stmvcsrrw 0 0 0 1 D8(RA) 1 0 0 0 0 1 0 (D8-mode) 1 6 RA 0 11 0 0 1 0 0 0 1 16 D8 24 31 if RA=0 then EA EXTS(D8) else EA (GPR(RA)+EXTS(D8)) MEM(EA,4) CSRR032:63 EA (EA+4) MEM(EA,4) CSRR132:63 Let the effective address (EA) be the sum of the contents of GPR(RA), or 0 if RA=0, and the sign-extended value of the D8 instruction field. Bits 32:63 of registers CSRR0 and CSRR1 are stored in consecutive words in storage starting at address EA. EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly undefined. Special Registers Altered: • None e_lmvdsrrw e_lmvdsrrw Load Multiple Volatile DSRR Word e_lmvdsrrw 0 0 0 0 D8(RA) 1 1 0 0 0 1 6 1 (D8-mode) 0 RA 11 0 0 0 1 0 0 0 16 0 D8 24 31 if RA=0 then EA EXTS(D8) else EA (GPR(RA)+EXTS(D8)) DSRR032:63 MEM(EA,4) e200z759n3 Core Reference Manual, Rev. 2 84 Freescale Semiconductor EA (EA+4) DSRR132:63 MEM(EA,4) Let the effective address (EA) be the sum of the contents of GPR(RA), or 0 if RA=0, and the sign-extended value of the D8 instruction field. Bits 32:63 of registers DSRR0 and DSRR1 are loaded from consecutive words in storage starting at address EA. EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly undefined. Special Registers Altered: • DSRR0 • DSRR1 e_stmvdsrrw e_stmvdsrrw Store Multiple Volatile DSRR Word e_stmvdsrrw 0 0 0 1 D8(RA) 1 0 0 0 0 1 6 1 (D8-mode) 0 RA 11 0 0 0 1 0 0 0 16 1 D8 24 31 if RA=0 then EA EXTS(D8) else EA (GPR(RA)+EXTS(D8)) MEM(EA,4) DSRR032:63 EA (EA+4) MEM(EA,4) DSRR132:63 Let the effective address (EA) be the sum of the contents of GPR(RA), or 0 if RA=0, and the sign-extended value of the D8 instruction field. Bits 32:63 of registers DSRR0 and DSRR1 are stored in consecutive words in storage starting at address EA. EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly undefined. Special Registers Altered: • None e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 85 e_lmvmcsrrw e_lmvmcsrrw Load Multiple Volatile MCSRR Word e_lmvmcsrrw 0 0 0 1 D8(RA) 1 0 0 0 0 1 1 (D8-mode) 1 6 RA 0 11 0 0 1 0 0 0 0 16 D8 24 31 if RA=0 then EA EXTS(D8) else EA (GPR(RA)+EXTS(D8)) MCSRR032:63 MEM(EA,4) EA (EA+4) MCSRR132:63 MEM(EA,4) Let the effective address (EA) be the sum of the contents of GPR(RA), or 0 if RA=0, and the sign-extended value of the D8 instruction field. Bits 32:63 of registers MCSRR0 and MCSRR1 are loaded from consecutive words in storage starting at address EA. EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly undefined. Special Registers Altered: • MCSRR0 • MCSRR1 e_stmvmcsrrw e_stmvmcsrrw Store Multiple Volatile MCSRR Word e_stmvmcsrrw 0 0 0 0 1 D8(RA) 1 0 0 0 1 6 1 (D8-mode) 1 RA 11 0 0 0 1 0 0 0 16 1 D8 24 31 if RA=0 then EA EXTS(D8) else EA (GPR(RA)+EXTS(D8)) MEM(EA,4) MCSRR032:63 EA (EA+4) MEM(EA,4) MCSRR132:63 Let the effective address (EA) be the sum of the contents of GPR(RA), or 0 if RA=0, and the sign-extended value of the D8 instruction field. Bits 32:63 of registers MCSRR0 and MCSRR1 are stored in consecutive words in storage starting at address EA. e200z759n3 Core Reference Manual, Rev. 2 86 Freescale Semiconductor EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly undefined. Special Registers Altered: • None 3.15 Unimplemented SPRs and read-only SPRs Zen fully decodes the SPR field of the mfspr and mtspr instructions. If the SPR specified is undefined and not privileged, an illegal instruction exception is generated. If the SPR specified is undefined and privileged and the CPU is in user mode (MSRPR=1), a privileged instruction exception is generated. If the SPR specified is undefined and privileged and the CPU is in supervisor mode (MSRPR=0), an illegal instruction exception is generated. For the mtspr instruction, if the SPR specified is read-only and not privileged, an illegal instruction exception is generated. If the SPR specified is read-only and privileged and the CPU is in user mode (MSRPR=1), a privileged instruction exception is generated. If the SPR specified is read-only and privileged and the CPU is in supervisor mode (MSRPR=0), an illegal instruction exception is generated. 3.16 3.16.1 Invalid forms of instructions Load and store with update instructions PowerPC Book E defines the case when a load with update instruction specifies the same register in the RT and RA field of the instruction as an invalid format. For this invalid case, the Zen core will perform the instruction and update the register with the load data. In addition, if RA=0 for any load or store with update instruction, the Zen core will update RA (GPR0). 3.16.2 Load multiple word (lmw, e_lmw) instruction PowerPC Book E defines as invalid any form of the lmw or e_lmw instruction in which RA is in the range of registers to be loaded, including the case in which RA=0. On Zen, invalid forms of the lmw or e_lmw instruction will be executed as follows: • Case 1: RA is in the range of RT, RA!=0. In this case, address generation for individual loads to register targets is done using the architectural value of RA that existed when beginning execution of this lmw or e_lmw instruction. RA will be overwritten with a value fetched from memory as if it had not been the base register. Note that if the instruction is interrupted and restarted, the base address may be different if RA has been overwritten. • Case 2: RA=0 and RT=0. In this case, address generation for all loads to register targets RT=0 to RT=31 will be done substituting the value of 0 for the RA operand. 3.16.3 Branch conditional to count register instructions PowerPC Book E defines as invalid any bcctr or bcctrl instruction that specifies the ‘decrement and test CTR’ (BO2=0) option. For these invalid forms of instructions Zen will execute the instruction by e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 87 decrementing the CTR and branch to the location specified by the pre-decremented CTR value if all CR and CTR conditions are met as specified by the other BO field settings. 3.16.4 Instructions with reserved fields non-zero PowerPC Book E defines certain bit fields in various instructions as reserved and specifies that these fields be set to zero. Per the Book E recommendation, Zen ignores the value of the reserved field (bit 31) in X-form integer load and store instructions. Zen ignores the value of the reserved ‘z’ bits in the BO field of branch instructions. For all other instructions, Zen will generate an illegal instruction exception if a reserved field is non-zero. 3.17 Instruction summary Table 20 and Table 21 list all 32-bit instructions in PowerPC Book E, as well as certain Zen specific instructions, sorted by mnemonic. Format, Opcode, Mnemonic, Instruction name, and page number in Book E: Enhanced PowerPCtm Architecture v0.99 are included in the table. For Zen specific instructions, page number is not shown. Entries with a are unsupported by the Zen core, and will signal an illegal instruction exception. Implementation dependent instructions are noted with a footnote. Instructions that are optionally supported (when an optional function is added to the base core) are shown with shaded entries. Note that specific APUs are not included in the table below: • Cache Maintenance APU • SPE APU • VLE APU • WAIT APU • Enhanced Reservation APU • Volatile Context Save/Restore APU e200z759n3 Core Reference Manual, Rev. 2 88 Freescale Semiconductor 3.17.1 Instruction index sorted by mnemonic Format Opcode Primary (Inst0:5) Extended (Inst21:31) Mnemonic Instruction BooK E 0.99 page Table 20. Instructions sorted by mnemonic X 011111 01000 01010 0 add Add 223 X 011111 01000 01010 1 add. Add & record CR 223 X 011111 00000 01010 0 addc Add Carrying 224 X 011111 00000 01010 1 addc. Add Carrying & record CR 224 X 011111 10000 01010 0 addco Add Carrying & record OV 224 X 011111 10000 01010 1 addco. Add Carrying & record OV & CR 224 X 011111 00100 01010 0 adde Add Extended with CA 225 X 011111 00100 01010 1 adde. Add Extended with CA & record CR 225 X 011111 10100 01010 0 addeo Add Extended with CA & record OV 225 X 011111 10100 01010 1 addeo. Add Extended with CA & record OV & CR 225 D 001110 ----- ----- - addi Add Immediate 226 D 001100 ----- ----- - addic Add Immediate Carrying 227 D 001101 ----- ----- - addic. Add Immediate Carrying & record CR 227 D 001111 ----- ----- - addis Add Immediate Shifted 226 X 011111 00111 01010 0 addme Add to Minus One Extended with CA 228 X 011111 00111 01010 1 addme. Add to Minus One Extended with CA & record CR 228 X 011111 10111 01010 0 addmeo Add to Minus One Extended with CA & record OV 228 X 011111 10111 01010 1 addmeo. Add to Minus One Extended with CA & record OV & CR X 011111 11000 01010 0 addo Add & record OV 223 X 011111 11000 01010 1 addo. Add & record OV & CR 223 X 011111 00110 01010 0 addze Add to Zero Extended with CA 229 X 011111 00110 01010 1 addze. Add to Zero Extended with CA & record CR 229 228 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’s Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 89 BooK E 0.99 page Table 20. Instructions sorted by mnemonic (continued) Format Opcode Primary (Inst0:5) Extended (Inst21:31) Mnemonic Instruction X 011111 10110 01010 0 addzeo Add to Zero Extended with CA & record OV 229 X 011111 10110 01010 1 addzeo. Add to Zero Extended with CA & record OV & CR 229 X 011111 00000 11100 0 and AND 230 X 011111 00000 11100 1 and. AND & record CR 230 X 011111 00001 11100 0 andc AND with Complement 230 X 011111 00001 11100 1 andc. AND with Complement & record CR 230 D 011100 ----- ----- - andi. AND Immediate & record CR 230 D 011101 ----- ----- - andis. AND Immediate Shifted & record CR 230 I 010010 ----- ----0 0 b Branch 231 I 010010 ----- ----1 0 ba Branch Absolute 231 B 010000 ----- ----0 0 bc Branch Conditional 232 B 010000 ----- ----1 0 bca Branch Conditional Absolute 232 XL 010011 10000 10000 0 bcctr Branch Conditional to Count Register 233 XL 010011 10000 10000 1 bcctrl Branch Conditional to Count Register & Link 233 B 010000 ----- ----0 1 bcl Branch Conditional & Link 232 B 010000 ----- ----1 1 bcla Branch Conditional & Link Absolute 232 XL 010011 00000 10000 0 bclr Branch Conditional to Link Register 234 XL 010011 00000 10000 1 bclrl Branch Conditional to Link Register & Link 234 I 010010 ----- ----0 1 bl Branch & Link 231 I 010010 ----- ----1 1 bla Branch & Link Absolute 231 X 011111 00000 00000 / cmp Compare 235 D 001011 ----- ----- - cmpi Compare Immediate 235 X 011111 00001 00000 / cmpl Compare Logical 236 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’s Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 90 Freescale Semiconductor Format Opcode Primary (Inst0:5) Extended (Inst21:31) Mnemonic Instruction BooK E 0.99 page Table 20. Instructions sorted by mnemonic (continued) D 001010 ----- ----- - cmpli Compare Logical Immediate 236 X 011111 00000 11010 0 cntlzw Count Leading Zeros Word 237 X 011111 00000 11010 1 cntlzw. Count Leading Zeros Word & record CR 237 XL 010011 01000 00001 / crand Condition Register AND 238 XL 010011 00100 00001 / crandc Condition Register AND with Complement 238 XL 010011 01001 00001 / creqv Condition Register Equivalent 238 XL 010011 00111 00001 / crnand Condition Register NAND 239 XL 010011 00001 00001 / crnor Condition Register NOR 239 XL 010011 01110 00001 / cror Condition Register OR 239 XL 010011 01101 00001 / crorc Condition Register OR with Complement 240 XL 010011 00110 00001 / crxor Condition Register XOR 240 X 011111 10111 10110 / dcba Data Cache Block Allocate 241 X 011111 00010 10110 / dcbf Data Cache Block Flush 242 X 011111 01110 10110 / dcbi Data Cache Block Invalidate 243 X 011111 01100 00110 / dcblc1 Data Cache Block Lock Clear — X 011111 00001 10110 / dcbst Data Cache Block Store 245 X 011111 01000 10110 / dcbt Data Cache Block Touch 246 X 011111 00101 00110 / dcbtls1 Data Cache Block Touch and Lock Set X 011111 00111 10110 / dcbtst Data Cache Block Touch for Store X 011111 00100 00110 / X 011111 11111 10110 / dcbz Data Cache Block set to Zero 248 X 011111 01111 01011 0 divw Divide Word 251 X 011111 01111 01011 1 divw. Divide Word & record CR 251 dcbtstls1 Data Cache Block Touch for Store and Lock Set — 247 — Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’s Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 91 BooK E 0.99 page Table 20. Instructions sorted by mnemonic (continued) Format Opcode Primary (Inst0:5) Extended (Inst21:31) Mnemonic Instruction X 011111 11111 01011 0 divwo Divide Word & record OV 251 X 011111 11111 01011 1 divwo. Divide Word & record OV & CR 251 X 011111 01110 01011 0 divwu Divide Word Unsigned 252 X 011111 01110 01011 1 divwu. Divide Word Unsigned & record CR 252 X 011111 11110 01011 0 divwuo Divide Word Unsigned & record OV 252 X 011111 11110 01011 1 divwuo. Divide Word Unsigned & record OV & CR 252 X 011111 01000 11100 0 eqv Equivalent 253 X 011111 01000 11100 1 eqv. Equivalent & record CR 253 X 011111 11101 11010 0 extsb Extend Sign Byte 254 X 011111 11101 11010 1 extsb. Extend Sign Byte & record CR 254 X 011111 11100 11010 0 extsh Extend Sign Halfword 254 X 011111 11100 11010 1 extsh. Extend Sign Halfword & record CR 254 X 111111 01000 01000 0 2 255 X 111111 01000 01000 1 2 255 A 111111 ----- 10101 0 2 256 A 111111 ----- 10101 1 2 256 A 111011 ----- 10101 0 2 256 A 111011 ----- 10101 1 2 256 X 111111 11010 01110 / 2 257 X 111111 00001 00000 / 2 259 X 111111 00000 00000 / 2 259 X 111111 11001 01110 / 2 260 X 111111 11001 01111 / 2 260 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’s Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 92 Freescale Semiconductor Format Opcode Primary (Inst0:5) Extended (Inst21:31) Mnemonic Instruction BooK E 0.99 page Table 20. Instructions sorted by mnemonic (continued) X 111111 00000 01110 0 2 262 X 111111 00000 01110 1 2 262 X 111111 00000 01111 0 2 262 X 111111 00000 01111 1 2 262 A 111111 ----- 10010 0 2 264 A 111111 ----- 10010 1 2 264 A 111011 ----- 10010 0 2 264 A 111011 ----- 10010 1 2 264 A 111111 ----- 11101 0 2 265 A 111111 ----- 11101 1 2 265 A 111011 ----- 11101 0 2 265 A 111011 ----- 11101 1 2 265 X 111111 00010 01000 0 2 266 X 111111 00010 01000 1 2 266 A 111111 ----- 11100 0 2 267 A 111111 ----- 11100 1 2 267 A 111011 ----- 11100 0 2 267 A 111011 ----- 11100 1 2 267 A 111111 ----- 11001 0 2 268 A 111111 ----- 11001 1 2 268 A 111011 ----- 11001 0 2 268 A 111011 ----- 11001 1 2 268 X 111111 00100 01000 0 2 269 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’s Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 93 BooK E 0.99 page Table 20. Instructions sorted by mnemonic (continued) Format Opcode Primary (Inst0:5) Extended (Inst21:31) Mnemonic Instruction X 111111 00100 01000 1 2 269 X 111111 00001 01000 0 2 269 X 111111 00001 01000 1 2 269 A 111111 ----- 11111 0 2 270 A 111111 ----- 11111 1 2 270 A 111011 ----- 11111 0 2 270 A 111011 ----- 11111 1 2 270 A 111111 ----- 11110 0 2 271 A 111111 ----- 11110 1 2 271 A 111011 ----- 11110 0 2 271 A 111011 ----- 11110 1 2 271 A 111011 ----- 11000 0 2 272 A 111011 ----- 11000 1 2 272 X 111111 00000 01100 0 2 273 X 111111 00000 01100 1 2 273 A 111111 ----- 11010 0 2 276 A 111111 ----- 11010 1 2 276 A 111111 ----- 10111 0 2 277 A 111111 ----- 10111 1 2 277 A 111111 ----- 10110 0 2 278 A 111111 ----- 10110 1 2 278 A 111011 ----- 10110 0 2 278 A 111011 ----- 10110 1 2 278 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’s Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 94 Freescale Semiconductor Format Opcode Primary (Inst0:5) Extended (Inst21:31) Mnemonic Instruction BooK E 0.99 page Table 20. Instructions sorted by mnemonic (continued) A 111111 ----- 10100 0 2 279 A 111111 ----- 10100 1 2 279 A 111011 ----- 10100 0 2 279 A 111011 ----- 10100 1 2 279 X 011111 11110 10110 / icbi Instruction Cache Block Invalidate 280 X 011111 00111 00110 / icblc1 Instruction Cache Block Lock Clear — X 011111 00000 10110 / icbt X 011111 01111 00110 / icbtls1 ?? 011111 ----- 01111 / XL 010011 00100 10110 / D 100010 ----- ----- - lbz D 100011 ----- ----- - X Instruction Cache Block Touch 281 Instruction Cache Block Touch and Lock Set — isel3 Integer Select — isync Instruction Synchronize 282 Load Byte & Zero 283 lbzu Load Byte & Zero with Update 283 011111 00011 10111 / lbzux Load Byte & Zero with Update Indexed 283 X 011111 00010 10111 / lbzx Load Byte & Zero Indexed 283 D 110010 ----- ----- - 2 286 D 110011 ----- ----- - 2 286 X 011111 10011 10111 / 2 286 X 011111 10010 10111 / 2 286 D 110000 ----- ----- - 2 287 D 110001 ----- ----- - 2 287 X 011111 10001 10111 / 2 287 X 011111 10000 10111 / 2 287 D 101010 ----- ----- - lha Load Halfword Algebraic 288 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’s Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 95 BooK E 0.99 page Table 20. Instructions sorted by mnemonic (continued) Format Opcode Primary (Inst0:5) Extended (Inst21:31) Mnemonic Instruction D 101011 ----- ----- - lhau Load Halfword Algebraic with Update 288 X 011111 01011 10111 / lhaux Load Halfword Algebraic with Update Indexed 288 X 011111 01010 10111 / lhax Load Halfword Algebraic Indexed 288 X 011111 11000 10110 / lhbrx Load Halfword Byte-Reverse Indexed 289 D 101000 ----- ----- - lhz Load Halfword & Zero 290 D 101001 ----- ----- - lhzu Load Halfword & Zero with Update 290 X 011111 01001 10111 / lhzux Load Halfword & Zero with Update Indexed 290 X 011111 01000 10111 / lhzx Load Halfword & Zero Indexed 290 D 101110 ----- ----- - lmw Load Multiple Word 291 X 011111 10010 10101 / 4 292 X 011111 10000 10101 / 4 292 X 011111 00000 10100 / lwarx5 Load Word & Reserve Indexed 294 X 011111 10000 10110 / lwbrx Load Word Byte-Reverse Indexed 296 D 100000 ----- ----- - lwz Load Word & Zero 297 D 100001 ----- ----- - lwzu Load Word & Zero with Update 297 X 011111 00001 10111 / lwzux Load Word & Zero with Update Indexed 297 X 011111 00000 10111 / lwzx Load Word & Zero Indexed 297 X 011111 11010 10110 / mbar5 Memory Barrier 298 XL 010011 00000 00000 / mcrf Move Condition Register Field 299 X 111111 00010 00000 / 2 X 011111 10000 00000 / mcrxr X 011111 01000 10011 / 4 X 011111 00000 10011 / mfcr 300 Move to Condition Register from XER 300 301 Move From Condition Register 301 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’s Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 96 Freescale Semiconductor Format Opcode Primary (Inst0:5) Extended (Inst21:31) Mnemonic Instruction Move From Device Control Register BooK E 0.99 page Table 20. Instructions sorted by mnemonic (continued) XFX 011111 01010 00011 / mfdcr 302 X 011111 01000 00011 / 4 302 X 111111 10010 00111 0 2 303 X 111111 10010 00111 1 2 303 X 011111 00010 10011 / mfmsr Move From Machine State Register 303 XFX 011111 01010 10011 / mfspr Move From Special Purpose Register 304 X 011111 10010 10110 / msync5 Memory Synchronize 305 XFX 011111 00100 10000 / mtcrf Move To Condition Register Fields 306 XFX 011111 01110 00011 / mtdcr Move To Device Control Register 307 X 011111 01100 00011 / 4 307 X 111111 00010 00110 0 2 308 X 111111 00010 00110 1 2 308 X 111111 00001 00110 0 2 308 X 111111 00001 00110 1 2 308 XFL 111111 10110 00111 0 2 309 XFL 111111 10110 00111 1 2 309 X 111111 00100 00110 0 2 310 X 111111 00100 00110 1 2 310 X 011111 00100 10010 / mtmsr Move To Machine State Register 311 XFX 011111 01110 10011 / mtspr Move To Special Purpose Register 312 X 011111 /0010 01011 0 mulhw Multiply High Word 314 X 011111 /0010 01011 1 mulhw. Multiply High Word & record CR 314 X 011111 /0000 01011 0 mulhwu Multiply High Word Unsigned 314 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’s Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 97 BooK E 0.99 page Table 20. Instructions sorted by mnemonic (continued) Format Opcode Primary (Inst0:5) Extended (Inst21:31) X 011111 /0000 01011 1 D 000111 ----- ----- - X Mnemonic Instruction Multiply High Word Unsigned & record CR 314 mulli Multiply Low Immediate 315 011111 00111 01011 0 mullw Multiply Low Word 316 X 011111 00111 01011 1 mullw. Multiply Low Word & record CR 316 X 011111 10111 01011 0 mullwo Multiply Low Word & record OV 316 X 011111 10111 01011 1 mullwo. Multiply Low Word & record OV & CR 316 X 011111 01110 11100 0 nand NAND 317 X 011111 01110 11100 1 nand. NAND & record CR 317 X 011111 00011 01000 0 neg Negate 318 X 011111 00011 01000 1 neg. Negate & record CR 318 X 011111 10011 01000 0 nego Negate & record OV 318 X 011111 10011 01000 1 nego. Negate & record OV & record CR 318 X 011111 00011 11100 0 nor NOR 319 X 011111 00011 11100 1 nor. NOR & record CR 319 X 011111 01101 11100 0 or OR 320 X 011111 01101 11100 1 or. OR & record CR 320 X 011111 01100 11100 0 orc OR with Complement 320 X 011111 01100 11100 1 orc. OR with Complement & record CR 320 D 011000 ----- ----- - ori OR Immediate 320 D 011001 ----- ----- - oris OR Immediate Shifted 320 XL 010011 00001 10011 / rfci Return From Critical Interrupt 321 XL 010011 00001 00111 / rfdi6 Return From Debug Interrupt — XL 010011 00001 10010 / rfi mulhwu. Return From Interrupt 322 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’s Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 98 Freescale Semiconductor Format Opcode Primary (Inst0:5) Extended (Inst21:31) Mnemonic Instruction BooK E 0.99 page Table 20. Instructions sorted by mnemonic (continued) XL 010011 00001 00110 / rfmci7 Return From Machine Check Interrupt M 010100 ----- ----- 0 rlwimi Rotate Left Word Immediate then Mask Insert 327 M 010100 ----- ----- 1 rlwimi. Rotate Left Word Immediate then Mask Insert & record CR 327 M 010101 ----- ----- 0 rlwinm Rotate Left Word Immediate then AND with Mask 328 M 010101 ----- ----- 1 rlwinm. Rotate Left Word Immediate then AND with Mask & record CR 328 M 010111 ----- ----- 0 rlwnm Rotate Left Word then AND with Mask 328 M 010111 ----- ----- 1 rlwnm. Rotate Left Word then AND with Mask & record CR 328 SC 010001 ///// ////1 / sc System Call 330 X 011111 00000 11000 0 slw Shift Left Word 332 X 011111 00000 11000 1 slw. Shift Left Word & record CR 332 X 011111 11000 11000 0 sraw Shift Right Algebraic Word 334 X 011111 11000 11000 1 sraw. Shift Right Algebraic Word & record CR 334 X 011111 11001 11000 0 srawi Shift Right Algebraic Word Immediate 334 X 011111 11001 11000 1 srawi. Shift Right Algebraic Word Immediate & record CR 334 X 011111 10000 11000 0 srw Shift Right Word 336 X 011111 10000 11000 1 srw. Shift Right Word & record CR 336 D 100110 ----- ----- - stb Store Byte 337 D 100111 ----- ----- - stbu Store Byte with Update 337 X 011111 00111 10111 / stbux Store Byte with Update Indexed 337 X 011111 00110 10111 / stbx Store Byte Indexed 337 D 110110 ----- ----- - 2 340 D 110111 ----- ----- - 2 340 X 011111 10111 10111 / 2 340 — Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’s Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 99 BooK E 0.99 page Table 20. Instructions sorted by mnemonic (continued) Format Opcode Primary (Inst0:5) Extended (Inst21:31) Mnemonic Instruction X 011111 10110 10111 / 2 340 X 011111 11110 10111 / 2 341 D 110100 ----- ----- - 2 342 D 110101 ----- ----- - 2 342 X 011111 10101 10111 / 2 342 X 011111 10100 10111 / 2 342 D 101100 ----- ----- - X 011111 11100 10110 / D 101101 ----- ----- - X Store Halfword 343 Store Halfword Byte-Reverse Indexed 344 sthu Store Halfword with Update 343 011111 01101 10111 / sthux Store Halfword with Update Indexed 343 X 011111 01100 10111 / sthx Store Halfword Indexed 343 D 101111 ----- ----- - stmw Store Multiple Word 345 X 011111 10110 10101 / 4 346 X 011111 10100 10101 / 4 346 D 100100 ----- ----- - X 011111 10100 10110 / X 011111 00100 10110 1 D 100101 ----- ----- - X sth sthbrx Store Word 347 stwbrx Store Word Byte-Reverse Indexed 348 stwcx.5 Store Word Conditional Indexed & record CR 349 stwu Store Word with Update 347 011111 00101 10111 / stwux Store Word with Update Indexed 347 X 011111 00100 10111 / stwx Store Word Indexed 347 X 011111 00001 01000 0 subf Subtract From 351 X 011111 00001 01000 1 subf. Subtract From & record CR 351 X 011111 00000 01000 0 subfc Subtract From Carrying 352 stw Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’s Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 100 Freescale Semiconductor Format Opcode Primary (Inst0:5) Extended (Inst21:31) Mnemonic Instruction BooK E 0.99 page Table 20. Instructions sorted by mnemonic (continued) X 011111 00000 01000 1 subfc. Subtract From Carrying & record CR 352 X 011111 10000 01000 0 subfco Subtract From Carrying & record OV 352 X 011111 10000 01000 1 subfco. Subtract From Carrying & record OV & CR 352 X 011111 00100 01000 0 subfe Subtract From Extended with CA 353 X 011111 00100 01000 1 subfe. Subtract From Extended with CA & record CR 353 X 011111 10100 01000 0 subfeo Subtract From Extended with CA & record OV 353 X 011111 10100 01000 1 subfeo. Subtract From Extended with CA & record OV & CR 353 D 001000 ----- ----- - subfic Subtract From Immediate Carrying 354 X 011111 00111 01000 0 subfme Subtract From Minus One Extended with CA 355 X 011111 00111 01000 1 subfme. Subtract From Minus One Extended with CA & record CR 355 X 011111 10111 01000 0 subfmeo Subtract From Minus One Extended with CA & record OV 355 X 011111 10111 01000 1 subfmeo. Subtract From Minus One Extended with CA & record OV & CR 355 X 011111 10001 01000 0 subfo Subtract From & record OV 351 X 011111 10001 01000 1 subfo. Subtract From & record OV & CR 351 X 011111 00110 01000 0 subfze Subtract From Zero Extended with CA 356 X 011111 00110 01000 1 subfze. Subtract From Zero Extended with CA & record CR 356 X 011111 10110 01000 0 subfzeo Subtract From Zero Extended with CA & record OV 356 X 011111 10110 01000 1 subfzeo. Subtract From Zero Extended with CA & record OV & CR X 011111 11000 10010 / tlbivax X 011111 11101 10010 / X 356 TLB Invalidate Virtual Address Indexed 358 tlbre TLB Read Entry 359 011111 11100 10010 ? tlbsx TLB Search Indexed 360 X 011111 10001 10110 / tlbsync TLB Synchronize 361 X 011111 11110 10010 / tlbwe TLB Write Entry 362 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’s Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 101 BooK E 0.99 page Table 20. Instructions sorted by mnemonic (continued) Format Opcode Primary (Inst0:5) Extended (Inst21:31) Mnemonic Instruction X 011111 00000 00100 / tw Trap Word 363 D 000011 ----- ----- - twi Trap Word Immediate 363 X 011111 00100 00011 / wrtee Write External Enable 364 X 011111 00101 00011 / wrteei Write External Enable Immediate 364 X 011111 01001 11100 0 xor XOR 365 X 011111 01001 11100 1 xor. XOR & record CR 365 D 011010 ----- ----- - xori XOR Immediate 365 D 011011 ----- ----- - xoris XOR Immediate Shifted 365 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’s Manual for the implementation NOTES: 1 Motorola Book E cache locking APU, refer to Section 11.12, Cache line locking/unlocking APU. 2 Attempted execution causes an illegal instruction exception. 3 Motorola Book E isel APU, refer to Section 3.9, ISEL APU. 4 Attempted execution causes an an illegal instruction exception 5 See Section 3.5, Memory synchronization and reservation instructions. 6 See Section 3.10, Debug APU. 7 See Section 3.11, Machine Check APU. 3.17.2 Instruction index sorted by opcode BooK E 0.99 page Table 21. Instructions sorted by opcode Format Opcode D Primary (Inst0:5) Extended (Inst21:31) 000011 ----- ----- - Mnemonic twi Instruction Trap Word Immediate 363 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 102 Freescale Semiconductor Format Opcode Mnemonic Primary (Inst0:5) Extended (Inst21:31) D 000111 ----- ----- - mulli D 001000 ----- ----- - D 001010 D Instruction BooK E 0.99 page Table 21. Instructions sorted by opcode (continued) Multiply Low Immediate 315 subfic Subtract From Immediate Carrying 354 ----- ----- - cmpli Compare Logical Immediate 236 001011 ----- ----- - cmpi Compare Immediate 235 D 001100 ----- ----- - addic Add Immediate Carrying 227 D 001101 ----- ----- - addic. Add Immediate Carrying & record CR 227 D 001110 ----- ----- - addi Add Immediate 226 D 001111 ----- ----- - addis Add Immediate Shifted 226 B 010000 ----- ----0 0 bc Branch Conditional 232 B 010000 ----- ----0 1 bcl Branch Conditional & Link 232 B 010000 ----- ----1 0 bca Branch Conditional Absolute 232 B 010000 ----- ----1 1 bcla Branch Conditional & Link Absolute 232 SC 010001 ///// ////1 / sc System Call 330 I 010010 ----- ----0 0 b Branch 231 I 010010 ----- ----0 1 bl Branch & Link 231 I 010010 ----- ----1 0 ba Branch Absolute 231 I 010010 ----- ----1 1 bla Branch & Link Absolute 231 XL 010011 00000 00000 / mcrf Move Condition Register Field 299 XL 010011 00000 10000 0 bclr Branch Conditional to Link Register 234 XL 010011 00000 10000 1 bclrl Branch Conditional to Link Register & Link 234 XL 010011 00001 00001 / crnor Condition Register NOR 239 XL 010011 00001 00110 / rfmci Return From Machine Check Interrupt ---- XL 010011 00001 00111 / rfdi Return From Debug Interrupt ---- Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 103 BooK E 0.99 page Table 21. Instructions sorted by opcode (continued) Format Opcode Mnemonic Instruction Primary (Inst0:5) Extended (Inst21:31) XL 010011 00001 10010 / rfi Return From Interrupt 322 XL 010011 00001 10011 / rfci Return From Critical Interrupt 321 XL 010011 00100 00001 / crandc Condition Register AND with Complement 238 XL 010011 00100 10110 / isync Instruction Synchronize 282 XL 010011 00110 00001 / crxor Condition Register XOR 240 XL 010011 00111 00001 / crnand Condition Register NAND 239 XL 010011 01000 00001 / crand Condition Register AND 238 XL 010011 01001 00001 / creqv Condition Register Equivalent 238 XL 010011 01101 00001 / crorc Condition Register OR with Complement 240 XL 010011 01110 00001 / cror Condition Register OR 239 XL 010011 10000 10000 0 bcctr Branch Conditional to Count Register 233 XL 010011 10000 10000 1 bcctrl Branch Conditional to Count Register & Link 233 M 010100 ----- ----- 0 rlwimi Rotate Left Word Immediate then Mask Insert 327 M 010100 ----- ----- 1 rlwimi. Rotate Left Word Immediate then Mask Insert & record CR 327 M 010101 ----- ----- 0 rlwinm Rotate Left Word Immediate then AND with Mask 328 M 010101 ----- ----- 1 rlwinm. Rotate Left Word Immediate then AND with Mask & record CR 328 M 010111 ----- ----- 0 rlwnm Rotate Left Word then AND with Mask 328 M 010111 ----- ----- 1 rlwnm. Rotate Left Word then AND with Mask & record CR 328 D 011000 ----- ----- - ori OR Immediate 320 D 011001 ----- ----- - oris OR Immediate Shifted 320 D 011010 ----- ----- - xori XOR Immediate 365 D 011011 ----- ----- - xoris XOR Immediate Shifted 365 D 011100 ----- ----- - andi. AND Immediate & record CR 230 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 104 Freescale Semiconductor Format Opcode Mnemonic Instruction BooK E 0.99 page Table 21. Instructions sorted by opcode (continued) Primary (Inst0:5) Extended (Inst21:31) D 011101 ----- ----- - andis. ?? 011111 ----- 01111 / isel Integer Select X 011111 00000 00000 / cmp Compare 235 X 011111 00000 00100 / tw Trap Word 363 X 011111 00000 01000 0 subfc Subtract From Carrying 352 X 011111 00000 01000 1 subfc. Subtract From Carrying & record CR 352 X 011111 00000 01010 0 addc Add Carrying 224 X 011111 00000 01010 1 addc. Add Carrying & record CR 224 X 011111 /0000 01011 0 mulhwu Multiply High Word Unsigned 314 X 011111 /0000 01011 1 mulhwu. Multiply High Word Unsigned & record CR 314 X 011111 00000 10011 / mfcr Move From Condition Register 301 X 011111 00000 10100 / lwarx Load Word & Reserve Indexed 294 X 011111 00000 10110 / icbt Instruction Cache Block Touch 281 X 011111 00000 10111 / lwzx Load Word & Zero Indexed 297 X 011111 00000 11000 0 slw Shift Left Word 332 X 011111 00000 11000 1 slw. Shift Left Word & record CR 332 X 011111 00000 11010 0 cntlzw Count Leading Zeros Word 237 X 011111 00000 11010 1 cntlzw. Count Leading Zeros Word & record CR 237 X 011111 00000 11100 0 and AND 230 X 011111 00000 11100 1 and. AND & record CR 230 X 011111 00001 00000 / cmpl Compare Logical 236 X 011111 00001 01000 0 subf Subtract From 351 X 011111 00001 01000 1 subf. Subtract From & record CR 351 AND Immediate Shifted & record CR 230 — Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 105 BooK E 0.99 page Table 21. Instructions sorted by opcode (continued) Format Opcode Mnemonic Instruction Primary (Inst0:5) Extended (Inst21:31) X 011111 00001 10110 / dcbst Data Cache Block Store 245 X 011111 00001 10111 / lwzux Load Word & Zero with Update Indexed 297 X 011111 00001 11100 0 andc AND with Complement 230 X 011111 00001 11100 1 andc. AND with Complement & record CR 230 X 011111 /0010 01011 0 mulhw Multiply High Word 314 X 011111 /0010 01011 1 mulhw. Multiply High Word & record CR 314 X 011111 00010 10011 / mfmsr Move From Machine State Register 303 X 011111 00010 10110 / dcbf Data Cache Block Flush 242 X 011111 00010 10111 / lbzx Load Byte & Zero Indexed 283 X 011111 00011 01000 0 neg Negate 318 X 011111 00011 01000 1 neg. Negate & record CR 318 X 011111 00011 10111 / lbzux Load Byte & Zero with Update Indexed 283 X 011111 00011 11100 0 nor NOR 319 X 011111 00011 11100 1 nor. NOR & record CR 319 X 011111 00100 00011 / wrtee Write External Enable 364 X 011111 00100 00110 / X 011111 00100 01000 0 subfe Subtract From Extended with CA 353 X 011111 00100 01000 1 subfe. Subtract From Extended with CA & record CR 353 X 011111 00100 01010 0 adde Add Extended with CA 225 X 011111 00100 01010 1 adde. Add Extended with CA & record CR 225 XFX 011111 00100 10000 / mtcrf Move To Condition Register Fields 306 X 011111 00100 10010 / mtmsr Move To Machine State Register 311 X 011111 00100 10110 1 stwcx. Store Word Conditional Indexed & record CR 349 dcbtstls1 Data Cache Block Touch for Store and Lock Set — Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 106 Freescale Semiconductor Format Opcode Mnemonic Primary (Inst0:5) Extended (Inst21:31) X 011111 00100 10111 / stwx X 011111 00101 00011 / X 011111 X Instruction BooK E 0.99 page Table 21. Instructions sorted by opcode (continued) Store Word Indexed 347 wrteei Write External Enable Immediate 364 00101 00110 / dcbtls1 Data Cache Block Touch and Lock Set 011111 00101 10111 / stwux Store Word with Update Indexed 347 X 011111 00110 01000 0 subfze Subtract From Zero Extended with CA 356 X 011111 00110 01000 1 subfze. Subtract From Zero Extended with CA & record CR 356 X 011111 00110 01010 0 addze Add to Zero Extended with CA 229 X 011111 00110 01010 1 addze. Add to Zero Extended with CA & record CR 229 X 011111 00110 10111 / stbx Store Byte Indexed 337 X 011111 00111 00110 / icblc1 X 011111 00111 01000 0 subfme Subtract From Minus One Extended with CA 355 X 011111 00111 01000 1 subfme. Subtract From Minus One Extended with CA & record CR 355 X 011111 00111 01010 0 addme Add to Minus One Extended with CA 228 X 011111 00111 01010 1 addme. Add to Minus One Extended with CA & record CR 228 X 011111 00111 01011 0 mullw Multiply Low Word 316 X 011111 00111 01011 1 mullw. Multiply Low Word & record CR 316 X 011111 00111 10110 / dcbtst Data Cache Block Touch for Store 247 X 011111 00111 10111 / stbux Store Byte with Update Indexed 337 X 011111 01000 00011 / X 011111 01000 01010 0 add Add 223 X 011111 01000 01010 1 add. Add & record CR 223 X 011111 01000 10011 / X 011111 01000 10110 / Instruction Cache Block Lock Clear — — 302 301 dcbt Data Cache Block Touch 246 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 107 BooK E 0.99 page Table 21. Instructions sorted by opcode (continued) Format Opcode Mnemonic Instruction Primary (Inst0:5) Extended (Inst21:31) X 011111 01000 10111 / lhzx Load Halfword & Zero Indexed 290 X 011111 01000 11100 0 eqv Equivalent 253 X 011111 01000 11100 1 eqv. Equivalent & record CR 253 X 011111 01001 10111 / lhzux Load Halfword & Zero with Update Indexed 290 X 011111 01001 11100 0 xor XOR 365 X 011111 01001 11100 1 xor. XOR & record CR 365 XFX 011111 01010 00011 / mfdcr Move From Device Control Register 302 XFX 011111 01010 10011 / mfspr Move From Special Purpose Register 304 X 011111 01010 10111 / lhax Load Halfword Algebraic Indexed 288 X 011111 01011 10111 / lhaux Load Halfword Algebraic with Update Indexed 288 X 011111 01100 00011 / X 011111 01100 00110 / dcblc1 X 011111 01100 10111 / sthx Store Halfword Indexed 343 X 011111 01100 11100 0 orc OR with Complement 320 X 011111 01100 11100 1 orc. OR with Complement & record CR 320 X 011111 01101 10111 / sthux Store Halfword with Update Indexed 343 X 011111 01101 11100 0 or OR 320 X 011111 01101 11100 1 or. OR & record CR 320 XFX 011111 01110 00011 / mtdcr Move To Device Control Register 307 X 011111 01110 01011 0 divwu Divide Word Unsigned 252 X 011111 01110 01011 1 divwu. Divide Word Unsigned & record CR 252 XFX 011111 01110 10011 / mtspr Move To Special Purpose Register 312 X 011111 01110 10110 / dcbi Data Cache Block Invalidate 243 307 Data Cache Block Lock Clear — Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 108 Freescale Semiconductor Format Opcode Mnemonic Instruction BooK E 0.99 page Table 21. Instructions sorted by opcode (continued) Primary (Inst0:5) Extended (Inst21:31) X 011111 01110 11100 0 nand NAND 317 X 011111 01110 11100 1 nand. NAND & record CR 317 X 011111 01111 00110 / icbtls1 Instruction Cache Block Touch and Lock Set X 011111 01111 01011 0 divw Divide Word 251 X 011111 01111 01011 1 divw. Divide Word & record CR 251 X 011111 10000 00000 / mcrxr Move to Condition Register from XER 300 X 011111 10000 01000 0 subfco Subtract From Carrying & record OV 352 X 011111 10000 01000 1 subfco. Subtract From Carrying & record OV & CR 352 X 011111 10000 01010 0 addco Add Carrying & record OV 224 X 011111 10000 01010 1 addco. Add Carrying & record OV & CR 224 X 011111 10000 10101 / X 011111 10000 10110 / X 011111 10000 10111 / X 011111 10000 11000 0 srw Shift Right Word 336 X 011111 10000 11000 1 srw. Shift Right Word & record CR 336 X 011111 10001 01000 0 subfo Subtract From & record OV 351 X 011111 10001 01000 1 subfo. Subtract From & record OV & CR 351 X 011111 10001 10110 / tlbsync TLB Synchronize 361 X 011111 10001 10111 / 287 X 011111 10010 10101 / 292 X 011111 10010 10110 / X 011111 10010 10111 / X 011111 10011 01000 0 — 292 lwbrx Load Word Byte-Reverse Indexed 296 287 msync Memory Synchronize 305 286 nego Negate & record OV 318 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 109 BooK E 0.99 page Table 21. Instructions sorted by opcode (continued) Format Opcode Mnemonic Instruction Primary (Inst0:5) Extended (Inst21:31) X 011111 10011 01000 1 X 011111 10011 10111 / X 011111 10100 01000 0 subfeo Subtract From Extended with CA & record OV 353 X 011111 10100 01000 1 subfeo. Subtract From Extended with CA & record OV & CR 353 X 011111 10100 01010 0 addeo Add Extended with CA & record OV 225 X 011111 10100 01010 1 addeo. Add Extended with CA & record OV & CR 225 X 011111 10100 10101 / X 011111 10100 10110 / X 011111 10100 10111 / 342 X 011111 10101 10111 / 342 X 011111 10110 01000 0 subfzeo X 011111 10110 01000 1 subfzeo. Subtract From Zero Extended with CA & record OV & CR X 011111 10110 01010 0 addzeo Add to Zero Extended with CA & record OV 229 X 011111 10110 01010 1 addzeo. Add to Zero Extended with CA & record OV & CR 229 X 011111 10110 10101 / 346 X 011111 10110 10111 / 340 X 011111 10111 01000 0 subfmeo Subtract From Minus One Extended with CA & record OV X 011111 10111 01000 1 subfmeo. Subtract From Minus One Extended with CA & record OV & CR 355 X 011111 10111 01010 0 addmeo X 011111 10111 01010 1 addmeo. Add to Minus One Extended with CA & record OV & CR X 011111 10111 01011 0 mullwo Multiply Low Word & record OV 316 X 011111 10111 01011 1 mullwo. Multiply Low Word & record OV & CR 316 X 011111 10111 10110 / dcba Data Cache Block Allocate 241 nego. Negate & record OV & record CR 318 286 346 stwbrx Store Word Byte-Reverse Indexed 348 Subtract From Zero Extended with CA & record OV Add to Minus One Extended with CA & record OV 356 356 355 228 228 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 110 Freescale Semiconductor Format Opcode Mnemonic Instruction BooK E 0.99 page Table 21. Instructions sorted by opcode (continued) Primary (Inst0:5) Extended (Inst21:31) X 011111 10111 10111 / X 011111 11000 01010 0 addo Add & record OV 223 X 011111 11000 01010 1 addo. Add & record OV & CR 223 X 011111 11000 10010 / tlbivax TLB Invalidate Virtual Address Indexed 358 X 011111 11000 10110 / lhbrx Load Halfword Byte-Reverse Indexed 289 X 011111 11000 11000 0 sraw Shift Right Algebraic Word 334 X 011111 11000 11000 1 sraw. Shift Right Algebraic Word & record CR 334 X 011111 11001 11000 0 srawi Shift Right Algebraic Word Immediate 334 X 011111 11001 11000 1 srawi. Shift Right Algebraic Word Immediate & record CR 334 X 011111 11010 10110 / mbar Memory Barrier 298 X 011111 11100 10010 ? tlbsx TLB Search Indexed 360 X 011111 11100 10110 / sthbrx Store Halfword Byte-Reverse Indexed 344 X 011111 11100 11010 0 extsh Extend Sign Halfword 254 X 011111 11100 11010 1 extsh. Extend Sign Halfword & record CR 254 X 011111 11101 10010 / tlbre TLB Read Entry 359 X 011111 11101 11010 0 extsb Extend Sign Byte 254 X 011111 11101 11010 1 extsb. Extend Sign Byte & record CR 254 X 011111 11110 01011 0 divwuo Divide Word Unsigned & record OV 252 X 011111 11110 01011 1 divwuo. Divide Word Unsigned & record OV & CR 252 X 011111 11110 10010 / tlbwe TLB Write Entry 362 X 011111 11110 10110 / icbi Instruction Cache Block Invalidate 280 X 011111 11110 10111 / X 011111 11111 01011 0 340 341 divwo Divide Word & record OV 251 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 111 BooK E 0.99 page Table 21. Instructions sorted by opcode (continued) Format Opcode Mnemonic Primary (Inst0:5) Extended (Inst21:31) X 011111 11111 01011 1 divwo. X 011111 11111 10110 / dcbz D 100000 ----- ----- - lwz D 100001 ----- ----- - lwzu D 100010 ----- ----- - lbz D 100011 ----- ----- - D 100100 D Instruction Divide Word & record OV & CR 251 Data Cache Block set to Zero 248 Load Word & Zero 297 Load Word & Zero with Update 297 Load Byte & Zero 283 lbzu Load Byte & Zero with Update 283 ----- ----- - stw Store Word 347 100101 ----- ----- - stwu Store Word with Update 347 D 100110 ----- ----- - stb Store Byte 337 D 100111 ----- ----- - stbu Store Byte with Update 337 D 101000 ----- ----- - lhz Load Halfword & Zero 290 D 101001 ----- ----- - lhzu Load Halfword & Zero with Update 290 D 101010 ----- ----- - lha Load Halfword Algebraic 288 D 101011 ----- ----- - lhau Load Halfword Algebraic with Update 288 D 101100 ----- ----- - sth Store Halfword 343 D 101101 ----- ----- - sthu Store Halfword with Update 343 D 101110 ----- ----- - lmw Load Multiple Word 291 D 101111 ----- ----- - stmw Store Multiple Word 345 D 110000 ----- ----- - 287 D 110001 ----- ----- - 287 D 110010 ----- ----- - 286 D 110011 ----- ----- - 286 D 110100 ----- ----- - 342 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 112 Freescale Semiconductor Format Opcode Mnemonic Instruction BooK E 0.99 page Table 21. Instructions sorted by opcode (continued) Primary (Inst0:5) Extended (Inst21:31) D 110101 ----- ----- - 342 D 110110 ----- ----- - 340 D 110111 ----- ----- - 340 A 111011 ----- 10010 0 264 A 111011 ----- 10010 1 264 A 111011 ----- 10100 0 279 A 111011 ----- 10100 1 279 A 111011 ----- 10101 0 256 A 111011 ----- 10101 1 256 A 111011 ----- 10110 0 278 A 111011 ----- 10110 1 278 A 111011 ----- 11000 0 272 A 111011 ----- 11000 1 272 A 111011 ----- 11001 0 268 A 111011 ----- 11001 1 268 A 111011 ----- 11100 0 267 A 111011 ----- 11100 1 267 A 111011 ----- 11101 0 265 A 111011 ----- 11101 1 265 A 111011 ----- 11110 0 271 A 111011 ----- 11110 1 271 A 111011 ----- 11111 0 270 A 111011 ----- 11111 1 270 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 113 BooK E 0.99 page Table 21. Instructions sorted by opcode (continued) Format Opcode Mnemonic Instruction Primary (Inst0:5) Extended (Inst21:31) A 111111 ----- 10010 0 264 A 111111 ----- 10010 1 264 A 111111 ----- 10100 0 279 A 111111 ----- 10100 1 279 A 111111 ----- 10101 0 256 A 111111 ----- 10101 1 256 A 111111 ----- 10110 0 278 A 111111 ----- 10110 1 278 A 111111 ----- 10111 0 277 A 111111 ----- 10111 1 277 A 111111 ----- 11001 0 268 A 111111 ----- 11001 1 268 A 111111 ----- 11010 0 276 A 111111 ----- 11010 1 276 A 111111 ----- 11100 0 267 A 111111 ----- 11100 1 267 A 111111 ----- 11101 0 265 A 111111 ----- 11101 1 265 A 111111 ----- 11110 0 271 A 111111 ----- 11110 1 271 A 111111 ----- 11111 0 270 A 111111 ----- 11111 1 270 X 111111 00000 00000 / 259 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 114 Freescale Semiconductor Format Opcode Mnemonic Instruction BooK E 0.99 page Table 21. Instructions sorted by opcode (continued) Primary (Inst0:5) Extended (Inst21:31) X 111111 00000 01100 0 273 X 111111 00000 01100 1 273 X 111111 00000 01110 0 262 X 111111 00000 01110 1 262 X 111111 00000 01111 0 262 X 111111 00000 01111 1 262 X 111111 00001 00000 / 259 X 111111 00001 00110 0 308 X 111111 00001 00110 1 308 X 111111 00001 01000 0 269 X 111111 00001 01000 1 269 X 111111 00010 00000 / 300 X 111111 00010 00110 0 308 X 111111 00010 00110 1 308 X 111111 00010 01000 0 266 X 111111 00010 01000 1 266 X 111111 00100 00110 0 310 X 111111 00100 00110 1 310 X 111111 00100 01000 0 269 X 111111 00100 01000 1 269 X 111111 01000 01000 0 255 X 111111 01000 01000 1 255 X 111111 10010 00111 0 303 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 115 BooK E 0.99 page Table 21. Instructions sorted by opcode (continued) Format Opcode Mnemonic Instruction Primary (Inst0:5) Extended (Inst21:31) X 111111 10010 00111 1 303 XFL 111111 10110 00111 0 309 XFL 111111 10110 00111 1 309 X 111111 11001 01110 / 260 X 111111 11001 01111 / 260 X 111111 11010 01110 / 257 Legend: - Don’t care, usually part of an operand field / Reserved bit, invalid instruction form if encoded as 1 ? Allocated for implementation-dependent use. See User’ Manual for the implementation NOTES: 1 Motorola Book E cache locking APU, refer to Section 11.12, Cache line locking/unlocking APU. e200z759n3 Core Reference Manual, Rev. 2 116 Freescale Semiconductor Chapter 4 Instruction Pipeline and Execution Timing This section describes the Zen instruction pipeline and instruction timing information. The core is partitioned into the following subsystems: • Instruction Unit • Control unit • Integer units • Load/store unit • Core interface 4.1 Overview of operation A block diagram of the e200z759n3 core is shown in Figure 17. The instruction fetch unit prefetches instructions from memory into the instruction buffers. The decode unit decodes each instruction and generates information needed by the branch unit and the execution units. Prefetched instructions are written into the instruction buffers. The instruction issue unit attempts to issue a pair of instructions each cycle to the execution units. Source operands for each of the instructions are provided from the GPRs or from the operand feed-forward muxes. Data or resource hazards may create stall conditions that cause instruction issue to be stalled for one or more cycles until the hazard is eliminated. The execution units write the result of a finished instruction onto the proper result bus and into the destination registers. The writeback logic retires an instruction when the instruction has finished execution. Up to three results can be simultaneously written, depending on the size of the result Two execution units are provided to allow dual issue of most instructions. Only a single load/store unit is provided. Only a single integer divide unit is provided, thus a pair of divide instructions cannot issue simultaneously. In addition, the divide unit is blocking. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 117 OnCE/NEXUS CPU CONTROL LOGIC SPE UNITS CONTROL LOGIC MEMORY MANAGEMENT UNIT LR SPR CR INTEGER EXECUTION UNITS GPR CTR XER INSTRUCTION BUFFER 64 N CONTROL INSTRUCTION CACHE 32 DATA INSTRUCTION BUS INTERFACE UNIT ADDRESS MULTIPLY UNITS INSTRUCTION UNIT CONTROL ... EXTERNAL SPR INTERFACE DATA (MTSPR/MFSPR) PC UNIT BRANCH UNIT LOAD/ STORE UNIT DATA CACHE DATA BUS INTERFACE UNIT 32 ADDRESS 64 DATA N CONTROL Figure 17. Zen block diagram Table 22 shows the e200z759n3 concurrent instruction issue capabilities. Note that data dependencies between instructions will generally preclude dual-issue. in particular, read after write dependencies are handled by stalling the issue pipeline as required to ensure the proper execution ordering. e200z759n3 Core Reference Manual, Rev. 2 118 Freescale Semiconductor Table 22. Concurrent instruction issue capabilities Class of instruction Branch Load/ store Scalar integer Scalar float Vector integer Vector float Special branch — 4 4 4 4 4 — load/store 4 — 4 4 4 4 — scalar integer 4 4 41 4 42 4 — scalar float 4 4 4 4 4 — — 2 4 3 4 4 — vector integer 4 4 4 vector float 4 4 4 — 4 — — special — — — — — — — NOTES: 1 excludes divide class instructions occurring in both issue slots 2 excludes vector MAC/multiply class instructions occurring with scalar multiply, or divide class instructions occurring in both issue slots 3 excludes vector MAC/multiply class instructions occurring in both issue slots, or divide class instructions occurring in both issue slots 4.1.1 Control unit The control unit coordinates the instruction fetch unit, branch unit, instruction decode unit, instruction issue unit, completion unit and exception handling logic. 4.1.2 Instruction unit The instruction unit controls the flow of instructions from the cache to the instruction buffers and decode unit. Ten instruction prefetch buffers allow the instruction unit to fetch instructions ahead of actual execution, and serve to decouple memory and the execution pipeline. 4.1.3 Branch unit The branch unit executes branch instructions, predicts conditional branches, and provides branch target addresses for instruction fetches. It contains a 32-entry Branch Target Buffer (BTB) to accelerate execution of branch instructions as well as a 3-entry Return Stack used for subroutine return address prediction. 4.1.4 Instruction decode unit The decode unit includes the instruction buffers. A pair of instructions can be decoded each cycle. The major functions of the decode logic are: • Opcode decoding to determine the instruction class and resource requirements for each instruction being decoded. • Source and destination register dependency checking. • Execution unit assignment. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 119 • Determine any decode serializations, and inhibit subsequent instruction decoding. The decode unit operates in a single processor clock cycle. 4.1.5 Exception handling The exception handling unit includes logic to handle exceptions, interrupts, and traps. 4.2 Execution units The core data execution units consist of the integer units, SPE units, EFPU floating-point units, and the load/store unit. Included in the execution units section are the 32 by 64-bit general purpose registers (GPRs). Instructions with data dependencies begin execution when all such dependencies are resolved. 4.2.1 Integer execution units Each integer execution unit is used to process arithmetic and logical instructions. Adds, subtracts, compares, count leading zeros, shifts and rotates execute in a single cycle. Integer multiply and divides execute in multiple clock cycles. Multiply instructions have a latency of 3 cycles for result data and 4 cycles for condition codes for record forms, with a throughput of 1 per cycle. Divide instructions have a variable latency (4-15 cycles) depending upon the operand data. The worst case integer divide will take 15 cycles. While the divide is running, the rest of the pipeline is unavailable for additional instructions (blocking divide). 4.2.2 Load / store unit The load/store unit executes instructions that move data between the GPRs and the memory subsystem. Loads, when free of data dependencies, execute with a maximum throughput of one per cycle and three cycle latency. Stores also execute with a maximum throughput of one per cycle and three cycle latency. Store data can be fed-forward from an immediately preceding load with no stall. 4.2.3 Embedded floating-point execution units The embedded floating-point execution units are used to process EFPU floating-point arithmetic instructions. Adds, subtracts, compares, multiply, and multiply-accumulate pipelines have a latency of 4 cycles with a maximum throughput of 1 per cycle. EFPU floating-point divide and square root instructions have a latency of 9 cycles. While the divide is running, the rest of the pipeline is unavailable for additional instructions (blocking divide). 4.3 Instruction pipeline The processor pipeline consists of stages for instruction fetch, instruction decode, register read, execution, and result writeback. Certain stages involve multiple clock cycles of execution. The processor also contains an instruction prefetch buffer to allow buffering of instructions prior to the decode stage. e200z759n3 Core Reference Manual, Rev. 2 120 Freescale Semiconductor Instructions proceed from this buffer to the instruction decode stage by entering the instruction decode register IR. Table 23. Pipeline stages Stage Description IFETCH0 Instruction Fetch From Memory, stage 0 IFETCH1 Instruction Fetch From Memory, stage 1 IFETCH2 Instruction Fetch From Memory, stage 2 DECODE0 DECODE1 / RF READ Instruction Decode, stage 0 Instruction Decode, stage 1 / Register Read/ Operand Forwarding / Memory Effective Address Generation EXECUTE0 / MEM0 Instruction Execution stage 0 / Memory Access stage 0 EXECUTE1 / MEM1 Instruction Execution stage 1 / Memory Access stage 1 EXECUTE2 / MEM2 Instruction Execution stage 2 / Memory Access stage 2 EXECUTE3 WB Instruction Execution stage 3 Write Back to Registers e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 121 Simple Instructions IFetch0 I0,I1 I2,I3 I0,I1 IFetch1 I2,I3 I0,I1 IFetch2 I2,I3 I0,I1 Decode0 I2,I3 I0,I1 Decode1/ Reg read/ FFwd I2,I3 I0,I1 Execute0 I2,I3 I0,I1 Feedforward I2,I3 I0,I1 Feedforward I2,I3 I0,I1 Feedforward I2,I3 I0,I1 Writeback I2,I3 Load Instructions IFetch0 IFetch1 L0,L1 L0,L1 IFetch2 Decode0 Decode1/ Reg read / EA calc L0,L1 L0,L1 L0,L1 L0 Memory0 L1 L0 Memory1 L1 L0 Memory2 L1 L0 Feedforward L1 L0 Writeback L1 Figure 18. Pipeline diagram 4.3.1 Description of pipeline stages The Fetch pipeline stages retrieve instructions from the memory system and determine where the next instruction fetch is performed. Up to two 32-bit instructions or four 16-bit instructions are sent from memory to the instruction buffers each cycle. The Decode pipeline stages decodes instructions, read operands from the register file, and performs dependency checking. Execution occurs in one or more of the four execute pipeline stages in each execution unit (perhaps over multiple cycles). Execution of most load/store instructions is pipelined. The load/store unit has four e200z759n3 Core Reference Manual, Rev. 2 122 Freescale Semiconductor pipeline stages. The pipeline stages are: effective address calculation (EA Calc), memory access (MEM0, MEM1), and data format and forward (MEM2). Simple integer instructions complete execution in the Execute 0 stage of the pipeline. Multiply instructions require all four execute stages but may be pipelined as well. Most condition-setting instructions complete in the Execute 0 stage of the pipeline, thus conditional branches dependent on a condition-setting instruction may be resolved by an instruction in this stage. Result feed-forward hardware forwards the result of one instruction into the source operand(s) of a following instruction so that the execution of data-dependent instructions do not wait until the completion of the result writeback. Feed forward hardware is supplied to allow bypassing of completed instructions from all four execute stages into the first execution stage for a subsequent data-dependent instruction. 4.3.2 Instruction prefetch buffers and branch target buffer Zen contains a 10-entry instruction prefetch buffer that supplies instructions into the Instruction Register (IR) for decoding. Each slot in the prefetch buffer is 32 bits wide, capable of holding a single 32-bit instruction, or a pair of 16-bit instructions. Instruction prefetches request a 64-bit doubleword and the prefetch buffer is filled with a pair of instructions at a time, except for the case of a change of flow fetch where the target is to the second (odd) word. In that case only a 32-bit prefetch is performed to load the instruction prefetch buffer. This 32-bit fetch may be immediately followed by a 64-bit prefetch to fill Slots 0 and 1 in the event that the branch is resolved to be taken. In normal sequential execution, instructions are loaded into the IR from prefetch buffer Slot 0 and 1, and as a pair of slots are emptied, they are refilled. Whenever a pair of slots is empty, a 64-bit prefetch is initiated, which fills the earliest empty slot pairs beginning with Slot 0. If the instruction prefetch buffer empties, instruction issue stalls, and the buffer is refilled. The first returned instruction is forwarded directly to the IR. Open cycles on the memory bus are utilized to keep the buffer full when possible. DATA 0:63 IR MUX SLOT1 SLOT3 SLOT5 SLOT7 SLOT9 DECODE SLOT0 SLOT2 SLOT4 SLOT6 SLOT8 . . Figure 19. Zen instruction prefetch buffers To resolve branch instructions and improve the accuracy of branch predictions, Zen implements a dynamic branch prediction mechanism using a 32-entry branch target buffer (BTB). e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 123 An entry is allocated in the BTB whenever a normal branch resolves as taken and the BTB is enabled. Certain other branches do not allocate BTB entries: blr, bclr, bctr, bcctr. Entries in the BTB are allocated on taken branches using a FIFO replacement algorithm. Each BTB entry holds the branch target address, and a 2-bit branch history counter whose value is incremented or decremented on a BTB hit, depending on whether the branch was taken. The counter can assume four different values: strongly taken, weakly taken, weakly not taken, and strongly not taken. On initial allocation of an entry to the BTB for a taken branch, the counter is initialized to the weakly-taken state. A branch will be predicted as taken on a hit in the BTB with a counter value of strongly or weakly taken. In this case the target address contained in the BTB is used to redirect the instruction fetch stream to the target of the branch prior to the branch reaching the instruction decode stage. In the case of a BTB miss, static prediction is used to predict the outcome of the branch. In the case of a mispredicted branch, the instruction fetch stream will return to the proper instruction stream after the branch has been resolved. When a branch is predicted taken and the branch is later resolved (in the branch execute stage), the value of the appropriate BTB counter is updated. If a branch whose counter indicates weakly taken is resolved as taken, the counter increments so that the prediction becomes strongly taken. If the branch resolves as not taken, the prediction changes to weakly not-taken. The counter saturates in the strongly taken states when the prediction is correct. Zen does not implement the static branch prediction that is defined by the Power Architecture architecture. The BO prediction bit in branch encodings is ignored. Dynamic branch prediction is enabled by setting BUCSRBPEN. Allocation of branch target buffer entries may be controlled using the BUCSRBALLOC field to control whether forward or backward branches (or both) are candidates for entry into the BTB, and thus for branch prediction. Once a branch is in the BTB, BUCSRALLOC has no further effect on that branch entry. Clearing BUCSRBPEN disables dynamic branch prediction, in which case Zen reverts to a static prediction mechanism using the BUCSRBPRED field to control whether forward or backward branches (or both) are predicted taken or not taken. The BTB uses virtual addresses for performing tag comparisons. On allocation of a BTB entry, the effective address of a taken branch, along with the current Instruction Space (as indicated by MSRIS) is loaded into the entry and the counter value is set to weakly taken. The current PID value is not maintained as part of the tag information. Zen does support automatic flushing of the BTB when the current PID value is updated by a mtcr PID0 instruction. Software is otherwise responsible for maintaining coherency in the BTB when a change in effective to real (virtual to physical) address mapping is changed. This is supported by the BUCSRBBFI control bit. e200z759n3 Core Reference Manual, Rev. 2 124 Freescale Semiconductor DATA TAG branch addr[0:30] IS target address[0:30] counter entry 0 branch addr[0:30] IS target address[0:30] counter entry 1 ... ... ... ... ... branch addr[0:30] IS target address[0:30] counter entry 31 IS = Instruction Space Figure 20. Zen branch target buffer 4.3.3 Single-cycle instruction pipeline operation Sequences of single-cycle execution instructions follow the flow in Figure 21. Instructions are issued and completed in program order. Most arithmetic and logical instructions fall into this category. Time Slot 1st Inst(s). IF0 2nd Inst(s). 3rd Inst(s). 4th Inst(s). IF1 IF2 D0 E0 FF FF FF WB IF2 D1/ RR D0 IF0 IF1 D1/ RR E0 FF FF FF WB IF0 IF1 IF2 D0 D1/ RR E0 FF FF FF WB IF0 IF1 IF2 D0 D1/ RR E0 FF FF FF WB Figure 21. Basic pipeline flow, single cycle instructions 4.3.4 Basic load and store instruction pipeline operation For load and store instructions, the effective address is calculated in the EA Calc stage, and memory is accessed in the MEM0–MEM1 stages. Data selection and alignment is performed in MEM2, and the result is available at the end of MEM2 for the following instruction. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 125 Time Slot 1st LD Inst. IF0 IF1 IF2 D0 D1/ RR/ EA M0 M1 M2 FF WB 2nd LD/ST Inst. IF0 IF1 IF2 D0 D1/ RR/ EA M0 M1 M2 FF WB IF0 IF1 IF2 D0 D1/ RR — E0 FF FF 3rd Inst. (single cycle) FF WB Figure 22. Basic pipeline flow, load/store instructions 4.3.5 Change-of-flow instruction pipeline operation Simple change of flow instructions require 4 cycles to refill the pipeline with the target instruction for taken branches and branch and link instructions with no BTB hit and no prediction required (condition resolved prior to branch decode). Time Slot BR Inst. Target Inst. IF0 IF1 IF2 D0/ EA (D1/ RR) (E0) (E1) (E2) (E3) WB TF0 TF1 TF2 D0 D1/ RR E0 E1 E2 E3 WB Figure 23. Basic pipeline flow, branch instructions, no prediction For branch type instructions, in some situations this 4 cycle timing may be reduced by performing the target fetch speculatively while the branch instruction is still being fetched into the instruction buffer if the branch target address can be obtained from the BTB. The resulting branch timing reduces to a single clock when the target fetch is initiated early enough and the branch is correctly predicted. e200z759n3 Core Reference Manual, Rev. 2 126 Freescale Semiconductor Time Slot BR Inst. IF0 IF1 IF2 D0 (D1) (E0) (E1) (E2) (E3) WB TF1 TF2 D0 D1/ RR E0 E1 E2 E3 (BTB HIT) TF0 Target Inst. WB Figure 24. Basic pipeline flow, branch instructions, BTB hit, correct prediction, branch taken For certain cases where the branch is incorrectly predicted, 6 cycles are required to correct the misprediction outcome. Figure 25 shows one example. Time Slot BR Inst. IF0 IF1 IF2 D0 (predict not taken) (D1/ (E0) (E1) RR) resolve condition (E2) (E3) TF0 TF1 TF2 Target Inst. WB D0 D1/ RR E0 E1 E2 E3 Figure 25. Basic pipeline flow, branch instructions, predict not taken, incorrect prediction For bcctr and e_bctr cases where the branch is correctly predicted as taken, 5 cycles are required to execute the branch as shown in Figure 26. Time Slot BR Inst. IF0 IF1 IF2 D0 (predict taken) Target Inst. (D1/ (E0) (E1) RR) resolve condition (E2) (E3) TF0 TF2 D0 TF1 WB D1/ RR E0 E1 E2 E3 Figure 26. Basic pipeline flow, bcctr Instruction, predict taken, correct prediction For bcctr and e_bctr cases where the branch is incorrectly predicted as taken, but the fall-through instruction is already in the instruction buffer, 3 cycles are required to execute the branch as shown in Figure 25. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 127 Time Slot BR Inst. IF0 IF1 IF2 D0 (predict taken) Target Inst. (D1/ (E0) (E1) RR) resolve condition (E2) (E3) TF0 TF1 TF2 (discard) D0 D1/ RR E0 Fall-through Inst. WB E1 E2 E3 WB Figure 27. Basic pipeline flow, bcctr Instruction, predict taken, incorrect prediction, instruction buffer not empty For bcctr and e_bctr cases where the branch is incorrectly predicted as taken, and the fall-through instruction is not already in the instruction buffer (a rare case), 6 cycles are required to execute the branch as shown in Figure 25. Time Slot BR Inst. IF0 IF1 IF2 D0 (predict taken) Target Inst. Fall-through Inst. (D1/ (E0) (E1) RR) resolve condition (E2) (E3) TF0 TF1 TF2 (discard) IF0 IF1 IF2 WB D0 D1/ RR E0 E1 E2 E3 Figure 28. Basic pipeline flow, bcctr Instruction, predict taken, incorrect prediction, instruction buffer empty 4.3.6 Basic multi-cycle instruction pipeline operation Most multi-cycle instructions may be pipelined so that the effective execution time is smaller than the overall number of clocks spent in execution. The restrictions to this execution overlap are that no data dependencies between the instructions are present, and that instructions must complete and write back results in order. A single-cycle instruction that follows a multi-cycle instruction must wait for completion of the multi-cycle instruction prior to its writeback in order to meet the in-order requirement. Result feed-forward paths are provided so that execution may continue prior to result writeback. e200z759n3 Core Reference Manual, Rev. 2 128 Freescale Semiconductor Time Slot 1st Inst. (multiply) IF0 2nd Inst(s). (single cycle) IF1 IF2 D0 IF0 IF1 IF0 3rd Inst(s). (single cycle) 4th Inst(s). (single cycle, dep on mul) E0 E1 E2 E3 WB IF2 D1/ RR D0 D1/ RR E0 FF FF FF WB IF1 IF2 D0 D1/ RR E0 FF FF FF WB IF0 IF1 IF2 D0 D1/ RR E0 FF FF FF WB Figure 29. Basic pipeline flow, integer multiply class instructions The divide and load and store multiple instructions require multiple cycles in the execute stage. Time Slot long inst. IF0 next inst. (single cycle) IF1 IF2 D0 D1/ RR E0 E1 E2 E3 .... Elast WB IF0 TIF1 IF2 D0 D1/ RR — — — — — E0 FF FF FF Figure 30. Basic pipeline flow, long instruction 4.3.7 Additional examples of instruction pipeline operation for load and store Figure 31 shows an example of pipelining two non-data-dependent load or store instructions with a following load target data-dependent single cycle instruction. While the first load or store begins accessing memory in the M0 stage, the next load can be calculating a new effective address in the D1/EA stage. The add in this example will stall for two cycles since a data dependency exists on the target register of the second load. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 129 Time Slot 1st LD Inst. IF0 IF1 IF2 D0 D1/ RR/ EA M0 M1 M2 FF WB 2nd LD/ST Inst. IF0 IF1 IF2 D0 D1/ RR/ EA M0 M1 M2 FF WB IF0 IF1 IF2 D0 D1/ RR — — E0 FF 3rd Inst. (add, depends on 2nd load) FF FF WB Figure 31. Pipelined load instructions with load target data dependency Figure 32 shows an example of pipelining a data-dependent add instruction following a load with update instruction. While the first load begins accessing memory in the M0 stage, the next load with update can be calculating a new effective address in the EA Calc stage. Following the EA Calc, the updated base register value can be fed-forward to subsequent instructions. The add in this example will not stall, even though a data dependency exists on the updated base register of the load with update. Time Slot 1st Inst. (load) IF0 IF1 IF2 D0 D1/ RR/ EA M0 M1 M2 FF WB 2nd Inst. (load w/update) IF0 IF1 IF2 D0 D1/ RR/ EA M0 M1 M2 FF WB IF0 IF1 IF2 D0 D1/ RR E0 FF FF FF 3rd Inst. (add, depends on 2nd load) WB Figure 32. Pipelined instructions with base register update data dependency Figure 33 shows an example of pipelining a data-dependent store instruction following a load instruction. While the first load begins accessing memory in the M0 stage, the store can be calculating a new effective address in the D1/EA stage. The store in this example will not stall due to the data dependency existing on the load data of the load instruction. e200z759n3 Core Reference Manual, Rev. 2 130 Freescale Semiconductor Time Slot 1st Inst. (load) IF0 IF1 IF2 D0 D1/ RR/ EA M0 M1 M2 FF WB IF0 2nd Inst. (store, data depends on load) IF1 IF2 D0 D1/ RR/ EA M0 M1 M2 FF WB Figure 33. Pipelined store instruction with store data dependency 4.3.8 Move to/from SPR instruction pipeline operation Many mtspr and mfspr instructions are treated like single cycle instructions in the pipeline, and do not cause stalls. Exceptions are for the MSR, the Debug SPRs, the SPE Unit, and Cache/MMU SPRs, which do cause stalls. Figure 34 through Figure 36 show examples of mtspr and mfspr instruction timing. Figure 34 applies to the Debug SPRs and the SPE APU’s SPEFSCR. These instructions do not begin execution until all previous instructions have finished their execute stage(s). In addition, execution of subsequent instructions is stalled until the mfspr and mtspr instructions complete. Time Slot Prev Inst. IF0 mtspr, mfspr debug, SPE Inst. Next Inst. IF1 IF2 D0 D1/ RR/ EA E0 E1 E2 E3 WB IF0 IF1 IF2 D0 D1/ RR — — — E0 E1 E2 E3 WB IF0 IF1 IF2 D0 D1/ RR — — — — — — E0 E1 Figure 34. mtspr, mfspr instruction execution, debug and SPE SPRs Figure 35 applies to the mtmsr instruction and the wrtee and wrteei instructions. Execution of subsequent instructions is stalled until the cycle after these instructions writeback. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 131 Time Slot Prev Inst. IF0 mtmsr, wrtee wrteei Inst. Next Inst. IF1 IF2 D0 D1/ RR/ EA E0 E1 E2 E3 WB IF0 IF1 IF2 D0 D1/ RR E0 E1 E2 E3 WB IF0 IF1 IF2 D0 D1/ RR — — — — — E0 E1 E2 Figure 35. mtmsr, wrtee[i] instruction execution Access to cache and MMU SPRs are stalled until all outstanding bus accesses have completed on both interfaces and the caches and MMU are idle (p_[d,i]_cmbusy negated) to allow an access window where no translations or cache cycles are required. Figure 36 shows an example where an outstanding bus access causes mtspr/mfspr execution to be delayed until the bus becomes idle. Other situations such as a cache linefill may cause the cache to be busy even when the processor interface is idle (p_[d,i]_tbusy[0]_b is negated). In these cases execution stalls until the cache and MMU are idle as signaled by negation of p_[d,i]_cmbusy. Processor access requests will be held off during execution of a Cache/MMU SPR instruction. A subsequent access request may be generated the cycle following the last execute stage (i.e. during the WB cycle). This same protocol applies to cache and MMU management instructions (e.g. dcbz, dcbf, etc., tlbre, tlbwe, etc.). e200z759n3 Core Reference Manual, Rev. 2 132 Freescale Semiconductor Time Slot Prev Inst. IF0 mtspr, mfspr debug, SPE Inst. IF1 IF2 D0 D1/ RR/ EA E0 E1 E2 E3 WB IF0 IF1 IF2 D0 D1/ RR — — — E0 E1 E2 E3 WB IF0 IF1 IF2 D0 D1/ RR — — — — — — E0 Next Inst. E1 p_rd_spr, p_wr_spr p_[d,i]_treq_b p_[d,i]_tbusy[0]_b p_[d,i]_ta_b p_[d,i]_cmbusy Figure 36. Cache / MMU mtspr, mfspr and management instruction execution 4.4 Control hazards Several internal control hazards exist in Zen that can cause certain instruction sequences to incur one or more stall cycles. These include: • mfspr instruction preceded by a mtspr instruction — issue stalls until the mtspr completes 4.5 Instruction serialization There are three types of serialization required by the core: • Completion serialization • Dispatch (Decode/Issue) serialization • Refetch serialization 4.5.1 Completion serialization A completion serialized instruction is held for execution until all prior instructions have completed. The instruction will then execute once it is next to complete in program order. Results from these instructions will not be available for or forwarded to subsequent instructions until the instruction completes. Instructions that are completion serialized are: e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 133 • • • • 4.5.2 Instructions that access or modify system control or status registers. e.g. mcrxr, mtmsr, wrtee, wrteei, mtspr, mfspr (except to CTR/LR), Instructions that manage caches and TLBs Instructions defined by the architecture as context or execution synchronizing: isync, se_isync, msync, rfi, rfci, rfdi, rfmci, se_rfi, se_rfci, se_rfdi, se_rfmci, sc, se_sc. wait Dispatch serialization Some instructions are dispatch-serialized by the core. An instruction that is dispatch-serialized prevents the next instruction from decoding until all instructions up to and including the dispatch-serialized instruction completes. Instructions that are dispatch serialized are isync, se_isync, msync, rfi, rfci, rfdi, rfmci, se_rfi, se_rfci, se_rfdi, se_rfmci, sc, se_sc. The mbar instruction is “pseudo-dispatch” serialized; it prevents the next instruction from decoding until all previous load and store class instructions have completed. 4.5.3 Refetch serialization Refetch serialized instructions inhibit dispatching of subsequent instructions and force a pipeline refill to refetch subsequent instructions after completion. These include: • The context synchronizing instructions isync, se_isync. • The rfi, rfci, rfdi, rfmci, se_rfi, se_rfci, se_rfdi, se_rfmci, sc, se_sc instructions. Figure 39 shows Time Slot 1 2 3 4 5 Single cycle EX1 Instructions EX2 EX3 WB EX0 EX1 Abort — — EX0 Stall Stall Stall 6 7 8 9 10 11 WB Stall final sample point p_extint_b p_iack 1st Instruction of handler IF0 IF1 IF2 D0 D1/RR EX0 WB Figure 37. Interrupt recognition and handler instruction execution e200z759n3 Core Reference Manual, Rev. 2 134 Freescale Semiconductor Time Slot 1 Load/Store Mem0 Instructions 2 3 wait D1/RR EX0 4 wait MEM2 Abort D1/RR Stall 5 6 7 8 9 10 11 WB — — Stall Stall WB Stall final sample point p_extint_b p_iack IF0 IF1 IF2 D0 D1/RR EX2 WB 1st Instruction of handler Figure 38. Interrupt recognition and handler instruction execution —load/store in progress Time Slot 1 2 3 4 5 6 D1 Multi-cycle Interruptible Instruction Next Instruction E0 E1 Abort — — D1 (E0) Abort — 7 8 9 10 — final sample point WB p_extint_b p_iack 1st Instruction of handler IF0 IF1 IF1 D0 D1/RR EX2 WB Figure 39. Interrupt recognition and handler instruction execution — multi-cycle instruction abort 4.6 Concurrent instruction execution The core effectively has several execution units: • Branch unit • Dual scalar integer units • Dual vector integer units • Dual scalar Embedded Floating-point units/ Single vector Embedded Floating-point unit e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 135 • Load/store unit These executions units are pipelined and support overlapped execution of instructions. In certain cases, the branch unit predicts branches and supplies a speculative instruction stream to the instruction buffer unit. The following instruction timing section accurately indicates the number of cycles an instruction executes in the appropriate unit, however, determining the elapsed time or cycles to execute a sequence of instructions is beyond the scope of this document. 4.7 Instruction Timings Instruction timing in number of processor clock cycles for various instruction classes is shown in Table 24. Pipelined instructions are shown with cycles of total latency and throughput cycles. Divide instructions are not pipelined and block other instructions from executing during divide execution. Timing for SPE instructions is detailed in Section 6.6, SPE instruction timing. Load/store multiple instruction cycles are represented as a fixed number of cycles plus a variable number of cycles where ‘n’ is the number of words accessed by the instruction. In addition, cycle times marked with a ‘&’ require variable number of additional cycles due to serialization. Table 24. Instruction class cycle counts Class of Instructions Latency Throughput Special notes integer: add, sub, shift, rotate, logical, cntlzw 1 1 — integer: compare 1 1 — Branch 6/4/1 6/4/1 Correct branch lookahead allows single cycle execution Worst-case mispredicted branch is 6 cycles multiply 3/4 1 result data is available after 3 cycles, record form conditions are available after 4th cycle divide 4-15 4-15 CR logical 1 1 — loads (non-multiple) 3 1 — load multiple stores (non-multiple) store multiple data-dependent timing 3 + n/2 (max) 1 + n/2 (max) Actual timing depends on n and address alignment. 3 1 — 3 + n/2 (max) 1 + n/2 (max) Actual timing depends on n and address alignment. mtmsr, wrtee, wrteei 6& 6 mcrf 1 1 mfspr, mtspr 4& 4& mfspr, mfmsr 1 1 Applies to Debug SPRs, optional unit SPRS Applies to internal, non Debug SPRs e200z759n3 Core Reference Manual, Rev. 2 136 Freescale Semiconductor Table 24. Instruction class cycle counts (continued) Class of Instructions Latency Throughput Special notes mfcr, mtcr 1 1 — rfi, rfci, rfdi, rfmci 6 — — sc 4 — — tw, twi 4 — Trap taken timing Detailed timing for each instruction mnemonic along with serialization requirements is shown in Table 25. Table 25. Instruction Timing by Mnemonic Mnemonic Latency Serialization add[o][.] 1 none addc[o][.] 1 none adde[o][.] 1 none addi 1 none addic[.] 1 none addis 1 none addme[o][.] 1 none addze[o][.] 1 none and[.] 1 none andc[.] 1 none andi. 1 none andis. 1 none b[l][a] 6/4/1 none bc[l][a] 6/4/1 none bcctr[l] 6/5/3/1 none bclr[l] 6/5/3/1 none cmp 1 none cmpi 1 none cmpl 1 none cmpli 1 none cntlzw[.] 1 none crand 1 none crandc 1 none creqv 1 none crnand 1 none crnor 1 none cror 1 none e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 137 Table 25. Instruction Timing by Mnemonic (continued) Mnemonic Latency Serialization crorc 1 none crxor 1 none divw[o][.] 4-15 1 none divwu[o][.] 4-151 none eqv[.] 1 none extsb[.] 1 none extsh[.] 1 none isel 1 none 2 refetch isync 6 lbarx 3 none lbz 33 none lbzu 33 none lbzux 3 3 none lbzx 33 none lha 33 none lharx 3 none lhau 33 none 3 3 none lhax 3 3 none lhbrx 33 none 3 3 none lhzu 3 3 none lhzux 33 none lhzx 33 none lmw 3 +(n/2) none lwarx 3 none lwbrx 33 none lwz 33 none lwzu 33 none 3 3 none lwzx 3 3 none mbar 12 pseudo- dispatch mcrf 1 none mcrxr 1 completion mfcr 1 none lhaux lhz lwzux e200z759n3 Core Reference Manual, Rev. 2 138 Freescale Semiconductor Table 25. Instruction Timing by Mnemonic (continued) Mnemonic Latency Serialization mfmsr 1 none mfspr (except DEBUG) 1 none mfspr (DEBUG) 3 2 completion msync 12 completion mtcrf 2 none 6 2 completion mtspr (DEBUG) 4 2 completion mtspr (except DEBUG, msr, hid0/1) 1 none mulhw[.] 3/4 none mulhwu[.] 3/4 none mulli 3/4 none mullw[o][.] 3/4 none nand[.] 1 none neg[o][.] 1 none nop (ori r0,r0,0) 1 none nor[.] 1 none or[.] 1 none orc[.] 1 none ori 1 none oris 1 none rfci 6 refetch rfdi 6 refetch rfi 6 refetch rfmci 6 refetch rlwimi[.] 1 none rlwinm[.] 1 none rlwnm[.] 1 none sc 4 refetch slw[.] 1 none sraw[.] 1 none srawi[.] 1 none srw[.] 1 none stb 33 none stbcx. 3 none mtmsr e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 139 Table 25. Instruction Timing by Mnemonic (continued) Mnemonic Latency Serialization stbu 33 none stbux 3 3 none stbx 33 none sth 33 none sthbrx 33 none sthcx. 3 none sthu 3 3 none sthux 33 none 3 none sthx 3 stmw 3 + (n/2) none stw 33 none stwbrx 33 none stwcx. 3 none stwu 33 none stwux 33 none stwx 33 none subf[o][.] 1 none subfc[o][.] 1 none subfe[o][.] 1 none subfic 1 none subfme[o][.] 1 none subfze[o][.] 1 none tw 4 none twi 4 none wrtee 6 completion wrteei 6 completion xor[.] 1 none xori 1 none xoris 1 none NOTES: 1 with early-out capability, timing is data dependent 2 plus additional synchronization time 3 Aligned e200z759n3 Core Reference Manual, Rev. 2 140 Freescale Semiconductor 4.8 Operand placement on performance The placement (location and alignment) of operands in memory affects relative performance of memory accesses, and in some cases, affects it significantly. Table 26 indicates the effects for the Zen core. In Table 26, optimal means that one effective address (EA) calculation occurs during the memory operation. Good means that multiple EA calculations occur during the memory operation, which may cause additional bus activities with multiple bus transfers. Poor means that an alignment interrupt is generated by the storage operation. Table 26. Performance effects of storage operand placement Operand Boundary crossing* Size Byte alignment None Cache line Protection boundary 4 Byte 4 <4 optimal1 good2 — good — good 2 Byte 2 <2 optimal good — good — good 1 Byte 1 optimal — — lmw, stmw 4 <4 good poor3 good poor good poor string N/A — — — NOTES: 1 optimal: One EA calculation occurs. 2 good: Multiple EA calculations occur, which may cause additional bus activities with multiple bus transfers. 3 poor: Alignment Interrupt occurs. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 141 e200z759n3 Core Reference Manual, Rev. 2 142 Freescale Semiconductor Chapter 5 Embedded Floating-Point APU (EFPU2) This chapter describes the instruction set architecture of the Embedded Floating-point APU version 2 (EFPU2) implemented on e200z759n3. This unit implements scalar and vector single-precision floating-point instructions to accelerate signal processing and other algorithms. In comparison to version 1.1 of the EFPU architecture, version 2 of the architecture implements additional operations such as minimum, maximum, and square root, as well as an extensive set of vector operations with permuted operands and mixed add/sub, sum, and differences. For the remainder of this chapter, the term EFPU implies version 2 of the architecture unless otherwise noted. 5.1 Nomenclature and conventions Several conventions regarding nomenclature are used in this chapter: • Bits 0 to 31 of a 64-bit register are referenced as field 0, upper half, or high-order element of the register. Bits 32–63 are referred to as field 1, lower half, or lower-order element of the register. Each half is an element of a GPR. • Mnemonics for EFPU instructions begin with the letters ‘evfs’ (embedded vector floating single) or ‘efs’ (embedded (scalar) floating single). 5.2 EFPU programming model The e200z759n3 core provides a register file with thirty-two 64-bit registers. The Power Architecture 32-bit Book E instructions operate on the lower (least significant) 32 bits of the 64-bit register. EFPU instructions are defined that view the 64-bit register as being composed of a vector of two 32-bit elements, or a single scalar 32-bit element. Vector floating-point instructions operate on a vector of two 32-bit single-precision floating-point numbers resident in the 64-bit GPRs. Scalar single-precision floating-point instructions operate on the lower half of GPRs. The floating-point instructions do not have a separate register file; there is a single shared register file for all instructions. There are no record forms of EFPU instructions. EFPU compare instructions store the result of the comparison into the condition register (CR). The meaning of the CR bits are now overloaded for the vector operations. Floating-point compare instructions treat NaNs, Infinity and Denorm as normalized numbers for the comparison calculation when default results are provided. 5.2.1 Signal Processing Extension / Embedded Floating-point Status and Control Register (SPEFSCR) Status and control for embedded floating-point uses the SPEFSCR register. This register is also used by the SPE APU. Status and control bits are shared for vector floating-point operations, scalar floating-point operations and SPE vector operations. The SPEFSCR register is implemented as special e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 143 8 FRMC FOVFE FUNFE FINVE FDBZE 0 FINXE FOVF FUNF FINV FDBZ FX FG OV SOV MODE 7 FOVFS 6 FDBZS 5 FUNFS 4 0 FINVS FINVH 3 FINXS FXH 2 FOVFH FGH 1 FDBZH OVH 0 FUNFH SOVH purpose register (SPR) number 512 and is read and written by the mfspr and mtspr instructions. The SPEFSCR is shown in Figure 5-1. 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 512; Read/Write; Reset - 0x0 Figure 5-1. SPE/EFPU Status and Control Register (SPEFSCR) The SPEFSCR bits are defined in Table 5-1. Table 5-1. SPEFSCR field descriptions Bits Name Description 0 (32) SOVH 1 (33) OVH Integer Overflow High Defined by SPE. 2 (34) FGH Embedded Floating-point Guard bit High FGH is supplied for use by the Floating-point Round exception handler. FGH is zeroed if a Floating-point Data Exception occurs for the high element(s). FGH corresponds to the high element result. FGH is cleared by a scalar floating point instruction. 3 (35) FXH Embedded Floating-point Sticky bit High FXH is supplied for use by the Floating-point Round exception handler. FXH is zeroed if a Floating-point Data Exception occurs for the high element(s). FXH corresponds to the high element result. FXH is cleared by a scalar floating point instruction. 4 (36) FINVH Embedded Floating-point Invalid Operation / Input error High In mode 0, the FINVH bit is set to 1 if the A or B high element operand of a floating-point instruction is Infinity, NaN, or Denorm, or if the operation is a divide and the high element dividend and divisor are both 0. In mode 1, the FINVH bit is set on an IEEE754 invalid operation (IEEE754-1985 sec7.1) in the high element. FINVHH is cleared by a scalar floating point instruction. 5 (37) FDBZH Embedded Floating-point Divide by Zero High The FDBZH bit is set to 1 when a floating-point divide instruction executed with a high element divisor of 0, and the high element dividend is a finite non-zero number. FDBZH is cleared by a scalar floating point instruction. 6 (38) FUNFH Embedded Floating-point Underflow High The FUNFH bit is set to 1 when the execution of a floating-point instruction results in an underflow in the high element. FUNFH is cleared by a scalar floating point instruction. 7 (39) FOVFH Embedded Floating-point Overflow High The FOVFH bit is set to 1 when the execution of a floating-point instruction results in an overflow in the high element. FOVFH is cleared by a scalar floating point instruction. 8:9 (40:41) — Summary Integer Overflow High Defined by SPE. Reserved e200z759n3 Core Reference Manual, Rev. 2 144 Freescale Semiconductor Table 5-1. SPEFSCR field descriptions (continued) Bits Name Description 10 (42) FINXS Embedded Floating-point Inexact Sticky Flag The FINXS bit is set to 1 whenever the execution of a floating-point instruction delivers an inexact result for either the low or high element and no Floating-point Data exception is taken for either element, or if the result of a Floating-point instruction results in overflow (FOVF=1 or FOVFH=1), but Floating-point Overflow exceptions are disabled (FOVFE=0), or if the result of a Floating-point instruction results in underflow (FUNF=1 or FUNFH=1), but Floating-point Underflow exceptions are disabled (FUNFE=0), and no Floating-point Data exception occurs. The FINXS bit remains set until it is cleared by a mtspr instruction specifying the SPEFSCR register. 11 (43) FINVS Embedded Floating-point Invalid Operation Sticky Flag The FINVS bit is set to a 1 when a floating-point instruction sets the FINVH or FINV bit to 1. The FINVS bit remains set until it is cleared by a mtspr instruction specifying the SPEFSCR register. 12 (44) FDBZS Embedded Floating-point Divide by Zero Sticky Flag The FDBZS bit is set to 1 when a floating-point divide instruction sets the FDBZH or FDBZ bit to 1. The FDBZS bit remains set until it is cleared by a mtspr instruction specifying the SPEFSCR register. 13 (45) FUNFS Embedded Floating-point Underflow Sticky Flag The FUNFS bit is set to 1 when a floating-point instruction sets the FUNFH or FUNF bit to 1. The FUNFS bit remains set until it is cleared by a mtspr instruction specifying the SPEFSCR register. 14 (46) FOVFS Embedded Floating-point Overflow Sticky Flag The FOVFS bit is set to 1 when a floating-point instruction sets the FOVFH or FOVF bit to 1. The FOVFS bit remains set until it is cleared by a mtspr instruction specifying the SPEFSCR register. 15 (47) MODE Embedded Floating-point Operating Mode 0 Default hardware results operating mode 1 IEEE754 hardware results operating mode (not supported by Zen) This bit controls the operating mode of the EFPU. Zen supports only mode 0. Software should read the value of this bit after writing it to determine if the implementation supports the selected mode. Implementations will return the value written if the selected mode is a supported mode, otherwise the value read will indicate the hardware supported mode. 16 (48) SOV Summary integer overflow Defined by SPE. 17 (49) OV Integer overflow Defined by SPE. 18 (50) FG Embedded Floating-point Guard bit FG is supplied for use by the Floating-point Round exception handler. FG is zeroed if a Floating-point Data Exception occurs for the low element(s). FG corresponds to the low element result. 19 (51) FX Embedded Floating-point Sticky bit FX is supplied for use by the Floating-point Round exception handler.FX is zeroed if a Floating-point Data Exception occurs for the low element(s). FX corresponds to the low element result. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 145 Table 5-1. SPEFSCR field descriptions (continued) Bits Name Description 20 (52) FINV Embedded Floating-point Invalid Operation / Input error In mode 0, the FINV bit is set to 1 if the A or B low element operand of a floating-point instruction is Infinity, NaN, or Denorm, or if the operation is a divide and the low element dividend and divisor are both 0. In mode 1, the FINV bit is set on an IEEE754 invalid operation (IEEE754-1985 sec7.1) in the low element. 21 (53) FDBZ Embedded Floating-point Divide by Zero The FDBZ bit is set to 1 when a floating-point divide instruction executed with a low element divisor of 0, and the low element dividend is a finite non-zero number. 22 (54) FUNF Embedded Floating-point Underflow The FUNF bit is set to 1 when the execution of a floating-point instruction results in an underflow in the low element. 23 (55) FOVF Embedded Floating-point Overflow The FOVF bit is set to 1 when the execution of a floating-point instruction results in an overflow in the low element. 24 (56) — 25 (57) FINXE Embedded Floating-point Inexact Exception Enable 0 Exception disabled 1 Exception enabled If the exception is enabled, a Floating-point Round exception is taken if for both elements, the result of a Floating-point instruction does not result in overflow or underflow, and the result for either element is inexact (FG | FX = 1, or FGH | FXH =1), or if the result of a Floating-point instruction does result in overflow (FOVF=1 or FOVFH=1) for either element, but Floating-point Overflow exceptions are disabled (FOVFE=0), or if the result of a Floating-point instruction results in underflow (FUNF=1 or FUNFH=1), but Floating-point Underflow exceptions are disabled (FUNFE=0), and no Floating-point Data exception occurs. 26 (58) FINVE Embedded Floating-point Invalid Operation / Input Error Exception Enable 0 Exception disabled 1 Exception enabled If the exception is enabled, a Floating-point Data exception is taken if the FINV or FINVH bit is set by a floating-point instruction. 27 (59) FDBZE Embedded Floating-point Divide by Zero Exception Enable 0 Exception disabled 1 Exception enabled If the exception is enabled, a Floating-point Data exception is taken if the FDBZ or FDBZH bit is set by a floating-point instruction. 28 (60) FUNFE Embedded Floating-point Underflow Exception Enable 0 Exception disabled 1 Exception enabled If the exception is enabled, a Floating-point Data exception is taken if the FUNF or FUNFH bit is set by a floating-point instruction. Reserved e200z759n3 Core Reference Manual, Rev. 2 146 Freescale Semiconductor Table 5-1. SPEFSCR field descriptions (continued) Bits Name Description 29 (61) FOVFE Embedded Floating-point Overflow Exception Enable 0 Exception disabled 1 Exception enabled If the exception is enabled, a Floating-point Data exception is taken if the FOVF or FOVFH bit is set by a floating-point instruction. 30:31 (62:63) FRMC Embedded Floating-point Rounding Mode Control 00 Round to Nearest 01 Round toward Zero 10 Round toward +Infinity 11 Round toward -Infinity 5.2.2 GPRs and PowerISA 2.06 instructions The e200z759n3 core implements the 32-bit forms of the Book E instructions. All 32-bit PowerISA 2.06 instructions operate upon the lower half of the 64-bit GPR. These instructions do not affect the upper half of a GPR. 5.2.3 SPE/EFPU available bit in MSR MSRSPE is defined as the SPE/EFPU available bit. If this bit is clear and software attempts to execute any of the EFPU vector instructions (evfsxxx) that affect the upper 32 bits of a GPR, the EFPU APU Unavailable exception is taken. If this bit is set, software can execute any of the EFPU instructions. 5.2.4 Embedded floating-point exception bit in ESR ESRSPE is defined as the SPE/EFPU exception bit. This bit is set whenever the processor takes an exception related to the execution of a SPE APU instruction. This bit is also set whenever the processor takes an interrupt related to the execution of the embedded floating-point instructions. (Note that the same bit is used for SPE APU exceptions. Thus, SPE and embedded floating-point interrupts are indistinguishable in the ESR). 5.2.5 EFPU exceptions The architecture defines the following Embedded Floating-point APU exceptions: • SPE/EFPU Unavailable exception • EFPU Floating-point Data exception • EFPU Floating-point Round exception Three new interrupt vector offset registers (IVORs), IVOR32, IVOR33, and IVOR34, are used by the exception model. The SPR number for IVOR32 is 528, for IVOR33 it is 529, and for IVOR34 it is 530. These registers are privileged. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 147 5.2.5.1 EFPU unavailable exception The EFPU Unavailable exception is taken if MSRSPE is cleared and execution of an EFPU vector instruction (evfsxxx) is attempted. When the EFPU Unavailable exception occurs, the processor suppresses execution of the instruction causing the exception. The SRR0, SRR1, MSR, and ESR registers are modified as follows: • SRR0 is set to the effective address of the instruction causing the exception. • SRR1 is set to the contents of the MSR at the time of the exception. • MSRCE,ME,DE are unchanged. All other bits are cleared. • The ESRSPE bit is set. All other ESR bits are cleared. Instruction execution resumes at address IVPR0:15||ivor3216:27||0b0000. 5.2.5.2 Embedded floating-point data exception The embedded floating-point data exception vector is used for enabled floating-point invalid operation/input error, underflow, overflow, and divide by zero exceptions (collectively called floating-point data exceptions). When one of these enabled floating-point exceptions occurs, the processor suppresses execution of the instruction causing the exception. The SRR0, SRR1, MSR, ESR and SPEFSCR registers are modified as follows: • SRR0 is set to the effective address of the instruction causing the exception. • SRR1 is set to the contents of the MSR at the time of the exception. • MSR bits CE, ME and DE are unchanged. All other bits are cleared. • The ESRSPE bit is set. All other ESR bits are cleared. • One or more SPEFSCR status bits are set to indicate the type of exception. The affected bits are FINVH, FINV, FDBZH, FDBZ, FOVFH, FOVF, FUNFH, and FUNF. SPEFSCRFG, FGH, FX, FXH are cleared Instruction execution resumes at address IVPR0:15||IVOR3316:27||0b0000. 5.2.5.3 Embedded floating-point round exception The embedded floating-point round exception occurs if the SPEFSCRFINXE bit is set and either the unrounded result of an operation is not exact, or an overflow occurs and overflow exceptions are disabled (FOVF or FOVFH set with FOVFE cleared), or if an underflow occurs and underflow exceptions are disabled (FUNF set with FUNFE cleared), and no floating-point data exception is taken. The embedded floating-point round exception will not occur if an enabled embedded floating-point data exception occurs. When the embedded floating-point round exception occurs, the unrounded (truncated) result of an inexact high or low element is placed in the target register. If only a single element is inexact, the other exact element will be updated with the correctly rounded result. The FG and FX bits corresponding to the other exact element will both be ‘0’. The bits FG and FX are provided so that an exception handler can round the result as it desires. FG (called the ‘guard’ bit) is the value of the bit immediately to the right of the lsb of the destination format mantissa from the infinitely precise intermediate calculation before rounding. FX (called the ‘sticky’ bit) is the value e200z759n3 Core Reference Manual, Rev. 2 148 Freescale Semiconductor of the ‘or’ of all the bits to the right of the guard bit (FG) of the destination format mantissa from the infinitely precise intermediate calculation before rounding. The SRR0, SRR1, MSR, ESR and SPEFSCR registers are modified as follows: • SRR0 is set to the effective address of the instruction following the instruction causing the exception. • SRR1 is set to the contents of the MSR at the time of the exception. • MSR bits CE, ME and DE are unchanged. All other bits are cleared. • The ESRSPE bit is set. All other ESR bits are cleared. • SPEFSCRFGH, FG, FXH, FX are set appropriately. SPEFSCRFINXS will be set. Instruction execution resumes at address IVPR0:15||IVOR3416:27||0b0000. 5.2.6 Exception Priorities The following list shows the priority order in which exceptions are taken: 1. EFPU Unavailable exception 2. EFPU Floating-point Data exception 3. EFPU Floating-point Round exception An embedded Floating-point Data exception will be taken if either element generates a embedded Floating-point Data exception. An embedded Floating-point Round exception will be taken if either element generates an embedded Floating-point Round exception and neither element generates a EFPU Floating-point Data exception. 5.3 Embedded floating-point APU operations e200z759n3 implements floating-point instructions that operate upon the contents of a 64-bit register that is a vector of two single-precision floating-point elements. The floating-point unit shares the same register file as the integer unit. There is no separate floating-point register file. Floating-point instructions are also provided to perform scalar single precision floating-point operations on the low elements of registers, without affecting the high-order portion. The Power Architecture UISA and Book E floating-point instructions are not implemented in e200z759n3. The Freescale EIS architecture definition for embedded floating-point defines two operating modes; a real-time, ‘default results’ oriented mode (mode 0) and a ‘true IEEE754 results’ operating mode (mode 1). Implementations of the embedded floating-point APU may choose to implement one or both of these modes. The e200z759n3 hardware implements mode 0. IEEE754 compliant operation is still available in mode 0 with assistance of a software envelope. 5.3.1 Floating-point data formats The EFPU supports single-precision scalar and single-precision vector floating-point data operations and conversions. In addition, conversions between single-precision floating-point and the half-precision floating-point storage format are supported. These formats are described in the following subsections. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 149 5.3.1.1 Single-precision floating-point format Each single-precision floating-point data element is 32 bits wide with one sign bit (s), 8 bits of biased exponent (e) and 23 bits of fraction (f). In the IEEE-754 specification, floating point values are represented in a format consisting of three explicit fields (sign field, biased exponent field, and fraction field) and an implicit hidden bit. Hidden Bit 0 S 1 8 9 exp 31 fraction S - sign bit 0 - positive; 1 - negative exp - biased exponent field (excess 127 notation) fraction- fractional portion of number Figure 5-2. Single-precision data format For Normalized numbers, the biased exponent value ‘e’ lies in the range of 1 to 254 corresponding to an actual exponent value E in the range –126 to +127, the hidden bit is a ‘1’ (for normalized numbers), and the value of the number is interpreted as S – 1 2 E 1.fraction where E is the unbiased exponent and 1.fraction is the significand consisting of a leading ‘1’ (the hidden bit) and a fractional part (fraction field). With this format, the maximum positive normalized number (pmax) is represented by the encoding 0x7F7FFFFF, which is approximately 3.4E+38 ( 2 128 ), and the minimum positive normalized value (pmin) is represented by the encoding 0x00800000, which is approximately 1.2E–38 ( 2 –126 ) Two specific values of the biased exponent are reserved; 0, and 255, for encoding special values of 0 , NaN , and Denorm . Zeros of both positive and negative sign are represented by a biased exponent value e of zero and a fraction f that is zero. Infinities of both positive and negative sign are represented by a biased exponent value of 255 and a fraction that is zero. Denormalized numbers of both positive and negative sign are represented by a biased exponent value e of 0 and a fraction f that is non-zero. For these numbers, the hidden bit is defined by the IEEE-754 standard to be ‘0’. This number type is not directly supported in hardware. Instead, either a software exception handler is invoked, or a default value is defined, depending on the operating mode. Not a Numbers (NaNs) are represented by a biased exponent value e of 255 and a fraction f that is non-zero. Defining pmax to be the most positive normalized value (farthest from zero), pmin the smallest positive normalized value (closest to zero), nmax the most negative normalized value (farthest from zero) and nmin the smallest normalized negative value (closest to zero), an overflow is said to have occurred if the e200z759n3 Core Reference Manual, Rev. 2 150 Freescale Semiconductor numerically correct result of an instruction is such that r>pmax or r<nmax. An underflow is said to have occurred if the numerically correct result of an instruction is such that 0<r<pmin or nmin<r<0. In this case, r may be denormalized, or may be smaller than the smallest denormalized number. If e=255 and f!= 0, then the value is a NaN. If e=0 and f=0, then the value is a signed 0. The EFPU hardware will not produce +Inf, -Inf, NaN, or a Denormalized number. If the result of an instruction overflows and Floating-point Overflow exceptions are disabled (SPEFSCRFOVFE bit is cleared), then pmax or nmax is generated as the result of that instruction depending upon the sign of the result. If the result of an instruction underflows and Floating-point Underflow exceptions are disabled (SPEFSCRFUNFE bit is cleared), then +0 or -0 is generated as the result of that instruction based upon the sign of the result. 5.3.1.2 Half-precision floating-point format Half-precision floating-point storage format is supported by the EFPU with conversion operations to and from single-precision floating-point format. No computational operations are defined for half-precision format numbers. Each half-precision floating-point data element is 16 bits wide with one sign bit (s), 5 bits of biased exponent (e) and 10 bits of fraction (f). In the IEEE-754r proposal, half-precision floating point values are represented in a format consisting of three explicit fields (sign field, biased exponent field, and fraction field) and an implicit hidden bit. Hidden Bit 0 S 5 1 15 6 exp fraction S - sign bit 0 - positive; 1 - negative exp - biased exponent field (excess 15 notation) fraction- fractional portion of number Figure 5-3. Half-precision data format For Normalized numbers, the biased exponent value ‘e’ lies in the range of 1 to 30 corresponding to an actual exponent value E in the range –14 to +15, the hidden bit is a ‘1’ (for normalized numbers), and the value of the number is interpreted as S – 1 2 E 1.fraction where E is the unbiased exponent and 1.fraction is the significand consisting of a leading ‘1’ (the hidden bit) and a fractional part (fraction field). With this format, the maximum positive normalized number (pmaxhp) is represented by the encoding 0x7BFF, which is 65504, and the minimum positive normalized value (pminhp) is represented by the encoding 0x0400, which is approximately 6.1E-5 ( 2 –14 ). Two specific values of the biased exponent are reserved; 0, and 31, for encoding special values of 0 , NaN , and Denorm . e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 151 Zeros of both positive and negative sign are represented by a biased exponent value e of zero and a fraction f that is zero. Infinities of both positive and negative sign are represented by a biased exponent value of 31 and a fraction that is zero. Denormalized numbers of both positive and negative sign are represented by a biased exponent value e of 0 and a fraction f that is non-zero. For these numbers, the hidden bit is defined to be ‘0’. Not a Numbers (NaNs) are represented by a biased exponent value e of 31 and a fraction f that is non-zero. Defining pmaxhp to be the most positive normalized value (farthest from zero), pminhp the smallest positive normalized value (closest to zero), nmaxhp the most negative normalized value (farthest from zero) and nminhp the smallest normalized negative value (closest to zero), an overflow is said to have occurred if the numerically correct result of a conversion is such that r>pmaxhp or r<nmaxhp. An underflow is said to have occurred if the numerically correct result of a conversion is such that 0<r<pminhp or nminhp<r<0. In this case, r may be denormalized, or may be smaller than the smallest denormalized number. If e=31 and f!= 0, then the value is a NaN. If e=0 and f=0, then the value is a signed 0. The EFPU hardware will not produce +Inf, –Inf, NaN, or a Denormalized number. If the result of a conversion to half-precision format overflows and Floating-point Overflow exceptions are disabled (SPEFSCRFOVFE bit is cleared), then pmaxhp or nmaxhp is generated as the result of that instruction depending upon the sign of the result. If the result of conversion to half-precision format underflows and Floating-point Underflow exceptions are disabled (SPEFSCRFUNFE bit is cleared), then +0 or -0 is generated as the result of that instruction based upon the sign of the result. Conversions from half-precision format to single-precision format are always exact, unless the source operand is a NaN, Inf, or Denorm. In such cases, if Floating-point Invalid Input exceptions are disabled (SPEFSCRFINVE bit is cleared), the conversion results in a properly signed max norm or zero default result. 5.3.2 IEEE 754 compliance The Freescale EIS architecture specifies that the EFPU implements a single-precision floating-point system as defined in ANSI/IEEE Standard 754-1985 but may rely on software support in order to conform fully with the standard. Thus, whenever an input operand of the floating-point instruction has data values that are +Infinity, –Infinity, Denormalized, NaN, or when the result of an operation produces an overflow or an underflow, an exception may be taken and the exception handler is responsible for delivering IEEE 754 compliant behavior if desired. When floating-point invalid input exceptions are disabled (SPEFSCRFINVE is cleared), default results are provided by the hardware when an Infinity, Denormalized, or NaN input is received, or for the operation 0/0. When Floating-point Underflow exceptions are disabled (SPEFSCRFUNFE is cleared) and the result of a floating-point operation underflows, a signed zero result is produced. The inexact exception is also signaled for this condition. When floating-point overflow exceptions are disabled (SPEFSCRFOVFE is cleared) and the result of a floating-point operation overflows, a pmax or nmax result is produced. The inexact exception is also signaled for this condition. An exception enable flag (SPEFSCRFINXE) is also provided for generating an exception when an inexact result is produced, to allow a software handler to conform to the IEEE 754 standard. A divide by zero exception enable flag (SPEFSCRFDBZE) is also provided for generating an exception when a divide by zero operation is attempted to allow a software e200z759n3 Core Reference Manual, Rev. 2 152 Freescale Semiconductor handler to conform to the IEEE 754 standard. All of these exceptions may be disabled, and the hardware will then deliver an appropriate default result. Overflow and underflow conditions are determined after rounding on Zen implementations. 5.3.3 Floating-point exceptions See Section 5.2.5, EFPU exceptions. 5.3.4 Embedded scalar single-precision floating-point instructions In the following instruction descriptions, “sa” is the sign of operand A, “ea” is the biased exponent value of operand A, “sb” is the sign of operand B, “eb” is the biased exponent value of operand B, “ei” is an intermediate exponent value, “r” is a result value. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 153 efsabs efsabs Floating-Point Single-Precision Absolute Value efsabs rD,rA 0 0 5 0 0 1 0 6 0 10 11 RD 15 16 RA 0 20 21 0 0 0 0 0 31 1 0 1 1 0 0 0 1 0 0 RD32:63 = 0b0 || RA33:63 Description: The sign bit of the low element of RA is set to 0 and the result is placed into the low element of RD. Exceptions: If the low element of RA is Infinity, Denorm, or NaN, the SPEFSCRFINV bit is set, and FG and FX are cleared. FGH and FXH are cleared as well. If Floating-point Invalid Input exceptions are enabled then an exception is taken, and the destination register is not updated. e200z759n3 Core Reference Manual, Rev. 2 154 Freescale Semiconductor efsadd efsadd Floating-Point Single-Precision Add efsadd rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 1 0 0 0 0 0 0 RD32:63 = RA32:63 +sp RB32:63 Description: The low element of RA is added to the low element of RB and the result is stored in the low element of RD. If RA is NaN or infinity, the result is either pmax (sa==0), or nmax (sa==1). Otherwise, If RB is NaN or infinity, the result is either pmax (sb==0), or nmax (sb==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in RD. Exceptions: If the contents of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV bit is set. If SPEFSCRFINVE is set, an exception is taken, and the destination register is not updated. Otherwise, if an overflow occurs, then the SPEFSCRFOVF bit is set, or if an underflow occurs, then the SPEFSCRFUNF bit is set. If either underflow or overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. FGH, FXH, FG and FX will be cleared if an overflow, underflow, or invalid operation/input error is signaled, regardless of enabled exceptions. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 155 efscfh efscfh Convert Floating-Point Single-Precision from Half-Precision efscfh rD,rB 0 0 5 0 0 1 0 6 0 10 11 RD 0 15 16 0 1 0 0 20 21 RB 0 31 1 0 1 1 0 1 0 0 0 1 FP16format f; FP32format result; f rB48:63 if (fexp = 0) & (ffrac = 0)) then result fsign || 310 // signed zero value else if Isa16NaNorInfinity(f) then SPEFSCRFINV 1 result fsign || 0b11111110 || 231 // max value else if Isa16Denorm(f) then SPEFSCRFINV 1 result fsign || 310 else resultsign fsign resultexp fexp - 15 + 127 resultfrac ffrac || 130 rD32:63 = result The half-precision FP number in the low half of the low element in RB is converted to a single-precision floating-point value and the result is placed into the low element of RD. The rounding mode is not used since this conversion is always exact. Exceptions: If the source element of rB is Infinity, Denorm, or NaN, SPEFSCRFINV is set. If SPEFSCRFINVE is set, an interrupt is taken, the destination register is not updated, and the FGH, FXH, FG, and FX bits are cleared. e200z759n3 Core Reference Manual, Rev. 2 156 Freescale Semiconductor efscfsf efscfsf Convert Floating-Point Single-Precision from Signed Fraction efscfsf rD,rB 0 0 5 0 0 1 0 0 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 1 0 1 0 0 1 1 Description: bl = RB32:63 RD32:63 = CnvtSF32ToFP32(bl) The signed fractional low element in RB is converted to a single-precision floating-point value using the current rounding mode and the result is placed into the low element of RD. Exceptions: This instruction can signal an inexact status and set SPEFSCRFINXS if the conversion is not exact. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 157 efscfsi efscfsi Convert Floating-Point Single-Precision from Signed Integer efscfsi rD,rB 0 0 5 0 0 1 0 0 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 1 0 1 0 0 0 1 Description: bl = RB32:63 RD32:63 = CnvtSI32ToFP32(bl) The signed integer low element in RB is converted to a single-precision floating-point value using the current rounding mode and the result is placed into the low element of RD. Exceptions: This instruction can signal an inexact status and set SPEFSCRFINXS if the conversion is not exact. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. e200z759n3 Core Reference Manual, Rev. 2 158 Freescale Semiconductor efscfuf efscfuf Convert Floating-Point Single-Precision from Unsigned Fraction efscfuf rD,rB 0 0 5 0 0 1 0 0 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 1 0 1 0 0 1 0 Description: bl = RB32:63 RD32:63 = CnvtUF32ToFP32(bl) The unsigned fractional low element in RB is converted to a single-precision floating-point value using the current rounding mode and the result is placed into the low element of RD. Exceptions: This instruction can signal an inexact status and set SPEFSCRFINXS if the conversion is not exact. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 159 efscfui efscfui Convert Floating-Point Single-Precision from Unsigned Integer efscfui rD,rB 0 0 5 0 0 1 0 0 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 1 0 1 0 0 0 0 Description: bl = RB32:63 RD32:63 = CnvtUI32ToFP32(bl) The unsigned integer low element in RB is converted to a single-precision floating-point value using the current rounding mode and the result is placed into the low element of RD. Exceptions: This instruction can signal an inexact status and set SPEFSCRFINXS if the conversion is not exact. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. e200z759n3 Core Reference Manual, Rev. 2 160 Freescale Semiconductor efscmpgt efscmpgt Floating-Point Single-Precision Compare Greater Than efscmpgt crfD,rA,rB 0 0 5 0 0 1 0 0 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 0 31 1 0 1 1 0 0 1 1 0 0 Description: al = RA32:63 bl = RB32:63 if (al > bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = undefined || cl || undefined || undefined The low element of RA is compared against the low element of RB. If RA is greater than RB, then the bit in the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = -0). Exceptions: If the contents of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV bit is set, and the FGH FXH, FG and FX bits are cleared. If Floating-point Invalid Input exceptions are enabled then an exception is taken, and the Condition Register is not updated. Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 161 efscmpeq efscmpeq Floating-Point Single-Precision Compare Equal efscmpeq crfD,rA,rB 0 0 5 0 0 1 0 0 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 0 31 1 0 1 1 0 0 1 1 1 0 Description: al = RA32:63 bl = RB32:63 if (al == bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = undefined || cl || undefined || undefined The low element of RA is compared against the low element of RB. If RA is equal to RB, then the bit in the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = -0). Exceptions: If the contents of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV bit is set, and the FGH FXH, FG and FX bits are cleared. If Floating-point Invalid Input exceptions are enabled then an exception is taken, and the Condition Register is not updated. Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. e200z759n3 Core Reference Manual, Rev. 2 162 Freescale Semiconductor efscmplt efscmplt Floating-Point Single-Precision Compare Less Than efscmplt crfD,rA,rB 0 0 5 0 0 1 0 0 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 0 31 1 0 1 1 0 0 1 1 0 1 Description: al = RA32:63 bl = RB32:63 if (al < bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = undefined || cl || undefined || undefined The low element of RA is compared against the low element of RB. If RA is less than RB, then the bit in the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = -0). Exceptions: If the contents of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV bit is set, and the FGH FXH, FG and FX bits are cleared. If Floating-point Invalid Input exceptions are enabled then an exception is taken, and the Condition Register is not updated. Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 163 efscth efscth Convert Floating-Point Single-Precision to Half-Precision efscth rD,rB 0 0 5 0 0 1 0 6 0 10 11 RD 0 15 16 0 1 0 0 20 21 RB 0 31 1 0 1 1 0 1 0 1 0 1 FP32format f; FP16format result; f rB32:63 if (fexp = 0) & (ffrac = 0)) then result fsign || 150 // signed zero value else if Isa32NaNorInfinity(f) then SPEFSCRFINV 1 result fsign || 0b11110 || 101 // max value else if Isa32Denorm(f) then SPEFSCRFINV 1 result fsign || 150 else unbias fexp - 127 if unbias > 15 then result fsign || 0b11110 || 101 // max value SPEFSCRFOVF 1 else if unbias < -14 && (result would not round up to bmin) then result fsign || 150 // like-signed zero value SPEFSCRFUNF 1 else resultsign fsign resultexp unbias + 15 resultfrac ffrac[0:9] guard ffrac[10] sticky (ffrac[11:22] 0) result Round16(result, LOWER, guard, sticky) SPEFSCRFG guard SPEFSCRFX sticky if guard | sticky then SPEFSCRFINXS 1 rD32:63 = 160 || result The single-precision FP number in the low element in RB is converted to a half-precision floating-point value using the current rounding mode. The result is then prepended with 16 zeros, and placed into the low element of RD. Exceptions: If the source element of rB is Infinity, Denorm, or NaN, SPEFSCRFINV is set. If SPEFSCRFINVE is set, an interrupt is taken, the destination register is not updated, and the FGH, FXH, FG, and FX bits are cleared. Otherwise, if an overflow occurs, SPEFSCRFOVF is set, or if an underflow occurs, SPEFSCRFUNF is set. If either underflow or overflow exceptions are enabled and the corresponding bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is e200z759n3 Core Reference Manual, Rev. 2 164 Freescale Semiconductor updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler, and the FGH and FXH bits are cleared. FGH, FXH, FG, and FX are cleared if an overflow, underflow, or invalid operation/input error is signaled, regardless of enabled exceptions. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 165 efsctsf efsctsf Convert Floating-Point Single-Precision to Signed Fraction efsctsf rD,rB 0 0 5 0 0 1 0 0 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 1 0 1 0 1 1 1 Description: bl = RB32:63 if (bl == Denorm) then RD32:63 = 0 else if ((bl == +0) || (bl == -0)) // zero cases RD32:63 = 0 else if (ebl < 127) then RD32:63 = CnvtFP32ToSF32Sat(bl) else if ((ebl == 127) && (sbl == 1) && (fbl==0)) then RD32:63 = 0x80000000 // max negative, no overflow else if (bl == NAN) then RD32:63 = 0 else // Overflow if (sbl == 0) then // Positive RD32:63 = 0x7FFFFFFF else RD32:63 = 0x80000000 The single-precision floating-point low element in RB is converted to a signed fraction using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit fraction. NaNs are converted as though they were zero. Exceptions: If the contents of RB are Infinity, Denorm, or NaN, or if an overflow occurs, then the SPEFSCRFINV bit is set, and the FGH, FXH, FG, and FX bits are cleared. If SPEFSCRFINVE is set, an exception is taken, and the destination register is not updated. This instruction can signal an inexact status and set SPEFSCRFINXS if the conversion is not exact. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. e200z759n3 Core Reference Manual, Rev. 2 166 Freescale Semiconductor efsctsi efsctsi Convert Floating-Point Single-Precision to Signed Integer efsctsi rD,rB 0 0 5 0 0 1 0 0 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 1 0 1 0 1 0 1 Description: bl = RB32:63 if (bl == Denorm) then RD32:63 = 0 else if (ebl < 158) then RD32:63 = CnvtFP32ToSI32Sat(al) else if ((ebl == 158) && (sbl == 1) && (fbl==0)) then RD32:63 = 0x80000000 // max negative, no overflow else if (bl == NAN) then RD32:63 = 0 else // Overflow if (sbl == 0) then // Positive RD32:63 = 0x7FFFFFFF else RD32:63 = 0x80000000 The single-precision floating-point low element in RB is converted to a signed integer using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Exceptions: If the contents of RB are Infinity, Denorm, or NaN, or if an overflow occurs, then the SPEFSCRFINV bit is set, and the FGH, FXH, FG, and FX bits are cleared. If SPEFSCRFINVE is set, an exception is taken, the destination register is not updated, and no other status bits are set. This instruction can signal an inexact status and set SPEFSCRFINXS if the conversion is not exact. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 167 efsctsiz efsctsiz Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero efsctsiz rD,rB 0 0 5 0 0 1 0 0 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 1 0 1 1 0 1 0 Description: bl = RB32:63 if (bl == Denorm) then RD32:63 = 0 else if (ebl < 158) then RD32:63 = CnvtFP32ToSI32Sat(bl) else if ((ebl == 158) && (sbl == 1) && (fbl==0)) then RD32:63 = 0x80000000 // max negative, no overflow else if (bl == NAN) then RD32:63 = 0 else // Overflow if (sbl == 0) then // Positive RD32:63 = 0x7FFFFFFF else RD32:63 = 0x80000000 The single-precision floating-point low element in RB is converted to a signed integer using the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Exceptions: If the contents of RB are Infinity, Denorm, or NaN, or if an overflow occurs, then the SPEFSCRFINV bit is set, and the FGH, FXH, FG, and FX bits are cleared. If SPEFSCRFINVE is set, an exception is taken, the destination register is not updated, and no other status bits are set. This instruction can signal an inexact status and set SPEFSCRFINXS if the conversion is not exact. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. e200z759n3 Core Reference Manual, Rev. 2 168 Freescale Semiconductor efsctuf efsctuf Convert Floating-Point Single-Precision to Unsigned Fraction efsctuf rD,rB 0 0 5 0 0 1 0 0 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 1 0 1 0 1 1 0 Description: bl = RB32:63 if (bl == Denorm) then // force denorm to zero RD32:63 = 0 else if ((bl == +0) || (bl == -0)) // zero cases RD32:63 = 0 else if (sbl == 1) // Negative RD32:63 = 0 else if (ebl < 127) RD32:63 = CnvtFP32ToUF32Sat(bl) else if (bl == NAN) then RD32:63 = 0 else // Overflow RD32:63 = 0xFFFFFFFF The single-precision floating-point low element in RB is converted to an unsigned fraction using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit unsigned fraction. NaNs are converted as though they were zero. Exceptions: If the contents of RB are Infinity, Denorm, or NaN, or if an overflow occurs, then the SPEFSCRFINV bit is set, and the FGH, FXH, FG, and FX bits are cleared. If SPEFSCRFINVE is set, an exception is taken, and the destination register is not updated. This instruction can signal an inexact status and set SPEFSCRFINXS if the conversion is not exact. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 169 efsctui efsctui Convert Floating-Point Single-Precision to Unsigned Integer efsctui rD,rB 0 0 5 0 0 1 0 0 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 1 0 1 0 1 0 0 Description: bl = RB32:63 if (bl == Denorm) then // force denorm to zero RD32:63 = 0 else if ((bl == +0) || (bl == -0)) // zero cases RD32:63 = 0 else if (sbl == 1) // Negative RD32:63 = 0 else if (ebl <= 158) RD32:63 = CnvtFP32ToUI32Sat(bl) else if (bl == NAN) then RD32:63 = 0 else // Overflow RD32:63 = 0xFFFFFFFF The single-precision floating-point low element in RB is converted to an unsigned integer using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Exceptions: If the contents of RB are Infinity, Denorm, or NaN, or if an overflow occurs, then the SPEFSCRFINV bit is set, and the FGH, FXH, FG, and FX bits are cleared. If SPEFSCRFINVE is set, an exception is taken, and the destination register is not updated. This instruction can signal an inexact status and set SPEFSCRFINXS if the conversion is not exact. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. e200z759n3 Core Reference Manual, Rev. 2 170 Freescale Semiconductor efsctuiz efsctuiz Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero efsctui rD,rB 0 0 5 0 0 1 0 0 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 1 0 1 1 0 0 0 Description: bl = RB32:63 if (bl == Denorm) then // force denorm to zero RD32:63 = 0 else if ((bl == +0) || (bl == -0)) // zero cases RD32:63 = 0 else if (sbl == 1) // Negative RD32:63 = 0 else if (ebl <= 158) RD32:63 = CnvtFP32ToUI32Sat(bl) else if (bl == NAN) then RD32:63 = 0 else // Overflow RD32:63 = 0xFFFFFFFF The single-precision floating-point low element in RB is converted to an unsigned integer using the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Exceptions: If the contents of RB are Infinity, Denorm, or NaN, or if an overflow occurs, then the SPEFSCRFINV bit is set, and the FGH, FXH, FG, and FX bits are cleared. If SPEFSCRFINVE is set, an exception is taken, and the destination register is not updated. This instruction can signal an inexact status and set SPEFSCRFINXS if the conversion is not exact. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 171 efsdiv efsdiv Floating-Point Single-Precision Divide efsdiv rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 1 0 0 1 0 0 1 RD32:63 = RA32:63 sp RB32:63 Description: The low element of RA is divided by the low element of RB and the result is stored in the low element of RD. If RB is a NaN or infinity, the result is a properly signed zero. Otherwise, if RB is a denormalized number or a zero, or if RA is either NaN or infinity, the result is either pmax (sa==sb), or nmax (sa!=sb). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 or -0 (as appropriate) is stored in RD. Exceptions: If the contents of RA or RB are Infinity, Denorm, or NaN, or if both RA and RB are +/-0, the SPEFSCRFINV bit is set. If SPEFSCRFINVE is set, an exception is taken, and the destination register is not updated. Otherwise, if the content of RB is +/-0 and the content of RA is a finite normalized non-zero number, the SPEFSCRFDBZ bit is set. If Floating-point Divide by Zero exceptions are enabled, an exception is then taken. Otherwise, if an overflow occurs, then the SPEFSCRFOVF bit is set, or if an underflow occurs, then the SPEFSCRFUNF bit is set. If either underflow or overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. FGH, FXH, FG and FX will be cleared if an overflow, underflow, divide by zero, or invalid operation/input error is signaled, regardless of enabled exceptions. e200z759n3 Core Reference Manual, Rev. 2 172 Freescale Semiconductor efsmadd efsmadd Floating-Point Single-Precision Multiply-Add efsmadd rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 1 0 0 0 0 1 0 RD32:63 = ((RA32:63 Xfp RB32:63) +sp RD32:63) The low element of rA is multiplied by the low element of rB, the intermediate product is added to the low element of rD, and the result is stored in the low element of rD. If RA or RB are either zero or denormalized, the intermediate product is a properly signed zero. Otherwise, if RA or RB are either NaN or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa!=sb), and this value is used for the result and stored into RD. Otherwise, the intermediate product is added to the corresponding element of RD. If RD is NaN or infinity, the result is either pmax (sd==0), or nmax (sd==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in RD. Exceptions: If the contents of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV bit is set. If SPEFSCRFINVE is set, an exception is taken, and the destination register is not updated. Otherwise, if an overflow occurs, then the SPEFSCRFOVF bit is set, or if an underflow occurs, then the SPEFSCRFUNF bit is set. If either underflow or overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If the result of this instruction is inexact, or if an overflow occurs on the add, but overflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. FGH, FXH, FG and FX will be cleared if an overflow, underflow, or invalid operation/input error is signaled, regardless of enabled exceptions. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 173 efsmax efsmax Floating-Point Single-Precision Maximum efsmax rD,rA,rB 0 0 5 0 0 1 0 0 6 8 RD 9 10 11 15 16 RA 20 21 RB 0 31 1 0 1 0 1 1 0 0 0 0 alrA32:63 blrB32:63 if (al < bl) then tempbl else tempal if (isnan(al) & ~(isnan(bl))) then tempbl if (isnan(bl) & ~(isnan(al))) then tempal rD32:63temp The low element of rA is compared against the low element of rB. The larger element is selected and placed into the low element of rD. The maximum of +0 and -0 is +0. Exceptions: If the contents of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV is set, and the FGH, FXH, FG and FX bits are cleared. If SPEFSCRFINVE is set, an interrupt is taken, and the destination register is not updated. Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. If one of the elements is a NaN and the other is not, the non-NaN element is selected rather than the comparison result. If the selected element is denorm, the result is a same signed zero. If the selected element is +NaN or +infinity, the corresponding result is pmax. Otherwise, if the selected element is -NaN or -infinity, the corresponding result is nmax. e200z759n3 Core Reference Manual, Rev. 2 174 Freescale Semiconductor efsmin efsmin Floating-Point Single-Precision Minimum efsmin rD,rA,rB 0 0 5 0 0 1 0 0 6 8 RD 9 10 11 15 16 RA 20 21 RB 0 31 1 0 1 0 1 1 0 0 0 1 alrA32:63 blrB32:63 if (al < bl) then tempal else tempbl if (isnan(al) & ~(isnan(bl))) then tempbl if (isnan(bl) & ~(isnan(al))) then tempal rD32:63temp The low element of rA is compared against the low element of rB. The smaller element is selected and placed into the low element of rD. The minimum of +0 and -0 is -0. Exceptions: If the contents of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV is set, and the FGH, FXH, FG and FX bits are cleared. If SPEFSCRFINVE is set, an interrupt is taken, and the destination register is not updated. Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. If one of the elements is a NaN and the other is not, the non-NaN element is selected rather than the comparison result. If the selected element is denorm, the result is a same signed zero. If the selected element is +NaN or +infinity, the corresponding result is pmax. Otherwise, if the selected element is -NaN or -infinity, the corresponding result is nmax. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 175 efsmsub efsmsub Floating-Point Single-Precision Multiply-Subtract efsmsub rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 1 0 0 0 0 1 1 RD32:63 = ((RA32:63 Xfp RB32:63) -sp RD32:63) The low element of rA is multiplied by the low element of rB, the low element of rD is subtracted from the intermediate product, and the result is stored in the low element of rD. If RA or RB are either zero or denormalized, the intermediate product is a properly signed zero. Otherwise, if RA or RB are either NaN or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa!=sb), and this value is used for the result and stored into RD. Otherwise, the low element of rD is subtracted from the intermediate product. If RD is NaN or infinity, the result is either nmax (sd==0), or pmax (sd==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in RD. Exceptions: If the contents of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV bit is set. If SPEFSCRFINVE is set, an exception is taken, and the destination register is not updated. Otherwise, if an overflow occurs, then the SPEFSCRFOVF bit is set, or if an underflow occurs, then the SPEFSCRFUNF bit is set. If either underflow or overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. FGH, FXH, FG and FX will be cleared if an overflow, underflow, or invalid operation/input error is signaled, regardless of enabled exceptions. e200z759n3 Core Reference Manual, Rev. 2 176 Freescale Semiconductor efsmul efsmul Floating-Point Single-Precision Multiply efsmul rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 1 0 0 1 0 0 0 RD32:63 = RA32:63 Xsp RB32:63 Description: The low element of RA is multiplied by the low element of RB and the result is stored in the low element of RD. If RA or RB are either zero or denormalized, the result is a properly signed zero. Otherwise, if RA or RB are either NaN or infinity, the result is either pmax (sa==sb), or nmax (sa!=sb). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 or -0 (as appropriate) is stored in RD. Exceptions: If the contents of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV bit is set. If SPEFSCRFINVE is set, an exception is taken, and the destination register is not updated. Otherwise, if an overflow occurs, then the SPEFSCRFOVF bit is set, or if an underflow occurs, then the SPEFSCRFUNF bit is set. If either underflow or overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. FGH, FXH, FG and FX will be cleared if an overflow, underflow, or invalid operation/input error is signaled, regardless of enabled exceptions. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 177 efsnabs efsnabs Floating-Point Single-Precision Negative Absolute Value efsnabs rD,rA 0 0 5 0 0 1 0 6 0 10 11 RD 15 16 RA 0 20 21 0 0 0 0 0 31 1 0 1 1 0 0 0 1 0 1 RD32:63 = 0b1 || RA33:63 Description: The sign bit of the low element of RA is set to 1 and the result is placed into the low element of RD. Exceptions: If the low element of RA is Infinity, Denorm, or NaN, the SPEFSCRFINV bit is set, and FG and FX are cleared. FGH and FXH are cleared as well. If Floating-point Invalid Input exceptions are enabled then an exception is taken, and the destination register is not updated. e200z759n3 Core Reference Manual, Rev. 2 178 Freescale Semiconductor efsneg efsneg Floating-Point Single-Precision Negate efsneg rD,rA 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 0 20 21 0 0 0 0 0 31 1 0 1 1 0 0 0 1 1 0 RD32:63 = ¬RA32 || RA33:63 Description: The sign bit of the low element of RA is complemented and the result is placed into the low element of RD. Exceptions: If the low element of RA is Infinity, Denorm, or NaN, the SPEFSCRFINV bit is set, and FG and FX are cleared. FGH and FXH are cleared as well. If Floating-point Invalid Input exceptions are enabled then an exception is taken, and the destination register is not updated. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 179 efsnmadd efsnmadd Floating-Point Single-Precision Negative Multiply-Add efsnmadd rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 1 0 0 1 0 1 0 RD32:63 = -((RA32:63 Xfp RB32:63) +sp RD32:63) The low element of rA is multiplied by the low element of rB, the intermediate product is added to the low element of rD, and the negated result is stored in the low element of rD. If RA or RB are either zero or denormalized, the intermediate product is a properly signed zero. Otherwise, if RA or RB are either NaN or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa!=sb), and this value is used for the result and stored into RD. Otherwise, the intermediate product is added to the corresponding element of RD, and the final result is negated. If RD is NaN or infinity, the result is either nmax (sd==0), or pmax (sd==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then -0 (for rounding modes RN, RZ, RP) or +0 (for rounding mode RM) is stored in RD. Exceptions: If the contents of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV bit is set. If SPEFSCRFINVE is set, an exception is taken, and the destination register is not updated. Otherwise, if an overflow occurs, then the SPEFSCRFOVF bit is set, or if an underflow occurs, then the SPEFSCRFUNF bit is set. If either underflow or overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. FGH, FXH, FG and FX will be cleared if an overflow, underflow, or invalid operation/input error is signaled, regardless of enabled exceptions. e200z759n3 Core Reference Manual, Rev. 2 180 Freescale Semiconductor efsnmsub efsnmsub Floating-Point Single-Precision Negative Multiply-Subtract efsnmsub rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 1 0 0 1 0 1 1 RD32:63 = -((RA32:63 Xfp RB32:63) -sp RD32:63) The low element of element of rA is multiplied by the low element of rB, the low element of rD is subtracted from the intermediate product, and the negated result is stored in the low element of rD. If RA or RB are either zero or denormalized, the intermediate product is a properly signed zero. Otherwise, if RA or RB are either NaN or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa!=sb), and this value is negated to obtain the result and is stored into RD. Otherwise, the low element of rD is subtracted from the intermediate product, and the final result is negated. If RD is NaN or infinity, the final result is either pmax (sd==0), or nmax (sd==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then -0 (for rounding modes RN, RZ, RP) or +0 (for rounding mode RM) is stored in RD. Exceptions: If the contents of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV bit is set. If SPEFSCRFINVE is set, an exception is taken, and the destination register is not updated. Otherwise, if an overflow occurs, then the SPEFSCRFOVF bit is set, or if an underflow occurs, then the SPEFSCRFUNF bit is set. If either underflow or overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. FGH, FXH, FG and FX will be cleared if an overflow, underflow, or invalid operation/input error is signaled, regardless of enabled exceptions. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 181 efssqrt efssqrt Floating-Point Single-Precision Square Root efssqrt rD,rA 0 0 5 0 0 1 0 6 0 10 11 RD 15 16 RA 0 20 21 0 0 0 0 0 31 1 0 1 1 0 0 0 1 1 1 rD32:63 SQRT(rA32:63) The square root of the low element of rA is calculated, and the results is stored in the low element of rD. If the low element of rA is zero or denorm, the result is a same signed zero. If the low element of rA is +NaN or +infinity, the corresponding result is pmax. Otherwise, if the low element of rA is non-zero and has a negative sign, including -NaN or -infinity, the corresponding result is -0. Otherwise, if an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in the low element of rD. Exceptions: If the low element of rA is non-zero and has a negative sign, or is Infinity, Denorm, or NaN, SPEFSCRFINV is set, and SPEFSCRFGH,FXH,FG,FX are cleared. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an underflow occurs, SPEFSCRFUNF is set. If underflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If the result element of this instruction is inexact, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler, and the FGH and FXH bits are cleared. FG, FX, FGH, and FXH are cleared if an underflow or an invalid operation/input error is signaled for the low element, regardless of enabled exceptions. e200z759n3 Core Reference Manual, Rev. 2 182 Freescale Semiconductor efssub efssub Floating-Point Single-Precision Subtract efssub rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 1 0 0 0 0 0 1 RD32:63 = RA32:63 -sp RB32:63 Description: The low element of RB is subtracted from the low element of RA and the result is stored in the low element of RD. If RA is NaN or infinity, the result is either pmax (sa==0), or nmax (sa==1). Otherwise, If RB is NaN or infinity, the result is either nmax (sb==0), or pmax (sb==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in RD. Exceptions: If the contents of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV bit is set. If SPEFSCRFINVE is set, an exception is taken, and the destination register is not updated. Otherwise, if an overflow occurs, then the SPEFSCRFOVF bit is set, or if an underflow occurs, then the SPEFSCRFUNF bit is set. If either underflow or overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and FXH bits are cleared. FGH, FXH, FG and FX will be cleared if an overflow, underflow, or invalid operation/input error is signaled, regardless of enabled exceptions. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 183 efststeq efststeq Floating-Point Single-Precision Test Equal efststeq crfD,rA,rB 0 0 5 0 0 1 0 0 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 0 31 1 0 1 1 0 1 1 1 1 0 Description: al = RA32:63 bl = RB32:63 if (al == bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = undefined || cl || undefined || undefined The low element of RA is compared against the low element of RB. If RA is equal to RB, then the bit in the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. No exceptions are generated during the execution of efststeq instruction. If strict IEEE 754 compliance is required, then the program should use the efscmpeq instruction. Implementation note: In an implementation, the execution of efststeq is likely to be faster than the execution of efscmpeq instruction. e200z759n3 Core Reference Manual, Rev. 2 184 Freescale Semiconductor efststgt efststgt Floating-Point Single-Precision Test Greater Than efststgt crfD,rA,rB 0 0 5 0 0 1 0 0 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 0 31 1 0 1 1 0 1 1 1 0 0 Description: al = RA32:63 bl = RB32:63 if (al > bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = undefined || cl || undefined || undefined The low element of RA is compared against the low element of RB. If RA is greater than RB, then the bit in the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. No exceptions are generated during the execution of efststgt instruction. If strict IEEE 754 compliance is required, then the program should use the efscmpgt instruction. Implementation note: In an implementation, the execution of efststgt is likely to be faster than the execution of efscmpgt instruction. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 185 efststlt efststlt Floating-Point Single-Precision Test Less Than efststlt crfD,rA,rB 0 0 5 0 0 1 0 0 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 0 31 1 0 1 1 0 1 1 1 0 1 Description: al = RA32:63 bl = RB32:63 if (al < bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = undefined || cl || undefined || undefined The low element of RA is compared against the low element of RB. If RA is less than RB, then the bit in the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. No exceptions are generated during the execution of efststlt instruction. If strict IEEE 754 compliance is required, then the program should use the efscmplt instruction. Implementation note: In an implementation, the execution of efststlt is likely to be faster than the execution of efscmplt instruction. 5.3.5 EFPU Vector Single-precision Embedded Floating-Point Instructions In the following instruction descriptions, “sa” is the sign of operand A, “ea” is the biased exponent value of operand A, “sb” is the sign of operand B, “eb” is the biased exponent value of operand B, “ei” is an intermediate exponent value, “r” is a result value. e200z759n3 Core Reference Manual, Rev. 2 186 Freescale Semiconductor evfsabs evfsabs Vector Floating-Point Single-Precision Absolute Value evfsabs rD,rA 0 5 6 4 10 11 RD 15 16 RA 0 20 21 0 0 0 0 0 31 1 0 1 0 0 0 0 1 0 0 RD0:31 = 0b0 || RA1:31 RD32:63 = 0b0 || RA33:63 Description: The sign bit of each element in RA is set to 0 and the results are placed into RD. Exceptions: If the contents of either element of RA are Infinity, Denorm, or NaN, the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If Floating-point Invalid Input exceptions are enabled then an exception is taken, and the destination register is not updated. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 187 evfsadd evfsadd Vector Floating-Point Single-Precision Add evfsadd rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 0 0 0 0 0 0 RD0:31 = RA0:31 +sp RB0:31 RD32:63 = RA32:63 +sp RB32:63 Description: Each single-precision floating-point element of RA is added to the corresponding element of RB and the results are stored in RD. If RA is NaN or infinity, the result is either pmax (sa==0), or nmax (sa==1). Otherwise, If RB is NaN or infinity, the result is either pmax (sb==0), or nmax (sb==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in RD. Exceptions: If the contents of either element of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If SPEFSCRFINVE is set, an exception is taken and the destination register is not updated. Otherwise, if an overflow occurs, then the SPEFSCRFOVF, FOVFH bits are set appropriately, or if an underflow occurs, then the SPEFSCRFUNF, FUNFH bits are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other exception is taken, or underflows but underflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the exception handler. FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 188 Freescale Semiconductor evfsaddsub evfsaddsub Vector Floating-Point Single-Precision Add / Subtract evfsaddsub rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 0 0 1 0 rD0:31 rA0:31 +sp rB0:31 rD32:63 rA32:63 -sp rB32:63 The high order single-precision floating-point element of rA is added to the corresponding element of rB, the low order single-precision floating-point element of rB is subtracted from the corresponding element of rA, and the results are stored in rD. If an element of rA is NaN or infinity, the corresponding result is either pmax (sa==0)or nmax (sa==1). Otherwise, if an element of rB is NaN or infinity, the corresponding result is either pmax (sb==0) or nmax (sb==1). Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in the corresponding element of rD. Exceptions: If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs, SPEFSCRFOVF,FOVFH are set appropriately, or if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS,FINXSH is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 189 evfsaddsubx evfsaddsubx Vector Floating-Point Single-Precision Add / Subtract Exchanged evfsaddsubx rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 1 0 1 0 rD0:31 rA32:63 +sp rB0:31 rD32:63 rA0:31 -sp rB32:63 The high-order single-precision floating-point element of rB is added to the low-order element of rA, the low-order single-precision floating-point element of rB is subtracted from the high-order element of rA, and the results are stored in rD. If an element of rA is NaN or infinity, the corresponding result is either pmax (sa==0)or nmax (sa==1). Otherwise, if an element of rB is NaN or infinity, the corresponding result is either pmax (sb==0) or nmax (sb==1). Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in the corresponding element of rD. Exceptions: If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs, SPEFSCRFOVF,FOVFH are set appropriately, or if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS,FINXSH is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 190 Freescale Semiconductor evfsaddx evfsaddx Vector Floating-Point Single-Precision Add Exchanged evfsaddx rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 1 0 0 0 rD0:31 rA32:63 +sp rB0:31 rD32:63 rA0:31 +sp rB32:63 The high-order single-precision floating-point element of rB is added to the low-order element of rA, the low-order single-precision floating-point element of rB is added to the high-order element of rA, and the results are stored in rD. If an element of rA is NaN or infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if an element of rB is NaN or infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in the corresponding element of rD. Exceptions: If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs, SPEFSCRFOVF,FOVFH are set appropriately, or if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS,FINXSH is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 191 evfscfh evfscfh Vector Convert Floating-Point Single-Precision from Half-Precision evfscfh rD,rB 0 0 5 0 0 1 0 6 0 10 11 RD 0 15 16 0 1 0 0 20 21 RB 0 31 1 0 1 0 0 1 0 0 0 1 FP16format f; FP32format result; fh rB24:31 fl rB48:63 if (fhexp = 0) & (fhfrac = 0)) then resulth fhsign || 310 // signed zero value else if Isa16NaNorInfinity(fh) then SPEFSCRFINVH 1 resulth fhsign || 0b11111110 || 231 // max value else if Isa16Denorm(fh) then SPEFSCRFINVH 1 resulth fhsign || 310 else resulthsign fhsign resulthexp fhexp - 15 + 127 resulthfrac fhfrac || 130 if (flexp = 0) & (flfrac = 0)) then resultl flsign || 310 // signed zero value else if Isa16NaNorInfinity(fl) then SPEFSCRFINV 1 resultl flsign || 0b11111110 || 231 // max value else if Isa16Denorm(fl) then SPEFSCRFINV 1 resultl flsign || 310 else resultlsign flsign resultlexp flexp - 15 + 127 resultlfrac flfrac || 130 rD0:31 = resulth; rD32:63 = resultl The half-precision FP number in each element in RB is converted to a single-precision floating-point value and the result is placed into the corresponding element of RD. The rounding mode is not used since this conversion is always exact. Exceptions: If either element of RB is Infinity, Denorm, or NaN, then the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared. If SPEFSCRFINVE is set, an exception is taken, the destination register is not updated, and no other status bits are set. e200z759n3 Core Reference Manual, Rev. 2 192 Freescale Semiconductor evfscfsf evfscfsf Vector Convert Floating-Point Single-Precision from Signed Fraction evfscfsf rD,rB 0 5 4 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 0 0 1 0 0 1 1 Description: RD0:31 = CnvtSF32ToFP32(RB0:31) RD32:63 = CnvtSF32ToFP32(RB32:63) Each signed fractional element of rB is converted to a single-precision floating-point value using the current rounding mode and the results are placed into the corresponding elements of rD. Exceptions: This instruction can signal an inexact status and set SPEFSCRFINXS if the conversions are not exact. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result(s). The FGH, FXH, FG and FX bits are properly updated to allow rounding to be performed in the exception handler. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 193 evfscfsi evfscfsi Vector Convert Floating-Point Single-Precision from Signed Integer evfscfsi rD,rB 0 5 4 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 0 0 1 0 0 0 1 Description: RD0:31 = CnvtSI32ToFP32(RB0:31) RD32:63 = CnvtSI32ToFP32(RB32:63) Each signed integer element of rB is converted to the nearest single-precision floating-point value using the current rounding mode and the results are placed into the corresponding element of rD. Exceptions: This instruction can signal an inexact status and set SPEFSCRFINXS if the conversions are not exact. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result(s). The FGH, FXH, FG and FX bits are properly updated to allow rounding to be performed in the exception handler. e200z759n3 Core Reference Manual, Rev. 2 194 Freescale Semiconductor evfscfuf evfscfuf Vector Convert Floating-Point Single-Precision from Unsigned Fraction evfscfuf rD,rB 0 5 4 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 0 0 1 0 0 1 0 RD0:31 = CnvtUF32ToFP32(RB0:31) RD32:63 = CnvtUF32ToFP32(RB32:63) Each unsigned fractional element of rB is converted to a single-precision floating-point value using the current rounding mode and the results are placed into the corresponding elements of rD. Exceptions: This instruction can signal an inexact status and set SPEFSCRFINXS if the conversions are not exact. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result(s). The FGH, FXH, FG and FX bits are properly updated to allow rounding to be performed in the exception handler. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 195 evfscfui evfscfui Vector Convert Floating-Point Single-Precision from Unsigned Integer evfscfui rD,rB 0 5 4 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 0 0 1 0 0 0 0 Description: RD0:31 = CnvtUI32ToFP32(RB0:31) RD32:63 = CnvtUI32ToFP32(RB32:63) Each unsigned integer element of rB is converted to the nearest single-precision floating-point value using the current rounding mode and the results are placed into the corresponding elements of rD. Exceptions: This instruction can signal an inexact status and set SPEFSCRFINXS if the conversions are not exact. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result(s). The FGH, FXH, FG and FX bits are properly updated to allow rounding to be performed in the exception handler. e200z759n3 Core Reference Manual, Rev. 2 196 Freescale Semiconductor evfscmpeq evfscmpeq Vector Floating-Point Single-Precision Compare Equal evfscmpeq crfD,rA,rB 0 5 4 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 0 31 1 0 1 0 0 0 1 1 1 0 Description: ah = RA0:31 al = RA32:63 bh = RB0:31 bl = RB32:63 if (ah == bh) then ch = 1 else ch = 0 if (al == bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl) Each element of rA is compared against the corresponding element of rB. If rA equals RB, the crfD bit is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = -0). Exceptions: If the contents of either element of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If Floating-point Invalid Input exceptions are enabled then an exception is taken, and the Condition Register is not updated. Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 197 evfscmpgt evfscmpgt Vector Floating-Point Single-Precision Compare Greater Than evfscmpgt crfD,rA,rB 0 5 4 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 0 31 1 0 1 0 0 0 1 1 0 0 Description: ah = RA0:31 al = RA32:63 bh = RB0:31 bl = RB32:63 if (ah > bh) then ch = 1 else ch = 0 if (al > bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl) Each element of rA is compared against the corresponding element of rB. If rA is greater than rB, the bit in the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = -0). Exceptions: If the contents of either element of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If Floating-point Invalid Input exceptions are enabled then an exception is taken, and the Condition Register is not updated. Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. e200z759n3 Core Reference Manual, Rev. 2 198 Freescale Semiconductor evfscmplt evfscmplt Vector Floating-Point Single-Precision Compare Less Than evfscmplt crfD,rA,rB 0 5 4 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 0 31 1 0 1 0 0 0 1 1 0 1 Description: ah = RA0:31 al = RA32:63 bh = RB0:31 bl = RB32:63 if (ah < bh) then ch = 1 else ch = 0 if (al < bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl) Each element of rA is compared against the corresponding element of rB. If rA is less than rB, the bit in the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = -0). Exceptions: If the contents of either element of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If Floating-point Invalid Input exceptions are enabled then an exception is taken, and the Condition Register is not updated. Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 199 evfscth evfscth Vector Convert Floating-Point Single-Precision to Half-Precision evfscth rD,rB 0 0 5 0 0 1 0 6 0 10 11 RD 0 15 16 0 1 0 0 20 21 RB 0 31 1 0 1 0 0 1 0 1 0 1 FP32format fh, fl; FP16format resulth, resultl; fh rB0:31; fl rB32:63 if (fhexp = 0) & (fhfrac = 0)) then resulth fhsign || 150 // signed zero value else if Isa32NaNorInfinity(fh) then SPEFSCRFINVH 1 result fhsign || 0b11110 || 101 // max value else if Isa32Denorm(fh) then SPEFSCRFINVH 1 resulth fsign || 150 else unbias fhexp - 127 if unbias > 15 then resulth fhsign || 0b11110 || 101 // max value SPEFSCRFOVFH 1 else if unbias < -14 && (result would not round up to bmin) then resulth fhsign || 150 // like-signed zero value SPEFSCRFUNFH 1 else resulthsign fhsign; resulthexp unbias + 15; resulthfrac fhfrac[0:9] guard fhfrac[10]; sticky (fhfrac[11:22] 0) resulth Round16(resulth, LOWER, guard, sticky) SPEFSCRFGH guard; SPEFSCRFXH sticky if guard | sticky then SPEFSCRFINXS 1 if (flexp = 0) & (flfrac = 0)) then resultl flsign || 150 // signed zero value else if Isa32NaNorInfinity(fl) then SPEFSCRFINV 1 resultl flsign || 0b11110 || 101 // max value else if Isa32Denorm(fl) then SPEFSCRFINV 1 resultl flsign || 150 else unbias flexp - 127 if unbias > 15 then // max value resultl flsign || 0b11110 || 101 SPEFSCRFOVF 1 else if unbias < -14 && (result would not round up to bmin) then resultl flsign || 150 // like-signed zero value SPEFSCRFUNF 1 else resultlsign flsign; resultlexp unbias + 15; resultlfrac flfrac[0:9] guard flfrac[10]; sticky (flfrac[11:22] 0) resultl Round16(resultl, LOWER, guard, sticky) SPEFSCRFG guard; SPEFSCRFX sticky if guard | sticky then SPEFSCRFINXS 1 rD0:31 = 160 || resulth; rD32:63 = 160 || resultl e200z759n3 Core Reference Manual, Rev. 2 200 Freescale Semiconductor The single-precision FP number in each element in RB is converted to a half-precision floating-point value using the current rounding mode. The result is then prepended with 16 zeros, and placed into the corresponding element of RD. Exceptions: If the contents of either element of rB is Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs, SPEFSCRFOVF,FOVFH are set appropriately, or if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS,FINXSH is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FGH, FXH, FG, and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 201 evfsctsf evfsctsf Vector Convert Floating-Point Single-Precision to Signed Fraction evfsctsf rD,rB 0 5 4 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 0 0 1 0 1 1 1 Description: ah = RB0:31 if (ah == Denorm) then RD0:31 = 0 else if ((al == +0) || (al == -0)) // zero cases RD0:31 = 0 else if (eah < 127) then RD0:31 = CnvtFP32ToSF32Sat(ah) else if ((eah == 127) && (sah == 1) && (fah==0)) then RD0:31 = 0x80000000 // max negative, no overflow else if (ah == NAN) then RD0:31 = 0 else // Overflow if (sah == 0) then // Positive RD0:31 = 0x7FFFFFFF else RD0:31 = 0x80000000 al = RB32:63 if (al == Denorm) then RD32:63 = 0 else if ((al == +0) || (al == -0)) // zero cases RD32:63 = 0 else if (eal < 127) then RD32:63 = CnvtFP32ToSF32Sat(al) else if ((eal == 127) && (sal == 1) && (fal==0)) then RD32:63 = 0x80000000 // max negative, no overflow else if (al == NAN) then RD32:63 = 0 else // Overflow if (sal == 0) then // Positive RD32:63 = 0x7FFFFFFF else RD32:63 = 0x80000000 Each single-precision floating-point element in RB is converted to a signed fraction using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit signed fraction. NaNs are converted as though they were zero. Exceptions: If either element of RB is Infinity, Denorm, or NaN, or if an overflow occurs, then the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If SPEFSCRFINVE is set, an exception is taken, the destination register is not updated, and no other status bits are set. If either result element of this instruction is inexact and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated e200z759n3 Core Reference Manual, Rev. 2 202 Freescale Semiconductor result. The FGH, FXH, FG and FX bits are properly updated to allow rounding to be performed in the exception handler. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 203 evfsctsi evfsctsi Vector Convert Floating-Point Single-Precision to Signed Integer evfsctsi rD,rB 0 5 4 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 0 0 1 0 1 0 1 Description: ah = RB0:31 if (ah == Denorm) then RD0:31 = 0 else if (eah < 158) then RD0:31 = CnvtFP32ToSI32Sat(ah) else if ((eah == 158) && (sah == 1) && (fah==0)) then RD0:31 = 0x80000000 // max negative, no overflow else if (ah == NAN) then RD0:31 = 0 else // Overflow if (sah == 0) then // Positive RD0:31 = 0x7FFFFFFF else RD0:31 = 0x80000000 al = RB32:63 if (al == Denorm) then RD32:63 = 0 else if (eal < 158) then RD32:63 = CnvtFP32ToSI32Sat(al) else if ((eal == 158) && (sal == 1) && (fal==0)) then RD32:63 = 0x80000000 // max negative, no overflow else if (al == NAN) then RD32:63 = 0 else // Overflow if (sal == 0) then // Positive RD32:63 = 0x7FFFFFFF else RD32:63 = 0x80000000 Each single-precision floating-point element in RB is converted to a signed integer using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Exceptions: If the contents of either element of RB are Infinity, Denorm, or NaN, or if an overflow occurs on conversion, then the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If SPEFSCRFINVE is set, an exception is taken, the destination register is not updated, and no other status bits are set. If either result element of this instruction is inexact and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the e200z759n3 Core Reference Manual, Rev. 2 204 Freescale Semiconductor Floating-point Round exception vector. In this case, the destination register is updated with the truncated result. The FGH, FXH, FG and FX bits are properly updated to allow rounding to be performed in the exception handler. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 205 evfsctsiz evfsctsiz Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero evfsctsiz rD,rB 0 5 4 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 0 0 1 1 0 1 0 Description: ah = RB0:31 if (ah == Denorm) then RD0:31 = 0 else if (eah < 158) then RD0:31 = CnvtFP32ToSI32Sat(ah) else if ((eah == 158) && (sah == 1) && (fah==0)) then RD0:31 = 0x80000000 // max negative, no overflow else if (ah == NAN) then RD0:31 = 0 else // Overflow if (sah == 0) then // Positive RD0:31 = 0x7FFFFFFF else RD0:31 = 0x80000000 al = RB32:63 if (al == Denorm) then RD32:63 = 0 else if (eal < 158) then RD32:63 = CnvtFP32ToSI32Sat(al) else if ((eal == 158) && (sal == 1) && (fal==0)) then RD32:63 = 0x80000000 // max negative, no overflow else if (al == NAN) then RD32:63 = 0 else // Overflow if (sal == 0) then // Positive RD32:63 = 0x7FFFFFFF else RD32:63 = 0x80000000 Each single-precision floating-point element in RB is converted to a signed integer using the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Exceptions: If either element of RB is Infinity, Denorm, or NaN, or if an overflow occurs, then the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If SPEFSCRFINVE is set, an exception is taken, the destination register is not updated, and no other status bits are set. If either result element of this instruction is inexact and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the e200z759n3 Core Reference Manual, Rev. 2 206 Freescale Semiconductor Floating-point Round exception vector. In this case, the destination register is updated with the truncated result. The FGH, FXH, FG and FX bits are properly updated to allow rounding to be performed in the exception handler. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 207 evfsctuf evfsctuf Vector Convert Floating-Point Single-Precision to Unsigned Fraction evfsctuf rD,rB 0 5 4 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 0 0 1 0 1 1 0 Description: ah = RB0:31 if (ah == Denorm) then // force denorm to zero RD0:31 = 0 else if ((ah == +0) || (ah == -0)) // zero cases RD0:31 = 0 else if (sah == 1) // Negative RD0:31 = 0 else if (eah < 127) RD0:31 = CnvtFP32ToUF32Sat(ah) else if (ah == NAN) then RD0:31 = 0 else // Overflow RD0:31 = 0xFFFFFFFF al = RB32:63 if (al == Denorm) then RD32:63 = 0 else if ((al == +0) || (al == -0)) // zero cases RD32:63 = 0 else if (sal == 1) // Negative RD32:63 = 0 else if (eal < 127) RD32:63 = CnvtFP32ToUF32Sat(al) else if (al == NAN) then RD32:63 = 0 else // Overflow RD32:63 = 0xFFFFFFFF Each single-precision floating-point element in RB is converted to an unsigned fraction using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit fraction. NaNs are converted as though they were zero. Exceptions: If either element of RB is Infinity, Denorm, or NaN, or if an overflow occurs, then the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If SPEFSCRFINVE is set, an exception is taken, the destination register is not updated, and no other status bits are set. If either result element of this instruction is inexact and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated e200z759n3 Core Reference Manual, Rev. 2 208 Freescale Semiconductor result. The FGH, FXH, FG and FX bits are properly updated to allow rounding to be performed in the exception handler. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 209 evfsctui evfsctui Vector Convert Floating-Point Single-Precision to Unsigned Integer evfsctui rD,rB 0 5 4 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 0 0 1 0 1 0 0 Description: ah = RB0:31 if (ah == Denorm) then // force denorm to zero RD0:31 = 0 else if ((ah == +0) || (ah == -0)) // zero cases RD0:31 = 0 else if (sah == 1) // Negative RD0:31 = 0 else if (eah <= 158) RD0:31 = CnvtFP32ToUI32Sat(ah) else if (ah == NAN) then RD0:31 = 0 else // Overflow RD0:31 = 0xFFFFFFFF al = RB32:63 if (al == Denorm) then RD32:63 = 0 else if ((al == +0) || (al == -0)) // zero cases RD32:63 = 0 else if (sal == 1) // Negative RD32:63 = 0 else if (eal <= 158) RD32:63 = CnvtFP32ToUI32Sat(al) else if (al == NAN) then RD32:63 = 0 else // Overflow RD32:63 = 0xFFFFFFFF Each single-precision floating-point element in RB is converted to an unsigned integer using the current rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Exceptions: If either element of RB is Infinity, Denorm, or NaN, or if an overflow occurs, then the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If SPEFSCRFINVE is set, an exception is taken, the destination register is not updated, and no other status bits are set. If either result element of this instruction is inexact and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated e200z759n3 Core Reference Manual, Rev. 2 210 Freescale Semiconductor result. The FGH, FXH, FG and FX bits are properly updated to allow rounding to be performed in the exception handler. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 211 evfsctuiz evfsctuiz Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero evfsctui rD,rB 0 5 4 6 10 11 RD 0 15 16 0 0 0 0 20 21 RB 0 31 1 0 1 0 0 1 1 0 0 0 Description: ah = RB0:31 if (ah == Denorm) then // force denorm to zero RD0:31 = 0 else if ((ah == +0) || (ah == -0)) // zero cases RD0:31 = 0 else if (sah == 1) // Negative RD0:31 = 0 else if (eah <= 158) RD0:31 = CnvtFP32ToUI32Sat(ah) else if (ah == NAN) then RD0:31 = 0 else // Overflow RD0:31 = 0xFFFFFFFF al = RB32:63 if (al == Denorm) then RD32:63 = 0 else if ((al == +0) || (al == -0)) // zero cases RD32:63 = 0 else if (sal == 1) // Negative RD32:63 = 0 else if (eal <= 158) RD32:63 = CnvtFP32ToUI32Sat(al) else if (al == NAN) then RD32:63 = 0 else // Overflow RD32:63 = 0xFFFFFFFF Each single-precision floating-point element in RB is converted to an unsigned integer using the rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted as though they were zero. Exceptions: If either element of RB is Infinity, Denorm, or NaN, or if an overflow occurs, then the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If SPEFSCRFINVE is set, an exception is taken, the destination register is not updated, and no other status bits are set. If either result element of this instruction is inexact and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated e200z759n3 Core Reference Manual, Rev. 2 212 Freescale Semiconductor result. The FGH, FXH, FG and FX bits are properly updated to allow rounding to be performed in the exception handler. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 213 evfsdiff evfsdiff Vector Floating-Point Single-Precision Differences evfsdiff rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 0 1 0 1 rD0:31 rA0:31 -sp rA32:63 rD32:63 rB0:31 -sp rB32:63 The low-order single-precision floating-point element of rA is subtracted from the high-order element of rA, the low-order single-precision floating-point element of rB is subtracted from the high-order element of rB, and the results are stored in rD. If the high-order element of rA or rB is NaN or infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if the low order element of rA or rB is NaN or infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in the corresponding element of rD. Exceptions: If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs, SPEFSCRFOVF,FOVFH are set appropriately, or if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS,FINXSH is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 214 Freescale Semiconductor evfsdiffsum evfsdiffsum Vector Floating-Point Single-Precision Difference / Sum evfsdiffsum rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 0 1 1 1 rD0:31 rA0:31 -sp rA32:63 rD32:63 rB0:31 +sp rB32:63 The low-order single-precision floating-point element of rA is subtracted from the high-order element of rA, the low-order single-precision floating-point element of rB is added to the high-order element of rB, and the results are stored in rD. If the high-order element of rA or rB is NaN or infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if the low order element of rA or rB is NaN or infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in the corresponding element of rD. Exceptions: If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs, SPEFSCRFOVF,FOVFH are set appropriately, or if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS,FINXSH is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 215 evfsdiv evfsdiv Vector Floating-Point Single-Precision Divide evfsdiv rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 0 0 1 0 0 1 RD0:31 = RA0:31 sp RB0:31 RD32:63 = RA32:63 sp RB32:63 Each single-precision floating-point element of rA is divided by the corresponding element of rB and the result is stored in rD. If RB is a NaN or infinity, the result is a properly signed zero. Otherwise, if RB is a denormalized number or a zero, or if RA is either NaN or infinity, the result is either pmax (sa==sb), or nmax (sa!=sb). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 or -0 (as appropriate) is stored in RD. Exceptions: If the contents of RA or RB are Infinity, Denorm, or NaN, or if both RA and RB are +/-0, the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If SPEFSCRFINVE is set, an exception is taken and the destination register is not updated. Otherwise, if the content of RB is +/-0 and the content of RA is a finite normalized non-zero number, the SPEFSCRFDBZ, FDBZH bits are set appropriately. If Floating-point Divide by Zero exceptions are enabled, an exception is then taken. Otherwise, if an overflow occurs, then the SPEFSCRFOVF, FOVFH bits are set appropriately, or if an underflow occurs, then the SPEFSCRFUNF, FUNFH bits are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other exception is taken, or underflows but underflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the exception handler. FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 216 Freescale Semiconductor evfsmadd evfsmadd Vector Floating-Point Single-Precision Multiply-Add evfsmadd rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 0 0 0 0 1 0 RD0:31 = ((RA0:31 Xfp RB0:31) +sp RD0:31) RD32:63 = ((RA32:63 Xfp RB32:63) +sp RD32:63) Each single-precision floating-point element of rA is multiplied with the corresponding element of rB, the intermediate product is added to the corresponding element of rD, and the result is stored in rD. If RA or RB are either zero or denormalized, the intermediate product is a properly signed zero. Otherwise, if RA or RB are either NaN or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa!=sb), and this value is used for the result and stored into RD. Otherwise, the intermediate product is added to the corresponding element of RD. If RD is NaN or infinity, the result is either pmax (sd==0), or nmax (sd==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in RD. Exceptions: If the contents of either element of RA, RB, or RD are Infinity, Denorm, or NaN, the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If SPEFSCRFINVE is set, an exception is taken and the destination register is not updated. Otherwise, if an overflow occurs, then the SPEFSCRFOVF, FOVFH bits are set appropriately, or if an underflow occurs, then the SPEFSCRFUNF, FUNFH bits are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other exception is taken, or underflows but underflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the exception handler. FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 217 evfsmax evfsmax Vector Floating-Point Single-Precision Maximum evfsmax rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 0 0 0 0 ahrA0:31 bhrB0:31 if (ah < bh) then temphbh else temphah if (isnan(ah) & ~(isnan(bh))) then temphbh if (isnan(bh) & ~(isnan(ah))) then temphah rD0:31temph alrA32:63 blrB32:63 if (al < bl) then templbl else templal if (isnan(al) & ~(isnan(bl))) then templbl if (isnan(bl) & ~(isnan(al))) then templal rD32:63templ Each single-precision floating-point element of rA is compared against the corresponding elements of rB. The larger element is selected and placed into the corresponding element of rD. The maximum of +0 and -0 is +0. Exceptions: If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken, and the destination register is not updated. Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. If one of the elements is a NaN and the other is not, the non-NaN element is selected rather than the comparison result. If the selected element is denorm, the result is a same signed zero. If the selected element is +NaN or +infinity, the corresponding result is pmax. Otherwise, if the selected element is -NaN or -infinity, the corresponding result is nmax. e200z759n3 Core Reference Manual, Rev. 2 218 Freescale Semiconductor evfsmin evfsmin Vector Floating-Point Single-Precision Minimum evfsmin rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 0 0 0 1 ahrA0:31 bhrB0:31 if (ah < bh) then temphah else temphbh if (isnan(ah) & ~(isnan(bh))) then temphbh if (isnan(bh) & ~(isnan(ah))) then temphah rD0:31temph alrA32:63 blrB32:63 if (al < bl) then templal else templbl if (isnan(al) & ~(isnan(bl))) then templbl if (isnan(bl) & ~(isnan(al))) then templal rD32:63templ Each single-precision floating-point element of rA is compared against the corresponding elements of rB. The smaller element is selected and placed into the corresponding element of rD. The minimum of +0 and -0 is -0. Exceptions: If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken, and the destination register is not updated. Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. If one of the elements is a NaN and the other is not, the non-NaN element is selected rather than the comparison result. If the selected element is denorm, the result is a same signed zero. If the selected element is +NaN or +infinity, the corresponding result is pmax. Otherwise, if the selected element is -NaN or -infinity, the corresponding result is nmax. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 219 evfsmsub evfsmsub Vector Floating-Point Single-Precision Multiply-Subtract evfsmsub rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 0 0 0 0 1 1 RD0:31 = ((RA0:31 Xfp RB0:31) -sp RD0:31) RD32:63 = ((RA32:63 Xfp RB32:63) -sp RD32:63) Each single-precision floating-point element of rA is multiplied with the corresponding element of rB, the corresponding element of rD is subtracted from the intermediate product, and the result is stored in rD. If RA or RB are either zero or denormalized, the intermediate product is a properly signed zero. Otherwise, if RA or RB are either NaN or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa!=sb), and this value is used for the result and stored into RD. Otherwise, the corresponding element of rD is subtracted from the intermediate product. If RD is NaN or infinity, the result is either nmax (sd==0), or pmax (sd==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 (for rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in RD. Exceptions: If the contents of either element of RA, RB, or RD are Infinity, Denorm, or NaN, the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If SPEFSCRFINVE is set, an exception is taken and the destination register is not updated. Otherwise, if an overflow occurs, then the SPEFSCRFOVF, FOVFH bits are set appropriately, or if an underflow occurs, then the SPEFSCRFUNF, FUNFH bits are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other exception is taken, or underflows but underflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the exception handler. FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 220 Freescale Semiconductor evfsmul evfsmul Vector Floating-Point Single-Precision Multiply evfsmul rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 0 0 1 0 0 0 RD0:31 = RA0:31 Xsp RB0:31 RD32:63 = RA32:63 Xsp RB32:63 Each single-precision floating-point element of rA is multiplied with the corresponding element of rB and the result is stored in rD. If RA or RB are either zero or denormalized, the result is a properly signed zero. Otherwise, if RA or RB are either NaN or infinity, the result is either pmax (sa==sb), or nmax (sa!=sb). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 or -0 (as appropriate) is stored in RD. Exceptions: If the contents of either element of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If SPEFSCRFINVE is set, an exception is taken and the destination register is not updated. Otherwise, if an overflow occurs, then the SPEFSCRFOVF, FOVFH bits are set appropriately, or if an underflow occurs, then the SPEFSCRFUNF, FUNFH bits are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other exception is taken, or underflows but underflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the exception handler. FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 221 evfsmule evfsmule Vector Floating-Point Single-Precision Multiply By Even Element evfsmule rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 1 1 1 0 rD0:31 rA0:31 sp rB0:31 rD32:63 rA0:31 sp rB32:63 The single-precision floating-point elements of rB are multiplied by the even (high-order) element of rA, and the results are stored in rD. If an element of rB or the even element of rA is either zero denormalized, the corresponding result is a properly signed zero. Otherwise, if an element of rB or the even element of rA is either NaN or infinity, the corresponding result is either pmax (asign==bsign), or nmax (asign!=bsign). Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 or -0 (as appropriate) is stored in the corresponding element of rD. Exceptions: If the contents of either element of rB or the even element of rA is Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs, SPEFSCRFOVF,FOVFH are set appropriately, or if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an overflow or underflow exception is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 222 Freescale Semiconductor evfsmulo evfsmulo Vector Floating-Point Single-Precision Multiply By Odd Element evfsmulo rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 1 1 1 1 rD0:31 rA32:63 sp rB0:31 rD32:63 rA32:63 sp rB32:63 The single-precision floating-point elements of rB are multiplied by the odd (low-order) element of rA, and the results are stored in rD. If an element of rB or the odd element of rA is either zero or denormalized, the corresponding result is a properly signed zero. Otherwise, if an element of rB or the odd element of rA is either NaN or infinity, the corresponding result is either pmax (asign==bsign), or nmax (asign!=bsign). Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 or -0 (as appropriate) is stored in the corresponding element of rD. Exceptions: If the contents of either element of rB or the odd element of rA is Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs, SPEFSCRFOVF,FOVFH are set appropriately, or if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an overflow or underflow exception is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 223 evfsmulx evfsmulx Vector Floating-Point Single-Precision Multiply Exchanged evfsmulx rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 1 1 0 0 rD0:31 rA32:63 sp rB0:31 rD32:63 rA0:31 sp rB32:63 The high-order single-precision floating-point element of rB is multiplied by the low-order element of rA, the low-order single-precision floating-point element of rB is multiplied by the high-order element of rA, and the results are stored in rD. If an element of rA or rB is either zero or denormalized, the corresponding result is a properly signed zero. Otherwise, if an element of rA or rB are either NaN or infinity, the corresponding result is either pmax (asign==bsign), or nmax (asign!=bsign). Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 or -0 (as appropriate) is stored in the corresponding element of rD. Exceptions: If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs, SPEFSCRFOVF,FOVFH are set appropriately, or if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an overflow or underflow exception is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 224 Freescale Semiconductor evfsnabs evfsnabs Vector Floating-Point Single-Precision Negative Absolute Value evfsnabs rD,rA 0 5 6 4 10 11 RD 15 16 RA 0 20 21 0 0 0 0 0 31 1 0 1 0 0 0 0 1 0 1 RD0:31 = 0b1 || RA1:31 RD32:63 = 0b1 || RA33:63 Description: The sign bit of each element in RA is set to 1 and the results are placed into RD. Exceptions: If the contents of either element of RA are Infinity, Denorm, or NaN, the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If Floating-point Invalid Input exceptions are enabled then an exception is taken, and the destination register is not updated. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 225 evfsneg evfsneg Vector Floating-Point Single-Precision Negate evfsneg rD,rA 0 5 4 6 10 11 RD 15 16 RA 0 20 21 0 0 0 0 0 31 1 0 1 0 0 0 0 1 1 0 RD0:31 = ¬RA0 || RA1:31 RD32:63 = ¬RA32 || RA33:63 Description: The sign bit of each element in RA is complemented and the results are placed into RD. Exceptions: If the contents of either element of RA are Infinity, Denorm, or NaN, the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If Floating-point Invalid Input exceptions are enabled then an exception is taken, and the destination register is not updated. e200z759n3 Core Reference Manual, Rev. 2 226 Freescale Semiconductor evfsnmadd evfsnmadd Vector Floating-Point Single-Precision Negative Multiply-Add evfsnmadd rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 0 0 1 0 1 0 RD0:31 = -((RA0:31 Xfp RB0:31) +sp RD0:31) RD32:63 = -((RA32:63 Xfp RB32:63) +sp RD32:63) Each single-precision floating-point element of rA is multiplied with the corresponding element of rB, the intermediate product is added to the corresponding element of rD, and the negated result is stored in rD. If RA or RB are either zero or denormalized, the intermediate product is a properly signed zero. Otherwise, if RA or RB are either NaN or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa!=sb), and this value is used for the result and stored into RD. Otherwise, the intermediate product is added to the corresponding element of RD, and the final result is negated. If RD is NaN or infinity, the result is either nmax (sd==0), or pmax (sd==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then -0 (for rounding modes RN, RZ, RP) or +0 (for rounding mode RM) is stored in RD. Exceptions: If the contents of either element of RA, RB, or RD are Infinity, Denorm, or NaN, the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If SPEFSCRFINVE is set, an exception is taken and the destination register is not updated. Otherwise, if an overflow occurs, then the SPEFSCRFOVF, FOVFH bits are set appropriately, or if an underflow occurs, then the SPEFSCRFUNF, FUNFH bits are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other exception is taken, or underflows but underflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the exception handler. FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 227 evfsnmsub evfsnmsub Vector Floating-Point Single-Precision Negative Multiply-Subtract evfsnmsub rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 0 0 1 0 1 1 RD0:31 = -((RA0:31 Xfp RB0:31) -sp RD0:31) RD32:63 = -((RA32:63 Xfp RB32:63)-sp RD32:63) Each single-precision floating-point element of rA is multiplied with the corresponding element of rB, the corresponding element of rD is subtracted from the intermediate product, and the negated result is stored in rD. If RA or RB are either zero or denormalized, the intermediate product is a properly signed zero. Otherwise, if RA or RB are either NaN or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa!=sb), and this value is negated to obtain the result and is stored into RD. Otherwise, the corresponding element of rD is subtracted from the intermediate product, and the final result is negated. If RD is NaN or infinity, the final result is either pmax (sd==0), or nmax (sd==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then -0 (for rounding modes RN, RZ, RP) or +0 (for rounding mode RM) is stored in RD. Exceptions: If the contents of either element of RA, RB, or RD are Infinity, Denorm, or NaN, the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If SPEFSCRFINVE is set, an exception is taken and the destination register is not updated. Otherwise, if an overflow occurs, then the SPEFSCRFOVF, FOVFH bits are set appropriately, or if an underflow occurs, then the SPEFSCRFUNF, FUNFH bits are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other exception is taken, or underflows but underflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the exception handler. FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 228 Freescale Semiconductor evfssqrt evfssqrt Vector Floating-Point Single-Precision Square Root evfssqrt rD,rA 0 0 5 0 0 1 0 6 0 10 11 RD 15 16 RA 0 20 21 0 0 0 0 0 31 1 0 1 0 0 0 0 1 1 1 rD0:31 SQRT(rA0:31) rD32:63 SQRT(rA32:63) The square root of each single-precision floating-point element of rA is calculated, and the results are stored in rD. If an element of rA is zero or denorm, the result is a same signed zero. If an element of rA is +NaN or +infinity, the corresponding result is pmax. Otherwise, if an element of rA is non-zero and has a negative sign, including -NaN or -infinity, the corresponding result is -0. Otherwise, if an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in the corresponding element of rD. Exceptions: If the contents of either element of rA are non-zero and have a negative sign, or are Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If underflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS,FINXSH is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an underflow interrupt is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 229 evfssub evfssub Vector Floating-Point Single-Precision Subtract evfssub rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 0 0 0 0 0 1 RD0:31 = RA0:31 -sp RB0:31 RD32:63 = RA32:63 -sp RB32:63 Description: Each single-precision floating-point element of RB is subtracted from the corresponding element of RA and the results are stored in RD. If RA is NaN or infinity, the result is either pmax (sa==0), or nmax (sa==1). Otherwise, If RB is NaN or infinity, the result is either nmax (sb==0), or pmax (sb==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in RD. Exceptions: If the contents of either element of RA or RB are Infinity, Denorm, or NaN, the SPEFSCRFINV, FINVH bits are set appropriately, and the SPEFSCRFGH, FXH, FG, FX bits are cleared appropriately. If SPEFSCRFINVE is set, an exception is taken and the destination register is not updated. Otherwise, if an overflow occurs, then the SPEFSCRFOVF, FOVFH bits are set appropriately, or if an underflow occurs, then the SPEFSCRFUNF, FUNFH bits are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other exception is taken, or underflows but underflow exceptions are disabled, and no other exception is taken, the SPEFSCRFINXS bit will be set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the exception handler. FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 230 Freescale Semiconductor evfssubadd evfssubadd Vector Floating-Point Single-Precision Subtract / Add evfssubadd rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 0 0 1 1 rD0:31 rA0:31 -sp rB0:31 rD32:63 rA32:63 +sp rB32:63 The high-order single-precision floating-point element of rB is subtracted from the corresponding element of rA, the low-order single-precision floating-point element of rB is subtracted from the corresponding element of rA, and the results are stored in rD. If an element of rA is NaN or infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if an element of rB is NaN or infinity, the corresponding result is either nmax or pmax (as appropriate). Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in the corresponding element of rD. Exceptions: If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs, SPEFSCRFOVF,FOVFH are set appropriately, or if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 231 evfssubaddx evfssubaddx Vector Floating-Point Single-Precision Subtract / Add Exchanged evfssubaddx rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 1 0 1 1 rD0:31 rA32:63-sp rB0:31 rD32:63 rA0:31 +sp rB32:63 The high-order single-precision floating-point element of rB is subtracted from the low-order element of rA, the low-order single-precision floating-point element of rB is added to the high-order from the corresponding element of rA, and the results are stored in rD. If an element of rA is NaN or infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if an element of rB is NaN or infinity, the corresponding result is either nmax or pmax (as appropriate). Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in the corresponding element of rD. Exceptions: If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs, SPEFSCRFOVF,FOVFH are set appropriately, or if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 232 Freescale Semiconductor evfssubx evfssubx Vector Floating-Point Single-Precision Subtract Exchanged evfssubx rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 1 0 0 1 rD0:31 rA32:63-sp rB0:31 rD32:63 rA0:31 -sp rB32:63 The high-order single-precision floating-point element of rB is subtracted from the low-order element of rA, the low-order single-precision floating-point element of rB is subtracted from the high-order from the corresponding element of rA, and the results are stored in rD. If an element of rA is NaN or infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if an element of rB is NaN or infinity, the corresponding result is either nmax or pmax (as appropriate). Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in the corresponding element of rD. Exceptions: If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs, SPEFSCRFOVF,FOVFH are set appropriately, or if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 233 evfssum evfssum Vector Floating-Point Single-Precision Sums evfssum rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 0 1 0 0 rD0:31 rA0:31 +sp rA32:63 rD32:63 rB0:31 +sp rB32:63 The high-order single-precision floating-point element of rA is added to the low-order element of rA, the high-order single-precision floating-point element of rB is added to the low-order element of rB, and the results are stored in rD. If the high-order element of rA or rB is NaN or infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if the low order element of rA or rB is NaN or infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in the corresponding element of rD. Exceptions: If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs, SPEFSCRFOVF,FOVFH are set appropriately, or if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS,FINXSH is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 234 Freescale Semiconductor evfssumdiff evfssumdiff Vector Floating-Point Single-Precision Sum / Difference evfssumdiff rD,rA,rB 0 0 5 0 0 1 0 0 6 10 11 RD 15 16 RA 20 21 RB 0 31 1 0 1 0 1 0 0 1 1 0 rD0:31 rA0:31 +sp rA32:63 rD32:63 rB0:31 -sp rB32:63 The high-order single-precision floating-point element of rA is added to the low-order element of rA, the low-order single-precision floating-point element of rB is subtracted from the high-order element of rB, and the results are stored in rD. If the high-order element of rA or rB is NaN or infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if the low order element of rA or rB is NaN or infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or -0 (for rounding mode RM) is stored in the corresponding element of rD. Exceptions: If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCRFINV,FINVH are set appropriately, and SPEFSCRFGH,FXH,FG,FX are cleared appropriately. If SPEFSCRFINVE is set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs, SPEFSCRFOVF,FOVFH are set appropriately, or if an underflow occurs, SPEFSCRFUNF,FUNFH are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated. If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled, and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt is taken, SPEFSCRFINXS,FINXSH is set. If the floating-point inexact exception is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the interrupt handler. FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid operation/input error is signaled for the low (high) element (regardless of FINVE). e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 235 evfststeq evfststeq Vector Floating-Point Single-Precision Test Equal evfststeq crfD,rA,rB 0 5 4 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 0 31 1 0 1 0 0 1 1 1 1 0 Description: ah = RA0:31 al = RA32:63 bh = RB0:31 bl = RB32:63 if (ah == bh) then ch = 1 else ch = 0 if (al == bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl) Each element of rA is compared against the corresponding element of rB. If rA equals RB, the bit in crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. No exceptions are taken during the execution of evfststeq. If strict IEEE 754 compliance is required, the program should use evfscmpeq. Implementation note: In an implementation, the execution of evfststeq is likely to be faster than the execution of evfscmpeq. e200z759n3 Core Reference Manual, Rev. 2 236 Freescale Semiconductor evfststgt evfststgt Vector Floating-Point Single-Precision Test Greater Than evfststgt crfD,rA,rB 0 5 4 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 0 31 1 0 1 0 0 1 1 1 0 0 Description: ah = RA0:31 al = RA32:63 bh = RB0:31 bl = RB32:63 if (ah > bh) then ch = 1 else ch = 0 if (al > bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl) Each element of rA is compared against the corresponding element of rB. If rA is greater than rB, the bit in crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. No exceptions are taken during the execution of evfststgt. If strict IEEE 754 compliance is required, the program should use evfscmpgt. Implementation note: In an implementation, the execution of evfststgt is likely to be faster than the execution of evfscmpgt. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 237 evfststlt evfststlt Vector Floating-Point Single-Precision Test Less Than evfststlt crfD,rA,rB 0 5 4 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 0 31 1 0 1 0 0 1 1 1 0 1 Description: ah = RA0:31 al = RA32:63 bh = RB0:31 bl = RB32:63 if (ah < bh) then ch = 1 else ch = 0 if (al < bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl) Each element of rA is compared with the corresponding element of rB. If rA is less than rB, the bit in the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = -0). The comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly. No exceptions are taken during the execution of evfststlt. If strict IEEE 754 compliance is required, the program should use evfscmplt. Implementation note: In an implementation, the execution of evfststlt is likely to be faster than the execution of evfscmplt. 5.4 Embedded floating-point results summary The following table summarizes the results of floating-point operations on various combinations of input operands. Flag settings are performed on appropriate element flags. FINV FOVF FUNF FDBZ FINX Table 5-2. Floating-point results summary — add, sub, mul, div Add amax 1 0 0 0 0 Add NaN amax 1 0 0 0 0 Add denorm amax 1 0 0 0 0 Add zero amax 1 0 0 0 0 Add Norm amax 1 0 0 0 0 Add NaN amax 1 0 0 0 0 Add NaN NaN amax 1 0 0 0 0 Operation Operand A Operand B Result Add e200z759n3 Core Reference Manual, Rev. 2 238 Freescale Semiconductor Operation Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Table 5-2. Floating-point results summary — add, sub, mul, div (continued) Add NaN denorm amax 1 0 0 0 0 Add NaN zero amax 1 0 0 0 0 Add NaN norm amax 1 0 0 0 0 Add denorm bmax 1 0 0 0 0 Add denorm NaN bmax 1 0 0 0 0 Add denorm denorm zero1 1 0 0 0 0 Add denorm zero zero1 1 0 0 0 0 Add denorm norm operand_b 1 0 0 0 0 Add zero bmax 1 0 0 0 0 Add zero NaN bmax 1 0 0 0 0 Add zero denorm zero1 1 0 0 0 0 Add zero zero zero1 0 0 0 0 0 Add zero norm operand_b 0 0 0 0 0 Add norm bmax 1 0 0 0 0 Add norm NaN bmax 1 0 0 0 0 Add norm denorm operand_a 1 0 0 0 0 Add norm zero operand_a 0 0 0 0 0 Add norm norm _Calc_ 0 * * 0 * Subtract Sub amax 1 0 0 0 0 Sub NaN amax 1 0 0 0 0 Sub denorm amax 1 0 0 0 0 Sub zero amax 1 0 0 0 0 Sub Norm amax 1 0 0 0 0 Sub NaN amax 1 0 0 0 0 Sub NaN NaN amax 1 0 0 0 0 Sub NaN denorm amax 1 0 0 0 0 Sub NaN zero amax 1 0 0 0 0 Sub NaN norm amax 1 0 0 0 0 Sub denorm -bmax 1 0 0 0 0 Sub denorm NaN -bmax 1 0 0 0 0 Sub denorm denorm zero2 1 0 0 0 0 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 239 Operation Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Table 5-2. Floating-point results summary — add, sub, mul, div (continued) Sub denorm zero zero2 1 0 0 0 0 Sub denorm norm -operand_b 1 0 0 0 0 Sub zero -bmax 1 0 0 0 0 Sub zero NaN -bmax 1 0 0 0 0 Sub zero denorm zero2 1 0 0 0 0 Sub zero zero zero2 0 0 0 0 0 Sub zero norm -operand_b 0 0 0 0 0 Sub norm -bmax 1 0 0 0 0 Sub norm NaN -bmax 1 0 0 0 0 Sub norm denorm operand_a 1 0 0 0 0 Sub norm zero operand_a 0 0 0 0 0 Sub norm norm _Calc_ 0 * * 0 * Multiply3 Mul max 1 0 0 0 0 Mul NaN max 1 0 0 0 0 Mul denorm zero 1 0 0 0 0 Mul zero zero 1 0 0 0 0 Mul Norm max 1 0 0 0 0 Mul NaN max 1 0 0 0 0 Mul NaN NaN max 1 0 0 0 0 Mul NaN denorm zero 1 0 0 0 0 Mul NaN zero zero 1 0 0 0 0 Mul NaN norm max 1 0 0 0 0 Mul denorm zero 1 0 0 0 0 Mul denorm NaN zero 1 0 0 0 0 Mul denorm denorm zero 1 0 0 0 0 Mul denorm zero zero 1 0 0 0 0 Mul denorm norm zero 1 0 0 0 0 Mul zero zero 1 0 0 0 0 Mul zero NaN zero 1 0 0 0 0 Mul zero denorm zero 1 0 0 0 0 Mul zero zero zero 0 0 0 0 0 e200z759n3 Core Reference Manual, Rev. 2 240 Freescale Semiconductor Operation Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Table 5-2. Floating-point results summary — add, sub, mul, div (continued) Mul zero norm zero 0 0 0 0 0 Mul norm max 1 0 0 0 0 Mul norm NaN max 1 0 0 0 0 Mul norm denorm zero 1 0 0 0 0 Mul norm zero zero 0 0 0 0 0 Mul norm norm _Calc_ 0 * * 0 * Divide3 Div zero 1 0 0 0 0 Div NaN zero 1 0 0 0 0 Div denorm max 1 0 0 0 0 Div zero max 1 0 0 0 0 Div Norm max 1 0 0 0 0 Div NaN zero 1 0 0 0 0 Div NaN NaN zero 1 0 0 0 0 Div NaN denorm max 1 0 0 0 0 Div NaN zero max 1 0 0 0 0 Div NaN norm max 1 0 0 0 0 Div denorm zero 1 0 0 0 0 Div denorm NaN zero 1 0 0 0 0 Div denorm denorm max 1 0 0 0 0 Div denorm zero max 1 0 0 0 0 Div denorm norm zero 1 0 0 0 0 Div zero zero 1 0 0 0 0 Div zero NaN zero 1 0 0 0 0 Div zero denorm max 1 0 0 0 0 Div zero zero max 1 0 0 0 0 Div zero norm zero 0 0 0 0 0 Div norm zero 1 0 0 0 0 Div norm NaN zero 1 0 0 0 0 Div norm denorm max 1 0 0 0 0 Div norm zero max 0 0 0 1 0 Div norm norm _Calc_ 0 * * 0 * e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 241 FINX Result FDBZ Operand B FUNF Operand A FOVF Operation FINV Table 5-2. Floating-point results summary — add, sub, mul, div (continued) Notes: the following definitions apply 1 - sign of result is positive when sign_a and sign_b are different for all rounding modes except round to minus infinity, where it is negative. 2 - sign of result is positive when sign_a and sign_b are the same for all rounding modes except round to minus infinity, where it is negative. 3 - sign of result is always (sign_a XOR sign_b) * - updated according to results of calculation _Calc_ - result is updated with the results of calculation max - max normalized number with sign of (sign_a XOR sign_b) amax - max normalized number with sign of sign_a bmax - max normalized number with sign of sign_b nmax - max negative normalized number pmax - max positive normalized number FINV FOVF FUNF FDBZ FINX Table 5-3. Floating-point results summary — madd, msub, nmadd, nmsub madd , NaN , NaN, Norm , NaN, denorm, zero, Norm abmax 1 0 0 0 0 madd , NaN denorm, zero , NaN dmax 1 0 0 0 0 madd , NaN denorm, zero denorm, zero 1 zero 1 0 0 0 0 madd , NaN denorm, zero Norm operand_d 1 0 0 0 0 madd denorm , NaN, denorm, zero, Norm , NaN dmax 1 0 0 0 0 madd denorm , NaN, denorm, zero, Norm denorm, zero zero1 1 0 0 0 0 madd denorm , NaN, denorm, zero, Norm Norm operand_d 1 0 0 0 0 madd zero , NaN, denorm, , NaN dmax 1 0 0 0 0 madd zero , NaN, denorm denorm, zero zero1 1 0 0 0 0 madd zero , NaN, denorm Norm operand_d 1 0 0 0 0 madd zero zero, Norm , NaN dmax 1 0 0 0 0 madd zero zero, Norm denorm zero1 1 0 0 0 0 madd zero zero, Norm zero zero1 0 0 0 0 0 madd zero zero, Norm Norm operand_d 0 0 0 0 0 madd norm , NaN , NaN, denorm, zero, Norm abmax 1 0 0 0 0 madd norm denorm , NaN dmax 1 0 0 0 0 Operation Operand A Operand B Operand D Result madd e200z759n3 Core Reference Manual, Rev. 2 242 Freescale Semiconductor Operation Operand A Operand B Operand D Result FINV FOVF FUNF FDBZ FINX Table 5-3. Floating-point results summary — madd, msub, nmadd, nmsub (continued) madd norm denorm denorm, zero zero1 1 0 0 0 0 madd norm denorm norm operand_d 1 0 0 0 0 madd norm zero , NaN dmax 1 0 0 0 0 madd norm zero denorm zero1 1 0 0 0 0 madd norm zero zero zero1 0 0 0 0 0 madd norm zero norm operand_d 0 0 0 0 0 madd norm norm , NaN dmax 1 0 0 0 0 madd norm norm denorm ab_Calc 1 * * 0 * madd norm norm zero ab_Calc 0 * * 0 * madd norm norm norm _Calc_ 0 * * 0 * nmadd nmadd , NaN , NaN, Norm , NaN, denorm, zero, Norm -abmax 1 0 0 0 0 nmadd , NaN denorm, zero , NaN -dmax 1 0 0 0 0 nmadd , NaN denorm, zero denorm, zero zero3 1 0 0 0 0 nmadd , NaN denorm, zero Norm -operand_d 1 0 0 0 0 nmadd denorm , NaN, denorm, zero, Norm , NaN -dmax 1 0 0 0 0 nmadd denorm , NaN, denorm, zero, Norm denorm, zero zero3 1 0 0 0 0 nmadd denorm , NaN, denorm, zero, Norm Norm -operand_d 1 0 0 0 0 nmadd zero , NaN, denorm, , NaN -dmax 1 0 0 0 0 nmadd zero , NaN, denorm denorm, zero zero3 1 0 0 0 0 nmadd zero , NaN, denorm Norm -operand_d 1 0 0 0 0 nmadd zero zero, Norm , NaN -dmax 1 0 0 0 0 nmadd zero zero, Norm denorm zero3 1 0 0 0 0 nmadd zero zero, Norm zero zero3 0 0 0 0 0 nmadd zero zero, Norm Norm -operand_d 0 0 0 0 0 nmadd norm , NaN , NaN, denorm, zero, Norm -abmax 1 0 0 0 0 nmadd norm denorm , NaN -dmax 1 0 0 0 0 nmadd norm denorm denorm, zero zero3 1 0 0 0 0 nmadd norm denorm norm -operand_d 1 0 0 0 0 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 243 Operation Operand A Operand B Operand D Result FINV FOVF FUNF FDBZ FINX Table 5-3. Floating-point results summary — madd, msub, nmadd, nmsub (continued) nmadd norm zero , NaN -dmax 1 0 0 0 0 nmadd norm zero denorm zero3 1 0 0 0 0 nmadd norm zero zero zero3 0 0 0 0 0 nmadd norm zero norm -operand_d 0 0 0 0 0 nmadd norm norm , NaN -dmax 1 0 0 0 0 nmadd norm norm denorm -ab_Calc 1 * * 0 * nmadd norm norm zero -ab_Calc 0 * * 0 * nmadd norm norm norm -(_Calc_) 0 * * 0 * msub msub , NaN , NaN, Norm , NaN, denorm, zero, Norm abmax 1 0 0 0 0 msub , NaN denorm, zero , NaN -dmax 1 0 0 0 0 msub , NaN denorm, zero denorm, zero zero2 1 0 0 0 0 msub , NaN denorm, zero Norm -operand_d 1 0 0 0 0 msub denorm , NaN, denorm, zero, Norm , NaN -dmax 1 0 0 0 0 msub denorm , NaN, denorm, zero, Norm denorm, zero zero2 1 0 0 0 0 msub denorm , NaN, denorm, zero, Norm Norm -operand_d 1 0 0 0 0 msub zero , NaN, denorm, , NaN -dmax 1 0 0 0 0 msub zero , NaN, denorm denorm, zero zero2 1 0 0 0 0 msub zero , NaN, denorm Norm -operand_d 1 0 0 0 0 msub zero zero, Norm , NaN -dmax 1 0 0 0 0 2 msub zero zero, Norm denorm zero 1 0 0 0 0 msub zero zero, Norm zero zero2 0 0 0 0 0 msub zero zero, Norm Norm -operand_d 0 0 0 0 0 msub norm , NaN , NaN, denorm, zero, Norm abmax 1 0 0 0 0 msub norm denorm , NaN -dmax 1 0 0 0 0 msub norm denorm denorm, zero zero2 1 0 0 0 0 msub norm denorm norm -operand_d 1 0 0 0 0 msub norm zero , NaN -dmax 1 0 0 0 0 2 1 0 0 0 0 msub norm zero denorm zero e200z759n3 Core Reference Manual, Rev. 2 244 Freescale Semiconductor Operation Operand A Operand B Operand D Result FINV FOVF FUNF FDBZ FINX Table 5-3. Floating-point results summary — madd, msub, nmadd, nmsub (continued) msub norm zero zero zero2 0 0 0 0 0 msub norm zero norm -operand_d 0 0 0 0 0 msub norm norm , NaN -dmax 1 0 0 0 0 msub norm norm denorm ab_Calc 1 * * 0 * msub norm norm zero ab_Calc 0 * * 0 * msub norm norm norm _Calc_ 0 * * 0 * nmsub nmsub , NaN , NaN, Norm , NaN, denorm, zero, Norm -abmax 1 0 0 0 0 nmsub , NaN denorm, zero , NaN dmax 1 0 0 0 0 nmsub , NaN denorm, zero denorm, zero zero4 1 0 0 0 0 nmsub , NaN denorm, zero Norm operand_d 1 0 0 0 0 nmsub denorm , NaN, denorm, zero, Norm , NaN dmax 1 0 0 0 0 nmsub denorm , NaN, denorm, zero, Norm denorm, zero zero4 1 0 0 0 0 nmsub denorm , NaN, denorm, zero, Norm Norm operand_d 1 0 0 0 0 nmsub zero , NaN, denorm, , NaN dmax 1 0 0 0 0 nmsub zero , NaN, denorm denorm, zero zero4 1 0 0 0 0 nmsub zero , NaN, denorm Norm operand_d 1 0 0 0 0 nmsub zero zero, Norm , NaN dmax 1 0 0 0 0 4 nmsub zero zero, Norm denorm zero 1 0 0 0 0 nmsub zero zero, Norm zero zero4 0 0 0 0 0 nmsub zero zero, Norm Norm -operand_d 0 0 0 0 0 nmsub norm , NaN , NaN, denorm, zero, Norm -abmax 1 0 0 0 0 nmsub norm denorm , NaN dmax 1 0 0 0 0 nmsub norm denorm denorm, zero zero4 1 0 0 0 0 nmsub norm denorm norm operand_d 1 0 0 0 0 nmsub norm zero , NaN dmax 1 0 0 0 0 4 nmsub norm zero denorm zero 1 0 0 0 0 nmsub norm zero zero zero4 0 0 0 0 0 nmsub norm zero norm operand_d 0 0 0 0 0 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 245 Operation Operand A Operand B Operand D Result FINV FOVF FUNF FDBZ FINX Table 5-3. Floating-point results summary — madd, msub, nmadd, nmsub (continued) nmsub norm norm , NaN dmax 1 0 0 0 0 nmsub norm norm denorm -ab_Calc 1 * * 0 * nmsub norm norm zero -ab_Calc 0 * * 0 * nmsub norm norm norm -(_Calc_) 0 * * 0 * Notes: the following definitions apply 1 - sign of result is positive when (sign_a XOR sign_b) and sign_d are different for all rounding modes except round to minus infinity, where it is negative. 2 - sign of result is positive when (sign_a XOR sign_b) and sign_d are the same for all rounding modes except round to minus infinity, where it is negative. 3 - sign of result is negative when (sign_a XOR sign_b) and sign_d are different for all rounding modes except round to minus infinity, where it is positive. 4 - sign of result is negative when (sign_a XOR sign_b) and sign_d are the same for all rounding modes except round to minus infinity, where it is positive. * - updated according to results of calculation ab_Calc - result is updated with the results of intermediate product calculation, rounded _Calc_ - result is updated with the results of calculation, rounded abmax - max normalized number with sign of (sign_a XOR sign_b) dmax - max normalized number with sign of sign_d nmax - max negative normalized number pmax - max positive normalized number Operand A Result FINV FOVF FUNF FDBZ FINX Table 5-4. Floating-point results summary—sqrt + pmax 1 0 0 0 0 - -0 1 0 0 0 0 +NaN pmax 1 0 0 0 0 -NaN -0 1 0 0 0 0 +denorm +zero 1 0 0 0 0 -denorm -zero 1 0 0 0 0 +zero +zero 0 0 0 0 0 -zero -zero 0 0 0 0 0 +norm _Calc_ 0 * * 0 * -norm -0 1 0 0 0 0 e200z759n3 Core Reference Manual, Rev. 2 246 Freescale Semiconductor FINV FOVF FUNF FDBZ FINX Table 5-5. Floating-point results summary—min, max pmax 1 0 0 0 0 pmax 1 0 0 0 0 +NaN pmax 1 0 0 0 0 -NaN pmax 1 0 0 0 0 denorm pmax 1 0 0 0 0 zero pmax 1 0 0 0 0 Norm pmax 1 0 0 0 0 pmax 1 0 0 0 0 nmax 1 0 0 0 0 +NaN nmax 1 0 0 0 0 -NaN nmax 1 0 0 0 0 denorm bzero 1 0 0 0 0 zero bzero 1 0 0 0 0 Norm operand_b 1 0 0 0 0 +NaN pmax 1 0 0 0 0 +NaN nmax 1 0 0 0 0 +NaN +NaN pmax 1 0 0 0 0 +NaN -NaN pmax 1 0 0 0 0 +NaN denorm bzero 1 0 0 0 0 +NaN zero bzero 1 0 0 0 0 +NaN Norm operand_b 1 0 0 0 0 -NaN pmax 1 0 0 0 0 -NaN nmax 1 0 0 0 0 -NaN +NaN pmax 1 0 0 0 0 -NaN -NaN nmax 1 0 0 0 0 -NaN denorm bzero 1 0 0 0 0 -NaN zero bzero 1 0 0 0 0 -NaN Norm operand_b 1 0 0 0 0 +denorm pmax 1 0 0 0 0 +denorm azero 1 0 0 0 0 +denorm +NaN azero 1 0 0 0 0 Operand A Operand B Result Max e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 247 Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Table 5-5. Floating-point results summary—min, max (continued) +denorm -NaN azero 1 0 0 0 0 +denorm denorm azero 1 0 0 0 0 +denorm zero azero 1 0 0 0 0 +denorm +Norm operand_b 1 0 0 0 0 +denorm -Norm azero 1 0 0 0 0 -denorm pmax 1 0 0 0 0 -denorm azero 1 0 0 0 0 -denorm +NaN azero 1 0 0 0 0 -denorm -NaN azero 1 0 0 0 0 -denorm denorm bzero 1 0 0 0 0 -denorm zero bzero 1 0 0 0 0 -denorm +Norm operand_b 1 0 0 0 0 -denorm -Norm azero 1 0 0 0 0 +zero pmax 1 0 0 0 0 +zero azero 1 0 0 0 0 +zero +NaN azero 1 0 0 0 0 +zero -NaN azero 1 0 0 0 0 +zero denorm azero 1 0 0 0 0 +zero zero azero 0 0 0 0 0 +zero +Norm operand_b 0 0 0 0 0 +zero -Norm azero 0 0 0 0 0 -zero pmax 1 0 0 0 0 -zero azero 1 0 0 0 0 -zero +NaN azero 1 0 0 0 0 -zero -NaN azero 1 0 0 0 0 -zero denorm bzero 1 0 0 0 0 -zero zero bzero 0 0 0 0 0 -zero +Norm operand_b 0 0 0 0 0 -zero -Norm azero 0 0 0 0 0 +Norm pmax 1 0 0 0 0 +Norm operand_a 1 0 0 0 0 +Norm +NaN operand_a 1 0 0 0 0 e200z759n3 Core Reference Manual, Rev. 2 248 Freescale Semiconductor Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Table 5-5. Floating-point results summary—min, max (continued) +Norm -NaN operand_a 1 0 0 0 0 +Norm denorm operand_a 1 0 0 0 0 +Norm zero operand_a 0 0 0 0 0 +Norm Norm _Calc_ 0 0 0 0 0 -Norm pmax 1 0 0 0 0 -Norm operand_a 1 0 0 0 0 -Norm +NaN operand_a 1 0 0 0 0 -Norm -NaN operand_a 1 0 0 0 0 -Norm denorm bzero 1 0 0 0 0 -Norm zero bzero 0 0 0 0 0 -Norm Norm _Calc_ 0 0 0 0 0 Min pmax 1 0 0 0 0 nmax 1 0 0 0 0 +NaN pmax 1 0 0 0 0 -NaN pmax 1 0 0 0 0 denorm bzero 1 0 0 0 0 zero bzero 1 0 0 0 0 Norm operand_b 1 0 0 0 0 nmax 1 0 0 0 0 nmax 1 0 0 0 0 +NaN nmax 1 0 0 0 0 -NaN nmax 1 0 0 0 0 denorm nmax 1 0 0 0 0 zero nmax 1 0 0 0 0 Norm nmax 1 0 0 0 0 +NaN pmax 1 0 0 0 0 +NaN nmax 1 0 0 0 0 +NaN +NaN pmax 1 0 0 0 0 +NaN -NaN nmax 1 0 0 0 0 +NaN denorm bzero 1 0 0 0 0 +NaN zero bzero 1 0 0 0 0 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 249 Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Table 5-5. Floating-point results summary—min, max (continued) +NaN Norm operand_b 1 0 0 0 0 -NaN pmax 1 0 0 0 0 -NaN nmax 1 0 0 0 0 -NaN +NaN nmax 1 0 0 0 0 -NaN -NaN nmax 1 0 0 0 0 -NaN denorm bzero 1 0 0 0 0 -NaN zero bzero 1 0 0 0 0 -NaN Norm operand_b 1 0 0 0 0 +denorm azero 1 0 0 0 0 +denorm nmax 1 0 0 0 0 +denorm +NaN azero 1 0 0 0 0 +denorm -NaN azero 1 0 0 0 0 +denorm denorm bzero 1 0 0 0 0 +denorm zero bzero 1 0 0 0 0 +denorm +Norm azero 1 0 0 0 0 +denorm -Norm operand_b 1 0 0 0 0 -denorm azero 1 0 0 0 0 -denorm nmax 1 0 0 0 0 -denorm +NaN azero 1 0 0 0 0 -denorm -NaN azero 1 0 0 0 0 -denorm denorm azero 1 0 0 0 0 -denorm zero azero 1 0 0 0 0 -denorm +Norm azero 1 0 0 0 0 -denorm -Norm operand_b 1 0 0 0 0 +zero azero 1 0 0 0 0 +zero nmax 1 0 0 0 0 +zero +NaN azero 1 0 0 0 0 +zero -NaN azero 1 0 0 0 0 +zero denorm bzero 1 0 0 0 0 +zero zero bzero 0 0 0 0 0 +zero +Norm azero 0 0 0 0 0 +zero -Norm operand_b 0 0 0 0 0 e200z759n3 Core Reference Manual, Rev. 2 250 Freescale Semiconductor Operand A Operand B Result FINV FOVF FUNF FDBZ FINX Table 5-5. Floating-point results summary—min, max (continued) -zero azero 1 0 0 0 0 -zero nmax 1 0 0 0 0 -zero +NaN azero 1 0 0 0 0 -zero -NaN azero 1 0 0 0 0 -zero denorm azero 1 0 0 0 0 -zero zero azero 0 0 0 0 0 -zero +Norm azero 0 0 0 0 0 -zero -Norm operand_b 0 0 0 0 0 +Norm operand_a 1 0 0 0 0 +Norm nmax 1 0 0 0 0 +Norm +NaN operand_a 1 0 0 0 0 +Norm -NaN operand_a 1 0 0 0 0 +Norm denorm bzero 1 0 0 0 0 +Norm zero bzero 0 0 0 0 0 +Norm Norm _Calc_ 0 0 0 0 0 -Norm operand_a 1 0 0 0 0 -Norm nmax 1 0 0 0 0 -Norm +NaN operand_a 1 0 0 0 0 -Norm -NaN operand_a 1 0 0 0 0 -Norm denorm operand_a 1 0 0 0 0 -Norm zero operand_a 0 0 0 0 0 -Norm Norm _Calc_ 0 0 0 0 0 Operand B integer result efsctui[z] Fractional result efsctuf FINV FOVF FUNF FDBZ FINX Table 5-6. Floating-point results summary — convert to unsigned + 0xFFFF_FFFF 0xFFFF_FFFF 1 0 0 0 0 - zero zero 1 0 0 0 0 +NaN zero zero 1 0 0 0 0 -NaN zero zero 1 0 0 0 0 denorm zero zero 1 0 0 0 0 zero zero zero 0 0 0 0 0 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 251 Operand B integer result efsctui[z] Fractional result efsctuf FINV FOVF FUNF FDBZ FINX Table 5-6. Floating-point results summary — convert to unsigned (continued) +norm _Calc_ _Calc_ * 0 0 0 * -norm zero zero 0 0 0 0 0 Operand B integer result efsctsi[z] Fractional result efsctsf FINV FOVF FUNF FDBZ FINX Table 5-7. Floating-point results summary —convert to signed + 0x7FFF_FFFF 0x7FFF_FFFF 1 0 0 0 0 - 0x8000_0000 0x8000_0000 1 0 0 0 0 +NaN zero zero 1 0 0 0 0 -NaN zero zero 1 0 0 0 0 denorm zero zero 1 0 0 0 0 zero zero zero 0 0 0 0 0 +norm _Calc_ _Calc_ * 0 0 0 * -norm _Calc_ _Calc_ * 0 0 0 * Operand B integer source efscfui Fractional source efscfuf FINV FOVF FUNF FDBZ FINX Table 5-8. Floating-point results summary — convert from unsigned zero zero zero 0 0 0 0 0 norm _Calc_ _Calc_ 0 0 0 0 * Operand B integer source efscfsi Fractional source efscfsf FINV FOVF FUNF FDBZ FINX Table 5-9. Floating-point results summary — convert from signed zero zero zero 0 0 0 0 0 norm _Calc_ _Calc_ 0 0 0 0 * fabs fnabs fneg FOVF FUNF FDBZ FINX Operand A FINV Table 5-10. Floating-point results summary — fabs, fnabs, fneg + - -A 1 0 0 0 0 NaN Sign bit cleared Sign bit set -A 1 0 0 0 0 denorm Sign bit cleared Sign bit set -A 1 0 0 0 0 e200z759n3 Core Reference Manual, Rev. 2 252 Freescale Semiconductor fabs fnabs fneg FOVF FUNF FDBZ FINX Operand A FINV Table 5-10. Floating-point results summary — fabs, fnabs, fneg zero zero zero zero 0 0 0 0 0 norm norm norm norm 0 0 0 0 0 Operand B e[v]fscfh FINV FOVF FUNF FDBZ FINX Table 5-11. Floating-point results summary — convert from half-precision bmax 1 0 0 0 0 NaN bmax 1 0 0 0 0 denorm bzero 1 0 0 0 0 zero bzero 0 0 0 0 0 +norm _Calc_ 0 0 0 0 * -norm _Calc_ 0 0 0 0 * 5.5 Operand B e[v]fscth FINV FOVF FUNF FDBZ FINX Table 5-12. Floating-point results summary — convert to half-precision bmaxhp 1 0 0 0 0 NaN bmaxhp 1 0 0 0 0 denorm bzero 1 0 0 0 0 zero bzero 0 0 0 0 0 +norm _Calc_ 0 * * 0 * -norm _Calc_ 0 * * 0 * EFPU instruction timing Instruction timing in number of processor clock cycles for EFPU instructions are shown in Table 5-13, and Table 5-14. Pipelined instructions are shown with cycles of total latency and throughput cycles. Divide instructions are not pipelined and block other instructions from executing during divide execution. Instruction pipelining in the CPU is affected by the possibility of a floating-point instruction generating an exception. A load or store class instruction that follows an EFPU instruction will stall until it can be ensured that no previous instruction can generate a floating-point exception. This determination is based on which floating-point exception enable bits are set (FINVE, FOVFE, FUNFE, FDBZE, and FINXE) and at what point in the FPU pipeline an exception can be guaranteed to not occur. Invalid input operands are detected in the first stage of the pipeline, while underflow, overflow, and inexactness are determined later in the pipeline. Best overall performance occurs when either floating-point exceptions are disabled, or when load and store class instructions are scheduled such that previous floating-point instructions have already resolved the possibility of exceptional results. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 253 5.5.1 EFPU single-precision vector floating-point instruction timing Instruction timing for EFPU vector floating-point instructions is shown in Table 5-13. The table is sorted by opcode. The number of stall cycles for evfsdiv and evfssqrt is (latency) cycles. Table 5-13. EFPU vector floating-point instruction timing Instruction Latency Throughput Comments evfsabs 4 1 — evfsadd 4 1 — evfsaddx 4 1 — evfsaddsub 4 1 — evfsaddsubx 4 1 — evfscfh 4 1 — evfscfsf 4 1 — evfscfsi 4 1 — evfscfuf 4 1 — evfscfui 4 1 — evfscmpeq 4 1 — evfscmpgt 4 1 — evfscmplt 4 1 — evfscth 4 1 — evfsctsf 4 1 — evfsctsi 4 1 — evfsctsiz 4 1 — evfsctuf 4 1 — evfsctui 4 1 — evfsctuiz 4 1 — evfsdiff 4 1 — evfsdiffsum 4 1 — evfsdiv 13 13 blocking, no overlap with next inst. evfsmax 4 1 — evfsmin 4 1 — evfsmadd 4 11 dest also used as source evfsmsub 4 11 dest also used as source evfsmul 4 1 — evfsmule 4 1 — e200z759n3 Core Reference Manual, Rev. 2 254 Freescale Semiconductor Table 5-13. EFPU vector floating-point instruction timing (continued) 1 5.5.2 Instruction Latency Throughput Comments evfsmulo 4 1 — evfsmulx 4 1 — evfsnabs 4 1 — evfsneg 4 1 — evfsnmadd 4 11 dest also used as source evfsnmsub 4 11 dest also used as source evfssqrt 15 15 blocking, no overlap with next inst. evfssub 4 1 — evfssubx 4 1 — evfssubadd 4 1 — evfssubaddx 4 1 — evfssum 4 1 — evfssumdiff 4 1 — evfststeq 4 1 — evfststgt 4 1 — evfststlt 4 1 — Destination register is also a source register, so for full throughput, back-to-back operations must use a different dest reg. EFPU single-precision scalar floating-point instruction timing Instruction timing for EFPU single-precision scalar floating-point instructions is shown in Table 5-14. The table is sorted by opcode. Table 5-14. EFPU single-precision scalar floating-point instruction timing Instruction Latency Throughput Comments efsabs 4 1 — efsadd 4 1 — efscfh 4 1 — efscfsf 4 1 — efscfsi 4 1 — efscfuf 4 1 — efscfui 4 1 — efscmpeq 4 1 — e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 255 Table 5-14. EFPU single-precision scalar floating-point instruction timing (continued) 1 5.6 Instruction Latency Throughput Comments efscmpgt 4 1 — efscmplt 4 1 — efscth 4 1 — efsctsf 4 1 — efsctsi 4 1 — efsctsiz 4 1 — efsctuf 4 1 — efsctui 4 1 — efsctuiz 4 1 — efsdiv 13 13 blocking, no execution overlap with next instruction efsmadd 4 11 dest also used as source efsmsub 4 11 dest also used as source efsmax 4 1 — efsmin 4 1 — efsmul 4 1 — efsnabs 4 1 — efsneg 4 1 — efsnmadd 4 11 dest also used as source efsnmsub 4 11 dest also used as source efssqrt 15 15 blocking, no overlap with next inst. efssub 4 1 — efststeq 4 1 — efststgt 4 1 — efststlt 4 1 — Destination register is also a source register, so for full throughput, back-to-back operations must use a different dest reg. Instruction forms and opcodes Table 5-15 gives the division of the opcode space for the EFPU instructions. This is the architectural assignment; not all instructions are implemented in all versions of the CPU. e200z759n3 Core Reference Manual, Rev. 2 256 Freescale Semiconductor Table 5-15. Opcode space division Opcode bits Instruction class 1 5.6.1 0–5 21–28 4 0101 00xx Embedded vector floating-point instructions 4 0101 010x Embedded vector floating-point instructions 4 0101 0110 Embedded scalar floating-point single-precision instructions 4 0101 0111 Reserved (Embedded scalar floating-point double-precision instructions)1 4 0101 10xx Embedded scalar floating-point single-precision instructions 4 0101 11xx Reserved (Embedded scalar floating-point double-precision instructions)1 Attempted execution of a defined EFP double-precision instruction will result in an Illegal instruction exception if MSRSPE =1, or an EFPU Unavailable exception if MSRSPE=0 Opcodes for EFPU vector floating-point instructions Table 5-16. Embedded vector floating-point instruction opcodes Opcode Bits Instruction Comments 0–5 6–10 11–15 16–20 21–24 25–31 evfsadd 4 rD rA rB 0101 0000000 evfssub 4 rD rA rB 0101 0000001 evfsmadd 4 rD rA rB 0101 0000010 — evfsmsub 4 rD rA rB 0101 0000011 — evfsabs 4 rD rA 00000 0101 0000100 — evfsnabs 4 rD rA 00000 0101 0000101 — evfsneg 4 rD rA 00000 0101 0000110 — evfssqrt 4 rD rA 00000 0101 0000111 — evfsmul 4 rD rA rB 0101 0001000 — evfsdiv 4 rD rA rB 0101 0001001 — evfsnmadd 4 rD rA rB 0101 0001010 — evfsnmsub 4 rD rA rB 0101 0001011 — evfscmpgt 4 crfD 00 rA rB 0101 0001100 — evfscmplt 4 crfD 00 rA rB 0101 0001101 — evfscmpeq 4 crfD 00 rA rB 0101 0001110 — 0101 0001111 — 4 — rA - rB evfscfui 4 rD 00000 rB 0101 0010000 — evfscfsi 4 rD 00000 rB 0101 0010001 — e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 257 Table 5-16. Embedded vector floating-point instruction opcodes (continued) Opcode Bits Instruction Comments 0–5 6–10 11–15 16–20 21–24 25–31 evfscfh 4 rD 00100 rB 0101 0010001 — evfscfuf 4 rD 00000 rB 0101 0010010 — evfscfsf 4 rD 00000 rB 0101 0010011 — evfsctui 4 rD 00000 rB 0101 0010100 — evfsctsi 4 rD 00000 rB 0101 0010101 — evfscth 4 rD 00100 rB 0101 0010101 — evfsctuf 4 rD 00000 rB 0101 0010110 — evfsctsf 4 rD 00000 rB 0101 0010111 — evfsctuiz 4 rD 00000 rB 0101 0011000 — 0101 0011001 — 0101 0011010 — 0101 0011011 — 4 evfsctsiz 4 rD 00000 rB 4 evfststgt 4 crfD 00 rA rB 0101 0011100 — evfststlt 4 crfD 00 rA rB 0101 0011101 — evfststeq 4 crfD 00 rA rB 0101 0011110 — 0101 0011111 — 4 evfsmax 4 rD rA rB 0101 0100000 — evfsmin 4 rD rA rB 0101 0100001 — evfsaddsub 4 rD rA rB 0101 0100010 — evfssubadd 4 rD rA rB 0101 0100011 evfssum 4 rD rA rB 0101 0100100 — evfsdiff 4 rD rA rB 0101 0100101 — evfssumdiff 4 rD rA rB 0101 0100110 — evfsdiffsum 4 rD rA rB 0101 0100111 — evfsaddx 4 rD rA rB 0101 0101000 — evfssubx 4 rD rA rB 0101 0101001 — evfsaddsubx 4 rD rA rB 0101 0101010 — evfssubaddx 4 rD rA rB 0101 0101011 evfsmulx 4 rD rA rB 0101 0101100 — 4 rD rA rB 0101 0101101 — rA - rB; rA + rB rA - rB; rA + rB e200z759n3 Core Reference Manual, Rev. 2 258 Freescale Semiconductor Table 5-16. Embedded vector floating-point instruction opcodes (continued) Opcode Bits Instruction 5.6.2 Comments 0–5 6–10 11–15 16–20 21–24 25–31 evfsmule 4 rD rA rB 0101 0101110 — evfsmulo 4 rD rA rB 0101 0101111 — Opcodes for EFPU scalar single-precision floating-point instructions Table 5-17. Embedded scalar single-precision floating-point instruction opcodes Opcode Bits Instruction Comments 0–5 6–10 11–15 16–20 21–24 25–31 efsmax 4 rD rA rB 0101 0110000 — efsmin 4 rD rA rB 0101 0110001 — efsadd 4 rD rA rB 0101 1000000 — efssub 4 rD rA rB 0101 1000001 efsmadd 4 rD rA rB 0101 1000010 — efsmsub 4 rD rA rB 0101 1000011 — efsabs 4 rD rA 00000 0101 1000100 — efsnabs 4 rD rA 00000 0101 1000101 — efsneg 4 rD rA 00000 0101 1000110 — efssqrt 4 rD rA 00000 0101 1000111 — efsmul 4 rD rA rB 0101 1001000 — efsdiv 4 rD rA rB 0101 1001001 — efsnmadd 4 rD rA rB 0101 1001010 — efsnmsub 4 rD rA rB 0101 1001011 — efscmpgt 4 crfD 00 rA rB 0101 1001100 — efscmplt 4 crfD 00 rA rB 0101 1001101 — efscmpeq 4 crfD 00 rA rB 0101 1001110 — efscfd 4 rD 00000 rB 0101 1001111 efscfui 4 rD 00000 rB 0101 1010000 — efscfsi 4 rD 00000 rB 0101 1010001 — efscfh 4 rD 00100 rB 0101 1010001 — rA - rB optional, not implemented e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 259 Table 5-17. Embedded scalar single-precision floating-point instruction opcodes (continued) Opcode Bits Instruction Comments 0–5 6–10 11–15 16–20 21–24 25–31 efscfuf 4 rD 00000 rB 0101 1010010 — efscfsf 4 rD 00000 rB 0101 1010011 — efsctui 4 rD 00000 rB 0101 1010100 — efsctsi 4 rD 00000 rB 0101 1010101 — efscth 4 rD 00100 rB 0101 1010101 — efsctuf 4 rD 00000 rB 0101 1010110 — efsctsf 4 rD 00000 rB 0101 1010111 — efsctuiz 4 rD 00000 rB 0101 1011000 — 0101 1011001 — 0101 1011010 — 0101 1011011 — 4 efsctsiz 4 rD 00000 rB 4 efststgt 4 crfD 00 rA rB 0101 1011100 — efststlt 4 crfD 00 rA rB 0101 1011101 — efststeq 4 crfD 00 rA rB 0101 1011110 — 0101 1011111 — 4 e200z759n3 Core Reference Manual, Rev. 2 260 Freescale Semiconductor Chapter 6 Signal Processing Extension APU (SPE APU) This chapter describes the instruction set architecture of the SPE version 1.1 APU. This unit implements instructions to accelerate signal processing and other algorithms. 6.1 Nomenclature and conventions Several conventions regarding nomenclature are used in this chapter: • Due to historical precedent, the terms SPE and SIMD are sometimes used interchangeably • Bits 0 to 31 of a 64-bit register are referenced as field 0, upper half, or high-order element of the register. Bits 32–63 are referred to as field 1, lower half, or lower-order element of the register. Each half is an element of a GPR. • Mnemonics for SPE APU instructions generally begin with the letters ‘ev’ (vector). 6.2 SPE programming model The e200z759n3 core provides a register file with thirty-two 64-bit registers. The Power Architecture 32-bit Book E instructions operate on the lower (least significant) 32 bits of the 64-bit register. New SPE instructions are defined that view the 64-bit register as being composed of a vector of two 32-bit elements, and some of the instructions also read or write 16-bit elements. These new instructions can also be used to perform scalar operations by ignoring the results of the upper 32-bit half of the register file. Some instructions are defined that produce a 64-bit scalar result. Vector fixed-point instructions operate on a vector of two 32-bit or four 16-bit fixed-point numbers resident in the 64-bit GPRs. The SPE and Book E instructions issue from a single instruction stream. There are no record forms of SPE instructions. Vector compare instructions store the result of the comparison into the condition register (CR). The meaning of the CR bits are now overloaded for the vector operations. Vector compare instructions specify a CR field, two source registers and the type of compare: greater than, less than, or equal. Two bits in the CR field are written with the result of the vector compare, one for each element. The remaining two bits reflect the ‘and’ing and ‘or’ing of the vector compare results. A partially visible accumulator register is architected for the SPE integer and fractional multiply accumulate forms of instructions. Its usage is described in Section 6.2.2, Accumulator. 6.2.1 SPE Status and Control Register (SPEFSCR) The e200z759n3 core implements the SPEFSCR register for status reporting and control of SPE instructions. This register is also used by the Embedded Floating-Point APUs. Status and control bits are shared for floating-point operations and SPE operations. The SPEFSCR register is implemented as special purpose register (SPR) number 512 and is read and written by the mfspr and mtspr instructions. The SPEFSCR is shown in Figure 6-1. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 261 8 FRMC FOVFE FUNFE FINVE FDBZE 0 FINXE FOVF FUNF FINV FDBZ FX FG OV SOV MODE FOVFS 7 FDBZS 6 FUNFS 5 FINVS FINVH 4 FINXS FXH 3 FOVFH FGH 2 FDBZH OVH 1 FUNFH SOVH 0 0 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR — 512; Read/Write; Reset — 0x0 Figure 6-1. SPE Status and Control Register (SPEFSCR) The SPEFSCR bits are defined in Table 6-1. Table 6-1. SPEFCR field descriptions Bits Name Description 0 (32) SOVH Summary Integer Overflow High The SOVH bit is set to 1 whenever an instruction sets OVH. The SOVH bit remains set until it is cleared by a mtspr instruction specifying the SPEFSCR register. 1 (33) OVH Integer Overflow High The OVH bit is set to 1 whenever an integer or fractional SPE instruction signals an overflow in the upper half of the result. 2 (34) FGH Embedded Floating-Point Guard bit High Defined by Embedded Floating-Point APUs. 3 (35) FXH Embedded Floating-Point Inexact bit High Defined by Embedded Floating-Point APUs. 4 (36) FINVH Embedded Floating-Point Invalid Operation / Input error High Defined by Embedded Floating-Point APUs. 5 (37) FDBZH Embedded Floating-Point Divide by Zero High Defined by Embedded Floating-Point APUs. 6 (38) FUNFH Embedded Floating-Point Underflow High Defined by Embedded Floating-Point APUs. 7 (39) FOVFH Embedded Floating-Point Overflow High Defined by Embedded Floating-Point APUs. 8:9 (40:41) — 10 (42) FINXS Embedded Floating-Point Inexact Sticky Flag Defined by Embedded Floating-Point APUs. 11 (43) FINVS Embedded Floating-Point Invalid Operation Sticky Flag Defined by Embedded Floating-Point APUs. 12 (44) FDBZS Embedded Floating-Point Divide by Zero Sticky Flag Defined by Embedded Floating-Point APUs. 13 (45) FUNFS Embedded Floating-Point Underflow Sticky Flag Defined by Embedded Floating-Point APUs. 14 (46) FOVFS Embedded Floating-Point Overflow Sticky Flag Defined by Embedded Floating-Point APUs. 15 (47) MODE Embedded Floating-Point Operating Mode Defined by Embedded Floating-Point APUs. Reserved e200z759n3 Core Reference Manual, Rev. 2 262 Freescale Semiconductor Table 6-1. SPEFCR field descriptions (continued) Bits Name Description 16 (48) SOV Summary Integer Overflow The SOV bit is set to 1 whenever an instruction sets OV. The SOV bit remains set until it is cleared by a mtspr instruction specifying the SPEFSCR register. 17 (49) OV Integer Overflow The OV bit is set to 1 whenever an integer or fractional SPE instruction signals an overflow in the low element result. 18 (50) FG Embedded Floating-Point Guard bit (low/scalar) Defined by Embedded Floating-Point APUs. 19 (51) FX Embedded Floating-Point Inexact bit (low/scalar) Defined by Embedded Floating-Point APUs. 20 (52) FINV Embedded Floating-Point Invalid Operation / Input error (low/scalar) Defined by Embedded Floating-Point APUs. 21 (53) FDBZ Embedded Floating-Point Divide by Zero (low/scalar) Defined by Embedded Floating-Point APUs. 22 (54) FUNF Embedded Floating-Point Underflow (low/scalar) Defined by Embedded Floating-Point APUs. 23 (55) FOVF Embedded Floating-Point Overflow (low/scalar) Defined by Embedded Floating-Point APUs. 24 (56) — 25 (57) FINXE Embedded Floating-Point Round (Inexact) Exception Enable Defined by Embedded Floating-Point APUs. 26 (58) FINVE Embedded Floating-Point Invalid Operation / Input Error Exception Enable Defined by Embedded Floating-Point APUs. 27 (59) FDBZE Embedded Floating-Point Divide by Zero Exception Enable Defined by Embedded Floating-Point APUs. 28 (60) FUNFE Embedded Floating-Point Underflow Exception Enable Defined by Embedded Floating-Point APUs. 29 (61) FOVFE Embedded Floating-Point Overflow Exception Enable Defined by Embedded Floating-Point APUs. 30:31 (62:63) FRMC Embedded Floating-Point Rounding Mode Control Defined by Embedded Floating-Point APUs. 6.2.2 Reserved Accumulator The e200z759n3 core has a 64-bit architectural accumulator register that holds the results of the SPE multiply accumulate (MAC) fixed-point instructions. The accumulator allows back-to-back execution of dependent fixed-point MAC instructions, something that is found in the inner loops of DSP code such as filters. The accumulator is partially visible to the programmer in that its results do not have to be explicitly read to use them. Instead, they are always copied into a 64-bit destination GPR specified as part of the instruction. The accumulator however, has to be explicitly cleared when starting a new MAC loop. Based e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 263 upon the type of instruction, an accumulator can hold either a single 64-bit value or a vector of two 32-bit elements. An example of a MAC instruction is evmhossfaaw rD,rA,rB. In this instruction, the least significant 16 bits of rA and rB are multiplied for both elements of the vector (see Figure "evmhossfaaw" on page 358), the result is shifted left one bit and added to the accumulator, and the result is possibly saturated to 32 bits in case of overflow. The final result is placed both in the accumulator and also in rD. Thus the result of this instruction can be used by accessing rD. To read the accumulator contents into a register, a multiply-accumulate instruction where one of its operands is a zero should be used, as the following sequence shows: evxor RD, RD, RD evmwumiaa RD, RD, RD // // // // Zero the contents of RD, not necessary if a zero is available in some register. Multiply 0 with 0, add the 0 result to accumulator and store back the value in acc and RD To initialize the accumulator, the evmra instruction is used. 6.2.2.1 Context switch When a context switch occurs, the OS process must explicitly save the accumulator as part of the context of the swapped-out task and then explicitly load the accumulator from the context of the new task that is being swapped in. When the old task is restarted, its accumulator must be restored before restarting the task. 6.2.3 GPRs and PowerPC Book E instructions The e200z759n3 core implements the 32-bit forms of the Book E instructions. All 32-bit PowerPC Book E instructions operate upon the lower half of the 64-bit GPR. These instructions do not affect the upper half of a GPR. 6.2.4 SPE available bit in MSR MSRSPE is defined as the SPE available bit. If this bit is clear and software attempts to execute any of the SPE instructions other than the s brinc instruction (which does not affect the upper 32 bits of a GPR), the SPE APU Unavailable exception is taken. If this bit is set, software can execute any of the SPE instructions. 6.2.5 SPE exception bit in ESR ESRSPE is defined as the SPE exception bit. This bit is set whenever the processor takes an exception related to the execution of the SPE APU instructions. 6.2.6 SPE exceptions The architecture defines the following SPE APU exceptions: • SPE APU Unavailable exception • SPE Vector Alignment exception — not used by e200z759n3 e200z759n3 Core Reference Manual, Rev. 2 264 Freescale Semiconductor Interrupt vector offset registers (IVOR) IVOR32 (SPE / Embedded Floating Point Unavailable Interrupt) and IVOR5 (Alignment Interrupt), are used by the interrupt model. The SPR number for IVOR32 is 528, IVOR5 is defined by Book E. These registers are privileged. 6.2.6.1 SPE APU Unavailable exception The SPE APU Unavailable exception is taken if MSRSPE is cleared and execution of a SPE APU instruction other than the brinc instruction is attempted. When the SPE APU Unavailable exception occurs, the processor suppresses execution of the instruction causing the exception. The SRR0, SRR1, MSR, and ESR registers are modified as follows: • SRR0 is set to the effective address of the instruction causing the exception. • SRR1 is set to the contents of the MSR at the time of the exception. • MSRCE,ME,DE are unchanged. All other bits are cleared. • The ESRSPE bit is set. All other ESR bits are cleared. Instruction execution resumes at address IVPR0:15||IVOR3216:27||0b0000. 6.2.7 Exception priorities The following list shows the priority order in which exceptions are taken: 1. SPE APU Unavailable exception 6.3 Integer SPE simple instructions e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 265 brinc brinc Bit Reversed Increment brinc rD,rA,rB 0 5 6 4 10 11 rD 15 16 RA 20 21 31 RB n = 16 mask = rB64-n:63 a = rA64-n:63 d = bitreverse(1 + bitreverse(a | (mask))) rD32:63 = rA32:63-n || (d & mask) 010 0000 1111 // Implementation dependent value // Least sig. n bits of 32-bit reg // || is concatenation The brinc instruction provides a way for software to access FFT data in a bit-reversed manner. rA contains the index into a buffer that contains data on which FFT is to be performed. rB contains a mask that allows the index to be updated with bit-reversed addressing. Typically this instruction precedes a load with index instruction, for example, brinc r2, r3, r4 lhax r8, r5, r2 rB contains a bitmask that is based upon the number of points in an FFT. To access a buffer containing n byte sized data that is to be accessed with bit-reversed addressing, the mask has log2n ‘1’s in the lsb positions and ‘0’s in the remaining most significant position. If however, the data size is a multiple of a half word or a word, the mask is constructed so that the ‘1’s are shifted left by log2 (size of the data) and ‘0’s are placed in the lsb positions. Table 6-2 shows example values of masks for different data sizes and number of data. Table 6-2. Data samples and sizes Data size Number of data samples Byte Half word Word Double word 8 000...00000111 000...00001110 000...000011100 000...0000111000 16 000...00001111 000...00011110 000...000111100 000...0001111000 32 000...00011111 000...00111110 000...001111100 000...0011111000 64 000...00111111 000...01111110 000...011111100 000...0111111000 NOTE An implementation can restrict the number of bits specified in a mask. In the e200z759n3 implementation, the number of bits is 16, which allows the user to perform bit-reversed address computations for 65536 byte sized samples. e200z759n3 Core Reference Manual, Rev. 2 266 Freescale Semiconductor evabs evabs Vector Absolute Value evabs rD,rA 0 5 6 4 10 11 RD 15 16 RA 20 21 0000 0 31 010 0000 1000 RD0:31 = ABS(RA0:31) RD32:63 = ABS(RA32:63) The absolute value of each element of rA is placed into the corresponding element of rD. Absolute value of 0x8000_0000 (most negative number) returns 0x8000_0000. No overflow is detected. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 267 evaddiw evaddiw Vector Add Immediate Word evaddiw rD,rB,UIMM 0 5 4 6 10 11 RD 15 16 UIMM 20 21 RB RD0:31 = RB0:31 + EXTZ(UIMM) RD32:63 = RB32:63 + EXTZ(UIMM) 31 010 0000 0010 // Modulo sum // Modulo sum The 5-bit UIMM value is zero-extended and added to each element of rB and the results are placed into the corresponding elements of rD. e200z759n3 Core Reference Manual, Rev. 2 268 Freescale Semiconductor evaddw evaddw Vector Add Word evaddw rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB RD0:31 = RA0:31 + RB0:31 RD32:63 = RA32:63 + RB32:63 31 010 0000 0000 // Modulo sum // Modulo sum Adds each element of rA to the corresponding element of rB and places the results into the corresponding elements of rD. The sum is a modulo sum. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 269 evand evand Vector AND evand rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB RD0:31 = RA0:31 & RB0:31 RD32:63 = RA32:63 & RB32:63 31 010 0001 0001 // Bitwise AND // Bitwise AND Performs a bitwise AND of each element of rA and rB and places the results into the corresponding elements of rD. e200z759n3 Core Reference Manual, Rev. 2 270 Freescale Semiconductor evandc evandc Vector AND with Complement evandc rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB RD0:31 = RA0:31 & (¬RB0:31) RD32:63 = RA32:63 & (¬RB32:63) 31 010 0001 0010 // Bitwise ANDC // Bitwise ANDC Performs a bitwise AND of each element of rA and complement of rB and places the results into the corresponding elements of rD. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 271 evcmpeq evcmpeq Vector Compare Equal evcmpeq crfD,rA,rB 0 5 4 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 31 010 0011 0100 ah = RA0:31 al = RA32:63 bh = RB0:31 bl = RB32:63 if (ah == bh) then ch = 1 else ch = 0 if (al == bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl) The msb in crfD is set if the high-order element of rA is equal to the high-order element of rB, cleared otherwise and the next most significant bit in crfD is set if the lower order element of rA is equal to the lower order element of rB, cleared otherwise. The last two bits of crfD are set to the OR and AND of the result of the compare of the high and low elements. e200z759n3 Core Reference Manual, Rev. 2 272 Freescale Semiconductor evcmpgts evcmpgts Vector Compare Greater Than Signed evcmpgts crfD,rA,rB 0 5 4 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 31 010 0011 0001 ah = RA0:31 al = RA32:63 bh = RB0:31 bl = RB32:63 if (ah > bh) then ch = 1 else ch = 0 if (al > bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl) The msb in crfD is set if the high-order element of rA is greater than the high-order element of rB, cleared otherwise and the next most significant bit in crfD is set if the lower order element of rA is greater than the lower order element of rB, cleared otherwise. The last two bits of crfD are set to the OR and AND of the result of the compare of the high and low elements. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 273 evcmpgtu evcmpgtu Vector Compare Greater Than Unsigned evcmpgtu crfD,rA,rB 0 5 4 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 31 010 0011 0000 ah = RA0:31 al = RA32:63 bh = RB0:31 bl = RB32:63 if (ah >U bh) then ch = 1 else ch = 0 if (al >U bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl) The msb in crfD is set if the high-order element of rA is greater than the high-order element of rB, cleared otherwise and the next most significant bit in crfD is set if the lower order element of rA is greater than the lower order element of rB, cleared otherwise. The last two bits of crfD are set to the OR and AND of the result of the compare of the high and low elements. e200z759n3 Core Reference Manual, Rev. 2 274 Freescale Semiconductor evcmplts evcmplts Vector Compare Less Than Signed evcmplts crfD,rA,rB 0 5 4 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 31 010 0011 0011 ah = RA0:31 al = RA32:63 bh = RB0:31 bl = RB32:63 if (ah < bh) then ch = 1 else ch = 0 if (al < bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl) The msb in crfD is set if the high-order element of rA is less than the high-order element of rB, cleared otherwise and the next most significant bit in crfD is set if the lower order element of rA is less than the lower order element of rB, cleared otherwise. The last two bits of crfD are set to the OR and AND of the result of the compare of the high and low elements. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 275 evcmpltu evcmpltu Vector Compare Less Than Unsigned evcmpltu crfD,rA,rB 0 5 4 6 8 crfD 9 10 11 00 15 16 RA 20 21 RB 31 010 0011 0010 ah = RA0:31 al = RA32:63 bh = RB0:31 bl = RB32:63 if (ah <U bh) then ch = 1 else ch = 0 if (al <U bl) then cl = 1 else cl = 0 CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl) The msb in crfD is set if the high-order element of rA is less than the high-order element of rB, cleared otherwise and the next most significant bit in crfD is set if the lower order element of rA is less than the lower order element of rB, cleared otherwise. The last two bits of crfD are set to the OR and AND of the result of the compare of the high and low elements. e200z759n3 Core Reference Manual, Rev. 2 276 Freescale Semiconductor evcntlsw evcntlsw Vector Count Leading Sign Bits Word evcntlsw rD,rA 0 5 4 6 10 11 RD 15 16 RA 20 21 0000 0 31 010 0000 1110 Counts the leading number of sign bits in each element of rA and places the counts into corresponding elements of rD. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 277 evcntlzw evcntlzw Vector Count Leading Zeros Word evcntlzw rD,rA 0 5 4 6 10 11 RD 15 16 RA 20 21 0000 0 31 010 0000 1101 Counts the leading number of zeros in each element of rA and places the counts into corresponding elements of rD. e200z759n3 Core Reference Manual, Rev. 2 278 Freescale Semiconductor evdivws evdivws Vector Divide Word Signed evdivws rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 31 100 1100 0110 dividendh = RA0:31 dividendl = RA32:63 divisorh = RB0:31 divisorl = RB32:63 RD0:31 = dividendh divisorh RD32:63 = dividendl divisorl Implementation Details: ovh = 0 ovl = 0 if ((dividendh<0) && (divisorh==0)) then RD0:31 = 0x80000000 ovh = 1 else if ((dividendh>=0) && (divisorh==0)) then RD0:31 = 0x7FFFFFFF ovh = 1 else if ((dividendh==0x80000000) && (divisorh==-1)) then RD0:31 = 0x7FFFFFFF ovh = 1 if ((dividendl<0) && (divisorl==0)) then RD32:63 = 0x80000000 ovl = 1 else if ((dividendl>=0) && (divisorl==0)) then RD32:63 = 0x7FFFFFFF ovl = 1 else if ((dividendl==0x80000000) && (divisorl==-1)) then RD32:63 = 0x7FFFFFFF ovl = 1 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl The two dividends are the two elements of the contents of rA. The two divisors are the two elements of the contents of rB. Two 32-bit quotients are formed as a result of the division on each of the upper and lower elements and the quotients are placed into rD. The remainders are not supplied as a result of this operation. Both the operands and quotients are interpreted as signed integers. If an overflow occurs (see the Power Architecture UISA divw instruction for the cases), the corresponding SPEFSCR bits are set, otherwise they are cleared. In case of overflow, a saturated value is delivered into the destination register. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 279 evdivwu evdivwu Vector Divide Word Unsigned evdivwu rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 31 100 1100 0111 dividendh = RA0:31 dividendl = RA32:63 divisorh = RB0:31 divisorl = RB32:63 RD0:31 = dividendh divisorh RD32:63 = dividendl divisorl Implementation Details: ovh = 0 ovl = 0 if (divisorh == 0) then RD0:31 = 0xFFFFFFFF ovh = 1 if (divisorl == 0) then RD32:63 = 0xFFFFFFFF ovl = 1 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl The two dividends are the two elements of the contents of rA. The two divisors are the two elements of the contents of rB. Two 32-bit quotients are formed as a result of the division on each of the upper and lower elements and the quotients are placed into rD. The remainders are not supplied as a result of this operation. Both the operands and quotients are interpreted as unsigned integers. If an overflow occurs (see the Power Architecture UISA divuw instruction for the cases), the corresponding SPEFSCR bits are set, otherwise they are cleared. In case of overflow, a saturated value is delivered into the destination register. e200z759n3 Core Reference Manual, Rev. 2 280 Freescale Semiconductor eveqv eveqv Vector Equivalent eveqv rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB RD0:31 = RA0:31 RB0:31 RD32:63 = RA32:63 RB32:63 31 010 0001 1001 // Bitwise XNOR // Bitwise XNOR Performs a bitwise XNOR of each element of rA and rB and places the results into the corresponding elements of rD. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 281 evextsb evextsb Vector Extend Sign Byte evextsb rD,rA 0 5 4 6 10 11 RD 15 16 RA 20 21 0000 0 31 010 0000 1010 RD0:31 = EXTS(RA24:31) RD32:63 = EXTS(RA56:63) Extends the sign of the low-order byte in each of the elements in rA and places the results into rD. e200z759n3 Core Reference Manual, Rev. 2 282 Freescale Semiconductor evextsh evextsh Vector Extend Sign Half Word evextsh rD,rA 0 5 4 6 10 11 RD 15 16 RA 20 21 0000 0 31 010 0000 1011 RD0:31 = EXTS(RA16:31) RD32:63 = EXTS(RA48:63) Extends the sign of the half words in each of the elements in rA and places the results into rD. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 283 evmergehi evmergehi Vector Merge High evmergehi rD,rA,rB 0 5 6 4 10 11 RD 15 16 RA 20 21 RB 31 010 0010 1100 RD0:31 = RA0:31 RD32:63 = RB0:31 The high-order elements of rA and rB are merged and placed into rD as shown in Figure 6-2. 0 31 32 63 RA RB RD Figure 6-2. High order element merging with evmergehi NOTE A vector splat high can be performed by specifying the same register in rA and rB. e200z759n3 Core Reference Manual, Rev. 2 284 Freescale Semiconductor evmergehilo evmergehilo Vector Merge High/Low evmergehilo rD,rA,rB 0 5 6 4 10 11 RD 15 16 RA 20 21 RB 31 010 0010 1110 RD0:31 = RA0:31 RD32:63 = RB32:63 The high-order element of rA and the low-order element of rB are merged and placed into rD as shown in Figure 6-3. 0 31 32 63 RA RB RD Figure 6-3. High order element merging with evmergehilo e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 285 evmergelo evmergelo Vector Merge Low evmergelo rD,rA,rB 0 5 6 4 10 11 RD 15 16 RA 20 21 RB 31 010 0010 1101 RD0:31 = RA32:63 RD32:63 = RB32:63 The low-order elements of rA and rB are merged and placed in rD as shown in Figure 6-4. 0 31 32 63 RA RB RD Figure 6-4. Low order element merging evmergelo NOTE A vector splat low can be performed by specifying the same register in rA and rB. e200z759n3 Core Reference Manual, Rev. 2 286 Freescale Semiconductor evmergelohi evmergelohi Vector Merge Low/High evmergelohi rD,rA,rB 0 5 6 4 10 11 RD 15 16 RA 20 21 RB 31 010 0010 1111 RD0:31 = RA32:63 RD32:63 = RB0:31 The low-order element of rA and the high-order element of rB are merged and placed into rD as shown in Figure 6-5. 0 31 32 63 RA RB RD Figure 6-5. Low order element merging evmergelohi NOTE A vector swap can be performed by specifying the same register in rA and rB. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 287 evnand evnand Vector NAND evnand rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB RD0:31 = (RA0:31 & RB0:31) RD32:63 = (RA32:63 & RB32:63) 31 010 0001 1110 // Bitwise NAND // Bitwise NAND Performs a bitwise NAND of each element of rA and rB and places the results into the corresponding elements of rD. e200z759n3 Core Reference Manual, Rev. 2 288 Freescale Semiconductor evneg evneg Vector Negate evneg rD,rA 0 5 4 6 10 11 RD 15 16 RA 20 21 0000 0 31 010 0000 1001 RD0:31 = NEG(RA0:31) RD32:63 = NEG(RA32:63) The negative value of each element of rA is placed in rD. The negative value of 0x8000_0000 (most negative number) returns 0x8000_0000. No overflow is detected. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 289 evnor evnor Vector NOR evnor rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB RD0:31 = (RA0:31 | RB0:31) RD32:63 = (RA32:63 | RB32:63) 31 010 0001 1000 // Bitwise NOR // Bitwise NOR Performs a bitwise NOR of each element of rA and rB and places the result into the corresponding element of rD. e200z759n3 Core Reference Manual, Rev. 2 290 Freescale Semiconductor evor evor Vector OR evor rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB RD0:31 = RA0:31 | RB0:31 RD32:63 = RA32:63 | RB32:63 31 010 0001 0111 //Bitwise OR // Bitwise OR Performs a bitwise OR of each element of rA and rB and places the results into the corresponding elements of rD. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 291 evorc evorc Vector OR with Complement evorc rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB RD0:31 = RA0:31 | (¬RB0:31) RD32:63 = RA32:63 | (¬RB32:63) 31 010 0001 1011 // Bitwise ORC // Bitwise ORC Performs a bitwise OR of each element of rA and complement of rB and places the results in the corresponding elements of rD. e200z759n3 Core Reference Manual, Rev. 2 292 Freescale Semiconductor evrlw evrlw Vector Rotate Left Word evrlw rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 31 010 0010 1000 nh = RB27:31 nl = RB59:63 RD0:31 = ROTL(RA0:31, nh) RD32:63 = ROTL(RA32:63, nl) Rotates left each of the elements of rA by amounts specified in rB and places the results into rD. The rotate amounts are specified by 5 bit fields in rB. Separate rotate values for each element of rA are specified in bit positions rB27:31 and rB59:63. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 293 evrlwi evrlwi Vector Rotate Left Word Immediate evrlwi rD,rA,UIMM 0 5 4 6 10 11 RD 15 16 RA 20 21 UIMM 31 010 0010 1010 n = UIMM RD0:31 = ROTL(RA0:31, n) RD32:63 = ROTL(RA32:63, n) Rotates left both elements of rA by an amount specified by the 5-bit UIMM immediate value and places the results into rD. e200z759n3 Core Reference Manual, Rev. 2 294 Freescale Semiconductor evrndw evrndw Vector Round Word evrndw rD,rA 0 5 4 6 10 11 RD 15 16 RA 20 21 0000 0 RD0:31 = (RA0:31+0x00008000) & 0xFFFF0000 RD32:63 = (RA32:63+0x00008000) & 0xFFFF0000 31 010 0000 1100 // Modulo sum // Modulo sum Rounds the 32-bit elements of rA into 16 bits and places the results into rD. The resulting 16 bits of each element are placed in the most significant 16 bits of each element of rD, zeroing out the low order 16 bits of each element. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 295 evsel evsel Vector Select evsel rD,rA,rB,crfS 0 5 6 4 10 11 RD 15 16 RA 20 21 RB 31 010 0111 1 crfS ch = CRcrfS*4 cl = CRcrfS*4+1 if (ch == 1) then RD0:31 = RA0:31 else RD0:31 = RB0:31 if (cl == 1) then RD32:63 = RA32:63 else RD32:63 = RB32:63 If the msb if the crfS field of CR is set, the high-order element of rA is placed in the high-order element of rD; otherwise, the high-order element of rB is placed into the higher order element of rD. If the next most significant bit in the crfS field of CR is set, the low-order element of rA is placed in the low-order element of rD, otherwise, the low-order element of rB is placed into the lower order element of rD. This is shown in Figure 6-6. 0 31 32 63 RA RB ch 1 0 cl 1 0 RD Figure 6-6. evsel e200z759n3 Core Reference Manual, Rev. 2 296 Freescale Semiconductor evslw evslw Vector Shift Left Word evslw rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 31 010 0010 0100 nh = RB26:31 nl = RB58:63 RD0:31 = SL(RA0:31, nh) RD32:63 = SL(RA32:63, nl) Shifts left each element of rA by amounts specified in rB and places the results into rD. The separate shift amounts for each element are specified by 6-bit fields in rB in bit positions 26:31 and 58:63. Shift amounts from 32 to 63 give a zero result. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 297 evslwi evslwi Vector Shift Left Word Immediate evslwi rD,rA,UIMM 0 5 4 6 10 11 RD 15 16 RA 20 21 UIMM 31 010 0010 0110 n = UIMM RD0:31 = SL(RA0:31, n) RD32:63 = SL(RA32:63, n) Shifts left each element of rA by the 5-bit UIMM value and places the results into rD. e200z759n3 Core Reference Manual, Rev. 2 298 Freescale Semiconductor evsplatfi evsplatfi Vector Splat Fractional Immediate evsplatfi rD,SIMM 0 5 4 6 10 11 RD 15 16 SIMM 20 21 0000 0 31 010 0010 1011 RD0:31 = SIMM || 270 RD32:63 = SIMM || 270 The 5-bit SIMM value is padded with trailing zeros and placed into both elements of rD as shown in Figure 6-7. The SIMM value is placed in bit positions rD0:4 and rD32:36. SABCD 0 31 32 SABCD000...........000000 SIMM 63 SABCD000...........000000 RD Figure 6-7. Splat for evsplatfi e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 299 evsplati evsplati Vector Splat Immediate evsplati rD,SIMM 0 5 6 4 10 11 RD 15 16 SIMM 20 21 0000 0 31 010 0010 1001 RD0:31 = EXTS(SIMM) RD32:63 = EXTS(SIMM) The 5-bit SIMM immediate value is sign-extended and placed into both elements of rD as shown in Figure 6-8. SABCD 0 31 32 SSS......................SABCD SIMM 63 SSS......................SABCD RD Figure 6-8. Sign-extend in evsplati e200z759n3 Core Reference Manual, Rev. 2 300 Freescale Semiconductor evsrwis evsrwis Vector Shift Right Word Immediate Signed evsrwis rD,rA,UIMM 0 5 4 6 10 11 RD 15 16 RA 20 21 UIMM 31 010 0010 0011 n = UIMM RD0:31 = EXTS(RA0:31-n) RD32:63 = EXTS(RA32:63-n) Shifts right arithmetically each element of rA by the 5-bit UIMM value and places the results into rD. The sign bit of each source element in rA is extended right into the most significant bit positions of each result element. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 301 evsrwiu evsrwiu Vector Shift Right Word Immediate Unsigned evsrwiu rD,rA,UIMM 0 5 4 6 10 11 RD 15 16 RA 20 21 UIMM 31 010 0010 0010 n = UIMM RD0:31 = EXTZ(RA0:31-n) RD32:63 = EXTZ(RA32:63-n) Shifts right logically each element of rA by the 5-bit UIMM value and places the results into rD. ‘0’ bits are shifted in to the most significant bit positions of each result element. e200z759n3 Core Reference Manual, Rev. 2 302 Freescale Semiconductor evsrws evsrws Vector Shift Right Word Signed evsrws rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 31 010 0010 0001 nh = RB26:31 nl = RB58:63 RD0:31 = EXTS(RA0:31-nh) RD32:63 = EXTS(RA32:63-nl) Shifts right arithmetically each element of rA by an amount specified in rB and places the results into rD. Separate shift amounts for each element are specified by 6-bit fields in rB that occupy bit positions 26:31 and 58:63. The sign bit of each source element in rA is extended right into the most significant bit positions of each result element. Shift amounts from 32 to 63 give a result of 32 sign bits. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 303 evsrwu evsrwu Vector Shift Right Word Unsigned evsrwu rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 31 010 0010 0000 nh = RB26:31 nl = RB58:63 RD0:31 = EXTZ(RA0:31-nh) RD32:63 = EXTZ(RA32:63-nl) Shifts right logically each element of rA by amounts specified in rB and places the results into rD. Separate shift amounts for each element are specified by 6-bit fields in rB that occupy bit positions 26:31 and 58:63. Zero bits are shifted in to the most significant bit positions. Shift amounts from 32 to 63 give a zero result. e200z759n3 Core Reference Manual, Rev. 2 304 Freescale Semiconductor evsubfw evsubfw Vector Subtract from Word evsubfw rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB RD0:31 = RB0:31 - RA0:31 RD32:63 = RB32:63 - RA32:63 31 010 0000 0100 // Modulo sum // Modulo sum Each element of rA is subtracted from the corresponding element of rB and the results are placed into the corresponding elements of rD. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 305 evsubifw evsubifw Vector Subtract Immediate from Word evsubifw rD,UIMM,rB 0 5 4 6 10 11 RD 15 16 UIMM 20 21 RB RD0:31 = RB0:31 - EXTZ(UIMM) RD32:63 = RB32:63 - EXTZ(UIMM) 31 010 0000 0110 // Modulo sum // Modulo sum The 5-bit UIMM value is zero-extended and subtracted from each element of rB and the results are placed into the corresponding elements of rD. Note that the same value is subtracted from each element. e200z759n3 Core Reference Manual, Rev. 2 306 Freescale Semiconductor evxor evxor Vector XOR evxor rD,rA,rB 0 5 4 6 10 11 RD 15 16 20 21 RA RB 31 010 0001 0110 RD0:31 = RA0:31 RB0:31 RD32:63 = RA32:63 RB32:63 // Bitwise XOR // Bitwise XOR Performs a bitwise exclusive-OR of each element of rA and rB and places the results into the corresponding elements of rD. 6.4 Integer SPE multiply, multiply-accumulate, and operation to accumulator instructions (complex integer instructions) A number of forms of multiply and multiply-accumulate operations are supported in the SPE APU, as are add and subtract to accumulator operations. The SPE supports signed and unsigned forms, and optional fractional forms. For all of these instructions, the fractional form does not apply to unsigned forms because integer and fractional forms are identical for unsigned operands. Table 6-3 defines mnemonic extensions for these instructions. Table 6-3. Mnemonic extensions for multiply-accumulate instructions Extension Meaning Comments Multiply form he halfword even 16 × 16 32 heg halfword even guarded 16 × 16 32, 64-bit final accum result ho halfword odd 16 × 16 32 hog halfword odd guarded 16 × 16 32, 64-bit final accum result w word 32 × 32 64 wh word high 32 × 32 32 high order 32 bits of product wl word low 32 × 32 32 low order 32 bits of product Data type smf signed modulo fractional Wrap, no saturate smi signed modulo integer Wrap, no saturate e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 307 Table 6-3. Mnemonic extensions for multiply-accumulate instructions (continued) Extension Meaning Comments ssf signed saturate fractional — ssi signed saturate integer — umi unsigned modulo integer Wrap, no saturate usi unsigned saturate integer — Accumulate options a update accumulator Update accumulator (no add) aa add to accumulator Add result to accumulator (64-bit sum) add to accumulator (words) Add word results to accumulator words (pair of 32-bit sums) an add negated Add negated result to accumulator (64-bit sum) anw add negated to accumulator (words) Add negated word results to accumulator words (pair of 32-bit sums) aaw 6.4.1 Multiply halfword instructions The following instructions perform 16x16 multiplies from the odd or even half of elements, with and without accumulates, using signed or unsigned integer or fractional operands, and with optional saturation. e200z759n3 Core Reference Manual, Rev. 2 308 Freescale Semiconductor evmhegsmfaa evmhegsmfaa Multiply Half Words, Even, Guarded, Signed, Modulo, Fractional and Accumulate evmhegsmfaa rD,rA,rB 0 5 4 6 (O=0, F=1, S=1) 10 11 RD 15 16 RA 20 21 31 RB 101 0010 1011 prod0:31 = rA32:47 * rB32:47 temp10:63 = EXTS(prod0:31 || 0) temp20:64 = ACC0:63 + temp10:63 rD0:63 = ACC0:63 = temp21:64 The low even-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The 32-bit intermediate product is sign-extended to 64 bits and then shifted left by one bit and added to the contents of the 64-bit accumulator to form a 65-bit intermediate sum. The lower 64 bits of the intermediate sum are placed back into the accumulator and also written into rD. NOTE This is a modulo sum. There is no check for overflow and no saturation is performed. An overflow from the 64-bit sum, if one occurs, is not recorded into SPEFSCR. 0 47 48 31 32 63 rA rB X Intermediate product Accumulator + Accumulator & rD Figure 6-9. evmhegsmfaa e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 309 evmhegsmfan evmhegsmfan Multiply Half Words, Even, Guarded, Signed, Modulo, Fractional and Accumulate Negative evmhegsmfan rD,rA,rB 0 5 4 6 (O=0, F=1, S=1) 10 11 RD 15 16 RA 20 21 31 RB 101 1010 1011 prod0:31 = rA32:47 * rB32:47 temp10:63 = EXTS(prod0:31 || 0) temp20:64 = ACC0:63 - temp10:63 rD0:63 = ACC0:63 = temp21:64 The low even-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The 32-bit intermediate product is sign-extended to 64 bits and then shifted left by one bit and subtracted from the contents of the 64-bit accumulator to form a 65-bit intermediate difference. The lower 64 bits of the intermediate difference is placed back into the accumulator and also written into rD. NOTE This is a modulo difference. There is no check for overflow and no saturation is performed. An overflow from the 64-bit difference, if one occurs, is not recorded into SPEFSCR. 0 47 48 31 32 63 rA rB X Intermediate product Accumulator – Accumulator & rD Figure 6-10. evmhegsmfan e200z759n3 Core Reference Manual, Rev. 2 310 Freescale Semiconductor evmhegsmiaa evmhegsmiaa Multiply Half Words, Even, Guarded, Signed, Modulo, Integer and Accumulate evmhegsmiaa rD,rA,rB 0 5 4 6 (O=0, F=0, S=1) 10 11 RD 15 16 RA 20 21 31 RB 101 0010 1001 prod0:31 = rA32:47 *si rB32:47 temp10:63 = EXTS(prod0:31) temp20:64 = ACC0:63 + temp10:63 rD0:63 = ACC0:63 = temp21:64 The low even-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The intermediate product is sign-extended to 64 bits and added to the contents of the 64-bit accumulator to form a 65-bit intermediate sum. The lower 64 bits of the intermediate sum is placed back into the accumulator and also written into rD. NOTE This is a modulo sum. There is no check for overflow and no saturation is performed. An overflow from the 64-bit sum, if one occurs, is not recorded into SPEFSCR. 0 31 32 47 48 63 rA rB X Intermediate product Accumulator + Accumulator & rD Figure 6-11. evmhegsmiaa e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 311 evmhegsmian evmhegsmian Multiply Half Words, Even, Guarded, Signed, Modulo, Integer and Accumulate Negative evmhegsmian rD,rA,rB 0 5 4 6 (O=0, F=0, S=1) 10 11 RD 15 16 RA 20 21 31 RB 101 1010 1001 prod0:31 = rA32:47 *si rB32:47 temp10:63 = EXTS(prod0:31) temp20:64 = ACC0:63 - temp10:63 rD0:63 = ACC0:63 = temp21:64 The low even-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The intermediate product is sign-extended to 64 bits and subtracted from the contents of the 64-bit accumulator to form a 65-bit intermediate difference. The lower 64 bits of the intermediate difference is placed back into the accumulator and also written into rD. NOTE This is a modulo difference. There is no check for overflow and no saturation is performed. An overflow from the 64-bit difference, if one occurs, is not recorded into SPEFSCR. 0 31 32 47 48 63 rA rB X Intermediate product Accumulator – Accumulator & rD Figure 6-12. evmhegsmian e200z759n3 Core Reference Manual, Rev. 2 312 Freescale Semiconductor evmhegumiaa evmhegumiaa Multiply Half Words, Even, Guarded, Unsigned, Modulo, Integer and Accumulate evmhegumiaa rD,rA,rB 0 5 4 6 (O=0, F=0, S=0) 10 11 RD 15 16 RA 20 21 31 RB 101 0010 1000 prod0:31 = rA32:47 *ui rB32:47 temp10:63 = EXTZ(prod0:31) temp20:64 = ACC0:63 + temp10:63 rD0:63 = ACC0:63 = temp21:64 The low even-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The intermediate product is sign-extended to 64 bits and added to the contents of the 64-bit accumulator to form a 65-bit intermediate sum. The lower 64 bits of the intermediate sum is placed back into the accumulator and also written into rD. 0 31 32 47 48 63 rA rB X Intermediate product Accumulator + Accumulator & rD Figure 6-13. evmhegumiaa e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 313 evmhegumian evmhegumian Multiply Half Words, Even, Guarded, Unsigned, Modulo, Integer and Accumulate Negative evmhegumian rD,rA,rB 0 5 4 6 (O=0, F=0, S=0) 10 11 RD 15 16 RA 20 21 31 RB 101 1010 1000 prod0:31 = rA32:47 *ui rB32:47 temp10:63 = EXTZ(prod0:31) temp20:64 = ACC0:63 - temp10:63 rD0:63 = ACC0:63 = temp21:64 The low even-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The intermediate product is sign-extended to 64 bits and subtracted from the contents of the 64-bit accumulator to form a 65-bit intermediate difference. The lower 64 bits of the intermediate difference is placed back into the accumulator and also written into rD. NOTE This is a modulo difference. There is no check for overflow and no saturation is performed. An overflow from the 64-bit difference, if one occurs, is not recorded into SPEFSCR. 0 31 32 47 48 63 rA rB X Intermediate product Accumulator – Accumulator & rD Figure 6-14. evmhegumian e200z759n3 Core Reference Manual, Rev. 2 314 Freescale Semiconductor evmhesmf evmhesmf Vector Multiply Half Words, Even, Signed, Modulo, Fractional evmhesmf rD,rA,rB 0 5 6 4 (M=1, O=0, F=1, S=1, A=0) 10 11 RD 15 16 RA 20 21 RB 31 100 0000 1011 prod0:31 = rA0:15 * rB0:15 prod32:63 = rA32:47 * rB32:47 temp10:32 = prod0:31 || 0 temp20:32 = prod32:63 || 0 rD0:31 = temp11:32 rD32:63 = temp21:32 Each even-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The two 32-bit signed fractional products are shifted left by one bit to remove the redundant sign bit, and are then placed into the two word elements of rD. 15 16 0 47 48 31 32 63 rA rB X X rD Figure 6-15. evmhesmf e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 315 evmhesmfa evmhesmfa Vector Multiply Half Words, Even, Signed, Modulo, Fractional, to Accumulator evmhesmfa rD,rA,rB 0 5 6 (M=1, O=0, F=1, S=1, A=1) 10 11 4 RD 15 16 RA 20 21 31 RB 100 0010 1011 prod0:31 = rA0:15 * rB0:15 prod32:63 = rA32:47 * rB32:47 temp10:32 = prod0:31 || 0 temp20:32 = prod32:63 || 0 rD0:31 = temp11:32 rD32:63 = temp21:32 ACC0:63 = rD0:63 Each even-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The two 32-bit signed fractional products are shifted left by one bit to remove the redundant sign bit, and are then placed into the two word elements of rD. The result in rD is also placed in the accumulator. Other registers altered: ACC 0 15 16 31 32 47 48 63 rA rB X X Accumulator & rD Figure 6-16. evmhesmfa e200z759n3 Core Reference Manual, Rev. 2 316 Freescale Semiconductor evmhesmfaaw evmhesmfaaw Vector Multiply Half Words, Even, Signed, Modulo, Fractional and Accumulate into Words evmhesmfaaw 0 rD,rA,rB 5 6 (M=1, O=0, F=1, S=1) 10 11 4 RD 15 16 RA 20 21 31 RB 101 0000 1011 temp10:32 = (rA0:15 * rB0:15) || 0 temp20:32 = (rA32:47 * rB32:47) || 0 temp30:32 = ACC0:31 + temp11:32 temp40:32 = ACC32:63 + temp21:32 ACC0:31 = rD0:31 = temp31:32 ACC32:63 = rD32:63 = temp41:32 For each word element in the accumulator the following operations are performed in the order shown: Each even-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The intermediate 32-bit product is shifted left by one bit to remove the redundant sign bit, and is then added to the contents of the accumulator word to form a 33-bit intermediate sum. The low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. Other registers altered: ACC 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-17. evmhesmfaaw e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 317 evmhesmfanw evmhesmfanw Vector Multiply Half Words, Even, Signed, Modulo, Fractional and Accumulate Negative into Words evmhesmfanw 0 rD,rA,rB 5 6 (M=1, O=0, F=1, S=1) 10 11 4 RD 15 16 RA 20 21 31 RB 101 1000 1011 temp10:32 = (rA0:15 * rB0:15) || 0 temp20:32 = (rA32:47 * rB32:47) || 0 temp30:32 = ACC0:31 - temp11:32 temp40:32 = ACC32:63 - temp21:32 ACC0:31 = rD0:31 = temp31:32 ACC32:63 = rD32:63 = temp41:32 For each word element in the accumulator the following operations are performed in the order shown: Each even-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The intermediate 32-bit product is shifted left by one bit to remove the redundant sign bit, and is then subtracted from the contents of the accumulator word to form a 33-bit intermediate difference. The low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. Other registers altered: ACC 15 16 0 47 48 31 32 63 rA rB X X Intermediate product Accumulator – – Accumulator & rD Figure 6-18. evmhesmfanw e200z759n3 Core Reference Manual, Rev. 2 318 Freescale Semiconductor evmhesmi evmhesmi Vector Multiply Half Words, Even, Signed, Modulo, Integer evmhesmi rD,rA,rB 0 5 6 4 (M=1, O=0, F=0, S=1, A=0) 10 11 RD 15 16 RA 20 21 RB 31 100 0000 1001 rD0:31 = rA0:15 *si rB0:15 rD32:63 = rA32:47 *si rB32:47 Each even-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The two 32-bit signed integer products are placed into the two word elements of rD. 0 15 16 31 32 47 48 63 rA rB X X rD Figure 6-19. evmhesmi e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 319 evmhesmia evmhesmia Vector Multiply Half Words, Even, Signed, Modulo, Integer, to Accumulator evmhesmia rD,rA,rB 0 5 6 (M=1, O=0, F=0, S=1, A=1) 10 11 4 RD 15 16 RA 20 21 31 RB 100 0010 1001 rD0:31 = rA0:15 *si rB0:15 rD32:63 = rA32:47 *si rB32:47 ACC0:63 = rD0:63 Each even-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The two 32-bit signed integer products are placed into the two word elements of rD. The result in rD is also placed in the accumulator. Other registers altered: ACC 15 16 0 47 48 31 32 63 rA rB X X Accumulator & rD Figure 6-20. evmhesmia e200z759n3 Core Reference Manual, Rev. 2 320 Freescale Semiconductor evmhesmiaaw evmhesmiaaw Vector Multiply Half Words, Even, Signed, Modulo, Integer and Accumulate into Words evmhesmiaaw 0 rD,rA,rB 5 6 (M=1, O=0, F=0, S=1) 10 11 4 RD 15 16 RA 20 21 31 RB 101 0000 1001 temp10:31 = rA0:15 *si rB0:15 temp20:31 = rA32:47 *si rB32:47 temp30:32 = ACC0:31 + temp10:31 temp40:32 = ACC32:63 + temp20:31 ACC0:31 = rD0:31 = temp31:32 ACC32:63 = rD32:63 = temp41:32 For each word element in the accumulator the following operations are performed in the order shown: Each even-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The intermediate 32-bit product is added to the contents of the accumulator word to form a 33-bit intermediate sum. The low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. Other registers altered: ACC 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-21. evmhesmiaaw e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 321 evmhesmianw evmhesmianw Vector Multiply Half Words, Even, Signed, Modulo, Integer and Accumulate Negative into Words evmhesmianw 0 rD,rA,rB 5 6 (M=1, O=0, F=0, S=1) 10 11 4 RD 15 16 RA 20 21 31 RB 101 1000 1001 temp10:31 = rA0:15 *si rB0:15 temp20:31 = rA32:47 *si rB32:47 temp30:32 = ACC0:31 - temp10:31 temp40:32 = ACC32:63 - temp20:31 ACC0:31 = rD0:31 = temp31:32 ACC32:63 = rD32:63 = temp41:32 For each word element in the accumulator the following operations are performed in the order shown: Each even-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The intermediate 32-bit product is subtracted from the contents of the accumulator word to form a 33-bit intermediate difference. The low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. Other registers altered: ACC 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator – – Accumulator & rD Figure 6-22. evmhesmianw e200z759n3 Core Reference Manual, Rev. 2 322 Freescale Semiconductor evmhessf evmhessf Vector Multiply Half Words, Even, Signed, Saturate, Fractional evmhessf rD,rA,rB 0 5 6 4 (M=0, O=0, F=1, S=1, A=0) 10 11 RD 15 16 RA 20 21 RB 31 100 0000 0011 temp10:32 = (rA0:15 * rB0:15) || 0 temp20:32 = (rA32:47 * rB32:47) || 0 movh = temp10 temp11 movl = temp20 temp21 rD0:31 = SATURATE(movh, 0x7FFFFFFF, temp11:32) rD32:63 = SATURATE(movl, 0x7FFFFFFF, temp21:32) SPEFSCROVH = movh SPEFSCROV = movl SPEFSCRSOVH = SPEFSCRSOVH | movh SPEFSCRSOV = SPEFSCRSOV | movl Each even-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The two 32-bit signed fractional products are shifted left one bit to eliminate the redundant sign bit, and are then placed into the two word elements of rD. If the inputs are –1.0 and –1.0 the result is saturated to the most positive signed fraction (0x7FFFFFFF). If saturation occurs, the appropriate overflow and summary overflow bits are recorded in SPEFSCR. Other registers altered: SPEFSCR 0 15 16 31 32 47 48 63 rA rB X X rD Figure 6-23. evmhessf e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 323 evmhessfa evmhessfa Vector Multiply Half Words, Even, Signed, Saturate, Fractional, to Accumulator evmhessfa rD,rA,rB 0 5 4 6 (M=0, O=0, F=1, S=1, A=1) 10 11 RD 15 16 20 21 RA 31 RB 100 0010 0011 temp10:32 = (rA0:15 * rB0:15) || 0 temp20:32 = (rA32:47 * rB32:47) || 0 movh = temp10 temp11 movl = temp20 temp21 rD0:31 = SATURATE(movh, 0x7FFFFFFF, temp11:32) rD32:63 = SATURATE(movl, 0x7FFFFFFF, temp21:32) ACC0:63 = rD0:63 SPEFSCROVH = movh SPEFSCROV = movl SPEFSCRSOVH = SPEFSCRSOVH | movh SPEFSCRSOV = SPEFSCRSOV | movl Each even-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The two 32-bit signed fractional products are shifted left one bit to eliminate the redundant sign bit, and are then placed into the two word elements of rD. If the inputs are –1.0 and –1.0 the result is saturated to the most positive signed fraction (0x7FFFFFFF). The result in rD is also placed in the accumulator. If saturation occurs, the appropriate overflow and summary overflow bits are recorded in SPEFSCR. Other registers altered: SPEFSCR, ACC 0 15 16 31 32 47 48 63 rA rB X X Accumulator & rD Figure 6-24. evmhessfa e200z759n3 Core Reference Manual, Rev. 2 324 Freescale Semiconductor evmhessfaaw evmhessfaaw Vector Multiply Half Words, Even, Signed, Saturate, Fractional and Accumulate into Words evmhessfaaw 0 rD,rA,rB 5 4 6 (M=0, O=0, F=1, S=1) 10 11 RD 15 16 RA 20 21 RB 31 101 0000 0011 temp10:32 = (rA0:15 * rB0:15) || 0 temp20:32 = (rA32:47 * rB32:47) || 0 movh = temp10 temp11 movl = temp20 temp21 temp30:31 = SATURATE(movh, 0x7FFFFFFF, temp11:32) temp40:31 = SATURATE(movl, 0x7FFFFFFF, temp21:32) temp50:32 = {ACC0,ACC0:31} + {temp30,temp30:31} temp60:32 = {ACC32,ACC32:63} + {temp40,temp40:31} ovh = temp50 temp51 ovl = temp60 temp61 rD0:31 = SATURATE_ACC(ovh, temp50, 0x80000000, 0x7FFFFFFF, temp51:32) rD32:63 = SATURATE_ACC(ovl, temp60, 0x80000000, 0x7FFFFFFF, temp61:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = movh | ovh SPEFSCROV = movl | ovl SPEFSCRSOVH = SPEFSCRSOVH | movh | ovh SPEFSCRSOV = SPEFSCRSOV | movl | ovl For each word element in the accumulator the following operations are performed in the order shown: Each even-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The two 32-bit signed fractional products are shifted left one bit to eliminate the redundant sign bit. If the inputs are –1.0 and –1.0 the intermediate result is saturated to the most positive signed fraction (0x7FFFFFFF). The intermediate 32-bit product is added to the contents of the accumulator word to form an intermediate sum. If the intermediate sum has overflowed, the appropriate saturation value (0x7FFFFFFF if positive overflow or 0x80000000 if negative overflow) is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. If there is an overflow from either the multiply or the addition, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 325 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-25. evmhessfaaw e200z759n3 Core Reference Manual, Rev. 2 326 Freescale Semiconductor evmhessfanw evmhessfanw Vector Multiply Half Words, Even, Signed, Saturate, Fractional and Accumulate Negative into Words evmhessfanw 0 rD,rA,rB 5 4 6 (M=0, O=0, F=1, S=1) 10 11 RD 15 16 RA 20 21 RB 31 101 1000 0011 temp10:32 = (rA0:15 * rB0:15) || 0 temp20:32 = (rA32:47 * rB32:47) || 0 movh = temp10 temp11 movl = temp20 temp21 temp30:31 = SATURATE(movh, 0x7FFFFFFF, temp11:32) temp40:31 = SATURATE(movl, 0x7FFFFFFF, temp21:32) temp50:32 = {ACC0,ACC0:31} - {temp30,temp30:31} temp60:32 = {ACC32,ACC32:63} - {temp40,temp40:31} ovh = temp50 temp51 ovl = temp60 temp61 rD0:31 = SATURATE_ACC(ovh, temp50, 0x80000000, 0x7FFFFFFF, temp51:32) rD32:63 = SATURATE_ACC(ovl, temp60, 0x80000000, 0x7FFFFFFF, temp61:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = movh | ovh SPEFSCROV = movl | ovl SPEFSCRSOVH = SPEFSCRSOVH | movh | ovh SPEFSCRSOV = SPEFSCRSOV | movl | ovl For each word element in the accumulator the following operations are performed in the order shown: Each even-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The two 32-bit signed fractional products are shifted left one bit to eliminate the redundant sign bit. If the inputs are –1.0 and –1.0 the intermediate result is saturated to the most positive signed fraction (0x7FFFFFFF). The intermediate 32-bit product is subtracted from the contents of the accumulator word to form an intermediate sum. If the intermediate difference has overflowed, the appropriate saturation value (0x7FFFFFFF if positive overflow or 0x80000000 if negative overflow) is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. If there is an overflow from either the multiply or the addition, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 327 15 16 0 47 48 31 32 63 rA rB X X Intermediate product Accumulator – – Accumulator & rD Figure 6-26. evmhessfanw e200z759n3 Core Reference Manual, Rev. 2 328 Freescale Semiconductor evmhessiaaw evmhessiaaw Vector Multiply Half Words, Even, Signed, Saturate, Integer and Accumulate into Words evmhessiaaw 0 rD,rA,rB 5 4 6 (M=0, O=0, F=0, S=1) 10 11 RD 15 16 RA 20 21 RB 31 101 0000 0001 temp10:31 = rA0:15 *si rB0:15 temp20:31 = rA32:47 *si rB32:47 temp30:32 = {ACC0,ACC0:31} + {temp10,temp10:31} temp40:32 = {ACC32,ACC32:63} + {temp20,temp20:31} ovh = temp30 temp31 ovl = temp40 temp41 rD0:31 = SATURATE_ACC(ovh, temp30, 0x80000000, 0x7FFFFFFF, temp31:32) rD32:63 = SATURATE_ACC(ovl, temp40, 0x80000000, 0x7FFFFFFF, temp41:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl For each word element in the accumulator the following operations are performed in the order shown: Each even-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The intermediate 32-bit product is added to the contents of the accumulator word to form an intermediate sum. If the intermediate sum has overflowed, the appropriate saturation value (0x7FFFFFFF if positive overflow or 0x80000000 if negative overflow) is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. If there is an overflow from the addition, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 329 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-27. Even form of vector halfword multiply (evmhessiaaw) e200z759n3 Core Reference Manual, Rev. 2 330 Freescale Semiconductor evmhessianw evmhessianw Vector Multiply Half Words, Even, Signed, Saturate, Integer and Accumulate Negative into Words evmhessianw 0 rD,rA,rB 5 4 6 (M=0, O=0, F=0, S=1) 10 11 RD 15 16 RA 20 21 RB 31 101 1000 0001 temp10:31 = rA0:15 *si rB0:15 temp20:31 = rA32:47 *si rB32:47 temp30:32 = {ACC0,ACC0:31} - {temp10,temp10:31} temp40:32 = {ACC32,ACC32:63} - {temp20,temp20:31} ovh = temp30 temp31 ovl = temp40 temp41 rD0:31 = SATURATE_ACC(ovh, temp30, 0x80000000, 0x7FFFFFFF, temp31:32) rD32:63 = SATURATE_ACC(ovl, temp40, 0x80000000, 0x7FFFFFFF, temp41:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl For each word element in the accumulator, the following operations are performed in the order shown: Each even-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The intermediate 32-bit product is subtracted from the contents of the accumulator word to form a 33-bit intermediate difference. If the intermediate difference has overflowed, the appropriate saturation value (0x7FFFFFFF if positive overflow or 0x80000000 if negative overflow) is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. If there is an overflow from the subtraction, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 331 15 16 0 47 48 31 32 63 rA rB X X Intermediate product Accumulator – – Accumulator & rD Figure 6-28. evmhessianw e200z759n3 Core Reference Manual, Rev. 2 332 Freescale Semiconductor evmheumi evmheumi Vector Multiply Half Words, Even, Unsigned, Modulo, Integer evmheumi rD,rA,rB 0 5 6 4 (M=1, O=0, F=0, S=0, A=0) 10 11 RD 15 16 RA 20 21 RB 31 100 0000 1000 rD0:31 = rA0:15 *ui rB0:15 rD32:63 = rA32:47 *ui rB32:47 Each even-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The two 32-bit unsigned integer products are placed into the two word elements of rD. 0 15 16 31 32 47 48 63 rA rB X X rD Figure 6-29. evmheumi — even multiply of two unsigned modulo integer elements e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 333 evmheumia evmheumia Vector Multiply Half Words, Even, Unsigned, Modulo, Integer, to Accumulator evmheumia rD,rA,rB 0 5 6 (M=1, O=0, F=0, S=0, A=1) 10 11 4 RD 15 16 RA 20 21 31 RB 100 0010 1000 rD0:31 = rA0:15 *ui rB0:15 rD32:63 = rA32:47 *ui rB32:47 ACC0:63 = rD0:63 Each even-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The two 32-bit unsigned integer products are placed into the two word elements of rD. The result in rD is also placed in the accumulator. Other registers altered: ACC 15 16 0 47 48 31 32 63 rA rB X X Accumulator & rD Figure 6-30. evmheumia e200z759n3 Core Reference Manual, Rev. 2 334 Freescale Semiconductor evmheumiaaw evmheumiaaw Vector Multiply Half Words, Even, Unsigned, Modulo, Integer and Accumulate into Words evmheumiaaw 0 rD,rA,rB 5 6 (M=1, O=0, F=0, S=0) 10 11 4 RD 15 16 RA 20 21 31 RB 101 0000 1000 temp10:31 = rA0:15 *ui rB0:15 temp20:31 = rA32:47 *ui rB32:47 temp30:32 = ACC0:31 + temp10:31 temp40:32 = ACC32:63 + temp20:31 ACC0:31 = rD0:31 = temp31:32 ACC32:63 = rD32:63 = temp41:32 For each word element in the accumulator the following operations are performed in the order shown: Each even-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The intermediate 32-bit product is added to the contents of the accumulator word to form a 33-bit intermediate sum. The low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. Other registers altered: ACC 15 16 0 47 48 31 32 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-31. evmheumiaaw e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 335 evmheumianw evmheumianw Vector Multiply Half Words, Even, Unsigned, Modulo, Integer and Accumulate Negative into Words evmheumianw 0 rD,rA,rB 5 6 (M=1, O=0, F=0, S=0) 10 11 4 RD 15 16 20 21 RA 31 RB 101 1000 1000 temp10:31 = rA0:15 *ui rB0:15 temp20:31 = rA32:47 *ui rB32:47 temp30:32 = ACC0:31 - temp10:31 temp40:32 = ACC32:63 - temp20:31 ACC0:31 = rD0:31 = temp31:32 ACC32:63 = rD32:63 = temp41:32 For each word element in the accumulator the following operations are performed in the order shown: Each even-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The intermediate 32-bit product is subtracted from the contents of the accumulator word to form a 33-bit intermediate difference. The low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. Other registers altered: ACC 15 16 0 47 48 31 32 63 rA rB X X Intermediate product Accumulator – – Accumulator & rD Figure 6-32. evmheumianw e200z759n3 Core Reference Manual, Rev. 2 336 Freescale Semiconductor evmheusiaaw evmheusiaaw Vector Multiply Half Words, Even, Unsigned, Saturate, Integer and Accumulate into Words evmheusiaaw 0 rD,rA,rB 5 4 6 (M=0, O=0, F=0, S=0) 10 11 RD 15 16 RA 20 21 RB 31 101 0000 0000 temp10:31 = rA0:15 *ui rB0:15 temp20:31 = rA32:47 *ui rB32:47 temp30:32 = ACC0:31 + temp10:31 temp40:32 = ACC32:63 + temp20:31 ovh = temp30 ovl = temp40 rD0:31 = SATURATE_ACC(ovh, 0xFFFFFFFF, temp31:32) rD32:63 = SATURATE_ACC(ovl, 0xFFFFFFFF, temp41:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl For each word element in the accumulator the following operations are performed in the order shown: Each even-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The intermediate 32-bit product is added to the contents of the accumulator word to form a 33-bit intermediate sum. If the intermediate sum has overflowed, the saturation value 0xFFFFFFFF is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. If there is an overflow from the addition, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 337 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-33. evmheusiaaw e200z759n3 Core Reference Manual, Rev. 2 338 Freescale Semiconductor evmheusianw evmheusianw Vector Multiply Half Words, Even, Unsigned, Saturate, Integer and Accumulate Negative into Words evmheusianw 0 rD,rA,rB 5 4 6 (M=0, O=0, F=0, S=0) 10 11 RD 15 16 RA 20 21 RB 31 101 1000 0000 temp10:31 = rA0:15 *ui rB0:15 temp20:31 = rA32:47 *ui rB32:47 temp30:32 = ACC0:31 - temp10:31 temp40:32 = ACC32:63 - temp20:31 ovh = temp30 ovl = temp40 rD0:31 = SATURATE_ACC(ovh, 0x00000000, temp31:32) rD32:63 = SATURATE_ACC(ovl, 0x00000000, temp41:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl For each word element in the accumulator the following operations are performed in the order shown: Each even-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The intermediate 32-bit product is subtracted from the contents of the accumulator word to form a 33-bit intermediate difference. If the intermediate difference has underflowed (is negative), the saturation value 0x00000000 is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. If there is an underflow from the subtraction, the underflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 339 15 16 0 47 48 31 32 63 rA rB X X Intermediate product Accumulator – – Accumulator & rD Figure 6-34. evmheusianw e200z759n3 Core Reference Manual, Rev. 2 340 Freescale Semiconductor evmhogsmfaa evmhogsmfaa Multiply Half Words, Odd, Guarded, Signed, Modulo, Fractional and Accumulate evmhogsmfaa rD,rA,rB 0 5 4 6 (O=1, F=1, S=1) 10 11 15 16 RD RA 20 21 31 RB 101 0010 1111 prod0:31 = rA48:63 * rB48:63 temp10:63 = EXTS(prod0:31 || 0) temp20:64 = ACC0:63 + temp10:63 rD0:63 = ACC0:63 = temp21:64 The low odd-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The 32-bit intermediate product is sign-extended to 64 bits and then shifted left by one bit and added to the contents of the 64-bit accumulator to form a 65-bit intermediate sum. The lower 64 bits of the intermediate sum is placed back into the accumulator and also written into rD. NOTE This is a modulo sum. There is no check for overflow and no saturation is performed. An overflow from the 64-bit sum, if one occurs, is not recorded into SPEFSCR. 0 31 32 47 48 63 rA rB X Intermediate product Accumulator + Accumulator & rD Figure 6-35. evmhogsmfaa e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 341 evmhogsmfan evmhogsmfan Multiply Half Words, Odd, Guarded, Signed, Modulo, Fractional and Accumulate Negative evmhogsmfan rD,rA,rB 0 5 4 6 (O=1, F=1, S=1) 10 11 RD 15 16 RA 20 21 31 RB 101 1010 1111 prod0:31 = rA48:63 * rB48:63 temp10:63 = EXTS(prod0:31 || 0) temp20:64 = ACC0:63 - temp10:63 rD0:63 = ACC0:63 = temp21:64 The low odd-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The 32-bit intermediate product is sign-extended to 64 bits and then shifted left by one bit and subtracted from the contents of the 64-bit accumulator to form a 65-bit intermediate difference. The lower 64 bits of the intermediate difference is placed back into the accumulator and also written into rD. NOTE This is a modulo difference. There is no check for overflow and no saturation is performed. An overflow from the 64-bit difference, if one occurs, is not recorded into SPEFSCR. 0 31 32 47 48 63 rA rB X Intermediate product Accumulator – Accumulator & rD Figure 6-36. evmhogsmfan e200z759n3 Core Reference Manual, Rev. 2 342 Freescale Semiconductor evmhogsmiaa evmhogsmiaa Multiply Half Words, Odd, Guarded, Signed, Modulo, Integer and Accumulate evmhogsmiaa rD,rA,rB 0 5 4 6 (O=1, F=0, S=1) 10 11 15 16 RD RA 20 21 31 RB 101 0010 1101 prod0:31 = rA48:63 *si rB48:63 temp10:63 = EXTS(prod0:31) temp20:64 = ACC0:63 + temp10:63 rD0:63 = ACC0:63 = temp21:64 The low odd-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The intermediate product is sign-extended to 64 bits and added to the contents of the 64-bit accumulator to form a 65-bit intermediate sum. The lower 64 bits of the intermediate sum is placed back into the accumulator and also written into rD. NOTE This is a modulo sum. There is no check for overflow and no saturation is performed. An overflow from the 64-bit sum, if one occurs, is not recorded into SPEFSCR. 0 31 32 47 48 63 rA rB X Intermediate product Accumulator + Accumulator & RD Figure 6-37. evmhogsmiaa e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 343 evmhogsmian evmhogsmian Multiply Half Words, Odd, Guarded, Signed, Modulo, Integer and Accumulate Negative evmhogsmian rD,rA,rB 0 5 4 6 (O=1, F=0, S=1) 10 11 RD 15 16 RA 20 21 31 RB 101 1010 1101 prod0:31 = rA48:63 *si rB48:63 temp10:63 = EXTS(prod0:31) temp20:64 = ACC0:63 - temp10:63 rD0:63 = ACC0:63 = temp21:64 The low odd-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The intermediate product is sign-extended to 64 bits and subtracted from the contents of the 64-bit accumulator to form a 65-bit intermediate difference. The lower 64 bits of the intermediate difference is placed back into the accumulator and also written into rD. NOTE This is a modulo difference. There is no check for overflow and no saturation is performed. An overflow from the 64-bit difference, if one occurs, is not recorded into SPEFSCR. 0 31 32 47 48 63 rA rB X Intermediate product Accumulator – Accumulator & rD Figure 6-38. evmhogsmian e200z759n3 Core Reference Manual, Rev. 2 344 Freescale Semiconductor evmhogumiaa evmhogumiaa Multiply Half Words, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate evmhogumiaa rD,rA,rB 0 5 4 6 (O=1, F=0, S=0) 10 11 15 16 RD RA 20 21 31 RB 101 0010 1100 prod0:31 = rA48:63 *ui rB48:63 temp10:63 = EXTZ(prod0:31) temp20:64 = ACC0:63 + temp10:63 rD0:63 = ACC0:63 = temp21:64 The low odd-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The intermediate product is sign-extended to 64 bits and added to the contents of the 64-bit accumulator to form a 65-bit intermediate sum. The lower 64 bits of the intermediate sum is placed back into the accumulator and also written into rD. NOTE This is a modulo sum. There is no check for overflow and no saturation is performed. An overflow from the 64-bit sum, if one occurs, is not recorded into SPEFSCR. 0 31 32 47 48 63 rA rB X Intermediate product Accumulator + Accumulator & rD Figure 6-39. evmhogumiaa e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 345 evmhogumian evmhogumian Multiply Half Words, Odd, Guarded, Unsigned, Modulo, Integer and Accumulate Negative evmhogumian rD,rA,rB 0 5 4 6 (O=1, F=0, S=0) 10 11 RD 15 16 RA 20 21 31 RB 101 1010 1100 prod0:31 = rA48:63 *ui rB48:63 temp10:63 = EXTZ(prod0:31) temp20:64 = ACC0:63 - temp10:63 The low odd-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The intermediate product is sign-extended to 64 bits and subtracted from the contents of the 64-bit accumulator to form a 65-bit intermediate difference. The lower 64 bits of the intermediate difference is placed back into the accumulator and also written into rD. NOTE This is a modulo difference. There is no check for overflow and no saturation is performed. An overflow from the 64-bit difference, if one occurs, is not recorded into SPEFSCR. 0 31 32 47 48 63 rA rB X Intermediate product Accumulator – Accumulator & rD Figure 6-40. evmhogumian e200z759n3 Core Reference Manual, Rev. 2 346 Freescale Semiconductor evmhosmf evmhosmf Vector Multiply Half Words, Odd, Signed, Modulo, Fractional evmhosmf rD,rA,rB 0 5 6 4 (M=1, O=1, F=1, S=1, A=0) 10 11 15 16 RD RA 20 21 31 RB 100 0000 1111 temp10:32 = rA16:31 * rB16:31 temp20:32 = rA48:63 * rB48:63 rD0:31 = temp11:32 rD32:63 = temp21:32 Each odd-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The two 32-bit signed fractional products are placed into the two word elements of rD. 0 15 16 31 32 47 48 63 rA rB X X rD Figure 6-41. evmhosmf e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 347 evmhosmfa evmhosmfa Vector Multiply Half Words, Odd, Signed, Modulo, Fractional, to Accumulator evmhosmfa rD,rA,rB 0 5 6 (M=1, O=1, F=1, S=1, A=1) 10 11 4 RD 15 16 RA 20 21 31 RB 100 0010 1111 prod0:31 = rA16:31 * rB16:31 prod32:63 = rA48:63 * rB48:63 temp10:32 = prod0:31 || 0 temp20:32 = prod32:63 || 0 rD0:31 = temp11:32 rD32:63 = temp21:32 ACC0:63 = rD0:63 Each odd-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The two 32-bit signed fractional products are shifted left by one bit to remove the redundant sign bit, and are then placed into the two word elements of rD. The result in rD is also placed in the accumulator. Other registers altered: ACC 0 15 16 31 32 47 48 63 rA rB X X Accumulator & rD Figure 6-42. evmhosmfa e200z759n3 Core Reference Manual, Rev. 2 348 Freescale Semiconductor evmhosmfaaw evmhosmfaaw Vector Multiply Half Words, Odd, Signed, Modulo, Fractional and Accumulate into Words evmhosmfaaw 0 rD,rA,rB 5 6 (M=1, O=1, F=1, S=1) 10 11 4 RD 15 16 RA 20 21 31 RB 101 0000 1111 temp10:32 = (rA16:31 * rB16:31) || 0 temp20:32 = (rA48:63 * rB48:63) || 0 temp30:32 = ACC0:31 + temp11:32 temp40:32 = ACC32:63 + temp21:32 ACC0:31 = rD0:31 = temp31:32 ACC32:63 = rD32:63 = temp41:32 For each word element in the accumulator the following operations are performed in the order shown: Each odd-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The intermediate 32-bit product is shifted left by one bit to remove the redundant sign bit, and is then added to the contents of the accumulator word to form a 33-bit intermediate sum. The low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. Other registers altered: ACC 0 15 16 31 32 47 48 63 RA RB X X Intermediate product Accumulator + + Accumulator & RD Figure 6-43. evmhosmfaaw e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 349 evmhosmfanw evmhosmfanw Vector Multiply Half Words, Odd, Signed, Modulo, Fractional and Accumulate Negative into Words evmhosmfanw 0 rD,rA,rB 5 6 (M=1, O=1, F=1, S=1) 10 11 4 RD 15 16 RA 20 21 31 RB 101 1000 1111 temp10:32 = (rA16:31 * rB16:31) || 0 temp20:32 = (rA48:63 * rB48:63) || 0 temp30:32 = ACC0:31 - temp11:32 temp40:32 = ACC32:63 - temp21:32 ACC0:31 = rD0:31 = temp31:32 ACC32:63 = rD32:63 = temp41:32 For each word element in the accumulator the following operations are performed in the order shown: Each odd-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The intermediate 32-bit product is shifted left by one bit to remove the redundant sign bit, and is then subtracted from the contents of the accumulator word to form a 33-bit intermediate difference. The low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. Other registers altered: ACC 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator – – Accumulator & rD Figure 6-44. evmhosmfanw e200z759n3 Core Reference Manual, Rev. 2 350 Freescale Semiconductor evmhosmi evmhosmi Vector Multiply Half Words, Odd, Signed, Modulo, Integer evmhosmi rD,rA,rB 0 5 6 4 (M=1, O=1, F=0, S=1, A=0) 10 11 15 16 RD RA 20 21 31 RB 100 0000 1101 rD0:31 = rA16:31 *si rB16:31 rD32:63 = rA48:63 *si rB48:63 Each odd-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The two 32-bit signed integer products are placed into the two word elements of rD. 0 15 16 31 32 47 48 63 rA rB X X rD Figure 6-45. evmhosmi e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 351 evmhosmia evmhosmia Vector Multiply Half Words, Odd, Signed, Modulo, Integer, to Accumulator evmhosmia rD,rA,rB 0 5 6 (M=1, O=1, F=0, S=1, A=1) 10 11 4 RD 15 16 RA 20 21 31 RB 100 0010 1101 rD0:31 = rA16:31 *si rB16:31 rD32:63 = rA48:63 *si rB48:63 ACC0:63 = rD0:63 Each odd-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The two 32-bit signed integer products are placed into the two word elements of rD. The result in rD is also placed in the accumulator. Other registers altered: ACC 0 15 16 31 32 47 48 63 rA rB X X Accumulator & rD Figure 6-46. evmhosmia e200z759n3 Core Reference Manual, Rev. 2 352 Freescale Semiconductor evmhosmiaaw evmhosmiaaw Vector Multiply Half Words, Odd, Signed, Modulo, Integer and Accumulate into Words evmhosmiaaw 0 rD,rA,rB 5 6 (M=1, O=1, F=0, S=1) 10 11 4 RD 15 16 RA 20 21 31 RB 101 0000 1101 temp10:31 = rA16:31 *si rB16:31 temp20:31 = rA48:63 *si rB48:63 temp30:32 = ACC0:31 + temp10:31 temp40:32 = ACC32:63 + temp20:31 ACC0:31 = rD0:31 = temp31:32 ACC32:63 = rD32:63 = temp41:32 For each word element in the accumulator the following operations are performed in the order shown: Each odd-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The intermediate 32-bit product is added to the contents of the accumulator word to form a 33-bit intermediate sum. The low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. Other registers altered: ACC 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-47. evmhosmiaaw e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 353 evmhosmianw evmhosmianw Vector Multiply Half Words, Odd, Signed, Modulo, Integer and Accumulate Negative into Words evmhosmianw 0 rD,rA,rB 5 6 (M=1, O=1, F=0, S=1) 10 11 4 RD 15 16 RA 20 21 31 RB 101 1000 1101 temp10:31 = rA16:31 *si rB16:31 temp20:31 = rA48:63 *si rB48:63 temp30:32 = ACC0:31 - temp10:31 temp40:32 = ACC32:63 - temp20:31 ACC0:31 = rD0:31 = temp31:32 ACC32:63 = rD32:63 = temp41:32 For each word element in the accumulator the following operations are performed in the order shown: Each odd-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The intermediate 32-bit product is subtracted from the contents of the accumulator word to form a 33-bit intermediate difference. The low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. Other registers altered: ACC 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator – – Accumulator & rD Figure 6-48. evmhosmianw e200z759n3 Core Reference Manual, Rev. 2 354 Freescale Semiconductor evmhossf evmhossf Vector Multiply Half Words, Odd, Signed, Saturate, Fractional evmhossf rD,rA,rB 0 5 6 4 (M=0, O=1, F=1, S=1, A=0) 10 11 15 16 RD RA 20 21 31 RB 100 0000 0111 temp10:32 = (rA16:31 * rB16:31) || 0 temp20:32 = (rA48:63 * rB48:63) || 0 movh = temp10 temp11 movl = temp20 temp21 rD0:31 = SATURATE(movh, 0x7FFFFFFF, temp11:32) rD32:63 = SATURATE(movl, 0x7FFFFFFF, temp21:32) SPEFSCROVH = movh SPEFSCROV = movl SPEFSCRSOVH = SPEFSCRSOVH | movh SPEFSCRSOV = SPEFSCRSOV | movl Each odd-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The two 32-bit signed fractional products are shifted left one bit to eliminate the redundant sign bit, and are then placed into the two word elements of rD. If the inputs are –1.0 and –1.0 the result is saturated to the most positive signed fraction (0x7FFFFFFF). If saturation occurs, the overflow and summary overflow bits are recorded. Other registers altered: SPEFSCR 0 15 16 31 32 47 48 63 rA rB X X rD Figure 6-49. evmhossf e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 355 evmhossfa evmhossfa Vector Multiply Half Words, Odd, Signed, Saturate, Fractional, to Accumulator evmhossfa rD,rA,rB 0 5 4 6 (M=0, O=1, F=1, S=1, A=1) 10 11 RD 15 16 20 21 RA 31 RB 100 0010 0111 temp10:32 = (rA16:31 * rB16:31) || 0 temp20:32 = (rA48:63 * rB48:63) || 0 movh = temp10 temp11 movl = temp20 temp21 rD0:31 = SATURATE(movh, 0x7FFFFFFF, temp11:32) rD32:63 = SATURATE(movl, 0x7FFFFFFF, temp21:32) ACC0:63 = rD0:63 SPEFSCROVH = movh SPEFSCROV = movl SPEFSCRSOVH = SPEFSCRSOVH | movh SPEFSCRSOV = SPEFSCRSOV | movl Each odd-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The two 32-bit signed fractional products are shifted left one bit to eliminate the redundant sign bit, and are then placed into the two word elements of rD. If the inputs are –1.0 and –1.0 the result is saturated to the most positive signed fraction (0x7FFFFFFF). If saturation occurs, the overflow and summary overflow bits are recorded. The result in rD is also placed in the accumulator. Other registers altered: SPEFSCR, ACC 0 15 16 31 32 47 48 63 rA rB X X Accumulator & rD Figure 6-50. evmhossfa e200z759n3 Core Reference Manual, Rev. 2 356 Freescale Semiconductor evmhossfaaw evmhossfaaw Vector Multiply Half Words, Odd, Signed, Saturate, Fractional and Accumulate into Words evmhossfaaw 0 rD,rA,rB 5 4 6 (M=0, O=1, F=1, S=1) 10 11 RD 15 16 RA 20 21 RB 31 101 0000 0111 temp10:32 = (rA16:31 * rB16:31) || 0 temp20:32 = (rA48:63 * rB48:63) || 0 movh = temp10 temp11 movl = temp20 temp21 temp30:31 = SATURATE(movh, 0x7FFFFFFF, temp11:32) temp40:31 = SATURATE(movl, 0x7FFFFFFF, temp21:32) temp50:32 = {ACC0,ACC0:31} + {temp30,temp30:31} temp60:32 = {ACC32,ACC32:63} + {temp40,temp40:31} ovh = temp50 temp51 ovl = temp60 temp61 rD0:31 = SATURATE_ACC(ovh, temp50, 0x80000000, 0x7FFFFFFF, temp51:32) rD32:63 = SATURATE_ACC(ovl, temp60, 0x80000000, 0x7FFFFFFF, temp61:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = movh | ovh SPEFSCROV = movl | ovl SPEFSCRSOVH = SPEFSCRSOVH | movh | ovh SPEFSCRSOV = SPEFSCRSOV | movl | ovl Each odd-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The two 32-bit signed fractional products are shifted left one bit to eliminate the redundant sign bit. If the inputs are –1.0 and –1.0 the intermediate result is saturated to the most positive signed fraction (0x7FFFFFFF). The intermediate 32-bit products are added to the respective accumulator word to form an intermediate sum. If the intermediate sum has overflowed, the appropriate saturation value (0x7FFFFFFF if positive overflow or 0x80000000 if negative overflow) is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. If there is an overflow from either the multiply or the addition, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 357 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-51. evmhossfaaw e200z759n3 Core Reference Manual, Rev. 2 358 Freescale Semiconductor evmhossfanw evmhossfanw Vector Multiply Half Words, Odd, Signed, Saturate, Fractional and Accumulate Negative into Words evmhossfanw 0 rD,rA,rB 5 4 6 (M=0, O=1, F=1, S=1) 10 11 RD 15 16 RA 20 21 RB 31 101 1000 0111 temp10:32 = (rA16:31 * rB16:31) || 0 temp20:32 = (rA48:63 * rB48:63) || 0 movh = temp10 temp11 movl = temp20 temp21 temp30:31 = SATURATE(movh, 0x7FFFFFFF, temp11:32) temp40:31 = SATURATE(movl, 0x7FFFFFFF, temp21:32) temp50:32 = {ACC0,ACC0:31} - {temp30,temp30:31} temp60:32 = {ACC32,ACC32:63} - {temp40,temp40:31} ovh = temp50 temp51 ovl = temp60 temp61 rD0:31 = SATURATE_ACC(ovh, temp50, 0x80000000, 0x7FFFFFFF, temp51:32) rD32:63 = SATURATE_ACC(ovl, temp60, 0x80000000, 0x7FFFFFFF, temp61:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = movh | ovh SPEFSCROV = movl | ovl SPEFSCRSOVH = SPEFSCRSOVH | movh | ovh SPEFSCRSOV = SPEFSCRSOV | movl | ovl For each word element in the accumulator the following operations are performed in the order shown: Each odd-numbered signed fractional halfword element in rA is multiplied by the corresponding signed fractional halfword element in rB. The two 32-bit signed fractional products are shifted left one bit to eliminate the redundant sign bit. If the inputs are –1.0 and –1.0 the intermediate result is saturated to the most positive signed fraction (0x7FFFFFFF). The intermediate 32-bit product is subtracted from the contents of the accumulator word to form an intermediate difference. If the intermediate difference has overflowed, the appropriate saturation value (0x7FFFFFFF if positive overflow or 0x80000000 if negative overflow) is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. If there is an overflow from either the multiply or the subtraction, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 359 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator – – Accumulator & rD Figure 6-52. evmhossfanw e200z759n3 Core Reference Manual, Rev. 2 360 Freescale Semiconductor evmhossiaaw evmhossiaaw Vector Multiply Half Words, Odd, Signed, Saturate, Integer and Accumulate into Words evmhossiaaw 0 rD,rA,rB 5 4 6 (M=0, O=1, F=0, S=1) 10 11 RD 15 16 RA 20 21 RB 31 101 0000 0101 temp10:31 = rA16:31 *si rB16:31 temp20:31 = rA48:63 *si rB48:63 temp30:32 = {ACC0,ACC0:31} + {temp10,temp10:31} temp40:32 = {ACC32,ACC32:63} + {temp20,temp20:31} ovh = temp30 temp31 ovl = temp40 temp41 rD0:31 = SATURATE_ACC(ovh, temp30, 0x80000000, 0x7FFFFFFF, temp31:32) rD32:63 = SATURATE_ACC(ovl, temp40, 0x80000000, 0x7FFFFFFF, temp41:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl For each word element in the accumulator the following operations are performed in the order shown: Each odd-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The intermediate 32-bit product is added to the contents of the accumulator word to form an intermediate sum. If the intermediate sum has overflowed, the appropriate saturation value (0x7FFFFFFF if positive overflow or 0x80000000 if negative overflow) is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. If there is an overflow from the addition, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 361 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-53. evmhossiaaw e200z759n3 Core Reference Manual, Rev. 2 362 Freescale Semiconductor evmhossianw evmhossianw Vector Multiply Half Words, Odd, Signed, Saturate, Integer and Accumulate Negative into Words evmhossianw 0 rD,rA,rB 5 4 6 (M=0, O=1, F=0, S=1) 10 11 RD 15 16 RA 20 21 RB 31 101 1000 0101 temp10:31 = rA16:31 *si rB16:31 temp20:31 = rA48:63 *si rB48:63 temp30:32 = {ACC0,ACC0:31} - {temp10,temp10:31} temp40:32 = {ACC32,ACC32:63} - {temp20,temp20:31} ovh = temp30 temp31 ovl = temp40 temp41 rD0:31 = SATURATE_ACC(ovh, temp30, 0x80000000, 0x7FFFFFFF, temp31:32) rD32:63 = SATURATE_ACC(ovl, temp40, 0x80000000, 0x7FFFFFFF, temp41:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl For each word element in the accumulator the following operations are performed in the order shown: Each odd-numbered signed integer halfword element in rA is multiplied by the corresponding signed integer halfword element in rB. The intermediate 32-bit product is subtracted from the contents of the accumulator word to form an intermediate difference. If the intermediate difference has overflowed, the appropriate saturation value (0x7FFFFFFF if positive overflow or 0x80000000 if negative overflow) is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. If there is an overflow from the subtraction, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 363 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator – – Accumulator & rD Figure 6-54. evmhossianw e200z759n3 Core Reference Manual, Rev. 2 364 Freescale Semiconductor evmhoumi evmhoumi Vector Multiply Half Words, Odd, Unsigned, Modulo, Integer evmhoumi rD,rA,rB 0 5 6 4 (M=1, O=1, F=0, S=0, A=0) 10 11 15 16 RD RA 20 21 31 RB 100 0000 1100 rD0:31 = rA16:31 *ui rB16:31 rD32:63 = rA48:63 *ui rB48:63 Each odd-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The two 32-bit unsigned integer products are placed into the two word elements of rD. 0 15 16 31 32 47 48 63 rA rB X X rD Figure 6-55. evmhoumi e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 365 evmhoumia evmhoumia Vector Multiply Half Words, Odd, Unsigned, Modulo, Integer, to Accumulator evmhoumia rD,rA,rB 0 5 6 (M=1, O=1, F=0, S=0, A=1) 10 11 4 RD 15 16 RA 20 21 31 RB 100 0010 1100 rD0:31 = rA16:31 *ui rB16:31 rD32:63 = rA48:63 *ui rB48:63 ACC0:63 = rD0:63 Each odd-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The two 32-bit unsigned integer products are placed into the two word elements of rD. The result in rD is also placed in the accumulator. Other registers altered: ACC 0 15 16 31 32 47 48 63 rA rB X X Accumulator & rD Figure 6-56. evmhoumia e200z759n3 Core Reference Manual, Rev. 2 366 Freescale Semiconductor evmhoumiaaw evmhoumiaaw Vector Multiply Half Words, Odd, Unsigned, Modulo, Integer and Accumulate into Words evmhoumiaaw 0 rD,rA,rB 5 6 (M=1, O=1, F=0, S=0) 10 11 4 RD 15 16 RA 20 21 31 RB 101 0000 1100 temp10:31 = rA16:31 *ui rB16:31 temp20:31 = rA48:63 *ui rB48:63 temp30:32 = ACC0:31 + temp10:31 temp40:32 = ACC32:63 + temp20:31 ACC0:31 = rD0:31 = temp31:32 ACC32:63 = rD32:63 = temp41:32 For each word element in the accumulator the following operations are performed in the order shown: Each odd-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The intermediate 32-bit product is added to the contents of the accumulator word to form a 33-bit intermediate sum. The low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. Other registers altered: ACC 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-57. evmhoumiaaw e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 367 evmhoumianw evmhoumianw Vector Multiply Half Words, Odd, Unsigned, Modulo, Integer and Accumulate Negative into Words evmhoumianw 0 rD,rA,rB 5 6 (M=1, O=1, F=0, S=0) 10 11 4 RD 15 16 RA 20 21 31 RB 101 1000 1100 temp10:31 = rA16:31 *ui rB16:31 temp20:31 = rA48:63 *ui rB48:63 temp30:32 = ACC0:31 - temp10:31 temp40:32 = ACC32:63 - temp20:31 ACC0:31 = rD0:31 = temp31:32 ACC32:63 = rD32:63 = temp41:32 For each word element in the accumulator the following operations are performed in the order shown: Each odd-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The intermediate 32-bit product is subtracted from the contents of the accumulator word to form a 33-bit intermediate difference. The low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. Other registers altered: ACC 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator – – Accumulator & rD Figure 6-58. evmhoumianw e200z759n3 Core Reference Manual, Rev. 2 368 Freescale Semiconductor evmhousiaaw evmhousiaaw Vector Multiply Half Words, Odd, Unsigned, Saturate, Integer and Accumulate into Words evmhousiaaw 0 rD,rA,rB 5 4 6 (M=0, O=1, F=0, S=0) 10 11 RD 15 16 RA 20 21 RB 31 101 0000 0100 temp10:31 = rA16:31 *ui rB16:31 temp20:31 = rA48:63 *ui rB48:63 temp30:32 = ACC0:31 + temp10:31 temp40:32 = ACC32:63 + temp20:31 ovh = temp30 ovl = temp40 rD0:31 = SATURATE_ACC(ovh, 0xFFFFFFFF, temp31:32) rD32:63 = SATURATE_ACC(ovl, 0xFFFFFFFF, temp41:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl For each word element in the accumulator the following operations are performed in the order shown: Each odd-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The intermediate 32-bit product is added to the contents of the accumulator word to form a 33-bit intermediate sum. If the intermediate sum has overflowed, 0xFFFFFFFF is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. If there is an overflow from the addition, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 369 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-59. evmhousiaaw e200z759n3 Core Reference Manual, Rev. 2 370 Freescale Semiconductor evmhousianw evmhousianw Vector Multiply Half Words, Odd, Unsigned, Saturate, Integer and Accumulate Negative into Words evmhousianw 0 rD,rA,rB 5 4 6 (M=0, O=1, F=0, S=0) 10 11 RD 15 16 RA 20 21 RB 31 101 1000 0100 temp10:31 = rA16:31 *ui rB16:31 temp20:31 = rA48:63 *ui rB48:63 temp30:32 = ACC0:31 - temp10:31 temp40:32 = ACC32:63 - temp20:31 ovh = temp30 ovl = temp40 rD0:31 = SATURATE_ACC(ovh, 0x00000000, temp31:32) rD32:63 = SATURATE_ACC(ovl, 0x00000000, temp41:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl For each word element in the accumulator the following operations are performed in the order shown: Each odd-numbered unsigned integer halfword element in rA is multiplied by the corresponding unsigned integer halfword element in rB. The intermediate 32-bit product is subtracted from the contents of the accumulator word to form a 33-bit intermediate difference. If the intermediate difference has underflowed (is negative), 0x00000000 is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. If there is an underflow from either subtraction, the underflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 371 0 15 16 31 32 47 48 63 rA rB X X Intermediate product Accumulator – – Accumulator & rD Figure 6-60. evmhousianw 6.4.2 Multiply words instructions The following instructions perform 32x32 multiplies, returning either the higher or lower portion of the product, with and without accumulates, using signed or unsigned integer or fractional operands, with optional saturation. e200z759n3 Core Reference Manual, Rev. 2 372 Freescale Semiconductor evmwhsmf evmwhsmf Vector Multiply Word High Signed, Modulo, Fractional evmwhsmf rD,rA,rB 0 5 6 4 (M=1, F=1, S=1,A=0) 10 11 RD 15 16 RA 20 21 RB 31 100 0100 1111 temp10:63 = rA0:31 * rB0:31 temp20:63 = rA32:63 * rB32:63 rD0:31 = temp11:32 rD32:63 = temp21:32 Each signed fractional word element in rA is multiplied by the corresponding signed fractional word element in rB. Bits1:32 of the two 64-bit signed fractional products (eliminating the redundant sign bit) are placed into the two word elements of rD. 0 31 32 63 rA rB X X Intermediate product rD Figure 6-61. evmwhsmf e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 373 evmwhsmfa evmwhsmfa Vector Multiply Word High Signed, Modulo, Fractional, to Accumulator evmwhsmfa rD,rA,rB 0 5 6 4 (M=1, F=1, S=1,A=1) 10 11 RD 15 16 RA 20 21 RB 31 100 0110 1111 temp10:64 = rA0:31 * rB0:31 temp20:64 = rA32:63 * rB32:63 rD0:31 = temp11:32 rD32:63 = temp21:32 ACC0:63 = rD0:63 Each signed fractional word element in rA is multiplied by the corresponding signed fractional word element in rB. Bits1:32 of the two 64-bit signed fractional products (eliminating the redundant sign bit) are placed into the two word elements of rD. The result in rD is also placed in the accumulator. Other registers altered: ACC 0 31 32 63 rA rB X X Intermediate product Accumulator & rD Figure 6-62. evmwhsmfa e200z759n3 Core Reference Manual, Rev. 2 374 Freescale Semiconductor evmwhsmi evmwhsmi Vector Multiply Word High Signed, Modulo, Integer evmwhsmi rD,rA,rB 0 5 4 6 (M=1, F=0, S=1,A=0) 10 11 RD 15 16 RA 20 21 RB 31 100 0100 1101 temp10:63 = rA0:31 *si rB0:31 temp20:63 = rA32:63 *si rB32:63 rD0:31 = temp10:31 rD32:63 = temp20:31 Each signed integer word element in rA is multiplied by the corresponding signed integer word element in rB. The upper 32 bits of the two 64-bit signed integer products are placed into the two word elements of rD. 0 31 32 63 rA rB X X Intermediate product rD Figure 6-63. evmwhsmi e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 375 evmwhsmia evmwhsmia Vector Multiply Word High Signed, Modulo, Integer, to Accumulator evmwhsmia rD,rA,rB 0 5 6 4 (M=1, F=0, S=1,A=1) 10 11 RD 15 16 RA 20 21 RB 31 100 0110 1101 temp10:63 = rA0:31 *si rB0:31 temp20:63 = rA32:63 *si rB32:63 rD0:31 = temp10:31 rD32:63 = temp20:31 ACC0:63 = rD0:63 Each signed integer word element in rA is multiplied by the corresponding signed integer word element in rB. The upper 32 bits of the two 64-bit signed integer products are placed into the two word elements of rD. The result in rD is also placed in the accumulator. Other registers altered: ACC 0 31 32 63 rA rB X X Intermediate product Accumulator & rD Figure 6-64. evmwhsmia e200z759n3 Core Reference Manual, Rev. 2 376 Freescale Semiconductor evmwhssf evmwhssf Vector Multiply Word High Signed, Saturate, Fractional evmwhssf rD,rA,rB 0 5 6 4 (M=0, F=1, S=1,A=0) 10 11 RD 15 16 RA 20 21 RB 31 100 0100 0111 temp10:63 = rA0:31 * rB0:31 temp20:63 = rA32:63 * rB32:63 movh = temp10 temp11 movl = temp20 temp21 rD0:31 = SATURATE(movh, 0x7FFFFFFF, temp11:32) rD32:63 = SATURATE(movl, 0x7FFFFFFF, temp21:32) SPEFSCROVH = movh SPEFSCROV = movl SPEFSCRSOVH = SPEFSCRSOVH | movh SPEFSCRSOV = SPEFSCRSOV | movl Each signed fractional word element in rA is multiplied by the corresponding signed fractional word element in rB. Bits1:32 of the two 64-bit signed fractional products (eliminating the redundant sign bit) are placed into the two word elements of rD. If the inputs are –1.0 and –1.0 the result is saturated to the most positive signed fraction (0x7FFFFFFF). If saturation occurs the overflow and summary overflow bits are recorded. Other registers altered: SPEFSCR 0 31 32 63 rA rB X X Intermediate product rD Figure 6-65. evmwhssf e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 377 evmwhssfa evmwhssfa Vector Multiply Word High Signed, Saturate, Fractional, to Accumulator evmwhssfa rD,rA,rB 0 5 6 4 (M=0, F=1, S=1,A=1) 10 11 RD 15 16 RA 20 21 RB 31 100 0110 0111 temp10:63 = rA0:31 * rB0:31 temp20:63 = rA32:63 * rB32:63 movh = temp10 temp11 movl = temp20 temp21 rD0:31 = SATURATE(movh, 0x7FFFFFFF, temp11:32) rD32:63 = SATURATE(movl, 0x7FFFFFFF, temp21:32) ACC0:63 = rD0:63 SPEFSCROVH = movh SPEFSCROV = movl SPEFSCRSOVH = SPEFSCRSOVH | movh SPEFSCRSOV = SPEFSCRSOV | movl Each signed fractional word element in rA is multiplied by the corresponding signed fractional word element in rB. Bits1:32 of the two 64-bit signed fractional products (eliminating the redundant sign bit) are placed into the two word elements of rD. If the inputs are –1.0 and –1.0 the result is saturated to the most positive signed fraction (0x7FFFFFFF). If saturation occurs the overflow and summary overflow bits are recorded. The result in rD is also placed in the accumulator. Other registers altered: SPEFSCR, ACC 0 31 32 63 rA rB X X Intermediate product Accumulator & rD Figure 6-66. evmwhssfa e200z759n3 Core Reference Manual, Rev. 2 378 Freescale Semiconductor evmwhumi evmwhumi Vector Multiply Word High Unsigned, Modulo, Integer evmwhumi rD,rA,rB 0 5 4 6 (M=1, F=0, S=0,A=0) 10 11 RD 15 16 RA 20 21 RB 31 100 0100 1100 temp10:63 = rA0:31 *ui rB0:31 temp20:63 = rA32:63 *ui rB32:63 rD0:31 = temp10:31 rD32:63 = temp20:31 Each unsigned integer word element in rA is multiplied by the corresponding unsigned integer word element in rB. The upper 32 bits of the two 64-bit unsigned integer products are placed into the two word elements of rD. 0 31 32 63 rA rB X X Intermediate product rD Figure 6-67. evmwhumi e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 379 evmwhumia evmwhumia Vector Multiply Word High Unsigned, Modulo, Integer, to Accumulator evmwhumia rD,rA,rB 0 5 6 4 (M=1, F=0, S=0,A=1) 10 11 RD 15 16 RA 20 21 RB 31 100 0110 1100 temp10:63 = rA0:31 *ui rB0:31 temp20:63 = rA32:63 *ui rB32:63 rD0:31 = temp10:31 rD32:63 = temp20:31 ACC0:63 = rD0:63 Each unsigned integer word element in rA is multiplied by the corresponding unsigned integer word element in rB. The upper 32 bits of the two 64-bit unsigned integer products are placed into the two word elements of rD. The result in rD is also placed in the accumulator. Other registers altered: ACC 0 31 32 63 rA rB X X Intermediate product Accumulator & rD Figure 6-68. evmwhumia e200z759n3 Core Reference Manual, Rev. 2 380 Freescale Semiconductor evmwlsmiaaw evmwlsmiaaw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate in Words evmwlsmiaaw 0 rD,rA,rB 5 6 4 (M=1, F=0, S=1) 10 11 RD 15 16 RA 20 21 RB 31 101 0100 1001 temp10:63 = rA0:31 *si rB0:31 temp20:63 = rA32:63 *si rB32:63 rD0:31 = ACC0:31 + temp132:63 rD32:63 = ACC32:63 + temp232:63 ACC0:63 = rD0:63 For each word element in the accumulator the following operations are performed in the order shown: Each signed integer word element in rA is multiplied by the corresponding signed integer word element in rB. The low 32 bits of the 64-bit intermediate product are added to the contents of the accumulator word and placed into the corresponding rD word. The result in rD is also placed in the accumulator. NOTE This instruction produces a valid result only if the intermediate product can be represented in the lower 32 bits. Other registers altered: ACC 0 31 32 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-69. evmwlsmiaaw e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 381 evmwlsmianw evmwlsmianw Vector Multiply Word Low Signed, Modulo, Integer and Accumulate Negative in Words evmwlsmianw 0 rD,rA,rB 5 6 4 (M=1, F=0, S=1) 10 11 RD 15 16 RA 20 21 RB 31 101 1100 1001 temp10:63 = rA0:31 *si rB0:31 temp20:63 = rA32:63 *si rB32:63 rD0:31 = ACC0:31 - temp132:63 rD32:63 = ACC32:63 - temp232:63 ACC0:63 = rD0:63 For each word element in the accumulator the following operations are performed in the order shown: Each signed integer word element in rA is multiplied by the corresponding signed integer word element in rB. The low 32 bits of the 64-bit intermediate product are subtracted from the contents of the accumulator word and placed into the corresponding rD word. The result in rD is also placed in the accumulator. NOTE This instruction produces a valid result only if the intermediate product can be represented in the lower 32 bits. Other registers altered: ACC 0 31 32 63 rA rB X X Intermediate product Accumulator – – Accumulator & rD Figure 6-70. evmwlsmianw e200z759n3 Core Reference Manual, Rev. 2 382 Freescale Semiconductor evmwlssiaaw evmwlssiaaw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate in Words evmwlssiaaw 0 rD,rA,rB 5 4 6 (M=0, F=0, S=1) 10 11 RD 15 16 RA 20 21 RB 31 101 0100 0001 temp10:63 = rA0:31 *si rB0:31 temp20:63 = rA32:63 *si rB32:63 temp30:32 = {ACC0,ACC0:31} + {temp132,temp132:63} temp40:32 = {ACC32,ACC32:63} + {temp232,temp232:63} ovh = temp30 temp31 ovl = temp40 temp41 rD0:31 = SATURATE_ACC(ovh, temp30, 0x80000000, 0x7FFFFFFF, temp31:32) rD32:63 = SATURATE_ACC(ovl, temp40, 0x80000000, 0x7FFFFFFF, temp41:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl For each word element in the accumulator the following operations are performed in the order shown: Each signed integer word element in rA is multiplied by the corresponding signed integer word element in rB. The low 32 bits of the 64-bit intermediate product are added to the contents of the accumulator word to form an intermediate sum. If the intermediate sum has overflowed, the appropriate saturation value (0x7FFFFFFF if positive overflow or 0x80000000 if negative overflow) is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. If there is an overflow from the addition, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. NOTE This instruction produces a valid result only if the intermediate product can be represented in the lower 32 bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 383 0 31 32 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-71. evmwlssiaaw e200z759n3 Core Reference Manual, Rev. 2 384 Freescale Semiconductor evmwlssianw evmwlssianw Vector Multiply Word Low Signed, Saturate, Integer and Accumulate Negative in Words evmwlssianw 0 rD,rA,rB 5 4 6 (M=0, F=0, S=1) 10 11 RD 15 16 RA 20 21 RB 31 101 1100 0001 temp10:63 = rA0:31 *si rB0:31 temp20:63 = rA32:63 *si rB32:63 temp30:32 = {ACC0,ACC0:31} - {temp132,temp132:63} temp40:32 = {ACC32,ACC32:63} - {temp232,temp232:63} ovh = temp30 temp31 ovl = temp40 temp41 rD0:31 = SATURATE_ACC(ovh, temp30, 0x80000000, 0x7FFFFFFF, temp31:32) rD32:63 = SATURATE_ACC(ovl, temp40, 0x80000000, 0x7FFFFFFF, temp41:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl For each word element in the accumulator the following operations are performed in the order shown: Each signed integer word element in rA is multiplied by the corresponding signed integer word element in rB. The low 32 bits of the 64-bit intermediate product are subtracted from the contents of the accumulator word to form an intermediate difference. If the intermediate difference has overflowed, the appropriate saturation value (0x7FFFFFFF if positive overflow or 0x80000000 if negative overflow) is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. If there is an overflow from the difference, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. NOTE This instruction produces a valid result only if the intermediate product can be represented in the lower 32 bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 385 0 31 32 63 rA rB X X Intermediate product Accumulator – – Accumulator & rD Figure 6-72. evmwlssianw e200z759n3 Core Reference Manual, Rev. 2 386 Freescale Semiconductor evmwlumi evmwlumi Vector Multiply Word Low Unsigned, Modulo, Integer evmwlumi rD,rA,rB 0 5 6 4 (M=1, F=0, S=0,A=0) 10 11 RD 15 16 RA 20 21 RB 31 100 0100 1000 temp10:63 = rA0:31 *ui rB0:31 temp20:63 = rA32:63 *ui rB32:63 rD0:31 = temp132:63 rD32:63 = temp232:63 Each unsigned integer word element in rA is multiplied by the corresponding unsigned integer word element in rB. The lower 32 bits of the two 64-bit unsigned integer products are placed into the two word elements of rD. NOTE The low-order 32 bits of the product are independent of whether the word elements in rA and rB are treated as signed or unsigned 32-bit integers. 0 31 32 63 rA rB X X Intermediate product rD Figure 6-73. evmwlumi e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 387 evmwlumia evmwlumia Vector Multiply Word Low Unsigned, Modulo, Integer, to Accumulator evmwlumia rD,rA,rB 0 5 6 4 (M=1, F=0, S=0,A=1) 10 11 RD 15 16 RA 20 21 RB 31 100 0110 1000 temp10:63 = rA0:31 *ui rB0:31 temp20:63 = rA32:63 *ui rB32:63 rD0:31 = temp132:63 rD32:63 = temp232:63 ACC0:63 = rD0:63 Each unsigned integer word element in rA is multiplied by the corresponding unsigned integer word element in rB. The lower 32 bits of the two 64-bit unsigned integer products are placed into the two word elements of rD. The result in rD is also placed in the accumulator. NOTE The low-order 32 bits of the product are independent of whether the word elements in rA and rB are treated as signed or unsigned 32-bit integers. Other registers altered: ACC 0 31 32 63 RA RB X X Intermediate product Accumulator & rD Figure 6-74. evmwlumia e200z759n3 Core Reference Manual, Rev. 2 388 Freescale Semiconductor evmwlumiaaw evmwlumiaaw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate in Words evmwlumiaaw 0 rD,rA,rB 5 6 4 (M=1, F=0, S=0) 10 11 RD 15 16 RA 20 21 RB 31 101 0100 1000 temp10:63 = rA0:31 *ui rB0:31 temp20:63 = rA32:63 *ui rB32:63 rD0:31 = ACC0:31 + temp132:63 rD32:63 = ACC32:63 + temp232:63 ACC0:63 = rD0:63 For each word element in the accumulator the following operations are performed in the order shown: Each unsigned integer word element in rA is multiplied by the corresponding unsigned integer word element in rB. The low 32 bits of the 64-bit intermediate product are added to the contents of the accumulator word and placed into the corresponding rD word. The result in rD is also placed in the accumulator. NOTE This instruction produces a valid result only if the intermediate product can be represented in the lower 32 bits. Other registers altered: ACC 0 31 32 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-75. evmwlumiaaw e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 389 evmwlumianw evmwlumianw Vector Multiply Word Low Unsigned, Modulo, Integer and Accumulate Negative in Words evmwlumianw 0 rD,rA,rB 5 6 4 (M=1, F=0, S=0) 10 11 RD 15 16 RA 20 21 RB 31 101 1100 1000 temp10:63 = rA0:31 *ui rB0:31 temp20:63 = rA32:63 *ui rB32:63 rD0:31 = ACC0:31 - temp132:63 rD32:63 = ACC32:63 - temp232:63 ACC0:63 = rD0:63 For each word element in the accumulator the following operations are performed in the order shown: Each unsigned integer word element in rA is multiplied by the corresponding unsigned integer word element in rB. The low 32 bits of the 64-bit intermediate product are subtracted from the contents of the accumulator word and placed into the corresponding rD word. The result in rD is also placed in the accumulator. NOTE This instruction produces a valid result only if the intermediate product can be represented in the lower 32 bits. Other registers altered: ACC 0 31 32 63 rA rB X X Intermediate product Accumulator – – Accumulator &rD Figure 6-76. evmwlumianw e200z759n3 Core Reference Manual, Rev. 2 390 Freescale Semiconductor evmwlusiaaw evmwlusiaaw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate in Words evmwlusiaaw 0 rD,rA,rB 5 4 6 (M=0, F=0, S=0) 10 11 RD 15 16 RA 20 21 RB 31 101 0100 0000 temp10:63 = rA0:31 *ui rB0:31 temp20:63 = rA32:63 *ui rB32:63 temp30:32 = ACC0:31 + temp132:63 temp40:32 = ACC32:63 + temp232:63 ovh = temp30 ovl = temp40 rD0:31 = SATURATE_ACC(ovh, 0xFFFFFFFF, temp31:32) rD32:63 = SATURATE_ACC(ovl, 0xFFFFFFFF, temp41:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl For each word element in the accumulator the following operations are performed in the order shown: Each unsigned integer word element in rA is multiplied by the corresponding unsigned integer word element in rB. The low 32 bits of the 64-bit intermediate product are added to the contents of the accumulator word to form a 33-bit intermediate sum. If the intermediate sum has overflowed, 0xFFFFFFFF is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. If there is an overflow from the addition, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. NOTE This instruction produces a valid result only if the intermediate product can be represented in the lower 32 bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 391 0 31 32 63 rA rB X X Intermediate product Accumulator + + Accumulator & rD Figure 6-77. evmwlusiaaw e200z759n3 Core Reference Manual, Rev. 2 392 Freescale Semiconductor evmwlusianw evmwlusianw Vector Multiply Word Low Unsigned, Saturate, Integer and Accumulate Negative in Words evmwlusianw 0 rD,rA,rB 5 4 6 (M=0, F=0, S=0) 10 11 RD 15 16 RA 20 21 RB 31 101 1100 0000 temp10:63 = rA0:31 *ui rB0:31 temp20:63 = rA32:63 *ui rB32:63 temp30:32 = ACC0:31 - temp132:63 temp40:32 = ACC32:63 - temp232:63 ovh = temp30 ovl = temp40 rD0:31 = SATURATE_ACC(ovh, 0x00000000, temp31:32) rD32:63 = SATURATE_ACC(ovl, 0x00000000, temp41:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl For each word element in the accumulator the following operations are performed in the order shown: Each unsigned integer word element in rA is multiplied by the corresponding unsigned integer word element in rB. The low 32 bits of the 64-bit intermediate product are subtracted from the contents of the accumulator word to form a 33-bit intermediate difference. If the intermediate difference has underflowed, 0x00000000 is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. If there is an underflow from the difference, the underflow information is recorded in the SPEFSCR overflow and summary overflow bits. NOTE This instruction produces a valid result only if the intermediate product can be represented in the lower 32 bits. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 393 0 31 32 63 rA rB X X Intermediate product Accumulator – – Accumulator &rD Figure 6-78. evmwlusianw e200z759n3 Core Reference Manual, Rev. 2 394 Freescale Semiconductor evmwsmf evmwsmf Vector Multiply Word Signed, Modulo, Fractional evmwsmf rD,rA,rB 0 5 6 4 (M=1, F=1, S=1, A=0) 10 11 RD 15 16 RA 20 21 RB 31 100 0101 1011 temp10:64 = (rA32:63 * rB32:63) || 0 rD0:63 = temp11:64 The low signed fractional word element in rA is multiplied by the corresponding low signed fractional word element in rB. Bits 1:63 of the 64-bit signed fractional product are padded on the right with a ‘0’, and this result is placed in rD. 0 31 32 63 rA rB X rD Figure 6-79. evmwsmf e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 395 evmwsmfa evmwsmfa Vector Multiply Word Signed, Modulo, Fractional, to Accumulator evmwsmfa rD,rA,rB 0 5 6 4 (M=1, F=1, S=1, A=1) 10 11 RD 15 16 RA 20 21 31 RB 100 0111 1011 temp10:64 = (rA32:63 * rB32:63) || 0 ACC0:63 = rD0:63 = temp11:64 The low signed fractional word element in rA is multiplied by the corresponding low signed fractional word element in rB. Bits 1:63 of the 64-bit signed fractional product are padded on the right with a ‘0’, and this result is placed in rD. The result in rD is also placed in the accumulator. Other registers altered: ACC 0 31 32 63 rA rB X Accumulator & rD Figure 6-80. evmwsmfa e200z759n3 Core Reference Manual, Rev. 2 396 Freescale Semiconductor evmwsmfaa evmwsmfaa Vector Multiply Word Signed, Modulo, Fractional and Accumulate evmwsmfaa rD,rA,rB 0 5 6 4 (M=1, F=1, S=1) 10 11 RD 15 16 RA 20 21 31 RB 101 0101 1011 temp10:64 = (rA32:63 * rB32:63) || 0 temp20:64 = ACC0:63 + temp11:64 ACC0:63 = rD0:63 = temp21:64 The low signed fractional word element in rA is multiplied by the corresponding low signed fractional word element in rB. Bits 1:63 of the 64-bit signed fractional product are padded on the right with a ‘0’, and this result is added to the contents of the 64-bit accumulator to form a 65-bit intermediate sum. The lower 64 bits of the intermediate sum is placed back into the accumulator and also written into rD. Other registers altered: ACC 0 31 32 63 rA rB X Intermediate product Accumulator + Accumulator & rD Figure 6-81. evmwsmfaa e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 397 evmwsmfan evmwsmfan Vector Multiply Word Signed, Modulo, Fractional and Accumulate Negative evmwsmfan rD,rA,rB 0 5 6 4 (M=1, F=1, S=1) 10 11 RD 15 16 RA 20 21 31 RB 101 1101 1011 temp10:64 = (rA32:63 * rB32:63) || 0 temp20:64 = ACC0:63 - temp11:64 ACC0:63 = rD0:63 = temp21:64 The low signed fractional word element in rA is multiplied by the corresponding low signed fractional word element in rB. Bits 1:63 of the 64-bit signed fractional product are padded on the right with a ‘0’, and this result is subtracted from the contents of the 64-bit accumulator to form a 65-bit intermediate difference. The lower 64 bits of the intermediate difference is placed back into the accumulator and also written into rD. Other registers altered: ACC 0 31 32 63 rA rB X Intermediate product Accumulator – Accumulator & rD Figure 6-82. evmwsmfan e200z759n3 Core Reference Manual, Rev. 2 398 Freescale Semiconductor evmwsmi evmwsmi Vector Multiply Word Signed, Modulo, Integer evmwsmi rD,rA,rB 0 5 6 4 (M=1, F=0, S=1, A=0) 10 11 RD 15 16 RA 20 21 RB 31 100 0101 1001 temp0:63 = rA32:63 *si rB32:63 ACC0:63 = rD0:63 = temp0:63 The low signed integer word element in rA is multiplied by the corresponding low signed integer word element in rB. The 64-bit signed integer product is placed in rD. 0 31 32 63 rA rB X RD Figure 6-83. evmwsmi e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 399 evmwsmia evmwsmia Vector Multiply Word Signed, Modulo, Integer, to Accumulator evmwsmia rD,rA,rB 0 5 6 4 (M=1, F=0, S=1, A=1) 10 11 RD 15 16 RA 20 21 31 RB 100 0111 1001 temp0:63 = rA32:63 *si rB32:63 ACC0:63 = rD0:63 = temp0:63 The low signed integer word element in rA is multiplied by the corresponding low signed integer word element in rB. The 64-bit signed integer product is placed in rD. The result in rD is also placed in the accumulator. Other registers altered: ACC 0 31 32 63 rA rB X Accumulator & rD Figure 6-84. evmwsmia e200z759n3 Core Reference Manual, Rev. 2 400 Freescale Semiconductor evmwsmiaa evmwsmiaa Vector Multiply Word Signed, Modulo, Integer and Accumulate evmwsmiaa rD,rA,rB 0 5 6 4 (M=1, F=0, S=1) 10 11 RD 15 16 RA 20 21 31 RB 101 0101 1001 temp10:63 = rA32:63 *si rB32:63 temp20:64 = ACC0:63 + temp10:63 ACC0:63 = rD0:63 = temp21:64 The low signed integer word element in rA is multiplied by the corresponding low signed integer word element in rB. The intermediate product is added to the contents of the 64-bit accumulator to form a 65-bit intermediate sum. The lower 64 bits of the intermediate sum is placed back into the accumulator and also written into rD. Other registers altered: ACC 0 31 32 63 rA rB X Intermediate product Accumulator + Accumulator & rD Figure 6-85. evmwsmiaa e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 401 evmwsmian evmwsmian Vector Multiply Word Signed, Modulo, Integer and Accumulate Negative evmwsmian rD,rA,rB 0 5 6 4 (M=1, F=0, S=1) 10 11 RD 15 16 RA 20 21 31 RB 101 1101 1001 temp10:63 = rA32:63 *si rB32:63 temp20:64 = ACC0:63 - temp10:63 ACC0:63 = rD0:63 = temp21:64 The low signed integer word element in rA is multiplied by the corresponding low signed integer word element in rB. The intermediate product is subtracted from the contents of the 64-bit accumulator to form a 65-bit intermediate difference. The lower 64 bits of the intermediate difference is placed back into the accumulator and also written into rD. Other registers altered: ACC 0 31 32 63 rA rB X Intermediate product Accumulator – Accumulator & rD Figure 6-86. evmwsmian e200z759n3 Core Reference Manual, Rev. 2 402 Freescale Semiconductor evmwssf evmwssf Vector Multiply Word Signed, Saturate, Fractional evmwssf rD,rA,rB 0 5 6 4 (M=0, F=1, S=1, A=0) 10 11 RD 15 16 RA 20 21 RB 31 100 0101 0011 temp0:64 = (rA32:63 * rB32:63) || 0 movl = temp0 temp1 rD0:63 = SATURATE(movh, 0x7FFFFFFFFFFFFFFF, temp1:64) SPEFSCROVH = 0 SPEFSCROV = movl SPEFSCRSOV = SPEFSCRSOV | movl The low signed fractional word element in rA is multiplied by the corresponding low signed fractional word element in rB. The 64-bit signed fractional product is placed in rD. If the inputs are –1.0 and –1.0 the result is saturated to the most positive signed fraction (0x7FFFFFFFFFFFFFFF). If saturation occurs the overflow and summary overflow bits are recorded. Other registers altered: SPEFSCR 0 31 32 63 rA rB X rD Figure 6-87. evmwssf e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 403 evmwssfa evmwssfa Vector Multiply Word Signed, Saturate, Fractional, to Accumulator evmwssfa rD,rA,rB 0 5 4 6 (M=0, F=1, S=1, A=1) 10 11 RD 15 16 RA 20 21 31 RB 100 0111 0011 temp0:64 = (rA32:63 * rB32:63) || 0 movl = temp0 temp1 ACC0:63 = rD0:63 = SATURATE(movh, 0x7FFFFFFFFFFFFFFF, temp1:64) SPEFSCROVH = 0 SPEFSCROV = movl SPEFSCRSOV = SPEFSCRSOV | movl The low signed fractional word element in rA is multiplied by the corresponding low signed fractional word element in rB. The 64-bit signed fractional product is placed in rD. If the inputs are –1.0 and –1.0 the result is saturated to the most positive signed fraction (0x7FFFFFFFFFFFFFFF). If saturation occurs the overflow and summary overflow bits are recorded. The result in rD is also placed in the accumulator. Other registers altered: SPEFSCR, ACC 0 31 32 63 rA rB X Accumulator & rD Figure 6-88. evmwssfa e200z759n3 Core Reference Manual, Rev. 2 404 Freescale Semiconductor evmwssfaa evmwssfaa Vector Multiply Word Signed, Saturate, Fractional and Accumulate evmwssfaa rD,rA,rB 0 5 4 6 (M=0, F=1, S=1) 10 11 RD 15 16 RA 20 21 RB 31 101 0101 0011 temp10:64 = (rA32:63 * rB32:63) || 0 mov = temp10 temp11 temp20:63 = SATURATE(mov, 0x7FFFFFFFFFFFFFFF, temp11:64) temp30:64 = {ACC0,ACC0:63} + {temp20,temp20:63} ov = temp30 temp31 rD0:63 = SATURATE_ACC(ov, temp30, 0x8000000000000000, 0x7FFFFFFFFFFFFFFF, temp31:64) ACC0:63 = rD0:63 SPEFSCROV = mov | ov SPEFSCROVH = 0 SPEFSCRSOV = SPEFSCRSOV | mov | ov The low signed fractional word element in rA is multiplied by the corresponding low signed fractional word element in rB. If the inputs are –1.0 and –1.0 the product is saturated to the most positive signed fraction (0x7FFFFFFFFFFFFFFF). The 64-bit intermediate product is shifted left by one bit (to eliminate the redundant sign bit) and padded on the right with a ‘0’, and this value is then added to the contents of the 64-bit accumulator to form an intermediate sum. If the intermediate sum has overflowed, the appropriate saturation value (0x7FFFFFFFFFFFFFFF if positive overflow or 0x8000000000000000 if negative overflow) is placed into the accumulator word and the corresponding rD word. Otherwise, the low 64 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. The overflow and summary overflow bits are recorded to indicate occurrence of saturation on either the multiply or the addition. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 405 0 31 32 63 rA rB X Intermediate product Accumulator + Accumulator & rD Figure 6-89. evmwssfaa e200z759n3 Core Reference Manual, Rev. 2 406 Freescale Semiconductor evmwssfan evmwssfan Vector Multiply Word Signed, Saturate, Fractional and Accumulate Negative evmwssfan rD,rA,rB 0 5 4 6 (M=0, F=1, S=1) 10 11 RD 15 16 RA 20 21 RB 31 101 1101 0011 temp10:64 = (rA32:63 * rB32:63) || 0 mov = temp10 temp11 temp20:63 = SATURATE(mov, 0x7FFFFFFFFFFFFFFF, temp11:64) temp30:64 = {ACC0,ACC0:63} - {temp20,temp20:63} ov = temp30 temp31 rD0:63 = SATURATE_ACC(ov, temp30, 0x8000000000000000, 0x7FFFFFFFFFFFFFFF, temp31:64) ACC0:63 = rD0:63 SPEFSCROV = mov | ov SPEFSCROVH = 0 SPEFSCRSOV = SPEFSCRSOV | mov | ov The low signed fractional word element in rA is multiplied by the corresponding low signed fractional word element in rB. If the inputs are –1.0 and –1.0 the product is saturated to the most positive signed fraction (0x7FFFFFFFFFFFFFFF). The 64-bit intermediate product is shifted left by one bit (to eliminate the redundant sign bit) and padded on the right with a ‘0’, and this value is then subtracted from the contents of the 64-bit accumulator to form an intermediate sum. If the intermediate difference has overflowed, the appropriate saturation value (0x7FFFFFFFFFFFFFFF if positive overflow or 0x8000000000000000 if negative overflow) is placed into the accumulator word and the corresponding rD word. Otherwise, the low 64 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. The overflow and summary overflow bits are recorded to indicate occurrence of saturation either the multiply or the subtraction. Other registers altered: SPEFSCR, ACC e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 407 0 31 32 63 rA rB X Intermediate product Accumulator – Accumulator & rD Figure 6-90. evmwssfan e200z759n3 Core Reference Manual, Rev. 2 408 Freescale Semiconductor evmwumi evmwumi Vector Multiply Word Unsigned, Modulo, Integer evmwumi rD,rA,rB 0 5 6 4 (M=1, F=0, S=0, A=0) 10 11 RD 15 16 RA 20 21 RB 31 100 0101 1000 temp0:63 = rA32:63 *ui rB32:63 ACC0:63 = rD0:63 = temp0:63 The low unsigned integer word element in rA is multiplied by the corresponding low unsigned integer word element in rB. The 64-bit unsigned integer product is placed in rD. 0 31 32 63 rA rB X RD Figure 6-91. evmwumi e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 409 evmwumia evmwumia Vector Multiply Word Unsigned, Modulo, Integer, to Accumulator evmwumia rD,rA,rB 0 5 6 4 (M=1, F=0, S=0, A=1) 10 11 RD 15 16 RA 20 21 31 RB 100 0111 1000 temp0:63 = rA32:63 *ui rB32:63 ACC0:63 = rD0:63 = temp0:63 The low unsigned integer word element in rA is multiplied by the corresponding low unsigned integer word element in rB. The 64-bit unsigned integer product is placed in rD. The result in rD is also placed in the accumulator. Other registers altered: ACC 0 31 32 63 rA rB X Accumulator & rD Figure 6-92. evmwumia e200z759n3 Core Reference Manual, Rev. 2 410 Freescale Semiconductor evmwumiaa evmwumiaa Vector Multiply Word Unsigned, Modulo, Integer and Accumulate evmwumiaa rD,rA,rB 0 5 6 4 (M=1, F=0, S=0) 10 11 RD 15 16 RA 20 21 31 RB 101 0101 1000 temp10:63 = rA32:63 *ui rB32:63 temp20:64 = ACC0:63 + temp10:63 ACC0:63 = rD0:63 = temp21:64 The low unsigned integer word element in rA is multiplied by the corresponding low unsigned integer word element in rB. The intermediate product is added to the contents of the 64-bit accumulator to form a 65-bit intermediate sum. The lower 64 bits of the intermediate sum is placed back into the accumulator and also written into rD. Other registers altered: ACC 0 31 32 63 rA rB X Intermediate product Accumulator + Accumulator & rD Figure 6-93. evmwumiaa e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 411 evmwumian evmwumian Vector Multiply Word Unsigned, Modulo, Integer and Accumulate Negative evmwumian rD,rA,rB 0 5 6 4 (M=1, F=0, S=0) 10 11 RD 15 16 RA 20 21 31 RB 101 1101 1000 temp10:63 = rA32:63 *ui rB32:63 temp20:64 = ACC0:63 - temp10:63 ACC0:63 = rD0:63 = temp21:64 The low unsigned integer word element in rA is multiplied by the corresponding low unsigned integer word element in rB. The intermediate product is subtracted from the contents of the 64-bit accumulator to form a 65-bit intermediate difference. The lower 64 bits of the intermediate difference is placed back into the accumulator and also written into rD. Other registers altered: ACC 0 31 32 63 rA rB X Intermediate product Accumulator – Accumulator & RD Figure 6-94. evmwumian 6.4.3 Add/subtract word to accumulator instructions The following instructions perform addition and subtraction, with and without accumulates, using signed or unsigned integer or fractional operands, with optional saturation. e200z759n3 Core Reference Manual, Rev. 2 412 Freescale Semiconductor evaddsmiaaw evaddsmiaaw Vector Add Signed, Modulo, Integer to Accumulator Word evaddsmiaaw rD,rA 0 5 6 4 (M=1, S=1) 10 11 RD 15 16 RA 20 21 31 0000 0 100 1100 1001 rD0:31 = ACC0:31 + rA0:31 rD32:63 = ACC32:63 + rA32:63 ACC0:63 = rD0:63 Each word element in rA is added to the corresponding word element in the accumulator and placed into the corresponding rD word. The result in rD is also placed in the accumulator. Other registers altered: ACC 0 31 32 63 rA Accumulator + + Accumulator & rD Figure 6-95. evaddsmiaaw e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 413 evaddssiaaw evaddssiaaw Vector Add Signed, Saturate, Integer to Accumulator Word evaddssiaaw rD,rA 0 5 4 6 (M=0, S=1) 10 11 15 16 RD RA 20 21 31 0000 0 100 1100 0001 temp10:32 = EXTS(ACC0:31) + EXTS(rA0:31) temp20:32 = EXTS(ACC32:63) + EXTS(rA32:63) ovh = temp10 temp11 ovl = temp20 temp21 rD0:31 = SATURATE_ACC(ovh, temp10, 0x80000000, 0x7FFFFFFF, temp11:32) rD32:63 = SATURATE_ACC(ovl, temp20, 0x80000000, 0x7FFFFFFF, temp21:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl Each word element in rA is added to the corresponding word element in the accumulator to form 33-bit intermediate sum. If the intermediate sum has overflowed, the appropriate saturation value (0x7FFFFFFF if positive overflow or 0x80000000 if negative overflow) is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. If there is an overflow from the addition, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC 0 31 32 63 rA Accumulator + + Accumulator & rD Figure 6-96. evaddssiaaw e200z759n3 Core Reference Manual, Rev. 2 414 Freescale Semiconductor evaddumiaaw evaddumiaaw Vector Add Unsigned, Modulo, Integer to Accumulator Word evaddumiaaw rD,rA 0 5 6 4 (M=1, S=0) 10 11 RD 15 16 RA 20 21 31 0000 0 100 1100 1000 rD0:31 = ACC0:31 + rA0:31 rD32:63 = ACC32:63 + rA32:63 ACC0:63 = rD0:63 Each word element in rA is added to the corresponding word element in the accumulator and placed into the corresponding rD word. The result in rD is also placed in the accumulator. Other registers altered: ACC 0 31 32 63 rA Accumulator + + Accumulator & rD Figure 6-97. evaddumiaaw e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 415 evaddusiaaw evaddusiaaw Vector Add Unsigned, Saturate, Integer to Accumulator Word evaddusiaaw rD,rA 0 5 4 6 (M=0, S=0) 10 11 15 16 RD RA 20 21 31 0000 0 100 1100 0000 temp10:32 = EXTZ(ACC0:31) + EXTZ(rA0:31) temp20:32 = EXTZ(ACC32:63) + EXTZ(rA32:63) ovh = temp10 ovl = temp20 rD0:31 = SATURATE(ovh, 0xFFFFFFFF, temp11:32) rD32:63 = SATURATE(ovl, 0xFFFFFFFF, temp21:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl Each word element in rA is added to the corresponding word element in the accumulator to form 33-bit intermediate sum. If the intermediate sum has overflowed, 0xFFFFFFFF is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate sum are placed into the accumulator word and the corresponding rD word. If there is an overflow from the addition, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC 0 31 32 63 rA Accumulator + + Accumulator & rD Figure 6-98. evaddusiaaw e200z759n3 Core Reference Manual, Rev. 2 416 Freescale Semiconductor evsubfsmiaaw evsubfsmiaaw Vector Subtract Signed, Modulo, Integer to Accumulator Word evsubfsmiaaw rD,rA 0 5 6 4 (M=1, S=1) 10 11 RD 15 16 RA 20 21 31 0000 0 100 1100 1011 rD0:31 = ACC0:31 - rA0:31 rD32:63 = ACC32:63 - rA32:63 ACC0:63 = rD0:63 Each word element in rA is subtracted from the corresponding word element in the accumulator and placed into the corresponding rD word. The result in rD is also placed in the accumulator. Other registers altered: ACC 0 31 32 63 rA Accumulator – – Accumulator & rD Figure 6-99. evsubfsmiaaw e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 417 evsubfssiaaw evsubfssiaaw Vector Subtract Signed, Saturate, Integer to Accumulator Word evsubfssiaaw rD,rA 0 5 4 6 (M=0, S=1) 10 11 RD 15 16 RA 20 21 31 0000 0 100 1100 0011 temp10:32 = EXTS(ACC0:31) - EXTS(rA0:31) temp20:32 = EXTS(ACC32:63) - EXTS(rA32:63) ovh = temp10 temp11 ovl = temp20 temp21 rD0:31 = SATURATE_ACC(ovh, temp10, 0x80000000, 0x7FFFFFFF, temp11:32) rD32:63 = SATURATE_ACC(ovl, temp20, 0x80000000, 0x7FFFFFFF, temp21:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl Each word element in rA is subtracted from the corresponding word element in the accumulator to form 33-bit intermediate difference. If the intermediate difference has overflowed, the appropriate saturation value (0x7FFFFFFF if positive overflow or 0x80000000 if negative overflow) is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. If there is an overflow from the subtraction, the overflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC 0 31 32 63 rA Accumulator – – Accumulator & rD Figure 6-100. evsubfssiaaw e200z759n3 Core Reference Manual, Rev. 2 418 Freescale Semiconductor evsubfumiaaw evsubfumiaaw Vector Subtract Unsigned, Modulo, Integer to Accumulator Word evsubfumiaaw rD,rA 0 5 6 4 (M=1, S=0) 10 11 RD 15 16 RA 20 21 31 0000 0 100 1100 1010 rD0:31 = ACC0:31 - rA0:31 rD32:63 = ACC32:63 - rA32:63 ACC0:63 = rD0:63 Each word element in rA is subtracted from the corresponding word element in the accumulator and placed into the corresponding rD word. The result in rD is also placed in the accumulator. Other registers altered: ACC 0 31 32 63 rA Accumulator – – Accumulator & rD Figure 6-101. evsubfumiaaw e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 419 evsubfusiaaw evsubfusiaaw Vector Subtract Unsigned, Saturate, Integer to Accumulator Word evsubfusiaaw rD,rA 0 5 6 4 (M=0, S=0) 10 11 RD 15 16 RA 20 21 31 0000 0 100 1100 0010 temp10:32 = EXTZ(ACC0:31) - EXTZ(rA0:31) temp20:32 = EXTZ(ACC32:63) - EXTZ(rA32:63) ovh = temp10 ovl = temp20 rD0:31 = SATURATE(ovh, 0x00000000, temp11:32) rD32:63 = SATURATE(ovl, 0x00000000, temp21:32) ACC0:31 = rD0:31 ACC32:63 = rD32:63 SPEFSCROVH = ovh SPEFSCROV = ovl SPEFSCRSOVH = SPEFSCRSOVH | ovh SPEFSCRSOV = SPEFSCRSOV | ovl Each word element in rA is subtracted from the corresponding word element in the accumulator to form 33-bit intermediate difference. If the intermediate difference has underflowed, 0x00000000 is placed into the accumulator word and the corresponding rD word. Otherwise, the low 32 bits of the intermediate difference are placed into the accumulator word and the corresponding rD word. If there is an underflow from the subtraction, the underflow information is recorded in the SPEFSCR overflow and summary overflow bits. Other registers altered: SPEFSCR, ACC 0 31 32 63 RA Accumulator – – Accumulator & RD Figure 6-102. evsubfusiaaw 6.4.4 Initializing and reading the accumulator To read the accumulator contents into a register, a multiply-accumulate instruction where one of its operands is a zero should be used, as the following sequence shows: evxor RD, RD, RD // Zero the contents of RD, not necessary if // a zero is available in some register. e200z759n3 Core Reference Manual, Rev. 2 420 Freescale Semiconductor evmwumiaa RD, RD, RD // Multiply 0 with 0, add the 0 result to // accumulator and store back the value in acc and RD To initialize the accumulator, the evmra instruction is used: e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 421 evmra evmra Move Register to Accumulator evmra rD,rA 0 5 4 6 10 11 RD 15 16 RA 20 21 0000 0 31 100 1100 0100 RD0:63 = acc0:63 = RA0:63 The contents of rA are written into the accumulator and copied into rD. This is the method for initializing the accumulator. 6.5 SPE vector load/store instructions SPE Vector load and store instructions are provided with a variety of options. The mnemonics are formed as follows: ev{l,st}<X><Y>[Z]x • X specifies the size of the load • Y specifies the size of data packed into the value being loaded. Thus evldhx specified a load that brings in a double-word composed of four half words. • Z specifies the operation to be performed such as unpack or splat. All load and store instructions are specified as indexed forms. A specification of a 0 in the rA field of the instruction results in the non-indexed form of the instruction. For all loads and stores, only the lower 32 bits of registers rA and rB are used and the effective address is 32 bits. PowerISA 2.06 load instructions are implemented such that the upper half of all registers are left unchanged for a load. e200z759n3 Core Reference Manual, Rev. 2 422 Freescale Semiconductor evldd evldd Vector Load Double into Double evldd rD,d(rA) 0 5 4 1 6 10 11 15 16 RD 20 21 31 UIMM1 RA 011 0000 0001 d = UIMM<<3 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*8) RD = MEM(EA,8) Figure 6-103 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 4 5 6 7 Memory a b c d e f g h GPR in big endian a b c d e f g h GPR in little endian h g f e d c b a Figure 6-103. evldd results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 423 evlddx evlddx Vector Load Double into Double Indexed evlddx rD,rA,rB 0 5 4 6 10 11 15 16 RD RA 20 21 31 RB 011 0000 0000 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) RD = MEM(EA,8) Figure 6-104 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 4 5 6 7 Memory a b c d e f g h GPR in big endian a b c d e f g h GPR in little endian h g f e d c b a Figure 6-104. evlddx results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 424 Freescale Semiconductor evldw evldw Vector Load Double into Words evldw rD,d(rA) 0 5 4 1 6 10 11 15 16 RD 20 21 31 UIMM1 RA 011 0000 0011 d = UIMM<<3 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*8) RD0:31 = MEM(EA,4) RD32:63 = MEM(EA+4,4) Figure 6-105 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 4 5 6 7 Memory a b c d e f g h GPR in big endian a b c d e f g h GPR in little endian d c b a h g f e Figure 6-105. evldw results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 425 evldwx evldwx Vector Load Double into Words Indexed evldwx rD,rA,rB 0 5 4 6 10 11 15 16 RD RA 20 21 31 RB 011 0000 0010 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) RD0:31 = MEM(EA,4) RD32:63 = MEM(EA+4,4) Figure 6-106 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 4 5 6 7 Memory a b c d e f g h GPR in big endian a b c d e f g h GPR in little endian d c b a h g f e Figure 6-106. evldwx results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 426 Freescale Semiconductor evldh evldh Vector Load Double into Halfwords evldh rD,d(rA) 0 5 4 1 6 10 11 15 16 RD 20 21 31 UIMM1 RA 011 0000 0101 d = UIMM<<3 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*8) RD0:15 = MEM(EA,2) RD16:31 = MEM(EA+2,2) RD32:47 = MEM(EA+4,2) RD48:63 = MEM(EA+6,2) Figure 6-107 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 4 5 6 7 Memory a b c d e f g h GPR in big endian a b c d e f g h GPR in little endian b a d c f e h g Figure 6-107. evldh results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 427 evldhx evldhx Vector Load Double into Halfwords Indexed evldhx rD,rA,rB 0 5 4 6 10 11 15 16 RD RA 20 21 31 RB 011 0000 0100 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) RD0:15 = MEM(EA,2) RD16:31 = MEM(EA+2,2) RD32:47 = MEM(EA+4,2) RD48:63 = MEM(EA+6,2) Figure 6-108 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 4 5 6 7 Memory a b c d e f g h GPR in big endian a b c d e f g h GPR in little endian b a d c f e h g Figure 6-108. evldhx results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 428 Freescale Semiconductor evlwhe evlwhe Vector Load Word into Half words Even evlwhe rD,d(rA) 0 5 4 1 6 10 11 RD 15 16 20 21 UIMM1 RA 31 011 0001 0001 d = UIMM<<2 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*4) RD0:15 = MEM(EA,2) RD16:31 = 0x0000 RD32:47 = MEM(EA+2,2) RD48:63 = 0x0000 Figure 6-109 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 Memory a b c d GPR in big endian a b Z Z c d Z Z Z = zero GPR in little endian b a Z Z d c Z Z Z = zero Figure 6-109. evlwhe results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 429 evlwhex evlwhex Vector Load Word into Halfwords Even Indexed evlwhex rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 31 011 0001 0000 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) RD0:15 = MEM(EA,2) RD16:31 = 0x0000 RD32:47 = MEM(EA+2,2) RD48:63 = 0x0000 Figure 6-110 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 Memory a b c d GPR in big endian a b Z Z c d Z Z Z = zero GPR in little endian b a Z Z d c Z Z Z = zero Figure 6-110. evlwhex results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 430 Freescale Semiconductor evlwhou evlwhou Vector Load Word into Halfwords Odd Unsigned (zero-extended) evlwhou rD,d(rA) 0 5 4 1 6 10 11 RD 15 16 20 21 UIMM1 RA 31 011 0001 0101 d = UIMM<<2 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*4) RD0:15 = 0x0000 RD16:31 = MEM(EA,2) RD32:47 = 0x0000 RD48:63 = MEM(EA+2,2) Figure 6-111 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 Memory a b c d GPR in big endian Z Z a b Z Z c d Z = zero GPR in little endian Z Z b a Z Z d c Z = zero Figure 6-111. evlwhou results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 431 evlwhoux evlwhoux Vector Load Word into Halfwords Odd Unsigned Indexed (zero-extended) evlwhoux rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 31 011 0001 0100 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) RD0:15 = 0x0000 RD16:31 = MEM(EA,2) RD32:47 = 0x0000 RD48:63 = MEM(EA+2,2) Figure 6-112 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 Memory a b c d GPR in big endian Z Z a b Z Z c d Z = zero GPR in little endian Z Z b a Z Z d c Z = zero Figure 6-112. evlwhoux results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 432 Freescale Semiconductor evlwhos evlwhos Vector Load Word into Halfwords Odd Signed (with sign extension) evlwhos rD,d(rA) 0 5 4 1 6 10 11 RD 15 16 20 21 UIMM1 RA 31 011 0001 0111 d = UIMM<<2 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*4) RD0:31 = EXTS(MEM(EA,2)) RD32:63 = EXTS(MEM(EA+2,2)) Figure 6-113 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 Memory a b c d GPR in big endian S S a b S S c d S = sign GPR in little endian S S b a S S d c S = sign Figure 6-113. evlwhos results in big- and little-endian modes In the big-endian memory, the msb of a and c are sign-extended. In the little-endian memory, the msb of b and d are sign-extended. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 433 evlwhosx evlwhosx Vector Load Word into Halfwords Odd Signed Indexed (with sign extension) evlwhosx rD,rA,rB 0 5 4 6 10 11 RD 15 16 RA 20 21 RB 31 011 0001 0110 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) RD0:31 = EXTS(MEM(EA,2)) RD32:63 = EXTS(MEM(EA+2,2)) Figure 6-114 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 Memory a b c d GPR in big endian S S a b S S c d S = sign GPR in little endian S S b a S S d c S = sign Figure 6-114. evlwhosx results in big- and little-endian modes In the big-endian memory, the msbs of a and c are sign-extended. In the little-endian memory, the msbs of b and d are sign-extended. e200z759n3 Core Reference Manual, Rev. 2 434 Freescale Semiconductor evlwwsplat evlwwsplat Vector Load Word into Word and Splat evlwwsplat rD,d(rA) 0 5 4 1 6 10 11 15 16 RD 20 21 31 UIMM1 RA 011 0001 1001 d = UIMM<<2 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*4) RD0:31 = MEM(EA,4) RD32:63 = MEM(EA,4) Figure 6-115 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 Memory a b c d GPR in big endian a b c d a b c d GPR in little endian d c b a d c b a Figure 6-115. evlwwsplat results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 435 evlwwsplatx evlwwsplatx Vector Load Word into Word and Splat Indexed evlwwsplatx rD,rA,rB 0 5 4 6 10 11 15 16 RD RA 20 21 31 RB 011 0001 1000 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) RD0:31 = MEM(EA,4) RD32:63 = MEM(EA,4) Figure 6-116 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 Memory a b c d GPR in big endian a b c d a b c d GPR in little endian d c b a d c b a Figure 6-116. evlwwsplatx results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 436 Freescale Semiconductor evlwhsplat evlwhsplat Vector Load Word into Halfwords and Splat evlwhsplat rD,d(rA) 0 5 4 1 6 10 11 15 16 RD 20 21 UIMM1 RA 31 011 0001 1101 d = UIMM<<2 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*4) RD0:15 = MEM(EA,2) RD16:31 = MEM(EA,2) RD32:47 = MEM(EA+2,2) RD48:63 = MEM(EA+2,2) Figure 6-117 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 Memory a b c d GPR in big endian a b a b c d c d GPR in little endian b a b a d c d c Figure 6-117. evlwhsplat results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 437 evlwhsplatx evlwhsplatx Vector Load Word into Halfwords and Splat Indexed evlwhsplatx rD,rA,rB 0 5 4 6 10 11 15 16 RD RA 20 21 RB 31 011 0001 1100 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) RD0:15 = MEM(EA,2) RD16:31 = MEM(EA,2) RD32:47 = MEM(EA+2,2) RD48:63 = MEM(EA+2,2) Figure 6-118 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 2 3 Memory a b c d GPR in big endian a b a b c d c d GPR in little endian b a b a d c d c Figure 6-118. evlwhsplatx results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 438 Freescale Semiconductor evlhhesplat evlhhesplat Vector Load Halfword into Halfword Even and Splat evlhhesplat rD,d(rA) 0 5 4 1 6 10 11 RD 15 16 20 21 UIMM1 RA 31 011 0000 1001 d = UIMM<<1 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*2) RD0:15 = MEM(EA,2) RD16:31 = 0x0000 RD32:47 = MEM(EA,2) RD48:63 = 0x0000 Figure 6-119 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 Memory a b GPR in big endian a b Z Z a b Z Z Z = zero GPR in little endian b a Z Z b a Z Z Z = zero Figure 6-119. evlhhesplat results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 439 evlhhesplatx evlhhesplatx Vector Load Halfword into Halfword Even and Splat Indexed evlhhesplatx 0 rD,rA,rB 5 4 6 10 11 RD 15 16 RA 20 21 RB 31 011 0000 1000 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) RD0:15 = MEM(EA,2) RD16:31 = 0x0000 RD32:47 = MEM(EA,2) RD48:63 = 0x0000 Figure 6-120 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 Memory a b GPR in big endian a b Z Z a b Z Z Z = zero GPR in little endian b a Z Z b a Z Z Z = zero Figure 6-120. evlhhesplatx results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 440 Freescale Semiconductor evlhhousplat evlhhousplat Vector Load Halfword into Halfword Odd Unsigned and Splat evlhhousplat 0 rD,d(rA) 5 4 1 6 10 11 RD 15 16 20 21 UIMM1 RA 31 011 0000 1101 d = UIMM<<1 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*2) RD0:15 = 0x0000 RD16:31 = MEM(EA,2) RD32:47 = 0x0000 RD48:63 = MEM(EA,2) Figure 6-121 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 Memory a b GPR in big endian Z Z a b Z Z a b Z = zero GPR in little endian Z Z b a Z Z b a Z = zero Figure 6-121. evlhhousplat results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 441 evlhhousplatx evlhhousplatx Vector Load Halfword into Halfword Odd Unsigned and Splat Indexed evlhhousplatx 0 rD,rA,rB 5 4 6 10 11 RD 15 16 RA 20 21 RB 31 011 0000 1100 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) RD0:15 = 0x0000 RD16:31 = MEM(EA,2) RD32:47 = 0x0000 RD48:63 = MEM(EA,2) Figure 6-122 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 Memory a b GPR in big endian Z Z a b Z Z a b Z = zero GPR in little endian Z Z b a Z Z b a Z = zero Figure 6-122. evlhhousplatx results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 442 Freescale Semiconductor evlhhossplat evlhhossplat Vector Load Halfword into Halfword Odd Signed and Splat evlhhossplat 0 rD,d(rA) 5 4 1 6 10 11 RD 15 16 20 21 UIMM1 RA 31 011 0000 1111 d = UIMM<<1 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*2) RD0:31 = EXTS(MEM(EA,2)) RD32:63 = EXTS(MEM(EA,2)) Figure 6-123 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 Memory a b GPR in big endian S S a b S S a b S = sign GPR in little endian S S b a S S b a S = sign Figure 6-123. evlhhossplat results in big- and little-endian modes In big-endian memory, the msb of a is sign-extended. In the little-endian memory, the msb of b is sign-extended. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 443 evlhhossplatx evlhhossplatx Vector Load Halfword into Halfword Odd Signed and Splat Indexed evlhhossplatx 0 rD,rA,rB 5 4 6 10 11 RD 15 16 RA 20 21 RB 31 011 0000 1110 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) RD0:31 = EXTS(MEM(EA,2)) RD32:63 = EXTS(MEM(EA,2)) Figure 6-124 shows how bytes are loaded into rD as determined by the endian mode. Byte addr 0 1 Memory a b GPR in big endian S S a b S S a b S = sign GPR in little endian S S b a S S b a S = sign Figure 6-124. evlhhossplatx results in big- and little-endian modes In big-endian memory, the msb of a is sign-extended. In the little-endian memory, the msb of b is sign-extended. e200z759n3 Core Reference Manual, Rev. 2 444 Freescale Semiconductor evstdd evstdd Vector Store Double of Double evstdd rS,d(rA) 0 5 4 1 6 10 11 RS 15 16 20 21 31 UIMM1 RA 011 0010 0001 d = UIMM<<3 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*8) MEM(EA,8) = RS0:63 Figure 6-125 shows how bytes are stored in memory as determined by the endian mode. GPR a b c d e f g h Byte addr 0 1 2 3 4 5 6 7 Memory in big endian a b c d e f g h Memory in little endian h g f e d c b a Figure 6-125. evstdd results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 445 evstddx evstddx Vector Store Double of Double Indexed evstddx rS,rA,rB 0 5 4 6 10 11 RS 15 16 20 21 RA 31 RB 011 0010 0000 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) MEM(EA,8) = RS0:63 Figure 6-126 shows how bytes are stored in memory as determined by the endian mode. GPR a b c d e f g h Byte addr 0 1 2 3 4 5 6 7 Memory in big endian a b c d e f g h Memory in little endian h g f e d c b a Figure 6-126. evstddx results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 446 Freescale Semiconductor evstdw evstdw Vector Store Double of Two Words evstdw rS,d(rA) 0 5 4 1 6 10 11 RS 15 16 20 21 31 UIMM1 RA 011 0010 0011 d = UIMM<<3 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*8) MEM(EA,4) = RS0:31 MEM(EA+4,4) = RS32:63 Figure 6-127 shows how bytes are stored in memory as determined by the endian mode. GPR a b c d e f g h Byte addr 0 1 2 3 4 5 6 7 Memory in big endian a b c d e f g h Memory in little endian d c b a h g f e Figure 6-127. evstdw results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 447 evstdwx evstdwx Vector Store Double of Two Words Indexed evstdwx rS,rA,rB 0 5 4 6 10 11 RS 15 16 RA 20 21 31 RB 011 0010 0010 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) MEM(EA,4) = RS0:31 MEM(EA+4,4) = RS32:63 Figure 6-128 shows how bytes are stored in memory as determined by the endian mode. GPR a b c d e f g h Byte addr 0 1 2 3 4 5 6 7 Memory in big endian a b c d e f g h Memory in little endian d c b a h g f e Figure 6-128. evstdwx results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 448 Freescale Semiconductor evstdh evstdh Vector Store Double of Four Halfwords evstdh rS,d(rA) 0 5 4 1 6 10 11 RS 15 16 20 21 31 UIMM1 RA 011 0010 0101 d = UIMM<<3 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*8) MEM(EA,2) = RS0:15 MEM(EA+2,2) = RS16:31 MEM(EA+4,2) = RS32:47 MEM(EA+6,2) = RS48:63 Figure 6-129 shows how bytes are stored in memory as determined by the endian mode. GPR a b c d e f g h Byte addr 0 1 2 3 4 5 6 7 Memory in big endian a b c d e f g h Memory in little endian b a d c f e h g Figure 6-129. evstdh results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 449 evstdhx evstdhx Vector Store Double of Four Halfwords Indexed evstdhx rS,rA,rB 0 5 4 6 10 11 RS 15 16 RA 20 21 31 RB 011 0010 0100 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) MEM(EA,2) = RS0:15 MEM(EA+2,2) = RS16:31 MEM(EA+4,2) = RS32:47 MEM(EA+6,2) = RS48:63 Figure 6-130 shows how bytes are stored in memory as determined by the endian mode. GPR a b c d e f g h Byte addr 0 1 2 3 4 5 6 7 Memory in big endian a b c d e f g h Memory in little endian b a d c f e h g Figure 6-130. evstdhx results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 450 Freescale Semiconductor evstwwe evstwwe Vector Store Word of Word from Even evstwwe rS,d(rA) 0 5 4 1 6 10 11 RS 15 16 20 21 31 UIMM1 RA 011 0011 1001 d = UIMM<<2 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*4) MEM(EA,4) = RS0:31 Figure 6-131 shows how bytes are stored in memory as determined by the endian mode. GPR a b c d Byte addr 0 1 2 3 Memory in big endian a b c d Memory in little endian d c b a e f g h Figure 6-131. evstwwe results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 451 evstwwex evstwwex Vector Store Word of Word from Even Indexed evstwwex rS,rA,rB 0 5 4 6 10 11 RS 15 16 RA 20 21 31 RB 011 0011 1000 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) MEM(EA,4) = RS0:31 Figure 6-132 shows how bytes are stored in memory as determined by the endian mode. g GPR a b c d Byte addr 0 1 2 3 Memory in big endian a b c d Memory in little endian d c b a e f g h Figure 6-132. evstwwex results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 452 Freescale Semiconductor evstwwo evstwwo Vector Store Word of Word from Odd evstwwo rS,d(rA) 0 5 4 1 6 10 11 RS 15 16 20 21 31 UIMM1 RA 011 0011 1101 d = UIMM<<2 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*4) MEM(EA,4) = rS32:63 Figure 6-133 shows how bytes are stored in memory as determined by the endian mode. GPR a b c d Byte addr 0 1 2 3 Memory in big endian e f g h Memory in little endian h g f e e f g h Figure 6-133. evstwwo results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 453 evstwwox evstwwox Vector Store Word of Word from Odd Indexed evstwwox rS,rA,rB 0 5 4 6 10 11 RS 15 16 20 21 RA 31 RB 011 0011 1100 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) MEM(EA,4) = rS32:63 Figure 6-134 shows how bytes are stored in memory as determined by the endian mode. GPR a b c d Byte addr 0 1 2 3 Memory in big endian e f g h Memory in little endian h g f e e f g h Figure 6-134. evstwwox results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 454 Freescale Semiconductor evstwhe evstwhe Vector Store Word of Two Halfwords from Even evstwhe rS,d(rA) 0 5 4 1 6 10 11 RS 15 16 20 21 31 UIMM1 RA 011 0011 0001 d = UIMM<<2 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*4) MEM(EA,2) = RS0:15 MEM(EA+2,2) = RS32:47 Figure 6-135 shows how bytes are stored in memory as determined by the endian mode. GPR a b c d Byte addr 0 1 2 3 Memory in big endian a b e f Memory in little endian b a f e e f g h Figure 6-135. evstwhe results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 455 evstwhex evstwhex Vector Store Word of Two Halfwords from Even Indexed evstwhex rS,rA,rB 0 5 4 6 10 11 RS 15 16 20 21 RA 31 RB 011 0011 0000 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) MEM(EA,2) = RS0:15 MEM(EA+2,2) = RS32:47 Figure 6-136 shows how bytes are stored in memory as determined by the endian mode. GPR a b c d Byte addr 0 1 2 3 Memory in big endian a b e f Memory in little endian b a f e e f g h Figure 6-136. evstwhex results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 456 Freescale Semiconductor evstwho evstwho Vector Store Word of Two Halfwords from Odd evstwho rS,d(rA) 0 5 4 1 6 10 11 RS 15 16 20 21 31 UIMM1 RA 011 0011 0101 d = UIMM<<2 if (rA == 0) then b = 0 else b = (rA) EA = b + EXTZ(UIMM*4) MEM(EA,2) = RS16:31 MEM(EA+2,2) = RS48:63 Figure 6-137 shows how bytes are stored in memory as determined by the endian mode. GPR a b c d Byte addr 0 1 2 3 Memory in big endian c d g h Memory in little endian d c h g e f g h Figure 6-137. evstwho results in big- and little-endian modes e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 457 evstwhox evstwhox Vector Store Word of Two Halfwords from Odd Indexed evstwhox rS,rA,rB 0 5 6 4 10 11 RS 15 16 RA 20 21 RB 31 011 0011 0100 if (rA == 0) then b = 0 else b = (rA) EA = b + (rB) MEM(EA,2) = RS16:31 MEM(EA+2,2) = RS48:63 Figure 6-138 shows how bytes are stored in memory as determined by the endian mode. GPR a b c d Byte addr 0 1 2 3 Memory in big endian c d g h Memory in little endian d c h g e f g h Figure 6-138. evstwhox results in big- and little-endian modes 6.6 SPE instruction timing Instruction timing in number of processor clock cycles for SPE instructions are shown in Table 6-4, Table 6-5, and Table 6-6. Pipelined instructions are shown with cycles of total latency and throughput cycles. Divide instructions are not pipelined and block other instructions from executing during divide execution. 6.6.1 SPE integer simple instructions timing Instruction timing for SPE integer simple instructions is shown in Table 6-4. The table is sorted by opcode. These instructions are issued as a pair of operations. Table 6-4. Timing for integer simple instructions Instruction Latency Throughput Comments brinc 1 1 — evabs 1 1 — evaddiw 1 1 — e200z759n3 Core Reference Manual, Rev. 2 458 Freescale Semiconductor Table 6-4. Timing for integer simple instructions (continued) Instruction Latency Throughput Comments evaddw 1 1 — evand 1 1 — evandc 1 1 — evcmpeq 1 1 — evcmpgts 1 1 — evcmpgtu 1 1 — evcmplts 1 1 — evcmpltu 1 1 — evcntlsw 1 1 — evcntlzw 1 1 — eveqv 1 1 — evextsb 1 1 — evextsh 1 1 — evmergehi 1 1 — evmergehilo 1 1 — evmergelo 1 1 — evmergelohi 1 1 — evnand 1 1 — evneg 1 1 — evnor 1 1 — evor 1 1 — evorc 1 1 — evrlw 1 1 — evrlwi 1 1 — evrndw 1 1 — evsel 1 1 — evslw 1 1 — e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 459 Table 6-4. Timing for integer simple instructions (continued) 6.6.2 Instruction Latency Throughput Comments evslwi 1 1 — evsplatfi 1 1 — evsplati 1 1 — evsrwis 1 1 — evsrwiu 1 1 — evsrws 1 1 — evsrwu 1 1 — evsubfw 1 1 — evsubifw 1 1 — evxor 1 1 — SPE load and store instruction timing Instruction timing for SPE load and store instructions is shown in Table 6-4. The table is sorted by opcode. Actual timing will depend on alignment; the table indicates timing for aligned operands. Table 6-5. SPE load and store instruction timing Instruction Latency Throughput Comments evldd 3 1 — evlddx 3 1 — evldh 3 1 — evldhx 3 1 — evldw 3 1 — evldwx 3 1 — evlhhesplat 3 1 — evlhhesplatx 3 1 — evlhhossplat 3 1 — evlhhossplatx 3 1 — evlhhousplat 3 1 — evlhhousplatx 3 1 — e200z759n3 Core Reference Manual, Rev. 2 460 Freescale Semiconductor Table 6-5. SPE load and store instruction timing (continued) 6.6.3 Instruction Latency Throughput Comments evlwhe 3 1 — evlwhex 3 1 — evlwhos 3 1 — evlwhosx 3 1 — evlwhou 3 1 — evlwhoux 3 1 — evlwhsplat 3 1 — evlwhsplatx 3 1 — evlwwsplat 3 1 — evlwwsplatx 3 1 — evstdd 3 1 — evstddx 3 1 — evstdh 3 1 — evstdhx 3 1 — evstdw 3 1 — evstdwx 3 1 — evstwhe 3 1 — evstwhex 3 1 — evstwho 3 1 — evstwhox 3 1 — evstwwe 3 1 — evstwwex 3 1 — evstwwo 3 1 — evstwwox 3 1 — SPE complex integer instruction timing Instruction timing for SPE complex integer instructions is shown in Table 6-6. The table is sorted by opcode. For the divide instructions, the number of stall cycles is (latency) for following instructions. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 461 Table 6-6. SPE complex integer instruction timing Instruction Latency Throughput Comments evaddsmiaaw 1 1 — evaddssiaaw 1 1 — evaddumiaaw 1 1 — evaddusiaaw 1 1 — evdivws 12–321 12–321 — evdivwu 12–321 12–321 — evmhegsmfaa 4 1 — evmhegsmfan 4 1 — evmhegsmiaa 4 1 — evmhegsmian 4 1 — evmhegumiaa 4 1 — evmhegumian 4 1 — evmhesmf 4 1 — evmhesmfa 4 1 — evmhesmfaaw 4 1 — evmhesmfanw 4 1 — evmhesmi 4 1 — evmhesmia 4 1 — evmhesmiaaw 4 1 — evmhesmianw 4 1 — evmhessf 4 1 — evmhessfa 4 1 — evmhessfaaw 4 1 — evmhessfanw 4 1 — evmhessiaaw 4 1 — evmhessianw 4 1 — evmheumi 4 1 — e200z759n3 Core Reference Manual, Rev. 2 462 Freescale Semiconductor Table 6-6. SPE complex integer instruction timing (continued) Instruction Latency Throughput Comments evmheumia 4 1 — evmheumiaaw 4 1 — evmheumianw 4 1 — evmheusiaaw 4 1 — evmheusianw 4 1 — evmhogsmfaa 4 1 — evmhogsmfan 4 1 — evmhogsmiaa 4 1 — evmhogsmian 4 1 — evmhogumiaa 4 1 — evmhogumian 4 1 — evmhosmf 4 1 — evmhosmfa 4 1 — evmhosmfaaw 4 1 — evmhosmfanw 4 1 — evmhosmi 4 1 — evmhosmia 4 1 — evmhosmiaaw 4 1 — evmhosmianw 4 1 — evmhossf 4 1 — evmhossfa 4 1 — evmhossfaaw 4 1 — evmhossfanw 4 1 — evmhossiaaw 4 1 — evmhossianw 4 1 — evmhoumi 4 1 — evmhoumia 4 1 — e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 463 Table 6-6. SPE complex integer instruction timing (continued) Instruction Latency Throughput Comments evmhoumiaaw 4 1 — evmhoumianw 4 1 — evmhousiaaw 4 1 — evmhousianw 4 1 — evmra 4 1 — evmwhsmf 4 1 — evmwhsmfa 4 1 — evmwhsmi 4 1 — evmwhsmia 4 1 — evmwhssf 4 1 — evmwhssfa 4 1 — evmwhumi 4 1 — evmwhumia 4 1 — evmwlsmiaaw 4 1 — evmwlsmianw 4 1 — evmwlssiaaw 4 1 — evmwlssianw 4 1 — evmwlumi 4 1 — evmwlumia 4 1 — evmwlumiaaw 4 1 — evmwlumianw 4 1 — evmwlusiaaw 4 1 — evmwlusianw 4 1 — evmwsmf 4 1 — evmwsmfa 4 1 — evmwsmfaa 4 1 — evmwsmfan 4 1 — e200z759n3 Core Reference Manual, Rev. 2 464 Freescale Semiconductor Table 6-6. SPE complex integer instruction timing (continued) 1 6.7 Instruction Latency Throughput Comments evmwsmi 4 1 — evmwsmia 4 1 — evmwsmiaa 4 1 — evmwsmian 4 1 — evmwssf 4 1 — evmwssfa 4 1 — evmwssfaa 4 1 — evmwssfan 4 1 — evmwumi 4 1 — evmwumia 4 1 — evmwumiaa 4 1 — evmwumian 4 1 — evsubfsmiaaw 1 1 — evsubfssiaaw 1 1 — evsubfumiaaw 1 1 — evsubfusiaaw 1 1 — Timing is data dependent Instruction forms and opcodes Table 6-7 gives the division of the opcode space for the new SPE instructions. Table 6-7. Opcode space division Opcode bits Instruction Class 0–5 21–25 4 0100* SPE APU integer simple instructions 4 01010 EFPU floating-point instructions 4 01011 Embedded floating-point APU instructions 4 01100 SPE APU load/store instructions 4 01101 SPE APU reserved for future use e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 465 Table 6-7. Opcode space division (continued) Opcode bits Instruction Class 6.7.1 0–5 21–25 4 0111* SPE APU reserved for future use 4 10*** SPE APU integer complex instructions 4 11*** SPE APU integer complex instructions: reserved for future use SPE vector integer simple instructions For instructions that have signed and unsigned forms, bit 31 is 1 for the signed form and 0 for the unsigned form. For instructions that have immediate forms, bit 30 is 1 for immediate forms. All instructions have the destination register specified in the bits 6–10, which differs from Power Architecture ISA/Book E where some instructions have the destination in bits 11–15. Table 6-8. Opcodes for integer simple instructions Opcode Instruction Comments 0–5 6–10 11–15 16–20 21–31 brinc 4 rD RA rB 010 0000 1111 — evabs 4 RD RA 00000 010 0000 1000 — evaddiw 4 RD UIMM RB 010 0000 0010 — evaddw 4 RD RA RB 010 0000 0000 — evand 4 RD RA RB 010 0001 0001 RD = RA & RB evandc 4 RD RA RB 010 0001 0010 RD = RA & (~RB) evcmpeq 4 crfD 00 RA RB 010 0011 0100 — evcmpgts 4 crfD 00 RA RB 010 0011 0001 — evcmpgtu 4 crfD 00 RA RB 010 0011 0000 — evcmplts 4 crfD 00 RA RB 010 0011 0011 — evcmpltu 4 crfD 00 RA RB 010 0011 0010 — evcntlsw 4 RD RA 00000 010 0000 1110 — evcntlzw 4 RD RA 00000 010 0000 1101 — eveqv 4 RD RA RB 010 0001 1001 RD = ~(RA XOR RB) evextsb 4 RD RA 00000 010 0000 1010 — evextsh 4 RD RA 00000 010 0000 1011 — e200z759n3 Core Reference Manual, Rev. 2 466 Freescale Semiconductor Table 6-8. Opcodes for integer simple instructions (continued) Opcode Instruction Comments 0–5 6–10 11–15 16–20 21–31 evmergehi 4 RD RA RB 010 0010 1100 — evmergehilo 4 RD RA RB 010 0010 1110 — evmergelo 4 RD RA RB 010 0010 1101 — evmergelohi 4 RD RA RB 010 0010 1111 — evnand 4 RD RA RB 010 0001 1110 RD = ~(RA & RB) evneg 4 RD RA 00000 010 0000 1001 — evnor 4 RD RA RB 010 0001 1000 RD = ~(RA | RB) evor 4 RD RA RB 010 0001 0111 RD = RA | RB evorc 4 RD RA RB 010 0001 1011 RD = RA | (~RB) evrlw 4 RD RA RB 010 0010 1000 — evrlwi 4 RD RA UIMM 010 0010 1010 — evrndw 4 RD RA 00000 010 0000 1100 — evsel 4 RD RA RB 010 0111 1crfS crfS is a 3-bit field evslw 4 RD RA RB 010 0010 0100 — evslwi 4 RD RA UIMM 010 0010 0110 — evsplatfi 4 RD SIMM 00000 010 0010 1011 — evsplati 4 RD SIMM 00000 010 0010 1001 — evsrwis 4 RD RA UIMM 010 0010 0011 — evsrwiu 4 RD RA UIMM 010 0010 0010 — evsrws 4 RD RA RB 010 0010 0001 — evsrwu 4 RD RA RB 010 0010 0000 — evsubfw 4 RD RA RB 010 0000 0100 — evsubifw 4 RD UIMM RB 010 0000 0110 — evxor 4 RD RA RB 010 0001 0110 RD = RA XOR RB e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 467 6.7.2 Opcodes for SPE load and store instructions Load instructions have a ‘0’ in bit 26 whereas all stores have a ‘1’ in bit 26. Bits 27 and 28 indicate the size of the data access to memory. Bit 31 indicates whether the index is immediate or the contents of a register. All store instructions have the source of the data register specified in bits 6:10 (RS). Table 6-9. SPE load and store instruction opcodes Opcode bits Instruction Comments 0–5 6–10 11–15 16–20 21–31 evldd 4 RD RA UIMM 011 0000 0001 — evlddx 4 RD RA RB 011 0000 0000 — evldh 4 RD RA UIMM 011 0000 0101 — evldhx 4 RD RA RB 011 0000 0100 — evldw 4 RD RA UIMM 011 0000 0011 — evldwx 4 RD RA RB 011 0000 0010 — evlhhesplat 4 RD RA UIMM 011 0000 1001 — evlhhesplatx 4 RD RA RB 011 0000 1000 — evlhhossplat 4 RD RA UIMM 011 0000 1111 — evlhhossplatx 4 RD RA RB 011 0000 1110 — evlhhousplat 4 RD RA UIMM 011 0000 1101 — evlhhousplatx 4 RD RA RB 011 0000 1100 — evlwhe 4 RD RA UIMM 011 0001 0001 — evlwhex 4 RD RA RB 011 0001 0000 — evlwhos 4 RD RA UIMM 011 0001 0111 — evlwhosx 4 RD RA RB 011 0001 0110 — evlwhou 4 RD RA UIMM 011 0001 0101 — evlwhoux 4 RD RA RB 011 0001 0100 — evlwhsplat 4 RD RA UIMM 011 0001 1101 — evlwhsplatx 4 RD RA RB 011 0001 1100 — evlwwsplat 4 RD RA UIMM 011 0001 1001 — evlwwsplatx 4 RD RA RB 011 0001 1000 — e200z759n3 Core Reference Manual, Rev. 2 468 Freescale Semiconductor Table 6-9. SPE load and store instruction opcodes (continued) Opcode bits Instruction 6.7.3 Comments 0–5 6–10 11–15 16–20 21–31 evstdd 4 RS RA UIMM 011 0010 0001 — evstddx 4 RS RA RB 011 0010 0000 — evstdh 4 RS RA UIMM 011 0010 0101 — evstdhx 4 RS RA RB 011 0010 0100 — evstdw 4 RS RA UIMM 011 0010 0011 — evstdwx 4 RS RA RB 011 0010 0010 — evstwhe 4 RS RA UIMM 011 0011 0001 — evstwhex 4 RS RA RB 011 0011 0000 — evstwho 4 RS RA UIMM 011 0011 0101 — evstwhox 4 RS RA RB 011 0011 0100 — evstwwe 4 RS RA UIMM 011 0011 1001 — evstwwex 4 RS RA RB 011 0011 1000 — evstwwo 4 RS RA UIMM 011 0011 1101 — evstwwox 4 RS RA RB 011 0011 1100 — Opcodes for SPE complex integer instructions Table 6-10. Opcodes for complex integer instructions, sorted by mnemonic Opcode bits Instruction 0–5 6–10 11–15 16–20 21–31 evaddsmiaaw 4 RD RA 00000 100 1100 1001 evaddssiaaw 4 RD RA 00000 100 1100 0001 evaddumiaaw 4 RD RA 00000 100 1100 1000 evaddusiaaw 4 RD RA 00000 100 1100 0000 evdivws 4 RD RA RB 100 1100 0110 evdivwu 4 RD RA RB 100 1100 0111 evmhegsmfaa 4 RD RA RB 101 0010 1011 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 469 Table 6-10. Opcodes for complex integer instructions, sorted by mnemonic (continued) Opcode bits Instruction 0–5 6–10 11–15 16–20 21–31 evmhegsmfan 4 RD RA RB 101 1010 1011 evmhegsmiaa 4 RD RA RB 101 0010 1001 evmhegsmian 4 RD RA RB 101 1010 1001 evmhegumiaa 4 RD RA RB 101 0010 1000 evmhegumian 4 RD RA RB 101 1010 1000 evmhesmf 4 RD RA RB 100 0000 1011 evmhesmfa 4 RD RA RB 100 0010 1011 evmhesmfaaw 4 RD RA RB 101 0000 1011 evmhesmfanw 4 RD RA RB 101 1000 1011 evmhesmi 4 RD RA RB 100 0000 1001 evmhesmia 4 RD RA RB 100 0010 1001 evmhesmiaaw 4 RD RA RB 101 0000 1001 evmhesmianw 4 RD RA RB 101 1000 1001 evmhessf 4 RD RA RB 100 0000 0011 evmhessfa 4 RD RA RB 100 0010 0011 evmhessfaaw 4 RD RA RB 101 0000 0011 evmhessfanw 4 RD RA RB 101 1000 0011 evmhessiaaw 4 RD RA RB 101 0000 0001 evmhessianw 4 RD RA RB 101 1000 0001 evmheumi 4 RD RA RB 100 0000 1000 evmheumia 4 RD RA RB 100 0010 1000 evmheumiaaw 4 RD RA RB 101 0000 1000 evmheumianw 4 RD RA RB 101 1000 1000 evmheusiaaw 4 RD RA RB 101 0000 0000 evmheusianw 4 RD RA RB 101 1000 0000 evmhogsmfaa 4 RD RA RB 101 0010 1111 e200z759n3 Core Reference Manual, Rev. 2 470 Freescale Semiconductor Table 6-10. Opcodes for complex integer instructions, sorted by mnemonic (continued) Opcode bits Instruction 0–5 6–10 11–15 16–20 21–31 evmhogsmfan 4 RD RA RB 101 1010 1111 evmhogsmiaa 4 RD RA RB 101 0010 1101 evmhogsmian 4 RD RA RB 101 1010 1101 evmhogumiaa 4 RD RA RB 101 0010 1100 evmhogumian 4 RD RA RB 101 1010 1100 evmhosmf 4 RD RA RB 100 0000 1111 evmhosmfa 4 RD RA RB 100 0010 1111 evmhosmfaaw 4 RD RA RB 101 0000 1111 evmhosmfanw 4 RD RA RB 101 1000 1111 evmhosmi 4 RD RA RB 100 0000 1101 evmhosmia 4 RD RA RB 100 0010 1101 evmhosmiaaw 4 RD RA RB 101 0000 1101 evmhosmianw 4 RD RA RB 101 1000 1101 evmhossf 4 RD RA RB 100 0000 0111 evmhossfa 4 RD RA RB 100 0010 0111 evmhossfaaw 4 RD RA RB 101 0000 0111 evmhossfanw 4 RD RA RB 101 1000 0111 evmhossiaaw 4 RD RA RB 101 0000 0101 evmhossianw 4 RD RA RB 101 1000 0101 evmhoumi 4 RD RA RB 100 0000 1100 evmhoumia 4 RD RA RB 100 0010 1100 evmhoumiaaw 4 RD RA RB 101 0000 1100 evmhoumianw 4 RD RA RB 101 1000 1100 evmhousiaaw 4 RD RA RB 101 0000 0100 evmhousianw 4 RD RA RB 101 1000 0100 evmra 4 RD RA 00000 100 1100 0100 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 471 Table 6-10. Opcodes for complex integer instructions, sorted by mnemonic (continued) Opcode bits Instruction 0–5 6–10 11–15 16–20 21–31 evmwhsmf 4 RD RA RB 100 0100 1111 evmwhsmfa 4 RD RA RB 100 0110 1111 evmwhsmi 4 RD RA RB 100 0100 1101 evmwhsmia 4 RD RA RB 100 0110 1101 evmwhssf 4 RD RA RB 100 0100 0111 evmwhssfa 4 RD RA RB 100 0110 0111 evmwhumi 4 RD RA RB 100 0100 1100 evmwhumia 4 RD RA RB 100 0110 1100 evmwlsmiaaw 4 RD RA RB 101 0100 1001 evmwlsmianw 4 RD RA RB 101 1100 1001 evmwlssiaaw 4 RD RA RB 101 0100 0001 evmwlssianw 4 RD RA RB 101 1100 0001 evmwlumi 4 RD RA RB 100 0100 1000 evmwlumia 4 RD RA RB 100 0110 1000 evmwlumiaaw 4 RD RA RB 101 0100 1000 evmwlumianw 4 RD RA RB 101 1100 1000 evmwlusiaaw 4 RD RA RB 101 0100 0000 evmwlusianw 4 RD RA RB 101 1100 0000 evmwsmf 4 RD RA RB 100 0101 1011 evmwsmfa 4 RD RA RB 100 0111 1011 evmwsmfaa 4 RD RA RB 101 0101 1011 evmwsmfan 4 RD RA RB 101 1101 1011 evmwsmi 4 RD RA RB 100 0101 1001 evmwsmia 4 RD RA RB 100 0111 1001 evmwsmiaa 4 RD RA RB 101 0101 1001 evmwsmian 4 RD RA RB 101 1101 1001 e200z759n3 Core Reference Manual, Rev. 2 472 Freescale Semiconductor Table 6-10. Opcodes for complex integer instructions, sorted by mnemonic (continued) Opcode bits Instruction 0–5 6–10 11–15 16–20 21–31 evmwssf 4 RD RA RB 100 0101 0011 evmwssfa 4 RD RA RB 100 0111 0011 evmwssfaa 4 RD RA RB 101 0101 0011 evmwssfan 4 RD RA RB 101 1101 0011 evmwumi 4 RD RA RB 100 0101 1000 evmwumia 4 RD RA RB 100 0111 1000 evmwumiaa 4 RD RA RB 101 0101 1000 evmwumian 4 RD RA RB 101 1101 1000 evsubfsmiaaw 4 RD RA 00000 100 1100 1011 evsubfssiaaw 4 RD RA 00000 100 1100 0011 evsubfumiaaw 4 RD RA 00000 100 1100 1010 evsubfusiaaw 4 RD RA 00000 100 1100 0010 Table 6-11. Opcodes for complex integer instructions, sorted by opcode Opcode bits Instruction 0–5 6–10 11–15 16–20 21–31 evmhessf 4 RD RA RB 100 0000 0011 evmhossf 4 RD RA RB 100 0000 0111 evmheumi 4 RD RA RB 100 0000 1000 evmhesmi 4 RD RA RB 100 0000 1001 evmhesmf 4 RD RA RB 100 0000 1011 evmhoumi 4 RD RA RB 100 0000 1100 evmhosmi 4 RD RA RB 100 0000 1101 evmhosmf 4 RD RA RB 100 0000 1111 evmhessfa 4 RD RA RB 100 0010 0011 evmhossfa 4 RD RA RB 100 0010 0111 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 473 Table 6-11. Opcodes for complex integer instructions, sorted by opcode (continued) Opcode bits Instruction 0–5 6–10 11–15 16–20 21–31 evmheumia 4 RD RA RB 100 0010 1000 evmhesmia 4 RD RA RB 100 0010 1001 evmhesmfa 4 RD RA RB 100 0010 1011 evmhoumia 4 RD RA RB 100 0010 1100 evmhosmia 4 RD RA RB 100 0010 1101 evmhosmfa 4 RD RA RB 100 0010 1111 evmwhssf 4 RD RA RB 100 0100 0111 evmwlumi 4 RD RA RB 100 0100 1000 evmwhumi 4 RD RA RB 100 0100 1100 evmwhsmi 4 RD RA RB 100 0100 1101 evmwhsmf 4 RD RA RB 100 0100 1111 evmwssf 4 RD RA RB 100 0101 0011 evmwumi 4 RD RA RB 100 0101 1000 evmwsmi 4 RD RA RB 100 0101 1001 evmwsmf 4 RD RA RB 100 0101 1011 evmwhssfa 4 RD RA RB 100 0110 0111 evmwlumia 4 RD RA RB 100 0110 1000 evmwhumia 4 RD RA RB 100 0110 1100 evmwhsmia 4 RD RA RB 100 0110 1101 evmwhsmfa 4 RD RA RB 100 0110 1111 evmwssfa 4 RD RA RB 100 0111 0011 evmwumia 4 RD RA RB 100 0111 1000 evmwsmia 4 RD RA RB 100 0111 1001 evmwsmfa 4 RD RA RB 100 0111 1011 evaddusiaaw 4 RD RA 00000 100 1100 0000 evaddssiaaw 4 RD RA 00000 100 1100 0001 e200z759n3 Core Reference Manual, Rev. 2 474 Freescale Semiconductor Table 6-11. Opcodes for complex integer instructions, sorted by opcode (continued) Opcode bits Instruction 0–5 6–10 11–15 16–20 21–31 evsubfusiaaw 4 RD RA 00000 100 1100 0010 evsubfssiaaw 4 RD RA 00000 100 1100 0011 evmra 4 RD RA 00000 100 1100 0100 evdivws 4 RD RA RB 100 1100 0110 evdivwu 4 RD RA RB 100 1100 0111 evaddumiaaw 4 RD RA 00000 100 1100 1000 evaddsmiaaw 4 RD RA 00000 100 1100 1001 evsubfumiaaw 4 RD RA 00000 100 1100 1010 evsubfsmiaaw 4 RD RA 00000 100 1100 1011 evmheusiaaw 4 RD RA RB 101 0000 0000 evmhessiaaw 4 RD RA RB 101 0000 0001 evmhessfaaw 4 RD RA RB 101 0000 0011 evmhousiaaw 4 RD RA RB 101 0000 0100 evmhossiaaw 4 RD RA RB 101 0000 0101 evmhossfaaw 4 RD RA RB 101 0000 0111 evmheumiaaw 4 RD RA RB 101 0000 1000 evmhesmiaaw 4 RD RA RB 101 0000 1001 evmhesmfaaw 4 RD RA RB 101 0000 1011 evmhoumiaaw 4 RD RA RB 101 0000 1100 evmhosmiaaw 4 RD RA RB 101 0000 1101 evmhosmfaaw 4 RD RA RB 101 0000 1111 evmhegumiaa 4 RD RA RB 101 0010 1000 evmhegsmiaa 4 RD RA RB 101 0010 1001 evmhegsmfaa 4 RD RA RB 101 0010 1011 evmhogumiaa 4 RD RA RB 101 0010 1100 evmhogsmiaa 4 RD RA RB 101 0010 1101 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 475 Table 6-11. Opcodes for complex integer instructions, sorted by opcode (continued) Opcode bits Instruction 0–5 6–10 11–15 16–20 21–31 evmhogsmfaa 4 RD RA RB 101 0010 1111 evmwlusiaaw 4 RD RA RB 101 0100 0000 evmwlssiaaw 4 RD RA RB 101 0100 0001 evmwlumiaaw 4 RD RA RB 101 0100 1000 evmwlsmiaaw 4 RD RA RB 101 0100 1001 evmwssfaa 4 RD RA RB 101 0101 0011 evmwumiaa 4 RD RA RB 101 0101 1000 evmwsmiaa 4 RD RA RB 101 0101 1001 evmwsmfaa 4 RD RA RB 101 0101 1011 evmheusianw 4 RD RA RB 101 1000 0000 evmhessianw 4 RD RA RB 101 1000 0001 evmhessfanw 4 RD RA RB 101 1000 0011 evmhousianw 4 RD RA RB 101 1000 0100 evmhossianw 4 RD RA RB 101 1000 0101 evmhossfanw 4 RD RA RB 101 1000 0111 evmheumianw 4 RD RA RB 101 1000 1000 evmhesmianw 4 RD RA RB 101 1000 1001 evmhesmfanw 4 RD RA RB 101 1000 1011 evmhoumianw 4 RD RA RB 101 1000 1100 evmhosmianw 4 RD RA RB 101 1000 1101 evmhosmfanw 4 RD RA RB 101 1000 1111 evmhegumian 4 RD RA RB 101 1010 1000 evmhegsmian 4 RD RA RB 101 1010 1001 evmhegsmfan 4 RD RA RB 101 1010 1011 evmhogumian 4 RD RA RB 101 1010 1100 evmhogsmian 4 RD RA RB 101 1010 1101 e200z759n3 Core Reference Manual, Rev. 2 476 Freescale Semiconductor Table 6-11. Opcodes for complex integer instructions, sorted by opcode (continued) Opcode bits Instruction 0–5 6–10 11–15 16–20 21–31 evmhogsmfan 4 RD RA RB 101 1010 1111 evmwlusianw 4 RD RA RB 101 1100 0000 evmwlssianw 4 RD RA RB 101 1100 0001 evmwlumianw 4 RD RA RB 101 1100 1000 evmwlsmianw 4 RD RA RB 101 1100 1001 evmwssfan 4 RD RA RB 101 1101 0011 evmwumian 4 RD RA RB 101 1101 1000 evmwsmian 4 RD RA RB 101 1101 1001 evmwsmfan 4 RD RA RB 101 1101 1011 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 477 e200z759n3 Core Reference Manual, Rev. 2 478 Freescale Semiconductor Chapter 7 Interrupts and Exceptions The PowerISA 2.06 document defines the mechanisms by which the e200z759n3 core implements interrupts and exceptions. The document uses the terminology Interrupt as the action in which the processor saves its old context and begins execution at a pre-determined interrupt handler address. Exceptions are referred to as events that, when enabled, cause the processor to take an interrupt. This section uses the same terminology. The Power Architecture exception mechanism allows the processor to change to supervisor state as a result of unusual conditions arising in the execution of instructions, and from external signals, bus errors, or various internal conditions. When interrupts occur, information about the state of the processor is saved to machine state save/restore registers (SRR0/SRR1, CSRR0/CSRR1, or DSRR0/DSRR1, MCSRR0/MCSRR1) and the processor begins execution at an address (interrupt vector) determined by the Interrupt Vector Prefix register (IVPR), and one of the Interrupt Vector Offset registers (IVOR). Processing of instructions within the interrupt handler begins in supervisor mode. Multiple exception conditions can map to a single interrupt vector, and may be distinguished by examining registers associated with the interrupt. The Exception Syndrome register (ESR) is updated with information specific to the exception type when an interrupt occurs. To prevent loss of state information, interrupt handlers must save the information stored in the machine state save/restore registers, soon after the interrupt has been taken. Four sets of these registers are implemented; SRR0 and SRR1 for non-critical interrupts, CSRR0 and CSRR1 for critical interrupts, DSRR0 and DSRR1 for debug interrupts (when the Debug APU is enabled), and MCSRR0 and MCSRR1 for machine check interrupts. Hardware supports nesting of critical interrupts within non-critical interrupts, machine check interrupts within both critical and non-critical interrupts, and debug interrupts within both critical, non-critical, and machine check interrupts. It is up to the interrupt handler to save necessary state information if interrupts of a given class are re-enabled within the handler. The following terms are used to describe the stages of exception processing: Recognition Exception recognition occurs when the condition that can cause an exception is identified by the processor. This is also referred to as an exception event. Taken An interrupt is said to be taken when control of instruction execution is passed to the interrupt handler; that is, the context is saved and the instruction at the appropriate vector offset is fetched and the interrupt handler routine begins. Handling Interrupt handling is performed by the software linked to the appropriate vector offset. Interrupt handling is begun in supervisor mode. Returning from an interrupt is performed by executing an rfi, rfci, rfdi, or rfmci instruction or se_rfi, se_rfci, se_rfdi, or se_rfmci VLE instruction to restore state information from the respective machine state save/restore register pair. 7.1 e200z759n3 interrupts As specified by the PowerISA 2.06 architecture, interrupts can be either precise or imprecise, synchronous or asynchronous, and critical or non-critical. Asynchronous exceptions are caused by events external to e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 479 the processor’s instruction execution; synchronous exceptions are directly caused by instructions or an event somehow synchronous to the program flow, such as a context switch. A precise interrupt architecturally guarantees that no instruction beyond the instruction causing the exception has (visibly) executed. Critical interrupts are provided with a separate save/restore register pair (CSRR0/CSRR1) to allow certain critical exceptions to be handled within a non-critical interrupt handler. Machine check interrupts are also provided with a separate save/restore register pair (MCSRR0/MCSRR1) to allow machine check exceptions to be handled within a non-critical or critical interrupt handler. The types of interrupts handled are shown in Table 7-1. Refer to Chapter 7 of Book E: Enhanced PowerPCtm Architecture v0.99 for exact details of each interrupt type. Table 7-1. Interrupt classifications Interrupt types Synchronous/asynchronous Precise/imprecise Critical/non-critical/ debug/ machine check System Reset Asynchronous, non-maskable Imprecise — Machine Check — — Machine Check Non-Maskable Input interrupt Asynchronous, non-maskable Imprecise Machine Check Critical Input interrupt Watchdog Timer interrupt Asynchronous, maskable Imprecise Critical External Input Interrupt Fixed-Interval Timer interrupt Decrementer interrupt Asynchronous, maskable Imprecise Non-critical Performance Monitor interrupts Synchronous/Asynchronous, maskable Imprecise Non-critical Instruction-based Debug interrupts Synchronous Precise Critical / Debug Debug Interrupt (UDE) Debug Imprecise interrupt Asynchronous Imprecise Critical / Debug Data Storage / Alignment / TLB interrupts Instruction Storage / TLB interrupts Synchronous Precise Non-critical These classifications are discussed in greater detail in Section 7.7, Interrupt definitions. Interrupts implemented in e200z759n3 and the exception conditions that cause them are listed in Table 7-2. Table 7-2. Exceptions and conditions Interrupt type Interrupt vector offset register Causing conditions System reset none, vector to [p_rstbase[0:29]] || 2’b00 Reset by assertion of p_reset_b. Critical Input IVOR 01 p_critint_b is asserted and MSRCE=1. e200z759n3 Core Reference Manual, Rev. 2 480 Freescale Semiconductor Table 7-2. Exceptions and conditions (continued) Interrupt type Interrupt vector offset register Causing conditions Machine check IVOR 1 • • • • Machine check (NMI) IVOR 1 p_nmi_b transitions from negated to asserted. Data Storage IVOR 2 • Access control. • Byte ordering due to misaligned access across page boundary to pages with mismatched E bits • Cache locking exception Instruction Storage IVOR 3 p_mcp_b transitions from negated to asserted ISI, ITLB Error on first instruction fetch for an exception handler Parity Error signaled on cache access External bus error • Access control. • Byte ordering due to misaligned instruction across page boundary to pages with mismatched VLE bits, or access to page with VLE set, and E indicating little-endian. • Misaligned Instruction fetch due to a change of flow to an odd halfword instruction boundary on a BookE (non-VLE) instruction page External Input IVOR 41 p_extint_b is asserted and MSREE=1. Alignment IVOR 5 • • • • Program IVOR 6 Illegal, Privileged, Trap, AP enabled. Floating-point unavailable IVOR 7 Unused by e200z759n3. System call IVOR 8 Execution of the System Call (sc, se_sc) instruction AP unavailable IVOR 9 Unused by e200z759n3 Decrementer IVOR 10 As specified in Book E: Enhanced PowerPCtm Architecture v0.99, Ch. 8, pg. 190-191 Fixed Interval Timer IVOR 11 As specified in Book E: Enhanced PowerPCtm Architecture v0.99, Ch. 8, pg. 191-192 Watchdog Timer IVOR 12 As specified in Book E: Enhanced PowerPCtm Architecture v0.99, Ch. 8, pg. 192-194 Data TLB Error IVOR 13 Data translation lookup did not match a valid entry in the TLB Instruction TLB Error IVOR 14 Instruction translation lookup did not match a valid entry in the TLB Debug IVOR 15 Trap, Instruction Address Compare, Data Address Compare, Instruction Complete, Branch Taken, Return from Interrupt, Interrupt Taken, Debug Counter, External Debug Event, Unconditional Debug Event Reserved IVOR 16-31 lmw, stmw not word aligned lwarx or stwcx. not word aligned, lharx or sthcx. not halfword aligned dcbz with disabled cache, or to W or I storage SPE ld and st instructions not properly aligned — e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 481 Table 7-2. Exceptions and conditions (continued) Interrupt vector offset register Interrupt type Causing conditions SPE/EFPU Unavailable Exception IVOR 32 See Section 6.2.6.1, SPE APU Unavailable exception, and Section 5.2.5.1, EFPU unavailable exception EFPU Data Exception IVOR 33 See Section 5.2.5.2, Embedded floating-point data exception EFPU Round Exception IVOR 34 See Section 5.2.5.3, Embedded floating-point round exception Performance Monitor IVOR 35 Performance Monitor Enabled Condition or Event 1 Autovectored External and Critical Input interrupts use this IVOR. Vectored interrupts supply an interrupt vector offset directly. 7.2 Exception Syndrome Register (ESR) VLEMI 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 ILK 0 0 0 MIF 7 SPE 6 PIE 5 BO 4 AP 0 DLK ST 3 FP 2 PTR 1 PIL 0 PPR 0 PUO The Exception Syndrome Register (ESR) provides a syndrome to differentiate between exceptions that can generate the same interrupt type. e200z759n3 adds some implementation specific bits to this register, as seen in Figure 7-1. 0 SPR - 62; Read/Write; Reset - 0x0 Figure 7-1. Exception Syndrome Register (ESR) The ESR bits are defined in Table 7-3. Table 7-3. ESR field descriptions Bits Name Description Associated interrupt type 0:3 (32:35) — Allocated1 4 (36) PIL Illegal Instruction exception (For e200z759n3, PIL used for all illegal/unimplemented instructions) Program 5 (37) PPR Privileged Instruction exception Program 6 (38) PTR Trap exception Program — e200z759n3 Core Reference Manual, Rev. 2 482 Freescale Semiconductor Table 7-3. ESR field descriptions (continued) Bits Name Description Associated interrupt type 7 (39) FP Floating-point operation Alignment (not on Zen) Data Storage (not on Zen) Data TLB (not on Zen) Program 8 (40) ST Store operation Alignment Data Storage Data TLB 9 (41) — Reserved2 10 (42) DLK Data Cache Locking Data Storage 11 (43) ILK Instruction Cache Locking Data Storage 12 (44) AP Auxiliary Processor operation (Not used by Zen) Alignment (not on Zen) Data Storage (not on Zen) Data TLB (not on Zen) Program (not on Zen) 13 (45) PUO Unimplemented Operation exception (Not used by e200z759n3, PIL used for all illegal/unimplemented instructions) Program 14 (46) BO Byte Ordering exception Mismatched Instruction Storage exception Data Storage Instruction Storage 15 (47) PIE Program Imprecise exception (Reserved) Currently unused by Zen 16:23 (48:55) — 24 (56) SPE 25 (57) — — Reserved2 — SPE/EFPU APU Operation Allocated1 SPE/EFPU Unavailable EFPU Floating-point Data Exception EFPU Floating-point Round Exception Alignment Data Storage Data TLB — e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 483 Table 7-3. ESR field descriptions (continued) Bits Name 26 (58) VLEMI 27:29 (59:61) — 30 (62) MIF 31 (63) — 1 Description Associated interrupt type VLE Mode Instruction SPE/EFPU Unavailable EFPU Floating-point Data Exception EFPU Floating-point Round Exception Data Storage Data TLB Instruction Storage Alignment Program System Call Allocated1 — Misaligned Instruction Fetch Instruction Storage Instruction TLB Allocated1 — These bits are not implemented and should be written with zero for future compatibility. These bits are not implemented, and should be written with zero for future compatibility. 2 7.3 Machine State Register (MSR) 0 1 2 3 4 7 8 RI 0 PMM DS 0 IS DE 0 FE1 FE0 ME FP PR 0 EE 6 0 CE 5 WE SPE 0 UCLE The Machine State Register defines the state of the processor. The e200z759n3 MSR is shown in Figure 7-2. 0 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Read/ Write; Reset - 0x0 Figure 7-2. Machine State Register (MSR) The MSR bits are defined in Table 7-4. Table 7-4. MSR field descriptions Bits Name 0:4 (32:36) — 5 (37) UCLE Description Reserved1 User Cache Lock Enable 0 Execution of the cache locking instructions in user mode (MSRPR=1) disabled; DSI exception taken instead, and ILK or DLK set in ESR. 1 Execution of the cache lock instructions in user mode enabled. e200z759n3 Core Reference Manual, Rev. 2 484 Freescale Semiconductor Table 7-4. MSR field descriptions (continued) Bits Name Description 6 (38) SPE SPE/EFPU Available 0 Execution of SPE and EFPU APU vector instructions is disabled; SPE/EFPU Unavailable exception taken instead, and SPE bit is set in ESR. 1 Execution of SPE and EFPU APU vector instructions is enabled. 7:12 (39:44) — 13 (45) WE Wait State (Power management) enable. This bit is defined as optional in the PowerISA 2.06 architecture. 0 Power management is disabled. 1 Power management is enabled. The processor can enter a power-saving mode when additional conditions are present. The mode chosen is determined by the DOZE, NAP, and SLEEP bits in the HID0 register, described in Section 2.4.11, Hardware Implementation Dependent Register 0 (HID0). 14 (46) CE Critical Interrupt Enable 0 Critical Input and Watchdog Timer interrupts are disabled. 1 Critical Input and Watchdog Timer interrupts are enabled. 15 (47) — Reserved1 16 (48) EE External Interrupt Enable 0 External Input, Decrementer, and Fixed-Interval Timer interrupts are disabled. 1 External Input, Decrementer, and Fixed-Interval Timer interrupts are enabled. 17 (49) PR Problem State 0 The processor is in supervisor mode, can execute any instruction, and can access any resource (e.g. GPRs, SPRs, MSR, etc.). 1 The processor is in user mode, cannot execute any privileged instruction, and cannot access any privileged resource. 18 (50) FP Floating-Point Available 0 Floating point unit is unavailable. The processor cannot execute floating-point instructions, including floating-point loads, stores, and moves. 1 Floating-point unit is available. The processor can execute floating-point instructions. Note that for e200z759n3, the floating point unit is not supported in hardware, and an Illegal Instruction exception will be generated for attempted execution of PowerISA 2.06 floating point instructions regardless of the setting of FP. FP is ignored, but cleared on exceptions. 19 (51) ME Machine Check Enable 0 Asynchronous Machine Check interrupts are disabled. 1 Asynchronous Machine Check interrupts are enabled. 20 (52) FE0 Floating-point exception mode 0 (not used by Zen) 21 (53) — Reserved1 22 (54) DE Debug Interrupt Enable 0 Debug interrupts are disabled. 1 Debug interrupts are enabled. 23 (55) FE1 Floating-point exception mode 1 (not used by Zen) Reserved1 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 485 Table 7-4. MSR field descriptions (continued) 1 Bits Name Description 24 (56) — Reserved1 25 (57) — Preserved1 26 (58) IS Instruction Address Space 0 The processor directs all instruction fetches to address space 0 (TS=0 in the relevant TLB entry). 1 The processor directs all instruction fetches to address space 1 (TS=1 in the relevant TLB entry). 27 (59) DS Data Address Space 0 The processor directs all data storage accesses to address space 0 (TS=0 in the relevant TLB entry). 1 The processor directs all data storage accesses to address space 1 (TS=1 in the relevant TLB entry). 28 (60) — Reserved1 29 (61) PMM 30 (62) RI Recoverable Interrupt - This bit is provided for software use to detect nested exception conditions. This bit is cleared by hardware when a Machine Check interrupt is taken 31 (63) — Preserved1 PMM Performance monitor mark bit. System software can set PMM when a marked process is running to enable statistics to be gathered only during the execution of the marked process. MSRPR and MSRPMM together define a state that the processor (supervisor or user) and the process (marked or unmarked) may be in at any time. If this state matches an individual state specified in the Performance Monitor registers PMLCa n, the state for which monitoring is enabled, counting is enabled. These bits are not implemented, will be read as zero, and writes are ignored. 7.3.1 Machine Check Syndrome Register (MCSR) BUS_WRERR BUS_IRERR BUS_DRERR 0 SNPERR 8 G 7 ST 6 IF 5 0 LD IC_LKERR DC_LKERR 4 MAV DC_TPERR 3 MEA IC_TPERR 2 0 NMI EXCP_ERR 1 CP_PERR IC_DPERR 0 DC_DPERR MCP When the processor takes a machine check interrupt, it updates the Machine Check Syndrome register (MCSR) to differentiate between machine check conditions. The MCSR is shown in Figure 7-3. 0 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 572; Read/Clear; Reset - 0x0 Figure 7-3. Machine Check Syndrome Register (MCSR) e200z759n3 Core Reference Manual, Rev. 2 486 Freescale Semiconductor Table 7-5 describes MCSR fields. The MCSR indicates the source of a machine check condition. When an “Async Mchk” or “Error Report” syndrome bit in the MCSR is set, the core complex asserts p_mcp_out for system information. All bits in the MCSR are implemented as “write ‘1’ to clear”. Software in the machine check handler is expected to clear the MCSR bits it has sampled prior to re-enabling MSRME to avoid a redundant machine check exception and to prepare for updated status bit information on the next machine check interrupt. Hardware will not clear a bit in the MCSR other than at reset. Software will typically sample MCSR early in the machine check handler, and will use the sampled value to clear those bits that were set at the time of sampling. Note that additional bits may become set during the handler after sampling if an asynchronous event occurs. By writing back only the originally sampled bits, another machine check can be generated to process the new conditions after the original handler re-enables MSRME either explicitly, or by restoring the MSR from MSRR1 at the return. Note that any set bit in the MCSR other than status-type bits will cause a subsequent machine check interrupt once MSRME=1. Table 7-5. MCSR field descriptions Bit Name 0 (32) MCP 1 (33) Description Exception Type1 Recoverable Machine check input pin Async Mchk Maybe IC_DPERR Instruction Cache data array parity error Async Mchk Precise 2 (34) CP_PERR Data Cache push parity error Async Mchk Unlikely 3 (35) DC_DPERR Data Cache data array parity error Async Mchk Maybe 4 (36) EXCP_ERR ISI, ITLB, or Bus Error on first instruction fetch for an exception handler Async Mchk Precise 5 (37) IC_TPERR Instruction Cache Tag parity error Async Mchk Precise 6 (38) DC_TPERR Data Cache Tag parity error Async Mchk Maybe 7 (39) IC_LKERR Instruction Cache Lock error Indicates a cache control operation or invalidation operation invalidated one or more locked lines in the ICache or encountered an uncorrectable lock error, or that an ICache miss with an uncorrectable lock error occurred. May also be set on locked line refill error. Status — 8 (40) DC_LKERR Data Cache Lock error Indicates a cache control operation or invalidation operation invalidated one or more locked lines in the DCache or encountered an uncorrectable lock error, or that an ICache miss with an uncorrectable lock error occurred. May also be set on locked line refill error. Status — e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 487 Table 7-5. MCSR field descriptions (continued) Description Exception Type1 Bit Name 9:10 (41:42) — 11 (43) NMI NMI input pin 12 (44) MAV 13 (45) MEA 14 (46) — Reserved, should be cleared. 15 (47) IF Instruction Fetch Error Report An error occurred during the attempt to fetch an instruction. This could be due to a parity error, or an external bus error. MCSRR0 contains the instruction address. Error Report Precise 16 (48) LD Load type instruction Error Report An error occurred during the attempt to execute the load type instruction located at the address stored in MCSRR0. This could be due to a parity error or an external bus error. Error Report Precise 17 (49) ST Store type instruction Error Report An error occurred during the attempt to execute the store type instruction located at the address stored in MCSRR0. This could be due to a parity error, or on certain external bus errors. Error Report Precise 18 (50) G Guarded instruction Error Report An error occurred during the attempt to execute the load or store type instruction located at the address stored in MCSRR0 and the access was guarded and encountered an error on the external bus. Error Report Precise 19:25 (51:57) — Reserved, should be cleared. 26 (58) SNPERR 27 (59) 28 (60) Reserved, should be cleared. Recoverable — NMI — MCAR Address Valid Indicates that the address contained in the MCAR was updated by hardware to correspond to the first detected Async Mchk error condition Status — MCAR holds Effective Address If MAV=1,MEA=1 indicates that the MCAR contains an effective address and MEA=0 indicates that the MCAR contains a physical address Status — — — Snoop Lookup Error An error occurred during certain snoop operations. This is typically due to a data cache tag parity error, in which case DC_TPERR will also be set. Async Mchk Unlikely? BUS_IRERR Read bus error on Instruction fetch or linefill Async Mchk Precise if data used BUS_DRERR Read bus error on data load or linefill Async Mchk Precise if data used e200z759n3 Core Reference Manual, Rev. 2 488 Freescale Semiconductor Table 7-5. MCSR field descriptions (continued) Bit Name 29 (61) BUS_WRERR 30:31 (62:63) — 1 Description Exception Type1 Write bus error on store or cache line push Reserved, should be cleared. Recoverable Async Mchk Unlikely — — The Exception Type indicates the exception type associated with a given syndrome bit - “Error Report” indicates that this bit is only set for error report exceptions that cause machine check interrupts. These bits are only updated when the machine check interrupt is actually taken. Error report exceptions are not gated by MSRME. These are synchronous exceptions. These bits will remain set until cleared by software writing a “1” to the bit position(s) to be cleared. - “Status” indicates that this bit is provides additional status information regarding the logging of a machine check exception. These bits will remain set until cleared by software writing a “1” to the bit position(s) to be cleared. - “NMI” indicates that this bit is only set for the non-maskable interrupt type exception that causes a machine check interrupt. This bit is only updated when the machine check interrupt is actually taken. NMI exceptions are not gated by MSRME. This is an asynchronous exception. This bit will remain set until cleared by software writing a “1” to the bit position. - “Async Mchk” indicates that this bit is set for an asynchronous machine check exception. These bits are set immediately upon detection of the error. Once any “Async Mchk” bit is set in the MCSR, a machine check interrupt will occur if MSRME=1. If MSRME=0, the machine check exception will remain pending. These bits will remain set until cleared by software writing a “1” to the bit position(s) to be cleared. 7.4 Interrupt Vector Prefix Registers (IVPR) The Interrupt Vector Prefix Register is used during interrupt processing for determining the starting address of a software handler used to handle an interrupt. The value contained in the Vector Offset field of the IVOR selected for a particular interrupt type is concatenated with the Vector Base value held in the Interrupt Vector Prefix register (IVPR) to form an instruction address from which execution is to begin. The format of IVPR is shown in Figure 7-4. Vector Base 0 1 2 3 4 5 6 7 8 0 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 63; Read/Write Figure 7-4. e200z759n3 Interrupt Vector Prefix Register (IVPR) The IVPR fields are defined in Table 7-6. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 489 Table 7-6. IVPR field descriptions Bits Name 0:15 (32:47) Vec Base Vector Base This field is used to define the base location of the vector table, aligned to a 64 KB boundary. This field provides the high-order 16 bits of the location of all interrupt handlers. The contents of the IVORxx register appropriate for the type of exception being processed are concatenated with the IVPR Vector Base to form the address of the handler in memory. 16:31 (48:63) 1 Description Reserved1 — These bits are not implemented, will be read as zero, and writes are ignored. 7.5 Interrupt Vector Offset Registers (IVORxx) The Interrupt Vector Offset Registers are used during interrupt processing for determining the starting address of a software handler used to handle an interrupt. The value contained in the Vector Offset field of the IVOR selected for a particular interrupt type is concatenated with the value held in the Interrupt Vector Prefix register (IVPR) to form an instruction address from which execution is to begin. The format of a e200z759n3 IVOR is shown in Figure 7-5. 0 0 1 2 3 4 5 6 7 Vector Offset 8 0 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 400-415, 528-530; Read/Write Figure 7-5. e200z759n3 Interrupt Vector Offset Register (IVOR) The IVOR fields are defined in Table 7-7. Table 7-7. IVOR field descriptions 1 7.6 Bits Name 0:15 (32:47) — 16:27 (48:59) Vector Offset 28:31 (60:63) — Description Reserved1 Vector Offset This field is used to provide a quadword index from the base address provided by the IVPR to locate an interrupt handler. Reserved1 These bits are not implemented, will be read as zero, and writes are ignored. Hardware Interrupt Vector Offset Values (p_voffset[0:15]) The p_voffset[0:15] input signals provide a hardware vector offset to be used when exception processing begins for an incoming interrupt request. These signals are sampled along with the p_extint_b and p_critint_b interrupt request inputs, and must be driven to a valid value when either of these signals is e200z759n3 Core Reference Manual, Rev. 2 490 Freescale Semiconductor asserted unless the p_avec_b signal is also asserted. If p_avec_b is asserted, these inputs are not used. p_voffset[0:11] are used in forming the exception handler address, and p_voffset[12:15] are reserved and should be driven low. 7.7 Interrupt definitions 7.7.1 Critical Input interrupt (IVOR0) A Critical Input exception is signaled to the processor by the assertion of the critical interrupt pin (p_critint_b). When e200z759n3 detects the exception, if the exception is enabled by MSRCE, e200z759n3 takes the Critical Input interrupt. The p_critint_b input is a level-sensitive signal expected to remain asserted until e200z759n3 acknowledges the interrupt. If p_critint_b is negated early, recognition of the interrupt request is not guaranteed. After e200z759n3 begins execution of the critical interrupt handler, the system can safely negate p_critint_b. A Critical Input interrupt may be delayed by other higher priority exceptions or if MSRCE is cleared when the exception occurs. Table 7-8 lists register settings when a Critical Input interrupt is taken. Table 7-8. Critical Input interrupt—register settings Register Setting description CSRR0 Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present. CSRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE 0 EE 0 PR 0 ESR Unchanged MCSR Unchanged DEAR Unchanged Vector IVPR0:15 || IVOR016:27 || 4b0000 (autovectored) IVPR0:15 || p_voffset[0:11] || 4b0000 (non-autovectored) 1 FP ME FE0 DE 0 — 0 —/01 FE1 IS DS PMM RI 0 0 0 0 — DE is cleared when the Debug APU is disabled. Clearing of DE is optionally supported by control in HID0 when the Debug APU is enabled. When the Debug APU is enabled, the MSRDE bit is not automatically cleared by a Critical Input interrupt, but can be configured to be cleared via the HID0 register (HID0CICLRDE). Refer to Section 2.4.11, Hardware Implementation Dependent Register 0 (HID0). IVOR0 is the vector offset register used by autovectored Critical Input interrupts to determine the interrupt handler location. e200z759n3 also provides the capability to directly vector Critical Input interrupts to multiple handlers by allowing a Critical Input interrupt request to be accompanied by a vector offset. The e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 491 p_voffset[0:11] input signals are used in place of the value in IVOR0 to form the interrupt vector when a Critical Input interrupt request is not autovectored (p_avec_b negated when p_critint_b asserted). 7.7.2 Machine Check interrupt (IVOR1) e200z759n3 implements the Machine Check exception as defined in the Freescale EIS Machine Check APU except for automatic clearing of the MSRDE bit (see later paragraph). This behavior is different from the definition in PowerISA 2.06. e200z759n3 initiates a Machine Check interrupt if any of the machine check sources listed in Table 7-2 is detected. As defined in Freescale EIS Machine Check APU, a machine check interrupt is taken for error report and NMI type machine check conditions even if MSRME is cleared, without the processor generating an internal checkstop condition. Processing of asynchronous type machine check sources (the sources reflected in the MCSR “async mchk” syndrome bits) is gated by MSRME. The Freescale EIS Machine Check APU defines a separate set of save/restore registers (MCSRR0/1), a Machine Check Syndrome register (MCSR) to record the source(s) of machine checks, and a Machine Check Address register (MCAR) to hold an address associated with a machine check for certain classes of machine checks. Return from Machine Check instructions (rfmci, se_rfmci) are also provided to support returns using MCSRR0/1. The MSRRI status bit is provided for software use in determining if multiple nested machine check exceptions have occurred. Software may interrogate the MCSRR1RI bit to determine if a machine check occurred during the initial portion of a machine check handler prior to handler code, which sets MSRRI to ‘1’ to indicate that the handler can now tolerate another machine check condition without losing state necessary for recovery. The MSRDE bit is not automatically cleared by a Machine Check exception, but can be configured to be cleared or left unchanged via the HID0 register (HID0MCCLRDE). Refer to Section 2.4.11, Hardware Implementation Dependent Register 0 (HID0). 7.7.2.1 Machine check causes Machine check causes are divided into different types: • Error Report Machine Check conditions • Non-Maskable Interrupt (NMI) machine check exceptions • Asynchronous machine check exceptions This division is intended to facilitate machine check handling in uni-processor, multiprocessor and multi-threaded systems. Although the initial implementation of the e200z759n3 does not implement multithreading, future versions are expected to, and the machine check model will remain compatible. In addition, the model is equally applicable to a single-threaded design. 7.7.2.1.1 Error report machine check exceptions Error report machine check exceptions are directly associated with the current instruction execution stream, and are presented to the interrupt mechanism in a manner analogous to an Instruction storage or data storage interrupt. Since the execution stream cannot continue execution without suffering from e200z759n3 Core Reference Manual, Rev. 2 492 Freescale Semiconductor corruption of architectural state, these exceptions are not masked by MSRME. Error report machine check exceptions are not necessarily recoverable if they occur during the initial portion of a machine check handler. The MSRRI and MCSRR1RI bits are provided to assist software in determining recoverability. For error report machine check exceptions, the MCSR (Machine Check Status Register) is updated only when the machine check interrupt is actually taken. The MCAR is not updated for error report machine check exceptions. Error report machine check exceptions encountered by program execution can be flushed if an older exception exists or if an asynchronous interrupt or machine check is taken before the instruction that encountered the error becomes the oldest instruction in the machine. In this case the corresponding MCSR bit will not be set due to the flushed exception condition (although the corresponding bit may have already been set by a previous instruction’s exception). Note that an async machine check condition may occur for the same error condition prior to the error report machine check, and the error report machine check may be discarded. Depending on the type of error, the MCSR IF, LD, G, or ST bit(s) will be set by hardware to reflect the error being reported. Software is responsible for clearing these syndrome bits by writing a ‘1’ to the bit(s) to be cleared. Hardware will not clear an error report bit once it is set. — MCSRIF will be set if the error occurred during an instruction fetch. — MCSRLD will be set if the error occurred for a load instruction. If the error occurred for a guarded load and the error source was from the external bus, MCSRG will also be set. — MCSRST will be set if the error occurred in the data cache (parity) or MMU (DTLB Error or DSI) for a store type instruction (including dcbz), if an external termination error was received on a cache-inhibited guarded store or on a store conditional instruction, or if an unsuccessful flush with invalidation occurs on a store conditional instruction due to a tag or data parity error or external bus error. If an external termination error occurred on a cache-inhibited guarded store, or on a guarded store conditional, MCSRG will also be set. Note that most (if not all) error report machine check exceptions will be accompanied by an associated asynchronous machine check exception on a single-threaded e200z759n3, although this will not generally be the case for a multi-threaded version. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 493 Table 7-9. Error report machine check exceptions Synchronous machine check source Instruction Fetch Load instruction Error type MCSR updates Precise1 (ICache tag array parity error or data array parity error) & L1CSR1ICEA=’00’ IF yes (ICache uncorrectable tag array parity error & L1CSR1ICEA=’01’ & line potentially locked (locked or lock parity error) was invalidated IF yes cacheable miss & L1CSR1ICEA=’00’ & any line with lock parity error IF yes cacheable miss & L1CSR1ICEA=’01’ & and line with uncorrectable lock parity error was invalidated IF yes External termination error IF yes (DCache tag array parity error or data array parity error) & L1CSR0DCEA=‘00’ LD yes (DCache uncorrectable tag array parity error or data array parity error) & L1CSR0DCEA=‘01’ & (line potentially locked (locked or lock parity error) was invalidated, or line potentially dirty (dirty or dirty parity error)) LD yes cacheable miss & L1CSR0DCEA=’00’ & any line with lock parity error, or dirty parity error on replacement line LD yes cacheable miss & L1CSR0DCEA=’01’ & line with uncorrectable lock parity error was invalidated LD yes LD, [G]2 yes LD yes DCache hit and dirty parity error & L1CSR0DCEA=‘00’ LD yes (DCache uncorrectable tag array parity error or data array parity error) & L1CSR0DCEA=‘01’ & line potentially dirty (dirty or dirty parity error) LD yes DCache data push parity error3 LD yes LD yes LD, [G]2 yes External termination error on load data Load and reserve instruction DCache tag array parity error & L1CSR0DCEA=‘00’ External termination error on dirty push3 External termination error on load e200z759n3 Core Reference Manual, Rev. 2 494 Freescale Semiconductor Table 7-9. Error report machine check exceptions (continued) Synchronous machine check source Store instruction Store conditional instruction MCSR updates Precise1 DCache tag array parity error & L1CSR0DCEA=‘00’ ST yes DCache uncorrectable tag array parity error & L1CSR0DCEA=‘01’ & (line potentially locked (locked or lock parity error) was invalidated, or line potentially dirty (dirty or dirty parity error)) ST yes cacheable miss & L1CSR0DCEA=’00’ & any line with lock parity error, or dirty parity error on replacement line ST yes cacheable miss & L1CSR0DCEA=’01’ & line with uncorrectable lock parity error was invalidated ST yes External termination error on CI+G store4 ST, G yes DCache tag array parity error & L1CSR0DCEA=‘00’ ST yes DCache hit and dirty parity error & L1CSR0DCEA=‘00’ ST yes DCache uncorrectable tag array parity error & L1CSR0DCEA=‘01’ & line potentially dirty (dirty or dirty parity error) ST yes DCache data push parity error5 ST yes ST yes ST, [G]6 yes DCache tag array parity error & miss & L1CSR0DCEA=‘00’ & any line with error is potentially dirty (dirty or dirty parity error) LD yes DCache uncorrectable tag array parity error & cacheable miss & L1CSR0DCEA=‘01’ & line potentially dirty (dirty or dirty parity error) LD yes DCache tag array parity error & miss & L1CSR0DCEA=‘00’ & (line potentially locked (locked or lock parity error) or line potentially dirty (dirty or dirty parity error)) LD yes DCache uncorrectable tag array parity error & miss & L1CSR0DCEA=‘01’ & (line potentially locked (locked or lock parity error) or line potentially dirty (dirty or dirty parity error)) LD yes Error type External termination error on dirty push5 External termination error on store conditional dcbst instruction dcbf instruction e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 495 Table 7-9. Error report machine check exceptions (continued) Synchronous machine check source Error type MCSR updates Precise1 DCache tag array parity error & cacheable miss & L1CSR0DCEA=‘00’ & line potentially locked (locked or lock parity error) LD yes DCache uncorrectable tag array parity error & cacheable miss & L1CSR0DCEA=‘01’ & line potentially locked (locked or lock parity error) LD yes (DCache tag array parity error or lock error) & miss & L1CSR0DCEA=‘00’ LD yes DCache uncorrectable tag array parity error & cacheable miss & L1CSR0DCEA=‘01’ & (line potentially locked (locked or lock parity error) was invalidated, or line potentially dirty (dirty or dirty parity error)) LD yes cacheable miss & L1CSR0DCEA=’00’ & any line with lock parity error, or dirty parity error on replacement line LD yes cacheable miss & L1CSR0DCEA=’01’ & line with uncorrectable lock parity error was invalidated LD yes LD, [G]2 yes (DCache tag array parity error or lock error) & cacheable miss & L1CSR0DCEA=‘00’ ST yes DCache uncorrectable tag array parity error & cacheable miss & L1CSR0DCEA=‘01’ & (line potentially locked (locked or lock parity error) was invalidated, or line potentially dirty (dirty or dirty parity error)) ST yes cacheable miss & L1CSR0DCEA=’00’ & any line with lock parity error, or dirty parity error on replacement line ST yes dcbz instruction7 cacheable miss & L1CSR0DCEA=’01’ & line with uncorrectable lock parity error was invalidated ST yes L1FINV0 flush or flush with invalidate operation DCache tag parity error & L1CSR0DCEA=‘00’and line potentially dirty (dirty or dirty parity error) LD yes dcblc instruction dcbtls, dcbtstls instruction External termination error on linefill dcbz instruction7 DCache uncorrectable tag parity error & L1CSR0DCEA=‘01’and line potentially dirty (dirty or dirty parity error) e200z759n3 Core Reference Manual, Rev. 2 496 Freescale Semiconductor Table 7-9. Error report machine check exceptions (continued) Synchronous machine check source icblc instruction icbtls instruction Exception vectoring 1 2 3 4 5 6 7 Error type MCSR updates Precise1 ICache tag array parity error & cacheable miss & L1CSR1ICEA=‘00’ & line potentially locked (locked or lock parity error) IF yes ICache uncorrectable tag array parity error & cacheable miss & L1CSR1ICEA=‘01’ & line potentially locked (locked or lock parity error) was invalidated IF yes (ICache tag array parity error or lock error) & cacheable miss & L1CSR1ICEA=‘00’ IF yes ICache uncorrectable tag array parity error & cacheable miss & L1CSR1ICEA=‘01’ & line potentially locked (locked or lock parity error) was invalidated IF yes External termination error on linefill IF yes ISI, ITLB, or Bus Error on first instruction fetch for an exception handler IF yes MCSRR0 will point to the instruction associated with the machine check condition G will be set if the load was a guarded load. Can only occur if the load and reserve causes a dirty line to be flushed Only reported if the store was a cache-inhibited guarded store Can only occur if the store conditional causes a dirty line to be flushed Only reported if the store was a guarded store. Alignment error may be generated concurrently 7.7.2.1.2 Non-maskable interrupt machine check exceptions Non-maskable interrupt exceptions are reported via the p_nmi_b input pin, which is transition sensitive. NMI exceptions are not gated by MSRME, thus are not necessarily recoverable if an NMI exception occurs during the initial part of a machine check exception handler. The MSRRI and MCSRR1RI bits are provide to assist software in determining recoverability. For NMI machine check exceptions, MCSRNMI is updated (set) only when the machine check interrupt is actually taken. Hardware does not clear the MCSRNMI syndrome bit. Software is responsible for clearing this syndrome bit by writing a ‘1’ to the bit(s) to be cleared. Hardware will not clear an NMI bit once it is set. The MCAR is not updated for NMI machine check exceptions. 7.7.2.1.3 Asynchronous machine check exceptions The remainder of machine check exceptions are classified as asynchronous machine check exceptions, as they are reported directly by the subsystem or resource that detected the condition. For many cases, the asynchronous condition will be reported simultaneously with a corresponding error report condition. These conditions are reported by immediately setting the corresponding MCSR “async mchk” syndrome e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 497 bit, regardless of the state of MSRME. Interrupts due to asynchronous machine check exceptions are gated by MSRME. If MSRME=0 at the time an async mchk bit becomes set, the interrupt will be postponed until MSRME is later set to ‘1’ (although a machine check interrupt may occur at the time of the event due to an error report exception). Asynchronous events are cumulative; hardware does not clear an async mchk syndrome bit. Software is responsible for clearing these syndrome bits by writing a ‘1’ to the bit(s) to be cleared. Hardware will not clear an async mchk bit once it is set. If MCSRMAV is cleared at the time an asynchronous machine check exception occurs that has a corresponding address (either an effective or real address) to log in the MCAR, then the MCAR and the MCSRMEA bit are updated, and the MCSRMAV bit is set. If MCSRMAV was previously set, then the MCAR and the MCSRMEA bit are not affected. Table 7-10 details all asynchronous machine check sources. Table 7-10. Asynchronous machine check exceptions Asynchronous Transaction machine check source source MCSR update1 Machine Check Input Pin3 External n/a Instruction Cache Instruction Fetch Data Cache Error type Tag array parity error & L1CSR1ICEA=00 MCAR update2 MCP MAV none IC_TPERR RA ICache hit, data array parity error & L1CSR1ICEA=00 IC_DPERR RA ICache cacheable miss, lock error, & L1CSR1ICEA=00 IC_TPERR, IC_LKERR RA L1CSR1ICEA=01 & auto-invalidation of locked or potentially locked line due to uncorrectable tag parity error IC_TPERR, IC_LKERR RA icblc Tag array parity error & cacheable miss & L1CSR1ICEA=00 & line potentially locked (locked or lock parity error) IC_TPERR, [IC_LKERR (if lock parity error)] RA icbtls (Tag array parity error or lock error) & cacheable miss & L1CSR1ICEA=00 IC_TPERR, [IC_LKERR (if lock parity error)] RA icblc icbtls L1CSR1ICEA=01 & Auto-invalidation of locked line due to uncorrectable tag parity error IC_TPERR, IC_LKERR RA dcblc Tag array parity error & cacheable miss & L1CSR0DCEA=00 & line potentially locked (lock or lock parity error) DC_TPERR, [DC_LKERR (if lock parity error)] RA MAV e200z759n3 Core Reference Manual, Rev. 2 498 Freescale Semiconductor Table 7-10. Asynchronous machine check exceptions (continued) Asynchronous Transaction machine check source source Data Cache Error type load or store Tag array parity error & L1CSR0DCEA=00 MCSR update1 MAV MCAR update2 DC_TPERR, [DC_LKERR (if lock parity error on line with tag parity error)] RA DC_TPERR RA Tag array parity error & cacheable miss & L1CSR0DCEA=00 DC_TPERR RA Tag array parity error & miss & L1CSR0DCEA=00 & (line potentially locked (locked or lock parity error) or line potentially dirty (dirty or dirty parity error)) DC_TPERR, [DC_LKERR (if lock parity error)] RA atomic load Hit & L1CSR0DCEA=00 & line has dirty parity error or store DC_TPERR RA dcbst, Tag array parity error & atomic load miss & L1CSR0DCEA=00 & line potentially dirty (dirty or store or dirty parity error) DC_TPERR, [DC_LKERR (if lock parity error)] RA load or store DCache cacheable miss & dcbtls L1CSR0DCEA=‘00’ & lock parity error dcbtstls dcbz DC_TPERR, DC_LKERR RA load or store dcbtls dcbtstls dcbz DCache cacheable miss & L1CSR0DCEA=‘00’ & dirty parity error on line to be replaced DC_TPERR RA load or store dcbtls dcbtstls dcbz DCache uncorrectable tag array parity error & L1CSR0DCEA=‘01’ & (line potentially locked (locked or lock parity error) was invalidated, or line potentially dirty (dirty or dirty parity error)) DC_TPERR, [DC_LKERR] RA L1FINV0 Tag array parity error & flush or flush L1CSR0DCEA=00 w/inv & line dirty or potentially dirty dcbtls dcbtstls dcbz dcbf e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 499 Table 7-10. Asynchronous machine check exceptions (continued) Asynchronous Transaction machine check source source Data Cache Error type L1FINV0 flush w/inv DCache uncorrectable tag array parity error & L1CSR0DCEA=‘01’ & line potentially dirty (dirty or dirty parity error)) dcblc DCache uncorrectable tag array parity error & L1CSR0DCEA=‘01’ & (line potentially locked (locked or lock parity error) was invalidated MCSR update1 MAV dcbst, DCache uncorrectable tag atomic load array parity error & or store L1CSR0DCEA=‘01’ & line potentially dirty (dirty or dirty parity error) MCAR update2 DC_TPERR RA DC_TPERR, [DC_LKERR] RA DC_TPERR, [DC_LKERR (if uncorrectable lock parity error)] RA dcbf DCache uncorrectable tag array parity error & L1CSR0DCEA=‘01’ & (line potentially locked (locked or lock parity error) or line potentially dirty (dirty or dirty parity error)) DC_TPERR, [DC_LKERR (if uncorrectable lock parity error)] RA L1FINV0 flush DCache uncorrectable tag array parity error & L1CSR0DCEA=‘01’ & line potentially dirty (dirty or dirty parity error) DC_TPERR RA DCache hit, data array parity error & L1CSR0DCEA=00 DC_DPERR RA DCache hit, data array parity error & L1CSR0DCEA=‘01’ & line potentially dirty (dirty or dirty parity error) DC_DPERR RA CP_PERR RA load replacement Data array push parity push error dcbf push dcbst push L1FINV0 push reservation instruction forced-push e200z759n3 Core Reference Manual, Rev. 2 500 Freescale Semiconductor Table 7-10. Asynchronous machine check exceptions (continued) Asynchronous Transaction machine check source source Error type MCSR update1 MCAR update2 Data Cache snoop lookup Tag array parity error & (cacheable miss, or hit only to way with tag parity error) MAV DC_TPERR, SNPERR BIU store or push Bus error on write or push MAV BUS_WRERR RA BUS_DRERR RA load Bus error on load fetch or store/w alloc linefill ate dcbtls dcbtstls Snoop Lookup Exception Vectoring RA (snoop address) load Bus error on error recovery refill BUS_DRERR RA instruction fetch Bus error on error recovery refill BUS_IRERR RA icbtls CI or cache disabled Ifetch Bus error on icbtls fill Bus error on CI Ifetch Bus error on cache disabled Ifetch BUS_IRERR RA load Bus error on locked line error recovery refill BUS_DRERR, DC_LKERR RA instruction fetch Bus error on locked line error recovery refill BUS_IRERR, IC_LKERR RA INV snoop command type Tag array parity error & (miss, or hit only to way with tag parity error) MAV SNPERR, DC_TPERR RA4 first ISI or Bus Error on first instruction instruction fetch for an fetch for an exception handler exception handler MAV EXCP_ERR RA first ITLB Error on first instruction instruction fetch for an fetch for an exception handler exception handler MAV EXCP_ERR EA 1 The MCSR update column indicates which bits in the MCSR will be updated when the exception is logged. The MCAR update column indicates whether or not the error will provide either a real address (RA), effective address (EA), or no address (none) that is associated with the error. 3 The machine check input pin is used by the platform logic to indicate machine check type errors that are detected by the platform. Software must query error logging information within the platform logic to determine the specific error condition and source. 4 The RA stored in the MCAR for this case will be Snoop Address value, with the index bits set to 0. 2 Table 7-11details the priority of asynchronous machine check updates to the MCAR when multiple simultaneous async machine check conditions occur. Note that since a lower priority condition may occur e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 501 and then a higher priority condition may subsequently occur prior to the machine check interrupt handler reading the MCSR and MCAR, the interrupt handler may not necessarily see the higher priority MCAR value, even though multiple MCSR bits are set. Table 7-11. Asynchronous machine check MCAR update priority Priority (0 — highest) 0 Asynchronous machine check source Exception Vectoring Transaction source Error type (MCSR update) first instruction fetch for an exception handler ISI or Bus Error on first instruction fetch for an exception handler EXCP_ERR first instruction fetch for an exception handler ITLB Error on first instruction fetch for an exception handler EXCP_ERR replacement push dcbf push dcbst push L1FINV0 push reservation-type instruction forced push Dirty push parity error 1 Data Cache 2 BIU store or push Bus error on write or push BUS_WRERR 3 Data Cache load or store dcblc dcbtls dcbtstls dcbz Uncorrectable tag array parity error & L1CSR0DCEA=01 & locked line invalidated DC_TPERR, DC_LKERR 4 Instruction Cache icblc icbtls instruction fetch Uncorrectable tag array parity error & L1CSR1ICEA=01 & locked line invalidated IC_TPERR, IC_LKERR 5 BIU load Bus error on locked line error recovery refill BUS_DRERR, DC_LKERR 6 BIU instruction fetch Bus error on locked line error recovery refill BUS_IRERR, IC_LKERR 7 Data Cache load or store dcbf dcbtls dcbtstls dcbz L1FINV0 flush or flush w/inv & line dirty Tag array parity error & L1CSR0DCEA=00 load or store dcbtls dcbtstls dcbz Cacheable miss & L1CSR0DCEA=‘00’ & dirty parity error on line to be replaced 7 Data Cache CP_PERR DC_TPERR Uncorrectable tag array parity error & L1CSR0DCEA=01 & line dirty or potentially dirty DC_TPERR e200z759n3 Core Reference Manual, Rev. 2 502 Freescale Semiconductor Table 7-11. Asynchronous machine check MCAR update priority (continued) Priority (0 — highest) Asynchronous machine check source 7 Data Cache 8 9 10 Data Cache Data Cache Data Cache Transaction source Error type (MCSR update) DC_TPERR, DC_LKERR load or store dcbtls dcbtstls dcbz Cacheable miss & L1CSR0DCEA=00 & lock parity error dcbst Tag array parity error & L1CSR0DCEA=00 & line potentially dirty (dirty or dirty parity error) DC_TPERR, [DC_LKERR (if lock parity error)] Uncorrectable tag array parity error & L1CSR0DCEA=01 & line potentially dirty (dirty or dirty parity error) DC_TPERR, [DC_LKERR (if uncorrectable lock parity error)] Tag array parity error & L1CSR0DCEA=00 & line potentially locked (locked or lock parity error) DC_TPERR, [DC_LKERR (if lock parity error)] Uncorrectable tag array parity error & L1CSR0DCEA=01 & line potentially locked (locked or lock parity error) DC_TPERR, [DC_LKERR (if uncorrectable lock parity error)] dcblc load Cacheable miss & L1CSR0DCEA=01 & uncorrectable lock parity error Data array parity error & L1CSR0DCEA=00 DC_DPERR Data array parity error & line dirty or potentially dirty & L1CSR0DCEA=01 11 Instruction Cache icblc Tag array parity error & L1CSR1ICEA=00 & line locked or lock parity error IC_TPERR, [IC_LKERR] icbtls Tag array parity error & L1CSR1ICEA=00 IC_TPERR Cacheable miss & L1CSR1ICEA=00 & lock parity error IC_TPERR, IC_LKERR Cacheable miss & L1CSR1ICEA=01 & uncorrectable lock parity error e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 503 Table 7-11. Asynchronous machine check MCAR update priority (continued) Priority (0 — highest) Asynchronous machine check source 12 BIU 13 BIU Transaction source Error type (MCSR update) load store/w allocate dcbtls dcbtstls Bus error on load or linefill or data refill BUS_DRERR icbtls Bus error on linefill or data refill Bus error on CI Ifetch Bus error on cache disabled Ifetch BUS_IRERR snoop lookup Tag parity error & (miss, or hit only to way with tag parity error) DC_TPERR, SNPERR Instruction Fetch Tag array parity error & L1CSR1ICEA=00 IC_TPERR Data array parity error & L1CSR1ICEA=00 IC_DPERR Cacheable miss & L1CSR1ICEA=00 & lock parity error IC_TPERR, IC_LKERR CI or cache disabled Ifetch 14 Data Cache 15 Instruction Cache 16 Instruction Cache 17 Instruction Cache Instruction Fetch Cacheable miss & L1CSR1ICEA=01 & uncorrectable lock parity error 7.7.2.2 Machine check interrupt actions Machine Check interrupts for “error report” conditions and NMI are enabled and taken regardless of the state of MSRME. Machine check interrupts due to an “async mchk” syndrome bit being set in MCSR are only taken when MSRME =1. When a Machine Check interrupt is taken, registers are updated as shown in Table 7-12. Table 7-12. Machine check interrupt — register settings Register Setting description MCSRR0 On a best-effort basis e200z759n3 sets this to the address of some instruction that was executing or about to be executing when the machine check condition occurred. MCSRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE 0 EE 0 PR 0 FP ME FE0 DE 0 0 0 0/—1 FE1 IS DS PMM RI 0 0 0 0 0 e200z759n3 Core Reference Manual, Rev. 2 504 Freescale Semiconductor Table 7-12. Machine check interrupt — register settings (continued) Register ESR 1 Setting description Unchanged MCSR Updated to reflect the source(s) of a machine check. Hardware only sets appropriate bits, no previously set bits are cleared by hardware. MCAR See Table 7-10 Vector IVPR0:15 || IVOR116:27 || 4b0000 DE is cleared when the Debug APU is disabled. Clearing of DE is optionally supported by control in HID0 when the Debug APU is enabled. The Machine Check Syndrome register is provided to identify the source(s) of a machine check, and in conjunction with MCSRR1RI, may be used to identify recoverable events. The MSRRI status bit is provided for software use in determining if multiple nested machine check exceptions have occurred. Software may interrogate the MCSRR1RI bit to determine if a machine check occurred during the initial portion of a machine check handler prior to handler code that sets MSRRI to ‘1’ to indicate that the handler can now tolerate another machine check condition without losing state necessary for recovery. The interrupt handler should set MSRRI as soon as possible after saving off working registers and MCSRR0,1 to avoid loss of state if another machine check condition were to occur. The Machine Check input pin p_mcp_b can be masked by HID0EMCP. The Non-Maskable Interrupt machine check input pin p_nmi_b is never masked. Precise external termination errors occur when a load or cache-inhibited or guarded store is terminated by assertion of p_tea_b (external bus ERROR termination response); these result in both an “error report” and an “async mchk” machine check exception. Some machine check exceptions are unrecoverable in the sense that execution cannot resume in the context that existed before the interrupt; however, system software can use the machine check interrupt handler to try to identify and recover from the machine check condition. 7.7.2.3 Checkstop state Machine checks no longer result in a checkstop and there is no checkstop state implemented on Zen z7. 7.7.3 Data Storage interrupt (IVOR2) A Data Storage interrupt (DSI) may occur if no higher priority exception exists and one of the following exception conditions exists: • Read or Write Access Control exception condition • Byte Ordering exception condition • Cache Locking exception condition Access control is defined as in PowerISA 2.06. A Byte Ordering exception condition occurs for any misaligned access across a page boundary to pages with mismatched E bits. Cache locking exception e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 505 conditions occur for any attempt to execute a dcbtls, dcbtstls, dcblc, icbtls, or icblc in user mode with MSRUCLE = 0. Table 7-13 lists register settings when a DSI is taken. Table 7-13. Data Storage Interrupt—register settings Register 7.7.4 Setting description SRR0 Set to the effective address of the excepting load/store instruction. SRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE — EE 0 PR 0 FP ME FE0 DE ESR Access: Byte ordering: Cache locking: [ST], [SPE], [VLEMI]. All other bits cleared. [ST], [SPE], [VLEMI], BO. All other bits cleared. (DLK, ILK), [VLEMI], [ST]. All other bits cleared. 0 — 0 — FE1 IS DS PMM RI 0 0 0 0 — MCSR Unchanged DEAR For Access and Byte ordering exceptions, set to the effective address of a byte within the page whose access caused the violation. Undefined on Cache locking exceptions (Zen does not update the DEAR on a cache locking exception) Vector IVPR0:15 || IVOR216:27 || 4b0000 Instruction Storage interrupt (IVOR3) An Instruction Storage interrupt (ISI) occurs when no higher priority exception exists and an Execute Access Control exception occurs. This interrupt is implemented as defined by PowerISA 2.06.,with the addition of Misaligned Instruction Fetch exceptions, and the extension of the Byte Ordering exception status to also cover Mismatched Instruction Storage exceptions. Exception extensions implemented in e200z759n3 for PowerISA VLE involve extending the definition of the Instruction Storage Interrupt to include Byte Ordering exceptions for instruction accesses, and Misaligned Instruction Fetch exceptions, and corresponding updates to the ESR as shown in Table 7-14 and Table 7-15. Table 7-14. ISI exceptions and conditions Interrupt type Interrupt vector offset register Instruction Storage IVOR 3 Causing conditions • Access control. • Byte ordering due to misaligned instruction across page boundary to pages with mismatched VLE bits, or access to page with VLE set, and E indicating little-endian. • Misaligned Instruction fetch due to a change of flow to an odd halfword instruction boundary on a BookE (non-VLE) instruction page Table 7-15 lists register settings when an ISI is taken. e200z759n3 Core Reference Manual, Rev. 2 506 Freescale Semiconductor Table 7-15. Instruction storage interrupt—register settings Register Setting Description SRR0 Set to the effective address of the excepting instruction. SRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE — EE 0 PR 0 ESR [BO, MIF, VLEMI]. All other bits cleared. FP ME FE0 DE MCSR Unchanged DEAR Unchanged Vector IVPR0:15 || IVOR316:27 || 4b0000 7.7.5 0 — 0 — FE1 IS DS PMM RI 0 0 0 0 — External Input interrupt (IVOR4) An External Input exception is signaled to the processor by the assertion of the external interrupt pin (p_extint_b). The p_extint_b input is a level-sensitive signal expected to remain asserted until e200z759n3 acknowledges the external interrupt. If p_extint_b is negated early, recognition of the interrupt request is not guaranteed. When e200z759n3 detects the exception, if the exception is enabled by MSREE, e200z759n3 takes the External Input interrupt. An External Input interrupt may be delayed by other higher priority exceptions or if MSREE is cleared when the exception occurs. Table 7-16 lists register settings when an External Input interrupt is taken. Table 7-16. External Input interrupt—register settings Register Setting description SRR0 Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present. SRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE — EE 0 PR 0 ESR Unchanged MCSR Unchanged DEAR Unchanged Vector IVPR0:15 || IVOR416:27 || 4b0000 IVPR0:15 || p_voffset[0:11] || 4b0000 (non-autovectored) FP ME FE0 DE 0 — 0 — FE1 IS DS PMM RI 0 0 0 0 — e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 507 IVOR4 is the vector offset register used by autovectored External Input interrupts to determine the interrupt handler location. e200z759n3 also provides the capability to directly vector External Input interrupts to multiple handlers by allowing a External Input interrupt request to be accompanied by a vector offset. The p_voffset[0:11] input signals are used in place of the value in IVOR4 when a External Input interrupt request is not autovectored (p_avec_b negated when p_extint_b asserted). 7.7.6 Alignment interrupt (IVOR5) e200z759n3 implements the Alignment interrupt as defined by PowerISA 2.06. An Alignment exception is generated when any of the following occurs: • The operand of lmw or stmw not word aligned. • The operand of lwarx or stwcx. not word aligned. • The operand of lharx or sthcx. not halfword aligned. • Execution of a dcbz instruction is attempted with a disabled cache. • Execution of a dcbz instruction with an enabled cache and W or I =1. • Execution of a SPE APU load or store instruction that is not properly aligned. Table 7-17 lists register settings when an alignment interrupt is taken. Table 7-17. Alignment interrupt—register settings Register 7.7.7 Setting description SRR0 Set to the effective address of the excepting load/store instruction. SRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE — EE 0 PR 0 ESR [ST], [SPE], [VLEMI]. All other bits cleared. FP ME FE0 DE 0 — 0 — FE1 IS DS PMM RI 0 0 0 0 — MCSR Unchanged DEAR Set to the effective address of a byte of the load or store whose access caused the violation. Vector IVPR0:15 || IVOR516:27 || 4b0000 Program interrupt (IVOR6) e200z759n3 implements the Program interrupt as defined by PowerISA 2.06. A program interrupt occurs when no higher priority exception exists and one or more of the following exception conditions defined in PowerISA 2.06 occur: • Illegal Instruction exception • Privileged Instruction exception • Trap exception e200z759n3 Core Reference Manual, Rev. 2 508 Freescale Semiconductor • Unimplemented Operation exception e200z759n3 will invoke an Illegal Instruction program exception on attempted execution of the following instructions: • Unimplemented instructions • Instruction from the illegal instruction class • mtspr and mfspr instructions with an undefined SPR specified • mtdcr and mfdcr instructions with an undefined DCR specified e200z759n3 will invoke a Privileged Instruction program exception on attempted execution of the following instructions when MSRPR=1 (user mode): • A privileged instruction • mtspr and mfspr instructions that specify a SPRN value with SPRN5=1 (even if the SPR is undefined). e200z759n3 will invoke an Trap exception on execution of the tw and twi instructions if the trap conditions are met and the exception is not also enabled as a Debug interrupt. e200z759n3 will invoke an Illegal instruction program exception on attempted execution of the instructions lswi, lswx, stswi, stswx, mfapidi, mfdcrx, mtdcrx, or on any PowerISA 2.06 floating point instruction when MSRFP=1. All other defined or allocated instructions that are not implemented by e200z759n3 will cause a illegal instruction program exception. Table 7-18 lists register settings when a Program interrupt is taken. Table 7-18. Program interrupt—register settings Register 7.7.8 Setting description SRR0 Set to the effective address of the excepting instruction. SRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE — EE 0 PR 0 FP ME FE0 DE ESR Illegal: Privileged: Trap: PIL, [VLEMI]. All other bits cleared. PPR, [VLEMI]. All other bits cleared. PTR, [VLEMI]. All other bits cleared. MCSR Unchanged DEAR Unchanged Vector IVPR0:15 || IVOR616:27 || 4b0000 0 — 0 — FE1 IS DS PMM RI 0 0 0 0 — Floating-Point Unavailable interrupt (IVOR7) The Floating-point Unavailable exception is not used by e200z759n3. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 509 7.7.9 System Call interrupt (IVOR8) A System Call interrupt occurs when a System Call (sc, se_sc) instruction is executed and no higher priority exception exists. Exception extensions implemented in e200z759n3 for PowerISA VLE include modification of the System Call Interrupt definition to include updating the ESR. Table 7-19 lists register settings when a System Call interrupt is taken. Table 7-19. System Call interrupt—register settings Register Setting description SRR0 Set to the effective address of the instruction following the sc instruction. SRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE — EE 0 PR 0 ESR [VLEMI] All other bits cleared. MCSR Unchanged DEAR Unchanged Vector IVPR0:15 || IVOR816:27 || 4b0000 7.7.10 FP ME FE0 DE 0 — 0 — FE1 IS DS PMM RI 0 0 0 0 — Auxiliary Processor Unavailable interrupt (IVOR9) An Auxiliary Processor Unavailable exception is defined by PowerISA 2.06 to occur when an attempt is made to execute an APU instruction that is implemented but configured as unavailable, and no higher priority exception condition exists. e200z759n3 does not utilize this interrupt. 7.7.11 Decrementer interrupt (IVOR10) e200z759n3 implements the Decrementer exception as described in Chapter 8, “Timer Facilities” beginning on page 181 in Book E: Enhanced PowerPCtm Architecture v0.99. A Decrementer interrupt occurs when no higher priority exception exists, a Decrementer exception condition exists (TSRDIS=1), and the interrupt is enabled (both TCRDIE and MSREE=1). The Timer Status Register (TSR) holds the Decrementer interrupt bit set by the Timer facility when an exception is detected. Software must clear this bit in the interrupt handler to avoid repeated Decrementer interrupts. Table 7-20 lists register settings when a Decrementer interrupt is taken. e200z759n3 Core Reference Manual, Rev. 2 510 Freescale Semiconductor Table 7-20. Decrementer interrupt—register settings Register Setting description SRR0 Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present. SRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE — EE 0 PR 0 ESR Unchanged MCSR Unchanged DEAR Unchanged Vector IVPR0:15 || IVOR1016:27 || 4b0000 7.7.12 FP ME FE0 DE 0 — 0 — FE1 IS DS PMM RI 0 0 0 0 — Fixed-Interval Timer interrupt (IVOR11) e200z759n3 implements the Fixed-Interval Timer (FIT) exception as described in Chapter 8, “Timer Facilities” beginning on page 181 in Book E: Enhanced PowerPCtm Architecture v0.99. The triggering of the exception is caused by selected bits in the Time Base register changing from 0 to 1. A Fixed-Interval Timer interrupt occurs when no higher priority exception exists, a FIT exception exists (TSRFIS=1), and the interrupt is enabled (both TCRFIE and MSREE=1). The Timer Status Register (TSR) holds the FIT interrupt bit set by the Timer facility when an exception is detected. Software must clear this bit in the interrupt handler to avoid repeated FIT interrupts. Table 7-21 lists register settings when a FIT interrupt is taken. Table 7-21. Fixed-Interval Timer interrupt—register settings Register Setting description SRR0 Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present. SRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE — EE 0 PR 0 ESR Unchanged MCSR Unchanged FP ME FE0 DE 0 — 0 — FE1 IS DS PMM RI 0 0 0 0 — e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 511 Table 7-21. Fixed-Interval Timer interrupt—register settings (continued) DEAR Unchanged Vector IVPR0:15 || IVOR1116:27 || 4b0000 7.7.13 Watchdog Timer interrupt (IVOR12) e200z759n3 implements the Watchdog Timer (WDT) exception as described in Chapter 8, “Timer Facilities” beginning on page 181 in Book E: Enhanced PowerPCtm Architecture v0.99. The triggering of the exception is caused by the first enabled watchdog time-out. A Watchdog Timer interrupt occurs when no higher priority exception exists, a Watchdog Timer exception exists (TSRWIS=1), and the interrupt is enabled (both TCRWIE and MSRCE=1). The Timer Status Register (TSR) holds the Watchdog interrupt bit set by the Timer facility when an exception is detected. Software must clear this bit in the interrupt handler to avoid repeated Watchdog interrupts. Table 7-22 lists register settings when a Watchdog Timer interrupt is taken. Table 7-22. Watchdog Timer interrupt—register settings 1 Register Setting description CSRR0 Set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present. CSRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE 0 EE 0 PR 0 ESR Unchanged MCSR Unchanged DEAR Unchanged Vector IVPR0:15 || IVOR1216:27 || 4b0000 FP ME FE0 DE 0 — 0 0/—1 FE1 IS DS PMM RI 0 0 0 0 — DE is cleared when the Debug APU is disabled. Clearing of DE is optionally supported by control in HID0 when the Debug APU is enabled. The MSRDE bit is not automatically cleared by a Watchdog Timer interrupt, but can be configured to be cleared via the HID0 register (HID0CICLRDE). Refer to Section 2.4.11, Hardware Implementation Dependent Register 0 (HID0). 7.7.14 Data TLB Error interrupt (IVOR13) A Data TLB Error interrupt occurs when no higher priority exception exists and a Data TLB Error exception exists due to a data translation lookup miss in the TLB. e200z759n3 Core Reference Manual, Rev. 2 512 Freescale Semiconductor Table 7-23 lists register settings when a DTLB interrupt is taken. Table 7-23. Data TLB Error interrupt—register settings Register Setting description SRR0 Set to the effective address of the excepting load/store instruction. SRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE — EE 0 PR 0 ESR [ST], [SPE], [VLEMI]. All other bits cleared. FP ME FE0 DE 0 — 0 — FE1 IS DS PMM RI 0 0 0 0 — MCSR Unchanged DEAR Set to the effective address of a byte of the load or store whose access caused the violation. Vector IVPR0:15 || IVOR1316:27 || 4b0000 7.7.15 Instruction TLB Error interrupt (IVOR14) A Instruction TLB Error interrupt occurs when no higher priority exception exists and an Instruction TLB Error exception exists due to an instruction translation lookup miss in the TLB. Exception extensions implemented in e200z759n3 for PowerISA VLE involve extending the definition of the Instruction TLB Error Interrupt to include updating the ESR. Table 7-24 lists register settings when an ITLB interrupt is taken. Table 7-24. Instruction TLB Error interrupt—register settings Register Setting description SRR0 Set to the effective address of the excepting instruction. SRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE — EE 0 PR 0 ESR [MIF] All other bits cleared. FP ME FE0 DE MCSR Unchanged DEAR Unchanged Vector IVPR0:15 || IVOR1416:27 || 4b0000 0 — 0 — FE1 IS DS PMM RI 0 0 0 0 — e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 513 7.7.16 Debug interrupt (IVOR15) e200z759n3 implements the Debug Interrupt as defined in PowerISA 2.06 with the following changes: • When the Debug APU is enabled, Debug is no longer a critical interrupt, but uses DSRR0 and DSRR1 for saving machine state on context switch • A Return from debug interrupt instruction (rfdi or se_rfdi) is implemented to support the new machine state registers • A Critical Interrupt Taken debug event is defined to allow critical interrupts to generate a debug event • A Critical Return debug event is defined to allow debug events to be generated for rfci and se_rfci instructions There are multiple sources that can signal a Debug exception. A Debug interrupt occurs when no higher priority exception exists, a Debug exception exists in the Debug Status Register, and Debug interrupts are enabled (both DBCR0IDM=1 (internal debug mode) and MSRDE=1). Enabling debug events and other debug modes are discussed further in Chapter 12, Debug Support. With the Debug APU enabled, (See Section 2.4.11, Hardware Implementation Dependent Register 0 (HID0)) the Debug interrupt has its own set of machine state save/restore registers (DSRR0, DSRR1) to allow debugging of both critical and non-critical interrupt handlers. In addition, the capability is provided to allow interrupts to be handled while in a debug software handler. External and Critical interrupts are not automatically disabled when a Debug interrupt occurs but can be configured to be cleared via the HID0 register (HID0DCLREE, DCLRCE). Refer to Section 2.4.11, Hardware Implementation Dependent Register 0 (HID0). When the Debug APU is disabled, Debug interrupts use the CSRR0 and CSRR1 registers to save machine state. NOTE For additional details regarding the following descriptions of debug exception types, refer to Section 12.2, Software debug events and exceptions. An Instruction Address Compare (IAC) debug exception occurs when there is an instruction address match as defined by the debug control registers and Instruction Address Compare events are enabled. This could either be a direct instruction address match or a selected set of instruction addresses. IAC has the highest interrupt priority of all instruction-based interrupts, even if the instruction itself may have encountered an Instruction TLB error or Instruction Storage exception. A Branch Taken (BRT) debug exception is signaled when a branch instruction is considered taken by the branch unit and branch taken events are enabled. The Debug interrupt is taken when no higher priority exception is pending. A Data Address Compare (DAC) exception is signaled when there is a data access address match as defined by the debug control registers and Data Address Compare events are enabled. This could either be a direct data address match or a selected set of data addresses, or a combination of data address and data value matching. The Debug interrupt is taken when no higher priority exception is pending. The e200z759n3 implementation provides IAC linked with DAC exceptions. This results in a DAC exception only if one or more IAC conditions are also met. See Chapter 12, Debug Support, for more details. e200z759n3 Core Reference Manual, Rev. 2 514 Freescale Semiconductor A Trap (TRAP) debug exception occurs when a program trap exception is generated while trap events are enabled. If MSRDE is set, the Debug exception has higher priority than the Program exception in this case, and will be taken instead of a Trap type Program Interrupt. The Debug interrupt is taken when no higher priority exception is pending. If MSRDE is cleared when a trap debug exception occurs, a Trap exception type Program interrupt will occur instead. A Return (RET) debug exception occurs when executing an rfi or se_rfi instruction and return debug events are enabled. Return debug exceptions are not generated for rfci or se_rfci instructions. If MSRDE=1 at the time of the execution of the rfi or se_rfi, a Debug interrupt will occur provided there exists no higher priority exception that is enabled to cause an interrupt. CSRR0 (Debug APU disabled) or DSRR0 (Debug APU enabled) will be set to the address of the rfi or se_rfi instruction. If MSRDE=0 at the time of the execution of the rfi or se_rfi, a Debug interrupt will not occur immediately, but the event will be recorded by setting the DBSRRET and DBSRIDE status bits. A Critical Return (CRET) debug exception occurs when executing an rfci or se_rfci instruction and critical return debug events are enabled. Critical return debug exceptions are only generated for rfci or se_rfci instructions. If MSRDE=1 at the time of the execution of the rfci or se_rfci, a Debug interrupt will occur provided there exists no higher priority exception that is enabled to cause an interrupt. CSRR0 (Debug APU disabled) or DSRR0 (Debug APU enabled) will be set to the address of the rfci or se_rfci instruction. If MSRDE=0 at the time of the execution of the rfci or se_rfci, a Debug interrupt will not occur immediately, but the event will be recorded by setting the DBSRCRET and DBSRIDE status bits. Note that critical return debug events should not normally be enabled unless the Debug APU is enabled to avoid corruption of CSRR0/1. An Instruction Complete (ICMP) debug exception is signaled following execution and completion of an instruction while this event is enabled. A mtmsr or mtdbcr0 that causes both MSRDE and DBCR0IDM to end up set, enabling precise debug mode, may cause an Imprecise (Delayed) Debug exception to be generated due to an earlier recorded event in the Debug Status register. An Interrupt Taken (IRPT) debug exception occurs when a non-critical interrupt context switch is detected. This exception is imprecise and unordered with respect to the program flow. Note that an IRPT Debug interrupt will only occur when detecting a non-critical interrupt on e200z759n3. The value saved in CSRR0/DSRR0 will be the address of the non-critical interrupt handler. A Critical Interrupt Taken (CIRPT) debug exception occurs when a critical interrupt context switch is detected. This exception is imprecise and unordered with respect to the program flow. Note that a CIRPT Debug interrupt will only occur when detecting a critical interrupt on e200z759n3. The value saved in CSRR0/DSRR0 will be the address of the critical interrupt handler. Note that Critical Interrupt Taken debug events should not normally be enabled unless the Debug APU is enabled to avoid corruption of CSRR0/1. An Unconditional Debug Event (UDE) exception occurs when the Unconditional Debug Event pin (p_ude) transitions to the asserted state. Debug Counter Debug exceptions occur when enabled and one of the Debug counters decrements to zero. External Debug exceptions occur when enabled and one of the External Debug Event pins (p_devt1, p_devt2) transitions to the asserted state. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 515 The Debug Status Register (DBSR) provides a syndrome to differentiate between debug exceptions that can generate the same interrupt. For more details see Chapter 12, Debug Support”. Table 7-25 lists register settings when a Debug interrupt is taken. Table 7-25. Debug interrupt—register settings Register Setting description CSRR0/ DSRR01 Set to the effective address of the excepting instruction for IAC, BRT, RET, CRET, and TRAP. Set to the effective address of the next instruction to be executed following the excepting instruction for DAC and ICMP. For a UDE, IRPT, CIRPT, DCNT, or DEVT type exception, set to the effective address of the instruction that the processor would have attempted to execute next if no exception conditions were present. CSRR1/ DSRR1 Set to the contents of the MSR at the time of the interrupt MSR DBSR3 UCLE 0 SPE 0 WE 0 CE —/02 EE —/02 PR 0 FP ME FE0 DE Unconditional Debug Event: Instr. Complete Debug Event: Branch Taken Debug Event: Interrupt Taken Debug Event: Critical Interrupt Taken Debug Event: Trap Instruction Debug Event: Instruction Address Compare: Data Address Compare: Return Debug Event: Critical Return Debug Event: Debug Counter Event: External Debug Event: and optionally, an Imprecise Debug Event flag UDE ICMP BRT IRPT CIRPT ESR Unchanged MCSR Unchanged DEAR Unchanged Vector IVPR0:15 || IVOR1516:27 || 4b0000 0 — 0 0 FE1 IS DS PMM RI 0 0 0 0 — TRAP {IAC1, IAC2, IAC3, IAC4} {DAC1R, DAC1W, DAC2R, DAC2W} RET CRET {DCNT1, DCNT2} {DEVT1, DEVT2} {IDE} 1 assumes that the Debug interrupt is precise conditional based on control bits in HID0 3 Note that multiple DBSR bits may be set 2 7.7.17 System Reset interrupt e200z759n3 implements the System Reset interrupt as defined in PowerISA 2.06. The System Reset exception is a non-maskable, asynchronous exception signaled to the processor through the assertion of system-defined signals. e200z759n3 Core Reference Manual, Rev. 2 516 Freescale Semiconductor A System reset may be initiated by either asserting the p_reset_b input signal or during power-on reset by asserting m_por. The m_por signal must be asserted during power up and must remain asserted for a period that allows internal logic to be reset. The p_reset_b signal must also remain asserted for a period that allows internal logic to be reset. This period is specified in the hardware specifications. If m_por or p_reset_b are asserted for less than the required interval, the results are not predictable. When a reset request occurs, the processor branches to the system reset exception vector (value on p_rstbase[0:29] concatenated with 2’b00) without attempting to reach a recoverable state. If reset occurs during normal operation, all operations cease and the machine state is lost. CPU internal state after a reset is defined in Section 2.6, Reset settings. Reset may also be initiated by Watchdog Timer or Debug Reset Control. Watchdog Timer and Debug Reset Control provide the capability to assert the p_wrs[0:1] and p_dbrstc[0:1] signals. External logic may factor this into the p_reset_b input signal to cause a e200z759n3 reset to occur. Table 7-26 shows the TSR register bits associated with Watchdog Timer reset status. Note that these bits will be cleared when a processor reset occurs, thus if the p_wrs[0:1] outputs are factored into p_reset_b, they will only be seen in the “00” state by software. Table 7-26. TSR Watchdog Timer reset status Bits Name 2:3 (34:35) WRS Function 00 01 10 11 No action performed by Watchdog Timer Watchdog Timer second time-out caused p_wrs[1] to be asserted Watchdog Timer second time-out caused p_wrs[0] to be asserted Watchdog Timer second time-out caused p_wrs[0] and p_wrs[1] to be asserted Table 7-27 shows the DBSR register bits associated with reset status. Table 7-27. DBSR most recent reset Bits Name 2:3 (34:35) MRR Function 00 01 10 11 No reset occurred since these bits were last cleared by software A reset occurred since these bits were last cleared by software Reserved Reserved Table 7-28 lists register settings when a System Reset interrupt is taken. Table 7-28. System Reset Interrupt—register settings Register Setting description CSRR0 Undefined. CSRR1 Undefined. MSR UCLE 0 SPE 0 WE 0 CE 0 EE 0 PR 0 FP ME FE0 DE 0 0 0 0 FE1 IS DS PMM RI 0 0 0 0 0 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 517 Table 7-28. System Reset Interrupt—register settings (continued) Register ESR Setting description Cleared DEAR Undefined Vector [p_rstbase[0:29]] || 2’b00 7.7.18 SPE/EFPU APU Unavailable interrupt (IVOR32) The SPE APU Unavailable exception is taken if MSRSPE is cleared and execution of a SPE or EFPU APU instruction other than the scalar floating-point instructions (efsxxx) or brinc is attempted. When the SPE/EFPU APU Unavailable exception occurs, the processor suppresses execution of the instruction causing the exception. Table 7-29 lists register settings when a SPE/EFPU Unavailable interrupt is taken. Table 7-29. SPE/EFPU Unavailable interrupt—register settings Register Setting description SRR0 Set to the effective address of the excepting SPE/EFPU instruction. SRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE — EE 0 PR 0 ESR SPE, [VLEMI]. All other bits cleared. FP ME FE0 DE MCSR Unchanged DEAR Unchanged Vector IVPR0:15 || IVOR3216:27 || 4b0000 7.7.19 0 — 0 — FE1 IS DS PMM RI 0 0 0 0 — Embedded Floating-point Data interrupt (IVOR33) The Embedded Floating-point Data interrupt is taken if no higher priority exception exists and a EFPU Floating-point Data exception is generated. When a Floating-point Data exception occurs, the processor suppresses execution of the instruction causing the exception. Table 7-30 lists register settings when a EFPU Floating-point Data interrupt is taken. Table 7-30. Embedded Floating-point Data interrupt—register settings Register Setting description SRR0 Set to the effective address of the excepting EFPU instruction. SRR1 Set to the contents of the MSR at the time of the interrupt e200z759n3 Core Reference Manual, Rev. 2 518 Freescale Semiconductor Table 7-30. Embedded Floating-point Data interrupt—register settings (continued) Register Setting description MSR UCLE 0 SPE 0 WE 0 CE — EE 0 PR 0 ESR SPE, [VLEMI]. All other bits cleared. FP ME FE0 DE MCSR Unchanged DEAR Unchanged Vector IVPR0:15 || IVOR3316:27 || 4b0000 7.7.20 0 — 0 — FE1 IS DS PMM RI 0 0 0 0 — Embedded Floating-point Round interrupt (IVOR34) The Embedded Floating-point Round interrupt is taken when a EFPU floating-point instruction generates an inexact result and inexact exceptions are enabled. Table 7-31 lists register settings when a EFPU Floating-point Round interrupt is taken. Table 7-31. Embedded Floating-point Round interrupt—register settings Register Setting description SRR0 Set to the effective address of the instruction following the excepting EFPU instruction. SRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE — EE 0 PR 0 ESR SPE, [VLEMI]. All other bits cleared. FP ME FE0 DE MCSR Unchanged DEAR Unchanged Vector IVPR0:15 || IVOR3416:27 || 4b0000 7.7.21 0 — 0 — FE1 IS DS PMM RI 0 0 0 0 — Performance monitor interrupt (IVOR35) Zen Z7 provides a performance monitor interrupt that may be generated by an enabled condition or event. An enabled condition or event is as follows: A PMCx register overflow condition occurs with the following settings: • PMLCaxCE = 1; that is, for the given counter the overflow condition is enabled. • PMCxOV = 1; that is, the given counter indicates an overflow. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 519 For a performance monitor interrupt to be signaled on an enabled condition or event, PMGC0PMIE must be set. Although an exception condition may occur with MSREE = 0, the interrupt cannot be taken until MSREE = 1. The priority of the performance monitor interrupt is below all other asynchronous interrupts. For details, see Section 8.4, Performance monitor interrupt. Table 7-32 lists register settings when an performance monitor interrupt is taken. Table 7-32. Performance monitor interrupt—register settings Register 7.8 Setting description SRR0 Set to the effective address of the next instruction to be executed. SRR1 Set to the contents of the MSR at the time of the interrupt MSR UCLE 0 SPE 0 WE 0 CE — EE 0 PR 0 ESR Unchanged MCSR Unchanged DEAR Unchanged Vector IVPR0:15 || IVOR3516:27 || 4b0000 FP ME FE0 DE 0 — 0 — FE1 IS DS PMM RI 0 0 0 0 — Exception recognition and priorities The following list of exception categories describes how e200z759n3 handles exceptions up to the point of signaling the appropriate interrupt to occur. Also, instruction completion is defined as updating all architectural registers associated with that instruction as necessary, and then removing the instruction from the pipeline. • Interrupts caused by asynchronous events (exceptions). These exceptions are further distinguished by whether they are maskable and recoverable. — Asynchronous, non-maskable, non-recoverable: System reset by assertion of p_reset_b Has highest priority and is taken immediately regardless of other pending exceptions or recoverability. (Includes Watchdog Timer Reset Control and Debug Reset Control) — Asynchronous, non-maskable, possibly non-recoverable: Non-maskable interrupt by assertion of p_nmi_b Has priority over any other pending exception except system reset conditions. Recoverability is dependent on whether MCSRR0/1 are holding essential state info and are overwritten when the NMI occurs. e200z759n3 Core Reference Manual, Rev. 2 520 Freescale Semiconductor — Asynchronous, maskable/non-maskable, recoverable/non-recoverable: Machine check interrupt Has priority over any other pending exception except system reset conditions. Recoverability is dependent on the source of the exception. — Asynchronous, maskable, recoverable: External Input, Fixed-Interval Timer, Decrementer, Critical Input, Performance Monitor, Unconditional Debug, External Debug Event, Debug Counter Event, and Watchdog Timer interrupts • • Before handling this type of exception, the processor needs to reach a recoverable state. A maskable recoverable exception will remain pending until taken or cancelled by software. Synchronous, non instruction-based interrupts. The only exception is this category is the Interrupt Taken debug exception, recognized by an interrupt taken event. It is not considered instruction-based but is synchronous with respect to the program flow. — Synchronous, maskable, recoverable: Interrupt Taken debug event The machine will be in a recoverable state due to the state of the machine at the context switch triggering this event. Instruction-based interrupts. These interrupts are further organized by the point in instruction processing in which they generate an exception. — Instruction Fetch: Instruction Storage, Instruction TLB, and Instruction Address Compare debug exceptions Once these types of exceptions are detected, the excepting instruction is tagged. When the excepting instruction is next to begin execution and a recoverable state has been reached, the interrupt is taken. If an event prior to the excepting instruction causes a redirection of execution, the instruction fetch exception is discarded (but may be encountered again). — Instruction Dispatch/Execution: Program, System Call, Data Storage, Alignment, SPE/EFPU Unavailable, Data TLB, Embedded Floating-point Data, Embedded Floating-point Round, Debug (Trap, Branch Taken, Ret) interrupts These types of exceptions are determined during decode or execution of an instruction. The exception remains pending until all instructions before the exception causing instruction in program order complete. The interrupt is then taken without completing the exception-causing instruction. If completing previous instructions causes an exception, that exception takes priority over the pending instruction dispatch/execution exception, which is discarded (but may be encountered again when instruction processing resumes). — Post-Instruction Execution: Debug (Data Address Compare, Instruction Complete) interrupt e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 521 These Debug exceptions are generated following execution and completion of an instruction while the event is enabled. If executing the instruction produces conditions for another type of exception with higher priority, that exception is taken and the post-instruction exception is discarded for the instruction (but may be encountered again when instruction processing resumes). 7.8.1 Exception priorities Exceptions are prioritized as described in Table 7-33. Some exceptions may be masked or imprecise, which will affect their priority. Non-maskable exceptions such as reset and machine check may occur at any time and are not delayed even if an interrupt is being serviced, thus state information for any interrupt may be lost. Reset and certain machine checks are non-recoverable. Table 7-33. e200z759n3 exception priorities Priority Exception Cause IVOR Asynchronous exceptions 0 System reset Assertion of p_reset_b, Watchdog Timer Reset Control, or Debug Reset Control 1 Machine check Assertion of p_mcp_b, assertion of p_nmi_b, Cache Parity errors, exception on fetch of first instruction of an interrupt handler, external bus errors 2 31 — none 1 — Debug: • UDE • DEVT1 • DEVT2 • DCNT1 • DCNT2 • IDE • • • • • • 41 Critical Input Assertion of p_critint_b 0 51 Watchdog Timer Watchdog Timer first enabled time-out 12 61 External Input Assertion of p_extint_b 4 71 Fixed-Interval Timer Posting of a FIT exception in TSR due to programmer-specified bit transition in the Time Base register 11 81 Decrementer Posting of a Decrementer exception in TSR due to programmer-specified Decrementer condition 10 91 Performance Monitor Performance Monitor Enabled Condition or Event 35 15 Assertion of p_ude (Unconditional Debug Event) Assertion of p_devt1 and event enabled (External Debug Event 1) Assertion of p_devt2 and event enabled (External Debug Event 2) Debug Counter 1 exception Debug Counter 2 exception Imprecise Debug Event (event imprecise due to previous higher priority interrupt Instruction Fetch exceptions 10 11 Debug: • IAC (unlinked) ITLB Error 15 • Instruction address compare match for enabled IAC debug event and DBCR0IDM asserted Instruction translation lookup miss in the TLB 14 e200z759n3 Core Reference Manual, Rev. 2 522 Freescale Semiconductor Table 7-33. e200z759n3 exception priorities (continued) Priority 12 Exception Instruction Storage Cause IVOR • Access control. • Byte ordering due to misaligned instruction across page boundary to pages with mismatched VLE bits, or access to page with VLE set, and E indicating little-endian. • Misaligned Instruction fetch due to a change of flow to an odd halfword instruction boundary on a BookE (non-VLE) instruction page, due to value in LR, CTR, or xSRR0 3 Instruction Dispatch/Execution interrupts 13 14 Program: • Illegal • Attempted execution of an illegal instruction. Program: • Privileged • Attempted execution of a privileged instruction in user-mode 15 SPE/EFPU Unavailable 16 Program: • Unimplemented 17 18 19 Debug: • BRT • Trap • RET • CRET 6 6 Any SPE or EFPU unavailable exception condition. 32 6 • Attempted execution of an unimplemented instruction. (unused by e200z759n3) 15 • Attempted execution of a taken branch instruction • Condition specified in tw or twi instruction met. • Attempted execution of a rfi instruction. • Attempted execution of an rfci instruction. Note: Exceptions requires corresponding debug event enabled, MSRDE=1, and DBCR0IDM=1. Program: • Trap • Condition specified in tw or twi instruction met and not trap debug. 6 System Call Execution of the System Call (sc, se_sc) instruction. 8 EFPU Floating-point Data Denormalized, NaN, or Infinity data detected as input or output, or underflow, overflow, divide by zero, or invalid operation in the EFPU APU. 33 EFPU Round Inexact Result 34 Alignment lmw, stmw, lwarx, or stwcx. not word aligned. lharx, or sthcx. not halfword aligned. dcbz with cache disabled. 5 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 523 Table 7-33. e200z759n3 exception priorities (continued) Priority Exception Cause IVOR 20 Debug: Debug with concurrent Debug with concurrent DTLB or DSI exception, or async machine check condition on the DAC. DBSRIDE also set. DTLB or DSI exception, or concurrent async • Data Address Compare linked with Instruction Address Compare machine check: • Data Address Compare unlinked • DAC/IAC linked2 • DAC unlinked2 Note: Exceptions requires corresponding debug event enabled, MSRDE=1, and DBCR0IDM=1. In this case, the Debug exception is considered imprecise, and DBSRIDE will be set. Saved PC will point to the load or store instruction causing the DAC event. 15 21 Data TLB Error Data translation lookup miss in the TLB. 13 22 Data Storage • Access control. • Byte ordering due to misaligned access across page boundary to pages with mismatched E bits. • Cache locking due to attempt to execute a dcbtls, dcbtstls, dcblc, icbtls, or icblc in user mode with MSRUCLE = 0. 2 23 Alignment dcbz to W=1 or I=1 storage with cache enabled 5 24 Debug: • IRPT • CIRPT 15 • Interrupt taken (non-critical) • Critical Interrupt taken (critical only) Note: Exceptions requires corresponding debug event enabled, MSRDE=1, and DBCR0IDM=1. Post-instruction execution exceptions 25 26 Debug: • DAC/IAC linked2 • DAC unlinked2 Debug: • ICMP 15 • Data Address Compare linked with Instruction Address Compare • Data Address Compare unlinked Note: Exceptions requires corresponding debug event enabled, MSRDE=1, and DBCR0IDM=1. Saved PC will point to the instruction following the load or store instruction causing the DAC event. 15 • Completion of an instruction. Note: Exceptions requires corresponding debug event enabled, MSRDE=1, and DBCR0IDM=1. 1 These asynchronous exceptions are sampled at instruction boundaries, thus may actually occur after exceptions that are due to a currently executing instruction. If one of these exceptions occurs during execution of an instruction in the pipeline, it is not processed until the pipeline has been flushed, and the exception associated with the excepting instruction may occur first. 2 When no Data Storage Interrupt or Data TLB Error occurs, e200z759n3 implements the data address compare debug exceptions as post-instruction exceptions, which differ from the PowerISA 2.06 definition. When a TEA (either a DTLB error or DSI or Machine Check (external TEA)) occurs in conjunction with an enabled DAC or linked DAC/IAC on a load or store class instruction, or a Debug Counter event based on a counted DAC, the Debug Interrupt takes priority, and the saved PC value will point to the load or store class instruction, rather than to the next instruction. e200z759n3 Core Reference Manual, Rev. 2 524 Freescale Semiconductor 7.9 Interrupt processing When an interrupt is taken, the processor uses SRR0/SRR1 for non-critical interrupts, CSRR0/CSRR1 for critical interrupts, MCSRR0/MCSRR1 for machine check interrupts, and either CSRR0/CSRR1 or DSRR0/DSRR1 for debug interrupts to save the contents of the MSR and to assist in identifying where instruction execution should resume after the interrupt is handled. When an interrupt occurs, one of SRR0/CSRR0/DSRR0/MCSRR0 is set to the address of the instruction that caused the exception, or to the following instruction if appropriate. SRR1 is used to save machine state (selected MSR bits) on non-critical interrupts and to restore those values when an rfi instruction is executed. CSRR1 is used to save machine status (selected MSR bits) on critical interrupts and to restore those values when an rfci instruction is executed. DSRR1 is used to save machine status (selected MSR bits) on debug interrupts when the Debug APU is enabled and to restore those values when an rfdi instruction is executed. MCSRR1 is used to save machine status (selected MSR bits) on machine check interrupts and to restore those values when an rfmci instruction is executed. The Exception Syndrome register is loaded with information specific to the exception type. Some interrupt types can only be caused by a single exception type, and thus do not use an ESR setting to indicate the interrupt cause. The Machine State register is updated to preclude unrecoverable interrupts from occurring during the initial portion of the interrupt handler. Specific settings are described in Table 7-34. For Alignment, Data Storage, or Data TLB Miss interrupts, the Data Exception Address Register (DEAR) is loaded with the address that caused the interrupt to occur. For Machine Check interrupts, the Machine Check Syndrome register is loaded with information specific to the exception type. For certain machine checks, the MCAR is loaded with an address corresponding to the machine check. Instruction fetch and execution resumes, using the new MSR value, at a location specific to the exception type. The location is determined by the Interrupt Vector Prefix Register (IVPR), and an Interrupt Vector Offset Register (IVOR) specific for each type of interrupt (see Table 7-2). Table 7-34 shows the MSR settings for different interrupt categories. Table 7-34. MSR setting due to interrupt MSR definition Reset setting Non-critical interrupt Critical interrupt Debug Interrupt Machine Check interrupt 5 (37) UCLE 0 0 0 0 0 6 (38) SPE 0 0 0 0 0 13 (45) WE 0 0 0 0 0 14 (46) CE 0 — 0 —/01 0 Bits e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 525 Table 7-34. MSR setting due to interrupt (continued) MSR definition Reset setting Non-critical interrupt Critical interrupt Debug Interrupt Machine Check interrupt 16 (48) EE 0 0 0 —/01 0 17 (49) PR 0 0 0 0 0 18 (50) FP 0 0 0 0 0 19 (51) ME 0 — — — 0 20 (52) FE0 0 0 0 0 0 22 (54) DE 0 — —/01 0 —/01 23 (55) FE1 0 0 0 0 0 26 (58) IS 0 0 0 0 0 27 (59) DS 0 0 0 0 0 29 (61) PMM 0 0 0 0 0 30 (62) RI 0 — — — 0 Bits Reserved and preserved bits are unimplemented and read as 0. 1 7.9.1 Conditionally cleared based on control bits in HID0 Enabling and disabling exceptions When a condition exists that may cause an exception to be generated, it must be determined whether the exception is enabled for that condition. • System reset exceptions cannot be masked. • Machine check exceptions cannot be masked from sources other than the machine check pin, and certain other async machine check status settings. Assertion of p_mcp_b is only recognized if the machine check pin enable bit (HID0EMCP) is set. Certain machine check exceptions can be enabled and disabled through bit(s) in the HID0 register. • Asynchronous, maskable non-critical exceptions (such as the External Input and Decrementer) are enabled by setting MSREE. When MSREE=0, recognition of these exception conditions is delayed. MSREE is cleared automatically when a non-critical or critical interrupt is taken to mask further recognition of conditions causing those exceptions. • Asynchronous, maskable critical exceptions (such as Critical Input and Watchdog Timer) are enabled by setting MSRCE. When MSRCE=0, recognition of these exception conditions is delayed. MSRCE is cleared automatically when a critical interrupt is taken to mask further recognition of conditions causing those exceptions. • Synchronous and asynchronous Debug exceptions are enabled by setting MSRDE. When MSRDE=0, recognition of these exception conditions is masked. MSRDE is cleared automatically when a Debug interrupt is taken to mask further recognition of conditions causing those exceptions. See Chapter 12, Debug Support, for more details on individual control of debug exceptions. e200z759n3 Core Reference Manual, Rev. 2 526 Freescale Semiconductor 7.9.2 Returning from an interrupt handler The return from interrupt (rfi, se_rfi), return from critical interrupt (rfci, se_rfci) return from debug interrupt (rfdi, se_rfdi), and return from machine check interrupt (rfmci, se_rfmci) instructions perform context synchronization by allowing previously-issued instructions to complete before returning to the interrupted process. In general, execution of return from interrupt type instructions ensures the following: • • • • • • • • All previous instructions have completed to a point where they can no longer cause an exception. This includes post-execute type exceptions. Previous instructions complete execution in the context (privilege and protection) under which they were issued. The rfi and se_rfi instructions copy SRR1 bits back into the MSR. The rfci and se_rfci instructions copy CSRR1 bits back into the MSR. The rfdi and se_rfdi instructions copy DSRR1 bits back into the MSR. The rfmci and se_rfmci instructions copy MCSRR1 bits back into the MSR. Instructions fetched after this instruction execute in the context established by this instruction. Program execution resumes at the instruction indicated by SRR0 for rfi and se_rfi, CSRR0 for rfci and se_rfci, MCCSRR0 for rfmci and se_rfmci, and DSRR0 for rfdi and se_rfdi. Note that the return instructions rfi and se_rfi may be subject to a Return type debug exception, and that the return from critical interrupt instructions rfci and se_rfci may be subject to a Critical Return type debug exception. For a complete description of context synchronization, refer to Book E: Enhanced PowerPCtm Architecture. 7.10 Process switching The following instructions are useful for restoring proper context during process switching: • The msync instruction orders the effects of data memory instruction execution. All instructions previously initiated appear to have completed before the msync instruction completes, and no subsequent instructions appear to be initiated until the msync instruction completes. • The isync instruction waits for all previous instructions to complete and then discards any fetched instructions, causing subsequent instructions to be fetched (or refetched) from memory and to execute in the context (privilege, translation, and protection) established by the previous instructions. • The stwcx. instructions clears any outstanding reservations, ensuring that a load and reserve instruction in an old process is not paired with a store conditional instruction in a new one. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 527 e200z759n3 Core Reference Manual, Rev. 2 528 Freescale Semiconductor Chapter 8 Performance Monitor This chapter describes the performance monitor, which is generally defined by the Freescale EIS and implemented as an APU on the e200z759n3 core. Although the programming model is defined by the EIS, some features are defined by the implementation; in particular, the events that can be counted. 8.1 Overview The performance monitor provides the ability to count predefined events and processor clocks associated with particular operations, such as cache misses, mispredicted branches, or the number of cycles an execution unit stalls. The count of such events can be used to trigger the performance monitor interrupt. The performance monitor can be used to do the following: • Improve system performance by monitoring software execution and then recoding algorithms for more efficiency. For example, memory hierarchy behavior can be monitored and analyzed to optimize task scheduling or data distribution algorithms. • Characterize processors in environments not easily characterized by benchmarking. • Help system developers bring up and debug their systems. The performance monitor comprises the following resources: • The performance monitor mark bit in the MSR (MSRPMM). This bit controls which programs are monitored. • The move to/from performance monitor registers (PMR) instructions, mtpmr and mfpmr. • The external inputs p_pm_qual and p_pm_event. • The external outputs p_pmc0_ov, p_pmc1_ov, p_pmc2_ov, and p_pmc3_ov • PMRs: — The performance monitor counter registers PMC0–PMC3 are 32-bit counters used to count software-selectable events. UPMC0–UPMC3 provide user-level read access to these registers. Counted events are those that should be of general value. They are identified in Table 8-10. — The performance monitor global control register PMGC0 controls the counting of performance monitor events. It takes priority over all other performance monitor control registers. UPMGC0 provides user-level read access to PMGC0. — The performance monitor local control registers PMLCa0–PMLCa3 and PMLCb0–PMLCb3 control individual performance monitor counters. Each counter has a corresponding PMLCa and PMLCb register. UPMLCa0–UPMLCa3 and UPMLCb0–UPMLCb3 provide user-level read access to PMLCa0–PMLCa3 and PMLCb0–PMLCb3. • The performance monitor interrupt follows the Book E interrupt model and is assigned to interrupt vector offset register 35 (IVOR35). It has the lowest priority of all asynchronous interrupts. Software communication with the performance monitor APU is achieved through PMRs rather than SPRs. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 529 8.2 Performance Monitor APU instructions The Performance Monitor APU defines the mfpmr and mtpmr instructions for reading and writing the PMRs as shown below. mfpmr mfpmr Move from Performance Monitor Register mfpmr rD,PMRN 0 5 6 0 1 1 1 1 1 10 11 rD Form: X 15 16 PMRN5:9 20 PMRN0:4 21 0 1 0 1 0 0 1 1 1 30 31 0 / GPR(rD) PMREG(PMRN) The contents of the performance monitor register designated by PMRN are placed into GPR[rD]. When MSRPR = 1, specifying a performance monitor register that is not implemented or is write-only and is not privileged (i.e. PMRN5=0) results in an illegal instruction exception-type Program Interrupt. When MSRPR = 1, specifying a performance monitor register that is not implemented or is write-only and is privileged (i.e. PMRN5=1) results in a privileged instruction exception-type Program Interrupt. When MSRPR = 0, specifying a performance monitor register that is not implemented or is write-only results in an illegal instruction exception type Program Interrupt. mtpmr mtpmr Move to Performance Monitor Register mtpmr PMRN, rS 0 5 0 1 1 1 1 1 6 10 rS 11 Form: X 15 PMRN5:9 16 20 PMRN0:4 21 0 1 1 1 0 0 1 1 1 30 31 0 / PMREG(PMRN) GPR(rS) The contents of GPR[rS] are placed into the performance monitor register designated by PMRN. When MSRPR = 1, specifying a performance monitor register that is not implemented or is read-only and is not privileged (i.e. PMRN5=0) results in an illegal instruction exception-type Program Interrupt. When MSRPR = 1, specifying a performance monitor register that is not implemented or is read-only and is privileged (i.e. PMRN5=1) results in a privileged instruction exception-type Program Interrupt. When MSRPR = 0, specifying a performance monitor register that is not implemented or is read-only results in an illegal instruction exception type Program Interrupt. e200z759n3 Core Reference Manual, Rev. 2 530 Freescale Semiconductor 8.3 Performance Monitor APU registers The Freescale EIS defines a set of register resources used exclusively by the performance monitor. PMRs are similar to the SPRs defined in the Book E architecture and are accessed by mtpmr and mfpmr instructions, which are also defined by the Freescale EIS. Table 8-1 lists supervisor-level (privileged) PMRs. Table 8-1. Supervisor-level PMRs (PMR[5] = 1) Name Register name PMR number pmr[0–4 pmr[5–9 ] ] Section/ page PMC0 Performance monitor counter 0 16 00000 10000 8.3.9/8-540 PMC1 Performance monitor counter 1 17 00000 10001 PMC2 Performance monitor counter 2 18 00000 10010 PMC3 Performance monitor counter 3 19 00000 10011 PMGC0 Performance monitor global control register 0 400 01100 10000 8.3.3/8-532 PMLCa Performance monitor local control a0 0 144 00100 10000 8.3.5/8-534 PMLCa Performance monitor local control a1 1 145 00100 10001 PMLCa Performance monitor local control a2 2 146 00100 10010 PMLCa Performance monitor local control a3 3 147 00100 10011 PMLCb Performance monitor local control b0 0 272 01000 10000 PMLCb Performance monitor local control b1 1 273 01000 10001 PMLCb Performance monitor local control b2 2 274 01000 10010 PMLCb Performance monitor local control b3 3 275 01000 10011 8.3.7/8-535 User-level PMRs in Table 8-2 are read-only and are accessed with mfpmr. Table 8-2. User-level PMRs (PMR[5] = 0) (read-only) Name Register Name PMR Number pmr[0–4] pmr[5–9] Section/ Page 8.3.10/8-541 UPMC0 User performance monitor counter 0 0 00000 00000 UPMC1 User performance monitor counter 1 1 00000 00001 UPMC2 User performance monitor counter 2 2 00000 00010 UPMC3 User performance monitor counter 3 3 00000 00011 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 531 Table 8-2. User-level PMRs (PMR[5] = 0) (read-only) (continued) Name Register Name PMR Number pmr[0–4] pmr[5–9] Section/ Page UPMGC0 User performance monitor global control register 0 384 01100 00000 8.3.4/8-534 UPMLCa0 User performance monitor local control a0 128 00100 00000 8.3.6/8-535 UPMLCa1 User performance monitor local control a1 129 00100 00001 UPMLCa2 User performance monitor local control a2 130 00100 00010 UPMLCa3 User performance monitor local control a3 131 00100 00011 UPMLCb0 User performance monitor local control b0 256 01000 00000 UPMLCb1 User performance monitor local control b1 257 01000 00001 UPMLCb2 User performance monitor local control b2 258 01000 00010 UPMLCb3 User performance monitor local control b3 259 01000 00011 8.3.1 8.3.8/8-540 Invalid PMR references Behavior when an invalid PMR is referenced depends on the privilege level of the register and MSRPR. Table 8-3 shows the response for various references to invalid PMRs. Table 8-3. Response to an invalid PMR reference PMR address bit 5 0 (user) 1 (supervisor) 8.3.2 MSRPR Response x Illegal exception 0 (supervisor) Illegal exception 1 (user) Privileged exception References to read-only PMRs If a mtpmr instruction is executed to a read-only PMR, e200z759n3 will take an Illegal exception. 8.3.3 Performance Monitor Global Control Register 0 (PMGC0) FCECE 1 2 3 4 5 6 7 8 0 TBEE PMIE 0 0 TBSEL FAC The performance monitor global control register PMGC0 shown in Figure 8-1 controls all performance monitor counters. 0 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PMR - 400; Read/Write; Reset - 0x0 Figure 8-1. Performance Monitor Global Control Register (PMGC0) e200z759n3 Core Reference Manual, Rev. 2 532 Freescale Semiconductor PMGC0 is cleared by reset. Reading this register does not change its contents. Table 8-4 describes PMGC0 fields. Table 8-4. PMGC0 field descriptions Bits Name Description 0 (32) FAC Freeze All Counters. 0 The PMCs are incremented (if permitted by other PMGC/PMLC control bits). 1 The PMCs are not incremented. When FAC is set to 1 by hardware or software, it has no effect on PMLCaxFC; PMLCaxFC maintains it’s current value until changed by software. FAC setting by hardware is controlled by PMGC0FCECE. 1 (33) PMIE Performance monitor interrupt enable 0 Performance monitor interrupts are disabled. 1 Performance monitor interrupts are enabled and occur when an enabled condition or event occurs, at which time PMGC0PMIE is cleared Software can clear PMIE to prevent performance monitor interrupts. Performance monitor interrupts are caused by time base events or PMCx counter overflows. 2 (34) FCECE Freeze Counters on Enabled Condition or Event 0 The PMCs can be incremented (if permitted by other PM control bits). 1 The PMCs can be incremented (if permitted by other PM control bits) only until an enabled condition or event occurs. When an enabled condition or event occurs, PMGC0FAC is set to 1. It is up to software to clear PMGC0FAC to 0. An enabled condition or event is defined as one of the following: • When the msb = 1 in PMCx and PMLCaxCE = 1. • When the time-base bit specified by PMGC0TBSEL transitions to 1 and PMGC0TBEE=1. The use of the trigger and freeze counter conditions depends on the enabled conditions and events described in Section 7.2, “Performance Monitor Interrupt.” 3:18 (35:50) — 19:20 (51:52) TBSEL 21:22 (53:54) — 23 (55) TBEE 24:31 (56:63) — Reserved, should be cleared. Time Base Selector. Selects the time base bit that can cause a time base transition event (the event occurs when the selected bit changes from 0 to 1). 00 TB63 (TBL31) 01 TB55 (TBL23) 10 TB51 (TBL19) 11 TB47 (TBL15) Time-base frequency is implementation-dependent, so software should invoke a system service program to obtain the frequency before choosing a TBSEL value. Reserved, should be cleared. Time base transition Event Enable 0 Time base transition events are disabled. 1 Time base transition events are enabled. A time base transition is signaled to the performance monitor if the TB bit specified in PMGC0TBSEL changes from 0 to 1. Time base transition events can be used to freeze counters (PMGC0FCECE) or signal an exception (PMGC0PMIE). Although the exception signal condition may occur with MSREE = 0, the interrupt cannot be taken until MSREE = 1. Changing PMGC0TBSEL while PMGC0TBEE is enabled may cause a false 0 to 1 transition that signals the specified action (freeze, exception) to occur immediately. Reserved, should be cleared. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 533 8.3.4 User Performance Monitor Global Control Register 0 (UPMGC0) UPMGC0 provides user-level read access to PMGC0. UPMGC0 can be read by user-level software with the mfpmr instruction using PMR 384. 8.3.5 Performance Monitor Local Control A Registers (PMLCa0–PMLCa3) FCM0 CE 1 FCU FCS 0 FCM1 FC The local control A registers (PMLCa0–PMLCa3) function as event selectors and give local control for the corresponding performance monitor counters. PMLCa is used in conjunction with the corresponding PMLCb register. PMLCa registers are shown in Figure 8-2. 2 3 4 5 0 6 EVENT 7 8 0 PMP 0 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PMR - 144, 145, 146, 147; Read/Write; Reset - 0x0 Figure 8-2. Performance Monitor Local Control A Registers (PMLCa0–PMLCa3) PMLCa registers are cleared by reset. Table 8-5 describes PMLCa fields. Table 8-5. PMLCa0–PMLCa3 field descriptions Bits Name Description 0 (32) FC Freeze Counter. 0 The PMC can be incremented (if enabled by other performance monitor control fields). 1 The PMC will not be incremented. 1 (33) FCS Freeze Counter in Supervisor state. 0 The PMC can be incremented (if enabled by other performance monitor control fields). 1 The PMC will not be incremented if MSRPR is cleared. 2 (34) FCU Freeze Counter in User state. 0 The PMC can be incremented (if enabled by other performance monitor control fields). 1 The PMC will not be incremented if MSRPR is set. 3 (35) FCM1 Freeze Counter while Mark is set. 0 The PMC can be incremented (if enabled by other performance monitor control fields). 1 The PMC will not be incremented if MSRPMM is set. 4 (36) FCM0 Freeze Counter while Mark is cleared. 0 The PMC can be incremented (if enabled by other performance monitor control fields). 1 The PMC will not be incremented if MSRPMM is cleared. 5 (37) CE Condition Enable. 0 verflow conditions for PMCn cannot occur (PMCn cannot cause interrupts or freeze counters) 1 An overflow condition is present when the most-significant-bit of PMCn is equal to 1. It is recommended that CE be cleared when counter PMCn is selected for chaining. 6:7 (38:39) — Reserved for EVENT expansion, should be cleared. 8:15 EVENT Event selector. See Section 8.7, Event selection (40:47) e200z759n3 Core Reference Manual, Rev. 2 534 Freescale Semiconductor Table 8-5. PMLCa0–PMLCa3 field descriptions (continued) Bits Name 16 (48) — 17:19 (49:51) PMP 20:31 (52:63) — 1 Description Reserved, should be cleared. Performance Monitor Watchpoint Periodicity Select 000 Performance Monitor Watchpoint x asserts on any change of counterx bit 32 (period=231) 001 Performance Monitor Watchpoint x asserts on any change of counterx bit 43 (period=220) 010 Performance Monitor Watchpoint x asserts on any change of counterx bit 49 (period=214) 011 Performance Monitor Watchpoint x asserts on any change of counterx bit 55 (period=28) 100 Performance Monitor Watchpoint x asserts on any change of counterx bit 59 (period=24) 101 Performance Monitor Watchpoint x asserts on any change of counterx bit 61 (period=22) 110 Performance Monitor Watchpoint x asserts on any change of counterx bit 62 (period=21) 111 Performance Monitor Watchpoint x asserts on any change of counterx bit 63 (period=20)1 Reserved, should be cleared. For certain events that may count an even number of times per cycle, this watchpoint is not guaranteed to assert with PMP=111. 8.3.6 User Performance Monitor Local Control A Registers (UPMLCa0–UPMLCa3) The PMLCa register contents are aliased to UPMLCa0–UPMLCa3, which can be read by user-level software with mfpmr using PMR numbers in Table 8-2. 8.3.7 Performance Monitor Local Control B Registers (PMLCb0–PMLCb3) 0 1 2 3 4 5 6 7 8 0 THRESHMUL 0 TRIGGERED 0 TRIGOFFSEL 0 TRIGONSEL 0 TRIGOFFCTL TRIGONCTL Local control B registers PMLCb0–PMLCb3) specify triggering conditions, a threshold value and a multiple to apply to a threshold event selected for the corresponding performance monitor counter. For the e200z759n3, thresholding is supported only for PMC0 and PMC1. PMLCb is used in conjunction with the corresponding PMLCa register. 0 THRESHOLD 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PMR - 272, 273, 274, 275; Read/Write; Reset - 0x0 Figure 8-3. Performance Monitor Local Control B Registers (PMLCb0–PMLCb3) PMLCb is cleared by reset. Table 8-6 describes PMLCb fields. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 535 Table 8-6. PMLCb0–PMLCb3 field descriptions Bits Name 0 (32) — 1:3 (33:35) TRIGONCNTL 4 (36) — 5:7 (37:39) 8 (40) Description Reserved, should be cleared. Trigger-on Control Class - Class of Trigger-on source 000 Trigger-on control is disabled if TRIGONSEL is 0000 (i.e. counting is not affected by triggers). All other values for TRIGONSEL are reserved. 001 Trigger-on control based on selected PMC condition(s) 010 Trigger-on based on selected processor event(s) 011 Trigger-on based on selected hardware signal(s) 100Trigger-on based on selected watchpoint occurrence (watchpoint #0–15) 101 Trigger-on based on selected watchpoint occurrence (extension for watchpoint #16-31) 11x Reserved Indicates the condition under which triggering to start counting occurs. No triggering will occur while PMGC0FAC or PMLCanFC is set to ‘1’. Reserved, should be cleared. TRIGOFFCNTL Trigger-off Control Class - Class of Trigger-off source 000 Trigger-off control is disabled if TRIGOFFSEL is 0000 (i.e. counting is not affected by triggers) All other values for TRIGOFFSEL are reserved. 001 Trigger-off control based on selected PMC condition(s) 010 Trigger-off based on selected processor event(s) 011 Trigger-off based on selected hardware signal(s) 100 Trigger-off based on selected watchpoint occurrence (watchpoint #0–15) 101 Trigger-off based on selected watchpoint occurrence (extension for watchpoint #16-31) 11x Reserved Indicates the condition under which triggering to stop counting occurs. No triggering will occur while PMGC0FAC or PMLCanFC is set to ‘1’. — Reserved, should be cleared. e200z759n3 Core Reference Manual, Rev. 2 536 Freescale Semiconductor Table 8-6. PMLCb0–PMLCb3 field descriptions (continued) Bits Name 9:12 (41:44) TRIGONSEL Description Trigger-on Source Select - Source Select based on setting of TRIGONCTL TRIGONCTL = 000: 0000 Trigger-on control is disabled 0001 –1111 : Reserved TRIGONCTL = 001: This field should be to the ID of the PMCy that should trigger event counting to start. When PMCy overflows, the trigger will be generated. When TRIGONSEL = PMCx (i.e. self-select), no triggering will occur due to any counter change. If TRIGONSEL = TRIGOFFSEL, triggering results are undefined. 0000 Trigger-on when PMC0OV transitions to a ‘1’. 0001 Trigger-on when PMC1OV transitions to a ‘1’. 0010 Trigger-on when PMC2OV transitions to a ‘1’. 0011 Trigger-on when PMC3OV transitions to a ‘1’. 0100 – 1111 : Reserved TRIGONCTL = 010: 0000 Trigger-on when next processor interrupt occurs (software may want to set PMGC0PMIE = 0 for this setting). 0001 – 1111 : Reserved TRIGONCTL = 011: 0000 Trigger on assertion of p_devnt_out[0] 0001 Trigger on assertion of p_devnt_out[1] 0010 Trigger on assertion of p_devnt_out[2] 0011 Trigger on assertion of p_devnt_out[3] 0100 Trigger on assertion of p_devnt_out[4] 0101 Trigger on assertion of p_devnt_out[5] 0110 Trigger on assertion of p_devnt_out[6] 0111 Trigger on assertion of p_devnt_out[7] 1000 Trigger on rise of p_pmcn_qual input 1001 – 1111 : Reserved TRIGONCTL = 100: 0000 Trigger-on based on watchpoint #0 occurrence 0001 Trigger-on based on watchpoint #1 occurrence 0010 Trigger-on based on watchpoint #2 occurrence ... 1110 Trigger-on based on watchpoint #14 occurrence 1111 Trigger-on based on watchpoint #15 occurrence TRIGONCTL = 101: 0000 Trigger-on based on watchpoint #16 occurrence 0001 Trigger-on based on watchpoint #17 occurrence 0010 Trigger-on based on watchpoint #18 occurrence ... 1100 Trigger-on based on watchpoint #28 occurrence 1101 Trigger-on based on watchpoint #29 occurrence 1110 – 1111 : Reserved 13 (45) — Reserved, should be cleared. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 537 Table 8-6. PMLCb0–PMLCb3 field descriptions (continued) Bits Name 14:17 (46:49) TRIGOFFSEL Description Trigger-off Source Select - Source Select based on setting of TRIGOFFCTL TRIGOFFCTL = 000: 0000 Trigger-off control is disabled 0001 – 1111 : Reserved TRIGOFFCTL = 001: This field should be to the ID of the PMCy that should trigger event counting to stop. When PMCy overflows, the trigger will be generated. When TRIGOFFSEL = PMCx (i.e. self-select), no triggering will occur due to any counter change. If TRIGONSEL = TRIGOFFSEL, triggering results are undefined. 0000 Trigger-off when PMC0OV transitions to a ‘1’. 0001 Trigger-off when PMC1OV transitions to a ‘1’. 0010 Trigger-off when PMC2OV transitions to a ‘1’. 0011 Trigger-off when PMC3OV transitions to a ‘1’. 0100 – 1111 : Reserved TRIGOFFCTL = 010: 0000 Trigger-on when next processor interrupt occurs (software may want to set PMGC0PMIE = 0 for this setting). 0001 – 1111 : Reserved TRIGOFFCTL = 011: 0000 Trigger-off based on assertion of p_devnt_out[0] 0001 Trigger-off based on assertion of p_devnt_out[1] 0010 Trigger-off based on assertion of p_devnt_out[2] 0011 Trigger-off based on assertion of p_devnt_out[3] 0100 Trigger-off based on assertion of p_devnt_out[4] 0101 Trigger-off based on n assertion of p_devnt_out[5] 0110 Trigger-off based on assertion of p_devnt_out[6] 0111 Trigger-off based on assertion of p_devnt_out[7] 1000 Trigger-off based on fall of p_pmcn_qual input 1001 – 1111 : Reserved TRIGOFFCTL = 100: 0000 Trigger-off based on watchpoint #0 occurrence 0001 Trigger-off based on watchpoint #1 occurrence 0010 Trigger-off based on watchpoint #2 occurrence ... 1110 Trigger-off based on watchpoint #14 occurrence 1111 Trigger-off based on watchpoint #15 occurrence TRIGOFFCTL = 101: 0000 Trigger-off based on watchpoint #16 occurrence 0001 Trigger-off based on watchpoint #17 occurrence 0010 Trigger-off based on watchpoint #18 occurrence ... 1100 Trigger-off based on watchpoint #28 occurrence 1101 Trigger-off based on watchpoint #29 occurrence 1110 – 1111 : Reserved e200z759n3 Core Reference Manual, Rev. 2 538 Freescale Semiconductor Table 8-6. PMLCb0–PMLCb3 field descriptions (continued) Bits Name 18 (50) TRIGGERED Description Triggered 0 Counter has not been triggered 1 Counter has been triggered TRIGGERED can be set or cleared by hardware or software. TRIGGERED setting by hardware is controlled by PMLCbxTRIGONCTL. If PMLCbxTRIGONCTL is set to enable trigger-on control, TRIGGERED will be set by hardware when the next trigger-on event occurs and TRIGGERED is currently cleared. TRIGGERED clearing by hardware is controlled by PMLCbxTRIGOFFCTL. If PMLCbxTRIGOFFCTL is set to enable trigger-off control, TRIGGERED will be cleared by hardware when the next trigger-off event occurs and TRIGGERED is currently set. The state of TRIGGERED qualifies counting if either PMLCbxTRIGONCTL or PMLCbxTRIGOFFCTL is set to enable triggering (other qualifiers on counting such as PMGC0FAC and PMLCa controls operate independently of TRIGGERED). If both PMLCbxTRIGONCNTL and PMLCbxTRIGOFFCTL are cleared to disable triggering, the state of TRIGGERED has no effect on counting. TRIGGERED has no effect on PMLCaxFC; PMLCaxFC maintains it’s current value until changed by software. 19:20 (51:52) — 21:23 (53:55) THRESHMUL1 24:25 (56:57) — 26:31 (58:63) THRESHOLD1 Reserved, should be cleared. Threshold multiple. 000 Threshold field is multiplied by 1 (PMLCbnTHRESHOLD 1) 001 Threshold field is multiplied by 2 (PMLCbnTHRESHOLD 2) 010 Threshold field is multiplied by 4 (PMLCbnTHRESHOLD 4) 011 Threshold field is multiplied by 8 (PMLCbnTHRESHOLD 8) 100 Threshold field is multiplied by 16 (PMLCbnTHRESHOLD 16) 101 Threshold field is multiplied by 32 (PMLCbnTHRESHOLD 32) 110 Threshold field is multiplied by 64 (PMLCbnTHRESHOLD 64) 111 Threshold field is multiplied by 128 (PMLCbnTHRESHOLD 128) Reserved, should be cleared. Threshold Only events that exceed this value multiplied by THRESHMUL are counted. Events to which a threshold value applies are implementation dependent, as are the unit (for example duration in cycles) and the granularity with which the threshold value is interpreted. By varying the threshold value, software can obtain a profile of the event characteristics subject to thresholding by monitoring a program repeatedly using a different threshold value each time. 1 These Fields are not implemented in PMLCb2 and PMLCb3, and read as zero. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 539 8.3.8 User Performance Monitor Local Control B registers (UPMLCb0–UPMLCb3) The contents of PMLCb0–PMLCb3 are aliased to UPMLCb0–UPMLCb3, which can be read by user-level software with mfpmr using PMR numbers in Table 8-2. 8.3.9 Performance Monitor Counter registers (PMC0–PMC3) The performance monitor counter registers PMC0–PMC3 shown in Figure 8-4 are 32-bit counters that can be programmed to generate overflow event signals when they overflow. Each counter is enabled to count up to 128 processor events. O V 0 Counter Value 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 PMR - 16, 17, 18, 19; Read/Write; Reset - 0x0 Figure 8-4. Performance Monitor Counter registers (PMC0–PMC3) PMCs are cleared by reset. Table 8-7 describes the PMC register fields. Table 8-7. PMC0–PMC3 field descriptions Bits Name 0 (32) OV 1:31 (33:63) Counter Value Description Overflow 0 - Counter has not reached an overflow state. 1 - Counter has reached an overflow state. Note: this bit is not sticky, thus will not remain set if the counter subsequently counts past 0xFFFF_FFFF. Indicates the number of occurrences of the specified event. The minimum value for a counter is 0 (0x0000_0000) and the maximum value is 4,294,967,295 (0xFFFF_FFFF). A counter can increment by 0, 1, 2, 3, or 4 (based on the number of events occurring in a given counter cycle) up to the maximum value and then wraps to the minimum value. A counter enters the overflow state when the high-order bit is set. A performance monitor interrupt handler can easily identify overflowed counters, even if the interrupt is masked for many cycles (during which the counters may continue incrementing). A high-order bit is normally set only when the counter increments from a value below 2,147,483,648 (0x8000_0000) to a value greater than or equal to 2,147,483,648 (0x8000_0000). NOTE Initializing PMCs to overflowed values is discouraged. If an overflowed value is loaded into a PMCn that held a non-overflowed value (and PMGC0PMIE, PMLCanCE, and MSREE are set), an interrupt may be falsely generated before any events are counted. e200z759n3 Core Reference Manual, Rev. 2 540 Freescale Semiconductor The response to an overflow condition depends on the configuration, as follows: • If PMLCanCE is clear, no special actions occur on overflow of PMCn: the counter continues incrementing, and no event is signaled. • If PMLCanCE and PMGC0FCECE are both set, all counters are frozen when PMCn overflows. • If PMLCanCE and PMGC0PMIE are set, an exception is signaled on overflow of PMCn. Performance Monitor Interrupts are masked when MSREE =0. An exception may be signaled while MSREE =0, but the interrupt is not taken until MSREE =1 and is only guaranteed to be taken if the overflow condition is still present (i.e., the counter has not counted past 0xFFFF_FFFF, in which case the OV bit would become cleared) and the configuration has not been changed in the meantime to disable the exception. If PMLCanCE or PMGC0PMIE is cleared, the exception is no longer signaled. The following sequence is recommended for setting counter values and configurations: 1. Set PMGC0FAC to freeze the counters. 2. Using mtpmr instructions, initialize counters and configure control registers. 3. Release the counters by clearing PMGC0FAC with a final mtpmr. 8.3.10 User Performance Monitor Counter registers (UPMC0–UPMC3) The contents of PMC0–PMC3 are aliased to UPMC0–UPMC3, which can be read by user-level software with the mfpmr instruction using PMR numbers in Table 8-2. 8.4 Performance monitor interrupt The performance monitor interrupt is triggered by an enabled condition or event. The enabled condition or events defined for the e200z759n3 are the following: • A PMCn overflow condition occurs when both of the following are true: — The counter’s overflow condition is enabled; PMLCanCE is set. — The counter indicates an overflow; PMCnOV is set. • A time base event occurs with the following settings: — Time base events are enabled with PMGC0TBEE = 1 — The TBL bit specified in PMGC0TBSEL changes from 0 to 1 The two performance monitor exception conditions are treated differently with respect to whether or not the conditions are level-sensitive or edge-sensitive. A performance monitor exception condition that is caused by a PMCn overflow condition is level-sensitive to the values of PMLCAnCE and PMCnOV. This means that as long as these values are both set to ‘1’, then the exception condition continues to exist and the performance monitor interrupt can be taken if the remainder of the performance monitor interrupt gating conditions are met. However, the exception due to the time base event is set only when both PMGC0TBEE=1 and the transition from ‘0’ to ‘1’ occurs in the specified TBL bit. This condition is not cleared once it occurs, regardless of whether the TBL bit subsequently transitions to a ‘0’, but this exception is automatically cleared whenever any performance monitor interrupt is subsequently taken. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 541 If PMGC0PMIE is set, an enabled condition or event triggers the signaling of a performance monitor exception. If PMGC0FCECE is set, an enabled condition or event forces all performance monitor counters to freeze. Although the performance monitor exception condition may occur with MSREE =0, the interrupt cannot be taken until MSREE =1. If PMCn overflows and would signal an exception (PMLCanCE=1 and PMGC0PMIE=1) while MSREE =0, and freezing of the counters is not enabled (PMGC0FCECE is clear), it is possible that PMCn could wrap around to all zeros again without the performance monitor interrupt being taken. Interrupt handlers should clear a counter overflow condition or the corresponding Condition Enable to avoid a repeated interrupt to occur for the same event. The priority of the performance monitor interrupt is specified in Section 7.8.1, Exception priorities. 8.5 Event counting This section describes configurability and specific unconditional counting modes. 8.5.1 MSR-based context filtering Counting can be configured to be conditionally enabled if conditions in the processor state match a software-specified condition. Because a software task scheduler may switch a processor’s execution among multiple processes and because statistics on only a particular process may be of interest, a facility is provided to mark a process. The performance monitor mark bit, MSRPMM, is used for this purpose. System software may set this bit when a marked process is running. This enables statistics to be gathered only during the execution of the marked process. The states of MSRPR and MSRPMM define a state that the processor (supervisor or user) and the process (marked or unmarked) may be in at any time. If this state matches an individual state specified by the PMLCanFCS,FCU,FCM1,FCM0 fields, counting is enabled for PMCn. For the e200z759n3 implementation, a given event may or may not support MSR-based context filtering. For events that do not support MSR-based context filtering, the FCS, FCU, FCM1, and FCM0 controls have no effect on the counting of that event. The processor states and the settings of the FCS, FCU, FCM1, and FCM0 bits in PMLCan necessary to enable monitoring of each processor state are shown in Table 8-8. Table 8-8. Processor States and PMLCa0–PMLCa3 bit settings Processor State FCS FCU FCM1 FCM0 All (no context filtering) 0 0 0 0 Marked 0 0 0 1 Not marked 0 0 1 0 Supervisor 0 1 0 0 Marked and supervisor 0 1 0 1 e200z759n3 Core Reference Manual, Rev. 2 542 Freescale Semiconductor Table 8-8. Processor States and PMLCa0–PMLCa3 bit settings (continued) Processor State 8.6 FCS FCU FCM1 FCM0 Not marked and supervisor 0 1 1 0 User 1 0 0 0 Marked and user 1 0 0 1 Not marked and user 1 0 1 0 None (counting disabled) X X 1 1 None (counting disabled) 1 1 X X Examples The following sections provide examples of how to use the performance monitor facility. 8.6.1 Chaining counters The counter chaining feature can be used to allow a higher event count than is possible with a single counter. Chaining two counters together effectively adds 32 bits to a counter register where rollover of the first counter generates a carry out feeding the second counter. By defining the event of interest to be another PMC’s rollover occurrence, the chained counter increments each time the first counter rolls over to zero. Multiple counters may be chained together. Because the entire chained value cannot be read in a single instruction, a rollover may occur between counter reads, producing an inaccurate value. A sequence like the following is necessary to read the complete chained value when it spans multiple counters and the counters are not frozen. The example shown is for a two-counter case. loop: mfpmr mfpmr mfpmr cmp bc Rx,pmctr1 Ry,pmctr0 Rz,pmctr1 cr0,0,Rz,Rx 4,2,loop #load from upper counter #load from lower counter #load from upper counter #see if ‘old’ = ‘new’ #loop if carry occurred between reads The comparison and loop are necessary to ensure that a consistent set of values has been obtained. The above sequence is not necessary if the counters are frozen. 8.6.2 Thresholding Threshold event measurement enables the counting of duration and usage events. For example, data cache load miss cycles (events C0:xx and C1:xx) require a threshold value. A data cache load miss cycles event is counted only when the number of cycles spent waiting for the miss is greater than the threshold. Because this event is supported by two counters and each counter has an individual threshold, one execution of a performance monitor program can sample two different threshold values. Measuring code performance with multiple concurrent thresholds may expedite code profiling significantly. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 543 8.7 Event selection Event selection is specified through the PMLCan registers described in Section 8.3.5, Performance Monitor Local Control A Registers (PMLCa0–PMLCa3). The event-select fields in PMLCanEVENT are described in Table 8-10, which lists encodings for the selectable events to be monitored. Table 8-10 establishes a correlation between each counter, events to be traced, and the pattern required for the desired selection. The Spec/Nonspec column indicates whether the event count includes any occurrences due to processing that was not architecturally required by the Power Architecture sequential execution model (speculative processing). • Speculative counts include speculative operations that were later flushed. • Nonspeculative counts do not include speculative operations, which are flushed. The PR, PMM filtering column indicates whether a given event supports MSR-based context filtering. Table 8-9 describes how event types are indicated in Table 8-10. Table 8-9. Event types Event type Label Description Reference Ref:# Shared across counters PMC0–PMC3. Common Com:# Shared across counters PMC0–PMC3. Counter-specific C[0–3]:# Counted only on one or more specific counters. The notation indicates the counter to which an event is assigned. For example, an event assigned to counter PMC0 is shown as C0:#. Table 8-10 describes performance monitor events. Table 8-10. Performance monitor event selection Number Spec/ nonspec Event PR, PMM filtering1 Count description General events Com:0 Nothing Nonspec - Register counter holds current value Ref:12 Processor cycles Nonspec yes Every processor cycle not in waiting, halted, stopped states and not in a debug session. Com:23 Instructions completed Nonspec yes Completed instructions. , 1, 2, or 3 per cycle. Com:32 Processor cycles with 0 instructions issued Nonspec yes Ref:1 cycles with no instructions entering execution Com:42 Processor cycles with 1 instruction issued Nonspec yes Ref:1 cycles with one instruction entering execution Com:52 Processor cycles with 2 instructions issued Nonspec yes Ref:1 cycles with two instructions entering execution e200z759n3 Core Reference Manual, Rev. 2 544 Freescale Semiconductor Table 8-10. Performance monitor event selection (continued) Spec/ nonspec PR, PMM filtering1 Instruction words fetched Spec yes Fetched instruction words. 0, 1, or 2, 3, or 4 per cycle. (note that an instruction word may hold 1 or 2 instructions, or 2 partial instructions when fetching from a VLE page) Com:7 — — — — Com:8 PM_EVENT transitions — — 0 to 1 transitions on the p_pm_event input. Com:9 PM_EVENT cycles — — Processor (Ref:1) cycles that occur when the p_pm_event input is asserted. Number Event Com:63 Count description Instruction types completed Com:103 Branch instructions completed Nonspec yes Completed branch instructions, includes branch and link type instructions Com:113 Branch and link type instructions completed Nonspec yes Completed branch and link type instructions Com:123 Conditional branch instructions completed Nonspec yes Completed conditional branch instructions Com:133 Taken Branch instructions completed Nonspec yes Completed branch instructions that were taken. Includes branch and link type instructions. Com:143 Taken Conditional Branch instructions completed Nonspec yes Completed conditional branch instructions that were taken. Com:153 Load instructions completed Nonspec yes Completed load, load-multiple type instructions Com:163 Store instructions completed Nonspec yes Completed store, store-multiple type instructions Com:173 Load micro-ops completed Nonspec yes Completed load micro-ops. (l*, evl*, load-update (1 load micro-op), load-multiple (1–32 micro-ops), dcbt, dcbtls, dcbtst, dcbtstls, and dcbtst, dcbf, dcblc, dcbst, icbi, icblc, icbt, icbtls). Misaligned loads crossing a 64-bit boundary count as two micro-ops. Com:183 Store micro-ops completed Nonspec yes Completed store micro-ops. (st*, evst*, store-update (1 store micro-op), store-multiple (1–32 micro-ops), dcbi, dcbz). Misaligned stores crossing a 64-bit boundary count as two micro-ops. Com:193 Integer instructions completed Nonspec yes Completed simple integer instructions (not a load-type/store-type/branch/mul/div, EFPU, or SPE) Com:203 Multiply instructions completed Nonspec yes Completed Multiply instructions (non-EFPU) Com:213 Divide instructions completed Nonspec yes Completed Divide instructions including SPE (non-EFPU) e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 545 Table 8-10. Performance monitor event selection (continued) Spec/ nonspec PR, PMM filtering1 Divide instruction execution cycles Nonspec yes Cycles of execution for all Divide instructions (non-EFPU) Com:233 SPE/EFPU instructions completed Nonspec yes Completed SPE/EFPU instructions. Does not include SPE/EFPU load and store instructions. Com:243 SPE simple instructions completed Nonspec yes Completed SPE simple instructions. All SPE instructions included except SPE load and store instructions, div, dotp, mul and mac-type instructions. Com:253 SPE mul/mac/dotp instructions completed Nonspec yes Completed SPE mul/mac/dotp instructions. Does not include other SPE instructions, or brinc instructions. Com:263 EFPU FP instructions completed Nonspec yes Completed EFPU FP (evfs, efs) instructions. Com:273 Number of return from interrupt instructions Nonspec yes Includes all types of return from interrupts (i.e. rfi, rfci, rfdi, rfmci, and VLE variants) Number Event Com:223 Count description Branch prediction and execution events Com:283 Finished branches that miss the BTB Spec yes Includes all taken branch instructions that missed in the BTB Com:293 Branches mispredicted (for any reason) Spec yes Counts branch instructions mispredicted due to direction or target (for example if the LR or CTR contents change). Com:303 Branches in the BTB mispredicted due to direction prediction. Spec yes Counts branch instructions that hit the BTB with mispredicted due to direction prediction. Com:313 Incorrect target prediction using the link stack Spec yes Com:323 BTB hits Spec yes Com:33 — — — — Com:34 — — — — — Branch instructions that hit in the BTB Pipeline stalls Com:35 — — — — Com:36 — — — — Com:372 Cycles decode stalled due to no instructions available Spec yes No instruction available to decode Com:382 Cycles issue stalled Spec yes Cycles the issue buffer is not empty but 0 instructions issued e200z759n3 Core Reference Manual, Rev. 2 546 Freescale Semiconductor Table 8-10. Performance monitor event selection (continued) Spec/ nonspec PR, PMM filtering1 Cycles branch issue stalled Spec yes Branch held in decode awaiting resolution Com:402 Cycles execution stalled waiting for load data Spec yes load stalls Com:412 Cycles execution stalled waiting for non-load/store SPE/EFPU result data Spec yes Stalled waiting on mul, div, FP or MAC results Number Event Com:392 Count description Load/store, data cache, and data line fill events Com:42 — — — — Com:43 — — — — Com:443 Total translation hits Spec yes — Com:453 Load translation hits Spec yes Cacheable l* or evl* micro-ops translated. (includes load micro-ops from load-multiple and load-update instructions) Com:463 Store translation hits Spec yes Cacheable st* or evst* micro-ops translated. (includes micro-ops from store-multiple, and store-update instructions) Com:473 Touch translation hits Spec yes Cacheable dcbt and dcbtst instructions translated (L1 only) and causing linefills. (Doesn’t count touches that are converted to nops i.e. exceptions, non-cacheable, HID0[NOPTI] is set, cache hits, etc.) Com:483 Data cache op translation hits Spec yes dcba, dcbf, dcbst, and dcbz instructions translated Com:493 Data cache lock set instructions completed Nonspec yes dcbtls and dcbtstls instructions completed Com:503 Data cache lock clear instructions completed Nonspec yes dcblc instructions completed Com:513 Cache-inhibited load access translation hits Spec yes Cache inhibited load accesses translated Com:523 Cache-inhibited store access translation hits Spec yes Cache inhibited store accesses translated Com:533 Guarded load translation hits Spec yes Guarded loads translated Com:543 Guarded store translation hits Spec yes Guarded stores translated Com:553 Write-through store translation hits Spec yes Write-through stores translated Com:563 Misaligned load or store accesses translated Spec yes Misaligned load or store accesses translated. Count once per misaligned load or store. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 547 Table 8-10. Performance monitor event selection (continued) Number Event Spec/ nonspec PR, PMM filtering1 Com:573 DCache linefills Spec yes Counts DCache reloads for any reason, including touch-type reloads. Typically used to determine approximate data cache miss rate (along with loads/stores completed). Com:583 DCache copybacks Spec yes Does not count copybacks due to dcbf, dcbst, or L1FINV0 operations Com:593 DCache sequential accesses Spec yes Number of sequential accesses Com:603 DCache stream hits Spec yes Number of load hits due to streaming Com:613 DCache linefill buffer hits Spec yes Number of load hit to the linefill buffer Com:623 Store stalls due to store to line of active linefill Spec yes Stall cycles due to store to linefill in progress Com:633 Store buffer full stalls Spec yes Stall cycles due to store buffer full Com:642 DCache throttling stalls Spec yes Cycles the data cache asserts p_d_halt_zlb, which actually cause a CPU stall Com:653 DCache recycled accesses Spec yes Number of loads or stores recycled for a re-lookup Com:663 DCache recycled access stalls Spec yes Number of stall cycles due to recycled accesses for a re-lookup Com:673 DCache CPU aborted accesses Spec yes Number of aborted requests Com:683 Data MMU miss Spec yes Counts number of DTLB events Com:693 Data MMU error Spec yes Counts number of DSI events Count description Fetch, instruction cache, instruction line fill, and instruction prefetch events Com:70 — — — — Com:71 — — — — Com:723 ICache linefills Spec yes Counts ICache reloads due to demand fetch. Used to determine instruction cache miss rate (along with instructions completed) Com:733 Number of fetches Spec yes Counts fetches that write at least one instruction to the instruction buffer. (With instruction fetched (com:4), can used to compute instructions-per-fetch) Com:743 ICache lock set instructions completed Nonspec yes icbtls instructions completed Com:753 ICache lock clear instructions completed Nonspec yes icblc instructions completed e200z759n3 Core Reference Manual, Rev. 2 548 Freescale Semiconductor Table 8-10. Performance monitor event selection (continued) Spec/ nonspec PR, PMM filtering1 Cache-inhibited instruction access translation hits Spec yes Cache-inhibited instruction accesses translated Com:772 ICache throttling stalls Spec yes Cycles the instruction cache asserts p_i_halt_zlb, which actually causes a CPU stall Com:783 ICache recycled accesses Spec yes Number of instruction access requests recycled for a re-lookup Com:793 ICache recycled access stalls Spec yes Number of stall cycles due to recycled accesses for a re-lookup Com:803 ICache CPU aborted accesses Spec yes Number of aborted requests Com:813 Instruction MMU miss Spec yes Counts number of events Com:823 Instruction MMU error Spec yes Counts number of events Number Event Com:763 Count description BIU interface usage Com:83 — — — — Com:84 — — — — Com:853 BIU instruction-side requests Spec yes instruction-side transactions Com:863 BIU instruction-side cycles Spec yes instruction-side transaction cycles Com:873 BIU data-side requests Spec yes data-side transactions Com:883 BIU data-side copyback requests Spec yes Replacement pushes including dcbf, dcbst, L1FINV0, copybacks. Com:893 BIU data-side cycles Spec yes data-side transaction cycles Com:903 BIU single-beat write cycles Non-Spec yes single beat write transaction cycles Com:91 — — — — Snoop Com:92 Snoop requests N/A — Externally generated snoop requests. (Counts snoop TSs.) Com:93 Snoop hits N/A — Snoop hits on all data-side resources regardless of the cache state (modified, shared, or exclusive) Com:943 Snoop induced CPU to DCache stalls N/A — Cycles a pending DCache access from CPU is stalled due to contention with snoops Com:95 Snoop Queue full cycles N/A — Cycles the snoop queue is full Com:96 — — — — e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 549 Table 8-10. Performance monitor event selection (continued) Number Spec/ nonspec Event PR, PMM filtering1 Count description Chaining events4 Com:97 PMC0 rollover N/A — PMC0OV transitions from 1 to 0. PMC1OV transitions from 1 to 0. Com:98 PMC1 rollover N/A — Com:99 PMC2 rollover N/A — PMC2OV transitions from 1 to 0. N/A — PMC3OV transitioned from 1 to 0. Com:100 PMC3 rollover Interrupt events Com:101 — — — — Com:102 — — — — Com:103 Interrupts taken Nonspec — — Com:104 External input interrupts taken Nonspec — — Com:105 Critical input interrupts taken Nonspec — — Com:106 Watchdog timer interrupts taken Nonspec — — Com:107 System call and trap interrupts Nonspec yes — Com:1082 Cycles in which MSREE=0 Nonspec — — Com:1092 Cycles in which MSRCE=0 Nonspec — — Ref:110 Transitions of TBL bit selected by PMGC0TBSEL. Nonspec — — DEVENT events Com:111 DEVNT0 is generated Nonspec yes assertion of p_devnt_out0 detected Com:112 DEVNT1 is generated Nonspec yes assertion of p_devnt_out1 detected Com:113 DEVNT2 is generated Nonspec yes assertion of p_devnt_out2 detected Com:114 DEVNT3 is generated Nonspec yes assertion of p_devnt_out3 detected Com:115 DEVNT4 is generated Nonspec yes assertion of p_devnt_out4 detected Com:116 DEVNT5 is generated Nonspec yes assertion of p_devnt_out5 detected Com:117 DEVNT6 is generated Nonspec yes assertion of p_devnt_out6 detected Com:118 DEVNT7 is generated Nonspec yes assertion of p_devnt_out7 detected Watchpoint events e200z759n3 Core Reference Manual, Rev. 2 550 Freescale Semiconductor Table 8-10. Performance monitor event selection (continued) Number Event Spec/ nonspec PR, PMM filtering1 Com:1192 Watchpoint #0 occurs Nonspec yes assertion of jd_watchpt0 detected Com:1202 Watchpoint #1 occurs Nonspec yes assertion of jd_watchpt1 detected 2 Com:121 Watchpoint #2 occurs Nonspec yes assertion of jd_watchpt2 detected Com:1222 Watchpoint #3 occurs Nonspec yes assertion of jd_watchpt3 detected 2 Com:123 Watchpoint #4 occurs Nonspec yes assertion of jd_watchpt4 detected Com:1242 Watchpoint #5 occurs Nonspec yes assertion of jd_watchpt5 detected 2 Com:125 Watchpoint #6 occurs Nonspec yes assertion of jd_watchpt6 detected Com:1262 Watchpoint #7 occurs Nonspec yes assertion of jd_watchpt7 detected Com:1272 Watchpoint #8 occurs Nonspec yes assertion of jd_watchpt8 detected Com:1282 Watchpoint #9 occurs Nonspec yes assertion of jd_watchpt9 detected Com:129 Watchpoint #10 occurs Nonspec yes assertion of jd_watchpt10 detected Com:130 Watchpoint #11 occurs Nonspec yes assertion of jd_watchpt11 detected Com:131 Watchpoint #12 occurs Nonspec yes assertion of jd_watchpt12 detected Com:132 Watchpoint #13 occurs Nonspec yes assertion of jd_watchpt13 detected Com:1332 Watchpoint #14 occurs Nonspec yes assertion of jd_watchpt14 detected Com:1342 Watchpoint #15 occurs Nonspec yes assertion of jd_watchpt15 detected Com:1352 Watchpoint #16 occurs Nonspec yes assertion of jd_watchpt16 detected Com:1362 Watchpoint #17 occurs Nonspec yes assertion of jd_watchpt17 detected Com:1372 Watchpoint #18 occurs Nonspec yes assertion of jd_watchpt18 detected Com:1382 Watchpoint #19 occurs Nonspec yes assertion of jd_watchpt19 detected Com:139 Watchpoint #20 occurs Nonspec yes assertion of jd_watchpt20 detected Com:140 Watchpoint #21 occurs Nonspec yes assertion of jd_watchpt21 detected Com:141 Watchpoint #22 occurs Nonspec yes assertion of jd_watchpt22 detected Com:142 Watchpoint #23 occurs Nonspec yes assertion of jd_watchpt23 detected Com:143 Watchpoint #24 occurs Nonspec yes assertion of jd_watchpt24 detected Com:144 Watchpoint #25 occurs Nonspec yes assertion of jd_watchpt25 detected Com:145 Watchpoint #26 occurs Nonspec yes assertion of jd_watchpt26 detected Com:1462 Watchpoint #27 occurs Nonspec yes assertion of jd_watchpt27 detected Com:1472 Watchpoint #28 occurs Nonspec yes assertion of jd_watchpt28 detected Com:1482 Watchpoint #29 occurs Nonspec yes assertion of jd_watchpt29 detected Com:149 — — — — Com:150 — — — — Count description e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 551 Table 8-10. Performance monitor event selection (continued) Number Spec/ nonspec Event PR, PMM filtering1 Count description NEXUS events Com:1513 Cycle CPU is stalled by Nexus3 FIFO full Nonspec yes OVCR stall control set to stall on FIFO fullness Threshold events C0:1523 C1:1523 Data cache load miss cycles Spec yes Instances when the number of cycles between a load miss in the data cache and update of the data cache exceeds the threshold. C0:1533 C1:1533 Instruction cache fetch miss cycles Spec yes Instances when the number of cycles between miss in the instruction cache and update of the instruction cache exceeds the threshold. C0:1543 C1:1543 External input interrupt latency cycles N/A — Instances when the number of cycles between request for interrupt (p_int_b) asserted (but possibly masked/disabled) and redirecting fetch to external interrupt vector exceeds threshold. Once the redirection has occurred, no further threshold comparisons are made until either the interrupt request negates, or the external input interrupt is re-enabled by setting MSREE. C0:1553 C1:1553 Critical input interrupt latency cycles N/A — Instances when the number of cycles between request for critical interrupt (p_critint_b) is asserted (but possibly masked/disabled) and redirecting fetch to the critical interrupt vector exceeds threshold. Once the redirection has occurred, no further threshold comparisons begin until either the interrupt request negates and is then re-asserted, or the critical input interrupt is re-enabled by setting MSRCE. C0:1563 C1:1563 Watchdog timer interrupt latency cycles N/A — Instances when the number of cycles between watchdog timer time-out request for critical interrupt becomes pending (watchdog interrupt enabled (TCRWIE set) and time-out occurs (TSRENW,WIS become 0b11)) and redirecting fetch to the critical interrupt vector exceeds the threshold. Once the redirection has occurred, no further threshold comparisons begin until either the watchdog interrupt request negates and is then re-asserted, or the watchdog interrupt is re-enabled by setting MSRCE. e200z759n3 Core Reference Manual, Rev. 2 552 Freescale Semiconductor Table 8-10. Performance monitor event selection (continued) Spec/ nonspec PR, PMM filtering1 External input interrupt pending latency cycles N/A — Critical input interrupt pending latency cycles N/A Number Event C0:1573 C1:1573 C0:1583 C1:1583 Count description Instances when the number of cycles between external interrupt pending (enabled and pin asserted) and redirecting fetch to the external interrupt vector exceeds the threshold. Once the redirection has occurred, no further threshold comparisons are made until either the interrupt request negates and is then re-asserted, or the external input interrupt is re-enabled by setting MSREE. Instances when the number of cycles between pin request for critical interrupt pending (enabled and pin asserted) and redirecting fetch to the critical interrupt vector exceeds the threshold. Once the redirection has occurred, no further threshold comparisons are made until either the interrupt request negates and is then re-asserted, or the critical input interrupt is re-enabled by setting MSRCE. 1 The notation for the PR, and PMM filtering column either contains a ‘yes’ or a ‘-’. A ‘yes’ indicates that the MSR-based context filtering function is available for that event. A ‘-’ indicates that the MSR-based context filtering is not available for that event and will have no effect on the counting of that event. See Section 8.5.1, MSR-based context filtering, for more information. 2 This event is not counted while the processor is in the waiting, halted, or stopped states, or during a debug session 3 This event is not counted while the processor is in a debug session. 4 For chaining events, if a counter is configured to count its own rollover, the result is undefined. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 553 e200z759n3 Core Reference Manual, Rev. 2 554 Freescale Semiconductor Chapter 9 Power Management 9.1 Power management Power management is supported by e200z759n3 cores to minimize overall system power consumption. The e200z759n3 core provides the ability to initiate power management from external sources as well as through software techniques. The power states on the e200z759n3 core are described below. 9.1.1 Active state The Active state is the default state for the e200z759n3 core in which all of its internal units are operating at full processor clock speed. In this state, the e200z759n3 core still provides dynamic power management in which individual internal functional units may stop clocking automatically whenever they are idle. 9.1.2 Waiting state The e200z759n3 core enters the Waiting state as a result of executing a wait instruction. Following entry into the waiting state, instruction execution and bus activity is suspended. Most internal clocks are gated off in this state. The e200z759n3 core asserts p_waiting to indicate it is in the waiting state. Prior to entering the waiting state, all outstanding instructions and bus transactions will be completed, and the cache’s store and push buffers will be flushed. The m_clk input should remain running while in the waiting state to allow for interrupt sampling, and to allow further transitions into the Halted or Stopped state if requested and to keep the Time Base operational if it is using m_clk as the clock source. In the waiting state, the core is waiting for a valid unmasked pending interrupt request. Once a pending interrupt request is received, the core will exit the waiting state and begin interrupt processing. The return program counter value will point to the next instruction after the wait instruction. The interrupt can be an external input interrupt, various critical interrupts, a debug interrupt (based on ICMP), a non-maskable interrupt, or a machine check interrupt (p_mcp_b assertion, etc.). Once the interrupt processing begins, the core will not return to the waiting state until another wait instruction is executed. The waiting state can be temporarily exited and returned to if a request is made to enter hardware debug mode (various mechanisms), the Halted state, or the Stopped state. After exiting one of these states, the processor will return to the waiting state. While temporarily exited, the p_waiting output will negate, and will be re-asserted once the CPU returns to the waiting state. 9.1.3 Halted state Instruction execution and bus activity is suspended in the Halted state. Most internal clocks are gated off in this state. The e200z759n3 core asserts p_halted to indicate it is in the halted state. Prior to entering the halted state, all outstanding bus transactions will be completed, and the cache’s store and push buffers will be flushed. The m_clk input should remain running while in the Halted state to ensure that snoop requests continue to be processed, to allow further transitions into the Stopped state if requested, and to keep the Time Base operational if it is using m_clk as the clock source. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 555 9.1.4 Stopped state The Stopped state is characterized as having all internal functional units of the e200z759n3 core stopped except the Time Base unit and the clock control state machine logic. The internal m_clk may be kept running to keep the Time Base active and to allow quick recovery to the full on state. Clocks are not running to functional units in this state except for the Time Base. The Stopped state is reached after transitioning through the Halted state with the p_stop input asserted. The p_stopped output signal will be asserted once the Stopped state is reached. The CPU will not enter the Stopped state until all snoops have been processed and the snoop queue is empty. System logic is responsible for ensuring that snoop requests are no longer generated once the p_stop input is asserted, in order to allow a transition from the Halted to the Stopped state. While in the Stopped state, further power savings may be achieved by disabling the Time Base by asserting p_tbdisable, or by stopping the m_clk input. This is done externally by the system after the e200z759n3 core is safely in the Stopped state and has asserted the p_stopped output signal. To exit from the Stopped state, the system must first restart the m_clk input. Since the Time Base unit is off during the Stopped state if it is using m_clk as the clock source and m_clk is stopped, or if the Time Base clocking is disabled by the assertion of p_tbdisable, system software must usually have to access an external time base source after returning to the full on state in order to re-initialize the Time Base unit. In addition, it will not be possible to use a Time Base related interrupt source to exit low power states. e200z759n3 also provides the capability of clocking the Time Base from an independent (but externally synchronized) clock source, which would allow the Time Base to be maintained during the Stopped state, and would allow a Time Base related interrupt to be generated to indicate an exit condition from the Stopped state. ipend ~p_halt & ~ p_stop & ~ipend Waiting ~p_halt & ~ p_stop Active exec wait p_halt | p_stop ~p_halt & ~ p_stop & prev_waited ~p_halt & ~ p_stop & ~prev_waited Halted (p_halted asserted) p_stop ~p_stop & p_halt ~p_stop Stopped (p_stopped asserted) p_stop Figure 9-1. Power management state diagram 9.1.5 Power management pins p_waiting - output pin asserted when the e200z759n3 core is in the Waiting state. e200z759n3 Core Reference Manual, Rev. 2 556 Freescale Semiconductor p_halt - input pin is asserted by system logic to request the core to go into the Halted state. Negating this pin causes the e200z759n3 core to transition back into the Active or Waiting state if p_stop is also negated. p_halted - output pin asserted when the e200z759n3 core is in the Halted state. p_stop - input pin is asserted by system logic to request that the e200z759n3 core go into the Stopped state. Negating this pin causes the e200z759n3 core to transition back into the Halted state from the Stopped state. p_stopped - output pin asserted when the e200z759n3 core is in the Stopped state. p_tbdisable - input pin is asserted by system logic when clocking of the Time Base should be disabled. p_tbint - output pin is asserted when an internal Time Base interrupt request is signaled. p_doze, p_nap, and p_sleep output pins that reflects the state of HID0DOZE, HID0NAP, and HID0SLEEP respectively. These pins are qualified with MSRWE = 1. Interpretation of these signals is done by the system logic. p_wakeup - output pin asserted when an interrupt is pending or other condition that requires the clock to be running. 9.1.6 Power management control bits The following bits are used by software to generate a request to enter a power-saving state and to choose the state to be entered: • MSRWE—The WE bit is used to qualify assertion of the p_doze, p_nap, and p_sleep output pins to the system logic. When MSRWE is negated, these pins are negated. When MSRWE is set, these pins reflect the state of their respective control bits in the HID0 register. • HID0DOZE —The interpretation of the doze mode bit is done by the external system logic. Doze mode on the e200z759n3 core is intended to be the halted state with the clocks running. • HID0NAP —The interpretation of the nap mode bit is done by the external system logic. Nap mode on the e200z759n3 core may be used for a powerdown state with the Time Base enabled. • HID0SLEEP —The interpretation of the sleep mode bit is done by the external system logic. Sleep mode on the e200z759n3 core may be used for a powerdown state with the Time Base disabled. 9.1.7 Software considerations for power management using wait instructions Executing a wait instruction causes the e200z759n3 core to complete instruction fetch and execution activity and await an interrupt. The p_waiting output is asserted once the Waiting state is entered. External system hardware may interpret the state of this signal and activate the p_halt and/or p_stop inputs to cause the e200z759n3 core to enter a quiescent state in which clocks may be disabled for low power operation. Alternatively, system hardware may utilize some other clock control mechanism while the processor is in the Waiting state, and p_wakeup remains negated. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 557 9.1.8 Software considerations for power management using Doze, Nap or Sleep Setting MSR[WE] generates a request to enter a power saving state. The power saving state (doze, nap, or sleep) must be previously determined by setting the appropriate HID0 bit. Setting MSR[WE] has no direct effect on instruction execution, but it simply reflected on p_doze, p_nap, and p_sleep depending on the setting of HID0DOZE, HID0NAP, and HID0SLEEP respectively. Note that the e200z759n3 core is not affected by assertion of these pins directly. External system hardware may interpret the state of these signals and activate the p_halt and/or p_stop inputs to cause the e200z759n3 core to enter a quiescent state in which clocks may be disabled for low power operation. To ensure a clean transition into and out of a power saving mode, the following program sequence is recommended: loop: sync mtmsr (WE) isync br loop (optionally use a wait instruction) An interrupt is typically used to exit a power saving state. The p_wakeup output is used to indicate to the system logic that an interrupt (or a debug request) has become pending. System logic uses this output to re-enable the clocks and exit a low power state. The interrupt handler is responsible for determining how to exit the low power loop if one is used. Wait instructions will be exited automatically. The vectored interrupt capability provided by the core may be useful in assisting the determination if an external hardware interrupt is used to perform the wake-up. 9.1.9 Debug considerations for power management When a debug request is presented to the e200z759n3 core while in either the Waiting, Halted or Stopped state, the p_wakeup signal will be asserted, and when m_clk is provided to the CPU, it will temporarily exit the Waiting, Halted or Stopped state and will enter Debug mode regardless of the assertion of p_halt or p_stop. The p_waiting, p_halted, and p_stopped outputs will be negated for the duration of the time the CPU remains in a debug session (jd_debug_b asserted). When the debug session is exited, the CPU will re-sample the p_halt and p_stop inputs and will re-enter the Halted or Stopped state as appropriate. If the CPU was previously waiting, and no interrupt was received while in the debug session, it will re-enter the Waiting state and re-assert p_waiting. e200z759n3 Core Reference Manual, Rev. 2 558 Freescale Semiconductor Chapter 10 Memory Management Unit 10.1 Overview The e200z759n3 Memory Management Unit is a 32-bit PowerISA 2.06 compliant implementation, with the following feature set: • Freescale EIS MMU architecture compliant • Translates from 32-bit effective to 32-bit real addresses • 32-entry fully associative TLB with support for twenty-three page sizes (1 KB, 2 KB, 4 KB, 8 KB, 16 KB, 32 KB, 64 KB, 128 KB, 256 KB, 512 KB, 1 MB, 2 MB, 4 MB, 8 MB, 16 MB, 32 MB, 64 MB, 128 MB, 256 MB, 512 MB, 1 GB, 2 GB, 4 GB) • Hardware assist for TLB miss exceptions • Software managed by tlbre, tlbwe, tlbsx, tlbsync, and tlbivax instructions • Support for external control of entry matching for a subset of TID values to support non-intrusive runtime mapping modifications 10.2 10.2.1 Effective to real address translation Effective addresses Instruction accesses are generated by sequential instruction fetches or due to a change in program flow (branches and interrupts). Data accesses are generated by load, store, and cache management instructions. The e200z759n3 instruction fetch, branch, and load/store units generate 32-bit effective addresses. The MMU translates this effective address to a 32-bit real address, which is then used for memory accesses. The PowerISA 2.06 architecture divides the effective (virtual) and real (physical) address space into pages. The page represents the granularity of effective address translation, permission control, and memory/cache attributes. The e200z759n3 MMU supports twenty-three page sizes (1 KB, 2 KB, 4 KB, 8 KB, 16 KB, 32 KB, 64 KB, 128 KB, 256 KB, 512 KB, 1 MB, 2 MB, 4 MB, 8 MB, 16 MB, 32 MB, 64 MB, 128 MB, 256 MB, 512 M, 1 GB, 2 GB, 4 GB). In order for an effective to real address translation to exist, a valid entry for the page containing the effective address must be in a Translation Lookaside Buffer (TLB). Addresses for which no TLB entry exists (a TLB miss) cause Instruction or Data TLB Errors. 10.2.2 Address spaces Instruction accesses are generated by sequential instruction fetches or due to a change in program flow (branches and interrupts). Data accesses are generated by load, store, and cache management instructions. The PowerISA 2.06 architecture defines two effective address spaces for instruction accesses and two effective address spaces for data accesses. The current effective address space for instruction or data accesses is determined by the value of MSR[IS] and MSR[DS], respectively. The address space indicator (the value of either MSR[IS] or MSR[DS], as appropriate) is used in addition to the effective address generated by the processor for translation into a physical address by the TLB mechanism. Because e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 559 MSR[IS] and MSR[DS] are both cleared to ‘0’ when an interrupt occurs, an address space value of 0b0 can be used to denote interrupt-related address spaces (or possibly all system software address spaces), and an address space value of 0b1 can be used to denote non interrupt-related (or possibly all user address spaces) address spaces. The address space associated with an instruction or data access is included as part of the virtual address in the translation process (AS). The p_tc[1] interface signal indicates the appropriate address space. 10.2.3 Process ID The PowerISA 2.06 architecture defines that a process ID (PID) value is associated with each effective address (instruction or data) generated by the processor. At the Book E level, a single PID register is defined as a 32-bit register, and it maintains the value of the PID for the current process. This PID value is included as part of the virtual address in the translation process (PID0). For the e200z759n3 MMU, the PID is 8 bits in length. The most-significant 24 bits are unimplemented and read as ‘0’. The p_pid0[0:7] interface signals indicate the current process ID. 10.2.4 Translation flow The effective address, concatenated with the address space value of the corresponding MSR bit (MSR[IS] or MSR[DS], is compared to the appropriate number of bits of the EPN field (depending on the page size) and the TS field of TLB entries. If the contents of the effective address plus the address space bit matches the EPN field and TS bit of the TLB entry, that TLB entry is a candidate for a possible translation match. In addition to a match in the EPN field and TS, a matching TLB entry must match with the current Process ID of the access (in PID0), or have a TID value of ‘0’, indicating the entry is globally shared among all processes. Figure 10-1 shows the translation match logic for the effective address plus its attributes, collectively called the virtual address, and how it is compared with the corresponding fields in the TLB entries. TLB_entry[V] TLB_entry[TS] AS (from MSR[IS] or MSR[DS]) Process ID TLB_entry[TID] TLB_entry[EPN] EA page number bits TLB entry Hit =? =? =0? private page shared page =? Figure 10-1. Virtual address and TLB entry compare process The page size defined for a TLB entry determines how many bits of the effective address are compared with the corresponding EPN field in the TLB entry as shown in Table 10-1. On a TLB hit, the corresponding bits of the Real Page Number (RPN) field are used to form the real address. e200z759n3 Core Reference Manual, Rev. 2 560 Freescale Semiconductor Table 10-1. Page size field encodings and EPN field comparison SIZE field Page size (2SIZEKB) EA to EPN comparison 0b00000 0b00001 0b00010 0b00011 0b00100 0b00101 0b00110 0b00111 0b01000 0b01001 0b01010 0b01011 0b01100 0b01101 0b01110 0b01111 0b10000 0b10001 0b10010 0b10011 0b10100 0b10101 0b10110 1 KB 2 KB 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 MB 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 512 MB 1 GB 2 GB 4 GB EA[0:21] =? EPN[0:21] EA[0:20] =? EPN[0:20] EA[0:19] =? EPN[0:19] EA[0:18] =? EPN[0:18] EA[0:17] =? EPN[0:17] EA[0:16] =? EPN[0:16] EA[0:15] =? EPN[0:15] EA[0:14] =? EPN[0:14] EA[0:13] =? EPN[0:13] EA[0:12] =? EPN[0:12] EA[0:11] =? EPN[0:11] EA[0:10] =? EPN[0:10] EA[0:9] =? EPN[0:9] EA[0:8] =? EPN[0:8] EA[0:7] =? EPN[0:7] EA[0:6] =? EPN[0:6] EA[0:5] =? EPN[0:5] EA[0:4] =? EPN[0:4] EA[0:3] =? EPN[0:3] EA[0:2] =? EPN[0:2] EA[0:1] =? EPN[0:1] EA[0] =? EPN[0] (none) On a TLB hit, the generation of the physical address occurs as shown in Figure 10-2. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 561 MSR[DS] for data access MSR[IS] for instruction fetch 32-bit Effective Address AS PID Effective Page Address 0 Offset n–1n 31 Virtual Address TLB multiple-entry RPN field of matching entry Real Page Number 0 Offset n–1n 31 32-bit Real Address NOTE: n = 32–log2(page size) n <= 22 n = 20 for 4 KB page size. Figure 10-2. Effective to real address translation flow 10.2.5 Permissions An operating system may restrict access to virtual pages by selectively granting permissions for user mode read, write, and execute, and supervisor mode read, write, and execute on a per page basis. These permissions can be set up for a particular system (for example, program code might be execute-only, data structures may be mapped as read/write/no-execute) and can also be changed by the operating system based on application requests and operating system policies. The UX, SX, UW, SW, UR, and SR access control bits are provided to support selective permissions (access control): • SR—Supervisor read permission. Allows loads and load-type cache management instructions to access the page while in supervisor mode (MSR[PR=0]). • SW—Supervisor write permission. Allows stores and store-type cache management instructions to access the page while in supervisor mode (MSR[PR=0]). • SX—Supervisor execute permission. Allows instruction fetches to access the page and instructions to be executed from the page while in supervisor mode (MSR[PR=0]). • UR—User read permission. Allows loads and load-type cache management instructions to access the page while in user mode (MSR[PR=1]). • UW—User write permission. Allows stores and store-type cache management instructions to access the page while in user mode (MSR[PR=1]). e200z759n3 Core Reference Manual, Rev. 2 562 Freescale Semiconductor • UX—User execute permission. Allows instruction fetches to access the page and instructions to be executed from the page while in user mode (MSR[PR=1]). If the translation match was successful, the permission bits are checked as shown in Figure 10-3. If the access is not allowed by the access permission mechanism, the processor generates an Instruction or Data Storage interrupt (ISI or DSI). The current privilege level of an access is signaled to the MMU with the CPU’s p_tc[0] output signal. TLB match (see MSR[PR] instruction fetch TLB_entry[UX] access granted TLB_entry[SX] load-class data access TLB_entry[UR] TLB_entry[SR] store-class data access TLB_entry[UW] TLB_entry[SW] Figure 10-3. Granting of access permission 10.2.6 Restrictions on 1 KB and 2 KB page size usage Because of certain implementation limitations regarding coherency lookup operations (lookup is done by physical address), if 1 KB or 2 KB pages are used, the low order virtual address bits used to index the cache (A[20:21] for 1 KB pages, A20 for 2 KB pages) must match the corresponding physical address bit value(s). For example, if logical page X maps to physical page P, then X and P must have the same values of A[20:21] for 1 KB pages, and A20 for 2 KB pages. This restriction must be followed for proper CPU operation. 10.3 Translation Lookaside Buffer (TLB) The Freescale EIS architecture defines support for zero or more TLBs in an implementation, each with its own characteristics, and provides configuration information for software to query the existence and structure of the TLB(s) through a set of special purpose registers: MMUCFG, TLB0CFG, TLB1CFG, etc. By convention, TLB0 is used for a set associative TLB with fixed page sizes, TLB1 is used for a fully associative TLB with variable page sizes, and TLB2 is arbitrarily defined by an implementation. The e200z759n3 MMU supports a TLB that is fully associative and supports variable page sizes, thus it corresponds to TLB1. TLB1 consists of a 32-entry, fully associative CAM array with support for twenty-three page sizes. To perform a lookup, the CAM is searched in parallel for a matching TLB entry. The contents of this TLB entry are then concatenated with the page offset of the original effective address. The result constitutes the real (physical) address of the access. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 563 A hit to multiple TLB entries is considered to be a programming error. If this occurs, the TLB generates an invalid address but an exception will not be reported. Table 10-2. TLB entry bit definitions Field Comments V Valid bit for entry TS Translation address space (compared against AS bit) TID[0:7] 10.4 Translation ID (compared against PID0 or ‘0’) EPN[0:21] Effective page number (compared against effective address) RPN[0:21] Real page number (translated address) SIZE[0:4] Page size (see Table 10-1) SX, SW, SR Supervisor execute, write, and read permission bits UX, UW, UR User execute, write, and read permission bits WIMGE Translation attributes (write-through required, cache-inhibited, memory coherence required, guarded, endian) U0-U3 User bits — used only by software IPROT Invalidation protect VLE VLE page indicator Configuration information Information about the configuration for a given MMU implementation is available to system software by reading the contents of the MMU configuration SPRs. These SPRs describe the architectural version of the MMU, the number of TLB arrays, and the characteristics of each TLB array. 10.4.1 MMU Configuration Register (MMUCFG) 0 1 2 3 4 5 6 7 8 MAVN 0 NTLBS 0 PIDSIZE 0 NPIDS RASIZE The MMU Configuration Register (MMUCFG) is a 32-bit read-only register. The SPR number for MMUCFG is 1015 in decimal. MMUCFG provides information about the configuration of the e200z759n3 MMU design. The MMUCFG register is shown in Figure 10-4. 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 1015; Read-Only Figure 10-4. MMU Configuration Register (MMUCFG) The MMUCFG bits are described in Table 10-3. e200z759n3 Core Reference Manual, Rev. 2 564 Freescale Semiconductor Table 10-3. MMUCFG field descriptions 1 Bits Name Function 0:7 [32:39] — 8:14 [40:46] RASIZE 15:16 [47:48] — 17:20 [49:52] NPIDS 21:25 [53:57] PIDSIZE 26:27 [58:59] — 28:29 [60:61] NTLBS Number of TLBs 01 This version of the MMU implements two TLB structures: a null TLB0 and a fully-associative TLB for TLB1 30:31 [62:63] MAVN MMU Architecture Version Number 00 This version of the MMU implements Version 1.0 of the Freescale EIS MMU Architecture Reserved1 Number of Bits of Real Address supported 0100000- This version of the MMU implements 32 real address bits Reserved1 Number of PID Registers 0001 This version of the MMU implements one PID register (PID0) PID Register Size 00111 PID registers contain 8 bits in this version of the MMU Reserved1 These bits are not implemented and will be read as zero. 10.4.2 TLB0 Configuration Register (TLB0CFG) 0 1 2 3 4 5 6 7 8 MAXSIZE P2PSA MINSIZE AVAIL ASSOC IPROT The TLB0 Configuration Register (TLB0CFG) is a 32-bit read-only register. The SPR number for TLB0CFG is 688 in decimal. TLB0CFG provides information about the configuration of TLB0. Since the e200z759n3 MMU design does not implement TLB0, this register reads as all ‘0’. It is supplied to allow software to query it in a fashion compatible with other Freescale EIS designs. The TLB0CFG register is shown in Figure 10-5. 0 NENTRY 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 688; Read-Only Figure 10-5. TLB0 Configuration Register (TLB0CFG) The TLB0CFG bits are described in Table 10-4. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 565 Table 10-4. TLB0CFG field descriptions Bits Name 0:7 [32:39] ASSOC Associativity 0 8:11 [40:43] MINSIZE Minimum Page Size 0 12:15 [44:47] MAXSIZE Maximum Page Size 0 16 [48] IPROT Invalidate Protect Capability 0 Not present in TLB0 17 [49] AVAIL Page Size Availability 0 No variable page sizes available 18 [50] P2PSA Power-of-2 Page Size Availability 0 No odd powers of 2 page sizes are supported 19 [51] — 20:31 [52:63] NENTRY 1 Function Reserved1 Number of Entries 0 TLB0 contains 0 entries These bits are not implemented and will be read as zero. 10.4.3 TLB1 Configuration Register (TLB1CFG) 0 1 2 3 4 5 6 7 8 MAXSIZE P2PSA MINSIZE AVAIL ASSOC IPROT The TLB1 Configuration Register (TLB1CFG) is a 32-bit read-only register. The SPR number for TLB1CFG is 689 in decimal. TLB1CFG provides information about the configuration of TLB1 in the e200z759n3 MMU. The TLB1CFG register is shown in Figure 10-6. 0 NENTRY 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 689; Read-Only Figure 10-6. TLB1 Configuration Register (TLB1CFG) The TLB1CFG bits are described in Table 10-5. Table 10-5. TLB1CFG field descriptions Bits Name 0:7 [32:39] ASSOC Function Associativity 0x20 Indicates that TLB1 associativity is 32 e200z759n3 Core Reference Manual, Rev. 2 566 Freescale Semiconductor Table 10-5. TLB1CFG field descriptions (continued) Bits Name 8:11 [40:43] MINSIZE Minimum Page Size 0x0Smallest page size is 1 KB 12:15 [44:47] MAXSIZE Maximum Page Size 0xB Largest page size is 4 GB 16 [48] IPROT Invalidate Protect Capability 1 Invalidate Protect Capability is supported in TLB1 17 [49] AVAIL Page Size Availability 1 All page sizes between MINSIZE and MAXSIZE are supported 18 [50] P2PSA Power-of-2 Page Size Availability 1 All odd powers of 2 page sizes between MINSIZE and MAXSIZE are supported (2 KB, 8 KB, 32 KB, etc.) 19 [51] — 20:31 [52:63] NENTRY 1 10.5 Function Reserved1 Number of Entries 0x20 Indicates that TLB1 contains 32 entries These bits are not implemented and will be read as zero. Software interface and TLB instructions The TLB is accessed indirectly through several MMU Assist (MAS) registers. Software can write and read the MMU Assist registers with mtspr and mfspr instructions. These registers contain information related to reading and writing a given entry within the TLB. Data is read from the TLB into the MAS registers with a tlbre (TLB read entry) instruction. Data is written to the TLB from the MAS registers with a tlbwe (TLB write entry) instruction. Certain fields of the MAS registers are also written by hardware when an Instruction TLB Error or Data TLB Error interrupt occurs. On a TLB Error interrupt, the MAS registers will be written by hardware with the proper EA, default attributes (TID, WIMGE, permissions, etc.), and TLB selection information, and an entry in the TLB to replace. Software manages this entry selection information by updating a replacement entry value during TLB miss handling. Software must provide the correct RPN and permission information in one of the MAS registers before executing a tlbwe instruction. On taking a DSI or ISI interrupt, software should update the search PID (SPID) and search address space (SAS) fields in the MAS registers using PID0, and appropriate MSR[IS] or MSR[DS] values that were used when the DSI or ISI exception was recognized. During the interrupt handler, software can issue a TLB search instruction (tlbsx), which uses the SPID field along with the SAS field, to determine the entry related to the DSI or ISI exception. (It is possible that the entry that caused the DSI or ISI interrupt no longer exists in the TLB by the time the search occurs if a TLB invalidate or replacement removes the entry between the time the exception is recognized and when the tlbsx is executed.) The tlbre, tlbwe, tlbsx, tlbivax, and tlbsync instructions are privileged. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 567 10.5.1 TLB read entry instruction (tlbre) The TLB read entry instruction causes the content of a single TLB entry to be placed in the MMU assist registers. The entry is specified by the TLBSEL and ESEL fields of the MAS0 register. The entry contents are placed in the MAS1, MAS2, and MAS3 registers. See Table 10-15 for details on how MAS register fields are updated. tlbre tlbre tlb read entry 31 0 0 5 6 1110110010 20 21 0 30 31 tlb_entry_id = MAS0(TLBSEL, ESEL) result = MMU(tlb_entry_id) MAS1, MAS2, MAS3 = result 10.5.2 TLB write entry instruction (tlbwe) The TLB write entry instruction causes the contents of certain fields within the MMU assist registers MAS1, MAS2, and MAS3 to be written into a single TLB entry in the MMU. The entry written is specified by the TLBSEL, and ESEL fields of the MAS0 register. tlbwe tlbwe tlb write entry 31 0 0 5 6 1111010010 20 21 0 30 31 tlb_entry_id = MAS0(TLBSEL, ESEL) MMU(tlb_entry_id) = MAS1, MAS2, MAS3 10.5.3 TLB search instruction (tlbsx) The TLB search instruction updates the MMU assist registers conditionally based on success or failure of a lookup of the TLB. The lookup is controlled by an effective address provided by GPR[RB] as specified in the instruction encoding, as well as by the SAS and SPID search fields in MAS6. The values placed into e200z759n3 Core Reference Manual, Rev. 2 568 Freescale Semiconductor MAS0, MAS1, MAS2, and MAS3 differ depending on a successful or unsuccessful search. See Table 10-15 for details on how MAS register fields are updated. tlbsx tlbsx TLB Search Indexed tlbsx RA,RB 31 0 0 5 6 Form X RA 10 11 RB 15 16 1110010010 20 21 0 30 31 if RA!=0 then EA = GPR(RA) + GPR(RB) else EA = GPR(RB) ProcessIDs = MAS6(SPID), 8’b00000000 AS = MAS6(SAS) VA = AS || ProcessIDs || EA if Valid_TLB_matching_entry_exists(VA) then result = see Table 10-15, column labelled “tlbsx hit” else result = see Table 10-15, column labelled “tlbsx miss” MAS0, MAS1, MAS2, MAS3 = result 10.5.4 TLB Invalidate (tlbivax) Instruction The TLB invalidate operation is performed whenever a TLB Invalidate Virtual Address Indexed (tlbivax) instruction is executed. This instruction invalidates TLB entries that correspond to the virtual address calculated by this instruction. The address is detailed in Table 10-6. No other information except for that shown in Table 10-6 is used for the invalidation (entry AS and TID values are don’t-cared). Additional information about the targeted TLB entries is encoded in two of the lower bits of the effective address calculated by the tlbivax instruction. Bit 28 of the tlbivax effective address is the TLBSEL field. This bit should be set to ‘1’ to ensure TLB1 is targeted by the invalidate. Bit 29 of the tlbivax effective address is the INV_ALL field. If this bit is set, it indicates that the invalidate operation needs to completely invalidate all entries of TLB1 that are not marked as invalidation protected (IPROT bit of entry set to ‘1’). The bits of EA used to perform the tlbivax invalidation of TLB1 are bits 0:21. t Table 10-6. tlbivax EA bit definitions Bits Description 0:21 EA[0:21] 22:27 Reserved1 28 TLBSEL(1=TLB1) Should be set to ‘1’ for future compatibility. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 569 Table 10-6. tlbivax EA bit definitions Bits Description 29 INV_ALL Reserved1 30:31 1 These bits should be zero for future compatibility. They are ignored. tlbivax tlbivax TLB Invalidate Virtual Address Indexed tlbivax RA,RB 0 31 0 5 6 Form X RA 10 11 RB 15 16 1100010010 20 21 0 30 31 if RA!=0 then EA = GPR(RA) + GPR(RB) else EA = GPR(RB) VA = EA if (Valid_TLB_matching_entry_exists(VA) or INV_ALL) and Entry_IPROT_not_set then Invalidate entry 10.5.5 TLB synchronize instruction (tlbsync) The TLB synchronize instruction is treated as a privileged no-op by the e200z759n3. tlbsync tlbsync TLB Synchronize tlbsync 31 0 0 5 6 10 11 1000110110 15 16 20 21 0 30 31 e200z759n3 Core Reference Manual, Rev. 2 570 Freescale Semiconductor 10.6 10.6.1 TLB operations Translation reload The TLB reload function is performed in software with some hardware assist. This hardware assist consists of: • Five 32-bit MMU assist registers (MAS0-4,MAS6) for support of the tlbre, tlbwe, and tlbsx TLB management instructions. • Loading of MAS0-2 based upon defaults in MAS4 for TLB miss exceptions. This automatically generates most of the TLB entry. • Loading of the data exception address register (DEAR) with the effective address of the load, store, or cache management instruction that caused an Alignment, Data TLB Miss, or Data Storage Interrupt. • The tlbwe instruction. When tlbwe is executed, the new TLB entry contained in MAS0-MAS2 is written into the TLB. 10.6.2 Reading the TLB The TLB array can be read by first writing the necessary information into MAS0 using mtspr and then executing the tlbre instruction. To read an entry from the TLB, the TLBSEL field in MAS0 must be set to ‘01’, and the ESEL bits in MAS0 must be set to point to the desired entry. After executing the tlbre instruction, MAS1-MAS3 will be updated with the data from the selected TLB entry. 10.6.3 Writing the TLB The TLB1 array can be written by first writing the necessary information into MAS0-MAS3 using mtspr and then executing the tlbwe instruction. To write an entry into the TLB, the TLBSEL field in MAS0 must be set to ‘01’, and the ESEL bits in MAS0 must be set to point to the desired entry. When the tlbwe instruction is executed, the TLB entry information stored in MAS1-MAS3 will be written into the selected TLB entry. 10.6.4 Searching the TLB The TLB can be searched using the tlbsx instruction by first writing the necessary information into MAS6. The tlbsx instruction will search using EPN[0:21] from the GPR selected by the instruction, SAS (search AS bit) in MAS6, and SPID in MAS6. If the search is successful, the given TLB entry information will be loaded into MAS0-MAS3. The valid bit in MAS1 is used as the success flag. If the search is successful, the valid bit in MAS1 will be set; if unsuccessful it is cleared. The tlbsx instruction is useful for finding the TLB entry that caused a DSI or ISI exception. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 571 10.6.5 TLB miss exception update When a TLB miss exception occurs, MAS0-MAS3 are updated with the defaults specified in MAS4, and the AS and EPN[0:21] of the access that caused the exception. In addition, the ESEL bits are updated with the replacement entry value. This sets up all the TLB entry data necessary for a TLB write except for the RPN[0:21], the U0-U3 user bits, and the UX/SX/UW/SW/UR/SR permission bits, all of which are stored in MAS3. Thus, if the defaults stored in MAS4 are applicable to the TLB entry to be loaded, the TLB miss exception handler will only have to update MAS3 via mtspr before executing tlbwe. If the defaults are not applicable to the TLB entry being loaded, then the TLB miss exception handler will have to update MAS0-MAS2 before performing the TLB write. 10.6.6 IPROT invalidation protection The IPROT bit is used to protect TLB entries from invalidation. TLB entries with IPROT set are not invalidated by a tlbivax instruction (even when INV_ALL is indicated), nor by the MMUCSR0[TLB1_FI] control function. The IPROT bit is used to protect interrupt vectors/handlers, since the instruction fetch of those vectors must be guaranteed to never take a TLB miss exception. 10.6.7 TLB load on reset During reset, all TLB entries except entry 0 are invalidated. TLB entry 0 is loaded with the values in the following table: Table 10-7. TLB entry 0 values after reset Field Reset value Comments VALID 1 Entry is valid TS 0 Address space 0 TID[0:7] 0x00 EPN[0:21] value of p_rstbase[0:21] Page address present on p_rstbase[0:29]. See Section 14.2.2.5, Reset base (p_rstbase[0:29]) RPN[0:21] value of p_rstbase[0:21] Page address present on p_rstbase[0:29]. See Section 14.2.2.5, Reset base (p_rstbase[0:29]) SIZE[0:4] 00010 SX/SW/SR 111 Full supervisor mode access allowed UX/UW/UR 111 Full user mode access allowed WIMG 0100 Cache inhibited, non-coherent E value of p_rst_endmode U0-U3 0000 TID value for shared (global) page 4KB page size Value present on p_rst_endmode. See Section 14.2.2.6, Reset endian mode (p_rst_endmode) User bits e200z759n3 Core Reference Manual, Rev. 2 572 Freescale Semiconductor Table 10-7. TLB entry 0 values after reset Field Reset value IPROT 1 VLE the value of p_rst_vlemode 10.6.8 Comments Page is protected from invalidation Value present on p_rst_vlemode signal. See Section 14.2.2.7, Reset VLE Mode (p_rst_vlemode). The G bit The G bit provides protection from bus accesses that could be cancelled due to an exception on a prior uncompleted instruction. If G=1 (guarded), these types of accesses must stall (if they miss in the cache) until the exception status of the instruction(s) in progress is known. If G=0 (unguarded), then these accesses may be issued to the bus regardless of the completion status of other instructions. Since the e200z759n3 does not make requests to the bus for load or store instructions that miss in the cache until it is known that prior instructions will complete without exceptions, proper operation will always occur to guarded storage. 10.7 MMU control registers 10.7.1 Data Exception Address Register (DEAR) The Data Exception Address register is loaded with the effective address of the data access that results in an Alignment, Data TLB Miss, or DSI exception. Effective Page Address 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 61; Read/ Write; Reset - Unaffected Figure 10-7. Data Exception Address Register (DEAR) The DEAR register can be read or written using the mfspr and mtspr instructions. 10.7.2 MMU Control and Status Register 0 (MMUCSR0) The MMU Control and Status Register 0 (MMUCSR0) is a 32-bit register. The SPR number for MMUCSR0 is 1012 in decimal. MMUCSR0 controls the state of the MMU. The MMUCSR0 register is shown in Figure 10-8. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 573 TLB1_FI 0 0 1 2 3 4 5 6 7 8 0 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 1012; Read/ Write; Reset - 0x0 Figure 10-8. MMU Control and Status Register 0 (MMUCSR0) The MMUCSR0 bits are described in Table 10-8. Table 10-8. MMUCSR0 field descriptions Bits Name 0:29 [32:61] — 30 [62] TLB1_FI 31 [63] — 1 Description Reserved1 TLB1 flash invalidate 0 No flash invalidate 1 TLB1 invalidation operation When written to a ‘1’, a TLB1 invalidation operation is initiated by hardware. Once complete, this bit is reset to ‘0’. Writing a ‘1’ while an invalidation operation is in progress will result in an undefined operation. Writing a ‘0’ to this bit while an invalidation operation is in progress will be ignored. TLB1 invalidation operations require 3 cycles to complete. Reserved1 These bits are not implemented, will be read as zero, and writes are ignored. 10.7.3 MMU assist registers (MAS) The e200z759n3 uses six special purpose registers (MAS0, MAS1, MAS2, MAS3, MAS4, and MAS6) to facilitate reading, writing, and searching the TLBs. The MAS registers can be read or written using the mfspr and mtspr instructions. The e200z759n3 does not implement the MAS5 register, present in other Freescale Book E designs, because the tlbsx instruction only searches based on a single SPID value. 10.7.3.1 MMU Read/Write and Replacement Control register (MAS0) TLBSEL (01) The MAS0 register is shown in Figure 10-9. Fields are defined in Table 10-9. 0 0 1 2 0 3 4 5 6 7 ESEL 8 0 NV 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 624; Read/ Write; Reset - Unaffected Figure 10-9. MMU Assist Register 0 (MAS0) e200z759n3 Core Reference Manual, Rev. 2 574 Freescale Semiconductor Table 10-9. MAS0 field descriptions Bit Name 0:1 [32:33] — 2:3 [34:35] TLBSEL 4:10 [36:42] — 11:15 [43:47] ESEL 16:25 [48:57] — Reserved1 27:31 [59:63] NV Next replacement victim for TLB1 (software managed) Software updates this field; it is copied to the ESEL field on a TLB Error (see Table 10-15) 1 Description Reserved1 Selects TLB for access: 00=TLB0, 01=TLB1 (ignored by Zen, should be written to 01 for future compatibility) Reserved1 Entry select for TLB. These bits are not implemented, will be read as zero, and writes are ignored. 10.7.3.2 Descriptor Context and Configuration Control register (MAS1) IPROT 0 1 0 2 3 4 TID 5 6 7 8 0 TS VALID The MAS1 register is shown in Figure 10-10. Fields are defined in Table 10-10. TSIZ 0 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 625; Read/ Write; Reset - Unaffected Figure 10-10. MMU Assist Register 1 (MAS1) Table 10-10. MAS1 field descriptions Bit Name Description 0 [32] VALID TLB Entry Valid 0 This TLB entry is invalid 1 This TLB entry is valid 1 [33] IPROT Invalidation Protect 0 Entry is not protected from invalidation 1 Entry is protected from invalidation as described in Section 10.6.6, IPROT invalidation protection. Protects TLB entry from invalidation by tlbivax (TLB1 only), or flash invalidates through MMUSCR0[TLB1_FI]. 2:7 [34:39] — Reserved1 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 575 Table 10-10. MAS1 field descriptions (continued) Bit Name Description 8:15 [40:47] TID Translation ID bits This field is compared with the current process IDs of the effective address to be translated. A TID value of 0 defines an entry as global and matches with all process IDs. 16:18 [48:50] — Reserved1 19 [51] TS Translation address space This bit is compared with the IS or DS fields of the MSR (depending on the type of access) to determine if this TLB entry may be used for translation. 20:24 [52:56] TSIZE Entry’s page size Supported page sizes are: 0b00000 — 1 KB 0b00001 — 2 KB 0b00010 — 4 KB 0b00011 — 8 KB 0b00100 — 16 KB 0b00101 — 32 KB 0b00110 — 64 KB 0b00111 — 128 KB 0b01000 — 256 KB 0b01001 — 512 KB 0b01010 — 1 MB 0b01011 — 2 MB 0b01100 — 4 MB 0b01101 — 8 MB 0b01110 — 16 MB 0b01111 — 32 MB 0b10000 — 64 MB 0b10001 — 128 MB 0b10010 — 256 MB 0b10011 — 512 MB 0b10100 — 1 GB 0b10101 — 2 GB 0b10110 — 4 GB All other values are undefined 25:31 [57:63] 1 — Reserved1 These bits are not implemented, will be read as zero, and writes are ignored. 10.7.3.3 EPN and Page Attributes register (MAS2) EPN 0 VLE The MAS2 register is shown in Figure 10-11. Fields are defined in Table 10-11. W I M G E Figure 10-11. MMU Assist Register 2 (MAS2) e200z759n3 Core Reference Manual, Rev. 2 576 Freescale Semiconductor 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 626; Read/ Write; Reset - Unaffected Figure 10-11. MMU Assist Register 2 (MAS2) Table 10-11. MAS2 field descriptions 1 Bit Name Description 0:21 [32:53] EPN 22:25 [54:57] — 26 [58] VLE PowerISA VLE 0 This page is a standard BookE page 1 This page is a PowerISA VLE page This bit will always read as zero and writes will be ignored if p_vle_present is negated. 27 [59] W Write-through Required 0 This page is considered write-back with respect to the caches in the system 1 All stores performed to this page are written through to main memory 28 [60] I Cache Inhibited 0 This page is considered cacheable 1 This page is considered cache-inhibited 29 [61] M Memory Coherence Required 0 Memory Coherence is not required 1 Memory Coherence is required 30 [62] G Guarded 0 Access to this page are not guarded, and can be performed before it is known if they are required by the sequential execution model 1 All loads and stores to this page are performed without speculation (i.e. they are known to be required) Zen Z7 uses the guarded attribute as described in Section 11.16, Page table control bits, for more information. 31 [63] E Endianness 0 The page is accessed in big-endian byte order. 1 The page is accessed in true little-endian byte order. Determines endianness for the corresponding page. Refer to Section 15.2.4, Byte lane specification, for more information Effective page number [0:21] Reserved1 These bits are not implemented, will be read as zero, and writes are ignored. 10.7.3.4 RPN and Access Control register (MAS3) The MAS3 register is shown in Figure 10-12. Fields are defined in Table 10-12. RPN U U U U U S U S U S 0 1 2 3 X X W W R R Figure 10-12. MMU Assist Register 3 (MAS3) e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 577 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 627; Read/ Write; Reset - Unaffected Figure 10-12. MMU Assist Register 3 (MAS3) Table 10-12. MAS3 field descriptions Bit Name Description 0:21 [32:53] RPN Real page number [0:21] Only bits that correspond to a page number are valid. Bits that represent offsets within a page are ignored and should be zero. 22:25 [54:57] U0-U3 26:31 [58:63] PERMIS 10.7.3.5 User bits [0-3] for use by system software Permission bits (UX, SX, UW, SW, UR, SR) Hardware Replacement Assist Configuration register (MAS4) 0 1 2 3 4 5 6 7 8 ED GD MD ID 0 WD 0 VLED 0 TSIZED 0 TIDSELD TLBSELD (01) The MAS4 register is shown in Figure 10-13. Fields are defined in Table 10-13. 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 628; Read/ Write; Reset - Unaffected Figure 10-13. MMU Assist Register 4 (MAS4) Table 10-13. MAS4 field descriptions Bit Name 0:1 [32:33] — 2:3 [34:35] TLBSELD 4:13 [36:45] — 14:15 [46:47] TIDSELD Description Reserved1 Default TLB selected 00=TLB0, 01=TLB1 Reserved1 Default PID# to load TID from 00 PID0 01 Reserved, do not use 10 Reserved, do not use 11 TIDZ (8’h00)) (Use all zeros, the globally shared value) e200z759n3 Core Reference Manual, Rev. 2 578 Freescale Semiconductor Table 10-13. MAS4 field descriptions (continued) 1 Bit Name Description 16:19 [48:51] — 20:24 [52:56] TSIZED 25 [57] — 26 [58] VLED 27:31 [59:63] DWIMGE Reserved1 Default TSIZE value Reserved1 Default VLE value Default WIMGE values These bits are not implemented, will be read as zero, and writes are ignored. NOTE MAS5 is not implemented on the MPC560xS. 10.7.3.6 TLB Search Context Register 0 (MAS6) 0 0 1 2 3 SPID 4 5 6 7 8 0 SAS The MAS6 register is shown in Figure 10-14. Fields are defined in Table 10-14. 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 630; Read/ Write; Reset - Unaffected Figure 10-14. MMU Assist Register 6 (MAS6) Table 10-14. MAS6 field descriptions 1 Bit Name 0:7 [32:39] — 8:15 [40:47] SPID 16:30 [48:62] — 31 [63] SAS Description Reserved1 PID value for searches Reserved1 AS value for searches These bits are not implemented, will be read as zero, and writes are ignored. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 579 10.7.4 MAS registers summary The MAS registers are summarized in Figure 10-15. TSIZ 0 0 0 1 2 3 SPID 4 5 6 7 8 9 SR UR 0 ED TSIZED GD 0 W I M G E SAS 0 MAS6 0 TIDSELD TLBSELD MAS4 RPN U1 0 U0 EPN SW T S 0 UW TID ID 0 0 MD 0 NV 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 VLE 9 SX 8 WD 7 U3 6 UX 5 VLED VALID 4 IPROT 0 MAS3 MAS2 MAS1 3 U2 2 ESEL 1 TLBSEL (01) MAS0 0 0 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Figure 10-15. MMU assist registers summary 10.7.5 MAS register updates Table 10-15 details the updates to each MAS register field for each update type. Table 10-15. MMU assist register field updates Bit/field MAS affecte d Instr/data TLB error tlbsx hit tlbsx miss tlbre tlbwe ISI/DSI TLBSEL 0 TLBSELD ‘Hitting TLB’ TLBSELD NC NC NC ESEL 0 NV matched entry NV NC NC NC NV 0 NC NC NC NC NC NC VALID 1 1 1 0 V(array) NC NC IPROT 1 0 Matched IPROT if TLB1 hit, else 0 0 IPROT(array) if TBL1, else 0 NC NC TID[0:7] 1 TIDSELD (pid0,TIDZ) TID(array) SPID TID(array) NC NC TS 1 MSR(IS/DS) SAS SAS TS(array) NC NC TSIZE[0:4] 1 TSIZED TSIZE(array) TSIZED TSIZE(array) NC NC e200z759n3 Core Reference Manual, Rev. 2 580 Freescale Semiconductor Table 10-15. MMU assist register field updates (continued) Bit/field MAS affecte d Instr/data TLB error tlbsx hit tlbsx miss tlbre tlbwe ISI/DSI EPN[0:21] 2 I/D EPN EPN(array) tlbsx EPN EPN(Array) NC NC VWIMGE 2 Default values VWIMGE(array) Default values VWIMGE(array) NC NC RPN[0:21] 3 Zeroed RPN(Array) Zeroed RPN(Array) NC NC ACCESS (PERMISS + U0:U3) 3 Zeroed Access(Array) Zeroed Access(Array) NC NC TLBSELD 4 NC NC NC NC NC NC TIDSELD[0:1] 4 NC NC NC NC NC NC TSIZED[0:4] 4 NC NC NC NC NC NC Default VWIMGE 4 NC NC NC NC NC NC SPID 6 PID0 NC NC NC NC NC SAS 6 MSR(IS/DS) NC NC NC NC NC 10.8 TLB coherency control The e200z759n3 core provides the ability to invalidate a TLB entry as described in the Book E Power Architecture architecture. The tlbivax instruction invalidates local TLB entries only. No broadcast is performed, as no hardware-based coherency support is provided. The tlbivax instruction invalidates by effective address only. This means that only the TLB entry’s EPN bits are used to determine if the TLB entry should be invalidated. It is therefore possible for a single tlbivax instruction to invalidate multiple TLB entries, since the AS and TID fields of the entries are ignored. 10.9 Core interface operation for MMU control instructions MMU control instructions will utilize the normal CPU interface to perform MMU control instructions. The address bus will be driven with the effective address value calculated by the instruction (if any), the access will be treated as a Supervisor Data word-size write, and the Transfer Type encodings will be used to distinguish these operations from other load and store operations. These transfers will not cause debug Data Address Compare matches to occur regardless of the effective address that is driven. 10.9.1 Transfer type encodings for MMU control instructions Transfer type encodings are used to indicate whether a normal access, atomic access, cache management control access, or MMU management control access is being requested. These attribute signals are driven with addresses when an access is requested. Table 10-16 shows the definitions of the p_d_ttype[0:5] encodings. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 581 Table 10-16. Transfer type encoding p_d_ttype[0:5]1 1 Transfer type Instruction 00000e Normal normal loads / stores 000010 Atomic lbarx, lharx, lwarx, stbcx., sthcx., and stwcx. 00010e Flush Data Block dcbst 00011e Flush and Invalidate Data Block dcbf 00100e Allocate and Zero Data Block dcbz 001010 Invalidate Data Block dcbi 00110e Invalidate Instruction Block icbi 001110 Multiple word load/store lmw, stmw 010000 TLB Invalidate tlbivax 010010 TLB Search tlbsx 010100 TLB Read entry tlbre 010110 TLB Write entry tlbwe 011000 Touch for Instruction icbt 011010 Lock Clear for Instruction icblc 011100 Touch for Instruction and Lock Set icbtls 011110 Lock Clear for Data dcblc 10000e Touch for Data dcbt 10001e Touch for Data Store dcbtst 100100 Touch for Data and Lock Set dcbtls 100110 Touch for Data Store and Lock Set dcbtstls p_ttype[5] ‘e’ is set to set to 0. 10.10 Effect of hardware debug on MMU operation Hardware debug facilities utilize normal CPU instructions to access register and memory contents during a debug session. If desired during a debug session, the debug firmware may disable the translation process and may substitute default values for the Access Protection (UX, UR, UW, SX, SR, SW) bits, and values obtained from the OnCE Control Register for Page Attribute (VLE, W, I, M, G, E) bits normally provided by a matching TLB entry. In addition, no address translation is performed, and instead, a 1:1 mapping of effective to real addresses is performed. When disabled during the debug session, no TLB miss or TLB Access Protection related DSI conditions will occur. If the debugger desires to use the normal translation process, the MMU may be left enabled in the OnCE OCR, and normal translation (including the possibility of a TLB Miss or DSI) will remain in effect. Refer to Section 12.4.6.3, e200z759n3 OnCE Control Register (OCR), for more detail on controlling MMU operation during debug sessions. e200z759n3 Core Reference Manual, Rev. 2 582 Freescale Semiconductor 10.11 External translation alterations for realtime systems In order to support realtime systems in which dynamic mapping of calibration or other data types is needed, the MMU provides special capabilities on a subset of TLB entries. These capabilities allow external hardware to dynamically select one of multiple mappings to one or more physical pages by the same logical address. This capability provides an inexpensive way of dynamically overlaying selected RAM pages on top of read-only memory during runtime. The particular physical page a given logical page maps to can be dynamically altered by means of the p_extpid[6:7] inputs. This capability is only provided for TLB1 entries #0 – #15, and only for a restricted subset of PID values. Enabling of the dynamic mapping capability is controlled by the p_extpid_en control input. This input is sampled with the rising edge of the clock, and when asserted, allows for the dynamic remapping capability to be used. When one or more of TLB1 entries #0 – #15 is programmed with a TID value of 8‘b1111xxxx, special entry-specific logic is enabled for the entry. This logic causes the sampled values of the p_extpid[6:7] inputs to be used in place of PID0[6:7] for the purposes of comparison of this entry with the current PID0 register contents to determine an entry hit condition. In addition, for those entries within entries #0 – #15 programmed with a TID value of 8‘b1111xx11, the comparison of TID[6:7] to PID0[6:7] for a match is always forced true. This means that the hit condition for these entries is independent of the sampled values of the p_extpid[6:7] inputs. Entries within entries #0 – #15 programmed with a TID value of 8‘b1111nm00, will match a PID0 value of 8‘b1111nmxx when p_extpid[6:7] inputs are 00, Those programmed with a TID value of 8‘b1111nm01 will match a PID0 value of 8‘b1111nmxx when p_extpid[6:7] inputs are 01, and those programmed with a TID value of 8‘b1111nm10 will match a PID0 value of 8‘b1111nmxx when p_extpid[6:7] inputs are 10. Those entries within entries #0 –#15 programmed with a TID value of 8‘b1111nm11, will match a PID0 value of 8‘b1111nmxx regardless of the sampled values of the p_extpid[6:7] inputs. This logic allows application software of this type to set up to three independent mappings for a set of calibration pages, and for external hardware to select between one of the three based on the driven values of the p_extpid[6:7] inputs. The other pages are mapped with a common set of entries with stored TID values of 1111xx11, which will match for all sets of calibration page selections. This specialized software must use PID values in the range of 111100xx to 111111xx. Software is responsible for coordinating the modification to the p_extpid[6:7] inputs to ensure they only change when there is no possibility of an error induced by simultaneous use. Figure 10-16 shows the equivalent logical operation of the capability. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 583 TLB entry Hit TLB_entry[V] TLB_entry[TS] AS (from MSR[IS] or MSR[DS]) =? p_extpid_en mask_TID6:7_cmp TLB_entry [TID0:3] TLB_entry[TID6:7] Process ID[6:7] p_extpid6:7 0 force compare true for PID/TID 6:7 modified_PID[6:7] 1 TLB_entry[TID0:7] Process ID[0:5] TLB_entry[TID] TLB_entry[EPN] EA page number bits =? =0? private page shared page =? Note: Functionality available for entry # 0-15 only Figure 10-16. External translation alteration TLB entry compare process e200z759n3 Core Reference Manual, Rev. 2 584 Freescale Semiconductor Chapter 11 L1 Cache This chapter describes the organization of the on-chip L1 Caches, cache control instructions, and various cache operations. It describes the interaction between the caches, the load/store unit (LSU), the instruction unit, and the memory subsystem. This chapter also describes the replacement algorithm used for the L1 Caches. The L1 Caches incorporate the following features: • 16 KB I + 16 KB D harvard cache design • Virtually indexed, Physically tagged • 32-byte line size • 64-bit data, 32-bit address • Pseudo round-robin replacement algorithm • 8-entry store buffer • Push (copyback) buffer • Linefill buffer • Hit under fill/copyback • Supports up to two outstanding misses • Multi-bit EDC protection for the ICache data and tag arrays, with correction/auto-invalidation capability • Multi-bit EDC protection for the DCache tag arrays, parity protection for the DCache data arrays; with correction/auto-invalidation capability 11.1 Overview The e200z759n3 processor supports a pair of 16 KB 4-way set-associative split instruction and data caches with a 32-byte line size. The caches improve system performance by providing low-latency data to the e200z759n3 instruction and data pipelines, which decouples processor performance from system memory performance. The caches are virtually indexed and physically tagged. Instruction and data addresses from the processor to the caches are virtual addresses used to index the cache array. The MMU provides the virtual to physical translation for use in performing the cache tag compare. If the physical address matches a valid cache tag entry, the access hits in the cache. For a read operation, the cache supplies the data to the processor, and for a write operation, the data from the processor updates the cache. If the access does not match a valid cache tag entry (misses in the cache) or a write access must be written through to memory, the cache performs a bus cycle on the system bus. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 585 CACHE CONTROL SYSTEM BUS (INST) CONTROL CONTROL LOGIC CONTROL DATA ARRAY ICACHE INTERFACE BUS INTERFACE MODULE TAG ARRAY DATA DATA ADDRESS ADDRESS/ DATA DATA PATH ADDRESS ADDRESS PATH MEMORY MANAGEMENT UNIT PROCESSOR CORE ADDRESS ADDRESS PATH ADDRESS DATA PATH DATA DCACHE INTERFACE DATA BUS INTERFACE MODULE TAG ARRAY DATA ARRAY CONTROL CONTROL LOGIC CONTROL CONTROL DATA ADDRESS/ CACHE SYSTEM BUS (DATA) Figure 11-1. e200z759n3 caches 11.2 16 KB cache organization Each e200z759n3 16 KB cache is organized as four ways of 128 sets with each line containing 32 bytes (four doublewords) of storage. Figure 11-2 illustrates the cache organization along with the cache line format. e200z759n3 Core Reference Manual, Rev. 2 586 Freescale Semiconductor WAY 0 WAY 1 WAY 2 WAY 3 • • • • • • LINE • • • • • • SET 0 SET 1 • • • SET 126 SET 127 CACHE LINE FORMAT TAG V D L Doubleword0 Doubleword1 Doubleword2 Doubleword3 TAG - 22 bit Physical Address Tag + Parity L - Lock bits D - Dirty bits (DCACHE Only) V - Valid bit Figure 11-2. 16 KB cache organization and line format Virtual address bits A[20:26] provide an index to select a set. Ways are selected according to the rules of set association. Each line consists of a physical address tag, status bits, and four doublewords of data. Address bits A[27:29] select the word within the line. 11.3 Cache lookup Once enabled, the appropriate cache will be searched for a tag match on instruction fetches and data accesses from the CPU. If a match is found, the cached data is forwarded on a read access to the instruction fetch unit or the load/store unit (data access), or is updated on a write access, and may also be written-through to memory if required. When a read miss occurs, if there is a TLB hit and the I bit of the hitting TLB entry is clear, the translated physical miss address is used to fetch a four doubleword cache line beginning with the requested doubleword (critical doubleword first). The line is fetched into a linefill buffer and the critical doubleword is forwarded to the CPU. Subsequent doublewords may be streamed to the CPU if they have been requested, or they may be forwarded from the linefill buffer if the data has already been received from the bus and is valid in the buffer. When a write miss occurs, if there is a TLB hit, and the I and G bits of the hitting TLB entry are clear and write allocation is enabled via the L1CSR0[DCWA] control bit, the translated physical address is used to fetch a four doubleword cache line beginning with the doubleword corresponding to the store address (critical doubleword first). The line is fetched into the linefill buffer and merged with the store data. Subsequently, the line is placed into the appropriate cache block. If write allocation is disabled, or the write is not cacheable or is guarded, no cache line fetch is performed for the write. During a cache line fill, doublewords received from the bus are placed into the cache linefill buffer, and may be forwarded (streamed) to the CPU if such a read request is pending. Accesses from the CPU e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 587 following delivery of the critical doubleword may be satisfied from the cache (hit under fill, non-blocking) or from the linefill buffer if the requested information has been already received. If write allocation is enabled, subsequent stores that hit the linefill buffer address while a linefill is in progress for a previous store or dcbtst miss will be merged into the linefill buffer. No merging of stores will be performed during a linefill initiated by a load miss. When a cache linefill occurs, the linefill buffer contents are placed into the cache array using two accesses; each occurs after receiving a pair of doublewords. The cache always fills an entire line, thereby providing validity on a line-by-line basis. A DCache line is always in one of the following states: invalid, valid, or dirty (and valid). For invalid lines, the V bit is clear, causing the cache line to be ignored during lookups. Valid lines have their V bit set and D bits cleared, indicating the line contains valid data consistent with memory. Dirty cache lines have the D and V bits set, indicating that the line has valid entries that have not been written to memory. ICache lines are either invalid or valid. In addition, a cache line in either cache may be locked (L bits set) indicating the line is not available for replacement. The caches should be explicitly invalidated after a hardware reset; reset does not invalidate the cache lines. Following initial power-up, the cache contents will be undefined. The L, D and V bits may be set on some lines, necessitating the invalidation of the caches by software before being enabled. Figure 11-3 illustrates the general flow of cache operation for each 1616 KB Cache Organization and Line Format cache to determine if the address is already allocated in the cache. (1) the cache set index, virtual address bits A[20:26], are used to select one cache set. A set is defined as the grouping of lines (one from each way), corresponding to the same index into the cache array. (2) The higher order physical address bits A[0:21] , are used as a tag reference or used to update the cache line tag field. (3)The tags from the selected cache set are compared with the tag reference. If any one of the tags matches the tag reference and the tag status is valid, a cache hit has occurred. (4) Virtual address bits A[27:28] are used to select one of the four doublewords in each line. A cache hit indicates that the selected doubleword in that cache line contain valid data (for a read access), or can be written with new data depending on the status of the W access control bit from the MMU (for a write access to the DCache). e200z759n3 Core Reference Manual, Rev. 2 588 Freescale Semiconductor PHYSICAL ADDRESS 0 VIRTUAL ADDRESS 21 TAG DATA / TAG REFERENCE 20 2627 31 INDEX WAY 3 WAY 2 WAY 1 WAY 0 OFST •• SET 0 SET SELECT A[20:26]) TAG STATUSDW0DW1DW2DW3 SET 1 • • • • • • SET 127 TAG • • • • • • • • • • • • • • • STATUSDW0DW1DW2DW3 DATA OR INSTRUCTION TAG REFERENCE A[0:21] •• •• MUX •• 3 2 COMPARATOR •• 1 0 SELECT HIT 3 HIT 2 HIT 1 HIT 0 LOGICAL OR HIT Figure 11-3. 16 KB cache lookup flow 11.4 Cache control Control of the cache is provided by bits in the L1 Cache Control and Status registers (L1CSR0, L1CSR1). Control bits are provided to enable/disable the cache and to invalidate it of all entries. In addition, availability of each way of the caches may be selectively controlled for use. This way control provides cache way locking capability, as well as controlling way availability on a cache line replacement. Ways 0-3 may be selectively disabled for instruction miss replacements and data miss replacements in the respective caches by using the WID and WDD control bits. Software is responsible for maintaining coherency between instruction and data caches, since independent copies of a cache line may be present in both caches; one allocated by an instruction access, another by a data access. 11.4.1 L1 Cache Control and Status Register 0 (L1CSR0) The L1 Cache Control and Status Register 0 (L1CSR0) is a 32-bit register used for general control of the data cache as well as providing general control over disabling ways in both caches. The L1CSR0 register is accessed using a mfspr or mtspr instruction. The SPR number for L1CSR0 is 1010 in decimal. The L1CSR0 register is shown in Figure 11-4. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 589 0 1 2 3 4 5 6 7 8 DCE DCINV DCABT DCEA 0 DCBZ32 DCLOA DCLO DCLFC DCUL DCSLC 0 DCEDT DCWA 0 DCEI 0 DCECE WDD DCWM WID 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 1010; Read/Write; Reset - 0x0 Figure 11-4. L1 Cache Control and Status Register 0 (L1CSR0) The L1CSR0 bits are described in Table 11-1. Table 11-1. L1CSR0 field descriptions Bits Name Description 0:3 WID Way Instruction Disable. 0 The corresponding way in the instruction cache is available for replacement by instruction miss line fills. 1 The corresponding way instruction cache is not available for replacement by instruction miss line fills. Bit 0 corresponds to way 0. Bit 1 corresponds to way 1. Bit 2 corresponds to way 2. Bit 3 corresponds to way 3. The WID bits may be used for locking ways of the instruction cache, and also are used in determining the replacement policy of the instruction cache. 4:7 WDD Way Data Disable. 0 The corresponding way in the data cache is available for replacement by data miss line fills. 1 The corresponding way in the data cache is not available for replacement by data miss line fills. Bit 4 corresponds to way 0. Bit 5 corresponds to way 1. Bit 6 corresponds to way 2. Bit 7 corresponds to way 3. The WDD bits may be used for locking ways of the data cache, and also are used in determining the replacement policy of the data cache. 8:10 — 11 DCWM Reserved1 Data Cache Write Mode 0 Data Cache operates in writethrough mode 1 Data Cache operates in copyback mode When set to writethrough mode, the “W” page attribute from the MMU is ignored and all writes are treated as writethrough required. When set, write accesses are performed in copyback mode unless the “W” page attribute from the MMU is set. e200z759n3 Core Reference Manual, Rev. 2 590 Freescale Semiconductor Table 11-1. L1CSR0 field descriptions (continued) Bits Name Description 12:13 DCWA Data Cache Write Allocation Policy 00 Cache line allocation on a cacheable write miss is disabled 01 Cache line allocation on a cacheable copyback write miss is enabled 10 Cache line allocation on a cacheable copyback or writethrough write miss is enabled 11 Reserved This field also controls merging of store data into the linefill buffer while a cache linefill is in progress. Store data will not be merged when write allocation is disabled. If DCWA is non-zero, store data merging is enabled regardless of the type (writethrough/copyback) of write. 14 — 15 DCECE 16 DCEI 17 — 18:19 DCEDT Data Cache Error Detection Type 00 Reserved (defaults to DCEDT=01(EDC) actions) 01 EDC Error Detection is selected for the tag array and parity is selected for the data arrays 1x Reserved 20 DCSLC Data Cache Snoop Lock Clear 0 Snoop has not invalidated a locked line 1 Snoop has invalidated a locked line Indicates a cache line lock was cleared by a snoop operation that caused an invalidation. This bit is set by hardware and will remain set until cleared by software writing 0 to this bit location. 21 DCUL Data Cache Unable to Lock Indicates a lock set instruction was not effective in locking a cache line. This bit is set by hardware on an “unable to lock” condition (other than lock overflows), and will remain set until cleared by software writing 0 to this bit location. 22 DCLO Data Cache Lock Overflow Indicates a lock overflow (overlocking) condition occurred. This bit is set by hardware on an “overlocking” condition, and will remain set until cleared by software writing 0 to this bit location. Reserved1 Data Cache Error Checking Enable 0 Error Checking is disabled 1 Error Checking is enabled Data Cache Error Injection 0 Cache Error Injection is disabled 1 parity errors will be purposefully injected into every byte subsequently written into the cache. The parity bit of each 8-bit data element written will be inverted. This includes writes due to store hits as well as writes due to cache line refills. DCEI will cause injection of errors regardless of the setting of DCECE, although reporting of errors will be masked while DCECE=0. Reserved1 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 591 Table 11-1. L1CSR0 field descriptions (continued) Bits Name Description 23 DCLFC Data Cache Lock Bits Flash Clear When written to a ‘1’, a cache lock bits flash clear operation is initiated by hardware. Once complete, this bit is reset to ‘0’. Writing a ‘1’ while a flash clear operation is in progress will result in an undefined operation. Writing a ‘0’ to this bit while a flash clear operation is in progress will be ignored. Cache Lock Bits Flash Clear operations require approximately 134 cycles to complete. Clearing occurs regardless of the enable (DCE) value. 24 DCLOA Data Cache Lock Overflow Allocate Set by software to allow a lock request to replace a locked line when a lock overflow situation exists. 0 Indicates a lock overflow condition will not replace an existing locked line with the requested line 1 Indicates a lock overflow condition will replace an existing locked line with the requested line 25:26 DCEA Data Cache Error Action 00 Error Detection causes Machine Check exception. 01 Error Detection causes Correction/Auto-invalidation. No machine check is generated for uncorrectable errors unless the cache line was locked and invalidated or is dirty. Dirty lines are not auto-invalidated. In EDC mode, correction is performed for single-bit tag errors, single-bit lock errors, and single or multi-bit dirty errors. Correction is performed for data errors by reloading of the line. 1x Reserved 27 — 28 DCBZ32 Reserved1 Data Cache dcba, dcbz operation length 0 dcba, dcbz operations operate on an entire cache line 1 dcba, dcbz operations operate on 32bytes of a cache line Note: This bit is implemented for forward compatibility. Since cache lines are 32 bytes, this bit is ignored for dcba, dcbz operations 29 DCABT Data Cache Operation Aborted Indicates a Cache Invalidate or a Cache Lock Bits Flash Clear operation was aborted prior to completion. This bit is set by hardware on an aborted condition, and will remain set until cleared by software writing 0 to this bit location. 30 DCINV Data Cache Invalidate 0 No cache invalidate 1 Cache invalidation operation When written to a ‘1’, a cache invalidation operation is initiated by hardware. Once complete, this bit is reset to ‘0’. Writing a ‘1’ while an invalidation operation is in progress will result in an undefined operation. Writing a ‘0’ to this bit while an invalidation operation is in progress will be ignored. Cache invalidation operations require approximately 134 cycles to complete. Invalidation occurs regardless of the enable (DCE) value. During cache invalidations, the parity check bits are written with a value dependent on the DCEDT selection. DCEDT should be written with the desired value for subsequent cache operation when DCINV is set to ‘1’ for proper operation of the cache. e200z759n3 Core Reference Manual, Rev. 2 592 Freescale Semiconductor Table 11-1. L1CSR0 field descriptions (continued) Bits Name Description 31 DCE Data Cache Enable 0 Cache is disabled 1 Cache is enabled When disabled, cache lookups are not performed for normal load or store accesses, or for snoop requests. Other L1CSR0 cache control operations are still available. Also, operation of the store buffer is not affected by DCE. 1 These bits are not implemented and should be written with zero for future compatibility. 11.4.2 L1 Cache Control and Status Register 1 (L1CSR1) 0 1 2 3 4 5 6 7 8 ICE ICINV 0 ICABT ICEA ICLOA ICLO ICLFC 0 ICUL 0 ICEDT 0 ICEI ICECE The L1 Cache Control and Status Register 1 (L1CSR1) is a 32-bit register used for general control of the instruction cache. The L1CSR1 register is accessed using a mfspr or mtspr instruction. The SPR number for L1CSR1 is 1011 in decimal. The L1CSR1 register is shown in Figure 11-5. 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 1011; Read/Write; Reset - 0x0 Figure 11-5. L1 Cache Control and Status Register 1 (L1CSR1) The L1CSR1 bits are described in Table 11-2. Table 11-2. L1CSR1 field descriptions Bits Name Description 0:14 — Reserved1 15 ICECE 16 ICEI 17 — Reserved1 17:24 — Reserved1 18:19 ICEDT 20 — Instruction Cache Error Checking Enable 0 Error Checking is disabled 1 Error Checking is enabled Instruction Cache Error Injection Enable 0 Cache Error Injection is disabled 1 When ICEDT=01, a double-bit error will be injected into each doubleword written into the cache by inverting the two uppermost parity check bits (p_chk[0:1]). ICEI will cause injection of errors regardless of the setting of ICECE, although reporting of errors will be masked when ICECE=0. Instruction Cache Error Detection Type 00 Reserved (defaults to ICEDT=01(EDC) actions) 01 EDC Error Detection is selected 1x - Reserved Reserved1 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 593 Table 11-2. L1CSR1 field descriptions (continued) Bits Name Description 21 ICUL Instruction Cache Unable to Lock Indicates a lock set instruction was not effective in locking a cache line. This bit is set by hardware on an “unable to lock” condition (other than lock overflows), and will remain set until cleared by software writing 0 to this bit location. 22 ICLO Instruction Cache Lock Overflow Indicates a lock overflow (overlocking) condition occurred. This bit is set by hardware on an “overlocking” condition, and will remain set until cleared by software writing 0 to this bit location. 23 ICLFC Instruction Cache Lock Bits Flash Clear When written to a ‘1’, a cache lock bits flash clear operation is initiated by hardware. Once complete, this bit is reset to ‘0’. Writing a ‘1’ while a flash clear operation is in progress will result in an undefined operation. Writing a ‘0’ to this bit while a flash clear operation is in progress will be ignored. Cache Lock Bits Flash Clear operations require approximately 134 cycles to complete. Clearing occurs regardless of the enable (ICE) value. 24 ICLOA Instruction Cache Lock Overflow Allocate Set by software to allow a lock request to replace a locked line when a lock overflow situation exists. 0 Indicates a lock overflow condition will not replace an existing locked line with the requested line 1 Indicates a lock overflow condition will replace an existing locked line with the requested line 25:26 ICEA Instruction Cache Error Action 00 Error Detection causes Machine Check exception. 01 Error Detection causes Correction/Auto-invalidation. No machine check is generated unless a locked line is invalidated. Correction is performed for single-bit tag and lock errors, and lines with multi-bit tag or lock errors are invalidated. In parity mode, tag or lock errors will result in invalidation of lines. Correction is performed for single or multi-bit data errors by reloading of the line. 1x Reserved 27:28 — 29 ICABT Reserved1 Instruction Cache Operation Aborted Indicates a Cache Invalidate or a Cache Lock Bits Flash Clear operation was aborted prior to completion. This bit is set by hardware on an aborted condition, and will remain set until cleared by software writing 0 to this bit location. e200z759n3 Core Reference Manual, Rev. 2 594 Freescale Semiconductor Table 11-2. L1CSR1 field descriptions (continued) Bits Name Description 30 ICINV Instruction Cache Invalidate 0 No cache invalidate 1 Cache invalidation operation When written to a ‘1’, a cache invalidation operation is initiated by hardware. Once complete, this bit is reset to ‘0’. Writing a ‘1’ while an invalidation operation is in progress will result in an undefined operation. Writing a ‘0’ to this bit while an invalidation operation is in progress will be ignored. Cache invalidation operations require approximately 134 cycles to complete. Invalidation occurs regardless of the enable (ICE) value. During cache invalidations, the parity check bits are written with a value dependent on the ICEDT selection. ICEDT should be written with the desired value for subsequent cache operation when ICINV is set to ‘1’ for proper operation of the cache. 31 ICE Instruction Cache Enable 0 Cache is disabled 1 Cache is enabled When disabled, cache lookups are not performed for instruction accesses. Other L1CSR1 cache control operations are still available and are not affected by ICE. 1 These bits are not implemented and should be written with zero for future compatibility. 11.4.3 L1 Cache Configuration Register 0 (L1CFG0) 00 4 5 6 1 0 1 0 0 7 DCREPL 8 00 DCECA 3 0 DCLA 2 DCBSIZE DCFISWA 1 CWPA 0 CFAHA CARCH The L1 Cache Configuration Register 0 (L1CFG0) is a 32-bit read-only register. L1CFG0 provides information about the configuration of the e200z759n3 L1 data cache design. The contents of the L1CFG0 register can be read using a mfspr instruction. The SPR number for L1CFG0 is 515 in decimal. The L1CFG0 register is shown in Figure 11-6. DCNWAY DCSIZE 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 10 1 1 00000011 (4 way) 00000010000 (16 KB) SPR - 515; Read-only Figure 11-6. L1 Cache Configuration Register 0 (L1CFG0) The L1CFG0 bits are described in Table 11-3. Table 11-3. L1CFG0 field descriptions Bits Name 0:1 CARCH 2 CWPA Description Cache Architecture 00 The cache architecture is Harvard Cache Way Partitioning Available 1 The caches support partitioning of way availability for I/D accesses e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 595 Table 11-3. L1CFG0 field descriptions (continued) Bits Name 3 DCFAHA Data Cache Flush All by Hardware Available 0 The data cache does not support Flush All in Hardware 4 DCFISWA Data Cache Flush/Invalidate by Set and Way Available 1 The data cache supports flushing/invalidation by Set and Way via the L1FINV0 spr 5:6 — 7:8 DCBSIZE Data Cache Block Size 00 The data cache implements a block size of 32 bytes 9:10 DCREPL Data Cache Replacement Policy 10 The data cache implements a pseudo-round-robin replacement policy 11 DCLA 12 DCECA Data Cache Error Checking Available 1 The data cache implements error checking 13:20 DCNWAY Data Cache Number of Ways 0x03 The data cache is 4-way set-associative 21:31 DCSIZE 11.4.4 Description Reserved - read as zeros Data Cache Locking APU Available 1 The data cache implements the line locking APU Data Cache Size 0x010The size of the data cache is 16 KB. L1 Cache Configuration Register 1 (L1CFG1) 0 1 2 0000 3 4 5 6 1 0 0 7 8 00 ICECA ICLA 0 ICREPL 0 ICBSIZE ICFISWA The L1 Cache Configuration Register 1 (L1CFG1) is a 32-bit read-only register. L1CFG1 provides information about the configuration of the e200z759n3 L1 instruction cache design. The contents of the L1CFG1 register can be read using a mfspr instruction. The SPR number for L1CFG1 is 516 in decimal. The L1CFG1 register is shown in Figure 11-7. ICNWAY ICSIZE 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 10 1 1 00000011 (4 way) 00000010000 (16 Kbyte) SPR - 516; Read-only Figure 11-7. L1 Cache Configuration Register 1 (L1CFG1) The L1CFG1 bits are described in Table 11-4. e200z759n3 Core Reference Manual, Rev. 2 596 Freescale Semiconductor Table 11-4. L1CFG1 field descriptions Bits Name 0:3 — 4 ICFISWA 5:6 — 7:8 ICBSIZE Instruction Cache Block Size 00 The instruction cache implements a block size of 32 bytes 9:10 ICREPL Instruction Cache Replacement Policy 10 The instruction cache implements a pseudo-round-robin replacement policy 11 ICLA 12 ICECA Instruction Cache Error Checking Available 1 The instruction cache implements error checking 13:20 ICNWAY Instruction Cache Number of Ways 0x03 The instruction cache is 4-way set-associative 21:31 ICSIZE 11.5 Description Reserved - read as zeros Instruction Cache Flush/Invalidate by Set and Way Available 1 The instruction cache supports invalidation by Set and Way via the L1FINV1 spr Reserved - read as zeros Instruction Cache Locking APU Available 1 The instruction cache implements the line locking APU Instruction Cache Size 0x010The size of the instruction cache is 16 KB. Data cache software coherency Data cache coherency is supported through software operations to invalidate, flush dirty lines to memory or invalidate dirty lines. The data cache may operate in either writethrough or copyback modes, and in conjunction with a MMU, may designate certain accesses as writethrough or copyback. Data cache misses will force the push and store buffers to empty prior to performing the access to ensure coherency. 11.6 Address aliasing Each cache is virtually indexed and physically tagged, thus the problems associated with potential cache synonyms due to effective address aliasing are eliminated, unless 1Kbyte or 2Kbyte pages are used. If 1Kbyte or 2Kbyte pages are used and multiple virtual addresses are mapped to the same physical address, the low order virtual address bits used to index the cache (A[20:21] for 1Kbyte pages, A20 for 2Kbyte pages) must be the same for each of the virtual pages, and these index bit(s) must match the corresponding physical address bit(s) value. For example, if logical pages X and Y map to physical page P, then X, Y, and P must have the same values of A[20:21] for 1Kbyte pages, and A20 for 2Kbyte pages. Note that this limitation should already met because of the requirements on 1Kbyte and 2Kbyte page usage mandated by Section 10.2.6, Restrictions on 1 KB and 2 KB page size usage. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 597 11.7 11.7.1 Cache Operation Cache enable/disable The caches are enabled or disabled by using the respective Cache Enable bits, L1CSR0DCE and L1CSR1ICE. Cache Enable bits are cleared by power-on reset or normal reset, disabling the caches. When a cache is disabled, the cache tag status bits are ignored, and the cache is not accessed for snoops, normal loads, stores, or instruction fetches. All normal accesses are propagated to the system bus as single-beat (non-burst) transactions. Note that the state of the Cache Inhibited access attribute (the I bit) remains independent of the state of L1CSR0DCE and L1CSR1ICE. Disabling a cache does not affect the translation logic in the Memory Management Unit. Translation attributes will still be used when generating attribute information on the system buses. The store buffer is still available for use even when the data cache is disabled. Altering the DCE or ICE bit must be preceded by an isync and msync to prevent the cache from being disabled or enabled in the middle of a data or instruction access. In addition, the cache may need to be globally flushed before it is disabled to prevent coherency problems when it is re-enabled. All cache operations are affected by disabling the cache. Cache management instructions (except for mtspr L1FINV{0,1} and mtspr L1CSR{0,1}) do not affect a cache when it is disabled. 11.7.2 Cache fills Cache line fills are requested when a cacheable load or instruction miss occurs. Cacheable store misses only allocate cache lines if data cache write allocation is enabled for the type of store being performed. The cache line fill is performed critical doubleword first on the bus using a burst access. The critical doubleword is forwarded to the requesting unit before being written to the cache, thus minimizing stalls due to fill delays. Cache line fills load a four doubleword linefill buffer, and updates to the cache array are performed as half-lines are received. Read accesses may hit in the line buffer and data supplied from the buffer to the CPU. On writes that hit to the buffer address, when write allocation is disabled, the writes will stall until the cache fill has been completed. When write allocation is enabled, these writes will update the linefill buffer if the buffer is being filled due to a store miss only, otherwise the write will also stall until the linefill completes. Data may be streamed to the CPU as it arrives from the bus if a corresponding request is pending. In addition, the cache supports hit under fill, allowing subsequent CPU accesses to be satisfied by cache hits while the remainder of the line fill completes. This non-blocking capability improves performance by hiding a portion of the line fill latency when data already in the cache or linefill buffer is subsequently requested by the CPU. The cache supports up to three outstanding misses, and will forward these miss requests to the BIU. Miss data is always returned from the BIU to the Cache in-order. e200z759n3 Core Reference Manual, Rev. 2 598 Freescale Semiconductor Cache fill operations are performed as wrapping bursts on the system bus. If an error response is received on any element of the burst, the burst will be terminated, and the cache line will be marked invalid. If one or more store hit updates occur to the linefill buffer during allocation of a line for a store miss and a subsequent error response is received during the linefill, the original store miss access and each individual hitting store access will be performed on the system bus as if they were non-allocating. In this case, an async machine check exception will be signaled for the linefill. 11.7.3 Cache line replacement On a cache miss, the cache controller uses a pseudo-round-robin replacement algorithm to determine which cache line will be selected to be replaced. There is a single replacement counter for each cache. The replacement algorithm acts as follows: On a miss, if the replacement pointer is pointing to a way that is not enabled for replacement (the selected line or way is locked), it is incremented until an available way is selected (if any). After a cache line is successfully filled without error, the replacement pointer increments to point to the next cache way. If no way is available for the replacement, the access is treated as a single beat access and no cache linefill occurs. Lines selected for replacement that are dirty (modified) must be copied back to main memory. This is performed by first storing the replaced line in a 32-byte push buffer while the missed data is fetched. After filling the new line, the contents of the buffer are written to memory beginning with doubleword 0. Each replacement counter is initialized to point to way 0 on a reset or on a respective cache invalidate all operation. A replacement counter may also be set to a specific value via a L1FINV0,1 command. 11.7.4 Cache miss access ordering Cacheable cache misses may be processed out-of-order by e200z759n3. Load misses that are not cache-inhibited are allowed to bypass buffered stores and push buffer pushes as long as no address alias exists. Alias checking is performed by comparing the index of the load with the index of each buffered store and push. If no alias match exists, the load is allowed to bypass buffered stores and pushes, regardless of the attributes associated with those stores. Load misses will be performed in-order with respect to other load misses. Store accesses do not bypass loads. Stores are not necessarily performed in order from the point of view of the memory system, since a store miss may cause a linefill to satisfy the store prior to previously buffered stores being completed, as long as no aliasing occurs. Memory access ordering must be enforced by software where required, using the mbar and/or msync instructions, per the PowerArch storage ordering rules. 11.7.5 Cache-inhibited accesses When the Cache-Inhibited attribute is indicated by translation and a cache miss occurs, all accesses are performed as single beat transactions on the system bus. Cache Inhibited status is ignored on all cache hits. For cache-inhibited load access misses, the processor termination is withheld for the load until the store buffer has been flushed of all entries, the push buffer has been emptied, and the load has completed to memory. Cache-inhibited store accesses that are not marked as Guarded are placed in the store buffer e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 599 (when enabled) and the processor termination occurs when the store buffer entry is allocated. (see Section 11.9, Push and store buffers). 11.7.6 Guarded accesses When the Guarded attribute is indicated by translation and a cache miss occurs, the access will not proceed on the external bus until all previously initiated demand-accesses have been terminated to the processor without error. Buffered stores are considered terminated to the processor when they are placed into the store buffer. Guarded load misses that are not cache-inhibited are allowed to bypass buffered stores and push buffer pushes as long as no address alias exists, regardless of a buffered store being guarded. Guarded stores will not allocate cache lines on a miss, but are buffered in the store buffer if the access is not also cache-inhibited, regardless of being writethrough required or not (regardless of W bit or L1CSR0DCWM values), and will be performed as single-beat accesses on the bus. 11.7.7 Cache-inhibited guarded accesses When the Cache-inhibited and Guarded attributes are indicated by translation and a cache miss occurs, accesses are performed as single beat transactions on the system bus. Cache-inhibited status is normally ignored on all cache hits. Cache-inhibited status for writethrough stores that are also guarded will not be ignored however. For cache-inhibited guarded access misses, or for cache-inhibited guarded writethrough store hits, the processor termination is withheld until the store buffer has been flushed of all entries, the push buffer has been emptied, and the access has completed to memory (see Section 11.9, Push and store buffers). Cache-inhibited guarded stores with W=0 or L1CSR0DCWM=1 that hit ignore the Cache-inhibited and Guarded status. 11.7.8 Cache invalidation e200z759n3 supports full invalidation of the caches under software control. The caches may be invalidated through the L1CSR0DCINV and L1CSR1ICINV cache invalidate control bits. This function is available even when a cache is disabled. Reset does not invalidate a cache automatically. Software must use the {D,I}CINV control for invalidation after a reset. Proper use of this bit is to determine that it is clear and then set it with a pair of mfspr mtspr operations. A 0-to-1 transition on {D,I}CINV causes a flash invalidation to be initiated, which lasts for multiple (approx. 134) CPU cycles. Once set, the {D,I}CINV bit will be cleared by hardware after the operation is complete. It will remain set during the invalidation interval, and may be tested by software to determine when the operation has completed. A mtspr operation to L1CSR{0,1} that attempts to change the state of {D,I}CINV during invalidation will not affect the state of that bit. In order to properly generate the tag parity/check bits during the invalidation process, the error detection type control located in the L1CSR[0,1][D,I]CEDT field should be configured properly at the time the invalidation operation is initiated. A subsequent change to the error detection type control will require a new invalidation to avoid improper interpretation of previously stored tag parity/check bits. During the process of performing the invalidation, a cache does not respond to accesses other than snoop accesses, and remains busy. Interrupts may still be recognized and processed, potentially aborting the invalidation operation. When this occurs, the L1CSR{0,1}ABT bit will be set to indicate unsuccessful e200z759n3 Core Reference Manual, Rev. 2 600 Freescale Semiconductor completion of the operation. Software should read the L1CSR{0,1} register to determine that the operation has completed (L1CSR{0,1}CINV bit cleared), and then check the status of the L1CSR{0,1}ABT bit to determine completion status. NOTE Note that while most implementations of the e200z759n3 will stall further instruction execution during this invalidation interval, it is not guaranteed across all implementations, thus software should be written using these guidelines. Individual cache lines may be invalidated using the icbi, dcbi, or dcbf instructions. These instructions require the respective cache to be enabled in order to operate normally. 11.7.9 Cache flush/invalidate by set and way e200z759n3 supports cache flushing under software control. The caches may be flushed and/or invalidated by index and way through a mtspr l1finv{0,1} instruction. The L1 Flush and Invalidate Control Registers (L1FINV{0,1}) are 32-bit SPRs used to select a cache set and way to be flushed/invalidated. No tag match is required. This function is available even when a cache is disabled. L1FINV0 is used for data cache operations, while L1FINV1 is used for instruction cache operations. 11.7.9.1 L1 Flush and Invalidate Control Register 0 (L1FINV0) The SPR number for L1FINV0 is 1016 in decimal. The L1FINV0 register is shown in Figure 11-8.The L1FINV0 bits are described in Table 11-5. 0 0 1 2 CWAY 3 4 5 6 7 0 8 CSET 0 CCMD 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 1016; Read/Write; Reset - 0x0 Figure 11-8. L1 Flush/Invalidate Register 0 (L1FINV0) . Table 11-5. L1FINV0 field descriptions Bits Name 0:5 — 6:7 CWAY 8:19 — 20:26 CSET Description Reserved1 for way extension Cache Way Specifies the data cache way to be selected Reserved1 for set extension Cache Set Specifies the cache set to be selected e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 601 Table 11-5. L1FINV0 field descriptions (continued) 1 Bits Name 27:29 — 30:31 CCMD Description Reserved1 for set/command extension Cache Command 00 The data contained in this entry is invalidated without flushing 01 The data contained in this entry is flushed if dirty and valid without invalidation 10 The data contained in this entry is flushed if dirty and valid and then is invalidated 11 Reset way replacement pointer to the way indicated by CWAY These bits are not implemented and should be written with zero for future compatibility. For cache flush operations, if a transfer error occurs on a data cache line flush, the push of the remaining portion of the cache line is aborted, the line remains marked dirty and valid, and a machine check condition is signaled For flush and flush with invalidation operations, data parity errors do not abort a flush to memory, but a machine check will be generated at the completion of the flush. In both cases the cache line is left unchanged. For flush with invalidation operations to clean lines, tag parity errors and data parity errors are ignored, and the line is invalidated. Note that only the line indicated by CSET and CWAY is checked for errors; lines in the other ways are ignored. For invalidation without flushing operations, tag parity errors, data parity errors, and dirty-bit parity errors are ignored, and the line will be invalidated. 11.7.9.2 L1 Flush and Invalidate Control Register 1 (L1FINV1) The SPR number for L1FINV1 is 959 in decimal. The L1FINV1 register is shown in Figure 11-9. The L1FINV1 bits are described in Table 11-6. 0 0 1 2 CWAY 3 4 5 6 7 0 8 CSET 0 CCMD 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 959; Read/Write; Reset - 0x0 Figure 11-9. L1 Flush/Invalidate Register 1 (L1FINV1) Table 11-6. L1FINV1 field descriptions Bits Name 0:5 — 6:7 CWAY 8:19 — 20:26 CSET Description Reserved1 for way extension Cache Way Specifies the instruction cache way to be selected Reserved1 for set extension Cache Set Specifies the instruction cache set to be selected e200z759n3 Core Reference Manual, Rev. 2 602 Freescale Semiconductor Table 11-6. L1FINV1 field descriptions (continued) 1 11.8 Bits Name 27:29 — 30:31 CCMD Description Reserved1 for set/command extension Cache Command 00 The data contained in this entry is invalidated 01 Reserved 10 Reserved 11 Reset way replacement pointer to the way indicated by CWAY These bits are not implemented and should be written with zero for future compatibility. Cache parity and EDC protection Cache parity is supported for both the tag and data arrays of each cache. Six parity check bits are provided for each tag entry for the tag arrays of both caches to support multi-bit error detection (EDC), and redundant dirty bits are provided in the data cache to provide dirty-bit parity checking without requiring a read-modify-write operation when the dirty bit is set. Redundant lock bits are provided as well for both the ICache and the DCache. Byte parity is supported for the data arrays of the data cache, and eight parity check bits are provided for each doubleword in the data arrays of the ICache, which are used for multi-bit error detection (EDC–DED, double error detection). Utilizing EDC protection, many multi-bit errors are also detected. Parity and EDC checking is controlled by the L1CSR0DCECE, L1CSR0DCEDT, L1CSR1ICECE, and L1CSR1ICEDT control fields. When error checking is enabled, checking is performed on each cache access, whether for lookup, snoop lookup, or for dirty line replacement. Parity or EDC errors are not signaled by the respective cache when cache error checking is disabled for that cache (L1CSR[0,1][I,D]CECE=0). For normal cache lookups due to instruction fetching, loads, or stores, if an uncorrectable tag EDC error is detected on any portion of the accessed tags, a parity error is signaled, regardless of whether a cache hit or miss occurs. Otherwise, if a cache hit for a load occurs and a data parity error is detected on any portion of the accessed doubleword of data, a parity error is also signaled. Data parity errors are ignored for store hits, since the parity will be updated for the data being stored. Data parity errors are ignored for misses unless the replacement line is dirty or incurs a dirty bit parity error, since the parity will be updated for the new linefill data being stored. Signaling of a parity error may not cause an exception to occur, depending on the error detection action to be taken. Instead, a correction/auto-invalidation cycle may be performed. A dirty line push will not be generated for a dirty line replacement that incurs an uncorrectable tag EDC error. In this case, a machine check will be generated, but no push will have been requested to the external bus, and the cache line will be left unchanged. For dirty line pushes from the data cache, accessing the data arrays for the push data may occur after the burst write has been requested on the external bus, thus a push of dirty data may actually push data that contains a parity error. A machine check will be signaled, but the burst will not be aborted, and the line will be invalidated and replaced. Dirty bit parity is checked when invalidation or replacement operations are required. If a dirty parity error is detected on a cache line replacement, in correction/autoinvalidation mode, it is ignored, and the line is e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 603 pushed normally. In machine check mode, a machine check exception will be signaled indicating a tag parity error. Dirty status or dirty parity errors will prevent the auto-invalidation of cache lines with tag EDC errors. If a dirty parity error occurs, in correction/autoinvalidation mode the line is assumed to be dirty, and if correction/auto-invalidation is enabled, the error will be corrected by re-writing all three dirty bits to ‘1’. This implies that a single or multi-bit error that sets one or more dirty bits from an initially cleared state will cause the line to appear dirty. This should not cause a functional issue however, since the only result is that a clean but coherent line may be pushed on a flush or replacement in correction/autoinvalidation mode. Regardless of the error action mode indicated by {D,I}CEA, lock bit parity errors will not signal an exception for normal hits without a tag parity error. If correction/auto-invalidation is enabled, on each cache lookup operation, if a single-bit lock error is detected in one or more ways, it will be corrected by re-writing all lock bits to the correct state. Uncorrectable lock errors will remain unchanged. For cache hits without a tag EDC error, all lock parity errors are ignored. Lock parity errors on a cacheable miss (after a correction attempt if correction/auto-invalidation is enabled) will result in the line(s) being invalidated if clean and a machine check to be generated. A new line will not be allocated, and the lock bits will not be updated on the invalidation. Lock bit parity errors are ignored for non-cacheable accesses. Signaling of a parity error or EDC error may cause a Machine Check exception to occur, and one or more syndrome bits to be set in the Machine Check Syndrome register, or may instead result in a correction/auto-invalidation operation and not result in an exception being signaled, or both may occur, depending on the error action control setting in the appropriate cache control register. Refer to Section 11.8.1, Cache error action control, for details of the cache error action controls. Refer to Section 7.7.2, Machine Check interrupt (IVOR1), and to Section 2.4.7, Machine Check Syndrome Register (MCSR), for a description of Machine Check conditions. 11.8.1 Cache error action control The L1CSR0DCEA and L1CSR1ICEA control fields allow for selection of several policies to apply when errors are detected during a cache lookup, and are described in the following subsections. 11.8.1.1 L1CSR[0,1][I,D]CEA = 00, machine check generation on error Selection of the machine check generation on error policy allows for all errors to be processed by software. Parity or EDC errors that could result in incorrect operation will cause a machine check condition. In order to be recoverable, the machine check handler must not incur another parity or EDC error during the initial portion of the machine check handler. Parity/EDC errors will not generate a machine check exception for cache-inhibited accesses. If machine check generation on error is enabled (L1CSR[0,1][I,D]CEA=00) and an EDC error is detected on any portion of the accessed tags for a cacheable load or store access, a machine check is reported, regardless of whether a cache hit or miss occurs. Otherwise, if a cache hit occurs and a parity or EDC error is detected on any portion of the accessed doubleword of data for a load or an instruction access, a machine check is also reported. For store accesses, data parity errors are ignored. Lock or dirty parity errors on a cacheable miss will cause a machine check to be reported indicating a lock error and/or a tag parity error. Dirty parity errors on a cache hit for a reservation instruction (lwarx, stwcx., etc.) will result in a machine check and will indicate a tag parity error. If a miss occurs and a tag EDC error is detected on a lookup for e200z759n3 Core Reference Manual, Rev. 2 604 Freescale Semiconductor a cacheable reservation instruction (lwarx, stwcx., etc.), it will be ignored if the line is clean, otherwise if the line is dirty or a dirty parity error occurs, a machine check will be generated and the reservation access will not be run externally. Cache inhibited reservation accesses will ignore all parity/EDC errors. 11.8.1.2 L1CSR[0,1][I,D]CEA = 01, correction/auto-invalidation on error The correction/auto-invalidation on error policy attempts to cause most parity and EDC errors to be transparently handled by correcting lines with single-bit tag errors, and invalidating lines with uncorrectable tag errors or with data errors and then causing cache refills to reload correct data from memory, without generation of exceptions. Exceptions are only generated when invalidations could cause or would cause a change in correct behavior, such as changing the locked status of a line, or invalidating potentially dirty data. Parity/EDC errors will not generate invalidations that could cause a machine check exception for cache-inhibited accesses however. When using EDC protection for the cache tags (L1CSR[0,1][D,I]CEDT=01), single-bit tag errors are corrected by the cache hardware during a correction/auto-invalidation cycle. Clean unlocked lines with multi-bit errors are invalidated on cache hits, with no machine check signaled. Clean locked lines with uncorrectable tag errors are invalidated on cache misses, and a machine check is signaled. Note that since the data arrays have a higher probability of incurring an error than the tag arrays, due to the relative storage capacities, most errors will be transparently corrected, even if they are double-bit or multi-bit errors. Using writethrough mode for critical data will ensure that invalidation or refills are able to recover from errors transparently in most cases. 11.8.1.2.1 Instruction cache errors If correction/auto-invalidation on error is enabled (L1CSR1ICEA=01) and an error is detected on any portion of the accessed tags or data for an access, a correction/auto-invalidation cycle is inserted, regardless of whether a cache hit or miss occurs. During this cycle, any tag entry with a single-bit tag or lock error is corrected and re-written to correct the stored error. Tag entries with uncorrectable errors are invalidated if unlocked or are invalidated if a cache miss will occur after a correction/auto-invalidation cycle regardless of locked status. If a locked line is invalidated, a machine check will occur, no replacement will occur, and the locked status will remain set for the invalidated line(s) to assist software in determining the location of the error(s). Following the correction/auto-invalidation cycle, a re-lookup is performed for the access. If a cache hit occurs on a way without a tag EDC error, and an EDC error is detected on any portion of the accessed doubleword of data, a miss is forced, and the same line is refilled from system memory, retaining the existing lock status. The replacement pointer for the cache is not updated in these circumstances. If a cache hit occurs on a way without a tag EDC error, EDC errors on all other lines are ignored, and no invalidations for those lines will occur. For all cases of invalidations, if any line that was locked or incurred a lock error was invalidated, a machine check will also occur, even though auto-invalidation is selected. Invalidation is not blocked for locked lines or lines with lock parity errors on cache misses. The lock bits will remain unmodified by the invalidation operation to allow for potential software recovery. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 605 If a refill of a locked line due to a data EDC error encounters an external bus error during the linefill, a machine check will be generated, the line will be invalidated, and the lock bits will remain set. 11.8.1.2.2 Data cache errors If correction/auto-invalidation on error is enabled (L1CSR0DCEA=01) and an error is detected on any portion of the accessed tags, or if a lock or dirty parity error is detected, an invalidation/correction cycle is inserted, regardless of whether a cache hit or miss occurs. Following the invalidation/correction cycle, a re-lookup is performed for the access. During the correction/auto-invalidation cycle, any tag entry with a tag or lock error is corrected if possible, and re-written to correct the stored error. Tag entries with uncorrectable errors are invalidated if the line is clean and unlocked, or if the line is clean and a miss will occur after the re-lookup, regardless of lock status. Dirty parity errors are corrected by setting all dirty bits to ‘1’. Dirty lines and lines with a dirty parity error are not invalidated. Following the correction/auto-invalidation cycle, a re-lookup is performed for the access. If a cache hit occurs on a way without a tag EDC error, and a parity error is detected on any portion of the accessed doubleword of data for a load, if the line is clean, a miss is forced and the line is refilled from system memory, retaining the existing lock status. The replacement pointer for the cache is not updated in these circumstances. All other clean unlocked lines with uncorrectable tag errors will have been invalidated during the correction/auto-invalidation cycle if one was initially needed. Tag EDC errors on lines that were not invalidated earlier due to lock or dirty status will be ignored since a cache hit occurs. For stores, parity errors on data are ignored, and no invalidation or refill of any lines will occur on a hit to a way without a tag EDC error. Note that since the data arrays have a higher probability of incurring an error than the tag arrays, due to the relative storage capacities, most errors will be transparently corrected. Using writethrough mode for critical data will ensure that invalidation or refills are able to recover from errors transparently in most cases. If a cache hit occurs on a way without a tag EDC error, and a parity error is detected on any portion of the accessed doubleword of data for a load, and the line is dirty or a dirty error occurs, no refill of the cache line will occur, the line will not be invalidated, and a machine check will also occur, even if auto-invalidation is selected. All other clean unlocked lines with uncorrectable tag errors will also have been invalidated during the correction/auto-invalidation cycle if one was initially needed. Tag EDC errors on lines that were not invalidated earlier due to lock or dirty status will be ignored If a cache hit occurs only on a line(s) with an uncorrectable tag EDC error after a invalidation /correction cycle has been performed, since the line is dirty or has a dirty parity error (it would have been invalidated otherwise), a machine check is generated, and no linefill is performed. If a cache miss occurs and any line with an uncorrectable tag EDC error is dirty or has a dirty parity error, the line is not invalidated, a machine check is generated, and no linefill is performed. All clean lines with tag errors will have been invalidated/corrected on a cache miss, regardless of locked status. For all cases of invalidations, if any line that was locked or incurred a lock error was invalidated, a machine check will also occur, even though auto-invalidation is selected. Invalidation on a miss is not blocked for locked lines or lines with lock parity errors unless the access is cache-inhibited or is dirty. The lock bits will remain unmodified by the invalidation operation to allow for potential software recovery. e200z759n3 Core Reference Manual, Rev. 2 606 Freescale Semiconductor If a refill of a locked line due to a data parity error encounters an external bus error during the linefill, a machine check will be generated, the line will be invalidated, and the lock bits will remain set. 11.8.1.2.3 Data cache line flush or invalidation due to reservation instructions (l[b,h,w]arx, st[b,h,w]cx.) Normally, when executing a load and reserve, or a store conditional instruction, a cache line hit results in the line being pushed (if dirty) and marked clean, and the reservation access performed as a single-beat access. Certain parity or EDC errors may cause other actions however. If a cache hit to a line with no tag EDC error occurs when performing a lookup for a load or store reservation access, the line will be pushed if dirty, or if a dirty parity error occurs, and will be marked as clean. Locked status will not be changed. A push parity error may occur during the push if a data parity error is encountered, and a machine check will be generated. In this case the reservation access will not be performed. Otherwise, a load reservation access is then performed as a single-beat access, ignoring the cache data. A store reservation access is performed as a writethrough single-beat write access on the bus, regardless of whether it is marked as writethrough required. If the write access completes without error and succeeds (no ERROR or XFAIL response from the bus), then the cache is updated with the store data, but the line is left in a clean state. Uncorrectable tag errors on other clean unlocked lines will cause invalidation of those lines without signaling a machine check. Uncorrectable tag errors on other cache lines that are locked or are dirty will be ignored. Otherwise, if any line has an uncorrectable tag EDC error and is dirty or has a dirty parity error, a machine check is generated, and the line(s) remains unchanged. Clean unlocked lines with tag EDC errors will be invalidated or corrected, but locked lines or lines with a lock error will not be invalidated on a cache miss, since no new cache line will be allocated. 11.8.2 Parity/EDC error handling for cache control operations and instructions Parity/EDC errors are not signaled when the respective L1CSR0DCECE and L1CSR1ICECE cache error checking enable bits are cleared. When set, the following sections describe error handling for cache control operations and cache control instructions. 11.8.2.1 L1FINV[0,1] operations For invalidation operations via the L1FINV[0,1] control registers, uncorrectable tag EDC errors will result in the specified line being invalidated, and no error will be reported, regardless of the setting of L1CSR[0,1][I,D]CEA. Data parity or EDC errors and dirty errors are ignored. Parity or EDC errors on all other ways not specified by the CWAY value for the L1FINV[0,1] are ignored, regardless of the settings of L1CSR[0,1][D,I]CEA. For flush and flush with invalidate operations via the L1FINV0 control register, if no uncorrectable tag EDC error occurs on the specified line, it is flushed to memory if dirty or if a dirty parity error occurs, and then invalidated for flush with invalidate operations, and no machine check is signaled for dirty parity errors. If an uncorrectable tag EDC error occurs on the specified line, and the line is dirty or a dirty error is encountered, no flush or invalidation will be performed, the line will remain unchanged, and a machine e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 607 check will be generated. For flush operations, an uncorrectable tag EDC error on a clean line will be ignored, and no error will be reported. For flush with invalidate operations, an uncorrectable tag EDC error on a clean line will result in the specified line being invalidated, and no error will be reported. Lock status is ignored for these operations. Data parity errors may result in a push parity error and a machine check generated, but the line will still be flushed to memory if not prevented due to an uncorrectable tag EDC error. If a push parity error occurs, the line will be left unaffected for flush with invalidate operations. Lock status will be cleared on an invalidation or flush with invalidation that does not result in a machine check. 11.8.2.2 Cache touch instructions (dcbt, dcbtst, icbt) Parity errors are not signaled on a lookup for a dcbt, dcbtst, or icbt instruction. For those instructions, an uncorrectable tag EDC error results in a nop and no error is reported, regardless of error checking being enabled. No invalidations will occur. 11.8.2.3 icbi instructions For icbi instructions, on a hit to any locked or unlocked line without an uncorrectable tag EDC error (with or without a lock parity error), or on a hit to an unlocked line with an uncorrectable tag EDC error, the line(s) is invalidated, regardless of the setting of L1CSR1ICEA, and no machine check is generated. If L1CSR1ICEA = ‘01’, if any line has a tag EDC error, a correction/invalidation cycle is inserted to correct tags with single-bit errors, and to invalidate unlocked lines with multi-bit errors. Locked lines with uncorrectable tag errors that miss are unaffected. No machine check will be generated. If a hit occurs to a line with a tag EDC error (after a correction for L1CSR1ICEA = ‘01’) that is locked or has a lock parity error, the line is left unaffected, and no machine check is generated, regardless of the setting of L1CSR1ICEA. If a miss occurs, all parity/EDC errors are ignored, the lines are left unaffected, and no machine check is generated, regardless of the setting of L1CSR1ICEA. All data EDC errors are ignored regardless of L1CSR1ICEA. 11.8.2.4 dcbi instructions For dcbi instructions, on a hit to a line without a tag EDC error, the line is invalidated, regardless of the setting of L1CSR0DCEA. For this case, data, lock, and dirty parity errors are ignored. When L1CSR0DCEA = ‘00’, tag parity/DC errors on other lines are ignored. When L1CSR0DCEA = ‘01’, uncorrectable tag EDC errors on other lines will also cause clean unlocked lines to be invalidated, regardless of hit or miss. No machine check is generated regardless of the setting of L1CSR0DCEA. For dcbi instructions that hit to a line with a tag EDC error, the line(s) is invalidated if clean and unlocked and no machine check is generated, regardless of the setting of L1CSR0DCEA. Uncorrectable tag EDC errors will cause other clean unlocked lines to be invalidated when L1CSR0DCEA = ‘01’, regardless of hit or miss. If a hit occurs to a line with an uncorrectable tag EDC error and the line is dirty, or is locked or has a lock parity error, the line is left unaffected, and no machine check is generated, regardless of the setting of L1CSR0DCEA. e200z759n3 Core Reference Manual, Rev. 2 608 Freescale Semiconductor For dcbi instructions that miss in all ways, when L1CSR0DCEA = ‘00’, no invalidation is performed regardless of tag parity /EDC errors and no machine check is signaled. Uncorrectable tag EDC errors will cause clean unlocked lines to be invalidated when L1CSR0DCEA = ‘01’, and no machine check is signaled. All other lines are left unchanged. 11.8.2.5 dcbst instructions For dcbst instructions, on a hit to any line without a tag EDC error, if the line is dirty, or has a dirty bit error, the line is flushed. Lock errors are ignored. When L1CSR0DCEA = ‘00’, tag EDC errors on other lines are ignored. When L1CSR0DCEA = ‘01’, uncorrectable tag EDC errors on other lines will also cause clean unlocked lines to be invalidated, regardless of hit or miss. No machine check is generated regardless of the setting of L1CSR0DCEA. For dcbst, lock and dirty errors are ignored on a hit. Data parity errors will not prevent the line from being flushed, but will cause a machine check to be generated due to a push parity error. For cacheable dcbst instructions that hit only to a line with a tag EDC error or that miss in all ways, a machine check will be generated if L1CSR0DCEA = ‘00’ and any line with a tag EDC error is dirty. Lock errors are ignored. If L1CSR0DCEA = ‘01’, clean unlocked lines with an uncorrectable tag EDC error are invalidated, and no errors are signaled unless any line with an uncorrectable tag EDC error is also dirty or has a dirty parity error. If any line with an uncorrectable tag EDC error is dirty, or has a dirty parity error, the line is not flushed and a machine check is generated, regardless of the settings of L1CSR0DCEA. 11.8.2.6 dcbf instructions For dcbf instructions, on a hit to any line without a tag EDC error, if the line is dirty, or has a dirty bit error, the line is flushed and invalidated. Lock errors are ignored. When L1CSR0DCEA = ‘00’, tag parity/EDC errors on other lines are ignored. When L1CSR0DCEA = ‘01’, uncorrectable tag EDC errors on other lines will also cause clean unlocked lines to be invalidated, regardless of hit or miss. No machine check is generated regardless of the setting of L1CSR0DCEA. For dcbf, data parity errors will not prevent the line from being flushed, but will cause a machine check to be generated due to a push parity error. For cacheable dcbf instructions that hit only to a line with a tag EDC error or that miss in all ways, a machine check will be generated if L1CSR0DCEA = ‘00’ and any line with a tag EDC error is dirty, locked, or has a dirty parity error or a lock parity error. If L1CSR0DCEA = ‘01’, clean unlocked lines with an uncorrectable tag EDC error are invalidated, and no errors are signaled unless any line with an uncorrectable tag EDC error is also dirty, locked, or has a dirty parity error or a lock parity error. If any line with an uncorrectable tag EDC error is dirty, or has a dirty parity error, the line is not flushed and a machine check is generated. If any line with an uncorrectable tag EDC error is locked, or has a lock parity error, the line is not invalidated, and a machine check is generated. 11.8.2.7 dcbz instructions For dcbz instructions, on a hit to any line without a tag EDC error, the line is zeroed and set to dirty. Data errors, lock errors, and dirty errors are ignored. When L1CSR0DCEA = ‘00’, tag parity/EDC errors on other lines are ignored. When L1CSR0DCEA = ‘01’, uncorrectable tag EDC errors on other lines will also cause clean unlocked lines to be invalidated, regardless of hit or miss. No machine check is generated regardless of the setting of L1CSR0DCEA. For dcbz, lock errors are ignored on a hit. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 609 For cacheable dcbz instructions that hit only to a line with a tag EDC error or that miss in all ways, a machine check will be generated if L1CSR0DCEA = ‘00’ and any line has a tag parity/EDC or lock error. If L1CSR0DCEA = ‘01’ all line(s) with an uncorrectable tag EDC error are invalidated if clean. If a clean line that was locked or had a lock parity error was invalidated, a machine check is generated. If any line with an uncorrectable tag EDC error is dirty or has a dirty parity error, the line is not affected, and a machine check is generated, regardless of the settings of L1CSR0DCEA. If a machine check is generated, no dcbz operation will be performed. 11.8.2.8 Cache locking instructions (dcbtls, dcbtstls, dcblc, icbtls, icblc) For dcbtls, dcbtstls, dcblc, icbtls, and icblc instructions, on a hit to any line without a tag EDC error, the lock bits are set or cleared appropriately, and data, lock, and dirty bit parity or EDC errors are ignored. When L1CSR[0,1][D,I]CEA = ‘00’, tag parity/EDC or lock errors on other lines are ignored. When L1CSR[0,1][D,I]CEA = ‘01’, uncorrectable tag EDC errors on other lines will also cause clean unlocked lines to be invalidated, regardless of hit or miss. No machine check is generated regardless of the setting of L1CSR[0,1][D,I]CEA. For cacheable dcbtls, dcbtstls, and icbtls instructions that hit only to a line with a tag EDC error or that miss in all ways, a machine check will be generated if L1CSR[0,1][D,I]CEA = ‘00’ and any line has a tag parity/EDC error or a lock error. If L1CSR[0,1][D,I]CEA = ‘01’, clean lines with an uncorrectable tag EDC error are invalidated and if a clean line that was locked or had a lock parity error was invalidated, a machine check is generated. If any line with an uncorrectable tag EDC error is dirty, or has a dirty parity error, the line is not affected and a machine check is generated, regardless of the settings of L1CSR[0,1][D,I]CEA. For cacheable dcblc and icblc instructions that hit only to a line with a tag EDC error or that miss in all ways, a machine check will be generated if L1CSR[0,1][D,I]CEA = ‘00’ and any line with a tag parity/EDC error is locked or has a lock parity error. If L1CSR[0,1][D,I]CEA = ‘01’, lock and dirty parity errors will not cause a machine check on their own, but clean lines with an uncorrectable tag EDC error are invalidated, and if a clean line that was locked or had a lock parity error was invalidated, a machine check is generated. If any locked line with an uncorrectable tag EDC error is dirty, or has a dirty parity error, the line is not affected and a machine check is generated, regardless of the settings of L1CSR[0,1][D,I]CEA. 11.8.3 Cache inhibited accesses and parity/EDC errors For non-cacheable access misses, no cache parity/EDC exceptions are signaled. When operating with correction/auto-invalidation disabled, tag EDC errors will cause misses for cache-inhibited accesses, and no machine check will be generated. When correction/auto-invalidation mode is enabled, a correction/auto-invalidation cycle will be run to correct/auto-invalidate tag, dirty, and lock errors, but invalidations will only be performed for uncorrectable tag errors on clean unlocked lines. If a cache-inhibited load or instruction fetch access hit occurs to a line with no tag EDC error, and the requested doubleword of data has no parity/EDC error, the access is treated as a cache hit and the CI status is ignored. Otherwise, if the requested doubleword of data has a parity/EDC error, the access is treated as a cache-inhibited cache miss and the cache data is ignored, even if dirty. No machine check will be generated in this case. A cache-inhibited store hit to a line with no tag EDC error will cause the data to be written to the cache, as well as to memory if the store is a writethrough store, and all data parity errors will be ignored. If a cache hit occurs to a line with an uncorrectable tag error, the hit is ignored, and the access is performed e200z759n3 Core Reference Manual, Rev. 2 610 Freescale Semiconductor as a cache-inhibited cache miss and the cache data is ignored, even if dirty. No machine check will be generated in this case. For cache control instructions such as dcbf, dcbi, icbi, and dcbst that are performed to addresses marked as cache-inhibited, no machine checks are generated, and the operations are only performed on/for lines that would not cause exceptions for the non-CI cases. 11.8.4 Snoop operations and parity/EDC errors For snoop command lookups in which a hit occurs to a cache line with no tag EDC error, tag EDC errors in other lines are ignored, and no error condition is signaled. Otherwise, for snoop command lookups in which a tag EDC error occurs and no hit occurs to a tag entry without a parity/EDC error, no correction attempt for the tags with errors will be made regardless of L1CSR0DCEA, and the snoop response will indicate an error condition. When such a tag EDC error occurs on a snoop invalidate command, the invalidation will not occur, and the error will result in a machine check. The snoop queue will continue to be serviced, and the machine check will not necessarily be recoverable. A checkstop condition will not occur however. In this respect, it is treated similarly to a non-maskable interrupt, and the MSR[RI] bit should be used accordingly by software. 11.8.5 EDC checkbit/syndrome coding scheme generation — ICache When operating with EDC enabled (L1CSR1ICEDT =01), double bit error detection codes are used to protect the tag and data portions of an instruction cache line. Each tag entry utilizes six check bits to cover the tag + valid bit, and each doubleword of data in the data arrays utilizes eight check bits. The specific coding schemes are shown in Table 11-7 and Table 11-8. The lock bits utilize bit-level redundancy, thus are independently protected. Table 11-7 shows the checkbit coding for each tag entry. A ‘*’ in the table indicates the bit is XOR’ed to form the final checkbit value. Table 11-7. Tag checkbit generation Checkbits p_tchk[0:5] 0 Tag bit 0 1 2 3 4 5 * * * * * * * * 1 * 2 * 3 * 4 * 5 * * * * 6 * * 7 8 9 * * * * * * * * * * * * * * * * * * * * * * * * * 10 11 12 13 14 15 16 17 18 19 20 21 V * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Table 11-8 shows the checkbit coding for each doubleword data entry. A ‘*’ in the table indicates the bit is XOR’ed to form the final checkbit value. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 611 Table 11-8. Data checkbit generation Data bit Checkbits p_dchk[0:7 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 0 1 2 3 4 5 6 7 8 9 ] 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 * * * 1 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 6 * 7 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 4 * * * * * * * * * 3 * * * * * * * 2 5 * * * * * * * * * * * * * * * * * * * * * * * * * Data bit Checkbit 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 5 6 6 6 6 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 0 1 * 2 * * * * * * * * * * 3 11.8.6 * * * 4 * * * 5 * * * * 6 * 7 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * EDC checkbit/syndrome coding scheme generation — DCache When operating with EDC enabled (L1CSR0DCEDT =01), double bit error detection codes are used to protect the tag portion of a data cache line. The data array continues to utilize single-bit parity protection. Each data cache tag entry utilizes six check bits to cover the tag + valid bit. The specific coding scheme for the tag array is the same as is used for the ICache, and is shown in Table 11-7. The dirty and lock bits utilize bit-level redundancy, thus are independently protected. Three dirty bits are provided to support single-bit and double-bit error detection. Correction is performed by setting the dirty bits to ‘1’ if a dirt parity error occurs and autoinvalidation/correction is enabled. Four lock bits are provided to support single-bit error correction and double-bit error detection. 11.8.7 Cache error injection Cache error injection provides a way to test error recovery by intentionally injecting parity errors into the instruction and/or data cache. e200z759n3 Core Reference Manual, Rev. 2 612 Freescale Semiconductor Error injection into the instruction cache operates as follows: • If L1CSR1ICEI is set and L1CSR1ICEDT=01, any instruction cache line fill to the instruction cache data has the associated two most significant parity check bits inverted in the instruction cache data array for each doubleword loaded. Error injection for the data cache operates as follows: • If L1CSR0DCEI is set, any cache line fill to the data cache data array has all of the associated parity bits inverted in the data array for each doubleword loaded. Additionally, inverted parity bits are generated for any bytes stored into the data cache data array on a store hit. Cache parity error injection is not performed for cache debug write accesses, since parity bit values written can be directly controlled (See Section 11.19.3, Cache Debug Access Control register (CDACNTL)). In order to clear the parity errors, a cache invalidation or an invalidation of the lines that could have had an injected parity error may be performed. Line invalidation may be performed by an icbi/dcbi instruction, or an L1FINV[0,1] invalidation operation. 11.9 Push and store buffers The push buffer reduces latency for requested new data on a data cache miss by temporarily holding displaced dirty data while the new data is fetched from memory. The push buffer contains 32 bytes of storage (one displaced cache line). If a data cache miss displaces a dirty line, the linefill request is forwarded to the external bus. While waiting for the response, the current contents of the dirty cache line are placed into the push buffer. Once the linefill transaction (burst read) completes, the cache controller can generate the appropriate burst write bus transaction to write the contents of the push buffer into memory. The store buffer contains a FIFO that can defer pending write misses or writes marked as write-through in order to maximize performance. The store buffer can buffer as many as eight words (32 bytes) for this purpose. The store buffer may be disabled for debug purposes. Operation of the store buffer is independent of the L1CSR0[DCE] bit. When the store buffer is enabled, non-allocating store operations that miss the cache or that are marked as writethrough are placed in the store buffer, and the CPU access is terminated. Each store buffer entry contains 32-bits of physical address, 32-bits of data, size information, and 3 bits of access attribute information (W, G, and S/U) in order to properly drive the attribute output signals on a buffered store access. Cache-inhibited guarded stores are not buffered however, and are delayed from being performed until the push and store buffers have been emptied. Once the push or store buffer has valid data, the internal bus controller uses the next available external bus cycle to generate the appropriate write cycles. In the event that another data cache fill is required (e.g., cache load or store w/allocate miss to process) during the continued instruction execution by the processor pipeline, an alias check is performed between the linefill address and all valid entries in the store and push buffer using the index portion of the access address. If no match is found, the linefill may bypass pending stores in the store or push buffer. Otherwise, if an alias exists (index matches any valid store buffer entry), the data cache pipeline will stall until the aliased entries have been flushed from the store and push buffer before generating the required external bus transaction for the linefill. Single-beat read transactions will not bypass pending stores in the push or store buffer. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 613 The push buffer is always emptied prior to queued store buffer entries to avoid memory consistency issues. Once the push buffer has been loaded with dirty data to be written back to memory, a subsequent store may be buffered, but will not be written to memory until the push has completed. For cache-inhibited load accesses or cache-inhibited guarded store accesses, the processor termination is withheld until the store buffer has been flushed of all entries, the push buffer has been emptied, and the access has completed to memory. A write to the L1CSR0 register may be used to force the push and store buffers to empty before proceeding with the actual L1CSR0 update. Additionally, the msync and mbar instructions will also cause these buffers to be emptied prior to completion. If an external transfer ERROR response occurs while emptying the store buffer, a machine check exception is signaled to the CPU, and a store for the next entry to be written (if any) is initiated. If a transfer error occurs for a push buffer transaction, the push of the remaining portion of the cache line is aborted, and a machine check exception is signaled to the CPU. This is also the case for a cache control operation that causes a line to be pushed. Following the transfer error, the line will be marked invalid. If it is possible for a transfer error to be returned by the system on a push or a buffered store, and this could cause a problem, the address must be marked guarded and cache inhibited. External termination errors that occur on any push of a dirty cache line will result in a machine check condition. 11.10 Cache management instructions This section describes the implementation of Cache Management instructions in e200z759n3. 11.10.1 Instruction cache block invalidate (icbi) instruction • • icbi is described on page 280 of Book E: Enhanced PowerPCtm Architecture v0.99 If the cache line containing the byte addressed by the EA associated with this instruction is present in the instruction cache, it is invalidated, regardless of lock status. If an instruction cache linefill is in progress and the linefill data corresponds to the EA associated with a icbi, the instruction cache is not updated with linefill data. 11.10.2 Instruction cache block touch (icbt) instruction • • icbt is described on page 281 of Book E: Enhanced PowerPCtm Architecture v0.99 If HID0NOPTI is set, this instruction is treated as a no-op. 11.10.3 Data cache block allocate (dcba) instruction • • dcba is described on page 241 of Book E: Enhanced PowerPCtm Architecture v0.99 This instruction is treated as a no-op. e200z759n3 Core Reference Manual, Rev. 2 614 Freescale Semiconductor 11.10.4 Data cache block flush (dcbf) instruction • • • • dcbf is described on page 242 of Book E: Enhanced PowerPCtm Architecture v0.99 If the cache line containing the byte addressed by the EA associated with this instruction is present in the data cache, it is copied back to memory if dirty. The line is subsequently invalidated regardless of whether it was copied back or locked. If a data cache linefill is in progress and the linefill data corresponds to the EA associated with a dcbf, the data cache is not updated with linefill data. This instruction is treated as a load for the purposes of access protection. If the data cache is disabled, this instruction is treated as a no-op. 11.10.5 Data cache block invalidate (dcbi) instruction • • • • • dcbi is described on page 243 of Book E: Enhanced PowerPCtm Architecture v0.99 If the cache line containing the byte addressed by the EA associated with this instruction is present in the data cache, it is invalidated, regardless of lock status. No copyback occurs if the line is present in the data cache and dirty. If a data cache linefill is in progress and the linefill data corresponds to the EA associated with a dcbi, the data cache is not updated with linefill data. This instruction is privileged This instruction is treated as a store for the purposes of access protection. If the data cache is disabled, this instruction is treated as a no-op in supervisor mode. 11.10.6 Data cache block store (dcbst) instruction • • • • dcbst is described on page 245 of Book E: Enhanced PowerPCtm Architecture v0.99 If the cache line containing the byte addressed by the EA associated with this instruction is present in the data cache, it is copied back to memory if dirty. The line is subsequently marked clean, and the lock status is unchanged This instruction is treated as a load for the purposes of access protection. If the data cache is disabled, this instruction is treated as a no-op. 11.10.7 Data cache block touch (dcbt) instruction • • dcbt is described on page 246 of Book E: Enhanced PowerPCtm Architecture v0.99 If HID0NOPTI is set, this instruction is treated as a no-op. 11.10.8 Data cache block touch for store (dcbtst) instruction • • dcbtst is described on page 247 of Book E: Enhanced PowerPCtm Architecture v0.99 If HID0NOPTI is set, this instruction is treated as a no-op. 11.10.9 Data cache block set to zero (dcbz) instruction • dcbz is described on page 248 of Book E: Enhanced PowerPCtm Architecture v0.99 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 615 • • • If the cache line containing the byte addressed by the EA associated with this instruction is present in the data cache, all bytes in the line are zeroed, the line is marked as modified, and remains valid. Lock status remains unchanged. If the cache line is not present and the address is cacheable, it is established in the data cache (without fetching from memory), all bytes in the line are zeroed, and the line is marked as modified and valid. This instruction is treated as a store for the purposes of access protection. dcbz causes an Alignment exception if the EA is marked by the MMU as Cache-inhibited and a data cache miss occurs, or if the EA is marked by the MMU as Writethrough Required, or if the data cache is disabled or is operating in writethrough mode, or if an overlocking condition prevents the allocation of a line into the data cache. 11.11 Touch instructions Due to the limitations of using the icbt, dcbt, and dcbtst instructions, a program that uses these instructions improperly may actually see a degradation in performance from their use. To avoid this, e200z759n3 provides the HID0NOPTI control bit to cause these instructions to be treated as nops. 11.12 Cache line locking/unlocking APU 11.12.1 Overview e200z759n3 supports the Freescale EIS Cache Line Locking APU, which defines user-mode instructions to perform cache locking/unlocking. Three of the instructions are for data cache locking control (dcblc, dcbtls, dcbtstls) and two instructions are for instruction cache locking control (icblc, icbtls). The dcbtls, dcbtstls, and dcblc lock instructions are treated as reads for checking access permissions when translated by the TLB, and exceptions are taken for Data TLB errors or Data Storage interrupts. The icbtls and icblc instructions require either execute (X) or read (R) permission when translated by the TLB. Exceptions are taken using Data TLB errors (DTLB) or Data Storage Interrupts (DSI), not ITLB or ISI. The user-mode cache lock enable MSR[UCLE] bit may be used to restrict user-mode cache line locking. If MSR[UCLE] is clear, any cache lock instruction executed in user-mode will take a Cache-locking DSI exception (unless nop’ed) and set either ESR[DLK] or ESR[ILK]. If MSR[UCLE] is set, cache-locking instructions can be executed in user-mode and they will not take a DSI for cache-locking. However, they may still cause a DSI for access violations or cause machine checks for external termination errors. There are cases when attempting to set a lock will fail even when no DSI or DTLB exceptions occur. These are as follows: • The target address is marked cache-inhibited and a cache miss occurs • The cache is disabled or all ways of the cache are disabled for replacement • The cache target indicated by the CT field (bits 7-10) of the instruction is not 0 In these cases, the lock set instruction is treated as a NOP, and the cache unable to lock L1CSR{0,1}[CUL] bit is set. e200z759n3 Core Reference Manual, Rev. 2 616 Freescale Semiconductor Assuming no exception conditions occur (DSI or DTLB error), for dcbtls, dcbtstls, and icbtls an attempt is made to lock the corresponding cache line. If a miss occurs, and all of the available ways (ways enabled for a particular access type) are already locked in a given cache set, an attempt to lock another line in the same set will result in an overlocking situation. In this case, the cache overlock bit L1CSR{0,1}[CLO] is set to indicate that an overlocking situation occurred. This does not cause an exception condition. The new line is conditionally placed in the cache, displacing a previously locked line depending on the setting of the appropriate L1CSR0,1[CLOA] bit. The CUL conditions have priority over the CLO condition. If multiple NOP or exception conditions arise on a cache lock instruction, the results are determined by the order of precedence described in Table 11-9. It is possible to lock all ways of a given cache set. If an attempt is made to perform a non-locking line fill for a new address in the same cache set, the new line is not put into the cache. It is satisfied on the bus using a single beat transfer instead of normal burst transfers. If a dcbz instruction is executed, and all ways available for allocation have been locked, an Alignment exception will be generated and no line is put into the cache. Cache line locking interacts with the ability to control replacement of lines in certain cache ways via the L1CSR0 WID and WDD control bits. If any cache line locking instruction (icbtls, dcbtls, dcbtstls) is allowed to execute and finds a matching line already present in the cache, the line’s lock bit will be set regardless of the settings of the WID and WDD fields. In this case, no replacement has been made. However, for cache misses that occur while executing a cache line lock set instruction, the only candidate lines available for locking are those that correspond to ways of the cache that have not been disabled for the particular type of line locking instruction (controlled by WDD for dcbtls and dcbtstls, controlled by WID for icbtls). Thus, an overlocking condition may result even though fewer than four lines with the same index are locked. The cache-locking DSI handler must decide whether or not to lock a given cache line based upon available cache resources. If the locking instruction is a set lock instruction, and if the handler decides to lock the line, it should do the following: • Add the line address to its list of locked lines. • Execute the appropriate set lock instruction to lock the cache line. • Modify save/restore register 0 to point to the instruction immediately after the locking instruction that caused the DSI. • Execute an rfi. If the locking instruction is a clear lock instruction, and if the handler decides to unlock the line, it should do the following: • Remove the line address from its list of locked lines. • Execute the appropriate clear lock instruction to unlock the cache line. • Modify save/restore register 0 to point to the instruction immediately after the locking instruction that caused the DSI. • Execute an rfi. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 617 11.12.2 dcbtls — data cache block touch and lock set dcbtls dcbtls Data Cache Block Touch and Lock Set dcbtls CT, RA, RB 31 0 / 5 6 CT (E=0) Form X RA 10 11 RB 15 16 0010100110 20 21 / 30 31 Figure 11-10. dcbtls — data cache block touch and lock set Description: if RA=0 then a 640else a GPR(RA) EA 320 || (a + GPR(RB))32:63 PrefetchDataCacheBlockLockSet(CT, EA) If CT=0, the cache line corresponding to EA is loaded and locked into the level 1 data cache. If CT=0 and the line already exists in the data cache, dcbtls locks the line without refetching it from external memory. Exceptions: If the MSR[UCLE] (user-mode cache lock enable) bit is set, dcbtls may be performed while in user mode (MSR[PR]=1). If the MSR[UCLE] bit is clear, an attempt to perform these instructions in user mode causes a data cache locking error DSI unless the CT field or other conditions otherwise NOP the instruction. The e200z759n3 only supports CT=0. If CT is some value other than 0, the dcbtls is NOP’ed and the L1CSR0[DCUL] bit is set indicating an unable-to-lock condition occurred. No other exceptions are reported. If the data cache is disabled, the dcbtls is NOP’ed and the L1CSR0[DCUL] bit is set indicating an unable-to-lock condition occurred. No other exceptions are reported. The dcbtls instruction is treated as a load with respect to translation and will cause a DSI interrupt for access violations, as well as causing a Data TLB error interrupt if the target address cannot be translated. If the block corresponding to EA is cache-inhibited and a data cache miss occurs, the instruction is NOP’ed, (no DSI is taken due to the cache-inhibited status), and the L1CSR0[DCUL] bit is set indicating an unable-to-lock condition occurred. Other registers altered: • L1CSR0 (see below) When a dcbtls is performed to an index, and a way can not be locked, the L1CSR0[DCUL] bit is set indicating an unable-to-lock condition occurred. This also occurs whenever the dcbtls must be NOP’ed. When a dcbtls is performed to an index in the data cache that already has all the ways locked, this is referred to as an over-locking situation. There is no exception generated by an over-locking situation. Instead the L1CSR0[DCLO] bit is set, indicating an over-lock condition occurred. A line is allocated and e200z759n3 Core Reference Manual, Rev. 2 618 Freescale Semiconductor locked in the cache depending on the setting of the L1CSR0[DCLOA] control bit. If system software wants to precisely determine if an overlock condition has happened, it must perform the following code sequence: dcbtls msync mfspr (L1CSR0) (check L1CSR0[DCUL] bit for cache index unable-to-lock condition) (check L1CSR0[DCLO] bit for cache index over-lock condition) 11.12.3 dcbtstls — data cache block touch for store and lock set dcbtstls dcbtstls Data Cache Block Touch for Store and Lock Set dcbtstls CT, RA, RB 31 0 / CT 5 6 (E=0) Form X RA 10 11 RB 15 16 0010000110 20 21 / 30 31 Figure 11-11. dcbtstls — data cache block touch for store and lock set Description: if RA=0 then a 640else a GPR(RA) EA 320 || (a + GPR(RB))32:63 PrefetchDataCacheBlockLockSet(CT, EA) e200z759n3 treats the dcbtstls instruction identically to the dcbtls instruction since no hardware coherency mechanisms are implemented for the cache. 11.12.4 dcblc — data cache block lock clear dcblc dcblc Data Cache Block Lock Clear dcblc CT, RA, RB 31 0 / 5 6 CT (E=0) Form X RA 10 11 RB 15 16 0110000110 20 21 / 30 31 Figure 11-12. dcblc — data cache block lock clear Description: if RA=0 then a 640else a GPR(RA) e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 619 EA 320 || (a + GPR(RB))32:63 DataCacheClearLockBit(CT, EA) If CT=0, and the line is present in the L1 data cache, the lock bit for that line is cleared, making that line eligible for replacement. Exceptions: If the MSR[UCLE] (user-mode cache lock enable) bit is set, dcblc may be performed while in user mode (MSR[PR]=1). If the MSR[UCLE] bit is clear, an attempt to perform this instructions in user mode causes a DSI, unless the CT field or other conditions otherwise NOP the instruction. The e200z759n3 only supports CT=0. If CT is some value other than 0, the dcblc is NOP’ed. No other exceptions are reported. If the data cache is disabled, the dcblc is NOP’ed. No other exceptions are reported. The dcblc instruction is treated as a load with respect to translation and will cause a DSI interrupt for access violations, as well as causing a Data TLB error interrupt if the target address cannot be translated. 11.12.5 icbtls — instruction cache block touch and lock set icbtls icbtls Instruction Cache Block Touch and Lock Set icbtls CT, RA, RB 31 0 / 5 6 CT (E=0) Form X RA 10 11 RB 15 16 0111100110 20 21 / 30 31 Figure 11-13. icbtls — instruction cache block touch and lock set Description: if RA=0 then a 640else a GPR(RA) EA 320 || (a + GPR(RB))32:63 PrefetchInstructionCacheBlockLockSet(CT, EA) If CT=0, the cache line corresponding to EA is loaded and locked into the level 1 instruction cache. If CT=0 and the line already exists in the instruction cache, icbtls locks the line without refetching it from external memory. Exceptions: If the MSR[UCLE] (user-mode cache lock enable) bit is set, icbtls may be performed while in user mode (MSR[PR]=1). If the MSR[UCLE] bit is clear, an attempt to perform these instructions in user mode causes an Instruction cache locking error DSI unless the CT field or other conditions otherwise NOP the instruction. The e200z759n3 only supports CT=0. If CT is some value other than 0, the icbtls is NOP’ed and the L1CSR1[ICUL] bit is set indicating an unable-to-lock condition occurred. No other exceptions are e200z759n3 Core Reference Manual, Rev. 2 620 Freescale Semiconductor reported. If the instruction cache is disabled, the icbtls is NOP’ed and the L1CSR1[ICUL] bit is set indicating an unable-to-lock condition occurred. No other exceptions are reported. The icbtls instruction requires either execute or read (X or R) permissions with respect to translation and will cause a DSI interrupt for access violations, as well as causing a Data TLB error interrupt if the target address cannot be translated. If the block corresponding to EA is cache-inhibited and an instruction cache miss occurs, the instruction is NOP’ed, (no DSI is taken due to the cache-inhibited status), and the L1CSR1[ICUL] bit is set indicating an unable-to-lock condition occurred. Other registers altered: • L1CSR1 (see below) When icbtls is performed to an index and a way can not be locked, the L1CSR1[ICUL] bit is set indicating an unable-to-lock condition occurred. This also occurs whenever icbtls must be NOP’ed. When icbtls is performed to an index in the instruction cache that already has all the ways locked, this is referred to as an over-locking situation. There is no exception generated by an over-locking situation. Instead the L1CSR1[ICLO] bit is set, indicating an over-lock condition occurred. A line is allocated and locked in the cache depending on the setting of the L1CSR1[ICLOA] control bit. If system software wants to precisely determine if an overlock condition has happened, it must perform the following code sequence: icbtls msync mfspr (L1CSR1) (check L1CSR1[ICUL] bit for cache index unable-to-lock condition) (check L1CSR1[ICLO] bit for cache index over-lock condition) 11.12.6 icblc — instruction cache block lock clear icblc icblc Instruction Cache Block Lock Clear icblc CT, RA, RB 31 0 / 5 6 CT (E=0) Form X RA 10 11 RB 15 16 0011100110 20 21 / 30 31 Figure 11-14. icblc — instruction cache block lock clear Description: if RA=0 then a 640else a GPR(RA) EA 320 || (a + GPR(RB))32:63 InstCacheClearLockBit(CT, EA) If CT=0, and the line is present in the instruction cache, the lock bit for that line is cleared, making that line eligible for replacement. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 621 Exceptions: If the MSR[UCLE] (user-mode cache lock enable) bit is set, icblc may be performed while in user mode (MSR[PR]=1). If the MSR[UCLE] bit is clear, an attempt to perform these instructions in user mode causes an Instruction cache locking error DSI unless the CT field or other conditions otherwise NOP the instruction. The e200z759n3 only supports CT=0. If CT is some value other than 0, the icblc is NOP’ed. No other exceptions are reported. If the instruction cache is disabled, the icblc is NOP’ed. No other exceptions are reported. The icblc instruction requires either execute or read (X or R) permissions with respect to translation and will cause a DSI interrupt for access violations, as well as causing a Data TLB error interrupt if the target address cannot be translated. 11.12.7 Effects of other cache instructions on locked lines The following cache instructions have no effect on the state of a cache line's lock bit: icbt, dcba, dcbz, dcbst, dcbt, and dcbtst. The following cache instructions flush/invalidate and unlock a cache line in the respective L1 caches: dcbf, dcbi, and icbi. 11.12.8 Flash clearing of lock bits e200z759n3 supports flash clearing of cache lock bits under software control by using the CFCL (cache flash clear locks) control bit in the L1CSR{0,1} register. Lock bits are not cleared automatically upon power-up (m_por) or normal reset (p_reset_b). Software must use the CLFC control bit to clear the lock bits after a reset. Proper use of this bit is to determine that it is clear and then set it with a pair of mfspr mtspr operations. A 0-to-1 transition on CLFC causes a flash clearing of the lock bits to be initiated, which lasts for multiple (approx. 134) CPU cycles. Once set, the CLFC bit will be cleared by hardware after the operation is complete. It will remain set during the clearing interval, and may be tested by software to determine when the operation has completed. A mtspr operation to L1CSR{0,1} that attempts to change the state of L1CSR{0,1}[CLFC] during invalidation will not affect the state of that bit. During the process of performing the flash clearing, the cache does not respond to accesses, and remains busy. Interrupts may still be recognized and processed, potentially aborting the flash clearing operation. When this occurs, the L1CSR{0,1}[ABT] bit will be set to indicate unsuccessful completion of the operation. Software should read the L1CSR{0,1} register to determine that the operation has completed (L1CSR{0,1}[CLFC] bit cleared), and then check the status of the L1CSR{0,1}[ABT] bit to determine completion status. e200z759n3 Core Reference Manual, Rev. 2 622 Freescale Semiconductor NOTE Note that while most implementations of the e200z759n3 will stall further instruction execution during this flash clearing interval, it is not guaranteed across all implementations, thus software should be written using these guidelines. 11.13 Cache instructions and exceptions All cache management instructions (except icbt, dcba, dcbt, and dcbtst) can generate TLB miss exceptions if the effective address cannot be translated, or may generate DSI exceptions due to permission violations. In addition, dcbz may generate an Alignment interrupt as described in Section 11.10.9, Data cache block set to zero (dcbz) instruction. The cache locking instructions dcblc, dcbtls, dcbtstls, icblc and icbtls generate DSI exceptions if the MSR[UCLE] bit is clear and the locking instruction is executed in user mode (MSR[PR]=1). Data cache locking instructions that result in a DSI exception for this reason set the ESR[DLK] bit (documented as DLK0 in Book E), and Instruction cache locking instructions that result in a DSI exception for this reason set the ESR[ILK] bit (documented as DLK1 in Book E). 11.13.1 Exception conditions for cache instructions If multiple NOP or exception conditions arise on a cache instruction, the results are determined by the order of precedence described in Table 11-9. Table 11-9. Special case handling Protectio n Violation WT or cache in writethrough mode External terminatio n error CT!= Cache 0 disabled TLB miss icbt, dcbt, dcbtst NOP NOP NOP — NOP — NOP NOP NOP NOP dcbtls dcbtstls dcblc DCU L DCU L NOP DCUL DCUL NOP DTLB DTLB DTLB DLK DLK DLK DSI DSI DSI — — — MC MC MC DCUL DCUL — DCLO DCLO — MC MC — icbtls icblc ICUL NOP ICUL NOP DTLB DTLB ILK ILK DSI DSI — — MC MC ICUL — ICLO — MC — dcbz — ALI DTLB — DSI ALI MC ALI ALI — dcbf, dcbst — NOP DTLB — DSI — MC — — MC icbi, dcbi — NOP DTLB — DSI — — — — — Operatio n Cache CI and parity miss in error cache All availabl e ways locked User & UCLE= 0 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 623 Table 11-9. Special case handling (continued) Operatio n CT!= Cache 0 disabled TLB miss User & UCLE= 0 Protectio n Violation WT or cache in writethrough mode Cache CI and parity miss in error cache All availabl e ways locked External terminatio n error Atomic load or store. — — — — DTLB DTLB — — DSI DSI — — MC MC — — — — MC MC load store — — — — DTLB DTLB — — DSI DSI — — MC MC — — — — MC MC Notes: — Priority decreases from left to right — Cache operations that do not set or clear locks ignore the value of the CT field — “dash” indicates executes normally — “NOP” indicates treated as a no-op — DSI = data storage interrupt; ALI = alignment interrupt; DTLB = data TLB interrupt — DCUL, ICUL = no-op, and set L1CSR0[CUL] — DCLO, ICLO = no-op, and set L1CSR0[CLO] — DLK, ILK = data storage interrupt (DSI) and set ESR[DLK] or ESR[ILK] — MC = Machine Check and update MCAR 11.13.2 Transfer type encodings for cache management instructions Transfer type encodings are used to indicate to the Cache whether a normal access, atomic access, cache management control access, or MMU management control access is being requested. These attribute signals are driven with addresses when an access is requested. Table 11-10 shows the definitions of the p_d_ttype[0:5] encodings. Table 11-10. Transfer type encoding p_d_ttype[0:5]1 Transfer type Instruction 00000e Normal normal loads / stores 000010 Atomic lwarx, stwcx., lharx, sthcx., lbarx, stbcx. 00010e Flush Data Block dcbst 00011e Flush and Invalidate Data Block dcbf 00100e Allocate and Zero Data Block dcbz 001010 Invalidate Data Block dcbi 00110e Invalidate Instruction Block icbi 001110 multiple word load/store lmw, stmw 010000 TLB Invalidate tlbivax 010010 TLB Search tlbsx 010100 TLB Read entry tlbre 010110 TLB Write entry tlbwe e200z759n3 Core Reference Manual, Rev. 2 624 Freescale Semiconductor Table 11-10. Transfer type encoding (continued) p_d_ttype[0:5]1 1 Transfer type Instruction 011000 Touch for Instruction icbt 011010 Lock Clear for Instruction icblc 011100 Touch for Instruction and Lock Set icbtls 011110 Lock Clear for Data dcblc 10000e Touch for Data dcbt 10001e Touch for Data Store dcbtst 100100 Touch for Data and Lock Set dcbtls 100110 Touch for Data Store and Lock Set dcbtstls p_ttype[5] ‘e’ is set to set to 0. 11.14 Sequential consistency The Power Architecture architecture requires that all memory operations executed by a single processor be sequentially self-consistent. This means that all memory accesses appear to be executed in the order that is specified by the program with respect to exceptions and data dependencies. The e200z759n3 CPU achieves this effect by operating a single pipeline to the Cache/MMU. All memory accesses are presented to the MMU in the exact order that they appear in the program and therefore exceptions are determined in order. 11.15 Self-modifying code requirements The following sequence of instructions will synchronize the instruction stream. 1. dcbf 2. icbi 3. msync 4. isync This sequence ensures that the operation is correct for PowerISA 2.06 processors that implement separate instruction and data caches, as well as for multi-processor cache-coherent systems. 11.16 Page table control bits The Power Architecture architecture allows certain memory characteristics to be set on a page and on a block basis. These characteristics include writethrough (using the W-bit), cacheability (using the I-bit), coherency (using the M-bit), guarded memory (using the G-bit), and endianness (using the E-bit). Incorrect use of these bits may create situations where coherency paradoxes are observed by the processor. In particular, this can happen when the state of these bits are changed without appropriate precautions being taken (that is, flushing the pages that correspond to the changed bits from the cache), or when the address translations of aliased real addresses specify different values for any of the WIMGE bits. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 625 Generally, certain mixing of WIMG settings are allowed by the Book E Power Architecture architecture, however others may present cache coherence paradoxes and are considered programming errors. 11.16.1 Writethrough stores A writethrough store (WIMGE = b’1xxxx’) may normally hit to a valid cache line. In this case, the cache line remains in its current state, the store data is written into the cache, and the store goes out on the bus as a single beat write. 11.16.2 Cache-inhibited accesses When the Cache-inhibited attribute is indicated by translation (WIMGE = b’x1xxx’) and a cache miss occurs, all accesses are performed as single beat transactions on the system bus with a size indicator corresponding to the size of the load, store or prefetch operation. Cache inhibited status is ignored on all cache hits. 11.16.3 Memory coherence required For the e200z759n3, the “memory coherence required” storage attribute (WIMGE = b’xx1xx’) is reflected on the p_d_gbl output during each external data access, to indicate to external coherency logic that memory coherence is required. This bit is ignored for instruction accesses. 11.16.4 Guarded storage For the e200z759n3, the guarded storage attribute (WIMGE = b’xxx1x’) is used to determine if a second outstanding data cache miss may proceed to the system interface prior to the termination of the first outstanding miss. If the second address is marked as guarded, it will not be presented to the external interface until the previous miss has been completed without error. 11.16.5 Misaligned accesses and the endian (E) bit Misaligned load or store accesses that cross page boundaries can cause data corruption if the two pages do not have the same endianness (that is, one page is big endian while the other page is little endian). If this occurs, the processor would not get all the bytes, or would get some of them out of order, resulting in garbled data. To protect against data corruption, the e200z759n3 core takes a DSI exception and set the BO (byte ordering) bit in the Exception Syndrome register whenever this situation occurs. 11.17 Reservation instructions and cache interactions The e200z759n3 core treats reservation instruction (lbarx, lharx, lwarx, stbcx., sthcx., and stwcx.) accesses as though they were cache inhibited, regardless of page attributes. Additionally, a cache line corresponding to the address of a reservation instruction access will be flushed to memory if dirty, prior to the reservation access being issued to the bus. This is done to allow external reservation logic to be built that properly signals a reservation failure. The bus access will be treated as a single-beat transfer. e200z759n3 Core Reference Manual, Rev. 2 626 Freescale Semiconductor 11.18 Effect of hardware debug on cache operation Hardware debug facilities utilize normal CPU instructions to access register and memory contents during a debug session. This may have the unavoidable side-effect of causing the store and push buffers to be flushed. During hardware debug, the MMU page attributes are controllable by the debug firmware via settings of the OnCE Control register (OCR). Refer to Section 12.4.6.3, e200z759n3 OnCE Control Register (OCR). Cache snoop operations continue to be serviced during debug sessions. 11.19 Cache memory access for debug / error handling The cache memory provides resources needed to do foreground accesses via mtdcr instructions executed by the processor, or background accesses through the JTAG/OnCE port to read and write the cache SRAM arrays. Accesses are supported via a pair of device control registers (DCRs) that are also mapped into OnCE-accessible registers. These resources are intended for use by special debug tools and by debug or specialized error recovery exception software, not by general application code. Access to the cache memory SRAM arrays using mtdcr instructions may be performed by supervisor-level software after appropriate synchronization has been performed with msync, isync instruction pairs. Access to the cache memory SRAM arrays using the JTAG port is conditional on the CPU being in debug mode. The CPU must be placed in debug state prior to initiation of a read or write access via OnCE. This facility allows access only to the SRAM arrays used for cache tag and data storage. This function is available even when the cache is disabled. The cache linefill buffer, push buffer, store buffer, and late write buffer are all outside of the SRAM arrays and are not accessible. However, before a debug memory access request is serviced, the push and store buffers will be written to external memory, and the late write and linefill buffers will be written to the cache arrays. 11.19.1 Cache memory access via software Cache debug access control and data information are accessed by executing mfdcr and mtdcr instructions to the Cache Debug Access control and data registers CDACNTL and CDADATA (see Table 11-11 and Table 11-12). Accesses are performed one word (32 bits) at a time. For a Cache write access, software must first write the CDADATA register with the desired tag and status flags, or data values. The second step is to write the CDACNTL register with desired tag or data location and parity values, and assert the R/W and GO bits in CDACNTL. Note that writing a 64-bit value for data requires two passes, one for the even word (A29=0) and one for the odd word (A29=1). Each 32-bit write will update all of the parity/check bits, so in general, if only a single 32-bit write is performed, it should be preceded by a read of the data that is not being modified, in order to properly compute or store all 8 parity/check bits when the modified 32-bit data is written. Tag writes are accomplished in a single pass. For a Cache read access, software must first access and write the CDACNTL register with desired tag or data location, and assert the R/W and GO bits in CDACNTL. The second step is to read the CDADATA register for the tag or data and read the CDACNTL register for parity information. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 627 Completion of any operation can be determined by reading the CDACNTL register. Operations are indicated as complete when CDACNTL[30:31] = ‘00’. Software should poll the CDACNTL register to determine when an access has been completed prior to assuming validity of any other information in the CDACNTL or CDADATA registers. Note that no parity errors are generated as a result of mtdcr/mfdcr instructions involving the CDACNTL or CDADATA registers. To ensure proper cache write operation, the following program sequence is recommended: loop: msync isync mtdcr cdadata, rS1 // set up write data mtdcr cdacntl, rS2 // write control to initiate write msync isync mfdcr rN, cdacntl // check for done andi. rT, rN, #3 bne loop . . To ensure proper cache read operation, the following program sequence is recommended: loop: msync isync mtdcr cdacntl, rS2 // write control to initiate read msync isync mfdcr rN, cdacntl // check for done andi. rT, rN, #3 bne loop mfdcr rT, cdadata // return data . . Conflict conditions with snoop accesses to the same cache line cannot be resolved in a manner that guarantees that a value read will not change state before a subsequent value written. No interlocking is performed, so a cache entry read as being valid or written to a valid state may become invalid at any time. 11.19.2 Cache memory access through JTAG/OnCE port Cache debug access control and data information are serially accessed through the OnCE controller and access the Cache Debug Access control and data registers CDACNTL and CDADATA (see Table 11-11 and Table 11-12). Accesses are performed one word (32 bits) at a time. For a Cache write access, the user must first write the CDADATA register with the desired tag or data values. The second step is to write the CDACNTL register with desired tag or data location, parity and dirty information (for data writes only), and assert the R/W and GO bits in CDACNTL. e200z759n3 Core Reference Manual, Rev. 2 628 Freescale Semiconductor For a Cache read access, the user must first access and write the CDACNTL register with desired tag or data location, and assert the R/W and GO bits in CDACNTL. The second step is to access and read the CDADATA register for the tag or data and read the CDACNTL register for parity. Completion of any operation can be determined by reading the CDACNTL register. Operations are indicated as complete when CDACNTL[30:31] = ‘00’. Debug firmware should poll the CDACNTL register to determine when an access has been completed prior to assuming validity of any other information in the CDACNTL or CDADATA registers. Conflict conditions with snoop accesses to the same cache line cannot be resolved in a manner that guarantees that a value read will not change state before a subsequent value written. No interlocking is performed, so a cache entry read as being valid or written to a valid state may become invalid at any time. 11.19.3 Cache Debug Access Control register (CDACNTL) 0 1 2 3 0 4 CSET 5 6 7 8 WORD PARITY 0 R/W 0 CWAY CACHE T/D The Cache Debug Access Control Register (CDACNTL) contains location information (T/D, CWAY, CSET, and WORD), and control (R/W and GO) needed to access the Cache Tag or Data SRAM arrays. Also included here are the SRAM parity bit values that must be supplied by the user for write accesses, and that will be supplied by the cache for read accesses. The CDACNTL register is shown in Figure 11-15. GO 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 DCR - 351; Read/Write; Reset - 0x0 Figure 11-15. Cache Debug Access Control register (CDACNTL) Table 11-11 provides bit definitions for the Cache Debug Access Control Register. Table 11-11. CDACNTL field descriptions Bit Name Description 0 T/D 1 — 2:3 CWAY 4:5 — 6:12 CSET Cache Set: Specifies the cache set to be selected 13:15 WORD Word (Data array access only, I or D cache) Specifies one of eight words of selected set Tag / Data 0 Data array selected 1 Tag array selected Reserved1 Cache Way Specifies the cache way to be selected Reserved1 e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 629 Table 11-11. CDACNTL field descriptions (continued) Bit Name Description PARITY / Parity check bits2 (I or D cache) EDC CHECK EDC Mode (L1CSR[0,1][D,I]CEDT = 01): DCache Data array: Byte parity bits. One bit per data byte. bit 16: Parity for byte 0, bit 17: Parity BITS for byte 1.... bit 23: Parity for byte 7. ICache Data Array: parity check bits for data. Bits 16:23 correspond to p_dchk[0:7] (See Table 11-8). Tag Array: parity check bits for tag. Bits 16:21 correspond to p_tchk[0:5] (See Table 11-7). bits 22:23 reserved. 16:23 24:27 Reserved1 — 28 CACHE Cache Select Specifies the cache to be selected 0 Selects the data cache for the operation. 1 Selects the instruction cache for the operation. 29 R/W Read / Write: 0 Selects write operation. Write the data in the CDADATA register to the location specified by this CDACNTL register. 1 Selects read operation. Read the cache memory location specified by this CDACNTL register and store the resulting data in the CDADATA register and store the parity bits in this CDACNTL register. 30:31 GO GO command bits 00 Inactive or complete (no action taken) hardware sets GO=00 when an operation is complete 01 Read or write cache memory location specified by this CDACNTL register. 1x Reserved 1 These bits are not implemented and should be written zero for future compatibility. Cache parity checkers assume odd parity when using parity protection. EDC coding is used otherwise. 2 11.19.3.1 Cache Debug Access Data register (CDADATA) The Cache Debug Access Data Register (CDADATA) contains the SRAM data for a debug access. The same register is used for Tag and Data SRAM read and write operations for both caches. Note that a single 32-bit word is accessed. Accessing an entire 64-bit doubleword requires two passes. The CDADATA register is shown in Figure 11-16. TAG or DATA 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 DCR - 350; Read/Write; Reset - Undefined/Unaffected Figure 11-16. Cache Debug Access Data register (CDADATA) Table 11-12 provides bit definitions for the Cache Debug Access Data Register. e200z759n3 Core Reference Manual, Rev. 2 630 Freescale Semiconductor Table 11-12. CDADATA field descriptions Bit(s) Name Description 0:31 TAG TAG Array Access Data - when accessing the tag array of either cache: 0:21 Tag compare bits 22 Reserved 23 Valid bit 24:27 Lock bits. These four bits should have the same value, 1-Locked, 0-Unlocked. 28:30 Dirty bits - (data cache only). These three bits should have the same value, 1-Dirty, 0-Clean. DATA DATA Array Access Data (Bytes 0:3 of the selected word) - when accessing the data array of either cache: 0:7 Byte 0 8:15 Byte 1 16:23 Byte 2 24:31 Byte 3 11.20 Hardware Debug (Cache) Control Register 0 0 1 2 3 4 5 6 7 8 0 ISTRM DSTRM 0 DSB 0 SNPDIS MBD Hardware debug control register 0 is used to disable certain cache features for hardware debug purposes. This register is not intended for normal user use. The HDBCR0 register is accessed using a mfspr or mtspr instruction. The SPR number for HDBCR0 is 976 in decimal. The HDBCR0 register is shown in Figure 11-17. 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 SPR - 976; Read/Write; Reset - 0x0; Supervisor-only Figure 11-17. Hardware Debug Control Register 0 (HDBCR0) The HDBCR0 bits are described in Table 11-13. Table 11-13. HDBCR0 field descriptions Bits Name 0:24 — 25 MBD Description Reserved1 Msync/Mbar Broadcast Disable 0 msync/mbar broadcasting is enabled. p_sync_req_out asserted normally and p_sync_ack_in is used to terminate msync and mbar MO=0,1 instruction execution 1 msync/mbar broadcasting is disabled. p_sync_req_out remains negated, and p_sync_ack_in is ignored and not used to terminate msync and mbar MO=0,1 instruction execution. Note: MBD settings have no effect on the operation of p_sync_req_in and p_sync_ack_out. Normal handshaking and completion of the synchronization request input will be performed. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 631 Table 11-13. HDBCR0 field descriptions (continued) 1 Bits Name Description 26 SNPDIS Snoop Disable 0 Snooping is not disabled. Snoops are processed normally according to the settings of L1CSR0DCE. 1 Snoop lookups are disabled. Snoops are processed in the same manner as when the data cache is disabled, i.e null responses are generated and no snoop lookups are performed. 27 — 28 DSB 29 DSTRM 30 — 31 ISTRM Reserved1 Disable Store Buffer 0 Store buffer enabled 1 Store buffer disabled Disable Data Cache Streaming 0 DCache streaming is enabled 1 DCache streaming is disabled Reserved1 Disable Instruction Cache Streaming 0 ICache streaming is enabled 1 ICache streaming is disabled These bits are not implemented and should be written with zero for future compatibility. 11.21 Hardware cache coherency Hardware cache coherency is supported to allow for dual-core or CPU + I/O coherency. The cache must operate in writethrough mode for those pages of memory requiring coherency operations. Coherency is maintained by the use of snoop invalidation commands provided to the CPU through a dedicated snoop interface port. Snooping is only performed while the data cache is enabled (L1CSR0DCE =1). Figure 11-18 shows an abstract block diagram of the structure. e200z759n3 Core Reference Manual, Rev. 2 632 Freescale Semiconductor DCache and Cache Control CPU Arbiter Snoop Port Control p_snp_rdy p_snp_ack, p_snp_resp[0:4] p_snp_id_out[0:3] p_snp_req p_snp_cmd[0:1] Snoop Command Queue p_snp_addr_in[0:26] p_snp_id_in[0:3] id[0:3] snp_addr[0:26] cmd p_cac_stalled Figure 11-18. Snoop command port 11.21.1 Coherency protocol The cache operates in a 2-state protocol for coherency purposes. The only state a coherent cache line should assume is Valid or Invalid. No Modified or Shared state is supported for coherent cache lines (although modified state is available for non-coherent lines), thus no snoop copyback or intervention operations are required. A snoop invalidation signaling port is provided to receive coherency requests. Snoop invalidation requests are received at the snoop invalidation port, and arbitrate with the CPU for access to the data cache tags for lookup and cache line invalidation. External coherency logic provides snoop invalidation requests to the snoop invalidation port based on the bus activity of other coherent bus masters, and these invalidation requests are later processed and a response provided. Memory regions that require coherency operations must be marked as “memory coherence required” (page’s M bit set) and as “writethrough” (page’s W bit set). External data accesses by the CPU reflect the value of the M bit of the accessed page on the p_d_gbl output. Typically, external coherency logic will monitor external accesses by a CPU (or other agent), and will request invalidation operations to other coherent entities for write accesses that also have p_d_gbl asserted. Non-shared data should be placed into pages with the M bit cleared, thus avoiding unnecessary coherency operations. 11.21.2 Snoop command port The snoop command port provides the signaling mechanism between external coherency logic and the snoop request queue. Command requests are received on the p_snp_cmd[0:1], p_snp_id_in[0:3], and p_snp_addr_in[0:26] inputs when the p_snp_req signal is properly asserted, and responses to snoop command requests are provided on the p_snp_ack, p_snp_resp[0:4], and p_snp_id_out[0:3] outputs. e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 633 Snoop invalidation requests provide the physical address of the data to be invalidated (p_snp_addr_in[0:26]), along with a four-bit ID field (p_snp_id_in[0:3]), which flows through the command pipeline and is returned on the p_snp_id_out[0:3] output port along with the completion status provided on p_snp_resp[0:4] when p_snp_ack is asserted. The p_snp_rdy output signal provides a handshaking mechanism for flow control of snoop requests to prevent overflow of the internal snoop queue, which buffers incoming snoop requests from the snoop command port prior to cache tag lookups and updates. Negation of p_snp_rdy indicates that another snoop command port request will not be accepted due to resource constraints in the snoop pipeline. Refer to Section 14.2.9, Coherency control signals, for details on the operating protocol of the snoop command port. The command value is stored in the snoop queue along with the snoop address and snoop ID value. Table 11-14 shows the definitions of the p_snp_cmd[0:1] encodings. Table 11-14. p_snp_cmd[0:1] Snoop command encoding p_snp_cmd[0:1] Response type 00 Null - no status bit operation performed, lookup is performed 01 INV - invalidate matching cache entry 10 SYNC - synchronize snoop queue 11 Reserved The NULL command is used for testing of interface handshaking and other status gathering purposes. The NULL command performs a snoop lookup operation, but performs no actual cache tag or status modifications (even in the presence of tag EDC errors). The INV command causes a snoop lookup and subsequent invalidation of a matching cache line. The SYNC command causes the snoop queue to be emptied with highest priority relative to CPU requests. Table 11-14 shows the definitions of the p_snp_resp[0:4] encodings. Table 11-15. p_snp_resp[0:4] Snoop response encoding p_snp_resp[0:4]1 Response type 000cc NULL - no operation performed or no matching cache entry 001cc AutoInv - AutoInvalidation performed on clean unlocked lines with tag parity errors 010cc ERROR - Error in processing a snoop request due to TAG parity error. For NULL commands, a tag parity error occurred and no hit to a tag without error occurred. No modification of cache entries, no machine check generated internally. For INV commands, a) possible invalidation of locked line with tag parity error occurred, or b) dirty line left valid with tag parity error, or c) no true hit occurred, and one or more lines reported tag parity errors. Machine check generated internally. 01100 SYNC - Sync completed, snoop queue synchronized 100cc HIT Clean- matching unlocked cache entry found e200z759n3 Core Reference Manual, Rev. 2 634 Freescale Semiconductor Table 11-15. p_snp_resp[0:4] Snoop response encoding (continued) p_snp_resp[0:4]1 1 Response type 101cc HIT Dirty- matching unlocked dirty cache entry found 110cc HIT Locked - matching clean locked cache entry found 111cc HIT Dirty Locked - matching dirty locked cache entry found cc - # collapsed requests; 00-no collapsing, 01- two requests combined, 10- three requests combined, 11- four requests combined The NULL response indicates there was no matching cache entry found for a null or invalidate command or the cache was disabled when the request was originally made. The HIT responses indicates that a matching cache entry was found. The SYNC response indicated all previous entries in the snoop queue were emptied. The ERROR response indicates that an error occurred in processing a snoop request due to a cache tag parity error. The AutoInv response indicates one or more cache lines with tag parity errors was invalidated. Responses for a Null command are either NULL, HIT, or ERROR. Responses for an INV command are either Null (no hit occurred or cache is disabled), Hit (a matching entry was found and invalidated), or ERROR (a tag parity error was found and left valid, no guarantee of the command success). Responses for a Sync command are SYNC completed. 11.21.3 Snoop request queue The snoop request queue provides a queueing mechanism between the snoop command port and the cache. As requests are accepted from the snoop invalidate port, they are queued into an 8-deep fifo queue for arbitration to the cache for tag and status lookup and conditional status clearing. Snoops can be collapsed within the queue under certain circumstances to minimize the number of invalidation lookups performed. When two consecutive snoop requests refer to the same cache line, they are collapsed (timing permitting) into a single snoop invalidation cycle. Collapsed entries are indicated complete via an encoding of the p_snp_resp[0:4] status outputs. Snoop invalidation requests have a lower priority than CPU data accesses or change of flow accesses when only a single queue entry is occupied. This allows for some optimization in cycle-stealing of the tag array from the CPU in an attempt to minimize CPU stalls. Snoop invalidation request priority is raised when a “snoop sync” command is received on the snoop command port or when a sync request is generated on the synchronization port (p_sync_req_in), regardless of the number of active queue entries. 11.21.4 Snoop lookup operation Entries in the snoop request queue are processed in-order after arbitrating for the cache tag and status bit arrays. Once the CPU has been stalled from performing further tag accesses, the snoop request queue is processed by performing a tag lookup, and a subsequent status bit write to clear the valid bit of a matching valid entry. Invalidation hits require two tag array accesses to first read, and then to update the valid bit. A subsequent snoop lookup may be pipelined while the first lookup of a pair of lookups is being processed to determine a hit/miss condition. In this manner, a pair of hitting invalidation requests will block the CPU e200z759n3 Core Reference Manual, Rev. 2 Freescale Semiconductor 635 for a total of 5 cycles. A single snoop lookup requires 3 cycles of latency on a miss, and 4 cycles on a hit prior to allowing the CPU to resume cache accesses. If the snoop queue contains enough entries, snoop read and write accesses to the cache tag are pipelined, and the total blockage will be 3*number_of_hits + number_of_misses + 1. In certain cases where the CPU has pipelined one or more cache misses, initial snoop accesses will be interlaced with CPU tag accesses prior to assuming highest priority in order to allow for proper operation of linefill and copyback operations initiated by the CPU. As entries are removed from the queue and the invalidation lookups are performed, the results of the lookups are provided on the response output signals, along with the original request ID. 11.21.5 Snoop errors Errors can occur during snoop lookup operations and are signaled on the snoop response output port. Tag parity errors that prevent an accurate hit/miss determination on the snoop request address may result in an error response signaled via p_snp_resp[0:4], as well as a machine check to the CPU for the INV command if a locked line was invalidated, if a line was dirty and not invalidated, or if a tag parity error occurred and no hit occurred to a line without error. When such a tag parity error occurs, the invalidation will not occur to the line(s) with error. The snoop queue will continue to be serviced, and the machine check will not necessarily be recoverable. A checkstop condition will not occur however. In this respect, it is treated similarly to a non-maskable interrupt, and the MSR[RI] bit should be used accordingly by software. 11.21.6 Snoop collisions Snoop requests may collide with an outstanding or pending cache linefill. Since there is no particular guarantee of the precise time an actual snoop invalidation lookup will occur relative to a cache linefill request, the CPU may in some instances be in the process of filling a line corresponding to a snoop invalidate request. In this case, the snoop will cause the linefills to be marked such that they are not loaded into the cache. Load miss operations that are in progress may use the data as it returns however. The responses for these collisions will be based on the state the cache line would have taken if the linefill completes successfully. Snoop requests should not c