e200z760n3 Power Architecture Core Reference Manual

e200z760n3 Power Architecture®
Core Reference Manual
Supports
e200z760n3
e200z760RM
Rev. 2
06/2012
How to Reach Us:
Home Page:
www.freescale.com
Web Support:
http://www.freescale.com/support
USA/Europe or Locations Not Listed:
Freescale Semiconductor, Inc.
Technical Information Center, EL516
2100 East Elliot Road
Tempe, Arizona 85284
+1-800-521-6274 or
+1-480-768-2130
www.freescale.com/support
Europe, Middle East, and Africa:
Freescale Halbleiter Deutschland GmbH
Technical Information Center
Schatzbogen 7
81829 Muenchen, Germany
+44 1296 380 456 (English)
+46 8 52200080 (English)
+49 89 92103 559 (German)
+33 1 69 35 48 48 (French)
www.freescale.com/support
Information in this document is provided solely to enable system and software
implementers to use Freescale Semiconductor products. There are no express or
implied copyright licenses granted hereunder to design or fabricate any integrated
circuits or integrated circuits based on the information in this document.
Freescale Semiconductor reserves the right to make changes without further notice to
any products herein. Freescale Semiconductor makes no warranty, representation or
guarantee regarding the suitability of its products for any particular purpose, nor does
Freescale Semiconductor assume any liability arising out of the application or use of
any product or circuit, and specifically disclaims any and all liability, including without
limitation consequential or incidental damages. “Typical” parameters which may be
provided in Freescale Semiconductor data sheets and/or specifications can and do
vary in different applications and actual performance may vary over time. All operating
parameters, including “Typicals” must be validated for each customer application by
customer’s technical experts. Freescale Semiconductor does not convey any license
under its patent rights nor the rights of others. Freescale Semiconductor products are
not designed, intended, or authorized for use as components in systems intended for
Japan:
Freescale Semiconductor Japan Ltd.
Headquarters
ARCO Tower 15F
1-8-1, Shimo-Meguro, Meguro-ku
Tokyo 153-0064
Japan
0120 191014 or
+81 3 5437 9125
[email protected]
surgical implant into the body, or other applications intended to support or sustain life,
Asia/Pacific:
Freescale Semiconductor China Ltd.
Exchange Building 23F
No. 118 Jianguo Road
Chaoyang District
Beijing 100022
China
+86 010 5879 8000
[email protected]
unintended or unauthorized use, even if such claim alleges that Freescale
For Literature Requests Only:
Freescale Semiconductor
Literature Distribution Center
+1-800 441-2447 or
+1-303-675-2140
Fax: +1-303-675-2150
LDCForFreescaleSemiconductor
@hibbertgroup.com
Document Number: e200z760RM
Rev. 2, 06/2012
or for any other application in which the failure of the Freescale Semiconductor product
could create a situation where personal injury or death may occur. Should Buyer
purchase or use Freescale Semiconductor products for any such unintended or
unauthorized application, Buyer shall indemnify and hold Freescale Semiconductor
and its officers, employees, subsidiaries, affiliates, and distributors harmless against all
claims, costs, damages, and expenses, and reasonable attorney fees arising out of,
directly or indirectly, any claim of personal injury or death associated with such
Semiconductor was negligent regarding the design or manufacture of the part.
Freescale, the Freescale logo,and PowerQUICCare trademarks of
Freescale Semiconductor, Inc. Reg. U.S. Pat. & Tm. Off. All other product or
service names are the property of their respective owners.
© 2010-2012 Freescale Semiconductor, Inc.
Contents
Paragraph
Number
Title
Page
Number
Contents
About This Book
Audience ...................................................................................................................... xxxix
Organization....................................................................................................................... xl
Suggested Reading............................................................................................................ xli
General Information...................................................................................................... xli
Acronyms and Abbreviations .......................................................................................... xlii
Terminology Conventions............................................................................................... xliii
Chapter 1
e200z7 Core Complex Overview
1.1
1.1.1
1.2
1.2.1
1.2.2
1.2.2.1
1.2.3
1.2.3.1
1.2.3.2
1.2.3.3
1.2.3.4
1.3
1.3.1
1.3.2
1.3.3
1.3.4
1.3.5
1.3.6
1.3.7
e200z7 Overview ............................................................................................................. 1-1
Features........................................................................................................................ 1-2
Programming Model ........................................................................................................ 1-3
Register Set .................................................................................................................. 1-3
Instruction Set .............................................................................................................. 1-6
VLE Category.......................................................................................................... 1-9
Interrupts and Exception Handling .............................................................................. 1-9
Interrupt Handling ................................................................................................... 1-9
Interrupt Classes .................................................................................................... 1-10
Interrupt Types....................................................................................................... 1-10
Interrupt Registers ................................................................................................. 1-11
Microarchitecture Summary .......................................................................................... 1-13
Instruction Unit Features ........................................................................................... 1-14
Integer Unit Features ................................................................................................. 1-14
Load/Store Unit (LSU) Features................................................................................ 1-15
L1 Cache Features ..................................................................................................... 1-15
Memory Management Unit (MMU) Features ........................................................... 1-16
System Bus (Core Complex Interface) Features........................................................ 1-16
Nexus 3+ Module Features ........................................................................................ 1-16
Chapter 2
Register Model
2.1
2.2
2.3
2.4
2.4.1
2.4.2
Power ISA Embedded Category Registers ...................................................................... 2-4
e200-Specific Special Purpose Registers......................................................................... 2-7
e200-Specific Device Control Registers.......................................................................... 2-8
Special-Purpose Register Descriptions ............................................................................ 2-9
Machine State Register (MSR) .................................................................................... 2-9
Processor ID Register (PIR) ...................................................................................... 2-11
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-iii
Contents
Paragraph
Number
2.4.3
2.4.4
2.4.5
2.4.6
2.4.6.1
2.4.6.2
2.4.7
2.4.8
2.4.9
2.4.10
2.4.11
2.4.12
2.4.13
2.4.14
2.4.15
2.4.16
2.4.17
2.4.18
2.4.19
2.5
2.5.1
2.5.2
2.5.3
2.6
Title
Page
Number
Processor Version Register (PVR)............................................................................. 2-11
System Version Register (SVR)................................................................................. 2-12
Integer Exception Register (XER)............................................................................. 2-13
Exception Syndrome Register ................................................................................... 2-13
Power ISA VLE Mode Instruction Syndrome....................................................... 2-15
Misaligned Instruction Fetch Syndrome................................................................ 2-15
Machine Check Syndrome Register (MCSR)............................................................ 2-16
Timer Control Register (TCR)................................................................................... 2-18
Timer Status Register (TSR)...................................................................................... 2-20
Debug Registers......................................................................................................... 2-20
Hardware Implementation Dependent Register 0 (HID0)......................................... 2-21
Hardware Implementation Dependent Register 1 (HID1)......................................... 2-23
Branch Unit Control and Status Register (BUCSR) .................................................. 2-24
L1 Cache Control and Status Registers (L1CSR0, L1CSR1).................................... 2-25
L1 Cache Configuration Registers (L1CFG0, L1CFG1)........................................... 2-25
L1 Cache Flush and Invalidate Registers (L1FINV0, L1FINV1) ............................. 2-25
MMU Control and Status Register (MMUCSR0) ..................................................... 2-25
MMU Configuration Register (MMUCFG) .............................................................. 2-25
TLB Configuration Registers (TLB0CFG, TLB1CFG)............................................. 2-26
SPR Register Access...................................................................................................... 2-26
Invalid SPR References ............................................................................................. 2-26
Synchronization Requirements for SPRs................................................................... 2-26
Special Purpose Register Summary........................................................................... 2-27
Reset Settings................................................................................................................. 2-31
Chapter 3
Instruction Model
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
3.10
3.10.1
3.11
3.12
Unsupported Instructions and Instruction Forms............................................................. 3-1
Implementation Specific Instructions .............................................................................. 3-1
Power ISA Embedded Category Instruction Extensions ................................................. 3-2
Memory Access Alignment Support................................................................................ 3-2
Memory Synchronization and Reservation Instructions.................................................. 3-2
Branch Prediction ............................................................................................................ 3-4
Interruption of Instructions by Interrupt Requests........................................................... 3-4
New e200 Functionality................................................................................................... 3-4
ISEL instruction ............................................................................................................... 3-5
Enhanced Debug .............................................................................................................. 3-5
Debug Notify Halt Instructions.................................................................................... 3-7
Machine Check .............................................................................................................. 3-10
WAIT Instruction ........................................................................................................... 3-12
e200z7 Power Architecture Core Reference Manual, Rev. 2
-iv
Freescale Semiconductor
Contents
Paragraph
Number
3.13
3.14
3.15
3.16
3.16.1
3.16.2
3.16.3
3.16.4
3.17
Title
Page
Number
Enhanced Reservations .................................................................................................. 3-13
Volatile Context Save/Restore Unit ............................................................................... 3-16
Unimplemented SPRs and Read-Only SPRs ................................................................. 3-24
Invalid Forms of Instructions......................................................................................... 3-24
Load and Store with Update instructions................................................................... 3-24
Load Multiple Word (lmw, e_lmw) instruction......................................................... 3-24
Branch Conditional to Count Register Instructions................................................... 3-25
Instructions With Reserved Fields Non-Zero ............................................................ 3-25
Instruction Summary...................................................................................................... 3-25
Chapter 4
Instruction Pipeline and Execution Timing
4.1
4.1.1
4.1.2
4.1.3
4.1.4
4.1.5
4.2
4.2.1
4.2.2
4.2.3
4.3
4.3.1
4.3.2
4.3.3
4.3.4
4.3.5
4.3.6
4.3.7
4.3.8
4.4
4.5
4.5.1
4.5.2
4.5.3
4.6
4.7
4.8
Overview of Operation .................................................................................................... 4-2
Control Unit ................................................................................................................. 4-3
Instruction Unit ............................................................................................................ 4-3
Branch Unit.................................................................................................................. 4-3
Instruction Decode Unit............................................................................................... 4-4
Exception Handling ..................................................................................................... 4-4
Execution Units................................................................................................................ 4-4
Integer Execution Units ............................................................................................... 4-4
Load/Store Unit............................................................................................................ 4-4
Embedded Floating-point Execution Units.................................................................. 4-4
Instruction Pipeline .......................................................................................................... 4-5
Description of Pipeline Stages ..................................................................................... 4-6
Instruction Prefetch Buffers and Branch Target Buffer ............................................... 4-7
Single-Cycle Instruction Pipeline Operation ............................................................. 4-10
Basic Load and Store Instruction Pipeline Operation................................................ 4-11
Change-of-Flow Instruction Pipeline Operation........................................................ 4-11
Basic Multicycle Instruction Pipeline Operation....................................................... 4-13
Additional Examples of Instruction Pipeline Operation for Load and Store............. 4-14
Move to/from SPR Instruction Pipeline Operation.................................................... 4-16
Control Hazards ............................................................................................................. 4-18
Instruction Serialization ................................................................................................. 4-18
Completion Serialization ........................................................................................... 4-18
Dispatch Serialization ................................................................................................ 4-19
Refetch Serialization.................................................................................................. 4-19
Concurrent Instruction Execution .................................................................................. 4-19
Instruction Timings ........................................................................................................ 4-20
Operand Placement on Performance.............................................................................. 4-25
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-v
Contents
Paragraph
Number
Title
Page
Number
Chapter 5
Embedded Floating-Point Unit
5.1
5.2
5.2.1
5.2.2
5.2.3
5.2.4
5.2.5
5.2.5.1
5.2.5.2
5.2.5.3
5.2.6
5.3
5.3.1
5.3.1.1
5.3.1.2
5.3.2
5.3.3
5.3.4
5.3.5
5.4
5.5
5.5.1
5.5.2
5.6
Nomenclature and Conventions....................................................................................... 5-1
EFPU Programming Model ............................................................................................. 5-1
Signal Processing Extension/Embedded Floating-Point Status and Control Register
(SPEFSCR) .............................................................................................................. 5-2
GPRs and Power ISA Embedded Category Instructions ............................................. 5-5
SPE/EFPU Available Bit in MSR................................................................................ 5-5
Embedded Floating-point Exception Bit in ESR......................................................... 5-5
EFPU Exceptions......................................................................................................... 5-5
EFPU Unavailable Exception .................................................................................. 5-6
Embedded Floating-point Data Exception............................................................... 5-6
Embedded Floating-point Round Exception ........................................................... 5-6
Exception Priorities...................................................................................................... 5-7
Embedded Floating-Point Unit Operations...................................................................... 5-7
Floating-point Data Formats........................................................................................ 5-8
Single-Precision Floating-point Format .................................................................. 5-8
Half-Precision Floating-point Format...................................................................... 5-9
IEEE 754 Compliance ............................................................................................... 5-10
Floating-Point Exceptions.......................................................................................... 5-11
Embedded Scalar Single-Precision Floating-Point Instructions................................ 5-11
EFPU Vector Single-precision Embedded Floating-Point Instructions..................... 5-44
Embedded Floating-point Results Summary ................................................................. 5-92
EFPU Instruction Timing............................................................................................. 5-107
EFPU Single-Precision Vector Floating-Point Instruction Timing.......................... 5-108
EFPU Single-precision Scalar Floating-Point Instruction Timing .......................... 5-109
Instruction Forms and Opcodes ....................................................................................5-111
Chapter 6
Signal Processing Extension (SPE)
6.1
6.2
6.2.1
6.2.2
6.2.3
6.2.3.1
6.2.4
6.2.5
6.2.6
Nomenclature and Conventions....................................................................................... 6-1
SPE Programming Model ................................................................................................ 6-2
GPR Registers.............................................................................................................. 6-2
Accumulator Register .................................................................................................. 6-3
SPE Status and Control Register (SPEFSCR) ............................................................. 6-4
Context Switch......................................................................................................... 6-6
GPRs and Power ISA Instructions............................................................................... 6-6
SPE Available Bit in MSR........................................................................................... 6-6
SPE Exception Bit in ESR ........................................................................................... 6-6
e200z7 Power Architecture Core Reference Manual, Rev. 2
-vi
Freescale Semiconductor
Contents
Paragraph
Number
6.2.7
6.2.7.1
6.2.7.2
6.2.8
6.2.8.1
6.2.8.2
6.2.8.3
6.2.8.4
6.2.8.5
6.2.8.6
6.2.8.7
6.2.8.8
6.2.8.9
6.2.9
6.2.9.1
6.2.9.1.1
6.2.9.1.2
6.2.9.2
6.2.9.3
6.2.9.3.1
6.2.9.3.2
6.2.9.3.3
6.2.9.4
6.2.10
6.2.10.1
6.2.10.2
6.2.11
6.3
6.3.1
6.3.2
6.3.3
6.3.4
6.3.5
6.3.6
6.3.7
6.3.8
6.3.9
6.3.10
6.3.11
Title
Page
Number
Data Formats................................................................................................................ 6-7
Integer Format ......................................................................................................... 6-7
Fractional Format..................................................................................................... 6-7
Computational Operations ........................................................................................... 6-7
Simple Vector Arithmetic Instructions .................................................................... 6-8
Vector Logical Instructions.................................................................................... 6-15
Vector Shift/Rotate Instructions............................................................................. 6-15
Vector Compare and Vector Set Instructions ......................................................... 6-16
Vector Select Instructions ...................................................................................... 6-16
Vector Data Arrangement Instructions .................................................................. 6-17
Multiply and accumulate instructions.................................................................... 6-21
Dot product instructions ........................................................................................ 6-23
Miscellaneous Vector Instructions......................................................................... 6-24
Load and Store Instructions ....................................................................................... 6-25
Addressing Modes—Non-Update forms ............................................................... 6-25
Base + Scaled Immediate Addressing—Non-Update Form.............................. 6-25
Base + Index Addressing................................................................................... 6-25
Addressing Modes—Update forms ....................................................................... 6-26
Addressing Modes—Modify forms....................................................................... 6-26
Linear Addressing Update Mode....................................................................... 6-26
Circular Addressing Modify Mode.................................................................... 6-27
Bit-Reversed Addressing Modify Mode............................................................ 6-27
Vector Load and Store Instruction Summary......................................................... 6-28
SPE Exceptions.......................................................................................................... 6-30
SPE/Embedded Floating-point Unavailable Exception......................................... 6-30
SPE Vector Alignment Exception.......................................................................... 6-30
Exception Priorities.................................................................................................... 6-31
SPE Instruction Timing.................................................................................................. 6-31
SPE Simple Vector Arithmetic Instructions Timing.................................................. 6-31
SPE Complex Integer Instruction Timing.................................................................. 6-34
SPE Vector Logical Instruction Timing..................................................................... 6-34
SPE Vector Shift/Rotate Instruction Timing.............................................................. 6-35
SPE Vector Compare and Vector Set Instruction Timing .......................................... 6-35
SPE Vector Select Instruction Timing ....................................................................... 6-36
SPE Vector Data Arrangement Instruction Timing ................................................... 6-36
SPE Multiply and Multiply/Accumulate Instruction Timing .................................... 6-39
SPE Dot Product Instruction Timing ......................................................................... 6-39
SPE Misc. Vector Instruction Timing ........................................................................ 6-39
SPE Load and Store Instruction Timing .................................................................... 6-39
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-vii
Contents
Paragraph
Number
Title
Page
Number
Chapter 7
Interrupts and Exceptions
7.1
7.2
7.3
7.3.1
7.4
7.5
7.6
7.6.1
7.6.2
7.6.2.1
7.6.2.1.1
7.6.2.1.2
7.6.2.1.3
7.6.2.2
7.6.2.3
7.6.3
7.6.4
7.6.5
7.6.6
7.6.7
7.6.8
7.6.9
7.6.10
7.6.11
7.6.12
7.6.13
7.6.14
7.6.15
7.6.16
7.6.17
7.6.18
7.6.19
7.6.20
7.6.21
7.7
7.7.1
7.8
7.8.1
e200 Interrupts ................................................................................................................. 7-2
Exception Syndrome Register ......................................................................................... 7-4
Machine State Register .................................................................................................... 7-6
Machine Check Syndrome Register (MCSR).............................................................. 7-8
Interrupt Vector Prefix Registers (IVPR)....................................................................... 7-12
Interrupt Vector Offset Registers (IVORxx) .................................................................. 7-12
Interrupt Definitions ...................................................................................................... 7-13
Critical Input Interrupt (IVOR0)................................................................................ 7-14
Machine Check Interrupt (IVOR1)............................................................................ 7-14
Machine Check Causes.......................................................................................... 7-15
Error Report Machine Check Exceptions .......................................................... 7-15
Nonmaskable Interrupt Machine Check Exceptions ......................................... 7-20
Asynchronous Machine Check Exceptions ....................................................... 7-20
Machine Check Interrupt Actions.......................................................................... 7-27
Checkstop State ..................................................................................................... 7-28
Data Storage Interrupt (IVOR2) ................................................................................ 7-28
Instruction Storage Interrupt (IVOR3) ...................................................................... 7-29
External Input Interrupt (IVOR4) .............................................................................. 7-30
Alignment Interrupt (IVOR5).................................................................................... 7-31
Program Interrupt (IVOR6) ....................................................................................... 7-31
Floating-Point Unavailable Interrupt (IVOR7).......................................................... 7-32
System Call Interrupt (IVOR8).................................................................................. 7-33
Auxiliary Processor Unavailable Interrupt (IVOR9)................................................. 7-34
Decrementer Interrupt (IVOR10) .............................................................................. 7-34
Fixed-Interval Timer Interrupt (IVOR11).................................................................. 7-34
Watchdog Timer Interrupt (IVOR12) ........................................................................ 7-35
Data TLB Error Interrupt (IVOR13) ......................................................................... 7-36
Instruction TLB Error Interrupt (IVOR14)................................................................ 7-36
Debug Interrupt (IVOR15) ........................................................................................ 7-37
System Reset Interrupt............................................................................................... 7-40
SPE/EFPU Unavailable Interrupt (IVOR32)............................................................. 7-41
Embedded Floating-Point Data Interrupt (IVOR33) ................................................. 7-41
Embedded Floating-Point Round Interrupt (IVOR34) .............................................. 7-42
Performance Monitor Interrupt (IVOR35) ................................................................ 7-43
Exception Recognition and Priorities ............................................................................ 7-43
Exception Priorities.................................................................................................... 7-45
Interrupt Processing ....................................................................................................... 7-48
Enabling and Disabling Exceptions........................................................................... 7-49
e200z7 Power Architecture Core Reference Manual, Rev. 2
-viii
Freescale Semiconductor
Contents
Paragraph
Number
7.8.2
7.9
Title
Page
Number
Returning from an Interrupt Handler ......................................................................... 7-50
Process Switching .......................................................................................................... 7-50
Chapter 8
Performance Monitor
8.1
8.2
8.3
8.3.1
8.3.2
8.3.3
8.3.4
8.3.5
8.3.6
8.3.7
8.3.8
8.3.9
8.3.10
8.4
8.5
8.5.1
8.6
8.6.1
8.6.2
8.7
Overview.......................................................................................................................... 8-1
Performance Monitor Instructions ................................................................................... 8-2
Performance Monitor Registers ....................................................................................... 8-3
Invalid PMR References.............................................................................................. 8-4
References to Read-only PMRs................................................................................... 8-4
Global Control Register 0 (PMGC0) ........................................................................... 8-5
User Global Control Register 0 (UPMGC0)................................................................ 8-6
Local Control A Registers (PMLCa0–PMLCa3) ........................................................ 8-6
User Local Control A Registers (UPMLCa0–UPMLCa3) .......................................... 8-7
Local Control B Registers (PMLCb0–PMLCb3) ........................................................ 8-7
User Local Control B Registers (UPMLCb0–UPMLCb3)........................................ 8-12
Performance Monitor Counter Registers (PMC0–PMC3)......................................... 8-12
User Performance Monitor Counter Registers (UPMC0–UPMC3) .......................... 8-13
Performance Monitor Interrupt ...................................................................................... 8-13
Event Counting .............................................................................................................. 8-14
MSR-based Context Filtering .................................................................................... 8-14
Examples........................................................................................................................ 8-15
Chaining Counters ..................................................................................................... 8-15
Thresholding .............................................................................................................. 8-15
Event Selection .............................................................................................................. 8-15
Chapter 9
L1 Cache
9.1
9.2
9.3
9.4
9.4.1
9.4.2
9.4.3
9.4.4
9.5
9.6
9.7
Overview.......................................................................................................................... 9-1
16-KB Cache Organization.............................................................................................. 9-3
Cache Lookup .................................................................................................................. 9-3
Cache Control .................................................................................................................. 9-5
L1 Cache Control and Status Register 0 (L1CSR0) .................................................... 9-5
L1 Cache Control and Status Register 1 (L1CSR1) .................................................... 9-9
L1 Cache Configuration Register 0 (L1CFG0) ......................................................... 9-10
L1 Cache Configuration Register 1 (L1CFG1) ......................................................... 9-11
Data Cache Software Coherency ................................................................................... 9-12
Address Aliasing............................................................................................................ 9-12
Cache Operation ............................................................................................................ 9-12
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-ix
Contents
Paragraph
Number
9.7.1
9.7.2
9.7.3
9.7.4
9.7.5
9.7.6
9.7.7
9.7.8
9.7.9
9.7.9.1
9.7.9.2
9.8
9.8.1
9.8.1.1
9.8.1.2
9.8.1.2.1
9.8.1.2.2
9.8.1.2.3
9.8.2
9.8.2.1
9.8.2.2
9.8.2.3
9.8.2.4
9.8.2.5
9.8.2.6
9.8.2.7
9.8.2.8
9.8.3
9.8.4
9.8.5
9.8.6
9.8.7
9.8.8
9.9
9.10
9.10.1
9.10.2
9.10.3
9.10.4
9.10.5
Title
Page
Number
Cache Enable/Disable ................................................................................................ 9-13
Cache Fills ................................................................................................................. 9-13
Cache Line Replacement ........................................................................................... 9-14
Cache Miss Access Ordering..................................................................................... 9-14
Cache-Inhibited Accesses .......................................................................................... 9-15
Guarded Accesses ...................................................................................................... 9-15
Cache-Inhibited Guarded Accesses ........................................................................... 9-15
Cache Invalidation ..................................................................................................... 9-15
Cache Flush/Invalidate by Set and Way .................................................................... 9-16
L1 Flush/Invalidate Register 0 (L1FINV0) ........................................................... 9-16
L1 Flush/Invalidate Register 1 (L1FINV1) ........................................................... 9-17
Cache Parity and EDC Protection.................................................................................. 9-18
Cache Error Action Control....................................................................................... 9-19
L1CSR0[DCEA]/L1CSR1[ICEA] = 00, Machine Check Generation on Error .... 9-19
L1CSR0[DCEA]/L1CSR1[ICEA] = 01, Correction/Auto-invalidation on Error . 9-20
Instruction Cache Errors .................................................................................... 9-20
Data Cache Errors.............................................................................................. 9-21
Data cache line flush or invalidation due to reservation instructions (l[b,h,w]arx,
st[b,h,w]cx.)................................................................................................... 9-22
Parity/EDC Error Handling for Cache Control Operations and Instructions ............ 9-22
L1FINV0/L1FINV1 Operations ............................................................................ 9-23
Cache touch instructions (dcbt, dcbtst, icbt).......................................................... 9-23
icbi instructions...................................................................................................... 9-23
dcbi instructions..................................................................................................... 9-23
dcbst instructions ................................................................................................... 9-24
dcbf Instructions ................................................................................................... 9-24
dcbz Instructions.................................................................................................... 9-25
Cache Locking Instructions (dcbtls, dcbtstls, dcblc, icbtls, icblc)......................... 9-25
Cache Inhibited Accesses and Parity/EDC Errors..................................................... 9-26
Snoop Operations and Parity/EDC Errors ................................................................. 9-26
EDC Checkbit/Syndrome Coding Scheme Generation—Icache............................... 9-26
EDC Checkbit/Syndrome Coding Scheme Generation—Dcache ............................. 9-28
Cache Error Injection................................................................................................. 9-28
Cache Error Cross-Signaling ..................................................................................... 9-29
Push and Store Buffers................................................................................................... 9-29
Cache Management Instructions.................................................................................... 9-30
Instruction Cache Block Invalidate (icbi) Instruction................................................ 9-30
Instruction Cache Block Touch (icbt) Instruction ..................................................... 9-30
Data Cache Block Allocate (dcba) Instruction .......................................................... 9-30
Data Cache Block Flush (dcbf) Instruction ............................................................... 9-30
Data Cache Block Invalidate (dcbi) Instruction ........................................................ 9-31
e200z7 Power Architecture Core Reference Manual, Rev. 2
-x
Freescale Semiconductor
Contents
Paragraph
Number
9.10.6
9.10.7
9.10.8
9.10.9
9.11
9.12
9.12.1
9.12.2
9.12.3
9.12.4
9.13
9.13.1
9.13.2
9.14
9.15
9.16
9.16.1
9.16.2
9.16.3
9.16.4
9.16.5
9.17
9.18
9.19
9.19.1
9.19.2
9.19.3
9.19.3.1
9.20
9.21
9.21.1
9.21.2
9.21.3
9.21.4
9.21.5
9.21.6
9.21.7
9.21.7.1
9.21.7.2
9.21.8
9.21.9
Title
Page
Number
Data Cache Block Store (dcbst) Instruction .............................................................. 9-31
Data Cache Block Touch (dcbt) Instruction .............................................................. 9-31
Data Cache Block Touch for Store (dcbtst) Instruction............................................. 9-31
Data Cache Block set to Zero (dcbz) Instruction....................................................... 9-31
Touch Instructions.......................................................................................................... 9-32
Cache Line Locking/Unlocking Unit............................................................................. 9-32
Overview.................................................................................................................... 9-32
Instruction Details...................................................................................................... 9-34
Effects of Other Cache Instructions on Locked Lines ............................................... 9-41
Flash Clearing of Lock Bits ....................................................................................... 9-41
Cache Instructions and Exceptions ................................................................................ 9-42
Exception Conditions for Cache Instructions ............................................................ 9-42
Transfer Type Encodings for Cache Management Instructions................................. 9-43
Sequential Consistency .................................................................................................. 9-44
Self-Modifying Code Requirements .............................................................................. 9-44
Page Table Control Bits ................................................................................................. 9-45
Write-through Stores.................................................................................................. 9-45
Cache-Inhibited Accesses .......................................................................................... 9-45
Memory Coherence Required.................................................................................... 9-45
Guarded Storage ........................................................................................................ 9-45
Misaligned Accesses and the Endian (E) Bit............................................................. 9-45
Reservation Instructions and Cache Interactions........................................................... 9-46
Effect of Hardware Debug on Cache Operation ............................................................ 9-46
Cache Memory Access For Debug/Error Handling....................................................... 9-46
Cache Memory Access via Software ......................................................................... 9-46
Cache Memory Access Through JTAG/OnCE Port .................................................. 9-48
Cache Debug Access Control Register (CDACNTL) ............................................... 9-48
Cache Debug Access Data Register (CDADATA) ................................................ 9-49
Hardware Debug (Cache) Control Register 0................................................................ 9-50
Hardware Debug (Cache) Coherency ............................................................................ 9-51
Coherency Protocol.................................................................................................... 9-52
Snoop Command Port................................................................................................ 9-53
Snoop Request Queue................................................................................................ 9-54
Snoop Lookup Operation........................................................................................... 9-55
Snoop Errors .............................................................................................................. 9-55
Snoop Collisions ........................................................................................................ 9-55
Snoop Synchronization .............................................................................................. 9-56
Synchronization Port Request................................................................................ 9-56
Snoop Command Port Request.............................................................................. 9-56
Starvation Control...................................................................................................... 9-56
Queue Flow Control................................................................................................... 9-57
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xi
Contents
Paragraph
Number
9.21.10
Title
Page
Number
Snooping in Low Power States .................................................................................. 9-57
Chapter 10
Memory Management Unit
10.1
10.2
10.2.1
10.2.2
10.2.3
10.2.4
10.2.5
10.2.6
10.3
10.4
10.4.1
10.4.2
10.4.3
10.5
10.5.1
10.5.2
10.5.3
10.5.4
10.5.5
10.6
10.6.1
10.6.2
10.6.3
10.6.4
10.6.5
10.6.6
10.6.7
10.6.8
10.7
10.7.1
10.7.2
10.7.3
10.7.4
10.8
10.9
10.9.1
Overview........................................................................................................................ 10-1
Effective to Real Address Translation ........................................................................... 10-1
Effective Addresses ................................................................................................... 10-1
Address Spaces .......................................................................................................... 10-2
Process ID .................................................................................................................. 10-2
Translation Flow ........................................................................................................ 10-2
Permissions ................................................................................................................ 10-4
Restrictions on 1-KB and 2-KB Page Size Usage ..................................................... 10-5
Translation Lookaside Buffer ........................................................................................ 10-5
Configuration Information ............................................................................................. 10-6
MMU Configuration Register (MMUCFG) .............................................................. 10-6
TLB0 Configuration Register (TLB0CFG) ............................................................... 10-7
TLB1 Configuration Register (TLB1CFG) ............................................................... 10-8
Software Interface and TLB Instructions....................................................................... 10-9
TLB Read Entry Instruction (tlbre) ........................................................................... 10-9
TLB Write Entry Instruction (tlbwe) ....................................................................... 10-10
TLB Search Instruction (tlbsx) ................................................................................ 10-10
TLB Invalidate (tlbivax) Instruction........................................................................ 10-11
TLB Synchronize Instruction (tlbsync) ................................................................... 10-12
TLB Operations ........................................................................................................... 10-12
Translation Reload ................................................................................................... 10-12
Reading the TLB...................................................................................................... 10-13
Writing the TLB....................................................................................................... 10-13
Searching the TLB ................................................................................................... 10-13
TLB Miss Exception Update ................................................................................... 10-13
IPROT Invalidation Protection ................................................................................ 10-13
TLB Load on Reset.................................................................................................. 10-14
The G bit .................................................................................................................. 10-14
MMU Control Registers .............................................................................................. 10-14
Data Exception Address Register (DEAR).............................................................. 10-15
MMU Control and Status Register 0 (MMUCSR0) ................................................ 10-15
MMU Assist Registers (MAS) ................................................................................ 10-16
MAS Register Updates ............................................................................................ 10-21
TLB Coherency Control .............................................................................................. 10-21
Core Interface Operation for MMU Control Instructions............................................ 10-22
Transfer Type Encodings for MMU Control Instructions ....................................... 10-22
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xii
Freescale Semiconductor
Contents
Paragraph
Number
10.10
10.11
Title
Page
Number
Effect of Hardware Debug on MMU Operation .......................................................... 10-23
External Translation Alterations for Realtime Systems............................................... 10-23
Chapter 11
External Core Complex Interfaces
11.1
11.2
11.2.1
11.2.2
11.2.2.1
11.2.2.2
11.2.2.3
11.2.2.4
11.2.2.5
11.2.2.6
11.2.2.7
11.2.2.8
11.2.3
11.2.3.1
11.2.3.2
11.2.3.3
11.2.4
11.2.4.1
11.2.4.2
11.2.4.3
11.2.4.4
11.2.4.5
11.2.4.6
11.2.4.7
11.2.4.8
11.2.5
11.2.5.1
11.2.5.2
11.2.6
11.2.6.1
11.2.6.2
11.2.6.3
11.2.7
11.2.7.1
11.2.7.2
Signal Index ................................................................................................................... 11-2
Signal Descriptions ...................................................................................................... 11-10
e200 Processor Clock (m_clk)................................................................................. 11-10
Reset-related Signals................................................................................................ 11-10
Power-on Reset (m_por).......................................................................................11-11
Reset (p_reset_b) ..................................................................................................11-11
Watchdog Reset Status (p_wrs[0:1]) ....................................................................11-11
Debug Reset Control (p_dbrstc[0:1]) ...................................................................11-11
Reset Base (p_rstbase[0:29]) ............................................................................... 11-12
Reset Endian Mode (p_rst_endmode) ................................................................. 11-12
Reset VLE Mode (p_rst_vlemode)...................................................................... 11-12
JTAG/OnCE Reset (j_trst_b) ............................................................................... 11-12
Address and Data Buses .......................................................................................... 11-12
Address Bus (p_d_haddr[31:0], p_i_haddr[31:0]) .............................................. 11-12
Read Data Bus (p_d_hrdata[63:0], p_i_hrdata[63:0]) ......................................... 11-12
Write Data Bus (p_d_hwdata[63:0]).................................................................... 11-13
Transfer Attribute Signals........................................................................................ 11-13
Transfer Type (p_d_htrans[1:0], p_i_htrans[1:0]) ............................................... 11-14
Write (p_d_hwrite, p_i_hwrite) ........................................................................... 11-14
Transfer Size (p_d_hsize[1:0], p_i_hsize[1:0]) ................................................... 11-14
Burst Type (p_d_hburst[2:0], p_i_hburst[2:0]) ................................................... 11-15
Protection Control (p_d_hprot[5:0], p_i_hprot[5:0]) .......................................... 11-15
Data Transfer Error (p_d_htrans_derr) ................................................................ 11-17
Globally Coherent Access—(p_d_gbl)................................................................ 11-17
Cache Way Replacement (p_d_wayrep[0:1], p_i_wayrep[0:1]) ......................... 11-17
Byte Lane Specification........................................................................................... 11-17
Unaligned Access (p_d_hunalign, p_i_hunalign)................................................ 11-17
Byte Strobes (p_d_hbstrb[7:0], p_i_hbstrb[7:0])................................................. 11-18
Transfer Control Signals .......................................................................................... 11-28
Transfer Ready (p_d_hready, p_i_hready) .......................................................... 11-28
Transfer Response (p_d_hresp[2:0], p_i_hresp[1:0]) .......................................... 11-28
Bus Stall Global Write Request (p_stall_bus_gwrite) ......................................... 11-29
AHB Clock Enable Signals...................................................................................... 11-29
Instruction AHB Clock Enable (p_i_ahb_clken)................................................. 11-29
Data AHB Clock Enable (p_d_ahb_clken).......................................................... 11-29
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xiii
Contents
Paragraph
Number
11.2.8
11.2.8.1
11.2.8.2
11.2.9
11.2.9.1
11.2.9.2
11.2.9.3
11.2.9.4
11.2.9.5
11.2.9.6
11.2.9.7
11.2.9.8
11.2.9.9
11.2.9.10
11.2.10
11.2.10.1
11.2.10.2
11.2.10.3
11.2.10.4
11.2.11
11.2.11.1
11.2.11.2
11.2.11.3
11.2.11.4
11.2.11.5
11.2.11.6
11.2.11.7
11.2.11.8
11.2.12
11.2.13
11.2.13.1
11.2.13.2
11.2.13.3
11.2.13.4
11.2.13.5
11.2.13.6
11.2.13.7
11.2.13.8
11.2.13.9
11.2.13.10
11.2.13.11
Title
Page
Number
Master ID Configuration Signals............................................................................. 11-30
CPU Master ID (p_masterid[3:0]) ....................................................................... 11-30
Nexus Master ID (nex_masterid[3:0])................................................................. 11-30
Coherency Control Signals ...................................................................................... 11-30
Snoop Ready (p_snp_rdy) ................................................................................... 11-30
Snoop Request (p_snp_req)................................................................................. 11-30
Snoop Command Input (p_snp_cmd_in[0:1]) ..................................................... 11-31
Snoop Request ID Input (p_snp_id_in[0:3]) ....................................................... 11-31
Snoop Address Input (p_snp_addr_in[0:26]) ...................................................... 11-31
Snoop Acknowledge (p_snp_ack) ....................................................................... 11-31
Snoop Request ID Output (p_snp_id_out[0:3])................................................... 11-32
Snoop Response (p_snp_resp[0:4]) ..................................................................... 11-32
Cache Stalled (p_cac_stalled).............................................................................. 11-32
Data Cache Enabled (p_d_cache_en) .................................................................. 11-32
Memory Synchronization Control Signals .............................................................. 11-33
Synchronization Request In (p_sync_req_in)...................................................... 11-33
Synchronization Request Acknowledge Out (p_sync_ack_out) ......................... 11-33
Synchronization Request Out (p_sync_req_out) ................................................. 11-33
Synchronization Request Acknowledge In (p_sync_ack_in) .............................. 11-34
Interrupt Signals....................................................................................................... 11-34
External Input Interrupt Request (p_extint_b)..................................................... 11-34
Critical Input Interrupt Request (p_critint_b)...................................................... 11-34
Nonmaskable Input Interrupt Request (p_nmi_b) ............................................... 11-34
Interrupt Pending (p_ipend)................................................................................. 11-35
Auto-vector (p_avec_b) ....................................................................................... 11-35
Interrupt Vector Offset (p_voffset[0:15]) ............................................................ 11-35
Interrupt Vector Acknowledge (p_iack) .............................................................. 11-35
Machine Check (p_mcp_b).................................................................................. 11-36
Lockstep Enable Signal (p_lkstep_en) .................................................................... 11-36
Cache Error Cross-signaling Signals ....................................................................... 11-36
Cache Tag Error Out (p_[d,i]_cache_tagerr_out) ................................................ 11-36
Cache Data Error Out (p_[d,i]_cache_dataerr_out)............................................. 11-37
Cache Push Data Error Out (p_d_pusherr_out)................................................... 11-37
Cache Error Address Out (p_[d,i]_cerraddr_out[0:31]) ...................................... 11-37
Cache Tag Error Way(s) Out (p_[d,i]_tagerrway_out[0:3]) ................................ 11-37
Cache Dirty Error Way(s) Out (p_d_drterrway_out[0:3])................................... 11-37
Cache Lock Error Way(s) Out (p_[d,i]_lkerrway_out[0:3])................................ 11-38
Cache Data Error In (p_[d,i]_cache_dataerr_in) ................................................. 11-38
Cache Push Data Error In (p_d_pusherr_in) ....................................................... 11-38
Cache Tag Error In (p_[d,i]_cache_tagerr_in)..................................................... 11-38
Cache Tag Error Way(s) In (p_[d,i]_tagerrway_in[0:3])..................................... 11-38
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xiv
Freescale Semiconductor
Contents
Paragraph
Number
11.2.13.12
11.2.13.13
11.2.14
11.2.14.1
11.2.14.2
11.2.15
11.2.15.1
11.2.15.2
11.2.15.3
11.2.16
11.2.16.1
11.2.16.2
11.2.17
11.2.17.1
11.2.17.2
11.2.17.3
11.2.17.4
11.2.17.5
11.2.17.6
11.2.17.7
11.2.18
11.2.18.1
11.2.18.2
11.2.18.3
11.2.18.4
11.2.18.5
11.2.18.6
11.2.19
11.2.19.1
11.2.19.2
11.2.19.3
11.2.20
11.2.21
11.2.21.1
11.2.21.2
11.2.21.3
11.2.21.4
11.2.22
11.2.22.1
11.2.22.2
Title
Page
Number
Cache Dirty Error Way(s) In (p_d_drterrway_in[0:3])........................................ 11-39
Cache Lock Error Way(s) in (p_[d,i]_lkerrway_in[0:3])..................................... 11-39
External Translation Alteration Signals................................................................... 11-39
External PID Enable (p_extpid_en)..................................................................... 11-39
External PID In (p_extpid[6:7])........................................................................... 11-39
Timer Facility Signals.............................................................................................. 11-40
Timer Disable (p_tbdisable) ................................................................................ 11-40
Timer External Clock (p_tbclk) ........................................................................... 11-40
Timer Interrupt Status (p_tbint) ........................................................................... 11-40
Processor Reservation Signals ................................................................................. 11-40
CPU Reservation Status (p_rsrv)......................................................................... 11-40
CPU Reservation Clear (p_rsrv_clr).................................................................... 11-40
Miscellaneous Processor Signals ............................................................................. 11-41
CPU ID (p_cpuid[0:7]) ........................................................................................ 11-41
PID0 outputs (p_pid0[0:7]).................................................................................. 11-41
PID0 Update (p_pid0_updt) ................................................................................ 11-41
System Version (p_sysvers[0:31]) ....................................................................... 11-41
Processor Version (p_pvrin[16:31])..................................................................... 11-41
HID1 System Control (p_hid1_sysctl[0:7])......................................................... 11-42
Debug Event Outputs (p_devnt_out[0:7]) ........................................................... 11-42
Processor State Signals ............................................................................................ 11-42
Processor Mode (p_mode[0:3]) ........................................................................... 11-42
Processor Execution Pipeline Status (p_pstat_pipe0[0:5], p_pstat_pipe1[0:5]).. 11-42
Branch Prediction Status (p_brstat[0:1]) ............................................................. 11-44
Processor Exception Enable MSR Values (p_msr_EE, p_msr_CE, p_msr_DE,
p_msr_ME) ...................................................................................................... 11-44
Processor Return from Interrupt (p_rfi, p_rfci, p_rfdi, p_rfmci)......................... 11-44
Processor Machine Check (p_mcp_out).............................................................. 11-44
Power Management Control Signals ....................................................................... 11-45
Wait, Halt, Stop Signals ....................................................................................... 11-45
Low-Power Mode Signals (p_doze, p_nap, p_sleep) .......................................... 11-45
Wakeup (p_wakeup) ............................................................................................ 11-45
Performance Monitor Signals .................................................................................. 11-46
Debug Event Input Signals ...................................................................................... 11-47
Unconditional Debug Event (p_ude) ................................................................... 11-47
External Debug Event 1 (p_devt1) ...................................................................... 11-47
External Debug Event 2 (p_devt2) ...................................................................... 11-47
14.2.22 Debug Event Output Signals (p_devnt_out[0:7]) ................................... 11-48
Debug/Emulation (Nexus 1/OnCE) Support Signals............................................... 11-48
OnCE Enable (jd_en_once) ................................................................................. 11-48
Debug Session (jd_debug_b)............................................................................... 11-48
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xv
Contents
Paragraph
Number
11.2.22.3
11.2.22.4
11.2.22.5
11.2.22.6
11.2.23
11.2.23.1
11.2.23.2
11.2.23.3
11.2.23.4
11.2.24
11.2.25
11.2.25.1
11.2.25.2
11.2.25.3
11.2.25.4
11.2.25.5
11.2.25.6
11.2.25.7
11.2.25.8
11.2.25.9
11.2.25.10
11.2.25.11
11.2.25.12
11.2.26
11.2.26.1
11.2.26.2
11.2.26.3
11.3
11.3.1
11.3.2
11.3.2.1
11.3.2.2
11.3.2.3
11.3.2.4
11.3.2.5
11.3.2.6
11.3.2.7
11.3.2.8
11.3.3
11.3.4
11.3.4.1
Title
Page
Number
Debug Request (jd_de_b) .................................................................................... 11-49
DE_b Active High Output Enable (jd_de_en)..................................................... 11-49
Processor Clock On (jd_mclk_on)....................................................................... 11-49
Watchpoint Events (jd_watchpt[0:26])................................................................ 11-49
Debug Lockstep Cross-signaling Signals ................................................................ 11-49
Debug Request EDM In (p_dbgrq_edm_in)........................................................ 11-50
Debug Go Request In (p_dbg_go_in).................................................................. 11-50
Debug Request EDM Out (p_dbgrq_edm_out)................................................... 11-50
Debug Go Request Out (p_dbg_go_out) ............................................................. 11-50
Development Support (Nexus 3) Signals................................................................. 11-50
JTAG Support Signals ............................................................................................. 11-51
JTAG/OnCE Serial Input (j_tdi) .......................................................................... 11-51
JTAG/OnCE Serial Clock (j_tclk) ....................................................................... 11-51
JTAG/OnCE Serial Output (j_tdo) ...................................................................... 11-51
JTAG/OnCE Test Mode Select (j_tms) ............................................................... 11-52
JTAG/OnCE Test Reset (j_trst_b) ....................................................................... 11-52
TAP Controller State Indicator Signals ............................................................... 11-52
Register Select (j_gp_regsel)............................................................................... 11-53
Enable Once Register Select (j_en_once_regsel) ................................................ 11-53
External Nexus Register Select (j_nexus_regsel)................................................ 11-53
External LSRL Register Select (j_lsrl_regsel) .................................................... 11-54
Serial Data (j_serial_data) ................................................................................... 11-54
Key Data In (j_key_in) ........................................................................................ 11-55
JTAG ID Signals ...................................................................................................... 11-56
JTAG ID Sequence (j_id_sequence[0:1]) ............................................................ 11-56
JTAG ID Sequence (j_id_sequence[2:9]) ............................................................ 11-56
JTAG ID Version (j_id_version[0:3]) .................................................................. 11-57
Timing Diagrams ......................................................................................................... 11-57
AHB Clock Enable and the Internal HCLK ............................................................ 11-57
Processor Instruction/Data Transfers....................................................................... 11-58
Basic Read Transfer Cycles ................................................................................. 11-60
Read Transfer with Wait State ............................................................................. 11-62
Basic Write Transfer Cycles ................................................................................ 11-63
Write Transfer with Wait States ........................................................................... 11-65
Read and Write Transfers .................................................................................... 11-66
Misaligned Accesses............................................................................................ 11-70
Burst Accesses ..................................................................................................... 11-73
Error Termination Operation ............................................................................... 11-76
Memory Synchronization Control Operation .......................................................... 11-80
Cache Error Cross-signaling Operation................................................................... 11-84
Cross-signaling with Machine Check Operation Selected .................................. 11-85
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xvi
Freescale Semiconductor
Contents
Paragraph
Number
11.3.4.2
11.3.5
11.3.5.1
11.3.6
11.3.6.1
11.3.6.2
11.3.6.3
11.3.7
11.3.8
11.3.9
11.3.10
Title
Page
Number
Cross-signaling with Auto-invalidation Operation Selected ............................... 11-86
Cache Coherency Interface Operation................................................................... 11-101
Stop Mode Entry/Exit and Snoop Ready Signaling .......................................... 11-106
Debug Lockstep Cross-signaling Operation .......................................................... 11-109
Debug Entry Cross-signaling..............................................................................11-110
Debug Exit Cross-signaling................................................................................11-113
Update_DR State Cross-signaling ......................................................................11-116
Power Management ................................................................................................11-118
Interrupt Interface ...................................................................................................11-118
Time Base Interface ............................................................................................... 11-122
JTAG Test Interface ............................................................................................... 11-122
Chapter 12
Power Management
12.1
12.1.1
12.1.2
12.1.3
12.1.4
12.1.5
12.1.6
12.1.7
12.1.8
12.1.9
Power Management ....................................................................................................... 12-1
Active State................................................................................................................ 12-1
Waiting State .............................................................................................................. 12-1
Halted State................................................................................................................ 12-2
Stopped State ............................................................................................................. 12-2
Power Management Pins ........................................................................................... 12-3
Power Management Control Bits............................................................................... 12-4
Software Considerations for Power Management using Wait Instructions ............... 12-4
Software Considerations for Power Management using Doze, Nap, or Sleep .......... 12-4
Debug Considerations for Power Management ......................................................... 12-5
Chapter 13
Debug Support
13.1
13.1.1
13.1.1.1
13.1.2
13.1.3
13.1.4
13.1.4.1
13.2
13.2.1
13.2.2
13.2.2.1
13.2.3
Overview........................................................................................................................ 13-1
Software Debug Facilities.......................................................................................... 13-1
Power ISA Embedded Category Compatibility..................................................... 13-2
Additional Debug Facilities ....................................................................................... 13-2
Hardware Debug Facilities ........................................................................................ 13-2
Software/Hardware Debug Resource Sharing ........................................................... 13-3
Simultaneous Hardware and Software Debug Event Handing.............................. 13-3
Software Debug Events and Exceptions ........................................................................ 13-4
Instruction Address Compare Event .......................................................................... 13-5
Data Address Compare Event.................................................................................... 13-6
Data Address Compare Event Status Updates....................................................... 13-7
Linked Instruction Address and Data Address Compare Event .............................. 13-17
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xvii
Contents
Paragraph
Number
13.2.4
13.2.5
13.2.6
13.2.7
13.2.8
13.2.9
13.2.10
13.2.11
13.2.12
13.2.13
13.3
13.3.1
13.3.2
13.3.3
13.3.3.1
13.3.3.2
13.3.3.3
13.3.3.4
13.3.3.5
13.3.3.6
13.3.3.7
13.3.3.8
13.3.4
13.3.5
13.3.6
13.4
13.4.1
13.4.1.1
13.4.1.2
13.4.1.3
13.4.2
13.4.3
13.4.4
13.4.4.1
13.4.4.2
13.4.4.3
13.4.4.4
13.4.5
13.4.5.1
13.4.5.2
13.4.5.3
Title
Page
Number
Trap Debug Event .................................................................................................... 13-17
Branch Taken Debug Event ..................................................................................... 13-17
Instruction Complete Debug Event.......................................................................... 13-18
Interrupt Taken Debug Event................................................................................... 13-18
Critical Interrupt Taken Debug Event...................................................................... 13-18
Return Debug Event................................................................................................. 13-19
Critical Return Debug Event.................................................................................... 13-19
Debug Counter Debug Event................................................................................... 13-19
External Debug Event.............................................................................................. 13-19
Unconditional Debug Event..................................................................................... 13-19
Debug Registers ........................................................................................................... 13-20
Debug Address and Value Registers........................................................................ 13-20
Debug Counter Register (DBCNT) ......................................................................... 13-21
Debug Control and Status Registers ........................................................................ 13-22
Debug Control Register 0 (DBCR0).................................................................... 13-22
Debug Control Register 1 (DBCR1).................................................................... 13-25
Debug Control Register 2 (DBCR2).................................................................... 13-27
Debug Control Register 3 (DBCR3).................................................................... 13-30
Debug Control Register 4 (DBCR4).................................................................... 13-36
Debug Control Register 5 (DBCR5).................................................................... 13-37
Debug Control Register 6 (DBCR6).................................................................... 13-39
Debug Status Register (DBSR) ........................................................................... 13-40
Debug External Resource Control Register (DBERC0).......................................... 13-43
Debug Event Select Register (DEVENT)................................................................ 13-50
Debug Data Acquisition Message Register (DDAM) ............................................. 13-51
External Debug Support............................................................................................... 13-51
External Debug Registers ........................................................................................ 13-52
External Debug Control Register 0 (EDBCR0)................................................... 13-53
External Debug Status Register 0 (EDBSR0)...................................................... 13-53
External Debug Status Register Mask 0 (EDBSRMSK0) ................................... 13-56
OnCE Introduction................................................................................................... 13-57
JTAG/OnCE Pins ..................................................................................................... 13-60
OnCE Internal Interface Signals .............................................................................. 13-60
CPU Debug Request (dbg_dbgrq) ....................................................................... 13-60
CPU Debug Acknowledge (cpu_dbgack)............................................................ 13-61
CPU Address, Attributes ..................................................................................... 13-61
CPU Data ............................................................................................................. 13-61
OnCE Interface Signals ........................................................................................... 13-61
OnCE Enable (jd_en_once) ................................................................................. 13-61
OnCE Debug Request/Event (jd_de_b, jd_de_en) .............................................. 13-61
e200 OnCE Debug Output (jd_debug_b) ............................................................ 13-62
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xviii
Freescale Semiconductor
Contents
Paragraph
Number
13.4.5.4
13.4.5.5
13.4.6
13.4.6.1
13.4.6.2
13.4.6.3
13.4.7
13.4.8
13.4.8.1
13.4.8.2
13.4.8.3
13.4.8.4
13.4.8.5
13.4.8.6
13.4.9
13.4.9.1
13.4.9.2
13.4.9.3
13.4.9.4
13.4.9.5
13.4.9.6
13.4.10
13.4.10.1
13.4.11
13.5
13.6
13.7
13.8
13.9
13.9.1
13.9.2
13.9.3
13.9.4
13.9.5
13.9.6
13.9.7
Title
Page
Number
e200 CPU Clock On Input (jd_mclk_on) ............................................................ 13-62
Watchpoint Events (jd_watchpt[0:29])................................................................ 13-62
e200 OnCE Controller and Serial Interface............................................................. 13-63
e200 OnCE Status Register ................................................................................. 13-63
e200 OnCE Command Register (OCMD)........................................................... 13-64
e200 OnCE Control Register (OCR) ................................................................... 13-68
Access to Debug Resources..................................................................................... 13-70
Methods of Entering Debug Mode .......................................................................... 13-72
External Debug Request During RESET............................................................. 13-72
Debug Request During RESET ........................................................................... 13-72
Debug Request During Normal Activity ............................................................. 13-73
Debug Request During Waiting, Halted, or Stopped State.................................. 13-73
Software Request During Normal Activity ......................................................... 13-73
Debug Notify Halt Instructions ........................................................................... 13-73
CPU Status and Control Scan Chain Register (CPUSCR) ...................................... 13-74
Instruction Register (IR) ...................................................................................... 13-74
Control State Register (CTL)............................................................................... 13-75
Program Counter Register (PC)........................................................................... 13-78
Write-Back Bus Register (WBBR[low], WBBR[high])...................................... 13-79
Machine State Register (MSR) ............................................................................ 13-79
Exiting Debug Mode and Interrupt Blocking ...................................................... 13-79
Instruction Address FIFO Buffer (PC FIFO)........................................................... 13-80
PC FIFO............................................................................................................... 13-80
Reserved Registers (Reserved) ................................................................................ 13-82
Watchpoint Support ..................................................................................................... 13-82
MMU and Cache Operation During Debug................................................................. 13-84
Cache Array Access During Debug............................................................................. 13-85
Basic Steps for Enabling, Using, and Exiting External Debug Mode ......................... 13-85
Parallel Signature Unit................................................................................................. 13-86
Parallel Signature Control Register (PSCR)............................................................ 13-88
Parallel Signature Status Register (PSSR)............................................................... 13-89
Parallel Signature High Register (PSHR)................................................................ 13-89
Parallel Signature Low Register (PSLR) ................................................................. 13-89
Parallel Signature Counter Register (PSCTR)......................................................... 13-90
Parallel Signature Update High Register (PSUHR) ................................................ 13-90
Parallel Signature Update Low Register (PSULR).................................................. 13-90
Chapter 14
Nexus 3 Module
14.1
Introduction.................................................................................................................... 14-1
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xix
Contents
Paragraph
Number
14.1.1
14.1.2
14.1.3
14.2
14.3
14.4
14.4.1
14.4.2
14.4.3
14.4.4
14.4.5
14.4.6
14.4.7
14.4.8
14.4.9
14.4.10
14.4.11
14.4.12
14.4.13
14.4.14
14.4.15
14.4.16
14.5
14.6
14.6.1
14.6.2
14.6.3
14.6.4
14.6.5
14.7
14.7.1
14.7.2
14.7.3
14.7.4
14.7.5
14.7.6
14.7.7
14.7.8
14.8
14.9
14.10
Title
Page
Number
Terms and Definitions................................................................................................ 14-1
Feature List ................................................................................................................ 14-2
Functional Block Diagram......................................................................................... 14-4
Enabling Nexus 3 Operation.......................................................................................... 14-4
TCODEs Supported ....................................................................................................... 14-5
Nexus 3 Programmer’s Model ..................................................................................... 14-10
Client Select Control (CSC) .................................................................................... 14-12
Port Configuration Register (PCR)—reference only............................................... 14-13
Nexus Development Control Register 1 (DC1)....................................................... 14-14
Nexus Development Control Register 2 (DC2)....................................................... 14-15
Nexus Development Control Register 3 (DC3)....................................................... 14-18
Nexus Development Control Register 4 (DC4)....................................................... 14-19
Development Status Register (DS) .......................................................................... 14-21
Watchpoint Trigger Registers (WT, PTSTC, PTETC, DTSTC, DTETC) ............... 14-21
Nexus Watchpoint Mask Register (WMSK)............................................................ 14-27
Nexus Overrun Control Register (OVCR)............................................................... 14-28
Data Trace Control Register (DTC)......................................................................... 14-29
Data Trace Start Address Registers (DTSA1–4) ..................................................... 14-30
Data Trace End Address Registers (DTEA1–4) ...................................................... 14-31
Read/Write Access Control/Status (RWCS) ............................................................ 14-32
Read/Write Access Data (RWD) ............................................................................. 14-33
Read/Write Access Address (RWA) ........................................................................ 14-35
JTAG/OnCE Nexus 3 Register Access ........................................................................ 14-35
Nexus Message Fields ................................................................................................. 14-36
TCODE Field........................................................................................................... 14-36
Source ID Field (SRC)............................................................................................. 14-36
Relative Address Field (U-ADDR).......................................................................... 14-36
Full Address Field (F-ADDR) ................................................................................. 14-37
Address Space Indication Field (MAP) ................................................................... 14-37
Nexus Message Queues ............................................................................................... 14-38
Message Queue Overrun.......................................................................................... 14-38
CPU Stall ................................................................................................................. 14-38
Message Suppression............................................................................................... 14-38
Nexus Message Priority ........................................................................................... 14-39
Data Acquisition Message Priority Loss Response ................................................. 14-40
Ownership Trace Message Priority Loss Response................................................. 14-40
Program Trace Message Priority Loss Response..................................................... 14-40
Data Trace Message Priority Loss Response........................................................... 14-40
Debug Status Messages................................................................................................ 14-41
Error Messages ............................................................................................................ 14-41
Ownership Trace .......................................................................................................... 14-41
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xx
Freescale Semiconductor
Contents
Paragraph
Number
14.10.1
14.10.2
14.11
14.11.1
14.11.1.1
14.11.1.2
14.11.1.3
14.11.1.4
14.11.2
14.11.2.1
14.11.2.2
14.11.2.3
14.11.3
14.11.3.1
14.11.3.2
14.11.3.3
14.11.3.4
14.11.3.5
14.11.3.6
14.11.3.7
14.11.3.8
14.11.3.9
14.11.3.10
14.11.4
14.11.5
14.12
14.12.1
14.12.2
14.12.2.1
14.12.2.2
14.12.2.3
14.12.3
14.12.3.1
14.12.3.2
14.12.3.3
14.12.3.4
14.12.4
14.13
Title
Page
Number
Overview.................................................................................................................. 14-41
Ownership Trace Messaging (OTM) ....................................................................... 14-42
Program Trace.............................................................................................................. 14-42
Branch Trace Messaging Types ............................................................................... 14-43
e200 Indirect Branch Message Instructions......................................................... 14-43
e200 Direct Branch Message Instructions ........................................................... 14-44
BTM Using Branch History Messages ................................................................ 14-44
BTM using Traditional Program Trace Messages ............................................... 14-44
BTM Message Formats............................................................................................ 14-45
Indirect Branch Messages (History) .................................................................... 14-45
Indirect Branch Messages (Traditional) .............................................................. 14-45
Direct Branch Messages (Traditional) ................................................................. 14-45
Program Trace Message Fields................................................................................ 14-46
Sequential Instruction Count Field (ICNT) ......................................................... 14-46
Branch/Predicate Instruction History (HIST)...................................................... 14-46
Execution Mode Indication.................................................................................. 14-47
Resource Full Messages ...................................................................................... 14-47
Program Correlation Messages............................................................................ 14-48
Program Correlation Message Generation for TLB Update with New Address
Translation ....................................................................................................... 14-50
Program Correlation Message Generation for TLB Invalidate (tlbivax) Operations ....
14-50
Program Correlation Message Generation for PID Updates or MSR[IS] Updates .......
14-50
Program Trace Overflow Error Messages ........................................................... 14-51
Program Trace Synchronization Messages.......................................................... 14-51
Enabling Program Trace .......................................................................................... 14-53
Program Trace Timing Diagrams (2 MDO/1 MSEO Configuration)...................... 14-53
Data Trace ................................................................................................................... 14-55
Data Trace Messaging (DTM) ................................................................................. 14-55
DTM Message Formats ........................................................................................... 14-55
Data Write Messages ........................................................................................... 14-55
Data Read Messages ............................................................................................ 14-56
Data Trace Synchronization Messages ................................................................ 14-56
DTM Operation........................................................................................................ 14-57
Data Trace Windowing ........................................................................................ 14-58
Data Access/Instruction Access Data Tracing..................................................... 14-58
Data Trace Filtering ............................................................................................. 14-58
e200 Bus Cycle Special Cases............................................................................. 14-58
Data Trace Timing Diagrams (8 MDO/2 MSEO Configuration) ............................ 14-59
Data Acquisition Messaging ........................................................................................ 14-60
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xxi
Contents
Paragraph
Number
14.13.1
14.13.2
14.13.3
14.14
14.14.1
14.15
14.15.1
14.15.2
14.15.3
14.15.4
14.15.5
14.15.5.1
14.15.5.2
14.15.6
14.16
14.16.1
14.16.2
14.17
14.18
14.19
14.20
14.21
Title
Page
Number
Data Acquisition ID Tag Field................................................................................. 14-60
Data Acquisition Data Field .................................................................................... 14-60
Data Acquisition Trace Event.................................................................................. 14-60
Watchpoint Trace Messaging....................................................................................... 14-61
Watchpoint Timing Diagram (2 MDO/1 MSEO configuration).............................. 14-62
Nexus 3 Read/Write Access to Memory-Mapped Resources..................................... 14-63
Single Write Access ................................................................................................. 14-63
Block Write Access.................................................................................................. 14-64
Single Read Access.................................................................................................. 14-64
Block Read Access .................................................................................................. 14-65
Error Handling ......................................................................................................... 14-65
AHB Read/Write Error ........................................................................................ 14-66
Access Termination ............................................................................................. 14-66
Read/Write Access Error Message .......................................................................... 14-66
Nexus 3 Pin Interface................................................................................................... 14-66
Pins Implemented .................................................................................................... 14-67
Pin Protocol.............................................................................................................. 14-69
Rules for Output Messages .......................................................................................... 14-71
Auxiliary Port Arbitration............................................................................................ 14-72
Examples...................................................................................................................... 14-72
Electrical Characteristics ............................................................................................. 14-75
IEEE 1149.1 (JTAG) RD/WR Sequences.................................................................... 14-75
Appendix A
Register Summary
Appendix B
Revision History
B.1
B.2
Changes between revisions 1 and 2 .................................................................................B-1
Changes between revisions 0 and 1 .................................................................................B-2
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xxii
Freescale Semiconductor
Tables
Table
Number
Title
Page
Number
Tables
i
ii
iii
1-1
1-2
1-3
1-4
1-5
2-1
2-2
2-3
2-4
2-5
2-6
2-7
2-8
2-9
2-10
2-11
2-12
2-13
2-14
2-15
2-16
3-1
3-2
3-3
4-1
4-2
4-3
4-4
4-5
5-1
5-2
5-3
5-4
5-5
5-6
5-7
5-8
5-9
Acronyms and Abbreviated Terms.......................................................................................... xlii
Terminology Conventions ..................................................................................................... xliii
Instruction Field Conventions ............................................................................................... xliv
Cache Block Lock and Unlock Instructions ........................................................................... 1-7
Scalar and Vector Embedded Floating-Point Instructions ...................................................... 1-7
Interrupt Types ...................................................................................................................... 1-10
Interrupt Registers................................................................................................................. 1-11
Exceptions and Conditions.................................................................................................... 1-12
MSR Field Descriptions.......................................................................................................... 2-9
PIR Field Descriptions .......................................................................................................... 2-11
PVR Field Descriptions ........................................................................................................ 2-12
SVR Field Descriptions ........................................................................................................ 2-12
XER Field Descriptions ........................................................................................................ 2-13
ESR Field Descriptions ......................................................................................................... 2-14
Machine Check Syndrome Register (MCSR) ....................................................................... 2-16
Timer Control Register Field Descriptions ........................................................................... 2-19
Timer Status Register Field Descriptions.............................................................................. 2-20
Hardware Implementation Dependent Register 0 ................................................................. 2-21
Hardware Implementation Dependent Register 1 ................................................................. 2-23
Branch Unit Control and Status Register .............................................................................. 2-24
System Response to Invalid SPR Reference......................................................................... 2-26
Additional synchronization requirements for SPRs.............................................................. 2-26
Special Purpose Registers ..................................................................................................... 2-28
Reset Settings for e200 Resources ........................................................................................ 2-31
Implementation-Specific Instruction Summary ...................................................................... 3-1
Instructions Sorted by Mnemonic ......................................................................................... 3-25
Instructions Sorted by Opcode .............................................................................................. 3-32
Concurrent Instruction Issue Capabilities ............................................................................... 4-3
Pipeline Stages ........................................................................................................................ 4-5
Instruction Class Cycle Counts ............................................................................................. 4-20
Instruction Timing by Mnemonic ......................................................................................... 4-21
Performance Effects of Storage Operand Placement ............................................................ 4-25
SPE /EFPU Status and Control Register................................................................................. 5-2
Floating-Point Results Summary—Add, Sub, Mul, Div ...................................................... 5-92
Floating-Point Results Summary—madd, msub, nmadd, nmsub ......................................... 5-96
Floating-Point Results Summary—sqrt .............................................................................. 5-100
Floating–Point Results Summary—Min, Max.................................................................... 5-101
Floating-Point Results Summary—Convert to Unsigned................................................... 5-106
Floating-Point Results Summary—Convert to Signed ....................................................... 5-106
Floating-Point Results Summary—Convert from Unsigned .............................................. 5-106
Floating-Point Results Summary—Convert from Signed .................................................. 5-106
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xxiii
Tables
Table
Number
5-10
5-11
5-12
5-13
5-14
5-15
5-16
5-17
6-1
6-2
6-3
6-4
6-5
6-6
6-7
6-8
6-9
6-10
6-11
6-12
6-13
6-14
6-15
6-16
6-17
6-18
6-19
6-20
6-21
6-22
6-23
6-24
6-25
7-1
7-2
7-3
7-4
7-5
7-6
7-7
7-8
Title
Page
Number
Floating-Point Results Summary—fabs, fnabs, fneg.......................................................... 5-107
Floating-point Results Summary—Convert from half-precision........................................ 5-107
Floating-point Results Summary—Convert to half-precision ............................................ 5-107
EFPU Vector Floating-Point Instruction Timing ................................................................ 5-108
EFPU Single-precision Scalar Floating-Point Instruction Timing...................................... 5-109
Opcode Space Division........................................................................................................5-111
Embedded Vector Floating-Point Instruction Opcodes........................................................5-111
Embedded Scalar Single-Precision Floating-Point Instruction Opcodes............................ 5-113
RTL Notation .......................................................................................................................... 6-2
SPE Status and Control Register............................................................................................. 6-4
Simple Vector Arithmetic Instructions.................................................................................... 6-8
Simple Vector Logical Instructions....................................................................................... 6-15
Simple Vector Shift/Rotate Instructions................................................................................ 6-15
Vector Compare Instructions................................................................................................. 6-16
Vector Set Instructions .......................................................................................................... 6-16
Vector Select Instructions...................................................................................................... 6-16
Vector Data Arrangement Instructions.................................................................................. 6-17
Mnemonic Extensions for Multiply Accumulate Instructions .............................................. 6-22
Mnemonic Extensions for Dot Product Instructions............................................................. 6-23
Misc. Vector Instructions ...................................................................................................... 6-24
Vector Load and Store Instructions ...................................................................................... 6-28
Simple Vector Arithmetic Instruction Timing....................................................................... 6-31
SPE Complex Integer Instruction Timing............................................................................. 6-34
SPE Vector Logical Instruction Timing ................................................................................ 6-34
SPE Vector Shift/Rotate Instruction Timing ......................................................................... 6-35
SPE Vector Compare Instruction Timing.............................................................................. 6-35
SPE Vector Set Instruction Timing ....................................................................................... 6-36
SPE Vector Select Instruction Timing................................................................................... 6-36
SPE Vector Data Arrangement Instruction Timing............................................................... 6-36
SPE Multiply and Multiply/Accumulate Instruction Timing................................................ 6-39
SPE Dot Product Instruction Timing .................................................................................... 6-39
SPE Misc. Vector Instruction Timing ................................................................................... 6-39
SPE Load and Store Instruction Timing ............................................................................... 6-39
Interrupt Classifications .......................................................................................................... 7-2
Exceptions and Conditions...................................................................................................... 7-3
ESR Bit Settings...................................................................................................................... 7-4
MSR Bit Settings .................................................................................................................... 7-6
Machine Check Syndrome Register (MCSR) ......................................................................... 7-9
IVPR Register Fields ............................................................................................................ 7-12
IVOR Register Fields............................................................................................................ 7-13
Interrupts ............................................................................................................................... 7-13
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xxiv
Freescale Semiconductor
Tables
Table
Number
7-9
7-10
7-11
7-12
7-13
7-14
7-15
7-16
7-17
7-18
7-19
7-20
7-21
7-22
7-23
7-24
7-25
7-26
7-27
7-28
7-29
7-30
7-31
7-32
7-33
7-34
7-35
7-36
8-1
8-2
8-3
8-4
8-5
8-6
8-7
8-8
8-9
8-10
9-1
9-2
9-3
Title
Page
Number
Critical Input Interrupt—Register Settings ........................................................................... 7-14
Error Report Machine Check Exceptions ............................................................................. 7-16
Asynchronous Machine Check Exceptions........................................................................... 7-21
Asynchronous Machine Check MCAR Update Priority....................................................... 7-25
Machine Check Interrupt—Register Settings ....................................................................... 7-27
Data Storage Interrupt—Register Settings............................................................................ 7-28
ISI Exceptions and Conditions.............................................................................................. 7-29
Instruction Storage Interrupt—Register Settings .................................................................. 7-29
External Input Interrupt—Register Settings ......................................................................... 7-30
Alignment Interrupt—Register Settings ............................................................................... 7-31
Program Interrupt—Register Settings................................................................................... 7-32
Floating-Point Unavailable Interrupt—Register Settings ..................................................... 7-33
System Call Interrupt—Register Settings ............................................................................. 7-33
Decrementer Interrupt—Register Settings............................................................................ 7-34
Fixed-Interval Timer Interrupt—Register Settings ............................................................... 7-35
Watchdog Timer Interrupt—Register Settings...................................................................... 7-35
Data TLB Error Interrupt—Register Settings ....................................................................... 7-36
Instruction TLB Error Interrupt—Register Settings ............................................................. 7-36
Debug Interrupt—Register Settings...................................................................................... 7-39
TSR Watchdog Timer Reset Status ....................................................................................... 7-40
DBSR Most Recent Reset ..................................................................................................... 7-40
System Reset Interrupt—Register Settings........................................................................... 7-41
SPE/EFPU Unavailable Interrupt—Register Settings .......................................................... 7-41
Embedded Floating-Point Data Interrupt—Register Settings............................................... 7-42
Embedded Floating-point Round Interrupt—Register Settings............................................ 7-42
Performance Monitor Interrupt—Register Settings .............................................................. 7-43
Zen Exception Priorities........................................................................................................ 7-45
MSR Setting Due to Interrupt ............................................................................................... 7-48
Supervisor-Level PMRs (PMR[5] = 1) ................................................................................... 8-3
User-Level PMRs (PMR[5] = 0) (Read-Only)........................................................................ 8-4
Response to an Invalid PMR Reference ................................................................................. 8-4
PMGC0 Field Descriptions ..................................................................................................... 8-5
PMLCa0–PMLCa3 Field Descriptions ................................................................................... 8-6
PMLCb0–PMLCb3 Field Descriptions .................................................................................. 8-8
PMC0–PMC3 Field Descriptions ......................................................................................... 8-12
Processor States and PMLCa0–PMLCa3 Bit Settings.......................................................... 8-14
Event Types........................................................................................................................... 8-16
Performance Monitor Event Selection.................................................................................. 8-16
L1CSR0 Field Descriptions .................................................................................................... 9-6
L1CSR1 Field Descriptions .................................................................................................... 9-9
L1CFG0 Field Descriptions .................................................................................................. 9-11
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xxv
Tables
Table
Number
9-4
9-5
9-6
9-7
9-8
9-9
9-10
9-11
9-12
9-13
9-14
9-15
9-16
9-17
10-1
10-2
10-3
10-4
10-5
10-6
10-7
10-8
10-9
10-10
10-11
10-12
10-13
10-14
10-15
10-16
11-1
11-2
11-3
11-4
11-5
11-6
11-7
11-8
11-9
11-10
11-11
Title
Page
Number
L1CFG1 Field Descriptions .................................................................................................. 9-12
L1FINV0 Field Descriptions ................................................................................................ 9-17
L1FINV1 Field Descriptions ................................................................................................ 9-18
Tag Checkbit Generation....................................................................................................... 9-27
Data Checkbit Generation ..................................................................................................... 9-27
Cache Line Locking/Unlocking Unit Instructions ................................................................ 9-32
Cache Line Locking/Unlocking Unit Instructions ................................................................ 9-34
Special Case Handling .......................................................................................................... 9-42
Transfer Type Encoding ........................................................................................................ 9-43
CDACNTL Field Descriptions ............................................................................................. 9-48
CDADATA Field Descriptions ............................................................................................. 9-50
HDBCR0 Field Descriptions ................................................................................................ 9-51
p_snp_cmd[0:1] Snoop Command Encoding ....................................................................... 9-53
p_snp_resp[0:4] Snoop Response Encoding......................................................................... 9-53
Page Size Field Encodings and EPN Field Comparison....................................................... 10-3
TLB Entry Bit Definitions ................................................................................................... 10-6
MMUCFG Field Descriptions .............................................................................................. 10-7
TLB0CFG Field Descriptions ............................................................................................... 10-7
TLB1CFG Field Descriptions ............................................................................................... 10-8
tlbivax EA Bit Definitions .................................................................................................. 10-11
TLB Entry 0 Values after Reset .......................................................................................... 10-14
MMUCSR0—MMU Control and Status Register 0 ........................................................... 10-15
MAS0 —MMU Read/Write and Replacement Control...................................................... 10-16
MAS1—Descriptor Context and Configuration Control .................................................... 10-17
MAS2—EPN and Page Attributes ...................................................................................... 10-18
MAS3—RPN and Access Control ...................................................................................... 10-19
MAS4—Hardware Replacement Assist Configuration Register........................................ 10-19
MAS6—TLB Search Context Register 0............................................................................ 10-20
MMU Assist Register Field Updates .................................................................................. 10-21
Transfer Type Encoding ...................................................................................................... 10-22
Interface Signal Definitions .................................................................................................. 11-4
p_hrdata[63:0] Byte Address Mappings ............................................................................. 11-13
p_d_hwdata[63:0] Byte Address Mappings........................................................................ 11-13
p_[d,i]_htrans[1:0] Transfer Type Encoding....................................................................... 11-14
p_[d,i]_hsize[1:0] Transfer Size Encoding ......................................................................... 11-14
p_[d,i]_hburst[2:0] Burst Type Encoding ........................................................................... 11-15
p_d_hprot[5:0] Protection Control Encoding ..................................................................... 11-15
p_i_hprot[5:0] Protection Control Encoding ...................................................................... 11-16
Mapping of Access attributes to p_d_hprot[4:2] Protection Control.................................. 11-16
p_[d,i]_hbstrb[7:0] to Byte Address Mappings .................................................................. 11-18
Byte Strobe Assertion for Transfers.................................................................................... 11-18
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xxvi
Freescale Semiconductor
Tables
Table
Number
11-12
11-13
11-14
11-15
11-16
11-17
11-18
11-19
11-20
11-21
11-22
11-23
11-24
11-25
11-26
13-1
13-2
13-3
13-4
13-5
13-6
13-7
13-8
13-9
13-10
13-11
13-12
13-13
13-14
13-15
13-16
13-17
13-18
13-19
13-20
13-21
13-22
13-23
13-24
13-25
13-26
Title
Page
Number
Big- and Little-Endian Storage ........................................................................................... 11-20
p_d_hresp[2:0] Transfer Response Encoding ..................................................................... 11-28
p_i_hresp[1:0] Transfer Response Encoding ...................................................................... 11-28
p_snp_cmd[0:1] Snoop Command Encoding ..................................................................... 11-31
p_snp_resp[0:4] Snoop Response Encoding....................................................................... 11-32
Processor Mode Encoding .................................................................................................. 11-42
Processor Execution PIpeline Status Encoding................................................................... 11-43
Branch Prediction Status Encoding..................................................................................... 11-44
e200 Debug/Emulation Support Signals ............................................................................. 11-48
e200 Development Support (Nexus) Signals ...................................................................... 11-50
JTAG Primary Interface Signals ......................................................................................... 11-51
JTAG Signals Used to Support External Registers ............................................................. 11-52
JTAG General Purpose Register Select Decoding .............................................................. 11-53
JTAG Register ID Fields..................................................................................................... 11-56
JTAG ID Register Inputs..................................................................................................... 11-56
DAC Events and Resultant Updates ..................................................................................... 13-8
DAC Events and Resultant Updates, Dual-Issue Case 1 .................................................... 13-11
DAC Events and Resultant Updates, Dual-Issue Case 2 .................................................... 13-13
DAC Events and Resultant Updates, Dual-Issue Case 3 .................................................... 13-14
DAC Events and Resultant Updates, Dual-issue Case 4 .................................................... 13-16
DBCR0 Bit Definitions ....................................................................................................... 13-23
DBCR1 Bit Definitions ....................................................................................................... 13-25
DBCR2 Bit Definitions ....................................................................................................... 13-27
DBCR3 Bit Definitions ....................................................................................................... 13-32
DBCR4 Bit Definitions ....................................................................................................... 13-36
DBCR5 Bit Definitions ....................................................................................................... 13-37
DBCR6 Bit Definitions ....................................................................................................... 13-39
DBSR Bit Definitions ......................................................................................................... 13-41
DBERC0 Bit Definitions .................................................................................................... 13-44
DBERC0 Resource Control ................................................................................................ 13-46
DEVENT Bit Definitions.................................................................................................... 13-50
DDAM Bit Definitions........................................................................................................ 13-51
EDBCR0 Bit Definitions .................................................................................................... 13-53
EDBSR0 Bit Definitions ..................................................................................................... 13-54
EDBSRMSK0 Bit Definitions ............................................................................................ 13-56
JTAG/OnCE Primary Interface Signals .............................................................................. 13-60
OnCE Status Register Bit Definitions................................................................................. 13-64
OnCE Command Register Bit Definitions.......................................................................... 13-65
e200 OnCE Register Addressing ........................................................................................ 13-66
OnCE Control Register Bit Definitions .............................................................................. 13-69
OnCE Register Access Requirements................................................................................. 13-71
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xxvii
Tables
Table
Number
13-27
13-28
13-29
13-30
14-1
14-2
14-3
14-4
14-5
14-6
14-7
14-8
14-9
14-10
14-11
14-12
14-13
14-14
14-15
14-16
14-17
14-18
14-19
14-20
14-21
14-22
14-23
14-24
14-25
14-26
14-27
14-28
14-29
14-30
14-31
14-32
14-33
14-34
14-35
14-36
14-37
Title
Page
Number
Control State Register Field Descriptions........................................................................... 13-75
Watchpoint Output Signal Assignments ............................................................................. 13-82
PSCR Field Descriptions .................................................................................................... 13-88
PSSR Field Descriptions ..................................................................................................... 13-89
Terms and Definitions ........................................................................................................... 14-1
Supported TCODEs .............................................................................................................. 14-5
Error Code Encoding (TCODE = 8) ..................................................................................... 14-9
Error Type Encoding (TCODE = 8)...................................................................................... 14-9
RCODE values (TCODE = 27)............................................................................................. 14-9
Event Code Encoding (TCODE = 33) .................................................................................. 14-9
Data Trace Size Encodings (TCODE = 5,6,13,14) ............................................................. 14-10
Nexus 3 Register Map......................................................................................................... 14-11
Client Select Control Register Fields.................................................................................. 14-12
Port Configuration Register Fields ..................................................................................... 14-13
Development Control Register 1 Fields.............................................................................. 14-14
Development Control Register 2 Fields.............................................................................. 14-15
Development Control Register 3 Fields.............................................................................. 14-18
Development Control Register 4 Fields.............................................................................. 14-20
Development Status Register Fields ................................................................................... 14-21
Watchpoint Trigger Register Fields .................................................................................... 14-22
Program Trace Start Trigger Control Register Fields ......................................................... 14-24
Program Trace End Trigger Control Register Fields .......................................................... 14-25
Data Trace Start Trigger Control Register Fields ............................................................... 14-26
Data Trace End Trigger Control Register Fields................................................................. 14-27
Watchpoint Mask Register Fields ....................................................................................... 14-28
Nexus Overrun Control Register Fields.............................................................................. 14-29
Data Trace Control Register Fields..................................................................................... 14-30
Data Trace—Address Range Options ................................................................................. 14-31
Read/Write Access Control/Status Register Fields ............................................................. 14-32
Read/Write Access Status Bit Encoding ............................................................................. 14-33
RWD Data Placement for Transfers.................................................................................... 14-34
RWD Byte Lane Mapping................................................................................................... 14-34
Message Type Priority and Message Dropped Responses.................................................. 14-39
Indirect Branch Message Sources ....................................................................................... 14-43
Direct Branch Message Sources ......................................................................................... 14-44
Branch/Predicate History Events ........................................................................................ 14-47
RCODE Encoding............................................................................................................... 14-48
Program Trace Exception Summary ................................................................................... 14-52
Data Trace Exception Summary ......................................................................................... 14-57
e200 Bus Cycle Cases ......................................................................................................... 14-58
Watchpoint Source Encoding .............................................................................................. 14-62
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xxviii
Freescale Semiconductor
Tables
Table
Number
14-38
14-39
14-40
14-41
14-42
14-43
14-44
14-45
14-46
14-47
14-48
14-49
14-50
14-51
B-1
B-2
Title
Page
Number
JTAG Pins for Nexus 3 ....................................................................................................... 14-67
Nexus 3 Auxiliary Pins ....................................................................................................... 14-67
Nexus Port Arbitration Signals ........................................................................................... 14-68
MSEO Pin(s) Protocol ........................................................................................................ 14-69
MDO Request Encodings.................................................................................................... 14-72
Indirect Branch Message Example (2 MDO/1 MSEO) ...................................................... 14-73
Indirect Branch Message Example (8 MDO/2 MSEO) ...................................................... 14-73
Direct Branch Message Example (2 MDO/1 MSEO)......................................................... 14-74
Direct Branch Message Example (8 MDO/2 MSEO)......................................................... 14-74
Data Write Message Example (8 MDO/1 MSEO).............................................................. 14-74
Data Write Message Example (8 MDO/2 MSEO).............................................................. 14-75
Accessing Internal Nexus 3 Registers via JTAG/OnCE ..................................................... 14-75
Accessing Memory-Mapped Resources (Reads) ................................................................ 14-76
Accessing Memory-Mapped Resources (Writes) ............................................................... 14-76
Changes between revisions 1and 2 .........................................................................................B-1
Changes between revisions 0 and 1 ........................................................................................B-2
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xxix
Figures
Figure
Number
Title
Page
Number
Figures
1-1
1-2
1-3
1-4
1-5
2-1
2-2
2-3
2-4
2-5
2-6
2-7
2-8
2-9
2-10
2-11
2-12
2-13
2-14
2-15
2-16
4-1
4-2
4-3
4-4
4-5
4-6
4-7
4-8
4-9
4-10
4-11
4-12
4-13
4-14
4-15
4-16
4-17
e200z7 Block Diagram............................................................................................................ 1-2
e200z760 Supervisor Mode Programmer’s Model ................................................................. 1-4
e200z760 Supervisor Mode Programmer’s Model DCRs and PMRs..................................... 1-5
e200z7 User Mode Programmer’s Model ............................................................................... 1-5
e200 User Mode Programmer’s Model PMRs........................................................................ 1-6
e200z760 Supervisor Mode Programmer’s Model ................................................................. 2-2
e200z760 Supervisor Mode Programmer’s Model DCRs and PMRs..................................... 2-3
e200z7 User Mode Programmer’s Model ............................................................................... 2-3
e200 User Mode Programmer’s Model PMRs........................................................................ 2-4
Machine State Register (MSR) ............................................................................................... 2-9
Processor ID Register (PIR).................................................................................................. 2-11
Processor Version Register (PVR) ........................................................................................ 2-11
System Version Register (SVR) ............................................................................................ 2-12
Integer Exception Register (XER) ........................................................................................ 2-13
Exception Syndrome Register (ESR).................................................................................... 2-13
Machine Check Syndrome Register (MCSR) ....................................................................... 2-16
Timer Control Register (TCR) .............................................................................................. 2-18
Timer Status Register (TSR) ................................................................................................. 2-20
Hardware Implementation Dependent Register 0 (HID0) .................................................... 2-21
Hardware Implementation Dependent Register 1 (HID1) .................................................... 2-23
Branch Unit Control and Status Register (BUCSR) ............................................................. 2-24
e200z7 Block Diagram............................................................................................................ 4-2
Pipeline Diagram..................................................................................................................... 4-6
e200 Instruction Prefetch Buffers ........................................................................................... 4-8
e200 Branch Target Buffer ...................................................................................................... 4-9
Basic Pipe Line Flow, Single Cycle Instructions .................................................................. 4-10
Basic Pipeline Flow, Load/Store Instructions ....................................................................... 4-11
Basic Pipeline Flow, Branch Instructions, No Prediction ..................................................... 4-11
Basic Pipeline Flow, Branch Instructions, BTB Hit, Correct Prediction, Branch Taken...... 4-12
Basic Pipeline Flow, Branch Instructions, Predict Not Taken, Incorrect Prediction ............ 4-12
Basic Pipeline Flow, bcctr Instruction, Predict Taken, Incorrect Prediction,
Instruction Buffer Not Empty........................................................................................... 4-12
Basic Pipeline Flow, bcctr Instruction, Predict Taken, Incorrect Prediction,
Instruction Buffer Not Empty........................................................................................... 4-13
Basic Pipelne Flow, bcctr Instruction, Predict Taken, Incorrect Prediction,
Instruction Buffer Empty.................................................................................................. 4-13
Basic Pipeline Flow, Integer Multiply Class Instructions..................................................... 4-14
Basic Pipeline Flow, Long Instruction .................................................................................. 4-14
Pipelined Load Instructions with Load Target Data Dependency ........................................ 4-15
Pipelined Instructions with Base Register Update Data Dependency .................................. 4-15
Pipelined Store Instruction with Store Data Dependency..................................................... 4-16
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xxxi
Figures
Figure
Number
4-18
4-19
4-20
5-1
5-2
5-3
6-1
6-2
6-3
6-4
6-5
7-1
7-2
7-3
7-4
7-5
8-1
8-2
8-3
8-4
9-1
9-2
9-3
9-4
9-5
9-6
9-7
9-8
9-9
9-10
9-11
9-12
9-13
10-1
10-2
10-3
10-4
10-5
10-6
10-7
10-8
Title
Page
Number
mtspr, mfspr Instruction Execution, Debug, and SPE SPRs............................................... 4-16
mtmsr, wrtee[i] Instruction Execution................................................................................. 4-17
Cache/MMU mtspr, mfspr, and Management Instruction Execution ................................. 4-18
SPE/EFPU Status and Control Register (SPEFSCR).............................................................. 5-2
Single-Precision Data Format ................................................................................................. 5-8
Half-Precision Data Format .................................................................................................... 5-9
Vector Storage in GPRs........................................................................................................... 6-3
Accumulator Storage............................................................................................................... 6-3
SPE/EFPU Status and Control Register (SPEFSCR).............................................................. 6-4
rA Used to Form Update Value for Mode 1000.................................................................... 6-27
rA Used to Form Update Value for Bit-Reversed Addressing Update Mode ....................... 6-28
Exception Syndrome Register (ESR)...................................................................................... 7-4
Machine State Register (MSR) ............................................................................................... 7-6
Machine Check Syndrome Register (MCSR) ......................................................................... 7-9
e200 Interrupt Vector Prefix Register (IVPR)....................................................................... 7-12
e200 Interrupt Vector Offset Register (IVOR)...................................................................... 7-12
Performance Monitor Global Control Register (PMGC0)...................................................... 8-5
Performance Monitor Local Control A Registers (PMLCa0–PMLCa3) ................................ 8-6
Performance Monitor Local Control B Registers (PMLCb0–PMLCb3)................................ 8-8
Performance Monitor Counter Registers (PMC0–PMC3).................................................... 8-12
e200z7 Caches......................................................................................................................... 9-2
16-KB Cache Organization and Line Format ......................................................................... 9-3
16-KB Cache Lookup Flow .................................................................................................... 9-5
L1 Cache Control and Status Register 0 (L1CSR0)................................................................ 9-6
L1 Cache Control and Status Register 1 (L1CSR1)................................................................ 9-9
L1 Cache Configuration Register 0 (L1CFG0)..................................................................... 9-10
L1 Cache Configuration Register 1 (L1CFG1)..................................................................... 9-11
L1 Flush/Invalidate Register 0 (L1FINV0) .......................................................................... 9-16
L1 Flush/Invalidate Register 1 (L1FINV1) .......................................................................... 9-17
Cache Debug Access Control Register (CDACNTL)........................................................... 9-48
Cache Debug Access Data Register (CDADATA) ............................................................... 9-50
Hardware Debug Control Register 0 (HDBCR0) ................................................................. 9-50
Snoop Command Port ........................................................................................................... 9-52
Virtual Address and TLB-Entry Compare Process ............................................................... 10-3
Effective to Real Address Translation Flow ......................................................................... 10-4
Granting of Access Permission ............................................................................................. 10-5
MMU Configuration Register (MMUCFG) ......................................................................... 10-6
TLB0 Configuration Register (TLB0CFG) .......................................................................... 10-7
TLB1 Configuration Register (TLB1CFG) .......................................................................... 10-8
Data Exception Address Register ....................................................................................... 10-15
MMU Control and Status Register 0 (MMUCSR0) ........................................................... 10-15
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xxxii
Freescale Semiconductor
Figures
Figure
Number
10-9
10-10
10-11
10-12
10-13
10-14
10-15
11-1
11-2
11-3
11-4
11-5
11-6
11-7
11-8
11-9
11-10
11-11
11-12
11-13
11-14
11-15
11-16
11-17
11-18
11-19
11-20
11-21
11-22
11-23
11-24
11-25
11-26
11-27
11-28
11-29
11-30
11-31
11-32
11-33
11-34
Title
Page
Number
MMU Assist Register 0 (MAS0) ........................................................................................ 10-16
MMU Assist Register 1 (MAS1) ........................................................................................ 10-16
MMU Assist Register 2 (MAS2) ........................................................................................ 10-18
MMU Assist Register 3 (MAS3) ........................................................................................ 10-19
MMU Assist Register 4 (MAS4) ........................................................................................ 10-19
MMU Assist Register 6 (MAS6) ........................................................................................ 10-20
External Translation Alteration TLB Entry Compare Process............................................ 10-24
e200 Signal Groups—part 1.................................................................................................. 11-3
e200 Signal Group—part 2 ................................................................................................... 11-4
Example External JTAG Register Design........................................................................... 11-55
AHB Clock Enable Operation—1....................................................................................... 11-57
AHB Clock Enable Operation—2....................................................................................... 11-58
AHB Clock Enable Operation—3....................................................................................... 11-58
Basic Read Transfers........................................................................................................... 11-60
Read Transfer with Wait-state............................................................................................. 11-62
Basic Write Transfers.......................................................................................................... 11-63
Write Transfer with Wait-State ........................................................................................... 11-65
Single Cycle Read and Write Transfers—1 ........................................................................ 11-66
Single Cycle Read and Write Transfers—2 ........................................................................ 11-67
Multi-Cycle Read and Write Transfers—1 ......................................................................... 11-68
Multi-Cycle Read and Write Transfers—2 ......................................................................... 11-69
Misaligned Read Transfer ................................................................................................... 11-70
Misaligned Write Transfer .................................................................................................. 11-71
Misaligned Write, Single Cycle Read Transfer................................................................... 11-72
Burst Read Transfer ............................................................................................................ 11-73
Burst Read with Wait-state Transfer ................................................................................... 11-74
Burst Write Transfer............................................................................................................ 11-75
Burst Write with Wait-state Transfer .................................................................................. 11-76
Read and Write Transfers, Instr. Read Error Termination................................................... 11-77
Data Read Error Termination .............................................................................................. 11-78
Misaligned Write Error Termination, Burst Substituted ..................................................... 11-79
Burst Read Error Termination, Burst Write Substituted ..................................................... 11-80
Memory Sync Operation (basic operation) ......................................................................... 11-81
Memory Sync Operation (interruption operation) .............................................................. 11-82
Memory Sync Operation (snoop queue empty) .................................................................. 11-83
Memory Sync Operation (2nd msync back-to-back) .......................................................... 11-83
Cross-signaling Exception Output Operation ..................................................................... 11-85
Cross-signaling Exception Input Operation ........................................................................ 11-86
Cross-signaling Invalidation Output Operation—Data Error ............................................. 11-87
Cross-signaling Invalidation Output Operation—Tag Error, Miss ..................................... 11-88
Cross-signaling Invalidation Output Operation—Tag Error, Hit........................................ 11-89
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xxxiii
Figures
Figure
Number
11-35
11-36
11-37
11-38
11-39
11-40
11-41
11-42
11-43
11-44
11-45
11-46
11-47
11-48
11-49
11-50
11-51
11-52
11-53
11-54
11-55
11-56
11-57
11-58
11-59
11-60
11-61
11-62
11-63
11-64
11-65
11-66
11-67
11-68
11-69
11-70
11-71
12-1
13-1
Title
Page
Number
Cross-signaling Invalidation Output Operation—Tag Error, Locked Inv........................... 11-90
Cross-signaling Invalidation Output Operation—Dirty Error ............................................ 11-91
Cross-signaling Invalidation Output Operation—Tag Error, Dirty Error ........................... 11-92
Cross-signaling Invalidation Output Operation—Tag Error, Dirty Error, and Lock Error. 11-93
Cross-signaling Invalidation Input Operation—Data Error................................................ 11-94
Cross-signaling Invalidation Input Operation—Tag Error, Miss........................................ 11-95
Cross-signaling Invalidation Input Operation—Tag Error, Hit .......................................... 11-96
Cross-signaling Invalidation input Operation—Tag Error, Locked Inv ............................. 11-97
Cross-signaling Invalidation Input Operation—Dirty Error ............................................... 11-98
Cross-signaling Invalidation Input Operation—Tag Error, Dirty Error.............................. 11-99
Cross-signaling Invalidation Input Operation—Tag Error, Dirty Error, and Lock Error . 11-100
Cross-signaling Push Parity error Output Operation—Error on DW 1 ............................ 11-101
Basic Cache Coherency Interface Operation—Misses ..................................................... 11-102
Basic Cache Coherency Interface Operation—Hit ........................................................... 11-102
Cache Coherency Interface Operation—Snoop Starvation Timeout ................................ 11-103
Cache Coherency Interface Operation—p_snp_rdy operation ......................................... 11-103
Cache Coherency Interface Operation—p_snp_rdy operation, p_snp_req negation prior to
acceptance .................................................................................................................... 11-104
p_snp_rdy Operation, p_snp_req Negation Prior to Acceptance, Reasserted Later in Ready
Window ........................................................................................................................ 11-105
Stop mode Entry, p_snp_rdy Operation ............................................................................ 11-107
Stop Mode Exit, p_snp_rdy Operation.............................................................................. 11-108
Debug Entry Cross-Signaling Interface, non-Lockstep Mode...........................................11-110
Debug Entry Cross-Signaling Interface, lockstep mode ....................................................11-111
Debug Entry Cross-Signaling Interface, lockstep mode (2) ..............................................11-112
Debug Exit Cross-Signaling Interface, non-lockstep mode...............................................11-113
Debug Exit Cross-Signaling Interface, lockstep mode ......................................................11-114
Debug Exit Cross-Signaling Interface, lockstep mode (2) ................................................11-115
Debug Update_DR State Cross-Signaling Interface, lockstep mode.................................11-116
Debug Update_DR State Cross-Signaling Interface, lockstep mode (2) ...........................11-117
Wakeup Control Signal (p_wakeup) ..................................................................................11-118
Interrupt Interface Input Signals ........................................................................................11-118
Interrupt Pending Operation...............................................................................................11-119
Interrupt Acknowledge Operation—1 .............................................................................. 11-120
Interrupt Acknowledge Operation—2 .............................................................................. 11-121
Time Base Input Timing.................................................................................................... 11-122
Test Clock Input Timing ................................................................................................... 11-122
j_trst_b Timing.................................................................................................................. 11-122
Test Access Port Timing ................................................................................................... 11-123
Power Management State Diagram....................................................................................... 12-3
e200z7 Debug Resources ...................................................................................................... 13-4
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xxxiv
Freescale Semiconductor
Figures
Figure
Number
13-2
13-3
13-4
13-5
13-6
13-7
13-8
13-9
13-10
13-11
13-12
13-13
13-14
13-15
13-16
13-17
13-18
13-19
13-20
13-21
13-22
13-23
13-24
13-25
13-26
13-27
13-28
13-29
13-30
13-31
13-32
13-33
14-1
14-2
14-3
14-4
14-5
14-6
14-7
14-8
14-9
Title
Page
Number
DVC1, DVC2 Registers ...................................................................................................... 13-21
DBCNT Register................................................................................................................. 13-21
DBCR0 Register ................................................................................................................. 13-22
DBCR1 Register ................................................................................................................. 13-25
DBCR2 Register ................................................................................................................. 13-27
DBCR3 Register ................................................................................................................. 13-32
DBCR4 Register ................................................................................................................. 13-36
DBCR5 Register ................................................................................................................. 13-37
DBCR6 Register ................................................................................................................. 13-39
DBSR Register.................................................................................................................... 13-41
DBERC0 Register ............................................................................................................... 13-44
DEVENT Register .............................................................................................................. 13-50
DDAM Register .................................................................................................................. 13-51
EDBCR0 Register ............................................................................................................... 13-53
EDBSR0 Register ............................................................................................................... 13-54
EDBSRMSK0 Register....................................................................................................... 13-56
OnCE TAP Controller and Registers .................................................................................. 13-58
e200 OnCE Controller and Serial Interface ........................................................................ 13-63
e200 OnCE Status Register................................................................................................. 13-63
OnCE Command Register................................................................................................... 13-65
OnCE Control Register ....................................................................................................... 13-68
CPU Scan Chain Register (CPUSCR) ................................................................................ 13-74
Control State Register (CTL) .............................................................................................. 13-75
OnCE PC FIFO ................................................................................................................... 13-81
PSU Structure...................................................................................................................... 13-87
Parallel Signature Control Register (PSCR) ....................................................................... 13-88
Parallel Signature Status Register (PSSR) .......................................................................... 13-89
Parallel Signature High Register (PSHR) ........................................................................... 13-89
Parallel Signature Low Register (PSLR) ............................................................................ 13-90
Parallel Signature Counter Register (PSCTR) .................................................................... 13-90
Parallel Signature Update High Register (PSUHR)............................................................ 13-90
Parallel Signature Update Low Register (PSULR)............................................................. 13-91
Nexus 3 Functional Block Diagram...................................................................................... 14-4
Client Select Control Register............................................................................................. 14-12
Port Configuration Register (PCR) ..................................................................................... 14-13
Development Control Register 1......................................................................................... 14-14
Development Control Register 2 (DC2) ............................................................................. 14-15
Development Control Register 3 (DC3) ............................................................................. 14-18
Development Control Register 4......................................................................................... 14-20
Development Status Register .............................................................................................. 14-21
Watchpoint Trigger (WT) Register ..................................................................................... 14-22
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xxxv
Figures
Figure
Number
14-10
14-11
14-12
14-13
14-14
14-15
14-16
14-17
14-18
14-19
14-20
14-21
14-22
14-23
14-24
14-25
14-26
14-27
14-28
14-29
14-30
14-31
14-32
14-33
14-34
14-35
14-36
14-37
14-38
14-39
14-40
14-41
14-42
14-43
14-44
14-45
14-46
14-47
A-1
A-2
A-3
Title
Page
Number
Program Trace Start Trigger Control (PTSTC) Register..................................................... 14-23
Program Trace End Trigger Control (PTETC) Register ..................................................... 14-24
Data Trace Start Trigger Control (DTSTC) Register .......................................................... 14-25
Data Trace End Trigger Control (DTETC) Register........................................................... 14-26
Watchpoint Mask Register .................................................................................................. 14-27
Nexus Overrun Control Register......................................................................................... 14-28
Data Trace Control Register................................................................................................ 14-29
Data Trace Start Address n Register ................................................................................... 14-31
Data Trace End Address n Register .................................................................................... 14-31
Read/Write Access Control/Status Register........................................................................ 14-32
Read/Write Access Data Register ....................................................................................... 14-33
Read/Write Access Address Register.................................................................................. 14-35
Relative Address Generation and Re-creation .................................................................... 14-37
Debug Status Message Format............................................................................................ 14-41
Error Message Format......................................................................................................... 14-41
Ownership Trace Message Format...................................................................................... 14-42
Indirect Branch Message (History) Format ........................................................................ 14-45
Indirect Branch Message Format ........................................................................................ 14-45
Direct Branch Message Format........................................................................................... 14-45
Resource Full Message Format........................................................................................... 14-48
Program Correlation Message Formats............................................................................... 14-50
Direct/Indirect Branch with Sync Message Format ............................................................ 14-52
Indirect Branch History with Sync Message Format .......................................................... 14-52
Program Trace—Indirect Branch Message (Traditional).................................................... 14-53
Program Trace—Indirect Branch Message (History) ......................................................... 14-54
Program Trace—Direct Branch (Traditional) and Error Messages .................................... 14-54
Program Trace—Indirect Branch with Sync Message........................................................ 14-54
Data Write Message Format................................................................................................ 14-55
Data Read Message Format ................................................................................................ 14-56
Data Write/Read with Synchronization Message Format ................................................... 14-57
Data Trace—Data Write Message....................................................................................... 14-59
Data Trace—Data Read with Sync Message ...................................................................... 14-60
Data Acquisition Message Format ...................................................................................... 14-61
Watchpoint Message Format............................................................................................... 14-61
Watchpoint Message and Watchpoint Error Message......................................................... 14-62
Error Message Format......................................................................................................... 14-66
Single Pin MSEO Transfers ................................................................................................ 14-70
Dual Pin MSEO Transfers .................................................................................................. 14-71
e200z760 Supervisor Mode Programmer’s Model ................................................................ A-2
e200z760 Supervisor Mode Programmer’s Model DCRs and PMRs.................................... A-3
e200z7 User Mode Programmer’s Model .............................................................................. A-3
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xxxvi
Freescale Semiconductor
Figures
Figure
Number
A-4
A-5
A-6
A-7
A-8
A-9
A-10
A-11
A-12
A-13
A-14
A-15
A-16
A-17
A-18
A-19
A-20
A-21
A-22
A-23
A-24
A-25
A-26
A-27
A-28
A-29
A-30
A-31
A-32
A-33
A-34
A-35
A-36
A-37
A-38
A-39
A-40
A-41
A-42
A-43
A-44
Title
Page
Number
e200 User Mode Programmer’s Model PMRs....................................................................... A-4
Machine State Register (MSR) .............................................................................................. A-4
Processor ID Register (PIR)................................................................................................... A-4
Processor Version Register (PVR) ......................................................................................... A-4
System Version Register (SVR) ............................................................................................. A-5
Integer Exception Register (XER) ......................................................................................... A-5
Exception Syndrome Register (ESR)..................................................................................... A-5
Machine Check Syndrome Register (MCSR) ........................................................................ A-5
Timer Control Register (TCR) ............................................................................................... A-5
Timer Status Register (TSR) .................................................................................................. A-6
Hardware Implementation Dependent Register 0 (HID0) ..................................................... A-6
Hardware Implementation Dependent Register 1 (HID1) ..................................................... A-6
Branch Unit Control and Status Register (BUCSR) .............................................................. A-6
SPE/EFPU Status and Control Register (SPEFSCR)............................................................. A-6
e200 Interrupt Vector Offset Register (IVOR)....................................................................... A-7
Performance Monitor Global Control Register (PMGC0)..................................................... A-7
Performance Monitor Local Control A Registers (PMLCa0–PMLCa3) ............................... A-7
Performance Monitor Local Control B Registers (PMLCb0–PMLCb3)............................... A-7
Performance Monitor Counter Registers (PMC0–PMC3)..................................................... A-7
L1 Cache Control and Status Register 0 (L1CSR0)............................................................... A-8
L1 Cache Control and Status Register 1 (L1CSR1)............................................................... A-8
L1 Cache Configuration Register 0 (L1CFG0)...................................................................... A-8
L1 Cache Configuration Register 1 (L1CFG1)...................................................................... A-8
L1 Flush/Invalidate Register 0 (L1FINV0) ........................................................................... A-8
L1 Flush/Invalidate Register 1 (L1FINV1) ........................................................................... A-9
MMU Configuration Register (MMUCFG) .......................................................................... A-9
TLB0 Configuration Register (TLB0CFG) ........................................................................... A-9
TLB1 Configuration Register (TLB1CFG) ........................................................................... A-9
Data Exception Address Register .......................................................................................... A-9
MMU Control and Status Register 0 (MMUCSR0) .............................................................. A-9
MMU Assist Register 0 (MAS0) ......................................................................................... A-10
MMU Assist Register 1 (MAS1) ......................................................................................... A-10
MMU Assist Register 2 (MAS2) ......................................................................................... A-10
MMU Assist Register 3 (MAS3) ......................................................................................... A-10
MMU Assist Register 4 (MAS4) ......................................................................................... A-10
MMU Assist Register 6 (MAS6) ......................................................................................... A-10
DVC1, DVC2 Registers ........................................................................................................A-11
DBCNT Register...................................................................................................................A-11
DBCR0 Register ...................................................................................................................A-11
DBCR1 Register ...................................................................................................................A-11
DBCR2 Register .................................................................................................................. A-12
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xxxvii
Figures
Figure
Number
A-45
A-46
A-47
A-48
A-49
A-50
A-51
A-52
A-53
A-54
A-55
A-56
A-57
A-58
A-59
A-60
A-61
A-62
A-63
A-64
A-65
A-66
A-67
Title
Page
Number
DBCR3 Register .................................................................................................................. A-12
DBCR4 Register .................................................................................................................. A-12
DBCR5 Register .................................................................................................................. A-12
DBCR6 Register .................................................................................................................. A-13
DBSR Register..................................................................................................................... A-13
DBERC0 Register ................................................................................................................ A-13
DEVENT Register ............................................................................................................... A-14
DDAM Register ................................................................................................................... A-14
EDBCR0 Register ................................................................................................................ A-14
EDBSR0 Register ................................................................................................................ A-14
EDBSRMSK0 Register........................................................................................................ A-15
e200 OnCE Status Register.................................................................................................. A-15
OnCE Command Register.................................................................................................... A-15
OnCE Control Register ........................................................................................................ A-15
CPU Scan Chain Register (CPUSCR) ................................................................................. A-16
Control State Register (CTL) ............................................................................................... A-16
Parallel Signature Control Register (PSCR) ........................................................................ A-17
Parallel Signature Status Register (PSSR) ........................................................................... A-17
Parallel Signature High Register (PSHR) ............................................................................ A-17
Parallel Signature Low Register (PSLR) ............................................................................. A-17
Parallel Signature Counter Register (PSCTR) ..................................................................... A-17
Parallel Signature Update High Register (PSUHR)............................................................. A-17
Parallel Signature Update Low Register (PSULR).............................................................. A-18
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xxxviii
Freescale Semiconductor
About This Book
The primary objective of this manual is to describe the functionality of the e200z760 embedded
microprocessor core for software and hardware developers. This book is intended as a companion to the
EREF: A Programmer's Reference Manual for Freescale Embedded Processors (hereafter referred to as
the EREF).
Users of prior implementations of the e200 core family, such as the e200z6, may notice new terminology
employed throughout this manual. In 2004, most of Freescale’s Embedded Implementation Standards
(EIS) were contributed to help launch Power.org whose mission was to develop, enable, and promote
technology originally conceived as the PowerPC architecture. References to “PowerPC” are replaced with
“Power ISA (Instruction Set Architecture) embedded category.” The term “Auxiliary Processing Unit
(APU)” is used to describe a collection of functionality within the EIS. These APUs were either absorbed
into various parts of the new Power ISA or retained their identity and became known as individual, and
sometimes optional, “categories” or “subcategories” of the Power ISA.
This document includes three levels of architectural and implementation definition, as follows:
• Power ISA embedded category—defines a set of user-level instructions and registers that are a part
of the Power ISA.
• e200 implementation details—In some cases, the Power ISA definition provides a general
framework, leaving specific details up to the implementation. Some of these details are common
to all members of the e200 core family and may be indicated as such.
• e200z7 implementation details—The next level of architectural specificity describes those features
that are shared across the cores in the e200z7 sub-family but that may be in the other members of
the e200 product line.
• e200z760n3 implementation details—The e200z7 subfamily includes one or more specific cores
with unique combinations of functionality. Each processor core in the e200z7 product line defines
instructions, registers, register fields, and other aspects that are more detailed than the architectural
layers described above. When features are implemented differently between the varieties of e200z7
cores, they are specifically noted as such.
As with any technical documentation, it is the readers’ responsibility to be sure they are using the most
recent version of the documentation.
Audience
It is assumed that the reader understands operating systems, microprocessor system design, and the basic
principles of RISC processing.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xxxix
About This Book
Organization
Following is a summary and a brief description of the major parts of this reference manual:
• Chapter 1, “e200z7 Core Complex Overview,” provides a general description of e200z760
functionality.
• Chapter 2, “Register Model,” is useful for software engineers who need to understand the
programming model for the three programming environments and the functionality of each
register.
• Chapter 3, “Instruction Model,” provides an overview of the addressing modes and a description
of the instructions. Instructions are organized by function.
• Chapter 4, “Instruction Pipeline and Execution Timing,” describes how instructions are fetched,
decoded, issued, executed, and completed, and how instruction results are presented to the
processor and memory system. Tables are provided that indicate latency and throughput for each
of the instructions supported by the e200z7.
• Chapter 5, “Embedded Floating-Point Unit,” describes the instruction set architecture of the
Embedded Floating-point (EFPU) implemented on the e200z7. This unit implements scalar and
vector single-precision floating-point instructions to accelerate signal processing and other
algorithms. The e200z760n3 implements version 2 of the embedded floating-point unit (EFPU2).
• Chapter 6, “Signal Processing Extension (SPE) describes the instruction set architecture of the SPE
and implements instructions to accelerate signal processing and other algorithms.
• Chapter 7, “Interrupts and Exceptions,” describes how the e200z7 implements the interrupt model
as it is defined by the Book E architecture.
• Chapter 9, “L1 Cache ,” describes the organization of the on-chip L1 Caches, cache control
instructions, and various cache operations.
• Chapter 10, “Memory Management Unit,” provides specific hardware and software details
regarding the e200z7 MMU implementation.
• Chapter 11, “External Core Complex Interfaces,” describes those aspects of the CCB that are
configurable or that provide status information through the programming interface. It provides a
glossary of signals mentioned throughout the book to offer a clearer understanding of how the core
is integrated as part of a larger device.
• Chapter 12, “Power Management,” describes the power management facilities as they are defined
and implemented in the e200z7 core.
• Chapter 13, “Debug Support,” describes the internal debug facilities as they are implemented in
the e200z760 core.
• Chapter 14, “Nexus 3 Module,” describes the Nexus3 module, which provides real-time
development capabilities for e200z760 processors in compliance with the IEEE-ISTO Nexus
5001-2008 standard.
• Appendix A, “Register Summary,” compiles the register figures for this manual.
• Appendix B, “Revision History,” contains a revision history for this manual.
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xl
Freescale Semiconductor
About This Book
Suggested Reading
This section lists additional reading that provides background for the information in this manual as well as
general information about the architecture.
General Information
The following documentation provides useful information about Power Architecture® technology and
computer architecture in general:
• Power ISA™ Version 2.05, by Power.org™, 2007, available at the Power.org website.
• PowerPC Architecture Book, by Brad Frey, IBM, 2005, available at the IBM website.
• Computer Architecture: A Quantitative Approach, Fourth Edition, by John L. Hennessy and David
A. Patterson, Morgan Kaufmann Publishers, 2006.
• Computer Organization and Design: The Hardware/Software Interface, Third Edition, by David
A. Patterson and John L. Hennessy, Morgan Kaufmann Publishers, 2007.
Freescale documentation is available from the sources listed on the back cover of this manual; the
document order numbers are included in parentheses for ease in ordering:
• EREF: A Programmer's Reference Manual for Freescale Embedded Processors (EREFRM).
Describes the programming, memory management, cache, and interrupt models defined by the
Power ISA™ for embedded environment processors.
• Power ISA™. The latest version of the Power instruction set architecture can be downloaded from
the website www.power.org.
• Category-specific programming environments manuals. These books describe the three major
extensions to the Power ISA embedded environment of the Power ISA. These include the
following:
— AltiVec™ Technology Programming Environments Manual (ALTIVECPEM)
— Signal Processing Engine (SPE) Programming Environments Manual: A Supplement to the
EREF (SPEPEM)
— Variable-Length Encoding (VLE) Programming Environments Manual: A Supplement to the
EREF (VLEPEM)
• Core reference manuals—These books describe the features and behavior of individual
microprocessor cores and provide specific information about how functionality described in the
EREF is implemented by a particular core. They also describe implementation-specific features
and microarchitectural details, such as instruction timing and cache hardware details, that lie
outside the architecture specification.
• Integrated device reference manuals—These manuals describe the features and behavior of
integrated devices that implement and utilize a Power ISA processor core.
• Addenda/errata to reference manuals—When processors have follow-on parts, often an addendum
is provided that describes the additional features and functionality changes. These addenda are
intended for use with the corresponding reference manuals.
• Hardware specifications—Hardware specifications provide specific data regarding bus timing,
signal behavior, and AC, DC, and thermal characteristics, as well as other design considerations.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xli
About This Book
•
•
Technical summaries—Each device has a technical summary that provides an overview of its
features. This document is roughly the equivalent to the overview (Chapter 1) of an
implementation’s reference manual.
Application notes—These short documents address specific design issues useful to programmers
and engineers working with Freescale processors.
Additional literature is published as new processors become available. For a current list of documentation,
refer to http://www.freescale.com.
Acronyms and Abbreviations
Table i contains acronyms and abbreviations that are used in this document. Note that the meanings for
some acronyms (such as XER) are historical, and the words for which an acronym stands may not be
intuitively obvious.
Table i. Acronyms and Abbreviated Terms
Term
CR
Meaning
Condition register
CTR
Count register
DCR
Data control register
DTLB
Data translation lookaside buffer
EA
Effective address
ECC
Error checking and correction
FPR
Floating-point register
GPR
General-purpose register
IEEE
Institute of Electrical and Electronics Engineers
LR
Link register
LRU
Least recently used
LSB
Least-significant byte
lsb
Least-significant bit
MMU
Memory management unit
MSB
Most-significant byte
msb
Most-significant bit
MSR
Machine state register
NaN
Not a number
No-op
No operation
PTE
Page table entry
PVR
Processor version register
RISC
Reduced instruction set computing
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xlii
Freescale Semiconductor
About This Book
Table i. Acronyms and Abbreviated Terms (continued)
Term
RTL
Meaning
Register transfer language
SIMM
Signed immediate value
SPR
Special-purpose register
SRR0
Machine status save/restore register 0
SRR1
Machine status save/restore register 1
TB
Time base facility
TBL
Time base lower register
TBU
Time base upper register
TLB
Translation lookaside buffer
UIMM
Unsigned immediate value
UISA
User instruction set architecture
VA
Virtual address
VLE
Variable-length encoding
XER
Register used for indicating conditions such as carries and overflows for integer operations
Terminology Conventions
Table ii lists certain terms used in this manual that differ from the architecture terminology conventions.
Table ii. Terminology Conventions
The Architecture Specification
This Manual
Extended mnemonics
Simplified mnemonics
Fixed-point unit (FXU)
Integer unit (IU)
Privileged mode (or privileged state)
Supervisor-level privilege
Problem mode (or problem state)
User-level privilege
Real address
Physical address
Relocation
Translation
Storage (locations)
Memory
Storage (the act of)
Access
Store in
Write back
Store through
Write through
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
-xliii
About This Book
Table iii describes instruction field notation conventions used in this manual.
Table iii. Instruction Field Conventions
The Architecture Specification
Equivalent to:
BA, BB, BT
crbA, crbB, crbD (respectively)
BF, BFA
crfD, crfS (respectively)
D
d
DS
ds
/, //, ///
0...0 (shaded)
RA, RB, RT, RS
rA, rB, rD, rS (respectively)
SI
SIMM
U
IMM
UI
UIMM
e200z7 Power Architecture Core Reference Manual, Rev. 2
-xliv
Freescale Semiconductor
Chapter 1
e200z7 Core Complex Overview
This chapter provides an overview of the e200z7 microprocessor core built on Power Architecture®
technology for embedded processors. It includes the following:
• A summary of the feature set for this core
• An overview of the register set
• An overview of the instruction set
• An overview of interrupts and exception handling
• A summary of instruction pipeline and flow
• A description of the memory management architecture
• High-level details of the core memory and coherency model
1.1
e200z7 Overview
The e200z7 processor core is a low-cost implementation of Power Architecture technology for embedded
processors. It is a dual-issue, 32-bit, Power ISA-compliant design with 64-bit, general-purpose registers
(GPRs).
In addition to the base Power ISA embedded category instruction set, the e200z7 also implements the
variable-length encoding (VLE) category, providing improved code density. See the EREF and
supplementary VLE PEM for more information about the VLE extension.
Instructions of the signal processing extension (SPE) category, as well as of the embedded vector and
scalar floating-point categories, are provided to support real-time integer and single-precision embedded
floating-point operations using the GPRs. The e200z7 does not support Power ISA floating-point
instructions in hardware, but traps them so they can be emulated by software.
All arithmetic instructions that execute in the core operate on data in the GPRs, which have been extended
to 64 bits to support vector instructions defined by the SPE and embedded vector floating-point categories.
These instructions operate on a vector pair of 16- or 32-bit data types and deliver vector and scalar results.
The e200z7 contains a 16-KB instruction cache, a 16-KB data cache, as well as a memory management
unit. A Nexus Class 3+ module is also integrated.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
1-1
e200z7 Core Complex Overview
Figure 1-1 shows a high-level block diagram of the e200z7 core.
Instruction/Control Unit
Additional Features
• OnCe/Nexus 1/Nexus 3
control logic
• AMBA AHB-Lite bus
• SPE (SIMD)
• VLE
• Embedded scalar/
vector floating-point
• Power management
• Time base/decrementer
counter
• Clock multiplier
Unified Memory Unit
Fetch Unit
Program Counter
Instruction Buffer
(10 Instructions)
Two/Four
instructions
64-Entry
Fully Associative
TLB
3-Cycle
Fetch Stage
Decode
Stage
EA Calc
32-Entry Branch
Target Buffer
MAS
Registers
Dual-Instruction, In-Order Dispatch
Execute Stage
32 GPRs
(64-Bit)
CR
XER
LR
CTR
Four execute
stages with
overlapped
execution and
feed forwarding
Write-Back Stage
••
•
1 KB–4 GB page sizes
Branch Processing Unit
+
Software-Managed
L1 Unified MMU
Execution Units
Embedded
Scalar FPU
SPE
Unit
+÷
+ ÷
Branch
Unit
Instruction Bus Interface Unit
32
Embedded
Vector FPU
Integer
Unit
Load/Store
Unit
+÷
+ ÷
+ EA Calc
VLE
SPRs
64
Address
Data
N
Control
Optional
Extension
Data Bus Interface Unit
Dual-Instruction, In-Order Write Back
32
Address
64
Data
N
Control
Figure 1-1. e200z7 Block Diagram
1.1.1
Features
Key features of the e200z7 are summarized as follows:
• Dual-issue, 32-bit Power ISA–compliant core
• Implementation of the VLE category for reduced code footprint
• In-order execution and retirement
• Precise exception handling
• Branch processing unit (BPU)
— Dedicated branch address calculation adder
— Branch target prefetching using a branch target buffer (BTB)
— Return address stack
• Load/store unit (LSU)
— Three-cycle load latency
e200z7 Power Architecture Core Reference Manual, Rev. 2
1-2
Freescale Semiconductor
e200z7 Core Complex Overview
—
—
—
—
•
•
•
•
•
•
•
•
•
1.2
Fully pipelined
Big- and little-endian support
Misaligned access support
AMBA (advanced microcontroller bus architecture) AHB-Lite (advanced high-performance
bus) 64-bit system bus
Memory management unit (MMU) with 64-entry, fully associative TLB and multiple page-size
support
16-KB, 4 way set-associative Harvard instruction and data caches
SPE unit supporting SIMD fixed-point and single-precision floating-point operations, using the
64-bit GPR file.
Embedded floating-point unit (EFPU) supporting scalar single-precision floating-point operations.
Performance management unit (PMU) supporting execution profiling
Nexus Class 3+ real-time development unit
Power management
— Low-power design—extensive clock gating
— Power-saving modes: doze, nap, sleep, and wait
— Dynamic power management of execution units, caches, and MMUs
e200z7-specific debug interrupt.
Testability
— Synthesizable, full MuxD scan design
— Built-in parallel signature unit
Programming Model
This section describes the register model, instruction model, and the interrupt model as they are defined
by the Power ISA, Freescale EIS, and the e200z7 implementation.
1.2.1
Register Set
Figure 1-2–Figure 1-5 show the complete e200z7 register set divided into supervisor and user-level
registers and grouped into general-purpose registers (GPRs), special-purpose registers (SPRs), device
control registers (DCRs), and any performance monitor registers (PMRs) that may implemented in a
particular variation of the e200z7 core family The number to the right of the special-purpose registers
(SPRs) is the decimal number used in the instruction syntax to access the register. For example, the integer
exception register (XER) is SPR 1.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
1-3
e200z7 Core Complex Overview
Figure 1-2 shows the supervisor mode programmer’s model.
General Registers
Exception Handling/Control Registers
SPR General
General-Purpose Registers
Condition Register
SPR 272
GPR0
SPRG1
GPR1
CR
Count
CTR
SPR 9
Link
LR
SPR 8
GPR31
XER
XER
SPR 1
Accumulator
ACC
Processor Version
PVR
SPR 287
Processor ID
PIR
SPR 286
HID1
SPR 273
SRR1
SPR 27 Interrupt Vector
SPRG2
SPR 274
CSRR0
SPRG3
SPR 275
SPRG4
SPR 276
SPRG5
SPR 277
DSRR11
SPR 575
SPR 400
CSRR1
SPR 59
IVOR1
SPR 401
DSRR01
SPR 574
IVOR15
SPR 415
IVOR321
SPR 528
IVOR351
SPR 531
1
SPR 278
MCSRR0
SPR 279
MCSRR11 SPR 571
SPRG8
SPR 604
SPRG9
SPR 605
SPR 256
SPR 570
Exception
Syndrome Register
ESR
SPR 62
Machine Check
Machine Check
Syndrome Register Address Register
MCSR
SPR 572
MCAR
SPR 573
Data Exception
DEAR
SPR 61
Timers
BTB Register
BTB Control1
Time Base
(write only)
TBL
SPR 284
DEC
SPR 22
TBU
SPR 285
DECAR
SPR 54
Decrementer
2
Debug Registers
Debug Control
IVOR0
SPRG7
SPR 1009
Instruction
Address Compare
SPR 58
SPRG6
USPRG0
System Version1
SPR 1023
SVR
SPR 63
SPR 26
Hardware Implementation
Dependent1
HID0
SPR 1008
Machine State
MSR
IVPR
SRR0
User SPR
Processor Control Registers
Interrupt Vector
Save and Restore
SPRG0
BUCSR
SPR 1013
SPE/EFPU Registers
Control and Status
TCR
SPR 340
TSR
SPR 336
SPE /EFPU Status and
Control Register
DBCR0
SPR 308
IAC1
SPR 312
DBCR1
SPR 309
IAC2
SPR 313
DBCR2
SPR 310
IAC3
SPR 314
DBCR31
SPR 561
IAC4
SPR 315
DBCR41
SPR 563
IAC5
SPR 565
DBCR51
SPR 564
IAC6
SPR 566
DBCR61
SPR 603
IAC7
SPR 567
MAS0
SPR 624
DBERC01
SPR 569
IAC8
SPR 568
MAS1
SPR 625
DEVENT1
MMUCFG SPR 1015
SPR 975
MAS2
SPR 626
TLB0CFG
SPR 576 Data Address Compare
DAC1
SPR 316
SPR 688
MAS3
SPR 627
TLB1CFG
SPR 689
MAS4
SPR 628
MAS6
SPR 630
DDAM1
DAC2
Debug Status
DBSR
SPR 304
Debug Counter
DBCNT
1
SPR 562
SPR 317
Data Value Compare
DVC1
SPR 318
DVC2
SPR 319
1 - These e200-specific registers may not be supported by
other Power Architecture processors
2 - Optional registers defined by the Power ISA embedded
architecture
3 - Read-only registers
SPEFSCR SPR 512
Memory Management Registers
MMU
Assist1
Process ID
PID0
Control & Configuration
SPR 48
MMUCSR0 SPR 1012
Cache Registers
Cache Control1
Cache
Configuration
(read only)
L1CFG0
SPR 515
L1CFG1
SPR 516
L1CSR0
SPR 1010
L1CSR1
SPR 1011
L1FINV0
SPR 1016
L1FINV1
SPR 959
Figure 1-2. e200z760 Supervisor Mode Programmer’s Model
e200z7 Power Architecture Core Reference Manual, Rev. 2
1-4
Freescale Semiconductor
e200z7 Core Complex Overview
Figure 1-3 shows the supervisor mode programmer models’s DCRs and PMRs.
PSU Registers1
Performance Monitor
Registers1
PSU
User Control
(read-only)
Control
PMGC0
PMR 400
PMLCa0
PMR 144
PMLCa1
PMR 145
PMLCa2
Counters
PMR 384
PMC0
PMR 16
UPMLCa0 PMR 128
PMC1
PMR 17
UPMLCa1 PMR 129
PMC2
PMR 18
PMR 146
UPMLCa2 PMR 130
PMC3
PMR 19
PMLCa3
PMR 147
UPMLCa3 PMR 131
PMLCb0
PMR 272
PMLCb1
PMR 273
PMLCb2
PMR 274
PMLCb3
PMR 275
UPMGC0
User Counters
UPMLCb0 PMR 256 (read-only)
UPMLCb1 PMR 257
PMR 0
UPMC0
UPMLCb2 PMR 258
PMR 1
UPMC1
UPMLCb3 PMR 259
PMR 2
UPMC2
UPMC3
PSCR
DCR 272
PSSR
DCR 273
PSHR
DCR 274
PSLR
DCR 275
PSCTR
DCR 276
PSUHR
DCR 277
PSULR
DCR 278
PMR 3
Cache Access Registers1
CDACNTL
DCR 351
CDADATA
DCR 350
Note:
1) These e200-specific registers may not be supported by other Power ISA embedded category processors
Figure 1-3. e200z760 Supervisor Mode Programmer’s Model DCRs and PMRs
Figure 1-4 shows the user mode programmer’s model.
General Registers
Time Base
Condition Register
CR
General-Purpose Registers
Count Register
CTR
SPR 268
TBU
SPR 269
Cache Register
(Read-Only)
Cache
Configuration
L1CFG0
SPR 515
L1CFG1
SPR 516
GPR1
Link Register
Control Registers
SPR 8
SPR General (Read-Only)
GPR31
XER
XER
TBL
GPR0
SPR 9
LR
Timers (Read-Only)
SPR 1
Accumulator
ACC
SPRG4
SPR 260
SPRG5
SPR 261
Category Registers
SPRG6
SPR 262
SPE Status and Control Register
SPRG7
SPR 263
SPEFSCR SPR 512
User SPR
USPRG0
Debug
DEVENT
SPR 975
DDAM
SPR 576
SPR 256
Figure 1-4. e200z7 User Mode Programmer’s Model
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
1-5
e200z7 Core Complex Overview
Figure 1-5 shows the user mode programmer’s model PMRs.
Performance Monitor
Registers
User Control
(read-only)
User Counters
(read-only)
UPMGC0
PMR 384
UPMC0
PMR 0
UPMLCa0
PMR 128
UPMC1
PMR 1
UPMLCa1
PMR 129
UPMC2
PMR 2
UPMLCa2
PMR 130
UPMC3
PMR 3
UPMLCa3
PMR 131
UPMLCb0
PMR 256
UPMLCb1
PMR 257
UPMLCb2
PMR 258
UPMLCb3
PMR 259
Note:
These e200-specific registers may not be supported by other Power ISA embedded category processors.
Figure 1-5. e200 User Mode Programmer’s Model PMRs
The GPRs are accessed through instruction operands. Access to other registers can be explicit (by using
instructions for that purpose such as the Move To Special Purpose Register (mtspr) and Move From
Special Purpose Register (mfspr) instructions) or implicit as part of the execution of an instruction. Some
registers are accessed both explicitly and implicitly.
For more information about the registers, see Chapter 2, “Register Model.”
1.2.2
Instruction Set
The e200z7 supports the following architectural extensions: VLE, ISEL, debug, machine check, wait,
SPE, cache line locking, and enhanced reservations.
The e200z7 implements the following instructions:
• The Power ISA instruction set for 32-bit embedded implementations. This is composed primarily
of the user-level instructions defined by the user instruction set architecture (UISA). The e200z7
does not include the Power ISA floating-point, load string, or store string instructions.
• The e200z7 supports the following EIS-defined instructions:
— Integer select category. This category consists of the Integer Select instruction (isel), which
functions as an if-then-else statement that selects between two source registers by comparison
to a CR bit. This instruction eliminates conditional branches, takes fewer clock cycles than the
equivalent coding, and reduces the code footprint.
e200z7 Power Architecture Core Reference Manual, Rev. 2
1-6
Freescale Semiconductor
e200z7 Core Complex Overview
— Cache line lock and unlock category. The cache block lock and unlock category consists of the
instructions described in Table 1-1, which defines a set of instructions for locking and clearing
cache lines.
Table 1-1. Cache Block Lock and Unlock Instructions
Name
Mnemonic
Syntax
Data Cache Block Lock Clear
dcblc
CT,rA,rB
Data Cache Block Touch and Lock Set
dcbtls
CT,rA,rB
dcbtstls
CT,rA,rB
Instruction Cache Block Lock Clear
icblc
CT,rA,rB
Instruction Cache Block Touch and Lock Set
icbtls
CT,rA,rB
Data Cache Block Touch for Store and Lock Set
— Debug category. This category defines the Return from Debug Interrupt instruction (rfdi),
which defines a separate set of interrupt save and restore registers to provide greater
responsiveness for debug interrupts.
— SPE vector category. New vector instructions are defined that view the 64-bit GPRs as being
composed of a vector of two 32-bit elements (some of the instructions also read or write 16-bit
elements). Some scalar instructions are defined for DSP that produce a 64-bit scalar result.
— The embedded floating-point categories provide single-precision scalar and vector
floating-point instructions. Scalar floating-point instructions use only the lower 32 bits of the
GPRs for single-precision floating-point calculations. Table 1-2 lists embedded floating-point
instructions.
— Wait category. This category consists of the wait instruction that allows software to cease all
synchronous activity and wait for an asynchronous interrupt to occur.
— Machine check category. This feature set adds two new instructions (rfmci, se_rfmci) and four
new registers (MCSRRO, MCSRR1, MCSR, MCAR)
— Volatile Context Save/Restore category supports the capability to quickly save and restore
volatile register context on entry into an interrupt handler.
Table 1-2. Scalar and Vector Embedded Floating-Point Instructions
Mnemonic
Instruction
Syntax
Scalar
Vector
Convert Floating-Point from Signed Fraction
efscfsf
evfscfsf
rD,rB
Convert Floating-Point from Signed Integer
efscfsi
evfscfsi
rD,rB
Convert Floating-Point from Unsigned Fraction
efscfuf
evfscfuf
rD,rB
Convert Floating-Point from Unsigned Integer
efscfui
evfscfui
rD,rB
Convert Floating-Point Single-Precision from Half-Precision
efscfh
evfscfh
rD,rB
Convert Floating-Point Single-Precision to Half-Precision
efscth
evfscth
rD,rB
Convert Floating-Point to Signed Fraction
efsctsf
evfsctsf
rD,rB
Convert Floating-Point to Signed Integer
efsctsi
evfsctsi
rD,rB
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
1-7
e200z7 Core Complex Overview
Table 1-2. Scalar and Vector Embedded Floating-Point Instructions (continued)
Mnemonic
Instruction
Syntax
Scalar
Vector
Convert Floating-Point to Signed Integer with Round Toward Zero
efsctsiz
evfsctsiz
rD,rB
Convert Floating-Point to Unsigned Fraction
efsctuf
evfsctuf
rD,rB
Convert Floating-Point to Unsigned Integer
efsctui
evfsctui
rD,rB
Convert Floating-Point to Unsigned Integer with Round Toward Zero
efsctuiz
evfsctuiz
rD,rB
Floating-Point Absolute Value
efsabs
evfsabs
rD,rA
Floating-Point Add
efsadd
evfsadd
rD,rA,rB
Floating-Point Compare Equal
efscmpeq
evfscmpeq
crD,rA,rB
Floating-Point Compare Greater Than
efscmpgt
evfscmpgt
crD,rA,rB
Floating-Point Compare Less Than
efscmplt
evfscmplt
crD,rA,rB
Floating-Point Divide
efsdiv
evfsdiv
rD,rA,rB
Floating-Point Multiply
efsmul
evfsmul
rD,rA,rB
Floating-Point Negate
efsneg
evfsneg
rD,rA
Floating-Point Negative Absolute Value
efsnabs
evfsnabs
rD,rA
Floating-Point Subtract
efssub
evfssub
rD,rA,rB
Floating-Point Test Equal
efststeq
evfststeq
crD,rA,rB
Floating-Point Test Greater Than
efststgt
evfststgt
crD,rA,rB
Floating-Point Test Less Than
efststlt
evfststlt
crD,rA,rB
Floating-Point Single-Precision Maximum
efsmax
evfsmax
rD,rA,rB
Floating-Point Single-Precision Minimum
efsmin
evfsmin
rD,rA,rB
Floating-Point Single-Precision Multiply-Add
efsmadd
evfsmadd
rD,rA,rB
Floating-Point Single-Precision Negative Multiply-Add
efsnmadd
evfsnmadd
rD,rA,rB
efsmsub
evfsmsub
rD,rA,rB
efsnmsub
evfsnmsub
rD,rA,rB
efssqrt
evfssqrt
rD,rA
Vector Floating-Point Single-Precision Add / Subtract
—
evfsaddsub
rD,rA,rB
Vector Floating-Point Single-Precision Add / Subtract Exchanged
—
evfsaddsubx
rD,rA,rB
Vector Floating-Point Single-Precision Add Exchanged
—
evfsaddx
rD,rA,rB
Vector Floating-Point Single-Precision Difference / Sum
—
evfsdiffsum
rD,rA,rB
Vector Floating-Point Single-Precision Differences
—
evfsdiff
rD,rA,rB
Vector Floating-Point Single-Precision Multiply By Even Element
—
evfsmule
rD,rA,rB
Vector Floating-Point Single-Precision Multiply By Odd Element
—
evfsmulo
rD,rA,rB
Floating-Point Single-Precision Multiply-Subtract
Floating-Point Single-Precision Negative Multiply-Subtract
Floating-Point Single-Precision Square Root
e200z7 Power Architecture Core Reference Manual, Rev. 2
1-8
Freescale Semiconductor
e200z7 Core Complex Overview
Table 1-2. Scalar and Vector Embedded Floating-Point Instructions (continued)
Mnemonic
Instruction
Syntax
Scalar
Vector
Vector Floating-Point Single-Precision Multiply Exchanged
—
evfsmulx
rD,rA,rB
Vector Floating-Point Single-Precision Subtract / Add Exchanged
—
evfssubaddx
rD,rA,rB
Vector Floating-Point Single-Precision Subtract Exchanged
—
evfssubx
rD,rA,rB
Vector Floating-Point Single-Precision Subtract/Add
—
evfssubadd
rD,rA,rB
Vector Floating-Point Single-Precision Sum / Difference
—
evfssumdiff
rD,rA,rB
Vector Floating-Point Single-Precision Sums
—
evfssum
rD,rA,rB
For more information about the instruction set, see Chapter 3, “Instruction Model.”
1.2.2.1
VLE Category
This section describes the extensions to the architecture to support VLE.
• rfci, rfdi, rfi do not mask bit 62 of CSRR0, DSRR0, or SRR0. The destination address is
[D,C]SRR0[32–62] || 0b0.
• bclr, bclrl, bcctr, bcctrl do not mask bit 62 of the LR or CTR. The destination address is
[LR, CTR][32–62] || 0b0.
1.2.3
Interrupts and Exception Handling
The core supports an extended exception handling model, with nested interrupt capability and extensive
interrupt vector programmability. The following sections define the interrupt model, including an
overview of interrupt handling as implemented on the e200z7 core, a brief description of the interrupt
classes, and an overview of the registers involved in the processes.
For more information about interrupts and exception handling, see Chapter 7, “Interrupts and Exceptions.”
1.2.3.1
Interrupt Handling
In general, interrupt processing begins with an exception that occurs due to external conditions, errors, or
program execution problems. When an exception occurs, the processor checks whether interrupt
processing is enabled for that particular exception. If enabled, the interrupt causes the state of the processor
to be saved in the appropriate registers and prepares to begin execution of the handler located at the
associated vector address for that particular exception.
Once the handler is executing, the implementation may need to check bits in the exception syndrome
register (ESR), the machine check syndrome register (MCSR), or the signal processing and embedded
floating-point status and control register (SPEFSCR), depending on the exception type, to verify the
specific cause of the exception and take appropriate action.
The core complex supports the interrupts described in Section 1.2.3.4, “Interrupt Registers.”
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
1-9
e200z7 Core Complex Overview
1.2.3.2
Interrupt Classes
All interrupts may be categorized as asynchronous/synchronous and critical/noncritical.
• Asynchronous interrupts (such as machine check, critical input, and external interrupts) are caused
by events that are independent of instruction execution. For asynchronous interrupts, the address
reported in a save/restore register is the address of the instruction that would have executed next
had the asynchronous interrupt not occurred.
• Synchronous interrupts are those that are caused directly by the execution or attempted execution
of instructions. Synchronous inputs are further divided into precise and imprecise types.
— Synchronous precise interrupts are those that precisely indicate the address of the instruction
causing the exception that generated the interrupt or, in some cases, the address of the
immediately following instruction. The interrupt type and status bits allow determination of
which of the two instructions has been addressed in the appropriate save/restore register.
— Synchronous imprecise interrupts are those that may indicate the address of the instruction
causing the exception that generated the interrupt, or some instruction after the instruction
causing the interrupt. If the interrupt was caused by either the context synchronizing
mechanism or the execution synchronizing mechanism, the address in the appropriate
save/restore register is the address of the interrupt-forcing instruction. If the interrupt was not
caused by either of those mechanisms, the address in the save/restore register is the last
instruction to start execution and may not have completed. No instruction following the
instruction in the save/restore register has executed.
1.2.3.3
Interrupt Types
The e200z7 core processes all interrupts as either debug, machine check, critical, or noncritical types.
Separate control and status register sets are provided for each type of interrupt. Table 1-3 describes the
interrupt types.
Table 1-3. Interrupt Types
Category
Description
Programming Resources
Noncritical First-level interrupts that let the processor change program flow
interrupts to handle conditions generated by external signals, errors, or
unusual conditions arising from program execution or from
programmable timer-related events. These interrupts are largely
identical to those defined by the OEA.
SRR0/SRR1 SPRs and rfi instruction.
Asynchronous noncritical interrupts can be
masked by the external interrupt enable bit,
MSR[EE].
Critical
interrupts
Critical save and restore SPRs (CSRR0/CSRR1)
and rfci. Critical input and watchdog timer critical
interrupts can be masked by the critical enable
bit, MSR[CE]. Debug events can be masked by
the debug enable bit MSR[DE].
Critical input, watchdog timer, and debug interrupts. These
interrupts can be taken during a noncritical interrupt or during
regular program flow. The critical input and watchdog timer
interrupts are treated as critical interrupts. If the debug feature is
not enabled, a debug interrupt is treated as a critical interrupt.
e200z7 Power Architecture Core Reference Manual, Rev. 2
1-10
Freescale Semiconductor
e200z7 Core Complex Overview
Table 1-3. Interrupt Types
Category
Description
Programming Resources
Machine
check
interrupt
Provides a separate set of resources for the machine check
Machine check save and restore SPRs
interrupt. See Section 7.6.2, “Machine Check Interrupt (IVOR1).” (MCSRR0/MCSRR1) and rfmci. Maskable with
the machine check enable bit, MSR[ME].
Includes the machine check syndrome register
(MCSR).
Debug
interrupt
Provides a separate set of resources for the debug interrupt. See Debug save and restore SPRs (DSRR0/DSRR1)
Section 7.6.16, “Debug Interrupt (IVOR15).”
and rfdi. Can be masked by the debug interrupt
enable bit, MSR[DE]. Includes the debug status
register (DBSR).
Because save/restore register pairs are serially reusable, care must be taken to preserve program state that
may be lost when an unordered interrupt is taken.
1.2.3.4
Interrupt Registers
The registers associated with interrupt handling are described in Table 1-4.
Table 1-4. Interrupt Registers
Register
Description
Noncritical Interrupt Registers
SRR0
Save/restore register 0—Stores the address of the instruction causing the exception or the address of the
instruction that will execute after the rfi instruction.
SRR1
Save/restore register 1—Saves machine state on noncritical interrupts and restores machine state when an rfi
instruction is executed.
Critical Interrupt Registers
CSRR0
Critical save/restore register 0—On critical interrupts, stores either the address of the instruction causing the
exception or the address of the instruction that executes after the rfci.
CSRR1
Critical save/restore register 1—Saves machine state on critical interrupts and restores machine state when an rfci
instruction is executed.
Debug Interrupt Registers
DSRR0
Debug save/restore register 0—Used to store the address of the instruction that will execute when an rfdi
instruction is executed.
DSRR1
Debug save/restore register 1—Stores machine state on debug interrupts and restores machine state when an rfdi
instruction is executed.
Machine Check Interrupts
MCSRR0
Machine check save/restore register 0—On machine check interrupts, stores either the address of the instruction
causing the exception or the address of the instruction that executes after the rfmci instruction.
MCSRR1
Machine check save/restore register 1—Saves machine state on machine check interrupts and restores those
values when an rfmci instruction is executed
Syndrome Registers
MCSR
Machine check syndrome register—Saves machine check syndrome information on machine check interrupts.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
1-11
e200z7 Core Complex Overview
Table 1-4. Interrupt Registers (continued)
Register
Description
ESR
Exception syndrome register—Provides a syndrome to differentiate among the different kinds of exceptions that
generate the same interrupt type. Upon generation of a specific exception type, the associated bits are set and all
other bits are cleared.
SPE Interrupt Registers
SPEFSCR Signal processing and embedded floating-point status and control register—Provides interrupt control and status
as well as various condition bits associated with the operations performed by the SPE.
Other Interrupt Registers
DEAR
Data exception address register—Contains the address that was referenced by a load, store, or cache
management instruction that caused an alignment, data TLB miss, or data storage interrupt.
IVPR
IVORs
Together, IVPR[32–47] || IVORn [48–59] || 0b0000 define the address of an interrupt-processing routine. See
Table 1-5 and Chapter 7, “Interrupts and Exceptions,” for more information.
MSR
Machine state register—Defines the state of the processor. When an interrupt occurs, it is updated to preclude
unrecoverable interrupts from occurring during the initial portion of the interrupt handler
DBSR
Debug status register—Contains status on debug events and the most recent processor reset. When debug
interrupts are enabled, a set bit in DBSR that is not MRR, VLES, or CNT1TRG causes a debug interrupt to be
generated.
Each interrupt has an associated interrupt vector address, obtained by concatenating IVPR[32–47] with the
address index in the associated IVOR (that is, IVPR[32–47] || IVORn[48–59] || 0b0000). The resulting
address is that of the instruction to be executed when that interrupt occurs. IVPR and IVOR values are
indeterminate on reset and must be initialized by the system software using mtspr. Table 1-5 lists IVOR
registers implemented on the e200z7 core and the associated interrupts.
Table 1-5. Exceptions and Conditions
IVORn
None1
1
IVORn
Interrupt Type
System reset (not an interrupt)
10
Decrementer
02
Critical input
11
Fixed-interval timer
1
Machine check
12
Watchdog timer
2
Data storage
13
Data TLB error
3
Instruction storage
14
Instruction TLB error
4
External input
15
Debug
5
Alignment
6
Program
32
SPE unavailable
7
Floating-point unavailable
33
SPE data exception
8
System call
34
SPE round exception
9
APU unavailable (not used by this core)
2
2
Interrupt Type
16–31
Reserved
Vector to [p_rstbase[0:29]] || 0xFFC.
Auto-vectored external and critical input interrupts use this IVOR. Vectored interrupts supply an interrupt vector offset directly.
e200z7 Power Architecture Core Reference Manual, Rev. 2
1-12
Freescale Semiconductor
e200z7 Core Complex Overview
1.3
Microarchitecture Summary
The e200z7 processor has a ten-stage pipeline with four stages for instruction execution. These stages
operate in an overlapped fashion, allowing single clock instruction execution for most instructions.
1. Instruction fetch 0
2. Instruction fetch 1
3. Instruction fetch 2
4. Instruction decode 0
5. Instruction decode 1/register file read/effective address calculation
6. Execute 0/memory access 0
7. Execute 1/memory access 1
8. Execute 2/memory access 2
9. Execute 3
10. Register writeback
The integer execution unit consists of a 32-bit arithmetic unit, a logic unit, a 32-bit barrel shifter, a
mask-insertion unit, a condition register manipulation unit, a count-leading-zeros unit, a 32  32 hardware
multiplier array, result feed-forward hardware, and support hardware for division.
Most arithmetic and logical operations are executed in a single cycle with the exception of multiply, which
is implemented with a pipelined hardware array, and the divide instructions. A count-leading-zeros unit
operates in a single clock cycle.
The instruction unit contains a program counter incrementer and a dedicated branch address adder to
minimize delays during change-of-flow operations. Sequential prefetching is performed to ensure a supply
of instructions into the execution pipeline. Branch target prefetching is performed to accelerate taken
branches. Prefetched instructions are placed into an instruction buffer.
Branch target addresses are calculated in parallel with branch instruction decode, resulting in execution
time of four clocks for correctly predicted branches. Conditional branches which are not taken execute in
a single clock. Branches with successful BTB target prefetching have an effective execution time of one
clock if correctly predicted.
Memory load and store operations are provided for byte, half-word, word (32-bit), and double-word data
with automatic zero or sign extension of byte and half-word load data as well as optional byte reversal of
data. These instructions can be pipelined to allow effective single-cycle throughput. Load and store
multiple word instructions allow low-overhead context save and restore operations. The load/store unit
(LSU) contains a dedicated effective address adder to optimize effective address generation.
The condition register unit supports the condition register (CR) and condition register operations defined
by the architecture. The CR consists of eight 4-bit fields that reflect the results of certain operations
generated by instructions such as move, integer and floating-point compare, arithmetic, and logical
instructions. The CR also provides a mechanism for testing and branching.
Vectored and auto-vectored interrupts are supported by the CPU. Vectored interrupt support is provided to
allow multiple interrupt sources to have unique interrupt handlers invoked with no software overhead.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
1-13
e200z7 Core Complex Overview
The SPE category supports vector instructions operating on 8, 16- and 32-bit integer and fractional data
types. The vector and scalar floating-point instructions operate on 32-bit IEEE Std 754™ single-precision
floating-point formats, and support single-precision floating-point operations in a pipelined fashion.
The 64-bit GPRs are used for source and destination operands for all vector instructions, and there is a
unified storage model for single-precision floating-point data types of 32 bits and the normal integer type.
The following low latency fixed-point and floating-point operations are provided:
• Add
• Subtract
• Mixed Add/subtract
• Sum
• Diff
• Min
• Max
• Multiply
• Multiply-add
• Multiply-sub
• Divide
• Square Root
• Compare
• Conversion
Most operations can be pipelined.
1.3.1
Instruction Unit Features
The e200z7 instruction unit implements the following:
• 64-bit fetch path that supports fetching of two 32-bit or up to four 16-bit VLE instructions per clock
• Instruction buffer holds up to ten 32-bit instructions
• Dedicated PC incrementer supporting instruction prefetches
• Branch processing unit with dedicated branch address adder and branch target buffer (BTB)
supporting single-cycle execution of successfully predicted branches
1.3.2
Integer Unit Features
The integer unit supports single-cycle execution of most integer instructions:
• 32-bit AU for arithmetic and comparison operations
• 32-bit LU for logical operations
• 32-bit priority encoder for count-leading-zeros function
• 32-bit single-cycle barrel shifter for static shifts and rotates
• 32-bit mask unit for data masking and insertion
e200z7 Power Architecture Core Reference Manual, Rev. 2
1-14
Freescale Semiconductor
e200z7 Core Complex Overview
•
•
1.3.3
Divider logic for signed and unsigned divide in 4 to 15 clocks with minimized execution timing
(EU1 only)
Pipelined 32  32 hardware multiplier array that supports 32  32  32 multiply with 3-clock
latency, 1-clock throughput
Load/Store Unit (LSU) Features
The e200z7 LSU supports load, store, and load multiple/store multiple instructions:
• 32-bit effective address adder for data memory address calculations
• Pipelined operation supports throughput of one load or store operation per cycle
• Dedicated 64-bit interface to memory supports saving and restoring of up to 2 registers per cycle
for load multiple and store multiple word instructions
1.3.4
L1 Cache Features
The features of the cache are as follows:
• Separate 16-KB, 4 way set-associative instruction and data caches
• Copy-back and write-through support
• Eight-entry store buffer
• Push buffer
• Line-fill buffers with critical double-word forwarding for both data loads and instruction fetches
• 32-bit address bus plus attributes and control
• Separate unidirectional 64-bit read and 64-bit write data buses
• Cache line locking
• Data cache locking control instructions-Data Cache Block Touch and Lock Set (dcbtls), Data
Cache Block Touch for Store and Lock Set (dcbtstls), and Data Cache Block Lock Clear (dcblc).
• Instruction cache locking control instructions-Instruction Cache Block Touch and Lock Set (icbtls)
and Instruction Cache Block Lock Clear (icblc)
• Way allocation
• Write allocation policies
• Tag and data parity
• Hardware cache coherency support for the data cache
• Supports multibit EDC for the instruction cache
• Correction/auto-invalidation capability for the intstruction and data caches
• e200z7-specific L1 cache flush and invalidate registers (L1FINV0 and L1FINV1) support
software-based flush and invalidation control on a set and way basis
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
1-15
e200z7 Core Complex Overview
1.3.5
Memory Management Unit (MMU) Features
The MMU is an implementation of the embedded MMU category of the Power ISA, with the following
feature set:
• 32-bit effective-to-real address translation
• 8-bit process identifier (PID)
• 64-entry, fully associative TLB
• Support for multiple page sizes (1 KB to 4 GB)
• Software managed by tlbre, tlbwe, tlbsx, tlbsync, and tlbivax instructions
• Entry flush protection
1.3.6
System Bus (Core Complex Interface) Features
The features of the core complex interface are as follows:
• Independent instruction and data buses
• Advanced microcontroller bus architecture (AMBA) and advanced high-performance bus
(AHB2.v6)-Lite protocol
• 32-bit address bus, 64-bit data bus, plus attributes and control on each bus
• Instruction interface supports read transfers of 16, 32, and 64 bits.
• Data interface has separate unidirectional 64-bit read data bus and 64-bit write data bus.
• Both the instruction and data interface buses support misaligned transfers, true big- and
little-endian operating modes, and operates in a pipelined fashion
• Support for HCLK running at a slower rate than CPU clock
1.3.7
Nexus 3+ Module Features
The Nexus 3+ module provides real-time development capabilities for e200z7 processors in compliance
with the IEEE-ISTO 5001™-2008 standard. This module provides development support capabilities
without requiring the use of address and data pins for internal visibility. The ‘3+’ suffix indicates that some
Nexus Class 4 features are implemented.
A portion of the pin interface (the JTAG port) is shared with the OnCE/Nexus 1 unit. The IEEE-ISTO
5001-2008 standard defines an extensible auxiliary port, which is used in conjunction with the JTAG port
in e200z7 processors.
e200z7 Power Architecture Core Reference Manual, Rev. 2
1-16
Freescale Semiconductor
Chapter 2
Register Model
This section describes the registers implemented in the e200z7 core. It includes an overview of registers
defined by the Power ISA embedded category architecture and highlights any differences in how these
registers are implemented in the e200z7 core. This section also provides detailed descriptions of
e200-specific registers. Full descriptions of the architecture-defined register set are provided in the EREF.
The Power ISA embedded category architecture defines register-to-register operations for all
computational instructions. Source data for these instructions are accessed from the on-chip registers or
are provided as immediate values embedded in the opcode. The three-register instruction format allows
specification of a target register distinct from the two source registers, thus preserving the original data for
use by other instructions. Data is transferred between memory and registers with explicit load and store
instructions only.
The e200z7 extends the general-purpose registers (GPRs) to 64 bits for supporting the signal processing
engine (SPE) and embedded floating point unit (EFPU) operations. Power ISA embedded category
instructions operate on the lower 32 bits of the GPRs only, and the upper 32 bits are unaffected by these
instructions. SPE vector instructions operate on the entire 64-bit register. The SPE defines load and store
instructions for transferring 64-bit values to/from memory.
Figure 2-1–Figure 2-4 show the complete e200z7 register set divided into supervisor and user-level
registers and grouped into general-purpose registers (GPRs), special-purpose registers (SPRs), device
control registers (DCRs), and any performance monitor registers (PMRs) that may be implemented in a
particular variation of the e200z7 core family The number to the right of the special-purpose registers
(SPRs) is the decimal number used in the instruction syntax to access the register. For example, the integer
exception register (XER) is SPR 1.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-1
Register Model
Figure 2-1 shows the supervisor mode programmer’s model.
General Registers
Exception Handling/Control Registers
SPR General
General-Purpose Registers
Condition Register
SPR 272
GPR0
SPRG1
GPR1
CR
Count
CTR
SPR 9
Link
LR
SPR 8
GPR31
XER
XER
SPR 1
Accumulator
ACC
Processor Version
PVR
SPR 287
Processor ID
PIR
SPR 286
HID1
SPR 273
SRR1
SPR 27 Interrupt Vector
SPRG2
SPR 274
CSRR0
SPRG3
SPR 275
SPRG4
SPR 276
SPRG5
SPR 277
DSRR11
SPR 575
SPR 400
CSRR1
SPR 59
IVOR1
SPR 401
DSRR01
SPR 574
IVOR15
SPR 415
IVOR321
SPR 528
IVOR351
SPR 531
1
SPR 278
MCSRR0
SPR 279
MCSRR11 SPR 571
SPRG8
SPR 604
SPRG9
SPR 605
SPR 256
SPR 570
Exception
Syndrome Register
ESR
SPR 62
Machine Check
Machine Check
Syndrome Register Address Register
MCSR
SPR 572
MCAR
SPR 573
Data Exception
DEAR
SPR 61
Timers
BTB Register
BTB Control1
Time Base
(write only)
TBL
SPR 284
DEC
SPR 22
TBU
SPR 285
DECAR
SPR 54
Decrementer
2
Debug Registers
Debug Control
IVOR0
SPRG7
SPR 1009
Instruction
Address Compare
SPR 58
SPRG6
USPRG0
System Version1
SPR 1023
SVR
SPR 63
SPR 26
Hardware Implementation
Dependent1
HID0
SPR 1008
Machine State
MSR
IVPR
SRR0
User SPR
Processor Control Registers
Interrupt Vector
Save and Restore
SPRG0
BUCSR
SPR 1013
SPE/EFPU Registers
Control and Status
TCR
SPR 340
TSR
SPR 336
SPE /EFPU Status and
Control Register
DBCR0
SPR 308
IAC1
SPR 312
DBCR1
SPR 309
IAC2
SPR 313
DBCR2
SPR 310
IAC3
SPR 314
DBCR31
SPR 561
IAC4
SPR 315
DBCR41
SPR 563
IAC5
SPR 565
DBCR51
SPR 564
IAC6
SPR 566
DBCR61
SPR 603
IAC7
SPR 567
MAS0
SPR 624
DBERC01
SPR 569
IAC8
SPR 568
MAS1
SPR 625
DEVENT1
MMUCFG SPR 1015
SPR 975
MAS2
SPR 626
SPR 576 Data Address Compare
DAC1
SPR 316
TLB0CFG SPR 688
MAS3
SPR 627
TLB1CFG SPR 689
MAS4
SPR 628
MAS6
SPR 630
DDAM1
DAC2
Debug Status
DBSR
SPR 304
Debug Counter
DBCNT
1
SPR 562
SPR 317
Data Value Compare
DVC1
SPR 318
DVC2
SPR 319
1 - These e200-specific registers may not be supported by
other Power Architecture processors
2 - Optional registers defined by the Power ISA embedded
architecture
3 - Read-only registers
SPEFSCR SPR 512
Memory Management Registers
MMU
Assist1
Process ID
PID0
Control & Configuration
SPR 48
MMUCSR0 SPR 1012
Cache Registers
Cache Control1
Cache
Configuration
(read only)
L1CFG0
SPR 515
L1CFG1
SPR 516
L1CSR0
SPR 1010
L1CSR1
SPR 1011
L1FINV0
SPR 1016
L1FINV1
SPR 959
Figure 2-1. e200z760 Supervisor Mode Programmer’s Model
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-2
Freescale Semiconductor
Register Model
Figure 2-2 shows the supervisor mode programmer models’s DCRs and PMRs.
PSU Registers1
Performance Monitor
Registers1
PSU
User Control
(read-only)
Control
PMGC0
PMR 400
PMLCa0
PMR 144
PMLCa1
PMR 145
PMLCa2
Counters
PMR 384
PMC0
PMR 16
UPMLCa0 PMR 128
PMC1
PMR 17
UPMLCa1 PMR 129
PMC2
PMR 18
PMR 146
UPMLCa2 PMR 130
PMC3
PMR 19
PMLCa3
PMR 147
UPMLCa3 PMR 131
PMLCb0
PMR 272
PMLCb1
PMR 273
PMLCb2
PMR 274
PMLCb3
PMR 275
UPMGC0
User Counters
UPMLCb0 PMR 256 (read-only)
UPMLCb1 PMR 257
PMR 0
UPMC0
UPMLCb2 PMR 258
PMR 1
UPMC1
UPMLCb3 PMR 259
PMR 2
UPMC2
UPMC3
PSCR
DCR 272
PSSR
DCR 273
PSHR
DCR 274
PSLR
DCR 275
PSCTR
DCR 276
PSUHR
DCR 277
PSULR
DCR 278
PMR 3
Cache Access Registers1
CDACNTL
DCR 351
CDADATA
DCR 350
Note:
1) These e200-specific registers may not be supported by other Power ISA embedded category processors
Figure 2-2. e200z760 Supervisor Mode Programmer’s Model DCRs and PMRs
Figure 2-3 shows the user mode programmer’s model.
General Registers
Time Base
Condition Register
CR
General-Purpose Registers
Count Register
CTR
SPR 268
TBU
SPR 269
Cache
Configuration
L1CFG0
SPR 515
L1CFG1
SPR 516
GPR1
Link Register
Control Registers
SPR 8
SPR General (Read-Only)
GPR31
XER
XER
TBL
Cache Register
(Read-Only)
GPR0
SPR 9
LR
Timers (Read-Only)
SPR 1
Accumulator
ACC
SPRG4
SPR 260
SPRG5
SPR 261
SPRG6
SPR 262
SPRG7
SPR 263
USPRG0
DEVENT
SPR 975
DDAM
SPR 576
SPE Status and
Control Register
SPEFSCR SPR 512
User SPR
Debug
Category Registers
SPR 256
Figure 2-3. e200z7 User Mode Programmer’s Model
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-3
Register Model
Figure 2-4 shows the user mode programmer’s model PMRs.
Performance Monitor
Registers
User Control
(read-only)
User Counters
(read-only)
UPMGC0
PMR 384
UPMC0
PMR 0
UPMLCa0
PMR 128
UPMC1
PMR 1
UPMLCa1
PMR 129
UPMC2
PMR 2
UPMLCa2
PMR 130
UPMC3
PMR 3
UPMLCa3
PMR 131
UPMLCb0
PMR 256
UPMLCb1
PMR 257
UPMLCb2
PMR 258
UPMLCb3
PMR 259
Note:
These e200-specific registers may not be supported by other Power ISA embedded category processors.
Figure 2-4. e200 User Mode Programmer’s Model PMRs
The GPRs are accessed through instruction operands. Access to other registers can be explicit (by using
instructions for that purpose such as Move to Special Purpose Register (mtspr) and Move from Special
Purpose Register (mfspr) instructions) or implicit as part of the execution of an instruction. Some registers
are accessed both explicitly and implicitly.
2.1
Power ISA Embedded Category Registers
The e200z7 supports most of the registers defined by the Power ISA embedded category architecture.
Notable exceptions are the floating-point registers FPR0–FPR31 and FPSCR. The e200z7 does not support
the Power ISA floating-point category functionality in hardware. The GPRs have been extended to 64 bits.
The EREF contains complete descriptions of the Power ISA embedded registers, but there are described
briefly as follows:
• User-level registers—The user-level registers can be accessed by all software with either user or
supervisor privileges. They include the following:
— General-purpose registers (GPRs). The thirty-two 64-bit GPRs (GPR0–GPR31) serve as data
source or destination registers for integer instructions and provide data for generating
addresses. Power ISA embedded category instructions affect only the lower 32 bits of the
GPRs. SPE and EFPU instructions are provided which operate on the entire 64-bit register.
— Condition register (CR). The 32-bit CR consists of eight 4-bit fields, CR0–CR7, that reflect
results of certain arithmetic operations and provide a mechanism for testing and branching.
The remaining user-level registers are SPRs. Note that the Power ISA embedded category
architecture provides the mtspr and mfspr instructions for accessing SPRs.
— Integer exception register (XER). The XER indicates overflow and carries for integer
operations.
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-4
Freescale Semiconductor
Register Model
•
— Link register (LR). The LR provides the branch target address for the branch [conditional] to
link register (bclr, bclrl, se_blr, se_blrl) instructions, and is used to hold the address of the
instruction that follows a branch and link instruction, typically used for linking to subroutines.
— Count register (CTR). The CTR holds a loop count that can be decremented during execution
of appropriately coded branch instructions. The CTR also provides the branch target address
for the branch [conditional] to count register (bcctr, bcctrl, se_bctr, se_bctrl) instructions.
— Time base (TB). The TB facility consists of two 32-bit registers—time base upper (TBU) and
time base lower (TBL). These two registers are accessible in a read-only fashion to user-level
software.
— SPRG4–SPRG7. The Power ISA embedded category architecture defines software-use special
purpose registers (SPRGs). SPRG4–SPRG7 are accessible in a read-only fashion by user-level
software. The e200 does not allow user mode access to the SPRG3 register (defined as
implementation dependent by Power ISA).
— USPRG0. The Power ISA embedded category architecture defines user software-use special
purpose register (USPRG0). The USPRG0 is accessible in a read-write fashion by user-level
software.
Supervisor-level registers—In addition to the registers accessible in user mode, supervisor-level
software has access to additional control and status registers used for configuration, exception
handling, and other operating system functions. The Power ISA embedded category architecture
defines the following supervisor-level registers:
— Processor control registers
– Machine state register (MSR). The MSR defines the state of the processor. The MSR can be
modified by the move to machine state register (mtmsr), system call (sc, se_sc), and return
from exception (rfi, rfci, rfdi, rfmci, se_rfi, se_rfci, se_rfdi, se_rfmci) instructions. It can
be read by the Move from Machine State Register (mfmsr) instruction. When an interrupt
occurs, the contents of the MSR are saved to one of the machine state save/restore
registers (SRR1, CSRR1, DSRR1, MCSRR1).
– Processor version register (PVR). This register is a read-only register that identifies the
version (model) and revision level of the processor.
– Processor identification register (PIR). This read/write register is provided to distinguish the
processor from other processors in the system.
— Storage control register
– Process ID register (PID, also referred to as PID0). This register is provided to indicate the
current process or task identifier. It is used by the MMU as an extension to the effective
address, and by external Nexus 2/3/4 modules for ownership trace message generation. The
Power ISA embedded category architecture allows for multiple PIDs; the e200z7
implements only one.
— Interrupt registers
– Data exception address register (DEAR). After most data storage interrupts (DSI), or on an
alignment interrupt or data TLB miss interrupt, the DEAR is set to the effective address
(EA) generated by the faulting instruction.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-5
Register Model
– SPRG0–SPRG7, USPRG0. The SPRG0–SPRG7 and USPRG0 registers are provided for
operating system use. The e200 does not allow user mode access to the SPRG3 register
(defined as implementation dependent by the Power ISA embedded category architecture).
– Exception syndrome register (ESR). The ESR register provides a syndrome to differentiate
between the different kinds of exceptions which can generate the same interrupt.
– Interrupt vector prefix register (IVPR) and the interrupt vector offset registers
(IVOR0–IVOR15, IVOR32–IVOR35). These registers together provide the address of the
interrupt handler for different classes of interrupts.
– Save/restore register 0 (SRR0). The SRR0 register is used to save machine state on a
noncritical interrupt, and contains the address of the instruction at which execution resumes
when an rfi or se_rfi instruction is executed at the end of a noncritical class interrupt handler
routine.
– Critical save/restore register 0 (CSRR0). The CSRR0 register is used to save machine state
on a critical interrupt, and contains the address of the instruction at which execution resumes
when an rfci or se_rfci instruction is executed at the end of a critical class interrupt handler
routine.
– Save/restore register 1 (SRR1). The SRR1 register is used to save machine state from the
MSR on noncritical interrupts, and to restore machine state when an rfi or se_rfi executes.
– Critical save/restore register 1 (CSRR1). The CSRR1 register is used to save machine state
from the MSR on critical interrupts, and to restore machine state when rfci or se_rfci
executes.
— Debug facility registers
– Debug control registers (DBCR0–DBCR2). These registers provide control for enabling
and configuring debug events.
– Debug status register (DBSR). This register contains debug event status.
– Instruction address compare registers (IAC1–IAC4). These registers contain addresses
and/or masks which are used to specify instruction address compare debug events.
– Data address compare registers (DAC1–DAC2). These registers contain addresses and/or
masks which are used to specify data address compare debug events.
– Data value compare registers (DVC1–DVC2). These registers contain data values which are
used to specify Data value compare debug events.
— Timer registers
– Time base (TB). The TB is a 64-bit structure provided for maintaining the time of day and
operating interval timers. The TB consists of two 32-bit registers, Time base upper (TBU)
and time base lower (TBL). The time base registers can be written only by supervisor-level
software, but can be read by both user and supervisor-level software.
– Decrementer register (DEC). This register is a 32-bit decrementing counter that provides a
mechanism for causing a decrementer exception after a programmable delay.
– Decrementer auto-reload (DECAR). This register is provided to support the auto-reload
feature of the decrementer.
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-6
Freescale Semiconductor
Register Model
– Timer control register (TCR). This register controls decrementer, fixed-interval timer, and
watchdog timer options.
– Timer status register (TSR). This register contains status on timer events and the most recent
watchdog timer-initiated processor reset.
2.2
e200-Specific Special Purpose Registers
The Power ISA embedded category architecture allows implementation-specific special purpose registers.
Those incorporated in the e200 core are as follows:
• User-level registers—The user-level registers can be accessed by all software with either user or
supervisor privileges. They include the following:
— Signal processing extension/embedded floating-point unit status and control register
(SPEFSCR). The SPEFSCR contains all fixed-point and floating-point exception signal bits,
exception summary bits, exception enable bits, and rounding control bits needed for
compliance with IEEE 754. See Section 6.2.3, “SPE Status and Control Register (SPEFSCR).”
— The L1 cache configuration registers (L1CFG0, L1CGF1). These read-only registers allows
software to query the configuration of the L1 Harvard caches.
• Supervisor-level registers—The following supervisor-level registers are defined in the e200 in
addition to the Power ISA embedded category registers described above:
— Configuration registers
– Hardware implementation dependent register 0 (HID0). This register controls various
processor and system functions.
– Hardware implementation dependent register 1 (HID1). This register controls various
processor and system functions.
— Exception handling and control registers
– Machine check save/restore register 0 (MCSRR0). The MCSRR0 register is used to save
machine state on a machine check interrupt, and contains the address of the instruction at
which execution resumes when an rfmci or se_rfmci instruction is executed.
– Machine Check save/restore register 1 (MCSRR1). The MCSRR1 register is used to save
machine state from the MSR on machine check interrupts, and to restore machine state when
an rfmci or se_rfmci instruction is executed.
– Machine check syndrome register (MCSR). This register provides a syndrome to
differentiate between the different kinds of conditions which can generate a machine check.
– Machine check address register (MCAR). This register provides an address associated with
certain machine checks.
– Debug save/restore register 0 (DSRR0). When enabled, the DSRR0 register is used to save
the address of the instruction at which execution continues when an rfdi or se_rfdi
instruction executes at the end of a debug interrupt handler routine.
– Debug save/restore register 1 (DSRR1). When enabled, the DSRR1 register is used to save
machine status on debug interrupts and to restore machine status when an rfdi or se_rfdi
instruction executes.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-7
Register Model
—
—
—
—
—
– SPRG8 and SPRG9. The SPRG8 and SPRG9 registers are provided for operating system
use for the machine check and debug APUs.
Debug facility registers
– Instruction address compare registers (IAC5–IAC8). These registers contain addresses
and/or masks which are used to specify instruction address compare debug events.
– Debug control registers (DBCR3–DBCR6). These registers provides control for debug
functions not described in Power ISA embedded category architecture.
– Debug external resource control register 0 (DBERC0). This register provides control for
debug functions not described in Power ISA embedded category architecture.
– Debug counter register (DBCNT). This register provides counter capability for debug
functions.
Branch unit control and status register (BUCSR) controls operation of the BTB
Cache registers
– L1 cache configuration registers (L1CFG0, L1CFG1) is a read-only register that allows
software to query the configuration of the L1 caches.
– L1 cache control and status registers (L1CSR0, L1CSR1) control the operation of the L1
caches such as cache enabling, cache invalidation, cache locking, etc.
– L1 cache flush and invalidate registers (L1FINV0, L1FINV1) controls software flushing
and invalidation of the L1 caches.
Memory management unit registers
– MMU configuration register (MMUCFG) is a read-only register that allows software to
query the configuration of the MMU.
– MMU assist (MAS0–MAS4, MAS6) registers. These registers provide the interface to the
e200 core from the MMU.
– MMU control and status register (MMUCSR0) controls invalidation of the MMU.
– TLB configuration registers (TLB0CFG, TLB1CFG) are read-only registers that allow
software to query the configuration of the TLBs.
System version register (SVR). This register is a read-only register that identifies the version
(model) and revision level of the system which includes the e200 processor.
NOTE
It is not guaranteed that the implementation of e200 core-specific registers
is consistent among the Power ISA embedded category processors, although
other processors may implement similar or identical registers.
All e200 SPR definitions are compliant with the Freescale EIS definitions.
2.3
e200-Specific Device Control Registers
In addition to the SPRs described above, implementations may also choose to implement one or more
device control registers (DCRs). The e200z7 implements a set of device control registers to perform a
parallel signature capability in the parallel signature unit (PSU). These registers are described in
Section 13.9, “Parallel Signature Unit.”
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-8
Freescale Semiconductor
Register Model
2.4
Special-Purpose Register Descriptions
This section describes the special-purpose registers. Each subsection contains an initial description
followed by a register figure and a table of bit field definitions.
2.4.1
Machine State Register (MSR)
A complete description of the machine state register (MSR) is included in the EREF. The MSR defines the
state of the processor. Chapter 7, “Interrupts and Exceptions,” describes how the MSR is affected when
interrupts occur. The e200 MSR is shown in Figure 2-5.
Access: Read/Write
0
R
W
4
—
5
6
UCLE SPE
7
12
—
13
14 15 16
17
18
19
20
21
22
23
WE CE — EE PR FP ME FE0 — DE FE1
Reset
24 25 26
—
27
IS DS
28
—
29
30
31
PMM RI —
All zeros
Figure 2-5. Machine State Register (MSR)
The MSR bits are defined in Table 2-1.
Table 2-1. MSR Field Descriptions
Bits
Name
Description
0–4
(32–36)
—
5
(37)
UCLE
6
(38)
SPE
7–12
(39–44)
—
13
(45)
WE
Wait State (Power Management) Enable
0 Power management is disabled.
1 Power management is enabled. The processor can enter a power-saving mode when additional
conditions are present. The mode chosen is determined by the DOZE, NAP, and SLEEP bits in the
HID0 register, described in Section 2.4.11, “Hardware Implementation Dependent Register 0
(HID0).”
14
(46)
CE
Critical Interrupt Enable
0 Critical input and watchdog timer interrupts are disabled.
1 Critical input and watchdog timer interrupts are enabled.
15
(47)
—
Reserved
Reserved
User Cache Lock Enable
0 Execution of the cache locking instructions in user mode (MSR[PR] = 1) disabled; DSI exception
taken instead, and ILK or DLK set in ESR.
1 Execution of the cache lock instructions in user mode enabled.
SPE/EFPU Available
0 Execution of SPE and EFPU vector instructions is disabled; SPE/EFPU unavailable exception taken
instead, and SPE bit is set in ESR.
1 Execution of SPE and EFPU vector instructions is enabled.
Reserved
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-9
Register Model
Table 2-1. MSR Field Descriptions (continued)
Bits
Name
Description
16
(48)
EE
External Interrupt Enable
0 External input, decrementer, and fixed-interval timer interrupts are disabled.
1 External input, decrementer, and fixed-interval timer interrupts are enabled.
17
(49)
PR
Problem State
0 The processor is in supervisor mode, can execute any instruction, and can access any resource (for
example, GPRs, SPRs, MSR, etc.).
1 The processor is in user mode, cannot execute any privileged instruction, and cannot access any
privileged resource.
18
(50)
FP
Floating-Point Available
0 Floating-point unit is unavailable. The processor cannot execute floating-point instructions, including
floating-point loads, stores, and moves. (An FP unavailable interrupt is generated on attempted
execution of floating-point instructions.)
1 Floating-point unit is available. The processor can execute floating-point instructions.
Note that for the e200, the floating-point unit is not supported in hardware, and an unimplemented
operation exception is generated for attempted execution of Power ISA embedded category
floating-point instructions when FP is set.
19
(51)
ME
Machine Check Enable
0 Asynchronous machine check interrupts are disabled.
1 Asynchronous machine check interrupts are enabled.
20
(52)
FE0
Floating-Point Exception Mode 0 (not used by e200)
21
(53)
—
Reserved
22
(54)
DE
Debug Interrupt Enable
0 Debug interrupts are disabled.
1 Debug interrupts are enabled.
23
(55)
FE1
Floating-Point Exception Mode 1 (not used by e200)
24
(56)
—
Reserved
25
(57)
—
Reserved
26
(58)
IS
Instruction Address Space
0 The processor directs all instruction fetches to address space 0 (TS = 0 in the relevant TLB entry).
1 The processor directs all instruction fetches to address space 1 (TS = 1 in the relevant TLB entry).
27
(59)
DS
Data Address Space
0 The processor directs all data storage accesses to address space 0 (TS = 0 in the relevant TLB
entry).
1 The processor directs all data storage accesses to address space 1 (TS = 1 in the relevant TLB
entry).
28
(60)
—
Reserved
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-10
Freescale Semiconductor
Register Model
Table 2-1. MSR Field Descriptions (continued)
Bits
Name
Description
29
(61)
PMM
PMM Performance monitor mark bit. System software can set PMM when a marked process is running
to enable statistics to be gathered only during the execution of the marked process. MSR[PR] and
MSR[PMM] together define a state that the processor (supervisor or user) and the process (marked or
unmarked) may be in at any time. If this state matches an individual state specified in the performance
monitor registers PMLCa n, the state for which monitoring is enabled, counting is enabled.
30
(62)
RI
Recoverable Interrupt. This bit is provided for software use to detect nested exception conditions. This
bit is cleared by hardware when a machine check interrupt is taken
31
(63)
—
Reserved
2.4.2
Processor ID Register (PIR)
The processor ID for the CPU core is contained in the processor ID register (PIR), shown in Figure 2-6.
The contents of the PIR register are a reflection of hardware input signals to the e200 core following reset.
This register may be written by software to modify the default reset value.
SPR 286
Access: Read/Write
0
31
R
ID
W
Reset 0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0 n1 n1 n1 n1 n1 n1 n1 n1
Updated to reflect the values on p_cpuid[0:7]
Figure 2-6. Processor ID Register (PIR)
The PIR fields are defined in Table 2-2.
Table 2-2. PIR Field Descriptions
Bits
Name
0–23
ID
Description
These bits are reset to 0 and are writable by software.
24–31
These bits are reset to the values provided on the p_cpuid[0:7] input signals and are writable by software.
2.4.3
Processor Version Register (PVR)
The processor version register (PVR), shown in Figure 2-7, contains the processor version number for the
CPU core.
SPR 287
0
R
W
Access: Read only
3
MANID
4
5
—
6
11 12
Type
15 16
Version
19 20
MBG Use
23 24
Minor Rev
27 28
Major Rev
31
MBG ID
Figure 2-7. Processor Version Register (PVR)
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-11
Register Model
This register contains fields to specify a particular implementation of an e200 family member. This register
is read only. Interface signals p_pvrin[16:31] provide the contents of a portion of this register.
Table 2-3. PVR Field Descriptions
Bits
Name
0–3
MANID
4–5
—
6–11
Type
12–15
Version
Description
These bits identify the manufacturer ID. Freescale is 0b1000.
Reserved
These bits identify the processor type. e200z7 is 0b010110.
These bits identify the version of the processor and inclusion of optional elements. For e200z760n3,
these are tied to 0b0011.
16–19
MBG Use These bits are allocated for use by Freescale to distinguish different system variants and are provided
by the p_pvrin[16:19] input signals.
20–23
Minor Rev These bits distinguish between implementations of the version and are provided by the p_pvrin[20:23]
input signals.
24–27
Major Rev These bits distinguish between implementations of the version and are provided by the p_pvrin[24:27]
input signals.
28–31
MBG ID
2.4.4
These bits identify the Freescale organization responsible for a particular mask set and are provided by
the p_pvrin[28:31] input signals.
MBG value of 0b0000 is reserved.
System Version Register (SVR)
The system version register (SVR), shown in Figure 2-8, contains system version information for an
e200-based SoC.
SPR 1023
Access: Read only
0
31
R
System Version
W
Figure 2-8. System Version Register (SVR)
This register is used to specify a particular implementation of an e200-based system. This register is read
only.
Table 2-4. SVR Field Descriptions
Bits
Name
Description
0–31
System
Version
These bits are allocated for use by Freescale to distinguish different system variants, and are provided by
the p_sysvers[0:31] input signals
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-12
Freescale Semiconductor
Register Model
2.4.5
Integer Exception Register (XER)
A complete description of the integer exception register (XER) can be found in the EREF. The XER bit
assignments are shown in Figure 2-9.
SPR 1
Access: Read/Write
0
R
1
2
3
24 25
SO OV CA
W
31
—
Reset
Bytecnt
All zeros
Figure 2-9. Integer Exception Register (XER)
The XER fields are defined in Table 2-5.
Table 2-5. XER Field Descriptions
1
Bits
Name
Description
0
(32)
SO
Summary Overflow (per Power ISA embedded category)
1
(33)
OV
Overflow (per Power ISA embedded category)
2
(34)
CA
Carry (per Power ISA embedded category)
3–24
(35–56)
—
Reserved
25–31
(57–63)
Bytecnt1
Reserved for lswi, lswx, stswi, stswx string instructions
These bits are implemented to support emulation of the string instructions.
2.4.6
Exception Syndrome Register
A complete description of the exception syndrome register (ESR) can be found in the EREF. The exception
syndrome register (ESR) provides a syndrome to differentiate between exceptions that can generate the
same interrupt type. The e200 adds some implementation-specific bits to this register, as seen in
Figure 2-10.
SPR 62
0
R
W
Access: Read/Write
3
—
4
5
6
7
8
9
10
11
12
13
14
15
PIL PPR PTR FP ST — DLK ILK AP PUO BO PIE
Reset
16
23
—
24
25
SPE —
26
27 29
VLEMI
—
30
31
MIF —
All zeros
Figure 2-10. Exception Syndrome Register (ESR)
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-13
Register Model
The ESR fields are defined in Table 2-6.
Table 2-6. ESR Field Descriptions
Bit(s)
Name
Description
Associated Interrupt Type
0–3
(32–35)
—
Reserved
4
(36)
PIL
Illegal Instruction Exception
Program
5
(37)
PPR
Privileged Instruction Exception
Program
6
(38)
PTR
Trap Exception
Program
7
(39)
FP
Floating-Point Operation
Alignment
Data storage
Data TLB
Program
8
(40)
ST
Store Operation
Alignment
Data storage
Data TLB
9
(41)
—
Reserved
10
(42)
DLK
Data Cache Locking
Data storage
11
(43)
ILK
Instruction Cache Locking
Data storage
12
(44)
AP
Auxiliary Processor Operation
(Currently unused in e200)
Alignment
Data storage
Data TLB
Program
13
(45)
PUO
Unimplemented Operation Exception
Program
14
(46)
BO
Byte Ordering Exception
Mismatched Instruction Storage Exception
Data storage
Instruction storage
15
(47)
PIE
Program Imprecise Exception
(Reserved)
Currently unused in e200
—
—
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-14
Freescale Semiconductor
Register Model
Table 2-6. ESR Field Descriptions (continued)
Bit(s)
Name
16–23
(48–55)
—
24
(56)
SPE
25
(57)
—
26
(58)
VLEMI
27–29
(59–61)
—
30
(62)
MIF
31
(63)
—
2.4.6.1
Description
Associated Interrupt Type
Reserved
SPE/EFPU Operation
—
SPE/EFPU unavailable
EFPU floating-point data exception
EFPU floating-point round
exception
Alignment
Data storage
Data TLB
Reserved
VLE Mode Instruction
—
SPE/EFPU unavailable
EFPU floating-point Data
exception
EFPU floating-point Round
exception
Data storage
Data TLB
Instruction storage
Alignment
Program
System call
Reserved
Misaligned Instruction Fetch
—
Instruction storage
Instruction TLB
Reserved
—
Power ISA VLE Mode Instruction Syndrome
The ESR[VLEMI] is provided to indicate that an interrupt was caused by a Power ISA VLE instruction.
This syndrome bit is set on an exception associated with execution or attempted execution of a Power ISA
VLE instruction. This bit is updated for the interrupt types indicated in Table 2-6.
2.4.6.2
Misaligned Instruction Fetch Syndrome
The ESR[MIF] is provided to indicate that an instruction storage interrupt was caused by an attempt to
fetch an instruction from a Power ISA embedded category page which was not aligned on a word
boundary. The fetch may have been caused by execution of a branch class instruction from a VLE page to
a non-VLE page, a branch to LR instruction with LR[62] = 1, a branch to CTR instruction with
CTR[62] = 1, execution of an rfi or se_rfi instruction with SRR0[62] = 1, execution of an rfci or se_rfci
instruction with CSRR0[62] = 1, execution of an rfdi or se_rfdi instruction with DSRR0[62] = 1, or
execution of an rfmci or se_rfmci instruction with MCSRR0[62] = 1, where the destination address
corresponds to an instruction page which is not marked as a Power ISA VLE page.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-15
Register Model
ESR[MIF] is also used to indicate that an instruction TLB interrupt was caused by a TLB miss on the
second half of a misaligned 32-bit Power ISA VLE instruction. For this case, SRR0 will be pointing to the
first half of the instruction, which will reside on the previous page from the miss at page offset 0xFFE. The
ITLB handler may need to realize that the miss corresponds to the next page, although MMU MAS2
contents will correctly reflect the page corresponding to the miss.
2.4.7
Machine Check Syndrome Register (MCSR)
When the core complex takes a machine check interrupt, it updates the machine check syndrome register
(MCSR) to differentiate between machine check conditions. Figure 2-11 shows the MCSR.
SPR 572
0
R
W
Access: w1c
1
2
3
4
5
6
w1c
w1c
w1c
w1c
w1c
w1c
w1c
Reset
8
9
w1c
10
11
12
13
14 15
NMI
MAV
MEA
w1c
w1c
w1c
27
28
29
—
—
w1c
IF
w
1c
All zeros
16
R
7
IC_
CP_
DC_ EXCP IC_
DC_
IC_
DC_
MCP
DPERR PERR DPERR _ERR TPERR TPERR LKERR LKERR
17
18
LD
ST
G
W w1c
w1c
w1c
19
25
26
30 31
BUS_ BUS_
BUS_
SNPERR
IRERR DRERR WRERR
—
w1c
Reset
w1c
w1c
—
w1c
All zeros
Figure 2-11. Machine Check Syndrome Register (MCSR)
Table 2-7 describes MCSR fields. The MCSR indicates the source of a machine check condition. When an
async mchk or error report syndrome bit in the MCSR is set, the core complex asserts p_mcp_out for
system information. Note that the bits in the MCSR are implemented as write one to clear, so software must
write ones into those bit positions it wishes to clear, typically by writing back what was originally read.
See Section 7.6.2, “Machine Check Interrupt (IVOR1),” for more details of the MCSR settings.
Table 2-7. Machine Check Syndrome Register (MCSR)
Exception
Type1
Recoverable
Machine check input pin
Async Mchk
Maybe
IC_DPERR
Instruction Cache data array parity error
Async Mchk
Precise
2
(34)
CP_PERR
Data Cache push parity error
Async Mchk
Unlikely
3
(35)
DC_DPERR
Data Cache data array parity error
Async Mchk
Maybe
4
(36)
EXCP_ERR
ISI, ITLB, or Bus Error on first instruction fetch for an exception
handler
Async Mchk
Precise
Bits
Name
0
(32)
MCP
1
(33)
Description
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-16
Freescale Semiconductor
Register Model
Table 2-7. Machine Check Syndrome Register (MCSR) (continued)
Exception
Type1
Recoverable
Instruction Cache Tag parity error
Async Mchk
Precise
DC_TPERR
Data Cache Tag parity error
Async Mchk
Maybe
7
(39)
IC_LKERR
Instruction Cache Lock error
Indicates a cache control operation or invalidation operation
invalidated one or more locked lines in the Icache
Status
—
8
(40)
DC_LKERR
Data Cache Lock error
Indicates a cache control operation or instruction invalidation
operation invalidated one or more locked lines in the Dcache
Status
—
9–10
(41–42)
—
—
—
11
(43)
NMI
NMI input pin
NMI
—
12
(44)
MAV
MCAR Address Valid
Indicates that the address contained in the MCAR was updated by
hardware to correspond to the first detected Async Mchk error
condition
Status
—
13
(45)
MEA
MCAR holds Effective Address
If MAV = 1, MEA = 1 indicates that the MCAR contains an effective
address and MEA = 0 indicates that the MCAR contains a physical
address
Status
—
14
(46)
—
Reserved
—
—
15
(47)
IF
Instruction Fetch Error Report
An error occurred during the attempt to fetch an instruction.
MCSRR0 contains the instruction address.
Error
Report
Precise
16
(48)
LD
Load type instruction Error Report
An error occurred during the attempt to execute the load type
instruction located at the address stored in MCSRR0.
Error
Report
Precise
17
(49)
ST
Store type instruction Error Report
An error occurred during the attempt to execute the store type
instruction located at the address stored in MCSRR0.
Error
Report
Precise
18
(50)
G
Guarded instruction Error Report
An error occurred during the attempt to execute the load or store
type instruction located at the address stored in MCSRR0 and the
access was guarded and encountered an error on the external bus.
Error
Report
Precise
19–25
(51–57)
—
Reserved
—
—
26
(58)
SNPERR
Bits
Name
5
(37)
IC_TPERR
6
(38)
Description
Reserved
Snoop Lookup Error
Async Mchk
An error occurred during certain snoop operations. This is typically
due to a data cache tag parity error, in which case DC_TPERR will
also be set.
Unlikely
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-17
Register Model
Table 2-7. Machine Check Syndrome Register (MCSR) (continued)
1
Exception
Type1
Bits
Name
Description
27
(59)
BUS_IRERR
Read bus error on Instruction fetch or linefill
Async Mchk
Precise if
data used
28
(60)
BUS_DRERR
Read bus error on data load or linefill
Async Mchk
Precise if
data used
29
(61)
BUS_WRERR
Write bus error on store or cache line push
Async Mchk
Unlikely
30–31
(62–63)
—
—
—
Reserved
Recoverable
The Exception Type indicates the exception type associated with a given syndrome bit
- “Error Report” indicates that this bit is only set for error report exceptions which cause machine check interrupts. These
bits are only updated when the machine check interrupt is actually taken. Error report exceptions are not gated by
MSR[ME]. These are synchronous exceptions. These bits remain set until cleared by software writing a 1 to the bit
position(s) to be cleared.
- “Status” indicates that this bit is provides additional status information regarding the logging of a machine check
exception. These bits remain set until cleared by software writing a 1 to the bit position(s) to be cleared.
- “NMI” indicates that this bit is only set for the non-maskable interrupt type exception which causes a machine check
interrupt. This bit is only updated when the machine check interrupt is actually taken. NMI exceptions are not gated by
MSR[ME]. This is an asynchronous exception. This bit remains set until cleared by software writing a 1 to the bit position.
- “Async Mchk” indicates that this bit is set for an asynchronous machine check exception. These bits are set immediately
upon detection of the error. Once any “Async Mchk” bit is set in the MCSR, a machine check interrupt will occur if
MSR[ME] = 1. If MSR[ME] = 0, the machine check exception will remain pending. These bits remain set until cleared by
software writing a 1 to the bit position(s) to be cleared.
2.4.8
Timer Control Register (TCR)
The timer control register (TCR) provides control information for the CPU timer facilities. A complete
description of the TCR in included in the EREF. The TCR[WRC] field functions are defined to be
implementation dependent and are described below. In addition, the e200 core implements two fields not
specified in the Power ISA embedded category, TCR[WPEXT] and TCR[FPEXT]. Figure 2-12 shows the
TCR.
SPR 340
0
R
W
Reset
Access: Read/Write
1
WP
2
3
4
5
WRC WIE DIE
6
7
FP
8
9
10 11
FIE ARE —
14 15
WPEXT
18 19
FPEXT
31
—
All zeros
Figure 2-12. Timer Control Register (TCR)
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-18
Freescale Semiconductor
Register Model
The TCR fields are defined in Table 2-8.
Table 2-8. Timer Control Register Field Descriptions
1
Bits
Name
Description
0–1
(32–33)
WP
2–3
(34–35)
WRC
Watchdog Timer Reset Control
00 No watchdog timer reset will occur
01 Assert watchdog reset status output 1 (p_wrs[1]) on second timeout of watchdog timer
10 Assert watchdog reset status output 0 (p_wrs[0]) on second timeout of watchdog timer
11 Assert watchdog reset status outputs 0 and 1 (p_wrs[0], p_wrs[1]) on second timeout of watchdog
timer
TCR[WRC] resets to 0b00. This field may be set by software, but cannot be cleared by software (except
by a software-induced reset). Once written to a non-zero value, this field may no longer be altered by
software.
4
(36)
WIE
Watchdog Timer Interrupt Enable
5
(37)
DIE
Decrementer Interrupt Enable
6–7
(38–39)
FP
Fixed-Interval Timer Period. When concatenated with FPEXT, specifies 1 of 64-bit locations of the time
base used to signal a fixed-interval timer exception on a transition from 0 to 1.
TCR[fpext][0–3],TCR[fp][0–1] == 0b000000 selects TBU[0]
TCR[fpext][0–3],TCR[fp][0–1] == 0b111111 selects TBL[31]
8
(40)
FIE
Fixed-Interval Timer Interrupt Enable
9
(41)
ARE
Auto-Reload Enable
10
(42)
—
11–14
(43–46)
WPEXT
Watchdog Timer Period Extension (see above description for WP). These bits get prepended to the
TCR[WP] bits to allow selection of 1 of the 64 time base bits used to signal a watchdog timer exception.
tb[0–63]  TBU[0–31] || TBL[0–31]
wp  TCR[WPEXT] || TCR[WP]
tb_wp_bit  tb[wp]
15–18
(47–50)
FPEXT
Fixed-Interval Timer Period Extension (see above description for FP). These bits get prepended to the
TCR[FP] bits to allow selection of 1 of the 64 time base bits used to signal a fixed-interval timer
exception.
tb[0–63]  TBU[0–31] || TBL[0–31]
fp  TCR[FPEXT] || TCR[FP]
tb_fp_bit  tb[fp]
19–31
(51–63)
—
Watchdog Timer Period
When concatenated with WPEXT, specifies 1 of 64-bit locations of the time base used to signal a
watchdog timer exception on a transition from 0 to 1.
TCR[wpext][0–3],TCR[wp][0–1] == 0b000000 selects TBU[0]
TCR[wpext[0–3],TCR[wp][0–1] == 0b111111 selects TBL[31]
Reserved1
Reserved1
These bits are not implemented and should be written with zero for future compatibility.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-19
Register Model
2.4.9
Timer Status Register (TSR)
The timer status register (TSR) provides status information for the CPU timer facilities. A complete
description of the TSR can be found in the EREF. TSR[WRS] is defined to be implementation dependent
and is described below. The TSR is shown in Figure 2-13.
SPR 336
0
Access: w1c
1
2
3
4
5
6
31
R ENW WIS WRS DIS FIS
W w1c w1c
—
w1c w1c w1c
Reset
All zeros
Figure 2-13. Timer Status Register (TSR)
The TSR fields are defined in Table 2-9.
Table 2-9. Timer Status Register Field Descriptions
Bits
Name
Description
0
(32)
ENW
Enable Next Watchdog
1
(33)
WIS
Watchdog Timer Interrupt Status
2–3
(34–35)
WRS
Watchdog Timer Reset Status
00 No second time-out of watchdog timer has occurred.
01 Assertion of watchdog reset status output 1 (p_wrs[1]) on second timeout of watchdog timer has
occurred.
10 Assertion of watchdog reset status output 0 (p_wrs[0]) on second timeout of watchdog timer has
occurred.
11 Assertion of watchdog reset status outputs 0 and 1 (p_wrs[0], p_wrs[1]) on second timeout of
watchdog timer has occurred.
4
(36)
DIS
Decrementer Interrupt Status
5
(37)
FIS
Fixed-Interval Timer Interrupt Status
6–31
(38–63)
—
Reserved
NOTE
The timer status register can be read using mfspr RT,TSR. The timer status
register cannot be directly written to. Instead, bits in the timer status register
corresponding to 1 bit in GPR[RS] can be cleared using mtspr TSR,RS.
2.4.10
Debug Registers
The debug facility registers are described in Chapter 13, “Debug Support.”
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-20
Freescale Semiconductor
Register Model
2.4.11
Hardware Implementation Dependent Register 0 (HID0)
The HID0 register, shown in Figure 2-14, is an e200-implementation dependent register used for various
configuration and control functions.
SPR 1008
0
R
W
Access: Read/Write
1
7
EMCP
—
10
11
13
—
14
15
ICR NHR
All zeros
16
W
9
DOZE NAP SLEEP
Reset
R
8
—
17
18
19
20
21
22
23
24
TBEN SEL_TBCLK DCLREE DCLRCE CICLRDE MCCLRDE DAPUEN
Reset
30
—
31
NOPTI
All zeros
Figure 2-14. Hardware Implementation Dependent Register 0 (HID0)
The HID0 fields are defined in Table 2-10.
Table 2-10. Hardware Implementation Dependent Register 0
Bits
Name
Description
0
[32]
EMCP
Enable Machine Check Pin (p_mcp_b)
0 p_mcp_b pin is disabled
1 p_mcp_b pin is enabled. Asserting p_mcp_b causes a machine check interrupt to be reported.
1–7
[33–39]
—
8
[40]
DOZE
Configure for Doze Power Management Mode
0 Doze mode is disabled
1 Doze mode is enabled
Doze mode is invoked by setting MSR[WE] while this bit is set.
9
[41]
NAP
Configure for Nap Power Management Mode
0 Nap mode is disabled
1 Nap mode is enabled
Nap mode is invoked by setting MSR[WE] while this bit is set.
10
[42]
SLEEP
11–13
[43–45]
—
14
[46]
ICR
Interrupt Inputs Clear Reservation
0 External input, critical input, and nonmaskable Interrupts do not affect reservation status
1 External input, critical input, and nonmaskable interrupts clear an outstanding reservation
15
[47]
NHR
Not Hardware Reset
0 indicates to a reset exception handler that a reset occurred if software had previously set this bit.
1 indicates to a reset exception handler that no reset occurred if software had previously set this bit.
Provided for software use—set anytime by software, cleared by reset.
Reserved
Configure for Sleep Power Management Mode
0 Sleep mode is disabled
1 Sleep mode is enabled
Sleep mode is invoked by setting MSR[WE] while this bit is set.
Only one of DOZE, NAP, or SLEEP should be set for proper operation.
Reserved
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-21
Register Model
Table 2-10. Hardware Implementation Dependent Register 0 (continued)
Bits
Name
Description
16
[48]
—
17
[49]
TBEN
18
[50]
SEL_TBCLK
Select Time Base Clock
0 Time base is based on processor clock
1 Time base is based on p_tbclk input
This bit controls the clock source for the time base. Altering this bit must be done while the time base
is disabled to preclude glitching of the counter. Timer interrupts should be disabled prior to
alteration, and the TBL and TBU registers re-initialized following a change of time base clock source.
19
[51]
DCLREE
Debug Interrupt Clears MSR[EE]
0 MSR[EE] unaffected by debug interrupt
1 MSR[EE] cleared by debug interrupt
This bit controls whether debug interrupts force external input interrupts to be disabled, or whether
they remain unaffected.
20
[52]
DCLRCE
Debug Interrupt Clears MSR[CE]
0 MSR[CE] unaffected by debug interrupt
1 MSR[CE] cleared by debug interrupt
This bit controls whether debug interrupts force critical interrupts to be disabled, or whether they
remain unaffected.
21
[53]
CICLRDE
Critical Interrupt Clears MSR[DE]
0 MSR[DE] unaffected by critical class interrupt
1 MSR[DE] cleared by critical class interrupt
This bit controls whether certain critical interrupts (critical input, watchdog timer) force debug
interrupts to be disabled, or whether they remain unaffected. Machine check interrupts have a
separate control bit.
Note that if critical interrupt debug events are enabled (DBCR0[CIRPT] set, which should only be
done when the debug functionality is enabled), and MSR[DE] is set at the time of a (critical input,
watchdog timer) critical interrupt, a debug event will be generated after the critical interrupt handler
has been fetched, and the debug handler will be executed first. In this case, DSRR0[DE] will have
been cleared, such that after returning from the debug handler, the critical interrupt handler will not
be run with MSR[DE] enabled.
22
[54]
MCCLRDE
Machine Check Interrupt Clears MSR[DE]
0 MSR[DE] unaffected by machine check interrupt
1 MSR[DE] cleared by machine check interrupt
This bit controls whether machine check interrupts force debug interrupts to be disabled, or whether
they remain unaffected.
Reserved
Time Base Enable
0 Time base is disabled
1 Time base is enabled
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-22
Freescale Semiconductor
Register Model
Table 2-10. Hardware Implementation Dependent Register 0 (continued)
Bits
Name
Description
23
[55]
DAPUEN
Debug functionality Enable
0 Debug functionality disabled
1 Debug functionality enabled
This bit controls whether the debug functionality is enabled. When enabled, debug interrupts use
the DSRR0/DSRR1 registers for saving state, and the rfdi instruction is available for returning from
a debug interrupt.
When disabled, debug Interrupts use the critical interrupt resources CSRR0/CSRR1 for saving
state, the rfci instruction is used for returning from a debug interrupt, and the rfdi instruction is
treated as an illegal instruction.
When disabled, the settings of the DCLREE, DCLRCE, CICLRDE, and MCCLRDE bits are ignored
and are assumed to be ones.
Read and write access to DSRR0/DSRR1 via the mfspr and mtspr instructions is not affected by
this bit.
24
[56]
—
Reserved
25–30
[58–62]
—
Reserved
31
[63]
NOPTI
2.4.12
No-Op Touch Instructions
0 icbt, dcbt, dcbtst instructions operate normally
1 icbt, dcbt, dcbtst instructions are no-oped
This bit only affects the icbt, dcbt, and dcbtst instructions.
Hardware Implementation Dependent Register 1 (HID1)
The HID1 register is used for bus configuration and system control. HID1 is shown in Figure 2-15.
SPR 1009
Access: Read/Write
0
15 16
R
23
—
W
Reset
SYSCTL
24
25
ATS
31
—
All zeros
Figure 2-15. Hardware Implementation Dependent Register 1 (HID1)
The HID1 fields are defined in Table 2-11.
Table 2-11. Hardware Implementation Dependent Register 1
Bits
Name
Description
0–15
[32–47]
—
16–23
[48–56]
SYSCTL
System Control. These bits are reflected on the outputs of the p_hid1_sysctl[0:7] output signals for use
in controlling the system. They may need external synchronization.
24
[56]
ATS
Atomic status (read-only). Indicates state of the reservation bit in the load/store unit. See Section 3.5,
“Memory Synchronization and Reservation Instructions,” for more detail.
25–31
[57–63]
—
Reserved
Reserved
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-23
Register Model
2.4.13
Branch Unit Control and Status Register (BUCSR)
The BUCSR register is used for general control and status of the branch target buffer (BTB). BUCSR is
shown in Figure 2-16.
SPR 1013
Access: Read/Write
0
21
R
—
W
Reset
22
BBFI
23
25
—
26
27
28
29
30
31
BALLOC — BPRED BPEN
All zeros
Figure 2-16. Branch Unit Control and Status Register (BUCSR)
The BUCSR fields are defined in Table 2-12.
Table 2-12. Branch Unit Control and Status Register
Bits
Name
0–21
[32–53]
—
22
[54]
BBFI
23–25
[55–57]
—
26–27
[58–59]
BALLOC
28
[60]
—
Description
Reserved
Branch Target Buffer Flash Invalidate.
When written to a one, BBFI flash clears the valid bit of all entries in the branch buffer; clearing occurs
regardless of the value of the enable bit (BPEN).
Note: BBFI is always read as zero.
Reserved
Branch Target Buffer Allocation Control
00 Branch target buffer allocation for all branches is enabled.
01 Branch target buffer allocation is disabled for backward branches.
10 Branch target buffer allocation is disabled for forward branches.
11 Branch target buffer allocation is disabled for both branch directions.
This field controls BTB allocation for branch acceleration when BPEN = 1. Note that BTB hits are not
affected by the settings of this field. Note that for branches with AA = 1, the MSB of the displacement
field is still used to indicate forward/backward, even though the branch is absolute.
Reserved
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-24
Freescale Semiconductor
Register Model
Table 2-12. Branch Unit Control and Status Register
Bits
Name
Description
29–30
[61–62]
BPRED
Branch Prediction Control (Static)
00 Branch predicted taken on BTB miss for all branches.
01 Branch predicted taken on BTB miss only for forward branches.
10 Branch predicted taken on BTB miss only for backward branches.
11 Branch predicted not taken on BTB miss for both branch directions.
This field controls operation of static prediction mechanism on a BTB miss. Unless disabled, fetching of
the predicted target location will be performed for branch acceleration. BPRED operates independently
of BPEN, and with a BPEN setting of 0, will be used to perform static prediction of all unresolved
branches.
Note that BTB hits are not affected by the settings of this field. Note that for certain applications, setting
BPRED to a non-default value may result in improved performance.
31
[63]
BPEN
Branch Target Buffer Prediction Enable.
0 Branch target buffer prediction disabled
1 Branch target buffer prediction enabled (enables BTB to predict branches)
When the BPEN bit is cleared, no hits will be generated from the BTB, and no new entries will be
allocated. Entries are not automatically invalidated when BPEN is cleared; the BBFI bit controls entry
invalidation. BPEN operates independently of BPRED, and will be used even with a BPRED setting of
00.
2.4.14
L1 Cache Control and Status Registers (L1CSR0, L1CSR1)
The L1CSR0 and L1CSR1 registers are used for general control and status of the L1 caches. A description
of the L1CSR0 and L1CSR1 registers can be found in Chapter 9, “L1 Cache.”
2.4.15
L1 Cache Configuration Registers (L1CFG0, L1CFG1)
The L1CFG0 and L1CGF1 registers provide configuration information for the L1 caches supplied with
this version of the e200 CPU core. A description of the L1CFG0 and L1CGF1 registers can be found in
Chapter 9, “L1 Cache.”
2.4.16
L1 Cache Flush and Invalidate Registers (L1FINV0, L1FINV1)
The L1FINV0 and L1FINV1 registers provide software-based flush and invalidation control for the L1
caches supplied with this version of the e200 CPU core. A description of the L1FINV0 and L1FINV1
registers can be found in Chapter 9, “L1 Cache.”
2.4.17
MMU Control and Status Register (MMUCSR0)
The MMUCSR0 register is used for general control of the MMU. A description of the MMUCSR register
can be found in Chapter 10, “Memory Management Unit.”
2.4.18
MMU Configuration Register (MMUCFG)
The MMUCFG register provides configuration information for the MMU supplied with this version of the
e200 CPU core. A description of the MMUCFG register can be found in Chapter 10, “Memory
Management Unit.”
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-25
Register Model
2.4.19
TLB Configuration Registers (TLB0CFG, TLB1CFG)
The TLB0CFG and TLB1CFG registers provide configuration information for the MMU TLBs supplied
with this version of the e200 CPU core. A description of these registers can be found in Chapter 10,
“Memory Management Unit.”
2.5
SPR Register Access
SPRs are accessed with the mfspr and mtspr instructions. The following sections outline additional access
requirements.
2.5.1
Invalid SPR References
System behavior when an invalid SPR is referenced depends on the apparent privilege level of the register
(refer to Table 2-13). The register privilege level is determined by bit 5 in the SPR address. If the invalid
SPR is accessible in user mode, then an illegal exception is generated. If the invalid SPR is accessible only
in supervisor mode and the CPU core is in supervisor mode (MSR[PR] = 0), then an illegal exception is
generated. If the invalid SPR address is accessible only in supervisor mode and the CPU is not in
supervisor mode (MSR[PR] = 1), then a privilege exception is generated.
Note that writes to read-only SPRs and reads of write-only SPRs are treated as invalid SPR references.
Table 2-13. System Response to Invalid SPR Reference
2.5.2
SPR Address
Bit 5
Mode
MSRPR
0
—
—
Illegal exception
1
Supervisor
0
Illegal exception
1
User
1
Privilege exception
Response
Synchronization Requirements for SPRs
With the exception of the following registers, there are no synchronization requirements for accessing
SPRs beyond those stated in the Power ISA embedded architecture. A complete description of
synchronization requirements are contained in the EREF. Software requirements for synchronization
before/after accessing these registers are shown in Table 2-14. The notation CSI in the table refers to a
context synchronizing instruction which include sc, isync, rfi, rfci, and rfdi.
Table 2-14. Additional synchronization requirements for SPRs
Context Altering Event or Instruction
Required
Before
Required
After
Notes
mtmsr[UCLE]
none
CSI
—
mtmsr[SPE]
none
CSI
—
mtmsr[PMM]
none
CSI
—
mfspr
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-26
Freescale Semiconductor
Register Model
Table 2-14. Additional synchronization requirements for SPRs (continued)
Required
Before
Required
After
Notes
Debug Counter register
msync
none
1
DBSR
Debug Status register
msync
none
—
HID0
Hardware implementation dependent reg 0
none
none
—
HID1
Hardware implementation dependent reg 1
msync
none
—
msync
none
—
Context Altering Event or Instruction
DBCNT
L1CSR0, L1CSR1 L1 cache control and status registers 0,1
L1FINV0,
L1FINV1
L1 cache flush and invalidate control registers 0,1
MMUCSR
MMU control and status register 0
msync
none
—
CSI
none
—
mtspr
BUCSR
Branch Unit Control and Status Register
none
CSI
—
DBCNT
Debug Counter register
none
CSI
1
Debug Control Register 0–6
none
CSI
—
msync
none
—
DBCR0–6
DBSR
Debug Status Register
HID0
Hardware implementation dependent reg 0
CSI
isync
—
HID1
Hardware implementation dependent reg 1
msync, isync
CSI
—
L1CSR0
L1 cache control and status register 0
msync, isync
CSI
—
L1CSR1
L1 cache control and status registers 1
none
CSI
—
L1FINV0,
L1FINV1
MASx
MMUCSR
PID
SPEFSCR
L1 cache flush and invalidate control registers 0,1
msync
CSI
—
MMU MAS registers
none
CSI
—
MMU control and status register 0
CSI
CSI
—
PID0 register
none
CSI
—
none
2
—
SPEFSCR register
CSI
Notes:
1. not required if counter is not currently enabled
2. not required for status bit clearing, required for altering exception enable or rounding mode bits
2.5.3
Special Purpose Register Summary
Power ISA embedded category and implementation-specific SPRs for the e200 core are listed in
Table 2-15. All registers are 32 bits in size. Register bits are numbered from bit 0 to bit 31 (most-significant
to least-significant). Shaded entries represent optional registers. An SPR register may be read or written
with the mfspr and mtspr instructions. In the instruction syntax, compilers should recognize the
mnemonic name given in the table below.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-27
Register Model
Table 2-15. Special Purpose Registers
Mnemonic
Name
SPR
Number
Access
Privileged
e200
Specific
1013
R/W
Yes
Yes
BUCSR
Branch unit control and status register
CSRR0
Critical save/restore register 0
58
R/W
Yes
No
CSRR1
Critical save/restore register 1
59
R/W
Yes
No
CTR
Count register
9
R/W
No
No
DAC1
Data address compare 1
316
R/W
Yes
No
DAC2
Data address compare 2
317
R/W
Yes
No
DBCNT
Debug counter register
562
R/W
Yes
Yes
DBCR0
Debug control register 0
308
R/W
Yes
No
DBCR1
Debug control register 1
309
R/W
Yes
No
DBCR2
Debug control register 2
310
R/W
Yes
No
DBCR3
Debug control register 3
561
R/W
Yes
Yes
DBCR4
Debug control register 4
563
R/W
Yes
Yes
DBCR5
Debug control register 5
564
R/W
Yes
Yes
DBCR6
Debug control register 5
603
R/W
Yes
Yes
Debug external resource control register 0
569
Read-only
Yes
Yes
Yes
No
DBERC0
1
DBSR
Debug status register
304
Read/Clear
DDAM
Debug data acquisition messaging register
576
R/W
No
Yes
DEAR
Data exception address register
61
R/W
Yes
No
Decrementer
22
R/W
Yes
No
DECAR
Decrementer auto-reload
54
R/W
Yes
No
DEVENT
Debug event register
975
R/W
No
Yes
DSRR0
Debug save/restore register 0
574
R/W
Yes
Yes
DSRR1
Debug save/restore register 1
575
R/W
Yes
Yes
DVC1
Data value compare 1
318
R/W
Yes
No
DVC2
Data value compare 2
319
R/W
Yes
No
ESR
Exception syndrome register
62
R/W
Yes
No
HID0
Hardware implementation dependent reg 0
1008
R/W
Yes
Yes
HID1
Hardware implementation dependent reg 1
1009
R/W
Yes
Yes
IAC1
Instruction address compare 1
312
R/W
Yes
No
IAC2
Instruction address compare 2
313
R/W
Yes
No
IAC3
Instruction address compare 3
314
R/W
Yes
No
IAC4
Instruction address compare 4
315
R/W
Yes
No
DEC
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-28
Freescale Semiconductor
Register Model
Table 2-15. Special Purpose Registers (continued)
Mnemonic
Name
SPR
Number
Access
Privileged
e200
Specific
IAC5
Instruction address compare 5
565
R/W
Yes
Yes
IAC6
Instruction address compare 6
566
R/W
Yes
Yes
IAC7
Instruction address compare 7
567
R/W
Yes
Yes
IAC8
Instruction address compare 8
568
R/W
Yes
Yes
IVOR0
Interrupt vector offset register 0
400
R/W
Yes
No
IVOR1
Interrupt vector offset register 1
401
R/W
Yes
No
IVOR2
Interrupt vector offset register 2
402
R/W
Yes
No
IVOR3
Interrupt vector offset register 3
403
R/W
Yes
No
IVOR4
Interrupt vector offset register 4
404
R/W
Yes
No
IVOR5
Interrupt vector offset register 5
405
R/W
Yes
No
IVOR6
Interrupt vector offset register 6
406
R/W
Yes
No
IVOR7
Interrupt vector offset register 7
407
R/W
Yes
No
IVOR8
Interrupt vector offset register 8
408
R/W
Yes
No
IVOR9
Interrupt vector offset register 9
409
R/W
Yes
No
IVOR10
Interrupt vector offset register 10
410
R/W
Yes
No
IVOR11
Interrupt vector offset register 11
411
R/W
Yes
No
IVOR12
Interrupt vector offset register 12
412
R/W
Yes
No
IVOR13
Interrupt vector offset register 13
413
R/W
Yes
No
IVOR14
Interrupt vector offset register 14
414
R/W
Yes
No
IVOR15
Interrupt vector offset register 15
415
R/W
Yes
No
IVOR32
Interrupt vector offset register 32
528
R/W
Yes
Yes
IVOR33
Interrupt vector offset register 33
529
R/W
Yes
Yes
IVOR34
Interrupt vector offset register 34
530
R/W
Yes
Yes
IVOR35
Interrupt vector offset register 35
531
R/W
Yes
Yes
Interrupt vector prefix register
63
R/W
Yes
No
Link register
8
R/W
No
No
IVPR
LR
L1CFG0
L1 cache config register 0
515
Read-only
No
Yes
L1CFG1
L1 cache config register 1
516
Read-only
No
Yes
L1CSR0
L1 cache control and status register 0
1010
R/W
Yes
Yes
L1CSR1
L1 cache control and status register 1
1011
R/W
Yes
Yes
L1FINV0
L1 cache flush and invalidate control register 0
1016
R/W
Yes
Yes
L1FINV1
L1 cache flush and invalidate control register 0
959
R/W
Yes
Yes
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-29
Register Model
Table 2-15. Special Purpose Registers (continued)
Mnemonic
Name
SPR
Number
Access
Privileged
e200
Specific
MAS0
MMU assist register 0
624
R/W
Yes
Yes
MAS1
MMU assist register 1
625
R/W
Yes
Yes
MAS2
MMU assist register 2
626
R/W
Yes
Yes
MAS3
MMU assist register 3
627
R/W
Yes
Yes
MAS4
MMU assist register 4
628
R/W
Yes
Yes
MAS6
MMU assist register 6
630
R/W
Yes
Yes
MCAR
Machine check address register
573
R/W
Yes
Yes
MCSR
Machine check syndrome register
572
R/Clear2
Yes
Yes
MCSRR0
Machine check save/restore register 0
570
R/W
Yes
Yes
MCSRR1
Machine check save/restore register 1
571
R/W
Yes
Yes
MMUCFG
MMU configuration register
1015
Read-only
Yes
Yes
MMUCSR
MMU control and status register 0
1012
R/W
Yes
Yes
PID0
Process ID register
48
R/W
Yes
No
PIR
Processor ID register
286
R/W
Yes
No
PVR
Processor version register
287
Read-only
Yes
No
SPE status and control register
512
R/W
No
No
SPRG0
SPR general 0
272
R/W
Yes
No
SPRG1
SPR general 1
273
R/W
Yes
No
SPRG2
SPR general 2
274
R/W
Yes
No
SPRG3
SPR general 3
275
R/W
Yes
No
SPRG4
SPR general 4
260
Read-only
No
No
276
R/W
Yes
No
261
Read-only
No
No
277
R/W
Yes
No
262
Read-only
No
No
278
R/W
Yes
No
263
Read-only
No
No
279
R/W
Yes
No
SPEFSCR
SPRG5
SPRG6
SPRG7
SPR general 5
SPR general 6
SPR general 7
SPRG8
SPR general 8
604
R/W
Yes
Yes
SPRG9
SPR general 9
605
R/W
Yes
Yes
SRR0
Save/restore register 0
26
R/W
Yes
No
SRR1
Save/restore register 1
27
R/W
Yes
No
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-30
Freescale Semiconductor
Register Model
Table 2-15. Special Purpose Registers (continued)
Mnemonic
Name
SPR
Number
Access
Privileged
e200
Specific
SVR
System version register
1023
Read-only
Yes
Yes
TBL
Time base lower
268
Read-only
No
No
284
Write-only
Yes
No
269
Read-only
No
No
285
Write-only
Yes
No
Timer control register
340
R/W
Yes
No
TLB0CFG
TLB0 configuration register
688
Read-only
Yes
Yes
TLB1CFG
TLB1 configuration register
689
Read-only
Yes
Yes
Yes
No
TBU
Time base upper
TCR
TSR
Timer status register
336
Read/Clear3
USPRG0
User SPR general 0
256
R/W
No
No
1
R/W
No
No
XER
Integer exception register
Note:
1
The Debug Status Register can be read using mfspr RT,DBSR. The Debug Status Register cannot be directly written to.
Instead, bits in the Debug Status Register corresponding to ‘1’ bits in GPR(RS) can be cleared using mtspr DBSR,RS.
2 The Machine Check Syndrome Register can be read using mfspr RT,MCSR. The Machine Check Syndrome Register
cannot be directly written to. Instead, bits in the Machine Check Syndrome Register corresponding to ‘1’ bits in GPR(RS)
can be cleared using mtspr MCSR,RS.
3 The Timer Status Register can be read using mfspr RT,TSR. The Timer Status Register cannot be directly written to.
Instead, bits in the Timer Status Register corresponding to ‘1’ bits in GPR(RS) can be cleared using mtspr TSR,RS.
2.6
Reset Settings
Table 2-16 shows the state of the Power ISA embedded category architecture registers and other optional
resources immediately following a system reset.
Table 2-16. Reset Settings for e200 Resources
Resource
System Reset Setting
Program Counter
p_rstbase[0:29] || 0b00
GPR
Unaffected1
CR
Unaffected1
BUCSR
All zeros
CSRR0
Unaffected1
CSRR1
Unaffected1
CTR
Unaffected1
DAC1
All zeros2
DAC2
All zeros2
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-31
Register Model
Table 2-16. Reset Settings for e200 Resources (continued)
Resource
System Reset Setting
DBCNT
Unaffected1
DBCR0
All zeros2
DBCR1
All zeros2
DBCR2
All zeros2
DBCR3
All zeros2
DBCR4
All zeros2
DBCR5
All zeros2
DBCR6
All zeros2
DBSR
0x1000_00002
DDAM
All zeros2
DEAR
Unaffected1
DEC
Unaffected1
DECAR
Unaffected1
DEVENT
All zeros2
DSRR0
Unaffected1
DSRR1
Unaffected1
DVC1
Unaffected1
DVC2
Unaffected1
ESR
All zeros
HID0
All zeros
HID1
All zeros
IAC1
All zeros2
IAC2
All zeros2
IAC3
All zeros2
IAC4
All zeros2
IAC5
All zeros2
IAC6
All zeros2
IAC7
All zeros2
IAC8
All zeros2
IVORxx
Unaffected1
IVPR
Unaffected1
LR
Unaffected1
L1CFG0, L1CFG13
—
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-32
Freescale Semiconductor
Register Model
Table 2-16. Reset Settings for e200 Resources (continued)
Resource
System Reset Setting
L1CSR0, 1
All zeros
L1FINV0, 1
All zeros
MAS0
Unaffected1
MAS1
Unaffected1
MAS2
Unaffected1
MAS3
Unaffected1
MAS4
Unaffected1
MAS6
Unaffected1
MCAR
Unaffected1
MCSR
All zeros
MCSRR0
Unaffected1
MCSRR1
Unaffected1
MMUCFG3
—
MSR
All zeros
PID0
All zeros
PIR
0x00_0000 || p_cpuid[0:7]
PVR3
—
SPEFSCR
All zeros
SPRG0
Unaffected1
SPRG1
Unaffected1
SPRG2
Unaffected1
SPRG3
Unaffected1
SPRG4
Unaffected1
SPRG5
Unaffected1
SPRG6
Unaffected1
SPRG7
Unaffected1
SPRG8
Unaffected1
SPRG9
Unaffected1
SRR0
Unaffected1
SRR1
Unaffected1
SVR3
—
TBL
Unaffected1
TBU
Unaffected1
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
2-33
Register Model
Table 2-16. Reset Settings for e200 Resources (continued)
Resource
System Reset Setting
TCR
All zeros
TSR
All zeros
3
TLB0CFG
—
TLB1CFG3
—
USPRG0
Unaffected1
XER
All zeros
1
Undefined on m_por assertion, unchanged on p_reset_b assertion
Reset by processor reset p_reset_b if DBCR0[EDM] = 0, as well as unconditionally by m_por.
3
Read-only registers
2
e200z7 Power Architecture Core Reference Manual, Rev. 2
2-34
Freescale Semiconductor
Chapter 3
Instruction Model
This chapter provides additional information about the Power ISA embedded category architecture as it
relates specifically to the e200z760n3.
The e200z7 is a 32-bit implementation of the Power ISA embedded category architecture as described in
the EREF. However, different processor implementations may require clarifications, extensions, or
deviations from the architectural descriptions. See the processor-specific reference manuals for details
about deviations.
3.1
Unsupported Instructions and Instruction Forms
Because the e200z7 is a 32-bit Power ISA embedded category core, all of the instructions defined for
64-bit implementations of the Power ISA architecture are illegal on the e200. See the EREF for more
information on 64-bit instructions. The e200 takes an illegal instruction exception type program interrupt
upon encountering a 64-bit Power ISA instruction.
Besides the 64-bit instructions, there are other Power ISA embedded category instructions not supported
by the e200z7. If one of these instructions is executed on the e200z7, an unimplemented operation or FP
(floating-point) unavailable exception is generated.
3.2
Implementation Specific Instructions
Several Power ISA embedded category instructions are implementation-specific. Table 3-1 summarizes
these e200 implementation-specific instructions.
Table 3-1. Implementation-Specific Instruction Summary
Mnemonic
mfapidi
Implementation Details
Unimplemented instructions
mfdcrx
mtdcrx
stbcx., sthcx., stwcx.
1
mfdcr, mtdcr
1
Address match with prior lbarx, lharx, or lwarx not required for store to be performed
Optionally supported instructions
The e200 CPU will take an illegal instruction exception for unsupported DCR values
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-1
Instruction Model
3.3
Power ISA Embedded Category Instruction Extensions
This section describes how certain Power ISA embedded category instructions support the Power ISA
VLE functionality:
• rfci, rfdi, rfi, rfmci—No longer mask bit 62 of CSRR0, DSRR0, or SRR0, respectively. The
destination address is [D,C,MC]SRR0[32:62] || 0b0.
• bclr, bclrl, bcctr, bcctrl—No longer mask bit 62 of the LR or CTR, respectively. The destination
address is [LR,CTR][32:62] || 0b0.
3.4
Memory Access Alignment Support
The e200 core provides hardware support for unaligned memory accesses; however, there is a performance
degradation for accesses which cross a 64-bit (8-byte) boundary. For loads that hit in the cache, the
throughput of the load/store unit is degraded to 1 misaligned load every 2 cycles. Stores which are
misaligned across a 64-bit (8-byte) boundary can be translated at a rate of 2 cycles per store. Frequent use
of unaligned memory accesses is discouraged because of the impact on performance.
NOTE
Accesses which cross a translation boundary may be restarted. A misaligned
access which crosses a page boundary is restarted in its entirety in the event
of a TLB miss of the second portion of the access. This may result in the first
portion being accessed twice.
Accesses that cross a translation boundary where the endianness changes
cause a byte ordering DSI exception.
3.5
Memory Synchronization and Reservation Instructions
The msync instruction provides a synchronization function and a memory barrier function. This
instruction waits for all preceding instructions and data memory accesses to complete before the msync
instruction completes. Subsequent instructions in the instruction stream are not initiated until after the
msync instruction completes to ensure these functions have been performed.
In addition, the msync instructions and the mbar w/MO = 0 or 1 instructions handshake with the system
to ensure that all accesses initiated by this CPU have been “performed” with respect to all other processors
and mechanisms prior to completion of the instruction.
On the e200 core, the mbar instruction with MO = 0, 1, or 1 behaves similarly to the msync instruction,
but only waits for previous data memory accesses rather than all previous instructions to complete before
completing. The mbar instruction with MO = 2 behaves similarly to the msync instruction, but only waits
for previous data memory accesses rather than all previous instructions to complete before completing, and
does not signal synchronizations to other processors through the synchronization port. The mbar
instruction may be preferred for most memory synchronization operations, since it does not stall
instruction execution if no load or store operations remain in the execution pipeline, unlike the msync
instruction. The mbar instruction with the MO field not equal to 0, 1, or 2 is treated as illegal by the e200
core.
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-2
Freescale Semiconductor
Instruction Model
The e200 core implements the lwarx and stwcx. instructions as described in the Power ISA embedded
category, as well as the lharx, lbarx, sthcx., and stbcx. instructions defined by the EIS enhanced
reservation functionality. If the EA is not a multiple of the access size for these instructions, an alignment
interrupt is invoked. The e200 allows reservation instructions to access a page that is marked as
write-through required or cache-inhibited, and no data storage interrupt is invoked.
As allowed by the Power ISA embedded category, the e200 core does not require that the EA of the
store-type instruction must be to the same reservation granule as the EA of a preceding reservation
load-type instruction for a reservation store-type instruction to succeed. Reservation granularity is
implementation dependent. The e200 core does not define a reservation granule explicitly; reservation
granularity is defined by external logic. When no external logic is provided, the e200 core performs no
address comparison checking, thus the effective implementation granularity is null.
The e200 core implements an internal status flag (HID1[ATS]) representing reservation status. This flag
is set when a load-type reservation instruction is executed and completes without error, and remains set
until it is cleared by one of the following mechanisms:
• Execution of a store-type reservation instruction is completed without error.
• The e200 core p_rsrv_clr input signal is asserted.
• The reservation is invalidated when an external input, critical input, or nonmaskable interrupt is
signaled and HID0[ICR] is set.
When the e200 core decodes a store-type reservation instruction, it checks the value of the local reservation
flag (HID1[ATS]). If the status indicates that no reservation is active, the store-type reservation instruction
is treated as a no-op. No exceptions are taken and no access is performed; thus no data breakpoint occurs,
regardless of matching the data breakpoint attributes.
The e200 core treats reservation accesses as though they were both cache inhibited and guarded, regardless
of page attributes. A cache line corresponding to the address of a reservation access is flushed to memory
if dirty, and then invalidated, prior to the reservation access being issued to the bus. This is done to allow
external reservation logic to be built, which properly signals a reservation failure.
The e200 core provides the input signal p_xfail_b, which is sampled at termination of a st[b,h,w]cx. store
transfer to allow an external agent or mechanism to indicate that the st[b,h,w]cx. instruction has failed to
update memory, even though a reservation existed for the store at the time it was issued. This is not
considered an error and causes the condition codes for the st[b,h,w]cx. instruction to be written as if a
reservation did not exist for the st[b,h,w]cx. instruction. In addition, any outstanding reservation is
cleared.
The p_rsrv_clr input signal is not intended for normal use in managing reservations. It is provided for
specialized system applications. The normal bus protocol is used to manage reservations using external
reservation logic in systems with multiple coherent bus masters, using the transfer type and transfer
response signals. In single coherent master systems, no external logic is required, and the internal
reservation flag is sufficient to support multitasking applications.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-3
Instruction Model
3.6
Branch Prediction
The e200z7 instruction fetching mechanism uses a branch target buffer (BTB) that holds branch target
addresses combined with a 2-bit saturating up-down counter scheme for branch prediction. Branch paths
are predicted by either the branch target buffer (BTB hit) or a selectable static prediction algorithm (BTB
miss) and subsequently checked to see if the prediction was correct. This enables operation beyond a
conditional branch without waiting for the branch to be decoded and resolved. The instruction fetch unit
predicts the direction of the branch as follows:
• Predict taken for any backward branch whose fetch address hits in the BTB and is predicted taken
by the counter or misses in the BTB and static prediction control in BUCSR for backward branches
indicates ‘predict taken’. Otherwise, predict not-taken.
• Predict taken for any forward branch whose fetch address hits in the BTB and is predicted taken
by the counter or misses in the BTB and static prediction control in BUCSR for forward branches
indicates ‘predict taken’. Otherwise, predict not-taken.
3.7
Interruption of Instructions by Interrupt Requests
In general, the e200z7 core samples pending nonmaskable interrupts, external input, and critical input
interrupt requests at instruction boundaries. However, in order to reduce interrupt latency, long running
instructions may be interrupted prior to completion. Instructions in this class include divides (divw[uo][.],
efsdiv, evfsdiv, evdivw[su]), load multiple word (lmw, e_lmw), and store multiple word (stmw, e_stmw).
In addition, the e_lmvgprw, e_stmvgprw, e_lmvsprw, and e_stmvsprw Volatile Context Save/Restore
functionality instructions may also be interrupted prior to completion. When interrupted prior to
completion, the value saved in SRR0/CSRR0/MCSRR0 is the address of the interrupted instruction. The
instruction is restarted from the beginning after returning to it from the interrupt handler.
3.8
New e200 Functionality
The e200z7core implements the following functionality that may be new to users migrating from earlier
implementations of the e200 core family, and these new categories of functionality are listed here to
highlight these new features. Many of these instructions are now part of the Power ISA embedded
architecture, while others are currently only implemented as EIS functionality in Freescale processors.
• The Power ISA isel instruction described in Section 3.9, “ISEL instruction,” and also in the EREF.
• The Power ISA Enhanced Debug Functionality and the Debug Notify Halt instructions described
in Section 3.10, “Enhanced Debug,” and also in the EREF.
• The Power ISA Machine Check functionality described in Section 3.11, “Machine Check,” and
also in the EREF.
• The Power ISA wait instruction described in Section 3.12, “WAIT Instruction.”
• The volatile context save/restore unit, which is described in Section 3.14, “Volatile Context
Save/Restore Unit
• The Power ISA embedded floating-point unit, described along with supporting instructions in
Chapter 5, “Embedded Floating-Point Unit.” The EFPU is a subset of the SPE.
• The Power ISA Signal Processing Extension (SPE) version, described along with supporting
instructions in Chapter 6, “Signal Processing Extension (SPE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-4
Freescale Semiconductor
Instruction Model
•
The Power ISA performance monitor functionality which is described in Chapter 8, “Performance
Monitor and also described in the EREF.
The Power ISA cache line-locking functionality described in Section 9.10, “Cache Management
Instructions,” and also in the EREF.
The enhanced reservations functionality described in Section 3.13, “Enhanced Reservations.”
•
•
3.9
ISEL instruction
The isel instruction provides a means to select one of two registers and place the result in a destination
register under the control of a predicate value supplied by a bit in the condition register. This instruction
can be used to eliminate branches in software and in many cases improve performance. This instruction
can also increase program execution time determinism by eliminating the need to predict the target and
direction of the branches replaced by the integer select function. The instruction form and definition is as
follows.
isel
isel
Integer Select
isel
RT, RA, RB, crb
31
0
RT
5
6
RA
10 11
RB
15 16
crb
20 21
01111
25 26
0
30 31
if RA=0 then a 320else a GPR(RA)
c = CRcrb
if c then GPR(RT)  a
else GPR(RT)  GPR(RB)
For isel, if the bit of the CR specified by (crb) is set, the contents of RA | 0 are copied into RT. If the bit of
the CR specified by (crb) is clear, the contents of RB are copied into RT.
Other registers altered:
• None
3.10
Enhanced Debug
The e200z7 implements the Power ISA embedded debug architecture to support the capability to handle
the debug interrupt as an additional interrupt level. To support this interrupt level, a new return from debug
interrupt (rfdi, se_rfdi) instruction is defined as part of the debug APU, along with a new pair of
save/restore registers, DSRR0 and DSRR1.
When the debug capability is enabled (HID0[DAPUEN] = 1), the rfdi or se_rfdi instruction provides a
means to return from a debug interrupt. See Section 2.4.11, “Hardware Implementation Dependent
Register 0 (HID0),” for more information about enabling the debug functionality.
The instruction form and definition is as follows:
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-5
Instruction Model
rfdi
rfdi
Return From Debug Interrupt
rfdi
19
///
0
5
0000100111
6
0
20 21
30 31
MSR DSRR1
PC DSRR00:30 || 10
The rfdi instruction is used to return from a debug interrupt, or as a means of simultaneously establishing
a new context and synchronizing on that new context.
The contents of debug save/restore register 1 are placed into the machine state register. If the new machine
state register value does not enable any pending exceptions, then the next instruction is fetched, under
control of the new machine state register value from the address DSRR0[0–30] || 0b0. If the new machine
state register value enables one or more pending exceptions, the interrupt associated with the highest
priority pending exception is generated; in this case the value placed into save/restore register 0 or critical
save/restore register 0 by the interrupt processing mechanism is the address of the instruction that would
have been executed next had the interrupt not occurred (that is, the address in debug save/restore register 0
at the time of the execution of the rfdi).
Execution of this instruction is privileged and context synchronizing.
Special registers altered:
• MSR
When the debug unit is disabled (HID0[DAPUEN] = 0), this instruction is treated as an illegal instruction.
se_rfdi
se_rfdi
Return From Debug Interrupt
se_rfdi
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
0
15
MSR DSRR1
PC DSRR032:62 || 0b0
The rfdi or se_rfdi instruction is used to return from a debug interrupt or as a means of simultaneously
establishing a new context and synchronizing on that new context.
The contents of debug save/restore register 1 are place into the machine state register. If the new machine
state register value does not enable any pending exceptions, then the next instruction is fetched, under
control of the new machine state register value from the address DSRR0[32–62] || 0b0. If the new machine
state register value enables one or more pending exceptions, the interrupt associated with the highest
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-6
Freescale Semiconductor
Instruction Model
priority pending exception is generated; in this case the value placed into save/restore register 0 or critical
save/restore register 0 by the interrupt processing mechanism is the address of the instruction that would
have been executed next had the interrupt not occurred (that is, the address in debug save/restore register 0
at the time of the execution of the rfdi or se_rfdi).
Execution of this instruction is privileged and context synchronizing.
Special registers altered:
• MSR
When the debug unit is disabled (HID0[DAPUEN] = 0), this instruction is treated as an illegal instruction.
3.10.1
Debug Notify Halt Instructions
The dnh, e_dnh, and se_dnh instructions provide a bridge between the execution of instructions on the
core in a non-halted mode, and an external debug facility. dnh, e_dnh, and se_dnh allows software to
transition the core from a running state to a debug halted state if enabled by an external debugger, and dnh
provides the external debugger with bits reserved in the instruction itself to pass additional information.
For e200z760n3, when the CPU enters a debug halted state due to a dnh, e_dnh, or se_dnh instruction,
the instruction is stored in the CPUSCR[IR] portion. The CPUSCR[PC] value points to the instruction.
The external debugger should update the CPUSCR prior to exiting the debug halted state to point past the
dnh, e_dnh, or se_dnh instruction.
Note that the dnh instruction is only available in Power ISA embedded category instruction pages, and the
e_dnh and se_dnh instructions are only available in VLE instruction pages.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-7
Instruction Model
dnh
dnh
Debugger Notify Halt
dnh
0
5
dui, duis
6
0 1 0 0 1 1
10
dui
11
15
16
duis
20
21
0
0
1
1
0
0
0
1
1
30
31
0
/
if EDBCRDNH_EN = 1 then
implementation dependent register dui
halt processor
else
illegal instruction exception
Execution of the dnh instruction causes the processor to halt if the external debug facility has enabled such
action by previously setting EDBCR[DNH_EN]. If the processor is halted, the contents of the dui field are
provided to the external debug facility to identify the reason for the halt.
If EDBCR[DNH_EN] has not been previously set by the external debug facility, executing the dnh
instruction produces an illegal instruction exception.
The duis field is provided to pass additional information about the halt, but requires that actions be
performed by the external debug facility to access the dnh instruction to read the contents of the field.
The dnh instruction is not privileged, and executes the same regardless of the state of MSR[PR].
The current state of the processor debug facility, whether the processor is in IDM or EDM mode has no
effect on the execution of the dnh instruction.
Other registers altered:
• None.
Software Note: After the dnh instruction has executed, the instruction itself can be read back by the Illegal
Instruction Interrupt handler or the external debug facility if the contents of the dui and duis field are of
interest. If the processor entered the Illegal Instruction Interrupt handler, software can use SRR0 to obtain
the address of the dnh instruction which caused the handler to be invoked. If the processor is halted in
debug mode, the external debug facility can access the CPUSCR register to obtain the dnh instruction
which caused the processor to halt.
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-8
Freescale Semiconductor
Instruction Model
e_dnh
e_dnh
Debugger Notify Halt
e_dnh
0
5
dui, duis
6
0 1 1 1 1 1
10
dui
11
15
16
duis
20
21
0
0
0
1
1
0
0
0
0
30
31
1
/
if EDBCRDNH_EN = 1 then
implementation dependent register dui
halt processor
else
illegal instruction exception
Execution of the e_dnh instruction causes the processor to halt if the external debug facility has enabled
such action by previously setting EDBCR[DNH_EN]. If the processor is halted, the contents of the dui
field are provided to the external debug facility to identify the reason for the halt.
If EDBCR[DNH_EN] has not been previously set by the external debug facility, executing the e_dnh
instruction produces an illegal instruction exception.
The duis field is provided to pass additional information about the halt, but requires that actions be
performed by the external debug facility to access the e_dnh instruction to read the contents of the field.
The e_dnh instruction is not privileged, and executes the same regardless of the state of MSR[PR].
The current state of the processor debug facility, whether the processor is in IDM or EDM mode has no
effect on the execution of the e_dnh instruction.
Other registers altered:
• None
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-9
Instruction Model
se_dnh
se_dnh
Debugger Notify Halt
se_dnh
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
0
15
if EDBCRDNH_EN = 1 then
halt processor
else
illegal instruction exception
Execution of the se_dnh instruction causes the processor to halt if the external debug facility has enabled
such action by previously setting EDBCR[DNH_EN].
If EDBCR[DNH_EN] has not been previously set by the external debug facility, executing the se_dnh
instruction produces an illegal instruction exception.
The se_dnh instruction is not privileged, and executes the same regardless of the state of MSR[PR].
The current state of the processor debug facility, whether the processor is in IDM or EDM mode has no
effect on the execution of the se_dnh instruction.
Other registers altered:
• None.
3.11
Machine Check
The e200z7 implements the Power ISA embedded category machine check functionality to support the
capability to handle the machine check interrupt as an additional interrupt level. To support this interrupt
level, a new Return From Machine Check Interrupt (rfmci, se_rfmci) instruction is defined as part of the
machine check capability, along with a new pair of save/restore registers (MCSRR0 and MCSRR1), a
machine check syndrome register (MCSR), and a machine check address register (MCAR).
The rfmci and se_rfmci instructions provide a means to return from a machine check interrupt. The
instruction form and definitions is as follows:
rfmci
rfmci
Return From Machine Check Interrupt
rfmci
///
19
0
5
6
0000100110
20 21
0
30 31
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-10
Freescale Semiconductor
Instruction Model
MSR MCSRR1
PC MCSRR00:30 || 10
The rfmci instruction is used to return from a machine check interrupt, or as a means of simultaneously
establishing a new context and synchronizing on that new context.
The contents of machine check save/restore register 1 are place into the machine state register. If the new
machine state register value does not enable any pending exceptions, then the next instruction is fetched,
under control of the new machine state register value from the address MCSRR0[0:30] || 0b0. If the new
machine state register value enables one or more pending exceptions, the interrupt associated with the
highest priority pending exception is generated; in this case the value placed into the appropriate
save/restore register 0 by the interrupt processing mechanism is the address of the instruction that would
have been executed next had the interrupt not occurred (that is, the address in machine check save/restore
register 0 at the time of the execution of the rfmci).
Execution of this instruction is privileged and context synchronizing.
Special registers altered:
• MSR
NOTE
This instruction is only available in 32-bit Power ISA embedded category
instruction pages. It is not available in VLE instruction pages.
se_rfmci
se_rfmci
Return From Machine Check Interrupt
se_rfmci
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
1
1
15
MSR MCSRR1
PC MCSRR00:30 || 10
The se_rfmci instruction is used to return from a machine check interrupt, or as a means of simultaneously
establishing a new context and synchronizing on that new context.
The contents of machine check save/restore register 1 are place into the machine state register. If the new
machine state register value does not enable any pending exceptions, then the next instruction is fetched,
under control of the new machine state register value from the address MCSRR0[0–30] || 0b0. If the new
machine state register value enables one or more pending exceptions, the interrupt associated with the
highest priority pending exception is generated; in this case the value placed into the appropriate
save/restore register 0 by the interrupt processing mechanism is the address of the instruction that would
have been executed next had the interrupt not occurred (that is, the address in machine check save/restore
register 0 at the time of the execution of the se_rfmci).
Execution of this instruction is privileged and context synchronizing.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-11
Instruction Model
Special registers altered:
• MSR
NOTE
This instruction is only available in VLE instruction pages. It is not
available in 32-bit Power ISA embedded category instruction pages.
3.12
WAIT Instruction
The wait instruction allows software to cease all synchronous activity, waiting for an asynchronous
interrupt or debug interrupt to occur. The instruction can be used to cease processor activity in both user
and supervisor modes. Asynchronous interrupts which will cause the waiting state to be exited if enabled
are critical input, external input, machine check pin (p_mcp_b). Nonmaskable interrupts (p_nmi_b) also
cause the waiting state to be exited.
wait
wait
Wait for Interrupt
wait
0
0
5
1
1
1
1
1
6
10 11
15 16
///
20 21
0
31
0
0
0
1
1
1
1
1
0
/
The wait instruction provides an ordering function for the effects of all instructions executed by the
processor executing the wait instruction and stops synchronous processor activity. Executing a wait
instruction ensures that all instructions have completed before the wait instruction completes, causes
processor instruction fetching to cease, and ensures that no subsequent instructions are initiated until an
asynchronous interrupt or a debug interrupt occurs.
Once the wait instruction has completed, the program counter will point to the next sequential instruction.
The saved value in xSRR0 when the processor re-initiates activity will point to the instruction following
the wait instruction.
Execution of a wait instruction places the CPU in the waiting state and is indicated by assertion of the
p_waiting output signal. The signal will be negated after leaving the waiting state.
Software must ensure that interrupts responsible for exiting the waiting state are enabled before executing
a wait instruction.
Architecture Note: The wait instruction can be used in verification test cases to signal the end of a test
case. The encoding for the instruction is the same in both big- and little-endian modes.
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-12
Freescale Semiconductor
Instruction Model
3.13
Enhanced Reservations
The e200 implements the Freescale EIS enhanced reservations functionality that extends the load and
reserve and store conditional instructions to support byte and half-word data types. These instructions
operate in the same manner as the lwarx and stwcx. instructions, except for the size of the access.
lbarx
lbarx
Load Byte And Reserve Indexed
lbarx
0
RT,RA,RB
1
1
1
1
1
0
(X-mode)
RT
6
RA
11
RB
16
0
0
0
0
1
1
0
1
0
0
21
/
31
if RA=0 then a  640 else a  GPR(RA)
if X-mode then EA  320 || (a + GPR(RB))32:63
RESERVE  1
RESERVE_ADDR  real_addr(EA)
GPR(RT)  560 || MEM(EA,1)
Let the effective address (EA) be calculated as follows:
• For lbarx, let EA be 32 zeros concatenated with bits 32–63 of the sum of the contents of GPR(RA),
or 64 zeros if RA = 0, and the contents of GPR(RB).
The byte in storage addressed by EA is loaded into GPR(RT)[56–63]. GPR(RT)[0–55] are set to zero.
This instruction creates a reservation for use by a store byte conditional instruction. An address computed
from the EA is associated with the reservation and replaces any address previously associated with the
reservation.
Special registers altered:
• None
lharx
lharx
Load Half Word And Reserve Indexed
lharx
0
RT,RA,RB
1
1
1
1
1
0
(X-mode)
RT
6
RA
11
RB
16
0
0
0
1
21
1
1
0
1
0
0
/
31
if RA=0 then a  640 else a  GPR(RA)
EA  320 || (a + GPR(RB))32:63
RESERVE  1
RESERVE_ADDR  real_addr(EA)
GPR(RT)  480 || MEM(EA,2)
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-13
Instruction Model
Let the effective address (EA) be calculated as follows:
• For lharx, let EA be 32 zeros concatenated with bits 32–63 of the sum of the contents of GPR(RA),
or 64 zeros if RA = 0, and the contents of GPR(RB).
The half-word in storage addressed by EA is loaded into GPR(RT)[48–63]. GPR(RT)[0–47] are set to
zero.
This instruction creates a reservation for use by a Store Half Word Conditional instruction. An address
computed from the EA is associated with the reservation and replaces any address previously associated
with the reservation.
EA must be a multiple of 2. If it is not, either an alignment interrupt is invoked or the results are boundedly
undefined.
Special registers altered:
• None
stbcx.
stbcx.
Store Byte Conditional Indexed
stbcx.
0
0
RS,RA,RB
1
1
1
1
1
(X-mode)
RS
6
RA
11
RB
16
1
0
1
0
21
1
1
0
1
1
0
1
31
if RA=0 then a  640 else a  GPR(RA)
EA  320 || (a + GPR(RB))32:63
if RESERVE then
if RESERVE_ADDR = real_addr(EA) then
MEM(EA,1)  GPR(RS)56:63
CR0  0b00 || 0b1 || XERSO
else
u  undefined 1-bit value
if u then MEM(EA,1) ¨ GPR(RS)56:63
CR0  0b00 || u || XERSO
RESERVE  0
else
CR0  0b00 || 0b0 || XERSO
Let the effective address (EA) be calculated as follows:
• For stbcx., let EA be 32 zeros concatenated with bits 32–63 of the sum of the contents of GPR(RA),
or 64 zeros if RA = 0, and the contents of GPR(RB).
If a reservation exists and the storage address specified by the stbcx. is the same as that specified by the
lbarx instruction that established the reservation, the contents of bits 56–63 of GPR(RS) are stored into
the byte in storage addressed by EA and the reservation is cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-14
Freescale Semiconductor
Instruction Model
If a reservation exists but the storage address specified by the stbcx. is not the same as that specified by
the Load and Reserve instruction that established the reservation, the reservation is cleared, and it is
undefined whether the instruction completes without altering storage.
If a reservation does not exist, the instruction completes without altering storage.
CR Field 0 is set to reflect whether the store operation was performed, as follows.
CR0LT GT EQ SO = 0b00 || store_performed || XERSO
Special registers altered:
CR0
sthcx.
sthcx.
Store Half Word Conditional Indexed
sthcx.
0
RS,RA,RB
1
1
1
1
1
0
(X-mode)
RS
6
RA
11
RB
16
1
0
1
1
0
21
1
0
1
1
0
1
31
if RA=0 then a  640 else a  GPR(RA)
EA  320 || (a + GPR(RB))32:63
if RESERVE then
if RESERVE_ADDR = real_addr(EA) then
MEM(EA,2)  GPR(RS)48:63
CR0  0b00 || 0b1 || XERSO
else
u  undefined 1-bit value
if u then MEM(EA,2)  GPR(RS)48:63
CR0  0b00 || u || XERSO
RESERVE  0
else
CR0  0b00 || 0b0 || XERSO
Let the effective address (EA) be calculated as follows:
• For sthcx., let EA be 32 zeros concatenated with bits 32–63 of the sum of the contents of GPR(RA),
or 64 zeros if RA = 0, and the contents of GPR(RB).
If a reservation exists and the storage address specified by the sthcx. is the same as that specified by the
lharx instruction that established the reservation, the contents of bits 48–63 of GPR(RS) are stored into
the half-word in storage addressed by EA and the reservation is cleared.
If a reservation exists but the storage address specified by the sthcx. is not the same as that specified by
the Load and Reserve instruction that established the reservation, the reservation is cleared, and it is
undefined whether the instruction completes without altering storage.
If a reservation does not exist, the instruction completes without altering storage.
CR Field 0 is set to reflect whether the store operation was performed, as follows.
CR0LT GT EQ SO = 0b00 || store_performed || XERSO
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-15
Instruction Model
EA must be a multiple of 2. If it is not, either an alignment interrupt is invoked or the results are boundedly
undefined.
Special registers altered:
• CR0
3.14
Volatile Context Save/Restore Unit
The e200 implements the EIS volatile context save/restore unit to support the capability to quickly save
and restore volatile register context on entry into an interrupt handler. To support this functionality, a new
set of instructions is defined as part of the unit.
• e_lmvgprw, e_stmvgprw—load/store multiple volatile GPRs (r0, r3:r12)
• e_lmvsprw, e_stmvsprw—load/store multiple volatile SPRs (CR, LR, CTR, and XER)
• e_lmvsrrw, e_stmvsrrw—load/store multiple volatile SRRs (SRR0, SRR1)
• e_lmvcsrrw, e_stmvcsrrw—load/store multiple volatile CSRRs (CSRR0, CSRR1)
• e_lmvdsrrw, e_stmvdsrrw—load/store multiple volatile DSRRs (DSRR0, DSRR1)
• e_lmvmcsrrw, e_stmvmcsrrw —load/store multiple volatile MCSRRs (MCSRR0, MCSRR1)
These instructions are available in VLE instruction pages to perform a multiple register load or store to a
word aligned memory address.
e_lmvgprw
e_lmvgprw
Load Multiple Volatile GPR Word
e_lmvgprw
0
0
0
0
D8(RA)
1
1
0
0
6
0
0
0
0
(D8-mode)
RA
11
0
16
0
0
1
0
0
0
0
D8
24
31
if RA=0 then EA  EXTS(D8)
else
EA  (GPR(RA)+EXTS(D8))
GPR(r0)32:63  MEM(EA,4)
EA  (EA+4)
r  3
do while r  12
GPR(r)32:63  MEM(EA,4)
EA  (EA+4)
r  r + 1
Let the effective address (EA) be the sum of the content of GPR(RA) and the sign-extended value of the
D8 instruction field. If RA = 0, the content of GPR(RA) equals 0 and EA is the sign-extended value of the
D8 instruction field.
Bits 32–63 of registers GPR(R0), and GPR(R3) through GPR(12) are loaded from n consecutive words in
storage starting at address EA.
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-16
Freescale Semiconductor
Instruction Model
EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly
undefined.
Special registers altered:
None
e_stmvgprw
e_stmvgprw
Store Multiple Volatile GPR Word
e_stmvgprw
D8(RA)
(D8-mode)
0 0 0 1 1 0 0 0 0 0 0
0
6
RA
11
0 0 0 1 0 0 0 1
16
D8
24
31
if RA=0 then EA  EXTS(D8)
else
EA  (GPR(RA)+EXTS(D8))
MEM(EA,4)  GPR(r0)32:63
EA  (EA+4)
r  3
do while r  12
MEM(EA,4)  GPR(r)32:63
r  r + 1
EA  (EA+4)
Let the effective address (EA) be the sum of the content of GPR(RA) and the sign-extended value of the
D8 instruction field. If RA = 0, the content of GPR(RA) equals 0 and EA is the sign-extended value of the
D8 instruction field.
Bits 32–63 of registers GPR(R0), and GPR(R3) through GPR(12) are stored in n consecutive words in
storage starting at address EA.
EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly
undefined.
Special registers altered:
None
e_lmvsprw
e_lmvsprw
Load Multiple Volatile SPR Word
e_lmvsprw
D8(RA)
(D8-mode)
0 0 0 1 1 0 0 0 0 0 1
0
6
RA
11
0 0 0 1 0 0 0 0
16
D8
24
31
if RA=0 then EA  EXTS(D8)
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-17
Instruction Model
else
EA  (GPR(RA)+EXTS(D8))
CR32:63  MEM(EA,4)
EA  (EA+4)
LR32:63  MEM(EA,4)
EA  (EA+4)
CTR32:63  MEM(EA,4)
EA  (EA+4)
XER32:63  MEM(EA,4)
Let the effective address (EA) be the sum of the content of GPR(RA) and the sign-extended value of the
D8 instruction field. If RA = 0, the content of GPR(RA) equals 0 and EA is the sign-extended value of the
D8 instruction field.
Bits 32–63 of registers CR, LR, CTR, and XER are loaded from n consecutive words in storage starting at
address EA.
EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly
undefined.
Special registers altered:
CR, LR, CTR, XER
NOTE
If the EA is misaligned and the e_lmvsprw is followed by either a branch to
link register or branch to count register within 4 instructions, the core can
lock up during exception handling for the misalignment. To avoid this issue,
do not do misaligned on e_lmvsprw or ensure there are at least 4 instructions
in between the e_lmvsprw and the branch to LR or CTR. This issue does not
apply to Book E applications.
e_stmvsprw
e_stmvsprw
Store Multiple Volatile SPR Word
e_stmvsprw
0
0
0
0
D8(RA)
1
1
0
0
0
0
0
6
if RA=0 then
else
MEM(EA,4) 
EA  (EA+4)
MEM(EA,4) 
EA  (EA+4)
MEM(EA,4) 
(D8-mode)
1
RA
11
0
16
0
0
1
0
0
0
1
D8
24
31
EA  EXTS(D8)
EA  (GPR(RA)+EXTS(D8))
CR32:63
LR32:63
CTR32:63
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-18
Freescale Semiconductor
Instruction Model
EA  (EA+4)
MEM(EA,4)  XER32:63
Let the effective address (EA) be the sum of the content of GPR(RA) and the sign-extended value of the
D8 instruction field. If RA = 0, the content of GPR(RA) equals 0 and EA is the sign-extended value of the
D8 instruction field.
Bits 32–63 of registers CR, LR, CTR, and XER are stored in n consecutive words in storage starting at
address EA.
EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly
undefined.
Special registers altered:
None
e_lmvsrrw
e_lmvsrrw
Load Multiple Volatile SRR Word
e_lmvsrrw
0
0
D8(RA)
0
1
1
0
0
0
6
0
1
0
(D8-mode)
0
RA
11
0
16
0
0
1
0
0
0
0
D8
24
31
if RA=0 then EA  EXTS(D8)
else
EA  (GPR(RA)+EXTS(D8))
SRR032:63  MEM(EA,4)
EA  (EA+4)
SRR132:63  MEM(EA,4)
Let the effective address (EA) be the sum of the content of GPR(RA) and the sign-extended value of the
D8 instruction field. If RA = 0, the content of GPR(RA) equals 0 and EA is the sign-extended value of the
D8 instruction field.
Bits 32–63 of registers SRR0 and SRR1 are loaded from consecutive words in storage starting at address
EA.
EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly
undefined.
Special registers altered:
SRR0, SRR1
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-19
Instruction Model
e_stmvsrrw
e_stmvsrrw
Store Multiple Volatile SRR Word
e_stmvsrrw
0
0
0
D8(RA)
1
1
0
0
0
0
1
0
(D8-mode)
0
6
RA
11
if RA=0 then
else
MEM(EA,4) 
EA  (EA+4)
MEM(EA,4) 
0
0
0
1
0
0
0
1
16
D8
24
31
EA  EXTS(D8)
EA  (GPR(RA)+EXTS(D8))
SRR032:63
SRR132:63
Let the effective address (EA) be the sum of the content of GPR(RA) and the sign-extended value of the
D8 instruction field. If RA = 0, the content of GPR(RA) equals 0 and EA is the sign-extended value of the
D8 instruction field.
Bits 32–63 of registers SRR0 and SRR1 are stored in consecutive words in storage starting at address EA.
EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly
undefined.
Special registers altered:
None
e_lmvcsrrw
e_lmvcsrrw
Load Multiple Volatile CSRR Word
e_lmvcsrrw
0
0
0
0
D8(RA)
1
1
0
0
0
1
0
1
6
if RA=0 then
else
CSRR032:63 
EA  (EA+4)
CSRR132:63 
(D8-mode)
RA
11
0
16
0
0
1
0
0
0
0
D8
24
31
EA  EXTS(D8)
EA  (GPR(RA)+EXTS(D8))
MEM(EA,4)
MEM(EA,4)
Let the effective address (EA) be the sum of the content of GPR(RA) and the sign-extended value of the
D8 instruction field. If RA = 0, the content of GPR(RA) equals 0 and EA is the sign-extended value of the
D8 instruction field.
Bits 32–63 of registers CSRR0 and CSRR1 are loaded from consecutive words in storage starting at
address EA.
EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly
undefined.
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-20
Freescale Semiconductor
Instruction Model
Special registers altered:
CSRR0, CSRR1
e_stmvcsrrw
e_stmvcsrrw
Store Multiple Volatile CSRR Word
e_stmvcsrrw
0
0
0
D8(RA)
1
1
0
0
0
0
1
0
(D8-mode)
1
6
RA
11
0
0
0
1
0
0
0
1
D8
16
24
31
if RA=0 then EA  EXTS(D8)
else
EA  (GPR(RA)+EXTS(D8))
MEM(EA,4)  CSRR032:63
EA  (EA+4)
MEM(EA,4)  CSRR132:63
Let the effective address (EA) be the sum of the content of GPR(RA) and the sign-extended value of the
D8 instruction field. If RA = 0, the content of GPR(RA) equals 0 and EA is the sign-extended value of the
D8 instruction field.
Bits 32–63 of registers CSRR0 and CSRR1 are stored in consecutive words in storage starting at address
EA.
EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly
undefined.
Special registers altered:
None
e_lmvdsrrw
e_lmvdsrrw
Load Multiple Volatile DSRR Word
e_lmvdsrrw
D8(RA)
(D8-mode)
0 0 0 1 1 0 0 0 1 1 0
0
6
if RA=0 then
else
DSRR032:63 
EA  (EA+4)
DSRR132:63 
RA
11
0 0 0 1 0 0 0 0
16
D8
24
31
EA  EXTS(D8)
EA  (GPR(RA)+EXTS(D8))
MEM(EA,4)
MEM(EA,4)
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-21
Instruction Model
Let the effective address (EA) be the sum of the content of GPR(RA) and the sign-extended value of the
D8 instruction field. If RA = 0, the content of GPR(RA) equals 0 and EA is the sign-extended value of the
D8 instruction field.
Bits 32–63 of registers DSRR0 and DSRR1 are loaded from consecutive words in storage starting at
address EA.
EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly
undefined.
Special registers altered:
DSRR0, DSRR1
e_stmvdsrrw
e_stmvdsrrw
Store Multiple Volatile DSRR Word
e_stmvdsrrw
0
0
0
D8(RA)
1
1
0
0
0
0
1
6
if RA=0 then
else
MEM(EA,4) 
EA  (EA+4)
MEM(EA,4) 
1
(D8-mode)
0
RA
11
0
16
0
0
1
0
0
0
1
D8
24
31
EA  EXTS(D8)
EA  (GPR(RA)+EXTS(D8))
DSRR032:63
DSRR132:63
Let the effective address (EA) be the sum of the content of GPR(RA) and the sign-extended value of the
D8 instruction field. If RA = 0, the content of GPR(RA) equals 0 and EA is the sign-extended value of the
D8 instruction field.
Bits 32–63 of registers DSRR0 and DSRR1 are stored in consecutive words in storage starting at address
EA.
EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly
undefined.
Special registers altered:
None
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-22
Freescale Semiconductor
Instruction Model
e_lmvmcsrrw
e_lmvmcsrrw
Load Multiple Volatile MCSRR Word
e_lmvmcsrrw
0
0
0
1
D8(RA)
1
0
0
0
0
1
1
(D8-mode)
1
6
RA
11
0
0
0
1
0
0
0
0
16
D8
24
31
if RA=0 then EA  EXTS(D8)
else
EA  (GPR(RA)+EXTS(D8))
MCSRR032:63  MEM(EA,4)
EA  (EA+4)
MCSRR132:63  MEM(EA,4)
Let the effective address (EA) be the sum of the content of GPR(RA) and the sign-extended value of the
D8 instruction field. If RA = 0, the content of GPR(RA) equals 0 and EA is the sign-extended value of the
D8 instruction field.
Bits 32–63 of registers MCSRR0 and MCSRR1 are loaded from consecutive words in storage starting at
address EA.
EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly
undefined.
Special registers altered:
MCSRR0, MCSRR1
e_stmvmcsrrw
e_stmvmcsrrw
Store Multiple Volatile MCSRR Word
e_stmvmcsrrw
0
0
0
1
D8(RA)
1
0
0
0
0
1
1
1
6
if RA=0 then
else
MEM(EA,4) 
EA  (EA+4)
MEM(EA,4) 
(D8-mode)
RA
11
0
16
0
0
1
0
0
0
1
D8
24
31
EA  EXTS(D8)
EA  (GPR(RA)+EXTS(D8))
MCSRR032:63
MCSRR132:63
Let the effective address (EA) be the sum of the content of GPR(RA) and the sign-extended value of the
D8 instruction field. If RA = 0, the content of GPR(RA) equals 0 and EA is the sign-extended value of the
D8 instruction field.
Bits 32–63 of registers MCSRR0 and MCSRR1 are stored in consecutive words in storage starting at
address EA.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-23
Instruction Model
EA must be a multiple of 4. If it is not, either an Alignment interrupt is invoked or the results are boundedly
undefined.
Special registers altered:
None
3.15
Unimplemented SPRs and Read-Only SPRs
The e200 fully decodes the SPR field of the mfspr and mtspr instructions. If the SPR specified is
undefined and not privileged, an illegal instruction exception is generated. If the SPR specified is
undefined and privileged and the CPU is in user mode (MSR[PR] = 1), a privileged instruction exception
is generated. If the SPR specified is undefined and privileged and the CPU is in supervisor mode
(MSR[PR] = 0), an illegal instruction exception is generated.
For the mtspr instruction, if the SPR specified is read-only and not privileged, an illegal instruction
exception is generated. If the SPR specified is read-only and privileged and the CPU is in user mode
(MSR[PR] = 1), a privileged instruction exception is generated. If the SPR specified is read-only and
privileged and the CPU is in supervisor mode (MSR[PR] = 0), an illegal instruction exception is generated.
3.16
Invalid Forms of Instructions
This section discusses invalid forms of instructions.
3.16.1
Load and Store with Update instructions
The Power ISA embedded category defines the case when a load with update instruction specifies the same
register in the RT and RA field of the instruction as an invalid format. For this invalid case, the e200 core
will perform the instruction and update the register with the load data. In addition, if RA = 0 for any load
or store with update instruction, the e200 core will update RA (GPR0).
3.16.2
Load Multiple Word (lmw, e_lmw) instruction
The Power ISA embedded category defines as invalid any form of the lmw or e_lmw instruction in which
RA is in the range of registers to be loaded, including the case in which RA = 0. On the e200, invalid forms
of the lmw or e_lmw instruction are executed as follows:
• Case 1: RA is in the range of RT, RA  0. In this case, address generation for individual loads to
register targets is done using the architectural value of RA which existed when beginning execution
of this lmw or e_lmw instruction. RA will be overwritten with a value fetched from memory as if
it had not been the base register. Note that if the instruction is interrupted and restarted, the base
address may be different if RA has been overwritten.
• Case 2: RA = 0 and RT = 0. In this case, address generation for all loads to register targets RT = 0
to RT = 31 will be done substituting the value of 0 for the RA operand.
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-24
Freescale Semiconductor
Instruction Model
3.16.3
Branch Conditional to Count Register Instructions
The Power ISA embedded category defines as invalid any bcctr or bcctrl instruction which specifies the
‘decrement and test CTR’ (BO2 = 0) option. For these invalid forms of instructions e200 will execute the
instruction by decrementing the CTR and branch to the location specified by the pre-decremented CTR
value if all CR and CTR conditions are met as specified by the other BO field settings.
3.16.4
Instructions With Reserved Fields Non-Zero
The Power ISA embedded category defines certain bit fields in various instructions as reserved and
specifies that these fields be set to zero. Per the Power ISA embedded category recommendation, e200
ignores the value of the reserved field (bit 31) in X-form integer load and store instructions. The e200
ignores the value of the reserved z bits in the BO field of branch instructions. For all other instructions, the
e200 generates an illegal instruction exception if a reserved field is non-zero.
3.17
Instruction Summary
Table 3-2 and Table 3-3 list all 32-bit instructions in the Power ISA embedded category architecture as
well as certain e200-specific instructions, sorted by mnemonic. The table includes the following: format,
opcode, mnemonic, and instruction name. For e200-specific instructions, the page number is not shown.
Instructions that are not listed here, but which are part of the Power ISA embedded category, either signal
an illegal instruction, unimplemented operation, or FP unavailable exception. Implementation-dependent
instructions are noted with a footnote. Instructions that are optionally supported (when an optional
function is added to the base core) are shown with shaded entries.
Note that specific areas of functionality are not included in the table below:
• Cache maintenance instructions
• SPE
• VLE
• WAIT instruction
• Enhanced reservation functionality
• Volatile context save/restore
Table 3-2 lists the instruction index sorted by mnemonic.
Table 3-2. Instructions Sorted by Mnemonic
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
X
011111
01000 01010 0
add
Add
X
011111
01000 01010 1
add.
Add and Record CR
X
011111
00000 01010 0
addc
Add Carrying
X
011111
00000 01010 1
addc.
Add Carrying and Record CR
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-25
Instruction Model
Table 3-2. Instructions Sorted by Mnemonic (continued)
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
X
011111
10000 01010 0
addco
Add Carrying and Record OV
X
011111
10000 01010 1
addco.
Add Carrying and Record OV & CR
X
011111
00100 01010 0
adde
Add Extended with CA
X
011111
00100 01010 1
adde.
Add Extended with CA and Record CR
X
011111
10100 01010 0
addeo
Add Extended with CA and Record OV
X
011111
10100 01010 1
addeo.
Add Extended with CA and Record OV & CR
D
001110
----- ----- -
addi
Add Immediate
D
001100
----- ----- -
addic
Add Immediate Carrying
D
001101
----- ----- -
addic.
Add Immediate Carrying and Record CR
D
001111
----- ----- -
addis
Add Immediate Shifted
X
011111
00111 01010 0
addme
Add to Minus One Extended with CA
X
011111
00111 01010 1
addme.
Add to Minus One Extended with CA and Record CR
X
011111
10111 01010 0
addmeo
Add to Minus One Extended with CA and Record OV
X
011111
10111 01010 1
addmeo.
Add to Minus One Extended with CA and Record OV & CR
X
011111
11000 01010 0
addo
Add and Record OV
X
011111
11000 01010 1
addo.
Add and Record OV and CR
X
011111
00110 01010 0
addze
Add to Zero Extended with CA
X
011111
00110 01010 1
addze.
Add to Zero Extended with CA and Record CR
X
011111
10110 01010 0
addzeo
Add to Zero Extended with CA and Record OV
X
011111
10110 01010 1
addzeo.
Add to Zero Extended with CA and Record OV & CR
X
011111
00000 11100 0
and
AND
X
011111
00000 11100 1
and.
AND and Record CR
X
011111
00001 11100 0
andc
AND with Complement
X
011111
00001 11100 1
andc.
AND with Complement and Record CR
D
011100
----- ----- -
andi.
AND Immediate and Record CR
D
011101
----- ----- -
andis.
AND Immediate Shifted and Record CR
I
010010
----- ----0 0
b
Branch
I
010010
----- ----1 0
ba
Branch Absolute
B
010000
----- ----0 0
bc
Branch Conditional
B
010000
----- ----1 0
bca
Branch Conditional Absolute
XL
010011
10000 10000 0
bcctr
Branch Conditional to Count Register
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-26
Freescale Semiconductor
Instruction Model
Table 3-2. Instructions Sorted by Mnemonic (continued)
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
XL
010011
10000 10000 1
bcctrl
B
010000
----- ----0 1
bcl
Branch Conditional and Link
B
010000
----- ----1 1
bcla
Branch Conditional and Link Absolute
XL
010011
00000 10000 0
bclr
Branch Conditional to Link Register
XL
010011
00000 10000 1
bclrl
Branch Conditional to Link Register & Link
I
010010
----- ----0 1
bl
Branch and Link
I
010010
----- ----1 1
bla
Branch and Link Absolute
X
011111
00000 00000 /
cmp
Compare
D
001011
----- ----- -
cmpi
Compare Immediate
X
011111
00001 00000 /
cmpl
Compare Logical
D
001010
----- ----- -
cmpli
Compare Logical Immediate
X
011111
00000 11010 0
cntlzw
Count Leading Zeros Word
X
011111
00000 11010 1
cntlzw.
Count Leading Zeros Word & Record CR
XL
010011
01000 00001 /
crand
Condition Register AND
XL
010011
00100 00001 /
crandc
Condition Register AND with Complement
XL
010011
01001 00001 /
creqv
XL
010011
00111 00001 /
crnand
XL
010011
00001 00001 /
crnor
XL
010011
01110 00001 /
cror
Condition Register OR
XL
010011
01101 00001 /
crorc
Condition Register OR with Complement
XL
010011
00110 00001 /
crxor
Condition Register XOR
X
011111
10111 10110 /
dcba
Data Cache Block Allocate
X
011111
00010 10110 /
dcbf
Data Cache Block Flush
X
011111
01110 10110 /
dcbi
Data Cache Block Invalidate
X
011111
01100 00110 /
dcblc
Data Cache Block Lock Clear
X
011111
00001 10110 /
dcbst
Data Cache Block Store
X
011111
01000 10110 /
dcbt
Data Cache Block Touch
X
011111
00101 00110 /
dcbtls
Data Cache Block Touch and Lock Set
X
011111
00111 10110 /
dcbtst
Data Cache Block Touch for Store
X
011111
00100 00110 /
dcbtstls
X
011111
11111 10110 /
dcbz
Branch Conditional to Count Register and Link
Condition Register Equivalent
Condition Register NAND
Condition Register NOR
Data Cache Block Touch for Store and Lock Set
Data Cache Block Set to Zero
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-27
Instruction Model
Table 3-2. Instructions Sorted by Mnemonic (continued)
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
X
011111
01111 01011 0
divw
Divide Word
X
011111
01111 01011 1
divw.
Divide Word and Record CR
X
011111
11111 01011 0
divwo
Divide Word and Record OV
X
011111
11111 01011 1
divwo.
Divide Word and Record OV & CR
X
011111
01110 01011 0
divwu
Divide Word Unsigned
X
011111
01110 01011 1
divwu.
Divide Word Unsigned and Record CR
X
011111
11110 01011 0
divwuo
Divide Word Unsigned and Record OV
X
011111
11110 01011 1
divwuo.
Divide Word Unsigned and Record OV & CR
X
011111
01000 11100 0
eqv
Equivalent
X
011111
01000 11100 1
eqv.
Equivalent and Record CR
X
011111
11101 11010 0
extsb
Extend Sign Byte
X
011111
11101 11010 1
extsb.
Extend Sign Byte and Record CR
X
011111
11100 11010 0
extsh
Extend Sign Half Word
X
011111
11100 11010 1
extsh.
Extend Sign Half Word and Record CR
X
011111
11110 10110 /
icbi
Instruction Cache Block Invalidate
X
011111
00111 00110 /
icblc
Instruction Cache Block Lock Clear
X
011111
00000 10110 /
icbt
Instruction Cache Block Touch
X
011111
01111 00110 /
icbtls
??
011111
----- 01111 /
isel
XL
010011
00100 10110 /
isync
D
100010
----- ----- -
lbz
D
100011
----- ----- -
lbzu
Load Byte & Zero with Update
X
011111
00011 10111 /
lbzux
Load Byte & Zero with Update Indexed
X
011111
00010 10111 /
lbzx
Load Byte & Zero Indexed
D
101010
----- ----- -
lha
Load Half Word Algebraic
D
101011
----- ----- -
lhau
Load Half Word Algebraic with Update
X
011111
01011 10111 /
lhaux
Load Half Word Algebraic with Update Indexed
X
011111
01010 10111 /
lhax
Load Half Word Algebraic Indexed
X
011111
11000 10110 /
lhbrx
Load Half Word Byte-Reverse Indexed
D
101000
----- ----- -
lhz
D
101001
----- ----- -
lhzu
Instruction Cache Block Touch and Lock Set
Integer Select
Instruction Synchronize
Load Byte & Zero
Load Half Word & Zero
Load Half Word & Zero with Update
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-28
Freescale Semiconductor
Instruction Model
Table 3-2. Instructions Sorted by Mnemonic (continued)
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
X
011111
01001 10111 /
lhzux
X
011111
01000 10111 /
lhzx
Load Half Word & Zero Indexed
D
101110
----- ----- -
lmw
Load Multiple Word
X
011111
00000 10100 /
lwarx
Load Word & Reserve Indexed
X
011111
10000 10110 /
lwbrx
Load Word Byte-Reverse Indexed
D
100000
----- ----- -
lwz
D
100001
----- ----- -
lwzu
Load Word & Zero with Update
X
011111
00001 10111 /
lwzux
Load Word & Zero with Update Indexed
X
011111
00000 10111 /
lwzx
Load Word & Zero Indexed
X
011111
11010 10110 /
mbar
Memory Barrier
XL
010011
00000 00000 /
mcrf
Move Condition Register Field
X
011111
10000 00000 /
mcrxr
X
011111
00000 10011 /
mfcr
XFX
011111
01010 00011 /
mfdcr
Move From Device Control Register
X
011111
00010 10011 /
mfmsr
Move From Machine State Register
XFX
011111
01010 10011 /
mfspr
Move From Special Purpose Register
X
011111
10010 10110 /
msync
Memory Synchronize
XFX
011111
00100 10000 /
mtcrf
Move To Condition Register Fields
XFX
011111
01110 00011 /
mtdcr
Move To Device Control Register
X
011111
00100 10010 /
mtmsr
Move To Machine State Register
XFX
011111
01110 10011 /
mtspr
Move To Special Purpose Register
X
011111
/0010 01011 0
mulhw
Multiply High Word
X
011111
/0010 01011 1
mulhw.
Multiply High Word & Record CR
X
011111
/0000 01011 0
mulhwu
Multiply High Word Unsigned
X
011111
/0000 01011 1
mulhwu.
Multiply High Word Unsigned & Record CR
D
000111
----- ----- -
mulli
Multiply Low Immediate
X
011111
00111 01011 0
mullw
Multiply Low Word
X
011111
00111 01011 1
mullw.
Multiply Low Word & Record CR
X
011111
10111 01011 0
mullwo
Multiply Low Word & Record OV
X
011111
10111 01011 1
mullwo.
Multiply Low Word & Record OV & CR
X
011111
01110 11100 0
nand
Load Half Word & Zero with Update Indexed
Load Word & Zero
Move to Condition Register from XER
Move From Condition Register
NAND
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-29
Instruction Model
Table 3-2. Instructions Sorted by Mnemonic (continued)
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
X
011111
01110 11100 1
nand.
X
011111
00011 01000 0
neg
Negate
X
011111
00011 01000 1
neg.
Negate & Record CR
X
011111
10011 01000 0
nego
Negate & Record OV
X
011111
10011 01000 1
nego.
Negate & Record OV & Record CR
X
011111
00011 11100 0
nor
NOR
X
011111
00011 11100 1
nor.
NOR & Record CR
X
011111
01101 11100 0
or
OR
X
011111
01101 11100 1
or.
OR & Record CR
X
011111
01100 11100 0
orc
OR with Complement
X
011111
01100 11100 1
orc.
OR with Complement & Record CR
D
011000
----- ----- -
ori
OR Immediate
D
011001
----- ----- -
oris
OR Immediate Shifted
XL
010011
00001 10011 /
rfci
Return From Critical Interrupt
XL
010011
00001 00111 /
rfdi
Return From Debug Interrupt
XL
010011
00001 10010 /
rfi
XL
010011
00001 00110 /
rfmci
Return From Machine Check Interrupt
M
010100
----- ----- 0
rlwimi
Rotate Left Word Immediate then Mask Insert
M
010100
----- ----- 1
rlwimi.
Rotate Left Word Immediate then Mask Insert & Record CR
M
010101
----- ----- 0
rlwinm
Rotate Left Word Immediate then AND with Mask
M
010101
----- ----- 1
rlwinm.
Rotate Left Word Immediate then AND with Mask & Record CR
M
010111
----- ----- 0
rlwnm
Rotate Left Word then AND with Mask
M
010111
----- ----- 1
rlwnm.
Rotate Left Word then AND with Mask & Record CR
SC
010001
///// ////1 /
sc
System Call
X
011111
00000 11000 0
slw
Shift Left Word
X
011111
00000 11000 1
slw.
Shift Left Word & Record CR
X
011111
11000 11000 0
sraw
Shift Right Algebraic Word
X
011111
11000 11000 1
sraw.
Shift Right Algebraic Word & Record CR
X
011111
11001 11000 0
srawi
Shift Right Algebraic Word Immediate
X
011111
11001 11000 1
srawi.
Shift Right Algebraic Word Immediate & Record CR
X
011111
10000 11000 0
srw
NAND & Record CR
Return From Interrupt
Shift Right Word
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-30
Freescale Semiconductor
Instruction Model
Table 3-2. Instructions Sorted by Mnemonic (continued)
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
X
011111
10000 11000 1
srw.
Shift Right Word & Record CR
D
100110
----- ----- -
stb
Store Byte
D
100111
----- ----- -
stbu
Store Byte with Update
X
011111
00111 10111 /
stbux
Store Byte with Update Indexed
X
011111
00110 10111 /
stbx
Store Byte Indexed
D
101100
----- ----- -
sth
Store Half Word
X
011111
11100 10110 /
sthbrx
D
101101
----- ----- -
sthu
Store Half Word with Update
X
011111
01101 10111 /
sthux
Store Half Word with Update Indexed
X
011111
01100 10111 /
sthx
Store Half Word Indexed
D
101111
----- ----- -
stmw
Store Multiple Word
D
100100
----- ----- -
stw
X
011111
10100 10110 /
stwbrx
Store Word Byte-Reverse Indexed
X
011111
00100 10110 1
stwcx.
Store Word Conditional Indexed & Record CR
D
100101
----- ----- -
stwu
Store Word with Update
X
011111
00101 10111 /
stwux
Store Word with Update Indexed
X
011111
00100 10111 /
stwx
Store Word Indexed
X
011111
00001 01000 0
subf
Subtract From
X
011111
00001 01000 1
subf.
Subtract From & Record CR
X
011111
00000 01000 0
subfc
Subtract From Carrying
X
011111
00000 01000 1
subfc.
Subtract From Carrying & Record CR
X
011111
10000 01000 0
subfco
Subtract From Carrying & Record OV
X
011111
10000 01000 1
subfco.
Subtract From Carrying & Record OV & CR
X
011111
00100 01000 0
subfe
Subtract From Extended with CA
X
011111
00100 01000 1
subfe.
Subtract From Extended with CA & Record CR
X
011111
10100 01000 0
subfeo
Subtract From Extended with CA & Record OV
X
011111
10100 01000 1
subfeo.
Subtract From Extended with CA & Record OV & CR
D
001000
----- ----- -
subfic
Subtract From Immediate Carrying
X
011111
00111 01000 0
subfme
Subtract From Minus One Extended with CA
X
011111
00111 01000 1
subfme.
Subtract From Minus One Extended with CA & Record CR
X
011111
10111 01000 0
subfmeo
Subtract From Minus One Extended with CA & Record OV
Store Half Word Byte-Reverse Indexed
Store Word
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-31
Instruction Model
Table 3-2. Instructions Sorted by Mnemonic (continued)
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
X
011111
10111 01000 1
subfmeo.
X
011111
10001 01000 0
subfo
Subtract From & Record OV
X
011111
10001 01000 1
subfo.
Subtract From & Record OV & CR
X
011111
00110 01000 0
subfze
Subtract From Zero Extended with CA
X
011111
00110 01000 1
subfze.
Subtract From Zero Extended with CA & Record CR
X
011111
10110 01000 0
subfzeo
Subtract From Zero Extended with CA & Record OV
X
011111
10110 01000 1
subfzeo.
Subtract From Zero Extended with CA & Record OV & CR
X
011111
11000 10010 /
tlbivax
X
011111
11101 10010 /
tlbre
TLB Read Entry
X
011111
11100 10010 ?
tlbsx
TLB Search Indexed
X
011111
10001 10110 /
tlbsync
TLB Synchronize
X
011111
11110 10010 /
tlbwe
TLB Write Entry
X
011111
00000 00100 /
tw
Trap Word
D
000011
----- ----- -
twi
Trap Word Immediate
X
011111
00100 00011 /
wrtee
Write External Enable
X
011111
00101 00011 /
wrteei
Write External Enable Immediate
X
011111
01001 11100 0
xor
XOR
X
011111
01001 11100 1
xor.
XOR and Record CR
D
011010
----- ----- -
xori
XOR Immediate
D
011011
----- ----- -
xoris
XOR Immediate Shifted
Subtract From Minus One Extended with CA & Record OV & CR
TLB Invalidate Virtual Address Indexed
Note:
- Don’t care, usually part of an operand field.
/ Reserved bit, invalid instruction form if encoded as 1.
? Allocated for implementation-dependent use. See user’s manual for the implementation.
Table 3-3 lists the instruction index sorted by opcode.
Table 3-3. Instructions Sorted by Opcode
Opcode
Format
Mnemonic
Primary
(Inst[0–5])
Extended
(Inst[21–31])
D
000011
----- ----- -
twi
D
000111
----- ----- -
mulli
Instruction
Trap Word Immediate
Multiply Low Immediate
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-32
Freescale Semiconductor
Instruction Model
Table 3-3. Instructions Sorted by Opcode (continued)
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
D
001000
----- ----- -
subfic
Subtract From Immediate Carrying
D
001010
----- ----- -
cmpli
Compare Logical Immediate
D
001011
----- ----- -
cmpi
Compare Immediate
D
001100
----- ----- -
addic
Add Immediate Carrying
D
001101
----- ----- -
addic.
Add Immediate Carrying & Record CR
D
001110
----- ----- -
addi
Add Immediate
D
001111
----- ----- -
addis
Add Immediate Shifted
B
010000
----- ----0 0
bc
Branch Conditional
B
010000
----- ----0 1
bcl
Branch Conditional & Link
B
010000
----- ----1 0
bca
Branch Conditional Absolute
B
010000
----- ----1 1
bcla
Branch Conditional & Link Absolute
SC
010001
///// ////1 /
sc
System Call
I
010010
----- ----0 0
b
Branch
I
010010
----- ----0 1
bl
Branch & Link
I
010010
----- ----1 0
ba
Branch Absolute
I
010010
----- ----1 1
bla
Branch & Link Absolute
XL
010011
00000 00000 /
mcrf
Move Condition Register Field
XL
010011
00000 10000 0
bclr
Branch Conditional to Link Register
XL
010011
00000 10000 1
bclrl
Branch Conditional to Link Register & Link
XL
010011
00001 00001 /
crnor
Condition Register NOR
XL
010011
00001 00110 /
rfmci
Return From Machine Check Interrupt
XL
010011
00001 00111 /
rfdi
Return From Debug Interrupt
XL
010011
00001 10010 /
rfi
Return From Interrupt
XL
010011
00001 10011 /
rfci
Return From Critical Interrupt
XL
010011
00100 00001 /
crandc
XL
010011
00100 10110 /
isync
Instruction Synchronize
XL
010011
00110 00001 /
crxor
Condition Register XOR
XL
010011
00111 00001 /
crnand
XL
010011
01000 00001 /
crand
Condition Register AND
XL
010011
01001 00001 /
creqv
Condition Register Equivalent
XL
010011
01101 00001 /
crorc
Condition Register OR with Complement
Condition Register AND with Complement
Condition Register NAND
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-33
Instruction Model
Table 3-3. Instructions Sorted by Opcode (continued)
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
XL
010011
01110 00001 /
cror
Condition Register OR
XL
010011
10000 10000 0
bcctr
Branch Conditional to Count Register
XL
010011
10000 10000 1
bcctrl
Branch Conditional to Count Register & Link
M
010100
----- ----- 0
rlwimi
Rotate Left Word Immediate then Mask Insert
M
010100
----- ----- 1
rlwimi.
Rotate Left Word Immediate then Mask Insert & Record CR
M
010101
----- ----- 0
rlwinm
Rotate Left Word Immediate then AND with Mask
M
010101
----- ----- 1
rlwinm.
Rotate Left Word Immediate then AND with Mask & Record CR
M
010111
----- ----- 0
rlwnm
Rotate Left Word then AND with Mask
M
010111
----- ----- 1
rlwnm.
Rotate Left Word then AND with Mask & Record CR
D
011000
----- ----- -
ori
OR Immediate
D
011001
----- ----- -
oris
OR Immediate Shifted
D
011010
----- ----- -
xori
XOR Immediate
D
011011
----- ----- -
xoris
XOR Immediate Shifted
D
011100
----- ----- -
andi.
AND Immediate & Record CR
D
011101
----- ----- -
andis.
AND Immediate Shifted & Record CR
??
011111
----- 01111 /
isel
Integer Select
X
011111
00000 00000 /
cmp
Compare
X
011111
00000 00100 /
tw
Trap Word
X
011111
00000 01000 0
subfc
Subtract From Carrying
X
011111
00000 01000 1
subfc.
Subtract From Carrying & Record CR
X
011111
00000 01010 0
addc
Add Carrying
X
011111
00000 01010 1
addc.
Add Carrying & Record CR
X
011111
/0000 01011 0
mulhwu
Multiply High Word Unsigned
X
011111
/0000 01011 1
mulhwu.
Multiply High Word Unsigned & Record CR
X
011111
00000 10011 /
mfcr
Move From Condition Register
X
011111
00000 10100 /
lwarx
Load Word & Reserve Indexed
X
011111
00000 10110 /
icbt
Instruction Cache Block Touch
X
011111
00000 10111 /
lwzx
Load Word & Zero Indexed
X
011111
00000 11000 0
slw
Shift Left Word
X
011111
00000 11000 1
slw.
Shift Left Word & Record CR
X
011111
00000 11010 0
cntlzw
Count Leading Zeros Word
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-34
Freescale Semiconductor
Instruction Model
Table 3-3. Instructions Sorted by Opcode (continued)
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
X
011111
00000 11010 1
cntlzw.
X
011111
00000 11100 0
and
AND
X
011111
00000 11100 1
and.
AND & Record CR
X
011111
00001 00000 /
cmpl
Compare Logical
X
011111
00001 01000 0
subf
Subtract From
X
011111
00001 01000 1
subf.
Subtract From & Record CR
X
011111
00001 10110 /
dcbst
Data Cache Block Store
X
011111
00001 10111 /
lwzux
Load Word & Zero with Update Indexed
X
011111
00001 11100 0
andc
AND with Complement
X
011111
00001 11100 1
andc.
AND with Complement & Record CR
X
011111
/0010 01011 0
mulhw
Multiply High Word
X
011111
/0010 01011 1
mulhw.
Multiply High Word & Record CR
X
011111
00010 10011 /
mfmsr
Move From Machine State Register
X
011111
00010 10110 /
dcbf
Data Cache Block Flush
X
011111
00010 10111 /
lbzx
Load Byte & Zero Indexed
X
011111
00011 01000 0
neg
Negate
X
011111
00011 01000 1
neg.
Negate & Record CR
X
011111
00011 10111 /
lbzux
Load Byte & Zero with Update Indexed
X
011111
00011 11100 0
nor
NOR
X
011111
00011 11100 1
nor.
NOR & Record CR
X
011111
00100 00011 /
wrtee
X
011111
00100 00110 /
dcbtstls
X
011111
00100 01000 0
subfe
Subtract From Extended with CA
X
011111
00100 01000 1
subfe.
Subtract From Extended with CA & Record CR
X
011111
00100 01010 0
adde
Add Extended with CA
X
011111
00100 01010 1
adde.
Add Extended with CA & Record CR
XFX
011111
00100 10000 /
mtcrf
Move to Condition Register Fields
X
011111
00100 10010 /
mtmsr
Move to Machine State Register
X
011111
00100 10110 1
stwcx.
Store Word Conditional Indexed & Record CR
X
011111
00100 10111 /
stwx
X
011111
00101 00011 /
wrteei
Count Leading Zeros Word & Record CR
Write External Enable
Data Cache Block Touch for Store and Lock Set
Store Word Indexed
Write External Enable Immediate
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-35
Instruction Model
Table 3-3. Instructions Sorted by Opcode (continued)
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
X
011111
00101 00110 /
dcbtls
Data Cache Block Touch and Lock Set
X
011111
00101 10111 /
stwux
Store Word with Update Indexed
X
011111
00110 01000 0
subfze
Subtract From Zero Extended with CA
X
011111
00110 01000 1
subfze.
Subtract From Zero Extended with CA & Record CR
X
011111
00110 01010 0
addze
Add to Zero Extended with CA
X
011111
00110 01010 1
addze.
Add to Zero Extended with CA & Record CR
X
011111
00110 10111 /
stbx
Store Byte Indexed
X
011111
00111 00110 /
icblc
Instruction Cache Block Lock Clear
X
011111
00111 01000 0
subfme
Subtract From Minus One Extended with CA
X
011111
00111 01000 1
subfme.
Subtract From Minus One Extended with CA & Record CR
X
011111
00111 01010 0
addme
Add to Minus One Extended with CA
X
011111
00111 01010 1
addme.
Add to Minus One Extended with CA & Record CR
X
011111
00111 01011 0
mullw
Multiply Low Word
X
011111
00111 01011 1
mullw.
Multiply Low Word & Record CR
X
011111
00111 10110 /
dcbtst
Data Cache Block Touch for Store
X
011111
00111 10111 /
stbux
Store Byte with Update Indexed
X
011111
01000 01010 0
add
Add
X
011111
01000 01010 1
add.
Add & Record CR
X
011111
01000 10110 /
dcbt
Data Cache Block Touch
X
011111
01000 10111 /
lhzx
Load Half Word & Zero Indexed
X
011111
01000 11100 0
eqv
Equivalent
X
011111
01000 11100 1
eqv.
Equivalent & Record CR
X
011111
01001 10111 /
lhzux
X
011111
01001 11100 0
xor
XOR
X
011111
01001 11100 1
xor.
XOR & Record CR
XFX
011111
01010 00011 /
mfdcr
Move From Device Control Register
XFX
011111
01010 10011 /
mfspr
Move From Special Purpose Register
X
011111
01010 10111 /
lhax
X
011111
01011 10111 /
lhaux
Load Half Word Algebraic with Update Indexed
X
011111
01100 00110 /
dcblc
Data Cache Block Lock Clear
X
011111
01100 10111 /
sthx
Store Half Word Indexed
Load Half Word & Zero with Update Indexed
Load Half Word Algebraic Indexed
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-36
Freescale Semiconductor
Instruction Model
Table 3-3. Instructions Sorted by Opcode (continued)
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
X
011111
01100 11100 0
orc
OR with Complement
X
011111
01100 11100 1
orc.
OR with Complement & Record CR
X
011111
01101 10111 /
sthux
X
011111
01101 11100 0
or
OR
X
011111
01101 11100 1
or.
OR & Record CR
XFX
011111
01110 00011 /
mtdcr
Move to Device Control Register
divwu
Divide Word Unsigned
X
011111
01110 01011 0
Store Half Word with Update Indexed
X
011111
01110 01011 1
divwu.
Divide Word Unsigned & Record CR
XFX
011111
01110 10011 /
mtspr
Move to Special Purpose Register
X
011111
01110 10110 /
dcbi
Data Cache Block Invalidate
X
011111
01110 11100 0
nand
NAND
X
011111
01110 11100 1
nand.
NAND & Record CR
X
011111
01111 00110 /
icbtls
Instruction Cache Block Touch and Lock Set
X
011111
01111 01011 0
divw
Divide Word
X
011111
01111 01011 1
divw.
Divide Word & Record CR
X
011111
10000 00000 /
mcrxr
Move to Condition Register from XER
X
011111
10000 01000 0
subfco
Subtract From Carrying & Record OV
X
011111
10000 01000 1
subfco.
Subtract From Carrying & Record OV & CR
X
011111
10000 01010 0
addco
Add Carrying & Record OV
X
011111
10000 01010 1
addco.
Add Carrying & Record OV & CR
X
011111
10000 10110 /
lwbrx
Load Word Byte-Reverse Indexed
X
011111
10000 11000 0
srw
Shift Right Word
X
011111
10000 11000 1
srw.
Shift Right Word & Record CR
X
011111
10001 01000 0
subfo
Subtract From & Record OV
X
011111
10001 01000 1
subfo.
Subtract From & Record OV & CR
X
011111
10001 10110 /
tlbsync
TLB Synchronize
X
011111
10010 10110 /
msync
Memory Synchronize
X
011111
10011 01000 0
nego
Negate & Record OV
X
011111
10011 01000 1
nego.
Negate & Record OV & Record CR
X
011111
10100 01000 0
subfeo
Subtract From Extended with CA & Record OV
X
011111
10100 01000 1
subfeo.
Subtract From Extended with CA & Record OV & CR
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-37
Instruction Model
Table 3-3. Instructions Sorted by Opcode (continued)
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
X
011111
10100 01010 0
addeo
Add Extended with CA & Record OV
X
011111
10100 01010 1
addeo.
Add Extended with CA & Record OV & CR
X
011111
10100 10110 /
stwbrx
Store Word Byte-Reverse Indexed
X
011111
10110 01000 0
subfzeo
Subtract From Zero Extended with CA & Record OV
X
011111
10110 01000 1
subfzeo.
Subtract From Zero Extended with CA & Record OV & CR
X
011111
10110 01010 0
addzeo
Add to Zero Extended with CA & Record OV
X
011111
10110 01010 1
addzeo.
Add to Zero Extended with CA & Record OV & CR
X
011111
10111 01000 0
subfmeo
Subtract From Minus One Extended with CA & Record OV
X
011111
10111 01000 1
subfmeo.
Subtract From Minus One Extended with CA & Record OV & CR
X
011111
10111 01010 0
addmeo
Add to Minus One Extended with CA & Record OV
X
011111
10111 01010 1
addmeo.
Add to Minus One Extended with CA & Record OV & CR
X
011111
10111 01011 0
mullwo
Multiply Low Word & Record OV
X
011111
10111 01011 1
mullwo.
Multiply Low Word & Record OV & CR
X
011111
10111 10110 /
dcba
Data Cache Block Allocate
X
011111
11000 01010 0
addo
Add & Record OV
X
011111
11000 01010 1
addo.
Add & Record OV & CR
X
011111
11000 10010 /
tlbivax
TLB Invalidate Virtual Address Indexed
X
011111
11000 10110 /
lhbrx
Load Half Word Byte-Reverse Indexed
X
011111
11000 11000 0
sraw
Shift Right Algebraic Word
X
011111
11000 11000 1
sraw.
Shift Right Algebraic Word & Record CR
X
011111
11001 11000 0
srawi
Shift Right Algebraic Word Immediate
X
011111
11001 11000 1
srawi.
Shift Right Algebraic Word Immediate & Record CR
X
011111
11010 10110 /
mbar
Memory Barrier
X
011111
11100 10010 ?
tlbsx
TLB Search Indexed
X
011111
11100 10110 /
sthbrx
Store Half Word Byte-Reverse Indexed
X
011111
11100 11010 0
extsh
Extend Sign Half Word
X
011111
11100 11010 1
extsh.
Extend Sign Half Word & Record CR
X
011111
11101 10010 /
tlbre
TLB Read Entry
X
011111
11101 11010 0
extsb
Extend Sign Byte
X
011111
11101 11010 1
extsb.
Extend Sign Byte & Record CR
X
011111
11110 01011 0
divwuo
Divide Word Unsigned & Record OV
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-38
Freescale Semiconductor
Instruction Model
Table 3-3. Instructions Sorted by Opcode (continued)
Opcode
Format
Mnemonic
Instruction
Primary
(Inst[0–5])
Extended
(Inst[21–31])
X
011111
11110 01011 1
divwuo.
X
011111
11110 10010 /
tlbwe
X
011111
11110 10110 /
icbi
X
011111
11111 01011 0
divwo
Divide Word & Record OV
X
011111
11111 01011 1
divwo.
Divide Word & Record OV & CR
X
011111
11111 10110 /
dcbz
D
100000
----- ----- -
lwz
D
100001
----- ----- -
lwzu
D
100010
----- ----- -
lbz
D
100011
----- ----- -
lbzu
Load Byte & Zero with Update
D
100100
----- ----- -
stw
Store Word
D
100101
----- ----- -
stwu
D
100110
----- ----- -
stb
Store Byte
D
100111
----- ----- -
stbu
Store Byte with Update
D
101000
----- ----- -
lhz
Load Half Word & Zero
D
101001
----- ----- -
lhzu
Load Half Word & Zero with Update
D
101010
----- ----- -
lha
Load Half Word Algebraic
D
101011
----- ----- -
lhau
Load Half Word Algebraic with Update
D
101100
----- ----- -
sth
Store Half Word
D
101101
----- ----- -
sthu
Store Half Word with Update
D
101110
----- ----- -
lmw
Load Multiple Word
D
101111
----- ----- -
stmw
Store Multiple Word
Divide Word Unsigned & Record OV & CR
TLB Write Entry
Instruction Cache Block Invalidate
Data Cache Block set to Zero
Load Word & Zero
Load Word & Zero with Update
Load Byte & Zero
Store Word with Update
Notes:
- Don’t care, usually part of an operand field.
/ Reserved bit, invalid instruction form if encoded as 1.
? Allocated for implementation-dependent use. See user’s manual for the implementation.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
3-39
Instruction Model
e200z7 Power Architecture Core Reference Manual, Rev. 2
3-40
Freescale Semiconductor
Chapter 4
Instruction Pipeline and Execution Timing
This chapter describes the e200 instruction pipeline and instruction timing information. The core is
partitioned into the following subsystems:
• Instruction unit
• Control unit
• Integer units
• Load/store unit
• Core interface
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
4-1
Instruction Pipeline and Execution Timing
4.1
Overview of Operation
A block diagram of the e200z7 core is shown in Figure 4-1.
Instruction/Control Unit
Additional Features
• OnCe/Nexus 1/Nexus 3
control logic
• AMBA AHB-Lite bus
• SPE (SIMD)
• VLE
• Embedded scalar/
vector floating-point
• Power management
• Time base/decrementer
counter
• Clock multiplier
Unified Memory Unit
Fetch Unit
Program Counter
Instruction Buffer
(10 Instructions)
Two/Four
instructions
64-Entry
Fully Associative
TLB
Two-Cycle
Fetch Stage
Decode
Stage
EA Calc
32-Entry Branch
Target Buffer
MAS
Registers
Dual-Instruction, In-Order Dispatch
Execute Stage
32 GPRs
(64-Bit)
CR
XER
LR
CTR
Four execute
stages with
overlapped
execution and
feed forwarding
Write-Back Stage
••
•
4 KB–4 GB page sizes
Branch Processing Unit
+
Software-Managed
L1 Unified MMU
Execution Units
Embedded
Scalar FPU
SPE
Unit
+÷
+ ÷
Branch
Unit
Instruction Bus Interface Unit
32
Embedded
Vector FPU
Integer
Unit
Load/Store
Unit
+÷
+ ÷
+ EA Calc
VLE
SPRs
64
Address
Data
N
Control
Optional
Extension
Data Bus Interface Unit
Dual-Instruction, In-Order Write Back
32
Address
64
Data
N
Control
Figure 4-1. e200z7 Block Diagram
The instruction fetch unit prefetches instructions from memory into the instruction buffers. The decode
unit decodes each instruction and generates information needed by the branch unit and the execution units.
Prefetched instructions are written into the instruction buffers.
The instruction issue unit attempts to issue a pair of instructions each cycle to the execution units. Source
operands for each of the instructions are provided from the general purpose registers (GPRs) or from the
operand feed-forward muxes. Data or resource hazards may create stall conditions which cause instruction
issue to be stalled for one or more cycles until the hazard is eliminated.
The execution units write the result of a finished instruction onto the proper result bus and into the
destination registers. The write-back logic retires an instruction when the instruction has finished
execution. Up to three results can be simultaneously written, depending on the size of the result.
e200z7 Power Architecture Core Reference Manual, Rev. 2
4-2
Freescale Semiconductor
Instruction Pipeline and Execution Timing
Two execution units are provided to allow dual issue of most instructions. Only a single load/store unit is
provided. Only a single integer divide unit is provided, thus a pair of divide instructions cannot issue
simultaneously. In addition, the divide unit is blocking.
Table 4-1 shows the e200z7 concurrent instruction issue capabilities. Note that data dependencies between
instructions will generally preclude dual-issue. in particular, read after write dependencies are handled by
stalling the issue pipeline as required to ensure the proper execution ordering.
Table 4-1. Concurrent Instruction Issue Capabilities
Class of Instruction
Branch
Load/Store
Scalar Integer
Scalar Float
Vector Integer
Vector Float
Special
Branch
—





—
Load/store

—




—
Scalar integer


1


2


—
Scalar float





—
—

3

—
Vector integer


2
Vector float



—

—
—
Special
—
—
—
—
—
—
—
1
Excludes divide class instructions occurring in both issue slots.
Excludes vector MAC/multiply class instructions occurring with scalar multiply, or divide class instructions occurring in both
issue slots.
3 Excludes vector MAC/multiply class instructions occurring in both issue slots, or divide class instructions occurring in both issue
slots.
2
4.1.1
Control Unit
The control unit coordinates the instruction fetch unit, branch unit, instruction decode unit, instruction
issue unit, completion unit, and exception handling logic.
4.1.2
Instruction Unit
The instruction unit controls the flow of instructions from the cache to the instruction buffers and decode
unit. Ten instruction prefetch buffers allow the instruction unit to fetch instructions ahead of actual
execution, and serve to decouple memory and the execution pipeline.
4.1.3
Branch Unit
The branch unit executes branch instructions, predicts conditional branches, and provides branch target
addresses for instruction fetches. It contains a 32-entry branch target buffer (BTB) to accelerate execution
of branch instructions as well as a 3-entry Return Stack used for subroutine return address prediction.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
4-3
Instruction Pipeline and Execution Timing
4.1.4
Instruction Decode Unit
The decode unit includes the instruction buffers. A pair of instructions can be decoded each cycle. The
major functions of the decode logic are:
• Opcode decoding to determine the instruction class and resource requirements for each instruction
being decoded.
• Source and destination register dependency checking.
• Execution unit assignment.
• Determine any decode serializations, and inhibit subsequent instruction decoding.
The decode unit operates in a single processor clock cycle.
4.1.5
Exception Handling
The exception handling unit includes logic to handle exceptions, interrupts, and traps.
4.2
Execution Units
The core data execution units consist of the integer units, SPE units, EFPU floating-point units, and the
load/store unit. Included in the execution units section are the 32- by 64-bit GPRs. Instructions with data
dependencies begin execution when all such dependencies are resolved.
4.2.1
Integer Execution Units
Each integer execution unit is used to process arithmetic and logical instructions. Adds, subtracts,
compares, count leading zeros, shifts and rotates execute in a single cycle. Integer multiply and divides
execute in multiple clock cycles.
Multiply instructions have a latency of 3 cycles for result data and 4 cycles for condition codes for record
forms, with a throughput of 1 per cycle.
Divide instructions have a variable latency (4–15 cycles) depending on the operand data. The worst case
integer divide will take 15 cycles. While the divide is running, the rest of the pipeline is unavailable for
additional instructions (blocking divide).
4.2.2
Load/Store Unit
The load/store unit executes instructions that move data between the GPRs and the memory subsystem.
When free of data dependencies, loads execute with a maximum throughput of 1 per cycle and a 3-cycle
latency. Stores also execute with a maximum throughput of 1 per cycle and a 3-cycle latency. Store data
can be fed-forward from an immediately preceding load with no stall.
4.2.3
Embedded Floating-point Execution Units
The embedded floating-point execution units are used to process EFPU floating-point arithmetic
instructions. Adds, subtracts, compares, multiply, and multiply-accumulate pipelines have a latency of
e200z7 Power Architecture Core Reference Manual, Rev. 2
4-4
Freescale Semiconductor
Instruction Pipeline and Execution Timing
4 cycles with a maximum throughput of 1 per cycle. EFPU floating-point divide and square root
instructions have a latency of 9 cycles. While the divide is running, the rest of the pipeline is unavailable
for additional instructions (blocking divide).
4.3
Instruction Pipeline
The processor pipeline consists of stages for instruction fetch, instruction decode, register read, execution,
and result writeback. Certain stages involve multiple clock cycles of execution. The processor also
contains an instruction prefetch buffer to allow buffering of instructions prior to the decode stage.
Instructions proceed from this buffer to the instruction decode stage by entering the instruction decode
register IR.
Table 4-2 describes the pipeline stages.
Table 4-2. Pipeline Stages
Stage
Description
IFETCH0
Instruction fetch from memory, stage 0
IFETCH1
Instruction fetch from memory, stage 1
IFETCH2
Instruction fetch from memory, stage 2
DECODE0
Instruction decode, stage 0
DECODE1/RF READ
Instruction Decode, stage 1/Register read/Operand forwarding/ Memory effective address
generation
EXECUTE0/MEM0
Instruction execution stage 0/Memory access stage 0
EXECUTE1/MEM1
Instruction execution stage 1/Memory access stage 1
EXECUTE2/MEM2
Instruction execution stage 2/Memory Access stage 2
EXECUTE3
Instruction execution stage 3
WB
Write back to registers
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
4-5
Instruction Pipeline and Execution Timing
Figure 4-2 shows a pipeline diagram.
Simple Instructions
IFetch0
I0,I1
I2,I3
I0,I1
IFetch1
I2,I3
I0,I1
IFetch2
I2,I3
I0,I1
Decode0
I2,I3
I0,I1
Decode1/ Reg read/ FFwd
I2,I3
I0,I1
Execute0
I2,I3
I0,I1
Feedforward
I2,I3
I0,I1
Feedforward
I2,I3
I0,I1
Feedforward
I2,I3
I0,I1
Writeback
I2,I3
Load Instructions
IFetch0
IFetch1
L0,L1
L0,L1
L0,L1
IFetch2
L0,L1
Decode0
Decode1/ Reg read / EA calc
L0,L1
L0
Memory0
L1
L0
Memory1
L1
L0
Memory2
L1
L0
Feedforward
L1
L0
Writeback
L1
Figure 4-2. Pipeline Diagram
4.3.1
Description of Pipeline Stages
The fetch pipeline stages retrieve instructions from the memory system and determine where the next
instruction fetch is performed. Up to two 32-bit instructions or four 16-bit instructions are sent from
memory to the instruction buffers each cycle.
The decode pipeline stages decodes instructions, read operands from the register file, and performs
dependency checking.
e200z7 Power Architecture Core Reference Manual, Rev. 2
4-6
Freescale Semiconductor
Instruction Pipeline and Execution Timing
Execution occurs in one or more of the four execute pipeline stages in each execution unit (perhaps over
multiple cycles). Execution of most load/store instructions is pipelined. The load/store unit has the
following four pipeline stages:
• EA Calc—effective address calculation
• MEM0—memory access
• MEM1—memory access
• MEM2—data format and forward
Simple integer instructions complete execution in the Execute0 stage of the pipeline. Multiply instructions
require all four execute stages but may be pipelined as well. Most condition-setting instructions complete
in the Execute0 stage of the pipeline, thus conditional branches dependent on a condition-setting
instruction may be resolved by an instruction in this stage.
Result feed-forward hardware forwards the result of one instruction into the source operand(s) of a
following instruction so that the execution of data-dependent instructions do not wait until the completion
of the result writeback. Feed forward hardware is supplied to allow bypassing of completed instructions
from all four execute stages into the first execution stage for a subsequent data-dependent instruction.
4.3.2
Instruction Prefetch Buffers and Branch Target Buffer
The e200 contains a 10-entry instruction prefetch buffer that supplies instructions into the instruction
register (IR) for decoding. Each slot in the prefetch buffer is 32 bits wide, capable of holding a single 32-bit
instruction, or a pair of 16-bit instructions. Figure 4-3 shows the instruction prefetch buffers.
Instruction prefetches request a 64-bit double word and the prefetch buffer is filled with a pair of
instructions at a time, except for the case of a change of flow fetch where the target is to the second (odd)
word. In that case, only a 32-bit prefetch is performed to load the instruction prefetch buffer. This 32-bit
fetch may be immediately followed by a 64-bit prefetch to fill slots 0 and 1 in the event that the branch is
resolved to be taken.
In normal sequential execution, instructions are loaded into the IR from prefetch buffer slots 0 and 1, and
as a pair of slots are emptied, they are refilled. Whenever a pair of slots is empty, a 64-bit prefetch is
initiated which fills the earliest empty slot pairs beginning with slot 0.
If the instruction prefetch buffer empties, the instruction issue stalls and the buffer is refilled. The first
returned instruction is forwarded directly to the IR. Open cycles on the memory bus are utilized to keep
the buffer full when possible.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
4-7
Instruction Pipeline and Execution Timing
Figure 4-3 shows the instruction prefetch buffer.
IR
Decode
SLOT1
SLOT3
SLOT5
SLOT7
SLOT9
MUX
SLOT0
SLOT2
SLOT4
SLOT6
SLOT8
DATA 0:63
.
.
Figure 4-3. e200 Instruction Prefetch Buffers
NOTE
- The e200z7 core can prefetch up to 2 cache lines (64 bytes total) beyond
the current instruction execution point. Executing code within the last 64
bytes of a memory region such as internal SRAM or Flash may cause a bus
error when pre-fetching occurs past the end of memory. Do not place code
to be executed within the last 64 bytes of a memory region.
- An ECC exception can occur if pre-fetches occur at locations that are valid
but not yet initialized for ECC. When executing code from internal ECC
SRAM, initialize memory beyond the end of the code until the next 32-byte
aligned address and then an additional 64 bytes to ensure that pre-fetches
cannot land in uninitialized SRAM.
- The Boot Assist Module (BAM) is located at the end of the address space
and so may cause instruction pre-fetches to wrap-around to address 0 in
internal flash memory. If this first block of flash memory contains ECC
errors, such as from an aborted program or erase operation, a machine-check
exception will be asserted. At this point in the boot procedure, exceptions
are disabled, but the machine-check will remain pending and the exception
vector will be taken if user application code subsequently enables the
machine check interrupt. To guard against the possibility of the BAM
causing a machine-check exception to be taken, user application code
should write all 1s to the Machine Check Syndrome Register (MCSR) to
clear it before enabling the machine check interrupt.
To resolve branch instructions and improve the accuracy of branch predictions, the e200 implements a
dynamic branch prediction mechanism using a 32-entry branch target buffer (BTB).
An entry is allocated in the BTB whenever a normal branch resolves as taken and the BTB is enabled.
Certain other branches do not allocate BTB entries: bctr, bcctr. Entries in the BTB are allocated on taken
branches using a FIFO replacement algorithm.
Each BTB entry holds the branch target address, and a 2-bit branch history counter whose value is
incremented or decremented on a BTB hit, depending on whether the branch was taken. The counter can
assume four different values: strongly taken, weakly taken, weakly not taken, and strongly not taken. On
e200z7 Power Architecture Core Reference Manual, Rev. 2
4-8
Freescale Semiconductor
Instruction Pipeline and Execution Timing
initial allocation of an entry to the BTB for a taken branch, the counter is initialized to the weakly-taken
state.
A branch is predicted as taken on a hit in the BTB with a counter value of strongly or weakly taken. In this
case the target address contained in the BTB is used to redirect the instruction fetch stream to the target of
the branch prior to the branch reaching the instruction decode stage. In the case of a BTB miss, static
prediction is used to predict the outcome of the branch. In the case of a mispredicted branch, the instruction
fetch stream will return to the proper instruction stream after the branch has been resolved.
When a branch is predicted taken and the branch is later resolved (in the branch execute stage), the value
of the appropriate BTB counter is updated. If a branch whose counter indicates weakly taken is resolved
as taken, the counter increments so that the prediction becomes strongly taken. If the branch resolves as
not taken, the prediction changes to weakly not-taken. The counter saturates in the strongly taken states
when the prediction is correct.
The e200 does not implement the static branch prediction that is defined by the Power ISA embedded
category architecture. The BO prediction bit in branch encodings is ignored.
Dynamic branch prediction is enabled by setting BUCSR[BPEN]. Allocation of branch target buffer
entries may be controlled using BUCSR[BALLOC] to control whether forward or backward branches (or
both) are candidates for entry into the BTB, and thus for branch prediction. Once a branch is in the BTB,
BUCSR[ALLOC] has no further effect on that branch entry. Clearing BUCSR[BPEN] disables dynamic
branch prediction, in which case the e200 reverts to a static prediction mechanism using BUCSR[BPRED]
to control whether forward or backward branches (or both) are predicted taken or not taken.
The BTB uses virtual addresses for performing tag comparisons. On allocation of a BTB entry, the
effective address of a taken branch, along with the current Instruction Space (as indicated by MSR[IS]) is
loaded into the entry and the counter value is set to weakly taken. The current PID value is not maintained
as part of the tag information.
The e200 does support automatic flushing of the BTB when the current PID value is updated by a mtcr
PID0 instruction. Software is otherwise responsible for maintaining coherency in the BTB when a change
in effective to real (virtual to physical) address mapping is changed. This is supported by the
BUCSR[BBFI] control bit.
Figure 4-4 shows the branch target buffer.
Tag
Data
branch addr[0:30]
IS
target address[0:30]
counter
entry 0
branch addr[0:30]
IS
target address[0:30]
counter
entry 1
...
...
...
...
...
branch addr[0:30]
IS
target address[0:30]
counter
entry 31
IS = Instruction Space
Figure 4-4. e200 Branch Target Buffer
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
4-9
Instruction Pipeline and Execution Timing
NOTE
Under certain conditions, if a static branch prediction and a dynamic return
prediction (which uses the subroutine return address stack) occur
simultaneously in the BTB, the e200z7 core can issue an errant fetch address
to the memory system (instruction fetched from wrong address). This can
only happen when the static branch prediction is "taken" but the branch
actually resolves to "not taken". To prevent the issue from occurring, set
BUCSR[BPRED] = 0b11 to configure static branch prediction to "not
taken". This issue does not apply to VLE.
4.3.3
Single-Cycle Instruction Pipeline Operation
Sequences of single-cycle execution instructions follow the flow shown in Figure 4-5. Instructions are
issued and completed in program order. Most arithmetic and logical instructions fall into this category.
Time Slot
1st Inst(s).
2nd Inst(s).
3rd Inst(s).
4th Inst(s).
IF0
IF1
IF2
D0
E0
FF
FF
FF
WB
IF2
D1/
RR
D0
IF0
IF1
D1/
RR
E0
FF
FF
FF
WB
IF0
IF1
IF2
D0
D1/
RR
E0
FF
FF
FF
WB
IF0
IF1
IF2
D0
D1/
RR
E0
FF
FF
FF
WB
Figure 4-5. Basic Pipe Line Flow, Single Cycle Instructions
e200z7 Power Architecture Core Reference Manual, Rev. 2
4-10
Freescale Semiconductor
Instruction Pipeline and Execution Timing
4.3.4
Basic Load and Store Instruction Pipeline Operation
Figure 4-6 shows the basic pipeline flow for load and store instructions. The effective address is calculated
in the EA Calc stage, and memory is accessed in the MEM0–MEM1 stages. Data selection and alignment
is performed in MEM2, and the result is available at the end of MEM2.
Time Slot
1st LD Inst. IF0
IF1
IF2
D0
D1/
RR/
EA
M0
M1
M2
FF
WB
2nd LD/ST Inst.
IF0
IF1
IF2
D0
D1/
RR/
EA
M0
M1
M2
FF
WB
IF0
IF1
IF2
D0
D1/
RR
—
E0
FF
FF
3rd Inst.
(single cycle)
FF
WB
Figure 4-6. Basic Pipeline Flow, Load/Store Instructions
4.3.5
Change-of-Flow Instruction Pipeline Operation
Figure 4-7 shows a pipeline flow with no prediction. Simple change of flow instructions require 4 cycles
to refill the pipeline with the target instruction for taken branches and branch and link instructions with no
BTB hit and no prediction required (condition resolved prior to branch decode).
Time Slot
BR Inst.
IF0
IF1
Target Inst.
IF2
D0/
EA
(D1/
RR)
(E0)
(E1)
(E2)
(E3)
WB
TF0
TF1
TF2
D0
D1/
RR
E0
E1
E2
E3
WB
Figure 4-7. Basic Pipeline Flow, Branch Instructions, No Prediction
Figure 4-7 shows a pipeline flow with correct prediction and a branch taken. For branch type instructions,
this 4-cycle timing may be reduced in some situations by performing the target fetch speculatively while
the branch instruction is still being fetched into the instruction buffer if the branch target address can be
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
4-11
Instruction Pipeline and Execution Timing
obtained from the BTB. The resulting branch timing reduces to a single clock when the target fetch is
initiated early enough and the branch is correctly predicted.
Time Slot
BR Inst.
IF0
IF1
IF2
D0
(D1)
(E0)
(E1)
(E2)
(E3)
WB
TF1
TF2
D0
D1/
RR
E0
E1
E2
E3
(BTB HIT)
TF0
Target Inst.
WB
Figure 4-8. Basic Pipeline Flow, Branch Instructions, BTB Hit, Correct Prediction, Branch Taken
Figure 4-9 shows a case where the branch is incorrectly predicted, and 6 cycles are required to correct the
misprediction outcome.
Time Slot
BR Inst.
IF0
IF1
IF2
D0
(Predict Not Taken)
(D1/ (E0) (E1)
RR) Resolve
Condition
(E2)
(E3)
TF0
TF1
TF2
Target Inst.
WB
D0
D1/
RR
E0
E1
E2
E3
Figure 4-9. Basic Pipeline Flow, Branch Instructions, Predict Not Taken, Incorrect Prediction
Figure 4-10 shows bcctr and e_bctr cases where the branch is correctly predicted as taken, and 5 cycles
are required to execute the branch.
Time Slot
BR Inst.
Target Inst.
IF0
(D1/ (E0) (E1)
RR) Resolve
Condition
(Predict Taken)
IF1
IF2
D0
TF0
TF1
(E2)
(E3)
TF2
D0
WB
D1/
RR
E0
E1
E2
E3
Figure 4-10. Basic Pipeline Flow, bcctr Instruction, Predict Taken, Incorrect Prediction,
Instruction Buffer Not Empty
e200z7 Power Architecture Core Reference Manual, Rev. 2
4-12
Freescale Semiconductor
Instruction Pipeline and Execution Timing
Figure 4-11 shows the bcctr and e_bctr cases where the branch is incorrectly predicted as taken, but the
fall-through instruction is already in the instruction buffer. 3 cycles are required to execute this branch.
Time Slot
BR Inst.
IF0
(D1/ (E0) (E1)
RR) Resolve
Condition
(Predict Taken)
IF1
IF2
(E2)
(E3)
TF1
TF2
(Discard)
D0
D1/
E0
D0
Target Inst.
TF0
Fall-Through Inst.
WB
E1
E2
E3
WB
Figure 4-11. Basic Pipeline Flow, bcctr Instruction, Predict Taken, Incorrect Prediction,
Instruction Buffer Not Empty
Figure 4-12 shows bcctr and e_bctr cases where the branch is incorrectly predicted as taken, and the
fall-through instruction is not already in the instruction buffer (a rare case). 6 cycles are required to execute
this branch.
Time Slot
BR Inst.
IF0
Target Inst.
Fall-Through Inst.
(D1/ (E0) (E1)
RR) Resolve
Condition
(Predict Taken)
IF1
IF2
(E2)
(E3)
TF1
TF2
(Discard)
IF0
IF1
IF2
D0
TF0
WB
D0
D1/
RR
E0
E1
E2
E3
Figure 4-12. Basic Pipelne Flow, bcctr Instruction, Predict Taken, Incorrect Prediction,
Instruction Buffer Empty
4.3.6
Basic Multicycle Instruction Pipeline Operation
Most multicycle instructions may be pipelined so that the effective execution time is smaller than the
overall number of clocks spent in execution. The restrictions to this execution overlap are that no data
dependencies between the instructions can be present and that instructions must complete and write back
results in order. A single cycle instruction that follows a multicycle instruction must wait for completion
of the multicycle instruction prior to its writeback in order to meet the in-order requirement. Result
feed-forward paths are provided so that execution may continue prior to result writeback.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
4-13
Instruction Pipeline and Execution Timing
Figure 4-13 shows the basic pipeline flow for integer multiply class instructions.
Time Slot
1st Inst.
(Multiply)
IF0
2nd Inst(s).
(Single Cycle)
IF1
IF2
D0
IF0
IF1
IF0
3rd Inst(s).
(Single Cycle)
4th Inst(s).
(Single Cycle,
Dep On Mul)
E0
E1
E2
E3
WB
IF2
D1/
RR
D0
D1/
RR
E0
FF
FF
FF
WB
IF1
IF2
D0
D1/
RR
E0
FF
FF
FF
WB
IF0
IF1
IF2
D0
D1/
RR
E0
FF
FF
FF
WB
Figure 4-13. Basic Pipeline Flow, Integer Multiply Class Instructions
The divide and load and store multiple instructions require multiple cycles in the execute stage, as shown
in Figure 4-14.
Time Slot
Long Inst.
IF0
Next Inst.
(Single Cycle)
IF1
IF2
D0
D1/
RR
E0
E1
E2
E3
....
Elast WB
IF0
TIF1 IF2
D0
D1/
RR
—
—
—
—
—
E0
FF
FF
FF
Figure 4-14. Basic Pipeline Flow, Long Instruction
4.3.7
Additional Examples of Instruction Pipeline Operation for Load and
Store
Figure 4-15 shows an example of pipelining two non-data-dependent load or store instructions with a
following load target data-dependent single cycle instruction. While the first load or store begins accessing
memory in the M0 stage, the next load can be calculating a new effective address in the D1/EA stage. The
e200z7 Power Architecture Core Reference Manual, Rev. 2
4-14
Freescale Semiconductor
Instruction Pipeline and Execution Timing
add in this example stalls for 2 cycles since a data dependency exists on the target register of the second
load.
Time Slot
1st LD Inst. IF0
IF1
IF2
D0
D1/
RR/
EA
M0
M1
M2
FF
WB
2nd LD/ST Inst.
IF0
IF1
IF2
D0
D1/
RR/
EA
M0
M1
M2
FF
WB
IF0
IF1
IF2
D0
D1/
RR
—
—
E0
FF
3rd Inst.
(Add, Depends
FF
FF
WB
Figure 4-15. Pipelined Load Instructions with Load Target Data Dependency
Figure 4-16 shows an example of pipelining a data-dependent add instruction following a load with update
instruction. While the first load begins accessing memory in the M0 stage, the next load with update can
be calculating a new effective address in the EA Calc stage. Following the EA Calc, the updated base
register value can be fed-forward to subsequent instructions. The add in this example does not stall, even
though a data dependency exists on the updated base register of the load with update.
Time Slot
1st Inst.
(Load)
IF0
IF1
IF2
D0
D1/
RR/
EA
M0
M1
M2
FF
WB
2nd Inst.
(Load W/Update)
IF0
IF1
IF2
D0
D1/
RR/
EA
M0
M1
M2
FF
WB
IF0
IF1
IF2
D0
D1/
RR
E0
FF
FF
FF
3rd Inst.
(Add, Depends
on 2nd Load)
WB
Figure 4-16. Pipelined Instructions with Base Register Update Data Dependency
Figure 4-17 shows an example of pipelining a data-dependent store instruction following a load
instruction. While the first load begins accessing memory in the M0 stage, the store can be calculating a
new effective address in the D1/EA stage. The store in this example will not stall due to the data
dependency existing on the load data of the load instruction.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
4-15
Instruction Pipeline and Execution Timing
Time Slot
1st Inst.
(Load)
IF1
IF2
D0
D1/
RR/
EA
M0
M1
M2
FF
WB
IF0
2nd Inst.
(Store, Data Depends
on Load)
IF1
IF2
D0
D1/
RR/
EA
M0
M1
M2
FF
IF0
WB
Figure 4-17. Pipelined Store Instruction with Store Data Dependency
4.3.8
Move to/from SPR Instruction Pipeline Operation
Many mtspr and mfspr instructions are treated like single cycle instructions in the pipeline, and do not
cause stalls. Exceptions are for the MSR, the debug SPRs, the SPE unit, and cache/MMU SPRs which do
cause stalls. Figure 4-18 through Figure 4-20 show examples of mtspr and mfspr instruction timing.
Figure 4-18 applies to the debug SPRs and the SPE unit’s SPEFSCR. These instructions do not begin
execution until all previous instructions have finished their execute stage(s). In addition, execution of
subsequent instructions is stalled until the mfspr and mtspr instructions complete.
Time Slot
Prev Inst.
IF0
IF1
IF2
D0
D1/
RR/
EA
E0
E1
E2
E3
WB
mtspr, mfspr
Debug, SPE Inst.
IF0
IF1
IF2
D0
D1/
RR
—
—
—
E0
E1
E2
E3
WB
IF0
IF1
IF2
D0
D1/
RR
—
—
—
—
—
—
E0
Next Inst.
E1
Figure 4-18. mtspr, mfspr Instruction Execution, Debug, and SPE SPRs
e200z7 Power Architecture Core Reference Manual, Rev. 2
4-16
Freescale Semiconductor
Instruction Pipeline and Execution Timing
Figure 4-19 applies to the mtmsr instruction and the wrtee and wrteei instructions. Execution of
subsequent instructions is stalled until the cycle after these instructions writeback.
Time Slot
Prev Inst.
IF0
mtmsr, wrtee
wrteei Inst.
Next Inst.
IF1
IF2
D0
D1/
RR/
EA
E0
E1
E2
E3
WB
IF0
IF1
IF2
D0
D1/
RR
E0
E1
E2
E3
WB
IF0
IF1
IF2
D0
D1/
RR
—
—
—
—
—
E0
E1
E2
Figure 4-19. mtmsr, wrtee[i] Instruction Execution
Access to cache and MMU SPRs are stalled until all outstanding bus accesses have completed on both
interfaces and the caches and MMU are idle (p_[d,i]_cmbusy negated) to allow an access window where
no translations or cache cycles are required. Figure 4-20 shows an example where an outstanding bus
access causes mtspr/mfspr execution to be delayed until the bus becomes idle. Other situations such as a
cache linefill may cause the cache to be busy even when the processor interface is idle (p_[d,i]_tbusy[0]_b
is negated). In these cases execution stalls until the cache and MMU are idle as signaled by negation of
p_[d,i]_cmbusy. Processor access requests will be held off during execution of a cache/MMU SPR
instruction. A subsequent access request may be generated the cycle following the last execute stage (that
is, during the WB cycle). This same protocol applies to cache and MMU management instructions (for
example, dcbz, dcbf, etc., tlbre, tlbwe, etc.).
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
4-17
Instruction Pipeline and Execution Timing
Time Slot
Prev Inst.
IF0
IF1
IF2
D0
D1/
RR/
EA
E0
E1
E2
E3
WB
mtspr, mfspr
Debug, SPE Inst.
IF0
IF1
IF2
D0
D1/
RR
—
—
—
E0
E1
E2
E3
WB
IF0
IF1
IF2
D0
D1/
RR
—
—
—
—
—
—
E0
Next Inst.
E1
p_rd_spr,
p_wr_spr
p_[d,i]_treq_b
p_[d,i]_tbusy[0]_b
p_[d,i]_ta_b
p_[d,i]_cmbusy
Figure 4-20. Cache/MMU mtspr, mfspr, and Management Instruction Execution
4.4
Control Hazards
Internal control hazards exist in the e200 that can cause certain instruction sequences to incur one or more
stall cycles. For example, when an mtspr instruction precedes an mfspr instruction, the issue stalls until
the mtspr completes.
4.5
Instruction Serialization
There are three types of serialization required by the core, as follows:
• Completion serialization
• Dispatch (decode/issue) serialization
• Refetch serialization
4.5.1
Completion Serialization
A completion serialized instruction is held for execution until all prior instructions have completed. The
instruction will then execute once it is next to complete in program order. Results from these instructions
e200z7 Power Architecture Core Reference Manual, Rev. 2
4-18
Freescale Semiconductor
Instruction Pipeline and Execution Timing
will not be available for or forwarded to subsequent instructions until the instruction completes.
Instructions which are completion serialized are as follows:
• Instructions that access or modify system control or status registers. For example, mcrxr, mtmsr,
wrtee, wrteei, mtspr, mfspr (except to CTR/LR).
• Instructions that manage caches and TLBs
• Instructions defined by the architecture as context or execution synchronizing: isync, se_isync,
msync, rfi, rfci, rfdi, rfmci, se_rfi, se_rfci, se_rfdi, se_rfmci, sc, se_sc.
• wait
4.5.2
Dispatch Serialization
Some instructions are dispatch-serialized by the core. An instruction that is dispatch-serialized prevents
the next instruction from decoding until all instructions up to and including the dispatch-serialized
instruction completes. Instructions which are dispatch serialized are: isync, se_isync, msync, rfi, rfci,
rfdi, rfmci, se_rfi, se_rfci, se_rfdi, se_rfmci, sc, se_sc.
The mbar instruction is pseudo-dispatch serialized; it prevents the next instruction from decoding until all
previous load and store class instructions have completed.
4.5.3
Refetch Serialization
Refetch serialized instructions inhibit dispatching of subsequent instructions and force a pipeline refill to
refetch subsequent instructions after completion. These include the following:
• The context synchronizing instructions isync, se_isync.
• The rfi, rfci, rfdi, rfmci, se_rfi, se_rfci, se_rfdi, se_rfmci, sc, se_sc instructions.
4.6
Concurrent Instruction Execution
The core effectively has the following execution units:
• Branch unit
• Dual scalar integer units
• Dual vector integer units
• Dual scalar embedded floating-point units/single vector embedded floating-point unit
• Load/store unit
These executions units are pipelined and support overlapped execution of instructions. In certain cases, the
branch unit predicts branches and supplies a speculative instruction stream to the instruction buffer unit.
Section 4.7, “Instruction Timings,” accurately indicates the number of cycles an instruction executes in the
appropriate unit. However, determining the elapsed time or cycles to execute a sequence of instructions is
beyond the scope of this document.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
4-19
Instruction Pipeline and Execution Timing
4.7
Instruction Timings
Instruction timing in number of processor clock cycles for various instruction classes is shown in
Table 4-3. Pipelined instructions are shown with cycles of total latency and throughput cycles. Divide
instructions are not pipelined and block other instructions from executing during divide execution. Timing
for SPE instructions is detailed in Section 6.3, “SPE Instruction Timing.”
Load/store multiple instruction cycles are represented as a fixed number of cycles plus a variable number
of cycles where n is the number of words accessed by the instruction. In addition, cycle times marked with
an ‘&’ require variable number of additional cycles due to serialization.
Table 4-3. Instruction Class Cycle Counts
Class of Instructions
Latency
Throughput
Special Notes
Integer: add, sub, shift, rotate, logical, cntlzw
1
1
—
Integer: compare
1
1
—
Branch
6/4/1
6/4/1
Multiply
3/4
1
4–15
4–15
CR logical
1
1
—
Loads (non-multiple)
3
1
—
3 + n/2 (max)
1 + n/2 (max)
3
1
3 + n/2 (max)
1 + n/2 (max)
6&
6
—
1
1
—
mfspr, mtspr
4&
4&
applies to debug SPRs, optional unit
SPRS
mfspr, mfmsr
1
1
Applies to internal, non-debug SPRs
mfcr, mtcr
1
1
—
rfi, rfci, rfdi, rfmci
6
—
—
sc
4
—
—
tw, twi
4
—
Divide
Load multiple
Stores (non-multiple)
Store multiple
mtmsr, wrtee, wrteei
mcrf
Correct branch lookahead allows single
cycle execution.
Worst-case mispredicted branch is 6
cycles.
Result data is available after 3 cycles,
record form conditions are available after
fourth cycle.
Data dependent timing
Actual timing depends on n and address
alignment.
—
Actual timing depends on n and address
alignment.
Trap taken timing
e200z7 Power Architecture Core Reference Manual, Rev. 2
4-20
Freescale Semiconductor
Instruction Pipeline and Execution Timing
Table 4-4 shows detailed timing for each instruction mnemonic along with its serialization requirements.
Table 4-4. Instruction Timing by Mnemonic
Mnemonic
Latency
Serialization
add[o][.]
1
None
addc[o][.]
1
None
adde[o][.]
1
None
addi
1
None
addic[.]
1
None
addis
1
None
addme[o][.]
1
None
addze[o][.]
1
None
and[.]
1
None
andc[.]
1
None
andi.
1
None
andis.
1
None
b[l][a]
6/4/1
None
bc[l][a]
6/4/1
None
bcctr[l]
6/5/3/1
None
bclr[l]
6/5/3/1
None
cmp
1
None
cmpi
1
None
cmpl
1
None
cmpli
1
None
cntlzw[.]
1
None
crand
1
None
crandc
1
None
creqv
1
None
crnand
1
None
crnor
1
None
cror
1
None
crorc
1
None
crxor
1
None
divw[o][.]
4–151
None
divwu[o][.]
4–151
None
1
None
eqv[.]
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
4-21
Instruction Pipeline and Execution Timing
Table 4-4. Instruction Timing by Mnemonic (continued)
Mnemonic
Latency
Serialization
extsb[.]
1
None
extsh[.]
1
None
isel
1
None
isync
62
Refetch
lbarx
3
None
lbz
33
None
lbzu
33
None
lbzux
33
None
lbzx
33
None
lha
33
None
lharx
3
None
lhau
33
None
lhaux
33
None
lhax
33
None
lhbrx
33
None
lhz
33
None
lhzu
33
None
lhzux
33
None
lhzx
33
None
lmw
3 + (n/2)
None
lwarx
3
None
lwbrx
33
None
lwz
33
None
lwzu
33
None
lwzux
33
None
lwzx
33
None
mbar
12
Pseudodispatch
mcrf
1
None
mcrxr
1
Completion
mfcr
1
None
mfmsr
1
None
mfspr (except DEBUG)
1
None
e200z7 Power Architecture Core Reference Manual, Rev. 2
4-22
Freescale Semiconductor
Instruction Pipeline and Execution Timing
Table 4-4. Instruction Timing by Mnemonic (continued)
Mnemonic
Latency
Serialization
mfspr (DEBUG)
32
Completion
msync
12
Completion
mtcrf
2
None
mtmsr
62
Completion
mtspr (DEBUG)
42
Completion
mtspr (except DEBUG, msr, hid0/1)
1
None
mulhw[.]
3/4
None
mulhwu[.]
3/4
None
mulli
3/4
None
mullw[o][.]
3/4
None
nand[.]
1
None
neg[o][.]
1
None
nop (ori r0,r0,0)
1
None
nor[.]
1
None
or[.]
1
None
orc[.]
1
None
ori
1
None
oris
1
None
rfci
6
Refetch
rfdi
6
Refetch
rfi
6
Refetch
rfmci
6
Refetch
rlwimi[.]
1
None
rlwinm[.]
1
None
rlwnm[.]
1
None
sc
4
Refetch
slw[.]
1
None
sraw[.]
1
None
srawi[.]
1
None
srw[.]
1
None
stb
33
None
stbcx.
3
None
stbu
33
None
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
4-23
Instruction Pipeline and Execution Timing
Table 4-4. Instruction Timing by Mnemonic (continued)
Mnemonic
Latency
Serialization
stbux
33
None
stbx
33
None
sth
3
3
None
sthbrx
33
None
sthcx.
3
None
sthu
33
None
sthux
33
None
sthx
33
None
stmw
3 + (n/2)
None
stw
33
None
stwbrx
33
None
stwcx.
3
None
stwu
33
None
stwux
33
None
stwx
33
None
subf[o][.]
1
None
subfc[o][.]
1
None
subfe[o][.]
1
None
subfic
1
None
subfme[o][.]
1
None
subfze[o][.]
1
None
tw
4
None
twi
4
None
wrtee
6
Completion
wrteei
6
Completion
xor[.]
1
None
xori
1
None
xoris
1
None
1
With early-out capability, timing is data dependent.
Plus additional synchronization time.
3 Aligned.
2
e200z7 Power Architecture Core Reference Manual, Rev. 2
4-24
Freescale Semiconductor
Instruction Pipeline and Execution Timing
4.8
Operand Placement on Performance
The placement (location and alignment) of operands in memory affects relative performance of memory
accesses, and in some cases, affects it significantly. Table 4-5 indicates the effects for the e200 core.
In Table 4-5, optimal means that one effective address (EA) calculation occurs during the memory
operation. Good means that multiple EA calculations occur during the memory operation which may cause
additional bus activities with multiple bus transfers. Poor means that an alignment interrupt is generated
by the storage operation.
Table 4-5. Performance Effects of Storage Operand Placement
Operand
Size
Boundary Crossing
Byte Align.
None
Cache Line
Protection Boundary
4 Bytes
4
<4
Optimal
Good
—
Good
—
Good
2 Bytes
2
<2
Optimal
Good
—
Good
—
Good
1 Byte
1
Optimal
—
—
lmw, stmw
4
<4
Good
Poor
Good
Poor
Good
Poor
string
N/A
—
—
—
Note:
Optimal: One EA calculation occurs.
Good: Multiple EA calculations occur which may cause additional bus activities with multiple bus transfers.
Poor: Alignment Interrupt occurs.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
4-25
Instruction Pipeline and Execution Timing
e200z7 Power Architecture Core Reference Manual, Rev. 2
4-26
Freescale Semiconductor
Chapter 5
Embedded Floating-Point Unit
This chapter describes the instruction set architecture of the embedded floating-point unit version 2
(EFPU) implemented on the e200z7. This unit implements scalar and vector single-precision
floating-point instructions to accelerate signal processing and other algorithms. In comparison to version
1.1 of the EFPU architecture, version 2 of the architecture implements additional operations such as
minimum, maximum, and square root, as well as an extensive set of vector operations with permuted
operands and mixed add/sub, sum, and differences. For the remainder of this chapter, the term EFPU
implies version 2 of the architecture unless otherwise noted.
5.1
Nomenclature and Conventions
The following conventions regarding nomenclature are used in this chapter:
• Bits 0 to 31 of a 64-bit register are referenced as field 0, upper half, or high-order element of the
register. Bits 32–63 are referred to as field 1, lower half, or lower-order element of the register.
Each half is an element of a GPR.
• Mnemonics for EFPU instructions begin with the letters ‘evfs’ (embedded vector floating single)
or ‘efs’ (embedded (scalar) floating single).
5.2
EFPU Programming Model
The e200z7 core provides a register file with thirty-two 64-bit registers. The Power ISA embedded
category 32-bit instructions operate on the lower (least significant) 32 bits of the 64-bit register. EFPU
instructions are defined that view the 64-bit register as being composed of a vector of two 32-bit elements,
or a single scalar 32-bit element. Vector floating-point instructions operate on a vector of two 32-bit
single-precision floating-point numbers resident in the 64-bit GPRs. Scalar single-precision floating-point
instructions operate on the lower half of GPRs. The floating-point instructions do not have a separate
register file; there is a single shared register file for all instructions.
There are no record forms of EFPU instructions. EFPU compare instructions store the result of the
comparison into the condition register (CR). The meaning of the CR bits are now overloaded for the vector
operations. Floating-point compare instructions treat NaNs, Infinity and Denorm as normalized numbers
for the comparison calculation when default results are provided.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-1
Embedded Floating-Point Unit
5.2.1
Signal Processing Extension/Embedded Floating-Point Status and
Control Register (SPEFSCR)
Status and control for embedded floating-point uses the SPEFSCR register. This register is also used by
the SPE unit. Status and control bits are shared for vector floating-point operations, scalar floating-point
operations and SPE vector operations. The SPEFSCR register is implemented as special purpose register
(SPR) number 512 and is read and written by the mfspr and mtspr instructions. The SPEFSCR is shown
in Figure 5-1.
SPR 512
Access: Read/Write
0
R
W
1
2
3
4
5
6
7
W
9
SOVH OVH FGH FXH FINVH FDBZH FUNFH FOVFH —
Reset
R
8
RM
10
11
12
13
14
15
FINXS FINVS FDBZS FUNFS FOVFS MODE
All zeros
16
17
18
19
20
21
22
SOV
OV
FG
FX
FINV
FDBZ
FUNF
Reset
23
24
25
26
27
28
29
FOVF — FINXE FINVE FDBZE FUNFE FOVFE
30
31
FRMC
All zeros
Figure 5-1. SPE/EFPU Status and Control Register (SPEFSCR)
The SPEFSCR bits are defined in Table 5-1.
Table 5-1. SPE /EFPU Status and Control Register
Bits
Name
Description
0
(32)
SOVH
1
(33)
OVH
Integer Overflow High. Defined by SPE.
2
(34)
FGH
Embedded Floating-Point Guard Bit High
FGH is supplied for use by the floating-point round exception handler. FGH is zeroed if a floating-point
data exception occurs for the high element(s). FGH corresponds to the high element result. FGH is
cleared by a scalar floating-point instruction.
3
(35)
FXH
Embedded Floating-Point Sticky Bit High
FXH is supplied for use by the Floating-point round exception handler. FXH is zeroed if a floating-point
data exception occurs for the high element(s). FXH corresponds to the high element result. FXH is
cleared by a scalar floating-point instruction.
4
(36)
FINVH
Embedded Floating-Point Invalid Operation/Input Error High
In mode 0, the FINVH bit is set to 1 if the A or B high element operand of a floating-point instruction is
Infinity, NaN, or Denorm, or if the operation is a divide and the high element dividend and divisor are
both 0.
In mode 1, the FINVH bit is set on an IEEE Std 754 invalid operation (see IEEE 754-1985, Section 7.1)
in the high element.
FINVHH is cleared by a scalar floating-point instruction.
5
(37)
FDBZH
Embedded Floating-Point Divide by Zero High
The FDBZH bit is set to 1 when a floating-point divide instruction executed with a high element divisor
of 0, and the high element dividend is a finite non-zero number. FDBZH is cleared by a scalar floating
point instruction.
Summary Integer Overflow High. Defined by SPE.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-2
Freescale Semiconductor
Embedded Floating-Point Unit
Table 5-1. SPE /EFPU Status and Control Register (continued)
Bits
Name
Description
6
(38)
FUNFH
Embedded Floating-Point Underflow High
The FUNFH bit is set to 1 when the execution of a floating-point instruction results in an underflow in
the high element. FUNFH is cleared by a scalar floating-point instruction.
7
(39)
FOVFH
Embedded Floating-Point Overflow High
The FOVFH bit is set to 1 when the execution of a floating-point instruction results in an overflow in the
high element. FOVFH is cleared by a scalar floating-point instruction.
8–
(40–)
—
Reserved
9
(41)
RM
Rounding Mode—Fixed point
Defined by SPE
10
(42)
FINXS
Embedded Floating-Point Inexact Sticky Flag
The FINXS bit is set to 1 whenever the execution of a floating-point instruction delivers an inexact result
for either the low or high element and no floating-point data exception is taken for either element, or if
the result of a floating-point instruction results in overflow (FOVF = 1 or FOVFH = 1), but floating-point
overflow exceptions are disabled (FOVFE = 0), or if the result of a floating-point instruction results in
underflow (FUNF=1 or FUNFH=1), but floating-point underflow exceptions are disabled (FUNFE = 0),
and no floating-point data exception occurs. The FINXS bit remains set until it is cleared by a mtspr
instruction specifying the SPEFSCR register.
11
(43)
FINVS
Embedded Floating-Point Invalid Operation Sticky Flag
The FINVS bit is set to a 1 when a floating-point instruction sets the FINVH or FINV bit to 1. The FINVS
bit remains set until it is cleared by a mtspr instruction specifying the SPEFSCR register.
12
(44)
FDBZS
Embedded Floating-Point Divide by Zero Sticky Flag
The FDBZS bit is set to 1 when a floating-point divide instruction sets the FDBZH or FDBZ bit to 1. The
FDBZS bit remains set until it is cleared by a mtspr instruction specifying the SPEFSCR register.
13
(45)
FUNFS
Embedded Floating-Point Underflow Sticky Flag
The FUNFS bit is set to 1 when a floating-point instruction sets the FUNFH or FUNF bit to 1. The FUNFS
bit remains set until it is cleared by a mtspr instruction specifying the SPEFSCR register.
14
(46)
FOVFS
Embedded Floating-Point Overflow Sticky Flag
The FOVFS bit is set to 1 when a floating-point instruction sets the FOVFH or FOVF bit to 1. The FOVFS
bit remains set until it is cleared by a mtspr instruction specifying the SPEFSCR register.
15
(47)
MODE
Embedded Floating-Point Operating Mode
0 Default hardware results operating mode
1 IEEE 754 standard hardware results operating mode (not supported by the e200)
This bit controls the operating mode of the EFPU.
The e200 supports only mode 0.
Software should read the value of this bit after writing it to determine if the implementation supports the
selected mode. Implementations will return the value written if the selected mode is a supported mode,
otherwise the value read will indicate the hardware supported mode.
16
(48)
SOV
17
(49)
OV
Integer Overflow. Defined by SPE.
18
(50)
FG
Embedded Floating-Point Guard Bit
FG is supplied for use by the floating-point round exception handler. FG is zeroed if a floating-point data
exception occurs for the low element(s). FG corresponds to the low element result.
Summary Integer Overflow. Defined by SPE.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-3
Embedded Floating-Point Unit
Table 5-1. SPE /EFPU Status and Control Register (continued)
Bits
Name
Description
19
(51)
FX
Embedded Floating-Point Sticky Bit
FX is supplied for use by the floating-point round exception handler.FX is zeroed if a floating-point data
exception occurs for the low element(s). FX corresponds to the low element result.
20
(52)
FINV
Embedded Floating-Point Invalid Operation/Input Error
In mode 0, the FINV bit is set to 1 if the A or B low element operand of a floating-point instruction is
Infinity, NaN, or Denorm, or if the operation is a divide and the low element dividend and divisor are both
0.
In mode 1, the FINV bit is set on an IEEE 754 standard invalid operation (IEEE 754-1985, Section 7.1)
in the low element.
21
(53)
FDBZ
Embedded Floating-Point Divide by Zero
The FDBZ bit is set to 1 when a floating-point divide instruction executed with a low element divisor of
0, and the low element dividend is a finite non-zero number.
22
(54)
FUNF
Embedded Floating-Point Underflow
The FUNF bit is set to 1 when the execution of a floating-point instruction results in an underflow in the
low element.
23
(55)
FOVF
Embedded Floating-Point Overflow
The FOVF bit is set to 1 when the execution of a floating-point instruction results in an overflow in the
low element.
24
(56)
—
25
(57)
FINXE
Embedded Floating-Point Inexact Exception Enable
0 Exception disabled
1 Exception enabled
If the exception is enabled, a floating-point round exception is taken if for both elements, the result of a
floating-point instruction does not result in overflow or underflow, and the result for either element is
inexact (FG | FX = 1, or FGH | FXH = 1), or if the result of a floating-point instruction does result in
overflow (FOVF = 1 or FOVFH = 1) for either element, but floating-point overflow exceptions are
disabled (FOVFE = 0), or if the result of a floating-point instruction results in underflow (FUNF = 1 or
FUNFH = 1), but floating-point underflow exceptions are disabled (FUNFE = 0), and no floating-point
data exception occurs.
26
(58)
FINVE
Embedded Floating-Point Invalid Operation/Input Error Exception Enable
0 Exception disabled
1 Exception enabled
If the exception is enabled, a floating-point data exception is taken if the FINV or FINVH bit is set by a
floating-point instruction.
27
(59)
FDBZE
Embedded Floating-Point Divide by Zero Exception Enable
0 Exception disabled
1 Exception enabled
If the exception is enabled, a floating-point data exception is taken if the FDBZ or FDBZH bit is set by a
floating-point instruction.
28
(60)
FUNFE
Embedded Floating-Point Underflow Exception Enable
0 Exception disabled
1 Exception enabled
If the exception is enabled, a floating-point data exception is taken if the FUNF or FUNFH bit is set by
a floating-point instruction.
Reserved
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-4
Freescale Semiconductor
Embedded Floating-Point Unit
Table 5-1. SPE /EFPU Status and Control Register (continued)
Bits
Name
Description
29
(61)
FOVFE
Embedded Floating-Point Overflow Exception Enable
0 Exception disabled
1 Exception enabled
If the exception is enabled, a Floating-point data exception is taken if the FOVF or FOVFH bit is set by
a floating-point instruction.
30–31
(62–63)
FRMC
Embedded Floating-Point Rounding Mode Control
00 Round to nearest
01 Round toward zero
10 Round toward +infinity
11 Round toward –infinity
5.2.2
GPRs and Power ISA Embedded Category Instructions
The e200z7 core implements the 32-bit forms of the Power ISA embedded category instructions. These
32-bit instructions operate upon the lower half of the 64-bit GPR. These instructions do not affect the upper
half of a GPR.
5.2.3
SPE/EFPU Available Bit in MSR
MSR[SPE] is defined as the SPE/EFPU available bit. If this bit is clear and software attempts to execute
any of the EFPU vector instructions (evfsxxx) that affect the upper 32-bits of a GPR, the EFPU Unavailable
exception is taken. If this bit is set, software can execute any of the EFPU instructions.
5.2.4
Embedded Floating-point Exception Bit in ESR
ESR[SPE] is defined as the SPE/EFPU exception bit. This bit is set whenever the processor takes an
exception related to the execution of a SPE instruction. This bit is also set whenever the processor takes
an interrupt related to the execution of the embedded floating-point instructions. (Note that the same bit is
used for SPE exceptions. Thus, SPE and embedded floating-point interrupts are indistinguishable in the
ESR).
5.2.5
EFPU Exceptions
The architecture defines the following embedded floating-point exceptions:
• SPE/EFPU unavailable exception
• EFPU floating-point data exception
• EFPU floating-point round exception
Three new interrupt vector offset registers (IVORs), IVOR32, IVOR33, and IVOR34, are used by the
exception model. The SPR numbers are as follows:
•
•
•
528 for IVOR32
529 for IVOR33
530 for IVOR34
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-5
Embedded Floating-Point Unit
These registers are privileged.
5.2.5.1
EFPU Unavailable Exception
The EFPU unavailable exception is taken if MSR[SPE] is cleared and execution of an EFPU vector
instruction (evfsxxx) is attempted. When the EFPU Unavailable exception occurs, the processor suppresses
execution of the instruction causing the exception. The SRR0, SRR1, MSR, and ESR registers are
modified as follows:
• SRR0 is set to the effective address of the instruction causing the exception.
• SRR1 is set to the contents of the MSR at the time of the exception.
• MSR[CE, ME, DE] are unchanged. All other bits are cleared.
• ESR[SPE] is set. All other ESR bits are cleared.
Instruction execution resumes at address IVPR[0–15]||IVOR32[16–27]||0b0000.
5.2.5.2
Embedded Floating-point Data Exception
The embedded floating-point data exception vector is used for enabled floating-point invalid
operation/input error, underflow, overflow, and divide by zero exceptions (collectively called
floating-point data exceptions). When one of these enabled floating-point exceptions occurs, the processor
suppresses execution of the instruction causing the exception. The SRR0, SRR1, MSR, ESR and
SPEFSCR registers are modified as follows:
• SRR0 is set to the effective address of the instruction causing the exception.
• SRR1 is set to the contents of the MSR at the time of the exception.
• MSR bits CE, ME and DE are unchanged. All other bits are cleared.
• ESR[SPE] is set. All other ESR bits are cleared.
• One or more SPEFSCR status bits are set to indicate the type of exception. The affected bits are
FINVH, FINV, FDBZH, FDBZ, FOVFH, FOVF, FUNFH, and FUNF. SPEFSCR[FG, FGH, FX,
FXH] are cleared.
Instruction execution resumes at address IVPR[0–15]||IVOR33[16–27]||0b0000.
5.2.5.3
Embedded Floating-point Round Exception
If SPEFSCR[FINXE] is set, the embedded floating-point round exception occurs in any of the following
conditions as long as no floating-point data exception is taken:
• The unrounded result of an operation is not exact.
• An overflow occurs and overflow exceptions are disabled (FOVF or FOVFH set with FOVFE
cleared).
• An underflow occurs and underflow exceptions are disabled (FUNF set with FUNFE cleared).
The embedded floating-point round exception does not occur if an enabled embedded floating-point data
exception occurs. When the embedded floating-point round exception occurs, the unrounded (truncated)
result of an inexact high or low element is placed in the target register. If only a single element is inexact,
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-6
Freescale Semiconductor
Embedded Floating-Point Unit
the other exact element is updated with the correctly rounded result. The FG and FX bits corresponding to
the other exact element are both 0.
The bits FG and FX are provided so that an exception handler can round the result as it desires. FG (called
the “guard” bit) is the value of the bit immediately to the right of the lsb of the destination format mantissa
from the infinitely precise intermediate calculation before rounding. FX (called the “sticky” bit) is the
value of the “or” of all the bits to the right of the guard bit (FG) of the destination format mantissa from
the infinitely precise intermediate calculation before rounding.
The SRR0, SRR1, MSR, ESR and SPEFSCR registers are modified as follows:
• SRR0 is set to the effective address of the instruction following the instruction causing the
exception.
• SRR1 is set to the contents of the MSR at the time of the exception.
• MSR bits CE, ME and DE are unchanged. All other bits are cleared.
• ESR[SPE] is set. All other ESR bits are cleared.
• SPEFSCR[FGH, FG, FXH, FX] are set appropriately. SPEFSCR[FINXS] is set.
Instruction execution resumes at address IVPR[0–15]||IVOR34[16–27]||0b0000.
5.2.6
Exception Priorities
The following list shows the priority order in which exceptions are taken:
1. EFPU unavailable exception
2. EFPU floating-point data exception
3. EFPU floating-point round exception
An embedded floating-point data exception is taken if either element generates an embedded
floating-point data exception. An embedded floating-point round exception is taken if either element
generates an embedded floating-point round exception and neither element generates an EFPU
floating-point data exception.
5.3
Embedded Floating-Point Unit Operations
The e200z7 implements floating-point instructions that operate upon the contents of a 64-bit register that
is a vector of two single-precision floating-point elements. The floating-point unit shares the same register
file as the integer unit. There is no separate floating-point register file. Floating-point instructions are also
provided to perform scalar single precision floating-point operations on the low elements of registers,
without affecting the high-order portion. The Power ISA floating-point instructions are not implemented
in the e200z7.
The Freescale EIS architecture definition for embedded floating-point defines two operating modes: a
real-time, default results oriented mode (mode 0) and a “true IEEE 754 standard results” operating mode
(mode 1). Implementations of the embedded floating-point unit may choose to implement one or both of
these modes. The e200z7 hardware implements mode 0. The IEEE 754-compatible operation is still
available in mode 0 with assistance of a software envelope.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-7
Embedded Floating-Point Unit
5.3.1
Floating-point Data Formats
The EFPU supports single-precision scalar and single-precision vector floating-point data operations and
conversions. In addition, conversions between single-precision floating-point and the half-precision
floating-point storage format are supported. These formats are described in the following subsections.
5.3.1.1
Single-Precision Floating-point Format
Each single-precision floating-point data element is 32 bits wide with one sign bit (s), 8 bits of biased
exponent (e) and 23 bits of fraction (f).
In IEEE 754, floating point values are represented in a format consisting of three explicit fields (sign field,
biased exponent field, and fraction field) and an implicit hidden bit.
Hidden Bit
0
S
8 9
1
exp
31
Fraction
S—sign bit; 0—positive; 1—negative.
exp—biased exponent field (excess 127 notation).
Fraction—fractional portion of number.
Figure 5-2. Single-Precision Data Format
For normalized numbers, the biased exponent value ‘e’ lies in the range of 1 to 254 corresponding to an
actual exponent value E in the range –126 to +127, the hidden bit is a ‘1’ (for normalized numbers), and
the value of the number is interpreted as in the following equation:
S
 – 1   2 E   1.fraction 
Eqn. 5-1
where E is the unbiased exponent and 1.fraction is the significand consisting of a leading ‘1’ (the hidden
bit) and a fractional part (fraction field). With this format, the maximum positive normalized number
(pmax) is represented by the encoding 0x7F7FFFFF, which is approximately 3.4E + 38 ( 2 128 ). The
minimum positive normalized value (pmin) is represented by the encoding 0x00800000, which is
approximately 1.2E – 38 ( 2 –126 ).
Two specific values of the biased exponent are reserved, 0 and 255, for encoding special values of ±0, ±
NaN, and Denorm, as follows:
• Zeros of both positive and negative sign are represented by a biased exponent value e of zero and
a fraction f which is zero.
• Infinities of both positive and negative sign are represented by a biased exponent value of 255 and
a fraction which is zero.
• Denormalized numbers of both positive and negative sign are represented by a biased exponent
value e of 0 and a fraction f which is non-zero.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-8
Freescale Semiconductor
Embedded Floating-Point Unit
•
For these numbers, the hidden bit is defined by the IEEE 754 standard to be ‘0’. This number type
is not directly supported in hardware. Instead, either a software exception handler is invoked, or a
default value is defined, depending on the operating mode.
Not a Numbers (NaNs) are represented by a biased exponent value e of 255 and a fraction f which
is non-zero.
Defining pmax to be the most positive normalized value (farthest from zero), pmin the smallest positive
normalized value (closest to zero), nmax the most negative normalized value (farthest from zero) and nmin
the smallest normalized negative value (closest to zero), an overflow is said to have occurred if the
numerically correct result of an instruction is such that r > pmax or r < nmax. An underflow is said to have
occurred if the numerically correct result of an instruction is such that 0 < r < pmin or nmin < r < 0. In this
case, r may be denormalized, or may be smaller than the smallest denormalized number. If e = 255 and
f! = 0, then the value is a NaN. If e = 0 and f = 0, then the value is a signed 0.
The EFPU hardware does not produce +, –, NaN, or a denormalized number. If the result of an
instruction overflows and floating-point overflow exceptions are disabled (SPEFSCR[FOVFE] is cleared),
pmax or nmax is generated as the result of that instruction depending upon the sign of the result. If the
result of an instruction underflows and floating-point underflow exceptions are disabled
(SPEFSCR[FUNFE] is cleared), +0 or –0 is generated as the result of that instruction based upon the sign
of the result.
5.3.1.2
Half-Precision Floating-point Format
Half-precision floating-point storage format is supported by the EFPU with conversion operations to and
from single-precision floating-point format. No computational operations are defined for half-precision
format numbers.
Each half-precision floating-point data element is 16 bits wide with one sign bit (s), 5 bits of biased
exponent (e) and 10 bits of fraction (f).
In the IEEE 754r proposal, half-precision floating point values are represented in a format consisting of
three explicit fields (sign field, biased exponent field, and fraction field) and an implicit hidden bit, as
shown in Figure 5-3.
Hidden Bit
0
S
5
1
exp
15
6
fraction
S = sign bit, 0 = positive, 1 = negative
exp = biased exponent field (excess 15 notation)
fraction = fractional portion of number
Figure 5-3. Half-Precision Data Format
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-9
Embedded Floating-Point Unit
For normalized numbers, the biased exponent value ‘e’ lies in the range of 1 to 30 corresponding to an
actual exponent value E in the range –14 to +15; the hidden bit is a 1; and the value of the number is
interpreted as in the following equation.
S
– 1   2 E   1.fraction 
Eqn. 5-2
where E is the unbiased exponent and 1.fraction is the significand consisting of a leading ‘1’ (the hidden
bit) and a fractional part (fraction field).
With this format, the maximum positive normalized number (pmaxhp) is represented by the encoding
0x7BFF, which is 65504, and the minimum positive normalized value (pminhp) is represented by the
encoding 0x0400, which is approximately 6.1E-5 ( 2 –14 ).
Two specific values of the biased exponent are reserved; 0, and 31, for encoding special values of ±0, ±
NaN, and Denorm, as follows:
• Zeros of both positive and negative sign are represented by a biased exponent value e of zero and
a fraction f which is zero.
• Infinities of both positive and negative sign are represented by a biased exponent value of 31 and
a fraction which is zero.
• Denormalized numbers of both positive and negative sign are represented by a biased exponent
value e of 0 and a fraction f which is non-zero. For these numbers, the hidden bit is defined to be ‘0’.
• Not a Numbers (NaNs) are represented by a biased exponent value e of 31 and a fraction f which
is non-zero.
Defining pmaxhp to be the most positive normalized value (farthest from zero), pminhp the smallest
positive normalized value (closest to zero), nmaxhp the most negative normalized value (farthest from
zero) and nminhp the smallest normalized negative value (closest to zero), an overflow is said to have
occurred if the numerically correct result of a conversion is such that r > pmaxhp or r < nmaxhp. An
underflow is said to have occurred if the numerically correct result of a conversion is such that
0 < r < pminhp or nminhp < r < 0. In this case, r may be denormalized, or may be smaller than the smallest
denormalized number. If e = 31 and f  0, then the value is a NaN. If e = 0 and f = 0, then the value is a
signed 0.
The EFPU hardware does not produce +, –, NaN, or a denormalized number. If the result of a
conversion to half-precision format overflows and floating-point overflow exceptions are disabled
(SPEFSCR[FOVFE] is cleared), then pmaxhp or nmaxhp is generated as the result of that instruction
depending upon the sign of the result. If the result of conversion to half-precision format underflows and
floating-point underflow exceptions are disabled (SPEFSCR[FUNFE] is cleared), then +0 or –0 is
generated as the result of that instruction based upon the sign of the result. Conversions from half-precision
format to single-precision format are always exact, unless the source operand is a NaN, Inf, or Denorm. In
such cases, if floating-point invalid input exceptions are disabled (SPEFSCR[FINVE] is cleared), the
conversion results in a properly signed max norm or zero default result.
5.3.2
IEEE 754 Compliance
The Freescale EIS architecture specifies that the EFPU implements a single-precision floating-point
system as defined in ANSI/IEEE 754-1985 but may rely on software support in order to conform fully with
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-10
Freescale Semiconductor
Embedded Floating-Point Unit
the standard. Thus, whenever an input operand of the floating-point instruction has data values that are +,
–, denormalized, NaN, or when the result of an operation produces an overflow or an underflow, an
exception may be taken. The exception handler is responsible for delivering IEEE 754-compatible
behavior, if desired.
When floating-point invalid input exceptions are disabled (SPEFSCR[FINVE] is cleared), default results
are provided by the hardware when an infinity, denormalized, or NaN input is received, or for the operation
0/0. When floating-point underflow exceptions are disabled (SPEFSCR[FUNFE] is cleared) and the result
of a floating-point operation underflows, a signed zero result is produced. The inexact exception is also
signaled for this condition. When floating-point overflow exceptions are disabled (SPEFSCR[FOVFE] is
cleared) and the result of a floating-point operation overflows, a pmax or nmax result is produced. The
inexact exception is also signaled for this condition. An exception enable flag (SPEFSCR[FINXE]) is also
provided for generating an exception when an inexact result is produced, which allows a software handler
to conform to the IEEE 754 standard. A divide by zero exception enable flag (SPEFSCR[FDBZE]) is also
provided for generating an exception when a divide by zero operation is attempted to allow a software
handler to conform to the IEEE 754 standard. All of these exceptions may be disabled, and the hardware
then delivers an appropriate default result.
Overflow and underflow conditions are determined after rounding on e200 implementations.
5.3.3
Floating-Point Exceptions
See Section 5.2.5, “EFPU Exceptions.”
5.3.4
Embedded Scalar Single-Precision Floating-Point Instructions
The instruction descriptions in this section, use the following conventions:
• sa = the sign of operand A
• ea = the biased exponent value of operand A
• sb = the sign of operand B
• eb = the biased exponent value of operand B
• ei = an intermediate exponent value
• r = a result value.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-11
Embedded Floating-Point Unit
efsabs
efsabs
Floating-Point Single-Precision Absolute Value
efsabs
rD,rA
0
0
5
0
0
1
0
6
10 11
0
RD
15 16
RA
0
20 21
0
0
0
0
0
31
1
0
1
1
0
0
0
1
0
0
RD32:63 = 0b0 || RA33:63
Description:
The sign bit of the low element of RA is set to 0 and the result is placed into the low element of RD.
Exceptions:
If the low element of RA is Infinity, Denorm, or NaN, SPEFSCR[FINV] is set, and FG and FX are cleared.
FGH and FXH are cleared as well. If Floating-point Invalid Input exceptions are enabled, an exception is
taken, and the destination register is not updated.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-12
Freescale Semiconductor
Embedded Floating-Point Unit
efsadd
efsadd
Floating-Point Single-Precision Add
efsadd rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
1
0
0
0
0
0
0
RD32:63 = RA32:63 +sp RB32:63
Description:
The low element of RA is added to the low element of RB and the result is stored in the low element of
RD. If RA is NaN or infinity, the result is either pmax (sa==0), or nmax (sa==1). Otherwise, If RB is NaN
or infinity, the result is either pmax (sb==0), or nmax (sb==1). Otherwise, if an overflow occurs, then pmax
or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 (for rounding modes RN, RZ,
RP) or –0 (for rounding mode RM) is stored in RD.
Exceptions:
If the contents of RA or RB are Infinity, Denorm, or NaN, SPEFSCR[FINV] is set. If SPEFSCR[FINVE]
is set, an exception is taken, and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF] is set, and if an underflow occurs, SPEFSCR[FUNF] is set. If either underflow or
overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these
exceptions are taken, the destination register is not updated.
If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and
no other exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is enabled,
an exception is taken using the Floating-point Round exception vector. In this case, the destination register
is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be
performed in the exception handler, and the FGH and FXH bits are cleared.
FGH, FXH, FG, and FX are cleared if an overflow, underflow, or invalid operation/input error is signaled,
regardless of enabled exceptions.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-13
Embedded Floating-Point Unit
efscfh
efscfh
Convert Floating-Point Single-Precision from Half-Precision
efscfh
rD,rB
0
0
5
0
0
1
0
6
0
10 11
RD
0
15 16
0
1
0
0
20 21
RB
0
31
1
0
1
1
0
1
0
0
0
1
FP16format f;
FP32format result;
f  rB48:63
if (fexp = 0) & (ffrac = 0)) then
result  fsign || 310
// signed zero value
else if Isa16NaNorInfinity(f) then
SPEFSCRFINV  1
result  fsign || 0b11111110 || 231
// max value
else if Isa16Denorm(f) then
SPEFSCRFINV  1
result  fsign || 310
else
resultsign  fsign
resultexp  fexp - 15 + 127
resultfrac  ffrac || 130
rD32:63 = result
The half-precision FP number in the low half of the low element in RB is converted to a single-precision
floating-point value and the result is placed into the low element of RD. The rounding mode is not used
since this conversion is always exact.
Exceptions:
If the source element of rB is Infinity, Denorm, or NaN, SPEFSCR[FINV] is set. If SPEFSCR[FINVE] is
set, an interrupt is taken; the destination register is not updated; and the FGH, FXH, FG, and FX bits are
cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-14
Freescale Semiconductor
Embedded Floating-Point Unit
efscfsf
efscfsf
Convert Floating-Point Single-Precision from Signed Fraction
efscfsf
rD,rB
0
0
5
0
0
1
0
0
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
1
0
1
0
0
1
1
Description:
bl = RB32:63
RD32:63 = CnvtSF32ToFP32(bl)
The signed fractional low element in RB is converted to a single-precision floating-point value using the
current rounding mode and the result is placed into the low element of RD.
Exceptions:
This instruction can signal an inexact status and set SPEFSCR[FINXS] if the conversion is not exact. If
the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round
exception vector. In this case, the destination register is updated with the truncated result, the FG and FX
bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and
FXH bits are cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-15
Embedded Floating-Point Unit
efscfsi
efscfsi
Convert Floating-Point Single-Precision from Signed Integer
efscfsi
rD,rB
0
0
5
0
0
1
0
0
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
1
0
1
0
0
0
1
Description:
bl = RB32:63
RD32:63 = CnvtSI32ToFP32(bl)
The signed integer low element in RB is converted to a single-precision floating-point value using the
current rounding mode and the result is placed into the low element of RD.
Exceptions:
This instruction can signal an inexact status and set SPEFSCR[FINXS] if the conversion is not exact. If
the floating-point inexact exception is enabled, an exception is taken using the floating-point round
exception vector. In this case, the destination register is updated with the truncated result, the FG and FX
bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and
FXH bits are cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-16
Freescale Semiconductor
Embedded Floating-Point Unit
efscfuf
efscfuf
Convert Floating-Point Single-Precision from Unsigned Fraction
efscfuf
rD,rB
0
0
5
0
0
1
0
0
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
1
0
1
0
0
1
0
Description:
bl = RB32:63
RD32:63 = CnvtUF32ToFP32(bl)
The unsigned fractional low element in RB is converted to a single-precision floating-point value using
the current rounding mode and the result is placed into the low element of RD.
Exceptions:
This instruction can signal an inexact status and set SPEFSCR[FINXS] if the conversion is not exact. If
the floating-point inexact exception is enabled, an exception is taken using the floating-point round
exception vector. In this case, the destination register is updated with the truncated result, the FG and FX
bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and
FXH bits are cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-17
Embedded Floating-Point Unit
efscfui
efscfui
Convert Floating-Point Single-Precision from Unsigned Integer
efscfui
rD,rB
0
0
5
0
0
1
0
0
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
1
0
1
0
0
0
0
Description:
bl = RB32:63
RD32:63 = CnvtUI32ToFP32(bl)
The unsigned integer low element in RB is converted to a single-precision floating-point value using the
current rounding mode and the result is placed into the low element of RD.
Exceptions:
This instruction can signal an inexact status and set SPEFSCR[FINXS] if the conversion is not exact. If
the floating-point inexact exception is enabled, an exception is taken using the floating-point round
exception vector. In this case, the destination register is updated with the truncated result, the FG and FX
bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and
FXH bits are cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-18
Freescale Semiconductor
Embedded Floating-Point Unit
efscmpgt
efscmpgt
Floating-Point Single-Precision Compare Greater Than
efscmpgt
crfD,rA,rB
0
0
5
0
0
1
0
0
6
8
crfD
9
10 11
00
15 16
RA
20 21
RB
0
31
1
0
1
1
0
0
1
1
0
0
Description:
al = RA32:63
bl = RB32:63
if (al > bl) then cl = 1
else cl = 0
CR4*crfD:4*crfD+3 = undefined || cl || undefined || undefined
The low element of RA is compared against the low element of RB. If RA is greater than RB, then the bit
in the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = –0).
Exceptions:
If the contents of RA or RB are infinity, denorm, or NaN, SPEFSCR[FINV] is set, and the FGH FXH, FG
and FX bits are cleared. If floating-point invalid input exceptions are enabled then an exception is taken,
and the condition register is not updated. Otherwise, the comparison proceeds after treating NaNs,
infinities, and denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-19
Embedded Floating-Point Unit
efscmpeq
efscmpeq
Floating-Point Single-Precision Compare Equal
efscmpeq
crfD,rA,rB
0
0
5
0
0
1
0
0
6
8
crfD
9
10 11
00
15 16
RA
20 21
RB
0
31
1
0
1
1
0
0
1
1
1
0
Description:
al = RA32:63
bl = RB32:63
if (al == bl) then cl = 1
else cl = 0
CR4*crfD:4*crfD+3 = undefined || cl || undefined || undefined
The low element of RA is compared against the low element of RB. If RA is equal to RB, then the bit in
the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = –0).
Exceptions:
If the contents of RA or RB are infinity, denorm, or NaN, SPEFSCR[FINV] is set, and the FGH FXH, FG
and FX bits are cleared. If Floating-point Invalid Input exceptions are enabled, an exception is taken and
the condition register is not updated. Otherwise, the comparison proceeds after treating NaNs, infinities,
and denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-20
Freescale Semiconductor
Embedded Floating-Point Unit
efscmplt
efscmplt
Floating-Point Single-Precision Compare Less Than
efscmplt
crfD,rA,rB
0
0
5
0
0
1
0
0
6
8
crfD
9
10 11
00
15 16
RA
20 21
RB
0
31
1
0
1
1
0
0
1
1
0
1
Description:
al = RA32:63
bl = RB32:63
if (al < bl) then cl = 1
else cl = 0
CR4*crfD:4*crfD+3 = undefined || cl || undefined || undefined
The low element of RA is compared against the low element of RB. If RA is less than RB, then the bit in
the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = –0).
Exceptions:
If the contents of RA or RB are Infinity, Denorm, or NaN, SPEFSCR[FINV] is set, and the FGH FXH, FG
and FX bits are cleared. If Floating-point Invalid Input exceptions are enabled then an exception is taken,
and the condition register is not updated. Otherwise, the comparison proceeds after treating NaNs,
Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-21
Embedded Floating-Point Unit
efscth
efscth
Convert Floating-Point Single-Precision to Half-Precision
efscth
rD,rB
0
0
5
0
0
1
0
6
10 11
0
RD
0
15 16
0
1
0
0
20 21
RB
0
31
1
0
1
1
0
1
0
1
0
1
FP32format f;
FP16format result;
f  rB32:63
if (fexp = 0) & (ffrac = 0)) then
result  fsign || 150
// signed zero value
else if Isa32NaNorInfinity(f) then
SPEFSCRFINV  1
result  fsign || 0b11110 || 101
// max value
else if Isa32Denorm(f) then
SPEFSCRFINV  1
result  fsign || 150
else
unbias  fexp - 127
if unbias > 15 then
result  fsign || 0b11110 || 101
// max value
SPEFSCRFOVF  1
else if unbias < -14 && (result would not round up to bmin) then
result  fsign || 150
// like-signed zero value
SPEFSCRFUNF  1
else
resultsign  fsign
resultexp  unbias + 15
resultfrac  ffrac[0:9]
guard  ffrac[10]
sticky  (ffrac[11:22]  0)
result  Round16(result, LOWER, guard, sticky)
SPEFSCRFG  guard
SPEFSCRFX  sticky
if guard | sticky then
SPEFSCRFINXS  1
rD32:63 = 160 || result
The single-precision FP number in the low element in RB is converted to a half-precision floating-point
value using the current rounding mode. The result is then prepended with 16 zeros, and placed into the low
element of RD.
Exceptions:
If the source element of rB is Infinity, Denorm, or NaN, SPEFSCR[FINV] is set. If SPEFSCR[FINVE] is
set, an interrupt is taken, the destination register is not updated, and the FGH, FXH, FG, and FX bits are
cleared. Otherwise, if an overflow occurs, SPEFSCR[FOVF] is set, and if an underflow occurs,
SPEFSCR[FUNF] is set. If either underflow or overflow exceptions are enabled and the corresponding bit
is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-22
Freescale Semiconductor
Embedded Floating-Point Unit
If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and
no other interrupt is taken, SPEFSCR[FINXS] is set. If the floating-point inexact exception is enabled, an
interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is
updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be
performed in the interrupt handler, and the FGH and FXH bits are cleared.
FGH, FXH, FG, and FX are cleared if an overflow, underflow, or invalid operation/input error is signaled,
regardless of enabled exceptions.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-23
Embedded Floating-Point Unit
efsctsf
efsctsf
Convert Floating-Point Single-Precision to Signed Fraction
efsctsf
rD,rB
0
0
5
0
0
1
0
0
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
1
0
1
0
1
1
1
Description:
bl = RB32:63
if (bl == Denorm) then
RD32:63 = 0
else if ((bl == +0) || (bl == -0)) // zero cases
RD32:63 = 0
else if (ebl < 127) then
RD32:63 = CnvtFP32ToSF32Sat(bl)
else if ((ebl == 127) && (sbl == 1) && (fbl==0)) then
RD32:63 = 0x80000000 // max negative, no overflow
else if (bl == NAN) then RD32:63 = 0
else // Overflow
if (sbl == 0) then // Positive
RD32:63 = 0x7FFFFFFF
else
RD32:63 = 0x80000000
The single-precision floating-point low element in RB is converted to a signed fraction using the current
rounding mode and the result is saturated if it cannot be represented in a 32-bit fraction. NaNs are
converted as though they were zero.
Exceptions:
If the contents of RB are Infinity, Denorm, or NaN, or if an overflow occurs, SPEFSCR[FINV] is set, and
the FGH, FXH, FG, and FX bits are cleared. If SPEFSCR[FINVE] is set, an exception is taken, and the
destination register is not updated.
This instruction can signal an inexact status and set SPEFSCR[FINXS] if the conversion is not exact. If
the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round
exception vector. In this case, the destination register is updated with the truncated result, the FG and FX
bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and
FXH bits are cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-24
Freescale Semiconductor
Embedded Floating-Point Unit
efsctsi
efsctsi
Convert Floating-Point Single-Precision to Signed Integer
efsctsi
rD,rB
0
0
5
0
0
1
0
0
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
1
0
1
0
1
0
1
Description:
bl = RB32:63
if (bl == Denorm) then
RD32:63 = 0
else if (ebl < 158) then
RD32:63 = CnvtFP32ToSI32Sat(al)
else if ((ebl == 158) && (sbl == 1) && (fbl==0)) then
RD32:63 = 0x80000000 // max negative, no overflow
else if (bl == NAN) then RD32:63 = 0
else // Overflow
if (sbl == 0) then // Positive
RD32:63 = 0x7FFFFFFF
else
RD32:63 = 0x80000000
The single-precision floating-point low element in RB is converted to a signed integer using the current
rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted
as though they were zero.
Exceptions:
If the contents of RB are Infinity, Denorm, or NaN, or if an overflow occurs, SPEFSCR[FINV] is set, and
the FGH, FXH, FG, and FX bits are cleared. If SPEFSCR[FINVE] is set, an exception is taken, the
destination register is not updated, and no other status bits are set.
This instruction can signal an inexact status and set SPEFSCR[FINXS] if the conversion is not exact. If
the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round
exception vector. In this case, the destination register is updated with the truncated result, the FG and FX
bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and
FXH bits are cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-25
Embedded Floating-Point Unit
efsctsiz
efsctsiz
Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero
efsctsiz
rD,rB
0
0
5
0
0
1
0
0
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
1
0
1
1
0
1
0
Description:
bl = RB32:63
if (bl == Denorm) then
RD32:63 = 0
else if (ebl < 158) then
RD32:63 = CnvtFP32ToSI32Sat(bl)
else if ((ebl == 158) && (sbl == 1) && (fbl==0)) then
RD32:63 = 0x80000000 // max negative, no overflow
else if (bl == NAN) then RD32:63 = 0
else // Overflow
if (sbl == 0) then // Positive
RD32:63 = 0x7FFFFFFF
else
RD32:63 = 0x80000000
The single-precision floating-point low element in RB is converted to a signed integer using the rounding
mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs
are converted as though they were zero.
Exceptions:
If the contents of RB are Infinity, Denorm, or NaN, or if an overflow occurs, SPEFSCR[FINV] is set, and
the FGH, FXH, FG, and FX bits are cleared. If SPEFSCR[FINVE] is set, an exception is taken, the
destination register is not updated, and no other status bits are set.
This instruction can signal an inexact status and set SPEFSCR[FINXS] if the conversion is not exact. If
the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round
exception vector. In this case, the destination register is updated with the truncated result, the FG and FX
bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and
FXH bits are cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-26
Freescale Semiconductor
Embedded Floating-Point Unit
efsctuf
efsctuf
Convert Floating-Point Single-Precision to Unsigned Fraction
efsctuf
rD,rB
0
0
5
0
0
1
0
0
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
1
0
1
0
1
1
0
Description:
bl = RB32:63
if (bl == Denorm) then // force denorm to zero
RD32:63 = 0
else if ((bl == +0) || (bl == -0)) // zero cases
RD32:63 = 0
else if (sbl == 1) // Negative
RD32:63 = 0
else if (ebl < 127)
RD32:63 = CnvtFP32ToUF32Sat(bl)
else if (bl == NAN) then RD32:63 = 0
else // Overflow
RD32:63 = 0xFFFFFFFF
The single-precision floating-point low element in RB is converted to an unsigned fraction using the
current rounding mode and the result is saturated if it cannot be represented in a 32-bit unsigned fraction.
NaNs are converted as though they were zero.
Exceptions:
If the contents of RB are Infinity, Denorm, or NaN, or if an overflow occurs, SPEFSCR[FINV] is set, and
the FGH, FXH, FG, and FX bits are cleared. If SPEFSCR[FINVE] is set, an exception is taken, and the
destination register is not updated.
This instruction can signal an inexact status and set SPEFSCR[FINXS] if the conversion is not exact. If
the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round
exception vector. In this case, the destination register is updated with the truncated result, the FG and FX
bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and
FXH bits are cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-27
Embedded Floating-Point Unit
efsctui
efsctui
Convert Floating-Point Single-Precision to Unsigned Integer
efsctui
rD,rB
0
0
5
0
0
1
0
0
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
1
0
1
0
1
0
0
Description:
bl = RB32:63
if (bl == Denorm) then // force denorm to zero
RD32:63 = 0
else if ((bl == +0) || (bl == -0)) // zero cases
RD32:63 = 0
else if (sbl == 1) // Negative
RD32:63 = 0
else if (ebl <= 158)
RD32:63 = CnvtFP32ToUI32Sat(bl)
else if (bl == NAN) then RD32:63 = 0
else // Overflow
RD32:63 = 0xFFFFFFFF
The single-precision floating-point low element in RB is converted to an unsigned integer using the current
rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted
as though they were zero.
Exceptions:
If the contents of RB are Infinity, Denorm, or NaN, or if an overflow occurs, SPEFSCR[FINV] is set, and
the FGH, FXH, FG, and FX bits are cleared. If SPEFSCR[FINVE] is set, an exception is taken, and the
destination register is not updated.
This instruction can signal an inexact status and set SPEFSCR[FINXS] if the conversion is not exact. If
the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round
exception vector. In this case, the destination register is updated with the truncated result, the FG and FX
bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and
FXH bits are cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-28
Freescale Semiconductor
Embedded Floating-Point Unit
efsctuiz
efsctuiz
Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero
efsctui
rD,rB
0
0
5
0
0
1
0
0
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
1
0
1
1
0
0
0
Description:
bl = RB32:63
if (bl == Denorm) then // force denorm to zero
RD32:63 = 0
else if ((bl == +0) || (bl == -0)) // zero cases
RD32:63 = 0
else if (sbl == 1) // Negative
RD32:63 = 0
else if (ebl <= 158)
RD32:63 = CnvtFP32ToUI32Sat(bl)
else if (bl == NAN) then RD32:63 = 0
else // Overflow
RD32:63 = 0xFFFFFFFF
The single-precision floating-point low element in RB is converted to an unsigned integer using the
rounding mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer.
NaNs are converted as though they were zero.
Exceptions:
If the contents of RB are Infinity, Denorm, or NaN, or if an overflow occurs, SPEFSCR[FINV] is set, and
the FGH, FXH, FG, and FX bits are cleared. If SPEFSCR[FINVE] is set, an exception is taken, and the
destination register is not updated.
This instruction can signal an inexact status and set SPEFSCR[FINXS] if the conversion is not exact. If
the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round
exception vector. In this case, the destination register is updated with the truncated result, the FG and FX
bits are properly updated to allow rounding to be performed in the exception handler, and the FGH and
FXH bits are cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-29
Embedded Floating-Point Unit
efsdiv
efsdiv
Floating-Point Single-Precision Divide
efsdiv rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
1
0
0
1
0
0
1
RD32:63 = RA32:63 sp RB32:63
Description:
The low element of RA is divided by the low element of RB and the result is stored in the low element of
RD. If RB is a NaN or infinity, the result is a properly signed zero. Otherwise, if RB is a denormalized
number or a zero, or if RA is either NaN or infinity, the result is either pmax (sa==sb), or nmax (sa!=sb).
Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow
occurs, then +0 or –0 (as appropriate) is stored in RD.
Exceptions:
If the contents of RA or RB are Infinity, Denorm, or NaN, or if both RA and RB are ±0, SPEFSCR[FINV]
is set. If SPEFSCR[FINVE] is set, an exception is taken, and the destination register is not updated.
Otherwise, if the content of RB is ±0 and the content of RA is a finite normalized non-zero number,
SPEFSCR[FDBZ] is set. If Floating-point Divide by Zero exceptions are enabled, an exception is then
taken. Otherwise, if an overflow occurs, SPEFSCR[FOVF] is set, or if an underflow occurs,
SPEFSCR[FUNF] is set. If either underflow or overflow exceptions are enabled and the corresponding bit
is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated.
If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and
no other exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is enabled,
an exception is taken using the Floating-point Round exception vector. In this case, the destination register
is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be
performed in the exception handler, and the FGH and FXH bits are cleared.
FGH, FXH, FG and FX will be cleared if an overflow, underflow, divide by zero, or invalid operation/input
error is signaled, regardless of enabled exceptions.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-30
Freescale Semiconductor
Embedded Floating-Point Unit
efsmadd
efsmadd
Floating-Point Single-Precision Multiply-Add
efsmadd rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
1
0
0
0
0
1
0
RD32:63 = ((RA32:63 Xfp RB32:63) +sp RD32:63)
The low element of rA is multiplied by the low element of rB, the intermediate product is added to the
low element of rD, and the result is stored in the low element of rD. If RA or RB are either zero or
denormalized, the intermediate product is a properly signed zero. Otherwise, if RA or RB are either NaN
or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa  sb), and this value is used for
the result and stored into RD. Otherwise, the intermediate product is added to the corresponding element
of RD. If RD is NaN or infinity, the result is either pmax (sd==0), or nmax (sd==1). Otherwise, if an
overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 (for
rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in RD.
Exceptions:
If the contents of RA or RB are Infinity, Denorm, or NaN, SPEFSCR[FINV] is set. If SPEFSCR[FINVE]
is set, an exception is taken, and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF] is set, and if an underflow occurs, SPEFSCR[FUNF] is set. If either underflow or
overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these
exceptions are taken, the destination register is not updated.
If the result of this instruction is inexact, or if an overflow occurs on the add but overflow exceptions are
disabled and no other exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception
is enabled, an exception is taken using the Floating-point Round exception vector. In this case, the
destination register is updated with the truncated result, the FG and FX bits are properly updated to allow
rounding to be performed in the exception handler, and the FGH and FXH bits are cleared.
FGH, FXH, FG and FX will be cleared if an overflow, underflow, or invalid operation/input error is
signaled, regardless of enabled exceptions.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-31
Embedded Floating-Point Unit
efsmax
efsmax
Floating-Point Single-Precision Maximum
efsmax
rD,rA,rB
0
0
5
0
0
1
0
0
6
8
RD
9
10 11
15 16
RA
20 21
RB
0
31
1
0
1
0
1
1
0
0
0
0
alrA32:63
blrB32:63
if (al < bl) then tempbl
else tempal
if (isnan(al) & ~(isnan(bl))) then tempbl
if (isnan(bl) & ~(isnan(al))) then tempal
rD32:63temp
The low element of rA is compared against the low element of rB. The larger element is selected and
placed into the low element of rD. The maximum of +0 and –0 is +0.
Exceptions:
If the contents of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV] is set, and the FGH, FXH, FG
and FX bits are cleared. If SPEFSCR[FINVE] is set, an interrupt is taken, and the destination register is
not updated. Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as
normalized numbers, using their values of ‘e’ and ‘f’ directly. If one of the elements is a NaN and the other
is not, the non-NaN element is selected rather than the comparison result. If the selected element is
denorm, the result is a same signed zero. If the selected element is +NaN or +infinity, the corresponding
result is pmax. Otherwise, if the selected element is –NaN or –infinity, the corresponding result is nmax.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-32
Freescale Semiconductor
Embedded Floating-Point Unit
efsmin
efsmin
Floating-Point Single-Precision Minimum
efsmin
rD,rA,rB
0
0
5
0
0
1
0
0
6
8
RD
9
10 11
15 16
RA
20 21
RB
0
31
1
0
1
0
1
1
0
0
0
1
alrA32:63
blrB32:63
if (al < bl) then tempal
else tempbl
if (isnan(al) & ~(isnan(bl))) then tempbl
if (isnan(bl) & ~(isnan(al))) then tempal
rD32:63temp
The low element of rA is compared against the low element of rB. The smaller element is selected and
placed into the low element of rD. The minimum of +0 and –0 is –0.
Exceptions:
If the contents of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV] is set, and the FGH, FXH, FG
and FX bits are cleared. If SPEFSCR[FINVE] is set, an interrupt is taken, and the destination register is
not updated. Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as
normalized numbers, using their values of ‘e’ and ‘f’ directly. If one of the elements is a NaN and the other
is not, the non-NaN element is selected rather than the comparison result. If the selected element is
denorm, the result is a same signed zero. If the selected element is +NaN or +infinity, the corresponding
result is pmax. Otherwise, if the selected element is –NaN or –infinity, the corresponding result is nmax.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-33
Embedded Floating-Point Unit
efsmsub
efsmsub
Floating-Point Single-Precision Multiply-Subtract
efsmsub rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
1
0
0
0
0
1
1
RD32:63 = ((RA32:63 Xfp RB32:63) -sp RD32:63)
The low element of rA is multiplied by the low element of rB, the low element of rD is subtracted from
the intermediate product, and the result is stored in the low element of rD. If RA or RB are either zero or
denormalized, the intermediate product is a properly signed zero. Otherwise, if RA or RB are either NaN
or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa  sb), and this value is used for
the result and stored into RD. Otherwise, the low element of rD is subtracted from the intermediate
product. If RD is NaN or infinity, the result is either nmax (sd==0), or pmax (sd==1). Otherwise, if an
overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 (for
rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in RD.
Exceptions:
If the contents of RA or RB are Infinity, Denorm, or NaN, SPEFSCR[FINV] is set. If SPEFSCR[FINVE]
is set, an exception is taken, and the destination register is not updated. Otherwise, if an overflow occurs,
tSPEFSCR[FOVF] is set, and if an underflow occurs, SPEFSCR[FUNF] is set. If either underflow or
overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these
exceptions are taken, the destination register is not updated.
If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and
no other exception is taken, the SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is
enabled, an exception is taken using the Floating-point Round exception vector. In this case, the
destination register is updated with the truncated result, the FG and FX bits are properly updated to allow
rounding to be performed in the exception handler, and the FGH and FXH bits are cleared.
FGH, FXH, FG and FX will be cleared if an overflow, underflow, or invalid operation/input error is
signaled, regardless of enabled exceptions.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-34
Freescale Semiconductor
Embedded Floating-Point Unit
efsmul
efsmul
Floating-Point Single-Precision Multiply
efsmul rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
1
0
0
1
0
0
0
RD32:63 = RA32:63 Xsp RB32:63
Description:
The low element of RA is multiplied by the low element of RB and the result is stored in the low element
of RD. If RA or RB are either zero or denormalized, the result is a properly signed zero. Otherwise, if RA
or RB are either NaN or infinity, the result is either pmax (sa == sb), or nmax (sa  sb). Otherwise, if an
overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 or
–0 (as appropriate) is stored in RD.
Exceptions:
If the contents of RA or RB are Infinity, Denorm, or NaN, SPEFSCR[FINV] is set. If SPEFSCR[FINVE]
is set, an exception is taken, and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF] is set, and if an underflow occurs, the SPEFSCR[FUNF] is set. If either underflow or
overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these
exceptions are taken, the destination register is not updated.
If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and
no other exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is enabled,
an exception is taken using the Floating-point Round exception vector. In this case, the destination register
is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be
performed in the exception handler, and the FGH and FXH bits are cleared.
FGH, FXH, FG and FX are cleared if an overflow, underflow, or invalid operation/input error is signaled,
regardless of enabled exceptions.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-35
Embedded Floating-Point Unit
efsnabs
efsnabs
Floating-Point Single-Precision Negative Absolute Value
efsnabs
rD,rA
0
0
5
0
0
1
0
6
10 11
0
RD
15 16
RA
0
20 21
0
0
0
0
0
31
1
0
1
1
0
0
0
1
0
1
RD32:63 = 0b1 || RA33:63
Description:
The sign bit of the low element of RA is set to 1 and the result is placed into the low element of RD.
Exceptions:
If the low element of RA is Infinity, Denorm, or NaN, SPEFSCR[FINV] is set, and FG and FX are cleared.
FGH and FXH are cleared as well. If Floating-point Invalid Input exceptions are enabled then an exception
is taken, and the destination register is not updated.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-36
Freescale Semiconductor
Embedded Floating-Point Unit
efsneg
efsneg
Floating-Point Single-Precision Negate
efsneg
rD,rA
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
0
20 21
0
0
0
0
0
31
1
0
1
1
0
0
0
1
1
0
RD32:63 = ¬RA32 || RA33:63
Description:
The sign bit of the low element of RA is complemented and the result is placed into the low element of RD.
Exceptions:
If the low element of RA is Infinity, Denorm, or NaN, SPEFSCR[FINV] is set, and FG and FX are cleared.
FGH and FXH are cleared as well. If Floating-point Invalid Input exceptions are enabled then an exception
is taken, and the destination register is not updated.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-37
Embedded Floating-Point Unit
efsnmadd
efsnmadd
Floating-Point Single-Precision Negative Multiply-Add
efsnmadd rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
1
0
0
1
0
1
0
RD32:63 = -((RA32:63 Xfp RB32:63) +sp RD32:63)
The low element of rA is multiplied by the low element of rB, the intermediate product is added to the
low element of rD, and the negated result is stored in the low element of rD. If RA or RB are either zero
or denormalized, the intermediate product is a properly signed zero. Otherwise, if RA or RB are either NaN
or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa  sb), and this value is used for
the result and stored into RD. Otherwise, the intermediate product is added to the corresponding element
of RD, and the final result is negated. If RD is NaN or infinity, the result is either nmax (sd==0), or pmax
(sd==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an
underflow occurs, then –0 (for rounding modes RN, RZ, RP) or +0 (for rounding mode RM) is stored in
RD.
Exceptions:
If the contents of RA or RB are Infinity, Denorm, or NaN, SPEFSCR[FINV] is set. If SPEFSCR[FINVE]
is set, an exception is taken, and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF] is set, and if an underflow occurs, SPEFSCR[FUNF] is set. If either underflow or
overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these
exceptions are taken, the destination register is not updated.
If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and
no other exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is enabled,
an exception is taken using the Floating-point Round exception vector. In this case, the destination register
is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be
performed in the exception handler, and the FGH and FXH bits are cleared.
FGH, FXH, FG and FX will be cleared if an overflow, underflow, or invalid operation/input error is
signaled, regardless of enabled exceptions.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-38
Freescale Semiconductor
Embedded Floating-Point Unit
efsnmsub
efsnmsub
Floating-Point Single-Precision Negative Multiply-Subtract
efsnmsub rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
1
0
0
1
0
1
1
RD32:63 = -((RA32:63 Xfp RB32:63) -sp RD32:63)
The low element of element of rA is multiplied by the low element of rB, the low element of rD is
subtracted from the intermediate product, and the negated result is stored in the low element of rD. If RA
or RB are either zero or denormalized, the intermediate product is a properly signed zero. Otherwise, if
RA or RB are either NaN or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa  sb),
and this value is negated to obtain the result and is stored into RD. Otherwise, the low element of rD is
subtracted from the intermediate product, and the final result is negated. If RD is NaN or infinity, the final
result is either pmax (sd==0), or nmax (sd==1). Otherwise, if an overflow occurs, then pmax or nmax (as
appropriate) is stored in RD. If an underflow occurs, then –0 (for rounding modes RN, RZ, RP) or +0 (for
rounding mode RM) is stored in RD.
Exceptions:
If the contents of RA or RB are Infinity, Denorm, or NaN, SPEFSCR[FINV] is set. If SPEFSCR[FINVE]
is set, an exception is taken, and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF] is set, and if an underflow occurs, SPEFSCR[FUNF] is set. If either underflow or
overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these
exceptions are taken, the destination register is not updated.
If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and
no other exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is enabled,
an exception is taken using the Floating-point Round exception vector. In this case, the destination register
is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be
performed in the exception handler, and the FGH and FXH bits are cleared.
FGH, FXH, FG and FX are cleared if an overflow, underflow, or invalid operation/input error is signaled,
regardless of enabled exceptions.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-39
Embedded Floating-Point Unit
efssqrt
efssqrt
Floating-Point Single-Precision Square Root
efssqrt
rD,rA
0
0
5
0
0
1
0
6
10 11
0
RD
15 16
RA
0
20 21
0
0
0
0
0
31
1
0
1
1
0
0
0
1
1
1
rD32:63  SQRT(rA32:63)
The square root of the low element of rA is calculated, and the results is stored in the low element of rD.
If the low element of rA is zero or denorm, the result is a same signed zero. If the low element of rA is
+NaN or +infinity, the corresponding result is pmax. Otherwise, if the low element of rA is non-zero and
has a negative sign, including –NaN or –infinity, the corresponding result is –0. Otherwise, if an underflow
occurs, +0 (for rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in the low element
of rD.
Exceptions:
If the low element of rA is non-zero and has a negative sign, or is Infinity, Denorm, or NaN,
SPEFSCR[FINV] is set, and SPEFSCR[FGH, FXH, FG, FX] are cleared. If SPEFSCR[FINVE] is set, an
interrupt is taken and the destination register is not updated. Otherwise, if an underflow occurs,
SPEFSCR[FUNF] is set. If underflow exceptions are enabled and a corresponding status bit is set, an
interrupt is taken. If any of these interrupts are taken, the destination register is not updated.
If the result element of this instruction is inexact, or underflows but underflow exceptions are disabled,
and no other interrupt is taken, SPEFSCR[FINXS] is set. If the floating-point inexact exception is enabled,
an interrupt is taken using the floating-point round interrupt vector. In this case, the destination register is
updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be
performed in the interrupt handler, and the FGH and FXH bits are cleared.
FG, FX, FGH, and FXH are cleared if an underflow or an invalid operation/input error is signaled for the
low element, regardless of enabled exceptions.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-40
Freescale Semiconductor
Embedded Floating-Point Unit
efssub
efssub
Floating-Point Single-Precision Subtract
efssub rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
1
0
0
0
0
0
1
RD32:63 = RA32:63 -sp RB32:63
Description:
The low element of RB is subtracted from the low element of RA and the result is stored in the low element
of RD. If RA is NaN or infinity, the result is either pmax (sa==0), or nmax (sa==1). Otherwise, If RB is
NaN or infinity, the result is either nmax (sb==0), or pmax (sb==1). Otherwise, if an overflow occurs, then
pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 (for rounding modes RN,
RZ, RP) or –0 (for rounding mode RM) is stored in RD.
Exceptions:
If the contents of RA or RB are Infinity, Denorm, or NaN, SPEFSCR[FINV] is set. If SPEFSCR[FINVE]
is set, an exception is taken, and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF] is set, and if an underflow occurs, SPEFSCR[FUNF] is set. If either underflow or
overflow exceptions are enabled and the corresponding bit is set, an exception is taken. If any of these
exceptions are taken, the destination register is not updated.
If the result of this instruction is inexact or if an overflow occurs but overflow exceptions are disabled, and
no other exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is enabled,
an exception is taken using the Floating-point Round exception vector. In this case, the destination register
is updated with the truncated result, the FG and FX bits are properly updated to allow rounding to be
performed in the exception handler, and the FGH and FXH bits are cleared.
FGH, FXH, FG and FX are cleared if an overflow, underflow, or invalid operation/input error is signaled,
regardless of enabled exceptions.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-41
Embedded Floating-Point Unit
efststeq
efststeq
Floating-Point Single-Precision Test Equal
efststeq
crfD,rA,rB
0
0
5
0
0
1
0
0
6
8
crfD
9
10 11
00
15 16
RA
20 21
RB
0
31
1
0
1
1
0
1
1
1
1
0
Description:
al = RA32:63
bl = RB32:63
if (al == bl) then cl = 1
else cl = 0
CR4*crfD:4*crfD+3 = undefined || cl || undefined || undefined
The low element of RA is compared against the low element of RB. If RA is equal to RB, then the bit in
the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = –0). The comparison
proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and
‘f’ directly.
No exceptions are generated during the execution of efststeq instruction. If strict conformity to IEEE 754
standard is required, the program should use the efscmpeq instruction.
Implementation note: In an implementation, the execution of efststeq is likely to be faster than the
execution of efscmpeq instruction.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-42
Freescale Semiconductor
Embedded Floating-Point Unit
efststgt
efststgt
Floating-Point Single-Precision Test Greater Than
efststgt
crfD,rA,rB
0
0
5
0
0
1
0
0
6
8
crfD
9
10 11
00
15 16
RA
20 21
RB
0
31
1
0
1
1
0
1
1
1
0
0
Description:
al = RA32:63
bl = RB32:63
if (al > bl) then cl = 1
else cl = 0
CR4*crfD:4*crfD+3 = undefined || cl || undefined || undefined
The low element of RA is compared against the low element of RB. If RA is greater than RB, then the bit
in the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = –0). The comparison
proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and
‘f’ directly.
No exceptions are generated during the execution of efststgt instruction. If strict conformity to IEEE 754
standard is required, the program should use the efscmpgt instruction.
Implementation note: In an implementation, the execution of efststgt is likely to be faster than the
execution of efscmpgt instruction.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-43
Embedded Floating-Point Unit
efststlt
efststlt
Floating-Point Single-Precision Test Less Than
efststlt
crfD,rA,rB
0
0
5
0
0
1
0
0
6
8
crfD
9
10 11
00
15 16
RA
20 21
RB
0
31
1
0
1
1
0
1
1
1
0
1
Description:
al = RA32:63
bl = RB32:63
if (al < bl) then cl = 1
else cl = 0
CR4*crfD:4*crfD+3 = undefined || cl || undefined || undefined
The low element of RA is compared against the low element of RB. If RA is less than RB, then the bit in
the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = –0). The comparison
proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and
‘f’ directly.
No exceptions are generated during the execution of efststlt instruction. If strict conformity to IEEE 754
standard is required, the program should use the efscmplt instruction.
Implementation note: In an implementation, the execution of efststlt is likely to be faster than the
execution of efscmplt instruction.
5.3.5
EFPU Vector Single-precision Embedded Floating-Point
Instructions
The instruction descriptions in this section use the following conventions:
• sa = the sign of operand A
• ea = the biased exponent value of operand A
• sb = the sign of operand B
• eb = the biased exponent value of operand B
• ei = an intermediate exponent value
• r = a result value.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-44
Freescale Semiconductor
Embedded Floating-Point Unit
evfsabs
evfsabs
Vector Floating-Point Single-Precision Absolute Value
evfsabs
rD,rA
0
5
6
10 11
4
RD
15 16
RA
0
20 21
0
0
0
0
0
31
1
0
1
0
0
0
0
1
0
0
RD0:31 = 0b0 || RA1:31
RD32:63 = 0b0 || RA33:63
Description:
The sign bit of each element in RA is set to 0 and the results are placed into RD.
Exceptions:
If the contents of either element of RA are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are set
appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If Floating-point Invalid
Input exceptions are enabled, an exception is taken and the destination register is not updated.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-45
Embedded Floating-Point Unit
evfsadd
evfsadd
Vector Floating-Point Single-Precision Add
evfsadd rD,rA,rB
0
5
4
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
0
0
0
0
0
0
RD0:31 = RA0:31 +sp RB0:31
RD32:63 = RA32:63 +sp RB32:63
Description:
Each single-precision floating-point element of RA is added to the corresponding element of RB and the
results are stored in RD. If RA is NaN or infinity, the result is either pmax (sa==0), or nmax (sa==1).
Otherwise, If RB is NaN or infinity, the result is either pmax (sb==0), or nmax (sb==1). Otherwise, if an
overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 (for
rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in RD.
Exceptions:
If the contents of either element of RA or RB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an exception is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other exception is taken, or underflows but underflow exceptions are disabled, and no other
exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is enabled, an
exception is taken using the Floating-point Round exception vector. In this case, the destination register is
updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be
performed in the exception handler.
FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-46
Freescale Semiconductor
Embedded Floating-Point Unit
evfsaddsub
evfsaddsub
Vector Floating-Point Single-Precision Add / Subtract
evfsaddsub
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
0
0
1
0
rD0:31  rA0:31 +sp rB0:31
rD32:63  rA32:63 -sp rB32:63
The high order single-precision floating-point element of rA is added to the corresponding element of rB,
the low order single-precision floating-point element of rB is subtracted from the corresponding element
of rA, and the results are stored in rD. If an element of rA is NaN or infinity, the corresponding result is
either pmax (sa==0)or nmax (sa==1). Otherwise, if an element of rB is NaN or infinity, the corresponding
result is either pmax (sb==0) or nmax (sb==1). Otherwise, if an overflow occurs, pmax or nmax (as
appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 (for rounding modes
RN, RZ, RP) or –0 (for rounding mode RM) is stored in the corresponding element of rD.
Exceptions:
If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt
is taken, SPEFSCR[FINXS, FINXSH] is set. If the floating-point inexact exception is enabled, an interrupt
is taken using the floating-point round interrupt vector. In this case, the destination register is updated with
the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the
interrupt handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-47
Embedded Floating-Point Unit
evfsaddsubx
evfsaddsubx
Vector Floating-Point Single-Precision Add / Subtract Exchanged
evfsaddsubx
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
1
0
1
0
rD0:31  rA32:63 +sp rB0:31
rD32:63  rA0:31 -sp rB32:63
The high-order single-precision floating-point element of rB is added to the low-order element of rA, the
low-order single-precision floating-point element of rB is subtracted from the high-order element of rA,
and the results are stored in rD. If an element of rA is NaN or infinity, the corresponding result is either
pmax (sa==0) or nmax (sa==1). Otherwise, if an element of rB is NaN or infinity, the corresponding result
is either pmax (sb==0) or nmax (sb==1). Otherwise, if an overflow occurs, pmax or nmax (as appropriate)
is stored in the corresponding element of rD. If an underflow occurs, +0 (for rounding modes RN, RZ, RP)
or –0 (for rounding mode RM) is stored in the corresponding element of rD.
Exceptions:
If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt
is taken, SPEFSCR[FINXS, FINXSH] is set. If the floating-point inexact exception is enabled, an interrupt
is taken using the floating-point round interrupt vector. In this case, the destination register is updated with
the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the
interrupt handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-48
Freescale Semiconductor
Embedded Floating-Point Unit
evfsaddx
evfsaddx
Vector Floating-Point Single-Precision Add Exchanged
evfsaddx
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
1
0
0
0
rD0:31  rA32:63 +sp rB0:31
rD32:63  rA0:31 +sp rB32:63
The high-order single-precision floating-point element of rB is added to the low-order element of rA, the
low-order single-precision floating-point element of rB is added to the high-order element of rA, and the
results are stored in rD. If an element of rA is NaN or infinity, the corresponding result is either pmax or
nmax (as appropriate). Otherwise, if an element of rB is NaN or infinity, the corresponding result is either
pmax or nmax (as appropriate). Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored
in the corresponding element of rD. If an underflow occurs, +0 (for rounding modes RN, RZ, RP) or –0
(for rounding mode RM) is stored in the corresponding element of rD.
Exceptions:
If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt
is taken, SPEFSCR[FINXS, FINXSH] is set. If the floating-point inexact exception is enabled, an interrupt
is taken using the floating-point round interrupt vector. In this case, the destination register is updated with
the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the
interrupt handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-49
Embedded Floating-Point Unit
evfscfh
evfscfh
Vector Convert Floating-Point Single-Precision from Half-Precision
evfscfh
rD,rB
0
0
5
0
0
1
0
6
0
10 11
RD
0
15 16
0
1
0
0
20 21
RB
0
31
1
0
1
0
0
1
0
0
0
1
FP16format f;
FP32format result;
fh  rB24:31
fl  rB48:63
if (fhexp = 0) & (fhfrac = 0)) then
resulth  fhsign || 310
// signed zero value
else if Isa16NaNorInfinity(fh) then
SPEFSCRFINVH  1
resulth  fhsign || 0b11111110 || 231
// max value
else if Isa16Denorm(fh) then
SPEFSCRFINVH  1
resulth  fhsign || 310
else
resulthsign  fhsign
resulthexp  fhexp - 15 + 127
resulthfrac  fhfrac || 130
if (flexp = 0) & (flfrac = 0)) then
// signed zero value
resultl  flsign || 310
else if Isa16NaNorInfinity(fl) then
SPEFSCRFINV  1
// max value
resultl  flsign || 0b11111110 || 231
else if Isa16Denorm(fl) then
SPEFSCRFINV  1
resultl  flsign || 310
else
resultlsign  flsign
resultlexp  flexp - 15 + 127
resultlfrac  flfrac || 130
rD0:31 = resulth; rD32:63 = resultl
The half-precision FP number in each element in RB is converted to a single-precision floating-point value
and the result is placed into the corresponding element of RD. The rounding mode is not used since this
conversion is always exact.
Exceptions:
If either element of RB is Infinity, Denorm, or NaN,SPEFSCR[FINV, FINVH] are set appropriately, and
SPEFSCR[FGH, FXH, FG, FX] are cleared. If SPEFSCR[FINVE] is set, an exception is taken; the
destination register is not updated; and no other status bits are set.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-50
Freescale Semiconductor
Embedded Floating-Point Unit
evfscfsf
evfscfsf
Vector Convert Floating-Point Single-Precision from Signed Fraction
evfscfsf
rD,rB
0
5
4
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
0
0
1
0
0
1
1
Description:
RD0:31 = CnvtSF32ToFP32(RB0:31)
RD32:63 = CnvtSF32ToFP32(RB32:63)
Each signed fractional element of rB is converted to a single-precision floating-point value using the
current rounding mode and the results are placed into the corresponding elements of rD.
Exceptions:
This instruction can signal an inexact status and set SPEFSCR[FINXS] if the conversions are not exact. If
the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round
exception vector. In this case, the destination register is updated with the truncated result(s). The FGH,
FXH, FG, and FX bits are properly updated to allow rounding to be performed in the exception handler.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-51
Embedded Floating-Point Unit
evfscfsi
evfscfsi
Vector Convert Floating-Point Single-Precision from Signed Integer
evfscfsi
rD,rB
0
5
4
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
0
0
1
0
0
0
1
Description:
RD0:31 = CnvtSI32ToFP32(RB0:31)
RD32:63 = CnvtSI32ToFP32(RB32:63)
Each signed integer element of rB is converted to the nearest single-precision floating-point value using
the current rounding mode and the results are placed into the corresponding element of rD.
Exceptions:
This instruction can signal an inexact status and set SPEFSCR[FINXS] if the conversions are not exact. If
the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round
exception vector. In this case, the destination register is updated with the truncated result(s). The FGH,
FXH, FG, and FX bits are properly updated to allow rounding to be performed in the exception handler.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-52
Freescale Semiconductor
Embedded Floating-Point Unit
evfscfuf
evfscfuf
Vector Convert Floating-Point Single-Precision from Unsigned Fraction
evfscfuf
rD,rB
0
5
4
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
0
0
1
0
0
1
0
RD0:31 = CnvtUF32ToFP32(RB0:31)
RD32:63 = CnvtUF32ToFP32(RB32:63)
Each unsigned fractional element of rB is converted to a single-precision floating-point value using the
current rounding mode and the results are placed into the corresponding elements of rD.
Exceptions:
This instruction can signal an inexact status and set SPEFSCR[FINXS] if the conversions are not exact. If
the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round
exception vector. In this case, the destination register is updated with the truncated result(s). The FGH,
FXH, FG, and FX bits are properly updated to allow rounding to be performed in the exception handler.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-53
Embedded Floating-Point Unit
evfscfui
evfscfui
Vector Convert Floating-Point Single-Precision from Unsigned Integer
evfscfui
rD,rB
0
5
4
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
0
0
1
0
0
0
0
Description:
RD0:31 = CnvtUI32ToFP32(RB0:31)
RD32:63 = CnvtUI32ToFP32(RB32:63)
Each unsigned integer element of rB is converted to the nearest single-precision floating-point value using
the current rounding mode and the results are placed into the corresponding elements of rD.
Exceptions:
This instruction can signal an inexact status and set SPEFSCR[FINXS] if the conversions are not exact. If
the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point Round
exception vector. In this case, the destination register is updated with the truncated result(s). The FGH,
FXH, FG, and FX bits are properly updated to allow rounding to be performed in the exception handler.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-54
Freescale Semiconductor
Embedded Floating-Point Unit
evfscmpeq
evfscmpeq
Vector Floating-Point Single-Precision Compare Equal
evfscmpeq
crfD,rA,rB
0
5
4
6
8
crfD
9
10 11
00
15 16
RA
20 21
RB
0
31
1
0
1
0
0
0
1
1
1
0
Description:
ah = RA0:31
al = RA32:63
bh = RB0:31
bl = RB32:63
if (ah == bh) then ch = 1
else ch = 0
if (al == bl) then cl = 1
else cl = 0
CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl)
Each element of rA is compared against the corresponding element of rB. If rA equals RB, the crfD bit
is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = –0).
Exceptions:
If the contents of either element of RA or RB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If Floating-point Invalid
Input exceptions are enabled then an exception is taken, and the condition register is not updated.
Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers,
using their values of ‘e’ and ‘f’ directly.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-55
Embedded Floating-Point Unit
evfscmpgt
evfscmpgt
Vector Floating-Point Single-Precision Compare Greater Than
evfscmpgt
crfD,rA,rB
0
5
4
6
8
crfD
9
10 11
00
15 16
RA
20 21
RB
0
31
1
0
1
0
0
0
1
1
0
0
Description:
ah = RA0:31
al = RA32:63
bh = RB0:31
bl = RB32:63
if (ah > bh) then ch = 1
else ch = 0
if (al > bl) then cl = 1
else cl = 0
CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl)
Each element of rA is compared against the corresponding element of rB. If rA is greater than rB, the bit
in the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = –0).
Exceptions:
If the contents of either element of RA or RB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If Floating-point Invalid
Input exceptions are enabled then an exception is taken, and the condition register is not updated.
Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers,
using their values of ‘e’ and ‘f’ directly.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-56
Freescale Semiconductor
Embedded Floating-Point Unit
evfscmplt
evfscmplt
Vector Floating-Point Single-Precision Compare Less Than
evfscmplt
crfD,rA,rB
0
5
4
6
8
crfD
9
10 11
00
15 16
RA
20 21
RB
0
31
1
0
1
0
0
0
1
1
0
1
Description:
ah = RA0:31
al = RA32:63
bh = RB0:31
bl = RB32:63
if (ah < bh) then ch = 1
else ch = 0
if (al < bl) then cl = 1
else cl = 0
CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl)
Each element of rA is compared against the corresponding element of rB. If rA is less than rB, the bit in
the crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = –0).
Exceptions:
If the contents of either element of RA or RB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If Floating-point Invalid
Input exceptions are enabled then an exception is taken, and the condition register is not updated.
Otherwise, the comparison proceeds after treating NaNs, Infinities, and Denorms as normalized numbers,
using their values of ‘e’ and ‘f’ directly.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-57
Embedded Floating-Point Unit
evfscth
evfscth
Vector Convert Floating-Point Single-Precision to Half-Precision
evfscth
rD,rB
0
0
5
0
0
1
0
6
10 11
0
RD
0
15 16
0
1
0
0
20 21
RB
0
31
1
0
1
0
0
1
0
1
0
1
FP32format fh, fl;
FP16format resulth, resultl;
fh  rB0:31; fl  rB32:63
if (fhexp = 0) & (fhfrac = 0)) then
resulth  fhsign || 150
// signed zero value
else if Isa32NaNorInfinity(fh) then
SPEFSCRFINVH  1
result  fhsign || 0b11110 || 101
// max value
else if Isa32Denorm(fh) then
SPEFSCRFINVH  1
resulth  fsign || 150
else
unbias  fhexp - 127
if unbias > 15 then
resulth  fhsign || 0b11110 || 100
// max value
SPEFSCRFOVFH  1
else if unbias < -14 && (result would not round up to bmin) then
resulth  fhsign || 150
// like-signed zero value
SPEFSCRFUNFH  1
else
resulthsign  fhsign; resulthexp  unbias + 15; resulthfrac  fhfrac[0:9]
guard  fhfrac[10]; sticky  (fhfrac[11:22]  0)
resulth  Round16(resulth, LOWER, guard, sticky)
SPEFSCRFGH  guard; SPEFSCRFXH  sticky
if guard | sticky then SPEFSCRFINXS  1
if (flexp = 0) & (flfrac = 0)) then
// signed zero value
resultl  flsign || 150
else if Isa32NaNorInfinity(fl) then
SPEFSCRFINV  1
// max value
resultl  flsign || 0b11110 || 101
else if Isa32Denorm(fl) then
SPEFSCRFINV  1
resultl  flsign || 150
else
unbias  flexp - 127
if unbias > 15 then
resultl  flsign || 0b11110 || 101
// max value
SPEFSCRFOVF  1
else if unbias < -14 && (result would not round up to bmin) then
resultl  flsign || 150
// like-signed zero value
SPEFSCRFUNF  1
else
resultlsign  flsign; resultlexp  unbias + 15; resultlfrac  flfrac[0:9]
guard  flfrac[10]; sticky  (flfrac[11:22]  0)
resultl  Round16(resultl, LOWER, guard, sticky)
SPEFSCRFG  guard; SPEFSCRFX  sticky
if guard | sticky then SPEFSCRFINXS  1
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-58
Freescale Semiconductor
Embedded Floating-Point Unit
rD0:31 = 160 || resulth; rD32:63 = 160 || resultl
The single-precision FP number in each element in RB is converted to a half-precision floating-point value
using the current rounding mode. The result is then prepended with 16 zeros, and placed into the
corresponding element of RD.
Exceptions:
If the contents of either element of rB is Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are set
appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is set,
an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt
is taken, SPEFSCR[FINXS, FINXSH] is set. If the floating-point inexact exception is enabled, an interrupt
is taken using the floating-point round interrupt vector. In this case, the destination register is updated with
the truncated result(s). The FGH, FXH, FG, and FX bits are properly updated to allow rounding to be
performed in the interrupt handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-59
Embedded Floating-Point Unit
evfsctsf
evfsctsf
Vector Convert Floating-Point Single-Precision to Signed Fraction
evfsctsf
rD,rB
0
5
4
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
0
0
1
0
1
1
1
Description:
ah = RB0:31
if (ah == Denorm) then
RD0:31 = 0
else if ((al == +0) || (al == -0)) // zero cases
RD0:31 = 0
else if (eah < 127) then
RD0:31 = CnvtFP32ToSF32Sat(ah)
else if ((eah == 127) && (sah == 1) && (fah==0)) then
RD0:31 = 0x80000000 // max negative, no overflow
else if (ah == NAN) then RD0:31 = 0
else // Overflow
if (sah == 0) then // Positive
RD0:31 = 0x7FFFFFFF
else
RD0:31 = 0x80000000
al = RB32:63
if (al == Denorm) then
RD32:63 = 0
else if ((al == +0) || (al == -0)) // zero cases
RD32:63 = 0
else if (eal < 127) then
RD32:63 = CnvtFP32ToSF32Sat(al)
else if ((eal == 127) && (sal == 1) && (fal==0)) then
RD32:63 = 0x80000000 // max negative, no overflow
else if (al == NAN) then RD32:63 = 0
else // Overflow
if (sal == 0) then // Positive
RD32:63 = 0x7FFFFFFF
else
RD32:63 = 0x80000000
Each single-precision floating-point element in RB is converted to a signed fraction using the current
rounding mode and the result is saturated if it cannot be represented in a 32-bit signed fraction. NaNs are
converted as though they were zero.
Exceptions:
If either element of RB is Infinity, Denorm, or NaN or if an overflow occurs, SPEFSCR[FINV, FINVH]
are set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE]
is set, an exception is taken; the destination register is not updated; and no other status bits are set.
If either result element of this instruction is inexact and no other exception is taken, SPEFSCR[FINXS] is
set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-60
Freescale Semiconductor
Embedded Floating-Point Unit
Round exception vector. In this case, the destination register is updated with the truncated result. The FGH,
FXH, FG, and FX bits are properly updated to allow rounding to be performed in the exception handler.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-61
Embedded Floating-Point Unit
evfsctsi
evfsctsi
Vector Convert Floating-Point Single-Precision to Signed Integer
evfsctsi
rD,rB
0
5
4
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
0
0
1
0
1
0
1
Description:
ah = RB0:31
if (ah == Denorm) then
RD0:31 = 0
else if (eah < 158) then
RD0:31 = CnvtFP32ToSI32Sat(ah)
else if ((eah == 158) && (sah == 1) && (fah==0)) then
RD0:31 = 0x80000000 // max negative, no overflow
else if (ah == NAN) then RD0:31 = 0
else // Overflow
if (sah == 0) then // Positive
RD0:31 = 0x7FFFFFFF
else
RD0:31 = 0x80000000
al = RB32:63
if (al == Denorm) then
RD32:63 = 0
else if (eal < 158) then
RD32:63 = CnvtFP32ToSI32Sat(al)
else if ((eal == 158) && (sal == 1) && (fal==0)) then
RD32:63 = 0x80000000 // max negative, no overflow
else if (al == NAN) then RD32:63 = 0
else // Overflow
if (sal == 0) then // Positive
RD32:63 = 0x7FFFFFFF
else
RD32:63 = 0x80000000
Each single-precision floating-point element in RB is converted to a signed integer using the current
rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted
as though they were zero.
Exceptions:
If the contents of either element of RB are Infinity, Denorm, or NaN or if an overflow occurs on
conversion, SPEFSCR[FINV, FINVH] are set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are
cleared appropriately. If SPEFSCR[FINVE] is set, an exception is taken, the destination register is not
updated, and no other status bits are set.
If either result element of this instruction is inexact and no other exception is taken, SPEFSCR[FINXS] is
set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-62
Freescale Semiconductor
Embedded Floating-Point Unit
Round exception vector. In this case, the destination register is updated with the truncated result. The FGH,
FXH, FG and FX bits are properly updated to allow rounding to be performed in the exception handler.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-63
Embedded Floating-Point Unit
evfsctsiz
evfsctsiz
Vector Convert Floating-Point Single-Precision to Signed Integer with Round toward Zero
evfsctsiz
rD,rB
0
5
4
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
0
0
1
1
0
1
0
Description:
ah = RB0:31
if (ah == Denorm) then
RD0:31 = 0
else if (eah < 158) then
RD0:31 = CnvtFP32ToSI32Sat(ah)
else if ((eah == 158) && (sah == 1) && (fah==0)) then
RD0:31 = 0x80000000 // max negative, no overflow
else if (ah == NAN) then RD0:31 = 0
else // Overflow
if (sah == 0) then // Positive
RD0:31 = 0x7FFFFFFF
else
RD0:31 = 0x80000000
al = RB32:63
if (al == Denorm) then
RD32:63 = 0
else if (eal < 158) then
RD32:63 = CnvtFP32ToSI32Sat(al)
else if ((eal == 158) && (sal == 1) && (fal==0)) then
RD32:63 = 0x80000000 // max negative, no overflow
else if (al == NAN) then RD32:63 = 0
else // Overflow
if (sal == 0) then // Positive
RD32:63 = 0x7FFFFFFF
else
RD32:63 = 0x80000000
Each single-precision floating-point element in RB is converted to a signed integer using the rounding
mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs
are converted as though they were zero.
Exceptions:
If either element of RB is Infinity, Denorm, or NaN or if an overflow occurs, SPEFSCR[FINV, FINVH]
are set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE]
is set, an exception is taken, the destination register is not updated, and no other status bits are set.
If either result element of this instruction is inexact and no other exception is taken, SPEFSCR[FINXS] is
set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point
Round exception vector. In this case, the destination register is updated with the truncated result. The FGH,
FXH, FG and FX bits are properly updated to allow rounding to be performed in the exception handler.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-64
Freescale Semiconductor
Embedded Floating-Point Unit
evfsctuf
evfsctuf
Vector Convert Floating-Point Single-Precision to Unsigned Fraction
evfsctuf
rD,rB
0
5
4
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
0
0
1
0
1
1
0
Description:
ah = RB0:31
if (ah == Denorm) then // force denorm to zero
RD0:31 = 0
else if ((ah == +0) || (ah == -0)) // zero cases
RD0:31 = 0
else if (sah == 1) // Negative
RD0:31 = 0
else if (eah < 127)
RD0:31 = CnvtFP32ToUF32Sat(ah)
else if (ah == NAN) then RD0:31 = 0
else // Overflow
RD0:31 = 0xFFFFFFFF
al = RB32:63
if (al == Denorm) then
RD32:63 = 0
else if ((al == +0) || (al == -0)) // zero cases
RD32:63 = 0
else if (sal == 1) // Negative
RD32:63 = 0
else if (eal < 127)
RD32:63 = CnvtFP32ToUF32Sat(al)
else if (al == NAN) then RD32:63 = 0
else // Overflow
RD32:63 = 0xFFFFFFFF
Each single-precision floating-point element in RB is converted to an unsigned fraction using the current
rounding mode and the result is saturated if it cannot be represented in a 32-bit fraction. NaNs are
converted as though they were zero.
Exceptions:
If either element of RB is Infinity, Denorm, or NaN, or if an overflow occurs, SPEFSCR[FINV, FINVH]
are set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE]
is set, an exception is taken; the destination register is not updated; and no other status bits are set.
If either result element of this instruction is inexact and no other exception is taken, SPEFSCR[FINXS] is
set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point
Round exception vector. In this case, the destination register is updated with the truncated result. The FGH,
FXH, FG and FX bits are properly updated to allow rounding to be performed in the exception handler.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-65
Embedded Floating-Point Unit
evfsctui
evfsctui
Vector Convert Floating-Point Single-Precision to Unsigned Integer
evfsctui
rD,rB
0
5
4
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
0
0
1
0
1
0
0
Description:
ah = RB0:31
if (ah == Denorm) then // force denorm to zero
RD0:31 = 0
else if ((ah == +0) || (ah == -0)) // zero cases
RD0:31 = 0
else if (sah == 1) // Negative
RD0:31 = 0
else if (eah <= 158)
RD0:31 = CnvtFP32ToUI32Sat(ah)
else if (ah == NAN) then RD0:31 = 0
else // Overflow
RD0:31 = 0xFFFFFFFF
al = RB32:63
if (al == Denorm) then
RD32:63 = 0
else if ((al == +0) || (al == -0)) // zero cases
RD32:63 = 0
else if (sal == 1) // Negative
RD32:63 = 0
else if (eal <= 158)
RD32:63 = CnvtFP32ToUI32Sat(al)
else if (al == NAN) then RD32:63 = 0
else // Overflow
RD32:63 = 0xFFFFFFFF
Each single-precision floating-point element in RB is converted to an unsigned integer using the current
rounding mode and the result is saturated if it cannot be represented in a 32-bit integer. NaNs are converted
as though they were zero.
Exceptions:
If either element of RB is Infinity, Denorm, or NaN, or if an overflow occurs, SPEFSCR[FINV, FINVH]
are set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE]
is set, an exception is taken; the destination register is not updated; and no other status bits are set.
If either result element of this instruction is inexact and no other exception is taken, SPEFSCR[FINXS] is
set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point
Round exception vector. In this case, the destination register is updated with the truncated result. The FGH,
FXH, FG and FX bits are properly updated to allow rounding to be performed in the exception handler.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-66
Freescale Semiconductor
Embedded Floating-Point Unit
evfsctuiz
evfsctuiz
Vector Convert Floating-Point Single-Precision to Unsigned Integer with Round toward Zero
evfsctui
rD,rB
0
5
4
6
10 11
RD
0
15 16
0
0
0
0
20 21
RB
0
31
1
0
1
0
0
1
1
0
0
0
Description:
ah = RB0:31
if (ah == Denorm) then // force denorm to zero
RD0:31 = 0
else if ((ah == +0) || (ah == -0)) // zero cases
RD0:31 = 0
else if (sah == 1) // Negative
RD0:31 = 0
else if (eah <= 158)
RD0:31 = CnvtFP32ToUI32Sat(ah)
else if (ah == NAN) then RD0:31 = 0
else // Overflow
RD0:31 = 0xFFFFFFFF
al = RB32:63
if (al == Denorm) then
RD32:63 = 0
else if ((al == +0) || (al == -0)) // zero cases
RD32:63 = 0
else if (sal == 1) // Negative
RD32:63 = 0
else if (eal <= 158)
RD32:63 = CnvtFP32ToUI32Sat(al)
else if (al == NAN) then RD32:63 = 0
else // Overflow
RD32:63 = 0xFFFFFFFF
Each single-precision floating-point element in RB is converted to an unsigned integer using the rounding
mode Round toward Zero and the result is saturated if it cannot be represented in a 32-bit integer. NaNs
are converted as though they were zero.
Exceptions:
If either element of RB is Infinity, Denorm, or NaN, or if an overflow occurs, SPEFSCR[FINV, FINVH]
are set appropriately, and SPEFSCR[FGH, FXH, FG, FX] cleared appropriately. If SPEFSCR[FINVE] is
set, an exception is taken, the destination register is not updated, and no other status bits are set.
If either result element of this instruction is inexact and no other exception is taken, SPEFSCR[FINXS] is
set. If the Floating-point Inexact exception is enabled, an exception is taken using the Floating-point
Round exception vector. In this case, the destination register is updated with the truncated result. The FGH,
FXH, FG, and FX bits are properly updated to allow rounding to be performed in the exception handler.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-67
Embedded Floating-Point Unit
evfsdiff
evfsdiff
Vector Floating-Point Single-Precision Differences
evfsdiff
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
0
1
0
1
rD0:31  rA0:31 -sp rA32:63
rD32:63  rB0:31 -sp rB32:63
The low-order single-precision floating-point element of rA is subtracted from the high-order element of
rA, the low-order single-precision floating-point element of rB is subtracted from the high-order element
of rB, and the results are stored in rD. If the high-order element of rA or rB is NaN or infinity, the
corresponding result is either pmax or nmax (as appropriate). Otherwise, if the low order element of rA or
rB is NaN or infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if an
overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an
underflow occurs, +0 (for rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in the
corresponding element of rD.
Exceptions:
If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt
is taken, SPEFSCR[FINXS, FINXSH] is set. If the floating-point inexact exception is enabled, an interrupt
is taken using the floating-point round interrupt vector. In this case, the destination register is updated with
the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the
interrupt handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-68
Freescale Semiconductor
Embedded Floating-Point Unit
evfsdiffsum
evfsdiffsum
Vector Floating-Point Single-Precision Difference / Sum
evfsdiffsum
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
0
1
1
1
rD0:31  rA0:31 -sp rA32:63
rD32:63  rB0:31 +sp rB32:63
The low-order single-precision floating-point element of rA is subtracted from the high-order element of
rA, the low-order single-precision floating-point element of rB is added to the high-order element of rB,
and the results are stored in rD. If the high-order element of rA or rB is NaN or infinity, the corresponding
result is either pmax or nmax (as appropriate). Otherwise, if the low order element of rA or rB is NaN or
infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if an overflow occurs,
pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0
(for rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in the corresponding element
of rD.
Exceptions:
If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt
is taken, SPEFSCR[FINXS, FINXSH] is set. If the floating-point inexact exception is enabled, an interrupt
is taken using the floating-point round interrupt vector. In this case, the destination register is updated with
the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the
interrupt handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-69
Embedded Floating-Point Unit
evfsdiv
evfsdiv
Vector Floating-Point Single-Precision Divide
evfsdiv rD,rA,rB
0
5
4
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
0
0
1
0
0
1
RD0:31 = RA0:31 sp RB0:31
RD32:63 = RA32:63 sp RB32:63
Each single-precision floating-point element of rA is divided by the corresponding element of rB and the
result is stored in rD. If RB is a NaN or infinity, the result is a properly signed zero. Otherwise, if RB is a
denormalized number or a zero, or if RA is either NaN or infinity, the result is either pmax (sa==sb), or
nmax (sa  sb). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If
an underflow occurs, then +0 or –0 (as appropriate) is stored in RD.
Exceptions:
If the contents of RA or RB are Infinity, Denorm, or NaN, or if both RA and RB are ±0, the
SPEFSCR[FINV, FINVH] are set appropriately, and the SPEFSCR[FGH, FXH, FG, FX] are cleared
appropriately. If SPEFSCR[FINVE] is set, an exception is taken and the destination register is not updated.
Otherwise, if the content of RB is ±0 and the content of RA is a finite normalized non-zero number, the
SPEFSCR[FDBZ, FDBZH] are set appropriately. If Floating-point Divide by Zero exceptions are enabled,
an exception is then taken. Otherwise, if an overflow occurs, SPEFSCR[FOVF, FOVFH] are set
appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH] are set appropriately. If either
underflow or overflow exceptions are enabled and a corresponding bit is set, an exception is taken. If any
of these exceptions are taken, the destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other exception is taken, or underflows but underflow exceptions are disabled, and no other
exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is enabled, an
exception is taken using the Floating-point Round exception vector. In this case, the destination register is
updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be
performed in the exception handler.
FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-70
Freescale Semiconductor
Embedded Floating-Point Unit
evfsmadd
evfsmadd
Vector Floating-Point Single-Precision Multiply-Add
evfsmadd rD,rA,rB
0
5
4
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
0
0
0
0
1
0
RD0:31 = ((RA0:31 Xfp RB0:31) +sp RD0:31)
RD32:63 = ((RA32:63 Xfp RB32:63) +sp RD32:63)
Each single-precision floating-point element of rA is multiplied with the corresponding element of rB, the
intermediate product is added to the corresponding element of rD, and the result is stored in rD. If RA or
RB are either zero or denormalized, the intermediate product is a properly signed zero. Otherwise, if RA
or RB are either NaN or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa  sb), and
this value is used for the result and stored into RD. Otherwise, the intermediate product is added to the
corresponding element of RD. If RD is NaN or infinity, the result is either pmax (sd==0), or nmax (sd==1).
Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow
occurs, then +0 (for rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in RD.
Exceptions:
If the contents of either element of RA, RB, or RD are Infinity, Denorm, or NaN, SPEFSCR[FINV,
FINVH] are set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If
SPEFSCR[FINVE] is set, an exception is taken and the destination register is not updated. Otherwise, if
an overflow occurs, SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs,
SPEFSCR[FUNF, FUNFH] are set appropriately. If either underflow or overflow exceptions are enabled
and a corresponding status bit is set, an exception is taken. If any of these exceptions are taken, the
destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other exception is taken, or underflows but underflow exceptions are disabled, and no other
exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is enabled, an
exception is taken using the Floating-point Round exception vector. In this case, the destination register is
updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be
performed in the exception handler.
FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-71
Embedded Floating-Point Unit
evfsmax
evfsmax
Vector Floating-Point Single-Precision Maximum
evfsmax
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
0
0
0
0
ahrA0:31
bhrB0:31
if (ah < bh) then temphbh
else temphah
if (isnan(ah) & ~(isnan(bh))) then temphbh
if (isnan(bh) & ~(isnan(ah))) then temphah
rD0:31temph
alrA32:63
blrB32:63
if (al < bl) then templbl
else templal
if (isnan(al) & ~(isnan(bl))) then templbl
if (isnan(bl) & ~(isnan(al))) then templal
rD32:63templ
Each single-precision floating-point element of rA is compared against the corresponding elements of rB.
The larger element is selected and placed into the corresponding element of rD. The maximum of +0 and
–0 is +0.
Exceptions:
If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an interrupt is taken, and the destination register is not updated. Otherwise, the comparison proceeds
after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’
directly. If one of the elements is a NaN and the other is not, the non-NaN element is selected rather than
the comparison result. If the selected element is denorm, the result is a same signed zero. If the selected
element is +NaN or +infinity, the corresponding result is pmax. Otherwise, if the selected element is –NaN
or –infinity, the corresponding result is nmax.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-72
Freescale Semiconductor
Embedded Floating-Point Unit
evfsmin
evfsmin
Vector Floating-Point Single-Precision Minimum
evfsmin
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
0
0
0
1
ahrA0:31
bhrB0:31
if (ah < bh) then temphah
else temphbh
if (isnan(ah) & ~(isnan(bh))) then temphbh
if (isnan(bh) & ~(isnan(ah))) then temphah
rD0:31temph
alrA32:63
blrB32:63
if (al < bl) then templal
else templbl
if (isnan(al) & ~(isnan(bl))) then templbl
if (isnan(bl) & ~(isnan(al))) then templal
rD32:63templ
Each single-precision floating-point element of rA is compared against the corresponding elements of rB.
The smaller element is selected and placed into the corresponding element of rD. The minimum of +0 and
–0 is –0.
Exceptions:
If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an interrupt is taken, and the destination register is not updated. Otherwise, the comparison proceeds
after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’
directly. If one of the elements is a NaN and the other is not, the non-NaN element is selected rather than
the comparison result. If the selected element is denorm, the result is a same signed zero. If the selected
element is +NaN or +infinity, the corresponding result is pmax. Otherwise, if the selected element is –NaN
or –infinity, the corresponding result is nmax.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-73
Embedded Floating-Point Unit
evfsmsub
evfsmsub
Vector Floating-Point Single-Precision Multiply-Subtract
evfsmsub rD,rA,rB
0
5
4
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
0
0
0
0
1
1
RD0:31 = ((RA0:31 Xfp RB0:31) -sp RD0:31)
RD32:63 = ((RA32:63 Xfp RB32:63) -sp RD32:63)
Each single-precision floating-point element of rA is multiplied with the corresponding element of rB, the
corresponding element of rD is subtracted from the intermediate product, and the result is stored in rD. If
RA or RB are either zero or denormalized, the intermediate product is a properly signed zero. Otherwise,
if RA or RB are either NaN or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa  sb),
and this value is used for the result and stored into RD. Otherwise, the corresponding element of rD is
subtracted from the intermediate product. If RD is NaN or infinity, the result is either nmax (sd==0), or
pmax (sd==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an
underflow occurs, then +0 (for rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in
RD.
Exceptions:
If the contents of either element of RA, RB, or RD are Infinity, Denorm, or NaN, SPEFSCR[FINV,
FINVH] are set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If
SPEFSCR[FINVE] is set, an exception is taken and the destination register is not updated. Otherwise, if
an overflow occurs, SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs,
SPEFSCR[FUNF, FUNFH] are set appropriately. If either underflow or overflow exceptions are enabled
and a corresponding status bit is set, an exception is taken. If any of these exceptions are taken, the
destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other exception is taken, or underflows but underflow exceptions are disabled, and no other
exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is enabled, an
exception is taken using the Floating-point Round exception vector. In this case, the destination register is
updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be
performed in the exception handler.
FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-74
Freescale Semiconductor
Embedded Floating-Point Unit
evfsmul
evfsmul
Vector Floating-Point Single-Precision Multiply
evfsmul rD,rA,rB
0
5
4
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
0
0
1
0
0
0
RD0:31 = RA0:31 Xsp RB0:31
RD32:63 = RA32:63 Xsp RB32:63
Each single-precision floating-point element of rA is multiplied with the corresponding element of rB and
the result is stored in rD. If RA or RB are either zero or denormalized, the result is a properly signed zero.
Otherwise, if RA or RB are either NaN or infinity, the result is either pmax (sa==sb), or nmax (sa!=sb).
Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow
occurs, then +0 or –0 (as appropriate) is stored in RD.
Exceptions:
If the contents of either element of RA or RB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an exception is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other exception is taken, or underflows but underflow exceptions are disabled, and no other
exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is enabled, an
exception is taken using the Floating-point Round exception vector. In this case, the destination register is
updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be
performed in the exception handler.
FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-75
Embedded Floating-Point Unit
evfsmule
evfsmule
Vector Floating-Point Single-Precision Multiply By Even Element
evfsmule
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
1
1
1
0
rD0:31  rA0:31 sp rB0:31
rD32:63  rA0:31 sp rB32:63
The single-precision floating-point elements of rB are multiplied by the even (high-order) element of rA,
and the results are stored in rD. If an element of rB or the even element of rA is either zero denormalized,
the corresponding result is a properly signed zero. Otherwise, if an element of rB or the even element of
rA is either NaN or infinity, the corresponding result is either pmax (asign==bsign), or nmax (asign  bsign).
Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of
rD. If an underflow occurs, +0 or –0 (as appropriate) is stored in the corresponding element of rD.
Exceptions:
If the contents of either element of rB or the even element of rA is Infinity, Denorm, or NaN,
SPEFSCR[FINV, FINVH] are set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared
appropriately. If SPEFSCR[FINVE] is set, an interrupt is taken and the destination register is not updated.
Otherwise, if an overflow occurs, SPEFSCR[FOVF, FOVFH] are set appropriately, or if an underflow
occurs, SPEFSCR[FUNF, FUNFH] are set appropriately. If either underflow or overflow exceptions are
enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the
destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt
is taken, SPEFSCR[FINXS] is set. If the floating-point inexact exception is enabled, an interrupt is taken
using the floating-point round interrupt vector. In this case, the destination register is updated with the
truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the
interrupt handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow exception is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-76
Freescale Semiconductor
Embedded Floating-Point Unit
evfsmulo
evfsmulo
Vector Floating-Point Single-Precision Multiply By Odd Element
evfsmulo
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
1
1
1
1
rD0:31  rA32:63 sp rB0:31
rD32:63  rA32:63 sp rB32:63
The single-precision floating-point elements of rB are multiplied by the odd (low-order) element of rA,
and the results are stored in rD. If an element of rB or the odd element of rA is either zero or denormalized,
the corresponding result is a properly signed zero. Otherwise, if an element of rB or the odd element of rA
is either NaN or infinity, the corresponding result is either pmax (asign==bsign), or nmax (asign  bsign).
Otherwise, if an overflow occurs, pmax or nmax (as appropriate) is stored in the corresponding element of
rD. If an underflow occurs, +0 or –0 (as appropriate) is stored in the corresponding element of rD.
Exceptions:
If the contents of either element of rB or the odd element of rA is Infinity, Denorm, or NaN,
SPEFSCR[FINV, FINVH] are set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared
appropriately. If SPEFSCR[FINVE] is set, an interrupt is taken and the destination register is not updated.
Otherwise, if an overflow occurs, SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow
occurs, SPEFSCR[FUNF, FUNFH] are set appropriately. If either underflow or overflow exceptions are
enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts are taken, the
destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt
is taken, SPEFSCR[FINXS] is set. If the floating-point inexact exception is enabled, an interrupt is taken
using the floating-point round interrupt vector. In this case, the destination register is updated with the
truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the
interrupt handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow exception is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-77
Embedded Floating-Point Unit
evfsmulx
evfsmulx
Vector Floating-Point Single-Precision Multiply Exchanged
evfsmulx
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
1
1
0
0
rD0:31  rA32:63 sp rB0:31
rD32:63  rA0:31 sp rB32:63
The high-order single-precision floating-point element of rB is multiplied by the low-order element of rA,
the low-order single-precision floating-point element of rB is multiplied by the high-order element of rA,
and the results are stored in rD. If an element of rA or rB is either zero or denormalized, the corresponding
result is a properly signed zero. Otherwise, if an element of rA or rB are either NaN or infinity, the
corresponding result is either pmax (asign==bsign), or nmax (asign  bsign). Otherwise, if an overflow
occurs, pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs,
+0 or –0 (as appropriate) is stored in the corresponding element of rD.
Exceptions:
If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt
is taken, SPEFSCR[FINXS] is set. If the floating-point inexact exception is enabled, an interrupt is taken
using the floating-point round interrupt vector. In this case, the destination register is updated with the
truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the
interrupt handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow exception is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-78
Freescale Semiconductor
Embedded Floating-Point Unit
evfsnabs
evfsnabs
Vector Floating-Point Single-Precision Negative Absolute Value
evfsnabs
rD,rA
0
5
6
10 11
4
RD
15 16
RA
0
20 21
0
0
0
0
0
31
1
0
1
0
0
0
0
1
0
1
RD0:31 = 0b1 || RA1:31
RD32:63 = 0b1 || RA33:63
Description:
The sign bit of each element in RA is set to 1 and the results are placed into RD.
Exceptions:
If the contents of either element of RA are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are set
appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If Floating-point Invalid
Input exceptions are enabled then an exception is taken, and the destination register is not updated.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-79
Embedded Floating-Point Unit
evfsneg
evfsneg
Vector Floating-Point Single-Precision Negate
evfsneg
rD,rA
0
5
4
6
10 11
RD
15 16
RA
0
20 21
0
0
0
0
0
31
1
0
1
0
0
0
0
1
1
0
RD0:31 = ¬RA0 || RA1:31
RD32:63 = ¬RA32 || RA33:63
Description:
The sign bit of each element in RA is complemented and the results are placed into RD.
Exceptions:
If the contents of either element of RA are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are set
appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If Floating-point Invalid
Input exceptions are enabled then an exception is taken, and the destination register is not updated.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-80
Freescale Semiconductor
Embedded Floating-Point Unit
evfsnmadd
evfsnmadd
Vector Floating-Point Single-Precision Negative Multiply-Add
evfsnmadd rD,rA,rB
0
5
4
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
0
0
1
0
1
0
RD0:31 = -((RA0:31 Xfp RB0:31) +sp RD0:31)
RD32:63 = -((RA32:63 Xfp RB32:63) +sp RD32:63)
Each single-precision floating-point element of rA is multiplied with the corresponding element of rB, the
intermediate product is added to the corresponding element of rD, and the negated result is stored in rD.
If RA or RB are either zero or denormalized, the intermediate product is a properly signed zero. Otherwise,
if RA or RB are either NaN or infinity, the intermediate product is either pmax (sa==sb), or nmax (sa!=sb),
and this value is used for the result and stored into RD. Otherwise, the intermediate product is added to the
corresponding element of RD, and the final result is negated. If RD is NaN or infinity, the result is either
nmax (sd==0), or pmax (sd==1). Otherwise, if an overflow occurs, then pmax or nmax (as appropriate) is
stored in RD. If an underflow occurs, then –0 (for rounding modes RN, RZ, RP) or +0 (for rounding mode
RM) is stored in RD.
Exceptions:
If the contents of either element of RA, RB, or RD are Infinity, Denorm, or NaN, SPEFSCR[FINV,
FINVH] are set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If
SPEFSCR[FINVE] is set, an exception is taken and the destination register is not updated. Otherwise, if
an overflow occurs, SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs,
SPEFSCR[FUNF, FUNFH] are set appropriately. If either underflow or overflow exceptions are enabled
and a corresponding status bit is set, an exception is taken. If any of these exceptions are taken, the
destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other exception is taken, or underflows but underflow exceptions are disabled, and no other
exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is enabled, an
exception is taken using the Floating-point Round exception vector. In this case, the destination register is
updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be
performed in the exception handler.
FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-81
Embedded Floating-Point Unit
evfsnmsub
evfsnmsub
Vector Floating-Point Single-Precision Negative Multiply-Subtract
evfsnmsub rD,rA,rB
0
5
4
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
0
0
1
0
1
1
RD0:31 = -((RA0:31 Xfp RB0:31) -sp RD0:31)
RD32:63 = -((RA32:63 Xfp RB32:63)-sp RD32:63)
Each single-precision floating-point element of rA is multiplied with the corresponding element of rB, the
corresponding element of rD is subtracted from the intermediate product, and the negated result is stored
in rD. If RA or RB are either zero or denormalized, the intermediate product is a properly signed zero.
Otherwise, if RA or RB are either NaN or infinity, the intermediate product is either pmax (sa==sb), or
nmax (sa!=sb), and this value is negated to obtain the result and is stored into RD. Otherwise, the
corresponding element of rD is subtracted from the intermediate product, and the final result is negated.
If RD is NaN or infinity, the final result is either pmax (sd==0), or nmax (sd==1). Otherwise, if an overflow
occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then –0 (for rounding
modes RN, RZ, RP) or +0 (for rounding mode RM) is stored in RD.
Exceptions:
If the contents of either element of RA, RB, or RD are Infinity, Denorm, or NaN, SPEFSCR[FINV,
FINVH] are set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If
SPEFSCR[FINVE] is set, an exception is taken and the destination register is not updated. Otherwise, if
an overflow occurs, SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs,
SPEFSCR[FUNF, FUNFH] are set appropriately. If either underflow or overflow exceptions are enabled
and a corresponding status bit is set, an exception is taken. If any of these exceptions are taken, the
destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other exception is taken, or underflows but underflow exceptions are disabled, and no other
exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is enabled, an
exception is taken using the Floating-point Round exception vector. In this case, the destination register is
updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be
performed in the exception handler.
FG and FX (FGH and FXH) will be cleared if an overflow or underflow exception is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-82
Freescale Semiconductor
Embedded Floating-Point Unit
evfssqrt
evfssqrt
Vector Floating-Point Single-Precision Square Root
evfssqrt
rD,rA
0
0
5
0
0
1
0
6
10 11
0
RD
15 16
RA
0
20 21
0
0
0
0
0
31
1
0
1
0
0
0
0
1
1
1
rD0:31  SQRT(rA0:31)
rD32:63  SQRT(rA32:63)
The square root of each single-precision floating-point element of rA is calculated, and the results are
stored in rD. If an element of rA is zero or denorm, the result is a same signed zero. If an element of rA is
+NaN or +infinity, the corresponding result is pmax. Otherwise, if an element of rA is non-zero and has a
negative sign, including –NaN or –infinity, the corresponding result is –0. Otherwise, if an underflow
occurs, +0 (for rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in the corresponding
element of rD.
Exceptions:
If the contents of either element of rA are non-zero and have a negative sign, or are Infinity, Denorm, or
NaN, SPEFSCR[FINV, FINVH] are set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared
appropriately. If SPEFSCR[FINVE] is set, an interrupt is taken and the destination register is not updated.
Otherwise, if an underflow occurs, SPEFSCR[FUNF, FUNFH] are set appropriately. If underflow
exceptions are enabled and a corresponding status bit is set, an interrupt is taken. If any of these interrupts
are taken, the destination register is not updated.
If either result element of this instruction is inexact, or underflows but underflow exceptions are disabled,
and no other interrupt is taken, SPEFSCR[FINXS, FINXSH] is set. If the floating-point inexact exception
is enabled, an interrupt is taken using the floating-point round interrupt vector. In this case, the destination
register is updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding
to be performed in the interrupt handler.
FG and FX (FGH and FXH) are cleared if an underflow interrupt is taken, or if an invalid operation/input
error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-83
Embedded Floating-Point Unit
evfssub
evfssub
Vector Floating-Point Single-Precision Subtract
evfssub rD,rA,rB
0
5
4
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
0
0
0
0
0
1
RD0:31 = RA0:31 -sp RB0:31
RD32:63 = RA32:63 -sp RB32:63
Description:
Each single-precision floating-point element of RB is subtracted from the corresponding element of RA
and the results are stored in RD. If RA is NaN or infinity, the result is either pmax (sa==0), or nmax (sa==1).
Otherwise, If RB is NaN or infinity, the result is either nmax (sb==0), or pmax (sb==1). Otherwise, if an
overflow occurs, then pmax or nmax (as appropriate) is stored in RD. If an underflow occurs, then +0 (for
rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in RD.
Exceptions:
If the contents of either element of RA or RB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an exception is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an exception is taken. If any of these exceptions are taken, the destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other exception is taken, or underflows but underflow exceptions are disabled, and no other
exception is taken, SPEFSCR[FINXS] is set. If the Floating-point Inexact exception is enabled, an
exception is taken using the Floating-point Round exception vector. In this case, the destination register is
updated with the truncated result(s). The FG and FX bits are properly updated to allow rounding to be
performed in the exception handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow exception is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-84
Freescale Semiconductor
Embedded Floating-Point Unit
evfssubadd
evfssubadd
Vector Floating-Point Single-Precision Subtract/Add
evfssubadd
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
0
0
1
1
rD0:31  rA0:31 -sp rB0:31
rD32:63  rA32:63 +sp rB32:63
The high-order single-precision floating-point element of rB is subtracted from the corresponding element
of rA, the low-order single-precision floating-point element of rB is subtracted from the corresponding
element of rA, and the results are stored in rD. If an element of rA is NaN or infinity, the corresponding
result is either pmax or nmax (as appropriate). Otherwise, if an element of rB is NaN or infinity, the
corresponding result is either nmax or pmax (as appropriate). Otherwise, if an overflow occurs, pmax or
nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 (for
rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in the corresponding element of rD.
Exceptions:
If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, or if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt
is taken, SPEFSCR[FINXS] is set. If the floating-point inexact exception is enabled, an interrupt is taken
using the floating-point round interrupt vector. In this case, the destination register is updated with the
truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the
interrupt handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-85
Embedded Floating-Point Unit
evfssubaddx
evfssubaddx
Vector Floating-Point Single-Precision Subtract / Add Exchanged
evfssubaddx
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
1
0
1
1
rD0:31  rA32:63-sp rB0:31
rD32:63  rA0:31 +sp rB32:63
The high-order single-precision floating-point element of rB is subtracted from the low-order element of
rA, the low-order single-precision floating-point element of rB is added to the high-order from the
corresponding element of rA, and the results are stored in rD. If an element of rA is NaN or infinity, the
corresponding result is either pmax or nmax (as appropriate). Otherwise, if an element of rB is NaN or
infinity, the corresponding result is either nmax or pmax (as appropriate). Otherwise, if an overflow occurs,
pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0
(for rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in the corresponding element
of rD.
Exceptions:
If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt
is taken, SPEFSCR[FINXS] is set. If the floating-point inexact exception is enabled, an interrupt is taken
using the floating-point round interrupt vector. In this case, the destination register is updated with the
truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the
interrupt handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-86
Freescale Semiconductor
Embedded Floating-Point Unit
evfssubx
evfssubx
Vector Floating-Point Single-Precision Subtract Exchanged
evfssubx
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
1
0
0
1
rD0:31  rA32:63-sp rB0:31
rD32:63  rA0:31 -sp rB32:63
The high-order single-precision floating-point element of rB is subtracted from the low-order element of
rA, the low-order single-precision floating-point element of rB is subtracted from the high-order from the
corresponding element of rA, and the results are stored in rD. If an element of rA is NaN or infinity, the
corresponding result is either pmax or nmax (as appropriate). Otherwise, if an element of rB is NaN or
infinity, the corresponding result is either nmax or pmax (as appropriate). Otherwise, if an overflow occurs,
pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0
(for rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in the corresponding element
of rD.
Exceptions:
If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated.
If either result element of this instruction is inexact, overflows but overflow exceptions are disabled and
no other interrupt is taken, or underflows but underflow exceptions are disabled and no other interrupt is
taken, SPEFSCR[FINXS] is set. If the floating-point inexact exception is enabled, an interrupt is taken
using the floating-point round interrupt vector. In this case, the destination register is updated with the
truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the
interrupt handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-87
Embedded Floating-Point Unit
evfssum
evfssum
Vector Floating-Point Single-Precision Sums
evfssum
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
0
1
0
0
rD0:31  rA0:31 +sp rA32:63
rD32:63  rB0:31 +sp rB32:63
The high-order single-precision floating-point element of rA is added to the low-order element of rA, the
high-order single-precision floating-point element of rB is added to the low-order element of rB, and the
results are stored in rD. If the high-order element of rA or rB is NaN or infinity, the corresponding result
is either pmax or nmax (as appropriate). Otherwise, if the low order element of rA or rB is NaN or infinity,
the corresponding result is either pmax or nmax (as appropriate). Otherwise, if an overflow occurs, pmax
or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0 (for
rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in the corresponding element of rD.
Exceptions:
If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated.
If either result element of this instruction is inexact, overflows but overflow exceptions are disabled and
no other interrupt is taken, or underflows but underflow exceptions are disabled and no other interrupt is
taken, SPEFSCR[FINXS, FINXSH] is set. If the floating-point inexact exception is enabled, an interrupt
is taken using the floating-point round interrupt vector. In this case, the destination register is updated with
the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the
interrupt handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-88
Freescale Semiconductor
Embedded Floating-Point Unit
evfssumdiff
evfssumdiff
Vector Floating-Point Single-Precision Sum / Difference
evfssumdiff
rD,rA,rB
0
0
5
0
0
1
0
0
6
10 11
RD
15 16
RA
20 21
RB
0
31
1
0
1
0
1
0
0
1
1
0
rD0:31  rA0:31 +sp rA32:63
rD32:63  rB0:31 -sp rB32:63
The high-order single-precision floating-point element of rA is added to the low-order element of rA, the
low-order single-precision floating-point element of rB is subtracted from the high-order element of rB,
and the results are stored in rD. If the high-order element of rA or rB is NaN or infinity, the corresponding
result is either pmax or nmax (as appropriate). Otherwise, if the low order element of rA or rB is NaN or
infinity, the corresponding result is either pmax or nmax (as appropriate). Otherwise, if an overflow occurs,
pmax or nmax (as appropriate) is stored in the corresponding element of rD. If an underflow occurs, +0
(for rounding modes RN, RZ, RP) or –0 (for rounding mode RM) is stored in the corresponding element
of rD.
Exceptions:
If the contents of either element of rA or rB are Infinity, Denorm, or NaN, SPEFSCR[FINV, FINVH] are
set appropriately, and SPEFSCR[FGH, FXH, FG, FX] are cleared appropriately. If SPEFSCR[FINVE] is
set, an interrupt is taken and the destination register is not updated. Otherwise, if an overflow occurs,
SPEFSCR[FOVF, FOVFH] are set appropriately, and if an underflow occurs, SPEFSCR[FUNF, FUNFH]
are set appropriately. If either underflow or overflow exceptions are enabled and a corresponding status bit
is set, an interrupt is taken. If any of these interrupts are taken, the destination register is not updated.
If either result element of this instruction is inexact, or overflows but overflow exceptions are disabled,
and no other interrupt is taken, or underflows but underflow exceptions are disabled, and no other interrupt
is taken, SPEFSCR[FINXS, FINXSH] is set. If the floating-point inexact exception is enabled, an interrupt
is taken using the floating-point round interrupt vector. In this case, the destination register is updated with
the truncated result(s). The FG and FX bits are properly updated to allow rounding to be performed in the
interrupt handler.
FG and FX (FGH and FXH) are cleared if an overflow or underflow interrupt is taken, or if an invalid
operation/input error is signaled for the low (high) element (regardless of FINVE).
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-89
Embedded Floating-Point Unit
evfststeq
evfststeq
Vector Floating-Point Single-Precision Test Equal
evfststeq
crfD,rA,rB
0
5
4
6
8
crfD
9
10 11
00
15 16
RA
20 21
RB
0
31
1
0
1
0
0
1
1
1
1
0
Description:
ah = RA0:31
al = RA32:63
bh = RB0:31
bl = RB32:63
if (ah == bh) then ch = 1
else ch = 0
if (al == bl) then cl = 1
else cl = 0
CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl)
Each element of rA is compared against the corresponding element of rB. If rA equals RB, the bit in crfD
is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = –0). The comparison proceeds after
treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’ directly.
No exceptions are taken during the execution of evfststeq. If strict conformity to IEEE 754 standard is
required, the program should use evfscmpeq.
Implementation note: In an implementation, the execution of evfststeq is likely to be faster than the
execution of evfscmpeq.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-90
Freescale Semiconductor
Embedded Floating-Point Unit
evfststgt
evfststgt
Vector Floating-Point Single-Precision Test Greater Than
evfststgt
crfD,rA,rB
0
5
4
6
8
crfD
9
10 11
00
15 16
RA
20 21
RB
0
31
1
0
1
0
0
1
1
1
0
0
Description:
ah = RA0:31
al = RA32:63
bh = RB0:31
bl = RB32:63
if (ah > bh) then ch = 1
else ch = 0
if (al > bl) then cl = 1
else cl = 0
CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl)
Each element of rA is compared against the corresponding element of rB. If rA is greater than rB, the bit
in crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = –0). The comparison
proceeds after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and
‘f’ directly.
No exceptions are taken during the execution of evfststgt. If strict conformity to IEEE 754 standard is
required, the program should use evfscmpgt.
Implementation note: In an implementation, the execution of evfststgt is likely to be faster than the
execution of evfscmpgt.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-91
Embedded Floating-Point Unit
evfststlt
evfststlt
Vector Floating-Point Single-Precision Test Less Than
evfststlt
crfD,rA,rB
0
5
4
6
8
crfD
9
10 11
15 16
00
RA
20 21
RB
0
31
1
0
1
0
0
1
1
1
0
1
Description:
ah = RA0:31
al = RA32:63
bh = RB0:31
bl = RB32:63
if (ah < bh) then ch = 1
else ch = 0
if (al < bl) then cl = 1
else cl = 0
CR4*crfD:4*crfD+3 = ch || cl || (ch | cl) || (ch & cl)
Each element of rA is compared with the corresponding element of rB. If rA is less than rB, the bit in the
crfD is set, otherwise it is cleared. Comparison ignores the sign of 0 (+0 = –0). The comparison proceeds
after treating NaNs, Infinities, and Denorms as normalized numbers, using their values of ‘e’ and ‘f’
directly.
No exceptions are taken during the execution of evfststlt. If strict conformity to IEEE 754 standard is
required, the program should use evfscmplt.
Implementation note: In an implementation, the execution of evfststlt is likely to be faster than the
execution of evfscmplt.
5.4
Embedded Floating-point Results Summary
Table 5-2 summarizes the results of floating-point operations on for add, sub, mul, and div. Flag settings
are performed on appropriate element flags.
Table 5-2. Floating-Point Results Summary—Add, Sub, Mul, Div
Operation
Operand A
Operand B
Result
F INV FOVF FUNF FDBZ F INX
Add
Add


amax
1
0
0
0
0
Add

NaN
amax
1
0
0
0
0
Add

denorm
amax
1
0
0
0
0
Add

zero
amax
1
0
0
0
0
Add

Norm
amax
1
0
0
0
0
Add
NaN

amax
1
0
0
0
0
Add
NaN
NaN
amax
1
0
0
0
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-92
Freescale Semiconductor
Embedded Floating-Point Unit
Table 5-2. Floating-Point Results Summary—Add, Sub, Mul, Div (continued)
Operation
Operand A
Operand B
Result
F INV FOVF FUNF FDBZ F INX
Add
NaN
denorm
amax
1
0
0
0
0
Add
NaN
zero
amax
1
0
0
0
0
Add
NaN
norm
amax
1
0
0
0
0
Add
denorm

bmax
1
0
0
0
0
Add
denorm
NaN
bmax
1
0
0
0
0
Add
denorm
denorm
zero1
1
0
0
0
0
Add
denorm
zero
zero1
1
0
0
0
0
Add
denorm
norm
operand_b
1
0
0
0
0
Add
zero

bmax
1
0
0
0
0
Add
zero
NaN
bmax
1
0
0
0
0
Add
zero
denorm
zero1
1
0
0
0
0
Add
zero
zero
zero1
0
0
0
0
0
Add
zero
norm
operand_b
0
0
0
0
0
Add
norm

bmax
1
0
0
0
0
Add
norm
NaN
bmax
1
0
0
0
0
Add
norm
denorm
operand_a
1
0
0
0
0
Add
norm
zero
operand_a
0
0
0
0
0
Add
norm
norm
_Calc_
0
*
*
0
*
Subtract
Sub


amax
1
0
0
0
0
Sub

NaN
amax
1
0
0
0
0
Sub

denorm
amax
1
0
0
0
0
Sub

zero
amax
1
0
0
0
0
Sub

Norm
amax
1
0
0
0
0
Sub
NaN

amax
1
0
0
0
0
Sub
NaN
NaN
amax
1
0
0
0
0
Sub
NaN
denorm
amax
1
0
0
0
0
Sub
NaN
zero
amax
1
0
0
0
0
Sub
NaN
norm
amax
1
0
0
0
0
Sub
denorm

–bmax
1
0
0
0
0
Sub
denorm
NaN
–bmax
1
0
0
0
0
Sub
denorm
denorm
zero2
1
0
0
0
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-93
Embedded Floating-Point Unit
Table 5-2. Floating-Point Results Summary—Add, Sub, Mul, Div (continued)
Operation
Operand A
Operand B
Result
F INV FOVF FUNF FDBZ F INX
Sub
denorm
zero
zero2
1
0
0
0
0
Sub
denorm
norm
–operand_b
1
0
0
0
0
Sub
zero

–bmax
1
0
0
0
0
Sub
zero
NaN
–bmax
1
0
0
0
0
Sub
zero
denorm
zero2
1
0
0
0
0
Sub
zero
zero
zero2
0
0
0
0
0
Sub
zero
norm
–operand_b
0
0
0
0
0
Sub
norm

–bmax
1
0
0
0
0
Sub
norm
NaN
–bmax
1
0
0
0
0
Sub
norm
denorm
operand_a
1
0
0
0
0
Sub
norm
zero
operand_a
0
0
0
0
0
Sub
norm
norm
_Calc_
0
*
*
0
*
Multiply3
Mul


max
1
0
0
0
0
Mul

NaN
max
1
0
0
0
0
Mul

denorm
zero
1
0
0
0
0
Mul

zero
zero
1
0
0
0
0
Mul

Norm
max
1
0
0
0
0
Mul
NaN

max
1
0
0
0
0
Mul
NaN
NaN
max
1
0
0
0
0
Mul
NaN
denorm
zero
1
0
0
0
0
Mul
NaN
zero
zero
1
0
0
0
0
Mul
NaN
norm
max
1
0
0
0
0
Mul
denorm

zero
1
0
0
0
0
Mul
denorm
NaN
zero
1
0
0
0
0
Mul
denorm
denorm
zero
1
0
0
0
0
Mul
denorm
zero
zero
1
0
0
0
0
Mul
denorm
norm
zero
1
0
0
0
0
Mul
zero

zero
1
0
0
0
0
Mul
zero
NaN
zero
1
0
0
0
0
Mul
zero
denorm
zero
1
0
0
0
0
Mul
zero
zero
zero
0
0
0
0
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-94
Freescale Semiconductor
Embedded Floating-Point Unit
Table 5-2. Floating-Point Results Summary—Add, Sub, Mul, Div (continued)
Operation
Operand A
Operand B
Result
F INV FOVF FUNF FDBZ F INX
Mul
zero
norm
zero
0
0
0
0
0
Mul
norm

max
1
0
0
0
0
Mul
norm
NaN
max
1
0
0
0
0
Mul
norm
denorm
zero
1
0
0
0
0
Mul
norm
zero
zero
0
0
0
0
0
Mul
norm
norm
_Calc_
0
*
*
0
*
Divide3
Div


zero
1
0
0
0
0
Div

NaN
zero
1
0
0
0
0
Div

denorm
max
1
0
0
0
0
Div

zero
max
1
0
0
0
0
Div

Norm
max
1
0
0
0
0
Div
NaN

zero
1
0
0
0
0
Div
NaN
NaN
zero
1
0
0
0
0
Div
NaN
denorm
max
1
0
0
0
0
Div
NaN
zero
max
1
0
0
0
0
Div
NaN
norm
max
1
0
0
0
0
Div
denorm

zero
1
0
0
0
0
Div
denorm
NaN
zero
1
0
0
0
0
Div
denorm
denorm
max
1
0
0
0
0
Div
denorm
zero
max
1
0
0
0
0
Div
denorm
norm
zero
1
0
0
0
0
Div
zero

zero
1
0
0
0
0
Div
zero
NaN
zero
1
0
0
0
0
Div
zero
denorm
max
1
0
0
0
0
Div
zero
zero
max
1
0
0
0
0
Div
zero
norm
zero
0
0
0
0
0
Div
norm

zero
1
0
0
0
0
Div
norm
NaN
zero
1
0
0
0
0
Div
norm
denorm
max
1
0
0
0
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-95
Embedded Floating-Point Unit
Table 5-2. Floating-Point Results Summary—Add, Sub, Mul, Div (continued)
Operation
Operand A
Operand B
Result
F INV FOVF FUNF FDBZ F INX
Div
norm
zero
max
0
0
0
1
0
Div
norm
norm
_Calc_
0
*
*
0
*
Notes:
The following definitions apply:
1. - sign of result is positive when sign_a and sign_b are different for all rounding modes except round to minus infinity, where it
is negative.
2. - sign of result is positive when sign_a and sign_b are the same for all rounding modes except round to minus infinity, where
it is negative.
3. - sign of result is always (sign_a XOR sign_b)
* - updated according to results of calculation
_Calc_ - result is updated with the results of calculation
max - max normalized number with sign of (sign_a XOR sign_b)
amax - max normalized number with sign of sign_a
bmax - max normalized number with sign of sign_b
nmax - max negative normalized number
pmax - max positive normalized number
Table 5-3 summarizes the results of floating-point operations on for madd, msub, nmadd, and nmsub.
Table 5-3. Floating-Point Results Summary—madd, msub, nmadd, nmsub
Operation
Operand A
Operand B
Operand D
Result
F INV FOVF FUNF FDBZ F INX
madd
madd
 , NaN
 , NaN, Norm
 , NaN, denorm,
zero, Norm
abmax
1
0
0
0
0
madd
 , NaN
denorm, zero
 , NaN
dmax
1
0
0
0
0
madd
 , NaN
denorm, zero
denorm, zero
zero1
1
0
0
0
0
madd
 , NaN
denorm, zero
Norm
operand_d
1
0
0
0
0
madd
denorm
 , NaN, denorm,
zero, Norm
 , NaN
dmax
1
0
0
0
0
madd
denorm
 , NaN, denorm,
zero, Norm
denorm, zero
zero1
1
0
0
0
0
madd
denorm
 , NaN, denorm,
zero, Norm
Norm
operand_d
1
0
0
0
0
madd
zero
 , NaN, denorm,
 , NaN
dmax
1
0
0
0
0
madd
zero
 , NaN, denorm
denorm, zero
zero1
1
0
0
0
0
madd
zero
 , NaN, denorm
Norm
operand_d
1
0
0
0
0
madd
zero
zero, Norm
 , NaN
dmax
1
0
0
0
0
1
0
0
0
0
madd
zero
zero, Norm
denorm
zero1
madd
zero
zero, Norm
zero
zero1
0
0
0
0
0
madd
zero
zero, Norm
Norm
operand_d
0
0
0
0
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-96
Freescale Semiconductor
Embedded Floating-Point Unit
Table 5-3. Floating-Point Results Summary—madd, msub, nmadd, nmsub (continued)
Operation
Operand A
Operand B
Operand D
Result
F INV FOVF FUNF FDBZ F INX
madd
norm
 , NaN
 , NaN, denorm,
zero, Norm
abmax
1
0
0
0
0
madd
norm
denorm
 , NaN
dmax
1
0
0
0
0
1
madd
norm
denorm
denorm, zero
zero
1
0
0
0
0
madd
norm
denorm
norm
operand_d
1
0
0
0
0
madd
norm
zero
 , NaN
dmax
1
0
0
0
0
madd
norm
zero
denorm
zero1
1
0
0
0
0
1
madd
norm
zero
zero
zero
0
0
0
0
0
madd
norm
zero
norm
operand_d
0
0
0
0
0
madd
norm
norm
 , NaN
dmax
1
0
0
0
0
madd
norm
norm
denorm
ab_Calc
1
*
*
0
*
madd
norm
norm
zero
ab_Calc
0
*
*
0
*
madd
norm
norm
norm
_Calc_
0
*
*
0
*
nmadd
nmadd
 , NaN
 , NaN, Norm
 , NaN, denorm,
zero, Norm
–abmax
1
0
0
0
0
nmadd
 , NaN
denorm, zero
 , NaN
–dmax
1
0
0
0
0
nmadd
 , NaN
denorm, zero
denorm, zero
zero3
1
0
0
0
0
nmadd
 , NaN
denorm, zero
Norm
–operand_d
1
0
0
0
0
nmadd
denorm
 , NaN, denorm,
zero, Norm
 , NaN
–dmax
1
0
0
0
0
nmadd
denorm
 , NaN, denorm,
zero, Norm
denorm, zero
zero3
1
0
0
0
0
nmadd
denorm
 , NaN, denorm,
zero, Norm
Norm
–operand_d
1
0
0
0
0
nmadd
zero
 , NaN, denorm,
 , NaN
–dmax
1
0
0
0
0
nmadd
zero
 , NaN, denorm
denorm, zero
zero3
1
0
0
0
0
nmadd
zero
 , NaN, denorm
Norm
–operand_d
1
0
0
0
0
nmadd
zero
zero, Norm
 , NaN
–dmax
1
0
0
0
0
1
0
0
0
0
nmadd
zero
zero, Norm
denorm
zero3
nmadd
zero
zero, Norm
zero
zero3
0
0
0
0
0
nmadd
zero
zero, Norm
Norm
–operand_d
0
0
0
0
0
nmadd
norm
 , NaN
 , NaN, denorm,
zero, Norm
–abmax
1
0
0
0
0
nmadd
norm
denorm
 , NaN
–dmax
1
0
0
0
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-97
Embedded Floating-Point Unit
Table 5-3. Floating-Point Results Summary—madd, msub, nmadd, nmsub (continued)
Operation
Operand A
Operand B
Operand D
Result
F INV FOVF FUNF FDBZ F INX
nmadd
norm
denorm
denorm, zero
zero3
1
0
0
0
0
nmadd
norm
denorm
norm
–operand_d
1
0
0
0
0
nmadd
norm
zero
 , NaN
–dmax
1
0
0
0
0
nmadd
norm
zero
denorm
zero3
1
0
0
0
0
nmadd
norm
zero
zero
zero3
0
0
0
0
0
nmadd
norm
zero
norm
–operand_d
0
0
0
0
0
nmadd
norm
norm
 , NaN
–dmax
1
0
0
0
0
nmadd
norm
norm
denorm
–ab_Calc
1
*
*
0
*
nmadd
norm
norm
zero
–ab_Calc
0
*
*
0
*
nmadd
norm
norm
norm
–(_Calc_)
0
*
*
0
*
msub
msub
 , NaN
 , NaN, Norm
 , NaN, denorm,
zero, Norm
abmax
1
0
0
0
0
msub
 , NaN
denorm, zero
 , NaN
–dmax
1
0
0
0
0
msub
 , NaN
denorm, zero
denorm, zero
zero2
1
0
0
0
0
msub
 , NaN
denorm, zero
Norm
–operand_d
1
0
0
0
0
msub
denorm
 , NaN, denorm,
zero, Norm
 , NaN
–dmax
1
0
0
0
0
msub
denorm
 , NaN, denorm,
zero, Norm
denorm, zero
zero2
1
0
0
0
0
msub
denorm
 , NaN, denorm,
zero, Norm
Norm
–operand_d
1
0
0
0
0
msub
zero
 , NaN, denorm,
 , NaN
–dmax
1
0
0
0
0
msub
zero
 , NaN, denorm
denorm, zero
zero2
1
0
0
0
0
msub
zero
 , NaN, denorm
Norm
–operand_d
1
0
0
0
0
msub
zero
zero, Norm
 , NaN
–dmax
1
0
0
0
0
msub
zero
zero, Norm
denorm
zero2
1
0
0
0
0
msub
zero
zero, Norm
zero
zero2
0
0
0
0
0
msub
zero
zero, Norm
Norm
–operand_d
0
0
0
0
0
msub
norm
 , NaN
 , NaN, denorm,
zero, Norm
abmax
1
0
0
0
0
msub
norm
denorm
 , NaN
–dmax
1
0
0
0
0
msub
norm
denorm
denorm, zero
zero2
1
0
0
0
0
msub
norm
denorm
norm
–operand_d
1
0
0
0
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-98
Freescale Semiconductor
Embedded Floating-Point Unit
Table 5-3. Floating-Point Results Summary—madd, msub, nmadd, nmsub (continued)
Operation
Operand A
Operand B
Operand D
Result
F INV FOVF FUNF FDBZ F INX
msub
norm
zero
 , NaN
–dmax
1
0
0
0
0
msub
norm
zero
denorm
zero2
1
0
0
0
0
2
msub
norm
zero
zero
zero
0
0
0
0
0
msub
norm
zero
norm
–operand_d
0
0
0
0
0
msub
norm
norm
 , NaN
–dmax
1
0
0
0
0
msub
norm
norm
denorm
ab_Calc
1
*
*
0
*
msub
norm
norm
zero
ab_Calc
0
*
*
0
*
msub
norm
norm
norm
_Calc_
0
*
*
0
*
nmsub
nmsub
 , NaN
 , NaN, Norm
 , NaN, denorm,
zero, Norm
–abmax
1
0
0
0
0
nmsub
 , NaN
denorm, zero
 , NaN
dmax
1
0
0
0
0
nmsub
 , NaN
denorm, zero
denorm, zero
zero4
1
0
0
0
0
nmsub
 , NaN
denorm, zero
Norm
operand_d
1
0
0
0
0
nmsub
denorm
 , NaN, denorm,
zero, Norm
 , NaN
dmax
1
0
0
0
0
nmsub
denorm
 , NaN, denorm,
zero, Norm
denorm, zero
zero4
1
0
0
0
0
nmsub
denorm
 , NaN, denorm,
zero, Norm
Norm
operand_d
1
0
0
0
0
nmsub
zero
 , NaN, denorm,
 , NaN
dmax
1
0
0
0
0
nmsub
zero
 , NaN, denorm
denorm, zero
zero4
1
0
0
0
0
nmsub
zero
 , NaN, denorm
Norm
operand_d
1
0
0
0
0
nmsub
zero
zero, Norm
 , NaN
dmax
1
0
0
0
0
nmsub
zero
zero, Norm
denorm
zero4
1
0
0
0
0
nmsub
zero
zero, Norm
zero
zero4
0
0
0
0
0
nmsub
zero
zero, Norm
Norm
–operand_d
0
0
0
0
0
nmsub
norm
 , NaN
 , NaN, denorm,
zero, Norm
–abmax
1
0
0
0
0
nmsub
norm
denorm
 , NaN
dmax
1
0
0
0
0
nmsub
norm
denorm
denorm, zero
zero4
1
0
0
0
0
nmsub
norm
denorm
norm
operand_d
1
0
0
0
0
nmsub
norm
zero
 , NaN
dmax
1
0
0
0
0
denorm
zero4
1
0
0
0
0
nmsub
norm
zero
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-99
Embedded Floating-Point Unit
Table 5-3. Floating-Point Results Summary—madd, msub, nmadd, nmsub (continued)
Operation
Operand A
Operand B
Operand D
Result
F INV FOVF FUNF FDBZ F INX
nmsub
norm
zero
zero
zero4
0
0
0
0
0
nmsub
norm
zero
norm
operand_d
0
0
0
0
0
nmsub
norm
norm
 , NaN
dmax
1
0
0
0
0
nmsub
norm
norm
denorm
–ab_Calc
1
*
*
0
*
nmsub
norm
norm
zero
–ab_Calc
0
*
*
0
*
nmsub
norm
norm
norm
–(_Calc_)
0
*
*
0
*
Notes:
The following definitions apply:
1. – sign of result is positive when (sign_a XOR sign_b) and sign_d are different for all rounding modes except round to minus
infinity, where it is negative.
2. – sign of result is positive when (sign_a XOR sign_b) and sign_d are the same for all rounding modes except round to minus
infinity, where it is negative.
3. – sign of result is negative when (sign_a XOR sign_b) and sign_d are different for all rounding modes except round to minus
infinity, where it is positive.
4. – sign of result is negative when (sign_a XOR sign_b) and sign_d are the same for all rounding modes except round to minus
infinity, where it is positive.
* – updated according to results of calculation
ab_Calc – result is updated with the results of intermediate product calculation, rounded
_Calc_ – result is updated with the results of calculation, rounded
abmax – max normalized number with sign of (sign_a XOR sign_b)
dmax – max normalized number with sign of sign_d
nmax – max negative normalized number
pmax – max positive normalized number
Table 5-4 summarizes the results of floating-point operations for sqrt.
Table 5-4. Floating-Point Results Summary—sqrt
Operand A
Result
FI
NV
FOVF
FUNF
FDBZ
F
I
NX
+
pmax
1
0
0
0
0
–
–0
1
0
0
0
0
+NaN
pmax
1
0
0
0
0
–NaN
–0
1
0
0
0
0
+denorm
+zero
1
0
0
0
0
–denorm
–zero
1
0
0
0
0
+zero
+zero
0
0
0
0
0
–zero
–zero
0
0
0
0
0
+norm
_Calc_
0
*
*
0
*
–norm
–0
1
0
0
0
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-100
Freescale Semiconductor
Embedded Floating-Point Unit
Table 5-5 shows the floating-point results summary for min and max.
Table 5-5. Floating–Point Results Summary—Min, Max
Operand A
Operand B
Result
FI
NV
FOV
F
FUN
F
FDB
Z
F
I
NX
Max


pmax
1
0
0
0
0


pmax
1
0
0
0
0

+NaN
pmax
1
0
0
0
0

–NaN
pmax
1
0
0
0
0

denorm
pmax
1
0
0
0
0

zero
pmax
1
0
0
0
0

Norm
pmax
1
0
0
0
0


pmax
1
0
0
0
0


nmax
1
0
0
0
0

+NaN
nmax
1
0
0
0
0

–NaN
nmax
1
0
0
0
0

denorm
bzero
1
0
0
0
0

zero
bzero
1
0
0
0
0

Norm
operand_b
1
0
0
0
0
+NaN

pmax
1
0
0
0
0
+NaN

nmax
1
0
0
0
0
+NaN
+NaN
pmax
1
0
0
0
0
+NaN
–NaN
pmax
1
0
0
0
0
+NaN
denorm
bzero
1
0
0
0
0
+NaN
zero
bzero
1
0
0
0
0
+NaN
Norm
operand_b
1
0
0
0
0
–NaN

pmax
1
0
0
0
0
–NaN

nmax
1
0
0
0
0
–NaN
+NaN
pmax
1
0
0
0
0
–NaN
–NaN
nmax
1
0
0
0
0
–NaN
denorm
bzero
1
0
0
0
0
–NaN
zero
bzero
1
0
0
0
0
–NaN
Norm
operand_b
1
0
0
0
0
+denorm

pmax
1
0
0
0
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-101
Embedded Floating-Point Unit
Table 5-5. Floating–Point Results Summary—Min, Max (continued)
Operand A
Operand B
Result
FI
NV
FOV
F
FUN
F
FDB
Z
F
I
NX
+denorm

azero
1
0
0
0
0
+denorm
+NaN
azero
1
0
0
0
0
+denorm
–NaN
azero
1
0
0
0
0
+denorm
denorm
azero
1
0
0
0
0
+denorm
zero
azero
1
0
0
0
0
+denorm
+Norm
operand_b
1
0
0
0
0
+denorm
–Norm
azero
1
0
0
0
0
–denorm

pmax
1
0
0
0
0
–denorm

azero
1
0
0
0
0
–denorm
+NaN
azero
1
0
0
0
0
–denorm
–NaN
azero
1
0
0
0
0
–denorm
denorm
bzero
1
0
0
0
0
–denorm
zero
bzero
1
0
0
0
0
–denorm
+Norm
operand_b
1
0
0
0
0
–denorm
–Norm
azero
1
0
0
0
0
+zero

pmax
1
0
0
0
0
+zero

azero
1
0
0
0
0
+zero
+NaN
azero
1
0
0
0
0
+zero
–NaN
azero
1
0
0
0
0
+zero
denorm
azero
1
0
0
0
0
+zero
zero
azero
0
0
0
0
0
+zero
+Norm
operand_b
0
0
0
0
0
+zero
–Norm
azero
0
0
0
0
0
–zero

pmax
1
0
0
0
0
–zero

azero
1
0
0
0
0
–zero
+NaN
azero
1
0
0
0
0
–zero
–NaN
azero
1
0
0
0
0
–zero
denorm
bzero
1
0
0
0
0
–zero
zero
bzero
0
0
0
0
0
–zero
+Norm
operand_b
0
0
0
0
0
–zero
–Norm
azero
0
0
0
0
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-102
Freescale Semiconductor
Embedded Floating-Point Unit
Table 5-5. Floating–Point Results Summary—Min, Max (continued)
Operand A
Operand B
Result
FI
NV
FOV
F
FUN
F
FDB
Z
F
I
NX
+Norm

pmax
1
0
0
0
0
+Norm

operand_a
1
0
0
0
0
+Norm
+NaN
operand_a
1
0
0
0
0
+Norm
–NaN
operand_a
1
0
0
0
0
+Norm
denorm
operand_a
1
0
0
0
0
+Norm
zero
operand_a
0
0
0
0
0
+Norm
Norm
_Calc_
0
0
0
0
0
–Norm

pmax
1
0
0
0
0
–Norm

operand_a
1
0
0
0
0
–Norm
+NaN
operand_a
1
0
0
0
0
–Norm
–NaN
operand_a
1
0
0
0
0
–Norm
denorm
bzero
1
0
0
0
0
–Norm
zero
bzero
0
0
0
0
0
–Norm
Norm
_Calc_
0
0
0
0
0
Min


pmax
1
0
0
0
0


nmax
1
0
0
0
0

+NaN
pmax
1
0
0
0
0

–NaN
pmax
1
0
0
0
0

denorm
bzero
1
0
0
0
0

zero
bzero
1
0
0
0
0

Norm
operand_b
1
0
0
0
0


nmax
1
0
0
0
0


nmax
1
0
0
0
0

+NaN
nmax
1
0
0
0
0

–NaN
nmax
1
0
0
0
0

denorm
nmax
1
0
0
0
0

zero
nmax
1
0
0
0
0

Norm
nmax
1
0
0
0
0
+NaN

pmax
1
0
0
0
0
+NaN

nmax
1
0
0
0
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-103
Embedded Floating-Point Unit
Table 5-5. Floating–Point Results Summary—Min, Max (continued)
Operand A
Operand B
Result
FI
NV
FOV
F
FUN
F
FDB
Z
F
I
NX
+NaN
+NaN
pmax
1
0
0
0
0
+NaN
–NaN
nmax
1
0
0
0
0
+NaN
denorm
bzero
1
0
0
0
0
+NaN
zero
bzero
1
0
0
0
0
+NaN
Norm
operand_b
1
0
0
0
0
–NaN

pmax
1
0
0
0
0
–NaN

nmax
1
0
0
0
0
–NaN
+NaN
nmax
1
0
0
0
0
–NaN
–NaN
nmax
1
0
0
0
0
–NaN
denorm
bzero
1
0
0
0
0
–NaN
zero
bzero
1
0
0
0
0
–NaN
Norm
operand_b
1
0
0
0
0
+denorm

azero
1
0
0
0
0
+denorm

nmax
1
0
0
0
0
+denorm
+NaN
azero
1
0
0
0
0
+denorm
–NaN
azero
1
0
0
0
0
+denorm
denorm
bzero
1
0
0
0
0
+denorm
zero
bzero
1
0
0
0
0
+denorm
+Norm
azero
1
0
0
0
0
+denorm
–Norm
operand_b
1
0
0
0
0
–denorm

azero
1
0
0
0
0
–denorm

nmax
1
0
0
0
0
–denorm
+NaN
azero
1
0
0
0
0
–denorm
–NaN
azero
1
0
0
0
0
–denorm
denorm
azero
1
0
0
0
0
–denorm
zero
azero
1
0
0
0
0
–denorm
+Norm
azero
1
0
0
0
0
–denorm
–Norm
operand_b
1
0
0
0
0
+zero

azero
1
0
0
0
0
+zero

nmax
1
0
0
0
0
+zero
+NaN
azero
1
0
0
0
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-104
Freescale Semiconductor
Embedded Floating-Point Unit
Table 5-5. Floating–Point Results Summary—Min, Max (continued)
Operand A
Operand B
Result
FI
NV
FOV
F
FUN
F
FDB
Z
F
I
NX
+zero
–NaN
azero
1
0
0
0
0
+zero
denorm
bzero
1
0
0
0
0
+zero
zero
bzero
0
0
0
0
0
+zero
+Norm
azero
0
0
0
0
0
+zero
–Norm
operand_b
0
0
0
0
0
–zero

azero
1
0
0
0
0
–zero

nmax
1
0
0
0
0
–zero
+NaN
azero
1
0
0
0
0
–zero
–NaN
azero
1
0
0
0
0
–zero
denorm
azero
1
0
0
0
0
–zero
zero
azero
0
0
0
0
0
–zero
+Norm
azero
0
0
0
0
0
–zero
–Norm
operand_b
0
0
0
0
0
+Norm

operand_a
1
0
0
0
0
+Norm

nmax
1
0
0
0
0
+Norm
+NaN
operand_a
1
0
0
0
0
+Norm
–NaN
operand_a
1
0
0
0
0
+Norm
denorm
bzero
1
0
0
0
0
+Norm
zero
bzero
0
0
0
0
0
+Norm
Norm
_Calc_
0
0
0
0
0
–Norm

operand_a
1
0
0
0
0
–Norm

nmax
1
0
0
0
0
–Norm
+NaN
operand_a
1
0
0
0
0
–Norm
–NaN
operand_a
1
0
0
0
0
–Norm
denorm
operand_a
1
0
0
0
0
–Norm
zero
operand_a
0
0
0
0
0
–Norm
Norm
_Calc_
0
0
0
0
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-105
Embedded Floating-Point Unit
Table 5-6 shows the floating-points results summary for convert to unsigned.
Table 5-6. Floating-Point Results Summary—Convert to Unsigned
Operand B
Integer Result efsctui[z]
Fractional Result efsctuf
F INV
FOVF
FUNF
FDBZ
F INX
+ 
0xFFFF_FFFF
0xFFFF_FFFF
1
0
0
0
0
- 
zero
zero
1
0
0
0
0
+NaN
zero
zero
1
0
0
0
0
–NaN
zero
zero
1
0
0
0
0
denorm
zero
zero
1
0
0
0
0
zero
zero
zero
0
0
0
0
0
+norm
_Calc_
_Calc_
*
0
0
0
*
–norm
zero
zero
0
0
0
0
0
Table 5-7 shows the floating-points results summary for convert to signed.
Table 5-7. Floating-Point Results Summary—Convert to Signed
Operand B
Integer Result efsctsWi[z]
Fractional Result efsctsf
F INV
FOVF
FUNF
FDBZ
F INX
+ 
0x7FFF_FFFF
0x7FFF_FFFF
1
0
0
0
0
– 
0x8000_0000
0x8000_0000
1
0
0
0
0
+NaN
zero
zero
1
0
0
0
0
–NaN
zero
zero
1
0
0
0
0
denorm
zero
zero
1
0
0
0
0
zero
zero
zero
0
0
0
0
0
+norm
_Calc_
_Calc_
*
0
0
0
*
–norm
_Calc_
_Calc_
*
0
0
0
*
Table 5-8 shows the floating-points results summary for convert from unsigned.
Table 5-8. Floating-Point Results Summary—Convert from Unsigned
Operand B
Integer Source efscfui
Fractional Source efscfu
F INV
FOVF
FUNF
FDBZ
F INX
zero
zero
zero
0
0
0
0
0
norm
_Calc_
_Calc_
0
0
0
0
*
Table 5-9 shows the floating-points results summary for convert from signed.
Table 5-9. Floating-Point Results Summary—Convert from Signed
Operand B
Integer Source efscfsi
Fractional Source efscfsf
F INV
FOVF
FUNF
FDBZ
F INX
zero
zero
zero
0
0
0
0
0
norm
_Calc_
_Calc_
0
0
0
0
*
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-106
Freescale Semiconductor
Embedded Floating-Point Unit
Table 5-10 shows the floating-points results summary for fabs, fnabs, fneg.
Table 5-10. Floating-Point Results Summary—fabs, fnabs, fneg
Operand A
fabs
fnabs
fneg
F INV
FOVF
FUNF
FDBZ
F INX

+ 
- 
–A
1
0
0
0
0
NaN
Sign bit cleared
Sign bit set
–A
1
0
0
0
0
denorm
Sign bit cleared
Sign bit set
–A
1
0
0
0
0
zero
zero
zero
zero
0
0
0
0
0
norm
norm
norm
norm
0
0
0
0
0
Table 5-11 shows the floating-point results summary for convert from half-precision.
Table 5-11. Floating-point Results Summary—Convert from half-precision
Operand B
e[v]fscfh
F INV
FOVF
FUNF
FDBZ
F INX

bmax
1
0
0
0
0
NaN
bmax
1
0
0
0
0
denorm
bzero
1
0
0
0
0
zero
bzero
0
0
0
0
0
+norm
_Calc_
0
0
0
0
*
–norm
_Calc_
0
0
0
0
*
Table 5-12 shows the floating-point results summary for convert from half-precision.
Table 5-12. Floating-point Results Summary—Convert to half-precision
5.5
Operand B
e[v]fscth
F INV
FOVF
FUNF
FDBZ
F INX

bmaxhp
1
0
0
0
0
NaN
bmaxhp
1
0
0
0
0
denorm
bzero
1
0
0
0
0
zero
bzero
0
0
0
0
0
+norm
_Calc_
0
*
*
0
*
–norm
_Calc_
0
*
*
0
*
EFPU Instruction Timing
Instruction timing in number of processor clock cycles for EFPU instructions are shown in Table 5-13, and
Table 5-14. Pipelined instructions are shown with cycles of total latency and throughput cycles. Divide
instructions are not pipelined and block other instructions from executing during divide execution.
Instruction pipelining in the CPU is affected by the possibility of a floating-point instruction generating an
exception. A load or store class instruction that follows an EFPU instruction stalls until it can be ensured
that no previous instruction can generate a floating-point exception. This determination is based on which
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-107
Embedded Floating-Point Unit
floating-point exception enable bits are set (FINVE, FOVFE, FUNFE, FDBZE, and FINXE) and at what
point in the FPU pipeline an exception can be guaranteed to not occur. Invalid input operands are detected
in the first stage of the pipeline, while underflow, overflow, and inexactness are determined later in the
pipeline. Best overall performance occurs when either floating-point exceptions are disabled, or when load
and store class instructions are scheduled such that previous floating-point instructions have already
resolved the possibility of exceptional results.
5.5.1
EFPU Single-Precision Vector Floating-Point Instruction Timing
Instruction timing for EFPU vector floating-point instructions is shown in Table 5-13. The table is sorted
by opcode. The number of stall cycles for evfsdiv and evfssqrt is (latency) cycles.
Table 5-13. EFPU Vector Floating-Point Instruction Timing
Instruction
Latency
Throughput
Comments
evfsabs
4
1
—
evfsadd
4
1
—
evfsaddx
4
1
—
evfsaddsub
4
1
—
evfsaddsubx
4
1
—
evfscfh
4
1
—
evfscfsf
4
1
—
evfscfsi
4
1
—
evfscfuf
4
1
—
evfscfui
4
1
—
evfscmpeq
4
1
—
evfscmpgt
4
1
—
evfscmplt
4
1
—
evfscth
4
1
—
evfsctsf
4
1
—
evfsctsi
4
1
—
evfsctsiz
4
1
—
evfsctuf
4
1
—
evfsctui
4
1
—
evfsctuiz
4
1
—
evfsdiff
4
1
evfsdiffsum
4
1
evfsdiv
13
13
evfsmax
4
1
blocking, no overlap with next inst.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-108
Freescale Semiconductor
Embedded Floating-Point Unit
Table 5-13. EFPU Vector Floating-Point Instruction Timing (continued)
1
Instruction
Latency
Throughput
Comments
evfsmin
4
1
evfsmadd
4
11
dest also used as source
evfsmsub
4
1
1
dest also used as source
evfsmul
4
1
—
evfsmule
4
1
—
evfsmulo
4
1
—
evfsmulx
4
1
—
evfsnabs
4
1
—
evfsneg
4
1
—
evfsnmadd
4
11
dest also used as source
evfsnmsub
4
11
dest also used as source
evfssqrt
15
15
blocking, no overlap with next inst.
evfssub
4
1
—
evfssubx
4
1
—
evfssubadd
4
1
—
evfssubaddx
4
1
—
evfssum
4
1
—
evfssumdiff
4
1
—
evfststeq
4
1
—
evfststgt
4
1
—
evfststlt
4
1
—
Destination register is also a source register, so for full throughput, back-to-back operations must use a different dest reg.
5.5.2
EFPU Single-precision Scalar Floating-Point Instruction Timing
Instruction timing for EFPU single-precision scalar floating-point instructions is shown in Table 5-14. The
table is sorted by opcode.
Table 5-14. EFPU Single-precision Scalar Floating-Point Instruction Timing
Instruction
Latency
Throughput
Comments
efsabs
4
1
—
efsadd
4
1
—
efscfh
4
1
—
efscfsf
4
1
—
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-109
Embedded Floating-Point Unit
Table 5-14. EFPU Single-precision Scalar Floating-Point Instruction Timing (continued)
Instruction
Latency
Throughput
Comments
efscfsi
4
1
—
efscfuf
4
1
—
efscfui
4
1
—
efscmpeq
4
1
—
efscmpgt
4
1
—
efscmplt
4
1
—
efscth
4
1
—
efsctsf
4
1
—
efsctsi
4
1
—
efsctsiz
4
1
—
efsctuf
4
1
—
efsctui
4
1
—
efsctuiz
4
1
—
efsdiv
13
13
blocking, no execution overlap with next instruction
efsmadd
4
11
dest also used as source
efsmsub
4
11
dest also used as source
efsmax
4
1
efsmin
4
1
efsmul
4
1
—
efsnabs
4
1
—
efsneg
4
1
—
efsnmadd
4
11
dest also used as source
efsnmsub
4
11
dest also used as source
efssqrt
15
15
blocking, no overlap with next inst.
efssub
4
1
—
efststeq
4
1
—
efststgt
4
1
—
efststlt
4
1
—
Note:
1
Destination register is also a source register, so for full throughput, back-to-back operations must use a different dest reg.
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-110
Freescale Semiconductor
Embedded Floating-Point Unit
5.6
Instruction Forms and Opcodes
Table 5-15 gives the division of the opcode space for the EFPU instructions. This is the architectural
assignment; not all instructions are implemented in all versions of the CPU.
Table 5-15. Opcode Space Division
Opcode Bits
Instruction Class
1
0–5
21–28
4
0101 00xx
Embedded vector floating-point instructions
4
0101 010x
Embedded vector floating-point instructions
4
0101 0110
Embedded scalar floating-point single-precision instructions
4
0101 0111
Reserved (Embedded scalar floating-point double-precision instructions)1
4
0101 10xx
Embedded scalar floating-point single-precision instructions
4
0101 11xx
Reserved (Embedded scalar floating-point double-precision instructions)1
Attempted execution of a defined EFP double-precision instruction results in an unimplemented instruction execution if
MSR[SPE] = 1 or an EFPU unavailable except if MSR[SPE] = 0.
Table 5-16 shows the embedded vector floating-point instruction opcodes.
Table 5-16. Embedded Vector Floating-Point Instruction Opcodes
Comments
Opcode Bits
Instruction
0–5
6–10
11–15
16–20
21–24
25–31
evfsadd
4
rD
rA
rB
0101
0000000
—
evfssub
4
rD
rA
rB
0101
0000001
rA – rB
evfsmadd
4
rD
rA
rB
0101
0000010
—
evfsmsub
4
rD
rA
rB
0101
0000011
—
evfsabs
4
rD
rA
00000
0101
0000100
—
evfsnabs
4
rD
rA
00000
0101
0000101
—
evfsneg
4
rD
rA
00000
0101
0000110
—
evfssqrt
4
rD
rA
00000
0101
0000111
—
evfsmul
4
rD
rA
rB
0101
0001000
—
evfsdiv
4
rD
rA
rB
0101
0001001
—
evfsnmadd
4
rD
rA
rB
0101
0001010
—
evfsnmsub
4
rD
rA
rB
0101
0001011
—
evfscmpgt
4
crfD 00
rA
rB
0101
0001100
—
evfscmplt
4
crfD 00
rA
rB
0101
0001101
—
evfscmpeq
4
crfD 00
rA
rB
0101
0001110
—
—
4
—
—
—
0101
0001111
—
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-111
Embedded Floating-Point Unit
Table 5-16. Embedded Vector Floating-Point Instruction Opcodes (continued)
Comments
Opcode Bits
Instruction
0–5
6–10
11–15
16–20
21–24
25–31
evfscfui
4
rD
00000
rB
0101
0010000
—
evfscfsi
4
rD
00000
rB
0101
0010001
—
evfscfh
4
rD
00100
rB
0101
0010001
—
evfscfuf
4
rD
00000
rB
0101
0010010
—
evfscfsf
4
rD
00000
rB
0101
0010011
—
evfsctui
4
rD
00000
rB
0101
0010100
—
evfsctsi
4
rD
00000
rB
0101
0010101
—
evfscth
4
rD
00100
rB
0101
0010101
—
evfsctuf
4
rD
00000
rB
0101
0010110
—
evfsctsf
4
rD
00000
rB
0101
0010111
—
evfsctuiz
4
rD
00000
rB
0101
0011000
—
—
4
—
—
—
0101
0011001
—
evfsctsiz
4
rD
00000
rB
0101
0011010
—
—
4
—
—
—
0101
0011011
—
evfststgt
4
crfD 00
rA
rB
0101
0011100
—
evfststlt
4
crfD 00
rA
rB
0101
0011101
—
evfststeq
4
crfD 00
rA
rB
0101
0011110
—
—
4
—
—
—
0101
0011111
—
evfsmax
4
rD
rA
rB
0101
0100000
—
evfsmin
4
rD
rA
rB
0101
0100001
—
evfsaddsub
4
rD
rA
rB
0101
0100010
—
evfssubadd
4
rD
rA
rB
0101
0100011
rA – rB; rA + rB
evfssum
4
rD
rA
rB
0101
0100100
—
evfsdiff
4
rD
rA
rB
0101
0100101
—
evfssumdiff
4
rD
rA
rB
0101
0100110
—
evfsdiffsum
4
rD
rA
rB
0101
0100111
—
evfsaddx
4
rD
rA
rB
0101
0101000
—
evfssubx
4
rD
rA
rB
0101
0101001
—
evfsaddsubx
4
rD
rA
rB
0101
0101010
—
evfssubaddx
4
rD
rA
rB
0101
0101011
rA – rB; rA + rB
evfsmulx
4
rD
rA
rB
0101
0101100
—
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-112
Freescale Semiconductor
Embedded Floating-Point Unit
Table 5-16. Embedded Vector Floating-Point Instruction Opcodes (continued)
Comments
Opcode Bits
Instruction
0–5
6–10
11–15
16–20
21–24
25–31
—
4
rD
rA
rB
0101
0101101
—
evfsmule
4
rD
rA
rB
0101
0101110
—
evfsmulo
4
rD
rA
rB
0101
0101111
—
Table 5-17 shows the embedded vector floating-point instruction opcodes.
Table 5-17. Embedded Scalar Single-Precision Floating-Point Instruction Opcodes
Opcode Bits
Instruction
Comments
0–5
6–10
11–15
16–20
21–24
25–31
efsmax
4
rD
rA
rB
0101
0110000
—
efsmin
4
rD
rA
rB
0101
0110001
—
efsadd
4
rD
rA
rB
0101
1000000
—
efssub
4
rD
rA
rB
0101
1000001
rA – rB
efsmadd
4
rD
rA
rB
0101
1000010
—
efsmsub
4
rD
rA
rB
0101
1000011
—
efsabs
4
rD
rA
00000
0101
1000100
—
efsnabs
4
rD
rA
00000
0101
1000101
—
efsneg
4
rD
rA
00000
0101
1000110
—
efssqrt
4
rD
rA
00000
0101
1000111
—
efsmul
4
rD
rA
rB
0101
1001000
—
efsdiv
4
rD
rA
rB
0101
1001001
—
efsnmadd
4
rD
rA
rB
0101
1001010
—
efsnmsub
4
rD
rA
rB
0101
1001011
—
efscmpgt
4
crfD 00
rA
rB
0101
1001100
—
efscmplt
4
crfD 00
rA
rB
0101
1001101
—
efscmpeq
4
crfD 00
rA
rB
0101
1001110
—
efscfd
4
rD
00000
rB
0101
1001111
efscfui
4
rD
00000
rB
0101
1010000
—
efscfsi
4
rD
00000
rB
0101
1010001
—
efscfh
4
rD
00100
rB
0101
1010001
—
efscfuf
4
rD
00000
rB
0101
1010010
—
efscfsf
4
rD
00000
rB
0101
1010011
—
optional, not implemented
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
5-113
Embedded Floating-Point Unit
Table 5-17. Embedded Scalar Single-Precision Floating-Point Instruction Opcodes (continued)
Opcode Bits
Instruction
Comments
0–5
6–10
11–15
16–20
21–24
25–31
efsctui
4
rD
00000
rB
0101
1010100
—
efsctsi
4
rD
00000
rB
0101
1010101
—
efscth
4
rD
00100
rB
0101
1010101
—
efsctuf
4
rD
00000
rB
0101
1010110
—
efsctsf
4
rD
00000
rB
0101
1010111
—
efsctuiz
4
rD
00000
rB
0101
1011000
—
—
4
—
—
—
0101
1011001
—
efsctsiz
4
rD
00000
rB
0101
1011010
—
—
4
—
—
—
0101
1011011
—
efststgt
4
crfD 00
rA
rB
0101
1011100
—
efststlt
4
crfD 00
rA
rB
0101
1011101
—
efststeq
4
crfD 00
rA
rB
0101
1011110
—
—
4
—
—
—
0101
1011111
—
e200z7 Power Architecture Core Reference Manual, Rev. 2
5-114
Freescale Semiconductor
Chapter 6
Signal Processing Extension (SPE)
This chapter provides an overview of the signal processing engine, version 2.1, which is designed to
accelerate signal processing applications normally suited to DSP operation. This is accomplished using
short vectors (two, four, or eight elements) within 64-bit GPRs and using single instruction multiple data
(SIMD) operations to perform the requisite computations. SPE2.1 also architects an accumulator register
to allow for certain back to back operations without loop unrolling. SPE2.1 is fully backward compatible
with the original SPE. The remainder of this document uses the term SPE to refer to version 2.1 unless
otherwise noted.
6.1
Nomenclature and Conventions
Several conventions regarding nomenclature are used in this chapter:
• Due to historical precedent, the terms SPE and SIMD are sometimes used interchangeably.
• The signal processing engine is abbreviated as SPE.
• All register bit numbering is 64-bit with bit 0 being the most significant bit. Registers that are only
32-bit define bit 32 as the most significant bit. For both 32-bit and 64-bit registers, bit 63 is the least
significant bit.
• Bits 0–31 of a 64-bit register are referenced as word 0, upper word, even word, or high word
element of the register. Bits 32–63 are referred to as word 1, lower word, odd word, or low word
element of the register. Each word is an element of a 64-bit GPR.
• Bits 0–15 of a 64-bit register are referenced as half word 0. Bits 16–31 are referred to as
half word 1. Bits 32–47 are referenced as half word 2. Bits 48–63 are referred to as half word 3.
Each half word is an element of a 64-bit GPR.
• Bits 0–7 of a 64-bit register are referenced as byte 0. Bits 8–15 are referred to as byte 1. Bits 16–23
are referenced as byte 2. Bits 24–31 are referred to as byte 3. Bits 32–39 are referred to as byte 4.
Bits 40–47 are referenced as byte 5. Bits 48–55 are referred to as byte 6. Bits 56–63 are referenced
as byte 7. Each byte is an element of a 64-bit GPR.
• Bits 0–15 and bits 32–47 are referenced as even half words. Bits 16–31 and bits 48–63 are
referenced as odd half words. Bits 0–15 and bits 16–31 are referenced as upper half words. Bits
32–47 and bits 48–63 are referenced as lower half words.
• Mnemonics for SPE instructions generally begin with the letters ‘ev’ (embedded vector).
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-1
Signal Processing Extension (SPE)
Table 6-1 shows RTL conventions that are used in this chapter.
Table 6-1. RTL Notation
Notation
sf
Meaning
Signed fractional multiplication.
Result of multiplying 2 quantities having bit lengths x and y taking the least significant x + y – 1 bits of
the product and concatenating a 0 to the least significant bit forming a signed fractional result of x + y
bits.
si
Signed integer multiplication
su, sui
6.2
Signed by Unsigned multiplication (same for int and frac)
ui
Unsigned integer multiplication
<<
Logical shift left. x << y shifts value x left by y bits, leaving zeros in the vacated bits.
>>
Logical shift right. x >> y shifts value x right by y bits, leaving zeros in the vacated bits.
SPE Programming Model
The e200z760n3 core provides a register file with thirty-two 64-bit registers. The embedded category in
the Power ISA instructions operate on the lower (least significant) 32 bits of the 64-bit register. SPE
instructions generally take elements from each source register and operate on them with the corresponding
elements of a second source register (and/or the accumulator) to produce results. Results are placed in the
destination register and/or the accumulator. Vector instructions (i.e. produce results of more than one
element) provide results for each element that are independent of the computation of the other elements.
These instructions can also be used to perform scalar DSP operations by ignoring the results of the upper
32-bit half of the register file.
SPE compare instructions and set instructions with record store the comparison result into the condition
register (CR). The meaning of the CR bits are now overloaded for SPE operations. SPE compare
instructions specify a CR field, two source registers, and the type of compare: greater than, less than, or
equal. Two bits of the CR field are written with the result of the vector compare: one for each of the high
and low 32-bits of the result. The remaining two bits reflect the ANDing and ORing of the vector compare
results. An additional set of compare instructions (evsetxx[.]) return a set of Boolean values into a
destination register, allowing for subsequent predicated computational operations, such as a select
operation to be performed.
A partially visible accumulator register is architected for the SPE integer and fractional multiply
accumulate forms of instructions. Its usage is described in Section 6.2.2, “Accumulator Register.”
6.2.1
GPR Registers
The SPE requires a GPR register file with thirty-two 64-bit registers. For 32-bit implementations, the
embedded category of the Power ISA instructions that normally operate on a 32-bit register, access and
change only the least significant 32-bits of the GPRs. They leave the most significant 32-bits unchanged.
SPE instructions view the 64-bit register as being composed of a vector of elements, each of which is 32
bits, 16 bits, or 8 bits wide.
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-2
Freescale Semiconductor
Signal Processing Extension (SPE)
Nomenclature is as follows:
• The most significant 32 bits are called word 0 (W0), the upper word, high word or even word.
• The least significant 32 bits are called word 1 (W1), the lower word, low word or odd word.
• Half word elements are called half word 0, 1, 2, or 3, from most significant to least significant.
• Byte elements are called byte 0, 1, 2, 3, 4, 5, 6, or 7, from most significant to least significant.
Unless otherwise specified, SPE instructions write all 64 bits of the destination register.
Figure 6-1 shows vector storage in GPRs.
0
7 8
15 16
23 24
31 32
GPR
39 40
47 48
55 56
63
Double word
GPR
Upper word (word 0)
GPR
half word 0
GPR
byte 0
Lower word (word 1)
half word 1
byte 1
byte 2
half word 2
byte 3
byte 4
half word 3
byte 5
byte 6
byte 7
Figure 6-1. Vector Storage in GPRs
6.2.2
Accumulator Register
The accumulator is a 64-bit register that allows the back-to-back execution of dependent MAC and dot
product instructions, something that is found in the inner loops of DSP code such as FIR and FFT filters.
The accumulator is partially visible to the programmer in that its results do not have to be explicitly read
to use them. Instead, they are always copied into a 64-bit destination GPR that is specified as part of the
instruction. However, the accumulator has to be explicitly initialized when starting a new accumulation
loop.
The accumulator is for used the following kinds of instructions:
• Certain integer/fractional accumulation
• Multiply accumulate (MAC)
• Dot product
• Summation forms
Based upon the type of instruction, the accumulator can hold either a single 64-bit value or a vector of two
32-bit elements, a vector of four 16-bit elements, or vector of eight 8-bit elements. In addition, for certain
instructions, the accumulator can be updated along with the destination register.
Figure 6-2 shows accumulator storage.
0
15 16
31 32
ACC
ACC
63
Double word
ACC
ACC
47 48
Upper word
half word 0
byte 0
byte 1
Lower word
half word 1
byte 2
byte 3
half word 2
byte 4
byte 5
half word 3
byte 6
byte 7
Figure 6-2. Accumulator Storage
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-3
Signal Processing Extension (SPE)
An example of a MAC instruction is evmhossfaaw rD,rA,rB. In this instruction, the least significant 16
bits of rA and rB are multiplied for both elements of the vector; the result is shifted left one bit and added
to the accumulator; and the result is possibly saturated to 32 bits in case of overflow. The final result is
placed both in the accumulator and also in rD. Therefore, the result of this instruction can be used by
accessing rD.
To read the accumulator contents into a register, the evmar instruction is used. To initialize the
accumulator, the evmra instruction or another instruction targeting the accumulator such as evsplatixxa is
used.
6.2.3
SPE Status and Control Register (SPEFSCR)
The e200 z760n3 core implements the SPEFSCR register for status reporting and control of SPE
instructions. This register is also used by the embedded floating-point units. Status and control bits
are shared for floating-point operations and SPE operations. The SPEFSCR register is implemented
as special purpose register (SPR) number 512 and is read and written by the mfspr and mtspr instructions
in both user and supervisor mode. SPE instructions affect both the high element (bits 0–1) and low element
status flags (bits 16–17).
Figure 6-3 shows the SPEFSCR.
SPR 512
0
R
W
Access: Read/Write
1
2
3
4
5
6
7
8
9
SOVH OVH FGH FXH FINVH FDBZH FUNFH FOVFH —
Reset
R
W
RM
10
11
12
13
14
15
FINXS FINVS FDBZS FUNFS FOVFS MODE
All zeros
16
17
18
19
20
21
22
SOV
OV
FG
FX
FINV
FDBZ
FUNF
Reset
23
24
25
26
27
28
29
FOVF — FINXE FINVE FDBZE FUNFE FOVFE
30
31
FRMC
All zeros
Figure 6-3. SPE/EFPU Status and Control Register (SPEFSCR)
Table 6-2 describes the SPEFSCR bits.
Table 6-2. SPE Status and Control Register
Bits
Name
Description
0
(32)
SOVH
1
(33)
OVH
Integer Overflow High
The OVH bit is set to 1 whenever an integer or fractional SPE instruction signals an overflow in the
upper half of the result.
2
(34)
FGH
Embedded Floating-Point Guard bit High
Defined by Embedded Floating-Point APUs.
3
(35)
FXH
Embedded Floating-Point Inexact bit High
Defined by Embedded Floating-Point APUs.
Summary Integer Overflow High
The SOVH bit is set to 1 whenever an instruction sets OVH. The SOVH bit remains set until it is
cleared by a mtspr instruction specifying the SPEFSCR register.
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-4
Freescale Semiconductor
Signal Processing Extension (SPE)
Table 6-2. SPE Status and Control Register (continued)
Bits
Name
4
(36)
FINVH
Embedded Floating-Point Invalid Operation/Input error High
Defined by Embedded Floating-Point APUs.
5
(37)
FDBZH
Embedded Floating-Point Divide by Zero High
Defined by Embedded Floating-Point APUs.
6
(38)
FUNFH
Embedded Floating-Point Underflow High
Defined by Embedded Floating-Point APUs.
7
(39)
FOVFH
Embedded Floating-Point Overflow High
Defined by Embedded Floating-Point APUs.
8
(40)
—
Description
Reserved
9
(41)
RM
Rounding Mode - Fixed Point
0 Normal Rounding (Biased-rounding), rounding performed by adding 1/2 lsb
1 Round to Nearest Even Rounding (convergent rounding), round to nearest even value
10
(42)
FINXS
Embedded Floating-Point Inexact Sticky Flag
Defined by Embedded Floating-Point APUs.
11
(43)
FINVS
Embedded Floating-Point Invalid Operation Sticky Flag
Defined by Embedded Floating-Point APUs.
12
(44)
FDBZS
Embedded Floating-Point Divide by Zero Sticky Flag
Defined by Embedded Floating-Point APUs.
13
(45)
FUNFS
Embedded Floating-Point Underflow Sticky Flag
Defined by Embedded Floating-Point APUs.
14
(46)
FOVFS
Embedded Floating-Point Overflow Sticky Flag
Defined by Embedded Floating-Point APUs.
15
(47)
MODE
Embedded Floating-Point Operating Mode
Defined by Embedded Floating-Point APUs.
16
(48)
SOV
Summary Integer Overflow
The SOV bit is set to 1 whenever an instruction sets OV. The SOV bit remains set until it is cleared by
an mtspr instruction specifying the SPEFSCR register.
17
(49)
OV
Integer Overflow
The OV bit is set to 1 whenever an integer or fractional SPE instruction signals an overflow in the low
element result.
18
(50)
FG
Embedded Floating-Point Guard bit (low/scalar)
Defined by Embedded Floating-Point APUs.
19
(51)
FX
Embedded Floating-Point Inexact bit (low/scalar)
Defined by Embedded Floating-Point APUs.
20
(52)
FINV
Embedded Floating-Point Invalid Operation / Input error (low/scalar)
Defined by Embedded Floating-Point APUs.
21
(53)
FDBZ
Embedded Floating-Point Divide by Zero (low/scalar)
Defined by Embedded Floating-Point APUs.
22
(54)
FUNF
Embedded Floating-Point Underflow (low/scalar)
Defined by Embedded Floating-Point APUs.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-5
Signal Processing Extension (SPE)
Table 6-2. SPE Status and Control Register (continued)
Bits
Name
23
(55)
FOVF
24
(56)
—
25
(57)
FINXE
Embedded Floating-Point Round (Inexact) Exception Enable
Defined by Embedded Floating-Point APUs.
26
(58)
FINVE
Embedded Floating-Point Invalid Operation / Input Error Exception Enable
Defined by Embedded Floating-Point APUs.
27
(59)
FDBZE
Embedded Floating-Point Divide by Zero Exception Enable
Defined by Embedded Floating-Point APUs.
28
(60)
FUNFE
Embedded Floating-Point Underflow Exception Enable
Defined by Embedded Floating-Point APUs.
29
(61)
FOVFE
Embedded Floating-Point Overflow Exception Enable
Defined by Embedded Floating-Point APUs.
30–31
(62–63)
FRMC
Embedded Floating-Point Rounding Mode Control
Defined by Embedded Floating-Point APUs.
6.2.3.1
Description
Embedded Floating-Point Overflow (low/scalar)
Defined by Embedded Floating-Point APUs.
Reserved
Context Switch
When a context switch occurs, the OS process must explicitly save the accumulator as part of the context
of the swapped-out task and then explicitly load the accumulator from the context of the new task that is
being swapped in. When the old task is restarted, its accumulator must be restored before restarting the
task.
6.2.4
GPRs and Power ISA Instructions
The e200 z760n3 core implements the 32-bit forms of the embedded category instructions in the Power
ISA. All 32-bit Power ISA instructions operate upon the lower half of the 64-bit GPR. These instructions
do not affect the upper half of a GPR.
6.2.5
SPE Available Bit in MSR
MSR[SPE] is defined as the SPE available bit. If this bit is clear and software attempts to execute any of
the SPE instructions other than the brinc instruction (which does not affect the upper 32-bits of a GPR),
the SPE unavailable exception is taken. If this bit is set, software can execute any of the SPE instructions.
6.2.6
SPE Exception Bit in ESR
ESR[SPE] is defined as the SPE exception bit. This bit is set whenever the processor takes an exception
related to the execution of the SPE instructions.
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-6
Freescale Semiconductor
Signal Processing Extension (SPE)
6.2.7
Data Formats
The SPE provides two different data formats, integer and fractional. Integer data formats can be treated as
signed or unsigned quantities. Fractional data formats are usually treated as signed quantities
6.2.7.1
Integer Format
Integer data format is the same as what is conventionally used in computing.
Unsigned integers consist of 8, 16, 32, or 64-bit binary integer values. The largest representable value is
2n – 1 where n represents the number of bits in the value. The smallest representable value is 0. Certain
computations that produce values larger than 2n – 1 or smaller than 0 set OV or OVH in the SPEFSCR.
Signed integers consist of 8, 16, 32, or 64-bit binary values in twos-complement form. The largest
representable value is 2n-1 – 1 where n represents the number of bits in the value. The smallest
representable value is –2n-1. Certain computations that produce values larger than 2n-1 – 1 or smaller than
–2n-1 set OV or OVH in the SPEFSCR.
6.2.7.2
Fractional Format
Fractional data format is the same that is conventionally used for DSP fractional arithmetic. Fractional data
is useful for representing data converted from analog devices.
Unsigned fractions consist of 16, 32, or 64-bit binary fractional values that range from 0 to less than 1.
Unsigned fractions place the decimal point immediately to the left of the most significant bit. The most
significant bit of the value represents the value 2-1, the next most significant bit represents the value 2-2
and so on. The largest representable value is 1 – 2-n where n represents the number of bits in the value. The
smallest representable value is 0. Certain computations that produce values larger than 1 – 2-n or smaller
than 0 set OV or OVH in the SPEFSCR. SPE does not contain explicit instructions that manipulate
unsigned fractional data. Unsigned integer forms produce the same bit exact results as unsigned fractional
values would, therefore unsigned fractional instruction forms are not defined for SPE.
Signed fractions consist of 16, 32, or 64-bit binary fractional values in twos complement form that range
from –1 to less than 1. Signed fractions in 1.31 or 1.63 format place the decimal point immediately to the
right of the most significant bit. The largest representable value is 1 – 2-(n-1) where n represents the number
of bits in the value. The smallest representable value is –1. Certain computations that produce values larger
than 1 – 2-(n-1)or smaller than –1 set OV or OVH in the SPEFSCR. Multiplication of two signed fractional
values causes the result to be shifted left one bit to remove the resultant redundant sign bit in the product.
In this case, a 0 bit is concatenated as the least significant bit of the shifted result.
Guarded fractional representations are also available in 33.31 format and in 17.47 format for a subset of
operations, providing for significant guarding capabilities.
6.2.8
Computational Operations
SPE supports several different computational capabilities. Both modulo and saturation results can be
performed. Modulo results produce truncation of the overflow bits in a calculation. Saturation provides a
maximum or minimum representable value (for the data type) for overflow or underflow respectively.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-7
Signal Processing Extension (SPE)
Instructions are provided for a wide range of computational capability. The operation types can be divided
into several basic categories:
• Simple vector instructions. These instructions use the corresponding elements of the operands to
produce a vector result that is placed in the destination register, the accumulator, or both.
0
31
32
63
rA
rB
operation
operation
rD
•
•
•
•
•
— arithmetic, logical, shift, and rotate of vector elements
— Averaging, summation, rounding, min, max, sum of absolute differences, absolute differences,
saturation, operations
— vector permutation, packing, unpacking, merge, swap, extraction, interleave, de-interleave
operations
Multiply and accumulate instructions. These instructions perform multiply operations, optionally
add the result to the accumulator and place the result into the destination register and optionally
into the accumulator. These instructions are composed of different multiply forms, data formats
and data accumulate options.
Dot product instructions. These instructions perform multiple multiply operations, optionally add
the results to the accumulator, and place the result into the destination register and optionally into
the Accumulator. These instructions are composed of different forms, data formats and data
accumulate options.
Load and store instructions. These instructions provide load and store capabilities for moving data
to and from memory. A variety of forms are provided that position data for efficient computation.
Compare instructions and set instructions.
Miscellaneous instructions. These instructions perform miscellaneous functions such as field
manipulation, bit-reversed and circular incrementing, count leading, and more.
6.2.8.1
Simple Vector Arithmetic Instructions
Simple vector arithmetic instructions are outlined in Table 6-3.
Table 6-3. Simple Vector Arithmetic Instructions
Basic
Operation
Variants
Description
ACC?
evabsb, evabsh, evabs, evabsd
absolute value byte, half word, word, double word
elements
—
evabsbs, evabshs, evabss, evabsds
abs b, h, w, d with saturation
—
Absolute Value
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-8
Freescale Semiconductor
Signal Processing Extension (SPE)
Table 6-3. Simple Vector Arithmetic Instructions
Basic
Operation
Absolute
Difference
Add
Variants
Count Leading
ACC?
evabsdifsb, evabsdifsh, evabsdifsw,
evabsdifub, evabsdifuh, evabsdifuw
absolute difference signed/unsigned byte, half word,
word elements
—
evaddb, evaddh, evaddw, evaddd
add byte, half word, word, double word elements
—
evaddbss, evaddhss, evaddwss,
evadddss
evaddbus, evaddhus, evaddwus,
evadddus
add byte, half word, word, double word elements with
signed or unsigned saturation
—
evaddhx, evaddhxss, evaddhxus
add exchanged half word elements with optional signed
or unsigned saturation. The even and odd half word
elements of operand rA are pairwise exchanged before
adding
—
evaddwx, evaddwxss, evaddwxus
add exchanged word elements with optional signed or
unsigned saturation. The high and low word elements of
operand rA are exchanged before adding
—
evaddib, evaddih, evaddiw
add unsigned imm value UIMM to all elements
—
evaddsmiaaw, evaddssiaaw,
evaddumiaaw, evaddusiaaw
add word elements from rA and Accumulator using
signed/unsigned modulo/saturation operations, results
into rD and Accumulator
Y
evaddsmiaa, evaddssiaa, evaddusiaa
add 64-bit value in rA and Accumulator with optional
signed/unsigned saturation, result into rD and
Accumulator
Y
evadd2subf2h, evadd2subf2hss
add for upper 2 half word elements, subf for lower 2
elements, with optional signed saturation.
—
evaddsubfh, evaddsubfhss
add for even half word elements, subf for odd elements,
with optional signed saturation.
—
evaddsubfhx, evaddsubfhxss
The even and odd half word elements of operand rA are
pairwise exchanged and then the resulting even
elements are added and the odd elements are
subtracted to/from elements in rB, with optional signed
saturation.
—
evaddsubfw, evaddsubfwss
The high word element of rA is added and the low word
element of rA is subtracted to/from the corresponding
element of rB, with optional signed saturation.
—
evaddsubfwx, evaddsubfwxss
The word elements of rA are exchanged and then the
resulting high word element is added and low word
elements is subtracted to/from word elements of rB,
with optional signed saturation.
—
evavgbs, evavghs, evavgws, evavgds,
compute the average of corresponding elements in rA
evavgbsr, evavghsr, evavgwsr, evavgdsr and rB, signed/unsigned with optional rounding
evavgbu, evavghu, evavgwu, evavgdu
evavgbur, evavghur, evavgwur, evavgdur
—
evcntlsh, evcntlzh
evcntlsw, evcntlzw
—
AddSubf
Average
Description
count leading sign/zero bits in each half word
count leading sign/zero bits in each word
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-9
Signal Processing Extension (SPE)
Table 6-3. Simple Vector Arithmetic Instructions
Basic
Operation
Divide
Variants
Description
ACC?
evdivws, evdivwu,
evdivwsf, evdivwuf
evdivs, evdivu
32 / 32  32 signed, unsigned integer
32 / 32  32 signed, unsigned fractional
64 / 64  64, signed, unsigned
—
evextsb, evextzb
the low byte of each word element in rA is sign/zero
extended to a word and placed into rD
—
evextsbh
the odd bytes of rA are sign extended to half words and
placed into rD
—
evextsh, evextzh (use evclrh)
the odd half word elements in rA are sign/zero extended
to a word and placed into rD
—
evextsw
the low word element in rA is sign extended to 64-bits
and placed into rD
—
Extend
evmaxbs, evmaxhs, evmaxws, evmaxds maximum of elements in rA signed; b, h, w, d
evmaxbu, evmaxhu, evmaxwu, evmaxdu maximum of elements in rA unsigned; b, h, w, d
—
evmaxbpsh, evmaxbpuh
pairwise maximum of bytes in rA extended to half word,
signed/unsigned
—
evmaxhpsw, evmaxhpuw
pairwise maximum of half words in rA extended to word,
signed/unsigned
—
evmaxwpsd, evmaxwpud
pairwise maximum of words in rA extended to double
word, signed/unsigned
—
evmaxmagws
pairwise maximum of magnitude values of signed words
in rA
—
evminbs, evminhs, evminws, evminds
evminbu, evminhu, evminwu, evmindu
minimum of elements in rA signed; b, h, w, d
minimum of elements in rA unsigned; b, h, w, d
—
evminbpsh, evminbpuh
pairwise minimum of bytes in rA extended to half word,
signed/unsigned
—
evminhpsw, evminhpuw
pairwise minimum of half words in rA extended to word,
signed/unsigned
—
evminwpsd, evminwpud
pairwise minimum of words in rA extended to double
word, signed/unsigned
—
evnegb, evnegh, evneg, evnegd
negate signed elements in rA; b,h,w,d
—
evnegbs, evneghs, evnegs, evnegds
negate signed elements in rA with saturation; b,h,w,d
—
evnegbo, evnegho, evnegwo
negate signed odd elements in rA; b,h,w
—
evnegbos, evneghos, evnegwos
negate signed odd elements in rA with saturation; b,h,w
—
Maximum
Maximum
Magnitude
Minimum
Negate
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-10
Freescale Semiconductor
Signal Processing Extension (SPE)
Table 6-3. Simple Vector Arithmetic Instructions
Basic
Operation
Round
Sum of
Absolute
Differences
Variants
Description
ACC?
evrndhb, evrndhbss, evrndhbus
The four half word elements of rA are rounded into
8-bits and placed into the even bytes of rD with optional
signed or unsigned saturation
—
evrndhnb, evrndhnbss, evrndhnbus
The four half word elements of rA are rounded into
8-bits using round to nearest even rounding and placed
into the even bytes of rD with optional signed or
unsigned saturation
—
evrndwh, evrndwhss, evrndwhus
The two word elements of rA are rounded into 16-bits
and placed into the even half words of rD with optional
signed or unsigned saturation
—
evrndwnh, evrndwnhss, evrndwnhus
The two word elements of rA are rounded into 16-bits
using round to nearest even rounding and placed into
the even half words of rD with optional signed or
unsigned saturation
—
evrnddw, evrnddwss, evrnddwus
The double word element of rA is rounded into 32-bits
and placed into the high word of rD with optional signed
or unsigned saturation. The low word is cleared.
—
evrndndw, evrndndwss, evrndndwus
The double word element of rA is rounded into 32-bits
using round to nearest even rounding and placed into
the high word of rD with optional signed or unsigned
saturation. The low word is cleared.
—
evsad2sh, evsad2sha, evsad2shaaw
Sums of pairs of absolute differences of 2 signed half
words, optionally loading the Accumulator, or
accumulating with the Accumulator values
opt.
evsad2uh, evsad2uha, evsad2uhaaw
Sums of pairs of absolute differences of 2 unsigned half
words, optionally loading the Accumulator, or
accumulating with the Accumulator values
opt.
evsad4sb, evsad4sba, evsad4sbaaw
Sums of four absolute differences of 2 signed bytes,
optionally loading the Accumulator, or accumulating
with the Accumulator values
opt.
evsad4ub, evsad4uba, evsad4ubaaw
Sums of four absolute differences of 2 unsigned bytes,
optionally loading the Accumulator, or accumulating
with the Accumulator values
opt.
evsadsw, evsadswa, evsadswaa
Sum of pair of absolute differences of 2 signed words,
optionally loading the Accumulator, or accumulating
with the Accumulator value
opt.
evsaduw, evsaduwa, evsaduwaa
Sum of pair of absolute differences of 2 unsigned
words, optionally loading the Accumulator, or
accumulating with the Accumulator value
opt.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-11
Signal Processing Extension (SPE)
Table 6-3. Simple Vector Arithmetic Instructions
Basic
Operation
Saturate
Subf
Variants
Description
ACC?
evsatsbub
Saturate signed byte to unsigned byte range
evsatubsb
Saturate unsigned byte to signed byte range
—
evsatsdsw, evsatsduw
Saturate signed double word to signed or unsigned
word range
—
evsatuduw
Saturate unsigned double word to unsigned word range
—
evsatshsb, evsatshub
Saturate signed half word to signed or unsigned byte
range
—
evsatshuh
Saturate signed half word to unsigned half word range
—
evsatuhub
Saturate unsigned half word to unsigned byte range
—
evsatuhsh
Saturate unsigned half word to signed half word range
—
evsatswgsdf
Saturate signed word guarded (17.47) to signed double
word fractional (1.63) range
—
evsatswsh, evsatswuh
Saturate signed word to signed or unsigned half word
range
—
evsatswuw
Saturate signed word to unsigned word range
—
evsatuwuh
Saturate unsigned word to unsigned half word range
—
evsatuwsw
Saturate unsigned word to signed word range
—
evsubfb, evsubfh, evsubfw, evsubfd
subtract byte, half word, word, double word elements
—
evsubfbss, evsubfhss, evsubfwss,
evsubfdss
evsubfbus, evsubfhus, evsubfwus,
evsubfdus
subtract byte, half word, word, double word elements
with signed or unsigned saturation
—
evsubfhx, evsubfhxss, evsubfhxus
subtract exchanged half word elements with optional
signed or unsigned saturation. The even and odd half
word elements of operand rA are pairwise exchanged
before subtracting
—
evsubfwx, evsubfwxss, evsubfwxus
subtract exchanged word elements with optional signed
or unsigned saturation. The high and low word elements
of operand rA are exchanged before subtracting
—
evsubifb, evsubifh, evsubifw
subtract unsigned imm value UIMM from all elements
—
evsubfsmiaaw, evsubfssiaaw,
evsubfumiaaw, evsubfusiaaw
subtract word elements in rA from Accumulator using
signed/unsigned modulo/saturation operations, results
into rD and Accumulator
Y
evsubfsmiaa, evsubfssiaa, evsubfusiaa
subtract 64-bit value in rA from Accumulator with
optional signed/unsigned saturation, result into rD and
Accumulator
Y
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-12
Freescale Semiconductor
Signal Processing Extension (SPE)
Table 6-3. Simple Vector Arithmetic Instructions
Basic
Operation
Variants
Description
ACC?
evsubf2add2h, evsubf2add2hss
subtract for upper 2 half word elements, add for lower 2
elements, with optional signed saturation.
—
evsubfaddh, evsubfaddhss
subtract for even half word elements, add for odd
elements, with optional signed saturation.
—
evsubfaddhx, evsubfaddhxss
The even and odd half word elements of operand rA are
pairwise exchanged and then the resulting even
elements are subtracted and the odd elements are
added from/to elements in rB, with optional signed
saturation.
—
evsubfaddw, evsubfaddwss
The low word element of rA is added and the high word
element of rA is subtracted to/from the corresponding
element of rB, with optional signed saturation.
—
evsubfaddwx, evsubfaddwxss
The word elements of rA are exchanged and then the
resulting high word element is subtracted and low word
element is added from/to word elements of rB, with
optional signed saturation.
—
SubfAdd
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-13
Signal Processing Extension (SPE)
Table 6-3. Simple Vector Arithmetic Instructions
Basic
Operation
Summation/
Diff
Variants
Description
ACC?
evsumws, evsumwu, evsumwsa,
evsumwua
The signed or unsigned word elements of rA are
summed together into 64 bits and placed into rD and
optionally into the Accumulator
opt
evsumwsaa, evsumwuaa
The signed or unsigned word elements of rA are
summed together along with the contents of the
Accumulator and placed into rD and the Accumulator
Y
evsum2hs, evsum2hu, evsum2hsa,
evsum2hua
Signed or unsigned pairs of half word elements of rA are
summed together into words and placed into rD and
optionally into the Accumulator
opt
evsum2hsaaw, evsum2huaaw
Signed or unsigned pairs of half word elements of rA are
summed together along with the contents of the
corresponding word element of the accumulator, into
words, and placed into rD and the Accumulator
Y
evsum4bs, evsum4bu, evsum4bsa,
evsum4bua
Signed or unsigned quads of byte elements of rA are
summed together into words and placed into rD and
optionally into the Accumulator
opt
evsum4bsaaw, evsum4buaaw
Signed or unsigned quads of byte elements of rA are
summed together along with the contents of the
corresponding word element of the accumulator, into
words, and placed into rD and the Accumulator
Y
evsum2his, evsum2hisa
Signed pairs of interleaved half word elements of rA are
summed together into words and placed into rD and
optionally into the Accumulator
opt
evsum2hisaaw
Signed pairs of interleaved half word elements of rA are
summed together along with the contents of the
corresponding word element of the accumulator, into
words, and placed into rD and the Accumulator
Y
evdiff2his, evdiff2hisa
Signed pairs of interleaved half word elements of rA are
subtracted to produce a pair of word differences and
placed into rD and optionally into the Accumulator
opt
evdiff2hisaaw
Signed pairs of interleaved half word elements of rA are
subtracted to produce a pair of word differences and the
differences are added together with the contents of the
corresponding word element of the accumulator, into
words, and placed into rD and the Accumulator
Y
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-14
Freescale Semiconductor
Signal Processing Extension (SPE)
6.2.8.2
Vector Logical Instructions
Vector logical instructions are outlined in Table 6-4.
Table 6-4. Simple Vector Logical Instructions
Basic
Operation
Variants
Description
AND
evand
AND word elements of rA and rB
ANDC
evandc
AND word elements of rA with complemented elements of rB
evclrbe, evclrbo
Clear (zero) even bytes of source value in rA using immediate
mask (mask). Clear (zero) odd bytes of source value in rA
using immediate mask (mask).
evclrh
Clear (zero) half word elements of source value in rA using
immediate mask (mask).
evnand
NAND word elements of rA and rB
NOR
evnor
NOR word elements of rA and rB
OR
evor
OR word elements of rA and rB
ORC
evorc
OR word elements of rA with complemented elements of rB
XNOR
eveqv
XNOR word elements of rA and rB
XOR
evxor
XOR word elements of rA and rB
Clear
NAND
6.2.8.3
Vector Shift/Rotate Instructions
Vector shift and rotate instructions are outlined in Table 6-5.
Table 6-5. Simple Vector Shift/Rotate Instructions
Basic
Operation
Shift Left
Variants
evslb, evslh, evslw, evsl
evslbi, evslhi, evslwi, evsli
Logical shift left of the 8,16, 32 or 64-bit element(s) in rA by the
amount(s) in rB or by the immediate value UIMM
evsloi
Logical shift left of the value in rA by 0 to 7 bytes
evsrbu, evsrhu, evsrwu, evsru
Logical Shift evsrbiu, evsrhiu, evsrwiu, evsriu
Right
evsroiu
Arithmetic
Shift Right
Rotate Left
Description
Logical shift right of the 8,16, 32 or 64-bit element(s) in rA by the
amount(s) in rB or by the immediate value UIMM
Logical shift right of the value in rA by 0 to 7 bytes
evsrbs, evsrhs, evsrws, evsrs
evsrbis, evsrhis, evsrwis, evsris
Arithmetic shift right of the 8,16, 32 or 64-bit element(s) in rA by
the amount(s) in rB or by the immediate value UIMM
evsrois
Arithmetic shift right of the value in rA by 0 to 7 bytes
evrlb, evrlh, evrlw
evrlbi, evrlhi, evrlwi
Rotate left of the 8,16, or 32-bit elements in rA by the amount(s)
in rB or by the immediate value UIMM
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-15
Signal Processing Extension (SPE)
6.2.8.4
Vector Compare and Vector Set Instructions
Vector compare and set instructions are outlined in Table 6-6 and Table 6-7. The compare operations
update the condition register with the results of the comparison.
Table 6-6. Vector Compare Instructions
Basic
Comparison
Operation
Variants
Description
=
evcmpeq, evcmpeqd
Compare word or double word elements for equal
>
evcmpgts, evcmpgtu, evcmpgtds,
evcmpgtdu
Compare word or double word elements for greater than
signed/unsigned
<
evcmplts, evcmpltu, evcmpltds,
evcmpltdu
Compare word or double word elements for less than signed/unsigned
Table 6-7. Vector Set Instructions
Basic
Comparison
Operation
Variants
Description
=
evseteqb[.], evseteqh[.], evseteqw[.]
Compare byte, half word or word elements in rA and rB for equal. For
each byte, half word or word, set destination byte half word or word to all
‘1’s if condition met. Optionally set CR0 with comparison results.
>
evsetgtbs[.], evsetgtbu[.], evsetgths[.], Compare byte, half word or word elements in rA and rB for greater than
evsetgthu[.], evsetgtws[.],
signed or unsigned. For each byte, half word or word, set destination
evsetgtwu[.]
byte half word or word to all ‘1’s if condition met. Optionally set CR0 with
comparison results.
<
6.2.8.5
evsetltbs[.], evsetltbu[.],
evsetlths[.], evsetlthu[.],
evsetltws[.], evsetltwu[.]
Compare byte, half word or word elements in rA and rB for greater than
signed or unsigned. For each byte, half word or word, set destination
byte half word or word to all ‘1’s if condition met. Optionally set CR0 with
comparison results.
Vector Select Instructions
Vector select instructions are outlined in Table 6-8.
Table 6-8. Vector Select Instructions
Operation
Select
Select Bits
Variants
Description
evsel
Select word elements from rA or rB based on crS condition register field
evselbit
Select bit elements from rA or rB based on select bit vector in rD, place
results into rD
evselbitm0
Insert bit elements from rB into rD based on select bit mask in rA of 0,
place results into rD
evselbitm1
Insert bit elements from rB into rD based on select bit mask in rA of 1,
place results into rD
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-16
Freescale Semiconductor
Signal Processing Extension (SPE)
6.2.8.6
Vector Data Arrangement Instructions
Vector data arrangement instructions are outlined in Table 6-9. These instructions are used to rearrange
fields of elements from one or more source vector registers.
Table 6-9. Vector Data Arrangement Instructions
Basic
Operation
Variants
Description
evdlveb
de-interleave even bytes; the vector of even byte elements in rA and
even byte elements in rB are concatenated and placed into rD
evdlveob
de-interleave even/odd bytes; the vector of even byte elements in rA and
odd byte elements in rB are concatenated and placed into rD
evdlvob
de-interleave odd bytes; the vector of odd byte elements in rA and odd
byte elements in rB are concatenated and placed into rD
evdlvoeb
de-interleave odd/even bytes; the vector of odd byte elements in rA and
even byte elements in rB are concatenated and placed into rD
evdlveh
de-interleave even half words; the even half word elements in rA and
even half word elements in rB are concatenated and placed into rD
evdlveoh
de-interleave even/odd half words; the even half word elements in rA
and odd half word elements in rB are concatenated and placed into rD
evdlvoh
de-interleave odd half word; the odd half word elements in rA and odd
half word elements in rB are concatenated and placed into rD
evdlvoeh
de-interleave odd/even half words; the odd half word elements in rA and
even half word elements in rB are concatenated and placed into rD
De-interleave
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-17
Signal Processing Extension (SPE)
Table 6-9. Vector Data Arrangement Instructions (continued)
Basic
Operation
Variants
Description
evilveh
interleave even half words; the even half words from rA are placed into
the even half words of rD and the even half words of rB are placed into
the odd half words of rD
evilveoh
interleave even/odd half words; the even half words from rA are placed
into the even half words of rD and the odd half words of rB are placed
into the odd half words of rD
evilvhih
interleave high half words; the high half words from rA are placed into
the even half words of rD and the high half words of rB are placed into
the odd half words of rD
evilvhiloh
interleave high/low half words; the high half words from rA are placed
into the even half words of rD and the low half words of rB are placed
into the odd half words of rD
evilvloh
interleave low half words; the low half words from rA are placed into the
even half words of rD and the low half words of rB are placed into the
odd half words of rD
evilvlohih
interleave low/high half words; the low half words from rA are placed into
the even half words of rD and the high half words of rB are placed into
the odd half words of rD
evilvoeh
interleave odd/even half words; the odd half words from rA are placed
into the even half words of rD and the even half words of rB are placed
into the odd half words of rD
evilvoh
interleave odd half words; the odd half words from rA are placed into the
even half words of rD and the odd half words of rB are placed into the
odd half words of rD
evmergehi
merge high words; the high word from rA is placed into the high word of
rD and the high word of rB is placed into the low word of rD
evmergehilo
merge high/low words; the high word from rA is placed into the high
word of rD and the low word of rB is placed into the low word of rD
evmergelo
merge low words; the low word from rA is placed into the high word of
rD and the low word of rB is placed into the low word of rD
evmergelohi
merge low/high words; the low word from rA is placed into the high word
of rD and the high word of rB is placed into the low word of rD
evperm
Permute the byte elements in rB according to the permute vector in rA
and place the results in rD
evperm2
Permute the vector of concatenated byte elements from rA and rB
according to the permute vector in rD and place the results in rD
evperm3
Permute the vector of concatenated byte elements from rD and rB
according to the permute vector in rA and place the results in rD
Interleave
Merge
Permute
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-18
Freescale Semiconductor
Signal Processing Extension (SPE)
Table 6-9. Vector Data Arrangement Instructions (continued)
Basic
Operation
Variants
Description
evpksdsws, evpkuduws
Pack the signed or unsigned double word elements from rA and rB into
a pair of signed or unsigned word elements in rD, saturating if necessary
evpksdswfrs
Pack the signed double word fractional elements from rA and rB into a
pair of signed word elements in rD using the current rounding mode in
SPEFSCR, saturating if necessary
evpksdshefrs
Pack the signed 33.31 guarded fractional elements from rA and rB into
a pair of signed half word even elements in rD using the current
rounding mode in SPEFSCR, saturating if necessary
evpkshsbs, evpkshubs, evpkuhubs
Pack the 8 signed or unsigned half word elements from rA and rB into 8
signed or unsigned byte elements in rD, saturating if necessary
evpkswgshefrs
Pack the signed 17.47 guarded fractional elements from rA and rB into
a pair of signed half word even elements in rD using the current
rounding mode in SPEFSCR, saturating if necessary
evpkswgswfrs
Pack the signed 17.47 guarded fractional elements from rA and rB into
a pair of signed word elements in rD using the current rounding mode in
SPEFSCR, saturating if necessary
evpkswshs, evpkswuhs, evpkuwuhs
Pack the 4 signed or unsigned word elements from rA and rB into 4
signed or unsigned half word elements in rD, saturating if necessary
evpkswshilvs
Pack the 4 signed word elements from rA and rB into 4 signed half word
elements in rD with interleaving, saturating if necessary
evpkswshfrs
Pack the 4 signed fractional word elements from rA and rB into 4 signed
or fractional half word elements in rD using the current rounding mode
in SPEFSCR, saturating if necessary
evpkswshilvfrs
Pack the 4 signed fractional word elements from rA and rB into 4 signed
or fractional half word elements in rD with interleaving using the current
rounding mode in SPEFSCR, saturating if necessary
Pack
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-19
Signal Processing Extension (SPE)
Table 6-9. Vector Data Arrangement Instructions (continued)
Basic
Operation
Splat
Variants
Description
evsplatb
splat (replicate) the byte from rA selected by the immediate field into all
byte elements of rD
evsplath
splat (replicate) the half word from rA selected by the immediate field
into all half word elements of rD
evsplatfib, splatfih, splatfi
Splat the 5-bit SIMM field as a signed fraction into all byte, half word, or
word elements of rD
evsplatfiba, splatfiha, splatfia
Splat the 5-bit SIMM field as a signed fraction into all byte, half word, or
word elements of rD and the accumulator
evsplatfid
Splat the 5-bit SIMM field as a signed fraction into rD
evsplatfida
Splat the 5-bit SIMM field as a signed fraction into rD and the
accumulator
evsplatfibo, splatfiho, splatfio
Splat the 5-bit SIMM field as a signed fraction into the odd byte, half
word, or word elements of rD
evsplatfiboa, splatfihoa, splatfioa
Splat the 5-bit SIMM field as a signed fraction into the odd byte, half
word, or word elements of rD and the accumulator
evsplatib, evsplatih, evsplati
Splat the 5-bit SIMM field as a signed integer into all byte, half word, or
word elements of rD
evsplatiba, evsplatiha, evsplatia
Splat the 5-bit SIMM field as a signed integer into all byte, half word, or
word elements of rD and the accumulator
evsplatid
Splat the 5-bit SIMM field as a signed integer into rD
evsplatida
Splat the 5-bit SIMM field as a signed integer into rD and the
accumulator
evsplatibe, evsplatihe, evsplatie
Splat the 5-bit SIMM field as a signed integer into the even byte, half
word, or word elements of rD
evsplatibea, evsplatihea, evsplatiea
Splat the 5-bit SIMM field as a signed integer into the even byte, half
word, or word elements of rD and the accumulator
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-20
Freescale Semiconductor
Signal Processing Extension (SPE)
Table 6-9. Vector Data Arrangement Instructions (continued)
Basic
Operation
Variants
Description
evswapbhilo
bytes within the upper 2 byte pairs in rA are swapped, and concatenated
with swapped bytes in the lower 2 byte pairs of rB.
evswapblohi
bytes within the lower 2 byte pairs in rA are swapped, and concatenated
with swapped bytes in the upper 2 byte pairs of rB.
evswaphe
The even half words in rA are swapped, and merged with the odd half
words of rB.
evswaphhi
The upper 2 half words in rA are swapped, and concatenated with the
lower 2 half words of rB.
evswaphhilo
The upper 2 half words in rA are swapped, and concatenated with
swapped lower 2 half words of rB.
evswaphlo
The lower 2 half words in rA are swapped, and concatenated after the
upper 2 half words of rB.
evswaphlohi
The lower 2 half words in rA are swapped, and concatenated with
swapped upper 2 half words of rB.
evswapho
The odd half words in rA are swapped, and then merged with even half
words of rB.
evunpkhibsi, evunpkhibui,
evunpklobsi, evunpklobui
Unpack the high or low 4 bytes of rA into signed or unsigned integer half
words
evunpkhihf, evunpkhihsi,
evunkpkhihui, evunpklohf,
evunpklohsi, evunkpklohui
Unpack the high or low 2 half words of rA into signed fractional, signed
integer, or unsigned integer words
evunpkhiwgsf, evunpklowgsf
Unpack the high or low word of rA into guarded signed fractional (17.47)
format
evxtrb
A specified byte in rA is placed into a specified byte of rD, zeroing all
other bytes of rD
evxtrd
a double word is extracted from the concatenated byte elements of rA
and rB and placed into rD
evxtrh
A specified half word in rA is placed into a specified half word of rD,
zeroing all other half words of rD
evinsb
A specified byte in rA is placed into a specified byte of rD; all other bytes
of rD are unchanged.
evinsh
A specified half word in rA is placed into a specified half word of rD; all
other half words of rD are unchanged.
Swap
Unpack
Extract
Insert
6.2.8.7
Multiply and accumulate instructions
These instructions perform multiply operations, optionally add the result to the accumulator and place the
result into the destination register and optionally into the accumulator. These instructions are composed of
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-21
Signal Processing Extension (SPE)
different multiply forms, data formats and data accumulate options. The mnemonics for these instructions
indicate their various characteristics. These are shown in Table 6-10.
Table 6-10. Mnemonic Extensions for Multiply Accumulate Instructions
Extension
Meaning
Comments
Multiply Form
half word even
16 × 16  32
heg
half word even guarded
16 × 16  32, 64-bit final accumulate result
ho
half word odd
16 × 16  32
half word odd guarded
16 × 16  32, 64-bit final accumulate result
word
32 × 32  64
word even high guarded
32 × 32  64 in 17.47 format
wh
word high
32 × 32  32 (high order 32 bits of product)
wl
word low
32 × 32  32 (low order 32 bits of product)
word odd high guarded
32 × 32  64 in 17.47 format
he
hog
w
wehg
wohg
Data Format
signed modulo fractional
modulo, no saturation or overflow
signed modulo fractional round
modulo, no saturation or overflow, rounding based on current
rounding mode
smi
signed modulo integer
modulo, no saturation or overflow
ssf
signed saturate fractional
saturation on product and accumulate
signed saturate fractional round
saturation on product and accumulate, rounding based on current
rounding mode
ssi
signed saturate integer
saturation on accumulate
umi
unsigned modulo integer
modulo, no saturation or overflow
usi
unsigned saturate integer
saturation on accumulate
smf
smfr
ssfr
Accumulate Option
a
place in Accumulator
result  rD, Accumulator
aa
add to Accumulator
Accumulator + result  rD, Accumulator
add to Accumulator as word elements
Accumulator[0:31] + result[0:31]  rD[0:31], Accumulator[0:31]
Accumulator[32:63] + result[32:63]  rD[32:63],
Accumulator[32:63]
aaw
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-22
Freescale Semiconductor
Signal Processing Extension (SPE)
Table 6-10. Mnemonic Extensions for Multiply Accumulate Instructions (continued)
Extension
aaw3
an
anw
anw3
6.2.8.8
Meaning
Comments
add to rD as word elements
rD[0:31] + result[0:31]  rD[0:31], Accumulator[0:31]
rD[32:63] + result[32:63]  rD[32:63], Accumulator[32:63]
add negated to Accumulator
Accumulator – result  rD, Accumulator
add negated to Accumulator as word
elements
Accumulator[0:31] – result[0:31]  rD[0:31], Accumulator[0:31]
Accumulator[32:63] – result[32:63]  rD[32:63],
Accumulator[32:63]
add negated to rD as word elements
rD[0:31] – result[0:31]  rD[0:31], Accumulator[0:31]
rD[32:63] – result[32:63]  rD[32:63], Accumulator[32:63]
Dot product instructions
These instructions perform multiple multiply operations, optionally add the results to the accumulator, and
place the result into the destination register and optionally into the accumulator. These instructions are
composed of different forms, data formats and data accumulate options. The mnemonics for these
instructions indicate their various characteristics. These are shown in Table 6-11.
Table 6-11. Mnemonic Extensions for Dot Product Instructions
Extension
Meaning
Comments
Multiply Form
b
byte
8 × 8 + 8 × 8 + 8 × 8 + 8 × 8  32, high and low
4h
four half words
16 × 16 + 16 × 16 + 16 × 16 + 16 × 16  32
four half words guarded
16 × 16 + 16 × 16 + 16 × 16 + 16 × 16  64
half word
16 × 16 op 16 × 16  32, high and low
hih
high half words
16 × 16 op 16 × 16  32, high half words, used for complex mul
loh
low half words
16 × 16 op 16 × 16  32, low half words, used for complex mul
four half words exchanged guarded add
(16 × 16 + 16 × 16) + (16 × 16 + 16 × 16)  64, even and odd rA
half words changed
4hg
h
4hxga
4hxgs
w
wg
wxga
wxgs
four half words exchanged guarded subtract (16 × 16 - 16 × 16) + (16 × 16 - 16 × 16)  64, even and odd rA
half words changed
word
32 × 32 op 32 × 32  64
word guarded
32 × 32 op 32 × 32  64 in 17.47 fractional format
word exchanged guarded add
32 × 32 + 32 × 32  64 in 17.47 fractional format, words in rA are
changed
word exchanged guarded subtract
32 × 32 - 32 × 32  64 in 17.47 fractional format, words in rA are
changed
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-23
Signal Processing Extension (SPE)
Table 6-11. Mnemonic Extensions for Dot Product Instructions (continued)
Extension
Meaning
Comments
Operation
a
add
addition of intermediate products
s
subtract
subtraction of intermediate products
c
complex
complex format arithmetic
Data Format
smf
add signed modulo fractional
modulo, no saturation or overflow
smi
signed modulo integer
modulo, no saturation or overflow
ssf
signed saturate fractional
saturation on product and accumulate
signed saturate fractional round
saturation on product and accumulate, rounding based on current
rounding mode
ssi
signed saturate integer
saturation on product and accumulate
umi
unsigned modulo integer
modulo, no saturation or overflow
usi
unsigned saturate integer
saturation on product and accumulate
ssfr
Accumulate Option
a
place in Accumulator
result  rD, Accumulator
aa
add to Accumulator
Accumulator + result  rD, Accumulator
aa3
add to Accumulator, 3op
rD + result  rD, Accumulator
add to Accumulator as word elements
Accumulator[0:31] + result[0:31]  rD[0:31], Accumulator[0:31]
Accumulator[32:63] + result[32:63]  rD[32:63],
Accumulator[32:63]
aaw
aaw3
6.2.8.9
add to Accumulator as word elements, 3 op rD[0:31] + result[0:31]  rD[0:31], Accumulator[0:31]
rD[32:63] + result[32:63]  rD[32:63], Accumulator[32:63]
Miscellaneous Vector Instructions
Miscellaneous vector instructions are outlined in Table 6-4.
Table 6-12. Misc. Vector Instructions
Operation
Variants
Description
evlvsl
load vector for shift left; place a vector of constant values
for a vector permute for left shift
evlvsr
load vector for shift right; place a vector of constant values
for a vector permute for right shift
load vector for shift
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-24
Freescale Semiconductor
Signal Processing Extension (SPE)
Table 6-12. Misc. Vector Instructions
Operation
Variants
Description
store Accumulator
evmar
move Accumulator to register rA
load Accumulator
evmra
move register rA to Accumulator
Bit reversed
increment
brinc
Compute a bit-reversed increment for a memory offset for
bit-reversed addressing
circinc
Computes a modulo increment for supporting circular
buffer index pointer modification
Circular Increment
6.2.9
Load and Store Instructions
SPE provides a number of load and store instructions. These instructions provide load and store
capabilities for moving data elements between the GPRs and memory. Data elements of 8, 16, 32, and 64
bits are supported. A variety of forms are provided that position data for efficient computation.
6.2.9.1
Addressing Modes—Non-Update forms
Base + index and base + scaled immediate addressing modes are provided. Base registers hold 64-bit
pointer values (32-bit pointers in a 32-bit implementation of the architecture), while registers used as index
values provide 32-bit index values. Scaled immediate values are unsigned and are scaled by the size of the
access.
6.2.9.1.1
Base + Scaled Immediate Addressing—Non-Update Form
In the base + scaled immediate addressing mode, register rA holds a 32-bit pointer value or a value of zero
(if rA = 0), and an immediate field in the instruction word provides a 5-bit unsigned immediate value
which is zero-extended and scaled (shifted left) by 1, 2, or 3, depending on the size (half word, word, or
double word) of the access. The sum of the value in rA and the zero-extended scaled immediate form the
effective address:
if (rA = 0) then b  0
else b  (rA32:63)
SCL  {1,2,3} // half word, word, or double word
EA  b + EXTZ(UIMM*SCL)
6.2.9.1.2
Base + Index Addressing
In the Base + Index addressing mode, register rA holds a 32-bit pointer value or a value of zero (if rA = 0),
while register rB provides a 32-bit index. The sum forms the effective address:
if (rA = 0) then b  0
else b  (rA32:63)
EA  b + (rB)
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-25
Signal Processing Extension (SPE)
6.2.9.2
Addressing Modes—Update forms
The base + scaled immediate addressing mode is also provided with an update form. As in the non-update
form, base register rA holds 32-bit pointer values. For the update form of the base + scaled immediate
addressing mode, the same effective address calculation is used as defined in Section 6.2.9.1.1, “Base +
Scaled Immediate Addressing—Non-Update Form,” and the calculated effective address is placed into rA
by the instruction.
For the base + scaled immediate with update addressing mode, scaled immediate values of 0 are reserved
for future definition and are treated as illegal. Instruction encodings with rA = 0 are also reserved for future
definition and treated as illegal instructions.
6.2.9.3
Addressing Modes—Modify forms
The base + index addressing mode is also provided with a set of modify forms. In the modify forms,
register rB holds 32-bit pointer values, while register rA is used to provide an index value as well as to
provide specialized control information for performing a post-modification to the lower 32 bits of rA.
Modify forms are provided to allow for parallel address computations to occur, which are useful for
sequential accessing of arrays, lists, circular buffers, and other complex data structures. Modify forms of
load and store instructions cause a calculated update value to be placed in the lower portion of register rA.
Support for specialized addressing modes are available when using base + index modify forms.
For the base + index modify forms, the modify calculation mode selection is based on a mode field in
register rA (rA[0:3]). Modify forms modify the original value in rA based on an addressing calculation
performed in parallel with the load or store instruction, which may or may not be the value of the effective
address of the load or store instruction, depending on the actual calculation mode. This is in contrast to
normal update forms of the Power Arch load and store instructions since the new value placed into rA need
not correspond to the effective address of the load or store.
The following three modify calculation modes are currently defined and selected by the value in rA[0:3]:
• Linear addressing: mode = 0000
• Circular addressing: mode = 1000
• Bit-reversed addressing: mode = 1010
All other mode encodings are reserved, and either result in an unimplemented instruction exception, or a
boundedly undefined result depending on the implementation.
Instruction encodings with rA = 0 are reserved for future definition and are treated as illegal instructions.
6.2.9.3.1
Linear Addressing Update Mode
Linear addressing update calculation mode causes the sum of rA[32:63] and rB[32:63] to be placed into
rA[32:63]:
if(mode=0000) then
rA32:63  rA32:63 + rB32:63
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-26
Freescale Semiconductor
Signal Processing Extension (SPE)
6.2.9.3.2
Circular Addressing Modify Mode
Circular addressing modify mode is provided to support addressing of circular buffers. Circular addressing
mode causes a circular increment to be performed on a portion of rA[32:63] (the circular buffer index
portion of rA) after the EA calculation, using the offset and length specifiers in rA and the result is placed
into rA[32–63]. rA[0–31] is left unchanged. rA[32–63] must be  si 0 and  ui Length, and the magnitude
of Offset must be  Length + 1, or the resulting value is boundedly undefined. rB must point to a
double-word boundary in memory, and Length + 1 must be a multiple of eight bytes or an alignment error
will be generated.
Figure 6-4 shows how rA is used in forming the update value for mode 1000 (circinc).
0....3
Mode
(1000)
4
5
-
6....13
Offset
(signed)
14 15 16
-
....
31 32
Length
(unsigned)
...
63
Index
(must always be positive and <=ui Length)
rA
Offset0:7  rA8:15; // signed byte offset, must be <= Length+1
Length0:15  rA16:31; // unsigned buffer length-1 in bytes. Length is byte index of
// last byte in buffer.
// buffer must be aligned on a doubleword boundary, and be a
// multiple of 8 bytes, i.e. Length13:15 =3‘b111.
Index0:31  rA32:63; // index into buffer, must be <=ui Length0:15, (so always >=si 0).
if ((Offset0 = 0) & ((EXTS32(Offset0:7) + Index0:31) >ui EXTZ32(Length0:15))) then
rA32:63  Index0:31 + EXTS32(Offset0:7) - EXTZ32(Length0:15) - 1; // wrap at end
elseif (Offset0 = 1) & ((EXTS32(Offset0:7) + Index0:31) <si 0)) then
rA32:63  Index0:31 + EXTS32(Offset0:7) + EXTZ32(Length0:15) + 1; // wrap at start
else rA32:63  Index0:31 + EXTS32(Offset0:7);
Figure 6-4. rA Used to Form Update Value for Mode 1000
Note that misalignment may cause the operand fetched to span the virtual boundary between the last byte
of the buffer at byte Buffer[Length] and the first byte of the Buffer at byte Buffer[0].
6.2.9.3.3
Bit-Reversed Addressing Modify Mode
Bit-reversed addressing modify calculation mode is provided to support addressing of buffers and arrays
used in FFT calculations.
When using bit-reversed addressing modify mode, a bit-reversed increment is performed on rA[32:63]
after the EA calculation, using a mask specifier in rA. The mask specifier is also used to indicate the bits
of rA[32:63] which are updated.
Figure 6-5 shows how rA is used in forming the update value for bit-reversed addressing update mode.
Note that the computation is similar to the brinc instruction computation, but the mask is applied to
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-27
Signal Processing Extension (SPE)
updating only those bits of rA indicated by a ‘1’ in the mask value, unlike in the brinc instruction, in which
all low order bits of rD corresponding to the maximum mask size are updated.
0
3
4
Mode
(1010)
15 16
0....0
....
31 32
...
Mask
63
Index
rA
Mask  rA16:31
// mask value is log2(#points)1, zero extended, then left shifted log2(element size
// in bytes). e.g., a 16 point FFT on half words has a mask of 16‘b0000000000011110
a  rA48:63 // up to 64KB in a single FFT
d  bitreverse(1 + bitreverse(a | ~Mask)))
rA32:63  rA32:47 || ((rA48:63 & ~Mask) |(d & Mask)) // different than brinc. allows main
pointer sharing to multiple buffers less than 64KB in size.
Figure 6-5. rA Used to Form Update Value for Bit-Reversed Addressing Update Mode
6.2.9.4
Vector Load and Store Instruction Summary
Vector load and store instructions are provided to load and store various size vectors of byte, half-word,
word or double-word size. These instructions allow for endian-neutral code to be written. In addition,
update forms of the non-indexed instructions are provided to allow for base register updates. Variations of
the load instructions provide splat (replication) capability for placing a smaller vector element into
multiple element positions in a vector register.
Vector load and store instructions are outlined in Table 6-13.
Table 6-13. Vector Load and Store Instructions
Operation
Load Byte
Variants
Description
evlbbsplatb, evlbbsplatbu, evlbbsplatbx,
evlbbsplatbmx
load byte and splat byte into 8 byte element positions
evldb, evldbu, evldbx, evldbmx
load double word as byte elements
evldd, evlddu, evlddx, evlddmx
load double word as double word
evldh, evldhu, evldhx, evldhmx
load double word as half word elements
evldw, evldwu, evldwx, evldwmx
load double word as word elements
evlhhesplat, evlhhesplatu, evlhhesplatx,
evlhhesplatmx
load half word into even half word elements, zeroing
the odd half word s elements
evlhhossplat, evlhhossplatu, evlhhossplatx,
evlhhossplatmx
load half word into odd half word elements,
sign-extending to word elements
Load Double Word
Load Half Word
evlhhousplat, evlhhousplatu, evlhhousplatx, load half word into odd half word elements,
evlhhousplatmx
zero-extending to word elements
evlhhsplath, evlhhsplathu, evlhhsplathx,
evlhhsplathmx
load half word into all half word elements
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-28
Freescale Semiconductor
Signal Processing Extension (SPE)
Table 6-13. Vector Load and Store Instructions (continued)
Operation
Variants
Description
evlwbe, evlwbeu, evlwbex, evlwbemx
load word as four byte elements into the four even byte
elements, zeroing the odd byte elements
evlwbos, evlwbosu, evlwbosx, evlwbosmx
load word as four byte elements into the four odd byte
elements, sign-extending to half word elements
evlwbou, evlwbouu, evlwboux, evlwboumx
load word as four byte elements into the four odd byte
elements, zero-extending to half word elements
evlwbsplatw, evlwbsplatwu, evlwbsplatwx,
evlwbsplatwmx
load word as four byte elements into both word
elements
evlwhe, evlwheu, evlwhex, evlwhemx
load word as two half word elements into the two even
half word elements, zeroing the odd half word
elements
evlwhos, evlwhosu, evlwhosx, evlwhosmx
load word as two half word elements into the two odd
half word elements, sign-extending to word elements
evlwhou, evlwhouu, evlwhoux, evlwhoumx
load word into the two odd half word elements,
zero-extending to word elements
evlwhsplat, evlwhsplatu, evlwhsplatx,
evlwhsplatmx
load word as two half word elements, placing the first
half word into both upper half word elements, second
half word into both lower half word elements
evlwhsplatw, evlwhsplatwu, evlwhsplatwx,
evlwhsplatwmx
load word as two half word elements, into both word
elements
evlwwsplat, evlwhsplatu, evlwhsplatx,
evlwhsplatmx
load word as word element, into both word elements
evstdb, evstdbu, evstdbx, evstdbmx
store double word as byte elements
evstdd, evstddu, evstddx, evstddmx
store double word as double word
evstdh, evstdhu, evstdhx, evstdhmx
store double word as half word elements
evstdw, evstdwu, evstdwx, evstdwmx
store double word as word elements
evsthb, evsthbu, evsthbx, evsthbmx
store half word as byte elements
evstwb, evstwbu, evstwbx, evstwbmx
store word as four byte elements
evstwbe, evstwbeu, evstwbex, evstwbemx
store word from four even byte elements
evstwbo, evstwbou, evstwbox, evstwbomx
store word from four odd byte elements
evstwhe, evstwheu, evstwhex, evstwhemx
store word from two even half word elements
evstwho, evstwhou, evstwhox, evstwhomx
store word from two odd half word elements
Load Word
Store Double Word
Store Half Word
Store Word
evstwwe, evstwweu, evstwwex, evstwwemx store word from even word element
evstwwo, evstwwou, evstwwox, evstwwomx store word from odd word element
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-29
Signal Processing Extension (SPE)
6.2.10
SPE Exceptions
The architecture defines the following SPE exceptions:
• SPE unavailable exception
• SPE vector alignment exception
Interrupt vector offset registers (IVOR) IVOR32 (SPE/embedded floating point unavailable interrupt) and
IVOR5 (alignment interrupt), are used by the interrupt model. The SPR number for IVOR32 is 528,
IVOR5 is defined by Power ISA. These registers are privileged.
6.2.10.1
SPE/Embedded Floating-point Unavailable Exception
The SPE/embedded floating-point unavailable exception is taken if MSR[SPE] is cleared and execution
of a SPE instruction other than the brinc instruction is attempted. When the SPE/embedded floating-point
unavailable exception occurs, the processor suppresses execution of the instruction causing the exception.
The SRR0, SRR1, MSR, and ESR registers are modified as follows:
• SRR0 is set to the effective address of the instruction causing the exception.
• SRR1 is set to the contents of the MSR at the time of the exception.
• MSR[CE, ME, DE] are unchanged. All other bits are cleared.
• ESR[SPE] is set. All other ESR bits are cleared.
Instruction execution resumes at address IVPR[0–15]||IVOR32[16–27]||0b0000.
6.2.10.2
SPE Vector Alignment Exception
For e200z760n3, the SPE vector alignment exception is taken if the effective address of any of the
following instructions is not aligned to a 32-bit boundary: evldd[u], evlddx, evldw[u], evldwx, evldh[u],
evldhx, evstdd[u], evstddx, evstdw[u], evstdwx, evstdh[u], and evstdhx. When an SPE vector alignment
exception occurs, the processor suppresses the execution of the instruction causing the alignment
exception and takes an alignment interrupt.
SRR0, SRR1, MSR, ESR, and DEAR are modified as follows:
• SRR0 is set to the effective address of the instruction causing the alignment exception.
• SRR1 is set to the contents of the MSR at the time of the exception.
• MSR[CE, ME, DE] are unchanged. All other bits are cleared.
• ESR[SPE] (bit 24) is set. ESR[ST] is set only if the instruction causing the exception is a store and
is cleared for a load. All other bits are cleared.
• DEAR is updated with the effective address of a byte of the load or store.
Instruction execution resumes at address IVPR[0–15]||IVOR5[16–27]||0b0000.
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-30
Freescale Semiconductor
Signal Processing Extension (SPE)
6.2.11
Exception Priorities
The following list shows the priority order in which exceptions are taken:
1. SPE Unavailable exception
2. SPE Vector Alignment exception
An SPE vector alignment exception is taken if an SPE double-word vector load or store access is attempted
with an address which is not 32-bit aligned.
6.3
SPE Instruction Timing
Instruction timing in number of processor clock cycles for SPE instructions are shown in the following
tables. Pipelined instructions are shown with cycles of total latency and throughput cycles. Divide
instructions are not pipelined and block other instructions from executing during divide execution.
6.3.1
SPE Simple Vector Arithmetic Instructions Timing
Table 6-14 shows instruction timing for SPE integer simple instructions. The table is sorted by opcode.
These instructions are issued as a pair of operations.
Table 6-14. Simple Vector Arithmetic Instruction Timing
Basic Operation
Instruction
Latency
Throughput
evabsb, evabsh, evabs, evabsd
1
1
evabsbs, evabshs, evabss, evabsds
1
1
evabsdifsb, evabsdifsh, evabsdifsw, evabsdifub,
evabsdifuh, evabsdifuw
1
1
evaddb, evaddh, evaddw, evaddd
1
1
evaddbss, evaddhss, evaddwss, evadddss
evaddbus, evaddhus, evaddwus, evadddus
1
1
evaddhx, evaddhxss, evaddhxus
1
1
evaddwx, evaddwxss, evaddwxus
1
1
evaddib, evaddih, evaddiw
1
1
evaddsmiaaw, evaddssiaaw, evaddumiaaw, evaddusiaaw
1
1
evaddsmiaa, evaddssiaa, evaddusiaa
1
1
evadd2subf2h, evadd2subf2hss
1
1
evaddsubfh, evaddsubfhss
1
1
evaddsubfhx, evaddsubfhxss
1
1
evaddsubfw, evaddsubfwss
1
1
evaddsubfwx, evaddsubfwxss
1
1
Absolute Value
Absolute Difference
Add
AddSubf
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-31
Signal Processing Extension (SPE)
Table 6-14. Simple Vector Arithmetic Instruction Timing (continued)
Basic Operation
Average
Count Leading
Instruction
Latency
Throughput
evavgbs, evavghs, evavgws, evavgds,
evavgbsr, evavghsr, evavgwsr, evavgdsr
evavgbu, evavghu, evavgwu, evavgdu
evavgbur, evavghur, evavgwur, evavgdur
1
1
evcntlsh, evcntlzh
evcntlsw, evcntlzw
1
1
evextsb, evextzb
1
1
evextsbh
1
1
evextsh, evextzh (use evclrh)
1
1
evextsw
1
1
evmaxbs, evmaxhs, evmaxws, evmaxds
evmaxbu, evmaxhu, evmaxwu, evmaxdu
1
1
evmaxbpsh, evmaxbpuh
1
1
evmaxhpsw, evmaxhpuw
1
1
evmaxwpsd, evmaxwpud
1
1
evmaxmagws
1
1
evminbs, evminhs, evminws, evminds
evminbu, evminhu, evminwu, evmindu
1
1
evminbpsh, evminbpuh
1
1
evminhpsw, evminhpuw
1
1
evminwpsd, evminwpud
1
1
evnegb, evnegh, evneg, evnegd
1
1
evnegbs, evneghs, evnegs, evnegds
1
1
evnegbo, evnegho, evnegwo
1
1
evnegbos, evneghos, evnegwos
1
1
evrndhb, evrndhbss, evrndhbus
1
1
evrndhnb, evrndhnbss, evrndhnbus
1
1
evrndwh, evrndwhss, evrndwhus
1
1
evrndwnh, evrndwnhss, evrndwnhus
1
1
evrnddw, evrnddwss, evrnddwus
1
1
evrndndw, evrndndwss, evrndndwus
1
1
Extend
Maximum
Maximum Magnitude
Minimum
Negate
Round
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-32
Freescale Semiconductor
Signal Processing Extension (SPE)
Table 6-14. Simple Vector Arithmetic Instruction Timing (continued)
Basic Operation
Sum of Absolute
Differences
Saturate
Subf
SubfAdd
Instruction
Latency
Throughput
evsad2sh, evsad2sha, evsad2shaaw
1
1
evsad2uh, evsad2uha, evsad2uhaaw
1
1
evsad4sb, evsad4sba, evsad4sbaaw
1
1
evsad4ub, evsad4uba, evsad4ubaaw
1
1
evsadsw, evsadswa, evsadswaa
1
1
evsaduw, evsaduwa, evsaduwaa
1
1
evsatsbub
1
1
evsatubsb
1
1
evsatsdsw, evsatsduw
1
1
evsatuduw
1
1
evsatshsb, evsatshub
1
1
evsatshuh
1
1
evsatuhub
1
1
evsatuhsh
1
1
evsatswgsdf
1
1
evsatswsh, evsatswuh
1
1
evsatswuw
1
1
evsatuwuh
1
1
evsatuwsw
1
1
evsubfb, evsubfh, evsubfw, evsubfd
1
1
evsubfbss, evsubfhss, evsubfwss, evsubfdss
evsubfbus, evsubfhus, evsubfwus, evsubfdus
1
1
evsubfhx, evsubfhxss, evsubfhxus
1
1
evsubfwx, evsubfwxss, evsubfwxus
1
1
evsubifb, evsubifh, evsubifw
1
1
evsubfsmiaaw, evsubfssiaaw, evsubfumiaaw,
evsubfusiaaw
1
1
evsubfsmiaa, evsubfssiaa, evsubfusiaa
1
1
evsubf2add2h, evsubf2add2hss
1
1
evsubfaddh, evsubfaddhss
1
1
evsubfaddhx, evsubfaddhxss
1
1
evsubfaddw, evsubfaddwss
1
1
evsubfaddwx, evsubfaddwxss
1
1
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-33
Signal Processing Extension (SPE)
Table 6-14. Simple Vector Arithmetic Instruction Timing (continued)
Basic Operation
Summation/
Diff
6.3.2
Instruction
Latency
Throughput
evsumws, evsumwu, evsumwsa, evsumwua
1
1
evsumwsaa, evsumwuaa
1
1
evsum2hs, evsum2hu, evsum2hsa, evsum2hua
1
1
evsum2hsaaw, evsum2huaaw
1
1
evsum4bs, evsum4bu, evsum4bsa, evsum4bua
1
1
evsum4bsaaw, evsum4buaaw
1
1
evsum2his, evsum2hisa
1
1
evsum2hisaaw
1
1
evdiff2his, evdiff2hisa
1
1
evdiff2hisaaw
1
1
SPE Complex Integer Instruction Timing
Table 6-15 shows instruction timing for SPE complex integer instructions. For the divide instructions, the
number of stall cycles is (latency) for following instructions.
Table 6-15. SPE Complex Integer Instruction Timing
Operation
Instruction
Throughput
12-321
12-321
evdivws, evdivwu,
evdivwsf, evdivwuf
evdivs, evdivu
Divide
1
Latency
Timing is data dependent
6.3.3
SPE Vector Logical Instruction Timing
Table 6-16 shows instruction timing for SPE simple vector logical instructions.
Table 6-16. SPE Vector Logical Instruction Timing
Basic Operation
Instruction
Latency
Throughput
AND
evand
1
1
ANDC
evandc
1
1
evclrbe, evclrbo
1
1
evclrh
1
1
evnand
1
1
NOR
evnor
1
1
OR
evor
1
1
Clear
NAND
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-34
Freescale Semiconductor
Signal Processing Extension (SPE)
Table 6-16. SPE Vector Logical Instruction Timing (continued)
Basic Operation
Instruction
Latency
Throughput
ORC
evorc
1
1
XNOR
eveqv
1
1
XOR
evxor
1
1
6.3.4
SPE Vector Shift/Rotate Instruction Timing
Instruction timing for SPE vector shift/rotate instructions is shown in Table 6-17.
Table 6-17. SPE Vector Shift/Rotate Instruction Timing
Basic Operation
Shift Left
Instruction
Latency
Throughput
evslb, evslh, evslw, evsl
evslbi, evslhi, evslwi, evsli
1
1
evsloi
1
1
1
1
evsroiu
1
1
evsrbs, evsrhs, evsrws, evsrs
evsrbis, evsrhis, evsrwis, evsris
1
1
evsrois
1
1
evrlb, evrlh, evrlw
evrlbi, evrlhi, evrlwi
1
1
evsrbu, evsrhu, evsrwu, evsru
Logical Shift Right evsrbiu, evsrhiu, evsrwiu, evsriu
Arithmetic Shift
Right
Rotate Left
6.3.5
SPE Vector Compare and Vector Set Instruction Timing
Instruction timing for SPE vector compare and set instructions is shown in Table 6-18 and Table 6-19.
Table 6-18 shows the SPE vector compare instruction timing.
Table 6-18. SPE Vector Compare Instruction Timing
Basic
Comparison
Operation
Instruction
Latency
Throughput
=
evcmpeq, evcmpeqd
1
1
>
evcmpgts, evcmpgtu, evcmpgtds, evcmpgtdu
1
1
<
evcmplts, evcmpltu, evcmpltds, evcmpltdu
1
1
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-35
Signal Processing Extension (SPE)
Table 6-19 shows the SPE vector set instruction timing.
Table 6-19. SPE Vector Set Instruction Timing
Comparison
Operation
6.3.6
Instruction
Latency
Throughput
=
evseteqb[.], evseteqh[.], evseteqw[.]
1
1
>
evsetgtbs[.], evsetgtbu[.], evsetgths[.], evsetgthu[.],
evsetgtws[.], evsetgtwu[.]
1
1
evsetltbs[.], evsetltbu[.],
evsetlths[.], evsetlthu[.],
evsetltws[.], evsetltwu[.]
1
1
<
SPE Vector Select Instruction Timing
Table 6-20 shows instruction timing for SPE vector select instructions.
Table 6-20. SPE Vector Select Instruction Timing
Operation
Select
Select Bits
6.3.7
Instruction
Latency
Throughput
evsel
1
1
evselbit
1
1
evselbitm0
1
1
evselbitm1
1
1
SPE Vector Data Arrangement Instruction Timing
Table 6-21 shows the instruction timing for SPE vector data arrangement instructions.
Table 6-21. SPE Vector Data Arrangement Instruction Timing
Operation
Instruction
Latency
Throughput
evdlveb
1
1
evdlveob
1
1
evdlvob
1
1
evdlvoeb
1
1
evdlveh
1
1
evdlveoh
1
1
evdlvoh
1
1
evdlvoeh
1
1
De-interleave
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-36
Freescale Semiconductor
Signal Processing Extension (SPE)
Table 6-21. SPE Vector Data Arrangement Instruction Timing (continued)
Operation
Instruction
Latency
Throughput
evilveh
1
1
evilveoh
1
1
evilvhih
1
1
evilvhiloh
1
1
evilvloh
1
1
evilvlohih
1
1
evilvoeh
1
1
evilvoh
1
1
evmergehi
1
1
evmergehilo
1
1
evmergelo
1
1
evmergelohi
1
1
evperm
1
1
evperm2
1
1
evperm3
1
1
evpksdsws, evpkuduws
1
1
evpksdswfrs
1
1
evpksdshefrs
1
1
evpkshsbs, evpkshubs, evpkuhubs
1
1
evpkswgshefrs
1
1
evpkswgswfrs
1
1
evpkswshs, evpkswuhs, evpkuwuhs
1
1
evpkswshilvs
1
1
evpkswshfrs
1
1
evpkswshilvfrs
1
1
Interleave
Merge
Permute
Pack
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-37
Signal Processing Extension (SPE)
Table 6-21. SPE Vector Data Arrangement Instruction Timing (continued)
Operation
Instruction
Latency
Throughput
evsplatb
1
1
evsplath
1
1
evsplatfib, splatfih, splatfi
1
1
evsplatfiba, splatfiha, splatfia
1
1
evsplatfid
1
1
evsplatfida
1
1
evsplatfibo, splatfiho, splatfio
1
1
evsplatfiboa, splatfihoa, splatfioa
1
1
evsplatib, evsplatih, evsplati
1
1
evsplatiba, evsplatiha, evsplatia
1
1
evsplatid
1
1
evsplatida
1
1
evsplatibe, evsplatihe, evsplatie
1
1
evsplatibea, evsplatihea, evsplatiea
1
1
evswapbhilo
1
1
evswapblohi
1
1
evswaphe
1
1
evswaphhi
1
1
evswaphhilo
1
1
evswaphlo
1
1
evswaphlohi
1
1
evswapho
1
1
evunpkhibsi, evunpkhibui, evunpklobsi,
evunpklobui
1
1
evunpkhihf, evunpkhihsi, evunkpkhihui, evunpklohf,
evunpklohsi, evunkpklohui
1
1
evunpkhiwgsf, evunpklowgsf
1
1
evxtrb
1
1
evxtrd
1
1
evxtrh
1
1
evinsb
1
1
evinsh
1
1
Splat
Swap
Unpack
Extract
Insert
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-38
Freescale Semiconductor
Signal Processing Extension (SPE)
6.3.8
SPE Multiply and Multiply/Accumulate Instruction Timing
Table 6-22 shows instruction timing for SPE multiply and multiply/accumulate instructions.
Table 6-22. SPE Multiply and Multiply/Accumulate Instruction Timing
6.3.9
Instruction
Latency
Throughput
all evm{b,h,w} instructions
4
1
SPE Dot Product Instruction Timing
Table 6-23 shows instruction timing for SPE dot product instructions.
Table 6-23. SPE Dot Product Instruction Timing
6.3.10
Instruction
Latency
Throughput
all evdotp instructions
4
1
SPE Misc. Vector Instruction Timing
Table 6-24 shows instruction timing for SPE miscellaneous instructions.
Table 6-24. SPE Misc. Vector Instruction Timing
Operation
Instruction
Latency
Throughput
evlvsl
1
1
evlvsr
1
1
store Accumulator
evmar
1
1
load Accumulator
evmra
1
1
Bit reversed increment
brinc
1
1
load vector for shift
6.3.11
SPE Load and Store Instruction Timing
Table 6-25 shows instruction timing for SPE load and store instructions.
Table 6-25. SPE Load and Store Instruction Timing
Instruction
Latency
Throughput
all ev loads
3
1
all ev stores
3
1
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
6-39
Signal Processing Extension (SPE)
e200z7 Power Architecture Core Reference Manual, Rev. 2
6-40
Freescale Semiconductor
Chapter 7
Interrupts and Exceptions
The Power ISA embedded category architecture defines the mechanisms by which the e200 core
implements interrupts and exceptions. This document uses the terminology ‘interrupt’ to indicate the
action in which the processor saves its old context and initiates execution at a predetermined interrupt
handler address. Exceptions are referred to as events, which when enabled, cause the processor to take an
interrupt. This chapter uses the same terminology.
The Power ISA embedded category exception mechanism allows the processor to change to supervisor
state as a result of unusual conditions arising in the execution of instructions, and from external signals,
bus errors, or various internal conditions. When interrupts occur, information about the state of the
processor is saved to machine state save/restore registers (SRR0/SRR1, CSRR0/CSRR1, or
DSRR0/DSRR1, MCSRR0/MCSRR1) and the processor begins execution at an address (interrupt vector)
determined by the interrupt vector prefix register (IVPR) and one of the interrupt vector offset registers
(IVOR). Processing of instructions within the interrupt handler begins in supervisor mode.
Multiple exception conditions can map to a single interrupt vector and may be distinguished by examining
registers associated with the interrupt. The exception syndrome register (ESR) is updated with information
specific to the exception type when an interrupt occurs.
To prevent loss of state information, interrupt handlers must save the information stored in the machine
state save/restore registers, soon after the interrupt has been taken. Four sets of these registers are
implemented; SRR0 and SRR1 for noncritical interrupts, CSRR0 and CSRR1 for critical interrupts,
DSRR0 and DSRR1 for debug interrupts (when the debug unit is enabled), and MCSRR0 and MCSRR1
for machine check interrupts. Hardware supports nesting of critical interrupts within noncritical interrupts,
machine check interrupts within both critical and noncritical interrupts, and debug interrupts within both
critical, noncritical, and machine check interrupts. It is up to the interrupt handler to save necessary state
information if interrupts of a given class are re-enabled within the handler.
The following terms are used to describe the stages of exception processing:
Recognition
Exception recognition occurs when the condition that can cause an exception is
identified by the processor. This is also referred to as an exception event.
Taken
An interrupt is said to be taken when control of instruction execution is passed to
the interrupt handler; that is, the context is saved and the instruction at the
appropriate vector offset is fetched and the interrupt handler routine begins.
Handling
Interrupt handling is performed by the software linked to the appropriate vector
offset. Interrupt handling is begun in supervisor mode.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-1
Interrupts and Exceptions
Returning from an interrupt is performed by executing an rfi, rfci, rfdi, or rfmci instruction (or se_rfi,
se_rfci, se_rfdi, or se_rfmci VLE instruction) to restore state information from the respective machine
state save/restore register pair.
7.1
e200 Interrupts
As specified by the Power ISA embedded category architecture, interrupts can be either precise or
imprecise, synchronous or asynchronous, and critical or noncritical. Asynchronous exceptions are caused
by events external to the processor’s instruction execution; synchronous exceptions are directly caused by
instructions or an event somehow synchronous to the program flow, such as a context switch. A precise
interrupt architecturally guarantees that no instruction beyond the instruction causing the exception has
(visibly) executed. Critical interrupts are provided with a separate save/restore register pair
(CSRR0/CSRR1) to allow certain critical exceptions to be handled within a noncritical interrupt handler.
Machine check interrupts are also provided with a separate save/restore register pair (MCSRR0/MCSRR1)
to allow machine check exceptions to be handled within a noncritical or critical interrupt handler.
The types of interrupts handled are shown in Table 7-1. Refer to the “Interrupts and Exceptions” chapter
in the EREF for exact details of each interrupt type.
Table 7-1. Interrupt Classifications
Interrupt Types
Synchronous/Asynchronous
System reset
Asynchronous, nonmaskable
Machine check
Precise/Imprecise
Imprecise
—
Critical/Noncritical/
Debug/Machine
Check
—
—
Machine check
Nonmaskable input interrupt
Asynchronous, nonmaskable
Imprecise
Machine check
Critical input interrupt
Watchdog timer interrupt
Asynchronous, maskable
Imprecise
Critical
External input interrupt
Fixed-interval timer interrupt
Decrementer interrupt
Asynchronous, maskable
Imprecise
Noncritical
Performance monitor interrupts
Synchronous/asynchronous,
maskable
Imprecise
Noncritical
Instruction-based debug interrupts
Synchronous
Precise
Critical/debug
Debug interrupt (UDE)
Debug imprecise interrupt
Asynchronous
Imprecise
Critical/debug
Data storage/alignment/TLB Interrupts
Instruction storage/TLB interrupts
Synchronous
Precise
Noncritical
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-2
Freescale Semiconductor
Interrupts and Exceptions
These classifications are discussed in greater detail in Section 7.6, “Interrupt Definitions.” Table 7-2 lists
the interrupts implemented in the e200 and the exception conditions that cause them.
Table 7-2. Exceptions and Conditions
Interrupt Type
Interrupt Vector Offset
Register
Causing Conditions
System reset
None,
Vector to
[p_rstbase[0:29]] || 0b00
Reset by assertion of p_reset_b.
Critical Input
IVOR01
p_critint_b is asserted and MSR[CE] = 1.
Machine check
IVOR1
1.
2.
3.
4.
Machine check
(NMI)
IVOR1
p_nmi_b transitions from negated to asserted
Data storage
IVOR2
1. Access control.
2. Byte ordering due to misaligned access across page boundary to pages
with mismatched E bits
3. Cache locking exception
Instruction storage
IVOR3
1. Access control.
2. Byte ordering due to misaligned instruction across page boundary to
pages with mismatched VLE bits, or access to page with VLE set, and E
indicating little endian.
3. Misaligned Instruction fetch due to a change of flow to an odd half word
instruction boundary on a Power ISA (non-VLE) instruction page
External input
IVOR41
p_extint_b is asserted and MSR[EE] = 1.
Alignment
IVOR5
1.
2.
3.
4.
Program
IVOR6
Illegal, privileged, trap, FP enabled, AP enabled, Unimplemented operation
Floating-point
unavailable
IVOR7
MSR[FP] = 0 and attempt to execute a Power ISA floating point operation
System call
IVOR8
Execution of the system call (sc, se_sc) instruction
AP unavailable
IVOR9
Unused by e200
Decrementer
IVOR10
As specified in the EREF, “Timer Facilities” chapter
Fixed interval timer
IVOR11
As specified in the EREF, “Timer Facilities” chapter
Watchdog timer
IVOR12
As specified in the EREF, “Timer Facilities” chapter
Data TLB error
IVOR13
Data translation lookup did not match a valid entry in the TLB
Instruction TLB
error
IVOR14
Instruction translation lookup did not match a valid entry in the TLB
p_mcp_b transitions from negated to asserted
ISI, ITLB error on first instruction fetch for an exception handler
Parity error signaled on cache access
External bus error
lmw, stmw not word aligned
lwarx or stwcx. not word aligned, lharx or sthcx. not half word aligned
dcbz with disabled cache, or to W or I storage
SPE ld and st instructions not properly aligned
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-3
Interrupts and Exceptions
Table 7-2. Exceptions and Conditions (continued)
Interrupt Type
Interrupt Vector Offset
Register
Causing Conditions
Debug
IVOR15
Reserved
IVOR16–IVOR31
SPE/EFPU
unavailable
exception
IVOR32
See Section 5.2.5.1, “EFPU Unavailable Exception.”
EFPU data
exception
IVOR33
See Section 5.2.5.2, “Embedded Floating-point Data Exception.”
EFPU round
exception
IVOR34
See Section 5.2.5.3, “Embedded Floating-point Round Exception.”
Performance
monitor
IVOR35
Performance monitor enabled condition or event
1
Trap, instruction address compare, data address compare, instruction
complete, branch taken, return from interrupt, interrupt taken, debug counter,
external debug event, unconditional debug event
—
Auto-vectored external and critical input interrupts use this IVOR. Vectored interrupts supply an interrupt vector offset directly.
7.2
Exception Syndrome Register
The exception syndrome register (ESR) provides a syndrome to differentiate between exceptions that can
generate the same interrupt type. The e200 adds some implementation-specific bits to this register, as seen
in Figure 7-1.
SPR 62
Access: Read/Write
0
3
R
—
W
4
5
6
7
8
9
10
11
12
13
14
15
PIL
PPR
PTR
FP
ST
—
DLK
ILK
AP
PUO
BO
PIE
25
26
27
28
29
30
31
—
VLEM
I
MIF
0
Reset
All zeros
16
23
R
—
W
24
SPE
Reset
0
All zeros
Figure 7-1. Exception Syndrome Register (ESR)
The ESR bits are defined in Table 7-3.
Table 7-3. ESR Bit Settings
Bits
Name
Description
0–3
(32–35)
—
Reserved
4
(36)
PIL
Illegal instruction exception
Associated Interrupt Type
—
Program
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-4
Freescale Semiconductor
Interrupts and Exceptions
Table 7-3. ESR Bit Settings (continued)
Bits
Name
Description
Associated Interrupt Type
5
(37)
PPR
Privileged instruction exception
Program
6
(38)
PTR
Trap exception
Program
7
(39)
FP
Floating-point operation
Alignment (not on the e200)
Data storage (not on the e200)
Data TLB (not on the e200)
Program
8
(40)
ST
Store operation
Alignment
Data storage
Data TLB
9
(41)
—
Reserved
10
(42)
DLK
Data Cache Locking
Data storage
11
(43)
ILK
Instruction Cache Locking
Data storage
12
(44)
AP
Auxiliary Processor operation
(Not used by the e200)
Alignment (not on the e200)
Data storage (not on the e200)
Data TLB (not on the e200)
Program (not on the e200)
13
(45)
PUO
Unimplemented Operation exception
Program
14
(46)
BO
Byte Ordering exception
Mismatched Instruction Storage exception
Data storage
Instruction storage
15
(47)
PIE
Program Imprecise exception
(Reserved)
Currently unused by the e200
16–23
(48–55)
—
24
(56)
SPE
25
(57)
—
—
Reserved
—
SPE/EFPU Operation
SPE/EFPU unavailable
EFPU floating-point data exception
EFPU floating-point round exception
Alignment
Data storage
Data TLB
Reserved
—
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-5
Interrupts and Exceptions
Table 7-3. ESR Bit Settings (continued)
Bits
Name
26
(58)
VLEMI
27–29
(59–61)
—
30
(62)
MIF
31
(63)
—
7.3
Description
Associated Interrupt Type
VLE Mode Instruction
SPE/EFPU unavailable
EFPU floating-point data exception
EFPU floating-point round exception
Data storage
Data TLB
Instruction storage
Alignment
Program
System call
Reserved
—
Misaligned Instruction Fetch
Instruction storage
Instruction TLB
Reserved
—
Machine State Register
The machine state register, shown in Figure 7-2, defines the state of the processor.
Access: Read/Write
0
4
R
—
W
5
6
W
12
UCLE SPE
Reset
R
7
—
13
14
15
WE
CE
—
All zeros
16
17
18
19
20
21
22
23
EE
PR
FP
ME
FE0
—
DE
FE1
Reset
24
25
—
26
27
28
29
30
31
IS
DS
—
PMM
RI
—
All zeros
Figure 7-2. Machine State Register (MSR)
The MSR bits are defined in Table 7-4.
Table 7-4. MSR Bit Settings
Bits
Name
0–4
(32–36)
—
5
(37)
UCLE
Description
Reserved
User Cache Lock Enable
0 Execution of the cache locking instructions in user mode (MSR[PR] = 1) disabled; DSI exception
taken instead, and ILK or DLK set in ESR.
1 Execution of the cache lock instructions in user mode enabled
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-6
Freescale Semiconductor
Interrupts and Exceptions
Table 7-4. MSR Bit Settings (continued)
Bits
Name
Description
6
(38)
SPE
7–12
(39–44)
—
13
(45)
WE
Wait State (power management) Enable. This bit is defined as optional in the Power ISA embedded
category architecture.
0 Power management is disabled
1 Power management is enabled. The processor can enter a power-saving mode when additional
conditions are present. The mode chosen is determined by the DOZE, NAP, and SLEEP bits in the
HID0 register, described in Section 2.4.11, “Hardware Implementation Dependent Register 0
(HID0).”
14
(46)
CE
Critical Interrupt Enable
0 Critical input and watchdog timer interrupts are disabled.
1 Critical input and watchdog timer interrupts are enabled.
15
(47)
—
Reserved
16
(48)
EE
External Interrupt Enable
0 External input, decrementer, and fixed-interval timer interrupts are disabled.
1 External input, decrementer, and fixed-interval timer interrupts are enabled.
17
(49)
PR
Problem State
0 The processor is in supervisor mode, can execute any instruction, and can access any resource
(for example, GPRs, SPRs, MSR, etc.).
1 The processor is in user mode, cannot execute any privileged instruction, and cannot access any
privileged resource.
18
(50)
FP
Floating-Point Available
0 Floating point unit is unavailable. The processor cannot execute floating-point instructions,
including floating-point loads, stores, and moves. (An FP Unavailable interrupt will be generated on
attempted execution of floating point instructions).
1 Floating-point unit is available. The processor can execute floating-point instructions.
Note that for Zen, the floating point unit is not supported in hardware, and an Unimplemented
Operation exception is generated for attempted execution of Power ISA floating point instructions
when FP is set.
19
(51)
ME
Machine Check Enable
0 Asynchronous machine check interrupts are disabled
1 Asynchronous machine check interrupts are enabled
20
(52)
FE0
Floating-Point Exception Mode 0 (not used by the e200)
21
(53)
—
Reserved
22
(54)
DE
Debug Interrupt Enable
0 Debug interrupts are disabled
1 Debug interrupts are enabled
23
(55)
FE1
Floating-Point Exception Mode 1 (not used by the e200)
SPE/EFPU Available
0 Execution of SPE and EFPU vector instructions is disabled; SPE/EFPU unavailable exception
taken instead, and SPE bit is set in ESR.
1 Execution of SPE and EFPU vector instructions is enabled.
Reserved
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-7
Interrupts and Exceptions
Table 7-4. MSR Bit Settings (continued)
Bits
Name
24
(56)
—
Reserved
25
(57)
—
Reserved
26
(58)
IS
Instruction Address Space
0 The processor directs all instruction fetches to address space 0 (TS = 0 in the relevant TLB entry).
1 The processor directs all instruction fetches to address space 1 (TS = 1 in the relevant TLB entry).
27
(59)
DS
Data Address Space
0 The processor directs all data storage accesses to address space 0 (TS = 0 in the relevant TLB
entry).
1 The processor directs all data storage accesses to address space 1 (TS = 1 in the relevant TLB
entry).
28
(60)
—
Reserved
29
(61)
PMM
PMM Performance Monitor mark bit. System software can set PMM when a marked process is
running to enable statistics to be gathered only during the execution of the marked process. MSR[PR]
and MSR[PMM] together define a state that the processor (supervisor or user) and the process
(marked or unmarked) may be in at any time. If this state matches an individual state specified in the
performance monitor registers PMLCa n, the state for which monitoring is enabled, counting is
enabled.
30
(62)
RI
Recoverable Interrupt. This bit is provided for software use to detect nested exception conditions. This
bit is cleared by hardware when a machine check interrupt is taken.
31
(63)
—
Reserved
7.3.1
Description
Machine Check Syndrome Register (MCSR)
When the processor takes a machine check interrupt, it updates the machine check syndrome register
(MCSR) to differentiate between machine check conditions. The MCSR indicates the source of a machine
check condition. When an async mchk or error report syndrome bit in the MCSR is set, the core complex
asserts p_mcp_out for system information.
All bits in the MCSR are implemented as write one to clear. Software in the machine check handler is
expected to clear the MCSR bits it has sampled prior to re-enabling MSR[ME] to avoid a redundant
machine check exception and to prepare for updated status bit information on the next machine check
interrupt. Hardware does not clear a bit in the MCSR other than at reset. Software typically samples MCSR
early in the machine check handler and uses the sampled value to clear those bits that were set at the time
of sampling. Note that additional bits may become set during the handler after sampling if an asynchronous
event occurs. By writing back only the originally sampled bits, another machine check can be generated
to process the new conditions after the original handler re-enables MSR[ME] either explicitly or by
restoring the MSR from MSRR1 at the return.
Note that any set bit in the MCSR other than status-type bits causes a subsequent machine check interrupt
once MSR[ME] = 1.
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-8
Freescale Semiconductor
Interrupts and Exceptions
Figure 7-3 shows the MCSR.
SPR 572
0
R
Access: w1c
1
2
3
4
5
6
7
W w1c
w1c
w1c
w1c
w1c
w1c
w1c
w1c
Reset
10
—
w1c
11
12
13
NMI
MAV
MEA
w1c
w1c
w1c
27
28
29
14
15
IF
—
w1c
All zeros
16
R
9
IC_D CP_P DC_D EXCP IC_T DC_T IC_LK DC_L
MCP
PERR ERR PERR _ERR PERR PERR ERR KERR
17
18
LD
ST
G
W w1c
w1c
w1c
19
25
26
30
SNP BUS_
BUS_
BUS_
ERR IRERR DRERR WRERR
—
w1c
Reset
w1c
w1c
31
—
w1c
All zeros
Figure 7-3. Machine Check Syndrome Register (MCSR)
Table 7-5 describes MCSR fields.
Table 7-5. Machine Check Syndrome Register (MCSR)
Exception
Type1
Recoverable
Machine check input pin
Async Mchk
Maybe
IC_DPERR
Instruction Cache data array parity error
Async Mchk
Precise
2
(34)
CP_PERR
Data Cache push parity error
Async Mchk
Unlikely
3
(35)
DC_DPERR
Data Cache data array parity error
Async Mchk
Maybe
4
(36)
EXCP_ERR
ISI, ITLB, or Bus Error on first instruction fetch for an
exception handler
Async Mchk
Precise
5
(37)
IC_TPERR
Instruction Cache Tag parity error
Async Mchk
Precise
6
(38)
DC_TPERR
Data Cache Tag parity error
Async Mchk
Maybe
7
(39)
IC_LKERR
Instruction Cache Lock error
Indicates a cache control operation or invalidation operation
invalidated one or more locked lines in the Icache or
encountered an uncorrectable lock error, or that an Icache
miss with an uncorrectable lock error occurred. May also be
set on locked line refill error.
Status
—
8
(40)
DC_LKERR
Data Cache Lock error
Indicates a cache control operation or invalidation operation
invalidated one or more locked lines in the Dcache or
encountered an uncorrectable lock error, or that an Icache
miss with an uncorrectable lock error occurred. May also be
set on locked line refill error.
Status
—
Bit
Name
0
(32)
MCP
1
(33)
Description
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-9
Interrupts and Exceptions
Table 7-5. Machine Check Syndrome Register (MCSR) (continued)
Exception
Type1
Recoverable
—
—
NMI
—
MCAR Address Valid
Indicates that the address contained in the MCAR was
updated by hardware to correspond to the first detected
Async Mchk error condition
Status
—
MCAR holds Effective Address
If MAV = 1,MEA = 1 indicates that the MCAR contains an
effective address and MEA = 0 indicates that the MCAR
contains a physical address
Status
—
—
—
Instruction Fetch Error Report
An error occurred during the attempt to fetch an instruction.
This could be due to a parity error, or an external bus error.
MCSRR0 contains the instruction address.
Error Report
Precise
LD
Load type instruction Error Report
An error occurred during the attempt to execute the load
type instruction located at the address stored in MCSRR0.
This could be due to a parity error or an external bus error.
Error Report
Precise
17
(49)
ST
Store type instruction Error Report
An error occurred during the attempt to execute the store
type instruction located at the address stored in MCSRR0.
This could be due to a parity error, or on certain external bus
errors.
Error Report
Precise
18
(50)
G
Guarded instruction Error Report
An error occurred during the attempt to execute the load or
store type instruction located at the address stored in
MCSRR0 and the access was guarded and encountered an
error on the external bus.
Error Report
Precise
19–:25
(51–57)
—
Reserved, should be cleared.
—
—
26
(58)
SNPERR
Snoop Lookup Error
An error occurred during certain snoop operations. This is
typically due to a data cache tag parity error, in which case
DC_TPERR will also be set.
Async Mchk
Unlikely?
27
(59)
BUS_IRERR
Read bus error on Instruction fetch or linefill
Async Mchk
Precise if data
used
28
(60)
BUS_DRERR
Read bus error on data load or linefill
Async Mchk
Precise if data
used
Bit
Name
Description
9–10
(41–42)
—
11
(43)
NMI
NMI input pin
12
(44)
MAV
13
(45)
MEA
14
(46)
—
Reserved, should be cleared.
15
(47)
IF
16
(48)
Reserved, should be cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-10
Freescale Semiconductor
Interrupts and Exceptions
Table 7-5. Machine Check Syndrome Register (MCSR) (continued)
Bit
Name
29
(61)
BUS_WRERR
30–31
(62–63)
—
Exception
Type1
Recoverable
Async Mchk
Unlikely
—
—
Description
Write bus error on store or cache line push
Reserved, should be cleared.
1
The Exception Type indicates the exception type associated with a given syndrome bit as follows:
• Error Report—indicates that this bit is only set for error report exceptions which cause machine check interrupts. These bits
are only updated when the machine check interrupt is actually taken. Error report exceptions are not gated by MSR[ME].
These are synchronous exceptions. These bits remain set until cleared by software writing a 1 to the bit position(s) to be
cleared.
• Status—indicates that this bit is provides additional status information regarding the logging of a machine check exception.
These bits remain set until cleared by software writing a 1 to the bit position(s) to be cleared.
• NMI—indicates that this bit is only set for the non-maskable interrupt type exception which causes a machine check interrupt.
This bit is only updated when the machine check interrupt is actually taken. NMI exceptions are not gated by MSR[ME]. This
is an asynchronous exception. This bit remains set until cleared by software writing a 1 to the bit position.
• Async Mchk—indicates that this bit is set for an asynchronous machine check exception. These bits are set immediately upon
detection of the error. Once any “Async Mchk” bit is set in the MCSR, a machine check interrupt will occur if MSR[ME] = 1. If
MSR[ME] = 0, the machine check exception will remain pending. These bits remain set until cleared by software writing a 1 to
the bit position(s) to be cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-11
Interrupts and Exceptions
7.4
Interrupt Vector Prefix Registers (IVPR)
The interrupt vector prefix register is used during interrupt processing for determining the starting address
of a software handler used to handle an interrupt. The value contained in the vector offset field of the IVOR
selected for a particular interrupt type is concatenated with the value held in the interrupt vector prefix
register (IVPR) to form an instruction address from which execution is to begin. The format of IVPR is
shown in Figure 7-4.
SPR 63
Access: Read/Write
0
15 16
R
31
—
Vector Base
W
Reset 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Figure 7-4. e200 Interrupt Vector Prefix Register (IVPR)
The IVPR fields are defined in Table 7-6.
Table 7-6. IVPR Register Fields
Bits
Name
Description
0–15
(32–47)
Vec Base
Vector Base
This field is used to define the base location of the vector table, aligned to a 64-KB boundary. This field
provides the high-order 16 bits of the location of all interrupt handlers. The contents of the IVORxx
register appropriate for the type of exception being processed are concatenated with the IVPR vector
base to form the address of the handler in memory.
16–31
(48–63)
—
7.5
Reserved
Interrupt Vector Offset Registers (IVORxx)
The interrupt vector offset registers are used during interrupt processing for determining the starting
address of a software handler used to handle an interrupt. The value contained in the vector offset field of
the IVOR selected for a particular interrupt type is concatenated with the value held in the interrupt vector
prefix register (IVPR) to form an instruction address from which execution is to begin.
Figure 7-5 shows the format of the e200 IVORs.
SPR 400–415
528–530
Access: Read/Write
0
R
W
15 16
—
27 28
Vector Offset
31
—
Figure 7-5. e200 Interrupt Vector Offset Register (IVOR)
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-12
Freescale Semiconductor
Interrupts and Exceptions
The IVOR fields are defined in Table 7-7.
Table 7-7. IVOR Register Fields
Bits
Name
0–15
(32–47)
—
16–27
(48–59)
Vector
Offset
28–31
(60–63)
—
7.6
Description
Reserved
Vector Offset. This field is used to provide a quad-word index from the base address provided by the
IVPR to locate an interrupt handler.
Reserved
Interrupt Definitions
This section provides detailed descriptions of the interrupts listed in Table 7-8.
Table 7-8. Interrupts
IVOR Number
Type of Interrupt
Section/Page
IVOR0
Critical Input
7.6.1/7-14
IVOR1
Machine Check
7.6.2/7-14
IVOR2
Data Storage
7.6.3/7-28
IVOR3
Instruction Storage
7.6.4/7-29
IVOR4
External Input
7.6.5/7-30
IVOR5
Alignment
7.6.6/7-31
IVOR6
Program
7.6.7/7-31
IVOR7
Floating-Point Unavailable
7.6.8/7-32
IVOR8
System Call
7.6.9/7-33
IVOR9
Auxiliary Processor Unavailable
7.6.10/7-34
IVOR10
Decrementer
7.6.11/7-34
IVOR11
Fixer-Interval Timer
7.6.12/7-34
IVOR12
Watchdog Timer
7.6.13/7-35
IVOR13
Data TLB Error
7.6.14/7-36
IVOR14
Instruction TLB Error
7.6.15/7-36
IVOR15
Debug
7.6.16/7-37
IVOR16
System Reset
7.6.17/7-40
IVOR32
SPE/EFPU Unavailable
7.6.18/7-41
IVOR33
Embedded Floating-Point Data
7.6.19/7-41
IVOR34
Embedded Floating-Point Round
7.6.20/7-42
IVOR35
Performance Monitor
7.6.21/7-43
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-13
Interrupts and Exceptions
7.6.1
Critical Input Interrupt (IVOR0)
A critical input exception is signaled to the processor by the assertion of the critical interrupt pin
(p_critint_b). When the e200 detects the exception, it takes the critical input interrupt if the exception is
enabled by MSR[CE]. The p_critint_b input is a level-sensitive signal expected to remain asserted until
the e200 acknowledges the interrupt. If p_critint_b is negated early, recognition of the interrupt request is
not guaranteed. After the e200 begins execution of the critical interrupt handler, the system can safely
negate p_critint_b.
A critical input interrupt may be delayed by other higher priority exceptions or if MSR[CE] is cleared
when the exception occurs.
Table 7-9 lists register settings when a critical input interrupt is taken.
Table 7-9. Critical Input Interrupt—Register Settings
Register
Setting Description
CSRR0
Set to the effective address of the instruction that the processor would have attempted to execute next if no
exception conditions were present.
CSRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE
SPE
WE
CE
EE
PR
ESR
Unchanged
MCSR
Unchanged
DEAR
Unchanged
Vector
IVPR[0:15] || IVOR0[16:27] || 0b0000 (auto-vectored)
IVPR[0:15] || p_voffset[0:11] || 0b0000 (non-auto-vectored)
1
0
0
0
0
0
0
FP
ME
FE0
DE
0
—
0
—/01
FE1
IS
DS
PMM
RI
0
0
0
0
—
DE is cleared when the debug unit is disabled. When the debug unit is enabled, control in HID0 optionally supports the clearing
of DE.
When the debug unit is enabled, MSR[DE] is not automatically cleared by a critical input interrupt, but
can be configured to be cleared via the HID0 register (HID0[CICLRDE]). Refer to Section 2.4.11,
“Hardware Implementation Dependent Register 0 (HID0).”
IVOR0 is the vector offset register used by auto-vectored critical input interrupts to determine the interrupt
handler location. e200 also provides the capability to directly vector critical input interrupts to multiple
handlers by allowing a critical input interrupt request to be accompanied by a vector offset. The
p_voffset[0:11] input signals are used in place of the value in IVOR0 to form the interrupt vector when a
critical input interrupt request is not auto-vectored (p_avec_b negated when p_critint_b asserted).
7.6.2
Machine Check Interrupt (IVOR1)
The e200 implements the machine check exception as defined in the Freescale EIS machine check unit
except for automatic clearing of MSR[DE]. This behavior is different from the definition in Power ISA
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-14
Freescale Semiconductor
Interrupts and Exceptions
embedded category architecture. The e200 initiates a machine check interrupt if any of the machine check
sources listed in Table 7-2 is detected.
As defined in Freescale EIS machine check unit, a machine check interrupt is taken for error report and
NMI type machine check conditions, even if MSR[ME] is cleared, without the processor generating an
internal checkstop condition. MSR[ME] gates the processing of asynchronous type machine check sources
(the sources reflected in the MCSR async mchk syndrome bits).
The Freescale EIS machine check unit defines a separate set of save/restore registers
(MCSRR0–MCSRR1), a machine check syndrome register (MCSR) to record the source(s) of machine
checks, and a machine check address register (MCAR) to hold an address associated with a machine check
for certain classes of machine checks. Return from machine check instructions (rfmci, se_rfmci) are also
provided to support returns using MCSRR0–MCSRR1.
The MSR[RI] status bit is provided for software use in determining if multiple nested machine check
exceptions have occurred. Software may interrogate MCSRR1[RI] to determine whether a machine check
occurred during the initial portion of a machine check handler prior to the handler code that sets MSR[RI]
to one to indicate that the handler can now tolerate another machine check condition without losing state
necessary for recovery.
MSR[DE] is not automatically cleared by a machine check exception, but can be configured to be cleared
or left unchanged via the HID0 register (HID0[MCCLRDE]). Refer to Section 2.4.11, “Hardware
Implementation Dependent Register 0 (HID0).”
7.6.2.1
Machine Check Causes
Machine check causes are divided into different types, as follows:
• Error report machine check conditions
• Nonmaskable interrupt (NMI) machine check exceptions
• Asynchronous machine check exceptions
This division is intended to facilitate machine check handling in uniprocessor, multiprocessor, and
multithreaded systems. Although the initial implementation of the e200z7 does not implement
multithreading, future versions are expected to, and the machine check model will remain compatible. In
addition, the model is equally applicable to a single-threaded design.
7.6.2.1.1
Error Report Machine Check Exceptions
Error report machine check exceptions are directly associated with the current instruction execution stream
and are presented to the interrupt mechanism in a manner analogous to an instruction storage or data
storage interrupt. Since the execution stream cannot continue execution without suffering from corruption
of architectural state, these exceptions are not masked by MSR[ME]. Error report machine check
exceptions are not necessarily recoverable if they occur during the initial portion of a machine check
handler. MSR[RI] and MCSRR1[RI] are provided to assist software in determining recoverability.
For error report machine check exceptions, the MCSR (machine check status register) is updated only
when the machine check interrupt is actually taken. The MCAR is not updated for error report machine
check exceptions.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-15
Interrupts and Exceptions
Error report machine check exceptions encountered by program execution can be flushed if an older
exception exists or if an asynchronous interrupt or machine check is taken before the instruction that
encountered the error becomes the oldest instruction in the machine. In this case, the corresponding MCSR
bit is not set due to the flushed exception condition (although the corresponding bit may have already been
set by a previous instruction’s exception).
Note that an async machine check condition may occur for the same error condition prior to the error report
machine check. The error report machine check may be discarded.
Depending on the type of error, hardware sets the MCSR IF, LD, G, or ST bit(s) to reflect the error being
reported. Software is responsible for clearing these syndrome bits by writing a one to the bit(s) to be
cleared. Hardware does not clear an error report bit once it is set. The bits are set as follows:
• MCSR[IF] is set if the error occurs during an instruction fetch
• MCSR[LD] is set if the error occurs for a load instruction. If the error occurs for a guarded load
and the error source was from the external bus, MCSR[G] is also set.
• MCSR[ST] is set if the error occurs in the data cache (parity) or MMU (DTLB error or DSI) for a
store type instruction (including dcbz), if an external termination error is received on a
cache-inhibited guarded store or on a store conditional instruction, or if an unsuccessful flush with
invalidation occurs on a store conditional instruction due to a tag or data parity error or external
bus error. If an external termination error occurs on a cache-inhibited guarded store or on a guarded
store conditional, MCSR[G] is also set.
Note that most (if not all) error report machine check exceptions are accompanied by an associated
asynchronous machine check exception on a single-threaded e200z7, although this is not generally the
case for a multithreaded version.
Table 7-10 shows the error report machine check exceptions.
Table 7-10. Error Report Machine Check Exceptions
Synchronous Machine Check
Source
Instruction Fetch
Error Type
MCSR Updates
Precise1
(Icache tag array parity error or data array parity
error) and L1CSR1[ICEA] = 00
IF
Yes
(Icache uncorrectable tag array parity error or
data array parity error and L1CSR1[ICEA] = 01
and line potentially locked (locked or lock parity
error) was invalidated
IF
Yes
Cacheable miss and L1CSR1[ICEA] = 00 and
any line with lock parity error
IF
Yes
Cacheable miss and L1CSR1[ICEA] = 01 and
and line with uncorrectable lock parity error was
invalidated
IF
Yes
External termination error
IF
Yes
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-16
Freescale Semiconductor
Interrupts and Exceptions
Table 7-10. Error Report Machine Check Exceptions (continued)
Synchronous Machine Check
Source
Load instruction
Error Type
MCSR Updates
Precise1
Dcache tag array parity error or data array parity
error) and L1CSR0[DCEA] = 00
LD
Yes
(Dcache uncorrectable tag array parity error or
data array parity error) and
L1CSR0[DCEA] = 01 and (line potentially
locked (locked or lock parity error) was
invalidated, or line potentially dirty (dirty or dirty
parity error))
LD
Yes
Cacheable miss and L1CSR0[DCEA] = 00 and
any line with lock parity error, or dirty parity error
on replacement line
LD
Yes
Cacheable miss and L1CSR0[DCEA] = 01 and
line with uncorrectable lock parity error was
invalidated
LD
Yes
LD, [G]2
Yes
Dcache tag array parity error and
L1CSR0[DCEA] = 00
LD
Yes
Dcache hit and dirty parity error and
L1CSR0[DCEA] = 00
LD
Yes
(Dcache uncorrectable tag array parity error or
data array parity error) and
L1CSR0[DCEA] = 01 and line potentially dirty
(dirty or dirty parity error)
LD
Yes
Dcache data push parity error3
LD
Yes
External termination error on dirty push3
LD
Yes
External termination error on load
LD, [G]2
Yes
Dcache tag array parity error and
L1CSR0[DCEA] = 00
ST
Yes
Dcache uncorrectable tag array parity error and
L1CSR0[DCEA] = 01 and (line potentially
locked (locked or lock parity error) was
invalidated, or line potentially dirty (dirty or dirty
parity error))
ST
Yes
Cacheable miss and L1CSR0[DCEA] = 00 and
any line with lock parity error, or dirty parity error
on replacement line
ST
Yes
Cacheable miss and L1CSR0[DCEA] = 01 and
line with uncorrectable lock parity error was
invalidated
ST
Yes
External termination error on unbuffered store4
ST, [G]7
Yes
ST, G
Yes
External termination error on load data
Load and reserve instruction
Store instruction
External termination error on CI+G
store5
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-17
Interrupts and Exceptions
Table 7-10. Error Report Machine Check Exceptions (continued)
Synchronous Machine Check
Source
Store conditional instruction
MCSR Updates
Precise1
Dcache tag array parity error and
L1CSR0[DCEA] = 00
ST
Yes
Dcache hit and dirty parity error and
L1CSR0[DCEA]= 00
ST
Yes
Dcache uncorrectable tag array parity error and
L1CSR0[DCEA]= 01 and line potentially dirty
(dirty or dirty parity error)
ST
Yes
Dcache data push parity error6
ST
Yes
ST
Yes
External termination error on store conditional
ST, [G]7
Yes
Dcache tag array parity error and miss and
L1CSR0[DCEA] = 00 and any line with error is
potentially dirty (dirty or dirty parity error)
LD
Yes
Dcache uncorrectable tag array parity error and
cacheable miss and L1CSR0[DCEA] = 01 and
line potentially dirty (dirty or dirty parity error)
LD
Yes
Dcache tag array parity error and miss and
L1CSR0[DCEA] = 00 and (line potentially
locked (locked or lock parity error) or line
potentially dirty (dirty or dirty parity error))
LD
Yes
Dcache uncorrectable tag array parity error and
miss and L1CSR0[DCEA] = 01 and (line
potentially locked (locked or lock parity error) or
line potentially dirty (dirty or dirty parity error))
LD
Yes
Dcache tag array parity error and cacheable
miss and L1CSR0[DCEA] = 00 and line
potentially locked (locked or lock parity error)
LD
Yes
Dcache uncorrectable tag array parity error and
cacheable miss and L1CSR0[DCEA] = 01 and
line potentially locked (locked or lock parity
error)
LD
Yes
Error Type
6
External termination error on dirty push
dcbst instruction
dcbf instruction
dcblc instruction
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-18
Freescale Semiconductor
Interrupts and Exceptions
Table 7-10. Error Report Machine Check Exceptions (continued)
Synchronous Machine Check
Source
Error Type
MCSR Updates
Precise1
(Dcache tag array parity error or lock error) and
miss and L1CSR0[DCEA] = 00
LD
Yes
Dcache uncorrectable tag array parity error and
cacheable miss and L1CSR0[DCEA] = 01 and
(line potentially locked (locked or lock parity
error) was invalidated, or line potentially dirty
(dirty or dirty parity error))
LD
Yes
Cacheable miss and L1CSR0[DCEA] = 00 and
any line with lock parity error, or dirty parity error
on replacement line
LD
Yes
Cacheable miss and L1CSR0[DCEA] = 01 and
line with uncorrectable lock parity error was
invalidated
LD
Yes
LD, [G]2
Yes
(Dcache tag array parity error or lock error) and
cacheable miss and L1CSR0[DCEA] = 00
ST
Yes
Dcache uncorrectable tag array parity error and
cacheable miss and L1CSR0[DCEA] = 01 and
(line potentially locked (locked or lock parity
error) was invalidated, or line potentially dirty
(dirty or dirty parity error))
ST
Yes
Cacheable miss and L1CSR0[DCEA] = 00 and
any line with lock parity error, or dirty parity error
on replacement line
ST
Yes
dcbz instruction8
Cacheable miss and L1CSR0[DCEA] = 01 and
line with uncorrectable lock parity error was
invalidated
ST
Yes
L1FINV0 flush or flush with
invalidate operation
Dcache tag parity error and
L1CSR0[DCEA] = 00 and line potentially dirty
(dirty or dirty parity error)
LD
Yes
Icache tag array parity error and cacheable miss
and L1CSR1[ICEA] = 00 and line potentially
locked (locked or lock parity error)
IF
Yes
Icache uncorrectable tag array parity error and
cacheable miss and L1CSR1[ICEA] = 01 and
line potentially locked (locked or lock parity
error) was invalidated
IF
Yes
dcbtls, dcbtstls instruction
External termination error on linefill
dcbz instruction8
Dcache uncorrectable tag parity error and
L1CSR0[DCEA] = 01 and line potentially dirty
(dirty or dirty parity error)
icblc instruction
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-19
Interrupts and Exceptions
Table 7-10. Error Report Machine Check Exceptions (continued)
Synchronous Machine Check
Source
icbtls instruction
Exception Vectoring
1
2
3
4
5
6
7
8
Error Type
MCSR Updates
Precise1
(Icache tag array parity error or lock error) and
cacheable miss and L1CSR1[ICEA] = 00
IF
Yes
Icache uncorrectable tag array parity error and
cacheable miss and L1CSR1[ICEA] = 01 and
line potentially locked (locked or lock parity
error) was invalidated
IF
Yes
External termination error on linefill
IF
Yes
ISI, ITLB, or Bus Error on first instruction fetch
for an exception handler
IF
Yes
MCSRR0 will point to the instruction associated with the machine check condition
G will be set if the load was a guarded load.
Can only occur if the load and reserve causes a dirty line to be flushed
Store may be unbuffered if the store buffer is disabled, and the store is not allocating a cache line
Only reported if the store was a cache-inhibited guarded store
Can only occur if the store conditional causes a dirty line to be flushed
Only reported if the store was a guarded store.
Alignment error may be generated concurrently
7.6.2.1.2
Nonmaskable Interrupt Machine Check Exceptions
Nonmaskable interrupt exceptions are reported via the p_nmi_b input pin, which is transition sensitive.
MSR[ME] does not gate NMI exceptions, thus they are not necessarily recoverable if an NMI exception
occurs during the initial part of a machine check exception handler. MSR[RI] and MCSRR1[RI] assist
software in determining recoverability.
For NMI machine check exceptions, MCSR[NMI] is updated (set) only when the machine check interrupt
is actually taken. Hardware does not clear the MCSR[NMI] syndrome bit. Software is responsible for
clearing this syndrome bit by writing a one to the bit(s) to be cleared. Hardware does not clear an NMI bit
once it is set.
The MCAR is not updated for NMI machine check exceptions.
7.6.2.1.3
Asynchronous Machine Check Exceptions
The remainder of machine check exceptions are classified as asynchronous machine check exceptions, as
they are reported directly by the subsystem or resource which detected the condition. For many cases, the
asynchronous condition is reported simultaneously with a corresponding error report condition. These
conditions are reported by immediately setting the corresponding MCSR async mchk syndrome bit,
regardless of the state of MSR[ME]. Interrupts due to asynchronous machine check exceptions are gated
by MSR[ME]. If MSR[ME] = 0 at the time an async mchk bit becomes set, the interrupt is postponed until
MSR[ME] is later set (although a machine check interrupt may occur at the time of the event due to an
error report exception). Asynchronous events are cumulative; hardware does not clear an async mchk
syndrome bit. Software is responsible for clearing these syndrome bits by writing a one to the bit(s) to be
cleared. Hardware does not clear an async mchk bit once it is set.
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-20
Freescale Semiconductor
Interrupts and Exceptions
If MCSR[MAV] is cleared at the time an asynchronous machine check exception occurs that has a
corresponding address (either an effective or real address) to log in the MCAR, the MCAR and
MCSR[MEA] are updated, and MCSR[MAV] is set. If MCSR[MAV] was previously set, the MCAR and
MCSR[MEA] are not affected.
Table 7-11 details all asynchronous machine check sources.
Table 7-11. Asynchronous Machine Check Exceptions
Asynchronous
Machine Check
Source
Transaction Source
Error Type
MCSR Update1
MCAR
Update2
External
N/A
Machine Check Input Pin3
MCP
None
Instruction Cache
Instruction Fetch
Tag array parity error and
L1CSR1[ICEA] = 00
Data Cache
MAV
IC_TPERR
RA
Icache hit, data array parity
error and
L1CSR1[ICEA] = 00
IC_DPERR
RA
Icache cacheable miss, lock
error, and
L1CSR1[ICEA] = 00
IC_TPERR,
IC_LKERR
RA
L1CSR1[ICEA] = 01 and
auto-invalidation of locked or
potentially locked line due to
uncorrectable tag parity error
IC_TPERR,
IC_LKERR
RA
icblc
Tag array parity error and
cacheable miss and
L1CSR1[ICEA] = 00 and line
potentially locked (locked or
lock parity error)
IC_TPERR,
[IC_LKERR (if lock
parity error)]
RA
icbtls
(Tag array parity error or lock
error) and cacheable miss
and L1CSR1[ICEA] = 00
IC_TPERR,
[IC_LKERR (if lock
parity error)]
RA
icblc
icbtls
L1CSR1[ICEA] = 01 and
Auto-invalidation of locked
line due to uncorrectable tag
parity error
IC_TPERR,
IC_LKERR
RA
dcblc
Tag array parity error and
cacheable miss and
L1CSR0[DCEA] = 00 and
line potentially locked (lock or
lock parity error)
DC_TPERR,
[DC_LKERR (if
lock parity error)]
RA
MAV
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-21
Interrupts and Exceptions
Table 7-11. Asynchronous Machine Check Exceptions (continued)
Asynchronous
Machine Check
Source
Transaction Source
Error Type
Data Cache
load or store
Tag array parity error and
L1CSR0[DCEA] = 00
L1FINV0 flush or flush
w/inv & line dirty or
potentially dirty
MCSR Update1
MAV
MCAR
Update2
DC_TPERR,
[DC_LKERR (if
lock parity error on
line with tag parity
error)]
RA
Tag array parity error and
L1CSR0[DCEA] = 00
DC_TPERR
RA
dcbtls
dcbtstls
dcbz
Tag array parity error and
cacheable miss and
L1CSR0[DCEA] = 00
DC_TPERR
RA
dcbf
Tag array parity error and
miss and
L1CSR0[DCEA] = 00 and
(line potentially locked
(locked or lock parity error) or
line potentially dirty (dirty or
dirty parity error))
DC_TPERR,
[DC_LKERR (if
lock parity error)]
RA
atomic load or store
Hit and L1CSR0[DCEA] = 00
and line has dirty parity error
DC_TPERR
RA
dcbst,
atomic load or store
Tag array parity error and
miss and
L1CSR0[DCEA] = 00 and
line potentially dirty (dirty or
dirty parity error)
DC_TPERR,
[DC_LKERR (if
lock parity error)]
RA
load or store
dcbtls
dcbtstls
dcbz
Dcache cacheable miss and
L1CSR0[DCEA] = 00 and
lock parity error
DC_TPERR,
DC_LKERR
RA
load or store
dcbtls
dcbtstls
dcbz
Dcache cacheable miss and
L1CSR0[DCEA] = 00 and
dirty parity error on line to be
replaced
DC_TPERR
RA
load or store
dcbtls
dcbtstls
dcbz
Dcache uncorrectable tag
array parity error and
L1CSR0[DCEA] = 01 and
(line potentially locked
(locked or lock parity error)
was invalidated, or line
potentially dirty (dirty or dirty
parity error))
DC_TPERR,
[DC_LKERR]
RA
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-22
Freescale Semiconductor
Interrupts and Exceptions
Table 7-11. Asynchronous Machine Check Exceptions (continued)
Asynchronous
Machine Check
Source
Transaction Source
Error Type
Data Cache
L1FINV0 flush w/inv
Dcache uncorrectable tag
array parity error and
L1CSR0[DCEA] = 01 and
line potentially dirty (dirty or
dirty parity error))
dcblc
MCSR Update1
DC_TPERR
RA
Dcache uncorrectable tag
array parity error and
L1CSR0[DCEA] = 01 and
(line potentially locked
(locked or lock parity error)
was invalidated
DC_TPERR,
[DC_LKERR]
RA
dcbst,
atomic load or store
Dcache uncorrectable tag
array parity error and
L1CSR0[DCEA] = 01 and
line potentially dirty (dirty or
dirty parity error)
DC_TPERR,
[DC_LKERR (if
uncorrectable lock
parity error)]
RA
dcbf
Dcache uncorrectable tag
array parity error and
L1CSR0[DCEA] = 01 and
(line potentially locked
(locked or lock parity error) or
line potentially dirty (dirty or
dirty parity error))
DC_TPERR,
[DC_LKERR (if
uncorrectable lock
parity error)]
RA
L1FINV0 flush
Dcache uncorrectable tag
array parity error and
L1CSR0[DCEA] = 01 and
line potentially dirty (dirty or
dirty parity error)
DC_TPERR
RA
load
Dcache hit, data array parity
error and
L1CSR0[DCEA] = 00
DC_DPERR
RA
Dcache hit, data array parity
error and
L1CSR0[DCEA] = 01 and
line potentially dirty (dirty or
dirty parity error)
DC_DPERR
RA
Data array push parity error
CP_PERR
RA
DC_TPERR,
SNPERR
RA
(snoop
address)
replacement push
dcbf push
dcbst push
L1FINV0 push
MAV
MCAR
Update2
reservation instruction
forced-push
Data Cache
snoop lookup
Tag array parity error and
(cacheable miss, or hit only
to way with tag parity error)
MAV
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-23
Interrupts and Exceptions
Table 7-11. Asynchronous Machine Check Exceptions (continued)
Asynchronous
Machine Check
Source
Transaction Source
Error Type
BIU
store or push
Bus error on write or push
load store/w allocate
dcbtls
dcbtstls
MCSR Update1
MAV
MCAR
Update2
BUS_WRERR
RA
Bus error on load fetch or
linefill
BUS_DRERR
RA
load
Bus error on error recovery
refill
BUS_DRERR
RA
instruction fetch
Bus error on error recovery
refill
BUS_IRERR
RA
icbtls
CI or cache disabled
Ifetch
Bus error on icbtls fill
Bus error on CI Ifetch
Bus error on cache disabled
Ifetch
BUS_IRERR
RA
load
Bus error on locked line error
recovery refill
BUS_DRERR,
DC_LKERR
RA
instruction fetch
Bus error on locked line error
recovery refill
BUS_IRERR,
IC_LKERR
RA
Snoop Lookup
INV snoop command
type
Tag array parity error and
(miss, or hit only to way with
tag parity error)
MAV
SNPERR,
DC_TPERR
RA4
Exception Vectoring
first instruction fetch
for an exception
handler
ISI or Bus Error on first
instruction fetch for an
exception handler
MAV
EXCP_ERR
RA
first instruction fetch
for an exception
handler
ITLB Error on first instruction
fetch for an exception handler
MAV
EXCP_ERR
EA
1
The MCSR update column indicates which bits in the MCSR will be updated when the exception is logged.
The MCAR update column indicates whether or not the error will provide either a real address (RA), effective address
(EA), or no address (none) which is associated with the error.
3 The machine check input pin is used by the platform logic to indicate machine check type errors which are detected by
the platform. Software must query error logging information within the platform logic to determine the specific error
condition and source.
4
The RA stored in the MCAR for this case will be Snoop Address value, with the index bits set to 0.
2
Table 7-12 details the priority of asynchronous machine check updates to the MCAR when multiple
simultaneous async machine check conditions occur. Note that since a lower priority condition may occur
and then a higher priority condition may subsequently occur prior to the machine check interrupt handler
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-24
Freescale Semiconductor
Interrupts and Exceptions
reading the MCSR and MCAR, the interrupt handler may not necessarily see the higher priority MCAR
value, even though multiple MCSR bits are set.
Table 7-12. Asynchronous Machine Check MCAR Update Priority
Priority
(0 = highest)
Asynchronous
Machine Check
Source
0
Exception Vectoring
Transaction Source
Error Type
(MCSR Update)
first instruction fetch for
an exception handler
ISI or Bus Error on first
instruction fetch for an
exception handler
EXCP_ERR
first instruction fetch for
an exception handler
ITLB Error on first instruction
fetch for an exception handler
EXCP_ERR
replacement push
dcbf push
dcbst push
L1FINV0 push
reservation-type
instruction forced push
Dirty push parity error
1
Data Cache
2
BIU
store or push
Bus error on write or push
3
Data Cache
load or store
dcblc
dcbtls
dcbtstls
dcbz
Uncorrectable tag array parity
error and L1CSR0[DCEA] = 01
and locked line invalidated
DC_TPERR,
DC_LKERR
4
Instruction Cache
icblc
icbtls
instruction fetch
Uncorrectable tag array parity
error, L1CSR1[ICEA] = 01, and
locked line invalidated
IC_TPERR,
IC_LKERR
5
BIU
load
Bus error on locked line error
recovery refill
BUS_DRERR,
DC_LKERR
6
BIU
instruction fetch
Bus error on locked line error
recovery refill
BUS_IRERR,
IC_LKERR
7
Data Cache
load or store
dcbf
dcbtls
dcbtstls
dcbz
L1FINV0 flush or flush
w/inv & line dirty
Tag array parity error and
L1CSR0[DCEA] = 00
DC_TPERR
CP_PERR
BUS_WRERR
Uncorrectable tag array parity
error and L1CSR0[DCEA] = 01
and line dirty or potentially dirty
7
Data Cache
load or store
dcbtls
dcbtstls
dcbz
Cacheable miss and
L1CSR0[DCEA] = 00 and dirty
parity error on line to be
replaced
DC_TPERR
7
Data Cache
load or store
dcbtls
dcbtstls
dcbz
Cacheable miss and
L1CSR0[DCEA] = 00 and lock
parity error
DC_TPERR,
DC_LKERR
Cacheable miss and
L1CSR0[DCEA] = 01 and
uncorrectable lock parity error
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-25
Interrupts and Exceptions
Table 7-12. Asynchronous Machine Check MCAR Update Priority (continued)
Priority
(0 = highest)
Asynchronous
Machine Check
Source
8
Data Cache
9
Data Cache
Transaction Source
dcbst
dcblc
Error Type
(MCSR Update)
Tag array parity error &
L1CSR0[DCEA] = 00 & line
potentially dirty (dirty or dirty
parity error)
DC_TPERR,
[DC_LKERR (if lock
parity error)]
Uncorrectable tag array parity
error, L1CSR0[DCEA] = 01,
line potentially dirty (dirty or
dirty parity error)
DC_TPERR,
[DC_LKERR (if
uncorrectable lock
parity error)]
Tag array parity error,
L1CSR0[DCEA] = 00, line
potentially locked (locked or
lock parity error)
DC_TPERR,
[DC_LKERR (if lock
parity error)]
Uncorrectable tag array parity
error, L1CSR0[DCEA] = 01,
and line potentially locked
(locked or lock parity error)
10
Data Cache
load
Data array parity error and
L1CSR0[DCEA] = 00
DC_TPERR,
[DC_LKERR (if
uncorrectable lock
parity error)]
DC_DPERR
Data array parity error, line dirty
or potentially dirty,
L1CSR0[DCEA] = 01
11
Instruction Cache
icblc
Tag array parity error,
L1CSR1[ICEA] = 00, line
locked or lock parity error
IC_TPERR,
[IC_LKERR]
icbtls
Tag array parity error and
L1CSR1[ICEA] = 00
IC_TPERR
Cacheable miss,
L1CSR1[ICEA] = 00, lock
parity error
IC_TPERR,
IC_LKERR
Cacheable miss,
L1CSR1[ICEA] = 01,
uncorrectable lock parity error
12
BIU
load store/w allocate
dcbtls
dcbtstls
Bus error on load or linefill or
data refill
BUS_DRERR
CI or cache disabled
Ifetch
Bus error on CI Ifetch
Bus error on cache disabled
Ifetch
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-26
Freescale Semiconductor
Interrupts and Exceptions
Table 7-12. Asynchronous Machine Check MCAR Update Priority (continued)
Priority
(0 = highest)
Asynchronous
Machine Check
Source
13
BIU
Transaction Source
Error Type
(MCSR Update)
Bus error on linefill or data refill
Bus error on CI Ifetch
Bus error on cache disabled
Ifetch
BUS_IRERR
snoop lookup
Tag parity error and (miss, or hit
only to way with tag parity error)
DC_TPERR,
SNPERR
Instruction Fetch
Tag array parity error and
L1CSR1[ICEA] = 00
IC_TPERR
Data array parity error and
L1CSR1[ICEA] = 00
IC_DPERR
Cacheable miss,
L1CSR1[ICEA] = 00, lock
parity error
IC_TPERR,
IC_LKERR
icbtls
CI or cache disabled
Ifetch
14
Data Cache
15
Instruction Cache
16
Instruction Cache
17
Instruction Cache
Instruction Fetch
Cacheable miss,
L1CSR1[ICEA] = 01,
uncorrectable lock parity error
7.6.2.2
Machine Check Interrupt Actions
Machine check interrupts for error report conditions and NMI are enabled and taken regardless of the state
of MSR[ME]. Machine check interrupts due to an async mchk syndrome bit being set in MCSR are only
taken when MSR[ME] = 1. When a machine check interrupt is taken, registers are updated as shown in
Table 7-13.
Table 7-13. Machine Check Interrupt—Register Settings
Register
Setting Description
MCSRR0
On a best-effort basis, the e200 sets this to the address of some instruction that was executing or about to be
executing when the machine check condition occurred.
MCSRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE
SPE
WE
CE
EE
PR
0
0
0
0
0
0
ESR
Unchanged
FP
ME
FE0
DE
0
0
0
0/—1
FE1
IS
DS
PMM
RI
0
0
0
0
0
MCSR
Updated to reflect the source(s) of a machine check. Hardware only sets appropriate bits, no previously set bits
are cleared by hardware.
MCAR
See Table 7-11
Vector
IVPR[0–15] || IVOR1[16–27] || 0b0000
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-27
Interrupts and Exceptions
1
DE is cleared when the debug unit is disabled. When the debug unit is enabled, control in HID0 optionally supports clearing DE.
The machine check syndrome register is provided to identify the source(s) of a machine check and may be
used to identify recoverable events in conjunction with MCSRR1[RI].
The MSR[RI] status bit is provided for software use in determining if multiple nested machine check
exceptions have occurred. Software may interrogate MCSRR1[RI] to determine if a machine check
occurred during the initial portion of a machine check handler prior to the handler code that sets MSR[RI]
to indicate that the handler can now tolerate another machine check condition without losing state
necessary for recovery. The interrupt handler should set MSR[RI] as soon as possible after saving off
working registers and MCSRR0,1 to avoid loss of state if another machine check condition were to occur.
The machine check input pin p_mcp_b can be masked by HID0[EMCP].
The nonmaskable interrupt machine check input pin p_nmi_b is never masked.
Precise external termination errors occur when a load or cache-inhibited or guarded store is terminated by
assertion of p_tea_b (external bus ERROR termination response); these result in both an error report and
an async mchk machine check exception.
Some machine check exceptions are unrecoverable in the sense that execution cannot resume in the
context that existed before the interrupt. However, system software can use the machine check interrupt
handler to try to identify and recover from the machine check condition.
7.6.2.3
Checkstop State
Machine checks no longer result in a checkstop and there is no checkstop state implemented on the e200z7.
7.6.3
Data Storage Interrupt (IVOR2)
A data storage interrupt (DSI) may occur if no higher priority exception exists and one of the following
exception conditions exists:
• Read or write access control exception condition
• Byte ordering exception condition
• Cache locking exception condition
Access control is defined as in the Power ISA embedded category. A byte ordering exception condition
occurs for any misaligned access across a page boundary to pages with mismatched E bits. Cache locking
exception conditions occur for any attempt to execute a dcbtls, dcbtstls, dcblc, icbtls, or icblc in user
mode with MSR[UCLE] = 0.
Table 7-14 lists register settings when a DSI is taken.
Table 7-14. Data Storage Interrupt—Register Settings
Register
Setting Description
SRR0
Set to the effective address of the excepting load/store instruction.
SRR1
Set to the contents of the MSR at the time of the interrupt
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-28
Freescale Semiconductor
Interrupts and Exceptions
Table 7-14. Data Storage Interrupt—Register Settings (continued)
MSR
UCLE
SPE
WE
CE
EE
PR
0
0
0
—
0
0
FP
ME
FE0
DE
ESR
Access:
Byte ordering:
Cache locking:
0
—
0
—
FE1
IS
DS
PMM
RI
0
0
0
0
—
[ST], [SPE], [VLEMI]. All other bits cleared.
[ST], [SPE], [VLEMI], BO. All other bits cleared.
(DLK, ILK), [VLEMI], [ST]. All other bits cleared.
MCSR
Unchanged
DEAR
For access and byte ordering exceptions, set to the effective address of a byte within the page whose access
caused the violation. Undefined on cache locking exceptions (The e200 does not update the DEAR on a cache
locking exception)
Vector
IVPR[0–15] || IVOR2[16–27] || 0b0000
7.6.4
Instruction Storage Interrupt (IVOR3)
An instruction storage interrupt (ISI) occurs when no higher priority exception exists and an execute
access control exception occurs. This interrupt is implemented as defined by the Power ISA embedded
category, with the addition of misaligned instruction fetch exceptions, and the extension of the byte
ordering exception status to also cover mismatched instruction storage exceptions.
Exception extensions implemented in the e200 for Power ISA VLE involve extending the definition of the
instruction storage interrupt to include byte ordering exceptions for instruction accesses, misaligned
instruction fetch exceptions, and corresponding updates to the ESR as shown in Table 7-15 and Table 7-16.
Table 7-15. ISI Exceptions and Conditions
Interrupt Type
Interrupt Vector
Offset
Register
Instruction storage
IVOR 3
Causing Conditions
1. Access control.
2. Byte ordering due to misaligned instruction across page boundary to pages with
mismatched VLE bits, or access to page with VLE set, and E indicating little
endian.
3. Misaligned Instruction fetch due to a change of flow to an odd half word
instruction boundary on a Power ISA (non-VLE) instruction page
Table 7-16 lists register settings when an ISI is taken.
Table 7-16. Instruction Storage Interrupt—Register Settings
Register
Setting Description
SRR0
Set to the effective address of the excepting instruction
SRR1
Set to the contents of the MSR at the time of the interrupt
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-29
Interrupts and Exceptions
Table 7-16. Instruction Storage Interrupt—Register Settings (continued)
MSR
UCLE
SPE
WE
CE
EE
PR
0
0
0
—
0
0
FP
ME
FE0
DE
ESR
[BO, MIF, VLEMI]. All other bits cleared.
MCSR
Unchanged
DEAR
Unchanged
Vector
IVPR[0:15] || IVOR3[16:27] || 0b0000
7.6.5
0
—
0
—
FE1
IS
DS
PMM
RI
0
0
0
0
—
External Input Interrupt (IVOR4)
An external input exception is signaled to the processor by the assertion of the external interrupt pin
(p_extint_b). The p_extint_b input is a level-sensitive signal expected to remain asserted until the e200
acknowledges the external interrupt. If p_extint_b is negated early, recognition of the interrupt request is
not guaranteed. When e200 detects the exception, if the exception is enabled by MSR[EE], the e200 takes
the external input interrupt.
An external input interrupt may be delayed by other higher priority exceptions or if MSR[EE] is cleared
when the exception occurs.
Table 7-17 lists register settings when an external input interrupt is taken.
Table 7-17. External Input Interrupt—Register Settings
Register
Setting Description
SRR0
Set to the effective address of the instruction that the processor would have attempted to execute next if no
exception conditions were present.
SRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE
SPE
WE
CE
EE
PR
ESR
Unchanged
MCSR
Unchanged
DEAR
Unchanged
Vector
IVPR[0–15] || IVOR4[16–27] || 0b0000
IVPR[0–15] || p_voffset[0:11] || 0b0000 (non-auto-vectored)
0
0
0
—
0
0
FP
ME
FE0
DE
0
—
0
—
FE1
IS
DS
PMM
RI
0
0
0
0
—
IVOR4 is the vector offset register used by auto-vectored external input interrupts to determine the
interrupt handler location. The e200 also provides the capability to directly vector external input interrupts
to multiple handlers by allowing a external input interrupt request to be accompanied by a vector offset.
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-30
Freescale Semiconductor
Interrupts and Exceptions
The p_voffset[0:11] input signals are used in place of the value in IVOR4 when a external input interrupt
request is not auto-vectored (p_avec_b negated when p_extint_b asserted).
7.6.6
Alignment Interrupt (IVOR5)
The e200 implements the alignment interrupt as defined by the Power ISA embedded category. An
alignment exception is generated when any of the following occurs:
• The operand of lmw or stmw not word aligned.
• The operand of lwarx or stwcx. not word aligned.
• The operand of lharx or sthcx. not half word aligned.
• Execution of a dcbz instruction is attempted with a disabled cache.
• Execution of a dcbz instruction with an enabled cache and W or I =1.
• Execution of a SPE load or store instruction which is not properly aligned.
Table 7-18 lists register settings when an alignment interrupt is taken.
Table 7-18. Alignment Interrupt—Register Settings
Register
Setting Description
SRR0
Set to the effective address of the excepting load/store instruction.
SRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE
SPE
WE
CE
EE
PR
ESR
[ST], [SPE], [VLEMI]. All other bits cleared.
0
0
0
—
0
0
FP
ME
FE0
DE
0
—
0
—
FE1
IS
DS
PMM
RI
0
0
0
0
—
MCSR
Unchanged
DEAR
Set to the effective address of a byte of the load or store whose access caused the violation.
Vector
IVPR[0–15] || IVOR5[16–27] || 0b0000
7.6.7
Program Interrupt (IVOR6)
The e200 implements the program interrupt as defined by the Power ISA embedded category. A program
interrupt occurs when no higher priority exception exists and one or more of the following exception
conditions defined in Power ISA embedded category occur:
• Illegal instruction exception
• Privileged instruction exception
• Trap exception
• Unimplemented operation exception
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-31
Interrupts and Exceptions
The e200 invokes an illegal instruction program exception on attempted execution of the following
instructions:
• Instruction from the illegal instruction class
• mtspr and mfspr instructions with an undefined SPR specified
• mtdcr and mfdcr instructions with an undefined DCR specified
The e200 invokes a privileged instruction program exception on attempted execution of the following
instructions when MSR[PR] = 1 (user mode):
• A privileged instruction
• mtspr and mfspr instructions that specify a SPRN value with SPRN[5] = 1 (even if the SPR is
undefined).
The e200 invokes a trap exception on execution of the tw and twi instructions if the trap conditions are
met and the exception is not also enabled as a debug interrupt.
The e200 invokes an unimplemented operation program exception on attempted execution of the
instructions lswi, lswx, stswi, stswx, mfapidi, mfdcrx, mtdcrx, or on any Power ISA embedded category
floating point instruction when MSR[FP] = 1. All other defined or allocated instructions that are not
implemented by the e200 cause an illegal instruction program exception.
Table 7-19 lists register settings when a program interrupt is taken.
Table 7-19. Program Interrupt—Register Settings
Register
Setting Description
SRR0
Set to the effective address of the excepting instruction.
SRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE
SPE
WE
CE
EE
PR
ESR
Illegal:
Privileged:
Trap:
Unimplemented:
0
0
0
—
0
0
FP
ME
FE0
DE
FE1
IS
DS
PMM
RI
0
0
0
0
—
PIL, [VLEMI]. All other bits cleared.
PPR, [VLEMI]. All other bits cleared.
PTR, [VLEMI]. All other bits cleared.
PUO, [FP], [VLEMI]. All other bits cleared.
MCSR
Unchanged
DEAR
Unchanged
Vector
IVPR[0–15] || IVOR6[16–27] || 0b0000
7.6.8
0
—
0
—
Floating-Point Unavailable Interrupt (IVOR7)
The floating-point unavailable exception is implemented as defined in the Power ISA embedded category.
A floating-point unavailable interrupt occurs when no higher priority exception exists, an attempt is made
to execute a floating-point instruction (including floating-point load, store, or move instructions), and the
floating-point available bit in the MSR is disabled (MSR[FP] = 0).
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-32
Freescale Semiconductor
Interrupts and Exceptions
Table 7-20 lists register settings when a floating-point unavailable interrupt is taken.
Table 7-20. Floating-Point Unavailable Interrupt—Register Settings
Register
Setting Description
SRR0
Set to the effective address of the excepting instruction.
SRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE
SPE
WE
CE
EE
PR
ESR
Unchanged
MCSR
Unchanged
DEAR
Unchanged
Vector
IVPR[0–15] || IVOR7[16–27] || 0b0000
7.6.9
0
0
0
—
0
0
FP
ME
FE0
DE
0
—
0
—
FE1
IS
DS
PMM
RI
0
0
0
0
—
System Call Interrupt (IVOR8)
A system call interrupt occurs when a system call (sc, se_sc) instruction is executed and no higher priority
exception exists.
Exception extensions implemented in the e200 for the Power ISA VLE include modification of the system
call interrupt definition to include updating the ESR.
Table 7-21 lists register settings when a system call interrupt is taken.
Table 7-21. System Call Interrupt—Register Settings
Register
Setting Description
SRR0
Set to the effective address of the instruction following the sc instruction.
SRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE
SPE
WE
CE
EE
PR
ESR
[VLEMI] All other bits cleared.
0
0
0
—
0
0
FP
ME
FE0
DE
MCSR
Unchanged
DEAR
Unchanged
Vector
IVPR[0–15] || IVOR8[16–27] || 0b0000
0
—
0
—
FE1
IS
DS
PMM
RI
0
0
0
0
—
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-33
Interrupts and Exceptions
7.6.10
Auxiliary Processor Unavailable Interrupt (IVOR9)
An auxiliary processor unavailable exception is defined by the Power ISA embedded category to occur
when an attempt is made to execute an auxiliary processor unit instruction which is implemented but
configured as unavailable, and no higher priority exception condition exists.
The e200 does not utilize this interrupt.
7.6.11
Decrementer Interrupt (IVOR10)
The e200 implements the decrementer exception as described in the EREF. A decrementer interrupt occurs
when no higher priority exception exists, a decrementer exception condition exists (TSR[DIS] = 1), and
the interrupt is enabled (both TCR[DIE] and MSR[EE] = 1).
The timer status register (TSR) holds the decrementer interrupt bit set by the timer facility when an
exception is detected. Software must clear this bit in the interrupt handler to avoid repeated decrementer
interrupts.
Table 7-22 lists register settings when a decrementer interrupt is taken.
Table 7-22. Decrementer Interrupt—Register Settings
Register
Setting Description
SRR0
Set to the effective address of the instruction that the processor would have attempted to execute next if no
exception conditions were present.
SRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE
SPE
WE
CE
EE
PR
ESR
Unchanged
MCSR
Unchanged
DEAR
Unchanged
Vector
IVPR[0–15] || IVOR10[16–27] || 0b0000
7.6.12
0
0
0
—
0
0
FP
ME
FE0
DE
0
—
0
—
FE1
IS
DS
PMM
RI
0
0
0
0
—
Fixed-Interval Timer Interrupt (IVOR11)
The e200 implements the fixed-interval timer (FIT) exception as described in the EREF. The triggering of
the exception is caused by selected bits in the time base register changing from 0 to 1.
A fixed-interval timer interrupt occurs when no higher priority exception exists, a FIT exception exists
(TSR[FIS] = 1), and the interrupt is enabled (both TCR[FIE] and MSR[EE] = 1).
The timer status register (TSR) holds the FIT interrupt bit set by the timer facility when an exception is
detected. Software must clear this bit in the interrupt handler to avoid repeated FIT interrupts.
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-34
Freescale Semiconductor
Interrupts and Exceptions
Table 7-23 lists register settings when a FIT interrupt is taken.
Table 7-23. Fixed-Interval Timer Interrupt—Register Settings
Register
Setting Description
SRR0
Set to the effective address of the instruction that the processor would have attempted to execute next if no
exception conditions were present.
SRR1
Set to the contents of the MSR at the time of the interrupt.
MSR
UCLE
SPE
WE
CE
EE
PR
ESR
Unchanged
MCSR
Unchanged
DEAR
Unchanged
Vector
IVPR[0–15] || IVOR11[16–27] || 0b0000
7.6.13
0
0
0
—
0
0
FP
ME
FE0
DE
0
—
0
—
FE1
IS
DS
PMM
RI
0
0
0
0
—
Watchdog Timer Interrupt (IVOR12)
The e200 implements the watchdog timer (WDT) exception as described in the EREF. The triggering of
the exception is caused by the first enabled watchdog time-out.
A watchdog timer interrupt occurs when no higher priority exception exists, a watchdog timer exception
exists (TSR[WIS] = 1), and the interrupt is enabled (both TCR[WIE] and MSR[CE] = 1).
The timer status register (TSR) holds the watchdog interrupt bit set by the timer facility when an exception
is detected. Software must clear this bit in the interrupt handler to avoid repeated watchdog interrupts.
Table 7-24 lists register settings when a watchdog timer interrupt is taken.
Table 7-24. Watchdog Timer Interrupt—Register Settings
Register
Setting Description
CSRR0
Set to the effective address of the instruction that the processor would have attempted to execute next if no
exception conditions were present.
CSRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE
SPE
WE
CE
EE
PR
0
0
0
0
0
0
ESR
Unchanged
MCSR
Unchanged
FP
ME
FE0
DE
0
—
0
0/—1
FE1
IS
DS
PMM
RI
0
0
0
0
—
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-35
Interrupts and Exceptions
Table 7-24. Watchdog Timer Interrupt—Register Settings (continued)
1
DEAR
Unchanged
Vector
IVPR[0–15] || IVOR12[16–27] || 0b0000
DE is cleared when the debug unit is disabled. Clearing of DE is optionally supported by control in HID0 when the debug unit
is enabled.
MSR[DE] is not automatically cleared by a watchdog timer interrupt, but can be configured to be cleared
via the HID0 register (HID0[CICLRDE]). Refer to Section 2.4.11, “Hardware Implementation Dependent
Register 0 (HID0).”
7.6.14
Data TLB Error Interrupt (IVOR13)
A data TLB error interrupt occurs when no higher priority exception exists and a data TLB error exception
exists due to a data translation lookup miss in the TLB.
Table 7-25 lists register settings when a DTLB interrupt is taken.
Table 7-25. Data TLB Error Interrupt—Register Settings
Register
Setting Description
SRR0
Set to the effective address of the excepting load/store instruction.
SRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE
SPE
WE
CE
EE
PR
ESR
[ST], [SPE], [VLEMI]. All other bits cleared.
0
0
0
—
0
0
FP
ME
FE0
DE
0
—
0
—
FE1
IS
DS
PMM
RI
0
0
0
0
—
MCSR
Unchanged
DEAR
Set to the effective address of a byte of the load or store whose access caused the violation.
Vector
IVPR[0–15] || IVOR13[16–27] || 0b0000
7.6.15
Instruction TLB Error Interrupt (IVOR14)
An instruction TLB error interrupt occurs when no higher priority exception exists and an instruction TLB
error exception exists due to an instruction translation lookup miss in the TLB.
Exception extensions implemented in the e200 for the Power ISA VLE involve extending the definition
of the instruction TLB error interrupt to include updating the ESR.
Table 7-26 lists register settings when an ITLB interrupt is taken.
Table 7-26. Instruction TLB Error Interrupt—Register Settings
Register
SRR0
Setting Description
Set to the effective address of the excepting instruction.
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-36
Freescale Semiconductor
Interrupts and Exceptions
Table 7-26. Instruction TLB Error Interrupt—Register Settings (continued)
SRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE
SPE
WE
CE
EE
PR
ESR
[MIF] All other bits cleared.
0
0
0
—
0
0
FP
ME
FE0
DE
MCSR
Unchanged
DEAR
Unchanged
Vector
IVPR[0:15] || IVOR14[16:27] || 0b0000
7.6.16
0
—
0
—
FE1
IS
DS
PMM
RI
0
0
0
0
—
Debug Interrupt (IVOR15)
The e200 implements the debug interrupt as defined in Power ISA embedded category with the following
changes:
• When the debug unit is enabled, debug is no longer a critical interrupt, but uses DSRR0 and
DSRR1 for saving machine state on context switch.
• A return from debug interrupt instruction (rfdi or se_rfdi) is implemented to support the new
machine state registers.
• A critical interrupt taken debug event is defined to allow critical interrupts to generate a debug
event.
• A critical return debug event is defined to allow debug events to be generated for rfci and se_rfci
instructions.
There are multiple sources that can signal a debug exception. A debug interrupt occurs when no higher
priority exception exists, a debug exception exists in the debug status register, and debug interrupts are
enabled (both DBCR0[IDM] = 1 (internal debug mode) and MSR[DE] = 1). Enabling debug events and
other debug modes are discussed further in Chapter 13, “Debug Support.” With the debug unit enabled,
(see Section 2.4.11, “Hardware Implementation Dependent Register 0 (HID0)”), the debug interrupt has
its own set of machine state save/restore registers (DSRR0, DSRR1) to allow debugging of both critical
and noncritical interrupt handlers. In addition, interrupts can be handled while in a debug software handler.
External and critical interrupts are not automatically disabled when a debug interrupt occurs but can be
configured to be cleared via the HID0 register (HID0[DCLREE, DCLRCE]). Refer to Section 2.4.11,
“Hardware Implementation Dependent Register 0 (HID0).” When the debug unit is disabled, debug
interrupts use the CSRR0 and CSRR1 registers to save machine state.
NOTE
For additional details regarding the following descriptions of debug
exception types, refer to Section 13.2, “Software Debug Events and
Exceptions.”
An instruction address compare (IAC) debug exception occurs when there is an instruction address match
as defined by the debug control registers and instruction address compare events are enabled. This could
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-37
Interrupts and Exceptions
either be a direct instruction address match or a selected set of instruction addresses. IAC has the highest
interrupt priority of all instruction-based interrupts, even if the instruction itself may have encountered an
instruction TLB error or instruction storage exception.
A branch taken (BRT) debug exception is signaled when a branch instruction is considered taken by the
branch unit and branch taken events are enabled. The debug interrupt is taken when no higher priority
exception is pending.
A data address compare (DAC) exception is signaled when there is a data access address match as defined
by the debug control registers and data address compare events are enabled. This could either be a direct
data address match or a selected set of data addresses, or a combination of data address and data value
matching. The debug interrupt is taken when no higher priority exception is pending.
The e200 implementation provides IAC linked with DAC exceptions. This results in a DAC exception
only if one or more IAC conditions are also met. See Chapter 13, “Debug Support,” for more details.
A trap (TRAP) debug exception occurs when a program trap exception is generated while trap events are
enabled. If MSR[DE] is set, the debug exception has higher priority than the program exception in this
case and will be taken instead of a trap type program interrupt. The debug interrupt is taken when no higher
priority exception is pending. If MSR[DE] is cleared when a trap debug exception occurs, a trap exception
type program interrupt will occur instead.
A return (RET) debug exception occurs when executing an rfi or se_rfi instruction and return debug events
are enabled. Return debug exceptions are not generated for rfci or se_rfci instructions. If MSR[DE] = 1 at
the time of the execution of the rfi or se_rfi, a debug interrupt occurs provided that no higher priority
exception is enabled to cause an interrupt. CSRR0 (debug unit disabled) or DSRR0 (debug unit enabled)
is set to the address of the rfi or se_rfi instruction. If MSR[DE] = 0 at the time of the execution of the rfi
or se_rfi, a debug interrupt does not occur immediately, but the event is recorded by setting the
DBSR[RET] and DBSR[IDE] status bits.
A critical return (CRET) debug exception occurs when executing an rfci or se_rfci instruction and critical
return debug events are enabled. Critical return debug exceptions are only generated for rfci or se_rfci
instructions. If MSR[DE] = 1 at the time of the execution of the rfci or se_rfci, a debug interrupt occurs
provided that no higher priority exception is enabled to cause an interrupt. CSRR0 (debug unit disabled)
or DSRR0 (debug unit enabled) is set to the address of the rfci or se_rfci instruction. If MSR[DE] = 0 at
the time of the execution of the rfci or se_rfci, a debug interrupt does not occur immediately, but the event
is recorded by setting the DBSR[CRET] and DBSR[IDE] status bits. Note that critical return debug events
should not normally be enabled unless the debug unit is enabled to avoid corruption of CSRR0/1.
An instruction complete (ICMP) debug exception is signaled following execution and completion of an
instruction while this event is enabled.
A mtmsr or mtdbcr0 that causes both MSR[DE] and DBCR0[IDM] to end up set, enabling precise debug
mode, may cause an imprecise (delayed) debug exception to be generated due to an earlier recorded event
in the debug status register.
An interrupt taken (IRPT) debug exception occurs when a noncritical interrupt context switch is detected.
This exception is imprecise and unordered with respect to the program flow. Note that an IRPT debug
interrupt only occurs when detecting a noncritical interrupt on the e200. The value saved in
CSRR0/DSRR0 is the address of the noncritical interrupt handler.
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-38
Freescale Semiconductor
Interrupts and Exceptions
A critical interrupt taken (CIRPT) debug exception occurs when a critical interrupt context switch is
detected. This exception is imprecise and unordered with respect to the program flow. Note that a CIRPT
debug interrupt only occurs when detecting a critical interrupt on the e200. The value saved in
CSRR0/DSRR0 is the address of the critical interrupt handler. Note that critical interrupt taken debug
events should not normally be enabled unless the debug unit is enabled to avoid corruption of CSRR0/1.
An unconditional debug event (UDE) exception occurs when the unconditional debug event pin (p_ude)
transitions to the asserted state.
Debug counter debug exceptions occur when enabled and one of the debug counters decrements to zero.
External debug exceptions occur when enabled and one of the external debug event pins (p_devt1,
p_devt2) transitions to the asserted state.
The debug status register (DBSR) provides a syndrome to differentiate between debug exceptions that can
generate the same interrupt. For more details see Chapter 13, “Debug Support.”
Table 7-27 lists register settings when a debug interrupt is taken.
Table 7-27. Debug Interrupt—Register Settings
Register
Setting Description
CSRR0/
DSRR01
Set to the effective address of the excepting instruction for IAC, BRT, RET, CRET, and TRAP.
Set to the effective address of the next instruction to be executed following the excepting instruction for DAC and
ICMP.
For a UDE, IRPT, CIRPT, DCNT, or DEVT type exception, set to the effective address of the instruction that the
processor would have attempted to execute next if no exception conditions were present.
CSRR1/
DSRR1
Set to the contents of the MSR at the time of the interrupt
MSR
DBSR3
UCLE
SPE
WE
CE
EE
PR
FP
ME
FE0
DE
0
0
0
—/02
—/02
0
Unconditional debug event:
Instr. complete debug event:
Branch taken debug event:
Interrupt taken debug event:
Critical interrupt taken debug event:
Trap instruction debug event:
Instruction address compare:
Data address compare:
Return debug event:
Critical return debug event:
Debug counter event:
External debug event:
and optionally, an
Imprecise debug event flag
ESR
Unchanged
MCSR
Unchanged
0
—
0
0
FE1
IS
DS
PMM
RI
0
0
0
0
—
UDE
ICMP
BRT
IRPT
CIRPT
TRAP
{IAC1, IAC2, IAC3, IAC4}
{DAC1R, DAC1W, DAC2R, DAC2W}
RET
CRET
{DCNT1, DCNT2}
{DEVT1, DEVT2}
{IDE}
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-39
Interrupts and Exceptions
Table 7-27. Debug Interrupt—Register Settings (continued)
DEAR
Unchanged
Vector
IVPR[0–15] || IVOR15[16–27] || 0b0000
1
Assumes that the debug interrupt is precise.
Conditional based on control bits in HID0.
3
Note that multiple DBSR bits may be set.
2
7.6.17
System Reset Interrupt
The e200 implements the system reset interrupt as defined in the Power ISA embedded category. The
system reset exception is a nonmaskable, asynchronous exception signaled to the processor through the
assertion of system-defined signals.
A system reset may be initiated by either asserting the p_reset_b input signal or during power-on reset by
asserting m_por. The m_por signal must be asserted during power up and must remain asserted for a period
that allows internal logic to be reset. The p_reset_b signal must also remain asserted for a period that
allows internal logic to be reset. This period is specified in the hardware specifications. If m_por or
p_reset_b are asserted for less than the required interval, the results are not predictable.
When a reset request occurs, the processor branches to the system reset exception vector (value on
p_rstbase[0:29] concatenated with 0b00) without attempting to reach a recoverable state. If reset occurs
during normal operation, all operations cease and the machine state is lost. CPU internal state after a reset
is defined in Section 2.6, “Reset Settings.”
Reset may also be initiated by watchdog timer or debug reset control. The watchdog timer and debug reset
control provide the capability to assert the p_wrs[0:1] and p_dbrstc[0:1] signals. External logic may factor
this into the p_reset_b input signal to cause an e200 reset to occur.
Table 7-28 shows the TSR register bits associated with watchdog timer reset status. Note that these bits are
cleared when a processor reset occurs; thus if the p_wrs[0:1] outputs are factored into p_reset_b, they are
only seen in the 00 state by software.
Table 7-28. TSR Watchdog Timer Reset Status
Bits
Name
2–3
(34–35)
WRS
Function
00
01
10
11
No action performed by watchdog timer
Watchdog Timer second time-out caused p_wrs1 to be asserted
Watchdog Timer second time-out caused p_wrs0 to be asserted
Watchdog Timer second time-out caused p_wrs0 and p_wrs1 to be asserted
Table 7-29 shows the DBSR register bits associated with reset status.
Table 7-29. DBSR Most Recent Reset
Bits
Name
2–3
(34–35)
MRR
Function
00
01
10
11
No reset occurred since these bits were last cleared by software
A reset occurred since these bits were last cleared by software
Reserved
Reserved
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-40
Freescale Semiconductor
Interrupts and Exceptions
Table 7-30 lists register settings when a system reset interrupt is taken.
Table 7-30. System Reset Interrupt—Register Settings
Register
Setting Description
CSRR0
Undefined
CSRR1
Undefined
MSR
UCLE
SPE
WE
CE
EE
PR
0
0
0
0
0
0
ESR
Cleared
FP
ME
FE0
DE
DEAR
Undefined
Vector
[p_rstbase[0:29]] || 0b00
7.6.18
0
0
0
0
FE1
IS
DS
PMM
RI
0
0
0
0
0
SPE/EFPU Unavailable Interrupt (IVOR32)
The SPE unit unavailable exception is taken if MSR[SPE] is cleared and execution of a SPE or EFPU
instruction other than the scalar floating-point instructions (efsxxx) or brinc is attempted. When the
SPE/EFPU unavailable exception occurs, the processor suppresses execution of the instruction causing the
exception. Table 7-31 lists register settings when a SPE/EFPU unavailable interrupt is taken.
Table 7-31. SPE/EFPU Unavailable Interrupt—Register Settings
Register
Setting Description
SRR0
Set to the effective address of the excepting SPE/EFPU instruction.
SRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE
SPE
WE
CE
EE
PR
ESR
SPE, [VLEMI]. All other bits cleared.
0
0
0
—
0
0
FP
ME
FE0
DE
MCSR
Unchanged
DEAR
Unchanged
Vector
IVPR[0–15] || IVOR32[16–27] || 0b0000
7.6.19
0
—
0
—
FE1
IS
DS
PMM
RI
0
0
0
0
—
Embedded Floating-Point Data Interrupt (IVOR33)
The embedded floating-point data interrupt is taken if no higher priority exception exists and an EFPU
floating-point data exception is generated. When a floating-point data exception occurs, the processor
suppresses execution of the instruction causing the exception.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-41
Interrupts and Exceptions
Table 7-32 lists register settings when an EFPU floating-point data interrupt is taken.
Table 7-32. Embedded Floating-Point Data Interrupt—Register Settings
Register
Setting Description
SRR0
Set to the effective address of the excepting EFPU instruction.
SRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE
SPE
WE
CE
EE
PR
ESR
SPE, [VLEMI]. All other bits cleared.
0
0
0
—
0
0
FP
ME
FE0
DE
MCSR
Unchanged
DEAR
Unchanged
Vector
IVPR[0–15] || IVOR33[16–27] || 0b0000
7.6.20
0
—
0
—
FE1
IS
DS
PMM
RI
0
0
0
0
—
Embedded Floating-Point Round Interrupt (IVOR34)
The embedded floating-point round interrupt is taken when an EFPU floating-point instruction generates
an inexact result and inexact exceptions are enabled.
Table 7-33 lists register settings when an EFPU floating-point round interrupt is taken.
Table 7-33. Embedded Floating-point Round Interrupt—Register Settings
Register
Setting Description
SRR0
Set to the effective address of the instruction following the excepting EFPU instruction.
SRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE
SPE
WE
CE
EE
PR
ESR
SPE, [VLEMI]. All other bits cleared.
0
0
0
—
0
0
FP
ME
FE0
DE
MCSR
Unchanged
DEAR
Unchanged
Vector
IVPR[0–15] || IVOR34[16–27] || 0b0000
0
—
0
—
FE1
IS
DS
PMM
RI
0
0
0
0
—
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-42
Freescale Semiconductor
Interrupts and Exceptions
7.6.21 Performance Monitor Interrupt (IVOR35)
The e200z7 provides a performance monitor interrupt that may be generated by an enabled condition or
event. An enabled condition or event is as follows:
A PMCx register overflow condition occurs with the following settings:
• PMLCax[CE] = 1; that is, for the given counter the overflow condition is enabled.
• PMCx[OV] = 1; that is, the given counter indicates an overflow.
For a performance monitor interrupt to be signaled on an enabled condition or event, PMGC0[PMIE] must
be set.
Although an exception condition may occur with MSR[EE] = 0, the interrupt cannot be taken until
MSR[EE] = 1.
The priority of the performance monitor interrupt is below all other asynchronous interrupts. For details,
see Section 7.6.21, “Performance Monitor Interrupt (IVOR35).”
Table 7-34 lists register settings when an performance monitor interrupt is taken.
Table 7-34. Performance Monitor Interrupt—Register Settings
Register
Setting Description
SRR0
Set to the effective address of the next instruction to be executed.
SRR1
Set to the contents of the MSR at the time of the interrupt
MSR
UCLE 0
SPE 0
WE 0
CE
—
EE
0
PR
0
ESR
Unchanged
MCSR
Unchanged
DEAR
Unchanged
Vector
IVPR[0–15] || IVOR35[16–27] || 0b0000
7.7
FP
ME
FE0
DE
0
—
0
—
FE1
IS
DS
PMM
RI
0
0
0
0
—
Exception Recognition and Priorities
The following list of exception categories describes how the e200 handles exceptions up to the point of
signaling the appropriate interrupt to occur. Also, instruction completion is defined as updating all
architectural registers associated with that instruction as necessary, and then removing the instruction from
the pipeline.
• Interrupts caused by asynchronous events (exceptions). These exceptions are further distinguished
by whether they are maskable and recoverable.
— Asynchronous, nonmaskable, nonrecoverable:
– System reset by assertion of p_reset_b
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-43
Interrupts and Exceptions
•
•
– Has highest priority and is taken immediately regardless of other pending exceptions or
recoverability. (Includes watchdog timer reset control and debug reset control.)
— Asynchronous, nonmaskable, possibly nonrecoverable:
– Nonmaskable interrupt by assertion of p_nmi_b
– Has priority over any other pending exception except system reset conditions.
Recoverability is dependent on whether MCSRR0/1 are holding essential state info and are
overwritten when the NMI occurs.
— Asynchronous, maskable/nonmaskable, recoverable/nonrecoverable:
– Machine check interrupt
– Has priority over any other pending exception except system reset conditions.
Recoverability is dependent on the source of the exception.
— Asynchronous, maskable, recoverable:
– External input, fixed-interval timer, decrementer, critical input, performance monitor,
unconditional debug, external debug event, debug counter event, and watchdog timer
interrupts
– Before handling this type of exception, the processor needs to reach a recoverable state. A
maskable recoverable exception will remain pending until taken or canceled by software.
Synchronous, non-instruction based interrupts. The only exception is this category is the interrupt
taken debug exception, recognized by an interrupt taken event. It is not considered
instruction-based but is synchronous with respect to the program flow.
— Synchronous, maskable, recoverable:
– Interrupt taken debug event
– The machine will be in a recoverable state due to the state of the machine at the context
switch triggering this event.
Instruction-based interrupts. These interrupts are further organized by the point in instruction
processing in which they generate an exception.
— Instruction fetch:
– Instruction storage, instruction TLB, and instruction address compare debug exceptions.
– Once these types of exceptions are detected, the excepting instruction is tagged. When the
excepting instruction is next to begin execution and a recoverable state has been reached,
the interrupt is taken. If an event prior to the excepting instruction causes a redirection of
execution, the instruction fetch exception is discarded (but may be encountered again).
— Instruction dispatch/execution:
– Program, system call, data storage, alignment, floating-point unavailable, SPE/EFPU
unavailable, data tlb, embedded floating-point data, embedded floating-point round, debug
(trap, branch taken, ret) interrupts.
– These types of exceptions are determined during decode or execution of an instruction. The
exception remains pending until all instructions before the exception causing instruction in
program order complete. The interrupt is then taken without completing the
exception-causing instruction. If completing previous instructions causes an exception, that
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-44
Freescale Semiconductor
Interrupts and Exceptions
exception takes priority over the pending instruction dispatch/execution exception, which is
discarded (but may be encountered again when instruction processing resumes).
— Post-instruction execution:
– Debug (data address compare, instruction complete) interrupt.
– These debug exceptions are generated following execution and completion of an instruction
while the event is enabled. If executing the instruction produces conditions for another type
of exception with higher priority, that exception is taken and the post-instruction exception
is discarded for the instruction (but may be encountered again when instruction processing
resumes)
7.7.1
Exception Priorities
Exceptions are prioritized as described in Table 7-35. Some exceptions may be masked or imprecise,
which affects their priority. Nonmaskable exceptions such as reset and machine check may occur at any
time and are not delayed even if an interrupt is being serviced; thus state information for any interrupt may
be lost. Reset and certain machine checks are nonrecoverable.
Table 7-35. Zen Exception Priorities
Priority
Exception
Cause
IVOR
Asynchronous Exceptions
0
System reset
Assertion of p_reset_b, Watchdog Timer Reset Control, or Debug Reset
Control
None
1
Machine check
Assertion of p_mcp_b, assertion of p_nmi_b, Cache Parity errors, exception
on fetch of first instruction of an interrupt handler, external bus errors
1
2
—
—
—
31
Debug:
1.
2.
3.
4.
5.
6.
15
UDE
DEVT1
DEVT2
DCNT1
DCNT2
IDE
1. Assertion of p_ude (Unconditional Debug Event)
2. Assertion of p_devt1 and event enabled (External Debug Event 1)
3. Assertion of p_devt2 and event enabled (External Debug Event 2)
4. Debug Counter 1 exception
5. Debug Counter 2 exception
6. Imprecise Debug Event (event imprecise due to previous higher priority
interrupt
41
Critical Input
Assertion of p_critint_b
0
51
Watchdog Timer
Watchdog Timer first enabled time-out
12
61
External Input
Assertion of p_extint_b
4
71
Fixed-Interval Timer
Posting of a FIT exception in TSR due to programmer-specified bit transition
in the Time Base register
11
81
Decrementer
Posting of a Decrementer exception in TSR due to programmer-specified
Decrementer condition
10
91
Performance Monitor
Performance Monitor Enabled Condition or Event
35
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-45
Interrupts and Exceptions
Table 7-35. Zen Exception Priorities (continued)
Priority
Exception
Cause
IVOR
Instruction Fetch Exceptions
10
Debug:
IAC (unlinked)
Instruction address compare match for enabled IAC debug event and
DBCR0[IDM] asserted
15
11
ITLB Error
Instruction translation lookup miss in the TLB
14
12
Instruction Storage
Access control.
Byte ordering due to misaligned instruction across page boundary to pages
with mismatched VLE bits, or access to page with VLE set, and E indicating
little-endian.
Misaligned Instruction fetch due to a change of flow to an odd half word
instruction boundary on a Power ISA (non-VLE) instruction page, due to
value in LR, CTR, or xSRR0
3
Instruction Dispatch/Execution Interrupts
13
14
15
16
17
18
19
Program:
Illegal
Attempted execution of an illegal instruction.
6
Program:
Privileged
Attempted execution of a privileged instruction in user-mode
6
Floating-point
Unavailable
Any floating-point unavailable exception condition.
7
SPE/EFPU Unavailable
Any SPE or EFPU unavailable exception condition.
32
Program:
Unimplemented
Attempted execution of an unimplemented instruction.
Debug:
1.
2.
3.
4.
6
15
BRT
Trap
RET
CRET
1. Attempted execution of a taken branch instruction
2. Condition specified in tw or twi instruction met.
3. Attempted execution of a rfi instruction.
4. Attempted execution of an rfci instruction.
Note: Exceptions requires corresponding debug event enabled,
MSR[DE] = 1, and DBCR0[IDM] = 1.
Program:
Trap
Condition specified in tw or twi instruction met and not trap debug.
6
System Call
Execution of the System Call (sc, se_sc) instruction.
8
EFPU Floating-point Data Denormalized, NaN, or Infinity data detected as input or output, or underflow,
overflow, divide by zero, or invalid operation in the EFPU.
33
EFPU Round
Inexact Result
34
Alignment
lmw, stmw, lwarx, or stwcx. not word aligned.
lharx, or sthcx. not half-word aligned.
dcbz with cache disabled.
5
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-46
Freescale Semiconductor
Interrupts and Exceptions
Table 7-35. Zen Exception Priorities (continued)
Priority
Exception
Cause
20
Debug:
Debug with concurrent
DTLB or DSI exception,
or concurrent async
machine check:
1. DAC/IAC linked2
2. DAC unlinked2
IVOR
15
Debug with concurrent DTLB or DSI exception, or async machine check
condition on the DAC. DBSR[IDE] also set.
1. Data Address Compare linked with Instruction Address Compare
2. Data Address Compare unlinked
Note: Exceptions requires corresponding debug event enabled,
MSR[DE] = 1, and DBCR0[IDM] = 1. In this case, the debug exception is
considered imprecise, and DBSR[IDE] will be set. Saved PC will point to the
load or store instruction causing the DAC event.
21
Data TLB Error
Data translation lookup miss in the TLB.
13
22
Data Storage
Access control.
Byte ordering due to misaligned access across page boundary to pages with
mismatched E bits.
Cache locking due to attempt to execute a dcbtls, dcbtstls, dcblc, icbtls, or
icblc in user mode with MSR[UCLE] = 0.
2
23
Alignment
dcbz to W = 1 or I = 1 storage with cache enabled
5
24
Debug:
1. IRPT
2. CIRPT
15
1. Interrupt taken (non-critical)
2. Critical Interrupt taken (critical only)
Note: Exceptions requires corresponding debug event enabled,
MSR[DE] = 1 and DBCR0[IDM] = 1.
Post-Instruction Execution Exceptions
25
Debug:
1. DAC/IAC linked2 1. Data Address Compare linked with Instruction Address Compare
2. Data Address Compare unlinked
2. DAC unlinked2
Notes: Exceptions requires corresponding debug event enabled,
MSR[DE] = 1, and DBCR0[IDM] = 1. Saved PC will point to the instruction
following the load or store instruction causing the DAC event.
15
26
Debug:
1. ICMP
15
1. Completion of an instruction.
Note: Exceptions requires corresponding debug event enabled,
MSR[DE] = 1, and DBCR0[IDM] = 1.
1
These asynchronous exceptions are sampled at instruction boundaries, thus may actually occur after exceptions which
are due to a currently executing instruction. If one of these exceptions occurs during execution of an instruction in the
pipeline, it is not processed until the pipeline has been flushed, and the exception associated with the excepting
instruction may occur first.
2 When no Data Storage Interrupt or Data TLB Error occurs, the Zen implements the data address compare debug
exceptions as post-instruction exceptions which differs from the Power ISA definition. When a TEA (either a DTLB error
or DSI or Machine Check (external TEA)) occurs in conjunction with an enabled DAC or linked DAC/IAC on a load or
store class instruction, or a debug counter event based on a counted DAC, the debug Interrupt takes priority, and the
saved PC value will point to the load or store class instruction, rather than to the next instruction.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-47
Interrupts and Exceptions
7.8
Interrupt Processing
When an interrupt is taken, the processor uses the following:
• SRR0/SRR1 for noncritical interrupts
• CSRR0/CSRR1 for critical interrupts
• MCSRR0/MCSRR1 for machine check interrupts
• Either CSRR0/CSRR1 or DSRR0/DSRR1 for debug interrupts to save the contents of the MSR and
to assist in identifying where instruction execution should resume after the interrupt is handled
When an interrupt occurs, one of SRR0/CSRR0/DSRR0/MCSRR0 is set to the address of the instruction
that caused the exception or to the following instruction as appropriate.
• SRR1 is used to save machine state (selected MSR bits) on noncritical interrupts and to restore
those values when an rfi instruction is executed.
• CSRR1 is used to save machine status (selected MSR bits) on critical interrupts and to restore those
values when an rfci instruction is executed.
• DSRR1 is used to save machine status (selected MSR bits) on debug interrupts when the debug
unit is enabled and to restore those values when an rfdi instruction is executed.
• MCSRR1 is used to save machine status (selected MSR bits) on machine check interrupts and to
restore those values when an rfmci instruction is executed.
The exception syndrome register is loaded with information specific to the exception type. Some interrupt
types can only be caused by a single exception type, and thus do not use an ESR setting to indicate the
interrupt cause.
The machine state register is updated to preclude unrecoverable interrupts from occurring during the initial
portion of the interrupt handler. Specific settings are described in Table 7-36.
• For alignment, data storage, or data TLB miss interrupts, the data exception address register
(DEAR) is loaded with the address which caused the interrupt to occur.
• For machine check interrupts, the machine check syndrome register is loaded with information
specific to the exception type. For certain machine checks, the MCAR is loaded with an address
corresponding to the machine check.
Instruction fetch and execution resumes, using the new MSR value, at a location specific to the exception
type. The location is determined by the interrupt vector prefix register (IVPR), and an interrupt vector
offset register (IVOR) specific for each type of interrupt (see Table 7-2).
Table 7-36 shows the MSR settings for different interrupt categories.
Table 7-36. MSR Setting Due to Interrupt
Bits
MSR Definition
Reset
Setting
Noncritical
Interrupt
Critical
Interrupt
Debug
Interrupt
Machine Check
Interrupt
5 (37)
UCLE
0
0
0
0
0
6 (38)
SPE
0
0
0
0
0
13 (45)
WE
0
0
0
0
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-48
Freescale Semiconductor
Interrupts and Exceptions
Table 7-36. MSR Setting Due to Interrupt (continued)
Bits
MSR Definition
Reset
Setting
Noncritical
Interrupt
Critical
Interrupt
Debug
Interrupt
Machine Check
Interrupt
14 (46)
CE
0
—
0
—/01
0
16 (48)
EE
0
0
0
—/01
0
17 (49)
PR
0
0
0
0
0
18 (50)
FP
0
0
0
0
0
19 (51)
ME
0
—
—
—
0
20 (52)
FE0
0
0
0
0
0
1
22 (54)
DE
0
—
—/0
0
—/01
23 (55)
FE1
0
0
0
0
0
26 (58)
IS
0
0
0
0
0
27 (59)
DS
0
0
0
0
0
29 (61)
PMM
0
0
0
0
0
30 (62)
RI
0
—
—
—
0
Reserved and preserved bits are unimplemented and read as 0.
1
Conditionally cleared based on control bits in HID0.
7.8.1
Enabling and Disabling Exceptions
When a condition exists that may cause an exception to be generated, determine whether the exception is
enabled for that condition according to the following.
• System reset exceptions cannot be masked.
• Machine check exceptions cannot be masked from sources other than the machine check pin, and
certain other async machine check status settings. Assertion of p_mcp_b is only recognized if the
machine check pin enable bit (HID0[EMCP]) is set. Certain machine check exceptions can be
enabled and disabled through bit(s) in the HID0 register.
• Asynchronous, maskable noncritical exceptions (such as the external input and decrementer) are
enabled by setting MSR[EE]. When MSR[EE] = 0, recognition of these exception conditions is
delayed. MSR[EE] is cleared automatically when a noncritical or critical interrupt is taken to mask
further recognition of conditions causing those exceptions.
• Asynchronous, maskable critical exceptions (such as critical input and watchdog timer) are
enabled by setting MSR[CE]. When MSR[CE] = 0, recognition of these exception conditions is
delayed. MSR[CE] is cleared automatically when a critical interrupt is taken to mask further
recognition of conditions causing those exceptions.
• Synchronous and asynchronous debug exceptions are enabled by setting MSR[DE]. When
MSR[DE] = 0, recognition of these exception conditions is masked. MSR[DE] is cleared
automatically when a debug interrupt is taken to mask further recognition of conditions causing
those exceptions. See Chapter 13, “Debug Support,” for more details on individual control of
debug exceptions.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
7-49
Interrupts and Exceptions
•
7.8.2
The floating-point unavailable exception can be prevented by setting MSR[FP] (although an
unimplemented instruction exception will be generated by e200 instead).
Returning from an Interrupt Handler
The return from interrupt (rfi, se_rfi), return from critical interrupt (rfci, se_rfci), return from debug
interrupt (rfdi, se_rfdi), and return from machine check interrupt (rfmci, se_rfmci) instructions perform
context synchronization by allowing previously-issued instructions to complete before returning to the
interrupted process. In general, execution of return from interrupt type instructions ensures the following:
• All previous instructions have completed to a point where they can no longer cause an exception.
This includes post-execute type exceptions.
• Previous instructions complete execution in the context (privilege and protection) under which
they were issued.
• The rfi and se_rfi instructions copy SRR1 bits back into the MSR.
• The rfci and se_rfci instructions copy CSRR1 bits back into the MSR.
• The rfdi and se_rfdi instructions copy DSRR1 bits back into the MSR.
• The rfmci and se_rfmci instructions copy MCSRR1 bits back into the MSR.
• Instructions fetched after this instruction execute in the context established by this instruction.
• Program execution resumes at the instruction indicated by SRR0 for rfi and se_rfi, CSRR0 for rfci
and se_rfci, MCCSRR0 for rfmci and se_rfmci, and DSRR0 for rfdi and se_rfdi.
Note that the return instructions rfi and se_rfi may be subject to a return type debug exception and that the
return from critical interrupt instructions rfci and se_rfci may be subject to a critical return type debug
exception. For a complete description of context synchronization, refer to the EREF.
7.9
Process Switching
The following instructions are useful for restoring proper context during process switching:
• The msync instruction orders the effects of data memory instruction execution. All instructions
previously initiated appear to have completed before the msync instruction completes, and no
subsequent instructions appear to be initiated until the msync instruction completes.
• The isync instruction waits for all previous instructions to complete and then discards any fetched
instructions, causing subsequent instructions to be fetched (or refetched) from memory and to
execute in the context (privilege, translation, and protection) established by the previous
instructions.
• The stwcx. instructions clears any outstanding reservations, ensuring that a load and reserve
instruction in an old process is not paired with a store conditional instruction in a new one.
e200z7 Power Architecture Core Reference Manual, Rev. 2
7-50
Freescale Semiconductor
Chapter 8
Performance Monitor
This chapter describes the performance monitor, which is generally defined by the Freescale EIS and
implemented as a unit on the e200z7 core. Although the programming model is defined by the EIS, some
features are defined by the implementation—in particular, the events that can be counted.
8.1
Overview
The performance monitor provides the ability to count predefined events and processor clocks associated
with particular operations, such as cache misses, mispredicted branches, or the number of cycles an
execution unit stalls. The count of such events can be used to trigger the performance monitor interrupt.
The performance monitor can do the following:
•
•
•
Improve system performance by monitoring software execution and then recoding algorithms for
more efficiency. For example, memory hierarchy behavior can be monitored and analyzed to
optimize task scheduling or data distribution algorithms.
Characterize processors in environments not easily characterized by benchmarking.
Help system developers bring up and debug their systems.
The performance monitor consists of the following resources:
• The performance monitor mark bit in the MSR (MSR[PMM]). This bit controls which programs
are monitored.
• The move to/from performance monitor registers (PMR) instructions, mtpmr and mfpmr.
• The external input p_pm_event.
• The external outputs p_pmc0_ov, p_pmc1_ov, p_pmc2_ov, and p_pmc3_ov
• PMRs, as follow:
— The performance monitor counter registers PMC0–PMC3 are 32-bit counters used to count
software-selectable events. UPMC0–UPMC3 provide user-level read access to these registers.
Counted events are those that should be of general value. They are identified in Table 8-10.
— The performance monitor global control register PMGC0 controls the counting of performance
monitor events. It takes priority over all other performance monitor control registers. UPMGC0
provides user-level read access to PMGC0.
— The performance monitor local control registers PMLCa0–PMLCa3 and PMLCb0–PMLCb3
control individual performance monitor counters. Each counter has a corresponding PMLCa
and PMLCb register. UPMLCa0–UPMLCa3 and UPMLCb0–UPMLCb3 provide user-level
read access to PMLCa0–PMLCa3 and PMLCb0–PMLCb3.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
8-1
Performance Monitor
•
The performance monitor interrupt follows the embedded category in the Power ISA interrupt
model and is assigned to interrupt vector offset register 35 (IVOR35). It has the lowest priority of
all asynchronous interrupts.
Software communication with the performance monitor APU is achieved through PMRs rather than SPRs.
8.2
Performance Monitor Instructions
The performance monitor defines the mfpmr and mtpmr instructions for reading and writing the PMRs
as follows.
mfpmr
mfpmr
Move from Performance Monitor Register
mfpmr
rD,PMRN
0
5
6
0 1 1 1 1 1
10
11
rD
Form: X
15
16
PMRN[5–9]
20
PMRN[0–4]
21
0
1
0
1
0
0
1
1
1
30
31
0
/
GPR(rD) PMREG(PMRN)
The contents of the performance monitor register designated by PMRN are placed into GPR[rD].
MSR[PR] has the following results:
• When MSR[PR] = 1, specifying a performance monitor register that is not implemented or is write
only and is not privileged (i.e. PMRN[5] = 0) results in an illegal instruction exception-type
Program Interrupt.
• When MSR[PR] = 1, specifying a performance monitor register that is not implemented or is write
only and is privileged (i.e. PMRN[5] = 1) results in a privileged instruction exception-type
Program Interrupt.
• When MSR[PR] = 0, specifying a performance monitor register that is not implemented or is
write-only results in an illegal instruction exception type Program Interrupt.
mtpmr
mtpmr
Move to Performance Monitor Register
mtpmr
PMRN, rS
0
5
0 1 1 1 1 1
6
10
rS
11
Form: X
15
PMRN5:9
16
20
PMRN0:4
21
0
1
1
1
0
0
1
1
1
30
31
0
/
PMREG(PMRN) GPR(rS)
The contents of GPR[rS] are placed into the performance monitor register designated by PMRN.
e200z7 Power Architecture Core Reference Manual, Rev. 2
8-2
Freescale Semiconductor
Performance Monitor
MSR[PR] has the following results:
• When MSR[PR] = 1, specifying a performance monitor register that is not implemented or is
read-only and is not privileged (i.e. PMRN[5] = 0) results in an illegal instruction exception-type
Program Interrupt.
• When MSR[PR] = 1, specifying a performance monitor register that is not implemented or is
read-only and is privileged (i.e. PMRN[5] = 1) results in a privileged instruction exception-type
Program Interrupt.
• When MSR[PR] = 0, specifying a performance monitor register that is not implemented or is
read-only results in an illegal instruction exception type Program Interrupt.
8.3
Performance Monitor Registers
The Freescale EIS defines a set of register resources used exclusively by the performance monitor. PMRs
are similar to the SPRs defined in the embedded category in the Power ISA architecture and are accessed
by mtpmr and mfpmr instructions, which are also defined by the Freescale EIS. Table 8-1 lists
supervisor-level (privileged) PMRs.
Table 8-1. Supervisor-Level PMRs (PMR[5] = 1)
Name
Register Name
PMR
Number
pmr[0–4]
pmr[5–9]
Section/
Page
8.3.9/8-12
PMC0
Performance monitor counter 0
16
00000
10000
PMC1
Performance monitor counter 1
17
00000
10001
PMC2
Performance monitor counter 2
18
00000
10010
PMC3
Performance monitor counter 3
19
00000
10011
PMGC0
Performance monitor global control register 0
400
01100
10000
8.3.3/8-5
PMLCa0
Performance monitor local control a0
144
00100
10000
8.3.5/8-6
PMLCa1
Performance monitor local control a1
145
00100
10001
PMLCa2
Performance monitor local control a2
146
00100
10010
PMLCa3
Performance monitor local control a3
147
00100
10011
PMLCb0
Performance monitor local control b0
272
01000
10000
PMLCb1
Performance monitor local control b1
273
01000
10001
PMLCb2
Performance monitor local control b2
274
01000
10010
PMLCb3
Performance monitor local control b3
275
01000
10011
8.3.7/8-7
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
8-3
Performance Monitor
Table 8-2 shows the user-level PMRs, which are read-only and accessed with mfpmr.
Table 8-2. User-Level PMRs (PMR[5] = 0) (Read-Only)
Name
Register Name
PMR
Number
pmr[0–4]
pmr[5–9]
UPMC0
User performance monitor counter 0
0
00000
00000
UPMC1
User performance monitor counter 1
1
00000
00001
UPMC2
User performance monitor counter 2
2
00000
00010
UPMC3
User performance monitor counter 3
3
00000
00011
Section/
Page
8.3.10/8-13
UPMGC0
User performance monitor global control register 0
384
01100
00000
8.3.4/8-6
UPMLCa0
User performance monitor local control a0
128
00100
00000
8.3.6/8-7
UPMLCa1
User performance monitor local control a1
129
00100
00001
UPMLCa2
User performance monitor local control a2
130
00100
00010
UPMLCa3
User performance monitor local control a3
131
00100
00011
UPMLCb0
User performance monitor local control b0
256
01000
00000
UPMLCb1
User performance monitor local control b1
257
01000
00001
UPMLCb2
User performance monitor local control b2
258
01000
00010
UPMLCb3
User performance monitor local control b3
259
01000
00011
8.3.1
8.3.8/8-12
Invalid PMR References
Behavior when an invalid PMR is referenced depends on the privilege level of the register and MSR[PR].
Table 8-3 shows the response for various references to invalid PMRs.
Table 8-3. Response to an Invalid PMR Reference
PMR Address Bit 5
0 (user)
1 (supervisor)
8.3.2
MSR[PR]
Response
x
Illegal exception
0 (supervisor)
Illegal exception
1 (user)
Privileged exception
References to Read-only PMRs
If a mtpmr instruction is executed to a read-only PMR, the e200z7 takes an illegal exception.
e200z7 Power Architecture Core Reference Manual, Rev. 2
8-4
Freescale Semiconductor
Performance Monitor
8.3.3
Global Control Register 0 (PMGC0)
The performance monitor global control register (PMGC0), shown in Figure 8-1, controls all performance
monitor counters.
PMR 400
0
R
W
Access: Read/Write
1
2
3
FAC PMIE FCECE
18
—
Reset
19
20
TBSEL
21 22
23
—
TBEE
24
31
—
All zeros
Figure 8-1. Performance Monitor Global Control Register (PMGC0)
PMGC0 is cleared by reset. Reading this register does not change its contents. Table 8-4 describes
PMGC0’s fields.
Table 8-4. PMGC0 Field Descriptions
Bits
Name
0
(32)
FAC
Freeze All Counters
When FAC is set by hardware or software, it has no effect on PMLCax[FC]; PMLCax[FC] maintains it’s
current value until changed by software. FAC setting by hardware is controlled by PMGC0[FCECE].
0 The PMCs are incremented (if permitted by other PMGC/PMLC control bits).
1 The PMCs are not incremented.
1
(33)
PMIE
Performance monitor interrupt Enable
Software can clear PMIE to prevent performance monitor interrupts. Performance monitor interrupts are
caused by time base events or PMCx overflow.
0 Performance monitor interrupts are disabled.
1 Performance monitor interrupts are enabled and occur when an enabled condition or event occurs, at
which time PMGC0[PMIE] is cleared
2
(34)
3–18
(35–50)
19–20
(51–52)
Description
FCECE Freeze Counters on Enabled Condition or Event
An enabled condition or event is defined as one of the following:
• When the msb = 1 in PMCx and PMLCax[CE] = 1.
• When the time-base bit specified by PMGC0[TBSEL] transitions to 1 and PMGC0[TBEE] = 1.
The use of the trigger and freeze counter conditions depends on the enabled conditions and events
described in Section 8.4, “Performance Monitor Interrupt.”
0 The PMCs can be incremented (if permitted by other PM control bits).
1 The PMCs can be incremented (if permitted by other PM control bits) only until an enabled condition or
event occurs. When an enabled condition or event occurs, PMGC0[FAC] is set to 1. It is up to software
to clear PMGC0[FAC] to 0.
—
Reserved, should be cleared.
TBSEL Time Base Selector
Selects the time base bit that can cause a time base transition event (the event occurs when the selected
bit changes from 0 to 1).
Time-base frequency is implementation dependent, so software should invoke a system service program
to obtain the frequency before choosing a TBSEL value.
00 TB[63] (TBL[31])
01 TB[55] (TBL[23])
10 TB[51] (TBL[19])
11 TB[47] (TBL[15])
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
8-5
Performance Monitor
Table 8-4. PMGC0 Field Descriptions (continued)
Bits
Name
21–22
(53–54)
—
23
(55)
TBEE
24–31
(56–63)
—
8.3.4
Description
Reserved, should be cleared.
Time base transition Event Enable
Time base transition events can be used to freeze counters (PMGC0[FCECE]) or signal an exception
(PMGC0[PMIE]). Although the exception signal condition may occur with MSR[EE] = 0, the interrupt
cannot be taken until MSR[EE] = 1.
Changing PMGC0[TBSEL] while PMGC0[TBEE] is enabled may cause a false 0 to 1 transition that
signals the specified action (freeze, exception) to occur immediately.
0 Time base transition events are disabled.
1 Time base transition events are enabled. A time base transition is signalled to the performance monitor
if the TB bit specified in PMGC0[TBSEL] changes from 0 to 1.
Reserved, should be cleared.
User Global Control Register 0 (UPMGC0)
UPMGC0 provides user-level read access to PMGC0. UPMGC0 can be read by user-level software with
the mfpmr instruction using PMR 384.
8.3.5
Local Control A Registers (PMLCa0–PMLCa3)
The local control A registers (PMLCa0–PMLCa3) function as event selectors and give local control for
the corresponding performance monitor counters. PMLCa is used in conjunction with the corresponding
PMLCb register. PMLCa registers are shown in Figure 8-2.
PMR 144–147
0
R
W
Access: Read/Write
1
2
3
4
5
FC FCS FCU FCM1 FCM0 CE
Reset
6
7
—
8
15 16 17 18 19 20
EVENT
—
PMP
31
—
All zeros
Figure 8-2. Performance Monitor Local Control A Registers (PMLCa0–PMLCa3)
PMLCa registers are cleared by reset. Table 8-5 describes PMLCa fields.
Table 8-5. PMLCa0–PMLCa3 Field Descriptions
Bits
Name
Description
0
(32)
FC
Freeze Counter.
0 The PMC can be incremented (if enabled by other performance monitor control fields).
1 The PMC will not be incremented.
1
(33)
FCS
Freeze Counter in Supervisor state.
0 The PMC can be incremented (if enabled by other performance monitor control fields).
1 The PMC will not be incremented if MSR[PR] is cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
8-6
Freescale Semiconductor
Performance Monitor
Table 8-5. PMLCa0–PMLCa3 Field Descriptions (continued)
Bits
Name
2
(34)
FCU
Freeze Counter in User state.
0 The PMC can be incremented (if enabled by other performance monitor control fields).
1 The PMC will not be incremented if MSR[PR] is set.
3
(35)
FCM1
Freeze Counter while Mark is set.
0 The PMC can be incremented (if enabled by other performance monitor control fields).
1 The PMC will not be incremented if MSR[PMM] is set.
4
(36)
FCM0
Freeze Counter while Mark is cleared.
0 The PMC can be incremented (if enabled by other performance monitor control fields).
1 The PMC will not be incremented if MSR[PMM] is cleared.
5
(37)
CE
Condition Enable.
It is recommended that CE be cleared when counter PMCn is selected for chaining.
0 Overflow conditions for PMCn cannot occur (PMCn cannot cause interrupts or freeze counters)
1 An overflow condition is present when the most-significant-bit of PMCn is equal to 1.
6–7
(38–39)
—
Reserved for EVENT expansion, should be cleared.
8–15
(40–47)
EVENT Event selector. See Section 8.7, “Event Selection”
16
(48)
—
17–19
(49–51)
PMP
20–31
(52–63)
—
1
Description
Reserved, should be cleared.
Performance Monitor Watchpoint Periodicity Select
000 Performance Monitor Watchpoint x triggers on any change of counterx bit 32 (period=231)
001 Performance Monitor Watchpoint x triggers on any change of counterx bit 43 (period=220)
010 Performance Monitor Watchpoint x triggers on any change of counterx bit 49 (period=214)
011 Performance Monitor Watchpoint x triggers on any change of counterx bit 55 (period=28)
100 Performance Monitor Watchpoint x triggers on any change of counterx bit 59 (period=24)
101 Performance Monitor Watchpoint x triggers on any change of counterx bit 61 (period=22)
110 Performance Monitor Watchpoint x triggers on any change of counterx bit 62 (period=21)
111 Performance Monitor Watchpoint x triggers on any change of counterx bit 63 (period=20)1
Reserved, should be cleared.
For certain events which may count an even number of times per cycle, this watchpoint is not guaranteed to assert with
PMP = 111.
8.3.6
User Local Control A Registers (UPMLCa0–UPMLCa3)
The PMLCa register contents are aliased to UPMLCa0–UPMLCa3, which can be read by user-level
software with mfpmr using PMR numbers in Table 8-2.
8.3.7
Local Control B Registers (PMLCb0–PMLCb3)
Local control B registers (PMLCb0–PMLCb3), shown in Figure 8-3, specify triggering conditions, a
threshold value, and a multiple to apply to a threshold event selected for the corresponding performance
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
8-7
Performance Monitor
monitor counter. For the e200z7, thresholding is supported only for PMC0 and PMC1. PMLCb is used in
conjunction with the corresponding PMLCa register.
PMR 272–275
0
R
W
Access: Read/Write
1
—
3
4
TRIGONCTL
—
7
TRIGOFFCTL
Reset
W
9
—
12
TRIGONSEL
13
—
14
15
TRIGOFFSEL
All zeros
16
R
8
17
18
19
TRIGOFFSEL TRIGGERED
20
—
21
22
23
24
THRESHMUL
Reset
25
26
—
31
THRESHOLD
All zeros
Figure 8-3. Performance Monitor Local Control B Registers (PMLCb0–PMLCb3)
PMLCb is cleared by reset. Table 8-6 describes PMLCb fields.
Table 8-6. PMLCb0–PMLCb3 Field Descriptions
Bits
Name
0
(32)
—
1:3
(33:35)
TRIGONCNTL
4
(36)
—
5:7
(37:39)
TRIGOFFCNTL
8
(40)
—
Description
Reserved, should be cleared.
Trigger-on Control Class—Class of Trigger-on source
Indicates the condition under which triggering to start counting occurs. No triggering will occur
while PMGC0[FAC] or PMLCan[FC] is set.
000 Trigger-on control is disabled if TRIGONSEL is 0000 (i.e. counting is not affected by
triggers). All other values for TRIGONSEL are reserved.
001 Trigger-on control based on selected PMC condition(s)
010 Trigger-on based on selected processor event(s)
011 Trigger-on based on selected hardware signal(s)
100 Trigger-on based on selected watchpoint occurrence (watchpoint #0–15)
101 Trigger-on based on selected watchpoint occurrence (extension for watchpoint #16–31)
11x Reserved
Reserved, should be cleared.
Trigger-off Control Class—Class of Trigger-off source
Indicates the condition under which triggering to stop counting occurs. No triggering will occur
while PMGC0[FAC] or PMLCan[FC] is set.
000 Trigger-off control is disabled if TRIGOFFSEL is 0000 (i.e. counting is not affected by
triggers) All other values for TRIGOFFSEL are reserved.
001 Trigger-off control based on selected PMC condition(s)
010 Trigger-off based on selected processor event(s)
011 Trigger-off based on selected hardware signal(s)
100 Trigger-off based on selected watchpoint occurrence (watchpoint #0–15)
101 Trigger-off based on selected watchpoint occurrence (extension for watchpoint #16–31)
11x Reserved
Reserved, should be cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
8-8
Freescale Semiconductor
Performance Monitor
Table 8-6. PMLCb0–PMLCb3 Field Descriptions (continued)
Bits
Name
Description
9:12
(41:44)
TRIGONSEL
Trigger-on Source Select—Source Select based on setting of TRIGONCTL
• TRIGONCTL = 000:
0000 Trigger-on control is disabled
0001–1111 Reserved
• TRIGONCTL = 001:
This field should be to the ID of the PMCy that should trigger event counting to start. When PMCy
overflows, the trigger will be generated.
When TRIGONSEL = PMCx (i.e. self-select), no triggering will occur due to any counter change.
If TRIGONSEL = TRIGOFFSEL, triggering results are undefined.
0000 Trigger-on when PMC0[OV] transitions to a 1.
0001 Trigger-on when PMC1[OV] transitions to a 1.
0010 Trigger-on when PMC2[OV] transitions to a 1.
0011 Trigger-on when PMC3[OV] transitions to a 1.
0100–1111 Reserved
• TRIGONCTL = 010:
0000 Trigger-on when next processor interrupt occurs (software may want to set
PMGC0[PMIE] = 0 for this setting).
0001–1111 Reserved
• TRIGONCTL = 011:
0000 Trigger on assertion of p_devnt_out[0]
0001 Trigger on assertion of p_devnt_out[1]
0010 Trigger on assertion of p_devnt_out[2]
0011 Trigger on assertion of p_devnt_out[3]
0100 Trigger on assertion of p_devnt_out[4]
0101 Trigger on assertion of p_devnt_out[5]
0110 Trigger on assertion of p_devnt_out[6]
0111 Trigger on assertion of p_devnt_out[7]
1000 Trigger on rise of p_pmcn_qual input
1001–1111 Reserved
• TRIGONCTL = 100:
0000 Trigger-on based on watchpoint #0 occurrence
0001 Trigger-on based on watchpoint #1 occurrence
0010 Trigger-on based on watchpoint #2 occurrence
…
1110 Trigger-on based on watchpoint #14 occurrence
1111 Trigger-on based on watchpoint #15 occurrence
• TRIGONCTL = 101:
0000 Trigger-on based on watchpoint #16 occurrence
0001 Trigger-on based on watchpoint #17 occurrence
0010 Trigger-on based on watchpoint #18 occurrence
.…
1000 Trigger-on based on watchpoint #24 occurrence
1001 Trigger-on based on watchpoint #25 occurrence
1100–1111 Reserved
13
(45)
—
Reserved, should be cleared.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
8-9
Performance Monitor
Table 8-6. PMLCb0–PMLCb3 Field Descriptions (continued)
Bits
Name
Description
14:17
(46:49)
TRIGOFFSEL
Trigger-off Source Select - Source Select based on setting of TRIGOFFCTL
• TRIGOFFCTL = 000:
0000 Trigger-off control is disabled
0001–1111 Reserved
• TRIGOFFCTL = 001:
This field should be to the ID of the PMCy that should trigger event counting to stop. When PMCy
overflows, the trigger will be generated.
When TRIGOFFSEL = PMCx (i.e. self-select), no triggering will occur due to any counter change.
If TRIGONSEL = TRIGOFFSEL, triggering results are undefined.
0000 Trigger-off when PMC0[OV] transitions to a 1.
0001 Trigger-off when PMC1[OV] transitions to a 1.
0010 Trigger-off when PMC2[OV] transitions to a 1.
0011 Trigger-off when PMC3[OV] transitions to a 1.
0100–1111 Reserved
• TRIGOFFCTL = 010:
0000 Trigger-on when next processor interrupt occurs (software may want to set
PMGC0[PMIE] = 0 for this setting).
0001–1111 Reserved
• TRIGOFFCTL = 011:
0000 Trigger-off based on assertion of p_devnt_out[0]
0001 Trigger-off based on assertion of p_devnt_out[1]
0010 Trigger-off based on assertion of p_devnt_out[2]
0011 Trigger-off based on assertion of p_devnt_out[3]
0100 Trigger-off based on assertion of p_devnt_out[4]
0101 Trigger-off based on n assertion of p_devnt_out[5]
0110 Trigger-off based on assertion of p_devnt_out[6]
0111 Trigger-off based on assertion of p_devnt_out[7]
1000 Trigger-off based on fall of p_pmcn_qual input
1001–1111 Reserved
• TRIGOFFCTL = 100:
0000 Trigger-off based on watchpoint #0 occurrence
0001 Trigger-off based on watchpoint #1 occurrence
0010 Trigger-off based on watchpoint #2 occurrence
…
1110 Trigger-off based on watchpoint #14 occurrence
1111 Trigger-off based on watchpoint #15 occurrence
• TRIGOFFCTL = 101:
0000 Trigger-off based on watchpoint #16 occurrence
0001 Trigger-off based on watchpoint #17 occurrence
0010 Trigger-off based on watchpoint #18 occurrence
…
1000 Trigger-off based on watchpoint #24 occurrence
1001 Trigger-off based on watchpoint #25 occurrence
1100–1111 Reserved
e200z7 Power Architecture Core Reference Manual, Rev. 2
8-10
Freescale Semiconductor
Performance Monitor
Table 8-6. PMLCb0–PMLCb3 Field Descriptions (continued)
1
Bits
Name
Description
18
(50)
TRIGGERED
Triggered
0 Counter has not been triggered
1 Counter has been triggered
TRIGGERED can be set or cleared by hardware or software. PMLCbx[TRIGONCTL] controls
TRIGGERED setting by hardware. If PMLCbx[TRIGONCTL] is set to enable trigger-on control,
TRIGGERED will be set by hardware when the next trigger-on event occurs and TRIGGERED is
currently cleared.
PMLCbx[TRIGOFFCTL] controls TRIGGERED clearing by hardware. If PMLCbx[TRIGOFFCTL]
is set to enable trigger-off control, TRIGGERED will be cleared by hardware when the next
trigger-off event occurs and TRIGGERED is currently set.
The state of TRIGGERED qualifies counting if either PMLCbx[TRIGONCTL] or
PMLCbx[TRIGOFFCTL] is set to enable triggering (other qualifiers on counting such as
PMGC0[FAC] and PMLCa controls operate independently of TRIGGERED). If both
PMLCbx[TRIGONCTL] and PMLCbx[TRIGOFFCTL] are cleared to disable triggering, the state
of TRIGGERED has no effect on counting.
TRIGGERED has no effect on PMLCax[FC]; PMLCax[FC] maintains its current value until
changed by software.
19:20
(51:52)
—
21:23
(53:55)
THRESHMUL1
24:25
(56:57)
—
26:31
(58:63)
THRESHOLD1
Reserved, should be cleared.
Threshold multiple.
000 Threshold field is multiplied by 1 (PMLCbn[THRESHOLD]  1)
001 Threshold field is multiplied by 2 (PMLCbn[THRESHOLD]  2)
010 Threshold field is multiplied by 4 (PMLCbn[THRESHOLD]  4)
011 Threshold field is multiplied by 8 (PMLCbn[THRESHOLD]  8)
100 Threshold field is multiplied by 16 (PMLCbn[THRESHOLD]  16)
101 Threshold field is multiplied by 32 (PMLCbn[THRESHOLD]  32)
110 Threshold field is multiplied by 64 (PMLCbn[THRESHOLD]  64)
111 Threshold field is multiplied by 128 (PMLCbn[THRESHOLD]  128)
Reserved, should be cleared.
Threshold
Only events that exceed this value multiplied by THRESHMUL are counted. Events to which a
threshold value applies are implementation dependent, as are the unit (for example duration in
cycles) and the granularity with which the threshold value is interpreted.
By varying the threshold value, software can obtain a profile of the event characteristics subject
to thresholding by monitoring a program repeatedly using a different threshold value each time.
These fields are not implemented in PMLCb2 and PMLCb3 and are read as zero.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
8-11
Performance Monitor
8.3.8
User Local Control B Registers (UPMLCb0–UPMLCb3)
The contents of PMLCb0–PMLCb3 are aliased to UPMLCb0–UPMLCb3, which can be read by user-level
software with mfpmr using PMR numbers in Table 8-2.
8.3.9
Performance Monitor Counter Registers (PMC0–PMC3)
The performance monitor counter registers PMC0–PMC3 shown in Figure 8-4 are 32-bit counters that can
be programmed to generate overflow event signals when they overflow. Each counter is enabled to count
up to 128 processor events.
PMR 16–19
0
R
W
Access: Read/Write
1
31
OV
Counter Value
Reset
All zeros
Figure 8-4. Performance Monitor Counter Registers (PMC0–PMC3)
PMCs are cleared by reset. Table 8-7 describes the PMC register fields.
Table 8-7. PMC0–PMC3 Field Descriptions
Bits
Name
Description
0
(32)
OV
1–31
(33–63)
Counter Value
Overflow
0 Counter has not reached an overflow state.
1 Counter has reached an overflow state.
Indicates the number of occurrences of the specified event.
A counter can increment by 0, 1, 2, 3, or 4 (based on the number of events occurring in a given counter
cycle) up to the maximum value and then wraps to the minimum value.
A counter enters the overflow state when the high-order bit is set. A performance monitor interrupt handler
can easily identify overflowed counters, even if the interrupt is masked for many cycles (during which the
counters may continue incrementing). A high-order bit is set normally only when the counter increments
from a value below 2,147,483,648 (0x8000_0000) to a value greater than or equal to 2,147,483,648
(0x8000_0000).
NOTE
Initializing PMCs to overflowed values is discouraged. If an overflowed
value is loaded into a PMCn that held a non-overflowed value (and
PMGC0[PMIE], PMLCan[CE], and MSR[EE] are set), an interrupt may be
falsely generated before any events are counted.
The response to an overflow condition depends on the configuration, as follows:
• If PMLCan[CE] is clear, no special actions occur on overflow of PMCn: the counter continues
incrementing, and no event is signaled.
• If PMLCan[CE] and PMGC0[FCECE] are both set, all counters are frozen when PMCn overflows.
e200z7 Power Architecture Core Reference Manual, Rev. 2
8-12
Freescale Semiconductor
Performance Monitor
•
If PMLCan[CE] and PMGC0[PMIE] are set, an exception is signaled on overflow of PMCn.
Performance Monitor Interrupts are masked when MSR[EE] = 0. An exception may be signaled
while MSR[EE] = 0, but the interrupt is not taken until MSR[EE] = 1 and is only guaranteed to be
taken if the overflow condition is still present and the configuration has not been changed in the
meantime to disable the exception. If PMLCan[CE] or PMGC0[PMIE] is cleared, the exception is
no longer signaled.
The following sequence is recommended for setting counter values and configurations:
1. Set PMGC0[FAC] to freeze the counters.
2. Using mtpmr instructions, initialize counters and configure control registers.
3. Release the counters by clearing PMGC0[FAC] with a final mtpmr.
8.3.10
User Performance Monitor Counter Registers (UPMC0–UPMC3)
The contents of PMC0–PMC3 are aliased to UPMC0–UPMC3, which can be read by user-level software
with the mfpmr instruction using PMR numbers in Table 8-2.
8.4
Performance Monitor Interrupt
The performance monitor interrupt is triggered by an enabled condition or event. The enabled condition
or events defined for the e200z7 are the following:
• A PMCn overflow condition occurs when both of the following are true:
— The counter’s overflow condition is enabled; PMLCan[CE] is set.
— The counter indicates an overflow; PMCn[OV] is set.
• A time base event occurs with the following settings:
— Time base events are enabled with PMGC0[TBEE] = 1
— the TBL bit specified in PMGC0[TBSEL] changes from 0 to 1
The two performance monitor exception conditions are treated differently with respect to whether or not
the conditions are level sensitive or edge sensitive. A performance monitor exception condition which is
caused by a PMCn overflow condition is level sensitive to the values of PMLCAn[CE] and PMCn[OV].
This means that as long as these values are both set to ‘1’, then the exception condition continues to exist
and the performance monitor interrupt can be taken if the remainder of the performance monitor interrupt
gating conditions are met. However, the exception due to the time base event is set only when both
PMGC0[TBEE] = 1 and the transition from ‘0’ to ‘1’ occurs in the specified TBL bit. This condition is not
cleared once it occurs, regardless of whether the TBL bit subsequently transitions to a ‘0’, but this
exception is automatically cleared whenever any performance monitor interrupt is subsequently taken.
• If PMGC0[PMIE] is set, an enabled condition or event triggers the signaling of a performance
monitor exception.
• If PMGC0[FCECE] is set, an enabled condition or event forces all performance monitor counters
to freeze.
Although the performance monitor exception condition may occur with MSR[EE] = 0, the interrupt cannot
be taken until MSR[EE] = 1. If PMCn overflows and would signal an exception (PMLCan[CE] = 1 and
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
8-13
Performance Monitor
PMGC0[PMIE] = 1) while MSR[EE] = 0, and freezing of the counters is not enabled (PMGC0[FCECE]
is clear), it is possible that PMCn could wrap around to all zeros again without the performance monitor
interrupt being taken.
Interrupt handlers should clear a counter overflow condition or the corresponding Condition Enable to
avoid a repeated interrupt to occur for the same event.
The priority of the performance monitor interrupt is specified in Section 7.7.1, “Exception Priorities.
8.5
Event Counting
This section describes configurability and specific unconditional counting modes.
8.5.1
MSR-based Context Filtering
Counting can be configured to be conditionally enabled if conditions in the processor state match a
software-specified condition. Because a software task scheduler may switch a processor’s execution
among multiple processes and because statistics on only a particular process may be of interest, a facility
is provided to mark a process. The performance monitor mark bit, MSR[PMM], is used for this purpose.
System software may set this bit when a marked process is running. This enables statistics to be gathered
only during the execution of the marked process. The states of MSR[PR] and MSR[PMM] define a state
that the processor (supervisor or user) and the process (marked or unmarked) may be in at any time. If this
state matches an individual state specified by the PMLCan[FCS, FCU, FCM1, FCM0] fields, counting is
enabled for PMCn.
For the e200z7 implementation, a given event may or may not support MSR-based context filtering. For
events that do not support MSR-based context filtering, the FCS, FCU, FCM1, and FCM0 controls have
no effect on the counting of that event.
The processor states and the settings of the FCS, FCU, FCM1, and FCM0 bits in PMLCan necessary to
enable monitoring of each processor state are shown in Table 8-8.
Table 8-8. Processor States and PMLCa0–PMLCa3 Bit Settings
Processor State
FCS
FCU
FCM1
FCM0
All (no context filtering)
0
0
0
0
Marked
0
0
0
1
Not marked
0
0
1
0
Supervisor
0
1
0
0
Marked and supervisor
0
1
0
1
Not marked and supervisor
0
1
1
0
User
1
0
0
0
Marked and user
1
0
0
1
Not marked and user
1
0
1
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
8-14
Freescale Semiconductor
Performance Monitor
Table 8-8. Processor States and PMLCa0–PMLCa3 Bit Settings (continued)
Processor State
FCS
FCU
FCM1
FCM0
None (counting disabled)
X
X
1
1
None (counting disabled)
1
1
X
X
8.6
Examples
The following sections provide examples of how to use the performance monitor facility.
8.6.1
Chaining Counters
The counter chaining feature can be used to allow a higher event count than is possible with a single
counter. Chaining two counters together effectively adds 32 bits to a counter register where rollover of the
first counter generates a carry out feeding the second counter. By defining the event of interest to be
another PMC’s rollover occurrence, the chained counter increments each time the first counter rolls over
to zero. Multiple counters may be chained together.
Because the entire chained value cannot be read in a single instruction, a rollover may occur between
counter reads, producing an inaccurate value. A sequence like the following is necessary to read the
complete chained value when it spans multiple counters and the counters are not frozen. The example
shown is for a two-counter case.
loop:
mfpmr
mfpmr
mfpmr
cmp
bc
Rx,pmctr1
Ry,pmctr0
Rz,pmctr1
cr0,0,Rz,Rx
4,2,loop
#load from upper counter
#load from lower counter
#load from upper counter
#see if ‘old’ = ‘new’
#loop if carry occurred between reads
The comparison and loop are necessary to ensure that a consistent set of values has been obtained. The
above sequence is not necessary if the counters are frozen.
8.6.2
Thresholding
Threshold event measurement enables the counting of duration and usage events. For example, data cache
load miss cycles (events C0:xx and C1:xx) require a threshold value. A data cache load miss cycles event
is counted only when the number of cycles spent waiting for the miss is greater than the threshold. Because
this event is supported by two counters and each counter has an individual threshold, one execution of a
performance monitor program can sample two different threshold values. Measuring code performance
with multiple concurrent thresholds may expedite code profiling significantly.
8.7
Event Selection
Event selection is specified through the PMLCan registers described in Section 8.3.5, “Local Control A
Registers (PMLCa0–PMLCa3). The event-select fields in PMLCan[EVENT] are described in Table 8-10,
which lists encodings for the selectable events to be monitored. Table 8-10 establishes a correlation
between each counter, events to be traced, and the pattern required for the desired selection.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
8-15
Performance Monitor
The Spec/Nonspec column indicates whether the event count includes any occurrences due to processing
that was not architecturally required by the PowerPC sequential execution model (speculative processing).
• Speculative counts include speculative operations that were later flushed.
• Non-speculative counts do not include speculative operations, which are flushed.
The PR, PMM filtering column indicates whether a given event supports MSR-based context filtering.
Table 8-9 describes how event types are indicated in Table 8-10.
Table 8-9. Event Types
Event Type
Label
Description
Reference
Ref:#
Shared across counters PMC0–PMC3.
Common
Com:#
Shared across counters PMC0–PMC3.
Counter-specific
C[0–3]:#
Counted only on one or more specific counters. The notation indicates the counter to which an
event is assigned. For example, an event assigned to counter PMC0 is shown as C0:#.
Table 8-10 describes performance monitor events.
Table 8-10. Performance Monitor Event Selection
Number
Spec/
Nonspec
Event
PR, PMM
Filtering1
Count Description
General Events
Com:0
Nothing
Nonspec
—
Register counter holds current value
Ref:12
Processor cycles
Nonspec
Yes
Every processor cycle not in waiting, halted, stopped
states and not in a debug session.
Com:23
Instructions completed
Nonspec
Yes
Completed instructions. 0, 1, 2, or 3 per cycle.
Com:32
Processor cycles with 0
instructions issued
Nonspec
Yes
Ref:1 cycles with no instructions entering execution
Com:42
Processor cycles with 1
instruction issued
Nonspec
Yes
Ref:1 cycles with one instruction entering execution
Com:52
Processor cycles with 2
instructions issued
Nonspec
Yes
Ref:1 cycles with two instructions entering execution
Com:63
Instruction words fetched
Spec
Yes
Fetched instruction words. 0, 1, or 2, 3, or 4 per cycle.
(note that an instruction word may hold 1 or 2 instructions,
or 2 partial instructions when fetching from a VLE page)
Com:7
—
—
—
—
Com:8
PM_EVENT transitions
—
—
0 to 1 transitions on the p_pm_event input.
Com:9
PM_EVENT cycles
—
—
Processor (Ref:1) cycles that occur when the
p_pm_event input is asserted.
Instruction Types Completed
Com:103
Branch instructions
completed
Nonspec
Yes
Completed branch instructions, includes branch and link
type instructions
e200z7 Power Architecture Core Reference Manual, Rev. 2
8-16
Freescale Semiconductor
Performance Monitor
Table 8-10. Performance Monitor Event Selection (continued)
Spec/
Nonspec
PR, PMM
Filtering1
Branch and link type
instructions completed
Nonspec
Yes
Completed branch and link type instructions
Com:123
Conditional branch
instructions completed
Nonspec
Yes
Completed conditional branch instructions
Com:133
Taken Branch instructions
completed
Nonspec
Yes
Completed branch instructions which were taken.
Includes branch and link type instructions.
Com:143
Taken Conditional Branch
instructions completed
Nonspec
Yes
Completed conditional branch instructions which were
taken.
Com:153
Load instructions
completed
Nonspec
Yes
Completed load, load-multiple type instructions
Com:163
Store instructions
completed
Nonspec
Yes
Completed store, store-multiple type instructions
Com:173
Load micro-ops completed
Nonspec
Yes
Completed load micro-ops. (l*, evl*, load-update (1 load
micro-op), load-multiple (1–32 micro-ops), dcbt, dcbtls,
dcbtst, dcbtstls, and dcbtst, dcbf, dcblc, dcbst, icbi,
icblc, icbt, icbtls). Misaligned loads crossing a 64-bit
boundary count as two micro-ops.
Com:183
Store micro-ops completed
Nonspec
Yes
Completed store micro-ops. (st*, evst*, store-update (1
store micro-op), store-multiple (1–32 micro-ops), dcbi,
dcbz). Misaligned stores crossing a 64-bit boundary
count as two micro-ops.
Com:193
Integer instructions
completed
Nonspec
Yes
Completed simple integer instructions (not a
load-type/store-type/branch/mul/div, EFPU, or SPE)
Com:203
Multiply instructions
completed
Nonspec
Yes
Completed Multiply instructions (non-EFPU)
Com:213
Divide instructions
completed
Nonspec
Yes
Completed Divide instructions including SPE (non-EFPU)
Com:223
Divide instruction execution
cycles
Nonspec
Yes
Cycles of execution for all Divide instructions (non-EFPU)
Com:233
SPE/EFPU instructions
completed
Nonspec
Yes
Completed SPE/EFPU instructions. Does not include
SPE/EFPU load and store instructions.
Com:243
SPE simple instructions
completed
Nonspec
Yes
Completed SPE simple instructions. All SPE instructions
included except SPE load and store instructions, div, dotp,
mul and mac-type instructions.
Com:253
SPE mul/mac/dotp
instructions completed
Nonspec
Yes
Completed SPE mul/mac/dotp instructions. Does not
include other SPE instructions, or brinc instructions.
Com:263
EFPU FP instructions
completed
Nonspec
Yes
Completed EFPU FP (evfs, efs) instructions.
Com:273
Number of return from
interrupt instructions
Nonspec
Yes
Includes all types of return from interrupts (i.e. rfi, rfci,
rfdi, rfmci, and VLE variants)
Number
Event
Com:113
Count Description
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
8-17
Performance Monitor
Table 8-10. Performance Monitor Event Selection (continued)
Number
Spec/
Nonspec
Event
PR, PMM
Filtering1
Count Description
Branch Prediction and Execution Events
Com:283
Finished branches that
miss the BTB
Spec
Yes
Includes all taken branch instructions which missed in the
BTB
Com:293
Branches mispredicted (for
any reason)
Spec
Yes
Counts branch instructions mispredicted due to direction
or target (for example if the LR or CTR contents change).
Com:303
Branches in the BTB
mispredicted due to
direction prediction.
Spec
Yes
Counts branch instructions which hit the BTB with
mispredicted due to direction prediction.
Com:313
Incorrect target prediction
using the link stack
Spec
Yes
Com:323
BTB hits
Spec
Yes
Com:33
—
—
—
—
Com:34
—
—
—
—
—
Branch instructions that hit in the BTB
Pipeline Stalls
Com:35
—
—
—
—
Com:36
—
—
—
—
Com:372
Cycles decode stalled due
to no instructions available
Spec
Yes
No instruction available to decode
Com:382
Cycles issue stalled
Spec
Yes
Cycles the issue buffer is not empty but 0 instructions
issued
Com:392
Cycles branch issue stalled
Spec
Yes
Branch held in decode awaiting resolution
Com:402
Cycles execution stalled
waiting for load data
Spec
Yes
load stalls
Com:412
Cycles execution stalled
waiting for non-load/store
SPE/EFPU result data
Spec
Yes
Stalled waiting on mul, div, FP or MAC results
Load/Store, Data Cache, and Data Line Fill Events
Com:42
—
—
—
—
Com:43
—
—
—
—
3
Com:44
Total translation hits
Spec
Yes
—
Com:453
Load translation hits
Spec
Yes
Cacheable l* or evl* micro-ops translated. (includes load
micro-ops from load-multiple and load-update
instructions)
Com:463
Store translation hits
Spec
Yes
Cacheable st* or evst* micro-ops translated. (includes
micro-ops from store-multiple, and store-update
instructions)
e200z7 Power Architecture Core Reference Manual, Rev. 2
8-18
Freescale Semiconductor
Performance Monitor
Table 8-10. Performance Monitor Event Selection (continued)
Number
Event
Spec/
Nonspec
PR, PMM
Filtering1
Com:473
Touch translation hits
Spec
Yes
Cacheable dcbt and dcbtst instructions translated (L1
only) (Doesn’t count touches that are converted to nops
i.e. exceptions, non-cacheable, HID0[NOPTI] is set.)
Com:483
Data cache op translation
hits
Spec
Yes
dcba, dcbf, dcbst, and dcbz instructions translated
Com:493
Data cache lock set
instructions completed
Nonspec
Yes
dcbtls and dcbtstls instructions completed
Com:503
Data cache lock clear
instructions completed
Nonspec
Yes
dcblc instructions completed
Com:513
Cache-inhibited load
access translation hits
Spec
Yes
Cache inhibited load accesses translated
Com:523
Cache-inhibited store
access translation hits
Spec
Yes
Cache inhibited store accesses translated
Com:533
Guarded load translation
hits
Spec
Yes
Guarded loads translated
Com:543
Guarded store translation
hits
Spec
Yes
Guarded stores translated
Com:553
Write-through store
translation hits
Spec
Yes
Write-through stores translated
Com:563
Misaligned load or store
accesses translated
Spec
Yes
Misaligned load or store accesses translated. Count once
per misaligned load or store.
Com:573
Dcache linefills
Spec
Yes
Counts dcache reloads for any reason, including
touch-type reloads. Typically used to determine
approximate data cache miss rate (along with
loads/stores completed).
Com:583
Dcache copybacks
Spec
Yes
Does not count copybacks due to dcbf, dcbst, or
L1FINV0 operations
Com:593
Dcache sequential
accesses
Spec
Yes
Number of sequential accesses
Com:603
Dcache stream hits
Spec
Yes
Number of load hits due to streaming
Com:613
Dcache linefill buffer hits
Spec
Yes
Number of load hit to the linefill buffer
Com:623
Store stalls due to store to
line of active linefill
Spec
Yes
Stall cycles due to store to linefill in progress
Com:633
Store buffer full stalls
Spec
Yes
Stall cycles due to store buffer full
Com:642
Dcache throttling stalls
Spec
Yes
Cycles the data cache asserts p_d_halt_zlb which
actually cause a CPU stall
Com:653
Dcache recycled accesses
Spec
Yes
Number of loads or stores recycled for a re-lookup
Com:663
Dcache recycled access
stalls
Spec
Yes
Number of stall cycles due to recycled accesses for a
re-lookup
Count Description
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
8-19
Performance Monitor
Table 8-10. Performance Monitor Event Selection (continued)
Spec/
Nonspec
PR, PMM
Filtering1
Dcache CPU aborted
accesses
Spec
Yes
Number of aborted requests
Com:683
Data MMU miss
Spec
Yes
Counts number of DTLB events
Com:693
Data MMU error
Spec
Yes
Counts number of DSI events
Number
Event
Com:673
Count Description
Fetch, Instruction Cache, Instruction Line Fill, and Instruction Prefetch Events
Com:70
—
—
—
—
Com:71
—
—
—
—
Com:723
Icache linefills
Spec
Yes
Counts icache reloads due to demand fetch. Used to
determine instruction cache miss rate (along with
instructions completed)
Com:733
Number of fetches
Spec
Yes
Counts fetches that write at least one instruction to the
instruction buffer. (With instruction fetched (com:4), can
used to compute instructions-per-fetch)
Com:743
Icache lock set instructions
completed
Nonspec
Yes
icbtls instructions completed
Com:753
Icache lock clear
instructions completed
Nonspec
Yes
icblc instructions completed
Com:763
Cache-inhibited instruction
access translation hits
Spec
Yes
Cache-inhibited instruction accesses translated
Com:772
Icache throttling stalls
Spec
Yes
Cycles the instruction cache asserts p_i_halt_zlb that
actually cause a CPU stall
Com:783
Icache recycled accesses
Spec
Yes
Number of instruction access requests recycled for a
re-lookup
Com:793
Icache recycled access
stalls
Spec
Yes
Number of stall cycles due to recycled accesses for a
re-lookup
Com:803
Icache CPU aborted
accesses
Spec
Yes
Number of aborted requests
Com:813
Instruction MMU miss
Spec
Yes
Counts number of events
Com:823
Instruction MMU error
Spec
Yes
Counts number of events
BIU Interface Usage
Com:83
—
—
—
—
Com:84
—
—
—
—
Com:853
BIU instruction-side
requests
Spec
Yes
instruction-side transactions
Com:863
BIU instruction-side cycles
Spec
Yes
instruction-side transaction cycles
Com:873
BIU data-side requests
Spec
Yes
data-side transactions
e200z7 Power Architecture Core Reference Manual, Rev. 2
8-20
Freescale Semiconductor
Performance Monitor
Table 8-10. Performance Monitor Event Selection (continued)
Spec/
Nonspec
PR, PMM
Filtering1
BIU data-side copyback
requests
Spec
Yes
Replacement pushes including dcbf, dcbst, L1FINV0,
copybacks.
Com:893
BIU data-side cycles
Spec
Yes
data-side transaction cycles
Com:903
BIU single-beat write
cycles
Non-Spec
Yes
single beat write transaction cycles
Com:91
—
—
—
Number
Event
Com:883
Count Description
—
Snoop
Com:92
Snoop requests
N/A
—
Externally generated snoop requests. (Counts snoop
TSs.)
Com:93
Snoop hits
N/A
—
Snoop hits on all data-side resources regardless of the
cache state (modified, shared, or exclusive)
Com:943
Snoop induced CPU to
Dcache stalls
N/A
—
Cycles a pending Dcache access from CPU is stalled due
to contention with snoops
Com:95
Snoop Queue full cycles
N/A
—
Cycles the snoop queue is full
Com:96
—
—
—
—
Chaining Events4
Com:97
PMC0 rollover
N/A
—
PMC0[OV] transitions from 1 to 0.
Com:98
PMC1 rollover
N/A
—
PMC1[OV] transitions from 1 to 0.
Com:99
PMC2 rollover
N/A
—
PMC2[OV] transitions from 1 to 0.
Com:100
PMC3 rollover
N/A
—
PMC3[OV] transitioned from 1 to 0.
Interrupt Events
Com:101
—
—
—
—
Com:102
—
—
—
—
Com:103
Interrupts taken
Nonspec
—
—
Com:104
External input interrupts
taken
Nonspec
—
—
Com:105
Critical input interrupts
taken
Nonspec
—
—
Com:106
Watchdog timer interrupts
taken
Nonspec
—
—
Com:107
System call and trap
interrupts
Nonspec
Yes
—
Com:1082
Cycles in which
MSR[EE] = 0
Nonspec
—
—
Com:1092
Cycles in which
MSR[CE] = 0
Nonspec
—
—
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
8-21
Performance Monitor
Table 8-10. Performance Monitor Event Selection (continued)
Number
Event
Ref:110
Transitions of TBL bit
selected by
PMGC0[TBSEL].
Spec/
Nonspec
PR, PMM
Filtering1
Nonspec
—
Count Description
Counts transitions of the TBL bit selected by
PMGC0[TBSEL]. Counts both 01 and 10.
DEVENT Events
Com:111
DEVNT0 is generated
Nonspec
Yes
assertion of p_devnt_out0 detected
Com:112
DEVNT1 is generated
Nonspec
Yes
assertion of p_devnt_out1 detected
Com:113
DEVNT2 is generated
Nonspec
Yes
assertion of p_devnt_out2 detected
Com:114
DEVNT3 is generated
Nonspec
Yes
assertion of p_devnt_out3 detected
Com:115
DEVNT4 is generated
Nonspec
Yes
assertion of p_devnt_out4 detected
Com:116
DEVNT5 is generated
Nonspec
Yes
assertion of p_devnt_out5 detected
Com:117
DEVNT6 is generated
Nonspec
Yes
assertion of p_devnt_out6 detected
Com:118
DEVNT7 is generated
Nonspec
Yes
assertion of p_devnt_out7 detected
Watchpoint Events
Com:1192
Watchpoint #0 occurs
Nonspec
Yes
assertion of jd_watchpt0 detected
Com:1202
Watchpoint #1 occurs
Nonspec
Yes
assertion of jd_watchpt1 detected
Com:1212
Watchpoint #2 occurs
Nonspec
Yes
assertion of jd_watchpt2 detected
Com:1222
Watchpoint #3 occurs
Nonspec
Yes
assertion of jd_watchpt3 detected
Com:1232
Watchpoint #4 occurs
Nonspec
Yes
assertion of jd_watchpt4 detected
Com:1242
Watchpoint #5 occurs
Nonspec
Yes
assertion of jd_watchpt5 detected
Com:1252
Watchpoint #6 occurs
Nonspec
Yes
assertion of jd_watchpt6 detected
Com:1262
Watchpoint #7 occurs
Nonspec
Yes
assertion of jd_watchpt7 detected
2
Com:127
Watchpoint #8 occurs
Nonspec
Yes
assertion of jd_watchpt8 detected
Com:1282
Watchpoint #9 occurs
Nonspec
Yes
assertion of jd_watchpt9 detected
Com:129
Watchpoint #10 occurs
Nonspec
Yes
assertion of jd_watchpt10 detected
Com:130
Watchpoint #11 occurs
Nonspec
Yes
assertion of jd_watchpt11 detected
Com:131
Watchpoint #12 occurs
Nonspec
Yes
assertion of jd_watchpt12 detected
Com:132
Watchpoint #13 occurs
Nonspec
Yes
assertion of jd_watchpt13 detected
Com:1332
Watchpoint #14 occurs
Nonspec
Yes
assertion of jd_watchpt14 detected
Com:1342
Watchpoint #15 occurs
Nonspec
Yes
assertion of jd_watchpt15 detected
Com:1352
Watchpoint #16 occurs
Nonspec
Yes
assertion of jd_watchpt16 detected
Com:1362
Watchpoint #17 occurs
Nonspec
Yes
assertion of jd_watchpt17 detected
Com:1372
Watchpoint #18 occurs
Nonspec
Yes
assertion of jd_watchpt18 detected
e200z7 Power Architecture Core Reference Manual, Rev. 2
8-22
Freescale Semiconductor
Performance Monitor
Table 8-10. Performance Monitor Event Selection (continued)
Number
Event
Spec/
Nonspec
PR, PMM
Filtering1
Com:1382
Watchpoint #19 occurs
Nonspec
Yes
assertion of jd_watchpt19 detected
Com:139
Watchpoint #20 occurs
Nonspec
Yes
assertion of jd_watchpt20 detected
Com:140
Watchpoint #21 occurs
Nonspec
Yes
assertion of jd_watchpt21 detected
Com:141
Watchpoint #22 occurs
Nonspec
Yes
assertion of jd_watchpt22 detected
Com:142
Watchpoint #23 occurs
Nonspec
Yes
assertion of jd_watchpt23 detected
Com:143
Watchpoint #24 occurs
Nonspec
Yes
assertion of jd_watchpt24 detected
Com:144
Watchpoint #25 occurs
Nonspec
Yes
assertion of jd_watchpt25 detected
Com:145
Watchpoint #26 occurs
Nonspec
Yes
assertion of jd_watchpt26 detected
Com:146
—
—
—
—
Com:147
—
—
—
—
Com:148
—
—
—
—
Com:149
—
—
—
—
Com:150
—
—
—
—
Count Description
NEXUS Events
Com:1513
Cycle CPU is stalled by
Nexus3 FIFO full
Nonspec
Yes
OVCR stall control set to stall on FIFO fullness
Threshold Events
C0:1523
C1:1523
Data cache load miss
cycles
Spec
Yes
Instances when the number of cycles between a load
miss in the data cache and update of the data cache
exceeds the threshold.
C0:1533
C1:1533
Instruction cache fetch
miss cycles
Spec
Yes
Instances when the number of cycles between miss in the
instruction cache and update of the instruction cache
exceeds the threshold.
C0:1543
C1:1543
External input interrupt
latency cycles
N/A
—
Instances when the number of cycles between request for
interrupt (p_int_b) asserted (but possibly
masked/disabled) and redirecting fetch to external
interrupt vector exceeds threshold. Once the redirection
has occurred, no further threshold comparisons are made
until either the interrupt request negates, or the external
input interrupt is re-enabled by setting MSR[EE].
C0:1553
C1:1553
Critical input interrupt
latency cycles
N/A
—
Instances when the number of cycles between request for
critical interrupt (p_critint_b) is asserted (but possibly
masked/disabled) and redirecting fetch to the critical
interrupt vector exceeds threshold. Once the redirection
has occurred, no further threshold comparisons begin
until either the interrupt request negates and is then
re-asserted, or the critical input interrupt is re-enabled by
setting MSR[CE].
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
8-23
Performance Monitor
Table 8-10. Performance Monitor Event Selection (continued)
Spec/
Nonspec
PR, PMM
Filtering1
Watchdog timer interrupt
latency cycles
N/A
—
Instances when the number of cycles between watchdog
timer time-out request for critical interrupt becomes
pending (watchdog interrupt enabled (TCR[WIE] set) and
time-out occurs (TSR[ENW, WIS] become 0b11)) and
redirecting fetch to the critical interrupt vector exceeds the
threshold. Once the redirection has occurred, no further
threshold comparisons begin until either the watchdog
interrupt request negates and is then re-asserted, or the
watchdog interrupt is re-enabled by setting MSR[CE].
C0:1573
C1:1573
External input interrupt
pending latency cycles
N/A
—
Instances when the number of cycles between external
interrupt pending (enabled and pin asserted) and
redirecting fetch to the external interrupt vector exceeds
the threshold. Once the redirection has occurred, no
further threshold comparisons are made until either the
interrupt request negates and is then re-asserted, or the
external input interrupt is re-enabled by setting MSR[EE].
C0:1583
C1:1583
Critical input interrupt
pending latency cycles
N/A
—
Instances when the number of cycles between pin
request for critical interrupt pending (enabled and pin
asserted) and redirecting fetch to the critical interrupt
vector exceeds the threshold. Once the redirection has
occurred, no further threshold comparisons are made
until either the interrupt request negates and is then
re-asserted, or the critical input interrupt is re-enabled by
setting MSR[CE].
Number
Event
C0:1563
C1:1563
Count Description
1
The notation for the PR, and PMM filtering column either contains a ‘yes’ or a ‘—’. A ‘yes’ indicates that the MSR-based context
filtering function is available for that event. A ‘—’ indicates that the MSR-based context filtering is not available for that event
and has no effect on the counting of that event. See Section 8.5.1, “MSR-based Context Filtering” for more information.
2 This event is not counted while the processor is in the waiting, halted, or stopped states, or during a debug session
3 This event is not counted while the processor is in a debug session.
4 For chaining events, if a counter is configured to count its own rollover, the result is undefined.
e200z7 Power Architecture Core Reference Manual, Rev. 2
8-24
Freescale Semiconductor
Chapter 9
L1 Cache
This chapter describes the organization of the on-chip L1 caches, cache control instructions, and various
cache operations. It describes the interaction between the caches, the load/store unit (LSU), the instruction
unit, and the memory subsystem. This chapter also describes the replacement algorithm used for the L1
caches.
The L1 caches incorporate the following features:
• 16-KB I + 16-KB D Harvard cache design
• Virtually indexed, physically tagged
• 32-byte line size
• 64-bit data, 32-bit address
• Pseudo round-robin replacement algorithm
• 8-entry store buffer
• Push (copyback) buffer
• Linefill buffer
• Hit under fill/copyback
• Supports up to two outstanding misses
• Parity or Multibit EDC protection for the ICache data and tag arrays, with
correction/auto-invalidation capability
• Parity or Multibit EDC protection for the DCache tag arrays, parity protection for the DCache data
arrays with correction/auto-invalidation capability
9.1
Overview
The processor supports a pair of 16-KB, 4-way set-associative, split instruction and data caches with a
32-byte line size. The caches improve system performance by providing low-latency data to the e200z7
instruction and data pipelines, which decouples processor performance from system memory performance.
The caches are virtually indexed and physically tagged.
Instruction and data addresses from the processor to the caches are virtual addresses used to index the
cache array. The MMU provides the virtual to physical translation for use in performing the cache tag
compare. If the physical address matches a valid cache tag entry, the access hits in the cache. For a read
operation, the cache supplies the data to the processor, and for a write operation, the data from the
processor updates the cache. If the access does not match a valid cache tag entry (misses in the cache) or
a write access must be written through to memory, the cache performs a bus cycle on the system bus.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-1
L1 Cache
Figure 9-1 shows the e200z7 caches.
Cache
Control
System
Bus
(Inst)
Control
Control Logic
Control
Data Array
Icache
Interface
Bus
Interface
Module
Tag Array
Data
Address
Address/
Data
Data
Data Path
Address
Address Path
Memory
Management
Unit
Processor
Core
Address
Data
Dcache
Interface
Address Path
Address
Data Path
Data
Bus
Interface
Module
Tag Array
Data Array
Control
Control Logic
Control
Control
Data
Address/
Cache
System
Bus
(Data)
Figure 9-1. e200z7 Caches
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-2
Freescale Semiconductor
L1 Cache
9.2
16-KB Cache Organization
Each 16-KB cache is organized as four ways of 128 sets with each line containing 32 bytes (four double
words) of storage. Figure 9-2 illustrates the cache organization along with the cache line format:
WAY 0
WAY 1
WAY 2
WAY 3
•
•
•
•
•
•
LINE
•
•
•
•
•
•
SET 0
SET 1
•
•
•
SET 126
SET 127
CACHE LINE FORMAT
TAG
V
D
L
Double word0
Double word1
Double word2
Double word3
TAG = 22 bit Physical Address Tag + Parity
L = Lock bits
D = Dirty bits (DCACHE Only)
V = Valid bit
Figure 9-2. 16-KB Cache Organization and Line Format
Virtual address bits A[20–26] provide an index to select a set. Ways are selected according to the rules of
set association.
Each line consists of a physical address tag, status bits, and four double words of data. Address bits
A[27–29] select the word within the line.
9.3
Cache Lookup
Once enabled, the appropriate cache will be searched for a tag match on instruction fetches and data
accesses from the CPU. If a match is found, the cached data is forwarded on a read access to the instruction
fetch unit or the load/store unit (data access), or it is updated on a write access. It may also be
written-through to memory if required.
When a read miss occurs, if there is a TLB hit and the I bit of the hitting TLB entry is clear, the translated
physical address is used to fetch a four double-word cache line beginning with the requested double-word
(critical double-word first). The line is fetched into a linefill buffer and the critical double-word is
forwarded to the CPU. Subsequent double-words may be streamed to the CPU if they have been requested,
or they may be forwarded from the linefill buffer if the data has already been received from the bus and is
valid in the buffer.
When a write miss occurs, if there is a TLB hit, and the I and G bits of the hitting TLB entry are clear and
write allocation is enabled via the L1CSR0[DCWA] control bit, the translated physical address is used to
fetch a four double-word cache line beginning with the double word corresponding to the store address
(critical double-word first). The line is fetched into the linefill buffer and merged with the store data.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-3
L1 Cache
Subsequently, the line is placed into the appropriate cache block. If write allocation is disabled, or the write
is not cacheable or is guarded, no cache line fetch is performed for the write.
During a cache line fill, double words received from the bus are placed into the cache linefill buffer, and
may be forwarded (streamed) to the CPU if such a read request is pending. Accesses from the CPU
following delivery of the critical double word may be satisfied from the cache (hit under fill, non-blocking)
or from the linefill buffer if the requested information has been already received.
If write allocation is enabled, subsequent stores that hit the linefill buffer address while a linefill is in
progress for a previous store or dcbtst miss are merged into the linefill buffer. No merging of stores are
performed during a linefill initiated by a load miss.
When a cache linefill occurs, the linefill buffer contents are placed into the cache array using two accesses;
each occurs after receiving a pair of double words.
The cache always fills an entire line, thereby providing validity on a line-by-line basis. A DCache line is
always in one of the following states: invalid, valid, or dirty (and valid). The state settings are as follows:
• For invalid lines, the V bit is clear, causing the cache line to be ignored during lookups.
• For valid lines, the V bit is set and D bits are cleared, indicating the line contains valid data
consistent with memory.
• For dirty lines, the D and V bits are set, indicating that the line has valid entries that have not been
written to memory.
ICache lines are either invalid or valid. In addition, a cache line in either cache may be locked (L bits set),
indicating the line is not available for replacement.
The caches should be explicitly invalidated after a hardware reset; reset does not invalidate the cache lines.
Following initial power-up, the cache contents are undefined. The L, D, and V bits may be set on some
lines, necessitating the invalidation of the caches by software before being enabled.
Figure 9-3 illustrates the general flow of cache operation for each 16KB cache to determine if the address
is already allocated in the cache,
1. The cache set index, virtual address bits A[20–26] are used to select one cache set. A set is defined
as the grouping of four lines (one from each way), corresponding to the same index into the cache
array.
2. The higher order physical address bits A[0–21] are used as a tag reference or used to update the
cache line tag field.
3. The tags from the selected cache set are compared with the tag reference. If any one of the tags
matches the tag reference and the tag status is valid, a cache hit has occurred.
4. Virtual address bits A[27–28] are used to select one of the four double words in each line. A cache
hit indicates that the selected double word in that cache line contain valid data (for a read access),
or can be written with new data depending on the status of the W access control bit from the MMU
(for a write access to the DCache).
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-4
Freescale Semiconductor
L1 Cache
PHYSICAL ADDRESS
VIRTUAL ADDRESS
0
21
TAG DATA/TAG REFERENCE
20
2627
31
OFST
SET 0
TAG
WAY 3
WAY 2
WAY 1
WAY 0
••
INDEX
SET
Select
A[20:26])
STATUS DW0 DW1 DW2 DW3
SET 1
•
•
•
•
•
•
SET 127 TAG
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
STATUS DW0 DW1 DW2 DW3
DATA OR
INSTRUCTION
TAG
REFERENCE
A[0:21]
••
••
MUX
••
3
SELECT
2
COMPARATOR
0
••
1
HIT 3
HIT 2
HIT 1
LOGICAL OR
HIT
HIT 0
Figure 9-3. 16-KB Cache Lookup Flow
9.4
Cache Control
Control of the cache is provided by bits in the L1 cache control and status registers (L1CSR0, L1CSR1).
Control bits are provided to enable/disable the cache and to invalidate it of all entries. In addition,
availability of each way of the caches may be selectively controlled for use. This way control provides
cache way locking capability, as well as controlling way availability on a cache line replacement. Ways
0–3 may be selectively disabled for instruction miss replacements and data miss replacements in the
respective caches by using the WID and WDD control bits. Software is responsible for maintaining
coherency between instruction and data caches, since independent copies of a cache line may be present
in both caches: one allocated by an instruction access and another by a data access.
9.4.1
L1 Cache Control and Status Register 0 (L1CSR0)
The L1 cache control and status register 0 (L1CSR0) is a 32-bit register used for general control of the data
cache as well as providing general control over disabling ways in both caches. The L1CSR0 register is
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-5
L1 Cache
accessed using a mfspr or mtspr instruction. The SPR number for L1CSR0 is 1010 in decimal. The
L1CSR0 register is shown in Figure 9-4.
SPR 1010
Access: Read/Write
0
R
3
4
7
WID
W
WDD
11
12
DCWM
13
DCWA
14
15
—
DCEC
E
30
31
All zeros
16
W
10
—
Reset
R
8
DCEI
17
18
19
20
21
22
26
27
— DCEDT DCSLC DCUL DCLO DCLFC DCLOA DCEA
—
Reset
23
24
25
28
29
DCBZ32 DCABT DCINV
DCE
All zeros
Figure 9-4. L1 Cache Control and Status Register 0 (L1CSR0)
The L1CSR0 bits are described in Table 9-1.
Table 9-1. L1CSR0 Field Descriptions
Bits
Name
Description
0–3
WID
Way Instruction Disable.
0 The corresponding way in the instruction cache is available for replacement by instruction miss
line fills.
1 The corresponding way instruction cache is not available for replacement by instruction miss line
fills.
• Bit 0 corresponds to way 0.
• Bit 1 corresponds to way 1.
• Bit 2 corresponds to way 2.
• Bit 3 corresponds to way 3.
The WID bits may be used for locking ways of the instruction cache and also are used to determine
the replacement policy of the instruction cache.
4–7
WDD
Way Data Disable.
0 The corresponding way in the data cache is available for replacement by data miss line fills.
1 The corresponding way in the data cache is not available for replacement by data miss line fills.
• Bit 4 corresponds to way 0.
• Bit 5 corresponds to way 1.
• Bit 6 corresponds to way 2.
• Bit 7 corresponds to way 3.
The WDD bits may be used for locking ways of the data cache and also are used to determine the
replacement policy of the data cache.
8–10
—
11
DCWM
Reserved1
Data Cache Write Mode
0 Data Cache operates in writethrough mode
1 Data Cache operates in copyback mode
When set to writethrough mode, the “W” page attribute from the MMU is ignored and all writes are
treated as writethrough required. When set, write accesses are performed in copyback mode
unless the “W” page attribute from the MMU is set.
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-6
Freescale Semiconductor
L1 Cache
Table 9-1. L1CSR0 Field Descriptions (continued)
Bits
Name
Description
12–13
DCWA
14
—
15
DCECE
16
DCEI
17
—
18–19
DCEDT
Data Cache Error Detection Type
00 Parity Error Detection is selected for both the tag and data arrays
01 EDC Error Detection is selected for the tag array and parity is selected for the data arrays
1x Reserved
20
DCSLC
Data Cache Snoop Lock Clear
0 Snoop has not invalidated a locked line
1 Snoop has invalidated a locked line
Indicates a cache line lock was cleared by a snoop operation which caused an invalidation. This bit
is set by hardware and will remain set until cleared by software writing 0 to this bit location.
21
DCUL
Data Cache Unable to Lock
Indicates a lock set instruction was not effective in locking a cache line. This bit is set by hardware
on an “unable to lock” condition (other than lock overflows) and will remain set until cleared by
software writing 0 to this bit location.
22
DCLO
Data Cache Lock Overflow
Indicates a lock overflow (overlocking) condition occurred. This bit is set by hardware on an
“overlocking” condition and will remain set until cleared by software writing 0 to this bit location.
23
DCLFC
Data Cache Lock Bits Flash Clear
When written to a ‘1’, a cache lock bits flash clear operation is initiated by hardware. Once
complete, this bit is reset to ‘0’. Writing a ‘1’ while a flash clear operation is in progress will result in
an undefined operation. Writing a ‘0’ to this bit while a flash clear operation is in progress will be
ignored. Cache Lock Bits Flash Clear operations require approximately 134 cycles to complete.
Clearing occurs regardless of the enable (DCE) value.
24
DCLOA
Data Cache Lock Overflow Allocate
Set by software to allow a lock request to replace a locked line when a lock overflow situation exists.
0 Indicates a lock overflow condition will not replace an existing locked line with the requested line
1 Indicates a lock overflow condition will replace an existing locked line with the requested line
Data Cache Write Allocation Policy
00 Cache line allocation on a cacheable write miss is disabled
01 Cache line allocation on a cacheable copyback write miss is enabled
10 Cache line allocation on a cacheable copyback or writethrough write miss is enabled
11 Reserved
This field also controls merging of store data into the linefill buffer while a cache linefill is in
progress. Store data will not be merged when write allocation is disabled. If DCWA is non-zero,
store data merging is enabled regardless of the type (writethrough/copyback) of write.
Reserved1
Data Cache Error Checking Enable
0 Error Checking is disabled
1 Error Checking is enabled
Data Cache Error Injection
0 Cache Error Injection is disabled
1 parity errors will be purposefully injected into every byte subsequently written into the cache. The
parity bit of each 8-bit data element written will be inverted. This includes writes due to store hits
as well as writes due to cache line refills.
DCEI will cause injection of errors regardless of the setting of DCECE, although reporting of errors
will be masked while DCECE = 0.
Reserved1
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-7
L1 Cache
Table 9-1. L1CSR0 Field Descriptions (continued)
1
Bits
Name
Description
25–26
DCEA
Data Cache Error Action
00 Error Detection causes Machine Check exception.
01 Error Detection causes Correction/Auto-invalidation. No machine check is generated for
uncorrectable errors unless the cache line was locked and invalidated or is dirty. Dirty lines are
not auto-invalidated. In EDC mode, correction is performed for single-bit tag errors, single-bit
lock errors, and single or multi-bit dirty errors. In parity mode, tag and lock errors will result in
invalidation of clean lines. In parity mode, tag and lock errors will result in invalidation of clean
lines. For both modes, correction is performed for data errors by reloading of the line.
1x Reserved
27
—
28
DCBZ32
Data Cache dcba, dcbz operation length
0 dcba, dcbz operations operate on an entire cache line
1 dcba, dcbz operations operate on 32bytes of a cache line
This bit is implemented for forward compatibility. Since cache lines are 32 bytes, this bit is ignored
for dcba, dcbz operations
29
DCABT
Data Cache Operation Aborted
Indicates a cache Invalidate or a Cache Lock Bits Flash Clear operation was aborted prior to
completion. This bit is set by hardware on an aborted condition, and will remain set until cleared by
software writing 0 to this bit location.
30
DCINV
Data Cache Invalidate
0 No cache invalidate
1 Cache invalidation operation
When written to a ‘1’, a cache invalidation operation is initiated by hardware. Once complete, this
bit is reset to ‘0’. Writing a ‘1’ while an invalidation operation is in progress will result in an undefined
operation. Writing a ‘0’ to this bit while an invalidation operation is in progress will be ignored. Cache
invalidation operations require approximately 134 cycles to complete. Invalidation occurs
regardless of the enable (DCE) value.
During cache invalidations, the parity check bits are written with a value dependent on the DCEDT
selection. DCEDT should be written with the desired value for subsequent cache operation when
DCINV is set to ‘1’ for proper operation of the cache.
31
DCE
Data Cache Enable
0 Cache is disabled
1 Cache is enabled
When disabled, cache lookups are not performed for normal load or store accesses, or for snoop
requests.
Other L1CSR0 cache control operations are still available. Also, operation of the store buffer is not
affected by DCE.
Reserved1
These bits are not implemented and should be written with zero for future compatibility.
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-8
Freescale Semiconductor
L1 Cache
9.4.2
L1 Cache Control and Status Register 1 (L1CSR1)
The L1 cache control and status register 1 (L1CSR1), shown in Figure 9-5, is a 32-bit register used for
general control of the instruction cache. The L1CSR1 register is accessed using an mfspr or mtspr
instruction. The SPR number for L1CSR1 is 1011 in decimal.
SPR 1011
Access: Read/Write
0
13
R
14
—
W
15
16
17 18
19
20
21
22
23
24
25 26 27 28
ICECE ICEI — ICEDT — ICUL ICLO ICLFC ICLOA ICEA
Reset
—
29
30
31
ICABT ICINV ICE
All zeros
Figure 9-5. L1 Cache Control and Status Register 1 (L1CSR1)
The L1CSR1 bits are described in Table 9-2.
Table 9-2. L1CSR1 Field Descriptions
Bits
Name
0–14
—
15
ICECE
Description
Reserved
Instruction Cache Error Checking Enable
0 Error Checking is disabled
1 Error Checking is enabled
16
ICEI
Instruction Cache Error Injection Enable
0 Cache Error Injection is disabled
1 When ICEDT = 00, parity errors are purposefully injected into every byte subsequently written into the
cache. The parity bit of each 8-bit data element written is inverted on cache linefills. When ICEDT = 01,
a double-bit error is injected into each double word written into the cache by inverting the two uppermost
parity check bits (p_chk[0:1]).
ICEI causes injection of errors regardless of the setting of ICECE, although reporting of errors is masked
when ICECE = 0.
17
—
18–19
ICEDT
20
—
21
ICUL
Instruction Cache Unable to Lock
Indicates a lock set instruction was not effective in locking a cache line. This bit is set by hardware on an
“unable to lock” condition (other than lock overflows) and remains set until cleared by software writing 0 to
this bit location.
22
ICLO
Instruction Cache Lock Overflow
Indicates a lock overflow (overlocking) condition occurred. This bit is set by hardware on an “overlocking”
condition and remains set until cleared by software writing 0 to this bit location.
23
ICLFC
Instruction Cache Lock Bits Flash Clear
When written to a 1, a cache lock bits flash clear operation is initiated by hardware. Once complete, this
bit is reset to 0. Writing a 1 while a flash clear operation is in progress will result in an undefined operation.
Writing a 0 to this bit while a flash clear operation is in progress will be ignored. Cache Lock Bits Flash
Clear operations require approximately 134 cycles to complete. Clearing occurs regardless of the enable
(ICE) value.
Reserved
Instruction Cache Error Detection Type
00 Parity Error Detection is selected for both the tag and data arrays
01 EDC Error Detection is selected
1x Reserved
Reserved
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-9
L1 Cache
Table 9-2. L1CSR1 Field Descriptions (continued)
Bits
Name
24
ICLOA
25–26
ICEA
27–28
—
29
ICABT
Instruction Cache Operation Aborted
Indicates a Cache Invalidate or a Cache Lock Bits Flash Clear operation was aborted prior to completion.
This bit is set by hardware on an aborted condition, and will remain set until cleared by software writing 0
to this bit location.
30
ICINV
Instruction Cache Invalidate
0 No cache invalidate
1 Cache invalidation operation
When written to a 1, a cache invalidation operation is initiated by hardware. Once complete, this bit is reset
to 0. Writing a 1 while an invalidation operation is in progress will result in an undefined operation. Writing
a 0 to this bit while an invalidation operation is in progress will be ignored. Cache invalidation operations
require approximately 134 cycles to complete. Invalidation occurs regardless of the enable (ICE) value.
During cache invalidations, the parity check bits are written with a value dependent on the ICEDT
selection. ICEDT should be written with the desired value for subsequent cache operation when ICINV is
set to ‘1’ for proper operation of the cache.
31
ICE
9.4.3
Description
Instruction Cache Lock Overflow Allocate
Set by software to allow a lock request to replace a locked line when a lock overflow situation exists.
0 Indicates a lock overflow condition will not replace an existing locked line with the requested line
1 Indicates a lock overflow condition will replace an existing locked line with the requested line
Instruction Cache Error Action
00 Error Detection causes machine check exception.
01 Error Detection causes correction/auto-invalidation. No machine check is generated unless a locked
line is invalidated. In EDC mode, correction is performed for single-bit tag and lock errors, and lines
with multi-bit tag or lock errors are invalidated. In parity mode, tag or lock errors will result in invalidation
of lines. For both modes, correction is performed for single or multi-bit data errors by reloading of the
line.
1x Reserved
Reserved
Instruction Cache Enable
0 Cache is disabled
1 Cache is enabled
When disabled, cache lookups are not performed for instruction accesses.
Other L1CSR1 cache control operations are still available and are not affected by ICE.
L1 Cache Configuration Register 0 (L1CFG0)
The L1 cache configuration register 0 (L1CFG0) is a 32-bit read-only register that provides information
about the configuration of the e200z7 L1 data cache design. The contents of the L1CFG0 register can be
read using a mfspr instruction. Figure 9-6 shows the L1CFG0 register.
SPR 515
0
Access: Read only
1
2
3
4
R CARCH CWPA CFAHA DCFISWA
W
Reset
0
0
1
0
1
5 6
—
00
7
8
9
10
11
12
DCBSIZE DCREPL DCLA DCECA
0
0
1
0
1
1
13
20 21
DCNWAY
31
DCSIZE
0 0 0 0 0 0 1 1 0 000001000 0
Figure 9-6. L1 Cache Configuration Register 0 (L1CFG0)
The L1CFG0 bits are described in Table 9-3.
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-10
Freescale Semiconductor
L1 Cache
Table 9-3. L1CFG0 Field Descriptions
Bits
Name
0–1
CARCH
2
CWPA
3
DCFAHA
Data Cache Flush All by Hardware Available
0 The data cache does not support Flush All in Hardware
4
DCFISWA
Data Cache Flush/Invalidate by Set and Way Available
1 The data cache supports flushing/invalidation by Set and Way via the L1FINV0 spr
5–6
—
7–8
DCBSIZE
Data Cache Block Size
00 The data cache implements a block size of 32 bytes
9–10
DCREPL
Data Cache Replacement Policy
10 The data cache implements a pseudo-round-robin replacement policy
11
DCLA
12
DCECA
Data Cache Error Checking Available
1 The data cache implements error checking
13–20
DCNWAY
Data Cache Number of Ways
0x03 The data cache is 4-way set-associative
21–31
DCSIZE
9.4.4
Description
Cache Architecture
00 The cache architecture is Harvard
Cache Way Partitioning Available
1 The caches support partitioning of way availability for I/D accesses
Reserved—read as zeros
Data Cache Locking unit Available
1 The data cache implements the line locking unit
Data Cache Size
0x010 The size of the data cache is 16 KB
L1 Cache Configuration Register 1 (L1CFG1)
The L1 cache configuration register 1 (L1CFG1) is a 32-bit read-only register that provides information
about the configuration of the e200z760n3 L1 instruction cache design. The contents of the L1CFG1
register can be read using a mfspr instruction. Figure 9-7 shows the L1CFG1 register.
SPR 516
0
R
W
Access: Read only
3
—
Reset 0 000
4
5
ICFISWA
1
6
—
0
7
8
9
10
11
12
ICBSIZE ICREPL ICLA ICECA
0
0
0
1
0
1
1
13
20 21
31
ICNWAY
ICSIZE
0 0 0 0 0 0 1 1
0 0 0 0 0 0 1 0 0 0 0
Figure 9-7. L1 Cache Configuration Register 1 (L1CFG1)
The L1CFG1 bits are described in Table 9-4.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-11
L1 Cache
Table 9-4. L1CFG1 Field Descriptions
Bits
Name
0–3
—
4
ICFISWA
5–6
—
7–8
ICBSIZE
Instruction Cache Block Size
00 The instruction cache implements a block size of 32 bytes
9–10
ICREPL
Instruction Cache Replacement Policy
10 The instruction cache implements a pseudo-round-robin replacement policy
11
ICLA
12
ICECA
Instruction Cache Error Checking Available
1 The instruction cache implements error checking
13–20
ICNWAY
Instruction Cache Number of Ways
0x03 The instruction cache is 4-way set-associative
21–31
ICSIZE
9.5
Description
Reserved—read as zeros
Instruction Cache Flush/Invalidate by Set and Way Available
1 The instruction cache supports invalidation by Set and Way via the L1FINV1 spr
Reserved—read as zeros
Instruction Cache Locking unit Available
1 The instruction cache implements the line locking unit
Instruction Cache Size
0x010 The size of the data cache is 16 KB
Data Cache Software Coherency
Data cache coherency is supported through software operations to invalidate, flush dirty lines to memory,
or invalidate dirty lines. The data cache may operate in either write-through or copyback modes, and in
conjunction with a MMU, may designate certain accesses as write-through or copyback. Data cache misses
force the push and store buffers to empty prior to performing the access to ensure coherency.
9.6
Address Aliasing
Each cache is virtually indexed and physically tagged, thus the problems associated with potential cache
synonyms due to effective address aliasing are eliminated, unless 1 KB or 2 KB pages are used. If 1 KB
or 2 KB pages are used and multiple virtual addresses are mapped to the same physical address, the low
order virtual address bits used to index the cache (A[20–21] for 1 KB pages, A20 for 2 KB pages) must be
the same for each of the virtual pages, and these index bit(s) must match the corresponding physical
address bit(s) value. For example, if logical pages X and Y map to physical page P, then X, Y, and P must
have the same values of A[20–21] for 1 KB pages, and A20 for 2 KB pages. Note that this limitation should
be already met because of the requirements on 1 KB and 2 KB page usage mandated by Section 10.2.6,
“Restrictions on 1-KB and 2-KB Page Size Usage.”
9.7
Cache Operation
This section contains the following subsections, which discuss cache operation in detail:
• Section 9.7.1, “Cache Enable/Disable”
• Section 9.7.2, “Cache Fills”
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-12
Freescale Semiconductor
L1 Cache
•
•
•
•
•
•
•
9.7.1
Section 9.7.3, “Cache Line Replacement”
Section 9.7.4, “Cache Miss Access Ordering”
Section 9.7.5, “Cache-Inhibited Accesses”
Section 9.7.6, “Guarded Accesses”
Section 9.7.7, “Cache-Inhibited Guarded Accesses”
Section 9.7.8, “Cache Invalidation”
Section 9.7.9, “Cache Flush/Invalidate by Set and Way”
Cache Enable/Disable
The caches are enabled or disabled by using the cache enable bits L1CSR0[DCE] and L1CSR1[ICE]
respectively. Cache enable bits are cleared by power-on reset or normal reset, disabling the caches.
When a cache is disabled, the cache tag status bits are ignored, and the cache is not accessed for snoops,
normal loads, stores, or instruction fetches. All normal accesses are propagated to the system bus as
single-beat (non-burst) transactions.
Note that the state of the Cache Inhibited access attribute (the I bit) remains independent of the state of
L1CSR0[DCE] and L1CSR1[ICE]. Disabling a cache does not affect the translation logic in the memory
management unit. Translation attributes are still used when generating attribute information on the system
buses.
The store buffer is still available for use even when the data cache is disabled.
Altering the DCE or ICE bit must be preceded by an isync and msync to prevent the cache from being
disabled or enabled in the middle of a data or instruction access. In addition, the cache may need to be
globally flushed before it is disabled to prevent coherency problems when it is re-enabled.
All cache operations are affected by disabling the cache. Cache management instructions (except for
mtspr L1FINV0,1 and mtspr L1CSR0,1) do not affect a cache when it is disabled.
9.7.2
Cache Fills
Cache line fills are requested when a cacheable load or instruction miss occurs. Cacheable store misses
only allocate cache lines if data cache write allocation is enabled for the type of store being performed. In
addition, no allocation is performed for a write-through store when the store buffer is disabled.
The cache line fill is performed critical double word first on the bus that is using a burst access. The critical
double word is forwarded to the requesting unit before being written to the cache, thus minimizing stalls
due to fill delays. Cache line fills load a four double word linefill buffer, and updates to the cache array are
performed as half-lines are received.
Read accesses may hit in the line buffer and data supplied from the buffer to the CPU. On writes which hit
to the buffer address, when write allocation is disabled, the writes stall until the cache fill has been
completed. When write allocation is enabled, these writes update the linefill buffer if the buffer is being
filled due to a store miss only; otherwise the write also stalls until the linefill completes.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-13
L1 Cache
Data may be streamed to the CPU as it arrives from the bus if a corresponding request is pending. In
addition, the cache supports hit under fill, allowing subsequent CPU accesses to be satisfied by cache hits
while the remainder of the line fill completes. This non-blocking capability improves performance by
hiding a portion of the line fill latency when data already in the cache or linefill buffer is subsequently
requested by the CPU.
The cache supports up to three outstanding misses and forwards these miss requests to the BIU. Miss data
is always returned from the BIU to the cache in-order.
Cache fill operations are performed as wrapping bursts on the system bus. If an error response is received
on any element of the burst, the burst will be terminated, and the cache line will be marked invalid.
If one or more store hit updates occur to the linefill buffer during allocation of a line for a store miss and
a subsequent error response is received during the linefill, the original store miss access and each
individual hitting store access are performed on the system bus as if they were non-allocating. In this case,
an async machine check exception is signaled for the linefill.
9.7.3
Cache Line Replacement
On a cache miss, the cache controller uses a pseudo-round-robin replacement algorithm to determine
which cache line will be selected to be replaced. There is a single replacement counter for each cache. The
replacement algorithm acts as follows: On a miss, if the replacement pointer is pointing to a way that is
not enabled for replacement (the selected line or way is locked), it is incremented until an available way
is selected (if any). After a cache line is successfully filled without error, the replacement pointer
increments to point to the next cache way. If no way is available for the replacement, the access is treated
as a single beat access and no cache linefill occurs.
Lines selected for replacement which are dirty (modified) must be copied back to main memory. This is
performed by first storing the replaced line in a 32-byte push buffer while the missed data is fetched. After
filling the new line, the contents of the buffer are written to memory beginning with double word 0.
Each replacement counter is initialized to point to way 0 on a reset or on a respective cache invalidate all
operation. A replacement counter may also be set to a specific value via a L1FINV0/L1FINV1 command.
9.7.4
Cache Miss Access Ordering
Cacheable cache misses may be processed out-of-order by the e200z760n3. Load misses which are not
cache-inhibited are allowed to bypass buffered stores and push buffer pushes as long as no address alias
exists. Alias checking is performed by comparing the index of the load with the index of each buffered
store and push. If no alias match exists, the load is allowed to bypass buffered stores and pushes, regardless
of the attributes associated with those stores. Load misses are performed in-order with respect to other load
misses. Store accesses do not bypass loads. Stores are not necessarily performed in order from the point of
view of the memory system, since a store miss may cause a linefill to satisfy the store prior to previously
buffered stores being completed, as long as no aliasing occurs.
Memory access ordering must be enforced by software where required, using the mbar and/or msync
instructions according to the Power Architecture storage ordering rules.
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-14
Freescale Semiconductor
L1 Cache
9.7.5
Cache-Inhibited Accesses
When the Cache-inhibited attribute is indicated by translation and a cache miss occurs, all accesses are
performed as single beat transactions on the system bus. Cache-inhibited status is ignored on all cache hits.
For cache-inhibited load access misses, the processor termination is withheld for the load until the store
buffer has been flushed of all entries, the push buffer has been emptied, and the load has completed to
memory. Cache-inhibited store accesses that are not marked as Guarded are placed in the store buffer
(when enabled) and the processor termination occurs when the store buffer entry is allocated. (see
Section 9.9, “Push and Store Buffers”)
9.7.6
Guarded Accesses
When the Guarded attribute is indicated by translation and a cache miss occurs, the access does not
proceed on the external bus until all previously initiated demand-accesses have been terminated to the
processor without error. Buffered stores are considered terminated to the processor when they are placed
into the store buffer. Guarded load misses that are not cache-inhibited are allowed to bypass buffered stores
and push buffer pushes as long as no address alias exists, regardless of whether a buffered store is guarded.
Guarded stores do not allocate cache lines on a miss. Instead, if the access is not cache-inhibited, they are
buffered in the store buffer (when enabled), regardless of whether or not they are write through required
(regardless of W bit or L1CSR0[DCWM] values), and performed as single-beat accesses on the bus.
9.7.7
Cache-Inhibited Guarded Accesses
When the Cache-inhibited and Guarded attributes are indicated by translation and a cache miss occurs,
accesses are performed as single beat transactions on the system bus. Cache-inhibited status is normally
ignored on all cache hits. Cache-inhibited status for write-through stores that are also guarded is not
ignored, however. For cache-inhibited guarded access misses, or for cache-inhibited guarded
write-through store hits, the processor termination is withheld until the store buffer has been flushed of all
entries, the push buffer has been emptied, and the access has completed to memory (see Section 9.9, “Push
and Store Buffers”). Cache-inhibited guarded stores with W = 0 or L1CSR0[DCWM] = 1, which hit
ignore Cache-inhibited and Guarded status.
9.7.8
Cache Invalidation
The e200z7 supports full invalidation of the caches under software control. The cache may be invalidated
through the L1CSR0[DCINV] and L1CSR1[ICINV] cache invalidate control bits. This function is
available even when a cache is disabled.
Reset does not invalidate a cache automatically. Software must use the DCNV/ICINV control for
invalidation after a reset. Proper use of this bit is to determine that it is clear and then set it with a pair of
mfspr mtspr operations. A 0-to-1 transition on DCNV/ICINV causes a flash invalidation to be initiated,
which lasts for multiple (approximately 134) CPU cycles. Once set, the DCNV/ICINV bit is cleared by
hardware after the operation is complete. It remains set during the invalidation interval and may be tested
by software to determine when the operation has completed. An mtspr operation to L1CSR0/1 that
attempts to change the state of DCNV/ICINV during invalidation does not affect the state of that bit.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-15
L1 Cache
To properly generate the tag parity/check bits during the invalidation process, the error detection type
control located in L1CSR0[DCEDT]/L1CSR1[ICEDT] should be configured properly at the time that the
invalidation operation is initiated. A subsequent change to the error detection type control requires a new
invalidation to avoid improper interpretation of previously stored tag parity/check bits.
During the process of performing the invalidation, a cache does not respond to accesses that are not snoop
accesses and remains busy. Interrupts may still be recognized and processed, potentially aborting the
invalidation operation. When this occurs, L1CSR0,1[ABT] is set to indicate unsuccessful completion of
the operation. Software should read the L1CSR0/L1CSR1 register to determine that the operation has
completed (L1CSR0,1[CINV] cleared), and then check the status of the L1CSR0,1[ABT] to determine
completion status.
NOTE
Note that while this implementation of the e200z7 stalls further instruction
execution during this invalidation interval, this is not guaranteed across all
implementations. Thus, software should be written using these guidelines.
Individual cache lines may be invalidated using the icbi, dcbi, or dcbf instructions. These instructions
require the respective cache to be enabled in order to operate normally.
9.7.9
Cache Flush/Invalidate by Set and Way
The e200z7 supports cache flushing under software control. The caches may be flushed and/or invalidated
by index and way through a mtspr l1finv{0,1} instruction.
The L1 flush and invalidate control registers (L1FINV0, L1FINV1) are 32-bit SPRs used to select a cache
set and way to be flushed/invalidated. No tag match is required. This function is available even when a
cache is disabled. L1FINV0 is used for data cache operations, while L1FINV1 is used for instruction cache
operations.
9.7.9.1
L1 Flush/Invalidate Register 0 (L1FINV0)
The SPR number for L1FINV0 is 1016 in decimal. The L1FINV0 register is shown in Figure 9-8.
SPR 1016
Access: Read/Write
0
R
W
Reset
5
—
6
7
CWAY
8
19 20
—
26 27
CSET
29
—
30
31
CCMD
All zeros
Figure 9-8. L1 Flush/Invalidate Register 0 (L1FINV0)
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-16
Freescale Semiconductor
L1 Cache
The L1FINV0 bits are described in Table 9-5.
Table 9-5. L1FINV0 Field Descriptions
1
Bits
Name
Description
0–5
—
6–7
CWAY
8–19
—
20–26
CSET
Cache Set
Specifies the cache set to be selected
27–29
—
Reserved1 for set/command extension
30–31
CCMD
Reserved1 for way extension
Cache Way
Specifies the data cache way to be selected
Reserved1 for set extension
Cache Command
00 The data contained in this entry is invalidated without flushing
01 The data contained in this entry is flushed if dirty and valid without invalidation
10 The data contained in this entry is flushed if dirty and valid and then is invalidated
11 Reset way replacement pointer to the way indicated by CWAY
These bits are not implemented and should be written with zero for future compatibility.
For cache flush operations, if a transfer error occurs on a data cache line flush, the push of the remaining
portion of the cache line is aborted; the line remains marked dirty and valid; and a machine check condition
is signaled.
For flush and flush with invalidation operations, data parity errors do not abort a flush to memory, but a
machine check is generated at the completion of the flush. In both cases, the cache line is left unchanged.
For flush with invalidation operations to clean lines, tag parity errors and data parity errors are ignored,
and the line is invalidated. Note that only the line indicated by CSET and CWAY is checked for errors;
lines in the other ways are ignored.
For invalidation without flushing operations, tag parity errors, data parity errors, and dirty-bit parity errors
are ignored, and the line is invalidated.
9.7.9.2
L1 Flush/Invalidate Register 1 (L1FINV1)
The SPR number for L1FINV1 is 959 in decimal. The L1FINV1 register is shown in Figure 9-9.
SPR 959
Access: Read/Write
0
R
W
5
—
Reset
6
7
CWAY
8
19 20
—
26 27
CSET
29
—
30
31
CCMD
All zeros
Figure 9-9. L1 Flush/Invalidate Register 1 (L1FINV1)
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-17
L1 Cache
The L1FINV1 bits are described in Table 9-6.
Table 9-6. L1FINV1 Field Descriptions
1
Bits
Name
0–5
—
6–7
CWAY
8–19
—
20–26
CSET
27–29
—
30–31
CCMD
Description
Reserved1 for way extension
Cache Way
Specifies the instruction cache way to be selected
Reserved1 for set extension
Cache Set
Specifies the instruction cache set to be selected
Reserved1 for set/command extension
Cache Command
00 The data contained in this entry is invalidated
01 Reserved
10 Reserved
11 Reset way replacement pointer to the way indicated by CWAY
These bits are not implemented and should be written with zero for future compatibility.
9.8
Cache Parity and EDC Protection
Cache parity is supported for both the tag and data arrays of each cache. Six parity check bits are provided
for each tag entry for the tag arrays of both caches to support multi-bit error detection (EDC), and
redundant dirty bits are provided in the data cache to provide dirty-bit parity checking without requiring a
read-modify-write operation when the dirty bit is set. Redundant lock bits are provided as well for both the
Icache and the Dcache. Byte parity is supported for the data arrays of the data cache. Eight parity check
bits are provided for each double word in the data arrays of the ICache, which can be used either standard
byte parity checking (single-bit error detection) or for multibit error detection (EDC—DED, double error
detection). When utilizing EDC protection, many multibit errors are also detected.
Parity and EDC checking is controlled by the L1CSR0[DCECE], L1CSR0[DCEDT], L1CSR1[ICECE],
and L1CSR1[ICEDT] control fields. When error checking is enabled, checking is performed on each cache
access, whether for lookup, snoop lookup, or for dirty line replacement. Parity or EDC errors are not
signaled by the respective cache when cache error checking is disabled for that cache
(L1CSR0[DCECE] or L1CSR1[ICECE] = 0).
For normal cache lookups due to instruction fetching, loads, or stores, if an uncorrectable tag parity or
EDC error is detected on any portion of the accessed tags, a parity error is signaled, regardless of whether
a cache hit or miss occurs. Otherwise, if a cache hit for a load occurs and a data parity error is detected on
any portion of the accessed double word of data, a parity error is also signaled. Data parity errors are
ignored for store hits, since the parity is updated for the data being stored. Data parity errors are ignored
for misses unless the replacement line is dirty or incurs a dirty bit parity error, since the parity will be
updated for the new linefill data being stored.
Signaling of a parity error may not cause an exception to occur, depending on the error detection action to
be taken. Instead, a correction/auto-invalidation cycle may be performed.
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-18
Freescale Semiconductor
L1 Cache
A dirty line push is not generated for a dirty line replacement that incurs an uncorrectable tag parity or
EDC error. In this case, a machine check is generated, but no push was requested to the external bus, and
the cache line is left unchanged. For dirty line pushes from the data cache, accessing the data arrays for the
push data may occur after the burst write has been requested on the external bus. Therefore, a push of dirty
data may actually push data that contains a parity error. A machine check is signaled, but the burst is not
aborted, and the line is invalidated and replaced.
Dirty bit parity is checked when invalidation or replacement operations are required. If a dirty parity error
is detected on a cache line replacement, in correction/auto-invalidation mode, it is ignored, and the line is
pushed normally. In machine check mode, a machine check exception is signaled, indicating a tag parity
error. Dirty status or dirty parity errors prevent the auto-invalidation of cache lines with tag parity or EDC
errors. If a dirty parity error occurs in correction/auto-invalidation mode, the line is assumed to be dirty. If
correction/auto-invalidation is enabled, the error is corrected by re-writing all three dirty bits to 1. This
implies that a single or multi-bit error that sets one or more dirty bits from an initially cleared state causes
the line to appear dirty. This should not cause a functional issue, however, because the only result is that a
clean but coherent line may be pushed on a flush or replacement in correction/auto-invalidation mode.
Regardless of the error action mode indicated by DCEA/ICEA, lock bit parity errors do not signal an
exception for normal hits without a tag parity error. If correction/auto-invalidation is enabled, on each
cache lookup operation, a single-bit lock error that is detected in one or more ways is corrected by
rewriting all lock bits to the correct state. Uncorrectable lock errors remain unchanged. For cache hits
without a tag parity/EDC error, all lock parity errors are ignored. Lock parity errors on a cacheable miss
(after a correction attempt if correction/auto-invalidation is enabled) result in the line(s) being invalidated
if clean and a machine check being generated. A new line is not allocated, and the lock bits are not updated
on the invalidation. Lock bit parity errors are ignored for non-cacheable accesses.
Signaling of a parity error or EDC error may cause a Machine Check exception to occur and one or more
syndrome bits to be set in the machine check syndrome register. However, it may instead result in a
correction/auto-invalidation operation and not in an exception being signaled. Both may also occur,
depending on the error action control setting in the appropriate cache control register. Refer to
Section 9.8.1, “Cache Error Action Control,” for details of the cache error action controls. Refer to
Section 7.6.2, “Machine Check Interrupt (IVOR1),” and to Section 2.4.7, “Machine Check Syndrome
Register (MCSR),” for a description of Machine Check conditions.
9.8.1
Cache Error Action Control
The L1CSR0[DCEA] and L1CSR1[ICEA] control fields allow the selection of several policies to apply
when errors are detected during a cache lookup. They are described in the following subsections.
9.8.1.1
L1CSR0[DCEA]/L1CSR1[ICEA] = 00, Machine Check Generation on
Error
Selection of the machine check generation on error policy allows all errors to be processed by software.
Parity or EDC errors that may result in incorrect operation cause a machine check condition. To be
recoverable, the machine check handler must not incur another parity or EDC error during the initial
portion of the machine check handler. Parity/EDC errors do not generate a machine check exception for
cache-inhibited accesses.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-19
L1 Cache
If machine check generation on error is enabled (L1CSR0[DCEA]/ L1CSR1[ICEA] = 00) and an
uncorrectable parity or EDC error is detected on any portion of the accessed tags for a cacheable load or
store access, a machine check is reported, regardless of whether a cache hit or miss occurs. If a cache hit
occurs and a parity or EDC error is detected on any portion of the accessed double word of data for a load
or an instruction access, a machine check is also reported. For store accesses, data parity errors are ignored.
Lock or dirty parity errors on a cacheable miss cause a machine check to be reported, indicating a lock
error and/or a tag parity error. Dirty parity errors on a cache hit for a reservation instruction (lwarx, stwcx.,
etc.) result in a machine check and indicate a tag parity error. If a miss occurs and a tag parity/EDC error
is detected on a lookup for a cacheable reservation instruction (lwarx, stwcx., etc.), it is ignored if the line
is clean. If the line is dirty or a dirty parity error occurs, a machine check is generated and the reservation
access is not run externally. Cache inhibited reservation accesses ignore all parity/EDC errors.
9.8.1.2
L1CSR0[DCEA]/L1CSR1[ICEA] = 01, Correction/Auto-invalidation on
Error
The correction/auto-invalidation on error policy attempts to cause most parity and EDC errors to be
transparently handled by correcting lines with single-bit tag errors, invalidating lines with uncorrectable
tag errors or with data errors, and then causing cache refills to reload correct data from memory, without
generation of exceptions. Exceptions are only generated when invalidations could cause or would cause a
change in correct behavior, such as changing the locked status of a line, or invalidating potentially dirty
data. Parity/EDC errors do not generate invalidations that could cause a machine check exception for
cache-inhibited accesses, however.
When using EDC protection for the cache tags (L1CSR0[DCEDT]/L1CSR1[ICEDT] = 01), single-bit tag
errors are corrected by the cache hardware during a correction/auto-invalidation cycle. Clean unlocked
lines with multi-bit errors are invalidated on cache hits, with no machine check signaled. Clean locked
lines with uncorrectable tag errors are invalidated on cache misses, and a machine check is signaled. When
operating with only parity protection for the cache tags (L1CSR0[DCEDT]/L1CSR1[ICEDT] = 00), clean
unlocked cache entries with detectable tag errors are invalidated rather than corrected by the cache
hardware during a correction/auto-invalidation cycle.
Note that since the data arrays have a higher probability of incurring an error than the tag arrays, due to
the relative storage capacities, most errors are transparently corrected, even if they are double-bit or
multi-bit errors. Using write-through mode for critical data ensures that invalidation or refills are able to
recover from errors transparently in most cases.
9.8.1.2.1
Instruction Cache Errors
If correction/auto-invalidation on error is enabled (L1CSR1[ICEA] = 01) and an error is detected on any
portion of the accessed tags or data for an access, a correction/auto-invalidation cycle is inserted,
regardless of whether a cache hit or miss occurs. During this cycle, any tag entry with a single-bit tag or
lock error is corrected if possible (correction is not possible when operation with only parity protection for
the tags), and re-written to correct the stored error. Tag entries with uncorrectable errors are invalidated if
unlocked or are invalidated if a cache miss occurs after a correction/auto-invalidation cycle regardless of
locked status. If a locked line is invalidated, a machine check occurs, no replacement occurs, and the
locked status remains set for the invalidated line(s) to assist software in determining the location of the
error(s).
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-20
Freescale Semiconductor
L1 Cache
Following the correction/auto-invalidation cycle, a re-lookup is performed for the access. If a cache hit
occurs on a way without a tag parity/EDC error, and a parity or EDC error is detected on any portion of
the accessed double word of data, a miss is forced, and the same line is refilled from system memory,
retaining the existing lock status. The replacement pointer for the cache is not updated in these
circumstances. If a cache hit occurs on a way without a tag parity/EDC error, parity or EDC errors on all
other lines are ignored, and no invalidations for those lines occurs.
For all cases of invalidations, if any line which was locked or incurred a lock error was invalidated, a
machine check also occurs, even though auto-invalidation is selected. Invalidation is not blocked for
locked lines or lines with lock parity errors on cache misses. The lock bits remain unmodified by the
invalidation operation to allow for potential software recovery.
If a refill of a locked line due to a data parity/EDC error encounters an external bus error during the linefill,
a machine check is generated, the line is invalidated, and the lock bits remain set.
9.8.1.2.2
Data Cache Errors
If correction/auto-invalidation on error is enabled (L1CSR0[DCEA] = 01) and an error is detected on any
portion of the accessed tags, or if a lock or dirty parity error is detected, an invalidation/correction cycle
is inserted, regardless of whether a cache hit or miss occurs. Following the invalidation/correction cycle,
a re-lookup is performed for the access. During the correction/auto-invalidation cycle, any tag entry with
a tag or lock error is corrected if possible, and re-written to correct the stored error. Tag entries with
uncorrectable errors are invalidated if the line is clean and unlocked, or if the line is clean and a miss will
occur after the re-lookup, regardless of lock status. Dirty parity errors are corrected by setting all dirty bits
to ‘1’. Dirty lines and lines with a dirty parity error are not invalidated.
Following the correction/auto-invalidation cycle, a re-lookup is performed for the access. If a cache hit
occurs on a way without a tag parity/EDC error, and a parity error is detected on any portion of the accessed
double word of data for a load, if the line is clean, a miss is forced and the line is refilled from system
memory, retaining the existing lock status. The replacement pointer for the cache is not updated in these
circumstances. All other clean unlocked lines with uncorrectable tag errors will have been invalidated
during the correction/auto-invalidation cycle if one was initially needed. Tag parity/EDC errors on lines
which were not invalidated earlier due to lock or dirty status will be ignored since a cache hit occurs. For
stores, parity errors on data are ignored, and no invalidation or refill of any lines will occur on a hit to a
way without a tag parity/EDC error.
Note that since the data arrays have a higher probability of incurring an error than the tag arrays, due to
the relative storage capacities, most errors will be transparently corrected. Using write-through mode for
critical data will ensure that invalidation or refills are able to recover from errors transparently in most
cases.
If a cache hit occurs on a way without a tag parity/EDC error, and a parity error is detected on any portion
of the accessed double word of data for a load, and the line is dirty or a dirty error occurs, no refill of the
cache line will occur, the line will not be invalidated, and a machine check will also occur, even if
auto-invalidation is selected. All other clean unlocked lines with uncorrectable tag errors will also have
been invalidated during the correction/auto-invalidation cycle if one was initially needed. Tag parity/EDC
errors on lines which were not invalidated earlier due to lock or dirty status will be ignored
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-21
L1 Cache
If a cache hit occurs only on a line(s) with an uncorrectable tag parity/EDC error after a invalidation
/correction cycle has been performed, since the line is dirty or has a dirty parity error (it would have been
invalidated otherwise), a machine check is generated, and no linefill is performed.
If a cache miss occurs and any line with an uncorrectable tag parity/EDC error is dirty or has a dirty parity
error, the line is not invalidated, a machine check is generated, and no linefill is performed. All clean lines
with tag errors will have been invalidated/corrected on a cache miss, regardless of locked status.
For all cases of invalidations, if any line which was locked or incurred a lock error was invalidated, a
machine check will also occur, even though auto-invalidation is selected. Invalidation on a miss is not
blocked for locked lines or lines with lock parity errors unless the access is cache-inhibited or is dirty. The
lock bits will remain unmodified by the invalidation operation to allow for potential software recovery.
If a refill of a locked line due to a data parity error encounters an external bus error during the linefill, a
machine check will be generated, the line will be invalidated, and the lock bits will remain set.
9.8.1.2.3
Data cache line flush or invalidation due to reservation instructions
(l[b,h,w]arx, st[b,h,w]cx.)
Normally, when executing a load and reserve, or a store conditional instruction, a cache line hit results in
the line being pushed (if dirty) and marked clean, and the reservation access performed as a single-beat
access. Certain parity or EDC errors may cause other actions however.
If a cache hit to a line with no tag parity/EDC error occurs when performing a lookup for a load or store
reservation access, the line will be pushed if dirty, or if a dirty parity error occurs, and will be marked as
clean. Locked status will not be changed. A push parity error may occur during the push if a data parity
error is encountered, and a machine check will be generated. In this case the reservation access will not be
performed. Otherwise, a load reservation access is then performed as a single-beat access, ignoring the
cache data. A store reservation access is performed as a writethrough single-beat write access on the bus,
regardless of whether it is marked as writethrough required. If the write access completes without error
and succeeds (no ERROR or XFAIL response from the bus), then the cache is updated with the store data,
but the line is left in a clean state. Uncorrectable tag errors on other clean unlocked lines will cause
invalidation of those lines without signaling a machine check. Uncorrectable tag errors on other cache lines
which are locked or are dirty will be ignored.
Otherwise, if any line has an uncorrectable tag parity/EDC error and is dirty or has a dirty parity error, a
machine check is generated, and the line(s) remains unchanged. Clean unlocked lines with tag parity/EDC
errors will be invalidated or corrected, but locked lines or lines with a lock error will not be invalidated on
a cache miss, since no new cache line will be allocated.
9.8.2
Parity/EDC Error Handling for Cache Control Operations and
Instructions
Parity/EDC errors are not signaled when the respective L1CSR0[DCECE] and L1CSR1[ICECE] cache
error checking enable bits are cleared. The following sections describe error handling for cache control
operations and cache control instructions when set.
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-22
Freescale Semiconductor
L1 Cache
9.8.2.1
L1FINV0/L1FINV1 Operations
For invalidation operations via the L1FINV0/L1FINV1 control registers, uncorrectable tag parity or EDC
errors result in the specified line being invalidated. No error is reported, regardless of the setting of the
DCEA/ICEA bit. Data parity or EDC errors and dirty errors are ignored. Parity or EDC errors on all other
ways not specified by the CWAY value for L1FINV0/L1FINV1 are ignored, regardless of the settings of
the DCEA/ICEA bit.
For flush and flush with invalidate operations via the L1FINV0 control register, if no uncorrectable tag
parity/EDC error occurs on the specified line, it is flushed to memory if dirty or if a dirty parity error occurs
and then invalidated for flush with invalidate operations. No machine check is signaled for dirty parity
errors. If an uncorrectable tag parity/EDC error occurs on the specified line, and the line is dirty or a dirty
error is encountered, no flush or invalidation is performed. The line remains unchanged, and a machine
check is generated. For flush operations, an uncorrectable tag parity or EDC error on a clean line is
ignored, and no error is reported. For flush with invalidate operations, an uncorrectable tag parity or EDC
error on a clean line results in the specified line being invalidated, and no error is reported. Lock status is
ignored for these operations.
Data parity errors may result in a push parity error and a machine check being generated. However, the
line is still flushed to memory if not prevented due to an uncorrectable tag parity/EDC error. If a push
parity error occurs, the line is left unaffected for flush with invalidate operations. Lock status is cleared on
an invalidation or flush with invalidation that does not result in a machine check.
9.8.2.2 Cache touch instructions (dcbt, dcbtst, icbt)
Parity errors are not signaled on a lookup for a dcbt, dcbtst, or icbt instruction. For those instructions, an
uncorrectable tag parity or EDC error results in a No-op and no error is reported, regardless of error
checking being enabled. No invalidations occur.
9.8.2.3 icbi instructions
For icbi instructions, on a hit to any locked or unlocked line without an uncorrectable tag parity/EDC error
(with or without a lock parity error), or on a hit to an unlocked line with an uncorrectable tag parity/EDC
error, the line(s) is invalidated, regardless of the setting of L1CSR1[ICEA]. No machine check is
generated. If L1CSR1[ICEA] = 01, if any line has a tag parity/EDC error, a correction/invalidation cycle
is inserted to correct tags with single-bit errors and to invalidate unlocked lines with multi-bit errors.
Locked lines with uncorrectable tag errors which miss are unaffected. No machine check is generated.
If a hit occurs to a line with a tag parity/EDC error (after a correction for L1CSR1[ICEA] = 01) that is
locked or has a lock parity error, the line is left unaffected. No machine check is generated, regardless of
the setting of L1CSR1[ICEA].
If a miss occurs, all parity/EDC errors are ignored, the lines are left unaffected. No machine check is
generated, regardless of the setting of L1CSR1[ICEA].
All data parity or EDC errors are ignored regardless of L1CSR1[ICEA].
9.8.2.4 dcbi instructions
For dcbi instructions, on a hit to a line without a tag parity/EDC error, the line is invalidated, regardless
of the setting of L1CSR0[DCEA]. For this case, data, lock, and dirty parity errors are ignored. When
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-23
L1 Cache
L1CSR0[DCEA] = 00, tag parity/DC errors on other lines are ignored. When L1CSR0[DCEA] = 01,
uncorrectable tag parity/EDC errors on other lines also cause clean unlocked lines to be invalidated,
regardless of hit or miss. No machine check is generated regardless of the setting of L1CSR0[DCEA].
For dcbi instructions that hit to a line with a tag parity/EDC error, the line(s) is invalidated if clean and
unlocked and no machine check is generated, regardless of the setting of L1CSR0[DCEA]. Uncorrectable
tag parity/EDC errors will cause other clean unlocked lines to be invalidated when L1CSR0[DCEA] = 01,
regardless of hit or miss. If a hit occurs to a line with an uncorrectable tag parity/EDC error and the line is
dirty, or is locked or has a lock parity error, the line is left unaffected, and no machine check is generated,
regardless of the setting of L1CSR0[DCEA].
For dcbi instructions that miss in all ways, when L1CSR0[DCEA] = 00, no invalidation is performed
regardless of tag parity/EDC errors and no machine check is signaled. Uncorrectable tag parity/EDC errors
cause clean unlocked lines to be invalidated when L1CSR0[DCEA] = 01, and no machine check is
signaled. All other lines are left unchanged.
9.8.2.5 dcbst instructions
For dcbst instructions, on a hit to any line without a tag parity or EDC error, if the line is dirty, or has a
dirty bit error, the line is flushed. Lock errors are ignored. When L1CSR0[DCEA] = 00, tag parity/EDC
errors on other lines are ignored. When L1CSR0[DCEA] = 01, uncorrectable tag parity/EDC errors on
other lines also cause clean unlocked lines to be invalidated, regardless of hit or miss. No machine check
is generated regardless of the setting of L1CSR0[DCEA]. For dcbst, lock and dirty errors are ignored on
a hit. Data parity errors will not prevent the line from being flushed, but will cause a machine check to be
generated due to a push parity error.
For cacheable dcbst instructions that hit only to a line with a tag parity/EDC error or that miss in all ways,
a machine check will be generated if L1CSR0[DCEA] = 00 and any line with a tag parity or EDC error is
dirty. Lock errors are ignored. If L1CSR0[DCEA] = 01, clean unlocked lines with an uncorrectable tag
parity/EDC error are invalidated, and no errors are signaled unless any line with an uncorrectable tag
parity/EDC error is also dirty or has a dirty parity error. If any line with an uncorrectable tag parity/EDC
error is dirty or has a dirty parity error, the line is not flushed and a machine check is generated, regardless
of the settings of L1CSR0[DCEA].
9.8.2.6 dcbf Instructions
For dcbf instructions, on a hit to any line without a tag parity or EDC error, if the line is dirty or has a dirty
bit error, the line is flushed and invalidated. Lock errors are ignored. When L1CSR0[DCEA] = 00, tag
parity/EDC errors on other lines are ignored. When L1CSR0[DCEA] = 01, uncorrectable tag parity/EDC
errors on other lines also cause clean unlocked lines to be invalidated, regardless of hit or miss. No
machine check is generated regardless of the setting of L1CSR0[DCEA]. For dcbf, data parity errors do
not prevent the line from being flushed, but cause a machine check to be generated due to a push parity
error.
For cacheable dcbf instructions that hit only to a line with a tag parity/EDC error or that miss in all ways,
a machine check is generated if L1CSR0[DCEA] = 00 and any line with a tag parity or EDC error is dirty,
locked, or has a dirty parity error or a lock parity error. If L1CSR0[DCEA] = 01, clean unlocked lines with
an uncorrectable tag parity/EDC error are invalidated, and no errors are signaled unless any line with an
uncorrectable tag parity/EDC error is also dirty, locked, or has a dirty parity error or a lock parity error. If
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-24
Freescale Semiconductor
L1 Cache
any line with an uncorrectable tag parity/EDC error is dirty or has a dirty parity error, the line is not flushed
and a machine check is generated. If any line with an uncorrectable tag parity/EDC error is locked or has
a lock parity error, the line is not invalidated, and a machine check is generated.
9.8.2.7 dcbz Instructions
For dcbz instructions, on a hit to any line without a tag parity/EDC error, the line is zeroed and set to dirty.
Data errors, lock errors, and dirty errors are ignored. When L1CSR0[DCEA] = 00, tag parity/EDC errors
on other lines are ignored. When L1CSR0[DCEA] = 01, uncorrectable tag parity/EDC errors on other lines
also cause clean unlocked lines to be invalidated, regardless of hit or miss. No machine check is generated
regardless of the setting of L1CSR0[DCEA]. For dcbz, lock errors are ignored on a hit.
For cacheable dcbz instructions that hit only to a line with a tag parity/EDC error or that miss in all ways,
a machine check is generated if L1CSR0[DCEA] = 00 and any line has a tag parity/EDC or lock error. If
L1CSR0[DCEA] = 01, all line(s) with an uncorrectable tag parity/EDC error are invalidated if clean. If a
clean line which was locked or had a lock parity error was invalidated, a machine check is generated. If
any line with an uncorrectable tag parity/EDC error is dirty or has a dirty parity error, the line is not
affected, and a machine check is generated, regardless of the settings of L1CSR0[DCEA]. If a machine
check is generated, no dcbz operation will be performed.
9.8.2.8 Cache Locking Instructions (dcbtls, dcbtstls, dcblc, icbtls, icblc)
For dcbtls, dcbtstls, dcblc, icbtls, and icblc instructions, on a hit to any line without a tag parity or EDC
error, the lock bits are set or cleared appropriately, and data, lock, and dirty bit parity or EDC errors are
ignored. When L1CSR0[DCEA]/L1CSR1[ICEA] = 00, tag parity/EDC or lock errors on other lines are
ignored. When L1CSR0[DCEA]/L1CSR1[ICEA] = 01, uncorrectable tag parity/EDC errors on other lines
also cause clean unlocked lines to be invalidated, regardless of hit or miss. No machine check is generated
regardless of the setting of L1CSR0[DCEA]/L1CSR1[ICEA]A.
For cacheable dcbtls, dcbtstls, and icbtls instructions that hit only to a line with a tag parity/EDC error or
which miss in all ways, a machine check is generated if L1CSR0[DCEA]/L1CSR1[ICEA] = 00 and any
line has a tag parity/EDC error or a lock error. If L1CSR0[DCEA]/L1CSR1[ICEA] = 01, clean lines with
an uncorrectable tag parity/EDC error are invalidated and if a clean line which was locked or had a lock
parity error was invalidated, a machine check is generated. If any line with an uncorrectable tag
parity/EDC error is dirty or has a dirty parity error, the line is not affected and a machine check is
generated, regardless of the settings of L1CSR0[DCEA]/L1CSR1[ICEA].
For cacheable dcblc and icblc instructions that hit only to a line with a tag parity/EDC error or that miss
in all ways, a machine check is generated if L1CSR0[DCEA]/L1CSR1[ICEA] = 00 and any line with a tag
parity/EDC error is locked or has a lock parity error. L1CSR0[DCEA]/L1CSR1[ICEA] = 01, lock and
dirty parity errors do not cause a machine check on their own, but clean lines with an uncorrectable tag
parity/EDC error are invalidated. If a clean line that was locked or had a lock parity error was invalidated,
a machine check is generated. If any locked line with an uncorrectable tag parity/EDC error is dirty or has
a dirty parity error, the line is not affected and a machine check is generated, regardless of the settings of
L1CSR0[DCEA]/L1CSR1[ICEA].
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-25
L1 Cache
9.8.3 Cache Inhibited Accesses and Parity/EDC Errors
For non-cacheable access misses, no cache parity/EDC exceptions are signaled. When operating with
correction/auto-invalidation disabled, tag parity/EDC errors cause misses for cache-inhibited accesses,
and no machine check is generated. When correction/auto-invalidation mode is enabled, a
correction/auto-invalidation cycle is run to correct/auto-invalidate tag, dirty, and lock errors, but
invalidations are only performed for uncorrectable tag errors on clean unlocked lines. If a cache-inhibited
load or instruction fetch access hit occurs to a line with no tag parity/EDC error, and the requested double
word of data has no parity/EDC error, the access is treated as a cache hit and the CI status is ignored.
Otherwise, if the requested double word of data has a parity/EDC error, the access is treated as a
cache-inhibited cache miss and the cache data is ignored, even if dirty. No machine check is generated in
this case. A cache-inhibited store hit to a line with no tag parity/EDC error causes the data to be written to
the cache, as well as to memory if the store is a write-through store, and all data parity errors are ignored.
If a cache hit occurs to a line with an uncorrectable tag error, the hit is ignored, the access is performed as
a cache-inhibited cache miss, and the cache data is ignored, even if dirty. No machine check is generated
in this case.
For cache control instructions such as dcbf, dcbi, icbi, and dcbst, which are performed to addresses
marked as cache-inhibited, no machine checks are generated. The operations are only performed on/for
lines which would not cause exceptions for the non-CI cases.
9.8.4 Snoop Operations and Parity/EDC Errors
For snoop command lookups in which a hit occurs to a cache line with no tag parity/EDC error, tag
parity/EDC errors in other lines are ignored, and no error condition is signaled.
Otherwise, for snoop command lookups in which a tag parity/EDC error occurs and no hit occurs to a tag
entry without a parity/EDC error, no correction attempt for the tags with errors is made regardless of
L1CSR0[DCEA]. The snoop response indicates an error condition. When such a tag parity/EDC error
occurs on a snoop invalidate command, the invalidation does not occur, and the error results in a machine
check. The snoop queue continues to be serviced, and the machine check will not necessarily be
recoverable. A checkstop condition does not occur however. In this respect, it is treated similarly to a
non-maskable interrupt, and MSR[RI] should be used accordingly by software.
9.8.5
EDC Checkbit/Syndrome Coding Scheme Generation—Icache
When operating with EDC enabled (L1CSR1[ICEDT] = 01), double bit error detection codes are used to
protect the tag and data portions of an instruction cache line. Each tag entry utilizes six check bits to cover
the tag + valid bit, and each double word of data in the data arrays utilizes eight check bits. These same
bits are used for parity coding when the L1CSR1[ICEDT] control field selects parity mode. The specific
coding schemes are shown in Table 9-7 and Table 9-8. The lock bits utilize bit-level redundancy, thus are
independently protected.
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-26
Freescale Semiconductor
L1 Cache
Table 9-7 shows the checkbit coding for each tag entry. A ‘*’ in the table indicates the bit is XORed to
form the final checkbit value.
Table 9-7. Tag Checkbit Generation
Tag Bit
Checkbits
p_tchk[0:5]
0
0
1
2
3
4
5
*
*
*
*
*
*
*
*
1
*
3
*
*
5
*
7
*
2
4
6
*
9
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
10 11 12 13 14 15 16 17 18 19 20 21
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
V
*
*
*
*
*
*
*
*
*
8
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Table 9-8 shows the checkbit coding for each double word data entry. A ‘*’ in the table indicates the bit is
XORed to form the final checkbit value.
Table 9-8. Data Checkbit Generation
Data Bit
Checkbits
p_dchk[0:7]
0 1
2
3
4
5
6
7
0
*
*
*
*
*
*
*
*
1
*
*
*
1
0
1
1
1
2
*
*
*
*
*
*
*
*
*
1
3
1
4
*
*
*
*
1
5
1
6
1
7
1
8
1
9
*
*
2
0
2
1
2
2
*
*
*
*
*
*
*
*
*
7
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
2
9
3
0
*
*
6
*
*
*
*
*
*
*
2
8
3
1
*
*
*
*
2
7
*
*
*
*
2
6
4
*
*
*
2
5
*
*
*
*
2
4
3
*
*
*
2
3
*
*
*
9
2
5
*
8
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
6
2
6
3
*
*
*
*
*
*
*
*
Data Bit
Checkbit
3 3
2 3
3
4
0
1
*
2
3
5
3
7
3
8
*
*
5
*
*
*
4
1
4
2
4
3
4
4
4
5
4
6
4
7
*
*
4
8
4
9
5
0
5
1
5
2
5
3
*
*
*
5
4
5
5
5
6
5
7
5
8
*
*
*
5
9
6
0
6
1
*
*
*
*
*
*
*
*
*
*
*
*
*
4
0
*
*
4
3
9
*
*
3
3
6
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-27
L1 Cache
Table 9-8. Data Checkbit Generation (continued)
Data Bit
Checkbits
p_dchk[0:7]
0 1
3
4
5
6
*
*
*
7
*
9.8.6
2
6
*
7
8
9
1
0
*
*
*
*
1
1
*
1
2
*
1
3
*
1
4
1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
2
2
3
*
*
*
*
*
*
*
*
*
*
*
2
4
2
5
2
6
2
7
2
8
2
9
3
0
*
*
*
*
*
*
*
*
*
*
3
1
*
EDC Checkbit/Syndrome Coding Scheme Generation—Dcache
When operating with EDC enabled (L1CSR0[DCEDT] = 01), double bit error detection codes are used to
protect the tag portion of a data cache line. The data array continues to utilize single-bit parity protection.
Each data cache tag entry utilizes six check bits to cover the tag + valid bit. Two of these same bits are
used for parity coding when the L1CSR0[DCEDT] control field selects parity mode. The specific coding
scheme for the tag array is the same as is used for the Icache, and is shown in Table 9-7. The dirty and lock
bits utilize bit-level redundancy, thus are independently protected. Three dirty bits are provided to support
single-bit and double-bit error detection. Correction is performed by setting the dirty bits to 1 if a dirty
parity error occurs and correction/auto-invalidation is enabled. Four lock bits are provided to support
single-bit error correction and double-bit error detection.
9.8.7
Cache Error Injection
Cache error injection provides a way to test error recovery by intentionally injecting parity errors into the
instruction and/or data cache.
Error injection into the instruction cache operates as follows:
• If L1CSR1[ICEI] is set and the L1CSR1[ICEDT] = 00, any instruction cache line fill to the
instruction cache data has all of the associated parity bits inverted in the instruction cache data
array for each double word loaded.
• If L1CSR1[ICEI] is set and L1CSR1[ICEDT] = 01, any instruction cache line fill to the instruction
cache data has the associated two most significant parity check bits inverted in the instruction
cache data array for each double word loaded.
Error injection for the data cache operates as follows:
• If L1CSR0[DCEI] is set, any cache line fill to the data cache data array has all of the associated
parity bits inverted in the data array for each double word loaded. Additionally, inverted parity bits
are generated for any bytes stored into the data cache data array on a store hit.
Cache parity error injection is not performed for cache debug write accesses, since parity bit values written
can be directly controlled (See Section 9.19.3, “Cache Debug Access Control Register (CDACNTL)”).
In order to clear the parity errors, a cache invalidation or an invalidation of the lines which could have had
an injected parity error may be performed. Line invalidation may be performed by an icbi/dcbi instruction,
or an L1FINV[0,1] invalidation operation.
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-28
Freescale Semiconductor
L1 Cache
9.8.8
Cache Error Cross-Signaling
Cache error cross-signaling provides a way to support multiple cores running in lock-step when one of the
CPUs encounters a parity/EDC error in the instruction and/or data cache. Refer to Section 11.2.13, “Cache
Error Cross-signaling Signals,” and Section 11.3.4, “Cache Error Cross-signaling Operation,” for more
details of operation.
9.9
Push and Store Buffers
The push buffer reduces latency for requested new data on a data cache miss by temporarily holding
displaced dirty data while the new data is fetched from memory. The push buffer contains 32 bytes of
storage (one displaced cache line).
If a data cache miss displaces a dirty line, the linefill request is forwarded to the external bus. While
waiting for the response, the current contents of the dirty cache line are placed into the push buffer. Once
the linefill transaction (burst read) completes, the cache controller can generate the appropriate burst write
bus transaction to write the contents of the push buffer into memory.
The store buffer contains a FIFO that can defer pending write misses or writes marked as write-through in
order to maximize performance. The store buffer can buffer as many as eight words (32 bytes) for this
purpose. The store buffer may be disabled for debug purposes. Operation of the store buffer is independent
of L1CSR0[DCE]. When the store buffer is enabled, non-allocating store operations which miss the cache
or which are marked as writethrough are placed in the store buffer, and the CPU access is terminated. Each
store buffer entry contains 32-bits of physical address, 32-bits of data, size information, and 3 bits of access
attribute information (W, G, and S/U) in order to properly drive the attribute output signals on a buffered
store access. Cache-inhibited guarded stores are not buffered however, and are delayed from being
performed until the push and store buffers have been emptied.
Once the push or store buffer has valid data, the internal bus controller uses the next available external bus
cycle to generate the appropriate write cycles. In the event that another data cache fill is required (e.g.,
cache load or store w/allocate miss to process) during the continued instruction execution by the processor
pipeline, an alias check is performed between the linefill address and all valid entries in the store and push
buffer using the index portion of the access address. If no match is found, the linefill may bypass pending
stores in the store or push buffer. Otherwise, if an alias exists (index matches any valid store buffer entry),
the data cache pipeline will stall until the aliased entries have been flushed from the store and push buffer
before generating the required external bus transaction for the linefill.
Single-beat read transactions will not bypass pending stores in the push or store buffer.
The push buffer is always emptied prior to queued store buffer entries to avoid memory consistency issues.
Once the push buffer has been loaded with dirty data to be written back to memory, a subsequent store may
be buffered, but will not be written to memory until the push has completed.
For cache-inhibited load accesses or cache-inhibited guarded store accesses, the processor termination is
withheld until the store buffer has been flushed of all entries, the push buffer has been emptied, and the
access has completed to memory.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-29
L1 Cache
A write to the L1CSR0 register may be used to force the push and store buffers to empty before proceeding
with the actual L1CSR0 update. Additionally, the msync and mbar instructions also cause these buffers
to be emptied prior to completion.
If an external transfer ERROR response occurs while emptying the store buffer, a machine check exception
is signaled to the CPU, and a store for the next entry to be written (if any) is initiated. If a transfer error
occurs for a push buffer transaction, the push of the remaining portion of the cache line is aborted, and a
machine check exception is signaled to the CPU. This is also the case for a cache control operation that
causes a line to be pushed. Following the transfer error, the line is marked invalid. If it is possible for a
transfer error to be returned by the system on a push or a buffered store, and this could cause a problem,
the address must be marked guarded and cache inhibited.
External termination errors that occur on any push of a dirty cache line results in a machine check
condition.
9.10
Cache Management Instructions
This section describes the implementation of cache management instructions in the e200z7.
9.10.1
Instruction Cache Block Invalidate (icbi) Instruction
If the cache line containing the byte addressed by the EA associated with this instruction is present in the
instruction cache, it is invalidated, regardless of lock status. If an instruction cache linefill is in progress
and the linefill data corresponds to the EA associated with a icbi, the instruction cache is not updated with
linefill data.
See the EREF for the full description of icbi.
9.10.2
Instruction Cache Block Touch (icbt) Instruction
If HID0[NOPTI] is set, this instruction is treated as a no-op. See the EREF for the full description of icbt.
9.10.3
Data Cache Block Allocate (dcba) Instruction
This instruction is treated as a no-op. See the EREF for the full description of dcba.
9.10.4
Data Cache Block Flush (dcbf) Instruction
If the cache line containing the byte addressed by the EA associated with this instruction is present in the
data cache, it is copied back to memory if dirty. The line is subsequently invalidated regardless of whether
it was copied back or locked. If a data cache linefill is in progress and the linefill data corresponds to the
EA associated with a dcbf, the data cache is not updated with linefill data.
This instruction is treated in the following way:
• As a load for the purposes of access protection
• As a no-op if the data cache is disabled
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-30
Freescale Semiconductor
L1 Cache
See the EREF for the full description of dcbf.
9.10.5
Data Cache Block Invalidate (dcbi) Instruction
If the cache line containing the byte addressed by the EA associated with this instruction is present in the
data cache, it is invalidated, regardless of lock status. No copyback occurs if the line is present in the data
cache and dirty. If a data cache linefill is in progress and the linefill data corresponds to the EA associated
with a dcbi, the data cache is not updated with linefill data.
This instruction is privileged. It is treated in the following way:
• As a store for the purposes of access protection
• As a no-op in supervisor mode if the data cache is disabled
See the EREF for the full description of dcbi.
9.10.6
Data Cache Block Store (dcbst) Instruction
If the cache line containing the byte addressed by the EA associated with this instruction is present in the
data cache, it is copied back to memory if dirty. The line is subsequently marked clean, and the lock status
is unchanged. The following conditions apply:
• This instruction is treated as a load for the purpose of access protection.
• If the data cache is disabled, this instruction is treated as a no-op.
See the EREF for the full description of dcbst.
9.10.7
Data Cache Block Touch (dcbt) Instruction
If HID0[NOPTI] is set, this instruction is treated as a no-op. See the EREF for the full description of dcbt.
9.10.8
Data Cache Block Touch for Store (dcbtst) Instruction
If HID0[NOPTI] is set, this instruction is treated as a no-op. See the EREF for the full description of
dcbtst.
9.10.9
Data Cache Block set to Zero (dcbz) Instruction
If the cache line containing the byte addressed by the EA associated with this instruction is present in the
data cache, all bytes in the line are zeroed, the line is marked as modified and remains valid. Lock status
remains unchanged. If the cache line is not present and the address is cacheable, it is established in the data
cache (without fetching from memory), all bytes in the line are zeroed, and the line is marked as modified
and valid.
This instruction is treated as a store for the purposes of access protection.
dcbz causes an alignment exception if the EA is marked by the MMU as cache-inhibited and a data cache
miss occurs, if the EA is marked by the MMU as write through required, if the data cache is disabled or is
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-31
L1 Cache
operating in writethrough mode, or if an overlocking condition prevents the allocation of a line into the
data cache.
See the EREF for the full description of dcbz.
9.11
Touch Instructions
Due to the limitations of using the icbt, dcbt, and dcbtst instructions, a program that uses these
instructions improperly may actually see a degradation in performance from their use. To avoid this, the
e200z7 provides the HID0[NOPTI] control bit to cause these instructions to be treated as no-ops.
9.12
Cache Line Locking/Unlocking Unit
This section has the following structure:
• Section 9.12.1, “Overview,” provides an overview of the Freescale EIS cache line
locking/unlocking unit.
• Section 9.12.2, “Instruction Details,” describes the instructions shown in Table 9-9.
Table 9-9. Cache Line Locking/Unlocking Unit Instructions
Acronym
dcbtls
•
Cross-Reference
Data cache block touch and lock set
9-35
Data cache block touch for store and lock set
9-37
dcblc
Data cache block lock clear
9-38
icbtls
Instruction cache block touch and lock set
9-39
icblc
Instruction cache block lock clear
9-41
dcbtstls
•
Definition
Section 9.12.3, “Effects of Other Cache Instructions on Locked Lines” identifies which
instructions have no effect on the state of the lock bit and which instructions flush/invalidate and
unlock a cache line.
Section 9.12.4, “Flash Clearing of Lock Bits” explains how the e200z7 supports flash clearing of
lock bits.
9.12.1
Overview
The e200z7 supports the Freescale EIS cache line locking unit which defines user-mode instructions to
perform cache locking/unlocking. Three of the instructions are for data cache locking control (dcblc,
dcbtls, dcbtstls) and two instructions are for instruction cache locking control (icblc, icbtls).
The dcbtls, dcbtstls, and dcblc lock instructions are treated as reads for checking access permissions when
translated by the TLB, and exceptions are taken for data TLB errors or data storage interrupts. The icbtls
and icblc instructions require either execute (X) or read (R) permission when translated by the TLB.
Exceptions are taken using data TLB errors (DTLB) or data storage interrupts (DSI), not ITLB or ISI.
The user-mode cache lock enable MSR[UCLE] may be used to restrict user-mode cache line locking. If
MSR[UCLE] is clear, any cache lock instruction executed in user-mode takes a cache-locking DSI
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-32
Freescale Semiconductor
L1 Cache
exception (unless no-oped) and set either ESR[DLK] or ESR[ILK]. If MSR[UCLE] is set, cache-locking
instructions can be executed in user-mode and they will not take a DSI for cache-locking. However, they
may still cause a DSI for access violations or cause machine checks for external termination errors.
The following list identifies cases where attempting to set a lock fail, even when no DSI or DTLB
exceptions occur.
• The target address is marked cache-inhibited and a cache miss occurs.
• The cache is disabled or all ways of the cache are disabled for replacement.
• The cache target indicated by the CT field (bits 6–10) of the instruction is not 0.
In these cases, the lock set instruction is treated as a no-op, and the cache unable to lock
L1CSR{0,1}[CUL] bit is set.
Assuming no exception conditions occur (DSI or DTLB error), for dcbtls, dcbtstls, and icbtls an attempt
is made to lock the corresponding cache line. If a miss occurs, and all of the available ways (ways enabled
for a particular access type) are already locked in a given cache set, an attempt to lock another line in the
same set will result in an overlocking situation. In this case, the cache overlock bit L1CSR{0,1}[CLO] is
set to indicate that an overlocking situation occurred. This does not cause an exception condition. The new
line is conditionally placed in the cache, displacing a previously locked line depending on the setting of
the appropriate L1CSR0,1[CLOA].
The CUL conditions have priority over the CLO condition.
If multiple no-op or exception conditions arise on a cache lock instruction, the results are determined by
the order of precedence described in Table 9-11.
It is possible to lock all ways of a given cache set. If an attempt is made to perform a non-locking line fill
for a new address in the same cache set, the new line is not put into the cache. It is satisfied on the bus
using a single beat transfer instead of normal burst transfers. If a dcbz instruction is executed, and all ways
available for allocation have been locked, an alignment exception is generated and no line is put into the
cache.
Cache line locking interacts with the ability to control replacement of lines in certain cache ways via the
L1CSR0 WID and WDD control bits. If any cache line locking instruction (icbtls, dcbtls, dcbtstls) is
allowed to execute and finds a matching line already present in the cache, the line’s lock bit is set
regardless of the settings of the WID and WDD fields. In this case, no replacement has been made.
However, for cache misses that occur while executing a cache line lock set instruction, the only candidate
lines available for locking are those that correspond to ways of the cache that have not been disabled for
the particular type of line locking instruction (controlled by WDD for dcbtls and dcbtstls, controlled by
WID for icbtls). Thus, an overlocking condition may result even though fewer than four lines with the
same index are locked.
The cache-locking DSI handler must decide whether or not to lock a given cache line based upon available
cache resources. If the locking instruction is a set lock instruction, and if the handler decides to lock the
line, it should do the following:
• Add the line address to its list of locked lines.
• Execute the appropriate set lock instruction to lock the cache line.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-33
L1 Cache
•
•
Modify save/restore register 0 to point to the instruction immediately after the locking instruction
that caused the DSI.
Execute an rfi.
If the locking instruction is a clear lock instruction, and if the handler decides to unlock the line, it should
do the following:
• Remove the line address from its list of locked lines.
• Execute the appropriate clear lock instruction to unlock the cache line.
• Modify save/restore register 0 to point to the instruction immediately after the locking instruction
that caused the DSI.
• Execute an rfi.
9.12.2
Instruction Details
This section provides details for the instructions shown in Table 9-10:
Table 9-10. Cache Line Locking/Unlocking Unit Instructions
Acronym
dcbtls
Definition
Cross-Reference
Data cache block touch and lock set
9-35
Data cache block touch for store and lock set
9-37
dcblc
Data cache block lock clear
9-38
icbtls
Instruction cache block touch and lock set
9-39
icblc
Instruction cache block lock clear
9-41
dcbtstls
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-34
Freescale Semiconductor
L1 Cache
dcbtls
dcbtls
Data Cache Block Touch and Lock Set
dcbtls
CT, RA, RB
31
0
CT
5 6
(E=0) Form X
RA
10 11
RB
15 16
0010100110
20 21
0
30 31
Description:
if RA=0 then a 640else a GPR(RA)
EA <- 320 || (a + GPR(RB))32:63
PrefetchDataCacheBlockLockSet(CT, EA)
If CT = 0, the cache line corresponding to EA is loaded and locked into the level 1 data cache.
If CT = 0 and the line already exists in the data cache, dcbtls locks the line without refetching it from
external memory.
Exceptions:
If the MSR[UCLE] (user-mode cache lock enable) bit is set, dcbtls may be performed while in user mode
(MSR[PR] = 1). If MSR[UCLE] is clear, an attempt to perform these instructions in user mode causes a
data cache locking error DSI unless the CT field or other conditions otherwise no-op the instruction.
The e200z7 only supports CT = 0. If CT is some value other than 0, the dcbtls is no-oped and the
L1CSR0[DCUL] bit is set indicating an unable-to-lock condition occurred. No other exceptions are
reported. If the data cache is disabled, the dcbtls is no-oped and L1CSR0[DCUL] is set indicating an
unable-to-lock condition occurred. No other exceptions are reported.
The dcbtls instruction is treated as a load with respect to translation and causes a DSI interrupt for access
violations, as well as causing a Data TLB error interrupt if the target address cannot be translated.
If the block corresponding to EA is cache-inhibited and a data cache miss occurs, the instruction is
no-oped, (no DSI is taken due to the cache-inhibited status), and L1CSR0[DCUL] is set, indicating an
unable-to-lock condition occurred.
Other registers altered:
• L1CSR0 (see below)
When a dcbtls is performed to an index, and a way can not be locked, L1CSR0[DCUL] is set, indicating
an unable-to-lock condition occurred. This also occurs whenever the dcbtls must be no-oped.
When a dcbtls is performed to an index in the data cache that already has all the ways locked, this is
referred to as an over-locking situation. There is no exception generated by an over-locking situation.
Instead, L1CSR0[DCLO] is set, indicating an over-lock condition occurred. A line is allocated and locked
in the cache depending on the setting of the L1CSR0[DCLOA] control bit. If system software wants to
precisely determine if an overlock condition has happened, it must perform the following code sequence:
dcbtls
msync
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-35
L1 Cache
mfspr (L1CSR0)
(check L1CSR0[DCUL] bit for cache index unable-to-lock condition)
(check L1CSR0[DCLO] bit for cache index over-lock condition)
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-36
Freescale Semiconductor
L1 Cache
dcbtstls
dcbtstls
Data Cache Block Touch for Store and Lock Set
dcbtstls
CT, RA, RB
31
0
CT
5 6
(E=0) Form X
RA
10 11
RB
15 16
0010000110
20 21
0
30 31
Description:
if RA=0 then a 640else a GPR(RA)
EA <- 320 || (a + GPR(RB))32:63
PrefetchDataCacheBlockLockSet(CT, EA)
The e200z7 treats the dcbtstls instruction identically to the dcbtls instruction because no hardware
coherency mechanisms are implemented for the cache.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-37
L1 Cache
dcblc
dcblc
Data Cache Block Lock Clear
dcblc
CT, RA, RB
31
0
CT
5 6
(E=0) Form X
RA
10 11
RB
15 16
0110000110
20 21
0
30 31
Description:
if RA=0 then a 640else a GPR(RA)
EA <- 320 || (a + GPR(RB))32:63
DataCacheClearLockBit(CT, EA)
If CT = 0, and the line is present in the L1 data cache, the lock bit for that line is cleared, making that line
eligible for replacement.
Exceptions:
If the MSR[UCLE] (user-mode cache lock enable) bit is set, dcblc may be performed while in user mode
(MSR[PR] = 1). If MSR[UCLE] is clear, an attempt to perform this instructions in user mode causes a DSI,
unless the CT field or other conditions otherwise no-op the instruction.
The e200z7only supports CT = 0. If CT is some value other than 0, the dcblc is no-op’ed. No other
exceptions are reported. If the data cache is disabled, the dcblc is no-op’ed. No other exceptions are
reported.
The dcblc instruction is treated as a load with respect to translation and causes a DSI interrupt for access
violations, as well as a Data TLB error interrupt if the target address cannot be translated.
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-38
Freescale Semiconductor
L1 Cache
icbtls
icbtls
Instruction Cache Block Touch and Lock Set
icbtls
CT, RA, RB
31
0
CT
5 6
(E=0) Form X
RA
10 11
RB
15 16
0111100110
20 21
0
30 31
Description:
if RA=0 then a 640else a GPR(RA)
EA <- 320 || (a + GPR(RB))32:63
PrefetchInstructionCacheBlockLockSet(CT, EA)
If CT = 0, the cache line corresponding to EA is loaded and locked into the level 1 instruction cache.
If CT = 0 and the line already exists in the instruction cache, icbtls locks the line without refetching it from
external memory.
Exceptions:
If MSR[UCLE] (user-mode cache lock enable) is set, icbtls may be performed while in user mode
(MSR[PR] = 1). If MSR[UCLE] is clear, an attempt to perform these instructions in user mode causes an
Instruction cache locking error DSI unless the CT field or other conditions otherwise no-op the instruction.
The e200z7 only supports CT = 0. If CT is some value other than 0, the icbtls is no-op’ed and
L1CSR1[ICUL] is set, indicating an unable-to-lock condition occurred. No other exceptions are reported.
If the instruction cache is disabled, the icbtls is no-op’ed and L1CSR1[ICUL] is set, indicating an
unable-to-lock condition occurred. No other exceptions are reported.
The icbtls instruction requires either execute or read (X or R) permissions with respect to translation and
cause a DSI interrupt for access violations as well as a Data TLB error interrupt if the target address cannot
be translated.
If the block corresponding to EA is cache-inhibited and an instruction cache miss occurs, the instruction
is no-op’ed, (no DSI is taken due to the cache-inhibited status), and the L1CSR1[ICUL] bit is set indicating
an unable-to-lock condition occurred.
Other registers altered:
• L1CSR1 (see below)
When icbtls is performed to an index and a way can not be locked, the L1CSR1[ICUL] bit is set indicating
an unable-to-lock condition occurred. This also occurs whenever icbtls must be no-op’ed.
When icbtls is performed to an index in the instruction cache that already has all the ways locked, this is
referred to as an overlocking situation. There is no exception generated by an overlocking situation.
Instead L1CSR1[ICLO] is set, indicating an overlock condition occurred. A line is allocated and locked in
the cache depending on the setting of the L1CSR1[ICLOA] control bit. If system software wants to
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-39
L1 Cache
precisely determine whether an overlock condition has happened, it must perform the following code
sequence:
icbtls
msync
mfspr (L1CSR1)
(check L1CSR1[ICUL] bit for cache index unable-to-lock condition)
(check L1CSR1[ICLO] bit for cache index over-lock condition)
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-40
Freescale Semiconductor
L1 Cache
icblc
icblc
Instruction Cache Block Lock Clear
icblc
CT, RA, RB
31
0
CT
5 6
(E=0) Form X
RA
10 11
RB
15 16
0011100110
20 21
0
30 31
Description:
if RA=0 then a 640else a GPR(RA)
EA <- 320 || (a + GPR(RB))32:63
InstCacheClearLockBit(CT, EA)
If CT = 0, and the line is present in the instruction cache, the lock bit for that line is cleared, making that
line eligible for replacement.
Exceptions:
If the MSR[UCLE] (user-mode cache lock enable) bit is set, icblc may be performed while in user mode
(MSR[PR]=1). If the MSR[UCLE] bit is clear, an attempt to perform these instructions in user mode
causes an instruction cache locking error DSI unless the CT field or other conditions otherwise no-op the
instruction.
The e200z7 only supports CT = 0. If CT is some value other than 0, the icblc is no-op’ed. No other
exceptions are reported. If the instruction cache is disabled, the icblc is no-op’ed. No other exceptions are
reported.
The icblc instruction requires either execute or read (X or R) permissions with respect to translation and
cause a DSI interrupt for access violations as well as a Data TLB error interrupt if the target address cannot
be translated.
9.12.3
Effects of Other Cache Instructions on Locked Lines
The following cache instructions have no effect on the state of a cache line's lock bit: icbt, dcba, dcbz,
dcbst, dcbt, and dcbtst.
The following cache instructions flush/invalidate and unlock a cache line in the respective L1 caches:
dcbf, dcbi, and icbi.
9.12.4
Flash Clearing of Lock Bits
The e200z7 supports flash clearing of cache lock bits under software control by using the CFCL (cache
flash clear locks) control bit in the L1CSR{0,1} register.
Lock bits are not cleared automatically upon power-up (m_por) or normal reset (p_reset_b). Software
must use the CLFC control bit to clear the lock bits after a reset. Proper use of this bit is to determine that
it is clear and then set it with a pair of mfspr mtspr operations. A 0-to-1 transition on CLFC causes a flash
clearing of the lock bits to be initiated which lasts for multiple (approx. 134) CPU cycles. Once set, the
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-41
L1 Cache
CLFC bit will be cleared by hardware after the operation is complete. It remains set during the clearing
interval and may be tested by software to determine when the operation has completed. An mtspr
operation to L1CSR{0,1}, which attempts to change the state of L1CSR{0,1}[CLFC] during invalidation,
does not affect the state of that bit.
During the process of performing the flash clearing, the cache does not respond to accesses and remains
busy. Interrupts may still be recognized and processed, potentially aborting the flash clearing operation.
When this occurs, L1CSR{0,1}[ABT] is set to indicate unsuccessful completion of the operation.
Software should read the L1CSR{0,1} register to determine that the operation has completed
(L1CSR{0,1}[CLFC] cleared) and then check the status of L1CSR{0,1}[ABT] to determine completion
status.
NOTE
Note that while most implementations of the e200z7 stall further instruction
execution during this flash clearing interval, this is not guaranteed across all
implementations. Thus, software should be written using these guidelines.
9.13
Cache Instructions and Exceptions
All cache management instructions (except icbt, dcba, dcbt, anddcbtst) can generate TLB miss
exceptions if the effective address cannot be translated, or may generate DSI exceptions due to permission
violations. In addition, dcbz may generate an alignment interrupt as described in Section 9.10.9, “Data
Cache Block set to Zero (dcbz) Instruction.”
The cache locking instructions dcblc, dcbtls, dcbtstls, icblc and icbtls generate DSI exceptions if
MSR[UCLE] is clear and the locking instruction is executed in user mode (MSR[PR] = 1). Data cache
locking instructions that result in a DSI exception for this reason set ESR[DLK] (documented as DLK0 in
the Power ISA embedded category), and instruction cache locking instructions that result in a DSI
exception for this reason set ESR[ILK] (documented as DLK1 in the Power ISA embedded category).
9.13.1
Exception Conditions for Cache Instructions
If multiple no-op or exception conditions arise on a cache instruction, the results are determined by the
order of precedence described in Table 9-11.
Table 9-11. Special Case Handling
TLB
Miss
User &
UCLE=
0
Protection
Violation
WT
or Cache
in Writethrough
mode
Cache
Parity
Error
CI and
miss in
cache
All
Available
ways
locked
External
Terminati
on Error
Operation
CT  0
Cache
Disabled
icbt,
dcbt,
dcbtst
No-op
No-op
No-op
—
No-op
—
No-op
No-op
No-op
No-op
dcbtls
dcbtstls
dcblc
DCUL
DCUL
No-op
DCUL
DCUL
No-op
DTLB
DTLB
DTLB
DLK
DLK
DLK
DSI
DSI
DSI
—
—
—
MC
MC
MC
DCUL
DCUL
—
DCLO
DCLO
—
MC
MC
—
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-42
Freescale Semiconductor
L1 Cache
Table 9-11. Special Case Handling (continued)
TLB
Miss
User &
UCLE=
0
Protection
Violation
WT
or Cache
in Writethrough
mode
Cache
Parity
Error
CI and
miss in
cache
All
Available
ways
locked
External
Terminati
on Error
Operation
CT  0
Cache
Disabled
icbtls
icblc
ICUL
No-op
ICUL
No-op
DTLB
DTLB
ILK
ILK
DSI
DSI
—
—
MC
MC
ICUL
—
ICLO
—
MC
—
dcbz
—
ALI
DTLB
—
DSI
ALI
MC
ALI
ALI
—
dcbf,
dcbst
—
No-op
DTLB
—
DSI
—
MC
—
—
MC
icbi,
dcbi
—
No-op
DTLB
—
DSI
—
—
—
—
—
Atomic load
or store.
—
—
—
—
DTLB
DTLB
—
—
DSI
DSI
—
—
MC
MC
—
—
—
—
MC
MC
load
store
—
—
—
—
DTLB
DTLB
—
—
DSI
DSI
—
—
MC
MC
—
—
—
—
MC
MC
Notes:
• Priority decreases from left to right
• Cache operations that do not set or clear locks ignore the value of the CT field
• “dash” indicates executes normally
• “NOP” indicates treated as a no-op
• DSI = data storage interrupt; ALI = alignment interrupt; DTLB = data TLB interrupt
• DCUL, ICUL = no-op, and set L1CSR0[CUL]
• DCLO, ICLO = no-op, and set L1CSR0[CLO]
• DLK, ILK = data storage interrupt (DSI) and set ESR[DLK] or ESR[ILK]
• MC = Machine Check and update MCAR
9.13.2
Transfer Type Encodings for Cache Management Instructions
Transfer type encodings are used to indicate to the cache whether a normal access, atomic access, cache
management control access, or MMU management control access is being requested. These attribute
signals are driven with addresses when an access is requested. Table 9-12 shows the definitions of the
p_d_ttype[0:5] encodings.
Table 9-12. Transfer Type Encoding
p_d_ttype[0:5]1
Transfer Type
Instruction
00000e
Normal
Normal loads/stores
000010
Atomic
lwarx, stwcx., lharx, sthcx., lbarx, stbcx.
00010e
Flush Data Block
dcbst
00011e
Flush and Invalidate Data Block
dcbf
00100e
Allocate and Zero Data Block
dcbz
001010
Invalidate Data Block
dcbi
00110e
Invalidate Instruction Block
icbi
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-43
L1 Cache
Table 9-12. Transfer Type Encoding (continued)
p_d_ttype[0:5]1
1
Transfer Type
Instruction
001110
multiple word load/store
lmw, stmw
010000
TLB Invalidate
tlbivax
010010
TLB Search
tlbsx
010100
TLB Read entry
tlbre
010110
TLB Write entry
tlbwe
011000
Touch for Instruction
icbt
011010
Lock Clear for Instruction
icblc
011100
Touch for Instruction and Lock Set
icbtls
011110
Lock Clear for Data
dcblc
10000e
Touch for Data
dcbt
10001e
Touch for Data Store
dcbtst
100100
Touch for Data and Lock Set
dcbtls
100110
Touch for Data Store and Lock Set
dcbtstls
p_ttype[5] ’e’ is set to set to 0.
9.14
Sequential Consistency
The Power ISA embedded category architecture requires that all memory operations executed by a single
processor be sequentially self-consistent. This means that all memory accesses appear to be executed in
the order that is specified by the program with respect to exceptions and data dependencies. The e200 CPU
achieves this effect by operating a single pipeline to the cache/MMU. All memory accesses are presented
to the MMU in the exact order that they appear in the program, and therefore exceptions are determined
in order.
9.15
Self-Modifying Code Requirements
The following sequence of instructions synchronizes the instruction stream.
1. dcbf
2. icbi
3. msync
4. isync
This sequence ensures that the operation is correct for Power ISA embedded category processors that
implement separate instruction and data caches, as well as for multi-processor cache-coherent systems.
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-44
Freescale Semiconductor
L1 Cache
9.16
Page Table Control Bits
The Power ISA embedded category architecture allows certain memory characteristics to be set on a page
and on a block basis. These characteristics include write through (using the W-bit), cacheability (using the
I-bit), coherency (using the M-bit), guarded memory (using the G-bit), and endianness (using the E-bit).
Incorrect use of these bits may create situations where coherency paradoxes are observed by the processor.
In particular, this can happen when the state of these bits are changed without appropriate precautions
being taken (that is, flushing the pages that correspond to the changed bits from the cache) or when the
address translations of aliased real addresses specify different values for any of the WIMGE bits.
Certain mixing of WIMG settings are allowed by the Power ISA embedded category architecture.
However, others may present cache coherence paradoxes and are considered programming errors.
9.16.1
Write-through Stores
A write-through store (WIMGE = b’1xxxx’) may normally hit to a valid cache line. In this case, the cache
line remains in its current state, the store data is written into the cache, and the store goes out on the bus
as a single beat write.
9.16.2
Cache-Inhibited Accesses
When the cache-inhibited attribute is indicated by translation (WIMGE = b’x1xxx’) and a cache miss
occurs, all accesses are performed as single beat transactions on the system bus with a size indicator
corresponding to the size of the load, store, or prefetch operation.
Note that cache inhibited status is ignored on all cache hits.
9.16.3
Memory Coherence Required
For the e200z7, the “memory coherence required” storage attribute (WIMGE = b’xx1xx’) is reflected on
the p_d_gbl output during each external data access, to indicate to external coherency logic that memory
coherence is required. This bit is ignored for instruction accesses.
9.16.4
Guarded Storage
For the e200z7, the guarded storage attribute (WIMGE = b’xxx1x’) is used to determine if a second
outstanding data cache miss may proceed to the system interface prior to the termination of the first
outstanding miss. If the second address is marked as guarded, it is not presented to the external interface
until the previous miss has been completed without error.
9.16.5
Misaligned Accesses and the Endian (E) Bit
Misaligned load or store accesses that cross page boundaries can cause data corruption if the two pages do
not have the same endianness (that is, one page is big endian while the other page is little endian). If this
occurs, the processor would not get all the bytes, or would get some of them out of order, resulting in
garbled data. To protect against data corruption, the e200 core takes a DSI exception and sets the BO (byte
ordering) bit in the exception syndrome register whenever this situation occurs.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-45
L1 Cache
9.17
Reservation Instructions and Cache Interactions
The e200 core treats reservation instruction (lbarx, lharx, lwarx, stbcx., sthcx., and stwcx.) accesses as
though they were cache inhibited, regardless of page attributes. Additionally, a cache line corresponding
to the address of a reservation instruction access is flushed to memory if dirty, and then invalidated (even
if marked as locked) prior to the reservation access being issued to the bus. This allows external reservation
logic to be built which properly signals a reservation failure. The bus access is treated as a single-beat
transfer.
9.18
Effect of Hardware Debug on Cache Operation
Hardware debug facilities utilize normal CPU instructions to access register and memory contents during
a debug session. This may have the unavoidable side-effect of causing the store and push buffers to be
flushed. During hardware debug, the MMU page attributes are controllable by the debug firmware via
settings of the OnCE control register (OCR).
Cache snoop operations continue to be serviced during debug sessions.
Refer to Section 13.4.6.3, “e200 OnCE Control Register (OCR).”
9.19
Cache Memory Access For Debug/Error Handling
The cache memory provides resources needed to do foreground accesses via mtdcr instructions executed
by the processor, or background accesses through the JTAG/OnCE port to read and write the cache SRAM
arrays. Accesses are supported via a pair of device control registers (DCRs) which are also mapped into
OnCE-accessible registers. These resources are intended for use by special debug tools and by debug or
specialized error recovery exception software, not by general application code.
Access to the cache memory SRAM arrays using mtdcr instructions may be performed by
supervisor-level software after appropriate synchronization has been performed with msync, isync
instruction pairs. Access to the cache memory SRAM arrays using the JTAG port is conditional on the
CPU being in debug mode. The CPU must be placed in debug state prior to initiation of a read or write
access via OnCE.
This facility allows access only to the SRAM arrays used for cache tag and data storage. This function is
available even when the cache is disabled. The cache linefill buffer, push buffer, store buffer, and late write
buffer are all outside of the SRAM arrays and are not accessible. However, before a debug memory access
request is serviced, the push and store buffers will be written to external memory, and the late write and
linefill buffers will be written to the cache arrays.
9.19.1
Cache Memory Access via Software
Cache debug access control and data information are accessed by executing mfdcr and mtdcr instructions
to the Cache Debug Access control and data registers CDACNTL and CDADATA (see Table 9-13 and
Table 9-14). Accesses are performed one word (32 bits) at a time.
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-46
Freescale Semiconductor
L1 Cache
For a Cache write access, software must first write the CDADATA register with the desired tag and status
flags, or data values. The second step is to write the CDACNTL register with desired tag or data location
and parity values, and assert the R/W and GO bits in CDACNTL.
Note that writing a 64-bit value for data requires two passes, one for the even word (A29 = 0) and one for
the odd word (A29 = 1). Each 32-bit write will update all of the parity/check bits, so in general, if only a
single 32-bit write is performed, it should be preceded by a read of the data which is not being modified
in order to properly compute or store all 8 parity/check bits when the modified 32-bit data is written. Tag
writes are accomplished in a single pass.
For a Cache read access, software must first access and write the CDACNTL register with desired tag or
data location, and assert the R/W and GO bits in CDACNTL. The second step is to read the CDADATA
register for the tag or data and read the CDACNTL register for parity information.
Completion of any operation can be determined by reading the CDACNTL register. Operations are
indicated as complete when CDACNTL[30:31] = 00. Software should poll the CDACNTL register to
determine when an access has been completed prior to assuming validity of any other information in the
CDACNTL or CDADATA registers.
Note that no parity errors are generated as a result of mtdcr/mfdcr instructions involving the CDACNTL
or CDADATA registers.
To ensure proper cache write operation, the following program sequence is recommended:
loop:
msync
isync
mtdcr cdadata, rS1 // set up write data
mtdcr cdacntl, rS2 // write control to initiate write
msync
isync
mfdcr rN, cdacntl // check for done
andi. rT, rN, #3
bne loop
.
.
To ensure proper cache read operation, the following program sequence is recommended:
loop:
msync
isync
mtdcr cdacntl, rS2 // write control to initiate read
msync
isync
mfdcr rN, cdacntl // check for done
andi. rT, rN, #3
bne loop
mfdcr rT, cdadata // return data
.
.
Conflict conditions with snoop accesses to the same cache line cannot be resolved in a manner that
guarantees a value read will not change state before a subsequent value written. No interlocking is
performed, so a cache entry read as being valid or written to a valid state may become invalid at any time.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-47
L1 Cache
9.19.2
Cache Memory Access Through JTAG/OnCE Port
Cache debug access control and data information are serially accessed through the OnCE controller and
access the Cache Debug Access control and data registers CDACNTL and CDADATA (see Table 9-13 and
Table 9-14). Accesses are performed one word (32 bits) at a time.
For a Cache write access, the user must first write the CDADATA register with the desired tag or data
values. The second step is to write the CDACNTL register with desired tag or data location, parity and
dirty information (for data writes only), and assert the R/W and GO bits in CDACNTL.
For a Cache read access, the user must first access and write the CDACNTL register with desired tag or
data location, and assert the R/W and GO bits in CDACNTL. The second step is to access and read the
CDADATA register for the tag or data and read the CDACNTL register for parity.
Completion of any operation can be determined by reading the CDACNTL register. Operations are
indicated as complete when CDACNTL[30:31] = 00. Debug firmware should poll the CDACNTL register
to determine when an access has been completed prior to assuming validity of any other information in the
CDACNTL or CDADATA registers.
Conflict conditions with snoop accesses to the same cache line cannot be resolved in a manner that
guarantees a value read will not change state before a subsequent value written. No interlocking is
performed, so a cache entry read as being valid or written to a valid state may become invalid at any time.
9.19.3
Cache Debug Access Control Register (CDACNTL)
The Cache Debug Access Control Register (CDACNTL) contains location information (T/D, CWAY,
CSET, and WORD), and control (R/W and GO) needed to access the Cache Tag or Data SRAM arrays.
Also included here are the SRAM parity bit values which must be supplied by the user for write accesses,
and which will be supplied by the cache for read accesses. The CDACNTL register is shown in
Figure 9-10.
DCR 351
0
R
W
1
Access: Read/Write
2
3
T/D — CWAY
4
5
6
—
12 13
CSET
Reset
15 16
19 20
WORD
23 24
PARITY
27
—
28
29
CACHE R/W
30 31
GO
All zeros
Figure 9-10. Cache Debug Access Control Register (CDACNTL)
Table 9-13 describes the CDACNTL bits.
Table 9-13. CDACNTL Field Descriptions
Bit
Name
0
T/D
1
—
Description
Tag/Data:
0 Data array selected
1 Tag array selected
Reserved1
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-48
Freescale Semiconductor
L1 Cache
Table 9-13. CDACNTL Field Descriptions (continued)
Bit
Name
Description
2–3
CWAY
4–5
—
6–12
CSET
Cache Set:
Specifies the cache set to be selected
13–15
WORD
Word (Data array access only, I or D cache)
Specifies one of eight words of selected set
16–23
PARITY/EDC
CHECK BITS
Cache Way
Specifies the cache way to be selected
Reserved1
Parity check bits2 (I or D cache)
Parity Mode (L1CSR[0,1][D,I]CEDT = 00):
Data array: Byte parity bits. One bit per data byte. bit 16: Parity for byte 0, bit 17: Parity for byte 1....
bit 23: Parity for byte 7.
Tag Array: parity check bits for tag. Bit 16 corresponds to parity of tag[0:11]. Bit 17 corresponds to
parity of tag[12:21]+V. bits 18:23 reserved.
EDC Mode (L1CSR[0,1][D,I]CEDT = 01):
Dcache Data array: Byte parity bits. One bit per data byte. bit 16: Parity for byte 0, bit 17: Parity for
byte 1.... bit 23: Parity for byte 7.
Icache Data Array: parity check bits for data. Bits 16:23 correspond to p_dchk[0:7] (See Table 9-8).
Tag Array: parity check bits for tag. Bits 16:21 correspond to p_tchk[0:5] (See Table 9-7). bits 22:23
reserved.
1
2
Reserved1
24–27
—
28
CACHE
29
R/W
Read / Write:
0 Selects write operation. Write the data in the CDADATA register to the location specified by this
CDACNTL register.
1 Selects read operation. Read the cache memory location specified by this CDACNTL register and
store the resulting data in the CDADATA register and store the parity bits in this CDACNTL
register.
30–31
GO
GO command bits
00 Inactive or complete (no action taken) hardware sets GO=00 when an operation is complete
01 Read or write cache memory location specified by this CDACNTL register.
1x Reserved
Cache Select
Specifies the cache to be selected
0 Selects the data cache for the operation.
1 Selects the instruction cache for the operation.
These bits are not implemented and should be written zero for future compatibility.
Cache parity checkers assume odd parity when using parity protection. EDC coding is used otherwise.
9.19.3.1
Cache Debug Access Data Register (CDADATA)
The cache debug access data register (CDADATA) contains the SRAM data for a debug access. The same
register is used for Tag and Data SRAM read and write operations for both caches. Note that a single 32-bit
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-49
L1 Cache
word is accessed. Accessing an entire 64-bit double word requires two passes. The CDADATA register is
shown in Figure 9-11.
DCR 350
Access: Read/Write
0
31
R
TAG or DATA
W
Reset
Undefined/Unaffected
Figure 9-11. Cache Debug Access Data Register (CDADATA)
Table 9-14 describes the CDADATA bits.
Table 9-14. CDADATA Field Descriptions
Bit
Name
0–31
TAG
TAG Array Access Data
When accessing the tag array of either cache, it has the following values:
0–21 Tag compare bits
22
Reserved
23
Valid bit
24–27 Lock bits. These four bits should have the same value:
1 Locked
0 Unlocked.
28–30 Dirty bits (data cache only). These three bits should have the same value:
1 Dirty
0 Clean
DATA
DATA Array Access Data (Bytes 0–3 of the selected word)
When accessing the data array of either cache, it has the following values:
0–7 byte 0
8–15 byte 1
16–23 byte 2
24–31 byte 3
9.20
Description
Hardware Debug (Cache) Control Register 0
Hardware debug control register 0 is used to disable certain cache features for hardware debug purposes.
This register is not intended for normal user use. The HDBCR0 register is accessed using a mfspr or mtspr
instruction. The SPR number for HDBCR0 is 976 in decimal. The HDBCR0 register is shown in
Figure 9-11.
SPR 976
Access: Read/Write (Supervisor only)
0
R
W
24
—
Reset
25
26
27
MBD SNPDIS —
28
DSB
29
30
31
DSTRM — ISTRM
All zeros
Figure 9-12. Hardware
Debug Control Register 0 (HDBCR0)
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-50
Freescale Semiconductor
L1 Cache
Table 9-15 describes the HDBCR0 bits.
Table 9-15. HDBCR0 Field Descriptions
Bits
0-24
9.21
—
Description
1
Reserved
MBD
Msync/Mbar Broadcast Disable
0 - msync/mbar broadcasting is enabled. p_sync_req_out asserted normally and
p_sync_ack_in is used to terminate msync and mbar MO=0,1 instruction execution
1 - msync/mbar broadcasting is disabled. p_sync_req_out remains negated, and
p_sync_ack_in is ignored and not used to terminate msync and mbar MO=0,1
instruction execution.
Note: MBD settings have no effect on the operation of p_sync_req_in and
p_sync_ack_out. Normal handshaking and completion of the synchronization
request input will be performed.
26
SNPDIS
Snoop Disable
0 - Snooping is not disabled. Snoops are processed normally according to the settings
of L1CSR0DCE.
1 - Snoop lookups are disabled. Snoops are processed in the same manner as when
the data cache is disabled, i.e null responses are generated and no snoop lookups
are performed.
27
—
28
DSB
29
DSTRM
30
—
31
ISTRM
25
1
Name
Reserved1
Disable Store Buffer
0 - Store buffer enabled
1 - Store buffer disabled
Disable Data Cache Streaming
0 - DCache streaming is enabled
1 - DCache streaming is disabled
Reserved1
Disable Instruction Cache Streaming
0 - ICache streaming is enabled
1 - ICache streaming is disabled
These bits are not implemented and should be written with zero for future compatibility.
Hardware Debug (Cache) Coherency
Hardware cache coherency is supported to allow for dual-core or CPU + I/O coherency. The cache must
operate in writethrough mode for those pages of memory requiring coherency operations. Coherency is
maintained by the use of snoop invalidation commands provided to the CPU through a dedicated snoop
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-51
L1 Cache
interface port. Snooping is only performed while the data cache is enabled (L1CSR0[DCE] = 1).
Figure 9-13 shows an abstract block diagram of the structure.
p_snp_rdy
p_snp_ack, p_snp_resp[0:4]
DCache
and
Cache
Control
Snoop
Port
Control
p_snp_id_out[0:3]
p_snp_req
p_snp_cmd[0:1]
CPU
Snoop
Command
Queue
p_snp_addr_in[0:26]
p_snp_id_in[0:3]
Arbiter
id[0:3]
snp_addr[0:26] cmd
p_cac_stalled
Figure 9-13. Snoop Command Port
9.21.1
Coherency Protocol
The cache operates in a 2-state protocol for coherency purposes. The only state a coherent cache line
should assume is Valid or Invalid. No Modified or Shared state is supported for coherent cache lines
(although modified state is available for non-coherent lines), thus no snoop copyback or intervention
operations are required. A snoop invalidation signaling port is provided to receive coherency requests.
Snoop invalidation requests are received at the snoop invalidation port, and arbitrate with the CPU for
access to the data cache tags for lookup and cache line invalidation. External coherency logic provides
snoop invalidation requests to the snoop invalidation port based on the bus activity of other coherent bus
masters, and these invalidation requests are later processed and a response provided. Memory regions
which require coherency operations must be marked as “memory coherence required” (page’s M bit set)
and as “writethrough” (page’s W bit set).
External data accesses by the CPU reflect the value of the M bit of the accessed page on the p_d_gbl
output. Typically, external coherency logic will monitor external accesses by a CPU (or other agent), and
will request invalidation operations to other coherent entities for write accesses which also have p_d_gbl
asserted. Non-shared data should be placed into pages with the M bit cleared, thus avoiding unnecessary
coherency operations.
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-52
Freescale Semiconductor
L1 Cache
9.21.2
Snoop Command Port
The snoop command port provides the signaling mechanism between external coherency logic and the
snoop request queue. Command requests are received on the p_snp_cmd[0:1], p_snp_id_in[0:3], and
p_snp_addr_in[0:26] inputs when the p_snp_req signal is properly asserted, and responses to snoop
command requests are provided on the p_snp_ack, p_snp_resp[0:4], and p_snp_id_out[0:3] outputs.
Snoop invalidation requests provide the physical address of the data to be invalidated
(p_snp_addr_in[0:26]), along with a four-bit ID field (p_snp_id_in[0:3]) which flows through the
command pipeline and is returned on the p_snp_id_out[0:3] output port along with the completion status
provided on p_snp_resp[0:4] when p_snp_ack is asserted.
The p_snp_rdy output signal provides a handshaking mechanism for flow control of snoop requests to
prevent overflow of the internal snoop queue which buffers incoming snoop requests from the snoop
command port prior to cache tag lookups and updates. Negation of p_snp_rdy indicates that another snoop
command port request will not be accepted due to resource constraints in the snoop pipeline.
Refer to Section 11.2.9, “Coherency Control Signals,” for details on the operating protocol of the snoop
command port.
The command value is stored in the snoop queue along with the snoop address and snoop ID value.
Table 9-16 shows the definitions of the p_snp_cmd[0:1] encodings.
Table 9-16. p_snp_cmd[0:1] Snoop Command Encoding
p_snp_cmd[0:1]
Response Type
00
Null—no status bit operation performed; lookup is performed
01
INV—invalidate matching cache entry
10
SYNC—synchronize snoop queue
11
Reserved
The NULL command is used for testing interface handshaking and other status gathering purposes. The
NULL command performs a snoop lookup operation, but performs no actual cache tag or status
modifications (even in the presence of tag parity or EDC errors). The INV command causes a snoop
lookup and subsequent invalidation of a matching cache line. The SYNC command causes the snoop
queue to be emptied with highest priority relative to CPU requests.
Table 9-16 shows the definitions of the p_snp_resp[0:4] encodings.
Table 9-17. p_snp_resp[0:4] Snoop Response Encoding
p_snp_resp[0:4]1
Response Type
000cc
NULL—no operation performed or no matching cache entry
001cc
AutoInv—AutoInvalidation performed on clean unlocked lines with tag parity errors
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-53
L1 Cache
Table 9-17. p_snp_resp[0:4] Snoop Response Encoding (continued)
p_snp_resp[0:4]1
Response Type
010cc
ERROR—Error in processing a snoop request due to TAG parity error.
For NULL commands, a tag parity error occurred and no hit to a tag without error occurred. No modification
of cache entries, no machine check generated internally.
For INV commands, a) possible invalidation of locked line with tag parity error occurred, or b) dirty line left
valid with tag parity error, or c) no true hit occurred, and one or more lines reported tag parity errors.
Machine check generated internally.
01100
SYNC—Sync completed, snoop queue synchronized
100cc
HIT Clean- matching unlocked cache entry found
101cc
HIT Dirty- matching unlocked dirty cache entry found
110cc
HIT Locked—matching clean locked cache entry found
111cc
HIT Dirty Locked—matching dirty locked cache entry found
1
cc = # collapsed requests
00 = no collapsing
01 = two requests combined
10 = three requests combined
11 = four requests combined
The NULL response indicates that either there was no matching cache entry found for a null or invalidate
command or the cache was disabled when the request was originally made. The HIT responses indicate
that a matching cache entry was found. The SYNC response indicates all previous entries in the snoop
queue were emptied. The ERROR response indicates that an error occurred in processing a snoop request
due to a cache tag parity error. The AutoInv response indicates that one or more cache lines with tag parity
errors was/were invalidated.
Responses for a null command are either NULL, HIT, or ERROR. Responses for an INV command are
either NULL (no hit occurred or cache is disabled), HIT (a matching entry was found and invalidated), or
ERROR (a tag parity error was found and left valid, no guarantee of the command success). Responses for
a Sync command are SYNC completed.
9.21.3
Snoop Request Queue
The snoop request queue provides a queueing mechanism between the snoop command port and the cache.
As requests are accepted from the snoop invalidate port, they are queued into an 8-deep FIFO queue for
arbitration to the cache for tag and status lookup and conditional status clearing.
Snoops can be collapsed within the queue under certain circumstances to minimize the number of
invalidation lookups performed. When two consecutive snoop requests refer to the same cache line, they
are collapsed (timing permitting) into a single snoop invalidation cycle. Collapsed entries are indicated
complete via an encoding of the p_snp_resp[0:4] status outputs.
Snoop invalidation requests have a lower priority than CPU data accesses or change of flow accesses when
only a single queue entry is occupied. This allows for some optimization in cycle-stealing of the tag array
from the CPU in an attempt to minimize CPU stalls. The snoop invalidation request priority is raised when
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-54
Freescale Semiconductor
L1 Cache
a snoop sync command is received on the snoop command port or when a sync request is generated on the
synchronization port (p_sync_req_in), regardless of the number of active queue entries.
9.21.4
Snoop Lookup Operation
Entries in the snoop request queue are processed in-order after arbitrating for the cache tag and status bit
arrays. Once the CPU has been stalled from performing further tag accesses, the snoop request queue is
processed by performing a tag lookup and a subsequent status bit write to clear the valid bit of a matching
valid entry. Invalidation hits require two tag array accesses to read and then update the valid bit. A
subsequent snoop lookup may be pipelined while the first lookup of a pair of lookups is being processed
to determine a hit/miss condition. In this manner, a pair of hitting invalidation requests block the CPU for
a total of 5 cycles. A single snoop lookup requires 3 cycles of latency on a miss and 4 cycles on a hit prior
to allowing the CPU to resume cache accesses. If the snoop queue contains enough entries, snoop read and
write accesses to the cache tag are pipelined, and the total blockage is
3  number_of_hits + number_of_misses + 1. In certain cases where the CPU has pipelined one or more
cache misses, initial snoop accesses are interlaced with CPU tag accesses prior to assuming highest priority
in order to allow for proper operation of linefill and copyback operations initiated by the CPU.
As entries are removed from the queue and the invalidation lookups are performed, the results of the
lookups are provided on the response output signals, along with the original request ID.
9.21.5
Snoop Errors
Errors can occur during snoop lookup operations and are signaled on the snoop response output port. Tag
parity errors that prevent an accurate hit/miss determination on the snoop request address may result in an
error response signaled via p_snp_resp[0:4]. They may also result in a machine check to the CPU for the
INV command if a locked line was invalidated, if a line was dirty and not invalidated, or if a tag parity
error occurred and no hit occurred to a line without error. When such a tag parity error occurs, the
invalidation does not occur to the line(s) with error. The snoop queue continues to be serviced, and the
machine check is not necessarily recoverable. A checkstop condition does not occur, however. In this
respect, it is treated similarly to a non-maskable interrupt, and MSR[RI] should be used accordingly by
software.
9.21.6
Snoop Collisions
Snoop requests may collide with an outstanding or pending cache linefill.
Because there is no particular guarantee of the precise time an actual snoop invalidation lookup occurs
relative to a cache linefill request, in some instances the CPU may be in the process of filling a line
corresponding to a snoop invalidate request. In this case, the snoop causes the linefills to be marked such
that they are not loaded into the cache. However, load miss operations that are in progress may use the data
as it returns. The responses for these collisions is based on the state that the cache line would have taken
if the linefill had completed successfully.
Snoop requests should not collide with dirty line copyback or flush operations because the coherent pages
must be marked as write through required. These snoop collisions are ignored.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-55
L1 Cache
9.21.7
Snoop Synchronization
Synchronization of the snoop queue occurs under two conditions: a synchronization port sync request and
a snoop command port sync request.
9.21.7.1
Synchronization Port Request
Assertion of the p_sync_req_in signal causes the snoop queue to assume highest priority and be flushed.
It is assumed that the system stops generating snoop requests during a synchronization of the queue to
allow it to drain. However, if snoop requests continue to be received, the acknowledgement of the
synchronization request is delayed until the queue finally drains to the point that all queue entries that were
present prior to the recognition of the sync request have been serviced.
In general, the synchronization port is expected to be utilized to handshake execution of msync
instructions from an alternate CPU. Additional snoop requests do not typically occur until the
synchronization handshake is complete, since no further bus writes will be requested by the alternate CPU.
However, if additional coherency traffic occurs due to another alternate master, it follows the normal
queueing process and does not block the eventual assertion of the p_sync_ack_out signal.
9.21.7.2
Snoop Command Port Request
Receiving a snoop command port snoop sync request encoded via the p_snp_cmd[0:1] inputs causes the
snoop queue to assume highest priority and to be flushed to the point the command has reached the head
of the queue and been acknowledged. After the command has been completed, snoop queue priority
reverts to normal operation, unless another snoop sync command has been received and placed into the
queue, in which case snoop queue priority remains elevated until all snoop sync commands have been
processed from the queue.
9.21.8
Starvation Control
To avoid starvation of a higher priority CPU due to a continuous stream of snoop requests from a lower
priority master which block CPU forward progress, some form of starvation control is desired. This is
implemented with a forward progress counter that tracks the number of contiguous cycles the CPU has
been prevented from accessing the cache due to snoop command port access requests. Once the count has
been exceeded, the CPU regains highest priority for one access cycle. A similar counter exists for the
snoop queue to allow for periodic snoop request processing when the queue holds only a single entry. Each
counter is 4 bits and causes a priority inversion to occur for tag access upon timeout.
The presence of one or more sync commands in the snoop queue when the counter expires delays the
priority inversion until the queue has been emptied up to the point that the sync(s) have been completed.
Subsequent syncs received while the starvation timeout is being postponed may also prevent the priority
inversion after the original sync(s) have been completed if additional snoops have been queued during the
sync command processing. This is not normally expected to occur in a typical system.
In addition, external logic may be used to implement additional safeguards by monitoring the
p_cac_stalled output, which indicates that the CPU has a pending cache access request blocked due to
snoop access activity.
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-56
Freescale Semiconductor
L1 Cache
9.21.9
Queue Flow Control
To avoid overflow of the snoop queue, the p_snp_rdy output is provided to indicate whether an additional
snoop command port request will be accepted on the following clock cycle. When negated, no further
command requests can be honored until a snoop queue entry becomes available.
To provide for flow control of CPU-generated snoop requests to another CPU’s queue, the
p_stall_bus_gwrite input is provided. This input suspends further bus activity that is requesting a global
write cycle. Other bus traffic is not affected.
9.21.10 Snooping in Low Power States
If the clock is running, snooping remains enabled while in the waiting or halted states. Snoops should only
be issued to the core complex while the core is in the normal, halted, or waiting states and both the p_stop
and p_stopped signals are negated.
When a request is made to enter stop mode via the assertion of p_stop, the p_snp_rdy output is negated.
While the core complex is in the stopped (power-down) state, bus snooping is disabled, and the p_snp_rdy
output is held negated. Snoop requests are processed around the assertion of the stop mode entry request
(assertion of p_stop) per the normal protocol associated with p_snp_rdy negation, including acceptance
of a snoop request during a small interval around p_snp_rdy negation. Therefore, additional snoop
operations may need to occur prior to entering the stopped state. All snoop queue entries are processed
prior to the assertion of p_stopped.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
9-57
L1 Cache
e200z7 Power Architecture Core Reference Manual, Rev. 2
9-58
Freescale Semiconductor
Chapter 10
Memory Management Unit
10.1
Overview
The e200z7 memory management unit (MMU) is a 32-bit Power ISA embedded category compliant
implementation, with the following feature set:
• Freescale EIS MMU architecture compliant
• Translates from 32-bit effective to 32-bit real addresses
• 64-entry fully associative TLB with support for twenty-three page sizes (1 KB, 2 KB, 4 KB, 8
KB, 16 KB, 32 KB, 64 KB, 128 KB, 256 KB, 512 KB, 1 MB, 2 MB, 4 MB, 8 MB, 16 MB,
32 MB, 64 MB, 128 MB, 256 MB, 512 MB, 1 GB, 2 GB, 4 GB)
• Hardware assist for TLB miss exceptions
• Software managed by tlbre, tlbwe, tlbsx, tlbsync, and tlbivax instructions
• Support for external control of entry matching for a subset of TID values to support non-intrusive
runtime mapping modifications
10.2
Effective to Real Address Translation
This section describes effective to real address translation. It contains the following subsections:
• Section 10.2.1, “Effective Addresses”
• Section 10.2.2, “Address Spaces”
• Section 10.2.3, “Process ID”
• Section 10.2.4, “Translation Flow”
• Section 10.2.5, “Permissions”
• Section 10.2.6, “Restrictions on 1-KB and 2-KB Page Size Usage”
10.2.1
Effective Addresses
Instruction accesses are generated by sequential instruction fetches or due to a change in program flow
(branches and interrupts). Data accesses are generated by load, store, and cache management instructions.
The e200 instruction fetch, branch, and load/store units generate 32-bit effective addresses. The MMU
translates this effective address to a 32-bit real address that is then used for memory accesses.
The Power ISA embedded category architecture divides the effective (virtual) and real (physical) address
space into pages. The page represents the granularity of effective address translation, permission control,
and memory/cache attributes. The MMU supports twenty-three page sizes (1 KB, 2 KB, 4 KB, 8 KB, 16
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
10-1
Memory Management Unit
KB, 32 KB, 64 KB, 128 KB, 256 KB, 512 KB, 1 MB, 2 MB, 4 MB, 8 MB, 16 MB, 32 MB, 64 MB,
128 MB, 256 MB, 512 MB, 1 GB, 2 GB, 4 GB). For an effective to real address translation to exist, a
valid entry for the page containing the effective address must be in a translation lookaside buffer (TLB).
Addresses for which no TLB entry exists (a TLB miss) cause instruction or data TLB errors.
10.2.2
Address Spaces
Instruction accesses are generated by sequential instruction fetches or due to a change in program flow
(branches and interrupts). Data accesses are generated by load, store, and cache management instructions.
The Power ISA embedded category architecture defines two effective address spaces for instruction
accesses and two effective address spaces for data accesses. The current effective address space for
instruction or data accesses is determined by the value of MSR[IS] and MSR[DS], respectively. The
address space indicator (the value of either MSR[IS] or MSR[DS], as appropriate) is used in addition to
the effective address generated by the processor for translation into a physical address by the TLB
mechanism. Because MSR[IS] and MSR[DS] are both cleared to 0 when an interrupt occurs, an address
space value of 0b0 can be used to denote interrupt-related address spaces (or possibly all system software
address spaces), and an address space value of 0b1 can be used to denote non interrupt-related (or possibly
all user address spaces) address spaces.
The address space associated with an instruction or data access is included as part of the virtual address in
the translation process (AS). The p_tc[1] interface signal indicates the appropriate address space.
10.2.3
Process ID
The Power ISA embedded category architecture defines that a process ID (PID) value is associated with
each effective address (instruction or data) generated by the processor. At the Power ISA embedded
category level, a single PID register is defined as a 32-bit register, and it maintains the value of the PID for
the current process. This PID value is included as part of the virtual address in the translation process
(PID0). For the e200z7 MMU, the PID is 8 bits in length. The most-significant 24 bits are unimplemented
and read as 0. The p_pid0[0:7] interface signals indicate the current process ID.
10.2.4
Translation Flow
The effective address, concatenated with the address space value of the corresponding MSR bit (MSR[IS]
or MSR[DS], is compared to the appropriate number of bits of the EPN field (depending on the page size)
and the TS field of TLB entries. If the contents of the effective address plus the address space bit matches
the EPN field and TS bit of the TLB entry, that TLB entry is a candidate for a possible translation match.
In addition to a match in the EPN field and TS, a matching TLB entry must match with the current Process
ID of the access (in PID0), or have a TID value of ‘0’, indicating the entry is globally shared among all
processes.
e200z7 Power Architecture Core Reference Manual, Rev. 2
10-2
Freescale Semiconductor
Memory Management Unit
Figure 10-1 shows the translation match logic for the effective address plus its attributes, collectively
called the virtual address, and how it is compared with the corresponding fields in the TLB entries.
TLB_entry[V]
TLB_entry[TS]
AS (from MSR[IS] or MSR[DS])
Process ID
TLB_entry[TID]
TLB_entry[EPN]
EA page number bits
TLB entry Hit
=?
=?
=0?
private page
shared page
=?
Figure 10-1. Virtual Address and TLB-Entry Compare Process
The page size defined for a TLB entry determines how many bits of the effective address are compared
with the corresponding EPN field in the TLB entry as shown in Table 10-1. On a TLB hit, the
corresponding bits of the real page number (RPN) field are used to form the real address.
Table 10-1. Page Size Field Encodings and EPN Field Comparison
SIZE Field
Page Size
(2SIZEKB)
EA to EPN Comparison
0b00000
0b00001
0b00010
0b00011
0b00100
0b00101
0b00110
0b00111
0b01000
0b01001
0b01010
0b01011
0b01100
0b01101
0b01110
0b01111
0b10000
0b10001
0b10010
0b10011
0b10100
0b10101
0b10110
1 KB
2 KB
4 KB
8 KB
16 KB
32 KB
64 KB
128 KB
256 KB
512 KB
1 MB
2 MB
4 MB
8 MB
16 MB
32 MB
64 MB
128 MB
256 MB
512 MB
1 GB
2 GB
4GB
EA[0–21] =? EPN[0–21]
EA[0–20] =? EPN[0–20]
EA[0–19] =? EPN[0–19]
EA[0–18] =? EPN[0–18]
EA[0–17] =? EPN[0–17]
EA[0–16] =? EPN[0–16]
EA[0–15] =? EPN[0–15]
EA[0–14] =? EPN[0–14]
EA[0–13] =? EPN[0–13]
EA[0–12] =? EPN[0–12]
EA[0–11] =? EPN[0–11]
EA[0–10] =? EPN[0–10]
EA[0–9] =? EPN[0–9]
EA[0–8] =? EPN[0–8]
EA[0–7] =? EPN[0–7]
EA[0–6] =? EPN[0–6]
EA[0–5] =? EPN[0–5]
EA[0–4] =? EPN[0–4]
EA[0–3] =? EPN[0–3]
EA[0–2] =? EPN[0–2]
EA[0–1] =? EPN[0–1]
EA[0] =? EPN[0]
(none)
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
10-3
Memory Management Unit
On a TLB hit, the generation of the physical address occurs as shown in Figure 10-2.
MSR[DS] for data access
MSR[IS] for instruction fetch
32-bit Effective Address
AS
PID
Effective Page Address
0
Offset
n–1n
31
Virtual Address
TLB
multiple-entry
RPN field of matching entry
Real Page Number
0
Offset
n–1n
31
32-bit Real Address
NOTE: n = 32 – log2(page size)
n  22
n = 20 for 4-KB page size.
Figure 10-2. Effective to Real Address Translation Flow
10.2.5
Permissions
An operating system may restrict access to virtual pages by selectively granting permissions for user mode
read, write, and execute, and supervisor mode read, write, and execute on a per page basis. These
permissions can be set up for a particular system (for example, program code might be execute-only, data
structures may be mapped as read/write/no-execute) and can also be changed by the operating system
based on application requests and operating system policies.
The UX, SX, UW, SW, UR, and SR access control bits are provided to support selective permissions
(access control):
• SR—Supervisor read permission. Allows loads and load-type cache management instructions to
access the page while in supervisor mode (MSR[PR = 0]).
• SW—Supervisor write permission. Allows stores and store-type cache management instructions to
access the page while in supervisor mode (MSR[PR = 0]).
• SX—Supervisor execute permission. Allows instruction fetches to access the page and instructions
to be executed from the page while in supervisor mode (MSR[PR = 0]).
• UR—User read permission. Allows loads and load-type cache management instructions to access
the page while in user mode (MSR[PR = 1]).
e200z7 Power Architecture Core Reference Manual, Rev. 2
10-4
Freescale Semiconductor
Memory Management Unit
•
•
UW—User write permission. Allows stores and store-type cache management instructions to
access the page while in user mode (MSR[PR = 1]).
UX—User execute permission. Allows instruction fetches to access the page and instructions to be
executed from the page while in user mode (MSR[PR = 1]).
If the translation match was successful, the permission bits are checked as shown in Figure 10-3. If the
access is not allowed by the access permission mechanism, the processor generates an Instruction or Data
Storage interrupt (ISI or DSI). The current privilege level of an access is signaled to the MMU with the
CPU’s p_tc[0] output signal.
TLB match (see Figure 10-1)
MSR[PR]
instruction fetch
TLB_entry[UX]
access granted
TLB_entry[SX]
load-class data access
TLB_entry[UR]
TLB_entry[SR]
store-class data access
TLB_entry[UW]
TLB_entry[SW]
Figure 10-3. Granting of Access Permission
10.2.6 Restrictions on 1-KB and 2-KB Page Size Usage
Because of certain implementation limitations regarding coherency lookup operations (lookup is done by
physical address), the low order virtual address bits used to index the cache must match the corresponding
physical address bit value(s) if 1-KB or 2 KB pages are used. These bits are A[20–21] for 1-KB pages and
A20 for 2-KB pages. For example, if logical page X maps to physical page P, then X and P must have the
same values of A[20–21] for 1-KB pages, and A20 for 2-KB pages. This restriction must be followed for
proper CPU operation.
10.3
Translation Lookaside Buffer
The Freescale EIS architecture defines support for zero or more TLBs in an implementation, each with its
own characteristics, and provides configuration information for software to query the existence and
structure of the TLB(s) through a set of special purpose registers: MMUCFG, TLB0CFG, TLB1CFG, etc.
By convention, TLB0 is used for a set associative TLB with fixed page sizes, TLB1 is used for a fully
associative TLB with variable page sizes, and TLB2 is arbitrarily defined by an implementation. The
e200z7 MMU supports a TLB which is fully associative and supports variable page sizes, thus it
corresponds to TLB1.
TLB1 consists of a 64-entry, fully associative CAM array with support for 23 page sizes. To perform a
lookup, the CAM is searched in parallel for a matching TLB entry. The contents of this TLB entry are then
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
10-5
Memory Management Unit
concatenated with the page offset of the original effective address. The result constitutes the real (physical)
address of the access.
A hit to multiple TLB entries is considered to be a programming error. If this occurs, the TLB generates
an invalid address but an exception will not be reported.
Table 10-2 shows the TLB entry bit definitions.
Table 10-2. TLB Entry Bit Definitions
Field
Comments
V
Valid bit for entry
TS
Translation address space (compared against AS bit)
TID[0–7]
Translation ID (compared against PID0 or ‘0’)
EPN[0–21]
Effective page number (compared against effective address)
RPN[0–21]
Real page number (translated address)
SIZE[0–34]
Page size (see Table 10-1)
SX, SW, SR
Supervisor execute, write, and read permission bits
UX, UW, UR
User execute, write, and read permission bits
WIMGE
Translation attributes (write-through required, cache-inhibited, memory coherence required, guarded,
endian)
U0–U3
User bits—used only by software
IPROT
Invalidation protect
VLE
VLE page indicator
10.4
Configuration Information
Information about the configuration for a given MMU implementation is available to system software by
reading the contents of the MMU configuration SPRs. These SPRs describe the architectural version of
the MMU, the number of TLB arrays, and the characteristics of each TLB array.
10.4.1
MMU Configuration Register (MMUCFG)
The MMU configuration register (MMUCFG), shown in Figure 10-4, is a 32-bit read-only register that
provides information about the configuration of the e200z7 MMU design.
SPR 1015
Access: Read only
0
R
W
7
—
8
14 15 16 17
RASIZE
—
20 21
NPIDS
25 26 27
PIDSIZE
—
28
29
30 31
NTLBS MAVN
Reset
Figure 10-4. MMU Configuration Register (MMUCFG)
e200z7 Power Architecture Core Reference Manual, Rev. 2
10-6
Freescale Semiconductor
Memory Management Unit
Table 10-3 describes the MMUCFG bits.
Table 10-3. MMUCFG Field Descriptions
Bits
Name
0–7
[32–39]
—
8–14
[40–46]
RASIZE
15–16
[47–48]
—
17–20
[49–52]
NPIDS
21–25
[53–57]
PIDSIZE
26–27
[58–59]
—
28–29
[60–61]
NTLBS
Number of TLBs
01 This version of the MMU implements two TLB structures: a null TLB0 and a fully-associative TLB
for TLB1
30–31
[62–63]
MAVN
MMU Architecture Version Number
00 This version of the MMU implements version 1.0 of the Freescale EIS MMU architecture
10.4.2
Function
Reserved
Number of Bits of Real Address supported
0100000 This version of the MMU implements 32 real address bits
Reserved
Number of PID Registers
0001 This version of the MMU implements one PID register (PID0)
PID Register Size
00111 PID registers contain 8 bits in this version of the MMU
Reserved
TLB0 Configuration Register (TLB0CFG)
The TLB0 configuration register (TLB0CFG) is a 32-bit read-only register that provides information about
the configuration of TLB0. Because the e200z7 MMU design does not implement TLB0, this register
reads as all ‘0’. It is supplied to allow software to query it in a fashion compatible with other Freescale EIS
designs. The TLB0CFG register is shown in Figure 10-5.
SPR 688
Access: Read only
0
R
7
ASSOC
8
11 12
MINSIZE
15
MAXSIZE
16
17
18
19 20
IPROT AVAIL P2PSA —
31
NENTRY
W
Reset
All zeros
Figure 10-5. TLB0 Configuration Register (TLB0CFG)
The TLB0CFG bits are described in Table 10-4.
Table 10-4. TLB0CFG Field Descriptions
Bits
Name
Function
0–7
[32–39]
ASSOC
Associativity
0
8–11
[40–43]
MINSIZE
Minimum Page Size
0
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
10-7
Memory Management Unit
Table 10-4. TLB0CFG Field Descriptions (continued)
1
Bits
Name
Function
12–15
[44–47]
MAXSIZE
16
[48]
IPROT
Invalidate Protect Capability
0 Not present in TLB0
17
[49]
AVAIL
Page Size Availability
0 No variable page sizes available
18
[50]
P2PSA
Power-of-2 Page Size Availability
0 No odd powers of 2 page sizes are supported
19
[51]
—
20–31
[52–63]
NENTRY
Maximum Page Size
0
Reserved1
Number of Entries
0 TLB0 contains 0 entries
These bits are not implemented and will be read as zero.
10.4.3
TLB1 Configuration Register (TLB1CFG)
The TLB1 configuration register (TLB1CFG) is a 32-bit read-only register that provides information about
the configuration of TLB1 in the e200z7 MMU. Figure 10-6 shows the TLB1CFG register.
SPR 689
Access: Read only
0
R
7
ASSOC
8
11 12
MINSIZE
15
MAXSIZE
16
17
18
IPROT AVAIL P2PSA
W
19 20
—
31
NENTRY
Reset
Figure 10-6. TLB1 Configuration Register (TLB1CFG)
The TLB1CFG bits are described in Table 10-5.
Table 10-5. TLB1CFG Field Descriptions
Bits
Name
Function
0–7
[32–39]
ASSOC
Associativity
0x40 Indicates that TLB1 associativity is 64
8–11
[40–43]
MINSIZE
Minimum Page Size
0x0 Smallest page size is 1 KB
12–15
[44–47]
MAXSIZE
Maximum Page Size
0xB Largest page size is 4 GB
16
[48]
IPROT
Invalidate Protect Capability
1 Invalidate protect capability is supported in TLB1
17
[49]
AVAIL
Page Size Availability
1 All page sizes between MINSIZE and MAXSIZE are supported
e200z7 Power Architecture Core Reference Manual, Rev. 2
10-8
Freescale Semiconductor
Memory Management Unit
Table 10-5. TLB1CFG Field Descriptions (continued)
Bits
Name
Function
18
[50]
P2PSA
Power-of-2 Page Size Availability
1 All odd powers of 2 page sizes between MINSIZE and MAXSIZE are supported (2K, 8K, 32K,
etc.)
19
[51]
—
20–31
[52–63]
NENTRY
10.5
Reserved
Number of Entries
0x40 Indicates that TLB1 contains 64 entries
Software Interface and TLB Instructions
The TLB is accessed indirectly through several MMU assist (MAS) registers. Software can write and read
the MMU assist registers with mtspr and mfspr instructions. These registers contain information related
to reading and writing a given entry within the TLB. Data is read from the TLB into the MAS registers
with a tlbre (TLB read entry) instruction. Data is written to the TLB from the MAS registers with a tlbwe
(TLB write entry) instruction.
Certain fields of the MAS registers are also written by hardware when an Instruction TLB Error or Data
TLB Error interrupt occurs.
On a TLB Error interrupt, the MAS registers are written by hardware with the proper EA, default attributes
(TID, WIMGE, permissions, etc.), and TLB selection information, and an entry in the TLB to replace.
Software manages this entry selection information by updating a replacement entry value during TLB miss
handling. Software must provide the correct RPN and permission information in one of the MAS registers
before executing a tlbwe instruction.
On taking a DSI or ISI interrupt, software should update the search PID (SPID) and search address space
(SAS) fields in the MAS registers using PID0, and appropriate MSR[IS] or MSR[DS] values which were
used when the DSI or ISI exception was recognized. During the interrupt handler, software can issue a
TLB search instruction (tlbsx), which uses the SPID field along with the SAS field, to determine the entry
related to the DSI or ISI exception. (It is possible that the entry which caused the DSI or ISI interrupt no
longer exists in the TLB by the time the search occurs if a TLB invalidate or replacement removes the entry
between the time the exception is recognized and when the tlbsx is executed.)
The tlbre, tlbwe, tlbsx, tlbivax, and tlbsync instructions are privileged.
10.5.1
TLB Read Entry Instruction (tlbre)
The TLB read entry instruction causes the content of a single TLB entry to be placed in the MMU assist
registers. The entry is specified by the TLBSEL and ESEL fields of the MAS0 register. The entry contents
are placed in the MAS1, MAS2, and MAS3 registers. See Table 10-15 for details on how MAS register
fields are updated.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
10-9
Memory Management Unit
tlbre
tlbre
tlb read entry
31
0
0
5
1110110010
6
20 21
0
30 31
tlb_entry_id = MAS0(TLBSEL, ESEL)
result = MMU(tlb_entry_id)
MAS1, MAS2, MAS3 = result
10.5.2
TLB Write Entry Instruction (tlbwe)
The TLB write entry instruction causes the contents of certain fields within the MMU assist registers
MAS1, MAS2, and MAS3 to be written into a single TLB entry in the MMU. The entry written is specified
by the TLBSEL, and ESEL fields of the MAS0 register.
tlbwe
tlbwe
tlb write entry
0
31
0
5
1111010010
6
20 21
0
30 31
tlb_entry_id = MAS0(TLBSEL, ESEL)
MMU(tlb_entry_id) = MAS1, MAS2, MAS3
10.5.3
TLB Search Instruction (tlbsx)
The TLB search instruction updates the MMU assist registers conditionally based on success or failure of
a lookup of the TLB. The lookup is controlled by an effective address provided by GPR[RB] as specified
in the instruction encoding, as well as by the SAS and SPID search fields in MAS6. The values placed into
MAS0, MAS1, MAS2, and MAS3 differ depending on a successful or unsuccessful search. See
Table 10-15 for details on how MAS register fields are updated.
tlbsx
tlbsx
TLB Search Indexed
tlbsx
RA,RB
31
0
0
5
6
Form X
RA
10 11
RB
15 16
1110010010
20 21
0
30 31
e200z7 Power Architecture Core Reference Manual, Rev. 2
10-10
Freescale Semiconductor
Memory Management Unit
if RA!=0 then EA = GPR(RA) + GPR(RB)
else EA = GPR(RB)
ProcessIDs = MAS6(SPID), 8’b00000000
AS = MAS6(SAS)
VA = AS || ProcessIDs || EA
if Valid_TLB_matching_entry_exists(VA)
then result = see Table 10-15, column labelled “tlbsx hit”
else result = see Table 10-15, column labelled “tlbsx miss”
MAS0, MAS1, MAS2, MAS3 = result
10.5.4
TLB Invalidate (tlbivax) Instruction
The TLB invalidate operation is performed whenever a TLB Invalidate Virtual Address Indexed (tlbivax)
instruction is executed. This instruction invalidates TLB entries which correspond to the virtual address
calculated by this instruction. The address is detailed in Table 10-6. No other information except for that
shown in Table 10-6 is used for the invalidation (entry AS and TID values are don’t-cared).
Additional information about the targeted TLB entries is encoded in two of the lower bits of the effective
address calculated by the tlbivax instruction. Bit 28 of the tlbivax effective address is the TLBSEL field.
This bit should be set to ‘1’ to ensure TLB1 is targeted by the invalidate.Bit 29 of the tlbivax effective
address is the INV_ALL field. If this bit is set, it indicates that the invalidate operation needs to completely
invalidate all entries of TLB1 which are not marked as invalidation protected (IPROT bit of entry set to 1).
The bits of EA used to perform the tlbivax invalidation of TLB1 are bits 0–21.
t
Table 10-6. tlbivax EA Bit Definitions
Bits
Field
0–21
EA[0–21]
22–27
Reserved1
28
TLBSEL(1 = TLB1) Should be set to 1 for future compatibility.
29
INV_ALL
Reserved1
30–31
1
These bits should be zero for future compatibility. They are ignored.
tlbivax
tlbivax
TLB Invalidate Virtual Address Indexed
tlbivax
RA,RB
31
0
0
5
6
Form X
RA
10 11
RB
15 16
1100010010
20 21
0
30 31
if RA!=0 then EA = GPR(RA) + GPR(RB)
else EA = GPR(RB)
VA = EA
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
10-11
Memory Management Unit
if (Valid_TLB_matching_entry_exists(VA) or INV_ALL) and Entry_IPROT_not_set
then Invalidate entry
10.5.5
TLB Synchronize Instruction (tlbsync)
The TLB synchronize instruction is treated as a privileged no-op by the e200z7.
tlbsync
tlbsync
TLB Synchronize
tlbsync
31
0
10.6
0
5
6
10 11
1000110110
15 16
20 21
0
30 31
TLB Operations
This section discusses the TLB operations. It consists of the following subsections:
• Section 10.6.1, “Translation Reload”
• Section 10.6.2, “Reading the TLB”
• Section 10.6.3, “Writing the TLB”
• Section 10.6.4, “Searching the TLB”
• Section 10.6.5, “TLB Miss Exception Update”
• Section 10.6.6, “IPROT Invalidation Protection”
• Section 10.6.7, “TLB Load on Reset”
• Section 10.6.8, “The G bit”
10.6.1
Translation Reload
The TLB reload function is performed in software with some hardware assist. This hardware assist consists
of the following:
• Five 32-bit MMU assist registers (MAS0–4, MAS6) for support of the tlbre, tlbwe, and tlbsx TLB
management instructions.
• Loading of MAS0–2 based upon defaults in MAS4 for TLB miss exceptions. This automatically
generates most of the TLB entry.
• Loading of the data exception address register (DEAR) with the effective address of the load, store,
or cache management instruction that caused an Alignment, Data TLB Miss, or Data Storage
Interrupt.
• The tlbwe instruction. When tlbwe is executed, the new TLB entry contained in MAS0-MAS2 is
written into the TLB.
e200z7 Power Architecture Core Reference Manual, Rev. 2
10-12
Freescale Semiconductor
Memory Management Unit
10.6.2
Reading the TLB
The TLB array can be read by first writing the necessary information into MAS0 using mtspr and then
executing the tlbre instruction. To read an entry from the TLB, the TLBSEL field in MAS0 must be set to
01, and the ESEL bits in MAS0 must be set to point to the desired entry. After executing the tlbre
instruction, MAS1–MAS3 is updated with the data from the selected TLB entry.
10.6.3
Writing the TLB
The TLB1 array can be written by first writing the necessary information into MAS0–MAS3 using mtspr
and then executing the tlbwe instruction. To write an entry into the TLB, the TLBSEL field in MAS0 must
be set to 01, and the ESEL bits in MAS0 must be set to point to the desired entry. When the tlbwe
instruction is executed, the TLB entry information stored in MAS1–MAS3 is written into the selected TLB
entry.
10.6.4
Searching the TLB
The TLB can be searched using the tlbsx instruction by first writing the necessary information into MAS6.
The tlbsx instruction searches using EPN[0–21] from the GPR selected by the instruction, SAS (search
AS bit) in MAS6, and SPID in MAS6. If the search is successful, the given TLB entry information is
loaded into MAS0–MAS3. The valid bit in MAS1 is used as the success flag. If the search is successful,
the valid bit in MAS1 is set; if unsuccessful it is cleared. The tlbsx instruction is useful for finding the TLB
entry that caused a DSI or ISI exception.
10.6.5
TLB Miss Exception Update
When a TLB miss exception occurs, MAS0–MAS3 are updated with the defaults specified in MAS4, and
the AS and EPN[0–21] of the access that caused the exception. In addition, the ESEL bits are updated with
the replacement entry value.
This sets up all the TLB entry data necessary for a TLB write except for the RPN[0–21], the U0–U3 user
bits, and the UX/SX/UW/SW/UR/SR permission bits, all of which are stored in MAS3. Thus, if the
defaults stored in MAS4 are applicable to the TLB entry to be loaded, the TLB miss exception handler will
only have to update MAS3 via mtspr before executing tlbwe. If the defaults are not applicable to the TLB
entry being loaded, the TLB miss exception handler must update MAS0–MAS2 before performing the
TLB write.
10.6.6
IPROT Invalidation Protection
The IPROT bit is used to protect TLB entries from invalidation. TLB entries with IPROT set are not
invalidated by a tlbivax instruction (even when INV_ALL is indicated), nor by the MMUCSR0[TLB1_FI]
control function. The IPROT bit is used to protect interrupt vectors/handlers because the instruction fetch
of those vectors must be guaranteed to never take a TLB miss exception.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
10-13
Memory Management Unit
10.6.7
TLB Load on Reset
During reset, all TLB entries except entry 0 are invalidated. TLB entry 0 is loaded with the values in
Table 10-7:
Table 10-7. TLB Entry 0 Values after Reset
Field
Reset Value
VALID
1
Entry is valid
TS
0
Address space 0
TID[0–7]
0x00
EPN[0–21]
value of p_rstbase[0–21]
Page address present on p_rstbase[0:29].
See Section 11.2.2.5, “Reset Base (p_rstbase[0:29]).”
RPN[0–21]
value of p_rstbase[0–21]
Page address present on p_rstbase[0:29].
See Section 11.2.2.5, “Reset Base (p_rstbase[0:29]).”
SIZE[0–4]
00010
SX/SW/SR
111
Full supervisor mode access allowed
UX/UW/UR
111
Full user mode access allowed
WIMG
0100
Cache inhibited, non-coherent
E
value of p_rst_endmode
U0–U3
0000
IPROT
1
VLE
the value of p_rst_vlemode
10.6.8
Comments
TID value for shared (global) page
4 KB page size
Value present on p_rst_endmode.
See Section 11.2.2.6, “Reset Endian Mode (p_rst_endmode).”
User bits
Page is protected from invalidation
Value present on p_rst_vlemode signal.See Section 11.2.2.7,
“Reset VLE Mode (p_rst_vlemode).”
The G bit
The G-bit provides protection from bus accesses that can be cancelled due to an exception on a prior
uncompleted instruction.
If G = 1 (guarded), these types of accesses must stall (if they miss in the cache) until the exception status
of the instruction(s) in progress is known. If G = 0 (unguarded), these accesses may be issued to the bus
regardless of the completion status of other instructions. Since the e200z7 does not make requests to the
bus for load or store instructions which miss in the cache until it is known that prior instructions will
complete without exceptions, proper operation always occurs to guarded storage.
10.7
MMU Control Registers
This section discusses the following registers:
•
•
•
Section 10.7.1, “Data Exception Address Register (DEAR)”
Section 10.7.2, “MMU Control and Status Register 0 (MMUCSR0)”
Section 10.7.3, “MMU Assist Registers (MAS)”
e200z7 Power Architecture Core Reference Manual, Rev. 2
10-14
Freescale Semiconductor
Memory Management Unit
•
Section 10.7.4, “MAS Register Updates”
10.7.1
Data Exception Address Register (DEAR)
The data exception address register (DEAR), shown in Figure 10-7, is loaded with the effective address of
the data access that results in an Alignment, Data TLB Miss, or DSI exception.
SPR 61
Access: Read/Write
0
31
R
Effective Page Address
W
Reset
Unaffected
Figure 10-7. Data Exception Address Register
The DEAR register can be read or written using the mfspr and mtspr instructions.
10.7.2
MMU Control and Status Register 0 (MMUCSR0)
The MMU control and status register 0 (MMUCSR0), shown in Figure 10-8, controls the state of the
MMU.
SPR 1012
Access: Read/Write
0
29
R
—
W
Reset
30
31
TLB1_FI
—
All zeros
Figure 10-8. MMU Control and Status Register 0 (MMUCSR0)
The MMUCSR0 bits are described in Table 10-8.
Table 10-8. MMUCSR0—MMU Control and Status Register 0
Bits
Name
0–29
[32–61]
—
30
[62]
TLB1_FI
31
[63]
—
Description
Reserved
TLB1 flash invalidate
0 No flash invalidate
1 TLB1 invalidation operation
When written to a 1, a TLB1 invalidation operation is initiated by hardware. Once complete, this bit is
reset to 0. Writing a 1 while an invalidation operation is in progress will result in an undefined operation.
Writing a 0 to this bit while an invalidation operation is in progress will be ignored. TLB1 invalidation
operations require 3 cycles to complete.
Reserved
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
10-15
Memory Management Unit
10.7.3
MMU Assist Registers (MAS)
The e200z7 uses six special purpose registers (MAS0, MAS1, MAS2, MAS3, MAS4, and MAS6) to
facilitate reading, writing, and searching the TLBs. The MAS registers can be read or written using the
mfspr and mtspr instructions. The e200z7 does not implement the MAS5 register, present in other
Freescale EIS designs, because the tlbsx instruction only searches based on a single SPID value.
Figure 10-9 shows the MAS0 register.
SPR 624
0
R
1
—
W
Access: Read/Write
2
3
4
9
TLBSEL (01)
10
—
15 16
25 26
ESEL
Reset
—
31
NV
Unaffected
Figure 10-9. MMU Assist Register 0 (MAS0)
Table 10-9 describes the fields.
Table 10-9. MAS0 —MMU Read/Write and Replacement Control
Bit
Name
Comments, or Function when Set
0–1
[32–33]
—
2–3
[34–35]
TLBSEL
4–9
[36–41]
—
10–15
[42–47]
ESEL
16–25
[48–57]
—
Reserved
26–31
[58–63]
NV
Next replacement victim for TLB1 (software managed) Software updates this field; it is copied
to the ESEL field on a TLB Error (see Table 10-15)
Reserved
selects TLB for access: 00=TLB0, 01=TLB1
(ignored by the e200, should be written to 01 for future compatibility)
Reserved
Entry select for TLB.
The MAS1 register is shown in Figure 10-10.
SPR 625
0
R
W
Reset
Access: Read/Write
1
VALID IPROT
2
7
—
8
15 16
TID
18
—
19
TS
20
24 25
TSIZE
31
—
Unaffected
Figure 10-10. MMU Assist Register 1 (MAS1)
e200z7 Power Architecture Core Reference Manual, Rev. 2
10-16
Freescale Semiconductor
Memory Management Unit
Table 10-10 describes the fields.
Table 10-10. MAS1—Descriptor Context and Configuration Control
Bit
Name
Comments, or Function when Set
0
[32]
VALID
TLB Entry Valid
0 This TLB entry is invalid
1 This TLB entry is valid
1
[33]
IPROT
Invalidation Protect
0 Entry is not protected from invalidation
1 Entry is protected from invalidation as described in Section 10.6.6, “IPROT Invalidation
Protection.”
Protects TLB entry from invalidation by tlbivax (TLB1 only), or flash invalidates through
MMUSCR0[TLB1_FI].
2–7
[34–39]
—
Reserved
8–15
[40–47]
TID
Translation ID bits
This field is compared with the current process IDs of the effective address to be translated. A TID
value of 0 defines an entry as global and matches with all process IDs.
16–18
[48–50]
—
Reserved
19
[51]
TS
Translation address space
This bit is compared with the IS or DS fields of the MSR (depending on the type of access) to
determine if this TLB entry may be used for translation.
20–24
[52–56]
TSIZE
25–31
[57–63]
—
Entry’s page size
Supported page sizes are:
0b00000–1 KB
0b00001–2 KB
0b00010–4 KB
0b00011–8 KB
0b00100–16 KB
0b00101–32 KB
0b00110–64 KB
0b00111–128 KB
0b01000–256 KB
0b01001–512 KB
0b01010–1 MB
0b01011–2 MB
0b01100–4 MB
0b01101–8 MB
0b01110–16 MB
0b01111–32 MB
0b10000–64 MB
0b10001–128 MB
0b10010–256 MB
0b10011–512 MB
0b10100–1 GB
0b10101–2 GB
0b10110–4 GB
All other values are undefined
Reserved
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
10-17
Memory Management Unit
Figure 10-11 shows the MAS2 register.
SPR 626
Access: Read/Write
0
19 20
R
21
22
EPN
W
25
—
Reset
26
27
28
29
30
31
I
M
G
E
VLE W
Unaffected
Figure 10-11. MMU Assist Register 2 (MAS2)
Table 10-11 describes the fields.
Table 10-11. MAS2—EPN and Page Attributes
1
Bit
Name
Comments, or Function when Set
0–21
[32–53]
EPN
22–25
[54–57]
—
26
[58]
VLE
27
[59]
W
28
[60]
I
Cache Inhibited
0 This page is considered cacheable
1 This page is considered cache-inhibited
29
[61]
M
Memory Coherence Required
0 Memory Coherence is not required
1 Memory Coherence is required
30
[62]
G
Guarded
0 Accesses to this page are not guarded and can be performed before it is known if they are required
by the sequential execution model
1 All loads and stores to this page are performed without speculation (i.e. they are known to be
required)
e200z7 uses the guarded attribute as described in Section 9.16, “Page Table Control Bits ,” for more
information.
31
[63]
E
Endianness
0 The page is accessed in big-endian byte order.
1 The page is accessed in true little-endian byte order.
Determines endianness for the corresponding page. Refer to Section 11.2.5, “Byte Lane Specification,”
for more information
Effective page number [0–21]
Reserved1
Power ISA VLE
0 This page is a standard Power ISA page
1 This page is a Power ISA VLE page
This bit will always read as zero and writes will be ignored if p_vle_present is negated.
Write-through Required
0 This page is considered write-back with respect to the caches in the system
1 All stores performed to this page are written through to main memory
These bits are not implemented, will be read as zero, and writes are ignored.
e200z7 Power Architecture Core Reference Manual, Rev. 2
10-18
Freescale Semiconductor
Memory Management Unit
The MAS3 register is shown in Figure 10-12.
SPR 627
Access: Read/Write
0
21
R
RPN
W
22
23
24
25
26
27
28
29
30
31
U0 U1 U2 U3 UX SX UW SW UR SR
Reset
Unaffected
Figure 10-12. MMU Assist Register 3 (MAS3)
Table 10-12 describes the fields.
Table 10-12. MAS3—RPN and Access Control
Bit
Name
Comments, or Function when Set
0–21
[32–53]
RPN
Real page number [0–21]
Only bits that correspond to a page number are valid. Bits that represent offsets within a page are ignored
and should be zero.
22–25
[54–57]
U0-U3
26–31
[58–63]
PERMIS
User bits [0–3] for use by system software
Permission bits (UX, SX, UW, SW, UR, SR)
The MAS4 register is shown in Figure 10-13.
SPR 628
0 1
R
W
Access: Read/Write
2
3
4
13
— TLBSELD (01)
—
Reset
14
15
TIDSELD
16
19 20
—
23 24 25
TSIZED
26
27
28
29
30
31
— VLED WD ID MD GD ED
Unaffected
Figure 10-13. MMU Assist Register 4 (MAS4)
Table 10-13 describes the fields.
Table 10-13. MAS4—Hardware Replacement Assist Configuration Register
Bit
Name
0–1
[32–33]
—
2–3
[34–35]
TLBSELD
4–13
[36–45]
—
14–15
[46–47]
TIDSELD
Comments, or Function when Set
Reserved
Default TLB selected
00 TLB0
01 TLB1
Reserved
Default PID# to load TID from
00 PID0
01 Reserved, do not use
10 Reserved, do not use
11 TIDZ (0x00)) (Use all zeros, the globally shared value)
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
10-19
Memory Management Unit
Table 10-13. MAS4—Hardware Replacement Assist Configuration Register (continued)
Bit
Name
16–19
[48–51]
—
20–24
[52–56]
TSIZED
–25
[–57]
—
26
[58]
VLED
27–31
[59–63]
DWIMGE
Comments, or Function when Set
Reserved
Default TSIZE value
Reserved
Default VLE value
Default WIMGE values
The MAS6 register is shown in Figure 10-14.
SPR 630
Access: Read/Write
0
R
W
7
8
—
15 16
SPID
Reset
30
—
31
SAS
Unaffected
Figure 10-14. MMU Assist Register 6 (MAS6)
Table 10-14 describes the fields.
Table 10-14. MAS6—TLB Search Context Register 0
Bit
Name
0–7
[32–39]
—
8–15
[40–47]
SPID
16–30
[48–62]
—
31
[63]
SAS
Comments, or Function when Set
Reserved
PID value for searches
Reserved
AS value for searches
e200z7 Power Architecture Core Reference Manual, Rev. 2
10-20
Freescale Semiconductor
Memory Management Unit
10.7.4
MAS Register Updates
Table 10-15 details the updates to each MAS register field for each update type.
Table 10-15. MMU Assist Register Field Updates
Bit/Field
MAS
affecte
d
Instr/Data TLB
Error
tlbsx hit
tlbsx miss
tlbre
tlbwe
ISI/DSI
TLBSEL
0
TLBSELD
‘Hitting TLB’
TLBSELD
NC
NC
NC
ESEL
0
NV
matched entry
NV
NC
NC
NC
NV
0
NC
NC
NC
NC
NC
NC
VALID
1
1
1
0
V(array)
NC
NC
IPROT
1
0
Matched IPROT if
TLB1 hit, else 0
0
IPROT(array) if
TBL1, else 0
NC
NC
TID[0–7]
1
TIDSELD
(pid0,TIDZ)
TID(array)
SPID
TID(array)
NC
NC
TS
1
MSR(IS/DS)
SAS
SAS
TS(array)
NC
NC
TSIZE[0–4]
1
TSIZED
TSIZE(array)
TSIZED
TSIZE(array)
NC
NC
EPN[0–21]
2
I/D EPN
EPN(array)
tlbsx EPN
EPN(Array)
NC
NC
VWIMGE
2
Default values
VWIMGE(array)
Default values
VWIMGE(array)
NC
NC
RPN[0–21]
3
Zeroed
RPN(Array)
Zeroed
RPN(Array)
NC
NC
ACCESS
(PERMISS +
U0:U3)
3
Zeroed
Access(Array)
Zeroed
Access(Array)
NC
NC
TLBSELD
4
NC
NC
NC
NC
NC
NC
TIDSELD[0–1]
4
NC
NC
NC
NC
NC
NC
TSIZED[0–4]
4
NC
NC
NC
NC
NC
NC
Default VWIMGE
4
NC
NC
NC
NC
NC
NC
SPID
6
PID0
NC
NC
NC
NC
NC
SAS
6
MSR(IS/DS)
NC
NC
NC
NC
NC
10.8
TLB Coherency Control
The e200 core allows invalidation of a TLB entry as described in the Power ISA embedded category
architecture. The tlbivax instruction invalidates local TLB entries only. No broadcast is performed, as no
hardware-based coherency support is provided.
The tlbivax instruction invalidates by effective address only. This means that only the TLB entry’s EPN
bits are used to determine if the TLB entry should be invalidated. It is therefore possible for a single
tlbivax instruction to invalidate multiple TLB entries, since the AS and TID fields of the entries are
ignored.
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
10-21
Memory Management Unit
10.9
Core Interface Operation for MMU Control Instructions
MMU control instructions utilize the normal CPU interface to perform MMU control instructions. The
address bus is driven with the effective address value calculated by the instruction (if any). The access is
treated as a Supervisor Data word-size write, and the Transfer Type encodings are used to distinguish these
operations from other load and store operations. These transfers do not cause debug data address compare
matches to occur regardless of the effective address that is driven.
10.9.1
Transfer Type Encodings for MMU Control Instructions
Transfer type encodings are used to indicate whether a normal access, atomic access, cache management
control access, or MMU management control access is being requested. These attribute signals are driven
with addresses when an access is requested. Table 10-16 shows the definitions of the p_d_ttype[0:5]
encodings.
Table 10-16. Transfer Type Encoding
p_d_ttype[0:5]1
1
Transfer Type
Instruction
00000e
Normal
Normal loads/stores
000010
Atomic
lbarx, lharx, lwarx,
stbcx., sthcx., and stwcx.
00010e
Flush Data Block
dcbst
00011e
Flush and Invalidate Data Block
dcbf
00100e
Allocate and Zero Data Block
dcbz
001010
Invalidate Data Block
dcbi
00110e
Invalidate Instruction Block
icbi
001110
multiple word load/store
lmw, stmw
010000
TLB Invalidate
tlbivax
010010
TLB Search
tlbsx
010100
TLB Read entry
tlbre
010110
TLB Write entry
tlbwe
011000
Touch for Instruction
icbt
011010
Lock Clear for Instruction
icblc
011100
Touch for Instruction and Lock Set
icbtls
011110
Lock Clear for Data
dcblc
10000e
Touch for Data
dcbt
10001e
Touch for Data Store
dcbtst
100100
Touch for Data and Lock Set
dcbtls
100110
Touch for Data Store and Lock Set
dcbtstls
p_ttype[5] ’e’ is set to set to 0.
e200z7 Power Architecture Core Reference Manual, Rev. 2
10-22
Freescale Semiconductor
Memory Management Unit
10.10 Effect of Hardware Debug on MMU Operation
Hardware debug facilities utilize normal CPU instructions to access register and memory contents during
a debug session. If desired during a debug session, the debug firmware may disable the translation process
and may substitute default values for the access protection (UX, UR, UW, SX, SR, SW) bits, and values
obtained from the OnCE control register for page attribute (VLE, W, I, M, G, E) bits normally provided by
a matching TLB entry. In addition, no address translation is performed, and instead, a 1:1 mapping of
effective to real addresses is performed. When disabled during the debug session, no TLB miss or TLB
Access Protection related DSI conditions occur. If the debugger wants to use the normal translation
process, the MMU can be left enabled in the OnCE OCR, and normal translation (including the possibility
of a TLB Miss or DSI) remains in effect.
Refer to Section 13.4.6.3, “e200 OnCE Control Register (OCR),” for more detail on controlling MMU
operation during debug sessions.
10.11 External Translation Alterations for Realtime Systems
To support realtime systems in which dynamic mapping of calibration or other data types is needed, the
MMU provides special capabilities on a subset of TLB entries. These capabilities allow external hardware
to dynamically select one of multiple mappings to one or more physical pages by the same logical address.
This capability provides an inexpensive way of dynamically overlaying selected RAM pages on top of
read-only memory during runtime. The particular physical page a given logical page maps to can be
dynamically altered by means of the p_extpid[6:7] inputs. This capability is only provided for TLB1
entries 0–15, and only for a restricted subset of PID values.
The p_extpid_en control input controls the enabling of the dynamic mapping capability. This input is
sampled with the rising edge of the clock, and when asserted, allows the use of the dynamic remapping
capability.
When one or more of TLB1 entries 0–15 is programmed with a TID value of 0b1111xxxx, special
entry-specific logic is enabled for the entry. This logic causes the sampled values of the p_extpid[6:7]
inputs to be used in place of PID0[6–7] for the purposes of comparison of this entry with the current PID0
register contents to determine an entry hit condition.
In addition, for those entries within entries 0–15 programmed with a TID value of 0b1111xx11, the
comparison of TID[6–7] to PID0[6–7] for a match is always forced true. This means that the hit condition
for these entries is independent of the sampled values of the p_extpid[6:7] inputs.
Entries within entries 0–15 programmed with a TID value of 0b1111nm00 match a PID0 value of
0b1111nmxx when p_extpid[6:7] inputs are 00. Those programmed with a TID value of 0b1111nm01
match a PID0 value of 0b1111nmxx when p_extpid[6:7] inputs are 01. Those programmed with a TID
value of 0b1111nm10 match a PID0 value of 0b1111nmxx when p_extpid[6:7] inputs are 10. Those
entries within entries 0–15 programmed with a TID value of 0b1111nm11 match a PID0 value of
0b1111nmxx regardless of the sampled values of the p_extpid[6:7] inputs.
This logic allows application software of this type to set up to three independent mappings for a set of
calibration pages and for external hardware to select between one of the three based on the driven values
of the p_extpid[6:7] inputs. The other pages are mapped with a common set of entries with stored TID
e200z7 Power Architecture Core Reference Manual, Rev. 2
Freescale Semiconductor
10-23
Memory Management Unit
values of 1111xx11, which match for all sets of calibration page selections. This specialized software must
use PID values in the range of 111100xx to 111111xx.
Software is responsible for coordinating the modification to the p_extpid[6:7] inputs to ensure they only
change when there is no possibility of an error induced by simultaneous use.
Figure 10-15 shows the equivalent logical operation of the capability.
TLB entry Hit
TLB_entry[V]
TLB_entry[TS]
AS (from MSR[IS] or MSR[DS])
=?
p_extpid_en
mask_TID6:7_cmp
TLB_entry
[TID0:3]
TLB_entry[TID6:7]
Process ID[6:7]
p_extpid6:7
0
force compare true
for PID/TID 6:7
modified_PID[6:7]
1
TLB_entry[TID0:7]
Process ID[0:5]
TLB_entry[TID]
TLB_entry[EPN]
EA page number bits
=?
=0?
private page
shared page
=?
Note: Functionality available for entry #0–15 only
Figure 10-15. External Translation Alteration TLB Entry Compare Process
e200z7 Power Architecture Core Reference Manual, Rev. 2
10-24
Freescale Semiconductor
Chapter 11
External Core Complex Interfaces
This chapter describes the external interfaces to the e200z7 core complex. This chapter also documents
signal descriptions and data transfer protocols.
The external interfaces encompass control and data signals supporting instruction and data transfers as
well as support for interrupts, including vectored interrupt logic, reset support, power management
interface signals, debug event signals, time base control and status information, processor state
information, Nexus 1/3/OnCE/JTAG interface signals, and a test interface.
The memory portion of the e200 core interface consists of a pair of 64-bit wide standard AHB system
buses, one for instructions and the other for data. The data memory interface supports read and write
transfers of 8, 16, 24, 32, and 64 bits, supports misaligned transfers, supports true big- and little-endian
operating modes, and operates in a pipelined fashion. The instruction memory interface supports read
transfers of 16, 32, and 64 bits, supports misaligned transfers, supports true big- and little-endian
operating modes, and operates in a pipelined fashion.
The memory interface supported by the BIUs is based on the AHB 2.v6 definition. Additional sideband
signals have been added to support additional control functions.
NOTE
The AHB bit and byte ordering reflect a natural little-endian ordering, as
used by the AMBA documentation. The e200z7 BIU automatically
performs the necessary byte lane conversions to support big-endian
transfers. Memories and peripheral devices/interfaces should be wired
according to byte lane addresses defined in Section 11.2.5, “Byte Lane
Specification,” and Table